From patchwork Tue Dec 15 20:33:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15E20C4361B for ; Tue, 15 Dec 2020 20:33:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8E59922AAE for ; Tue, 15 Dec 2020 20:33:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E59922AAE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1E63B6B006E; Tue, 15 Dec 2020 15:33:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1968C6B0070; Tue, 15 Dec 2020 15:33:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 085AA6B0071; Tue, 15 Dec 2020 15:33:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0160.hostedemail.com [216.40.44.160]) by kanga.kvack.org (Postfix) with ESMTP id E0F286B006E for ; Tue, 15 Dec 2020 15:33:23 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A8FED8249980 for ; Tue, 15 Dec 2020 20:33:23 +0000 (UTC) X-FDA: 77596666686.03.help43_491301827426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 8A01628A4E9 for ; Tue, 15 Dec 2020 20:33:23 +0000 (UTC) X-HE-Tag: help43_491301827426 X-Filterd-Recvd-Size: 9231 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:22 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:20 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064402; bh=GFplm+lU1SFITInB15APrDjZrs/kpaiI5eHDW8X8mh4=; h=From:To:Subject:In-Reply-To:From; b=T5NjQI1ngv/DCkjc2h2GsBFAaZIbblhpjJ70mMhJkGxT9XIhRkYJZ9PNdkdWJTkDi e/6y4HiCIocwDBof2RsztyuatBV3k4Gf2VlRAwSWjjacvSZXUh0KgBpdFPgocsEBz7 3oQkoZsnb2aOX2EZWtMZOUWJCY24KA28aZD4LkI8= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 01/19] mm/thp: move lru_add_page_tail() to huge_memory.c Message-ID: <20201215203320.p3Xky-7P6%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/thp: move lru_add_page_tail() to huge_memory.c Patch series "per memcg lru lock", v21. This patchset includes 3 parts: 1, some code cleanup and minimum optimization as a preparation. 2, use TestCleanPageLRU as page isolation's precondition. 3, replace per node lru_lock with per memcg per node lru_lock. Current lru_lock is one for each of node, pgdat->lru_lock, that guard for lru lists, but now we had moved the lru lists into memcg for long time. Still using per node lru_lock is clearly unscalable, pages on each of memcgs have to compete each others for a whole lru_lock. This patchset try to use per lruvec/memcg lru_lock to repleace per node lru lock to guard lru lists, make it scalable for memcgs and get performance gain. Currently lru_lock still guards both lru list and page's lru bit, that's ok. but if we want to use specific lruvec lock on the page, we need to pin down the page's lruvec/memcg during locking. Just taking lruvec lock first may be undermined by the page's memcg charge/migration. To fix this problem, we could take out the page's lru bit clear and use it as pin down action to block the memcg changes. That's the reason for new atomic func TestClearPageLRU. So now isolating a page need both actions: TestClearPageLRU and hold the lru_lock. The typical usage of this is isolate_migratepages_block() in compaction.c we have to take lru bit before lru lock, that serialized the page isolation in memcg page charge/migration which will change page's lruvec and new lru_lock in it. The above solution suggested by Johannes Weiner, and based on his new memcg charge path, then have this patchset. (Hugh Dickins tested and contributed much code from compaction fix to general code polish, thanks a lot!). Daniel Jordan's testing show 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box on v18, which has no much different with this v20. https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/ Thanks to Hugh Dickins and Konstantin Khlebnikov, they both brought this idea 8 years ago, and others who gave comments as well: Daniel Jordan, Mel Gorman, Shakeel Butt, Matthew Wilcox, Alexander Duyck etc. Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu, and Yun Wang. Hugh Dickins also shared his kbuild-swap case. This patch (of 19): lru_add_page_tail() is only used in huge_memory.c, defining it in other file with a CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird. Let's move it THP. And make it static as Hugh Dickins suggested. Link: https://lkml.kernel.org/r/1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-2-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Reviewed-by: Kirill A. Shutemov Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Matthew Wilcox Cc: Mel Gorman Cc: Tejun Heo Cc: Konstantin Khlebnikov Cc: Daniel Jordan Cc: Shakeel Butt Cc: Joonsoo Kim Cc: Wei Yang Cc: Alexander Duyck Cc: "Chen, Rong A" Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Huang, Ying" Cc: Jann Horn Cc: Kirill A. Shutemov Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/swap.h | 2 -- mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/swap.c | 33 --------------------------------- 3 files changed, 30 insertions(+), 35 deletions(-) --- a/include/linux/swap.h~mm-thp-move-lru_add_page_tail-func-to-huge_memoryc +++ a/include/linux/swap.h @@ -338,8 +338,6 @@ extern void lru_note_cost(struct lruvec unsigned int nr_pages); extern void lru_note_cost_page(struct page *); extern void lru_cache_add(struct page *); -extern void lru_add_page_tail(struct page *page, struct page *page_tail, - struct lruvec *lruvec, struct list_head *head); extern void mark_page_accessed(struct page *); extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); --- a/mm/huge_memory.c~mm-thp-move-lru_add_page_tail-func-to-huge_memoryc +++ a/mm/huge_memory.c @@ -2359,6 +2359,36 @@ static void remap_page(struct page *page } } +static void lru_add_page_tail(struct page *page, struct page *page_tail, + struct lruvec *lruvec, struct list_head *list) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + VM_BUG_ON_PAGE(PageCompound(page_tail), page); + VM_BUG_ON_PAGE(PageLRU(page_tail), page); + lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); + + if (!list) + SetPageLRU(page_tail); + + if (likely(PageLRU(page))) + list_add_tail(&page_tail->lru, &page->lru); + else if (list) { + /* page reclaim is reclaiming a huge page */ + get_page(page_tail); + list_add_tail(&page_tail->lru, list); + } else { + /* + * Head page has not yet been counted, as an hpage, + * so we must account for each subpage individually. + * + * Put page_tail on the list at the correct position + * so they all end up in order. + */ + add_page_to_lru_list_tail(page_tail, lruvec, + page_lru(page_tail)); + } +} + static void __split_huge_page_tail(struct page *head, int tail, struct lruvec *lruvec, struct list_head *list) { --- a/mm/swap.c~mm-thp-move-lru_add_page_tail-func-to-huge_memoryc +++ a/mm/swap.c @@ -977,39 +977,6 @@ void __pagevec_release(struct pagevec *p } EXPORT_SYMBOL(__pagevec_release); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -/* used by __split_huge_page_refcount() */ -void lru_add_page_tail(struct page *page, struct page *page_tail, - struct lruvec *lruvec, struct list_head *list) -{ - VM_BUG_ON_PAGE(!PageHead(page), page); - VM_BUG_ON_PAGE(PageCompound(page_tail), page); - VM_BUG_ON_PAGE(PageLRU(page_tail), page); - lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); - - if (!list) - SetPageLRU(page_tail); - - if (likely(PageLRU(page))) - list_add_tail(&page_tail->lru, &page->lru); - else if (list) { - /* page reclaim is reclaiming a huge page */ - get_page(page_tail); - list_add_tail(&page_tail->lru, list); - } else { - /* - * Head page has not yet been counted, as an hpage, - * so we must account for each subpage individually. - * - * Put page_tail on the list at the correct position - * so they all end up in order. - */ - add_page_to_lru_list_tail(page_tail, lruvec, - page_lru(page_tail)); - } -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, void *arg) { From patchwork Tue Dec 15 20:33:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 942B6C4361B for ; Tue, 15 Dec 2020 20:33:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 343B422CAF for ; Tue, 15 Dec 2020 20:33:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 343B422CAF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A23306B0071; Tue, 15 Dec 2020 15:33:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D1CC6B0072; Tue, 15 Dec 2020 15:33:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 874EA6B0073; Tue, 15 Dec 2020 15:33:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id 677F66B0071 for ; Tue, 15 Dec 2020 15:33:28 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 288DC363C for ; Tue, 15 Dec 2020 20:33:28 +0000 (UTC) X-FDA: 77596666896.17.burst06_190a59927426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 0D10D180D0194 for ; Tue, 15 Dec 2020 20:33:28 +0000 (UTC) X-HE-Tag: burst06_190a59927426 X-Filterd-Recvd-Size: 5011 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:27 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:24 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064406; bh=supK10J6PyZPRKp1A8xh0qJcUSeFlUi3DibL24SyoOc=; h=From:To:Subject:In-Reply-To:From; b=GZ+rkgwfNpCanPzyq/JBZafL7zEvPGiVQQdcdJsVyM0jz6Dug1rht4H1ug0RRDwsK tyk5xD3wHzkl6G3RLzEKHs9KZSa5wBejDYHwyVE7Uqshz+wGWGNI0UEfZigXXK5Bkx IkHfy98wiB/GxsXmCOX07WBwNBLNJpu+L1whpxjE= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 02/19] mm/thp: use head for head page in lru_add_page_tail() Message-ID: <20201215203324.vyhdW4WXY%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/thp: use head for head page in lru_add_page_tail() Since the first parameter is only used by head page, it's better to make it explicit. Link: https://lkml.kernel.org/r/1604566549-62481-3-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Reviewed-by: Kirill A. Shutemov Reviewed-by: Matthew Wilcox (Oracle) Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/huge_memory.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) --- a/mm/huge_memory.c~mm-thp-use-head-for-head-page-in-lru_add_page_tail +++ a/mm/huge_memory.c @@ -2359,33 +2359,32 @@ static void remap_page(struct page *page } } -static void lru_add_page_tail(struct page *page, struct page *page_tail, +static void lru_add_page_tail(struct page *head, struct page *tail, struct lruvec *lruvec, struct list_head *list) { - VM_BUG_ON_PAGE(!PageHead(page), page); - VM_BUG_ON_PAGE(PageCompound(page_tail), page); - VM_BUG_ON_PAGE(PageLRU(page_tail), page); + VM_BUG_ON_PAGE(!PageHead(head), head); + VM_BUG_ON_PAGE(PageCompound(tail), head); + VM_BUG_ON_PAGE(PageLRU(tail), head); lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); if (!list) - SetPageLRU(page_tail); + SetPageLRU(tail); - if (likely(PageLRU(page))) - list_add_tail(&page_tail->lru, &page->lru); + if (likely(PageLRU(head))) + list_add_tail(&tail->lru, &head->lru); else if (list) { /* page reclaim is reclaiming a huge page */ - get_page(page_tail); - list_add_tail(&page_tail->lru, list); + get_page(tail); + list_add_tail(&tail->lru, list); } else { /* * Head page has not yet been counted, as an hpage, * so we must account for each subpage individually. * - * Put page_tail on the list at the correct position + * Put tail on the list at the correct position * so they all end up in order. */ - add_page_to_lru_list_tail(page_tail, lruvec, - page_lru(page_tail)); + add_page_to_lru_list_tail(tail, lruvec, page_lru(tail)); } } From patchwork Tue Dec 15 20:33:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975721 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00924C4361B for ; Tue, 15 Dec 2020 20:33:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 88E8B22CB1 for ; Tue, 15 Dec 2020 20:33:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88E8B22CB1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DD0496B0073; Tue, 15 Dec 2020 15:33:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D558D6B0074; Tue, 15 Dec 2020 15:33:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF7406B0075; Tue, 15 Dec 2020 15:33:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0165.hostedemail.com [216.40.44.165]) by kanga.kvack.org (Postfix) with ESMTP id A33E06B0073 for ; Tue, 15 Dec 2020 15:33:32 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 72B931E09 for ; Tue, 15 Dec 2020 20:33:32 +0000 (UTC) X-FDA: 77596667064.29.cork31_491752327426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 4129E180882B2 for ; Tue, 15 Dec 2020 20:33:32 +0000 (UTC) X-HE-Tag: cork31_491752327426 X-Filterd-Recvd-Size: 4811 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:31 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:29 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064410; bh=3U6xznw5k4S17kSCr9BgUv6ipQkATNVw1nVtIV16FHM=; h=From:To:Subject:In-Reply-To:From; b=01jblC9GQ03Io/ENuOkkyubNDUPWmEoG4cywWWKDTFb2bzJO/IfJbLBdCcr6Fu7FL BbSW+DrGTE9npo1bCtOM8LSSEHOngocIoCTXDua6AV/CQVzucTCktLEnzC7+Xd8Zpw KvRkJSMErilqTxKS7vcUxQ6GrPG9UE9Zf8FdC9NA= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 03/19] mm/thp: simplify lru_add_page_tail() Message-ID: <20201215203329.o2h4X80q8%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/thp: simplify lru_add_page_tail() Simplify lru_add_page_tail(), there are actually only two cases possible: split_huge_page_to_list(), with list supplied and head isolated from lru by its caller; or split_huge_page(), with NULL list and head on lru - because when head is racily isolated from lru, the isolator's reference will stop the split from getting any further than its page_ref_freeze(). So decide between the two cases by "list", but add VM_WARN_ON()s to verify that they match our lru expectations. [Hugh Dickins: rewrite commit log] Link: https://lkml.kernel.org/r/1604566549-62481-4-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Reviewed-by: Kirill A. Shutemov Acked-by: Hugh Dickins Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Mika Penttilä Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/huge_memory.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) --- a/mm/huge_memory.c~mm-thp-simplify-lru_add_page_tail +++ a/mm/huge_memory.c @@ -2367,24 +2367,16 @@ static void lru_add_page_tail(struct pag VM_BUG_ON_PAGE(PageLRU(tail), head); lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); - if (!list) - SetPageLRU(tail); - - if (likely(PageLRU(head))) - list_add_tail(&tail->lru, &head->lru); - else if (list) { + if (list) { /* page reclaim is reclaiming a huge page */ + VM_WARN_ON(PageLRU(head)); get_page(tail); list_add_tail(&tail->lru, list); } else { - /* - * Head page has not yet been counted, as an hpage, - * so we must account for each subpage individually. - * - * Put tail on the list at the correct position - * so they all end up in order. - */ - add_page_to_lru_list_tail(tail, lruvec, page_lru(tail)); + /* head is still on lru (and we have it frozen) */ + VM_WARN_ON(!PageLRU(head)); + SetPageLRU(tail); + list_add_tail(&tail->lru, &head->lru); } } From patchwork Tue Dec 15 20:33:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3877BC4361B for ; Tue, 15 Dec 2020 20:33:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CEE5D22CB1 for ; Tue, 15 Dec 2020 20:33:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CEE5D22CB1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 62CF46B0075; Tue, 15 Dec 2020 15:33:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B3D96B0078; Tue, 15 Dec 2020 15:33:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47CF66B007B; Tue, 15 Dec 2020 15:33:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 2C13E6B0075 for ; Tue, 15 Dec 2020 15:33:37 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D3378246A for ; Tue, 15 Dec 2020 20:33:36 +0000 (UTC) X-FDA: 77596667232.24.fear74_440c20027426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id A44ED1A4A0 for ; Tue, 15 Dec 2020 20:33:36 +0000 (UTC) X-HE-Tag: fear74_440c20027426 X-Filterd-Recvd-Size: 7731 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:35 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:33 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064415; bh=igu9+jhjsJ6nopit9khbzJNO1kBtIiPPb8A34akUTv4=; h=From:To:Subject:In-Reply-To:From; b=ANEjZUHUsNk0zRYjfw5d8vJ2Q4wLrz+xnc+7JWt4f7OR4EoW/zdCrtK+yDkv1J06f zutG3yjx2uAwIE/jFw66Ko35qq8VGz1LzXJG6LpKAmLJ0AT6FD3Ja2Qo3m14o5X55q A5rVuARIoQWqlBV2ZSKvtvOSKgvhaY9X+v/jPcTs= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 04/19] mm/thp: narrow lru locking Message-ID: <20201215203333.hyZtikIQM%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/thp: narrow lru locking lru_lock and page cache xa_lock have no obvious reason to be taken one way round or the other: until now, lru_lock has been taken before page cache xa_lock, when splitting a THP; but nothing else takes them together. Reverse that ordering: let's narrow the lru locking - but leave local_irq_disable to block interrupts throughout, like before. Hugh Dickins point: split_huge_page_to_list() was already silly, to be using the _irqsave variant: it's just been taking sleeping locks, so would already be broken if entered with interrupts enabled. So we can save passing flags argument down to __split_huge_page(). Why change the lock ordering here? That was hard to decide. One reason: when this series reaches per-memcg lru locking, it relies on the THP's memcg to be stable when taking the lru_lock: that is now done after the THP's refcount has been frozen, which ensures page memcg cannot change. Another reason: previously, lock_page_memcg()'s move_lock was presumed to nest inside lru_lock; but now lru_lock must nest inside (page cache lock inside) move_lock, so it becomes possible to use lock_page_memcg() to stabilize page memcg before taking its lru_lock. That is not the mechanism used in this series, but it is an option we want to keep open. [hughd@google.com: rewrite commit log] Link: https://lkml.kernel.org/r/1604566549-62481-5-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Reviewed-by: Kirill A. Shutemov Acked-by: Hugh Dickins Cc: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Alexander Duyck Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Konstantin Khlebnikov Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/huge_memory.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) --- a/mm/huge_memory.c~mm-thp-narrow-lru-locking +++ a/mm/huge_memory.c @@ -2446,7 +2446,7 @@ static void __split_huge_page_tail(struc } static void __split_huge_page(struct page *page, struct list_head *list, - pgoff_t end, unsigned long flags) + pgoff_t end) { struct page *head = compound_head(page); pg_data_t *pgdat = page_pgdat(head); @@ -2456,8 +2456,6 @@ static void __split_huge_page(struct pag unsigned int nr = thp_nr_pages(head); int i; - lruvec = mem_cgroup_page_lruvec(head, pgdat); - /* complete memcg works before add pages to LRU */ mem_cgroup_split_huge_fixup(head); @@ -2469,6 +2467,11 @@ static void __split_huge_page(struct pag xa_lock(&swap_cache->i_pages); } + /* prevent PageLRU to go away from under us, and freeze lru stats */ + spin_lock(&pgdat->lru_lock); + + lruvec = mem_cgroup_page_lruvec(head, pgdat); + for (i = nr - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); /* Some pages can be beyond i_size: drop them from page cache */ @@ -2488,6 +2491,8 @@ static void __split_huge_page(struct pag } ClearPageCompound(head); + spin_unlock(&pgdat->lru_lock); + /* Caller disabled irqs, so they are still disabled here */ split_page_owner(head, nr); @@ -2505,8 +2510,7 @@ static void __split_huge_page(struct pag page_ref_add(head, 2); xa_unlock(&head->mapping->i_pages); } - - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + local_irq_enable(); remap_page(head, nr); @@ -2652,12 +2656,10 @@ bool can_split_huge_page(struct page *pa int split_huge_page_to_list(struct page *page, struct list_head *list) { struct page *head = compound_head(page); - struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); struct deferred_split *ds_queue = get_deferred_split_queue(head); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; int count, mapcount, extra_pins, ret; - unsigned long flags; pgoff_t end; VM_BUG_ON_PAGE(is_huge_zero_page(head), head); @@ -2718,9 +2720,8 @@ int split_huge_page_to_list(struct page unmap_page(head); VM_BUG_ON_PAGE(compound_mapcount(head), head); - /* prevent PageLRU to go away from under us, and freeze lru stats */ - spin_lock_irqsave(&pgdata->lru_lock, flags); - + /* block interrupt reentry in xa_lock and spinlock */ + local_irq_disable(); if (mapping) { XA_STATE(xas, &mapping->i_pages, page_index(head)); @@ -2750,7 +2751,7 @@ int split_huge_page_to_list(struct page __dec_lruvec_page_state(head, NR_FILE_THPS); } - __split_huge_page(page, list, end, flags); + __split_huge_page(page, list, end); ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { @@ -2764,7 +2765,7 @@ int split_huge_page_to_list(struct page spin_unlock(&ds_queue->split_queue_lock); fail: if (mapping) xa_unlock(&mapping->i_pages); - spin_unlock_irqrestore(&pgdata->lru_lock, flags); + local_irq_enable(); remap_page(head, thp_nr_pages(head)); ret = -EBUSY; } From patchwork Tue Dec 15 20:33:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9F7AC4361B for ; Tue, 15 Dec 2020 20:33:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A4B822CB8 for ; Tue, 15 Dec 2020 20:33:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A4B822CB8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B5DE36B007B; Tue, 15 Dec 2020 15:33:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AE4C36B007D; Tue, 15 Dec 2020 15:33:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 986396B007E; Tue, 15 Dec 2020 15:33:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0227.hostedemail.com [216.40.44.227]) by kanga.kvack.org (Postfix) with ESMTP id 7BDA46B007B for ; Tue, 15 Dec 2020 15:33:42 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2C302181AEF30 for ; Tue, 15 Dec 2020 20:33:42 +0000 (UTC) X-FDA: 77596667484.10.brass42_3b0f38527426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 03EFD16A044 for ; Tue, 15 Dec 2020 20:33:41 +0000 (UTC) X-HE-Tag: brass42_3b0f38527426 X-Filterd-Recvd-Size: 5966 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:40 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:37 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064419; bh=yVxFEvuUVn3jmENrHzY0uWR+ELpuL33DI8lqyXbACNI=; h=From:To:Subject:In-Reply-To:From; b=To0UlzHeEHFxW90ysMyK0blDqvptIN0OpoQOaeeD+u7KyG/+UMA+6iv3AfNHvYAGM dgMwd6hy9lIPKJ1yc2naEr5m3CWbpDz9NrZEUFU2FPAY7GpAyVw5h4DQEXoNMWS7J6 wQbrFVEi7DZwjb1yt9zp3ySvsJxgI7TLRbPs5HzM= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 05/19] mm/vmscan: remove unnecessary lruvec adding Message-ID: <20201215203337.vwzTlQ4OE%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/vmscan: remove unnecessary lruvec adding We don't have to add a freeable page into lru and then remove from it. This change saves a couple of actions and makes the moving more clear. The SetPageLRU needs to be kept before put_page_testzero for list integrity, otherwise: #0 move_pages_to_lru #1 release_pages if !put_page_testzero if (put_page_testzero()) !PageLRU //skip lru_lock SetPageLRU() list_add(&page->lru,) list_add(&page->lru,) [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/1604566549-62481-6-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Tejun Heo Cc: Matthew Wilcox Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/vmscan.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-) --- a/mm/vmscan.c~mm-vmscan-remove-unnecessary-lruvec-adding +++ a/mm/vmscan.c @@ -1851,26 +1851,30 @@ static unsigned noinline_for_stack move_ while (!list_empty(list)) { page = lru_to_page(list); VM_BUG_ON_PAGE(PageLRU(page), page); + list_del(&page->lru); if (unlikely(!page_evictable(page))) { - list_del(&page->lru); spin_unlock_irq(&pgdat->lru_lock); putback_lru_page(page); spin_lock_irq(&pgdat->lru_lock); continue; } - lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* + * The SetPageLRU needs to be kept here for list integrity. + * Otherwise: + * #0 move_pages_to_lru #1 release_pages + * if !put_page_testzero + * if (put_page_testzero()) + * !PageLRU //skip lru_lock + * SetPageLRU() + * list_add(&page->lru,) + * list_add(&page->lru,) + */ SetPageLRU(page); - lru = page_lru(page); - nr_pages = thp_nr_pages(page); - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); - list_move(&page->lru, &lruvec->lists[lru]); - - if (put_page_testzero(page)) { + if (unlikely(put_page_testzero(page))) { __ClearPageLRU(page); __ClearPageActive(page); - del_page_from_lru_list(page, lruvec, lru); if (unlikely(PageCompound(page))) { spin_unlock_irq(&pgdat->lru_lock); @@ -1878,11 +1882,19 @@ static unsigned noinline_for_stack move_ spin_lock_irq(&pgdat->lru_lock); } else list_add(&page->lru, &pages_to_free); - } else { - nr_moved += nr_pages; - if (PageActive(page)) - workingset_age_nonresident(lruvec, nr_pages); + + continue; } + + lruvec = mem_cgroup_page_lruvec(page, pgdat); + lru = page_lru(page); + nr_pages = thp_nr_pages(page); + + update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); + list_add(&page->lru, &lruvec->lists[lru]); + nr_moved += nr_pages; + if (PageActive(page)) + workingset_age_nonresident(lruvec, nr_pages); } /* From patchwork Tue Dec 15 20:33:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18EAAC4361B for ; Tue, 15 Dec 2020 20:33:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C73D222CAE for ; Tue, 15 Dec 2020 20:33:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C73D222CAE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6072E6B0080; Tue, 15 Dec 2020 15:33:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 59FF36B0082; Tue, 15 Dec 2020 15:33:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40A996B0081; Tue, 15 Dec 2020 15:33:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 20C306B007E for ; Tue, 15 Dec 2020 15:33:46 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E19208249980 for ; Tue, 15 Dec 2020 20:33:45 +0000 (UTC) X-FDA: 77596667610.13.spade31_1d17e6e27426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id BC3AC18140B70 for ; Tue, 15 Dec 2020 20:33:45 +0000 (UTC) X-HE-Tag: spade31_1d17e6e27426 X-Filterd-Recvd-Size: 5882 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:45 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:42 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064424; bh=nBG3JTxx41P3NkdiTpJL/2E1ZKUvgP0xDWYDNugdXGA=; h=From:To:Subject:In-Reply-To:From; b=ML92VQmQxblPj/4XWaGMQaZmAjB8Mi9TmjcHTftEIgcQNaIXYmc+tKJUUlTa2q26Y xfKMBW9g4gAyhSv/QPZj1BDlgErNC0gawSaJyWWAR4Xk63BW1iz09SXgtiqRT1zEdP dkNmw20YpDI0GWydwpVwvn4zsajlKB0pFI6ChGB0= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 06/19] mm/rmap: stop store reordering issue on page->mapping Message-ID: <20201215203342.X_2LUvA2i%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/rmap: stop store reordering issue on page->mapping Hugh Dickins and Minchan Kim observed a long time issue which discussed here, but actully the mentioned fix missed. https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop/ The store reordering may cause problem in the scenario: CPU 0 CPU1 do_anonymous_page page_add_new_anon_rmap() page->mapping = anon_vma + PAGE_MAPPING_ANON lru_cache_add_inactive_or_unevictable() spin_lock(lruvec->lock) SetPageLRU() spin_unlock(lruvec->lock) /* idletacking judged it as LRU * page so pass the page in * page_idle_clear_pte_refs */ page_idle_clear_pte_refs rmap_walk if PageAnon(page) Johannes give detailed examples how the store reordering could cause a trouble: "The concern is the SetPageLRU may get reorder before 'page->mapping' setting, That would make CPU 1 will observe at page->mapping after observing PageLRU set on the page. 1. anon_vma + PAGE_MAPPING_ANON That's the in-order scenario and is fine. 2. NULL That's possible if the page->mapping store gets reordered to occur after SetPageLRU. That's fine too because we check for it. 3. anon_vma without the PAGE_MAPPING_ANON bit That would be a problem and could lead to all kinds of undesirable behavior including crashes and data corruption. Is it possible? AFAICT the compiler is allowed to tear the store to page->mapping and I don't see anything that would prevent it. That said, I also don't see how the reader testing PageLRU under the lru_lock would prevent that in the first place. AFAICT we need that WRITE_ONCE() around the page->mapping assignment." [alex.shi@linux.alibaba.com: updated for comments change from Johannes] Link: https://lkml.kernel.org/r/e66ef2e5-c74c-6498-e8b3-56c37b9d2d15@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-7-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Johannes Weiner Acked-by: Hugh Dickins Cc: Matthew Wilcox Cc: Minchan Kim Cc: Vladimir Davydov Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/rmap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/mm/rmap.c~mm-rmap-stop-store-reordering-issue-on-page-mapping +++ a/mm/rmap.c @@ -1054,8 +1054,14 @@ static void __page_set_anon_rmap(struct if (!exclusive) anon_vma = anon_vma->root; + /* + * page_idle does a lockless/optimistic rmap scan on page->mapping. + * Make sure the compiler doesn't split the stores of anon_vma and + * the PAGE_MAPPING_ANON type identifier, otherwise the rmap code + * could mistake the mapping for a struct address_space and crash. + */ anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; - page->mapping = (struct address_space *) anon_vma; + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma); page->index = linear_page_index(vma, address); } From patchwork Tue Dec 15 20:33:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975729 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C383AC4361B for ; Tue, 15 Dec 2020 20:33:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6E53022CAE for ; Tue, 15 Dec 2020 20:33:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6E53022CAE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EB78E6B0081; Tue, 15 Dec 2020 15:33:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E3EEE6B0082; Tue, 15 Dec 2020 15:33:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE2956B0083; Tue, 15 Dec 2020 15:33:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id B17EB6B0081 for ; Tue, 15 Dec 2020 15:33:50 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 79AD3181AEF2A for ; Tue, 15 Dec 2020 20:33:50 +0000 (UTC) X-FDA: 77596667820.20.skirt71_070ec9227426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 31D88180C0611 for ; Tue, 15 Dec 2020 20:33:50 +0000 (UTC) X-HE-Tag: skirt71_070ec9227426 X-Filterd-Recvd-Size: 4542 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:49 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:47 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064428; bh=PMreW2R4Vi4QzUU4z1kWWF1Hlk7nQ75NLwcVJaoR06g=; h=From:To:Subject:In-Reply-To:From; b=Fg6KExjlL7q5L3vlDeC5fca0+M6jzLbrO3itgGKR7adWdPVWO/cw/VOkKAdiWkFW9 VMhbe5msNv3CtTeXPhD6Zd8Bje9+f6BNjKLBWBf2rt5qu+r9Kb5J0lM5h3qKG3lfiI 970IJqDAX8TcRdWG4BQAC7DV1IzXo1RbwYumFqn8= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 07/19] mm: page_idle_get_page() does not need lru_lock Message-ID: <20201215203347.kUAcWnZn0%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm: page_idle_get_page() does not need lru_lock It is necessary for page_idle_get_page() to recheck PageLRU() after get_page_unless_zero(), but holding lru_lock around that serves no useful purpose, and adds to lru_lock contention: delete it. See https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop for the discussion that led to lru_lock there; but __page_set_anon_rmap() now uses WRITE_ONCE(), and I see no other risk in page_idle_clear_pte_refs() using rmap_walk() (beyond the risk of racing PageAnon->PageKsm, mostly but not entirely prevented by page_count() check in ksm.c's write_protect_page(): that risk being shared with page_referenced() and not helped by lru_lock). Link: https://lkml.kernel.org/r/1604566549-62481-8-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Hugh Dickins Signed-off-by: Alex Shi Acked-by: Johannes Weiner Acked-by: "Huang, Ying" Acked-by: Vlastimil Babka Cc: Vladimir Davydov Cc: Minchan Kim Cc: Alex Shi Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/page_idle.c | 4 ---- 1 file changed, 4 deletions(-) --- a/mm/page_idle.c~mm-page_idle_get_page-does-not-need-lru_lock +++ a/mm/page_idle.c @@ -32,19 +32,15 @@ static struct page *page_idle_get_page(unsigned long pfn) { struct page *page = pfn_to_online_page(pfn); - pg_data_t *pgdat; if (!page || !PageLRU(page) || !get_page_unless_zero(page)) return NULL; - pgdat = page_pgdat(page); - spin_lock_irq(&pgdat->lru_lock); if (unlikely(!PageLRU(page))) { put_page(page); page = NULL; } - spin_unlock_irq(&pgdat->lru_lock); return page; } From patchwork Tue Dec 15 20:33:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975731 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B0E5C4361B for ; Tue, 15 Dec 2020 20:33:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D2E422CAE for ; Tue, 15 Dec 2020 20:33:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D2E422CAE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BAFE08D0002; Tue, 15 Dec 2020 15:33:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B32C78D0001; Tue, 15 Dec 2020 15:33:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D3808D0002; Tue, 15 Dec 2020 15:33:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id 7DA8F8D0001 for ; Tue, 15 Dec 2020 15:33:54 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 494FB181AEF30 for ; Tue, 15 Dec 2020 20:33:54 +0000 (UTC) X-FDA: 77596667988.29.books52_3214e2a27426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 37DD118084BB7 for ; Tue, 15 Dec 2020 20:33:54 +0000 (UTC) X-HE-Tag: books52_3214e2a27426 X-Filterd-Recvd-Size: 3812 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:53 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:51 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064433; bh=YsmLonhKn5rdyprldCYnMKLk4XtNdulv/3zrJjVmsFY=; h=From:To:Subject:In-Reply-To:From; b=d8RT0mGx8La0aXDXcTCuDkk2C7V+SVXCkbrMWeTBfpUwq8eJL2TXanaSwaxTOcc0d ShxM7eIEkjcjAZf3tcOna0KTNO97+Yf634BqgjnRFxO2GPvcNiru0+W5nlmP0yQeNU ck/RjNrT+NQX5yqB1wg4zJfQJUvgYAEYFxWankIQ= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 08/19] mm/memcg: add debug checking in lock_page_memcg Message-ID: <20201215203351.k_hk2O5De%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/memcg: add debug checking in lock_page_memcg Add a debug checking in lock_page_memcg, then we could get alarm if anything wrong here. Link: https://lkml.kernel.org/r/1604566549-62481-9-git-send-email-alex.shi@linux.alibaba.com Suggested-by: Johannes Weiner Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/memcontrol.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/mm/memcontrol.c~mm-memcg-add-debug-checking-in-lock_page_memcg +++ a/mm/memcontrol.c @@ -2150,6 +2150,12 @@ again: if (unlikely(!memcg)) return NULL; +#ifdef CONFIG_PROVE_LOCKING + local_irq_save(flags); + might_lock(&memcg->move_lock); + local_irq_restore(flags); +#endif + if (atomic_read(&memcg->moving_account) <= 0) return memcg; From patchwork Tue Dec 15 20:33:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975733 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A339C4361B for ; Tue, 15 Dec 2020 20:34:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D87C022CB1 for ; Tue, 15 Dec 2020 20:33:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D87C022CB1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 78AE48D0005; Tue, 15 Dec 2020 15:33:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 73DE58D0003; Tue, 15 Dec 2020 15:33:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6035F8D0005; Tue, 15 Dec 2020 15:33:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 441528D0003 for ; Tue, 15 Dec 2020 15:33:59 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 048E3180AD81D for ; Tue, 15 Dec 2020 20:33:59 +0000 (UTC) X-FDA: 77596668198.22.bone99_53068e727426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id D711C18038E6B for ; Tue, 15 Dec 2020 20:33:58 +0000 (UTC) X-HE-Tag: bone99_53068e727426 X-Filterd-Recvd-Size: 11173 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:33:58 +0000 (UTC) Date: Tue, 15 Dec 2020 12:33:56 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064437; bh=5CUM/uATWapAyCB8TIQbp1W2SKHL3XAPrKb0rHFeVcE=; h=From:To:Subject:In-Reply-To:From; b=IjXT2CR5I+SPqTUUjzfkpuBaXMxdjTiokAH5BG9q0g3oDH+JeRWUOSrIvTIU5mKzX C8HJzmxJKx1GnpZFCgZDEHAWJRzpGRN87ZQwJKbw23n/Hn9MwSCLy9fnfcVLEsqPHV AkmITeOrA7S7vQYoATpZ7+oGw55ek80O7qxQ0LLY= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Message-ID: <20201215203356.TnUWPfD4b%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Fold the PGROTATED event collection into pagevec_move_tail_fn call back func like other funcs does in pagevec_lru_move_fn. Thus we could save func call pagevec_move_tail(). Now all usage of pagevec_lru_move_fn are same and no needs of its 3rd parameter. It's just simply the calling. No functional change. [lkp@intel.com: found a build issue in the original patch, thanks] Link: https://lkml.kernel.org/r/1604566549-62481-10-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/swap.c | 65 ++++++++++++++++++---------------------------------- 1 file changed, 23 insertions(+), 42 deletions(-) --- a/mm/swap.c~mm-swapc-fold-vm-event-pgrotated-into-pagevec_move_tail_fn +++ a/mm/swap.c @@ -204,8 +204,7 @@ int get_kernel_page(unsigned long start, EXPORT_SYMBOL_GPL(get_kernel_page); static void pagevec_lru_move_fn(struct pagevec *pvec, - void (*move_fn)(struct page *page, struct lruvec *lruvec, void *arg), - void *arg) + void (*move_fn)(struct page *page, struct lruvec *lruvec)) { int i; struct pglist_data *pgdat = NULL; @@ -224,7 +223,7 @@ static void pagevec_lru_move_fn(struct p } lruvec = mem_cgroup_page_lruvec(page, pgdat); - (*move_fn)(page, lruvec, arg); + (*move_fn)(page, lruvec); } if (pgdat) spin_unlock_irqrestore(&pgdat->lru_lock, flags); @@ -232,35 +231,22 @@ static void pagevec_lru_move_fn(struct p pagevec_reinit(pvec); } -static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec) { - int *pgmoved = arg; - if (PageLRU(page) && !PageUnevictable(page)) { del_page_from_lru_list(page, lruvec, page_lru(page)); ClearPageActive(page); add_page_to_lru_list_tail(page, lruvec, page_lru(page)); - (*pgmoved) += thp_nr_pages(page); + __count_vm_events(PGROTATED, thp_nr_pages(page)); } } /* - * pagevec_move_tail() must be called with IRQ disabled. - * Otherwise this may cause nasty races. - */ -static void pagevec_move_tail(struct pagevec *pvec) -{ - int pgmoved = 0; - - pagevec_lru_move_fn(pvec, pagevec_move_tail_fn, &pgmoved); - __count_vm_events(PGROTATED, pgmoved); -} - -/* * Writeback is about to end against a page which has been marked for immediate * reclaim. If it still appears to be reclaimable, move it to the tail of the * inactive list. + * + * rotate_reclaimable_page() must disable IRQs, to prevent nasty races. */ void rotate_reclaimable_page(struct page *page) { @@ -273,7 +259,7 @@ void rotate_reclaimable_page(struct page local_lock_irqsave(&lru_rotate.lock, flags); pvec = this_cpu_ptr(&lru_rotate.pvec); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_move_tail(pvec); + pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); local_unlock_irqrestore(&lru_rotate.lock, flags); } } @@ -315,8 +301,7 @@ void lru_note_cost_page(struct page *pag page_is_file_lru(page), thp_nr_pages(page)); } -static void __activate_page(struct page *page, struct lruvec *lruvec, - void *arg) +static void __activate_page(struct page *page, struct lruvec *lruvec) { if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { int lru = page_lru_base_type(page); @@ -340,7 +325,7 @@ static void activate_page_drain(int cpu) struct pagevec *pvec = &per_cpu(lru_pvecs.activate_page, cpu); if (pagevec_count(pvec)) - pagevec_lru_move_fn(pvec, __activate_page, NULL); + pagevec_lru_move_fn(pvec, __activate_page); } static bool need_activate_page_drain(int cpu) @@ -358,7 +343,7 @@ static void activate_page(struct page *p pvec = this_cpu_ptr(&lru_pvecs.activate_page); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_lru_move_fn(pvec, __activate_page, NULL); + pagevec_lru_move_fn(pvec, __activate_page); local_unlock(&lru_pvecs.lock); } } @@ -374,7 +359,7 @@ static void activate_page(struct page *p page = compound_head(page); spin_lock_irq(&pgdat->lru_lock); - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL); + __activate_page(page, mem_cgroup_page_lruvec(page, pgdat)); spin_unlock_irq(&pgdat->lru_lock); } #endif @@ -525,8 +510,7 @@ void lru_cache_add_inactive_or_unevictab * be write it out by flusher threads as this is much more effective * than the single-page writeout from reclaim. */ -static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) { int lru; bool active; @@ -573,8 +557,7 @@ static void lru_deactivate_file_fn(struc } } -static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) { if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { int lru = page_lru_base_type(page); @@ -591,8 +574,7 @@ static void lru_deactivate_fn(struct pag } } -static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec) { if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && !PageSwapCache(page) && !PageUnevictable(page)) { @@ -636,21 +618,21 @@ void lru_add_drain_cpu(int cpu) /* No harm done if a racing interrupt already did this */ local_lock_irqsave(&lru_rotate.lock, flags); - pagevec_move_tail(pvec); + pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); local_unlock_irqrestore(&lru_rotate.lock, flags); } pvec = &per_cpu(lru_pvecs.lru_deactivate_file, cpu); if (pagevec_count(pvec)) - pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pagevec_lru_move_fn(pvec, lru_deactivate_file_fn); pvec = &per_cpu(lru_pvecs.lru_deactivate, cpu); if (pagevec_count(pvec)) - pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pagevec_lru_move_fn(pvec, lru_deactivate_fn); pvec = &per_cpu(lru_pvecs.lru_lazyfree, cpu); if (pagevec_count(pvec)) - pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); + pagevec_lru_move_fn(pvec, lru_lazyfree_fn); activate_page_drain(cpu); } @@ -679,7 +661,7 @@ void deactivate_file_page(struct page *p pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pagevec_lru_move_fn(pvec, lru_deactivate_file_fn); local_unlock(&lru_pvecs.lock); } } @@ -701,7 +683,7 @@ void deactivate_page(struct page *page) pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pagevec_lru_move_fn(pvec, lru_deactivate_fn); local_unlock(&lru_pvecs.lock); } } @@ -723,7 +705,7 @@ void mark_page_lazyfree(struct page *pag pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); + pagevec_lru_move_fn(pvec, lru_lazyfree_fn); local_unlock(&lru_pvecs.lock); } } @@ -977,8 +959,7 @@ void __pagevec_release(struct pagevec *p } EXPORT_SYMBOL(__pagevec_release); -static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec) { enum lru_list lru; int was_unevictable = TestClearPageUnevictable(page); @@ -1037,7 +1018,7 @@ static void __pagevec_lru_add_fn(struct */ void __pagevec_lru_add(struct pagevec *pvec) { - pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL); + pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn); } /** From patchwork Tue Dec 15 22:20:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBA69C4361B for ; Tue, 15 Dec 2020 22:20:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 846DC22D49 for ; Tue, 15 Dec 2020 22:20:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 846DC22D49 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 27E768D000D; Tue, 15 Dec 2020 17:20:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 22CEF8D000C; Tue, 15 Dec 2020 17:20:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F4D28D000D; Tue, 15 Dec 2020 17:20:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id E63588D000C for ; Tue, 15 Dec 2020 17:20:53 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B62A3180AD81A for ; Tue, 15 Dec 2020 22:20:53 +0000 (UTC) X-FDA: 77596937586.23.quiet38_30000ae27427 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 99C1137606 for ; Tue, 15 Dec 2020 22:20:53 +0000 (UTC) X-HE-Tag: quiet38_30000ae27427 X-Filterd-Recvd-Size: 5525 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 22:20:53 +0000 (UTC) Date: Tue, 15 Dec 2020 14:20:50 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608070852; bh=gxvTtWnSad0FaLs7/+b3NpyvCIteFLuV4/DnD865n8E=; h=From:To:Subject:In-Reply-To:From; b=pt/JJLnsEDpsw+81ZXGeyAinLLWtYDvO4ZwjS4rE6lR+tOsqDp3NDDWnnhx1oDc2L Zq5lB9Ioj0Hzk6IsWsTc4jrQVMI9KAOanaOBDhKPhbR3wowgWa50in2OUUKjLm3pR+ +RxD3LxYuLv+PQkDht2ugWNST1SSj9eKsuBX32KM= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 10/19] mm/lru: move lock into lru_note_cost Message-ID: <20201215222050.WZsb-8EB6%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/lru: move lock into lru_note_cost We have to move lru_lock into lru_note_cost, since it cycle up on memcg tree, for future per lruvec lru_lock replace. It's a bit ugly and may cost a bit more locking, but benefit from multiple memcg locking could cover the lost. Link: https://lkml.kernel.org/r/1604566549-62481-11-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Johannes Weiner Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/swap.c | 3 +++ mm/vmscan.c | 4 +--- mm/workingset.c | 2 -- 3 files changed, 4 insertions(+), 5 deletions(-) --- a/mm/swap.c~mm-lru-move-lock-into-lru_note_cost +++ a/mm/swap.c @@ -268,7 +268,9 @@ void lru_note_cost(struct lruvec *lruvec { do { unsigned long lrusize; + struct pglist_data *pgdat = lruvec_pgdat(lruvec); + spin_lock_irq(&pgdat->lru_lock); /* Record cost event */ if (file) lruvec->file_cost += nr_pages; @@ -292,6 +294,7 @@ void lru_note_cost(struct lruvec *lruvec lruvec->file_cost /= 2; lruvec->anon_cost /= 2; } + spin_unlock_irq(&pgdat->lru_lock); } while ((lruvec = parent_lruvec(lruvec))); } --- a/mm/vmscan.c~mm-lru-move-lock-into-lru_note_cost +++ a/mm/vmscan.c @@ -1971,19 +1971,17 @@ shrink_inactive_list(unsigned long nr_to nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, &stat, false); spin_lock_irq(&pgdat->lru_lock); - move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - lru_note_cost(lruvec, file, stat.nr_pageout); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); - spin_unlock_irq(&pgdat->lru_lock); + lru_note_cost(lruvec, file, stat.nr_pageout); mem_cgroup_uncharge_list(&page_list); free_unref_page_list(&page_list); --- a/mm/workingset.c~mm-lru-move-lock-into-lru_note_cost +++ a/mm/workingset.c @@ -381,9 +381,7 @@ void workingset_refault(struct page *pag if (workingset) { SetPageWorkingset(page); /* XXX: Move to lru_cache_add() when it supports new vs putback */ - spin_lock_irq(&page_pgdat(page)->lru_lock); lru_note_cost_page(page); - spin_unlock_irq(&page_pgdat(page)->lru_lock); inc_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file); } out: From patchwork Tue Dec 15 20:34:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975735 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEE14C4361B for ; Tue, 15 Dec 2020 20:34:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 755CA22CB9 for ; Tue, 15 Dec 2020 20:34:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 755CA22CB9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F07C08D0007; Tue, 15 Dec 2020 15:34:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E8E9E8D0006; Tue, 15 Dec 2020 15:34:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D302D8D0007; Tue, 15 Dec 2020 15:34:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0062.hostedemail.com [216.40.44.62]) by kanga.kvack.org (Postfix) with ESMTP id B8A598D0006 for ; Tue, 15 Dec 2020 15:34:06 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 60B3B363C for ; Tue, 15 Dec 2020 20:34:06 +0000 (UTC) X-FDA: 77596668492.06.drink76_5610dd327426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 3766710044B09 for ; Tue, 15 Dec 2020 20:34:06 +0000 (UTC) X-HE-Tag: drink76_5610dd327426 X-Filterd-Recvd-Size: 4105 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:05 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:02 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064444; bh=j/mg/P1x2UmA5zgSLCrFTLSVkF4qHliuS8Vdn/oXcPk=; h=From:To:Subject:In-Reply-To:From; b=eG1iQJ6nJGY6T1psSxSaMlDqx9mzhMylC5XhJiT1rq0mB6nVP+VSVVqNqrprC81UA TkDV+hvExQIbAAfnpUYpyr+4ZB8dsca8OcG3vC2bcNyj3bpXy/1ZGpck2fnQ5+Ij2l 99OtugSKN0y4Ghli3atFrJ2+O8o7QS5vAw1kig68= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Message-ID: <20201215203402.8d5-PP6AL%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/vmscan: remove lruvec reget in move_pages_to_lru Isolated page shouldn't be recharged by memcg since the memcg migration isn't possible at the time. All pages were isolated from the same lruvec (and isolation inhibits memcg migration). So remove unnecessary regetting. Thanks to Alexander Duyck for pointing this out. Link: https://lkml.kernel.org/r/1604566549-62481-12-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Konstantin Khlebnikov Cc: Michal Hocko Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/vmscan.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/mm/vmscan.c~mm-vmscan-remove-lruvec-reget-in-move_pages_to_lru +++ a/mm/vmscan.c @@ -1886,7 +1886,12 @@ static unsigned noinline_for_stack move_ continue; } - lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* + * All pages were isolated from the same lruvec (and isolation + * inhibits memcg migration). + */ + VM_BUG_ON_PAGE(mem_cgroup_page_lruvec(page, page_pgdat(page)) + != lruvec, page); lru = page_lru(page); nr_pages = thp_nr_pages(page); From patchwork Tue Dec 15 20:34:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF7E9C4361B for ; Tue, 15 Dec 2020 20:34:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 71AC022CBB for ; Tue, 15 Dec 2020 20:34:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 71AC022CBB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 085DE8D0009; Tue, 15 Dec 2020 15:34:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 036798D0008; Tue, 15 Dec 2020 15:34:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E684E8D0009; Tue, 15 Dec 2020 15:34:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id CD3938D0008 for ; Tue, 15 Dec 2020 15:34:10 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 994A68249980 for ; Tue, 15 Dec 2020 20:34:10 +0000 (UTC) X-FDA: 77596668660.10.glue90_441672227426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 75E0716A044 for ; Tue, 15 Dec 2020 20:34:10 +0000 (UTC) X-HE-Tag: glue90_441672227426 X-Filterd-Recvd-Size: 5348 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:09 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:07 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064449; bh=x2mmbCdpPJf2IlvVW3PZy7aYbI/LxsQFYS4ckvi7cqo=; h=From:To:Subject:In-Reply-To:From; b=LqrzDGgWrcTV0MNUtv5qHnq1QC05+hR7q/PKbuYoiKkN+B9lt74BhO3Rd8hXtKP2M J8cjupsbYRJxnYkCwevbjIcWp1xxKmyGOPxpt+GKpRlf+3ApTJoGKmzY493enooI+P NgltWCSXxTreEi4JenxDjrWluyH2NO67GNxOmg7o= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Message-ID: <20201215203407.nGAUjeqv5%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/mlock: remove lru_lock on TestClearPageMlocked In the func munlock_vma_page, comments mentained lru_lock needed for serialization with split_huge_pages. But the page must be PageLocked as well as pages in split_huge_page series funcs. Thus the PageLocked is enough to serialize both funcs. Further more, Hugh Dickins pointed: before splitting in split_huge_page_to_list, the page was unmap_page() to remove pmd/ptes which protect the page from munlock. Thus, no needs to guard __split_huge_page_tail for mlock clean, just keep the lru_lock there for isolation purpose. LKP found a preempt issue on __mod_zone_page_state which need change to mod_zone_page_state. Thanks! Link: https://lkml.kernel.org/r/1604566549-62481-13-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Kirill A. Shutemov Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/mlock.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-) --- a/mm/mlock.c~mm-mlock-remove-lru_lock-on-testclearpagemlocked +++ a/mm/mlock.c @@ -187,40 +187,24 @@ static void __munlock_isolation_failed(s unsigned int munlock_vma_page(struct page *page) { int nr_pages; - pg_data_t *pgdat = page_pgdat(page); /* For try_to_munlock() and to serialize with page migration */ BUG_ON(!PageLocked(page)); - VM_BUG_ON_PAGE(PageTail(page), page); - /* - * Serialize with any parallel __split_huge_page_refcount() which - * might otherwise copy PageMlocked to part of the tail pages before - * we clear it in the head page. It also stabilizes thp_nr_pages(). - */ - spin_lock_irq(&pgdat->lru_lock); - if (!TestClearPageMlocked(page)) { /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ - nr_pages = 1; - goto unlock_out; + return 0; } nr_pages = thp_nr_pages(page); - __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); + mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); - if (__munlock_isolate_lru_page(page, true)) { - spin_unlock_irq(&pgdat->lru_lock); + if (!isolate_lru_page(page)) __munlock_isolated_page(page); - goto out; - } - __munlock_isolation_failed(page); - -unlock_out: - spin_unlock_irq(&pgdat->lru_lock); + else + __munlock_isolation_failed(page); -out: return nr_pages - 1; } From patchwork Tue Dec 15 20:34:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975739 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5098CC4361B for ; Tue, 15 Dec 2020 20:34:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F41EB22CE3 for ; Tue, 15 Dec 2020 20:34:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F41EB22CE3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 913518D000B; Tue, 15 Dec 2020 15:34:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C0888D000A; Tue, 15 Dec 2020 15:34:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73AFA8D000B; Tue, 15 Dec 2020 15:34:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id 59A108D000A for ; Tue, 15 Dec 2020 15:34:15 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1C8E4180AD81D for ; Tue, 15 Dec 2020 20:34:15 +0000 (UTC) X-FDA: 77596668870.10.teeth66_440108427426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id ED2E816A044 for ; Tue, 15 Dec 2020 20:34:14 +0000 (UTC) X-HE-Tag: teeth66_440108427426 X-Filterd-Recvd-Size: 4757 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:14 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:11 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064453; bh=0G01U/Boh0CSobjjtVUxYKR8XvOKNNRLFrDaW1rRS7g=; h=From:To:Subject:In-Reply-To:From; b=DkbemUi73ZEY9tyaM6w2pIF+aHv/3bO5UzM9Wa2B0q2tlem6fZYoVJyRSfAHvmJwX q0gDy4UFf37QjTzcjAblhW4l/nQTV9UX/RxHcGO0RBcrRYfQdjsTVJNxhDuFRMk8A8 /7eJ8a5xzGSKrTdDy74EYS5Rl0HtmEEjWRkesg5c= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 13/19] mm/mlock: remove __munlock_isolate_lru_page() Message-ID: <20201215203411.t0XUy3NBv%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/mlock: remove __munlock_isolate_lru_page() __munlock_isolate_lru_page() only has one caller, remove it to clean up and simplify code. Link: https://lkml.kernel.org/r/1604566549-62481-14-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Kirill A. Shutemov Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/mlock.c | 31 +++++++++---------------------- 1 file changed, 9 insertions(+), 22 deletions(-) --- a/mm/mlock.c~mm-mlock-remove-__munlock_isolate_lru_page +++ a/mm/mlock.c @@ -106,26 +106,6 @@ void mlock_vma_page(struct page *page) } /* - * Isolate a page from LRU with optional get_page() pin. - * Assumes lru_lock already held and page already pinned. - */ -static bool __munlock_isolate_lru_page(struct page *page, bool getpage) -{ - if (PageLRU(page)) { - struct lruvec *lruvec; - - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (getpage) - get_page(page); - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, page_lru(page)); - return true; - } - - return false; -} - -/* * Finish munlock after successful page isolation * * Page must be locked. This is a wrapper for try_to_munlock() @@ -296,9 +276,16 @@ static void __munlock_pagevec(struct pag * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (__munlock_isolate_lru_page(page, false)) + if (PageLRU(page)) { + struct lruvec *lruvec; + + ClearPageLRU(page); + lruvec = mem_cgroup_page_lruvec(page, + page_pgdat(page)); + del_page_from_lru_list(page, lruvec, + page_lru(page)); continue; - else + } else __munlock_isolation_failed(page); } else { delta_munlocked++; From patchwork Tue Dec 15 20:34:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4BBEC4361B for ; Tue, 15 Dec 2020 20:34:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7FFE222D00 for ; Tue, 15 Dec 2020 20:34:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FFE222D00 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F9E48D000D; Tue, 15 Dec 2020 15:34:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 084C28D000C; Tue, 15 Dec 2020 15:34:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8E5D8D000D; Tue, 15 Dec 2020 15:34:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CE7E58D000C for ; Tue, 15 Dec 2020 15:34:19 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 988D18249980 for ; Tue, 15 Dec 2020 20:34:19 +0000 (UTC) X-FDA: 77596669038.08.horn09_1904aa727426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 77ADE1819E626 for ; Tue, 15 Dec 2020 20:34:19 +0000 (UTC) X-HE-Tag: horn09_1904aa727426 X-Filterd-Recvd-Size: 9213 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:18 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:16 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064458; bh=YNnPbEldHd+o/dDMLSqQFxUWG7aTHHGGeMozOZHkt0Y=; h=From:To:Subject:In-Reply-To:From; b=M+708Sx0eNFEwvDPoW2NCqyTlAB6NljhesgoCRFbuaEw0NP5+WGSMqM5KabDmqW4z Jc+1HSb/yatA68D11tmIdqfe1VpMb455w0P538jJRDnyPie88pr73wMQrADf4S9yxI ItkCXuyuZoC7nikT57uDvLez2qNRCvOdKhKNAmxg= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 14/19] mm/lru: introduce TestClearPageLRU() Message-ID: <20201215203416.oJg1652MT%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/lru: introduce TestClearPageLRU() Currently lru_lock still guards both lru list and page's lru bit, that's ok. but if we want to use specific lruvec lock on the page, we need to pin down the page's lruvec/memcg during locking. Just taking lruvec lock first may be undermined by the page's memcg charge/migration. To fix this problem, we will clear the lru bit out of locking and use it as pin down action to block the page isolation in memcg changing. So now a standard steps of page isolation is following: 1, get_page(); #pin the page avoid to be free 2, TestClearPageLRU(); #block other isolation like memcg change 3, spin_lock on lru_lock; #serialize lru list access 4, delete page from lru list; This patch start with the first part: TestClearPageLRU, which combines PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This function will be used as page isolation precondition to prevent other isolations some where else. Then there are may !PageLRU page on lru list, need to remove BUG() checking accordingly. There 2 rules for lru bit now: 1, the lru bit still indicate if a page on lru list, just in some temporary moment(isolating), the page may have no lru bit when it's on lru list. but the page still must be on lru list when the lru bit set. 2, have to remove lru bit before delete it from lru list. As Andrew Morton mentioned this change would dirty cacheline for a page which isn't on the LRU. But the loss would be acceptable in Rong Chen report: https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/ Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com Suggested-by: Johannes Weiner Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Michal Hocko Cc: Vladimir Davydov Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 1 mm/mlock.c | 3 -- mm/vmscan.c | 39 +++++++++++++++++------------------ 3 files changed, 21 insertions(+), 22 deletions(-) --- a/include/linux/page-flags.h~mm-lru-introduce-testclearpagelru +++ a/include/linux/page-flags.h @@ -334,6 +334,7 @@ PAGEFLAG(Referenced, referenced, PF_HEAD PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) __CLEARPAGEFLAG(Dirty, dirty, PF_HEAD) PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) + TESTCLEARFLAG(LRU, lru, PF_HEAD) PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) TESTCLEARFLAG(Active, active, PF_HEAD) PAGEFLAG(Workingset, workingset, PF_HEAD) --- a/mm/mlock.c~mm-lru-introduce-testclearpagelru +++ a/mm/mlock.c @@ -276,10 +276,9 @@ static void __munlock_pagevec(struct pag * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (PageLRU(page)) { + if (TestClearPageLRU(page)) { struct lruvec *lruvec; - ClearPageLRU(page); lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); del_page_from_lru_list(page, lruvec, --- a/mm/vmscan.c~mm-lru-introduce-testclearpagelru +++ a/mm/vmscan.c @@ -1541,7 +1541,7 @@ unsigned int reclaim_clean_pages_from_li */ int __isolate_lru_page(struct page *page, isolate_mode_t mode) { - int ret = -EINVAL; + int ret = -EBUSY; /* Only take pages on the LRU. */ if (!PageLRU(page)) @@ -1551,8 +1551,6 @@ int __isolate_lru_page(struct page *page if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) return ret; - ret = -EBUSY; - /* * To minimise LRU disruption, the caller can indicate that it only * wants to isolate pages it will be able to operate on without @@ -1599,8 +1597,10 @@ int __isolate_lru_page(struct page *page * sure the page is not being freed elsewhere -- the * page release code relies on it. */ - ClearPageLRU(page); - ret = 0; + if (TestClearPageLRU(page)) + ret = 0; + else + put_page(page); } return ret; @@ -1666,8 +1666,6 @@ static unsigned long isolate_lru_pages(u page = lru_to_page(src); prefetchw_prev_lru_page(page, src, flags); - VM_BUG_ON_PAGE(!PageLRU(page), page); - nr_pages = compound_nr(page); total_scan += nr_pages; @@ -1764,21 +1762,18 @@ int isolate_lru_page(struct page *page) VM_BUG_ON_PAGE(!page_count(page), page); WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); - if (PageLRU(page)) { + if (TestClearPageLRU(page)) { pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; - spin_lock_irq(&pgdat->lru_lock); + get_page(page); lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (PageLRU(page)) { - int lru = page_lru(page); - get_page(page); - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, lru); - ret = 0; - } + spin_lock_irq(&pgdat->lru_lock); + del_page_from_lru_list(page, lruvec, page_lru(page)); spin_unlock_irq(&pgdat->lru_lock); + ret = 0; } + return ret; } @@ -4289,6 +4284,10 @@ void check_move_unevictable_pages(struct nr_pages = thp_nr_pages(page); pgscanned += nr_pages; + /* block memcg migration during page moving between lru */ + if (!TestClearPageLRU(page)) + continue; + if (pagepgdat != pgdat) { if (pgdat) spin_unlock_irq(&pgdat->lru_lock); @@ -4297,10 +4296,7 @@ void check_move_unevictable_pages(struct } lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (!PageLRU(page) || !PageUnevictable(page)) - continue; - - if (page_evictable(page)) { + if (page_evictable(page) && PageUnevictable(page)) { enum lru_list lru = page_lru_base_type(page); VM_BUG_ON_PAGE(PageActive(page), page); @@ -4309,12 +4305,15 @@ void check_move_unevictable_pages(struct add_page_to_lru_list(page, lruvec, lru); pgrescued += nr_pages; } + SetPageLRU(page); } if (pgdat) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); spin_unlock_irq(&pgdat->lru_lock); + } else if (pgscanned) { + count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } } EXPORT_SYMBOL_GPL(check_move_unevictable_pages); From patchwork Tue Dec 15 20:34:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975743 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48D80C4361B for ; Tue, 15 Dec 2020 20:34:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D6B9322AAE for ; Tue, 15 Dec 2020 20:34:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6B9322AAE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7157C8D000F; Tue, 15 Dec 2020 15:34:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C5578D000E; Tue, 15 Dec 2020 15:34:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B4F68D000F; Tue, 15 Dec 2020 15:34:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 424778D000E for ; Tue, 15 Dec 2020 15:34:24 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0D72F363C for ; Tue, 15 Dec 2020 20:34:24 +0000 (UTC) X-FDA: 77596669248.10.wrist46_2f163a227426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id E196E16A0AB for ; Tue, 15 Dec 2020 20:34:23 +0000 (UTC) X-HE-Tag: wrist46_2f163a227426 X-Filterd-Recvd-Size: 10267 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:23 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:20 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064462; bh=Nw7aksrDz7LrQPm+ighfoLv9YoD24ca39i1zcMsdOuo=; h=From:To:Subject:In-Reply-To:From; b=cQVSDXDDkNcYfbzByCNtJoAOBIdqam4Z4JHP2GajXpYNER3cfQbGNA4aKa20CqKcz cKNkG60ZrEMiRNOKQzOFf4o0j0VosS1Mwl+S3gKwCy2I0566yBpbnahQ31XfKfWt// +NA3JlEoC4WnT862V4sjhFOn3P5wJVAXXwZqwiLM= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 15/19] mm/compaction: do page isolation first in compaction Message-ID: <20201215203420.HrcA96Ocz%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/compaction: do page isolation first in compaction Currently, compaction would get the lru_lock and then do page isolation which works fine with pgdat->lru_lock, since any page isoltion would compete for the lru_lock. If we want to change to memcg lru_lock, we have to isolate the page before getting lru_lock, thus isoltion would block page's memcg change which relay on page isoltion too. Then we could safely use per memcg lru_lock later. The new page isolation use previous introduced TestClearPageLRU() + pgdat lru locking which will be changed to memcg lru lock later. Hugh Dickins fixed following bugs in this patch's early version: Fix lots of crashes under compaction load: isolate_migratepages_block() must clean up appropriately when rejecting a page, setting PageLRU again if it had been cleared; and a put_page() after get_page_unless_zero() cannot safely be done while holding locked_lruvec - it may turn out to be the final put_page(), which will take an lruvec lock when PageLRU. And move __isolate_lru_page_prepare back after get_page_unless_zero to make trylock_page() safe: trylock_page() is not safe to use at this time: its setting PG_locked can race with the page being freed or allocated ("Bad page"), and can also erase flags being set by one of those "sole owners" of a freshly allocated page who use non-atomic __SetPageFlag(). Link: https://lkml.kernel.org/r/1604566549-62481-16-git-send-email-alex.shi@linux.alibaba.com Suggested-by: Johannes Weiner Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Matthew Wilcox Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/swap.h | 2 - mm/compaction.c | 42 +++++++++++++++++++++++++++++++--------- mm/vmscan.c | 43 ++++++++++++++++++++--------------------- 3 files changed, 56 insertions(+), 31 deletions(-) --- a/include/linux/swap.h~mm-compaction-do-page-isolation-first-in-compaction +++ a/include/linux/swap.h @@ -356,7 +356,7 @@ extern void lru_cache_add_inactive_or_un extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode); +extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, --- a/mm/compaction.c~mm-compaction-do-page-isolation-first-in-compaction +++ a/mm/compaction.c @@ -890,6 +890,7 @@ isolate_migratepages_block(struct compac if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) { if (!cc->ignore_skip_hint && get_pageblock_skip(page)) { low_pfn = end_pfn; + page = NULL; goto isolate_abort; } valid_page = page; @@ -971,6 +972,21 @@ isolate_migratepages_block(struct compac if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) goto isolate_fail; + /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto isolate_fail; + + if (__isolate_lru_page_prepare(page, isolate_mode) != 0) + goto isolate_fail_put; + + /* Try isolate the page */ + if (!TestClearPageLRU(page)) + goto isolate_fail_put; + /* If we already hold the lock, we can skip some rechecking */ if (!locked) { locked = compact_lock_irqsave(&pgdat->lru_lock, @@ -983,10 +999,6 @@ isolate_migratepages_block(struct compac goto isolate_abort; } - /* Recheck PageLRU and PageCompound under lock */ - if (!PageLRU(page)) - goto isolate_fail; - /* * Page become compound since the non-locked check, * and it's on LRU. It can only be a THP so the order @@ -994,16 +1006,13 @@ isolate_migratepages_block(struct compac */ if (unlikely(PageCompound(page) && !cc->alloc_contig)) { low_pfn += compound_nr(page) - 1; - goto isolate_fail; + SetPageLRU(page); + goto isolate_fail_put; } } lruvec = mem_cgroup_page_lruvec(page, pgdat); - /* Try isolate the page */ - if (__isolate_lru_page(page, isolate_mode) != 0) - goto isolate_fail; - /* The whole page is taken off the LRU; skip the tail pages. */ if (PageCompound(page)) low_pfn += compound_nr(page) - 1; @@ -1032,6 +1041,15 @@ isolate_success: } continue; + +isolate_fail_put: + /* Avoid potential deadlock in freeing page under lru_lock */ + if (locked) { + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + locked = false; + } + put_page(page); + isolate_fail: if (!skip_on_failure) continue; @@ -1068,9 +1086,15 @@ isolate_fail: if (unlikely(low_pfn > end_pfn)) low_pfn = end_pfn; + page = NULL; + isolate_abort: if (locked) spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (page) { + SetPageLRU(page); + put_page(page); + } /* * Updated the cached scanner pfn once the pageblock has been scanned --- a/mm/vmscan.c~mm-compaction-do-page-isolation-first-in-compaction +++ a/mm/vmscan.c @@ -1539,7 +1539,7 @@ unsigned int reclaim_clean_pages_from_li * * returns 0 on success, -ve errno on failure. */ -int __isolate_lru_page(struct page *page, isolate_mode_t mode) +int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) { int ret = -EBUSY; @@ -1591,22 +1591,9 @@ int __isolate_lru_page(struct page *page if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) return ret; - if (likely(get_page_unless_zero(page))) { - /* - * Be careful not to clear PageLRU until after we're - * sure the page is not being freed elsewhere -- the - * page release code relies on it. - */ - if (TestClearPageLRU(page)) - ret = 0; - else - put_page(page); - } - - return ret; + return 0; } - /* * Update LRU sizes after isolating pages. The LRU size updates must * be complete before mem_cgroup_update_lru_size due to a sanity check. @@ -1686,20 +1673,34 @@ static unsigned long isolate_lru_pages(u * only when the page is being freed somewhere else. */ scan += nr_pages; - switch (__isolate_lru_page(page, mode)) { + switch (__isolate_lru_page_prepare(page, mode)) { case 0: + /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto busy; + + if (!TestClearPageLRU(page)) { + /* + * This page may in other isolation path, + * but we still hold lru_lock. + */ + put_page(page); + goto busy; + } + nr_taken += nr_pages; nr_zone_taken[page_zonenum(page)] += nr_pages; list_move(&page->lru, dst); break; - case -EBUSY: + default: +busy: /* else it is being freed elsewhere */ list_move(&page->lru, src); - continue; - - default: - BUG(); } } From patchwork Tue Dec 15 20:34:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975745 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83A09C4361B for ; Tue, 15 Dec 2020 20:34:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2189422510 for ; Tue, 15 Dec 2020 20:34:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2189422510 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AE4B68D0010; Tue, 15 Dec 2020 15:34:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A6E7A6B0070; Tue, 15 Dec 2020 15:34:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95E4A8D0010; Tue, 15 Dec 2020 15:34:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 7C08D6B006E for ; Tue, 15 Dec 2020 15:34:28 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 449E68249980 for ; Tue, 15 Dec 2020 20:34:28 +0000 (UTC) X-FDA: 77596669416.01.grass02_25060cc27426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 224251004F904 for ; Tue, 15 Dec 2020 20:34:28 +0000 (UTC) X-HE-Tag: grass02_25060cc27426 X-Filterd-Recvd-Size: 7624 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:27 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:25 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064466; bh=BVx8MCtmmzSLRDN8wp4Qqb/rIkvBhWvTWbbeCe3Qf8A=; h=From:To:Subject:In-Reply-To:From; b=gmMhf9dq7KW+Ubc7gegYCYLNPRgtYocPribRbe0jCwcMuAaRXuxt+qiMW7Tm8UvaV mMctC1shhrIebe4L4XpnSybd+E2/dK0+3KubPoyD/s+ki/zNFs+hP0hcJok4K5rspr AZ2lARltA2t5lXmouzR9pTduZFuKlzjbbNJdy+aU= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Message-ID: <20201215203425._MFaJyUPh%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Hugh Dickins' found a memcg change bug on original version: If we want to change the pgdat->lru_lock to memcg's lruvec lock, we have to serialize mem_cgroup_move_account during pagevec_lru_move_fn. The possible bad scenario would like: cpu 0 cpu 1 lruvec = mem_cgroup_page_lruvec() if (!isolate_lru_page()) mem_cgroup_move_account spin_lock_irqsave(&lruvec->lru_lock <== wrong lock. So we need TestClearPageLRU to block isolate_lru_page(), that serializes the memcg change. and then removing the PageLRU check in move_fn callee as the consequence. __pagevec_lru_add_fn() is different from the others, because the pages it deals with are, by definition, not yet on the lru. TestClearPageLRU is not needed and would not work, so __pagevec_lru_add() goes its own way. Link: https://lkml.kernel.org/r/1604566549-62481-17-git-send-email-alex.shi@linux.alibaba.com Reported-by: Hugh Dickins Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/swap.c | 44 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 9 deletions(-) --- a/mm/swap.c~mm-swapc-serialize-memcg-changes-in-pagevec_lru_move_fn +++ a/mm/swap.c @@ -222,8 +222,14 @@ static void pagevec_lru_move_fn(struct p spin_lock_irqsave(&pgdat->lru_lock, flags); } + /* block memcg migration during page moving between lru */ + if (!TestClearPageLRU(page)) + continue; + lruvec = mem_cgroup_page_lruvec(page, pgdat); (*move_fn)(page, lruvec); + + SetPageLRU(page); } if (pgdat) spin_unlock_irqrestore(&pgdat->lru_lock, flags); @@ -233,7 +239,7 @@ static void pagevec_lru_move_fn(struct p static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec) { - if (PageLRU(page) && !PageUnevictable(page)) { + if (!PageUnevictable(page)) { del_page_from_lru_list(page, lruvec, page_lru(page)); ClearPageActive(page); add_page_to_lru_list_tail(page, lruvec, page_lru(page)); @@ -306,7 +312,7 @@ void lru_note_cost_page(struct page *pag static void __activate_page(struct page *page, struct lruvec *lruvec) { - if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { + if (!PageActive(page) && !PageUnevictable(page)) { int lru = page_lru_base_type(page); int nr_pages = thp_nr_pages(page); @@ -362,7 +368,8 @@ static void activate_page(struct page *p page = compound_head(page); spin_lock_irq(&pgdat->lru_lock); - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat)); + if (PageLRU(page)) + __activate_page(page, mem_cgroup_page_lruvec(page, pgdat)); spin_unlock_irq(&pgdat->lru_lock); } #endif @@ -519,9 +526,6 @@ static void lru_deactivate_file_fn(struc bool active; int nr_pages = thp_nr_pages(page); - if (!PageLRU(page)) - return; - if (PageUnevictable(page)) return; @@ -562,7 +566,7 @@ static void lru_deactivate_file_fn(struc static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) { - if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + if (PageActive(page) && !PageUnevictable(page)) { int lru = page_lru_base_type(page); int nr_pages = thp_nr_pages(page); @@ -579,7 +583,7 @@ static void lru_deactivate_fn(struct pag static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec) { - if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && + if (PageAnon(page) && PageSwapBacked(page) && !PageSwapCache(page) && !PageUnevictable(page)) { bool active = PageActive(page); int nr_pages = thp_nr_pages(page); @@ -1021,7 +1025,29 @@ static void __pagevec_lru_add_fn(struct */ void __pagevec_lru_add(struct pagevec *pvec) { - pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn); + int i; + struct pglist_data *pgdat = NULL; + struct lruvec *lruvec; + unsigned long flags = 0; + + for (i = 0; i < pagevec_count(pvec); i++) { + struct page *page = pvec->pages[i]; + struct pglist_data *pagepgdat = page_pgdat(page); + + if (pagepgdat != pgdat) { + if (pgdat) + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + pgdat = pagepgdat; + spin_lock_irqsave(&pgdat->lru_lock, flags); + } + + lruvec = mem_cgroup_page_lruvec(page, pgdat); + __pagevec_lru_add_fn(page, lruvec); + } + if (pgdat) + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + release_pages(pvec->pages, pvec->nr); + pagevec_reinit(pvec); } /** From patchwork Tue Dec 15 20:34:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27129C4361B for ; Tue, 15 Dec 2020 20:34:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AD2A3222BB for ; Tue, 15 Dec 2020 20:34:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD2A3222BB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 47B466B0070; Tue, 15 Dec 2020 15:34:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 409358D0011; Tue, 15 Dec 2020 15:34:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A66D6B0072; Tue, 15 Dec 2020 15:34:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id 08D316B0070 for ; Tue, 15 Dec 2020 15:34:33 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C8861181AEF32 for ; Tue, 15 Dec 2020 20:34:32 +0000 (UTC) X-FDA: 77596669584.20.gate87_060b59427426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 878B5180C060E for ; Tue, 15 Dec 2020 20:34:32 +0000 (UTC) X-HE-Tag: gate87_060b59427426 X-Filterd-Recvd-Size: 32176 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:31 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:29 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064470; bh=2eO4O38Y0IYFQXKbSvtsg7ndNfXCdsCob4+qTUbP24I=; h=From:To:Subject:In-Reply-To:From; b=pFXndZhdd8xYtqLdWRNn3o9wgkBe+wQIzTPDc4I5r27yC2TkeTuv7NCDdqlTisOrA Kpc8h21HVky8TTk+yX4iYWXEXsZEjQgGdYqxgHpK/o2xPEC9W02w7JiwLcm+5ZQ3fC TIX8PsJN1SPtL9KTbswJVhBwmHElRJMT6uONNl6s= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Message-ID: <20201215203429.pD3dkJ0Zz%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alex Shi Subject: mm/lru: replace pgdat lru_lock with lruvec lock This patch moves per node lru_lock into lruvec, thus bring a lru_lock for each of memcg per node. So on a large machine, each of memcg don't have to suffer from per node pgdat->lru_lock competition. They could go fast with their self lru_lock. After move memcg charge before lru inserting, page isolation could serialize page's memcg, then per memcg lruvec lock is stable and could replace per node lru lock. In isolate_migratepages_block(), compact_unlock_should_abort and lock_page_lruvec_irqsave are open coded to work with compact_control. Also add a debug func in locking which may give some clues if there are sth out of hands. Daniel Jordan's testing show 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box. https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/ Hugh Dickins helped on the patch polish, thanks! [alex.shi@linux.alibaba.com: fix comment typo] Link: https://lkml.kernel.org/r/5b085715-292a-4b43-50b3-d73dc90d1de5@linux.alibaba.com [alex.shi@linux.alibaba.com: use page_memcg()] Link: https://lkml.kernel.org/r/5a4c2b72-7ee8-2478-fc0e-85eb83aafec4@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-18-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Rong Chen Cc: Michal Hocko Cc: Vladimir Davydov Cc: Yang Shi Cc: Matthew Wilcox Cc: Konstantin Khlebnikov Cc: Daniel Jordan Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Wei Yang Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 58 +++++++++++++++++ include/linux/mmzone.h | 3 mm/compaction.c | 56 ++++++++++------ mm/huge_memory.c | 11 +-- mm/memcontrol.c | 78 ++++++++++++++++++++++- mm/mlock.c | 22 ++++-- mm/mmzone.c | 1 mm/page_alloc.c | 1 mm/swap.c | 116 +++++++++++++++++------------------ mm/vmscan.c | 55 +++++++--------- 10 files changed, 275 insertions(+), 126 deletions(-) --- a/include/linux/memcontrol.h~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/include/linux/memcontrol.h @@ -491,6 +491,19 @@ struct mem_cgroup *get_mem_cgroup_from_m struct mem_cgroup *get_mem_cgroup_from_page(struct page *page); +struct lruvec *lock_page_lruvec(struct page *page); +struct lruvec *lock_page_lruvec_irq(struct page *page); +struct lruvec *lock_page_lruvec_irqsave(struct page *page, + unsigned long *flags); + +#ifdef CONFIG_DEBUG_VM +void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page); +#else +static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) +{ +} +#endif + static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -996,6 +1009,31 @@ static inline void mem_cgroup_put(struct { } +static inline struct lruvec *lock_page_lruvec(struct page *page) +{ + struct pglist_data *pgdat = page_pgdat(page); + + spin_lock(&pgdat->__lruvec.lru_lock); + return &pgdat->__lruvec; +} + +static inline struct lruvec *lock_page_lruvec_irq(struct page *page) +{ + struct pglist_data *pgdat = page_pgdat(page); + + spin_lock_irq(&pgdat->__lruvec.lru_lock); + return &pgdat->__lruvec; +} + +static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page, + unsigned long *flagsp) +{ + struct pglist_data *pgdat = page_pgdat(page); + + spin_lock_irqsave(&pgdat->__lruvec.lru_lock, *flagsp); + return &pgdat->__lruvec; +} + static inline struct mem_cgroup * mem_cgroup_iter(struct mem_cgroup *root, struct mem_cgroup *prev, @@ -1215,6 +1253,10 @@ static inline void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) { } + +static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) +{ +} #endif /* CONFIG_MEMCG */ /* idx can be of type enum memcg_stat_item or node_stat_item */ @@ -1296,6 +1338,22 @@ static inline struct lruvec *parent_lruv return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); } +static inline void unlock_page_lruvec(struct lruvec *lruvec) +{ + spin_unlock(&lruvec->lru_lock); +} + +static inline void unlock_page_lruvec_irq(struct lruvec *lruvec) +{ + spin_unlock_irq(&lruvec->lru_lock); +} + +static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, + unsigned long flags) +{ + spin_unlock_irqrestore(&lruvec->lru_lock, flags); +} + #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); --- a/include/linux/mmzone.h~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/include/linux/mmzone.h @@ -276,6 +276,8 @@ enum lruvec_flags { struct lruvec { struct list_head lists[NR_LRU_LISTS]; + /* per lruvec lru_lock for memcg */ + spinlock_t lru_lock; /* * These track the cost of reclaiming one LRU - file or anon - * over the other. As the observed cost of reclaiming one LRU @@ -782,7 +784,6 @@ typedef struct pglist_data { /* Write-intensive fields used by page reclaim */ ZONE_PADDING(_pad1_) - spinlock_t lru_lock; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* --- a/mm/compaction.c~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/mm/compaction.c @@ -804,7 +804,7 @@ isolate_migratepages_block(struct compac unsigned long nr_scanned = 0, nr_isolated = 0; struct lruvec *lruvec; unsigned long flags = 0; - bool locked = false; + struct lruvec *locked = NULL; struct page *page = NULL, *valid_page = NULL; unsigned long start_pfn = low_pfn; bool skip_on_failure = false; @@ -868,11 +868,20 @@ isolate_migratepages_block(struct compac * contention, to give chance to IRQs. Abort completely if * a fatal signal is pending. */ - if (!(low_pfn % SWAP_CLUSTER_MAX) - && compact_unlock_should_abort(&pgdat->lru_lock, - flags, &locked, cc)) { - low_pfn = 0; - goto fatal_pending; + if (!(low_pfn % SWAP_CLUSTER_MAX)) { + if (locked) { + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; + } + + if (fatal_signal_pending(current)) { + cc->contended = true; + + low_pfn = 0; + goto fatal_pending; + } + + cond_resched(); } if (!pfn_valid_within(low_pfn)) @@ -944,9 +953,8 @@ isolate_migratepages_block(struct compac if (unlikely(__PageMovable(page)) && !PageIsolated(page)) { if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, - flags); - locked = false; + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; } if (!isolate_movable_page(page, isolate_mode)) @@ -987,10 +995,19 @@ isolate_migratepages_block(struct compac if (!TestClearPageLRU(page)) goto isolate_fail_put; + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* If we already hold the lock, we can skip some rechecking */ - if (!locked) { - locked = compact_lock_irqsave(&pgdat->lru_lock, - &flags, cc); + if (lruvec != locked) { + if (locked) + unlock_page_lruvec_irqrestore(locked, flags); + + compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + locked = lruvec; + rcu_read_unlock(); + + lruvec_memcg_debug(lruvec, page); /* Try get exclusive access under lock */ if (!skip_updated) { @@ -1009,9 +1026,8 @@ isolate_migratepages_block(struct compac SetPageLRU(page); goto isolate_fail_put; } - } - - lruvec = mem_cgroup_page_lruvec(page, pgdat); + } else + rcu_read_unlock(); /* The whole page is taken off the LRU; skip the tail pages. */ if (PageCompound(page)) @@ -1045,8 +1061,8 @@ isolate_success: isolate_fail_put: /* Avoid potential deadlock in freeing page under lru_lock */ if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - locked = false; + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; } put_page(page); @@ -1061,8 +1077,8 @@ isolate_fail: */ if (nr_isolated) { if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - locked = false; + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; } putback_movable_pages(&cc->migratepages); cc->nr_migratepages = 0; @@ -1090,7 +1106,7 @@ isolate_fail: isolate_abort: if (locked) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + unlock_page_lruvec_irqrestore(locked, flags); if (page) { SetPageLRU(page); put_page(page); --- a/mm/huge_memory.c~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/mm/huge_memory.c @@ -2365,7 +2365,7 @@ static void lru_add_page_tail(struct pag VM_BUG_ON_PAGE(!PageHead(head), head); VM_BUG_ON_PAGE(PageCompound(tail), head); VM_BUG_ON_PAGE(PageLRU(tail), head); - lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); + lockdep_assert_held(&lruvec->lru_lock); if (list) { /* page reclaim is reclaiming a huge page */ @@ -2449,7 +2449,6 @@ static void __split_huge_page(struct pag pgoff_t end) { struct page *head = compound_head(page); - pg_data_t *pgdat = page_pgdat(head); struct lruvec *lruvec; struct address_space *swap_cache = NULL; unsigned long offset = 0; @@ -2467,10 +2466,8 @@ static void __split_huge_page(struct pag xa_lock(&swap_cache->i_pages); } - /* prevent PageLRU to go away from under us, and freeze lru stats */ - spin_lock(&pgdat->lru_lock); - - lruvec = mem_cgroup_page_lruvec(head, pgdat); + /* lock lru list/PageCompound, ref freezed by page_ref_freeze */ + lruvec = lock_page_lruvec(head); for (i = nr - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); @@ -2491,7 +2488,7 @@ static void __split_huge_page(struct pag } ClearPageCompound(head); - spin_unlock(&pgdat->lru_lock); + unlock_page_lruvec(lruvec); /* Caller disabled irqs, so they are still disabled here */ split_page_owner(head, nr); --- a/mm/memcontrol.c~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/mm/memcontrol.c @@ -20,6 +20,9 @@ * Lockless page tracking & accounting * Unified hierarchy configuration model * Copyright (C) 2015 Red Hat, Inc., Johannes Weiner + * + * Per memcg lru locking + * Copyright (C) 2020 Alibaba, Inc, Alex Shi */ #include @@ -1330,6 +1333,23 @@ int mem_cgroup_scan_tasks(struct mem_cgr return ret; } +#ifdef CONFIG_DEBUG_VM +void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) +{ + struct mem_cgroup *memcg; + + if (mem_cgroup_disabled()) + return; + + memcg = page_memcg(page); + + if (!memcg) + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != root_mem_cgroup, page); + else + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != memcg, page); +} +#endif + /** * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page * @page: the page @@ -1371,6 +1391,60 @@ out: } /** + * lock_page_lruvec - lock and return lruvec for a given page. + * @page: the page + * + * This series functions should be used in either conditions: + * PageLRU is cleared or unset + * or page->_refcount is zero + * or page is locked. + */ +struct lruvec *lock_page_lruvec(struct page *page) +{ + struct lruvec *lruvec; + struct pglist_data *pgdat = page_pgdat(page); + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + spin_lock(&lruvec->lru_lock); + rcu_read_unlock(); + + lruvec_memcg_debug(lruvec, page); + + return lruvec; +} + +struct lruvec *lock_page_lruvec_irq(struct page *page) +{ + struct lruvec *lruvec; + struct pglist_data *pgdat = page_pgdat(page); + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + spin_lock_irq(&lruvec->lru_lock); + rcu_read_unlock(); + + lruvec_memcg_debug(lruvec, page); + + return lruvec; +} + +struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) +{ + struct lruvec *lruvec; + struct pglist_data *pgdat = page_pgdat(page); + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + spin_lock_irqsave(&lruvec->lru_lock, *flags); + rcu_read_unlock(); + + lruvec_memcg_debug(lruvec, page); + + return lruvec; +} + +/** * mem_cgroup_update_lru_size - account for adding or removing an lru page * @lruvec: mem_cgroup per zone lru vector * @lru: index of lru list the page is sitting on @@ -3281,10 +3355,8 @@ void obj_cgroup_uncharge(struct obj_cgro #endif /* CONFIG_MEMCG_KMEM */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE - /* - * Because tail pages are not marked as "used", set it. We're under - * pgdat->lru_lock and migration entries setup in all page mappings. + * Because page_memcg(head) is not set on compound tails, set it now. */ void mem_cgroup_split_huge_fixup(struct page *head) { --- a/mm/mlock.c~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/mm/mlock.c @@ -262,12 +262,12 @@ static void __munlock_pagevec(struct pag int nr = pagevec_count(pvec); int delta_munlocked = -nr; struct pagevec pvec_putback; + struct lruvec *lruvec = NULL; int pgrescued = 0; pagevec_init(&pvec_putback); /* Phase 1: page isolation */ - spin_lock_irq(&zone->zone_pgdat->lru_lock); for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; @@ -277,10 +277,16 @@ static void __munlock_pagevec(struct pag * so we can spare the get_page() here. */ if (TestClearPageLRU(page)) { - struct lruvec *lruvec; + struct lruvec *new_lruvec; + + new_lruvec = mem_cgroup_page_lruvec(page, + page_pgdat(page)); + if (new_lruvec != lruvec) { + if (lruvec) + unlock_page_lruvec_irq(lruvec); + lruvec = lock_page_lruvec_irq(page); + } - lruvec = mem_cgroup_page_lruvec(page, - page_pgdat(page)); del_page_from_lru_list(page, lruvec, page_lru(page)); continue; @@ -299,8 +305,12 @@ static void __munlock_pagevec(struct pag pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; } - __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); - spin_unlock_irq(&zone->zone_pgdat->lru_lock); + if (lruvec) { + __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); + unlock_page_lruvec_irq(lruvec); + } else if (delta_munlocked) { + mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); + } /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); --- a/mm/mmzone.c~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/mm/mmzone.c @@ -77,6 +77,7 @@ void lruvec_init(struct lruvec *lruvec) enum lru_list lru; memset(lruvec, 0, sizeof(struct lruvec)); + spin_lock_init(&lruvec->lru_lock); for_each_lru(lru) INIT_LIST_HEAD(&lruvec->lists[lru]); --- a/mm/page_alloc.c~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/mm/page_alloc.c @@ -6870,7 +6870,6 @@ static void __meminit pgdat_init_interna init_waitqueue_head(&pgdat->pfmemalloc_wait); pgdat_page_ext_init(pgdat); - spin_lock_init(&pgdat->lru_lock); lruvec_init(&pgdat->__lruvec); } --- a/mm/swap.c~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/mm/swap.c @@ -79,16 +79,14 @@ static DEFINE_PER_CPU(struct lru_pvecs, static void __page_cache_release(struct page *page) { if (PageLRU(page)) { - pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; unsigned long flags; - spin_lock_irqsave(&pgdat->lru_lock, flags); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irqsave(page, &flags); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + unlock_page_lruvec_irqrestore(lruvec, flags); } __ClearPageWaiters(page); } @@ -207,32 +205,30 @@ static void pagevec_lru_move_fn(struct p void (*move_fn)(struct page *page, struct lruvec *lruvec)) { int i; - struct pglist_data *pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags = 0; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); - - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - pgdat = pagepgdat; - spin_lock_irqsave(&pgdat->lru_lock, flags); - } + struct lruvec *new_lruvec; /* block memcg migration during page moving between lru */ if (!TestClearPageLRU(page)) continue; - lruvec = mem_cgroup_page_lruvec(page, pgdat); + new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + if (lruvec != new_lruvec) { + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = lock_page_lruvec_irqsave(page, &flags); + } + (*move_fn)(page, lruvec); SetPageLRU(page); } - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); } @@ -274,9 +270,15 @@ void lru_note_cost(struct lruvec *lruvec { do { unsigned long lrusize; - struct pglist_data *pgdat = lruvec_pgdat(lruvec); - spin_lock_irq(&pgdat->lru_lock); + /* + * Hold lruvec->lru_lock is safe here, since + * 1) The pinned lruvec in reclaim, or + * 2) From a pre-LRU page during refault (which also holds the + * rcu lock, so would be safe even if the page was on the LRU + * and could move simultaneously to a new lruvec). + */ + spin_lock_irq(&lruvec->lru_lock); /* Record cost event */ if (file) lruvec->file_cost += nr_pages; @@ -300,7 +302,7 @@ void lru_note_cost(struct lruvec *lruvec lruvec->file_cost /= 2; lruvec->anon_cost /= 2; } - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); } while ((lruvec = parent_lruvec(lruvec))); } @@ -364,13 +366,15 @@ static inline void activate_page_drain(i static void activate_page(struct page *page) { - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; page = compound_head(page); - spin_lock_irq(&pgdat->lru_lock); - if (PageLRU(page)) - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat)); - spin_unlock_irq(&pgdat->lru_lock); + if (TestClearPageLRU(page)) { + lruvec = lock_page_lruvec_irq(page); + __activate_page(page, lruvec); + unlock_page_lruvec_irq(lruvec); + SetPageLRU(page); + } } #endif @@ -860,8 +864,7 @@ void release_pages(struct page **pages, { int i; LIST_HEAD(pages_to_free); - struct pglist_data *locked_pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags; unsigned int lock_batch; @@ -871,11 +874,11 @@ void release_pages(struct page **pages, /* * Make sure the IRQ-safe lock-holding time does not get * excessive with a continuous string of pages from the - * same pgdat. The lock is held only if pgdat != NULL. + * same lruvec. The lock is held only if lruvec != NULL. */ - if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } page = compound_head(page); @@ -883,10 +886,9 @@ void release_pages(struct page **pages, continue; if (is_zone_device_page(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); - locked_pgdat = NULL; + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } /* * ZONE_DEVICE pages that return 'false' from @@ -907,27 +909,27 @@ void release_pages(struct page **pages, continue; if (PageCompound(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } __put_compound_page(page); continue; } if (PageLRU(page)) { - struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *new_lruvec; - if (pgdat != locked_pgdat) { - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, + new_lruvec = mem_cgroup_page_lruvec(page, + page_pgdat(page)); + if (new_lruvec != lruvec) { + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); lock_batch = 0; - locked_pgdat = pgdat; - spin_lock_irqsave(&locked_pgdat->lru_lock, flags); + lruvec = lock_page_lruvec_irqsave(page, &flags); } - lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); @@ -937,8 +939,8 @@ void release_pages(struct page **pages, list_add(&page->lru, &pages_to_free); } - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); @@ -1026,26 +1028,24 @@ static void __pagevec_lru_add_fn(struct void __pagevec_lru_add(struct pagevec *pvec) { int i; - struct pglist_data *pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags = 0; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); + struct lruvec *new_lruvec; - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - pgdat = pagepgdat; - spin_lock_irqsave(&pgdat->lru_lock, flags); + new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + if (lruvec != new_lruvec) { + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = lock_page_lruvec_irqsave(page, &flags); } - lruvec = mem_cgroup_page_lruvec(page, pgdat); __pagevec_lru_add_fn(page, lruvec); } - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); } --- a/mm/vmscan.c~mm-lru-replace-pgdat-lru_lock-with-lruvec-lock +++ a/mm/vmscan.c @@ -1764,14 +1764,12 @@ int isolate_lru_page(struct page *page) WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); if (TestClearPageLRU(page)) { - pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; get_page(page); - lruvec = mem_cgroup_page_lruvec(page, pgdat); - spin_lock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page); del_page_from_lru_list(page, lruvec, page_lru(page)); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); ret = 0; } @@ -1838,7 +1836,6 @@ static int too_many_isolated(struct pgli static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, struct list_head *list) { - struct pglist_data *pgdat = lruvec_pgdat(lruvec); int nr_pages, nr_moved = 0; LIST_HEAD(pages_to_free); struct page *page; @@ -1849,9 +1846,9 @@ static unsigned noinline_for_stack move_ VM_BUG_ON_PAGE(PageLRU(page), page); list_del(&page->lru); if (unlikely(!page_evictable(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); putback_lru_page(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); continue; } @@ -1873,9 +1870,9 @@ static unsigned noinline_for_stack move_ __ClearPageActive(page); if (unlikely(PageCompound(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); destroy_compound_page(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); } else list_add(&page->lru, &pages_to_free); @@ -1952,7 +1949,7 @@ shrink_inactive_list(unsigned long nr_to lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); @@ -1964,14 +1961,14 @@ shrink_inactive_list(unsigned long nr_to __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); __count_vm_events(PGSCAN_ANON + file, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); if (nr_taken == 0) return 0; nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, &stat, false); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); @@ -1980,7 +1977,7 @@ shrink_inactive_list(unsigned long nr_to __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); lru_note_cost(lruvec, file, stat.nr_pageout); mem_cgroup_uncharge_list(&page_list); @@ -2033,7 +2030,7 @@ static void shrink_active_list(unsigned lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); @@ -2044,7 +2041,7 @@ static void shrink_active_list(unsigned __count_vm_events(PGREFILL, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); while (!list_empty(&l_hold)) { cond_resched(); @@ -2090,7 +2087,7 @@ static void shrink_active_list(unsigned /* * Move pages back to the lru list. */ - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_activate = move_pages_to_lru(lruvec, &l_active); nr_deactivate = move_pages_to_lru(lruvec, &l_inactive); @@ -2101,7 +2098,7 @@ static void shrink_active_list(unsigned __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_list(&l_active); free_unref_page_list(&l_active); @@ -2689,10 +2686,10 @@ again: /* * Determine the scan balance between anon and file LRUs. */ - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&target_lruvec->lru_lock); sc->anon_cost = target_lruvec->anon_cost; sc->file_cost = target_lruvec->file_cost; - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&target_lruvec->lru_lock); /* * Target desirable inactive:active list ratios for the anon @@ -4268,16 +4265,15 @@ int node_reclaim(struct pglist_data *pgd */ void check_move_unevictable_pages(struct pagevec *pvec) { - struct lruvec *lruvec; - struct pglist_data *pgdat = NULL; + struct lruvec *lruvec = NULL; int pgscanned = 0; int pgrescued = 0; int i; for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); int nr_pages; + struct lruvec *new_lruvec; if (PageTransTail(page)) continue; @@ -4289,13 +4285,12 @@ void check_move_unevictable_pages(struct if (!TestClearPageLRU(page)) continue; - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irq(&pgdat->lru_lock); - pgdat = pagepgdat; - spin_lock_irq(&pgdat->lru_lock); + new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + if (lruvec != new_lruvec) { + if (lruvec) + unlock_page_lruvec_irq(lruvec); + lruvec = lock_page_lruvec_irq(page); } - lruvec = mem_cgroup_page_lruvec(page, pgdat); if (page_evictable(page) && PageUnevictable(page)) { enum lru_list lru = page_lru_base_type(page); @@ -4309,10 +4304,10 @@ void check_move_unevictable_pages(struct SetPageLRU(page); } - if (pgdat) { + if (lruvec) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); } else if (pgscanned) { count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } From patchwork Tue Dec 15 20:34:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1D0BC4361B for ; Tue, 15 Dec 2020 20:34:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9670A222BB for ; Tue, 15 Dec 2020 20:34:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9670A222BB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3B95F8D0013; Tue, 15 Dec 2020 15:34:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 340D68D0012; Tue, 15 Dec 2020 15:34:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E3DB8D0013; Tue, 15 Dec 2020 15:34:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 010E48D0012 for ; Tue, 15 Dec 2020 15:34:36 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B999C629 for ; Tue, 15 Dec 2020 20:34:36 +0000 (UTC) X-FDA: 77596669752.25.magic09_3f0812427426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 903571804E3B7 for ; Tue, 15 Dec 2020 20:34:36 +0000 (UTC) X-HE-Tag: magic09_3f0812427426 X-Filterd-Recvd-Size: 10185 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:35 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:33 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064475; bh=mpOsvdVNyVZwb8pCK2adVRJD4Osq5c6wcgfx2B+scWc=; h=From:To:Subject:In-Reply-To:From; b=SDGlEwz83pKvHoRNaltzN1vRwQ6XDPp/QOEc6E7+lcEpssaoFxSCBHPUghlYm7wqX MSP4p7NVU9GTuvjy2YGAJi28oZPFGg7BgTlQsnapKJfTwiFFMVF0rGCAEVRg3RIvOK TkNpKvM6BymIMWQsW5RyPRbH42V6VvFrnJU9zVv4= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.h.duyck@linux.intel.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 18/19] mm/lru: introduce relock_page_lruvec() Message-ID: <20201215203433.gd0RZE7Fb%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexander Duyck Subject: mm/lru: introduce relock_page_lruvec() Add relock_page_lruvec() to replace repeated same code, no functional change. When testing for relock we can avoid the need for RCU locking if we simply compare the page pgdat and memcg pointers versus those that the lruvec is holding. By doing this we can avoid the extra pointer walks and accesses of the memory cgroup. In addition we can avoid the checks entirely if lruvec is currently NULL. [alex.shi@linux.alibaba.com: use page_memcg()] Link: https://lkml.kernel.org/r/66d8e79d-7ec6-bfbc-1c82-bf32db3ae5b7@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-19-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alexander Duyck Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Thomas Gleixner Cc: Andrey Ryabinin Cc: Matthew Wilcox Cc: Mel Gorman Cc: Konstantin Khlebnikov Cc: Tejun Heo Cc: Andrea Arcangeli Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Vladimir Davydov Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 52 +++++++++++++++++++++++++++++++++++ mm/mlock.c | 11 ------- mm/swap.c | 31 ++++---------------- mm/vmscan.c | 12 +------- 4 files changed, 61 insertions(+), 45 deletions(-) --- a/include/linux/memcontrol.h~mm-lru-introduce-the-relock_page_lruvec-function +++ a/include/linux/memcontrol.h @@ -485,6 +485,22 @@ out: struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); +static inline bool lruvec_holds_page_lru_lock(struct page *page, + struct lruvec *lruvec) +{ + pg_data_t *pgdat = page_pgdat(page); + const struct mem_cgroup *memcg; + struct mem_cgroup_per_node *mz; + + if (mem_cgroup_disabled()) + return lruvec == &pgdat->__lruvec; + + mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec); + memcg = page_memcg(page) ? : root_mem_cgroup; + + return lruvec->pgdat == pgdat && mz->memcg == memcg; +} + struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); @@ -984,6 +1000,14 @@ static inline struct lruvec *mem_cgroup_ return &pgdat->__lruvec; } +static inline bool lruvec_holds_page_lru_lock(struct page *page, + struct lruvec *lruvec) +{ + pg_data_t *pgdat = page_pgdat(page); + + return lruvec == &pgdat->__lruvec; +} + static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { return NULL; @@ -1354,6 +1378,34 @@ static inline void unlock_page_lruvec_ir spin_unlock_irqrestore(&lruvec->lru_lock, flags); } +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irq(struct page *page, + struct lruvec *locked_lruvec) +{ + if (locked_lruvec) { + if (lruvec_holds_page_lru_lock(page, locked_lruvec)) + return locked_lruvec; + + unlock_page_lruvec_irq(locked_lruvec); + } + + return lock_page_lruvec_irq(page); +} + +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page, + struct lruvec *locked_lruvec, unsigned long *flags) +{ + if (locked_lruvec) { + if (lruvec_holds_page_lru_lock(page, locked_lruvec)) + return locked_lruvec; + + unlock_page_lruvec_irqrestore(locked_lruvec, *flags); + } + + return lock_page_lruvec_irqsave(page, flags); +} + #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); --- a/mm/mlock.c~mm-lru-introduce-the-relock_page_lruvec-function +++ a/mm/mlock.c @@ -277,16 +277,7 @@ static void __munlock_pagevec(struct pag * so we can spare the get_page() here. */ if (TestClearPageLRU(page)) { - struct lruvec *new_lruvec; - - new_lruvec = mem_cgroup_page_lruvec(page, - page_pgdat(page)); - if (new_lruvec != lruvec) { - if (lruvec) - unlock_page_lruvec_irq(lruvec); - lruvec = lock_page_lruvec_irq(page); - } - + lruvec = relock_page_lruvec_irq(page, lruvec); del_page_from_lru_list(page, lruvec, page_lru(page)); continue; --- a/mm/swap.c~mm-lru-introduce-the-relock_page_lruvec-function +++ a/mm/swap.c @@ -210,19 +210,12 @@ static void pagevec_lru_move_fn(struct p for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct lruvec *new_lruvec; /* block memcg migration during page moving between lru */ if (!TestClearPageLRU(page)) continue; - new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (lruvec != new_lruvec) { - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = lock_page_lruvec_irqsave(page, &flags); - } - + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); (*move_fn)(page, lruvec); SetPageLRU(page); @@ -918,17 +911,12 @@ void release_pages(struct page **pages, } if (PageLRU(page)) { - struct lruvec *new_lruvec; + struct lruvec *prev_lruvec = lruvec; - new_lruvec = mem_cgroup_page_lruvec(page, - page_pgdat(page)); - if (new_lruvec != lruvec) { - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, - flags); + lruvec = relock_page_lruvec_irqsave(page, lruvec, + &flags); + if (prev_lruvec != lruvec) lock_batch = 0; - lruvec = lock_page_lruvec_irqsave(page, &flags); - } VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); @@ -1033,15 +1021,8 @@ void __pagevec_lru_add(struct pagevec *p for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct lruvec *new_lruvec; - - new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (lruvec != new_lruvec) { - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = lock_page_lruvec_irqsave(page, &flags); - } + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); __pagevec_lru_add_fn(page, lruvec); } if (lruvec) --- a/mm/vmscan.c~mm-lru-introduce-the-relock_page_lruvec-function +++ a/mm/vmscan.c @@ -1883,8 +1883,7 @@ static unsigned noinline_for_stack move_ * All pages were isolated from the same lruvec (and isolation * inhibits memcg migration). */ - VM_BUG_ON_PAGE(mem_cgroup_page_lruvec(page, page_pgdat(page)) - != lruvec, page); + VM_BUG_ON_PAGE(!lruvec_holds_page_lru_lock(page, lruvec), page); lru = page_lru(page); nr_pages = thp_nr_pages(page); @@ -4273,7 +4272,6 @@ void check_move_unevictable_pages(struct for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; int nr_pages; - struct lruvec *new_lruvec; if (PageTransTail(page)) continue; @@ -4285,13 +4283,7 @@ void check_move_unevictable_pages(struct if (!TestClearPageLRU(page)) continue; - new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (lruvec != new_lruvec) { - if (lruvec) - unlock_page_lruvec_irq(lruvec); - lruvec = lock_page_lruvec_irq(page); - } - + lruvec = relock_page_lruvec_irq(page, lruvec); if (page_evictable(page) && PageUnevictable(page)) { enum lru_list lru = page_lru_base_type(page); From patchwork Tue Dec 15 22:21:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11975891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61940C4361B for ; Tue, 15 Dec 2020 22:21:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F1D3322BE9 for ; Tue, 15 Dec 2020 22:21:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F1D3322BE9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8BC718D0002; Tue, 15 Dec 2020 17:21:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 86D2B8D0001; Tue, 15 Dec 2020 17:21:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 735248D0002; Tue, 15 Dec 2020 17:21:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 59F7D8D0001 for ; Tue, 15 Dec 2020 17:21:35 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 228A81EE6 for ; Tue, 15 Dec 2020 22:21:35 +0000 (UTC) X-FDA: 77596939350.02.form37_0e128e327427 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 09F1210097AA0 for ; Tue, 15 Dec 2020 22:21:35 +0000 (UTC) X-HE-Tag: form37_0e128e327427 X-Filterd-Recvd-Size: 17279 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 22:21:33 +0000 (UTC) Date: Tue, 15 Dec 2020 14:21:31 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608070892; bh=1sj45ioq6+GBql/T8PI+jt0AUBEdEAXEXYa2uX3VwIQ=; h=From:To:Subject:In-Reply-To:From; b=SFYuDPusNnPpdbLHZGuO6xuT2q99a4cGGAra6p2lTGIy2V57nCejm7X3YnqHEbywH BkT3/u5+JmuYn1va0jnBFOOz+LiDV/O4Z79lt7MO/7h5E/x+R48IogNGxHHZQg7H10 TOIQKTg/ipcKz/Uw51XUnPpaCt/qls0lSZQ99d9E= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 19/19] mm/lru: revise the comments of lru_lock Message-ID: <20201215222131.Un95p7p-j%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/lru: revise the comments of lru_lock Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix the incorrect comments in code. Also fixed some zone->lru_lock comment error from ancient time. etc. I struggled to understand the comment above move_pages_to_lru() (surely it never calls page_referenced()), and eventually realized that most of it had got separated from shrink_active_list(): move that comment back. Link: https://lkml.kernel.org/r/1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Hugh Dickins Signed-off-by: Alex Shi Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Tejun Heo Cc: Andrey Ryabinin Cc: Jann Horn Cc: Mel Gorman Cc: Matthew Wilcox Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: "Chen, Rong A" Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Michal Hocko Cc: Michal Hocko Cc: Mika Penttilä Cc: Minchan Kim Cc: Shakeel Butt Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 ---- Documentation/admin-guide/cgroup-v1/memory.rst | 23 ++---- Documentation/trace/events-kmem.rst | 2 Documentation/vm/unevictable-lru.rst | 22 ++--- include/linux/mm_types.h | 2 include/linux/mmzone.h | 3 mm/filemap.c | 4 - mm/rmap.c | 4 - mm/vmscan.c | 41 ++++++----- 9 files changed, 51 insertions(+), 65 deletions(-) --- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst~mm-lru-revise-the-comments-of-lru_lock +++ a/Documentation/admin-guide/cgroup-v1/memcg_test.rst @@ -133,18 +133,9 @@ Under below explanation, we assume CONFI 8. LRU ====== - Each memcg has its own private LRU. Now, its handling is under global - VM's control (means that it's handled under global pgdat->lru_lock). - Almost all routines around memcg's LRU is called by global LRU's - list management functions under pgdat->lru_lock. - - A special function is mem_cgroup_isolate_pages(). This scans - memcg's private LRU and call __isolate_lru_page() to extract a page - from LRU. - - (By __isolate_lru_page(), the page is removed from both of global and - private LRU.) - + Each memcg has its own vector of LRUs (inactive anon, active anon, + inactive file, active file, unevictable) of pages from each node, + each LRU handled under a single lru_lock for that memcg and node. 9. Typical Tests. ================= --- a/Documentation/admin-guide/cgroup-v1/memory.rst~mm-lru-revise-the-comments-of-lru_lock +++ a/Documentation/admin-guide/cgroup-v1/memory.rst @@ -287,20 +287,17 @@ When oom event notifier is registered, e 2.6 Locking ----------- - lock_page_cgroup()/unlock_page_cgroup() should not be called under - the i_pages lock. +Lock order is as follows: - Other lock order is following: - - PG_locked. - mm->page_table_lock - pgdat->lru_lock - lock_page_cgroup. - - In many cases, just lock_page_cgroup() is called. - - per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by - pgdat->lru_lock, it has no lock of its own. + Page lock (PG_locked bit of page->flags) + mm->page_table_lock or split pte_lock + lock_page_memcg (memcg->move_lock) + mapping->i_pages lock + lruvec->lru_lock. + +Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by +lruvec->lru_lock; PG_lru bit of page->flags is cleared before +isolating a page from its LRU under lruvec->lru_lock. 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) ----------------------------------------------- --- a/Documentation/trace/events-kmem.rst~mm-lru-revise-the-comments-of-lru_lock +++ a/Documentation/trace/events-kmem.rst @@ -69,7 +69,7 @@ When pages are freed in batch, the also Broadly speaking, pages are taken off the LRU lock in bulk and freed in batch with a page list. Significant amounts of activity here could indicate that the system is under memory pressure and can also indicate -contention on the zone->lru_lock. +contention on the lruvec->lru_lock. 4. Per-CPU Allocator Activity ============================= --- a/Documentation/vm/unevictable-lru.rst~mm-lru-revise-the-comments-of-lru_lock +++ a/Documentation/vm/unevictable-lru.rst @@ -33,7 +33,7 @@ reclaim in Linux. The problems have bee memory x86_64 systems. To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of -main memory will have over 32 million 4k pages in a single zone. When a large +main memory will have over 32 million 4k pages in a single node. When a large fraction of these pages are not evictable for any reason [see below], vmscan will spend a lot of time scanning the LRU lists looking for the small fraction of pages that are evictable. This can result in a situation where all CPUs are @@ -55,7 +55,7 @@ unevictable, either by definition or by The Unevictable Page List ------------------------- -The Unevictable LRU infrastructure consists of an additional, per-zone, LRU list +The Unevictable LRU infrastructure consists of an additional, per-node, LRU list called the "unevictable" list and an associated page flag, PG_unevictable, to indicate that the page is being managed on the unevictable list. @@ -84,15 +84,9 @@ The unevictable list does not differenti swap-backed pages. This differentiation is only important while the pages are, in fact, evictable. -The unevictable list benefits from the "arrayification" of the per-zone LRU +The unevictable list benefits from the "arrayification" of the per-node LRU lists and statistics originally proposed and posted by Christoph Lameter. -The unevictable list does not use the LRU pagevec mechanism. Rather, -unevictable pages are placed directly on the page's zone's unevictable list -under the zone lru_lock. This allows us to prevent the stranding of pages on -the unevictable list when one task has the page isolated from the LRU and other -tasks are changing the "evictability" state of the page. - Memory Control Group Interaction -------------------------------- @@ -101,8 +95,8 @@ The unevictable LRU facility interacts w memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the lru_list enum. -The memory controller data structure automatically gets a per-zone unevictable -list as a result of the "arrayification" of the per-zone LRU lists (one per +The memory controller data structure automatically gets a per-node unevictable +list as a result of the "arrayification" of the per-node LRU lists (one per lru_list enum element). The memory controller tracks the movement of pages to and from the unevictable list. @@ -196,7 +190,7 @@ for the sake of expediency, to leave a u active/inactive LRU lists for vmscan to deal with. vmscan checks for such pages in all of the shrink_{active|inactive|page}_list() functions and will "cull" such pages that it encounters: that is, it diverts those pages to the -unevictable list for the zone being scanned. +unevictable list for the node being scanned. There may be situations where a page is mapped into a VM_LOCKED VMA, but the page is not marked as PG_mlocked. Such pages will make it all the way to @@ -328,7 +322,7 @@ If the page was NOT already mlocked, mlo page from the LRU, as it is likely on the appropriate active or inactive list at that time. If the isolate_lru_page() succeeds, mlock_vma_page() will put back the page - by calling putback_lru_page() - which will notice that the page -is now mlocked and divert the page to the zone's unevictable list. If +is now mlocked and divert the page to the node's unevictable list. If mlock_vma_page() is unable to isolate the page from the LRU, vmscan will handle it later if and when it attempts to reclaim the page. @@ -603,7 +597,7 @@ Some examples of these unevictable pages unevictable list in mlock_vma_page(). shrink_inactive_list() also diverts any unevictable pages that it finds on the -inactive lists to the appropriate zone's unevictable list. +inactive lists to the appropriate node's unevictable list. shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd after shrink_active_list() had moved them to the inactive list, or pages mapped --- a/include/linux/mm_types.h~mm-lru-revise-the-comments-of-lru_lock +++ a/include/linux/mm_types.h @@ -79,7 +79,7 @@ struct page { struct { /* Page cache and anonymous pages */ /** * @lru: Pageout list, eg. active_list protected by - * pgdat->lru_lock. Sometimes used as a generic list + * lruvec->lru_lock. Sometimes used as a generic list * by the page owner. */ struct list_head lru; --- a/include/linux/mmzone.h~mm-lru-revise-the-comments-of-lru_lock +++ a/include/linux/mmzone.h @@ -113,8 +113,7 @@ static inline bool free_area_empty(struc struct pglist_data; /* - * zone->lock and the zone lru_lock are two of the hottest locks in the kernel. - * So add a wild amount of padding here to ensure that they fall into separate + * Add a wild amount of padding here to ensure datas fall into separate * cachelines. There are very few zone structures in the machine, so space * consumption is not a concern here. */ --- a/mm/filemap.c~mm-lru-revise-the-comments-of-lru_lock +++ a/mm/filemap.c @@ -102,8 +102,8 @@ * ->swap_lock (try_to_unmap_one) * ->private_lock (try_to_unmap_one) * ->i_pages lock (try_to_unmap_one) - * ->pgdat->lru_lock (follow_page->mark_page_accessed) - * ->pgdat->lru_lock (check_pte_range->isolate_lru_page) + * ->lruvec->lru_lock (follow_page->mark_page_accessed) + * ->lruvec->lru_lock (check_pte_range->isolate_lru_page) * ->private_lock (page_remove_rmap->set_page_dirty) * ->i_pages lock (page_remove_rmap->set_page_dirty) * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) --- a/mm/rmap.c~mm-lru-revise-the-comments-of-lru_lock +++ a/mm/rmap.c @@ -28,12 +28,12 @@ * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * anon_vma->rwsem * mm->page_table_lock or pte_lock - * pgdat->lru_lock (in mark_page_accessed, isolate_lru_page) * swap_lock (in swap_duplicate, swap_info_get) * mmlist_lock (in mmput, drain_mmlist and others) * mapping->private_lock (in __set_page_dirty_buffers) - * mem_cgroup_{begin,end}_page_stat (memcg->move_lock) + * lock_page_memcg move_lock (in __set_page_dirty_buffers) * i_pages lock (widely used) + * lruvec->lru_lock (in lock_page_lruvec_irq) * inode->i_lock (in set_page_dirty's __mark_inode_dirty) * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) * sb_lock (within inode_lock in fs/fs-writeback.c) --- a/mm/vmscan.c~mm-lru-revise-the-comments-of-lru_lock +++ a/mm/vmscan.c @@ -1613,14 +1613,16 @@ static __always_inline void update_lru_s } /** - * pgdat->lru_lock is heavily contended. Some of the functions that + * Isolating page from the lruvec to fill in @dst list by nr_to_scan times. + * + * lruvec->lru_lock is heavily contended. Some of the functions that * shrink the lists perform better by taking out a batch of pages * and working on them outside the LRU lock. * * For pagecache intensive workloads, this function is the hottest * spot in the kernel (apart from copy_*_user functions). * - * Appropriate locks must be held before calling this function. + * Lru_lock must be held before calling this function. * * @nr_to_scan: The number of eligible pages to look through on the list. * @lruvec: The LRU vector to pull pages from. @@ -1814,25 +1816,11 @@ static int too_many_isolated(struct pgli } /* - * This moves pages from @list to corresponding LRU list. - * - * We move them the other way if the page is referenced by one or more - * processes, from rmap. - * - * If the pages are mostly unmapped, the processing is fast and it is - * appropriate to hold zone_lru_lock across the whole operation. But if - * the pages are mapped, the processing is slow (page_referenced()) so we - * should drop zone_lru_lock around each page. It's impossible to balance - * this, so instead we remove the pages from the LRU while processing them. - * It is safe to rely on PG_active against the non-LRU pages in here because - * nobody will play with that bit on a non-LRU page. - * - * The downside is that we have to touch page->_refcount against each page. - * But we had to alter page->flags anyway. + * move_pages_to_lru() moves pages from private @list to appropriate LRU list. + * On return, @list is reused as a list of pages to be freed by the caller. * * Returns the number of pages moved to the given lruvec. */ - static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, struct list_head *list) { @@ -2010,6 +1998,23 @@ shrink_inactive_list(unsigned long nr_to return nr_reclaimed; } +/* + * shrink_active_list() moves pages from the active LRU to the inactive LRU. + * + * We move them the other way if the page is referenced by one or more + * processes. + * + * If the pages are mostly unmapped, the processing is fast and it is + * appropriate to hold lru_lock across the whole operation. But if + * the pages are mapped, the processing is slow (page_referenced()), so + * we should drop lru_lock around each page. It's impossible to balance + * this, so instead we remove the pages from the LRU while processing them. + * It is safe to rely on PG_active against the non-LRU pages in here because + * nobody will play with that bit on a non-LRU page. + * + * The downside is that we have to touch page->_refcount against each page. + * But we had to alter page->flags anyway. + */ static void shrink_active_list(unsigned long nr_to_scan, struct lruvec *lruvec, struct scan_control *sc,