[04/19] mm/thp: narrow lru locking

Message ID	20201215203333.hyZtikIQM%akpm@linux-foundation.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=3wKV=FT=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CEE5D22CB1 Date: Tue, 15 Dec 2020 12:33:33 -0800 From: Andrew Morton <akpm@linux-foundation.org> To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 04/19] mm/thp: narrow lru locking Message-ID: <20201215203333.hyZtikIQM%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[01/19] mm/thp: move lru_add_page_tail() to huge_memory.c \| expand [01/19] mm/thp: move lru_add_page_tail() to huge_memory.c [02/19] mm/thp: use head for head page in lru_add_page_tail() [03/19] mm/thp: simplify lru_add_page_tail() [04/19] mm/thp: narrow lru locking [05/19] mm/vmscan: remove unnecessary lruvec adding [06/19] mm/rmap: stop store reordering issue on page->mapping [07/19] mm: page_idle_get_page() does not need lru_lock [08/19] mm/memcg: add debug checking in lock_page_memcg [09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn [11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru [12/19] mm/mlock: remove lru_lock on TestClearPageMlocked [13/19] mm/mlock: remove __munlock_isolate_lru_page() [14/19] mm/lru: introduce TestClearPageLRU() [15/19] mm/compaction: do page isolation first in compaction [16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn [17/19] mm/lru: replace pgdat lru_lock with lruvec lock [18/19] mm/lru: introduce relock_page_lruvec() [10/19] mm/lru: move lock into lru_note_cost [19/19] mm/lru: revise the comments of lru_lock

Message ID

20201215203333.hyZtikIQM%akpm@linux-foundation.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CEE5D22CB1
Date: Tue, 15 Dec 2020 12:33:33 -0800
From: Andrew Morton <akpm@linux-foundation.org>
To: aarcange@redhat.com, akpm@linux-foundation.org,
 alex.shi@linux.alibaba.com, alexander.duyck@gmail.com,
 aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org,
 hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com,
 khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com,
 kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net,
 mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com,
 minchan@kernel.org, mm-commits@vger.kernel.org,
 richard.weiyang@gmail.com, rong.a.chen@intel.com, shakeelb@google.com,
 tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org,
 vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org,
 yang.shi@linux.alibaba.com, ying.huang@intel.com
Subject: [patch 04/19] mm/thp: narrow lru locking
Message-ID: <20201215203333.hyZtikIQM%akpm@linux-foundation.org>
In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org>
User-Agent: s-nail v14.8.16
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[01/19] mm/thp: move lru_add_page_tail() to huge_memory.c | expand

Commit Message

Andrew Morton Dec. 15, 2020, 8:33 p.m. UTC

From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/thp: narrow lru locking

lru_lock and page cache xa_lock have no obvious reason to be taken one way
round or the other: until now, lru_lock has been taken before page cache
xa_lock, when splitting a THP; but nothing else takes them together. 
Reverse that ordering: let's narrow the lru locking - but leave
local_irq_disable to block interrupts throughout, like before.

Hugh Dickins point: split_huge_page_to_list() was already silly, to be
using the _irqsave variant: it's just been taking sleeping locks, so would
already be broken if entered with interrupts enabled.  So we can save
passing flags argument down to __split_huge_page().

Why change the lock ordering here?  That was hard to decide.  One reason:
when this series reaches per-memcg lru locking, it relies on the THP's
memcg to be stable when taking the lru_lock: that is now done after the
THP's refcount has been frozen, which ensures page memcg cannot change.

Another reason: previously, lock_page_memcg()'s move_lock was presumed to
nest inside lru_lock; but now lru_lock must nest inside (page cache lock
inside) move_lock, so it becomes possible to use lock_page_memcg() to
stabilize page memcg before taking its lru_lock.  That is not the
mechanism used in this series, but it is an option we want to keep open.

[hughd@google.com: rewrite commit log]
Link: https://lkml.kernel.org/r/1604566549-62481-5-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |   25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

--- a/mm/huge_memory.c~mm-thp-narrow-lru-locking
+++ a/mm/huge_memory.c
@@ -2446,7 +2446,7 @@  static void __split_huge_page_tail(struc
 }
 
 static void __split_huge_page(struct page *page, struct list_head *list,
-		pgoff_t end, unsigned long flags)
+		pgoff_t end)
 {
 	struct page *head = compound_head(page);
 	pg_data_t *pgdat = page_pgdat(head);
@@ -2456,8 +2456,6 @@  static void __split_huge_page(struct pag
 	unsigned int nr = thp_nr_pages(head);
 	int i;
 
-	lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
 	/* complete memcg works before add pages to LRU */
 	mem_cgroup_split_huge_fixup(head);
 
@@ -2469,6 +2467,11 @@  static void __split_huge_page(struct pag
 		xa_lock(&swap_cache->i_pages);
 	}
 
+	/* prevent PageLRU to go away from under us, and freeze lru stats */
+	spin_lock(&pgdat->lru_lock);
+
+	lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
 	for (i = nr - 1; i >= 1; i--) {
 		__split_huge_page_tail(head, i, lruvec, list);
 		/* Some pages can be beyond i_size: drop them from page cache */
@@ -2488,6 +2491,8 @@  static void __split_huge_page(struct pag
 	}
 
 	ClearPageCompound(head);
+	spin_unlock(&pgdat->lru_lock);
+	/* Caller disabled irqs, so they are still disabled here */
 
 	split_page_owner(head, nr);
 
@@ -2505,8 +2510,7 @@  static void __split_huge_page(struct pag
 		page_ref_add(head, 2);
 		xa_unlock(&head->mapping->i_pages);
 	}
-
-	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+	local_irq_enable();
 
 	remap_page(head, nr);
 
@@ -2652,12 +2656,10 @@  bool can_split_huge_page(struct page *pa
 int split_huge_page_to_list(struct page *page, struct list_head *list)
 {
 	struct page *head = compound_head(page);
-	struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
 	struct deferred_split *ds_queue = get_deferred_split_queue(head);
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
 	int count, mapcount, extra_pins, ret;
-	unsigned long flags;
 	pgoff_t end;
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2718,9 +2720,8 @@  int split_huge_page_to_list(struct page
 	unmap_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
-	/* prevent PageLRU to go away from under us, and freeze lru stats */
-	spin_lock_irqsave(&pgdata->lru_lock, flags);
-
+	/* block interrupt reentry in xa_lock and spinlock */
+	local_irq_disable();
 	if (mapping) {
 		XA_STATE(xas, &mapping->i_pages, page_index(head));
 
@@ -2750,7 +2751,7 @@  int split_huge_page_to_list(struct page
 				__dec_lruvec_page_state(head, NR_FILE_THPS);
 		}
 
-		__split_huge_page(page, list, end, flags);
+		__split_huge_page(page, list, end);
 		ret = 0;
 	} else {
 		if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
@@ -2764,7 +2765,7 @@  int split_huge_page_to_list(struct page
 		spin_unlock(&ds_queue->split_queue_lock);
 fail:		if (mapping)
 			xa_unlock(&mapping->i_pages);
-		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+		local_irq_enable();
 		remap_page(head, thp_nr_pages(head));
 		ret = -EBUSY;
 	}

[04/19] mm/thp: narrow lru locking

Commit Message

Patch