diff mbox series

[06/13] mm/munlock: maintain page->mlock_count while unevictable

Message ID 3d204af4-664f-e4b0-4781-16718a2efb9c@google.com (mailing list archive)
State New
Headers show
Series mm/munlock: rework of mlock+munlock page handling | expand

Commit Message

Hugh Dickins Feb. 6, 2022, 9:40 p.m. UTC
Previous patches have been preparatory: now implement page->mlock_count.
The ordering of the "Unevictable LRU" is of no significance, and there is
no point holding unevictable pages on a list: place page->mlock_count to
overlay page->lru.prev (since page->lru.next is overlaid by compound_head,
which needs to be even so as not to satisfy PageTail - though 2 could be
added instead of 1 for each mlock, if that's ever an improvement).

But it's only safe to rely on or modify page->mlock_count while lruvec
lock is held and page is on unevictable "LRU" - we can save lots of edits
by continuing to pretend that there's an imaginary LRU here (there is an
unevictable count which still needs to be maintained, but not a list).

The mlock_count technique suffers from an unreliability much like with
page_mlock(): while someone else has the page off LRU, not much can
be done.  As before, err on the safe side (behave as if mlock_count 0),
and let try_to_unlock_one() move the page to unevictable if reclaim finds
out later on - a few misplaced pages don't matter, what we want to avoid
is imbalancing reclaim by flooding evictable lists with unevictable pages.

I am not a fan of "if (!isolate_lru_page(page)) putback_lru_page(page);":
if we have taken lruvec lock to get the page off its present list, then
we save everyone trouble (and however many extra atomic ops) by putting
it on its destination list immediately.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/mm_inline.h | 11 +++++--
 include/linux/mm_types.h  | 19 +++++++++--
 mm/huge_memory.c          |  5 ++-
 mm/memcontrol.c           |  3 +-
 mm/mlock.c                | 68 +++++++++++++++++++++++++++++++--------
 mm/mmzone.c               |  7 ++++
 mm/swap.c                 |  1 +
 7 files changed, 92 insertions(+), 22 deletions(-)

Comments

Vlastimil Babka Feb. 11, 2022, 12:27 p.m. UTC | #1
On 2/6/22 22:40, Hugh Dickins wrote:
> Previous patches have been preparatory: now implement page->mlock_count.
> The ordering of the "Unevictable LRU" is of no significance, and there is
> no point holding unevictable pages on a list: place page->mlock_count to
> overlay page->lru.prev (since page->lru.next is overlaid by compound_head,
> which needs to be even so as not to satisfy PageTail - though 2 could be
> added instead of 1 for each mlock, if that's ever an improvement).
> 
> But it's only safe to rely on or modify page->mlock_count while lruvec
> lock is held and page is on unevictable "LRU" - we can save lots of edits
> by continuing to pretend that there's an imaginary LRU here (there is an
> unevictable count which still needs to be maintained, but not a list).
> 
> The mlock_count technique suffers from an unreliability much like with
> page_mlock(): while someone else has the page off LRU, not much can
> be done.  As before, err on the safe side (behave as if mlock_count 0),
> and let try_to_unlock_one() move the page to unevictable if reclaim finds
> out later on - a few misplaced pages don't matter, what we want to avoid
> is imbalancing reclaim by flooding evictable lists with unevictable pages.
> 
> I am not a fan of "if (!isolate_lru_page(page)) putback_lru_page(page);":
> if we have taken lruvec lock to get the page off its present list, then
> we save everyone trouble (and however many extra atomic ops) by putting
> it on its destination list immediately.

Good point.

> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/mm_inline.h | 11 +++++--
>  include/linux/mm_types.h  | 19 +++++++++--
>  mm/huge_memory.c          |  5 ++-
>  mm/memcontrol.c           |  3 +-
>  mm/mlock.c                | 68 +++++++++++++++++++++++++++++++--------
>  mm/mmzone.c               |  7 ++++
>  mm/swap.c                 |  1 +
>  7 files changed, 92 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index b725839dfe71..884d6f6af05b 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -99,7 +99,8 @@ void lruvec_add_folio(struct lruvec *lruvec, struct folio *folio)
>  
>  	update_lru_size(lruvec, lru, folio_zonenum(folio),
>  			folio_nr_pages(folio));
> -	list_add(&folio->lru, &lruvec->lists[lru]);
> +	if (lru != LRU_UNEVICTABLE)
> +		list_add(&folio->lru, &lruvec->lists[lru]);
>  }
>  
>  static __always_inline void add_page_to_lru_list(struct page *page,
> @@ -115,6 +116,7 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio)
>  
>  	update_lru_size(lruvec, lru, folio_zonenum(folio),
>  			folio_nr_pages(folio));
> +	/* This is not expected to be used on LRU_UNEVICTABLE */

Felt uneasy about this at first because it's just a _tail version of
lruvec_add_folio, and there's probably nothing fundamental about the users
of _tail to not encounter unevictable pages. But if the assumption is ever
violated, the poisoned list head should make it immediately clear, so I
guess that's fine.

>  	list_add_tail(&folio->lru, &lruvec->lists[lru]);
>  }
>  
> @@ -127,8 +129,11 @@ static __always_inline void add_page_to_lru_list_tail(struct page *page,
>  static __always_inline
>  void lruvec_del_folio(struct lruvec *lruvec, struct folio *folio)
>  {
> -	list_del(&folio->lru);
> -	update_lru_size(lruvec, folio_lru_list(folio), folio_zonenum(folio),
> +	enum lru_list lru = folio_lru_list(folio);
> +
> +	if (lru != LRU_UNEVICTABLE)
> +		list_del(&folio->lru);
> +	update_lru_size(lruvec, lru, folio_zonenum(folio),
>  			-folio_nr_pages(folio));
>  }
>
Vlastimil Babka Feb. 11, 2022, 6:07 p.m. UTC | #2
On 2/6/22 22:40, Hugh Dickins wrote:
> @@ -72,19 +91,40 @@ void mlock_page(struct page *page)
>   */
>  void munlock_page(struct page *page)
>  {
> +	struct lruvec *lruvec;
> +	int nr_pages = thp_nr_pages(page);
> +
>  	VM_BUG_ON_PAGE(PageTail(page), page);
>  
> +	lock_page_memcg(page);

Hm this (and unlock_page_memcg() below) didn't catch my attention until I
see patch 10/13 removes it again. It also AFAICS wasn't present in the code
removed by patch 1. Am I missing something or it wasn't necessary to add it
in the first place?

> +	lruvec = folio_lruvec_lock_irq(page_folio(page));
> +	if (PageLRU(page) && PageUnevictable(page)) {
> +		/* Then mlock_count is maintained, but might undercount */
> +		if (page->mlock_count)
> +			page->mlock_count--;
> +		if (page->mlock_count)
> +			goto out;
> +	}
> +	/* else assume that was the last mlock: reclaim will fix it if not */
> +
>  	if (TestClearPageMlocked(page)) {
> -		int nr_pages = thp_nr_pages(page);
> -
> -		mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
> -		if (!isolate_lru_page(page)) {
> -			putback_lru_page(page);
> -			count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages);
> -		} else if (PageUnevictable(page)) {
> -			count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
> -		}
> +		__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
> +		if (PageLRU(page) || !PageUnevictable(page))
> +			__count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages);
> +		else
> +			__count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
> +	}
> +
> +	/* page_evictable() has to be checked *after* clearing Mlocked */
> +	if (PageLRU(page) && PageUnevictable(page) && page_evictable(page)) {
> +		del_page_from_lru_list(page, lruvec);
> +		ClearPageUnevictable(page);
> +		add_page_to_lru_list(page, lruvec);
> +		__count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages);
>  	}
> +out:
> +	unlock_page_lruvec_irq(lruvec);
> +	unlock_page_memcg(page);
>  }
>  
>  /*
Hugh Dickins Feb. 14, 2022, 5:42 a.m. UTC | #3
On Fri, 11 Feb 2022, Vlastimil Babka wrote:
> On 2/6/22 22:40, Hugh Dickins wrote:
> > @@ -115,6 +116,7 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio)
> >  
> >  	update_lru_size(lruvec, lru, folio_zonenum(folio),
> >  			folio_nr_pages(folio));
> > +	/* This is not expected to be used on LRU_UNEVICTABLE */
> 
> Felt uneasy about this at first because it's just a _tail version of
> lruvec_add_folio, and there's probably nothing fundamental about the users
> of _tail to not encounter unevictable pages. But if the assumption is ever
> violated, the poisoned list head should make it immediately clear, so I
> guess that's fine.

Yes, I could have made that one check against LRU_UNEVICTABLE too, but
thought we would rather see the crash on the poisoned list head: since
specifically choosing the tail of an unordered (and imaginary) list
raises questions - it might turn out to be best permitted, it might turn
out to require a rethink; but until there is a case, let's crash on it.

Hugh
Hugh Dickins Feb. 14, 2022, 6:28 a.m. UTC | #4
On Fri, 11 Feb 2022, Vlastimil Babka wrote:
> On 2/6/22 22:40, Hugh Dickins wrote:
> > @@ -72,19 +91,40 @@ void mlock_page(struct page *page)
> >   */
> >  void munlock_page(struct page *page)
> >  {
> > +	struct lruvec *lruvec;
> > +	int nr_pages = thp_nr_pages(page);
> > +
> >  	VM_BUG_ON_PAGE(PageTail(page), page);
> >  
> > +	lock_page_memcg(page);
> 
> Hm this (and unlock_page_memcg() below) didn't catch my attention until I
> see patch 10/13 removes it again. It also AFAICS wasn't present in the code
> removed by patch 1. Am I missing something or it wasn't necessary to add it
> in the first place?

Something is needed to stabilize page->memcg, whoops I'm way out of date,
folio->memcg_data, before trying to get the lock on the relevant lruvec.

In commit_charge(), Johannes gives us a choice between four tools:
	 * - the page lock
	 * - LRU isolation
	 * - lock_page_memcg()
	 * - exclusive reference

The original code was using TestClearPageLRU inside isolate_lru_page()
to do it (also happened to have the page lock, but one tool is enough).

But I chose to use lock_page_memcg() at this stage, because we want to
do the TestClearPageMlocked part of the business even when !PageLRU;
and I don't entirely love the TestClearPageLRU method, since one will
fail if two try it concurrently.

Later, when doing the pagevec implementation, it seemed to become
more natural to use the TestClearPageLRU method - because that's how
pagevec_lru_move_fn() does it, or did I have a stronger reason for
making a different choice at that stage?  Maybe: I might have been
trying to keep the different functions as similar as possible.

But really we have too many ways to do that same thing, and my
choices may have been arbitrary, according to mood.  (When Alex Shi
popularized the TestClearPageLRU method, I did devise a patch which
used the lock_page_memcg() method throughout instead; but it was not
obviously better, so I didn't waste anyone else's time with it.)

I'm afraid that looking here again has led me to wonder whether
__munlock_page() in the final (10/13 pagevec) implementaton is correct
to be using __operations in its !isolated case.  But I'll have to come
back and think about that another time, must push forward tonight.

Hugh

> 
> > +	lruvec = folio_lruvec_lock_irq(page_folio(page));
> > +	if (PageLRU(page) && PageUnevictable(page)) {
> > +		/* Then mlock_count is maintained, but might undercount */
> > +		if (page->mlock_count)
> > +			page->mlock_count--;
> > +		if (page->mlock_count)
> > +			goto out;
> > +	}
> > +	/* else assume that was the last mlock: reclaim will fix it if not */
> > +
> >  	if (TestClearPageMlocked(page)) {
> > -		int nr_pages = thp_nr_pages(page);
> > -
> > -		mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
> > -		if (!isolate_lru_page(page)) {
> > -			putback_lru_page(page);
> > -			count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages);
> > -		} else if (PageUnevictable(page)) {
> > -			count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
> > -		}
> > +		__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
> > +		if (PageLRU(page) || !PageUnevictable(page))
> > +			__count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages);
> > +		else
> > +			__count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
> > +	}
> > +
> > +	/* page_evictable() has to be checked *after* clearing Mlocked */
> > +	if (PageLRU(page) && PageUnevictable(page) && page_evictable(page)) {
> > +		del_page_from_lru_list(page, lruvec);
> > +		ClearPageUnevictable(page);
> > +		add_page_to_lru_list(page, lruvec);
> > +		__count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages);
> >  	}
> > +out:
> > +	unlock_page_lruvec_irq(lruvec);
> > +	unlock_page_memcg(page);
> >  }
> >  
> >  /*
diff mbox series

Patch

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index b725839dfe71..884d6f6af05b 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -99,7 +99,8 @@  void lruvec_add_folio(struct lruvec *lruvec, struct folio *folio)
 
 	update_lru_size(lruvec, lru, folio_zonenum(folio),
 			folio_nr_pages(folio));
-	list_add(&folio->lru, &lruvec->lists[lru]);
+	if (lru != LRU_UNEVICTABLE)
+		list_add(&folio->lru, &lruvec->lists[lru]);
 }
 
 static __always_inline void add_page_to_lru_list(struct page *page,
@@ -115,6 +116,7 @@  void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio)
 
 	update_lru_size(lruvec, lru, folio_zonenum(folio),
 			folio_nr_pages(folio));
+	/* This is not expected to be used on LRU_UNEVICTABLE */
 	list_add_tail(&folio->lru, &lruvec->lists[lru]);
 }
 
@@ -127,8 +129,11 @@  static __always_inline void add_page_to_lru_list_tail(struct page *page,
 static __always_inline
 void lruvec_del_folio(struct lruvec *lruvec, struct folio *folio)
 {
-	list_del(&folio->lru);
-	update_lru_size(lruvec, folio_lru_list(folio), folio_zonenum(folio),
+	enum lru_list lru = folio_lru_list(folio);
+
+	if (lru != LRU_UNEVICTABLE)
+		list_del(&folio->lru);
+	update_lru_size(lruvec, lru, folio_zonenum(folio),
 			-folio_nr_pages(folio));
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5140e5feb486..475bdb282769 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -85,7 +85,16 @@  struct page {
 			 * lruvec->lru_lock.  Sometimes used as a generic list
 			 * by the page owner.
 			 */
-			struct list_head lru;
+			union {
+				struct list_head lru;
+				/* Or, for the Unevictable "LRU list" slot */
+				struct {
+					/* Always even, to negate PageTail */
+					void *__filler;
+					/* Count page's or folio's mlocks */
+					unsigned int mlock_count;
+				};
+			};
 			/* See page-flags.h for PAGE_MAPPING_FLAGS */
 			struct address_space *mapping;
 			pgoff_t index;		/* Our offset within mapping. */
@@ -241,7 +250,13 @@  struct folio {
 		struct {
 	/* public: */
 			unsigned long flags;
-			struct list_head lru;
+			union {
+				struct list_head lru;
+				struct {
+					void *__filler;
+					unsigned int mlock_count;
+				};
+			};
 			struct address_space *mapping;
 			pgoff_t index;
 			void *private;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d6477f48a27e..9afca0122723 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2300,8 +2300,11 @@  static void lru_add_page_tail(struct page *head, struct page *tail,
 	} else {
 		/* head is still on lru (and we have it frozen) */
 		VM_WARN_ON(!PageLRU(head));
+		if (PageUnevictable(tail))
+			tail->mlock_count = 0;
+		else
+			list_add_tail(&tail->lru, &head->lru);
 		SetPageLRU(tail);
-		list_add_tail(&tail->lru, &head->lru);
 	}
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 09d342c7cbd0..b10590926177 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1257,8 +1257,7 @@  struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio,
  * @nr_pages: positive when adding or negative when removing
  *
  * This function must be called under lru_lock, just before a page is added
- * to or just after a page is removed from an lru list (that ordering being
- * so as to allow it to check that lru_size 0 is consistent with list_empty).
+ * to or just after a page is removed from an lru list.
  */
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 				int zid, int nr_pages)
diff --git a/mm/mlock.c b/mm/mlock.c
index db936288b8a0..0d3ae04b1f4e 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -54,16 +54,35 @@  EXPORT_SYMBOL(can_do_mlock);
  */
 void mlock_page(struct page *page)
 {
+	struct lruvec *lruvec;
+	int nr_pages = thp_nr_pages(page);
+
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
 	if (!TestSetPageMlocked(page)) {
-		int nr_pages = thp_nr_pages(page);
-
 		mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages);
-		count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
-		if (!isolate_lru_page(page))
-			putback_lru_page(page);
+		__count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
+	}
+
+	/* There is nothing more we can do while it's off LRU */
+	if (!TestClearPageLRU(page))
+		return;
+
+	lruvec = folio_lruvec_lock_irq(page_folio(page));
+	if (PageUnevictable(page)) {
+		page->mlock_count++;
+		goto out;
 	}
+
+	del_page_from_lru_list(page, lruvec);
+	ClearPageActive(page);
+	SetPageUnevictable(page);
+	page->mlock_count = 1;
+	add_page_to_lru_list(page, lruvec);
+	__count_vm_events(UNEVICTABLE_PGCULLED, nr_pages);
+out:
+	SetPageLRU(page);
+	unlock_page_lruvec_irq(lruvec);
 }
 
 /**
@@ -72,19 +91,40 @@  void mlock_page(struct page *page)
  */
 void munlock_page(struct page *page)
 {
+	struct lruvec *lruvec;
+	int nr_pages = thp_nr_pages(page);
+
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
+	lock_page_memcg(page);
+	lruvec = folio_lruvec_lock_irq(page_folio(page));
+	if (PageLRU(page) && PageUnevictable(page)) {
+		/* Then mlock_count is maintained, but might undercount */
+		if (page->mlock_count)
+			page->mlock_count--;
+		if (page->mlock_count)
+			goto out;
+	}
+	/* else assume that was the last mlock: reclaim will fix it if not */
+
 	if (TestClearPageMlocked(page)) {
-		int nr_pages = thp_nr_pages(page);
-
-		mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
-		if (!isolate_lru_page(page)) {
-			putback_lru_page(page);
-			count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages);
-		} else if (PageUnevictable(page)) {
-			count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
-		}
+		__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
+		if (PageLRU(page) || !PageUnevictable(page))
+			__count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages);
+		else
+			__count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
+	}
+
+	/* page_evictable() has to be checked *after* clearing Mlocked */
+	if (PageLRU(page) && PageUnevictable(page) && page_evictable(page)) {
+		del_page_from_lru_list(page, lruvec);
+		ClearPageUnevictable(page);
+		add_page_to_lru_list(page, lruvec);
+		__count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages);
 	}
+out:
+	unlock_page_lruvec_irq(lruvec);
+	unlock_page_memcg(page);
 }
 
 /*
diff --git a/mm/mmzone.c b/mm/mmzone.c
index eb89d6e018e2..40e1d9428300 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -81,6 +81,13 @@  void lruvec_init(struct lruvec *lruvec)
 
 	for_each_lru(lru)
 		INIT_LIST_HEAD(&lruvec->lists[lru]);
+	/*
+	 * The "Unevictable LRU" is imaginary: though its size is maintained,
+	 * it is never scanned, and unevictable pages are not threaded on it
+	 * (so that their lru fields can be reused to hold mlock_count).
+	 * Poison its list head, so that any operations on it would crash.
+	 */
+	list_del(&lruvec->lists[LRU_UNEVICTABLE]);
 }
 
 #if defined(CONFIG_NUMA_BALANCING) && !defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS)
diff --git a/mm/swap.c b/mm/swap.c
index ff4810e4a4bc..682a03301a2c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1062,6 +1062,7 @@  static void __pagevec_lru_add_fn(struct folio *folio, struct lruvec *lruvec)
 	} else {
 		folio_clear_active(folio);
 		folio_set_unevictable(folio);
+		folio->mlock_count = !!folio_test_mlocked(folio);
 		if (!was_unevictable)
 			__count_vm_events(UNEVICTABLE_PGCULLED, nr_pages);
 	}