[113/131] mm: balance LRU lists based on relative thrashing

Message ID	20200603230303.kSkT62Lb5%akpm@linux-foundation.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=OQel=7Q=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6C416221FB Date: Wed, 03 Jun 2020 16:03:03 -0700 From: Andrew Morton <akpm@linux-foundation.org> To: akpm@linux-foundation.org, hannes@cmpxchg.org, iamjoonsoo.kim@lge.com, linux-mm@kvack.org, mhocko@suse.com, minchan@kernel.org, mm-commits@vger.kernel.org, riel@surriel.com, torvalds@linux-foundation.org Subject: [patch 113/131] mm: balance LRU lists based on relative thrashing Message-ID: <20200603230303.kSkT62Lb5%akpm@linux-foundation.org> In-Reply-To: <20200603155549.e041363450869eaae4c7f05b@linux-foundation.org> User-Agent: s-nail v14.8.16 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[001/131] mm/slub: fix a memory leak in sysfs_slab_add() \| expand [001/131] mm/slub: fix a memory leak in sysfs_slab_add() [002/131] mm/memcg: optimize memory.numa_stat like memory.stat [003/131] mm/gup: move __get_user_pages_fast() down a few lines in gup.c [004/131] mm/gup: refactor and de-duplicate gup_fast() code [005/131] mm/gup: introduce pin_user_pages_fast_only() [006/131] drm/i915: convert get_user_pages() --> pin_user_pages() [007/131] mm/gup: might_lock_read(mmap_sem) in get_user_pages_fast() [008/131] kasan: stop tests being eliminated as dead code with FORTIFY_SOURCE [009/131] string.h: fix incompatibility between FORTIFY_SOURCE and KASAN [010/131] mm: clarify __GFP_MEMALLOC usage [011/131] mm: memblock: replace dereferences of memblock_region.nid with API calls [012/131] mm: make early_pfn_to_nid() and related defintions close to each other [013/131] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option [014/131] mm: free_area_init: use maximal zone PFNs rather than zone sizes [015/131] mm: use free_area_init() instead of free_area_init_nodes() [016/131] alpha: simplify detection of memory zone boundaries [017/131] arm: simplify detection of memory zone boundaries [018/131] arm64: simplify detection of memory zone boundaries for UMA configs [019/131] csky: simplify detection of memory zone boundaries [020/131] m68k: mm: simplify detection of memory zone boundaries [021/131] parisc: simplify detection of memory zone boundaries [022/131] sparc32: simplify detection of memory zone boundaries [023/131] unicore32: simplify detection of memory zone boundaries [024/131] xtensa: simplify detection of memory zone boundaries [025/131] mm: memmap_init: iterate over memblock regions rather that check each PFN [026/131] mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES [027/131] mm: free_area_init: allow defining max_zone_pfn in descending order [028/131] mm: rename free_area_init_node() to free_area_init_memoryless_node() [029/131] mm: clean up free_area_init_node() and its helpers [030/131] mm: simplify find_min_pfn_with_active_regions() [031/131] docs/vm: update memory-models documentation [032/131] mm/page_alloc.c: bad_[reason\|flags] is not necessary when PageHWPoison [033/131] mm/page_alloc.c: bad_flags is not necessary for bad_page() [034/131] mm/page_alloc.c: rename free_pages_check_bad() to check_free_page_bad() [035/131] mm/page_alloc.c: rename free_pages_check() to check_free_page() [036/131] mm/page_alloc.c: extract check_[new\|free]_page_bad() common part to page_bad_reason() [037/131] mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations [038/131] mm/page_alloc.c: remove unused free_bootmem_with_active_regions [039/131] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it [040/131] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty [041/131] mm/vmstat.c: do not show lowmem reserve protection information of empty zone [042/131] mm/page_alloc: use ac->high_zoneidx for classzone_idx [043/131] mm/page_alloc: integrate classzone_idx and high_zoneidx [044/131] mm/page_alloc.c: use NODE_MASK_NONE in build_zonelists() [045/131] mm: rename gfpflags_to_migratetype to gfp_migratetype for same convention [046/131] mm/page_alloc.c: reset numa stats for boot pagesets [047/131] mm, page_alloc: reset the zone->watermark_boost early [048/131] mm/page_alloc: restrict and formalize compound_page_dtors[] [049/131] mm/pagealloc.c: call touch_nmi_watchdog() on max order boundaries in deferred init [050/131] mm: initialize deferred pages with interrupts enabled [051/131] mm: call cond_resched() from deferred_init_memmap() [052/131] padata: remove exit routine [053/131] padata: initialize earlier [054/131] padata: allocate work structures for parallel jobs from a pool [055/131] padata: add basic support for multithreaded jobs [056/131] mm: don't track number of pages during deferred initialization [057/131] mm: parallelize deferred_init_memmap() [058/131] mm: make deferred init's max threads arch-specific [059/131] padata: document multithreaded jobs [060/131] mm/page_alloc.c: add missing newline [061/131] khugepaged: add self test [062/131] khugepaged: do not stop collapse if less than half PTEs are referenced [063/131] khugepaged: drain all LRU caches before scanning pages [064/131] khugepaged: drain LRU add pagevec after swapin [065/131] khugepaged: allow to collapse a page shared across fork [066/131] khugepaged: allow to collapse PTE-mapped compound pages [067/131] thp: change CoW semantics for anon-THP [068/131] khugepaged: introduce 'max_ptes_shared' tunable [069/131] hugetlbfs: add arch_hugetlb_valid_size [070/131] hugetlbfs: move hugepagesz= parsing to arch independent code [071/131] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate [072/131] hugetlbfs: clean up command line processing [073/131] hugetlbfs: fix changes to command line processing [074/131] mm/hugetlb: avoid unnecessary check on pud and pmd entry in huge_pte_offset [075/131] arm64/mm: drop __HAVE_ARCH_HUGE_PTEP_GET [076/131] mm/hugetlb: define a generic fallback for is_hugepage_only_range() [077/131] mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags() [078/131] mm: simplify calling a compound page destructor [079/131] mm/vmscan.c: use update_lru_size() in update_lru_sizes() [080/131] mm/vmscan: count layzfree pages and fix nr_isolated_* mismatch [081/131] mm/vmscan.c: change prototype for shrink_page_list [082/131] mm/vmscan: update the comment of should_continue_reclaim() [083/131] mm: fix NUMA node file count error in replace_page_cache() [084/131] mm: memcontrol: fix stat-corrupting race in charge moving [085/131] mm: memcontrol: drop @compound parameter from memcg charging API [086/131] mm: shmem: remove rare optimization when swapin races with hole punching [087/131] mm: memcontrol: move out cgroup swaprate throttling [088/131] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API [089/131] mm: memcontrol: prepare uncharging for removal of private page type counters [090/131] mm: memcontrol: prepare move_account for removal of private page type counters [091/131] mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters [092/131] mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters [093/131] mm: memcontrol: switch to native NR_ANON_MAPPED counter [094/131] mm: memcontrol: switch to native NR_ANON_THPS counter [095/131] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API [096/131] mm: memcontrol: drop unused try/commit/cancel charge API [097/131] mm: memcontrol: prepare swap controller setup for integration [098/131] mm: memcontrol: make swap tracking an integral part of memory control [099/131] mm: memcontrol: charge swapin pages on instantiation [100/131] mm: memcontrol: document the new swap control behavior [101/131] mm: memcontrol: delete unused lrucare handling [102/131] mm: memcontrol: update page->mem_cgroup stability rules [103/131] mm: fix LRU balancing effect of new transparent huge pages [104/131] mm: keep separate anon and file statistics on page reclaim activity [105/131] mm: allow swappiness that prefers reclaiming anon over the file workingset [106/131] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() [107/131] mm: workingset: let cache workingset challenge anon [108/131] mm: remove use-once cache bias from LRU balancing [109/131] mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count() [110/131] mm: base LRU balancing on an explicit cost model [111/131] mm: deactivations shouldn't bias the LRU balance [112/131] mm: only count actual rotations as LRU reclaim cost [113/131] mm: balance LRU lists based on relative thrashing [114/131] mm: vmscan: determine anon/file pressure balance at the reclaim root [115/131] mm: vmscan: reclaim writepage is IO cost [116/131] mm: vmscan: limit the range of LRU type balancing [117/131] mm: swap: fix vmstats for huge pages [118/131] mm: swap: memcg: fix memcg stats for huge pages [119/131] tools/vm/page_owner_sort.c: filter out unneeded line [120/131] mm, mempolicy: fix up gup usage in lookup_node [121/131] include/linux/memblock.h: fix minor typo and unclear comment [122/131] sparc32: register memory occupied by kernel as memblock.memory [123/131] hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs [124/131] mm: thp: don't need to drain lru cache when splitting and mlocking THP [125/131] powerpc/mm: drop platform defined pmd_mknotpresent() [126/131] mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() [127/131] drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup [128/131] mm: add DEBUG_WX support [129/131] riscv: support DEBUG_WX [130/131] x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined [131/131] arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined

Message ID

20200603230303.kSkT62Lb5%akpm@linux-foundation.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6C416221FB
Date: Wed, 03 Jun 2020 16:03:03 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, hannes@cmpxchg.org,
 iamjoonsoo.kim@lge.com, linux-mm@kvack.org, mhocko@suse.com,
 minchan@kernel.org, mm-commits@vger.kernel.org, riel@surriel.com,
 torvalds@linux-foundation.org
Subject: [patch 113/131] mm: balance LRU lists based on relative
 thrashing
Message-ID: <20200603230303.kSkT62Lb5%akpm@linux-foundation.org>
In-Reply-To: <20200603155549.e041363450869eaae4c7f05b@linux-foundation.org>
User-Agent: s-nail v14.8.16
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[001/131] mm/slub: fix a memory leak in sysfs_slab_add() | expand

Commit Message

Andrew Morton June 3, 2020, 11:03 p.m. UTC

From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: balance LRU lists based on relative thrashing

Since the LRUs were split into anon and file lists, the VM has been
balancing between page cache and anonymous pages based on per-list ratios
of scanned vs.  rotated pages.  In most cases that tips page reclaim
towards the list that is easier to reclaim and has the fewest actively
used pages, but there are a few problems with it:

1. Refaults and LRU rotations are weighted the same way, even though
   one costs IO and the other costs a bit of CPU.

2. The less we scan an LRU list based on already observed rotations,
   the more we increase the sampling interval for new references, and
   rotations become even more likely on that list. This can enter a
   death spiral in which we stop looking at one list completely until
   the other one is all but annihilated by page reclaim.

Since commit a528910e12ec ("mm: thrash detection-based file cache sizing")
we have refault detection for the page cache.  Along with swapin events,
they are good indicators of when the file or anon list, respectively, is
too small for its workingset and needs to grow.

For example, if the page cache is thrashing, the cache pages need more
time in memory, while there may be colder pages on the anonymous list. 
Likewise, if swapped pages are faulting back in, it indicates that we
reclaim anonymous pages too aggressively and should back off.

Replace LRU rotations with refaults and swapins as the basis for relative
reclaim cost of the two LRUs.  This will have the VM target list balances
that incur the least amount of IO on aggregate.

Link: http://lkml.kernel.org/r/20200520232525.798933-12-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/swap.h |    3 +--
 mm/swap.c            |   11 +++++++----
 mm/swap_state.c      |    5 +++++
 mm/vmscan.c          |   39 ++++++++++-----------------------------
 mm/workingset.c      |    4 ++++
 5 files changed, 27 insertions(+), 35 deletions(-)

Comments

Alex Shi June 9, 2020, 9:15 a.m. UTC | #1

在 2020/6/4 上午7:03, Andrew Morton 写道:
>  
> +	/* XXX: Move to lru_cache_add() when it supports new vs putback */

Hi Hannes,

Sorry for a bit lost, would you like to explain a bit more of your idea here?

> +	spin_lock_irq(&page_pgdat(page)->lru_lock);
> +	lru_note_cost(page);
> +	spin_unlock_irq(&page_pgdat(page)->lru_lock);
> +


What could we see here w/o the lru_lock?

Thanks
Alex

Johannes Weiner June 9, 2020, 2:45 p.m. UTC | #2

On Tue, Jun 09, 2020 at 05:15:33PM +0800, Alex Shi wrote:
> 
> 
> 在 2020/6/4 上午7:03, Andrew Morton 写道:
> >  
> > +	/* XXX: Move to lru_cache_add() when it supports new vs putback */
> 
> Hi Hannes,
> 
> Sorry for a bit lost, would you like to explain a bit more of your idea here?
> 
> > +	spin_lock_irq(&page_pgdat(page)->lru_lock);
> > +	lru_note_cost(page);
> > +	spin_unlock_irq(&page_pgdat(page)->lru_lock);
> > +
> 
> 
> What could we see here w/o the lru_lock?

It'll just be part of the existing LRU locking in
pagevec_lru_move_fn(), when the new pages are added to the LRU in
batch. See this older patch for example:

https://lore.kernel.org/linux-mm/20160606194836.3624-6-hannes@cmpxchg.org/

I didn't include it in this series to reduce conflict with Joonsoo's
WIP series that also operates in this area and does something similar:

https://lkml.org/lkml/2020/4/3/63

Joonsoo Kim June 10, 2020, 5:23 a.m. UTC | #3

2020년 6월 9일 (화) 오후 11:46, Johannes Weiner <hannes@cmpxchg.org>님이 작성:
>
> On Tue, Jun 09, 2020 at 05:15:33PM +0800, Alex Shi wrote:
> >
> >
> > 在 2020/6/4 上午7:03, Andrew Morton 写道:
> > >
> > > +   /* XXX: Move to lru_cache_add() when it supports new vs putback */
> >
> > Hi Hannes,
> >
> > Sorry for a bit lost, would you like to explain a bit more of your idea here?
> >
> > > +   spin_lock_irq(&page_pgdat(page)->lru_lock);
> > > +   lru_note_cost(page);
> > > +   spin_unlock_irq(&page_pgdat(page)->lru_lock);
> > > +
> >
> >
> > What could we see here w/o the lru_lock?
>
> It'll just be part of the existing LRU locking in
> pagevec_lru_move_fn(), when the new pages are added to the LRU in
> batch. See this older patch for example:
>
> https://lore.kernel.org/linux-mm/20160606194836.3624-6-hannes@cmpxchg.org/
>
> I didn't include it in this series to reduce conflict with Joonsoo's
> WIP series that also operates in this area and does something similar:

Thanks!

> https://lkml.org/lkml/2020/4/3/63

I haven't completed the rebase of my series but I guess that referenced patch
"https://lkml.org/lkml/2020/4/3/63" would be removed in the next version.

Before the I/O cost model, a new anonymous page contributes to the LRU reclaim
balance. But, now, a new anonymous page doesn't contributes to the I/O cost
so this adjusting patch would not be needed anymore.

If anyone wants to change this part,
"/* XXX: Move to lru_cache_add() when it supports new vs putback */", feel free
to do it.

Thanks.

Alex Shi June 11, 2020, 3:28 a.m. UTC | #4

在 2020/6/10 下午1:23, Joonsoo Kim 写道:
> 2020년 6월 9일 (화) 오후 11:46, Johannes Weiner <hannes@cmpxchg.org>님이 작성:
>>
>> On Tue, Jun 09, 2020 at 05:15:33PM +0800, Alex Shi wrote:
>>>
>>>
>>> 在 2020/6/4 上午7:03, Andrew Morton 写道:
>>>>
>>>> +   /* XXX: Move to lru_cache_add() when it supports new vs putback */
>>>
>>> Hi Hannes,
>>>
>>> Sorry for a bit lost, would you like to explain a bit more of your idea here?
>>>
>>>> +   spin_lock_irq(&page_pgdat(page)->lru_lock);
>>>> +   lru_note_cost(page);
>>>> +   spin_unlock_irq(&page_pgdat(page)->lru_lock);
>>>> +
>>>
>>>
>>> What could we see here w/o the lru_lock?

Why I want to know the lru_lock protection here is that currently we have 5 lru lists
but only guarded by one lock, that would cause much contention when different apps
active on a server.

I guess originally we have only one lru_lock, since 5 locks would cause cacheline bouncing
if we put them together, or a bit cacheline waste to separate them in cacheline. But
after we have qspinlock, each of cpu will just loop lock on their cacheline, no interfer 
to others. It would much much relief the performance drop by cacheline bounce.

And we could use page.mapping bits to store the using lru list index for the page.
As a quick thought, I guess, except the 5 locks for 5 lists, we still need 1 more lock for
common lruvec data or for others which relay on lru_lock now, like mlock, hpage_nr_pages..
That's the reason I want to know everything under lru_lock. :)

Any comments for this idea? :)

Thanks
Alex

>>
>> It'll just be part of the existing LRU locking in
>> pagevec_lru_move_fn(), when the new pages are added to the LRU in
>> batch. See this older patch for example:
>>
>> https://lore.kernel.org/linux-mm/20160606194836.3624-6-hannes@cmpxchg.org/
>>
>> I didn't include it in this series to reduce conflict with Joonsoo's
>> WIP series that also operates in this area and does something similar:
> 
> Thanks!
> 
>> https://lkml.org/lkml/2020/4/3/63
> 
> I haven't completed the rebase of my series but I guess that referenced patch
> "https://lkml.org/lkml/2020/4/3/63" would be removed in the next version.

Thanks a lot for the info, Johannes&Joonsoo! A long history for a interesting idea. :)

> 
> Before the I/O cost model, a new anonymous page contributes to the LRU reclaim
> balance. But, now, a new anonymous page doesn't contributes to the I/O cost
> so this adjusting patch would not be needed anymore.
> 
> If anyone wants to change this part,
> "/* XXX: Move to lru_cache_add() when it supports new vs putback */", feel free
> to do it.

--- a/include/linux/swap.h~mm-balance-lru-lists-based-on-relative-thrashing
+++ a/include/linux/swap.h
@@ -334,8 +334,7 @@  extern unsigned long nr_free_pagecache_p
 
 
 /* linux/mm/swap.c */
-extern void lru_note_cost(struct lruvec *lruvec, bool file,
-			  unsigned int nr_pages);
+extern void lru_note_cost(struct page *);
 extern void lru_cache_add(struct page *);
 extern void lru_add_page_tail(struct page *page, struct page *page_tail,
 			 struct lruvec *lruvec, struct list_head *head);
--- a/mm/swap.c~mm-balance-lru-lists-based-on-relative-thrashing
+++ a/mm/swap.c
@@ -278,12 +278,15 @@  void rotate_reclaimable_page(struct page
 	}
 }
 
-void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages)
+void lru_note_cost(struct page *page)
 {
-	if (file)
-		lruvec->file_cost += nr_pages;
+	struct lruvec *lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
+
+	/* Record new data point */
+	if (page_is_file_lru(page))
+		lruvec->file_cost++;
 	else
-		lruvec->anon_cost += nr_pages;
+		lruvec->anon_cost++;
 }
 
 static void __activate_page(struct page *page, struct lruvec *lruvec,
--- a/mm/swap_state.c~mm-balance-lru-lists-based-on-relative-thrashing
+++ a/mm/swap_state.c
@@ -440,6 +440,11 @@  struct page *__read_swap_cache_async(swp
 		goto fail_unlock;
 	}
 
+	/* XXX: Move to lru_cache_add() when it supports new vs putback */
+	spin_lock_irq(&page_pgdat(page)->lru_lock);
+	lru_note_cost(page);
+	spin_unlock_irq(&page_pgdat(page)->lru_lock);
+
 	/* Caller will initiate read into locked page */
 	SetPageWorkingset(page);
 	lru_cache_add(page);
--- a/mm/vmscan.c~mm-balance-lru-lists-based-on-relative-thrashing
+++ a/mm/vmscan.c
@@ -1958,12 +1958,6 @@  shrink_inactive_list(unsigned long nr_to
 	move_pages_to_lru(lruvec, &page_list);
 
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
-	/*
-	 * Rotating pages costs CPU without actually
-	 * progressing toward the reclaim goal.
-	 */
-	lru_note_cost(lruvec, 0, stat.nr_activate[0]);
-	lru_note_cost(lruvec, 1, stat.nr_activate[1]);
 	item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
 	if (!cgroup_reclaim(sc))
 		__count_vm_events(item, nr_reclaimed);
@@ -2079,11 +2073,6 @@  static void shrink_active_list(unsigned
 	 * Move pages back to the lru list.
 	 */
 	spin_lock_irq(&pgdat->lru_lock);
-	/*
-	 * Rotating pages costs CPU without actually
-	 * progressing toward the reclaim goal.
-	 */
-	lru_note_cost(lruvec, file, nr_rotated);
 
 	nr_activate = move_pages_to_lru(lruvec, &l_active);
 	nr_deactivate = move_pages_to_lru(lruvec, &l_inactive);
@@ -2298,22 +2287,23 @@  static void get_scan_count(struct lruvec
 	scan_balance = SCAN_FRACT;
 
 	/*
-	 * With swappiness at 100, anonymous and file have the same priority.
-	 * This scanning priority is essentially the inverse of IO cost.
+	 * Calculate the pressure balance between anon and file pages.
+	 *
+	 * The amount of pressure we put on each LRU is inversely
+	 * proportional to the cost of reclaiming each list, as
+	 * determined by the share of pages that are refaulting, times
+	 * the relative IO cost of bringing back a swapped out
+	 * anonymous page vs reloading a filesystem page (swappiness).
+	 *
+	 * With swappiness at 100, anon and file have equal IO cost.
 	 */
 	anon_prio = swappiness;
 	file_prio = 200 - anon_prio;
 
 	/*
-	 * OK, so we have swap space and a fair amount of page cache
-	 * pages.  We use the recently rotated / recently scanned
-	 * ratios to determine how valuable each cache is.
-	 *
 	 * Because workloads change over time (and to avoid overflow)
 	 * we keep these statistics as a floating average, which ends
-	 * up weighing recent references more than old ones.
-	 *
-	 * anon in [0], file in [1]
+	 * up weighing recent refaults more than old ones.
 	 */
 
 	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
@@ -2328,15 +2318,6 @@  static void get_scan_count(struct lruvec
 		lruvec->file_cost /= 2;
 		totalcost /= 2;
 	}
-
-	/*
-	 * The amount of pressure on anon vs file pages is inversely
-	 * proportional to the assumed cost of reclaiming each list,
-	 * as determined by the share of pages that are likely going
-	 * to refault or rotate on each list (recently referenced),
-	 * times the relative IO cost of bringing back a swapped out
-	 * anonymous page vs reloading a filesystem page (swappiness).
-	 */
 	ap = anon_prio * (totalcost + 1);
 	ap /= lruvec->anon_cost + 1;
 
--- a/mm/workingset.c~mm-balance-lru-lists-based-on-relative-thrashing
+++ a/mm/workingset.c
@@ -365,6 +365,10 @@  void workingset_refault(struct page *pag
 	/* Page was active prior to eviction */
 	if (workingset) {
 		SetPageWorkingset(page);
+		/* XXX: Move to lru_cache_add() when it supports new vs putback */
+		spin_lock_irq(&page_pgdat(page)->lru_lock);
+		lru_note_cost(page);
+		spin_unlock_irq(&page_pgdat(page)->lru_lock);
 		inc_lruvec_state(lruvec, WORKINGSET_RESTORE);
 	}
 out:

[113/131] mm: balance LRU lists based on relative thrashing

Commit Message

Comments

Patch