diff mbox series

[v2,3/3] mm, lru_gen: try to prefetch next page when canning LRU

Message ID 20240111183321.19984-4-ryncsn@gmail.com (mailing list archive)
State New
Headers show
Series mm, lru_gen: batch update pages when aging | expand

Commit Message

Kairui Song Jan. 11, 2024, 6:33 p.m. UTC
From: Kairui Song <kasong@tencent.com>

Prefetch for inactive/active LRU have been long exiting, apply the same
optimization for MGLRU.

Ramdisk based swap test in a 4G memcg on a EPYC 7K62 with:

  memcached -u nobody -m 16384 -s /tmp/memcached.socket \
    -a 0766 -t 16 -B binary &

  memtier_benchmark -S /tmp/memcached.socket \
    -P memcache_binary -n allkeys \
    --key-minimum=1 --key-maximum=16000000 -d 1024 \
    --ratio=1:0 --key-pattern=P:P -c 2 -t 16 --pipeline 8 -x 6

Average result of 18 test runs:

Before:           44017.78 Ops/sec
After patch 1-3:  44890.50 Ops/sec (+1.8%)

Ramdisk fio test in a 4G memcg on a EPYC 7K62 with:

  fio -name=mglru --numjobs=16 --directory=/mnt --size=960m \
    --buffered=1 --ioengine=io_uring --iodepth=128 \
    --iodepth_batch_submit=32 --iodepth_batch_complete=32 \
    --rw=randread --random_distribution=zipf:0.5 --norandommap \
    --time_based --ramp_time=1m --runtime=5m --group_reporting

Before this patch:
bw (  MiB/s): min= 7644, max= 9293, per=100.00%, avg=8777.77, stdev=16.59, samples=9568
iops        : min=1956954, max=2379053, avg=2247108.51, stdev=4247.22, samples=9568

After this patch (+7.5%):
bw (  MiB/s): min= 8462, max= 9902, per=100.00%, avg=9444.77, stdev=16.43, samples=9568
iops        : min=2166433, max=2535135, avg=2417858.23, stdev=4205.15, samples=9568

Prefetch is highly related to timing and architecture so it may only help in
certain cases, some extra test showed at least no regression here for
the series:

Ramdisk memtier test above in a 8G memcg on an Intel i7-9700:

  memtier_benchmark -S /tmp/memcached.socket \
    -P memcache_binary -n allkeys --key-minimum=1 \
    --key-maximum=36000000 --key-pattern=P:P -c 1 -t 12 \
    --ratio 1:0 --pipeline 8 -d 1024 -x 4

Average result of 12 test runs:

Before:           61241.96 Ops/sec
After patch 1-3:  61268.53 Ops/sec (+0.0%)

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/vmscan.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Kairui Song Jan. 11, 2024, 6:35 p.m. UTC | #1
Kairui Song <ryncsn@gmail.com> 于2024年1月12日周五 02:33写道:
>
> From: Kairui Song <kasong@tencent.com>
>
> Prefetch for inactive/active LRU have been long exiting, apply the same
> optimization for MGLRU.
>
> Ramdisk based swap test in a 4G memcg on a EPYC 7K62 with:
>

Hi, my applogize, I just realize I forgot to fix the typo in title
right after I send the patch... it should be:
mm, lru_gen: try to prefetch next page when scanning LRU
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 57b6549946c3..4ef83db40adb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3773,10 +3773,12 @@  static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
 			VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio);
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
 
-			if (unlikely(list_is_first(&folio->lru, head)))
+			if (unlikely(list_is_first(&folio->lru, head))) {
 				prev = NULL;
-			else
+			} else {
 				prev = lru_to_folio(&folio->lru);
+				prefetchw(&prev->flags);
+			}
 
 			new_gen = folio_inc_gen(lruvec, folio, false, &batch);
 			lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, new_gen, type, zone, &batch);
@@ -4452,10 +4454,12 @@  static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
 
 			scanned += delta;
-			if (unlikely(list_is_first(&folio->lru, head)))
+			if (unlikely(list_is_first(&folio->lru, head))) {
 				prev = NULL;
-			else
+			} else {
 				prev = lru_to_folio(&folio->lru);
+				prefetchw(&prev->flags);
+			}
 
 			if (sort_folio(lruvec, folio, sc, tier, bulk_gen, &batch))
 				sorted += delta;