mbox series

[v3,0/2] improving dynamic zswap shrinker protection scheme

Message ID 20240805232243.2896283-1-nphamcs@gmail.com (mailing list archive)
Headers show
Series improving dynamic zswap shrinker protection scheme | expand

Message

Nhat Pham Aug. 5, 2024, 11:22 p.m. UTC
v3: No (intended) functional change
  * Small cleanups, renamings, etc. (suggested by Yosry Ahmed)
v2:
  * Add more details in comments, patch changelog, documentation, etc.
    about the second chance scheme and its ability to modulate the
	writeback rate (patch 1) (suggested by Yosry Ahmed).
  * Move the referenced bit (patch 1) (suggested by Yosry Ahmed).

When experimenting with the memory-pressure based (i.e "dynamic") zswap
shrinker in production, we observed a sharp increase in the number of
swapins, which led to performance regression. We were able to trace this
regression to the following problems with the shrinker's warm pages
protection scheme: 

1. The protection decays way too rapidly, and the decaying is coupled with
   zswap stores, leading to anomalous patterns, in which a small batch of
   zswap stores effectively erase all the protection in place for the
   warmer pages in the zswap LRU.

   This observation has also been corroborated upstream by Takero Funaki
   (in [1]).

2. We inaccurately track the number of swapped in pages, missing the
   non-pivot pages that are part of the readahead window, while counting
   the pages that are found in the zswap pool.


To alleviate these two issues, this patch series improve the dynamic zswap
shrinker in the following manner:

1. Replace the protection size tracking scheme with a second chance
   algorithm. This new scheme removes the need for haphazard stats
   decaying, and automatically adjusts the pace of pages aging with memory
   pressure, and writeback rate with pool activities: slowing down when
   the pool is dominated with zswpouts, and speeding up when the pool is
   dominated with stale entries.

2. Fix the tracking of the number of swapins to take into account
   non-pivot pages in the readahead window.

With these two changes in place, in a kernel-building benchmark without
any cold data added, the number of swapins is reduced by 64.12%. This
translate to a 10.32% reduction in build time. We also observe a 3%
reduction in kernel CPU time.

In another benchmark, with cold data added (to gauge the new algorithm's
ability to offload cold data), the new second chance scheme outperforms
the old protection scheme by around 0.7%, and actually written back around
21% more pages to backing swap device. So the new scheme is just as good,
if not even better than the old scheme on this front as well.

[1]: https://lore.kernel.org/linux-mm/CAPpodddcGsK=0Xczfuk8usgZ47xeyf4ZjiofdT+ujiyz6V2pFQ@mail.gmail.com/

Nhat Pham (2):
  zswap: implement a second chance algorithm for dynamic zswap shrinker
  zswap: track swapins from disk more accurately

 include/linux/zswap.h |  16 +++----
 mm/page_io.c          |  11 ++++-
 mm/swap_state.c       |   8 +---
 mm/zswap.c            | 108 ++++++++++++++++++++++++------------------
 4 files changed, 82 insertions(+), 61 deletions(-)


base-commit: cca1345bd26a67fc61a92ff0c6d81766c259e522