diff mbox series

[1/2] mm: zswap: increase shrinking protection for zswap swapins only

Message ID 20240320020823.337644-1-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series [1/2] mm: zswap: increase shrinking protection for zswap swapins only | expand

Commit Message

Yosry Ahmed March 20, 2024, 2:08 a.m. UTC
Currently, the number of protected zswap entries corresponding to an
lruvec are incremented every time we swapin a page. This happens
regardless of whether or not the page originated in zswap. Hence,
swapins from disk will lead to increasing protection on potentially
stale zswap entries. Furthermore, the increased shrinking protection can
lead to more pages skipping zswap and going to disk, eventually leading
to even more swapins from disk and starting a vicious circle.

Instead, only increase the protection when pages are loaded from zswap.
This also has a nice side effect of removing zswap_folio_swapin() and
replacing it with a static helper that is only called from zswap_load().

No problems were observed in practice, this was found through code
inspection.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/zswap.h |  2 --
 mm/swap_state.c       |  8 ++------
 mm/zswap.c            | 10 +++-------
 3 files changed, 5 insertions(+), 15 deletions(-)

Comments

Johannes Weiner March 20, 2024, 9:50 a.m. UTC | #1
On Wed, Mar 20, 2024 at 02:08:22AM +0000, Yosry Ahmed wrote:
> Currently, the number of protected zswap entries corresponding to an
> lruvec are incremented every time we swapin a page.

Correct. This is the primary signal that the shrinker is being too
aggressive in moving entries to disk and should slow down...?

> This happens regardless of whether or not the page originated in
> zswap. Hence, swapins from disk will lead to increasing protection
> on potentially stale zswap entries. Furthermore, the increased
> shrinking protection can lead to more pages skipping zswap and going
> to disk, eventually leading to even more swapins from disk and
> starting a vicious circle.

How does shrinker protection affect zswap stores?

On the contrary, I would expect this patch to create a runaway
shrinker. The more aggressively it moves entries out to disk, the
lower the rate of zswap loads, the more aggressively it moves more
entries out to disk.

> Instead, only increase the protection when pages are loaded from zswap.
> This also has a nice side effect of removing zswap_folio_swapin() and
> replacing it with a static helper that is only called from zswap_load().
> 
> No problems were observed in practice, this was found through code
> inspection.

This is missing test results :)
Nhat Pham March 20, 2024, 2:48 p.m. UTC | #2
On Wed, Mar 20, 2024 at 2:51 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Mar 20, 2024 at 02:08:22AM +0000, Yosry Ahmed wrote:
> > Currently, the number of protected zswap entries corresponding to an
> > lruvec are incremented every time we swapin a page.
>
> Correct. This is the primary signal that the shrinker is being too
> aggressive in moving entries to disk and should slow down...?

Yup. Currently, there are two scenarios in which we increase zswap
protection area:

1. zswap lru movement - for instance, if a page for some reasons is
rotated in the LRU. When this happens, we increment the protection
size, so that the page at the tail end of the protected area does not
lose its protection because of (potentially) spurious LRU churnings.
2. swapin - when this happens, it is a signal that the shrinker is
being too... enthusiastic in its reclaiming action. We should be more
conservative, hence the increase in protection.

I think there's some confusion around this, because we use the
same-ish mechanism for two different events. Maybe I should have put
yet another fat comment at the callsites of zswap_folio_swapin() too
:)

>
> > This happens regardless of whether or not the page originated in
> > zswap. Hence, swapins from disk will lead to increasing protection
> > on potentially stale zswap entries. Furthermore, the increased

Hmmm my original thinking was that, had we protected the swapped-in
page back when it was still in zswap, we would have prevented this IO.
And since the pages in zswap are (at least conceptually) "warmer" than
the swapped in page, it is appropriate to increase the zswap
protection.

In fact, I was toying with the idea to max out the protection on
swap-in to temporarily cease all zswap shrinking, but that is perhaps
overkill :)

> > shrinking protection can lead to more pages skipping zswap and going

I think this can only happen when the protection is so strong that the
zswap pool is full, right?

In practice I have never seen this happen though. Did you observe it?
We have a fairly aggressive lru-size-based protection decaying as
well, so that should not happen...

Also technically the protection only applies to the dynamic shrinker -
the capacity-driven shrinking is not affected (although it's not very
effective - perhaps we can rework this?)

> > to disk, eventually leading to even more swapins from disk and
> > starting a vicious circle.
>
> How does shrinker protection affect zswap stores?
>
> On the contrary, I would expect this patch to create a runaway
> shrinker. The more aggressively it moves entries out to disk, the
> lower the rate of zswap loads, the more aggressively it moves more
> entries out to disk.
>
> > Instead, only increase the protection when pages are loaded from zswap.
> > This also has a nice side effect of removing zswap_folio_swapin() and
> > replacing it with a static helper that is only called from zswap_load().
> >
> > No problems were observed in practice, this was found through code
> > inspection.
>
> This is missing test results :)

Yeah it'd be nice to see benchmark numbers :)

I mean all this protection is heuristics anyway, so maybe I'm just
being overly conservative. But only numbers can show this.
Yosry Ahmed March 20, 2024, 7:28 p.m. UTC | #3
On Wed, Mar 20, 2024 at 05:50:53AM -0400, Johannes Weiner wrote:
> On Wed, Mar 20, 2024 at 02:08:22AM +0000, Yosry Ahmed wrote:
> > Currently, the number of protected zswap entries corresponding to an
> > lruvec are incremented every time we swapin a page.
> 
> Correct. This is the primary signal that the shrinker is being too
> aggressive in moving entries to disk and should slow down...?

Right, I think that's where I was confused. See below :)

> 
> > This happens regardless of whether or not the page originated in
> > zswap. Hence, swapins from disk will lead to increasing protection
> > on potentially stale zswap entries. Furthermore, the increased
> > shrinking protection can lead to more pages skipping zswap and going
> > to disk, eventually leading to even more swapins from disk and
> > starting a vicious circle.
> 
> How does shrinker protection affect zswap stores?
> 
> On the contrary, I would expect this patch to create a runaway
> shrinker. The more aggressively it moves entries out to disk, the
> lower the rate of zswap loads, the more aggressively it moves more
> entries out to disk.

I think I found the source of my confusion. As you described, the
intention of the protection is to detect that we are doing writeback
more aggressively than we should. This is indicated by swapins
(especially those from disk).

However, this assumes that pages swapped in from disk had gone there by
shrinking/writeback. What I was focused on was pages that end up on disk
because of the zswap limit. In this case, a disk swapin is actually a
sign that we swapped hot pages to disk instead of putting them in zswap,
which probably indicates that we should shrink zswap more aggressively
to make room for these pages.

I guess the general assumption is that with the shrinker at play, pages
skipping zswap due to the limit becomes the unlikely scenario -- which
makes sense. However, pages that end up on disk because they skipped
zswap should have a higher likelihood of being swapped in, because they
are hotter by definition.

Ideally, we would be able to tell during swapin if this page was sent
to disk due to writeback or skipping zswap and increase or decrease
protection accordingly -- but this isn't the case.

So perhaps the answer is to decrease the protection when pages skip
zswap instead? Or is this not necessary because we kick off a background
shrinker anyway? IIUC the latter will stop once we reach the acceptance
threshold, but it might be a good idea to increase shrinking beyond
that under memory pressure.

Also, in an ideal world I guess it would be better to distinguish
swapins from disk vs. zswap? Swapins from disk should lead to a bigger
increase in protection IIUC (assuming they happened due to writeback).

Sorry if my analysis is still off, I am still familiarizing myself with
the shrinker heuristics :)

> 
> > Instead, only increase the protection when pages are loaded from zswap.
> > This also has a nice side effect of removing zswap_folio_swapin() and
> > replacing it with a static helper that is only called from zswap_load().
> > 
> > No problems were observed in practice, this was found through code
> > inspection.
> 
> This is missing test results :)

I intentionally wanted to send out the patch first to get some feedback,
knowing that I probably missed something. In retrospect, this should
have been an RFC patch. Patch 2 should be irrelevant tho, I only
bundled them because they touch the same area.
Yosry Ahmed March 20, 2024, 7:32 p.m. UTC | #4
On Wed, Mar 20, 2024 at 07:48:13AM -0700, Nhat Pham wrote:
> On Wed, Mar 20, 2024 at 2:51 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Wed, Mar 20, 2024 at 02:08:22AM +0000, Yosry Ahmed wrote:
> > > Currently, the number of protected zswap entries corresponding to an
> > > lruvec are incremented every time we swapin a page.
> >
> > Correct. This is the primary signal that the shrinker is being too
> > aggressive in moving entries to disk and should slow down...?
> 
> Yup. Currently, there are two scenarios in which we increase zswap
> protection area:
> 
> 1. zswap lru movement - for instance, if a page for some reasons is
> rotated in the LRU. When this happens, we increment the protection
> size, so that the page at the tail end of the protected area does not
> lose its protection because of (potentially) spurious LRU churnings.

I don't think we update the protected area during rotations anymore,
only when a new entry is added to the LRU (with potential decay).

> 2. swapin - when this happens, it is a signal that the shrinker is
> being too... enthusiastic in its reclaiming action. We should be more
> conservative, hence the increase in protection.
> 
> I think there's some confusion around this, because we use the
> same-ish mechanism for two different events. Maybe I should have put
> yet another fat comment at the callsites of zswap_folio_swapin() too
> :)

I think it makes sense. The confusion was mostly around the
interpretation of finding a page on disk. I was focused on pages that
skip zswap, but the main intention was for pages that were written back,
as I explained in my response to Johannes.

> 
> >
> > > This happens regardless of whether or not the page originated in
> > > zswap. Hence, swapins from disk will lead to increasing protection
> > > on potentially stale zswap entries. Furthermore, the increased
> 
> Hmmm my original thinking was that, had we protected the swapped-in
> page back when it was still in zswap, we would have prevented this IO.
> And since the pages in zswap are (at least conceptually) "warmer" than
> the swapped in page, it is appropriate to increase the zswap
> protection.
> 
> In fact, I was toying with the idea to max out the protection on
> swap-in to temporarily cease all zswap shrinking, but that is perhaps
> overkill :)

This would be problematic for pages that skip zswap IIUC, we'd want more
shrinking in this case.

> 
> > > shrinking protection can lead to more pages skipping zswap and going
> 
> I think this can only happen when the protection is so strong that the
> zswap pool is full, right?

Yeah that's what I had in mind. 

> 
> In practice I have never seen this happen though. Did you observe it?
> We have a fairly aggressive lru-size-based protection decaying as
> well, so that should not happen...

No this was all code inspection as I mentioned :)

> 
> Also technically the protection only applies to the dynamic shrinker -
> the capacity-driven shrinking is not affected (although it's not very
> effective - perhaps we can rework this?)

That logic is flawed anyway imo due to the acceptance threshold. We
should really get rid of that as we discussed before.
diff mbox series

Patch

diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 2a85b941db975..1f020b5427e3d 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -34,7 +34,6 @@  int zswap_swapon(int type, unsigned long nr_pages);
 void zswap_swapoff(int type);
 void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg);
 void zswap_lruvec_state_init(struct lruvec *lruvec);
-void zswap_folio_swapin(struct folio *folio);
 bool is_zswap_enabled(void);
 #else
 
@@ -58,7 +57,6 @@  static inline int zswap_swapon(int type, unsigned long nr_pages)
 static inline void zswap_swapoff(int type) {}
 static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
 static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {}
-static inline void zswap_folio_swapin(struct folio *folio) {}
 
 static inline bool is_zswap_enabled(void)
 {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index bfc7e8c58a6d3..32e151054ec47 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -696,10 +696,8 @@  struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	/* The page was likely read above, so no need for plugging here */
 	folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
 					&page_allocated, false);
-	if (unlikely(page_allocated)) {
-		zswap_folio_swapin(folio);
+	if (unlikely(page_allocated))
 		swap_read_folio(folio, false, NULL);
-	}
 	return folio;
 }
 
@@ -872,10 +870,8 @@  static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 	/* The folio was likely read above, so no need for plugging here */
 	folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
 					&page_allocated, false);
-	if (unlikely(page_allocated)) {
-		zswap_folio_swapin(folio);
+	if (unlikely(page_allocated))
 		swap_read_folio(folio, false, NULL);
-	}
 	return folio;
 }
 
diff --git a/mm/zswap.c b/mm/zswap.c
index b31c977f53e9c..323f1dea43d22 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -773,14 +773,9 @@  void zswap_lruvec_state_init(struct lruvec *lruvec)
 	atomic_long_set(&lruvec->zswap_lruvec_state.nr_zswap_protected, 0);
 }
 
-void zswap_folio_swapin(struct folio *folio)
+static void zswap_lruvec_inc_protected(struct lruvec *lruvec)
 {
-	struct lruvec *lruvec;
-
-	if (folio) {
-		lruvec = folio_lruvec(folio);
-		atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
-	}
+	atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
 }
 
 void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
@@ -1644,6 +1639,7 @@  bool zswap_load(struct folio *folio)
 	zswap_entry_free(entry);
 
 	folio_mark_dirty(folio);
+	zswap_lruvec_inc_protected(folio_lruvec(folio));
 
 	return true;
 }