diff mbox series

mm: zswap: support exclusive loads

Message ID 20230530210251.493194-1-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series mm: zswap: support exclusive loads | expand

Commit Message

Yosry Ahmed May 30, 2023, 9:02 p.m. UTC
Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets")
removed support for exclusive loads from frontswap as it was not used.

Bring back exclusive loads support to frontswap by adding an
exclusive_loads argument to frontswap_ops. Add support for exclusive
loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS.

Refactor zswap entry invalidation in zswap_frontswap_invalidate_page()
into zswap_invalidate_entry() to reuse it in zswap_frontswap_load().

With exclusive loads, we avoid having two copies of the same page in
memory (compressed & uncompressed) after faulting it in from zswap. On
the other hand, if the page is to be reclaimed again without being
dirtied, it will be re-compressed. Compression is not usually slow, and
a page that was just faulted in is less likely to be reclaimed again
soon.

Suggested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/frontswap.h |  1 +
 mm/Kconfig                | 13 +++++++++++++
 mm/frontswap.c            |  7 ++++++-
 mm/zswap.c                | 23 +++++++++++++++--------
 4 files changed, 35 insertions(+), 9 deletions(-)

Comments

Andrew Morton May 30, 2023, 9:15 p.m. UTC | #1
On Tue, 30 May 2023 21:02:51 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:

> Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets")
> removed support for exclusive loads from frontswap as it was not used.
> 
> Bring back exclusive loads support to frontswap by adding an
> exclusive_loads argument to frontswap_ops. Add support for exclusive
> loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS.

Why is this Kconfigurable?  Why not just enable the feature for all
builds?

> Refactor zswap entry invalidation in zswap_frontswap_invalidate_page()
> into zswap_invalidate_entry() to reuse it in zswap_frontswap_load().
> 
> With exclusive loads, we avoid having two copies of the same page in
> memory (compressed & uncompressed) after faulting it in from zswap. On
> the other hand, if the page is to be reclaimed again without being
> dirtied, it will be re-compressed. Compression is not usually slow, and
> a page that was just faulted in is less likely to be reclaimed again
> soon.
> 
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON
>  	  The selection made here can be overridden by using the kernel
>  	  command line 'zswap.enabled=' option.
>  
> +config ZSWAP_EXCLUSIVE_LOADS
> +	bool "Invalidate zswap entries when pages are loaded"
> +	depends on ZSWAP
> +	help
> +	  If selected, when a page is loaded from zswap, the zswap entry is
> +	  invalidated at once, as opposed to leaving it in zswap until the
> +	  swap entry is freed.
> +
> +	  This avoids having two copies of the same page in memory
> +	  (compressed and uncompressed) after faulting in a page from zswap.
> +	  The cost is that if the page was never dirtied and needs to be
> +	  swapped out again, it will be re-compressed.

So it's a speed-vs-space tradeoff?  I'm not sure how users are to
decide whether they want this.  Did we help them as much as possible?
Yosry Ahmed May 30, 2023, 9:20 p.m. UTC | #2
On Tue, May 30, 2023 at 2:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 30 May 2023 21:02:51 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
>
> > Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets")
> > removed support for exclusive loads from frontswap as it was not used.
> >
> > Bring back exclusive loads support to frontswap by adding an
> > exclusive_loads argument to frontswap_ops. Add support for exclusive
> > loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS.
>
> Why is this Kconfigurable?  Why not just enable the feature for all
> builds?

I assumed that some users want the current behavior, where reclaiming
clean pages that were once in zswap would be faster. If no one cares,
I can remove the config option and have it always on.

>
> > Refactor zswap entry invalidation in zswap_frontswap_invalidate_page()
> > into zswap_invalidate_entry() to reuse it in zswap_frontswap_load().
> >
> > With exclusive loads, we avoid having two copies of the same page in
> > memory (compressed & uncompressed) after faulting it in from zswap. On
> > the other hand, if the page is to be reclaimed again without being
> > dirtied, it will be re-compressed. Compression is not usually slow, and
> > a page that was just faulted in is less likely to be reclaimed again
> > soon.
> >
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON
> >         The selection made here can be overridden by using the kernel
> >         command line 'zswap.enabled=' option.
> >
> > +config ZSWAP_EXCLUSIVE_LOADS
> > +     bool "Invalidate zswap entries when pages are loaded"
> > +     depends on ZSWAP
> > +     help
> > +       If selected, when a page is loaded from zswap, the zswap entry is
> > +       invalidated at once, as opposed to leaving it in zswap until the
> > +       swap entry is freed.
> > +
> > +       This avoids having two copies of the same page in memory
> > +       (compressed and uncompressed) after faulting in a page from zswap.
> > +       The cost is that if the page was never dirtied and needs to be
> > +       swapped out again, it will be re-compressed.
>
> So it's a speed-vs-space tradeoff?  I'm not sure how users are to
> decide whether they want this.  Did we help them as much as possible?

Yes, it is a reclaim speed vs. space tradeoff.

My intuition is that it should be more useful to have this enabled, as
the memory savings should be more important than having reclaim be a
little bit faster in some specific situations. We can make the
configuration on by default if others agree.

I would imagine users would turn this configuration on and observe
memory usage of zswap vs. reclaim speed, and decide based on the
numbers.

>
>
Johannes Weiner May 30, 2023, 11:54 p.m. UTC | #3
On Tue, May 30, 2023 at 09:02:51PM +0000, Yosry Ahmed wrote:
> @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page)
>  
>  	/* Try loading from each implementation, until one succeeds. */
>  	ret = frontswap_ops->load(type, offset, page);
> -	if (ret == 0)
> +	if (ret == 0) {
>  		inc_frontswap_loads();
> +		if (frontswap_ops->exclusive_loads) {
> +			SetPageDirty(page);
> +			__frontswap_clear(sis, offset);
> +		}

Somewhat tangential, but is there still a point to the frontswap
layer? It seems usecases other than zswap have never materialized, at
least not in tree. Life would be a lot easier if we were to just
hardcode the zswap callbacks in the swap functions.

It's not the patch's fault, but it highlights the boiler plate the
indirection causes. ->load() already has the page and could just dirty
it directly. Instead now both layers have to handle invalidation,
which is a vector for bugs.

Can somebody think of reasons to keep it? If not, I'd take a stab at
removing it.
Yosry Ahmed May 30, 2023, 11:56 p.m. UTC | #4
On Tue, May 30, 2023 at 4:54 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, May 30, 2023 at 09:02:51PM +0000, Yosry Ahmed wrote:
> > @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page)
> >
> >       /* Try loading from each implementation, until one succeeds. */
> >       ret = frontswap_ops->load(type, offset, page);
> > -     if (ret == 0)
> > +     if (ret == 0) {
> >               inc_frontswap_loads();
> > +             if (frontswap_ops->exclusive_loads) {
> > +                     SetPageDirty(page);
> > +                     __frontswap_clear(sis, offset);
> > +             }
>
> Somewhat tangential, but is there still a point to the frontswap
> layer? It seems usecases other than zswap have never materialized, at
> least not in tree. Life would be a lot easier if we were to just
> hardcode the zswap callbacks in the swap functions.
>
> It's not the patch's fault, but it highlights the boiler plate the
> indirection causes. ->load() already has the page and could just dirty
> it directly. Instead now both layers have to handle invalidation,
> which is a vector for bugs.
>
> Can somebody think of reasons to keep it? If not, I'd take a stab at
> removing it.

I was intending to eventually remove it as part of the swap
abstraction proposal I am working on, but this will take some time. If
you want to remove it now, even better :)
Christoph Hellwig May 31, 2023, 6 a.m. UTC | #5
On Tue, May 30, 2023 at 07:54:47PM -0400, Johannes Weiner wrote:
> Somewhat tangential, but is there still a point to the frontswap
> layer? It seems usecases other than zswap have never materialized, at
> least not in tree. Life would be a lot easier if we were to just
> hardcode the zswap callbacks in the swap functions.

I've been wanting to remove it for a while, as it really is rather
pointless.
Johannes Weiner June 7, 2023, 2:13 p.m. UTC | #6
On Tue, May 30, 2023 at 09:02:51PM +0000, Yosry Ahmed wrote:
> @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON
>  	  The selection made here can be overridden by using the kernel
>  	  command line 'zswap.enabled=' option.
>  
> +config ZSWAP_EXCLUSIVE_LOADS
> +	bool "Invalidate zswap entries when pages are loaded"
> +	depends on ZSWAP
> +	help
> +	  If selected, when a page is loaded from zswap, the zswap entry is
> +	  invalidated at once, as opposed to leaving it in zswap until the
> +	  swap entry is freed.
> +
> +	  This avoids having two copies of the same page in memory
> +	  (compressed and uncompressed) after faulting in a page from zswap.
> +	  The cost is that if the page was never dirtied and needs to be
> +	  swapped out again, it will be re-compressed.
> +
>  choice
>  	prompt "Default compressor"
>  	depends on ZSWAP
> diff --git a/mm/frontswap.c b/mm/frontswap.c
> index 279e55b4ed87..e5d6825110f4 100644
> --- a/mm/frontswap.c
> +++ b/mm/frontswap.c
> @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page)
>  
>  	/* Try loading from each implementation, until one succeeds. */
>  	ret = frontswap_ops->load(type, offset, page);
> -	if (ret == 0)
> +	if (ret == 0) {
>  		inc_frontswap_loads();
> +		if (frontswap_ops->exclusive_loads) {
> +			SetPageDirty(page);
> +			__frontswap_clear(sis, offset);
> +		}
> +	}
>  	return ret;

This would be a much more accessible feature (distro kernels,
experimenting, adapting to different workloads) if it were runtime
switchable.

That should be possible, right? As long as frontswap and zswap are
coordinated, this can be done on a per-entry basis:

	exclusive = READ_ONCE(frontswap_ops->exclusive_loads);
	ret = frontswap_ops->load(type, offset, page, exclusive);
	if (ret == 0) {
		if (exclusive) {
			SetPageDirty(page);
			__frontswap_clear(sis, offset);
		}
	}
Yosry Ahmed June 7, 2023, 4:33 p.m. UTC | #7
On Wed, Jun 7, 2023 at 7:13 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, May 30, 2023 at 09:02:51PM +0000, Yosry Ahmed wrote:
> > @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON
> >         The selection made here can be overridden by using the kernel
> >         command line 'zswap.enabled=' option.
> >
> > +config ZSWAP_EXCLUSIVE_LOADS
> > +     bool "Invalidate zswap entries when pages are loaded"
> > +     depends on ZSWAP
> > +     help
> > +       If selected, when a page is loaded from zswap, the zswap entry is
> > +       invalidated at once, as opposed to leaving it in zswap until the
> > +       swap entry is freed.
> > +
> > +       This avoids having two copies of the same page in memory
> > +       (compressed and uncompressed) after faulting in a page from zswap.
> > +       The cost is that if the page was never dirtied and needs to be
> > +       swapped out again, it will be re-compressed.
> > +
> >  choice
> >       prompt "Default compressor"
> >       depends on ZSWAP
> > diff --git a/mm/frontswap.c b/mm/frontswap.c
> > index 279e55b4ed87..e5d6825110f4 100644
> > --- a/mm/frontswap.c
> > +++ b/mm/frontswap.c
> > @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page)
> >
> >       /* Try loading from each implementation, until one succeeds. */
> >       ret = frontswap_ops->load(type, offset, page);
> > -     if (ret == 0)
> > +     if (ret == 0) {
> >               inc_frontswap_loads();
> > +             if (frontswap_ops->exclusive_loads) {
> > +                     SetPageDirty(page);
> > +                     __frontswap_clear(sis, offset);
> > +             }
> > +     }
> >       return ret;
>
> This would be a much more accessible feature (distro kernels,
> experimenting, adapting to different workloads) if it were runtime
> switchable.
>
> That should be possible, right? As long as frontswap and zswap are
> coordinated, this can be done on a per-entry basis:
>
>         exclusive = READ_ONCE(frontswap_ops->exclusive_loads);
>         ret = frontswap_ops->load(type, offset, page, exclusive);
>         if (ret == 0) {
>                 if (exclusive) {
>                         SetPageDirty(page);
>                         __frontswap_clear(sis, offset);
>                 }
>         }

Sure, I can do that. My thought was that this isn't something that
someone usually wants to change at runtime, but at least for
experimentation, it does make things a lot easier. I can add a zswap
module parameter.

I am guessing for consistency with other zswap settings and for
convenience, I can leave the config option to set the default value
with, aka CONFIG_ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON or something. Does
this make sense?
diff mbox series

Patch

diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h
index a631bac12220..289561e12cad 100644
--- a/include/linux/frontswap.h
+++ b/include/linux/frontswap.h
@@ -13,6 +13,7 @@  struct frontswap_ops {
 	int (*load)(unsigned, pgoff_t, struct page *); /* load a page */
 	void (*invalidate_page)(unsigned, pgoff_t); /* page no longer needed */
 	void (*invalidate_area)(unsigned); /* swap type just swapoff'ed */
+	bool exclusive_loads; /* pages are invalidated after being loaded */
 };
 
 int frontswap_register_ops(const struct frontswap_ops *ops);
diff --git a/mm/Kconfig b/mm/Kconfig
index 7672a22647b4..92c30879bf67 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -46,6 +46,19 @@  config ZSWAP_DEFAULT_ON
 	  The selection made here can be overridden by using the kernel
 	  command line 'zswap.enabled=' option.
 
+config ZSWAP_EXCLUSIVE_LOADS
+	bool "Invalidate zswap entries when pages are loaded"
+	depends on ZSWAP
+	help
+	  If selected, when a page is loaded from zswap, the zswap entry is
+	  invalidated at once, as opposed to leaving it in zswap until the
+	  swap entry is freed.
+
+	  This avoids having two copies of the same page in memory
+	  (compressed and uncompressed) after faulting in a page from zswap.
+	  The cost is that if the page was never dirtied and needs to be
+	  swapped out again, it will be re-compressed.
+
 choice
 	prompt "Default compressor"
 	depends on ZSWAP
diff --git a/mm/frontswap.c b/mm/frontswap.c
index 279e55b4ed87..e5d6825110f4 100644
--- a/mm/frontswap.c
+++ b/mm/frontswap.c
@@ -216,8 +216,13 @@  int __frontswap_load(struct page *page)
 
 	/* Try loading from each implementation, until one succeeds. */
 	ret = frontswap_ops->load(type, offset, page);
-	if (ret == 0)
+	if (ret == 0) {
 		inc_frontswap_loads();
+		if (frontswap_ops->exclusive_loads) {
+			SetPageDirty(page);
+			__frontswap_clear(sis, offset);
+		}
+	}
 	return ret;
 }
 
diff --git a/mm/zswap.c b/mm/zswap.c
index 59da2a415fbb..fba80330afd1 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1329,6 +1329,16 @@  static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 	goto reject;
 }
 
+static void zswap_invalidate_entry(struct zswap_tree *tree,
+				   struct zswap_entry *entry)
+{
+	/* remove from rbtree */
+	zswap_rb_erase(&tree->rbroot, entry);
+
+	/* drop the initial reference from entry creation */
+	zswap_entry_put(tree, entry);
+}
+
 /*
  * returns 0 if the page was successfully decompressed
  * return -1 on entry not found or error
@@ -1403,6 +1413,8 @@  static int zswap_frontswap_load(unsigned type, pgoff_t offset,
 		count_objcg_event(entry->objcg, ZSWPIN);
 freeentry:
 	spin_lock(&tree->lock);
+	if (!ret && IS_ENABLED(CONFIG_ZSWAP_EXCLUSIVE_LOADS))
+		zswap_invalidate_entry(tree, entry);
 	zswap_entry_put(tree, entry);
 	spin_unlock(&tree->lock);
 
@@ -1423,13 +1435,7 @@  static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
 		spin_unlock(&tree->lock);
 		return;
 	}
-
-	/* remove from rbtree */
-	zswap_rb_erase(&tree->rbroot, entry);
-
-	/* drop the initial reference from entry creation */
-	zswap_entry_put(tree, entry);
-
+	zswap_invalidate_entry(tree, entry);
 	spin_unlock(&tree->lock);
 }
 
@@ -1472,7 +1478,8 @@  static const struct frontswap_ops zswap_frontswap_ops = {
 	.load = zswap_frontswap_load,
 	.invalidate_page = zswap_frontswap_invalidate_page,
 	.invalidate_area = zswap_frontswap_invalidate_area,
-	.init = zswap_frontswap_init
+	.init = zswap_frontswap_init,
+	.exclusive_loads = IS_ENABLED(CONFIG_ZSWAP_EXCLUSIVE_LOADS),
 };
 
 /*********************************