diff mbox series

[v2,1/2] mm/memcontrol: respect zswap.writeback setting from parent cg too

Message ID 20240816144344.18135-1-me@yhndnzj.com (mailing list archive)
State New
Headers show
Series [v2,1/2] mm/memcontrol: respect zswap.writeback setting from parent cg too | expand

Commit Message

Mike Yuan Aug. 16, 2024, 2:44 p.m. UTC
Currently, the behavior of zswap.writeback wrt.
the cgroup hierarchy seems a bit odd. Unlike zswap.max,
it doesn't honor the value from parent cgroups. This
surfaced when people tried to globally disable zswap writeback,
i.e. reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results
in subcgroups with zswap.writeback=1 still performing writeback.

The inconsistency became more noticeable after I introduced
the MemoryZSwapWriteback= systemd unit setting [2] for
controlling the knob. The patch assumed that the kernel would
enforce the value of parent cgroups. It could probably be
workarounded from systemd's side, by going up the slice unit
tree and inheriting the value. Yet I think it's more sensible
to make it behave consistently with zswap.max and friends.

[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
[2] https://github.com/systemd/systemd/pull/31734

Changes in v2:
- Actually base on latest tree (is_zswap_enabled() -> zswap_is_enabled())
- Updated Documentation/admin-guide/cgroup-v2.rst to reflect the change

Link to v1: https://lore.kernel.org/linux-kernel/20240814171800.23558-1-me@yhndnzj.com/

Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Mike Yuan <me@yhndnzj.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 5 ++++-
 mm/memcontrol.c                         | 9 ++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)


base-commit: d07b43284ab356daf7ec5ae1858a16c1c7b6adab

Comments

Yosry Ahmed Aug. 19, 2024, 7:09 p.m. UTC | #1
On Fri, Aug 16, 2024 at 7:44 AM Mike Yuan <me@yhndnzj.com> wrote:
>
> Currently, the behavior of zswap.writeback wrt.
> the cgroup hierarchy seems a bit odd. Unlike zswap.max,
> it doesn't honor the value from parent cgroups. This
> surfaced when people tried to globally disable zswap writeback,
> i.e. reserve physical swap space only for hibernation [1] -
> disabling zswap.writeback only for the root cgroup results
> in subcgroups with zswap.writeback=1 still performing writeback.
>
> The inconsistency became more noticeable after I introduced
> the MemoryZSwapWriteback= systemd unit setting [2] for
> controlling the knob. The patch assumed that the kernel would
> enforce the value of parent cgroups. It could probably be
> workarounded from systemd's side, by going up the slice unit
> tree and inheriting the value. Yet I think it's more sensible
> to make it behave consistently with zswap.max and friends.
>
> [1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
> [2] https://github.com/systemd/systemd/pull/31734
>
> Changes in v2:
> - Actually base on latest tree (is_zswap_enabled() -> zswap_is_enabled())
> - Updated Documentation/admin-guide/cgroup-v2.rst to reflect the change
>
> Link to v1: https://lore.kernel.org/linux-kernel/20240814171800.23558-1-me@yhndnzj.com/
>
> Cc: Nhat Pham <nphamcs@gmail.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
>
> Signed-off-by: Mike Yuan <me@yhndnzj.com>
> Reviewed-by: Nhat Pham <nphamcs@gmail.com>

LGTM,
Acked-by: Yosry Ahmed <yosryahmed@google.com>

> ---
>  Documentation/admin-guide/cgroup-v2.rst | 5 ++++-
>  mm/memcontrol.c                         | 9 ++++++++-
>  2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 86311c2907cd..80906cea4264 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1719,7 +1719,10 @@ The following nested keys are defined.
>    memory.zswap.writeback
>         A read-write single value file. The default value is "1". The
>         initial value of the root cgroup is 1, and when a new cgroup is
> -       created, it inherits the current value of its parent.
> +       created, it inherits the current value of its parent. Note that
> +       this setting is hierarchical, i.e. the writeback would be
> +       implicitly disabled for child cgroups if the upper hierarchy
> +       does so.
>
>         When this is set to 0, all swapping attempts to swapping devices
>         are disabled. This included both zswap writebacks, and swapping due
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f29157288b7d..327b2b030639 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5320,7 +5320,14 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
>  bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
>  {
>         /* if zswap is disabled, do not block pages going to the swapping device */
> -       return !zswap_is_enabled() || !memcg || READ_ONCE(memcg->zswap_writeback);
> +       if (!zswap_is_enabled())
> +               return true;

This is orthogonal to this patch, but I just realized that we
completely ignore memory.zswap_writeback if zswap is disabled. This
means that if a cgroup has disabled writeback, then zswap is globally
disabled for some reason, we stop respecting the cgroup knob. I guess
the rationale could be that we want to help get pages out of zswap as
much as possible to honor zswap's disablement? Nhat, did I get that
right?

I feel like it's a little bit odd to be honest, but I don't have a
strong opinion on it. Maybe we should document this behavior better.


> +
> +       for (; memcg; memcg = parent_mem_cgroup(memcg))
> +               if (!READ_ONCE(memcg->zswap_writeback))
> +                       return false;
> +
> +       return true;
>  }
>
>  static u64 zswap_current_read(struct cgroup_subsys_state *css,
>
> base-commit: d07b43284ab356daf7ec5ae1858a16c1c7b6adab
> --
> 2.46.0
>
>
Mike Yuan Aug. 20, 2024, 9:38 a.m. UTC | #2
On 2024-08-19 at 12:09 -0700, Yosry Ahmed wrote:
> On Fri, Aug 16, 2024 at 7:44 AM Mike Yuan <me@yhndnzj.com> wrote:
> > 
> > Currently, the behavior of zswap.writeback wrt.
> > the cgroup hierarchy seems a bit odd. Unlike zswap.max,
> > it doesn't honor the value from parent cgroups. This
> > surfaced when people tried to globally disable zswap writeback,
> > i.e. reserve physical swap space only for hibernation [1] -
> > disabling zswap.writeback only for the root cgroup results
> > in subcgroups with zswap.writeback=1 still performing writeback.
> > 
> > The inconsistency became more noticeable after I introduced
> > the MemoryZSwapWriteback= systemd unit setting [2] for
> > controlling the knob. The patch assumed that the kernel would
> > enforce the value of parent cgroups. It could probably be
> > workarounded from systemd's side, by going up the slice unit
> > tree and inheriting the value. Yet I think it's more sensible
> > to make it behave consistently with zswap.max and friends.
> > 
> > [1]
> > https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
> > [2] https://github.com/systemd/systemd/pull/31734
> > 
> > Changes in v2:
> > - Actually base on latest tree (is_zswap_enabled() ->
> > zswap_is_enabled())
> > - Updated Documentation/admin-guide/cgroup-v2.rst to reflect the
> > change
> > 
> > Link to v1:
> > https://lore.kernel.org/linux-kernel/20240814171800.23558-1-me@yhndnzj.com/
> > 
> > Cc: Nhat Pham <nphamcs@gmail.com>
> > Cc: Yosry Ahmed <yosryahmed@google.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > 
> > Signed-off-by: Mike Yuan <me@yhndnzj.com>
> > Reviewed-by: Nhat Pham <nphamcs@gmail.com>
> 
> LGTM,
> Acked-by: Yosry Ahmed <yosryahmed@google.com>
> 
> > ---
> >  Documentation/admin-guide/cgroup-v2.rst | 5 ++++-
> >  mm/memcontrol.c                         | 9 ++++++++-
> >  2 files changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst
> > b/Documentation/admin-guide/cgroup-v2.rst
> > index 86311c2907cd..80906cea4264 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1719,7 +1719,10 @@ The following nested keys are defined.
> >    memory.zswap.writeback
> >         A read-write single value file. The default value is "1".
> > The
> >         initial value of the root cgroup is 1, and when a new
> > cgroup is
> > -       created, it inherits the current value of its parent.
> > +       created, it inherits the current value of its parent. Note
> > that
> > +       this setting is hierarchical, i.e. the writeback would be
> > +       implicitly disabled for child cgroups if the upper
> > hierarchy
> > +       does so.
> > 
> >         When this is set to 0, all swapping attempts to swapping
> > devices
> >         are disabled. This included both zswap writebacks, and
> > swapping due
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f29157288b7d..327b2b030639 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5320,7 +5320,14 @@ void obj_cgroup_uncharge_zswap(struct
> > obj_cgroup *objcg, size_t size)
> >  bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
> >  {
> >         /* if zswap is disabled, do not block pages going to the
> > swapping device */
> > -       return !zswap_is_enabled() || !memcg || READ_ONCE(memcg-
> > >zswap_writeback);
> > +       if (!zswap_is_enabled())
> > +               return true;
> 
> This is orthogonal to this patch, but I just realized that we
> completely ignore memory.zswap_writeback if zswap is disabled. This
> means that if a cgroup has disabled writeback, then zswap is globally
> disabled for some reason, we stop respecting the cgroup knob. I guess
> the rationale could be that we want to help get pages out of zswap as
> much as possible to honor zswap's disablement? Nhat, did I get that
> right?

Hmm, I think the current behavior makes more sense. If zswap is
completely
disabled, it seems intuitive that zswap-related knobs lose their
effect.

> I feel like it's a little bit odd to be honest, but I don't have a
> strong opinion on it. Maybe we should document this behavior better.

But clarify this in the documentation certainly sounds good :)

> 
> > +
> > +       for (; memcg; memcg = parent_mem_cgroup(memcg))
> > +               if (!READ_ONCE(memcg->zswap_writeback))
> > +                       return false;
> > +
> > +       return true;
> >  }
> > 
> >  static u64 zswap_current_read(struct cgroup_subsys_state *css,
> > 
> > base-commit: d07b43284ab356daf7ec5ae1858a16c1c7b6adab
> > --
> > 2.46.0
> > 
> >
Nhat Pham Aug. 20, 2024, 3:26 p.m. UTC | #3
On Tue, Aug 20, 2024 at 5:38 AM Mike Yuan <me@yhndnzj.com> wrote:
>
> On 2024-08-19 at 12:09 -0700, Yosry Ahmed wrote:
> > On Fri, Aug 16, 2024 at 7:44 AM Mike Yuan <me@yhndnzj.com> wrote:
> > >
> > > Currently, the behavior of zswap.writeback wrt.
> > > the cgroup hierarchy seems a bit odd. Unlike zswap.max,
> > > it doesn't honor the value from parent cgroups. This
> > > surfaced when people tried to globally disable zswap writeback,
> > > i.e. reserve physical swap space only for hibernation [1] -
> > > disabling zswap.writeback only for the root cgroup results
> > > in subcgroups with zswap.writeback=1 still performing writeback.
> > >
> > > The inconsistency became more noticeable after I introduced
> > > the MemoryZSwapWriteback= systemd unit setting [2] for
> > > controlling the knob. The patch assumed that the kernel would
> > > enforce the value of parent cgroups. It could probably be
> > > workarounded from systemd's side, by going up the slice unit
> > > tree and inheriting the value. Yet I think it's more sensible
> > > to make it behave consistently with zswap.max and friends.
> > >
> > > [1]
> > > https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
> > > [2] https://github.com/systemd/systemd/pull/31734
> > >
> > > Changes in v2:
> > > - Actually base on latest tree (is_zswap_enabled() ->
> > > zswap_is_enabled())
> > > - Updated Documentation/admin-guide/cgroup-v2.rst to reflect the
> > > change
> > >
> > > Link to v1:
> > > https://lore.kernel.org/linux-kernel/20240814171800.23558-1-me@yhndnzj.com/
> > >
> > > Cc: Nhat Pham <nphamcs@gmail.com>
> > > Cc: Yosry Ahmed <yosryahmed@google.com>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > >
> > > Signed-off-by: Mike Yuan <me@yhndnzj.com>
> > > Reviewed-by: Nhat Pham <nphamcs@gmail.com>
> >
> > LGTM,
> > Acked-by: Yosry Ahmed <yosryahmed@google.com>
> >
> > > ---
> > >  Documentation/admin-guide/cgroup-v2.rst | 5 ++++-
> > >  mm/memcontrol.c                         | 9 ++++++++-
> > >  2 files changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst
> > > b/Documentation/admin-guide/cgroup-v2.rst
> > > index 86311c2907cd..80906cea4264 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1719,7 +1719,10 @@ The following nested keys are defined.
> > >    memory.zswap.writeback
> > >         A read-write single value file. The default value is "1".
> > > The
> > >         initial value of the root cgroup is 1, and when a new
> > > cgroup is
> > > -       created, it inherits the current value of its parent.
> > > +       created, it inherits the current value of its parent. Note
> > > that
> > > +       this setting is hierarchical, i.e. the writeback would be
> > > +       implicitly disabled for child cgroups if the upper
> > > hierarchy
> > > +       does so.
> > >
> > >         When this is set to 0, all swapping attempts to swapping
> > > devices
> > >         are disabled. This included both zswap writebacks, and
> > > swapping due
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index f29157288b7d..327b2b030639 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -5320,7 +5320,14 @@ void obj_cgroup_uncharge_zswap(struct
> > > obj_cgroup *objcg, size_t size)
> > >  bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
> > >  {
> > >         /* if zswap is disabled, do not block pages going to the
> > > swapping device */
> > > -       return !zswap_is_enabled() || !memcg || READ_ONCE(memcg-
> > > >zswap_writeback);
> > > +       if (!zswap_is_enabled())
> > > +               return true;
> >
> > This is orthogonal to this patch, but I just realized that we
> > completely ignore memory.zswap_writeback if zswap is disabled. This
> > means that if a cgroup has disabled writeback, then zswap is globally
> > disabled for some reason, we stop respecting the cgroup knob. I guess
> > the rationale could be that we want to help get pages out of zswap as
> > much as possible to honor zswap's disablement? Nhat, did I get that
> > right?
>
> Hmm, I think the current behavior makes more sense. If zswap is
> completely
> disabled, it seems intuitive that zswap-related knobs lose their
> effect.

Mike is right here. It's less of a behavioral decision, but more of a
this-can-confuse-users kind of thing :) At least that's my rationale
when I wrote this.

If users want to disable swap still, they can still do that with
memory.swap.max = 0, which is probably better because it would fail
earlier at the swap slot allocation step.

>
> > I feel like it's a little bit odd to be honest, but I don't have a
> > strong opinion on it. Maybe we should document this behavior better.
>
> But clarify this in the documentation certainly sounds good :)

But yes, better documentation == happy Nhat :)
diff mbox series

Patch

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 86311c2907cd..80906cea4264 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1719,7 +1719,10 @@  The following nested keys are defined.
   memory.zswap.writeback
 	A read-write single value file. The default value is "1". The
 	initial value of the root cgroup is 1, and when a new cgroup is
-	created, it inherits the current value of its parent.
+	created, it inherits the current value of its parent. Note that
+	this setting is hierarchical, i.e. the writeback would be
+	implicitly disabled for child cgroups if the upper hierarchy
+	does so.
 
 	When this is set to 0, all swapping attempts to swapping devices
 	are disabled. This included both zswap writebacks, and swapping due
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f29157288b7d..327b2b030639 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5320,7 +5320,14 @@  void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
 bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
 {
 	/* if zswap is disabled, do not block pages going to the swapping device */
-	return !zswap_is_enabled() || !memcg || READ_ONCE(memcg->zswap_writeback);
+	if (!zswap_is_enabled())
+		return true;
+
+	for (; memcg; memcg = parent_mem_cgroup(memcg))
+		if (!READ_ONCE(memcg->zswap_writeback))
+			return false;
+
+	return true;
 }
 
 static u64 zswap_current_read(struct cgroup_subsys_state *css,