Message ID | 20240816144344.18135-1-me@yhndnzj.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,1/2] mm/memcontrol: respect zswap.writeback setting from parent cg too | expand |
On Fri, Aug 16, 2024 at 7:44 AM Mike Yuan <me@yhndnzj.com> wrote: > > Currently, the behavior of zswap.writeback wrt. > the cgroup hierarchy seems a bit odd. Unlike zswap.max, > it doesn't honor the value from parent cgroups. This > surfaced when people tried to globally disable zswap writeback, > i.e. reserve physical swap space only for hibernation [1] - > disabling zswap.writeback only for the root cgroup results > in subcgroups with zswap.writeback=1 still performing writeback. > > The inconsistency became more noticeable after I introduced > the MemoryZSwapWriteback= systemd unit setting [2] for > controlling the knob. The patch assumed that the kernel would > enforce the value of parent cgroups. It could probably be > workarounded from systemd's side, by going up the slice unit > tree and inheriting the value. Yet I think it's more sensible > to make it behave consistently with zswap.max and friends. > > [1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation > [2] https://github.com/systemd/systemd/pull/31734 > > Changes in v2: > - Actually base on latest tree (is_zswap_enabled() -> zswap_is_enabled()) > - Updated Documentation/admin-guide/cgroup-v2.rst to reflect the change > > Link to v1: https://lore.kernel.org/linux-kernel/20240814171800.23558-1-me@yhndnzj.com/ > > Cc: Nhat Pham <nphamcs@gmail.com> > Cc: Yosry Ahmed <yosryahmed@google.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > > Signed-off-by: Mike Yuan <me@yhndnzj.com> > Reviewed-by: Nhat Pham <nphamcs@gmail.com> LGTM, Acked-by: Yosry Ahmed <yosryahmed@google.com> > --- > Documentation/admin-guide/cgroup-v2.rst | 5 ++++- > mm/memcontrol.c | 9 ++++++++- > 2 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 86311c2907cd..80906cea4264 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1719,7 +1719,10 @@ The following nested keys are defined. > memory.zswap.writeback > A read-write single value file. The default value is "1". The > initial value of the root cgroup is 1, and when a new cgroup is > - created, it inherits the current value of its parent. > + created, it inherits the current value of its parent. Note that > + this setting is hierarchical, i.e. the writeback would be > + implicitly disabled for child cgroups if the upper hierarchy > + does so. > > When this is set to 0, all swapping attempts to swapping devices > are disabled. This included both zswap writebacks, and swapping due > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f29157288b7d..327b2b030639 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5320,7 +5320,14 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size) > bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) > { > /* if zswap is disabled, do not block pages going to the swapping device */ > - return !zswap_is_enabled() || !memcg || READ_ONCE(memcg->zswap_writeback); > + if (!zswap_is_enabled()) > + return true; This is orthogonal to this patch, but I just realized that we completely ignore memory.zswap_writeback if zswap is disabled. This means that if a cgroup has disabled writeback, then zswap is globally disabled for some reason, we stop respecting the cgroup knob. I guess the rationale could be that we want to help get pages out of zswap as much as possible to honor zswap's disablement? Nhat, did I get that right? I feel like it's a little bit odd to be honest, but I don't have a strong opinion on it. Maybe we should document this behavior better. > + > + for (; memcg; memcg = parent_mem_cgroup(memcg)) > + if (!READ_ONCE(memcg->zswap_writeback)) > + return false; > + > + return true; > } > > static u64 zswap_current_read(struct cgroup_subsys_state *css, > > base-commit: d07b43284ab356daf7ec5ae1858a16c1c7b6adab > -- > 2.46.0 > >
On 2024-08-19 at 12:09 -0700, Yosry Ahmed wrote: > On Fri, Aug 16, 2024 at 7:44 AM Mike Yuan <me@yhndnzj.com> wrote: > > > > Currently, the behavior of zswap.writeback wrt. > > the cgroup hierarchy seems a bit odd. Unlike zswap.max, > > it doesn't honor the value from parent cgroups. This > > surfaced when people tried to globally disable zswap writeback, > > i.e. reserve physical swap space only for hibernation [1] - > > disabling zswap.writeback only for the root cgroup results > > in subcgroups with zswap.writeback=1 still performing writeback. > > > > The inconsistency became more noticeable after I introduced > > the MemoryZSwapWriteback= systemd unit setting [2] for > > controlling the knob. The patch assumed that the kernel would > > enforce the value of parent cgroups. It could probably be > > workarounded from systemd's side, by going up the slice unit > > tree and inheriting the value. Yet I think it's more sensible > > to make it behave consistently with zswap.max and friends. > > > > [1] > > https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation > > [2] https://github.com/systemd/systemd/pull/31734 > > > > Changes in v2: > > - Actually base on latest tree (is_zswap_enabled() -> > > zswap_is_enabled()) > > - Updated Documentation/admin-guide/cgroup-v2.rst to reflect the > > change > > > > Link to v1: > > https://lore.kernel.org/linux-kernel/20240814171800.23558-1-me@yhndnzj.com/ > > > > Cc: Nhat Pham <nphamcs@gmail.com> > > Cc: Yosry Ahmed <yosryahmed@google.com> > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > Signed-off-by: Mike Yuan <me@yhndnzj.com> > > Reviewed-by: Nhat Pham <nphamcs@gmail.com> > > LGTM, > Acked-by: Yosry Ahmed <yosryahmed@google.com> > > > --- > > Documentation/admin-guide/cgroup-v2.rst | 5 ++++- > > mm/memcontrol.c | 9 ++++++++- > > 2 files changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst > > b/Documentation/admin-guide/cgroup-v2.rst > > index 86311c2907cd..80906cea4264 100644 > > --- a/Documentation/admin-guide/cgroup-v2.rst > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > @@ -1719,7 +1719,10 @@ The following nested keys are defined. > > memory.zswap.writeback > > A read-write single value file. The default value is "1". > > The > > initial value of the root cgroup is 1, and when a new > > cgroup is > > - created, it inherits the current value of its parent. > > + created, it inherits the current value of its parent. Note > > that > > + this setting is hierarchical, i.e. the writeback would be > > + implicitly disabled for child cgroups if the upper > > hierarchy > > + does so. > > > > When this is set to 0, all swapping attempts to swapping > > devices > > are disabled. This included both zswap writebacks, and > > swapping due > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index f29157288b7d..327b2b030639 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -5320,7 +5320,14 @@ void obj_cgroup_uncharge_zswap(struct > > obj_cgroup *objcg, size_t size) > > bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) > > { > > /* if zswap is disabled, do not block pages going to the > > swapping device */ > > - return !zswap_is_enabled() || !memcg || READ_ONCE(memcg- > > >zswap_writeback); > > + if (!zswap_is_enabled()) > > + return true; > > This is orthogonal to this patch, but I just realized that we > completely ignore memory.zswap_writeback if zswap is disabled. This > means that if a cgroup has disabled writeback, then zswap is globally > disabled for some reason, we stop respecting the cgroup knob. I guess > the rationale could be that we want to help get pages out of zswap as > much as possible to honor zswap's disablement? Nhat, did I get that > right? Hmm, I think the current behavior makes more sense. If zswap is completely disabled, it seems intuitive that zswap-related knobs lose their effect. > I feel like it's a little bit odd to be honest, but I don't have a > strong opinion on it. Maybe we should document this behavior better. But clarify this in the documentation certainly sounds good :) > > > + > > + for (; memcg; memcg = parent_mem_cgroup(memcg)) > > + if (!READ_ONCE(memcg->zswap_writeback)) > > + return false; > > + > > + return true; > > } > > > > static u64 zswap_current_read(struct cgroup_subsys_state *css, > > > > base-commit: d07b43284ab356daf7ec5ae1858a16c1c7b6adab > > -- > > 2.46.0 > > > >
On Tue, Aug 20, 2024 at 5:38 AM Mike Yuan <me@yhndnzj.com> wrote: > > On 2024-08-19 at 12:09 -0700, Yosry Ahmed wrote: > > On Fri, Aug 16, 2024 at 7:44 AM Mike Yuan <me@yhndnzj.com> wrote: > > > > > > Currently, the behavior of zswap.writeback wrt. > > > the cgroup hierarchy seems a bit odd. Unlike zswap.max, > > > it doesn't honor the value from parent cgroups. This > > > surfaced when people tried to globally disable zswap writeback, > > > i.e. reserve physical swap space only for hibernation [1] - > > > disabling zswap.writeback only for the root cgroup results > > > in subcgroups with zswap.writeback=1 still performing writeback. > > > > > > The inconsistency became more noticeable after I introduced > > > the MemoryZSwapWriteback= systemd unit setting [2] for > > > controlling the knob. The patch assumed that the kernel would > > > enforce the value of parent cgroups. It could probably be > > > workarounded from systemd's side, by going up the slice unit > > > tree and inheriting the value. Yet I think it's more sensible > > > to make it behave consistently with zswap.max and friends. > > > > > > [1] > > > https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation > > > [2] https://github.com/systemd/systemd/pull/31734 > > > > > > Changes in v2: > > > - Actually base on latest tree (is_zswap_enabled() -> > > > zswap_is_enabled()) > > > - Updated Documentation/admin-guide/cgroup-v2.rst to reflect the > > > change > > > > > > Link to v1: > > > https://lore.kernel.org/linux-kernel/20240814171800.23558-1-me@yhndnzj.com/ > > > > > > Cc: Nhat Pham <nphamcs@gmail.com> > > > Cc: Yosry Ahmed <yosryahmed@google.com> > > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > > > Signed-off-by: Mike Yuan <me@yhndnzj.com> > > > Reviewed-by: Nhat Pham <nphamcs@gmail.com> > > > > LGTM, > > Acked-by: Yosry Ahmed <yosryahmed@google.com> > > > > > --- > > > Documentation/admin-guide/cgroup-v2.rst | 5 ++++- > > > mm/memcontrol.c | 9 ++++++++- > > > 2 files changed, 12 insertions(+), 2 deletions(-) > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst > > > b/Documentation/admin-guide/cgroup-v2.rst > > > index 86311c2907cd..80906cea4264 100644 > > > --- a/Documentation/admin-guide/cgroup-v2.rst > > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > > @@ -1719,7 +1719,10 @@ The following nested keys are defined. > > > memory.zswap.writeback > > > A read-write single value file. The default value is "1". > > > The > > > initial value of the root cgroup is 1, and when a new > > > cgroup is > > > - created, it inherits the current value of its parent. > > > + created, it inherits the current value of its parent. Note > > > that > > > + this setting is hierarchical, i.e. the writeback would be > > > + implicitly disabled for child cgroups if the upper > > > hierarchy > > > + does so. > > > > > > When this is set to 0, all swapping attempts to swapping > > > devices > > > are disabled. This included both zswap writebacks, and > > > swapping due > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index f29157288b7d..327b2b030639 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -5320,7 +5320,14 @@ void obj_cgroup_uncharge_zswap(struct > > > obj_cgroup *objcg, size_t size) > > > bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) > > > { > > > /* if zswap is disabled, do not block pages going to the > > > swapping device */ > > > - return !zswap_is_enabled() || !memcg || READ_ONCE(memcg- > > > >zswap_writeback); > > > + if (!zswap_is_enabled()) > > > + return true; > > > > This is orthogonal to this patch, but I just realized that we > > completely ignore memory.zswap_writeback if zswap is disabled. This > > means that if a cgroup has disabled writeback, then zswap is globally > > disabled for some reason, we stop respecting the cgroup knob. I guess > > the rationale could be that we want to help get pages out of zswap as > > much as possible to honor zswap's disablement? Nhat, did I get that > > right? > > Hmm, I think the current behavior makes more sense. If zswap is > completely > disabled, it seems intuitive that zswap-related knobs lose their > effect. Mike is right here. It's less of a behavioral decision, but more of a this-can-confuse-users kind of thing :) At least that's my rationale when I wrote this. If users want to disable swap still, they can still do that with memory.swap.max = 0, which is probably better because it would fail earlier at the swap slot allocation step. > > > I feel like it's a little bit odd to be honest, but I don't have a > > strong opinion on it. Maybe we should document this behavior better. > > But clarify this in the documentation certainly sounds good :) But yes, better documentation == happy Nhat :)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 86311c2907cd..80906cea4264 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1719,7 +1719,10 @@ The following nested keys are defined. memory.zswap.writeback A read-write single value file. The default value is "1". The initial value of the root cgroup is 1, and when a new cgroup is - created, it inherits the current value of its parent. + created, it inherits the current value of its parent. Note that + this setting is hierarchical, i.e. the writeback would be + implicitly disabled for child cgroups if the upper hierarchy + does so. When this is set to 0, all swapping attempts to swapping devices are disabled. This included both zswap writebacks, and swapping due diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f29157288b7d..327b2b030639 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5320,7 +5320,14 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size) bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) { /* if zswap is disabled, do not block pages going to the swapping device */ - return !zswap_is_enabled() || !memcg || READ_ONCE(memcg->zswap_writeback); + if (!zswap_is_enabled()) + return true; + + for (; memcg; memcg = parent_mem_cgroup(memcg)) + if (!READ_ONCE(memcg->zswap_writeback)) + return false; + + return true; } static u64 zswap_current_read(struct cgroup_subsys_state *css,