diff mbox series

mm, memcg: fix inconsistent oom event behavior

Message ID 20200412140427.6732-1-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm, memcg: fix inconsistent oom event behavior | expand

Commit Message

Yafang Shao April 12, 2020, 2:04 p.m. UTC
A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
memory.events") changes the behavior of memcg events, which will
consider subtrees in memory.events. But oom_kill event is a special one
as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
in memory.oom_control. The file memory.oom_control is in both root memcg
and non root memcg, that is different with memory.event as it only in
non-root memcg. That commit is okay for cgroup2, but it is not okay for
cgroup1 as it will cause inconsistent behavior between root memcg and
non-root memcg.
Let's recover the original behavior for cgroup1.

Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/memcontrol.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Shakeel Butt April 13, 2020, 5:05 p.m. UTC | #1
On Sun, Apr 12, 2020 at 7:04 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> memory.events") changes the behavior of memcg events, which will
> consider subtrees in memory.events. But oom_kill event is a special one
> as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> in memory.oom_control. The file memory.oom_control is in both root memcg
> and non root memcg, that is different with memory.event as it only in
> non-root memcg. That commit is okay for cgroup2, but it is not okay for
> cgroup1 as it will cause inconsistent behavior between root memcg and
> non-root memcg.

I still couldn't understand the cgroup v1's root vs non_root behavior
change. The behavior change I see is the hierarchical one i.e.
MEMCG_OOM_KILL event in the descendant will cause the notification and
count increment in the ancestors even in the cgroup v1. I suppose we
don't want that behavior change in v1.

> Let's recover the original behavior for cgroup1.
>
> Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> Cc: Chris Down <chris@chrisdown.name>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/linux/memcontrol.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 8c340e6b347f..a0ae080a67d1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
>                 atomic_long_inc(&memcg->memory_events[event]);
>                 cgroup_file_notify(&memcg->events_file);
>
> -               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> +               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
> +                   !cgroup_subsys_on_dfl(memory_cgrp_subsys))
>                         break;
>         } while ((memcg = parent_mem_cgroup(memcg)) &&
>                  !mem_cgroup_is_root(memcg));
> --
> 2.18.2
>
Chris Down April 13, 2020, 7:31 p.m. UTC | #2
Hi Yafang,

Yafang Shao writes:
>A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
>memory.events") changes the behavior of memcg events, which will
>consider subtrees in memory.events. But oom_kill event is a special one
>as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
>in memory.oom_control. The file memory.oom_control is in both root memcg
>and non root memcg, that is different with memory.event as it only in
>non-root memcg. That commit is okay for cgroup2, but it is not okay for
>cgroup1 as it will cause inconsistent behavior between root memcg and
>non-root memcg.
>Let's recover the original behavior for cgroup1.

Can you please explain the practical ramifications of this and show an 
explicitly laid out example of how this manifests, with numbers and scenarios? 
It's unclear to me that this is a real problem as is -- it may be, but there 
certainly needs to be more information.

Thanks,

Chris
Yafang Shao April 14, 2020, 12:35 a.m. UTC | #3
On Tue, Apr 14, 2020 at 1:06 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Sun, Apr 12, 2020 at 7:04 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> > memory.events") changes the behavior of memcg events, which will
> > consider subtrees in memory.events. But oom_kill event is a special one
> > as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> > in memory.oom_control. The file memory.oom_control is in both root memcg
> > and non root memcg, that is different with memory.event as it only in
> > non-root memcg. That commit is okay for cgroup2, but it is not okay for
> > cgroup1 as it will cause inconsistent behavior between root memcg and
> > non-root memcg.
>
> I still couldn't understand the cgroup v1's root vs non_root behavior
> change. The behavior change I see is the hierarchical one i.e.
> MEMCG_OOM_KILL event in the descendant will cause the notification and
> count increment in the ancestors even in the cgroup v1.

For the non-root memcg, its memory.oom_control(oom_kill) includes its
descendants' oom_kill, but for root memcg, it doesn't include its
descendants' oom_kill. That means, memory.oom_control(oom_kill) has
different meanings in different memcgs. That is inconsistent.

[snip]
> I suppose we
> don't want that behavior change in v1.
>

That is another topic. I think this feature is welcomed to cgroup1, if
we can fully support it, for example by adding memory.events.local
into cgroup1 as well, but as far as I know the cgroup1 is frozen.

> > Let's recover the original behavior for cgroup1.
> >
> > Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Shakeel Butt <shakeelb@google.com>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/linux/memcontrol.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 8c340e6b347f..a0ae080a67d1 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
> >                 atomic_long_inc(&memcg->memory_events[event]);
> >                 cgroup_file_notify(&memcg->events_file);
> >
> > -               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > +               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
> > +                   !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> >                         break;
> >         } while ((memcg = parent_mem_cgroup(memcg)) &&
> >                  !mem_cgroup_is_root(memcg));
> > --
> > 2.18.2
> >


Thanks
Yafang
Yafang Shao April 14, 2020, 12:41 a.m. UTC | #4
On Tue, Apr 14, 2020 at 3:31 AM Chris Down <chris@chrisdown.name> wrote:
>
> Hi Yafang,
>
> Yafang Shao writes:
> >A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> >memory.events") changes the behavior of memcg events, which will
> >consider subtrees in memory.events. But oom_kill event is a special one
> >as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> >in memory.oom_control. The file memory.oom_control is in both root memcg
> >and non root memcg, that is different with memory.event as it only in
> >non-root memcg. That commit is okay for cgroup2, but it is not okay for
> >cgroup1 as it will cause inconsistent behavior between root memcg and
> >non-root memcg.
> >Let's recover the original behavior for cgroup1.
>
> Can you please explain the practical ramifications of this and show an
> explicitly laid out example of how this manifests, with numbers and scenarios?
> It's unclear to me that this is a real problem as is -- it may be, but there
> certainly needs to be more information.
>

Here's an example.

     root memcg
     /
  memcg foo
   /
memcg bar

Suppose there's an oom_kill in memcg bar, then the oon_kill will be

     root memcg : memory.oom_control(oom_kill)  0
     /
  memcg foo : memory.oom_control(oom_kill)  1
   /
memcg bar : memory.oom_control(oom_kill)  1

Then the user has to know whether the memcg is root or not, if it is
root memcg, then memory.oom_control(oom_kill)  is its local event
only, while if it is not root memcg, then memory.oom_control(oom_kill)
includes all its descendants' oom_kill events.


Thanks
Yafang
Shakeel Butt April 14, 2020, 12:53 a.m. UTC | #5
On Mon, Apr 13, 2020 at 5:36 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Apr 14, 2020 at 1:06 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Sun, Apr 12, 2020 at 7:04 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> > > memory.events") changes the behavior of memcg events, which will
> > > consider subtrees in memory.events. But oom_kill event is a special one
> > > as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> > > in memory.oom_control. The file memory.oom_control is in both root memcg
> > > and non root memcg, that is different with memory.event as it only in
> > > non-root memcg. That commit is okay for cgroup2, but it is not okay for
> > > cgroup1 as it will cause inconsistent behavior between root memcg and
> > > non-root memcg.
> >
> > I still couldn't understand the cgroup v1's root vs non_root behavior
> > change. The behavior change I see is the hierarchical one i.e.
> > MEMCG_OOM_KILL event in the descendant will cause the notification and
> > count increment in the ancestors even in the cgroup v1.
>
> For the non-root memcg, its memory.oom_control(oom_kill) includes its
> descendants' oom_kill, but for root memcg, it doesn't include its
> descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> different meanings in different memcgs. That is inconsistent.
>
> [snip]
> > I suppose we
> > don't want that behavior change in v1.
> >
>
> That is another topic. I think this feature is welcomed to cgroup1, if
> we can fully support it, for example by adding memory.events.local
> into cgroup1 as well, but as far as I know the cgroup1 is frozen.
>

Please note that after your patch the non-root memcg's
memory.oom_control(oom_kill) will not include the descendant's
oom_kill anymore. The non-root and root memcg's
memory.oom_control(oom_kill) will not be hierarchical anymore but
consistent. I think that was the intention of the patch, right?

> > > Let's recover the original behavior for cgroup1.
> > >
> > > Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> > > Cc: Chris Down <chris@chrisdown.name>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Shakeel Butt <shakeelb@google.com>
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >  include/linux/memcontrol.h | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > index 8c340e6b347f..a0ae080a67d1 100644
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
> > >                 atomic_long_inc(&memcg->memory_events[event]);
> > >                 cgroup_file_notify(&memcg->events_file);
> > >
> > > -               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > > +               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
> > > +                   !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > >                         break;
> > >         } while ((memcg = parent_mem_cgroup(memcg)) &&
> > >                  !mem_cgroup_is_root(memcg));
> > > --
> > > 2.18.2
> > >
>
>
> Thanks
> Yafang
Yafang Shao April 14, 2020, 12:57 a.m. UTC | #6
On Tue, Apr 14, 2020 at 8:53 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Mon, Apr 13, 2020 at 5:36 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Apr 14, 2020 at 1:06 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Sun, Apr 12, 2020 at 7:04 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> > > > memory.events") changes the behavior of memcg events, which will
> > > > consider subtrees in memory.events. But oom_kill event is a special one
> > > > as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> > > > in memory.oom_control. The file memory.oom_control is in both root memcg
> > > > and non root memcg, that is different with memory.event as it only in
> > > > non-root memcg. That commit is okay for cgroup2, but it is not okay for
> > > > cgroup1 as it will cause inconsistent behavior between root memcg and
> > > > non-root memcg.
> > >
> > > I still couldn't understand the cgroup v1's root vs non_root behavior
> > > change. The behavior change I see is the hierarchical one i.e.
> > > MEMCG_OOM_KILL event in the descendant will cause the notification and
> > > count increment in the ancestors even in the cgroup v1.
> >
> > For the non-root memcg, its memory.oom_control(oom_kill) includes its
> > descendants' oom_kill, but for root memcg, it doesn't include its
> > descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> > different meanings in different memcgs. That is inconsistent.
> >
> > [snip]
> > > I suppose we
> > > don't want that behavior change in v1.
> > >
> >
> > That is another topic. I think this feature is welcomed to cgroup1, if
> > we can fully support it, for example by adding memory.events.local
> > into cgroup1 as well, but as far as I know the cgroup1 is frozen.
> >
>
> Please note that after your patch the non-root memcg's
> memory.oom_control(oom_kill) will not include the descendant's
> oom_kill anymore. The non-root and root memcg's
> memory.oom_control(oom_kill) will not be hierarchical anymore but
> consistent. I think that was the intention of the patch, right?
>

Right. If we can't fully support it in cgroup1, then let's don't touch
its original behavior.

> > > > Let's recover the original behavior for cgroup1.
> > > >
> > > > Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> > > > Cc: Chris Down <chris@chrisdown.name>
> > > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > > Cc: Shakeel Butt <shakeelb@google.com>
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >  include/linux/memcontrol.h | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > > index 8c340e6b347f..a0ae080a67d1 100644
> > > > --- a/include/linux/memcontrol.h
> > > > +++ b/include/linux/memcontrol.h
> > > > @@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
> > > >                 atomic_long_inc(&memcg->memory_events[event]);
> > > >                 cgroup_file_notify(&memcg->events_file);
> > > >
> > > > -               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > > > +               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
> > > > +                   !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > > >                         break;
> > > >         } while ((memcg = parent_mem_cgroup(memcg)) &&
> > > >                  !mem_cgroup_is_root(memcg));
> > > > --
> > > > 2.18.2
> > > >
> >
> >


Thanks
Yafang
Shakeel Butt April 14, 2020, 1:07 a.m. UTC | #7
On Mon, Apr 13, 2020 at 5:58 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Apr 14, 2020 at 8:53 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Mon, Apr 13, 2020 at 5:36 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Tue, Apr 14, 2020 at 1:06 AM Shakeel Butt <shakeelb@google.com> wrote:
> > > >
> > > > On Sun, Apr 12, 2020 at 7:04 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> > > > > memory.events") changes the behavior of memcg events, which will
> > > > > consider subtrees in memory.events. But oom_kill event is a special one
> > > > > as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> > > > > in memory.oom_control. The file memory.oom_control is in both root memcg
> > > > > and non root memcg, that is different with memory.event as it only in
> > > > > non-root memcg. That commit is okay for cgroup2, but it is not okay for
> > > > > cgroup1 as it will cause inconsistent behavior between root memcg and
> > > > > non-root memcg.
> > > >
> > > > I still couldn't understand the cgroup v1's root vs non_root behavior
> > > > change. The behavior change I see is the hierarchical one i.e.
> > > > MEMCG_OOM_KILL event in the descendant will cause the notification and
> > > > count increment in the ancestors even in the cgroup v1.
> > >
> > > For the non-root memcg, its memory.oom_control(oom_kill) includes its
> > > descendants' oom_kill, but for root memcg, it doesn't include its
> > > descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> > > different meanings in different memcgs. That is inconsistent.
> > >
> > > [snip]
> > > > I suppose we
> > > > don't want that behavior change in v1.
> > > >
> > >
> > > That is another topic. I think this feature is welcomed to cgroup1, if
> > > we can fully support it, for example by adding memory.events.local
> > > into cgroup1 as well, but as far as I know the cgroup1 is frozen.
> > >
> >
> > Please note that after your patch the non-root memcg's
> > memory.oom_control(oom_kill) will not include the descendant's
> > oom_kill anymore. The non-root and root memcg's
> > memory.oom_control(oom_kill) will not be hierarchical anymore but
> > consistent. I think that was the intention of the patch, right?
> >
>
> Right. If we can't fully support it in cgroup1, then let's don't touch
> its original behavior.
>

Agreed.

> > > > > Let's recover the original behavior for cgroup1.
> > > > >
> > > > > Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> > > > > Cc: Chris Down <chris@chrisdown.name>
> > > > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > > > Cc: Shakeel Butt <shakeelb@google.com>
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

> > > > > ---
> > > > >  include/linux/memcontrol.h | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > > > index 8c340e6b347f..a0ae080a67d1 100644
> > > > > --- a/include/linux/memcontrol.h
> > > > > +++ b/include/linux/memcontrol.h
> > > > > @@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
> > > > >                 atomic_long_inc(&memcg->memory_events[event]);
> > > > >                 cgroup_file_notify(&memcg->events_file);
> > > > >
> > > > > -               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > > > > +               if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
> > > > > +                   !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > > > >                         break;
> > > > >         } while ((memcg = parent_mem_cgroup(memcg)) &&
> > > > >                  !mem_cgroup_is_root(memcg));
> > > > > --
> > > > > 2.18.2
> > > > >
> > >
> > >
>
>
> Thanks
> Yafang
Chris Down April 14, 2020, 6:19 p.m. UTC | #8
To be clear, you're correct that this wasn't intended to result in any changes 
on cgroup v1, so I'm not against the change. Especially for stable, though, I'd 
like to understand what the real results and ramifications are here.
Yafang Shao April 18, 2020, 12:23 a.m. UTC | #9
On Wed, Apr 15, 2020 at 2:19 AM Chris Down <chris@chrisdown.name> wrote:
>
> To be clear, you're correct that this wasn't intended to result in any changes
> on cgroup v1, so I'm not against the change. Especially for stable, though, I'd
> like to understand what the real results and ramifications are here.

As explained above, the user tool parsing memory.oom_control is
affected by this behavioral change, and what's worse is there is no
documentation on it. I'm not agaist it if we think that is not enough
to cc:stable.

Thanks
Yafang
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8c340e6b347f..a0ae080a67d1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -798,7 +798,8 @@  static inline void memcg_memory_event(struct mem_cgroup *memcg,
 		atomic_long_inc(&memcg->memory_events[event]);
 		cgroup_file_notify(&memcg->events_file);
 
-		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
+		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
+		    !cgroup_subsys_on_dfl(memory_cgrp_subsys))
 			break;
 	} while ((memcg = parent_mem_cgroup(memcg)) &&
 		 !mem_cgroup_is_root(memcg));