diff mbox series

[v15,2/2] Add oom victim's memcg to the oom context information

Message ID 1542799799-36184-2-git-send-email-ufo19890607@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v15,1/2] Reorganize the oom report in dump_header | expand

Commit Message

禹舟键 Nov. 21, 2018, 11:29 a.m. UTC
From: yuzhoujian <yuzhoujian@didichuxing.com>

The current oom report doesn't display victim's memcg context during the
global OOM situation. While this information is not strictly needed, it
can be really helpful for containerized environments to locate which
container has lost a process. Now that we have a single line for the oom
context, we can trivially add both the oom memcg (this can be either
global_oom or a specific memcg which hits its hard limits) and task_memcg
which is the victim's memcg.

Below is the single line output in the oom report after this patch.
- global oom context information:
oom-kill:constraint=<constraint>,nodemask=<nodemask>,cpuset=<cpuset>,mems_allowed=<mems_allowed>,global_oom,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
- memcg oom context information:
oom-kill:constraint=<constraint>,nodemask=<nodemask>,cpuset=<cpuset>,mems_allowed=<mems_allowed>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>

Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
---
 include/linux/memcontrol.h | 11 +++++++++--
 mm/memcontrol.c            | 33 ++++++++++++++++++++-------------
 mm/oom_kill.c              |  3 ++-
 3 files changed, 31 insertions(+), 16 deletions(-)

Comments

Michal Hocko Nov. 22, 2018, 1:39 p.m. UTC | #1
On Wed 21-11-18 19:29:59, ufo19890607@gmail.com wrote:
> From: yuzhoujian <yuzhoujian@didichuxing.com>
> 
> The current oom report doesn't display victim's memcg context during the
> global OOM situation. While this information is not strictly needed, it
> can be really helpful for containerized environments to locate which
> container has lost a process. Now that we have a single line for the oom
> context, we can trivially add both the oom memcg (this can be either
> global_oom or a specific memcg which hits its hard limits) and task_memcg
> which is the victim's memcg.
> 
> Below is the single line output in the oom report after this patch.
> - global oom context information:
> oom-kill:constraint=<constraint>,nodemask=<nodemask>,cpuset=<cpuset>,mems_allowed=<mems_allowed>,global_oom,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
> - memcg oom context information:
> oom-kill:constraint=<constraint>,nodemask=<nodemask>,cpuset=<cpuset>,mems_allowed=<mems_allowed>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
> 
> Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>

I thought I have acked this one already.
Acked-by: Michal Hocko <mhocko@suse.com>
禹舟键 Nov. 23, 2018, 6:11 a.m. UTC | #2
Hi Michal
I just rebase the patch from the latest version.


Michal Hocko <mhocko@kernel.org> 于2018年11月22日周四 下午9:39写道:

> On Wed 21-11-18 19:29:59, ufo19890607@gmail.com wrote:
> > From: yuzhoujian <yuzhoujian@didichuxing.com>
> >
> > The current oom report doesn't display victim's memcg context during the
> > global OOM situation. While this information is not strictly needed, it
> > can be really helpful for containerized environments to locate which
> > container has lost a process. Now that we have a single line for the oom
> > context, we can trivially add both the oom memcg (this can be either
> > global_oom or a specific memcg which hits its hard limits) and task_memcg
> > which is the victim's memcg.
> >
> > Below is the single line output in the oom report after this patch.
> > - global oom context information:
> >
> oom-kill:constraint=<constraint>,nodemask=<nodemask>,cpuset=<cpuset>,mems_allowed=<mems_allowed>,global_oom,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
> > - memcg oom context information:
> >
> oom-kill:constraint=<constraint>,nodemask=<nodemask>,cpuset=<cpuset>,mems_allowed=<mems_allowed>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
> >
> > Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
>
> I thought I have acked this one already.
> Acked-by: Michal Hocko <mhocko@suse.com>
> --
> Michal Hocko
> SUSE Labs
>
<div dir="ltr">Hi Michal<div>I just rebase the patch from the latest version.</div></div><br><br><div class="gmail_quote"><div dir="ltr">Michal Hocko &lt;<a href="mailto:mhocko@kernel.org">mhocko@kernel.org</a>&gt; 于2018年11月22日周四 下午9:39写道:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Wed 21-11-18 19:29:59, <a href="mailto:ufo19890607@gmail.com" target="_blank">ufo19890607@gmail.com</a> wrote:<br>
&gt; From: yuzhoujian &lt;<a href="mailto:yuzhoujian@didichuxing.com" target="_blank">yuzhoujian@didichuxing.com</a>&gt;<br>
&gt; <br>
&gt; The current oom report doesn&#39;t display victim&#39;s memcg context during the<br>
&gt; global OOM situation. While this information is not strictly needed, it<br>
&gt; can be really helpful for containerized environments to locate which<br>
&gt; container has lost a process. Now that we have a single line for the oom<br>
&gt; context, we can trivially add both the oom memcg (this can be either<br>
&gt; global_oom or a specific memcg which hits its hard limits) and task_memcg<br>
&gt; which is the victim&#39;s memcg.<br>
&gt; <br>
&gt; Below is the single line output in the oom report after this patch.<br>
&gt; - global oom context information:<br>
&gt; oom-kill:constraint=&lt;constraint&gt;,nodemask=&lt;nodemask&gt;,cpuset=&lt;cpuset&gt;,mems_allowed=&lt;mems_allowed&gt;,global_oom,task_memcg=&lt;memcg&gt;,task=&lt;comm&gt;,pid=&lt;pid&gt;,uid=&lt;uid&gt;<br>
&gt; - memcg oom context information:<br>
&gt; oom-kill:constraint=&lt;constraint&gt;,nodemask=&lt;nodemask&gt;,cpuset=&lt;cpuset&gt;,mems_allowed=&lt;mems_allowed&gt;,oom_memcg=&lt;memcg&gt;,task_memcg=&lt;memcg&gt;,task=&lt;comm&gt;,pid=&lt;pid&gt;,uid=&lt;uid&gt;<br>
&gt; <br>
&gt; Signed-off-by: yuzhoujian &lt;<a href="mailto:yuzhoujian@didichuxing.com" target="_blank">yuzhoujian@didichuxing.com</a>&gt;<br>
<br>
I thought I have acked this one already.<br>
Acked-by: Michal Hocko &lt;<a href="mailto:mhocko@suse.com" target="_blank">mhocko@suse.com</a>&gt;<br>
-- <br>
Michal Hocko<br>
SUSE Labs<br>
</blockquote></div>
Tetsuo Handa Dec. 19, 2018, 7:23 a.m. UTC | #3
Andrew, will you fold below diff into "mm, oom: add oom victim's memcg to the oom context information" ?

From add1e8daddbfc5186417dbc58e9e11e7614868f8 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 19 Dec 2018 16:09:31 +0900
Subject: [PATCH] mm, oom: Use pr_cont() in mem_cgroup_print_oom_context().

One line summary of the OOM killer context is not one line due to
not using KERN_CONT.

[   23.346650] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0
[   23.346691] ,global_oom,task_memcg=/,task=firewalld,pid=5096,uid=0

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b860dd4f7..4afd597 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1306,10 +1306,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
 	rcu_read_lock();
 
 	if (memcg) {
-		pr_info(",oom_memcg=");
+		pr_cont(",oom_memcg=");
 		pr_cont_cgroup_path(memcg->css.cgroup);
 	} else
-		pr_info(",global_oom");
+		pr_cont(",global_oom");
 	if (p) {
 		pr_cont(",task_memcg=");
 		pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
Michal Hocko Dec. 19, 2018, 9:39 a.m. UTC | #4
On Wed 19-12-18 16:23:39, Tetsuo Handa wrote:
> Andrew, will you fold below diff into "mm, oom: add oom victim's memcg to the oom context information" ?
> 
> >From add1e8daddbfc5186417dbc58e9e11e7614868f8 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Wed, 19 Dec 2018 16:09:31 +0900
> Subject: [PATCH] mm, oom: Use pr_cont() in mem_cgroup_print_oom_context().
> 
> One line summary of the OOM killer context is not one line due to
> not using KERN_CONT.
> 
> [   23.346650] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0
> [   23.346691] ,global_oom,task_memcg=/,task=firewalld,pid=5096,uid=0
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Sorry, I have missed that during review. Thanks for catching this up!

> ---
>  mm/memcontrol.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b860dd4f7..4afd597 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1306,10 +1306,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
>  	rcu_read_lock();
>  
>  	if (memcg) {
> -		pr_info(",oom_memcg=");
> +		pr_cont(",oom_memcg=");
>  		pr_cont_cgroup_path(memcg->css.cgroup);
>  	} else
> -		pr_info(",global_oom");
> +		pr_cont(",global_oom");
>  	if (p) {
>  		pr_cont(",task_memcg=");
>  		pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
> -- 
> 1.8.3.1
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7ab2120..83ae11c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -526,9 +526,11 @@  unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
 
 unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg);
 
-void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
+void mem_cgroup_print_oom_context(struct mem_cgroup *memcg,
 				struct task_struct *p);
 
+void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg);
+
 static inline void mem_cgroup_enter_user_fault(void)
 {
 	WARN_ON(current->in_user_fault);
@@ -970,7 +972,12 @@  static inline unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg)
 }
 
 static inline void
-mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p)
+{
+}
+
+static inline void
+mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
 {
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6e1469b..b860dd4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1293,32 +1293,39 @@  static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg)
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
 /**
- * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
+ * mem_cgroup_print_oom_context: Print OOM information relevant to
+ * memory controller.
  * @memcg: The memory cgroup that went over limit
  * @p: Task that is going to be killed
  *
  * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
  * enabled
  */
-void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p)
 {
-	struct mem_cgroup *iter;
-	unsigned int i;
-
 	rcu_read_lock();
 
+	if (memcg) {
+		pr_info(",oom_memcg=");
+		pr_cont_cgroup_path(memcg->css.cgroup);
+	} else
+		pr_info(",global_oom");
 	if (p) {
-		pr_info("Task in ");
+		pr_cont(",task_memcg=");
 		pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
-		pr_cont(" killed as a result of limit of ");
-	} else {
-		pr_info("Memory limit reached of cgroup ");
 	}
-
-	pr_cont_cgroup_path(memcg->css.cgroup);
-	pr_cont("\n");
-
 	rcu_read_unlock();
+}
+
+/**
+ * mem_cgroup_print_oom_meminfo: Print OOM memory information relevant to
+ * memory controller.
+ * @memcg: The memory cgroup that went over limit
+ */
+void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
+{
+	struct mem_cgroup *iter;
+	unsigned int i;
 
 	pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
 		K((u64)page_counter_read(&memcg->memory)),
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 2c686d2..6fd1ead 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -435,6 +435,7 @@  static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim)
 			oom_constraint_text[oc->constraint],
 			nodemask_pr_args(oc->nodemask));
 	cpuset_print_current_mems_allowed();
+	mem_cgroup_print_oom_context(oc->memcg, victim);
 	pr_cont(",task=%s,pid=%d,uid=%d\n", victim->comm, victim->pid,
 		from_kuid(&init_user_ns, task_uid(victim)));
 }
@@ -449,7 +450,7 @@  static void dump_header(struct oom_control *oc, struct task_struct *p)
 
 	dump_stack();
 	if (is_memcg_oom(oc))
-		mem_cgroup_print_oom_info(oc->memcg, p);
+		mem_cgroup_print_oom_meminfo(oc->memcg);
 	else {
 		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
 		if (is_dump_unreclaim_slabs())