diff mbox

mm: be more informative in OOM task list

Message ID 7de14c6cac4a486c04149f37948e3a76028f3fa5.1530461087.git.rfreire@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Rodrigo Freire July 1, 2018, 4:09 p.m. UTC
The default page memory unit of OOM task dump events might not be
intuitive for the non-initiated when debugging OOM events. Add
a small printk prior to the task dump informing that the memory
units are actually memory _pages_.

Signed-off-by: Rodrigo Freire <rfreire@redhat.com>
---
 mm/oom_kill.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Michal Hocko July 2, 2018, 9:30 a.m. UTC | #1
On Sun 01-07-18 13:09:40, Rodrigo Freire wrote:
> The default page memory unit of OOM task dump events might not be
> intuitive for the non-initiated when debugging OOM events. Add
> a small printk prior to the task dump informing that the memory
> units are actually memory _pages_.

Does this really help? I understand the the oom report might be not the
easiest thing to grasp but wouldn't it be much better to actually add
documentation with clarification of each part of it?

> Signed-off-by: Rodrigo Freire <rfreire@redhat.com>
> ---
>  mm/oom_kill.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84081e7..b4d9557 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -392,6 +392,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  	struct task_struct *p;
>  	struct task_struct *task;
>  
> +	pr_info("Tasks state (memory values in pages):\n");
>  	pr_info("[ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
>  	rcu_read_lock();
>  	for_each_process(p) {
> -- 
> 1.8.3.1
Rodrigo Freire July 2, 2018, 11:22 a.m. UTC | #2
Hello Michal,

----- Original Message ----- 
> From: "Michal Hocko" <mhocko@kernel.org>
> To: "Rodrigo Freire" <rfreire@redhat.com>
> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
> Sent: Monday, July 2, 2018 6:30:43 AM
> Subject: Re: [PATCH] mm: be more informative in OOM task list
>
> On Sun 01-07-18 13:09:40, Rodrigo Freire wrote:
> > The default page memory unit of OOM task dump events might not be
> > intuitive for the non-initiated when debugging OOM events. Add
> > a small printk prior to the task dump informing that the memory
> > units are actually memory _pages_.
>
> Does this really help? I understand the the oom report might be not the
> easiest thing to grasp but wouldn't it be much better to actually add
> documentation with clarification of each part of it?

That would be great: After a quick grep -ri for oom in Documentation,
I found several other files containing its own OOM behaviour modifier
configurations. But it indeed lacks a central and canonical Doc file
which documents the OOM Killer behavior and workflows.

However, I still stand by my proposed patch: It is unobtrusive, infers
no performance issue and clarifying: I recently worked in a case (for
full disclosure: I am a far cry from a MM expert) where the sum of the
RSS pages made sense when interpreted as real kB pages. Reason: There
were processes sharing (a good amount of) memory regions, misleading
the interpretation and that misled not only me, but some other
colleagues a well: The pages was only sorted out after actually
inspecting the source code.

This patch is user-friendly and can be a great time saver to others in
the community.

I kindly request the ACKed-by ;-)

Have a great week,

- RF.
Michal Hocko July 2, 2018, 11:29 a.m. UTC | #3
On Mon 02-07-18 07:22:13, Rodrigo Freire wrote:
> Hello Michal,
> 
> ----- Original Message ----- 
> > From: "Michal Hocko" <mhocko@kernel.org>
> > To: "Rodrigo Freire" <rfreire@redhat.com>
> > Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
> > Sent: Monday, July 2, 2018 6:30:43 AM
> > Subject: Re: [PATCH] mm: be more informative in OOM task list
> >
> > On Sun 01-07-18 13:09:40, Rodrigo Freire wrote:
> > > The default page memory unit of OOM task dump events might not be
> > > intuitive for the non-initiated when debugging OOM events. Add
> > > a small printk prior to the task dump informing that the memory
> > > units are actually memory _pages_.
> >
> > Does this really help? I understand the the oom report might be not the
> > easiest thing to grasp but wouldn't it be much better to actually add
> > documentation with clarification of each part of it?
> 
> That would be great: After a quick grep -ri for oom in Documentation,
> I found several other files containing its own OOM behaviour modifier
> configurations. But it indeed lacks a central and canonical Doc file
> which documents the OOM Killer behavior and workflows.
> 
> However, I still stand by my proposed patch: It is unobtrusive, infers
> no performance issue and clarifying: I recently worked in a case (for
> full disclosure: I am a far cry from a MM expert) where the sum of the
> RSS pages made sense when interpreted as real kB pages. Reason: There
> were processes sharing (a good amount of) memory regions, misleading
> the interpretation and that misled not only me, but some other
> colleagues a well: The pages was only sorted out after actually
> inspecting the source code.
> 
> This patch is user-friendly and can be a great time saver to others in
> the community.

Well, all other counters we print are in page units unless explicitly
kB. So I am not sure we really need to do anything but document the
output better. Maybe others will find it more important though.
Rodrigo Freire July 2, 2018, 11:39 a.m. UTC | #4
Hello Michal!

----- Original Message ----- 
> From: "Michal Hocko" <mhocko@kernel.org>
> To: "Rodrigo Freire" <rfreire@redhat.com>
> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
> Sent: Monday, July 2, 2018 8:29:06 AM
> Subject: Re: [PATCH] mm: be more informative in OOM task list
>
> On Mon 02-07-18 07:22:13, Rodrigo Freire wrote:
> > Hello Michal,
> >
> > ----- Original Message -----
> > > From: "Michal Hocko" <mhocko@kernel.org>
> > > To: "Rodrigo Freire" <rfreire@redhat.com>
> > > Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
> > > Sent: Monday, July 2, 2018 6:30:43 AM
> > > Subject: Re: [PATCH] mm: be more informative in OOM task list
> > >
> > > On Sun 01-07-18 13:09:40, Rodrigo Freire wrote:
> > > > The default page memory unit of OOM task dump events might not be
> > > > intuitive for the non-initiated when debugging OOM events. Add
> > > > a small printk prior to the task dump informing that the memory
> > > > units are actually memory _pages_.
> > >
> > > Does this really help? I understand the the oom report might be not the
> > > easiest thing to grasp but wouldn't it be much better to actually add
> > > documentation with clarification of each part of it?
> >
> > That would be great: After a quick grep -ri for oom in Documentation,
> > I found several other files containing its own OOM behaviour modifier
> > configurations. But it indeed lacks a central and canonical Doc file
> > which documents the OOM Killer behavior and workflows.
> >
> > However, I still stand by my proposed patch: It is unobtrusive, infers
> > no performance issue and clarifying: I recently worked in a case (for
> > full disclosure: I am a far cry from a MM expert) where the sum of the
> > RSS pages made sense when interpreted as real kB pages. Reason: There
> > were processes sharing (a good amount of) memory regions, misleading
> > the interpretation and that misled not only me, but some other
> > colleagues a well: The pages was only sorted out after actually
> > inspecting the source code.
> >
> > This patch is user-friendly and can be a great time saver to others in
> > the community.
>
> Well, all other counters we print are in page units unless explicitly
> kB. 

Your statement is correct. And I thought about that too. And then the doubt:
* Maybe someone forgot to state that these values are in kB?

> So I am not sure we really need to do anything but document the
> output better. Maybe others will find it more important though.

The thing is, it also led some other colleagues (a few!) to think the
very same as me: That raised the flag and made me write the patch:
That was indeed misleading.
And you may not have a MM and OOM-versed specialist available all the 
time! ;-)

Still ask you to reconsider.

My best regards,

- RF.
David Rientjes July 4, 2018, 1:34 a.m. UTC | #5
On Sun, 1 Jul 2018, Rodrigo Freire wrote:

> The default page memory unit of OOM task dump events might not be
> intuitive for the non-initiated when debugging OOM events. Add
> a small printk prior to the task dump informing that the memory
> units are actually memory _pages_.
> 
> Signed-off-by: Rodrigo Freire <rfreire@redhat.com>
> ---
>  mm/oom_kill.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84081e7..b4d9557 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -392,6 +392,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  	struct task_struct *p;
>  	struct task_struct *task;
>  
> +	pr_info("Tasks state (memory values in pages):\n");
>  	pr_info("[ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
>  	rcu_read_lock();
>  	for_each_process(p) {

As the author of dump_tasks(), and having seen these values misinterpreted 
on more than one occassion, I think this is a valuable addition.

Could you also expand out the "pid" field to allow for seven digits 
instead of five?  I think everything else is aligned.

Feel free to add

Acked-by: David Rientjes <rientjes@google.com>

to a v2.
Rafael Aquini July 4, 2018, 2:36 p.m. UTC | #6
On Tue, Jul 03, 2018 at 06:34:48PM -0700, David Rientjes wrote:
> On Sun, 1 Jul 2018, Rodrigo Freire wrote:
> 
> > The default page memory unit of OOM task dump events might not be
> > intuitive for the non-initiated when debugging OOM events. Add
> > a small printk prior to the task dump informing that the memory
> > units are actually memory _pages_.
> > 
> > Signed-off-by: Rodrigo Freire <rfreire@redhat.com>
> > ---
> >  mm/oom_kill.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 84081e7..b4d9557 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -392,6 +392,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
> >  	struct task_struct *p;
> >  	struct task_struct *task;
> >  
> > +	pr_info("Tasks state (memory values in pages):\n");
> >  	pr_info("[ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
> >  	rcu_read_lock();
> >  	for_each_process(p) {
> 
> As the author of dump_tasks(), and having seen these values misinterpreted 
> on more than one occassion, I think this is a valuable addition.
> 
> Could you also expand out the "pid" field to allow for seven digits 
> instead of five?  I think everything else is aligned.
> 
> Feel free to add
> 
> Acked-by: David Rientjes <rientjes@google.com>
> 
> to a v2.
>

Same here, for a v2:
 
Acked-by: Rafael Aquini <aquini@redhat.com>
diff mbox

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 84081e7..b4d9557 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -392,6 +392,7 @@  static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
 	struct task_struct *p;
 	struct task_struct *task;
 
+	pr_info("Tasks state (memory values in pages):\n");
 	pr_info("[ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
 	rcu_read_lock();
 	for_each_process(p) {