[1/2] mm, oom: marks all killed tasks as oom victims

Message ID	20190107143802.16847-2-mhocko@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; From: Michal Hocko <mhocko@kernel.org> To: <linux-mm@kvack.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, Johannes Weiner <hannes@cmpxchg.org>, Andrew Morton <akpm@linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, Michal Hocko <mhocko@suse.com> Subject: [PATCH 1/2] mm, oom: marks all killed tasks as oom victims Date: Mon, 7 Jan 2019 15:38:01 +0100 Message-Id: <20190107143802.16847-2-mhocko@kernel.org> In-Reply-To: <20190107143802.16847-1-mhocko@kernel.org> References: <20190107143802.16847-1-mhocko@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	oom, memcg: do not report racy no-eligible OOM \| expand [0/2] oom, memcg: do not report racy no-eligible OOM [1/2] mm, oom: marks all killed tasks as oom victims [2/2] memcg: do not report racy no-eligible OOM tasks [3/2] memcg: Facilitate termination of memcg OOM victims.

Message ID

20190107143802.16847-2-mhocko@kernel.org (mailing list archive)

State

New, archived

Headers

Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates
 209.85.220.65 as permitted sender) client-ip=209.85.220.65;
From: Michal Hocko <mhocko@kernel.org>
To: <linux-mm@kvack.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [PATCH 1/2] mm, oom: marks all killed tasks as oom victims
Date: Mon,  7 Jan 2019 15:38:01 +0100
Message-Id: <20190107143802.16847-2-mhocko@kernel.org>
In-Reply-To: <20190107143802.16847-1-mhocko@kernel.org>
References: <20190107143802.16847-1-mhocko@kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

oom, memcg: do not report racy no-eligible OOM | expand

Commit Message

Michal Hocko Jan. 7, 2019, 2:38 p.m. UTC

From: Michal Hocko <mhocko@suse.com>

Historically we have called mark_oom_victim only to the main task
selected as the oom victim because oom victims have access to memory
reserves and granting the access to all killed tasks could deplete
memory reserves very quickly and cause even larger problems.

Since only a partial access to memory reserves is allowed there is no
longer this risk and so all tasks killed along with the oom victim
can be considered as well.

The primary motivation for that is that process groups which do not
shared signals would behave more like standard thread groups wrt oom
handling (aka tsk_is_oom_victim will work the same way for them).

- Use find_lock_task_mm to stabilize mm as suggested by Tetsuo

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/oom_kill.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Tetsuo Handa Jan. 7, 2019, 8:58 p.m. UTC | #1

On 2019/01/07 23:38, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have called mark_oom_victim only to the main task
> selected as the oom victim because oom victims have access to memory
> reserves and granting the access to all killed tasks could deplete
> memory reserves very quickly and cause even larger problems.
> 
> Since only a partial access to memory reserves is allowed there is no
> longer this risk and so all tasks killed along with the oom victim
> can be considered as well.
> 
> The primary motivation for that is that process groups which do not
> shared signals would behave more like standard thread groups wrt oom
> handling (aka tsk_is_oom_victim will work the same way for them).
> 
> - Use find_lock_task_mm to stabilize mm as suggested by Tetsuo
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/oom_kill.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index f0e8cd9edb1a..0246c7a4e44e 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -892,6 +892,7 @@ static void __oom_kill_process(struct task_struct *victim)
>  	 */
>  	rcu_read_lock();
>  	for_each_process(p) {
> +		struct task_struct *t;
>  		if (!process_shares_mm(p, mm))
>  			continue;
>  		if (same_thread_group(p, victim))
> @@ -911,6 +912,11 @@ static void __oom_kill_process(struct task_struct *victim)
>  		if (unlikely(p->flags & PF_KTHREAD))
>  			continue;
>  		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID);
> +		t = find_lock_task_mm(p);
> +		if (!t)
> +			continue;
> +		mark_oom_victim(t);
> +		task_unlock(t);

Thank you for updating this patch. This patch is correct from the point of
view of avoiding TIF_MEMDIE race. But if I recall correctly, the reason we
did not do this is to avoid depleting memory reserves. And we still grant
full access to memory reserves for CONFIG_MMU=n case. Shouldn't the changelog
mention CONFIG_MMU=n case?

>  	}
>  	rcu_read_unlock();
>  
>

Michal Hocko Jan. 8, 2019, 8:11 a.m. UTC | #2

On Tue 08-01-19 05:58:41, Tetsuo Handa wrote:
> On 2019/01/07 23:38, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Historically we have called mark_oom_victim only to the main task
> > selected as the oom victim because oom victims have access to memory
> > reserves and granting the access to all killed tasks could deplete
> > memory reserves very quickly and cause even larger problems.
> > 
> > Since only a partial access to memory reserves is allowed there is no
> > longer this risk and so all tasks killed along with the oom victim
> > can be considered as well.
> > 
> > The primary motivation for that is that process groups which do not
> > shared signals would behave more like standard thread groups wrt oom
> > handling (aka tsk_is_oom_victim will work the same way for them).
> > 
> > - Use find_lock_task_mm to stabilize mm as suggested by Tetsuo
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/oom_kill.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index f0e8cd9edb1a..0246c7a4e44e 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -892,6 +892,7 @@ static void __oom_kill_process(struct task_struct *victim)
> >  	 */
> >  	rcu_read_lock();
> >  	for_each_process(p) {
> > +		struct task_struct *t;
> >  		if (!process_shares_mm(p, mm))
> >  			continue;
> >  		if (same_thread_group(p, victim))
> > @@ -911,6 +912,11 @@ static void __oom_kill_process(struct task_struct *victim)
> >  		if (unlikely(p->flags & PF_KTHREAD))
> >  			continue;
> >  		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID);
> > +		t = find_lock_task_mm(p);
> > +		if (!t)
> > +			continue;
> > +		mark_oom_victim(t);
> > +		task_unlock(t);
> 
> Thank you for updating this patch. This patch is correct from the point of
> view of avoiding TIF_MEMDIE race. But if I recall correctly, the reason we
> did not do this is to avoid depleting memory reserves. And we still grant
> full access to memory reserves for CONFIG_MMU=n case. Shouldn't the changelog
> mention CONFIG_MMU=n case?

Like so many times before. Does nommu matter in this context at all? You
keep bringing it up without actually trying to understand that nommu is
so special that reserves for those architectures are of very limited
use. I do not really see much point mentioning nommu in every oom patch.

Or do you know of a nommu oom killer bug out there? I would be more than
curious. Seriously.

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f0e8cd9edb1a..0246c7a4e44e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -892,6 +892,7 @@  static void __oom_kill_process(struct task_struct *victim)
 	 */
 	rcu_read_lock();
 	for_each_process(p) {
+		struct task_struct *t;
 		if (!process_shares_mm(p, mm))
 			continue;
 		if (same_thread_group(p, victim))
@@ -911,6 +912,11 @@  static void __oom_kill_process(struct task_struct *victim)
 		if (unlikely(p->flags & PF_KTHREAD))
 			continue;
 		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+		t = find_lock_task_mm(p);
+		if (!t)
+			continue;
+		mark_oom_victim(t);
+		task_unlock(t);
 	}
 	rcu_read_unlock();

[1/2] mm, oom: marks all killed tasks as oom victims

Commit Message

Comments

Patch