diff mbox series

[2/2] memcg: do not report racy no-eligible OOM tasks

Message ID 20190107143802.16847-3-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show
Series oom, memcg: do not report racy no-eligible OOM | expand

Commit Message

Michal Hocko Jan. 7, 2019, 2:38 p.m. UTC
From: Michal Hocko <mhocko@suse.com>

Tetsuo has reported [1] that a single process group memcg might easily
swamp the log with no-eligible oom victim reports due to race between
the memcg charge and oom_reaper

Thread 1		Thread2				oom_reaper
try_charge		try_charge
			  mem_cgroup_out_of_memory
			    mutex_lock(oom_lock)
  mem_cgroup_out_of_memory
    mutex_lock(oom_lock)
			      out_of_memory
			        select_bad_process
				oom_kill_process(current)
				  wake_oom_reaper
							  oom_reap_task
							  MMF_OOM_SKIP->victim
			    mutex_unlock(oom_lock)
    out_of_memory
      select_bad_process # no task

If Thread1 didn't race it would bail out from try_charge and force the
charge. We can achieve the same by checking tsk_is_oom_victim inside
the oom_lock and therefore close the race.

[1] http://lkml.kernel.org/r/bb2074c0-34fe-8c2c-1c7d-db71338f1e7f@i-love.sakura.ne.jp
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Tetsuo Handa Jan. 7, 2019, 8:59 p.m. UTC | #1
On 2019/01/07 23:38, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Tetsuo has reported [1] that a single process group memcg might easily
> swamp the log with no-eligible oom victim reports due to race between
> the memcg charge and oom_reaper

This explanation is outdated. I reported that one memcg OOM killer can
kill all processes in that memcg. I expect the changelog to be updated.

> 
> Thread 1		Thread2				oom_reaper
> try_charge		try_charge
> 			  mem_cgroup_out_of_memory
> 			    mutex_lock(oom_lock)
>   mem_cgroup_out_of_memory
>     mutex_lock(oom_lock)
> 			      out_of_memory
> 			        select_bad_process
> 				oom_kill_process(current)
> 				  wake_oom_reaper
> 							  oom_reap_task
> 							  MMF_OOM_SKIP->victim
> 			    mutex_unlock(oom_lock)
>     out_of_memory
>       select_bad_process # no task
> 
> If Thread1 didn't race it would bail out from try_charge and force the
> charge. We can achieve the same by checking tsk_is_oom_victim inside
> the oom_lock and therefore close the race.
> 
> [1] http://lkml.kernel.org/r/bb2074c0-34fe-8c2c-1c7d-db71338f1e7f@i-love.sakura.ne.jp
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memcontrol.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index af7f18b32389..90eb2e2093e7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  		.gfp_mask = gfp_mask,
>  		.order = order,
>  	};
> -	bool ret;
> +	bool ret = true;
>  
>  	mutex_lock(&oom_lock);

And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom
victims", mark_oom_victim() will be called on current thread even if
we used mutex_lock_killable(&oom_lock) here, like you said

  mutex_lock_killable would take care of exiting task already. I would
  then still prefer to check for mark_oom_victim because that is not racy
  with the exit path clearing signals. I can update my patch to use
  _killable lock variant if we are really going with the memcg specific
  fix.

. If current thread is not yet killed by the OOM killer but can terminate
without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here
saves some processes. What is the race you are referring by "racy with the
exit path clearing signals" ?

> +
> +	/*
> +	 * multi-threaded tasks might race with oom_reaper and gain
> +	 * MMF_OOM_SKIP before reaching out_of_memory which can lead
> +	 * to out_of_memory failure if the task is the last one in
> +	 * memcg which would be a false possitive failure reported
> +	 */

Not only out_of_memory() failure. Current thread needlessly tries to
select next OOM victim. out_of_memory() failure is nothing but a result
of no eligible candidate case.

> +	if (tsk_is_oom_victim(current))
> +		goto unlock;
> +
>  	ret = out_of_memory(&oc);
> +
> +unlock:
>  	mutex_unlock(&oom_lock);
>  	return ret;
>  }
>
Michal Hocko Jan. 8, 2019, 8:14 a.m. UTC | #2
On Tue 08-01-19 05:59:49, Tetsuo Handa wrote:
> On 2019/01/07 23:38, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Tetsuo has reported [1] that a single process group memcg might easily
> > swamp the log with no-eligible oom victim reports due to race between
> > the memcg charge and oom_reaper
> 
> This explanation is outdated. I reported that one memcg OOM killer can
> kill all processes in that memcg. I expect the changelog to be updated.

I am open to refinements. Any specific wording you have in mind?

> > 
> > Thread 1		Thread2				oom_reaper
> > try_charge		try_charge
> > 			  mem_cgroup_out_of_memory
> > 			    mutex_lock(oom_lock)
> >   mem_cgroup_out_of_memory
> >     mutex_lock(oom_lock)
> > 			      out_of_memory
> > 			        select_bad_process
> > 				oom_kill_process(current)
> > 				  wake_oom_reaper
> > 							  oom_reap_task
> > 							  MMF_OOM_SKIP->victim
> > 			    mutex_unlock(oom_lock)
> >     out_of_memory
> >       select_bad_process # no task
> > 
> > If Thread1 didn't race it would bail out from try_charge and force the
> > charge. We can achieve the same by checking tsk_is_oom_victim inside
> > the oom_lock and therefore close the race.
> > 
> > [1] http://lkml.kernel.org/r/bb2074c0-34fe-8c2c-1c7d-db71338f1e7f@i-love.sakura.ne.jp
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/memcontrol.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index af7f18b32389..90eb2e2093e7 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  		.gfp_mask = gfp_mask,
> >  		.order = order,
> >  	};
> > -	bool ret;
> > +	bool ret = true;
> >  
> >  	mutex_lock(&oom_lock);
> 
> And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom
> victims", mark_oom_victim() will be called on current thread even if
> we used mutex_lock_killable(&oom_lock) here, like you said
> 
>   mutex_lock_killable would take care of exiting task already. I would
>   then still prefer to check for mark_oom_victim because that is not racy
>   with the exit path clearing signals. I can update my patch to use
>   _killable lock variant if we are really going with the memcg specific
>   fix.
> 
> . If current thread is not yet killed by the OOM killer but can terminate
> without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here
> saves some processes. What is the race you are referring by "racy with the
> exit path clearing signals" ?

This is unrelated to the patch.
 
> > +
> > +	/*
> > +	 * multi-threaded tasks might race with oom_reaper and gain
> > +	 * MMF_OOM_SKIP before reaching out_of_memory which can lead
> > +	 * to out_of_memory failure if the task is the last one in
> > +	 * memcg which would be a false possitive failure reported
> > +	 */
> 
> Not only out_of_memory() failure. Current thread needlessly tries to
> select next OOM victim. out_of_memory() failure is nothing but a result
> of no eligible candidate case.

So?

Let me ask again. Does this solve the issue you are seeing? I really do
not want to end in nit picking endless thread again and would like to
move on.

> > +	if (tsk_is_oom_victim(current))
> > +		goto unlock;
> > +
> >  	ret = out_of_memory(&oc);
> > +
> > +unlock:
> >  	mutex_unlock(&oom_lock);
> >  	return ret;
> >  }
> >
kernel test robot Jan. 8, 2019, 8:35 a.m. UTC | #3
Hi Michal,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.0-rc1 next-20190108]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/oom-memcg-do-not-report-racy-no-eligible-OOM/20190108-092805
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit
   include/linux/sched/mm.h:141:37: warning: dereference of noderef expression
   mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock
   mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block
>> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock

vim +/__oom_kill_process +918 mm/oom_kill.c

1af8bb43 Michal Hocko          2016-07-28  845  
5989ad7b Roman Gushchin        2018-08-21  846  static void __oom_kill_process(struct task_struct *victim)
^1da177e Linus Torvalds        2005-04-16  847  {
5989ad7b Roman Gushchin        2018-08-21  848  	struct task_struct *p;
647f2bdf David Rientjes        2012-03-21  849  	struct mm_struct *mm;
bb29902a Tetsuo Handa          2016-03-25  850  	bool can_oom_reap = true;
^1da177e Linus Torvalds        2005-04-16  851  
6b0c81b3 David Rientjes        2012-07-31  852  	p = find_lock_task_mm(victim);
6b0c81b3 David Rientjes        2012-07-31  853  	if (!p) {
6b0c81b3 David Rientjes        2012-07-31  854  		put_task_struct(victim);
647f2bdf David Rientjes        2012-03-21  855  		return;
6b0c81b3 David Rientjes        2012-07-31  856  	} else if (victim != p) {
6b0c81b3 David Rientjes        2012-07-31  857  		get_task_struct(p);
6b0c81b3 David Rientjes        2012-07-31  858  		put_task_struct(victim);
6b0c81b3 David Rientjes        2012-07-31  859  		victim = p;
6b0c81b3 David Rientjes        2012-07-31  860  	}
647f2bdf David Rientjes        2012-03-21  861  
880b7689 Tetsuo Handa          2015-11-05  862  	/* Get a reference to safely compare mm after task_unlock(victim) */
647f2bdf David Rientjes        2012-03-21  863  	mm = victim->mm;
f1f10076 Vegard Nossum         2017-02-27  864  	mmgrab(mm);
8e675f7a Konstantin Khlebnikov 2017-07-06  865  
8e675f7a Konstantin Khlebnikov 2017-07-06  866  	/* Raise event before sending signal: task reaper must see this */
8e675f7a Konstantin Khlebnikov 2017-07-06  867  	count_vm_event(OOM_KILL);
fe6bdfc8 Roman Gushchin        2018-06-14  868  	memcg_memory_event_mm(mm, MEMCG_OOM_KILL);
8e675f7a Konstantin Khlebnikov 2017-07-06  869  
426fb5e7 Tetsuo Handa          2015-11-05  870  	/*
cd04ae1e Michal Hocko          2017-09-06  871  	 * We should send SIGKILL before granting access to memory reserves
cd04ae1e Michal Hocko          2017-09-06  872  	 * in order to prevent the OOM victim from depleting the memory
cd04ae1e Michal Hocko          2017-09-06  873  	 * reserves from the user space under its control.
426fb5e7 Tetsuo Handa          2015-11-05  874  	 */
079b22dc Eric W. Biederman     2018-09-03  875  	do_send_sig_info(SIGKILL, SEND_SIG_PRIV, victim, PIDTYPE_TGID);
16e95196 Johannes Weiner       2015-06-24  876  	mark_oom_victim(victim);
eca56ff9 Jerome Marchand       2016-01-14  877  	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
647f2bdf David Rientjes        2012-03-21  878  		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
647f2bdf David Rientjes        2012-03-21  879  		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
eca56ff9 Jerome Marchand       2016-01-14  880  		K(get_mm_counter(victim->mm, MM_FILEPAGES)),
eca56ff9 Jerome Marchand       2016-01-14  881  		K(get_mm_counter(victim->mm, MM_SHMEMPAGES)));
647f2bdf David Rientjes        2012-03-21  882  	task_unlock(victim);
647f2bdf David Rientjes        2012-03-21  883  
647f2bdf David Rientjes        2012-03-21  884  	/*
647f2bdf David Rientjes        2012-03-21  885  	 * Kill all user processes sharing victim->mm in other thread groups, if
647f2bdf David Rientjes        2012-03-21  886  	 * any.  They don't get access to memory reserves, though, to avoid
647f2bdf David Rientjes        2012-03-21  887  	 * depletion of all memory.  This prevents mm->mmap_sem livelock when an
647f2bdf David Rientjes        2012-03-21  888  	 * oom killed thread cannot exit because it requires the semaphore and
647f2bdf David Rientjes        2012-03-21  889  	 * its contended by another thread trying to allocate memory itself.
647f2bdf David Rientjes        2012-03-21  890  	 * That thread will now get access to memory reserves since it has a
647f2bdf David Rientjes        2012-03-21  891  	 * pending fatal signal.
647f2bdf David Rientjes        2012-03-21  892  	 */
4d4048be Oleg Nesterov         2014-01-21  893  	rcu_read_lock();
c319025a Oleg Nesterov         2015-11-05  894  	for_each_process(p) {
00508538 Michal Hocko          2019-01-07  895  		struct task_struct *t;
4d7b3394 Oleg Nesterov         2015-11-05  896  		if (!process_shares_mm(p, mm))
c319025a Oleg Nesterov         2015-11-05  897  			continue;
c319025a Oleg Nesterov         2015-11-05  898  		if (same_thread_group(p, victim))
c319025a Oleg Nesterov         2015-11-05  899  			continue;
1b51e65e Michal Hocko          2016-10-07  900  		if (is_global_init(p)) {
aac45363 Michal Hocko          2016-03-25  901  			can_oom_reap = false;
862e3073 Michal Hocko          2016-10-07  902  			set_bit(MMF_OOM_SKIP, &mm->flags);
a373966d Michal Hocko          2016-07-28  903  			pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n",
a373966d Michal Hocko          2016-07-28  904  					task_pid_nr(victim), victim->comm,
a373966d Michal Hocko          2016-07-28  905  					task_pid_nr(p), p->comm);
647f2bdf David Rientjes        2012-03-21  906  			continue;
aac45363 Michal Hocko          2016-03-25  907  		}
1b51e65e Michal Hocko          2016-10-07  908  		/*
1b51e65e Michal Hocko          2016-10-07  909  		 * No use_mm() user needs to read from the userspace so we are
1b51e65e Michal Hocko          2016-10-07  910  		 * ok to reap it.
1b51e65e Michal Hocko          2016-10-07  911  		 */
1b51e65e Michal Hocko          2016-10-07  912  		if (unlikely(p->flags & PF_KTHREAD))
1b51e65e Michal Hocko          2016-10-07  913  			continue;
079b22dc Eric W. Biederman     2018-09-03  914  		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID);
00508538 Michal Hocko          2019-01-07  915  		t = find_lock_task_mm(p);
00508538 Michal Hocko          2019-01-07  916  		if (!t)
00508538 Michal Hocko          2019-01-07  917  			continue;
00508538 Michal Hocko          2019-01-07 @918  		mark_oom_victim(t);
00508538 Michal Hocko          2019-01-07  919  		task_unlock(t);
647f2bdf David Rientjes        2012-03-21  920  	}
6b0c81b3 David Rientjes        2012-07-31  921  	rcu_read_unlock();
647f2bdf David Rientjes        2012-03-21  922  
aac45363 Michal Hocko          2016-03-25  923  	if (can_oom_reap)
36324a99 Michal Hocko          2016-03-25  924  		wake_oom_reaper(victim);
aac45363 Michal Hocko          2016-03-25  925  
880b7689 Tetsuo Handa          2015-11-05  926  	mmdrop(mm);
6b0c81b3 David Rientjes        2012-07-31  927  	put_task_struct(victim);
^1da177e Linus Torvalds        2005-04-16  928  }
647f2bdf David Rientjes        2012-03-21  929  #undef K
^1da177e Linus Torvalds        2005-04-16  930  

:::::: The code at line 918 was first introduced by commit
:::::: 00508538cb045f28a2f60e5d2dff98b77b0da725 mm, oom: marks all killed tasks as oom victims

:::::: TO: Michal Hocko <mhocko@suse.com>
:::::: CC: 0day robot <lkp@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Michal Hocko Jan. 8, 2019, 9:39 a.m. UTC | #4
On Tue 08-01-19 16:35:42, kbuild test robot wrote:
[...]
> All warnings (new ones prefixed by >>):
> 
>    include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit
>    include/linux/sched/mm.h:141:37: warning: dereference of noderef expression
>    mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock
>    mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block
> >> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock

What exactly does this warning say? I do not see anything wrong about
the code. find_lock_task_mm returns a locked task when t != NULL and
mark_oom_victim doesn't do anything about the locking. Am I missing
something or the warning is just confused?

[...]
> 00508538 Michal Hocko          2019-01-07  915  		t = find_lock_task_mm(p);
> 00508538 Michal Hocko          2019-01-07  916  		if (!t)
> 00508538 Michal Hocko          2019-01-07  917  			continue;
> 00508538 Michal Hocko          2019-01-07 @918  		mark_oom_victim(t);
> 00508538 Michal Hocko          2019-01-07  919  		task_unlock(t);
> 647f2bdf David Rientjes        2012-03-21  920  	}
Tetsuo Handa Jan. 8, 2019, 10:39 a.m. UTC | #5
On 2019/01/08 17:14, Michal Hocko wrote:
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index af7f18b32389..90eb2e2093e7 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>>>  		.gfp_mask = gfp_mask,
>>>  		.order = order,
>>>  	};
>>> -	bool ret;
>>> +	bool ret = true;
>>>  
>>>  	mutex_lock(&oom_lock);
>>
>> And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom
>> victims", mark_oom_victim() will be called on current thread even if
>> we used mutex_lock_killable(&oom_lock) here, like you said
>>
>>   mutex_lock_killable would take care of exiting task already. I would
>>   then still prefer to check for mark_oom_victim because that is not racy
>>   with the exit path clearing signals. I can update my patch to use
>>   _killable lock variant if we are really going with the memcg specific
>>   fix.
>>
>> . If current thread is not yet killed by the OOM killer but can terminate
>> without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here
>> saves some processes. What is the race you are referring by "racy with the
>> exit path clearing signals" ?
> 
> This is unrelated to the patch.

Ultimately related! This is the reasoning why your patch should be preferred
over my patch.

For example, if memcg OOM events in different domains are pending, already
OOM-killed threads needlessly wait for pending memcg OOM events in different
domains. An out_of_memory() call is slow because it involves printk(). With
slow serial consoles, out_of_memory() might take more than a second. I consider
that allowing killed processes to call mmput() from exit_mm() from do_exit()
quickly (instead of waiting for pending memcg OOM events in different domains
at mem_cgroup_out_of_memory()) helps calling __mmput() (which can reclaim more
memory than the OOM reaper can reclaim) quickly. Unless what you call "racy" is
problematic, I don't see reasons not to apply my patch. So, please please answer
what you are referring to with "racy".
Michal Hocko Jan. 8, 2019, 11:46 a.m. UTC | #6
On Tue 08-01-19 19:39:58, Tetsuo Handa wrote:
> On 2019/01/08 17:14, Michal Hocko wrote:
> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>> index af7f18b32389..90eb2e2093e7 100644
> >>> --- a/mm/memcontrol.c
> >>> +++ b/mm/memcontrol.c
> >>> @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >>>  		.gfp_mask = gfp_mask,
> >>>  		.order = order,
> >>>  	};
> >>> -	bool ret;
> >>> +	bool ret = true;
> >>>  
> >>>  	mutex_lock(&oom_lock);
> >>
> >> And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom
> >> victims", mark_oom_victim() will be called on current thread even if
> >> we used mutex_lock_killable(&oom_lock) here, like you said
> >>
> >>   mutex_lock_killable would take care of exiting task already. I would
> >>   then still prefer to check for mark_oom_victim because that is not racy
> >>   with the exit path clearing signals. I can update my patch to use
> >>   _killable lock variant if we are really going with the memcg specific
> >>   fix.
> >>
> >> . If current thread is not yet killed by the OOM killer but can terminate
> >> without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here
> >> saves some processes. What is the race you are referring by "racy with the
> >> exit path clearing signals" ?
> > 
> > This is unrelated to the patch.
> 
> Ultimately related! This is the reasoning why your patch should be preferred
> over my patch.

No! I've said I do not mind using mutex_lock_killable on top of this
patch. I just want to have this fix minimal.
Chen, Rong A Jan. 11, 2019, 12:23 a.m. UTC | #7
On 01/08/2019 05:39 PM, Michal Hocko wrote:
> On Tue 08-01-19 16:35:42, kbuild test robot wrote:
> [...]
>> All warnings (new ones prefixed by >>):
>>
>>     include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit
>>     include/linux/sched/mm.h:141:37: warning: dereference of noderef expression
>>     mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock
>>     mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block
>>>> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock
> What exactly does this warning say? I do not see anything wrong about
> the code. find_lock_task_mm returns a locked task when t != NULL and
> mark_oom_victim doesn't do anything about the locking. Am I missing
> something or the warning is just confused?

Thanks for your reply. It looks like a false positive. We'll look into it.

Best Regards,
Rong Chen

>
> [...]
>> 00508538 Michal Hocko          2019-01-07  915  		t = find_lock_task_mm(p);
>> 00508538 Michal Hocko          2019-01-07  916  		if (!t)
>> 00508538 Michal Hocko          2019-01-07  917  			continue;
>> 00508538 Michal Hocko          2019-01-07 @918  		mark_oom_victim(t);
>> 00508538 Michal Hocko          2019-01-07  919  		task_unlock(t);
>> 647f2bdf David Rientjes        2012-03-21  920  	}
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index af7f18b32389..90eb2e2093e7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1387,10 +1387,22 @@  static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		.gfp_mask = gfp_mask,
 		.order = order,
 	};
-	bool ret;
+	bool ret = true;
 
 	mutex_lock(&oom_lock);
+
+	/*
+	 * multi-threaded tasks might race with oom_reaper and gain
+	 * MMF_OOM_SKIP before reaching out_of_memory which can lead
+	 * to out_of_memory failure if the task is the last one in
+	 * memcg which would be a false possitive failure reported
+	 */
+	if (tsk_is_oom_victim(current))
+		goto unlock;
+
 	ret = out_of_memory(&oc);
+
+unlock:
 	mutex_unlock(&oom_lock);
 	return ret;
 }