diff mbox

mm, oom: remove sleep from under oom_lock

Message ID 20180710094341.GD14284@dhcp22.suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Hocko July 10, 2018, 9:43 a.m. UTC
On Mon 09-07-18 15:49:53, David Rientjes wrote:
> On Mon, 9 Jul 2018, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent
> > munlock and oom reaper unmap, v3") we have a strong synchronization
> > between the oom_killer and victim's exiting because both have to take
> > the oom_lock. Therefore the original heuristic to sleep for a short time
> > in out_of_memory doesn't serve the original purpose.
> > 
> > Moreover Tetsuo has noticed that the short sleep can be more harmful
> > than actually useful. Hammering the system with many processes can lead
> > to a starvation when the task holding the oom_lock can block for a
> > long time (minutes) and block any further progress because the
> > oom_reaper depends on the oom_lock as well.
> > 
> > Drop the short sleep from out_of_memory when we hold the lock. Keep the
> > sleep when the trylock fails to throttle the concurrent OOM paths a bit.
> > This should be solved in a more reasonable way (e.g. sleep proportional
> > to the time spent in the active reclaiming etc.) but this is much more
> > complex thing to achieve. This is a quick fixup to remove a stale code.
> > 
> > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> This reminds me:
> 
> mm/oom_kill.c
> 
>  54) int sysctl_oom_dump_tasks = 1;
>  55) 
>  56) DEFINE_MUTEX(oom_lock);
>  57) 
>  58) #ifdef CONFIG_NUMA
> 
> Would you mind documenting oom_lock to specify what it's protecting?

What do you think about the following?

Comments

David Rientjes July 10, 2018, 6:55 p.m. UTC | #1
On Tue, 10 Jul 2018, Michal Hocko wrote:

> What do you think about the following?
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ed9d473c571e..32e6f7becb40 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -53,6 +53,14 @@ int sysctl_panic_on_oom;
>  int sysctl_oom_kill_allocating_task;
>  int sysctl_oom_dump_tasks = 1;
>  
> +/*
> + * Serializes oom killer invocations (out_of_memory()) from all contexts to
> + * prevent from over eager oom killing (e.g. when the oom killer is invoked
> + * from different domains).
> + *
> + * oom_killer_disable() relies on this lock to stabilize oom_killer_disabled
> + * and mark_oom_victim
> + */
>  DEFINE_MUTEX(oom_lock);
>  
>  #ifdef CONFIG_NUMA

I think it's better, thanks.  However, does it address the question about 
why __oom_reap_task_mm() needs oom_lock protection?  Perhaps it would be 
helpful to mention synchronization between reaping triggered from 
oom_reaper and by exit_mmap().
David Rientjes July 10, 2018, 9:12 p.m. UTC | #2
On Tue, 10 Jul 2018, David Rientjes wrote:

> I think it's better, thanks.  However, does it address the question about 
> why __oom_reap_task_mm() needs oom_lock protection?  Perhaps it would be 
> helpful to mention synchronization between reaping triggered from 
> oom_reaper and by exit_mmap().
> 

Actually, can't we remove the need to take oom_lock in exit_mmap() if 
__oom_reap_task_mm() can do a test and set on MMF_UNSTABLE and, if already 
set, bail out immediately?
Michal Hocko July 11, 2018, 8:59 a.m. UTC | #3
On Tue 10-07-18 14:12:28, David Rientjes wrote:
> On Tue, 10 Jul 2018, David Rientjes wrote:
> 
> > I think it's better, thanks.  However, does it address the question about 
> > why __oom_reap_task_mm() needs oom_lock protection?  Perhaps it would be 
> > helpful to mention synchronization between reaping triggered from 
> > oom_reaper and by exit_mmap().
> > 
> 
> Actually, can't we remove the need to take oom_lock in exit_mmap() if 
> __oom_reap_task_mm() can do a test and set on MMF_UNSTABLE and, if already 
> set, bail out immediately?

I think we do not really depend on oom_lock anymore in
__oom_reap_task_mm.  The race it was original added for (mmget_not_zero
vs. exit path) is no longer a problem. I didn't really get to evaluate
it deeper though. There are just too many things going on in parallel.

Tetsuo was proposing some patches to remove the lock but those patches
had some other problems. If we have a simple patch to remove the
oom_lock from the oom reaper then I will review it. I am not sure I can
come up with a patch myself in few days.
diff mbox

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ed9d473c571e..32e6f7becb40 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -53,6 +53,14 @@  int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 
+/*
+ * Serializes oom killer invocations (out_of_memory()) from all contexts to
+ * prevent from over eager oom killing (e.g. when the oom killer is invoked
+ * from different domains).
+ *
+ * oom_killer_disable() relies on this lock to stabilize oom_killer_disabled
+ * and mark_oom_victim
+ */
 DEFINE_MUTEX(oom_lock);
 
 #ifdef CONFIG_NUMA