diff mbox

RFT mm/oomkill: Don't skip the reaper

Message ID 20180529095528.30302-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 29, 2018, 9:55 a.m. UTC
If we find a task that has already been selected for reaping, consider
that it may still free some memory. Currently, we skip such tasks
believing that we've already extracted as memory free pages as possible
from before hitting a livelock. In practice, at least on single user
systems deliberating exercising oom, such processes have only begun to
recover their pages and are still the worst hogs on the system. If we
skip them, we just select yet more victims, and with sufficient
enthusiasm may end up with all available processes marked for reaping,
panicking in the process.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 mm/oom_kill.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

Comments

Chris Wilson June 7, 2018, 4:53 p.m. UTC | #1
Quoting Chris Wilson (2018-05-29 10:55:28)
> If we find a task that has already been selected for reaping, consider
> that it may still free some memory. Currently, we skip such tasks
> believing that we've already extracted as memory free pages as possible
> from before hitting a livelock. In practice, at least on single user
> systems deliberating exercising oom, such processes have only begun to
> recover their pages and are still the worst hogs on the system. If we
> skip them, we just select yet more victims, and with sufficient
> enthusiasm may end up with all available processes marked for reaping,
> panicking in the process.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>

I've stuck this in this topic/core-for-CI for further testing and so we
can make some progress on the mlock failures in igt.
-Chris
diff mbox

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8ba6cb88cf58..2ca231202af9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -49,6 +49,8 @@ 
 #define CREATE_TRACE_POINTS
 #include <trace/events/oom.h>
 
+static bool task_will_free_mem(struct task_struct *task);
+
 int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
@@ -201,19 +203,27 @@  unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 	if (oom_unkillable_task(p, memcg, nodemask))
 		return 0;
 
+	/*
+	 * If we cannot lock the task's mm, it is in the process of being
+	 * removed, so select it as the target of the repear and see if it
+	 * recovers enough memory to continue.
+	 */
 	p = find_lock_task_mm(p);
 	if (!p)
-		return 0;
+		return ULONG_MAX;
+
+	if (task_will_free_mem(p)) {
+		task_unlock(p);
+		return ULONG_MAX;
+	}
 
 	/*
 	 * Do not even consider tasks which are explicitly marked oom
-	 * unkillable or have been already oom reaped or the are in
-	 * the middle of vfork
+	 * unkillable or are in the middle of vfork (and so still share their
+	 * parent's mm).
 	 */
 	adj = (long)p->signal->oom_score_adj;
-	if (adj == OOM_SCORE_ADJ_MIN ||
-			test_bit(MMF_OOM_SKIP, &p->mm->flags) ||
-			in_vfork(p)) {
+	if (adj == OOM_SCORE_ADJ_MIN || in_vfork(p)) {
 		task_unlock(p);
 		return 0;
 	}
@@ -806,11 +816,12 @@  static bool task_will_free_mem(struct task_struct *task)
 		return false;
 
 	/*
-	 * This task has already been drained by the oom reaper so there are
-	 * only small chances it will free some more
+	 * This task has already been selected for the oom reaper, but
+	 * the reaper is unable to make progress. Continue to dogpile
+	 * on top until it succeeds.
 	 */
 	if (test_bit(MMF_OOM_SKIP, &mm->flags))
-		return false;
+		return true;
 
 	if (atomic_read(&mm->mm_users) <= 1)
 		return true;