From patchwork Fri Dec 20 06:26:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?5b2t5b+X5Yia?= X-Patchwork-Id: 11304845 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3389114B7 for ; Fri, 20 Dec 2019 06:26:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E18F724685 for ; Fri, 20 Dec 2019 06:26:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="P/w4T7Yl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E18F724685 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 287888E018E; Fri, 20 Dec 2019 01:26:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 239CA8E0184; Fri, 20 Dec 2019 01:26:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 127758E018E; Fri, 20 Dec 2019 01:26:23 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id EED5D8E0184 for ; Fri, 20 Dec 2019 01:26:22 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 6F3EF180AD820 for ; Fri, 20 Dec 2019 06:26:22 +0000 (UTC) X-FDA: 76284535404.02.hand39_1a7b18528fc0e X-Spam-Summary: 2,0,0,1720bb6d1bf0657f,d41d8cd98f00b204,zgpeng.linux@gmail.com,:akpm@linux-foundation.org:hannes@cmpxchg.org:mhocko@kernel.org:vdavydov.dev@gmail.com:shakeelb@google.com::cgroups@vger.kernel.org:linux-kernel@vger.kernel.org:zgpeng@tencent.com,RULES_HIT:2:41:69:355:379:541:800:960:973:988:989:1260:1345:1437:1535:1605:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2693:2731:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4049:4119:4321:4605:5007:6261:6653:7556:7576:7903:9413:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12679:12895:12986:13255:14096:14394:21080:21324:21433:21444:21450:21451:21627:21666:21740:21990:30054:30056:30075:30090,0,RBL:209.85.215.193:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: hand39_1a7b18528fc0e X-Filterd-Recvd-Size: 8836 Received: from mail-pg1-f193.google.com (mail-pg1-f193.google.com [209.85.215.193]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Fri, 20 Dec 2019 06:26:21 +0000 (UTC) Received: by mail-pg1-f193.google.com with SMTP id k25so4418319pgt.7 for ; Thu, 19 Dec 2019 22:26:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=sLQwYn83trd9XP0m0VSg2ROtVjHj/MiokIdaMbs0cVg=; b=P/w4T7YlmdltPydFyxnyFx8IP3iRFTt6ngWTIipJ8X9+t5RdFUxp0GmzwR1y8cFoil 0Un/ojBgwFRQxx3bN0PrWfAWJnbYKK2RbXFj0gs2mIC5ZTQmbP8wDEnZYR4R3UF3Bven LDChDt+bkb1McC8Dypn3T53eBELy4Oc523i8oTZpCdB2Z/JniO0d7xSON9wNI7wLMtp8 AVmUWj/t0Xnt5gKGC4ECIybmU5CaUf9itm+km1cb8eFHxzuDbf99lhtW7PjtYqSbFxco DUTK3lDOAjFLXYfhLdmE0mvzLW9ssveAi0aExIyEyF7Az6gntHBK8UUXAAASFni9YFC5 XIug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=sLQwYn83trd9XP0m0VSg2ROtVjHj/MiokIdaMbs0cVg=; b=VYs+djb7AqYELsj/ftaSDrdmtrXf7B3n8KVdqgF3pbWCul49hbD9K0JcV+I2lkF79f NwdaQPJ7buKV6HfWbPb0CYgcV5Py5w90+XrQB1PHtVlfGNYPKEvhamp3aCPrenbnvRvY jVZaUFJG4Vh72iDsIZqJbPw2GyfZK5ipYxJg7qpPCUlAz+TWaN2haKt+FNvBt5/tYX4R YurjBkdn/Zvpi8b90g0zNU4+0K2HyGd45sTQVGsaAZv2fjagDvl75uSXGf9ObC23jV2S KxTzFITrfsqy51Qx5fkOuZy+ksTTEo//EpW5Yry0KKcL+5I2vqFQgK/BCRSflQnkAhGn 0Qmg== X-Gm-Message-State: APjAAAV+GQzAuYMmUWJ0C4g1AM0OiU4uClAq8M/grWQI7dc+7NV/VzbM 0DXVuAPdr36uZHNUxkrFKp0= X-Google-Smtp-Source: APXvYqy0UUWibb6exuBS3rItPtIaJrg0uC4VXYXtj0dKoxluNWpPOzAefrFh8N6Qx556bVbVRyAtbw== X-Received: by 2002:aa7:8007:: with SMTP id j7mr14295020pfi.73.1576823180794; Thu, 19 Dec 2019 22:26:20 -0800 (PST) Received: from TENCENT64 ([150.109.127.35]) by smtp.gmail.com with ESMTPSA id 203sm11218039pfy.185.2019.12.19.22.26.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Dec 2019 22:26:20 -0800 (PST) From: zgpeng.linux@gmail.com To: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, shakeelb@google.com Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, zgpeng Subject: [PATCH] oom: choose a more suitable process to kill while all processes are not killable Date: Fri, 20 Dec 2019 14:26:12 +0800 Message-Id: <1576823172-25943-1-git-send-email-zgpeng.linux@gmail.com> X-Mailer: git-send-email 2.7.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: zgpeng It has been found in multiple business scenarios that when a oom occurs in a cgroup, the process that consumes the most memory in the cgroup is not killed first. Analysis of the reasons found that each process in the cgroup oom_score_adj is set to -998, oom_badness in the calculation of points, if points is negative, uniformly set it to 1. For these processes that should not be killed, the kernel does not distinguish which process consumes the most memory. As a result, there is a phenomenon that a process with low memory consumption may be killed first. In addition, when the memory is large, even if oom_score_adj is not set so small, such as -1, there will be similar problems. Based on the scenario mentioned above, the existing oom killer can be optimized. In this patch, oom_badness is optimized so that when points are negative, it can also distinguish which process consumes the most memory. Therefore, when oom occurs, the process with the largest memory consumption can be killed first. Signed-off-by: zgpeng --- drivers/tty/sysrq.c | 1 + fs/proc/base.c | 6 ++++-- include/linux/oom.h | 4 ++-- mm/memcontrol.c | 1 + mm/oom_kill.c | 21 +++++++++------------ mm/page_alloc.c | 1 + 6 files changed, 18 insertions(+), 16 deletions(-) diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 573b205..1fe79d9 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -367,6 +367,7 @@ static void moom_callback(struct work_struct *ignored) .memcg = NULL, .gfp_mask = gfp_mask, .order = -1, + .chosen_points = LONG_MIN, }; mutex_lock(&oom_lock); diff --git a/fs/proc/base.c b/fs/proc/base.c index ebea950..2c7c4e0 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -548,9 +548,11 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { unsigned long totalpages = totalram_pages() + total_swap_pages; - unsigned long points = 0; + long points = 0; - points = oom_badness(task, totalpages) * 1000 / totalpages; + points = oom_badness(task, totalpages); + points = points > 0 ? points : 1; + points = points * 1000 / totalpages; seq_printf(m, "%lu\n", points); return 0; diff --git a/include/linux/oom.h b/include/linux/oom.h index c696c26..2d2b898 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -48,7 +48,7 @@ struct oom_control { /* Used by oom implementation, do not set */ unsigned long totalpages; struct task_struct *chosen; - unsigned long chosen_points; + long chosen_points; /* Used to print the constraint info. */ enum oom_constraint constraint; @@ -107,7 +107,7 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm) bool __oom_reap_task_mm(struct mm_struct *mm); -extern unsigned long oom_badness(struct task_struct *p, +extern long oom_badness(struct task_struct *p, unsigned long totalpages); extern bool out_of_memory(struct oom_control *oc); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c5b5f74..73e2381 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1563,6 +1563,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, .memcg = memcg, .gfp_mask = gfp_mask, .order = order, + .chosen_points = LONG_MIN, }; bool ret; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 71e3ace..160f364 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -195,17 +195,17 @@ static bool is_dump_unreclaim_slabs(void) * predictable as possible. The goal is to return the highest value for the * task consuming the most memory to avoid subsequent oom failures. */ -unsigned long oom_badness(struct task_struct *p, unsigned long totalpages) +long oom_badness(struct task_struct *p, unsigned long totalpages) { long points; long adj; if (oom_unkillable_task(p)) - return 0; + return LONG_MIN; p = find_lock_task_mm(p); if (!p) - return 0; + return LONG_MIN; /* * Do not even consider tasks which are explicitly marked oom @@ -217,7 +217,7 @@ unsigned long oom_badness(struct task_struct *p, unsigned long totalpages) test_bit(MMF_OOM_SKIP, &p->mm->flags) || in_vfork(p)) { task_unlock(p); - return 0; + return LONG_MIN; } /* @@ -232,11 +232,7 @@ unsigned long oom_badness(struct task_struct *p, unsigned long totalpages) adj *= totalpages / 1000; points += adj; - /* - * Never return 0 for an eligible task regardless of the root bonus and - * oom_score_adj (oom_score_adj can't be OOM_SCORE_ADJ_MIN here). - */ - return points > 0 ? points : 1; + return points; } static const char * const oom_constraint_text[] = { @@ -309,7 +305,7 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc) static int oom_evaluate_task(struct task_struct *task, void *arg) { struct oom_control *oc = arg; - unsigned long points; + long points; if (oom_unkillable_task(task)) goto next; @@ -335,12 +331,12 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) * killed first if it triggers an oom, then select it. */ if (oom_task_origin(task)) { - points = ULONG_MAX; + points = LONG_MAX; goto select; } points = oom_badness(task, oc->totalpages); - if (!points || points < oc->chosen_points) + if (points == LONG_MIN || points < oc->chosen_points) goto next; select: @@ -1126,6 +1122,7 @@ void pagefault_out_of_memory(void) .memcg = NULL, .gfp_mask = 0, .order = 0, + .chosen_points = LONG_MIN, }; if (mem_cgroup_oom_synchronize(true)) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4785a8a..63ccb2a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3797,6 +3797,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) .memcg = NULL, .gfp_mask = gfp_mask, .order = order, + .chosen_points = LONG_MIN, }; struct page *page;