From patchwork Sat Aug 4 13:29:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 10555765 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03D2013AC for ; Sat, 4 Aug 2018 13:30:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6F7629C2E for ; Sat, 4 Aug 2018 13:30:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DA59D29C34; Sat, 4 Aug 2018 13:30:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0CCE429C2E for ; Sat, 4 Aug 2018 13:30:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7654D6B0005; Sat, 4 Aug 2018 09:30:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6EC6E6B0006; Sat, 4 Aug 2018 09:30:05 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 590016B0007; Sat, 4 Aug 2018 09:30:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f69.google.com (mail-pl0-f69.google.com [209.85.160.69]) by kanga.kvack.org (Postfix) with ESMTP id 1377F6B0005 for ; Sat, 4 Aug 2018 09:30:05 -0400 (EDT) Received: by mail-pl0-f69.google.com with SMTP id o12-v6so4916761pls.20 for ; Sat, 04 Aug 2018 06:30:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ivjvAeLWUzWkZMvjgj344DysM7ydd6Z4SniFwwhudRM=; b=XyozsqQl7qYz0sMNUjGIiPX04L6J57ov7RiuaVv5gdR5Gwv0pfx0y+zSjH+zrLyLt2 eUv6kBDXLgZ0iqzRTogkRMf8t0MC/cYKyiWBj1mcPMkHCrrRmNkfsrm1J9EuoagE43+s 5VZYcIAj350GJP6CwCVnr96bCe6P/ZtNNsRHQiUO9Wwmx2+qlc3H5demeoo3taKeS+li vs4ywrsFxabzsA+RKRZZU678yil9D3wN59hU9RnDSiwo9rIrwPuCwkwNSsygs6Qmgy4G H7z6RPP+lCQFqgpdVxgvNNeMx0vx/XppRvRP5k0MiVDh79Ls6RbmmhaOII19qsjl/dIc Atew== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp X-Gm-Message-State: AOUpUlG6zrDjKdmNssTqP2P4nYIquln3OEdgtPgjlyXCktqmWE1ASD/p 7sa73AgdOFHPmoL9c75r25wL5NXFzVGIOKQcRP3JvWEI4IwqVTpSBINV6hjIZ9uAeOVZFul3wOQ GDMwSrHe4Xc1uVeX8tMBi1c3ZyN71S/HSvOXwrbe8eTr3ojhoEx3iUBluWWDdBC9ARA== X-Received: by 2002:a63:dc17:: with SMTP id s23-v6mr7886161pgg.40.1533389404734; Sat, 04 Aug 2018 06:30:04 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeECNQFnah8nbzLFNyXD2EcFJRYXwhi0ywlJsbUi5ifX8Tnz9qd6lJsiw08nO95Fsmqb3nZ X-Received: by 2002:a63:dc17:: with SMTP id s23-v6mr7886068pgg.40.1533389403267; Sat, 04 Aug 2018 06:30:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533389403; cv=none; d=google.com; s=arc-20160816; b=yOYcXbLbw7tWHMc/0R3/r9ZfyDPBlYB5IgQRLwMjWDZiXQ58GuXI2M4T8knxY+2ehg flS0z2L1RBA29ZQHmvJWnsrVRmVrX6YF2f7+cffTw6/IAdiQsTsrgJF7jAZduCDt+Feh lWIGXJeDZ6smjfjhw5f6PYf3wkMDeRLigXYZf1O2Yhi/3i0PWMjIhkXwoWOBvD3PX0Dc g3izPdwKe/LjlcJMVijXeeWrN2+hh6gXvDX7kRIYIWxaLTwyVG/zr/fEYUBYjOdbp2v8 kk0PAUqQZkFFTVuiMTU1midtPlSlrRWmY37AMeEMep4qxEdWDF0JhuCUmY5uyI0/wXC8 MUNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=ivjvAeLWUzWkZMvjgj344DysM7ydd6Z4SniFwwhudRM=; b=sj83SRdO8OgJdp3lwrIlu5Iq86yg7mQaGcHZ1gQZcYljDAfsYg2I3gsueKExW1lRKP kaLOe7TfTO3Dp6D8m4QxJEOLYhhNtHPicywk4laFJKS1u/8vFWbYhO3O45MfLETV1VEM 4rU5omhyYSXdXxtn1iPfmgjKN2jsEPfsQ8P1GZLAWl2Fpd3lN9LL3Mrb+nG0Xmfiri6n J4zjukZT9Z9zbDH3PO5XUTi2h69wRoPpghO7bppXXPl8G0hyAzZ2QOXeH//Bq7mmR29d m5vfg3QzljYrZTyYiGKubGcCP/YCLkatBzt3efC1i+Mx4e5do3Y1IAc2t4FbGCD4hb0P isOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id p21-v6si5648555plq.94.2018.08.04.06.30.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 Aug 2018 06:30:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) client-ip=202.181.97.72; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from fsav105.sakura.ne.jp (fsav105.sakura.ne.jp [27.133.134.232]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w74DTnog005280; Sat, 4 Aug 2018 22:29:49 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav105.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav105.sakura.ne.jp); Sat, 04 Aug 2018 22:29:49 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav105.sakura.ne.jp) Received: from ccsecurity.localdomain (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w74DTiDL005246 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 4 Aug 2018 22:29:49 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) From: Tetsuo Handa To: linux-mm@kvack.org Cc: Tetsuo Handa , David Rientjes , Michal Hocko , Roman Gushchin Subject: [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes. Date: Sat, 4 Aug 2018 22:29:46 +0900 Message-Id: <1533389386-3501-4-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1533389386-3501-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> References: <1533389386-3501-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP David Rientjes is complaining about current behavior that the OOM killer selects next OOM victim as soon as MMF_OOM_SKIP is set even if __oom_reap_task_mm() returned without any progress. To address this problem, this patch adds a timeout with whether the OOM score of an OOM victim's memory is decreasing over time as a feedback, after MMF_OOM_SKIP is set by the OOM reaper or exit_mmap(). Signed-off-by: Tetsuo Handa Cc: Michal Hocko Cc: David Rientjes Cc: Roman Gushchin --- include/linux/sched.h | 3 ++ mm/oom_kill.c | 81 ++++++++++++++++++++++++++++++++++++++------------- 2 files changed, 63 insertions(+), 21 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 589fe78..70c7dfd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1174,6 +1174,9 @@ struct task_struct { #endif int pagefault_disabled; struct list_head oom_victim_list; + unsigned long last_oom_compared; + unsigned long last_oom_score; + unsigned char oom_reap_stall_count; #ifdef CONFIG_VMAP_STACK struct vm_struct *stack_vm_area; #endif diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 783f04d..7cad886 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -49,6 +49,12 @@ #define CREATE_TRACE_POINTS #include +static inline unsigned long oom_victim_mm_score(struct mm_struct *mm) +{ + return get_mm_rss(mm) + get_mm_counter(mm, MM_SWAPENTS) + + mm_pgtables_bytes(mm) / PAGE_SIZE; +} + int sysctl_panic_on_oom; int sysctl_oom_kill_allocating_task; int sysctl_oom_dump_tasks = 1; @@ -230,8 +236,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, * The baseline for the badness score is the proportion of RAM that each * task's rss, pagetable and swap space use. */ - points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + - mm_pgtables_bytes(p->mm) / PAGE_SIZE; + points = oom_victim_mm_score(p->mm); task_unlock(p); /* Normalize to oom_score_adj units */ @@ -571,15 +576,6 @@ static void oom_reap_task(struct task_struct *tsk) while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm)) schedule_timeout_idle(HZ/10); - if (attempts <= MAX_OOM_REAP_RETRIES || - test_bit(MMF_OOM_SKIP, &mm->flags)) - goto done; - - pr_info("oom_reaper: unable to reap pid:%d (%s)\n", - task_pid_nr(tsk), tsk->comm); - debug_show_all_locks(); - -done: /* * Hide this mm from OOM killer because it has been either reaped or * somebody can't call up_write(mmap_sem). @@ -631,6 +627,9 @@ static void mark_oom_victim(struct task_struct *tsk) if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) { mmgrab(tsk->signal->oom_mm); set_bit(MMF_OOM_VICTIM, &mm->flags); + tsk->last_oom_compared = jiffies; + tsk->last_oom_score = oom_victim_mm_score(mm); + tsk->oom_reap_stall_count = 0; get_task_struct(tsk); list_add(&tsk->oom_victim_list, &oom_victim_list); } @@ -867,7 +866,6 @@ static void __oom_kill_process(struct task_struct *victim) mmdrop(mm); put_task_struct(victim); } -#undef K /* * Kill provided task unless it's secured by setting @@ -999,33 +997,74 @@ int unregister_oom_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); +static bool victim_mm_stalling(struct task_struct *p, struct mm_struct *mm) +{ + unsigned long score; + + if (time_before(jiffies, p->last_oom_compared + HZ / 10)) + return false; + score = oom_victim_mm_score(mm); + if (score < p->last_oom_score) + p->oom_reap_stall_count = 0; + else + p->oom_reap_stall_count++; + p->last_oom_score = oom_victim_mm_score(mm); + p->last_oom_compared = jiffies; + if (p->oom_reap_stall_count < 30) + return false; + pr_info("Gave up waiting for process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", + task_pid_nr(p), p->comm, K(mm->total_vm), + K(get_mm_counter(mm, MM_ANONPAGES)), + K(get_mm_counter(mm, MM_FILEPAGES)), + K(get_mm_counter(mm, MM_SHMEMPAGES))); + return true; +} + static bool oom_has_pending_victims(struct oom_control *oc) { - struct task_struct *p; + struct task_struct *p, *tmp; + bool ret = false; + bool gaveup = false; if (is_sysrq_oom(oc)) return false; /* - * Since oom_reap_task()/exit_mmap() will set MMF_OOM_SKIP, let's - * wait for pending victims until MMF_OOM_SKIP is set or __mmput() - * completes. + * Wait for pending victims until __mmput() completes or stalled + * too long. */ - list_for_each_entry(p, &oom_victim_list, oom_victim_list) { + list_for_each_entry_safe(p, tmp, &oom_victim_list, oom_victim_list) { + struct mm_struct *mm = p->signal->oom_mm; + if (oom_unkillable_task(p, oc->memcg, oc->nodemask)) continue; - if (!test_bit(MMF_OOM_SKIP, &p->signal->oom_mm->flags)) { + ret = true; #ifdef CONFIG_MMU + /* + * Since the OOM reaper exists, we can safely wait until + * MMF_OOM_SKIP is set. + */ + if (!test_bit(MMF_OOM_SKIP, &mm->flags)) { if (!oom_reap_target) { get_task_struct(p); oom_reap_target = p; trace_wake_reaper(p->pid); wake_up(&oom_reaper_wait); } -#endif - return true; + continue; } +#endif + /* We can wait as long as OOM score is decreasing over time. */ + if (!victim_mm_stalling(p, mm)) + continue; + gaveup = true; + list_del(&p->oom_victim_list); + /* Drop a reference taken by mark_oom_victim(). */ + put_task_struct(p); } - return false; + if (gaveup) + debug_show_all_locks(); + + return ret; } /**