From patchwork Wed Jan 16 10:55:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 10765723 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B041391E for ; Wed, 16 Jan 2019 10:55:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E7FE2CFCB for ; Wed, 16 Jan 2019 10:55:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 924942D166; Wed, 16 Jan 2019 10:55:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,HEXHASH_WORD, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56E072CFCB for ; Wed, 16 Jan 2019 10:55:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D1078E0003; Wed, 16 Jan 2019 05:55:49 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 433428E0002; Wed, 16 Jan 2019 05:55:49 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 321B68E0003; Wed, 16 Jan 2019 05:55:49 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ot1-f72.google.com (mail-ot1-f72.google.com [209.85.210.72]) by kanga.kvack.org (Postfix) with ESMTP id 069258E0002 for ; Wed, 16 Jan 2019 05:55:49 -0500 (EST) Received: by mail-ot1-f72.google.com with SMTP id a3so2962271otl.9 for ; Wed, 16 Jan 2019 02:55:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=UXMhPUcsJzOr+5irvUr3hI+hJmKPQ2D0PUCaAbDhqkw=; b=TSggvLmFQ9SMh+2hkbAe73A0EpTBnfyLzq8PblrJJpjM88V+xLrDQfCbK7P5HBvJcO zroOW/6qwLXFFKx3eNZwNX4obQAIEu5Lmhrwdu395TJwBUUjKvUMO+XPUaGeCLw66DpI CkAvKGBxey86+2+UOc6XeHbsoFGJgNoALYp76i8sr1sygAgSGbDsys92ixElCkfs0cDW pO3vfq5JbF/hKON1QPi1irGe1ZSbqOtbJF/It7E2YAlRNE2aHsd3e1AOvhNDfbP5aIB1 6CJ+cwXHS0XjJLjZiKOfsUHCTBf4P2sLEmgkcNv2pLB+L7szwAVmeRExCYXwXTJb+bIk uTtQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp X-Gm-Message-State: AJcUukcPFvHJQIa/RR1G1EPk6qLCgolXAC16Wn4IwFhEHgnlnWB9CJ29 hHk666OtQYhhTMPJuQbi2hyarPMi984Zo6m0FAFS5r2zlr4Wx8jW6lvhIXe8EjVO+7y71363o8N zPTEeckZ31NDbpB1UGtB4KCO4wVdH08Toh666QzNn7JHDDo/zX6nkkZb6IFZTZzk9PQ== X-Received: by 2002:a9d:7cd9:: with SMTP id r25mr5024940otn.110.1547636148742; Wed, 16 Jan 2019 02:55:48 -0800 (PST) X-Google-Smtp-Source: ALg8bN4hh7X0IRs7GJkNim++OZTf4sFyfztiO8Mp3XyGzHedLYrY2ASRgehv0qbH9ZPcVBPb9YZR X-Received: by 2002:a9d:7cd9:: with SMTP id r25mr5024894otn.110.1547636147621; Wed, 16 Jan 2019 02:55:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547636147; cv=none; d=google.com; s=arc-20160816; b=ZR9Ys0NbMGIdxUt768jRT5T/CGsbGR7rD8b3hpODN4gT4qydL6T/iOe3GRBXqUvlw2 v7IvwTImK9fYEI1lHn+1Zl+5Vxp2tBbHRK+ltJ0kPEjDMSltChxh43ViEImnGk3Au52t KF4kktF6XpZCcVpU5Km1vld6bn2rP2aH/Yj/4/9/t6IpS1CWUIHStPX+f4qCYdKfbsqc vW6+0FuxPUOkP2rYpxqa1W42arFEtmuw1VxhxDTyU48oLLSh/T4LiBVKvKRqzuE/dyzO AeBvCVO1ZgogE8pRVPSU5ly1l9wCsED3oxaOk/d9pXy4emxZyCXzyff4qBDx/9ws74LG 6v/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from; bh=UXMhPUcsJzOr+5irvUr3hI+hJmKPQ2D0PUCaAbDhqkw=; b=pan6J377fLRIPvL/943e5nXygn8Ym/0jAvrMpXLeOaoiOonJ2mWsPPWcbc9N2k/h2C Bm2xBfIVp8W9ZkaTkEfl3WenFF3YdpzUQHiW9yFhSSgxBN5HwI0VR2G2BiDEk2KVsdUa m34MWLSjrhZ4IRVWNcIio6CXcxIaMFYK/EglrlJTFFrYK/4e+kxcN5OxS7DCqywWv988 9ReCq8SPPk1mG+pCTFiqDASf6NXzst4Cx9HfGGjlFdkk26m9+sFwkUZhGDf6Ig5xy5rm +fo3xyCrSk5dxV9ykquQJM7svBGywcNy1JJexKjP6GXaNfzJWfKxZnKdAYxgoQMZrUpW tuWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id r10si2776489oia.78.2019.01.16.02.55.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 02:55:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) client-ip=202.181.97.72; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from fsav110.sakura.ne.jp (fsav110.sakura.ne.jp [27.133.134.237]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id x0GAtVkM004532; Wed, 16 Jan 2019 19:55:31 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav110.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav110.sakura.ne.jp); Wed, 16 Jan 2019 19:55:31 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav110.sakura.ne.jp) Received: from ccsecurity.localdomain (softbank126126163036.bbtec.net [126.126.163.36]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id x0GAtRt2004469 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 16 Jan 2019 19:55:31 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) From: Tetsuo Handa To: Michal Hocko , Andrew Morton Cc: Johannes Weiner , David Rientjes , linux-mm@kvack.org, Tetsuo Handa , Yong-Taek Lee Subject: [PATCH] mm, oom: Tolerate processes sharing mm with different view of oom_score_adj. Date: Wed, 16 Jan 2019 19:55:21 +0900 Message-Id: <1547636121-9229-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Mailer: git-send-email 1.8.3.1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch reverts both commit 44a70adec910d692 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj") and commit 97fd49c2355ffded ("mm, oom: kill all tasks sharing the mm") in order to close a race and reduce the latency at __set_oom_adj(), and reduces the warning at __oom_kill_process() in order to minimize the latency. Commit 36324a990cf578b5 ("oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space") introduced the worst case mentioned in 44a70adec910d692. But since the OOM killer skips mm with MMF_OOM_SKIP set, only administrators can trigger the worst case. Since 44a70adec910d692 did not take latency into account, we can hold RCU for minutes and trigger RCU stall warnings by calling printk() on many thousands of thread groups. Even without calling printk(), the latency is mentioned by Yong-Taek Lee [1]. And I noticed that 44a70adec910d692 is racy, and trying to fix the race will require a global lock which is too costly for rare events. If the worst case in 44a70adec910d692 happens, it is an administrator's request. Therefore, tolerate the worst case and speed up __set_oom_adj(). [1] https://lkml.kernel.org/r/20181008011931epcms1p82dd01b7e5c067ea99946418bc97de46a@epcms1p8 Signed-off-by: Tetsuo Handa Reported-by: Yong-Taek Lee Nacked-by: Michal Hocko --- fs/proc/base.c | 46 ---------------------------------------------- include/linux/mm.h | 2 -- mm/oom_kill.c | 10 ++++++---- 3 files changed, 6 insertions(+), 52 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 633a634..41ece8f 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1020,7 +1020,6 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count, static int __set_oom_adj(struct file *file, int oom_adj, bool legacy) { static DEFINE_MUTEX(oom_adj_mutex); - struct mm_struct *mm = NULL; struct task_struct *task; int err = 0; @@ -1050,55 +1049,10 @@ static int __set_oom_adj(struct file *file, int oom_adj, bool legacy) } } - /* - * Make sure we will check other processes sharing the mm if this is - * not vfrok which wants its own oom_score_adj. - * pin the mm so it doesn't go away and get reused after task_unlock - */ - if (!task->vfork_done) { - struct task_struct *p = find_lock_task_mm(task); - - if (p) { - if (atomic_read(&p->mm->mm_users) > 1) { - mm = p->mm; - mmgrab(mm); - } - task_unlock(p); - } - } - task->signal->oom_score_adj = oom_adj; if (!legacy && has_capability_noaudit(current, CAP_SYS_RESOURCE)) task->signal->oom_score_adj_min = (short)oom_adj; trace_oom_score_adj_update(task); - - if (mm) { - struct task_struct *p; - - rcu_read_lock(); - for_each_process(p) { - if (same_thread_group(task, p)) - continue; - - /* do not touch kernel threads or the global init */ - if (p->flags & PF_KTHREAD || is_global_init(p)) - continue; - - task_lock(p); - if (!p->vfork_done && process_shares_mm(p, mm)) { - pr_info("updating oom_score_adj for %d (%s) from %d to %d because it shares mm with %d (%s). Report if this is unexpected.\n", - task_pid_nr(p), p->comm, - p->signal->oom_score_adj, oom_adj, - task_pid_nr(task), task->comm); - p->signal->oom_score_adj = oom_adj; - if (!legacy && has_capability_noaudit(current, CAP_SYS_RESOURCE)) - p->signal->oom_score_adj_min = (short)oom_adj; - } - task_unlock(p); - } - rcu_read_unlock(); - mmdrop(mm); - } err_unlock: mutex_unlock(&oom_adj_mutex); put_task_struct(task); diff --git a/include/linux/mm.h b/include/linux/mm.h index 80bb640..28879c1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2690,8 +2690,6 @@ static inline int in_gate_area(struct mm_struct *mm, unsigned long addr) } #endif /* __HAVE_ARCH_GATE_AREA */ -extern bool process_shares_mm(struct task_struct *p, struct mm_struct *mm); - #ifdef CONFIG_SYSCTL extern int sysctl_drop_caches; int drop_caches_sysctl_handler(struct ctl_table *, int, diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f0e8cd9..c7005b1 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -478,7 +478,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p) * task's threads: if one of those is using this mm then this task was also * using it. */ -bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) +static bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) { struct task_struct *t; @@ -896,12 +896,14 @@ static void __oom_kill_process(struct task_struct *victim) continue; if (same_thread_group(p, victim)) continue; - if (is_global_init(p)) { + if (is_global_init(p) || + p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) { can_oom_reap = false; - set_bit(MMF_OOM_SKIP, &mm->flags); - pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n", + if (!test_bit(MMF_OOM_SKIP, &mm->flags)) + pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n", task_pid_nr(victim), victim->comm, task_pid_nr(p), p->comm); + set_bit(MMF_OOM_SKIP, &mm->flags); continue; } /*