From patchwork Thu Apr 22 12:33:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Enderborg X-Patchwork-Id: 12218449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EEFCC433B4 for ; Thu, 22 Apr 2021 12:33:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 13AB66144E for ; Thu, 22 Apr 2021 12:33:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 13AB66144E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sony.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 34C856B0070; Thu, 22 Apr 2021 08:33:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D6726B0071; Thu, 22 Apr 2021 08:33:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1287C6B0072; Thu, 22 Apr 2021 08:33:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0199.hostedemail.com [216.40.44.199]) by kanga.kvack.org (Postfix) with ESMTP id E771E6B0070 for ; Thu, 22 Apr 2021 08:33:53 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A05C81802345C for ; Thu, 22 Apr 2021 12:33:53 +0000 (UTC) X-FDA: 78059944746.22.B8FE110 Received: from JPTOSEGREL01.sonyericsson.com (jptosegrel01.sonyericsson.com [124.215.201.71]) by imf15.hostedemail.com (Postfix) with ESMTP id 0C646A0003AC for ; Thu, 22 Apr 2021 12:33:49 +0000 (UTC) Subject: [RFC PATCH] Android OOM helper proof of concept To: Michal Hocko , Shakeel Butt CC: Johannes Weiner , Roman Gushchin , Linux MM , Andrew Morton , Cgroups , David Rientjes , LKML , Suren Baghdasaryan , Greg Thelen , Dragos Sbirlea , Priya Duraisamy References: From: peter enderborg Message-ID: Date: Thu, 22 Apr 2021 14:33:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-GB X-SEG-SpamProfiler-Analysis: v=2.3 cv=DLnxHBFb c=1 sm=1 tr=0 a=fZcToFWbXLKijqHhjJ02CA==:117 a=IkcTkHD0fZMA:10 a=3YhXtTcJ-WEA:10 a=iox4zFpeAAAA:8 a=z6gsHLkEAAAA:8 a=FyX8BiV9-lCGZTZ29yIA:9 a=QEXdDO2ut3YA:10 a=WzC6qhA0u3u7Ye7llzcV:22 a=d-OLMTCWyvARjPbQ-enb:22 X-SEG-SpamProfiler-Score: 0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0C646A0003AC X-Stat-Signature: 6tq8pqz5s41fsbm3wug5er56k5mwkn5d Received-SPF: none (sony.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=JPTOSEGREL01.sonyericsson.com; client-ip=124.215.201.71 X-HE-DKIM-Result: none/none X-HE-Tag: 1619094829-383053 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/21/21 4:29 PM, Michal Hocko wrote: > On Wed 21-04-21 06:57:43, Shakeel Butt wrote: >> On Wed, Apr 21, 2021 at 12:16 AM Michal Hocko wrote: >> [...] >>>> To decide when to kill, the oom-killer has to read a lot of metrics. >>>> It has to open a lot of files to read them and there will definitely >>>> be new allocations involved in those operations. For example reading >>>> memory.stat does a page size allocation. Similarly, to perform action >>>> the oom-killer may have to read cgroup.procs file which again has >>>> allocation inside it. >>> True but many of those can be avoided by opening the file early. At >>> least seq_file based ones will not allocate later if the output size >>> doesn't increase. Which should be the case for many. I think it is a >>> general improvement to push those who allocate during read to an open >>> time allocation. >>> >> I agree that this would be a general improvement but it is not always >> possible (see below). > It would be still great to invest into those improvements. And I would > be really grateful to learn about bottlenecks from the existing kernel > interfaces you have found on the way. > >>>> Regarding sophisticated oom policy, I can give one example of our >>>> cluster level policy. For robustness, many user facing jobs run a lot >>>> of instances in a cluster to handle failures. Such jobs are tolerant >>>> to some amount of failures but they still have requirements to not let >>>> the number of running instances below some threshold. Normally killing >>>> such jobs is fine but we do want to make sure that we do not violate >>>> their cluster level agreement. So, the userspace oom-killer may >>>> dynamically need to confirm if such a job can be killed. >>> What kind of data do you need to examine to make those decisions? >>> >> Most of the time the cluster level scheduler pushes the information to >> the node controller which transfers that information to the >> oom-killer. However based on the freshness of the information the >> oom-killer might request to pull the latest information (IPC and RPC). > I cannot imagine any OOM handler to be reliable if it has to depend on > other userspace component with a lower resource priority. OOM handlers > are fundamentally complex components which has to reduce their > dependencies to the bare minimum. I think we very much need a OOM killer that can help out, but it is essential that it also play with android rules. This is RFC patch that interact with OOM From 09f3a2e401d4ed77e95b7cea7edb7c5c3e6a0c62 Mon Sep 17 00:00:00 2001 From: Peter Enderborg Date: Thu, 22 Apr 2021 14:15:46 +0200 Subject: [PATCH] mm/oom: Android oomhelper This is proff of concept of a pre-oom-killer that kill task strictly on oom-score-adj order if the score is positive. It act as lifeline when userspace does not have optimal performance. ---  drivers/staging/Makefile              |  1 +  drivers/staging/oomhelper/Makefile    |  2 +  drivers/staging/oomhelper/oomhelper.c | 65 +++++++++++++++++++++++++++  mm/oom_kill.c                         |  4 +-  4 files changed, 70 insertions(+), 2 deletions(-)  create mode 100644 drivers/staging/oomhelper/Makefile  create mode 100644 drivers/staging/oomhelper/oomhelper.c diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile index 2245059e69c7..4a5449b42568 100644 --- a/drivers/staging/Makefile +++ b/drivers/staging/Makefile @@ -47,3 +47,4 @@ obj-$(CONFIG_QLGE)        += qlge/  obj-$(CONFIG_WIMAX)        += wimax/  obj-$(CONFIG_WFX)        += wfx/  obj-y                += hikey9xx/ +obj-y                += oomhelper/ diff --git a/drivers/staging/oomhelper/Makefile b/drivers/staging/oomhelper/Makefile new file mode 100644 index 000000000000..ee9b361957f8 --- /dev/null +++ b/drivers/staging/oomhelper/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-y    += oomhelper.o diff --git a/drivers/staging/oomhelper/oomhelper.c b/drivers/staging/oomhelper/oomhelper.c new file mode 100644 index 000000000000..5a3fe0270cb8 --- /dev/null +++ b/drivers/staging/oomhelper/oomhelper.c @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: GPL-2.0 +/* prof of concept of android aware oom killer */ +/* Author: peter.enderborg@sony.com */ + +#include +#include +#include +#include +void wake_oom_reaper(struct task_struct *tsk); /* need to public ... */ +void __oom_kill_process(struct task_struct *victim, const char *message); + +static int oomhelper_oom_notify(struct notifier_block *self, +                      unsigned long notused, void *param) +{ +  struct task_struct *tsk; +  struct task_struct *selected = NULL; +  int highest = 0; + +  pr_info("invited"); +  rcu_read_lock(); +  for_each_process(tsk) { +      struct task_struct *candidate; +      if (tsk->flags & PF_KTHREAD) +          continue; + +      /* Ignore task if coredump in progress */ +      if (tsk->mm && tsk->mm->core_state) +          continue; +      candidate = find_lock_task_mm(tsk); +      if (!candidate) +          continue; + +      if (highest < candidate->signal->oom_score_adj) { +          /* for test dont kill level 0 */ +          highest = candidate->signal->oom_score_adj; +          selected = candidate; +          pr_info("new selected %d %d", selected->pid, +              selected->signal->oom_score_adj); +      } +      task_unlock(candidate); +  } +  if (selected) { +      get_task_struct(selected); +  } +  rcu_read_unlock(); +  if (selected) { +      pr_info("oomhelper killing: %d", selected->pid); +      __oom_kill_process(selected, "oomhelper"); +  } + +  return NOTIFY_OK; +} + +static struct notifier_block oomhelper_oom_nb = { +    .notifier_call = oomhelper_oom_notify +}; + +int __init oomhelper_register_oom_notifier(void) +{ +    register_oom_notifier(&oomhelper_oom_nb); +    pr_info("oomhelper installed"); +    return 0; +} + +subsys_initcall(oomhelper_register_oom_notifier); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index fa1cf18bac97..a5f7299af9a3 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -658,7 +658,7 @@ static int oom_reaper(void *unused)      return 0;  }   -static void wake_oom_reaper(struct task_struct *tsk) +void wake_oom_reaper(struct task_struct *tsk)  {      /* mm is already queued? */      if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) @@ -856,7 +856,7 @@ static bool task_will_free_mem(struct task_struct *task)      return ret;  }   -static void __oom_kill_process(struct task_struct *victim, const char *message) +void __oom_kill_process(struct task_struct *victim, const char *message)  {      struct task_struct *p;      struct mm_struct *mm;