From patchwork Fri Jul 13 23:07:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Rientjes X-Patchwork-Id: 10524225 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 73FA960245 for ; Fri, 13 Jul 2018 23:07:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 62A4429B44 for ; Fri, 13 Jul 2018 23:07:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56AFC29B58; Fri, 13 Jul 2018 23:07:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA3FC29B44 for ; Fri, 13 Jul 2018 23:07:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 34D056B000E; Fri, 13 Jul 2018 19:07:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 303396B0010; Fri, 13 Jul 2018 19:07:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C6D76B0266; Fri, 13 Jul 2018 19:07:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f72.google.com (mail-pl0-f72.google.com [209.85.160.72]) by kanga.kvack.org (Postfix) with ESMTP id D32BE6B000E for ; Fri, 13 Jul 2018 19:07:27 -0400 (EDT) Received: by mail-pl0-f72.google.com with SMTP id cf17-v6so13091771plb.2 for ; Fri, 13 Jul 2018 16:07:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:from:to:cc:subject :in-reply-to:message-id:references:user-agent:mime-version; bh=drSc/yLyXYJewM0nnn0zDSduvYIMw4DeBViBIvRq4XU=; b=VGnSqh9OaucC2ZFAyUmBAmmkNtcJlszPT9JJqSsmseQ/Ms2XZaVfx7rQOsoaRbWcdE 1HgJ+R/okcNnRl21MViQ9sgi4I33uv3q4+rAKg1PU5DwnZd4ucmflhv0uqCAy8E62/lk 6byQeTMbASsVscWubTru4l8XhirwHza48opoqzd6Mxi1lS4r8hmlQWvVtNdjYixsoFwu 2niSvEB4fEIY+jKJ/3bv0d+yRE0MfgEn4M84fQ7inUjqTsPJN/NVsMChSsgRbxcyUiPM FPi752HVPMKYvQTjBcTrMjJhm4vSdCD394jwXIYtjgf0jKzVgzbO/BWvxQvg2dxVhTuw Qvjg== X-Gm-Message-State: AOUpUlEZ/XxhXleZO7U0eZKloWeg7cA/diW5c+Qy6nT51SfnR1enceFI 4CUleWf8ro/lg8WrYyuMu2AJmobiG997Me26rW2DvrpjJzwuJQmFTKTXHbHgVn06mU7QHXs+hDx 9e2XbMZ3r5wrLFsP9BUNPRFqnx3YZkS6kwrADtRw2KlxshSorRAtVaJVQ/FFQtENBWts9qj+ToB 65IqsNAqW7jnuoh5lHKhY1LFTh7La1x7+07SIckGyWTq6yMuo5aGwDpSlSPtSH40tmwsQaj+5jk U1XfY8i0mIE7lwgb8bGlFDFSpz9yYKGnTPR8R8+hLS+ADJJVQhcNZh9yXFQSBJBUtmvl77u4P/F 4/8hkuwUb4HB7F297REutmlGBQgnVbWYNFL6ZvQ5gUwe6IvZYVlI2fmlySJfDn50xJXvXeQyE70 + X-Received: by 2002:a63:4b5a:: with SMTP id k26-v6mr7526024pgl.384.1531523247549; Fri, 13 Jul 2018 16:07:27 -0700 (PDT) X-Received: by 2002:a63:4b5a:: with SMTP id k26-v6mr7525996pgl.384.1531523246744; Fri, 13 Jul 2018 16:07:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531523246; cv=none; d=google.com; s=arc-20160816; b=d8Zw0Yn3HnojUeOj00NdZ98KyGEW0jIBoFFxBoRzQkjUk42LdngPygoXQIWOFDb/5X +ED/UwknTs2zZJNwgp2a4E7mijis3rzzvvQcr6sIRWOUCBCc+pYz/86eytm5fXtscmDb 1xrYbUTRiESc9JphR/uDgNUN5yOKRBINm92wbhVbe6bXqDoDkkf8GuL0qIE02SJGhnaz rwMzLEa2oDlNxc1Sz6XKjn80B1XrLj9S208ZQ8jwn2AlrU8HOgjrprL1yYEPfp8apVUp meZFIVNKyB9urWbMYlazs2vVSiRdKkHtN2Um5/WIznz/K7ctXXCw/rHxWvhuIVg7aw8F eRsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:references:message-id:in-reply-to:subject :cc:to:from:date:dkim-signature:arc-authentication-results; bh=drSc/yLyXYJewM0nnn0zDSduvYIMw4DeBViBIvRq4XU=; b=FGOKUTx8O1jYS0u+lYPT44cMCpIN9F/rNR3WaTpyB+utvoELULxRXf4p7qbI61npLY kZ/dezaxNl3as47ujddAdxQNQi6XqveQkipxTFCZoRF/76bslrDGldfddPkAWwnOsaTZ hkBJ4NWRUOuUJFQdhoTUcHIGNppDy2xIVTUUyhiwtb/fEl5nFU4V0yqMwElN9pw02RN/ rqMQqjBI3wgK6YvIIma55JuM3ULjst+DBC6j6mT0VMacVcRglSAtUG3ZQ7tx4n1IHnL/ VGayRpVDQmJxwpYOlOxmVFcU5efryi1PyGwJ7QimtOCXqGF42Dm+D6n7BJeqDAg7so3q gX6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lpiLg3cj; spf=pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id e2-v6sor7837150pfb.34.2018.07.13.16.07.26 for (Google Transport Security); Fri, 13 Jul 2018 16:07:26 -0700 (PDT) Received-SPF: pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lpiLg3cj; spf=pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=drSc/yLyXYJewM0nnn0zDSduvYIMw4DeBViBIvRq4XU=; b=lpiLg3cjkAB12viMdRujE7ROHpAeZZh1tuwQSJ/ZkmxhkMMa7TVDSSSJ0ln895yeiR EvX1fNvRovwbNOjBsQYlINcOsRpMlhBxFZafXciWzrWmKrojnFa80EBulKCQ0I7yNDDE wWiEzvrzlpOxLJtxtq6jIglt0L8kB+VWL/1DvomcuxKXA4MB1qTFvR3yWiXLH5EXCmH4 JHZU/vgIVB13osMH8Ay/v32TeespdFz+qnPNF4qQvRKE8/Ys2C29TtDRxFo0Wp/sosTL p99Cv8lhyitRmJ2wcuvlX3FZX9BDVF1FQDES61ygd0JsB/Z+WtF736qQnsS4Do+UjzE7 SESg== X-Google-Smtp-Source: AAOMgpdHICtwh1WeyS981B6Z845v5k1O/dS+yIAblQi4N6JSxKKnqoVyt+wQ3WyTvkzGhZbetZRQWA== X-Received: by 2002:a62:3b03:: with SMTP id i3-v6mr8935515pfa.197.1531523246245; Fri, 13 Jul 2018 16:07:26 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id 77-v6sm71871497pga.40.2018.07.13.16.07.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 13 Jul 2018 16:07:25 -0700 (PDT) Date: Fri, 13 Jul 2018 16:07:25 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch v3 -mm 1/6] mm, memcg: introduce per-memcg oom policy tunable In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The cgroup aware oom killer is needlessly enforced for the entire system by a mount option. It's unnecessary to force the system into a single oom policy: either cgroup aware, or the traditional process aware. This patch introduces a memory.oom_policy tunable for all mem cgroups. It is currently a no-op: it can only be set to "none", which is its default policy. It will be expanded in the next patch to define cgroup aware oom killer behavior for its subtree. This is an extensible interface that can be used to define cgroup aware assessment of mem cgroup subtrees or the traditional process aware assessment. Reading memory.oom_policy will specify the list of available policies. Another benefit of such an approach is that an admin can lock in a certain policy for the system or for a mem cgroup subtree and can delegate the policy decision to the user to determine if the kill should originate from a subcontainer, as indivisible memory consumers themselves, or selection should be done per process. Signed-off-by: David Rientjes --- Documentation/admin-guide/cgroup-v2.rst | 11 ++++++++ include/linux/memcontrol.h | 11 ++++++++ mm/memcontrol.c | 35 +++++++++++++++++++++++++ 3 files changed, 57 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1098,6 +1098,17 @@ PAGE_SIZE multiple when read back. If cgroup-aware OOM killer is not enabled, ENOTSUPP error is returned on attempt to access the file. + memory.oom_policy + + A read-write single string file which exists on all cgroups. The + default value is "none". + + If "none", the OOM killer will use the default policy to choose a + victim; that is, it will choose the single process with the largest + memory footprint adjusted by /proc/pid/oom_score_adj (see + Documentation/filesystems/proc.txt). This is the same policy as if + memory cgroups were not even mounted. + memory.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -66,6 +66,14 @@ enum mem_cgroup_protection { MEMCG_PROT_MIN, }; +enum memcg_oom_policy { + /* + * No special oom policy, process selection is determined by + * oom_badness() + */ + MEMCG_OOM_POLICY_NONE, +}; + struct mem_cgroup_reclaim_cookie { pg_data_t *pgdat; int priority; @@ -234,6 +242,9 @@ struct mem_cgroup { /* OOM-Killer disable */ int oom_kill_disable; + /* OOM policy for this subtree */ + enum memcg_oom_policy oom_policy; + /* * Treat the sub-tree as an indivisible memory consumer, * kill all belonging tasks if the memory cgroup selected diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4649,6 +4649,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) if (parent) { memcg->swappiness = mem_cgroup_swappiness(parent); memcg->oom_kill_disable = parent->oom_kill_disable; + memcg->oom_policy = parent->oom_policy; } if (parent && parent->use_hierarchy) { memcg->use_hierarchy = true; @@ -5810,6 +5811,34 @@ static int memory_stat_show(struct seq_file *m, void *v) return 0; } +static int memory_oom_policy_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + enum memcg_oom_policy policy = READ_ONCE(memcg->oom_policy); + + switch (policy) { + case MEMCG_OOM_POLICY_NONE: + default: + seq_puts(m, "[none]\n"); + }; + return 0; +} + +static ssize_t memory_oom_policy_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = nbytes; + + buf = strstrip(buf); + if (!memcmp("none", buf, min(sizeof("none")-1, nbytes))) + memcg->oom_policy = MEMCG_OOM_POLICY_NONE; + else + ret = -EINVAL; + + return ret; +} + static struct cftype memory_files[] = { { .name = "current", @@ -5857,6 +5886,12 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NOT_ON_ROOT, .seq_show = memory_stat_show, }, + { + .name = "oom_policy", + .flags = CFTYPE_NS_DELEGATABLE, + .seq_show = memory_oom_policy_show, + .write = memory_oom_policy_write, + }, { } /* terminate */ };