From patchwork Fri Jul 13 23:07:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Rientjes X-Patchwork-Id: 10524245 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E5FE960245 for ; Fri, 13 Jul 2018 23:07:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4B4A29B44 for ; Fri, 13 Jul 2018 23:07:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C8A8129B58; Fri, 13 Jul 2018 23:07:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58AD629B44 for ; Fri, 13 Jul 2018 23:07:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D13116B026F; Fri, 13 Jul 2018 19:07:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CC1F36B0270; Fri, 13 Jul 2018 19:07:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8BCB6B0271; Fri, 13 Jul 2018 19:07:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f71.google.com (mail-pl0-f71.google.com [209.85.160.71]) by kanga.kvack.org (Postfix) with ESMTP id 79C796B026F for ; Fri, 13 Jul 2018 19:07:38 -0400 (EDT) Received: by mail-pl0-f71.google.com with SMTP id d22-v6so2805634pls.4 for ; Fri, 13 Jul 2018 16:07:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:from:to:cc:subject :in-reply-to:message-id:references:user-agent:mime-version; bh=T2+PUkWlQYmob3d7nbW9BJNgUCy5teUF4SQyur60lBY=; b=irhP3v6r1jt9AwC2Ux6F1AwmLsUC2kHFFqzW2zdLJWvJtUBBnEEEg653ZJfDFeNv1G EtOUSX5F5v647ncBnuamo/fjHcPIVcrukZPhF0kJiQMR7Xad37ilY5i/E+qkc8cf+dK3 9GfCkYAYVXBff4wc25NQe/QzSgZFDZ4i/uFmV0WyGYqY1TznNo+GwcJVayEOAXD2MZhV UiNoPgtx1nbN+QGumiNS4zZfG/s7vhlcR8XS5T8qBC8IxyhOFti7CxIcM8tKK9sKviMb A4lkg2T0PT7DM5XSnGpSBtuq2C3wBoUQz2Q769wGlwXu+V0aYfW/lfYXDsSSM+Z4nWGj L6Hw== X-Gm-Message-State: AOUpUlH7R3YgwfN0RLeYEnT85WySxvTFst7m0m8EAKcH4x4TGgerjuQH pvuu/uEV2hT8A3IEk82uTlJHZFP/cgfdhJfZAaGRPeLwdImGbSEnV++HHTgwTuRFtK5QLMu79n6 YwwuCLaZsAihRjCbjTJtOPAKsTMXWelTlFqEpsuykk9z6bStSkHdBTsPJKUdl1XTDSC4stCnwQt B59tAQ1FocWHZhb7nFSWltGIVb3HnuCUCK+DAKgc3h1ZYeRqGfqySQ/cACjvtwFY0gthkfiXLOC UraYlYkzQ4UTeHmQhc+RceamuZS1xfD5O1aVJE54c4wsQ9Xir+xp5gO3XG7Y7MvawHu6UdsCedn h/Z0AI9DFC03HAUMEwl2BHfdOdGeY2LBb20whLHDi2kmtjcHI/Hy94BGeThZK0HajfgLmpPGQTI Q X-Received: by 2002:a62:47c4:: with SMTP id p65-v6mr9126503pfi.170.1531523258178; Fri, 13 Jul 2018 16:07:38 -0700 (PDT) X-Received: by 2002:a62:47c4:: with SMTP id p65-v6mr9126475pfi.170.1531523257459; Fri, 13 Jul 2018 16:07:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531523257; cv=none; d=google.com; s=arc-20160816; b=orMBgfwt8+MOb266ouRrdSrH4NqAC1PsO944rDw5Lz01TichMxBvUhwlvuHmRHff+m 0PlNrojV0NwGHQk0oMb8oGiJbHSzaAaOzTBD/a/R7TtpfNL+dNfcrFIwA7NDY9BLA7xJ mnRjPjO1PGdy82FxZdv/Z3qBkgFS8Psa1LRJfrQsZk9RgNVW6GMGQd3UCV95oV9/YRNs ebSyJyTvXNMGebQ7Um/e3HW/Niz+hRFxs1zOTfGok3v+jWvx8EngA0u7QsXGZ6BZRL7a YvjJf2S5HeXUjYwPSKTxw1EZLOBuwu9fpQeKNIPmcXDCvPgwrTIhZSCo/NWtfLV1lIRW +LBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:references:message-id:in-reply-to:subject :cc:to:from:date:dkim-signature:arc-authentication-results; bh=T2+PUkWlQYmob3d7nbW9BJNgUCy5teUF4SQyur60lBY=; b=zJdWdU68Knr2JI88nnQnGaZIGkQAPmxOm/jV5sCHGhU5zOpE0SzcKWvbuSKnTdx7rJ 55Ydo+5kMLbZPj8N22vSrBUNL+fznK6zoTjIC6AF5PhIKiHF8Mboi30piuh7wrqYlDED w8AIe8LSrZXUyMDyiF7DImpbKQLL1CeM3e5CbUASkLXlnWgorQTD3Z5oyjRWMu9YEGZZ W1YiRh3lho+xPhIO0NDMIq0a4Z6w6FKwu3+SFMA+VH9343JHVoZ6RRjMibc63mdvEsEL 3YF5nBpEUh4tMxgVUOTUpZzJCOm+iQeHbHOpZncnP9T1ATF4Epj07pbqo7ftaL0vyOeT yp7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oTyoY5ZY; spf=pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id bc12-v6sor8403884plb.85.2018.07.13.16.07.37 for (Google Transport Security); Fri, 13 Jul 2018 16:07:37 -0700 (PDT) Received-SPF: pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oTyoY5ZY; spf=pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=T2+PUkWlQYmob3d7nbW9BJNgUCy5teUF4SQyur60lBY=; b=oTyoY5ZYXGItIy6cx+ht7YTXMM48yYu1RA9qyXh9i2A1gHInBn1iXqpMnYo+mBZXX3 sT/eCnZJi5AIXufgWSdfgcRlf/y7fhlf5XrRrdWoJdGgD9bbYYweJ9ouppUglMObdRCC 7Fx5ViiMrcFKYNlPYTXfnq9sigHbwhxN3gtEg/X9uLQ06EWc0qdcLFjbiKn8X8qoFFSS 9fQBpIfXQEzofLnpWWbmpjCAGGmsnu9Qigd7x5Tj85hWwT2aaGR8AH5w+vUsbwEqHLlq 4f4KJInaA5/nYyz/Y0nerQU3aJqcYDOj+ivXjKApv83coxIJue4y+kRc1f/BGmxrnG1A piaQ== X-Google-Smtp-Source: AAOMgpfjslxBd1HjqfSr3NhWLRij03RspDnRYnp0+IFgOXkoZA2FwFOqQpy1Fbqgbu4aiRTVXQDIlQ== X-Received: by 2002:a17:902:aa01:: with SMTP id be1-v6mr8138127plb.296.1531523256906; Fri, 13 Jul 2018 16:07:36 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id e16-v6sm37730415pfn.46.2018.07.13.16.07.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 13 Jul 2018 16:07:36 -0700 (PDT) Date: Fri, 13 Jul 2018 16:07:35 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch v3 -mm 6/6] mm, memcg: disregard mempolicies for cgroup-aware oom killer In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The cgroup-aware oom killer currently considers the set of allowed nodes for the allocation that triggers the oom killer and discounts usage from disallowed nodes when comparing cgroups. If a cgroup has both the cpuset and memory controllers enabled, it may be possible to restrict allocations to a subset of nodes, for example. Some latency sensitive users use cpusets to allocate only local memory, almost to the point of oom even though there is an abundance of available free memory on other nodes. The same is true for processes that mbind(2) their memory to a set of allowed nodes. This yields very inconsistent results by considering usage from each mem cgroup (and perhaps its subtree) for the allocation's set of allowed nodes for its mempolicy. Allocating a single page for a vma that is mbind to a now-oom node can cause a cgroup that is restricted to that node by its cpuset controller to be oom killed when other cgroups may have much higher overall usage. The cgroup-aware oom killer is described as killing the largest memory consuming cgroup (or subtree) without mentioning the mempolicy of the allocation. For now, discount it. It would be possible to add an additional oom policy for NUMA awareness if it would be generally useful later with the extensible interface. Signed-off-by: David Rientjes --- mm/memcontrol.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2819,19 +2819,15 @@ static inline bool memcg_has_children(struct mem_cgroup *memcg) return ret; } -static long memcg_oom_badness(struct mem_cgroup *memcg, - const nodemask_t *nodemask) +static long memcg_oom_badness(struct mem_cgroup *memcg) { const bool is_root_memcg = memcg == root_mem_cgroup; long points = 0; int nid; - pg_data_t *pgdat; for_each_node_state(nid, N_MEMORY) { - if (nodemask && !node_isset(nid, *nodemask)) - continue; + pg_data_t *pgdat = NODE_DATA(nid); - pgdat = NODE_DATA(nid); if (is_root_memcg) { points += node_page_state(pgdat, NR_ACTIVE_ANON) + node_page_state(pgdat, NR_INACTIVE_ANON); @@ -2867,8 +2863,7 @@ static long memcg_oom_badness(struct mem_cgroup *memcg, * >0: memcg is eligible, and the returned value is an estimation * of the memory footprint */ -static long oom_evaluate_memcg(struct mem_cgroup *memcg, - const nodemask_t *nodemask) +static long oom_evaluate_memcg(struct mem_cgroup *memcg) { struct css_task_iter it; struct task_struct *task; @@ -2902,7 +2897,7 @@ static long oom_evaluate_memcg(struct mem_cgroup *memcg, if (eligible <= 0) return eligible; - return memcg_oom_badness(memcg, nodemask); + return memcg_oom_badness(memcg); } static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) @@ -2962,7 +2957,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) if (memcg_has_children(iter)) continue; - score = oom_evaluate_memcg(iter, oc->nodemask); + score = oom_evaluate_memcg(iter); /* * Ignore empty and non-eligible memory cgroups. @@ -2991,8 +2986,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) if (oc->chosen_memcg != INFLIGHT_VICTIM) { if (root == root_mem_cgroup) { - group_score = oom_evaluate_memcg(root_mem_cgroup, - oc->nodemask); + group_score = oom_evaluate_memcg(root_mem_cgroup); if (group_score > leaf_score) { /* * Discount the sum of all leaf scores to find