From patchwork Tue Jun 30 21:25:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634821 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DF02739 for ; Tue, 30 Jun 2020 21:25:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E87B920663 for ; Tue, 30 Jun 2020 21:25:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E87B920663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 01CB08D0001; Tue, 30 Jun 2020 17:25:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E9CF58D0006; Tue, 30 Jun 2020 17:25:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8ADC8D0001; Tue, 30 Jun 2020 17:25:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 96B628D0006 for ; Tue, 30 Jun 2020 17:25:27 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 55D23180AD807 for ; Tue, 30 Jun 2020 21:25:27 +0000 (UTC) X-FDA: 76987159494.03.moon07_4a1203126e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 2C22D28A4EB for ; Tue, 30 Jun 2020 21:25:27 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30029:30045:30054:30064:30070,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y85zi5k4w36wp6yech3ptdzg1cwypd9bywx3oybs4oin6hcx3hkw6iscth3uf.mr6hpwhza4oi4pzfrpbiq4azsq1jksxfhahrpboniadre1bpk5jiok8qogqocrb.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: moon07_4a1203126e7b X-Filterd-Recvd-Size: 3099 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:26 +0000 (UTC) IronPort-SDR: MxZYPYdiPpO8Gh2AVxGHCtgUVuLPBxTFoGWZj6T2IMZUeq471siPwTtgl3mjBa3gHvZ/S6DC0B kqbePLmjGKbQ== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011310" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011310" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:22 -0700 IronPort-SDR: Ov0YaenJERyHlKrRWV9cIrfOzL1LUSY/DeDfkMZ/AHGcvyfsEPHB1EGkCWR3iFPNnFpnA+J2DR T/N2chdpCZWg== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336251" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:21 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Andrew Morton , David Rientjes , Michal Hocko Subject: [PATCH 01/12] mm/mempolicy: Add comment for missing LOCAL Date: Tue, 30 Jun 2020 14:25:06 -0700 Message-Id: <20200630212517.308045-2-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2C22D28A4EB X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: MPOL_LOCAL is a bit weird because it is simply a different name for an existing behavior (preferred policy with no node mask). It has been this way since it was added here: commit 479e2802d09f ("mm: mempolicy: Make MPOL_LOCAL a real policy") It is so similar to MPOL_PREFERRED in fact that when the policy is created in mpol_new, the mode is set as PREFERRED, and an internal state representing LOCAL doesn't exist. To prevent future explorers from scratching their head as to why MPOL_LOCAL isn't defined in the mpol_ops table, add a small comment explaining the situations. v2: Change comment to refer to mpol_new (Michal) Cc: Andrew Morton Cc: David Rientjes Acked-by: Michal Hocko Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 381320671677..bde193278301 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -427,6 +427,7 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { .create = mpol_new_bind, .rebind = mpol_rebind_nodemask, }, + /* [MPOL_LOCAL] - see mpol_new() */ }; static int migrate_page_add(struct page *page, struct list_head *pagelist, From patchwork Tue Jun 30 21:25:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634819 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1EB26739 for ; Tue, 30 Jun 2020 21:25:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DC4C520663 for ; Tue, 30 Jun 2020 21:25:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC4C520663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8D2848D0005; Tue, 30 Jun 2020 17:25:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8312F8D0001; Tue, 30 Jun 2020 17:25:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D10D8D0005; Tue, 30 Jun 2020 17:25:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 4840D8D0001 for ; Tue, 30 Jun 2020 17:25:27 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 10EBD1EE6 for ; Tue, 30 Jun 2020 21:25:27 +0000 (UTC) X-FDA: 76987159494.30.ring22_0714fbb26e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id D8D97180B3C85 for ; Tue, 30 Jun 2020 21:25:26 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30025:30054:30055:30064:30070:30075,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yg9ggsfrg4mqosr4f7qt63ek7pmypa9rm3sk6yokbfudapgt3fsd75udcgdd7.8gnbcz5xscaie33wo7953qed73gqn5imp7tk5px78ptdpoiojs1iobnx96wuqs6.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: ring22_0714fbb26e7b X-Filterd-Recvd-Size: 11862 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:25 +0000 (UTC) IronPort-SDR: GZdyCa9kr4RrkIefwpL261fj4yxzwN2rBe8VOBJhRd4dwAl8b4btbFOY+92pYGLkcLbxhYBVIT kDE+tgsHDA/w== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011312" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011312" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:22 -0700 IronPort-SDR: Jq1y5A9nKENqGDoolHFOIpXH6FkcStD5ep1PzSL/VKZhwU7v3oFN7FRcGsrAA8jPwjL69dTfbp LDOkJ0zO1IsA== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336255" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:22 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Dave Hansen , Andrew Morton , Vlastimil Babka , Ben Widawsky Subject: [PATCH 02/12] mm/mempolicy: convert single preferred_node to full nodemask Date: Tue, 30 Jun 2020 14:25:07 -0700 Message-Id: <20200630212517.308045-3-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D8D97180B3C85 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen The NUMA APIs currently allow passing in a "preferred node" as a single bit set in a nodemask. If more than one bit it set, bits after the first are ignored. Internally, this is implemented as a single integer: mempolicy->preferred_node. This single node is generally OK for location-based NUMA where memory being allocated will eventually be operated on by a single CPU. However, in systems with multiple memory types, folks want to target a *type* of memory instead of a location. For instance, someone might want some high-bandwidth memory but do not care about the CPU next to which it is allocated. Or, they want a cheap, high capacity allocation and want to target all NUMA nodes which have persistent memory in volatile mode. In both of these cases, the application wants to target a *set* of nodes, but does not want strict MPOL_BIND behavior as that could lead to OOM killer or SIGSEGV. To get that behavior, a MPOL_PREFERRED mode is desirable, but one that honors multiple nodes to be set in the nodemask. The first step in that direction is to be able to internally store multiple preferred nodes, which is implemented in this patch. This should not have any function changes and just switches the internal representation of mempolicy->preferred_node from an integer to a nodemask called 'mempolicy->preferred_nodes'. This is not a pie-in-the-sky dream for an API. This was a response to a specific ask of more than one group at Intel. Specifically: 1. There are existing libraries that target memory types such as https://github.com/memkind/memkind. These are known to suffer from SIGSEGV's when memory is low on targeted memory "kinds" that span more than one node. The MCDRAM on a Xeon Phi in "Cluster on Die" mode is an example of this. 2. Volatile-use persistent memory users want to have a memory policy which is targeted at either "cheap and slow" (PMEM) or "expensive and fast" (DRAM). However, they do not want to experience allocation failures when the targeted type is unavailable. 3. Allocate-then-run. Generally, we let the process scheduler decide on which physical CPU to run a task. That location provides a default allocation policy, and memory availability is not generally considered when placing tasks. For situations where memory is valuable and constrained, some users want to allocate memory first, *then* allocate close compute resources to the allocation. This is the reverse of the normal (CPU) model. Accelerators such as GPUs that operate on core-mm-managed memory are interested in this model. v2: Fix spelling errors in commit message. (Ben) clang-format. (Ben) Integrated bit from another patch. (Ben) Update the docs to reflect the internal data structure change (Ben) Don't advertise MPOL_PREFERRED_MANY in UAPI until we can handle it (Ben) Added more to the commit message (Dave) Cc: Andrew Morton Cc: Vlastimil Babka Signed-off-by: Dave Hansen Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky --- .../admin-guide/mm/numa_memory_policy.rst | 6 +-- include/linux/mempolicy.h | 4 +- mm/mempolicy.c | 40 ++++++++++--------- 3 files changed, 27 insertions(+), 23 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index 067a90a1499c..1ad020c459b8 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -205,9 +205,9 @@ MPOL_PREFERRED of increasing distance from the preferred node based on information provided by the platform firmware. - Internally, the Preferred policy uses a single node--the - preferred_node member of struct mempolicy. When the internal - mode flag MPOL_F_LOCAL is set, the preferred_node is ignored + Internally, the Preferred policy uses a nodemask--the + preferred_nodes member of struct mempolicy. When the internal + mode flag MPOL_F_LOCAL is set, the preferred_nodes are ignored and the policy is interpreted as local allocation. "Local" allocation policy can be viewed as a Preferred policy that starts at the node containing the cpu where the allocation diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index ea9c15b60a96..c66ea9f4c61e 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -47,8 +47,8 @@ struct mempolicy { unsigned short mode; /* See MPOL_* above */ unsigned short flags; /* See set_mempolicy() MPOL_F_* above */ union { - short preferred_node; /* preferred */ - nodemask_t nodes; /* interleave/bind */ + nodemask_t preferred_nodes; /* preferred */ + nodemask_t nodes; /* interleave/bind */ /* undefined for default */ } v; union { diff --git a/mm/mempolicy.c b/mm/mempolicy.c index bde193278301..1098995f234d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -205,7 +205,7 @@ static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) else if (nodes_empty(*nodes)) return -EINVAL; /* no allowed nodes */ else - pol->v.preferred_node = first_node(*nodes); + pol->v.preferred_nodes = nodemask_of_node(first_node(*nodes)); return 0; } @@ -345,22 +345,26 @@ static void mpol_rebind_preferred(struct mempolicy *pol, const nodemask_t *nodes) { nodemask_t tmp; + nodemask_t preferred_node; + + /* MPOL_PREFERRED uses only the first node in the mask */ + preferred_node = nodemask_of_node(first_node(*nodes)); if (pol->flags & MPOL_F_STATIC_NODES) { int node = first_node(pol->w.user_nodemask); if (node_isset(node, *nodes)) { - pol->v.preferred_node = node; + pol->v.preferred_nodes = nodemask_of_node(node); pol->flags &= ~MPOL_F_LOCAL; } else pol->flags |= MPOL_F_LOCAL; } else if (pol->flags & MPOL_F_RELATIVE_NODES) { mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); - pol->v.preferred_node = first_node(tmp); + pol->v.preferred_nodes = tmp; } else if (!(pol->flags & MPOL_F_LOCAL)) { - pol->v.preferred_node = node_remap(pol->v.preferred_node, - pol->w.cpuset_mems_allowed, - *nodes); + nodes_remap(tmp, pol->v.preferred_nodes, + pol->w.cpuset_mems_allowed, preferred_node); + pol->v.preferred_nodes = tmp; pol->w.cpuset_mems_allowed = *nodes; } } @@ -913,7 +917,7 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) break; case MPOL_PREFERRED: if (!(p->flags & MPOL_F_LOCAL)) - node_set(p->v.preferred_node, *nodes); + *nodes = p->v.preferred_nodes; /* else return empty node mask for local allocation */ break; default: @@ -1906,9 +1910,9 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) { - if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) - nd = policy->v.preferred_node; - else { + if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) { + nd = first_node(policy->v.preferred_nodes); + } else { /* * __GFP_THISNODE shouldn't even be used with the bind policy * because we might easily break the expectation to stay on the @@ -1953,7 +1957,7 @@ unsigned int mempolicy_slab_node(void) /* * handled MPOL_F_LOCAL above */ - return policy->v.preferred_node; + return first_node(policy->v.preferred_nodes); case MPOL_INTERLEAVE: return interleave_nodes(policy); @@ -2087,7 +2091,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) if (mempolicy->flags & MPOL_F_LOCAL) nid = numa_node_id(); else - nid = mempolicy->v.preferred_node; + nid = first_node(mempolicy->v.preferred_nodes); init_nodemask_of_node(mask, nid); break; @@ -2225,7 +2229,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, * node in its nodemask, we allocate the standard way. */ if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) - hpage_node = pol->v.preferred_node; + hpage_node = first_node(pol->v.preferred_nodes); nmask = policy_nodemask(gfp, pol); if (!nmask || node_isset(hpage_node, *nmask)) { @@ -2364,7 +2368,7 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) /* a's ->flags is the same as b's */ if (a->flags & MPOL_F_LOCAL) return true; - return a->v.preferred_node == b->v.preferred_node; + return nodes_equal(a->v.preferred_nodes, b->v.preferred_nodes); default: BUG(); return false; @@ -2508,7 +2512,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long if (pol->flags & MPOL_F_LOCAL) polnid = numa_node_id(); else - polnid = pol->v.preferred_node; + polnid = first_node(pol->v.preferred_nodes); break; case MPOL_BIND: @@ -2825,7 +2829,7 @@ void __init numa_policy_init(void) .refcnt = ATOMIC_INIT(1), .mode = MPOL_PREFERRED, .flags = MPOL_F_MOF | MPOL_F_MORON, - .v = { .preferred_node = nid, }, + .v = { .preferred_nodes = nodemask_of_node(nid), }, }; } @@ -2991,7 +2995,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) if (mode != MPOL_PREFERRED) new->v.nodes = nodes; else if (nodelist) - new->v.preferred_node = first_node(nodes); + new->v.preferred_nodes = nodemask_of_node(first_node(nodes)); else new->flags |= MPOL_F_LOCAL; @@ -3044,7 +3048,7 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) if (flags & MPOL_F_LOCAL) mode = MPOL_LOCAL; else - node_set(pol->v.preferred_node, nodes); + nodes_or(nodes, nodes, pol->v.preferred_nodes); break; case MPOL_BIND: case MPOL_INTERLEAVE: From patchwork Tue Jun 30 21:25:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634823 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 37AF2739 for ; Tue, 30 Jun 2020 21:25:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 00E6120663 for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 00E6120663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 32ABF8D0008; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2B3348D0007; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B91B8D0008; Tue, 30 Jun 2020 17:25:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id D2BFB8D0006 for ; Tue, 30 Jun 2020 17:25:28 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9C8942DFC for ; Tue, 30 Jun 2020 21:25:28 +0000 (UTC) X-FDA: 76987159536.23.sun55_0d0ba4426e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 713A337606 for ; Tue, 30 Jun 2020 21:25:28 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064:30070,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y836oitbn556uin64qjdqj184xmopwp98kom8zq4o7p6m484ursx5b4j1tqgj.thrnn6kazaubjryfetes17yqrtuf9gmo98r8qcznjc5kzwmubpmp9yjiku64wex.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:34,LUA_SUMMARY:none X-HE-Tag: sun55_0d0ba4426e7b X-Filterd-Recvd-Size: 8214 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:27 +0000 (UTC) IronPort-SDR: +M27VCwgEQLO3vZlgll7oOFpCgBkpMc6ITS+/idKTm0+LOsKn4Qfxq3n7XIFYuhX4TuYt6Cfzv Ma5wvtmgOlEw== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011315" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011315" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:23 -0700 IronPort-SDR: I9ni+G/Dnh1KF8GPSOcdTvZrp7u/WAgYIveajT861Ki3qAhm/2f/+YSzzxYx3bJKPQDScSFmZR +iFiNwR5V7WA== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336260" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:22 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Dave Hansen , Andrew Morton , Ben Widawsky Subject: [PATCH 03/12] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Date: Tue, 30 Jun 2020 14:25:08 -0700 Message-Id: <20200630212517.308045-4-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 713A337606 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen MPOL_PREFERRED honors only a single node set in the nodemask. Add the bare define for a new mode which will allow more than one. The patch does all the plumbing without actually adding the new policy type. v2: Plumb most MPOL_PREFERRED_MANY without exposing UAPI (Ben) Fixes for checkpatch (Ben) Cc: Andrew Morton Signed-off-by: Dave Hansen Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1098995f234d..33bf29ddfab2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -31,6 +31,9 @@ * but useful to set in a VMA when you have a non default * process policy. * + * preferred many Try a set of nodes first before normal fallback. This is + * similar to preferred without the special case. + * * default Allocate on the local node first, or when on a VMA * use the process policy. This is what Linux always did * in a NUMA aware kernel and still does by, ahem, default. @@ -105,6 +108,8 @@ #include "internal.h" +#define MPOL_PREFERRED_MANY MPOL_MAX + /* Internal flags */ #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for continuous vmas */ #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */ @@ -175,7 +180,7 @@ struct mempolicy *get_task_policy(struct task_struct *p) static const struct mempolicy_operations { int (*create)(struct mempolicy *pol, const nodemask_t *nodes); void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes); -} mpol_ops[MPOL_MAX]; +} mpol_ops[MPOL_MAX + 1]; static inline int mpol_store_user_nodemask(const struct mempolicy *pol) { @@ -415,7 +420,7 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) mmap_write_unlock(mm); } -static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { +static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = { [MPOL_DEFAULT] = { .rebind = mpol_rebind_default, }, @@ -432,6 +437,10 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { .rebind = mpol_rebind_nodemask, }, /* [MPOL_LOCAL] - see mpol_new() */ + [MPOL_PREFERRED_MANY] = { + .create = NULL, + .rebind = NULL, + }, }; static int migrate_page_add(struct page *page, struct list_head *pagelist, @@ -915,6 +924,9 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) case MPOL_INTERLEAVE: *nodes = p->v.nodes; break; + case MPOL_PREFERRED_MANY: + *nodes = p->v.preferred_nodes; + break; case MPOL_PREFERRED: if (!(p->flags & MPOL_F_LOCAL)) *nodes = p->v.preferred_nodes; @@ -1910,7 +1922,9 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) { - if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) { + if ((policy->mode == MPOL_PREFERRED || + policy->mode == MPOL_PREFERRED_MANY) && + !(policy->flags & MPOL_F_LOCAL)) { nd = first_node(policy->v.preferred_nodes); } else { /* @@ -1953,6 +1967,7 @@ unsigned int mempolicy_slab_node(void) return node; switch (policy->mode) { + case MPOL_PREFERRED_MANY: case MPOL_PREFERRED: /* * handled MPOL_F_LOCAL above @@ -2087,6 +2102,9 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) task_lock(current); mempolicy = current->mempolicy; switch (mempolicy->mode) { + case MPOL_PREFERRED_MANY: + *mask = mempolicy->v.preferred_nodes; + break; case MPOL_PREFERRED: if (mempolicy->flags & MPOL_F_LOCAL) nid = numa_node_id(); @@ -2141,6 +2159,9 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, * nodes in mask. */ break; + case MPOL_PREFERRED_MANY: + ret = nodes_intersects(mempolicy->v.preferred_nodes, *mask); + break; case MPOL_BIND: case MPOL_INTERLEAVE: ret = nodes_intersects(mempolicy->v.nodes, *mask); @@ -2225,10 +2246,13 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, * node and don't fall back to other nodes, as the cost of * remote accesses would likely offset THP benefits. * - * If the policy is interleave, or does not allow the current - * node in its nodemask, we allocate the standard way. + * If the policy is interleave or multiple preferred nodes, or + * does not allow the current node in its nodemask, we allocate + * the standard way. */ - if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) + if ((pol->mode == MPOL_PREFERRED || + pol->mode == MPOL_PREFERRED_MANY) && + !(pol->flags & MPOL_F_LOCAL)) hpage_node = first_node(pol->v.preferred_nodes); nmask = policy_nodemask(gfp, pol); @@ -2364,6 +2388,9 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) case MPOL_BIND: case MPOL_INTERLEAVE: return !!nodes_equal(a->v.nodes, b->v.nodes); + case MPOL_PREFERRED_MANY: + return !!nodes_equal(a->v.preferred_nodes, + b->v.preferred_nodes); case MPOL_PREFERRED: /* a's ->flags is the same as b's */ if (a->flags & MPOL_F_LOCAL) @@ -2532,6 +2559,8 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long polnid = zone_to_nid(z->zone); break; + /* case MPOL_PREFERRED_MANY: */ + default: BUG(); } @@ -2883,6 +2912,7 @@ static const char * const policy_modes[] = [MPOL_BIND] = "bind", [MPOL_INTERLEAVE] = "interleave", [MPOL_LOCAL] = "local", + [MPOL_PREFERRED_MANY] = "prefer (many)", }; @@ -2962,6 +2992,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) if (!nodelist) err = 0; goto out; + case MPOL_PREFERRED_MANY: case MPOL_BIND: /* * Insist on a nodelist @@ -3044,6 +3075,9 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) switch (mode) { case MPOL_DEFAULT: break; + case MPOL_PREFERRED_MANY: + WARN_ON(flags & MPOL_F_LOCAL); + fallthrough; case MPOL_PREFERRED: if (flags & MPOL_F_LOCAL) mode = MPOL_LOCAL; From patchwork Tue Jun 30 21:25:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634829 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E16C739 for ; Tue, 30 Jun 2020 21:25:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 047EC2072D for ; Tue, 30 Jun 2020 21:25:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 047EC2072D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E76778D000B; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D5E518D000A; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C027D8D0009; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id 9EE0B8D0007 for ; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 62B6C8248068 for ; Tue, 30 Jun 2020 21:25:29 +0000 (UTC) X-FDA: 76987159578.03.birds93_2d0b20026e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 3363E28A4E9 for ; Tue, 30 Jun 2020 21:25:29 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30054:30064,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygruwug37yd6cpnbszgx4oyesu8ycntjrxyxutj7oru8zf91tbzg79p1bssjs.uwygyqrufmjy9d6mopy9hqgq1dtuett6jdjisxi6zp35hnw5gz4t99u44whppei.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: birds93_2d0b20026e7b X-Filterd-Recvd-Size: 3600 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:27 +0000 (UTC) IronPort-SDR: KucV2/BGc+aDgZeYKkKp+uIoTLKm+6LvUFhhT86oZ9BJsNfXrlBc6AltIL0IXVcypksJZijvx4 fQHAMpjmgyBQ== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011316" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011316" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:23 -0700 IronPort-SDR: cA821hHlmTE2NiSTAOgHl5UlpxdCUWDkRaE6ceH/S074++WSfveBc1T2NQ4PPRjFo/FRFCZHo0 t1fnhyXlrwPQ== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336263" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:23 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Dave Hansen , Andrew Morton , Randy Dunlap , Ben Widawsky Subject: [PATCH 04/12] mm/mempolicy: allow preferred code to take a nodemask Date: Tue, 30 Jun 2020 14:25:09 -0700 Message-Id: <20200630212517.308045-5-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 3363E28A4E9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen Create a helper function (mpol_new_preferred_many()) which is usable both by the old, single-node MPOL_PREFERRED and the new MPOL_PREFERRED_MANY. Enforce the old single-node MPOL_PREFERRED behavior in the "new" version of mpol_new_preferred() which calls mpol_new_preferred_many(). Cc: Andrew Morton Cc: Randy Dunlap Signed-off-by: Dave Hansen Signed-off-by: Ben Widawsky Reported-by: kernel test robot --- mm/mempolicy.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 33bf29ddfab2..1ad6e446d8f6 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -203,17 +203,30 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes) return 0; } -static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) +static int mpol_new_preferred_many(struct mempolicy *pol, + const nodemask_t *nodes) { if (!nodes) pol->flags |= MPOL_F_LOCAL; /* local allocation */ else if (nodes_empty(*nodes)) return -EINVAL; /* no allowed nodes */ else - pol->v.preferred_nodes = nodemask_of_node(first_node(*nodes)); + pol->v.preferred_nodes = *nodes; return 0; } +static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) +{ + if (nodes) { + /* MPOL_PREFERRED can only take a single node: */ + nodemask_t tmp = nodemask_of_node(first_node(*nodes)); + + return mpol_new_preferred_many(pol, &tmp); + } + + return mpol_new_preferred_many(pol, NULL); +} + static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) { if (nodes_empty(*nodes)) From patchwork Tue Jun 30 21:25:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634827 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D27313B4 for ; Tue, 30 Jun 2020 21:25:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 446F320663 for ; Tue, 30 Jun 2020 21:25:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 446F320663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 67B628D0006; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5549D8D0009; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A3078D0006; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id EBB2E8D0007 for ; Tue, 30 Jun 2020 17:25:28 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9D81B181ABE87 for ; Tue, 30 Jun 2020 21:25:28 +0000 (UTC) X-FDA: 76987159536.28.wing18_4d10e6526e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 7BD5A6C1D for ; Tue, 30 Jun 2020 21:25:28 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30054:30064,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8sjmp9ye6rues1ueh7b5r6we89yc6jsme4a3ffpo7gdegewegfyudm8mdsue.zepez4ccpphq7dfep48h86yc1nrrsexfxq4yh39rgd511mpdaruo5gwqaohjzd7.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:28,LUA_SUMMARY:none X-HE-Tag: wing18_4d10e6526e7b X-Filterd-Recvd-Size: 4239 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:27 +0000 (UTC) IronPort-SDR: kTiNVmgcNtMasYgU56aXXav0f+hS/4CEH99ZRFoaQCJJ5NIq2YUCnu4RiVhvPavrp+xdmYXvvh vpP2O3x1DEoA== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011319" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011319" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:24 -0700 IronPort-SDR: 8tf47HXTmflERPAz04KulZYeKlsYIfrtqxfeH/oK2tBgZDLjipaFmRLB7TDBLXheYyn3sHaWSP bgzUhgbo8Snw== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336270" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:23 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Dave Hansen , Andrew Morton , Ben Widawsky Subject: [PATCH 05/12] mm/mempolicy: refactor rebind code for PREFERRED_MANY Date: Tue, 30 Jun 2020 14:25:10 -0700 Message-Id: <20200630212517.308045-6-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7BD5A6C1D X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen Again, this extracts the "only one node must be set" behavior of MPOL_PREFERRED. It retains virtually all of the existing code so it can be used by MPOL_PREFERRED_MANY as well. v2: Fixed typos in commit message. (Ben) Merged bits from other patches. (Ben) annotate mpol_rebind_preferred_many as unused (Ben) Cc: Andrew Morton Signed-off-by: Dave Hansen Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1ad6e446d8f6..d320a02fd35b 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -359,14 +359,11 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes) pol->v.nodes = tmp; } -static void mpol_rebind_preferred(struct mempolicy *pol, - const nodemask_t *nodes) +static void mpol_rebind_preferred_common(struct mempolicy *pol, + const nodemask_t *preferred_nodes, + const nodemask_t *nodes) { nodemask_t tmp; - nodemask_t preferred_node; - - /* MPOL_PREFERRED uses only the first node in the mask */ - preferred_node = nodemask_of_node(first_node(*nodes)); if (pol->flags & MPOL_F_STATIC_NODES) { int node = first_node(pol->w.user_nodemask); @@ -381,12 +378,30 @@ static void mpol_rebind_preferred(struct mempolicy *pol, pol->v.preferred_nodes = tmp; } else if (!(pol->flags & MPOL_F_LOCAL)) { nodes_remap(tmp, pol->v.preferred_nodes, - pol->w.cpuset_mems_allowed, preferred_node); + pol->w.cpuset_mems_allowed, *preferred_nodes); pol->v.preferred_nodes = tmp; pol->w.cpuset_mems_allowed = *nodes; } } +/* MPOL_PREFERRED_MANY allows multiple nodes to be set in 'nodes' */ +static void __maybe_unused mpol_rebind_preferred_many(struct mempolicy *pol, + const nodemask_t *nodes) +{ + mpol_rebind_preferred_common(pol, nodes, nodes); +} + +static void mpol_rebind_preferred(struct mempolicy *pol, + const nodemask_t *nodes) +{ + nodemask_t preferred_node; + + /* MPOL_PREFERRED uses only the first node in 'nodes' */ + preferred_node = nodemask_of_node(first_node(*nodes)); + + mpol_rebind_preferred_common(pol, &preferred_node, nodes); +} + /* * mpol_rebind_policy - Migrate a policy to a different set of nodes * From patchwork Tue Jun 30 21:25:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634835 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2ADF4739 for ; Tue, 30 Jun 2020 21:25:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DE1B320663 for ; Tue, 30 Jun 2020 21:25:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DE1B320663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 52BA88D000C; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 48D368D0007; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 242A88D000C; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id E05F18D0007 for ; Tue, 30 Jun 2020 17:25:31 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 939EC1EE6 for ; Tue, 30 Jun 2020 21:25:31 +0000 (UTC) X-FDA: 76987159662.23.paper10_0f00b0f26e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 687ED37606 for ; Tue, 30 Jun 2020 21:25:31 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30029:30034:30051:30054:30056:30064:30075,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yfzn16rc9s96uzz7nwtsgcaxxp6yp7uz9iyuu864z455pg1ps9et8jxz3gcaq.dz5wqotzdgdjhhddf64r5x316s9h5eermixqnrng6cdtgze7h1jc3fiwmskzomk.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: paper10_0f00b0f26e7b X-Filterd-Recvd-Size: 15024 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:28 +0000 (UTC) IronPort-SDR: NLzrdXhGvbgPfR0n5ZmsvpJc2+pZ1EnSjufP+2UxgE7hDIOxKZfESVEJ/IjjrkSjUsm+jYvNgx eASUEiWmRW1Q== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011320" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011320" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:24 -0700 IronPort-SDR: +c5MKAiuSdzR3kifmlLApsUXQJVo2TyM9aED+03XExR2tBPKjgIi1NEGtcm73gtVbF2SDMnWbK IWsQx7nN02EA== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336274" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:24 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Dave Hansen Subject: [PATCH 06/12] mm/mempolicy: kill v.preferred_nodes Date: Tue, 30 Jun 2020 14:25:11 -0700 Message-Id: <20200630212517.308045-7-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 687ED37606 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that preferred_nodes is just a mask, and policies are mutually exclusive, there is no reason to have a separate mask. This patch is optional. It definitely helps clean up code in future patches, but there is no functional difference to leaving it with the previous name. I do believe it helps demonstrate the exclusivity of the fields. Cc: Michal Hocko Cc: Dave Hansen Signed-off-by: Ben Widawsky --- include/linux/mempolicy.h | 6 +- mm/mempolicy.c | 112 ++++++++++++++++++-------------------- 2 files changed, 55 insertions(+), 63 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index c66ea9f4c61e..892f78f39f84 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -46,11 +46,7 @@ struct mempolicy { atomic_t refcnt; unsigned short mode; /* See MPOL_* above */ unsigned short flags; /* See set_mempolicy() MPOL_F_* above */ - union { - nodemask_t preferred_nodes; /* preferred */ - nodemask_t nodes; /* interleave/bind */ - /* undefined for default */ - } v; + nodemask_t nodes; /* interleave/bind/many */ union { nodemask_t cpuset_mems_allowed; /* relative to these nodes */ nodemask_t user_nodemask; /* nodemask passed by user */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d320a02fd35b..e71ebc906ff0 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -199,7 +199,7 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes) { if (nodes_empty(*nodes)) return -EINVAL; - pol->v.nodes = *nodes; + pol->nodes = *nodes; return 0; } @@ -211,7 +211,7 @@ static int mpol_new_preferred_many(struct mempolicy *pol, else if (nodes_empty(*nodes)) return -EINVAL; /* no allowed nodes */ else - pol->v.preferred_nodes = *nodes; + pol->nodes = *nodes; return 0; } @@ -231,7 +231,7 @@ static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) { if (nodes_empty(*nodes)) return -EINVAL; - pol->v.nodes = *nodes; + pol->nodes = *nodes; return 0; } @@ -348,15 +348,15 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes) else if (pol->flags & MPOL_F_RELATIVE_NODES) mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); else { - nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed, - *nodes); + nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed, + *nodes); pol->w.cpuset_mems_allowed = *nodes; } if (nodes_empty(tmp)) tmp = *nodes; - pol->v.nodes = tmp; + pol->nodes = tmp; } static void mpol_rebind_preferred_common(struct mempolicy *pol, @@ -369,17 +369,17 @@ static void mpol_rebind_preferred_common(struct mempolicy *pol, int node = first_node(pol->w.user_nodemask); if (node_isset(node, *nodes)) { - pol->v.preferred_nodes = nodemask_of_node(node); + pol->nodes = nodemask_of_node(node); pol->flags &= ~MPOL_F_LOCAL; } else pol->flags |= MPOL_F_LOCAL; } else if (pol->flags & MPOL_F_RELATIVE_NODES) { mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); - pol->v.preferred_nodes = tmp; + pol->nodes = tmp; } else if (!(pol->flags & MPOL_F_LOCAL)) { - nodes_remap(tmp, pol->v.preferred_nodes, - pol->w.cpuset_mems_allowed, *preferred_nodes); - pol->v.preferred_nodes = tmp; + nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed, + *preferred_nodes); + pol->nodes = tmp; pol->w.cpuset_mems_allowed = *nodes; } } @@ -950,14 +950,14 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) switch (p->mode) { case MPOL_BIND: case MPOL_INTERLEAVE: - *nodes = p->v.nodes; + *nodes = p->nodes; break; case MPOL_PREFERRED_MANY: - *nodes = p->v.preferred_nodes; + *nodes = p->nodes; break; case MPOL_PREFERRED: if (!(p->flags & MPOL_F_LOCAL)) - *nodes = p->v.preferred_nodes; + *nodes = p->nodes; /* else return empty node mask for local allocation */ break; default: @@ -1043,7 +1043,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, *policy = err; } else if (pol == current->mempolicy && pol->mode == MPOL_INTERLEAVE) { - *policy = next_node_in(current->il_prev, pol->v.nodes); + *policy = next_node_in(current->il_prev, pol->nodes); } else { err = -EINVAL; goto out; @@ -1918,14 +1918,14 @@ static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone) BUG_ON(dynamic_policy_zone == ZONE_MOVABLE); /* - * if policy->v.nodes has movable memory only, + * if policy->nodes has movable memory only, * we apply policy when gfp_zone(gfp) = ZONE_MOVABLE only. * - * policy->v.nodes is intersect with node_states[N_MEMORY]. + * policy->nodes is intersect with node_states[N_MEMORY]. * so if the following test faile, it implies - * policy->v.nodes has movable memory only. + * policy->nodes has movable memory only. */ - if (!nodes_intersects(policy->v.nodes, node_states[N_HIGH_MEMORY])) + if (!nodes_intersects(policy->nodes, node_states[N_HIGH_MEMORY])) dynamic_policy_zone = ZONE_MOVABLE; return zone >= dynamic_policy_zone; @@ -1939,9 +1939,9 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) { /* Lower zones don't get a nodemask applied for MPOL_BIND */ if (unlikely(policy->mode == MPOL_BIND) && - apply_policy_zone(policy, gfp_zone(gfp)) && - cpuset_nodemask_valid_mems_allowed(&policy->v.nodes)) - return &policy->v.nodes; + apply_policy_zone(policy, gfp_zone(gfp)) && + cpuset_nodemask_valid_mems_allowed(&policy->nodes)) + return &policy->nodes; return NULL; } @@ -1953,7 +1953,7 @@ static int policy_node(gfp_t gfp, struct mempolicy *policy, if ((policy->mode == MPOL_PREFERRED || policy->mode == MPOL_PREFERRED_MANY) && !(policy->flags & MPOL_F_LOCAL)) { - nd = first_node(policy->v.preferred_nodes); + nd = first_node(policy->nodes); } else { /* * __GFP_THISNODE shouldn't even be used with the bind policy @@ -1972,7 +1972,7 @@ static unsigned interleave_nodes(struct mempolicy *policy) unsigned next; struct task_struct *me = current; - next = next_node_in(me->il_prev, policy->v.nodes); + next = next_node_in(me->il_prev, policy->nodes); if (next < MAX_NUMNODES) me->il_prev = next; return next; @@ -2000,7 +2000,7 @@ unsigned int mempolicy_slab_node(void) /* * handled MPOL_F_LOCAL above */ - return first_node(policy->v.preferred_nodes); + return first_node(policy->nodes); case MPOL_INTERLEAVE: return interleave_nodes(policy); @@ -2016,7 +2016,7 @@ unsigned int mempolicy_slab_node(void) enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL); zonelist = &NODE_DATA(node)->node_zonelists[ZONELIST_FALLBACK]; z = first_zones_zonelist(zonelist, highest_zoneidx, - &policy->v.nodes); + &policy->nodes); return z->zone ? zone_to_nid(z->zone) : node; } @@ -2027,12 +2027,12 @@ unsigned int mempolicy_slab_node(void) /* * Do static interleaving for a VMA with known offset @n. Returns the n'th - * node in pol->v.nodes (starting from n=0), wrapping around if n exceeds the + * node in pol->nodes (starting from n=0), wrapping around if n exceeds the * number of present nodes. */ static unsigned offset_il_node(struct mempolicy *pol, unsigned long n) { - unsigned nnodes = nodes_weight(pol->v.nodes); + unsigned nnodes = nodes_weight(pol->nodes); unsigned target; int i; int nid; @@ -2040,9 +2040,9 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned long n) if (!nnodes) return numa_node_id(); target = (unsigned int)n % nnodes; - nid = first_node(pol->v.nodes); + nid = first_node(pol->nodes); for (i = 0; i < target; i++) - nid = next_node(nid, pol->v.nodes); + nid = next_node(nid, pol->nodes); return nid; } @@ -2098,7 +2098,7 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, } else { nid = policy_node(gfp_flags, *mpol, numa_node_id()); if ((*mpol)->mode == MPOL_BIND) - *nodemask = &(*mpol)->v.nodes; + *nodemask = &(*mpol)->nodes; } return nid; } @@ -2131,19 +2131,19 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) mempolicy = current->mempolicy; switch (mempolicy->mode) { case MPOL_PREFERRED_MANY: - *mask = mempolicy->v.preferred_nodes; + *mask = mempolicy->nodes; break; case MPOL_PREFERRED: if (mempolicy->flags & MPOL_F_LOCAL) nid = numa_node_id(); else - nid = first_node(mempolicy->v.preferred_nodes); + nid = first_node(mempolicy->nodes); init_nodemask_of_node(mask, nid); break; case MPOL_BIND: case MPOL_INTERLEAVE: - *mask = mempolicy->v.nodes; + *mask = mempolicy->nodes; break; default: @@ -2188,11 +2188,11 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, */ break; case MPOL_PREFERRED_MANY: - ret = nodes_intersects(mempolicy->v.preferred_nodes, *mask); + ret = nodes_intersects(mempolicy->nodes, *mask); break; case MPOL_BIND: case MPOL_INTERLEAVE: - ret = nodes_intersects(mempolicy->v.nodes, *mask); + ret = nodes_intersects(mempolicy->nodes, *mask); break; default: BUG(); @@ -2281,7 +2281,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, if ((pol->mode == MPOL_PREFERRED || pol->mode == MPOL_PREFERRED_MANY) && !(pol->flags & MPOL_F_LOCAL)) - hpage_node = first_node(pol->v.preferred_nodes); + hpage_node = first_node(pol->nodes); nmask = policy_nodemask(gfp, pol); if (!nmask || node_isset(hpage_node, *nmask)) { @@ -2415,15 +2415,14 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) switch (a->mode) { case MPOL_BIND: case MPOL_INTERLEAVE: - return !!nodes_equal(a->v.nodes, b->v.nodes); + return !!nodes_equal(a->nodes, b->nodes); case MPOL_PREFERRED_MANY: - return !!nodes_equal(a->v.preferred_nodes, - b->v.preferred_nodes); + return !!nodes_equal(a->nodes, b->nodes); case MPOL_PREFERRED: /* a's ->flags is the same as b's */ if (a->flags & MPOL_F_LOCAL) return true; - return nodes_equal(a->v.preferred_nodes, b->v.preferred_nodes); + return nodes_equal(a->nodes, b->nodes); default: BUG(); return false; @@ -2567,7 +2566,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long if (pol->flags & MPOL_F_LOCAL) polnid = numa_node_id(); else - polnid = first_node(pol->v.preferred_nodes); + polnid = first_node(pol->nodes); break; case MPOL_BIND: @@ -2578,12 +2577,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long * else select nearest allowed node, if any. * If no allowed nodes, use current [!misplaced]. */ - if (node_isset(curnid, pol->v.nodes)) + if (node_isset(curnid, pol->nodes)) goto out; - z = first_zones_zonelist( - node_zonelist(numa_node_id(), GFP_HIGHUSER), - gfp_zone(GFP_HIGHUSER), - &pol->v.nodes); + z = first_zones_zonelist(node_zonelist(numa_node_id(), + GFP_HIGHUSER), + gfp_zone(GFP_HIGHUSER), &pol->nodes); polnid = zone_to_nid(z->zone); break; @@ -2784,11 +2782,9 @@ int mpol_set_shared_policy(struct shared_policy *info, struct sp_node *new = NULL; unsigned long sz = vma_pages(vma); - pr_debug("set_shared_policy %lx sz %lu %d %d %lx\n", - vma->vm_pgoff, - sz, npol ? npol->mode : -1, - npol ? npol->flags : -1, - npol ? nodes_addr(npol->v.nodes)[0] : NUMA_NO_NODE); + pr_debug("set_shared_policy %lx sz %lu %d %d %lx\n", vma->vm_pgoff, sz, + npol ? npol->mode : -1, npol ? npol->flags : -1, + npol ? nodes_addr(npol->nodes)[0] : NUMA_NO_NODE); if (npol) { new = sp_alloc(vma->vm_pgoff, vma->vm_pgoff + sz, npol); @@ -2882,11 +2878,11 @@ void __init numa_policy_init(void) 0, SLAB_PANIC, NULL); for_each_node(nid) { - preferred_node_policy[nid] = (struct mempolicy) { + preferred_node_policy[nid] = (struct mempolicy){ .refcnt = ATOMIC_INIT(1), .mode = MPOL_PREFERRED, .flags = MPOL_F_MOF | MPOL_F_MORON, - .v = { .preferred_nodes = nodemask_of_node(nid), }, + .nodes = nodemask_of_node(nid), }; } @@ -3052,9 +3048,9 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) * for /proc/mounts, /proc/pid/mounts and /proc/pid/mountinfo. */ if (mode != MPOL_PREFERRED) - new->v.nodes = nodes; + new->nodes = nodes; else if (nodelist) - new->v.preferred_nodes = nodemask_of_node(first_node(nodes)); + new->nodes = nodemask_of_node(first_node(nodes)); else new->flags |= MPOL_F_LOCAL; @@ -3110,11 +3106,11 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) if (flags & MPOL_F_LOCAL) mode = MPOL_LOCAL; else - nodes_or(nodes, nodes, pol->v.preferred_nodes); + nodes_or(nodes, nodes, pol->nodes); break; case MPOL_BIND: case MPOL_INTERLEAVE: - nodes = pol->v.nodes; + nodes = pol->nodes; break; default: WARN_ON_ONCE(1); From patchwork Tue Jun 30 21:25:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634831 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08F5013B4 for ; Tue, 30 Jun 2020 21:25:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D351F20826 for ; Tue, 30 Jun 2020 21:25:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D351F20826 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 577388D0009; Tue, 30 Jun 2020 17:25:30 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3F26A8D0007; Tue, 30 Jun 2020 17:25:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E73388D0009; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0172.hostedemail.com [216.40.44.172]) by kanga.kvack.org (Postfix) with ESMTP id C504E8D0007 for ; Tue, 30 Jun 2020 17:25:29 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9229D2DFC for ; Tue, 30 Jun 2020 21:25:29 +0000 (UTC) X-FDA: 76987159578.07.leaf04_500796c26e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 550081803F9AF for ; Tue, 30 Jun 2020 21:25:29 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygsha3piu3gk6gtmts355oae5ewop4tkdmnncsdfj64j98k8uwdutmjq1tsmt.oe6e33u1ojdo1i6o7qg11nxiwgzx5gtnccwapbry6d58i4iycfyckhsd6ae5pxy.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: leaf04_500796c26e7b X-Filterd-Recvd-Size: 6165 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:28 +0000 (UTC) IronPort-SDR: VQBKb7GUVregyO+8svMDLXXVuLW2YmnypCupm/cHlAsv0abgNJ0+XggbI30oEnY8l1wRfa8udR yWWSK7By79vA== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011322" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011322" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:25 -0700 IronPort-SDR: 4+7cRLuAWyAT8sTbObsbjZgxHy7VffO0F4uOKj3ndsLDqqlKVCgamUHtvDiFuekZHQLoLUR5I0 ZtfbCNfNp09Q== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336277" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:24 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Andrew Morton , Dave Hansen Subject: [PATCH 07/12] mm/mempolicy: handle MPOL_PREFERRED_MANY like BIND Date: Tue, 30 Jun 2020 14:25:12 -0700 Message-Id: <20200630212517.308045-8-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 550081803F9AF X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch begins the real plumbing for handling this new policy. Now that the internal representation for preferred nodes and bound nodes is the same, and we can envision what multiple preferred nodes will behave like, there are obvious places where we can simply reuse the bind behavior. In v1 of this series, the moral equivalent was: "mm: Finish handling MPOL_PREFERRED_MANY". Like that, this attempts to implement the easiest spots for the new policy. Unlike that, this just reuses BIND. Cc: Andrew Morton Cc: Dave Hansen Cc: Michal Hocko Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e71ebc906ff0..3b38c9c4e580 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -950,8 +950,6 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) switch (p->mode) { case MPOL_BIND: case MPOL_INTERLEAVE: - *nodes = p->nodes; - break; case MPOL_PREFERRED_MANY: *nodes = p->nodes; break; @@ -1938,7 +1936,8 @@ static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone) static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) { /* Lower zones don't get a nodemask applied for MPOL_BIND */ - if (unlikely(policy->mode == MPOL_BIND) && + if (unlikely(policy->mode == MPOL_BIND || + policy->mode == MPOL_PREFERRED_MANY) && apply_policy_zone(policy, gfp_zone(gfp)) && cpuset_nodemask_valid_mems_allowed(&policy->nodes)) return &policy->nodes; @@ -1995,7 +1994,6 @@ unsigned int mempolicy_slab_node(void) return node; switch (policy->mode) { - case MPOL_PREFERRED_MANY: case MPOL_PREFERRED: /* * handled MPOL_F_LOCAL above @@ -2005,6 +2003,7 @@ unsigned int mempolicy_slab_node(void) case MPOL_INTERLEAVE: return interleave_nodes(policy); + case MPOL_PREFERRED_MANY: case MPOL_BIND: { struct zoneref *z; @@ -2020,6 +2019,7 @@ unsigned int mempolicy_slab_node(void) return z->zone ? zone_to_nid(z->zone) : node; } + default: BUG(); } @@ -2130,9 +2130,6 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) task_lock(current); mempolicy = current->mempolicy; switch (mempolicy->mode) { - case MPOL_PREFERRED_MANY: - *mask = mempolicy->nodes; - break; case MPOL_PREFERRED: if (mempolicy->flags & MPOL_F_LOCAL) nid = numa_node_id(); @@ -2143,6 +2140,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) case MPOL_BIND: case MPOL_INTERLEAVE: + case MPOL_PREFERRED_MANY: *mask = mempolicy->nodes; break; @@ -2186,12 +2184,11 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, * Thus, it's possible for tsk to have allocated memory from * nodes in mask. */ - break; - case MPOL_PREFERRED_MANY: ret = nodes_intersects(mempolicy->nodes, *mask); break; case MPOL_BIND: case MPOL_INTERLEAVE: + case MPOL_PREFERRED_MANY: ret = nodes_intersects(mempolicy->nodes, *mask); break; default: @@ -2415,7 +2412,6 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) switch (a->mode) { case MPOL_BIND: case MPOL_INTERLEAVE: - return !!nodes_equal(a->nodes, b->nodes); case MPOL_PREFERRED_MANY: return !!nodes_equal(a->nodes, b->nodes); case MPOL_PREFERRED: @@ -2569,6 +2565,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long polnid = first_node(pol->nodes); break; + case MPOL_PREFERRED_MANY: case MPOL_BIND: /* @@ -2585,8 +2582,6 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long polnid = zone_to_nid(z->zone); break; - /* case MPOL_PREFERRED_MANY: */ - default: BUG(); } @@ -3099,15 +3094,13 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) switch (mode) { case MPOL_DEFAULT: break; - case MPOL_PREFERRED_MANY: - WARN_ON(flags & MPOL_F_LOCAL); - fallthrough; case MPOL_PREFERRED: if (flags & MPOL_F_LOCAL) mode = MPOL_LOCAL; else nodes_or(nodes, nodes, pol->nodes); break; + case MPOL_PREFERRED_MANY: case MPOL_BIND: case MPOL_INTERLEAVE: nodes = pol->nodes; From patchwork Tue Jun 30 21:25:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634837 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2460813B4 for ; Tue, 30 Jun 2020 21:25:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF81020663 for ; Tue, 30 Jun 2020 21:25:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF81020663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CF1A28D0010; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BB1108D0007; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E2108D000E; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 66A7E8D000D for ; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 31D032DFD for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) X-FDA: 76987159704.03.pet05_351143926e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 0661C28A4E9 for ; Tue, 30 Jun 2020 21:25:31 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30036:30054:30064:30070,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygziu5gug4ma1p1uqaq4dp1a17pyp9p4xywc5fnmy9uqw7d5qffak31zg4n5a.gjy91sa34e89osif3appe51o51jgddoxrw4ryup3qmj9jtn58zgnthwqidrzuyz.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: pet05_351143926e7b X-Filterd-Recvd-Size: 6380 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:30 +0000 (UTC) IronPort-SDR: h00sLeAe0FYICXMuaS1LaZgpSLQ4Tc+rAKf+8H27j8d9nHLAAQhTyGBMt2R2TDll9TMzOVPxtg V5B07ma759ug== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011324" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011324" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:25 -0700 IronPort-SDR: jiH7xBlVLhLIbymLk1VPjHyNfRig7LjaND1D47uVIY79yl6f1VN84qqwM4SZDL/oeYdj6uXFnR nc9G6savvdFA== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336282" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:25 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Andrew Morton , Vlastimil Babka Subject: [PATCH 08/12] mm/mempolicy: Create a page allocator for policy Date: Tue, 30 Jun 2020 14:25:13 -0700 Message-Id: <20200630212517.308045-9-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0661C28A4E9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds a helper function which takes care of handling multiple preferred nodes. It will be called by future patches that need to handle this, specifically VMA based page allocation, and task based page allocation. Huge pages don't quite fit the same pattern because they use different underlying page allocation functions. This consumes the previous interleave policy specific allocation function to make a one stop shop for policy based allocation. For now, only interleaved policy will be used so there should be no functional change yet. However, if bisection points to issues in the next few commits, it was likely the fault of this patch. Similar functionality is offered via policy_node() and policy_nodemask(). By themselves however, neither can achieve this fallback style of sets of nodes. Cc: Andrew Morton Cc: Michal Hocko CC: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 60 +++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 47 insertions(+), 13 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3b38c9c4e580..1009cf90ad37 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2199,22 +2199,56 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, return ret; } -/* Allocate a page in interleaved policy. - Own path because it needs to do special accounting. */ -static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, - unsigned nid) +/* Handle page allocation for all but interleaved policies */ +static struct page *alloc_pages_policy(struct mempolicy *pol, gfp_t gfp, + unsigned int order, int preferred_nid) { struct page *page; + gfp_t gfp_mask = gfp; - page = __alloc_pages(gfp, order, nid); - /* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */ - if (!static_branch_likely(&vm_numa_stat_key)) + if (pol->mode == MPOL_INTERLEAVE) { + page = __alloc_pages(gfp, order, preferred_nid); + /* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */ + if (!static_branch_likely(&vm_numa_stat_key)) + return page; + if (page && page_to_nid(page) == preferred_nid) { + preempt_disable(); + __inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT); + preempt_enable(); + } return page; - if (page && page_to_nid(page) == nid) { - preempt_disable(); - __inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT); - preempt_enable(); } + + VM_BUG_ON(preferred_nid != NUMA_NO_NODE); + + preferred_nid = numa_node_id(); + + /* + * There is a two pass approach implemented here for + * MPOL_PREFERRED_MANY. In the first pass we pretend the preferred nodes + * are bound, but allow the allocation to fail. The below table explains + * how this is achieved. + * + * | Policy | preferred nid | nodemask | + * |-------------------------------|---------------|------------| + * | MPOL_DEFAULT | local | NULL | + * | MPOL_PREFERRED | best | NULL | + * | MPOL_INTERLEAVE | ERR | ERR | + * | MPOL_BIND | local | pol->nodes | + * | MPOL_PREFERRED_MANY | best | pol->nodes | + * | MPOL_PREFERRED_MANY (round 2) | local | NULL | + * +-------------------------------+---------------+------------+ + */ + if (pol->mode == MPOL_PREFERRED_MANY) + gfp_mask |= __GFP_RETRY_MAYFAIL; + + page = __alloc_pages_nodemask(gfp_mask, order, + policy_node(gfp, pol, preferred_nid), + policy_nodemask(gfp, pol)); + + if (unlikely(!page && pol->mode == MPOL_PREFERRED_MANY)) + page = __alloc_pages_nodemask(gfp, order, preferred_nid, NULL); + return page; } @@ -2256,8 +2290,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, unsigned nid; nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order); + page = alloc_pages_policy(pol, gfp, order, nid); mpol_cond_put(pol); - page = alloc_page_interleave(gfp, order, nid); goto out; } @@ -2341,7 +2375,7 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order) * nor system default_policy */ if (pol->mode == MPOL_INTERLEAVE) - page = alloc_page_interleave(gfp, order, interleave_nodes(pol)); + page = alloc_pages_policy(pol, gfp, order, interleave_nodes(pol)); else page = __alloc_pages_nodemask(gfp, order, policy_node(gfp, pol, numa_node_id()), From patchwork Tue Jun 30 21:25:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634833 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0CA313B4 for ; Tue, 30 Jun 2020 21:25:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A7B5B20663 for ; Tue, 30 Jun 2020 21:25:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A7B5B20663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4142F8D000A; Tue, 30 Jun 2020 17:25:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 39BC38D0007; Tue, 30 Jun 2020 17:25:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 266058D000A; Tue, 30 Jun 2020 17:25:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id F10658D0007 for ; Tue, 30 Jun 2020 17:25:30 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BCF29180AD807 for ; Tue, 30 Jun 2020 21:25:30 +0000 (UTC) X-FDA: 76987159620.14.bells73_530376f26e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 9745218229837 for ; Tue, 30 Jun 2020 21:25:30 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8hxh9osm5icb4k34ckfmo888igocsjg13cc6p4soqqjt66q3obpwrcfmt581.3gkgpjhdxq7cy9tn9a4hramo9i87acy359ymrq785hpqjhfopwj4nfsxbttu85r.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: bells73_530376f26e7b X-Filterd-Recvd-Size: 3316 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:29 +0000 (UTC) IronPort-SDR: P1uA+ZKLmIAilQTuRRXMhRQYzsaIepBSpAbox5WI+k0mGII5ItNi8wibRh857YDlxPSIS24Nb8 yx4mZEcsMZdw== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011327" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011327" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:26 -0700 IronPort-SDR: YOtfhjFDe7/DRkbzG2XfTI/q1VqpxcJUI87TCypy0Kj2J8ISrWTHDu+KPWiDjWGa4u2zFYbYTS VUkQfmK3ZZPA== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336285" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:25 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Andrew Morton , Vlastimil Babka Subject: [PATCH 09/12] mm/mempolicy: Thread allocation for many preferred Date: Tue, 30 Jun 2020 14:25:14 -0700 Message-Id: <20200630212517.308045-10-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9745218229837 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In order to support MPOL_PREFERRED_MANY as the mode used by set_mempolicy(2), alloc_pages_current() needs to support it. This patch does that by using the new helper function to allocate properly based on policy. All the actual machinery to make this work was part of ("mm/mempolicy: Create a page allocator for policy") Cc: Andrew Morton Cc: Michal Hocko Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1009cf90ad37..5fb70e6599a6 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2365,7 +2365,7 @@ EXPORT_SYMBOL(alloc_pages_vma); struct page *alloc_pages_current(gfp_t gfp, unsigned order) { struct mempolicy *pol = &default_policy; - struct page *page; + int nid = NUMA_NO_NODE; if (!in_interrupt() && !(gfp & __GFP_THISNODE)) pol = get_task_policy(current); @@ -2375,13 +2375,10 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order) * nor system default_policy */ if (pol->mode == MPOL_INTERLEAVE) - page = alloc_pages_policy(pol, gfp, order, interleave_nodes(pol)); - else - page = __alloc_pages_nodemask(gfp, order, - policy_node(gfp, pol, numa_node_id()), - policy_nodemask(gfp, pol)); + nid = interleave_nodes(pol); + + return alloc_pages_policy(pol, gfp, order, nid); - return page; } EXPORT_SYMBOL(alloc_pages_current); From patchwork Tue Jun 30 21:25:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634839 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29D5913B4 for ; Tue, 30 Jun 2020 21:25:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 009BA20826 for ; Tue, 30 Jun 2020 21:25:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 009BA20826 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2D2588D0007; Tue, 30 Jun 2020 17:25:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CEDDC8D000F; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACD6F8D000D; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0109.hostedemail.com [216.40.44.109]) by kanga.kvack.org (Postfix) with ESMTP id 8BE7F8D0007 for ; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4BC802C6D for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) X-FDA: 76987159704.19.event13_580bfc726e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 201C51ACC27 for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yruwufmiy354yg116k6csfuipyyocw9p8tk46teaf1tf54yyoa79bjhh7goy5.gtbweq8jrfdjj9mdtfrrixbwrmf9kaek89ywiebq9qxhyx5tqoeywqs4pmj5ye1.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: event13_580bfc726e7b X-Filterd-Recvd-Size: 4788 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:31 +0000 (UTC) IronPort-SDR: WwSfDVWZCqdoDY7/2IkvCvPNITlBNFfZvvowMa4XrUOD6HymznRDk2hMbGg76G3QD2Z5DA8+MB 1eoRzsIlTDTQ== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011330" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011330" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:26 -0700 IronPort-SDR: qWxOIhL7nJJh0zS00TYWJugW1j+mNLWMxg4HVF+3r0kt5FQiYEUuz4ti33qSn8YqMi0x1CXO0f 6FGbE6q+/+EA== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336293" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:26 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Andrea Arcangeli , Andrew Morton , David Rientjes , Mel Gorman , Vlastimil Babka Subject: [PATCH 10/12] mm/mempolicy: VMA allocation for many preferred Date: Tue, 30 Jun 2020 14:25:15 -0700 Message-Id: <20200630212517.308045-11-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 201C51ACC27 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch implements MPOL_PREFERRED_MANY for alloc_pages_vma(). Like alloc_pages_current(), alloc_pages_vma() needs to support policy based decisions if they've been configured via mbind(2). The temporary "hack" of treating MPOL_PREFERRED and MPOL_PREFERRED_MANY can now be removed with this, too. All the actual machinery to make this work was part of ("mm/mempolicy: Create a page allocator for policy") Cc: Andrea Arcangeli Cc: Andrew Morton Cc: David Rientjes Cc: Mel Gorman Cc: Michal Hocko Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 5fb70e6599a6..51ac0d4a2eda 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2281,8 +2281,6 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, { struct mempolicy *pol; struct page *page; - int preferred_nid; - nodemask_t *nmask; pol = get_vma_policy(vma, addr); @@ -2296,6 +2294,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, } if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { + nodemask_t *nmask; int hpage_node = node; /* @@ -2309,10 +2308,26 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, * does not allow the current node in its nodemask, we allocate * the standard way. */ - if ((pol->mode == MPOL_PREFERRED || - pol->mode == MPOL_PREFERRED_MANY) && - !(pol->flags & MPOL_F_LOCAL)) + if (pol->mode == MPOL_PREFERRED || !(pol->flags & MPOL_F_LOCAL)) { hpage_node = first_node(pol->nodes); + } else if (pol->mode == MPOL_PREFERRED_MANY) { + struct zoneref *z; + + /* + * In this policy, with direct reclaim, the normal + * policy based allocation will do the right thing - try + * twice using the preferred nodes first, and all nodes + * second. + */ + if (gfp & __GFP_DIRECT_RECLAIM) { + page = alloc_pages_policy(pol, gfp, order, NUMA_NO_NODE); + goto out; + } + + z = first_zones_zonelist(node_zonelist(numa_node_id(), GFP_HIGHUSER), + gfp_zone(GFP_HIGHUSER), &pol->nodes); + hpage_node = zone_to_nid(z->zone); + } nmask = policy_nodemask(gfp, pol); if (!nmask || node_isset(hpage_node, *nmask)) { @@ -2338,9 +2353,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, } } - nmask = policy_nodemask(gfp, pol); - preferred_nid = policy_node(gfp, pol, node); - page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask); + page = alloc_pages_policy(pol, gfp, order, NUMA_NO_NODE); mpol_cond_put(pol); out: return page; From patchwork Tue Jun 30 21:25:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634841 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C7A4739 for ; Tue, 30 Jun 2020 21:25:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 42AFF2072D for ; Tue, 30 Jun 2020 21:25:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 42AFF2072D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 647F38D000D; Tue, 30 Jun 2020 17:25:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 527748D000E; Tue, 30 Jun 2020 17:25:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC61A8D000D; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id BD3C98D000E for ; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8564B181ABE87 for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) X-FDA: 76987159704.30.jeans99_1710cb326e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 5E4E5180B3C85 for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30003:30054:30064,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yro4tg3bdk63yewf1es8a77mioioco7tofwh717odbq9mhac3h371ozb6k1io.k7ffnngw6w9n9wnf58tqbjf9gdooibxqqq6hsnsfh77ebsco3s914uwbcspu1q8.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: jeans99_1710cb326e7b X-Filterd-Recvd-Size: 4931 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:31 +0000 (UTC) IronPort-SDR: 80/pKV0fOVfGBR9spaMs4UZuiKt2QfvuigMzoDgJ8EBAGZw0hJZ8VVlLP1dNcbNq3Gn54tyBcP 5PQFRlymFefg== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011334" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011334" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:27 -0700 IronPort-SDR: b6SwilEx9q+8Qz2Y8NdZPIh0mgxRNDJuxOvQ6jURPIacvIwNLPbSqGU8YHr8giEm4vKTY4lfys 5a5jWIF7w+cw== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336298" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:26 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Andrew Morton , Mike Kravetz Subject: [PATCH 11/12] mm/mempolicy: huge-page allocation for many preferred Date: Tue, 30 Jun 2020 14:25:16 -0700 Message-Id: <20200630212517.308045-12-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 5E4E5180B3C85 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch implements the missing huge page allocation functionality while obeying the preferred node semantics. Like the previous patches, this uses a fallback mechanism to try multiple preferred nodes first, and then all other nodes. It cannot use the helper function that was introduced because huge page allocation already has its own helpers and it was more LOC, and effort to try to consolidate that. The weirdness in this patch is it cannot yet use MPOL_PREFERRED_MANY because it is part of the UAPI we haven't yet exposed. Instead of make that define global, it's simply changed with the UAPI patch. Cc: Andrew Morton Cc: Michal Hocko Cc: Mike Kravetz Signed-off-by: Ben Widawsky --- mm/hugetlb.c | 20 +++++++++++++++++--- mm/mempolicy.c | 3 ++- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 57ece74e3aae..46e94675de44 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1103,7 +1103,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, unsigned long address, int avoid_reserve, long chg) { - struct page *page; + struct page *page = NULL; struct mempolicy *mpol; gfp_t gfp_mask; nodemask_t *nodemask; @@ -1124,7 +1124,14 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, gfp_mask = htlb_alloc_mask(h); nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask); - page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask); + if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */ + page = dequeue_huge_page_nodemask(h, gfp_mask | __GFP_RETRY_MAYFAIL, + nid, nodemask); + if (!page) + page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL); + } else { + page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask); + } if (page && !avoid_reserve && vma_has_reserves(vma, chg)) { SetPagePrivate(page); h->resv_huge_pages--; @@ -1972,7 +1979,14 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h, nodemask_t *nodemask; nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask); - page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask); + if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */ + page = alloc_surplus_huge_page(h, gfp_mask | __GFP_RETRY_MAYFAIL, + nid, nodemask); + if (!page) + alloc_surplus_huge_page(h, gfp_mask, nid, NULL); + } else { + page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask); + } mpol_cond_put(mpol); return page; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 51ac0d4a2eda..53390c2e0aca 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2097,7 +2097,8 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, huge_page_shift(hstate_vma(vma))); } else { nid = policy_node(gfp_flags, *mpol, numa_node_id()); - if ((*mpol)->mode == MPOL_BIND) + if ((*mpol)->mode == MPOL_BIND || + (*mpol)->mode == MPOL_PREFERRED_MANY) *nodemask = &(*mpol)->nodes; } return nid; From patchwork Tue Jun 30 21:25:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11634843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8D35739 for ; Tue, 30 Jun 2020 21:25:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 85B66207FB for ; Tue, 30 Jun 2020 21:25:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 85B66207FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C7A8D8D000E; Tue, 30 Jun 2020 17:25:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BB1B38D000F; Tue, 30 Jun 2020 17:25:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA3548D000E; Tue, 30 Jun 2020 17:25:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0128.hostedemail.com [216.40.44.128]) by kanga.kvack.org (Postfix) with ESMTP id 7D1A08D000F for ; Tue, 30 Jun 2020 17:25:33 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 43FC9181ABE87 for ; Tue, 30 Jun 2020 21:25:33 +0000 (UTC) X-FDA: 76987159746.20.help91_370408a26e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id DECF6180C0609 for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30003:30051:30054:30064:30070:30075:30090,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8rus4grrkx16wdrbasupjfsyhbyc9y5bnehknq55wpjktejnum1in96g9b3e.u6x14w6k9b6k111zp7x6wf5e5omta7xx6zy48354j6489yq9exks9mg8gowxubw.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:1:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: help91_370408a26e7b X-Filterd-Recvd-Size: 8662 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) IronPort-SDR: uG/FmqLOWB3EPG8Al6d3xQpG4cUlASzILheuwcA8QSSVlmVvnQ0n+QTBEIjf2EkgnlyJUDCvvc c8KVvLJ9cnpQ== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011335" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011335" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:28 -0700 IronPort-SDR: lBRKYoXUCC2RmEUqQaIYtSw4X7CkNrmpTVmHCVJnCTEUTGlhmqSEzAPmsg9RwPdKsgBB7E/V++ c0+QGy1TPooA== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336307" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:27 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Andrew Morton , Dave Hansen , David Hildenbrand , Vlastimil Babka Subject: [PATCH 12/12] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Date: Tue, 30 Jun 2020 14:25:17 -0700 Message-Id: <20200630212517.308045-13-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: DECF6180C0609 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY. MPOL_PREFERRED_MANY will be adequately documented in the internal admin-guide with this patch. Eventually, the man pages for mbind(2), get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text about this mode. Those shall contain the canonical reference. NUMA systems continue to become more prevalent. New technologies like PMEM make finer grain control over memory access patterns increasingly desirable. MPOL_PREFERRED_MANY allows userspace to specify a set of nodes that will be tried first when performing allocations. If those allocations fail, all remaining nodes will be tried. It's a straight forward API which solves many of the presumptive needs of system administrators wanting to optimize workloads on such machines. The mode will work either per VMA, or per thread. Generally speaking, this is similar to the way MPOL_BIND works, except the user will only get a SIGSEGV if all nodes in the system are unable to satisfy the allocation request. Cc: Andrew Morton Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- .../admin-guide/mm/numa_memory_policy.rst | 16 ++++++++++++---- include/uapi/linux/mempolicy.h | 6 +++--- mm/hugetlb.c | 4 ++-- mm/mempolicy.c | 14 ++++++-------- 4 files changed, 23 insertions(+), 17 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index 1ad020c459b8..b69963a37fc8 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -245,6 +245,14 @@ MPOL_INTERLEAVED address range or file. During system boot up, the temporary interleaved system default policy works in this mode. +MPOL_PREFERRED_MANY + This mode specifies that the allocation should be attempted from the + nodemask specified in the policy. If that allocation fails, the kernel + will search other nodes, in order of increasing distance from the first + set bit in the nodemask based on information provided by the platform + firmware. It is similar to MPOL_PREFERRED with the main exception that + is is an error to have an empty nodemask. + NUMA memory policy supports the following optional mode flags: MPOL_F_STATIC_NODES @@ -253,10 +261,10 @@ MPOL_F_STATIC_NODES nodes changes after the memory policy has been defined. Without this flag, any time a mempolicy is rebound because of a - change in the set of allowed nodes, the node (Preferred) or - nodemask (Bind, Interleave) is remapped to the new set of - allowed nodes. This may result in nodes being used that were - previously undesired. + change in the set of allowed nodes, the preferred nodemask (Preferred + Many), preferred node (Preferred) or nodemask (Bind, Interleave) is + remapped to the new set of allowed nodes. This may result in nodes + being used that were previously undesired. With this flag, if the user-specified nodes overlap with the nodes allowed by the task's cpuset, then the memory policy is diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 3354774af61e..ad3eee651d4e 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -16,13 +16,13 @@ */ /* Policies */ -enum { - MPOL_DEFAULT, +enum { MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, MPOL_INTERLEAVE, MPOL_LOCAL, - MPOL_MAX, /* always last member of enum */ + MPOL_PREFERRED_MANY, + MPOL_MAX, /* always last member of enum */ }; /* Flags for set_mempolicy */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 46e94675de44..7b75231fe277 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1124,7 +1124,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, gfp_mask = htlb_alloc_mask(h); nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask); - if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */ + if (mpol->mode == MPOL_PREFERRED_MANY) { page = dequeue_huge_page_nodemask(h, gfp_mask | __GFP_RETRY_MAYFAIL, nid, nodemask); if (!page) @@ -1979,7 +1979,7 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h, nodemask_t *nodemask; nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask); - if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */ + if (mpol->mode != MPOL_PREFERRED_MANY) { page = alloc_surplus_huge_page(h, gfp_mask | __GFP_RETRY_MAYFAIL, nid, nodemask); if (!page) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 53390c2e0aca..b2a4c07cf811 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -108,8 +108,6 @@ #include "internal.h" -#define MPOL_PREFERRED_MANY MPOL_MAX - /* Internal flags */ #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for continuous vmas */ #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */ @@ -180,7 +178,7 @@ struct mempolicy *get_task_policy(struct task_struct *p) static const struct mempolicy_operations { int (*create)(struct mempolicy *pol, const nodemask_t *nodes); void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes); -} mpol_ops[MPOL_MAX + 1]; +} mpol_ops[MPOL_MAX]; static inline int mpol_store_user_nodemask(const struct mempolicy *pol) { @@ -385,8 +383,8 @@ static void mpol_rebind_preferred_common(struct mempolicy *pol, } /* MPOL_PREFERRED_MANY allows multiple nodes to be set in 'nodes' */ -static void __maybe_unused mpol_rebind_preferred_many(struct mempolicy *pol, - const nodemask_t *nodes) +static void mpol_rebind_preferred_many(struct mempolicy *pol, + const nodemask_t *nodes) { mpol_rebind_preferred_common(pol, nodes, nodes); } @@ -448,7 +446,7 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) mmap_write_unlock(mm); } -static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = { +static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { [MPOL_DEFAULT] = { .rebind = mpol_rebind_default, }, @@ -466,8 +464,8 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = { }, /* [MPOL_LOCAL] - see mpol_new() */ [MPOL_PREFERRED_MANY] = { - .create = NULL, - .rebind = NULL, + .create = mpol_new_preferred_many, + .rebind = mpol_rebind_preferred_many, }, };