From patchwork Sat Dec 9 06:59:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13485960 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2992C10DC3 for ; Sat, 9 Dec 2023 06:59:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37FFA6B0074; Sat, 9 Dec 2023 01:59:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32E346B0075; Sat, 9 Dec 2023 01:59:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1818E6B007D; Sat, 9 Dec 2023 01:59:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 033556B0074 for ; Sat, 9 Dec 2023 01:59:42 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D8D1A1A0296 for ; Sat, 9 Dec 2023 06:59:41 +0000 (UTC) X-FDA: 81546379362.11.7881E09 Received: from mail-yb1-f194.google.com (mail-yb1-f194.google.com [209.85.219.194]) by imf05.hostedemail.com (Postfix) with ESMTP id 18281100009 for ; Sat, 9 Dec 2023 06:59:39 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dOh7qQe0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.219.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702105180; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kJfHcOqh+1VXL6BeGt8uno6KhS8AZJyMuELbDvAkwCQ=; b=gG9U6hSLhc0yZqZ0h1dc/Y1n05xUHU13IIx2dxkXIryVN3AWJ2YkaHw3vEDQdGYkFGwy4a 6CJLHp4JR+BQrPKldDhp8kx0wy52Sgv91qxxL9w4WvqAU9HKAiKnxuN5AGTzRnywFQK8vL 3T2jeGSPhS1MKiW9ePohJ3UVl71j6z4= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dOh7qQe0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.219.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702105180; a=rsa-sha256; cv=none; b=QYy7WFsRkUz3F77FFSUUZaIpJuAgwlFHzJMw9bjNoySYWJ0Xx7oFVvx5QyigLb1Jq08q1b l0KUkxjUXj+h5f1Yjoyhn8oJ9S1A7PYZ5T1hPTWDWw18Mb4JofEswgIJuwR9iiLZoyBtiw 26ZU/PIpUoEvG7tZN3U2pbi+gY95D2U= Received: by mail-yb1-f194.google.com with SMTP id 3f1490d57ef6-db5e5647c24so3344417276.1 for ; Fri, 08 Dec 2023 22:59:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702105179; x=1702709979; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kJfHcOqh+1VXL6BeGt8uno6KhS8AZJyMuELbDvAkwCQ=; b=dOh7qQe0eDS0juLO3n8GSOeb/rysMAIcv2OxHV2IS9EdI98Yr24ELqp+yAZqiVfqkc dRa/1aUVQtQuTAd1E+N8QwOE0TAnjcTzN+QvbkdbM4LN+7qFVWql/9+8itpJ/82gskas Or7AVfPbiPuNkw1VGoRBIphZdvV5yWAGrSgivUe5ShQHzDk2Pb/0Z+NkkESMG5ItnR0Z CDqyilRJnyrPycZnJKOoXPkEU3IA8DL9qxPSH5wFY1Po2kyAJuMkUr+zywa4aBaiz/AV PyZTlYYkSW2/qC/hTFIT43Hgh2dectNqjFcTGu+bOmlaB508plJRDjgPuveyBPZ1eMIW 3g9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105179; x=1702709979; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kJfHcOqh+1VXL6BeGt8uno6KhS8AZJyMuELbDvAkwCQ=; b=p4oVvjJuAcfW9/n4ewjUS0T8gJXvDWnARlXjZemp0QApb6uN4FPBk8sllQD5owWipL I6dpgrEVlIC3miaGaRIxqKZnck3gYpCFTtGZLgaZY1T2HeSwlYQepVJwnoePSE7hR1xy 2Q5K+hnvOGoyujI3JjKzKqPWUDrnoGr2k3WsM+5X29iMJBDdeVMn+iTznMsoIGJjmwk0 gnH4i25zTdszvS4oY/PLhLp/MIr6Hrf2lHQQgx301NYTNXZreqqJ23LQdmFXcQkC+scj qfiDOrMbbjkHvByF3NFsx7WOF2nhkJAIQw6p3VZZ5I7GdK4aOaFX5EfAvJiuJE+d4GgU 1OZA== X-Gm-Message-State: AOJu0YyBhQRSiBDxzmI5X6Ky0cBpPME2vzftp3D1CMHiYHkIwYxiEsu4 8IcpPtjbCH6bjxjPbI/aa5HfiXX+8H9T X-Google-Smtp-Source: AGHT+IHem49DcoGcs2D+xaYZ/BXuVuRdWD2QqTeF3fRSLx7GEMQSsEB+52OPmkNaYCUrWg8SJLhQvQ== X-Received: by 2002:a0d:e802:0:b0:5d7:1940:3f03 with SMTP id r2-20020a0de802000000b005d719403f03mr949706ywe.52.1702105178928; Fri, 08 Dec 2023 22:59:38 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x8-20020a81b048000000b005df5d592244sm326530ywk.78.2023.12.08.22.59.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 22:59:38 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com, Srinivasulu Thanneeru Subject: [PATCH v2 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving Date: Sat, 9 Dec 2023 01:59:22 -0500 Message-Id: <20231209065931.3458-3-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231209065931.3458-1-gregory.price@memverge.com> References: <20231209065931.3458-1-gregory.price@memverge.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 18281100009 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ms3yiubuqbop15qn9cx9w48noaw4zt9a X-HE-Tag: 1702105179-775298 X-HE-Meta: U2FsdGVkX18d+tk1BoRquHYDo9l+lwhDhw81BIHAuErF7aSzbJZIgaqe+zw9Fkn6/ObtIJJaRoUnMoUqi5TTNdoOJl40ypZkmp6s5ut2WSrKtPqqk6lV19gNL/3Wb3vG/D+ybxkXpUZUfQllVdKRTgK8cZCm2j1WAMyOGkpLuYYSWKriHp6/l/fNbP3OAa45p2AWqdv7HyLaSA4OytFx7nlxRFx/qLwdwW4A90oWaxs1pBX4s2+QZGb/e6CB0phqnGOe1Gko0ItNYVVMa/dpUK7uqsZnc4/HTBN1d10JRO/5uAWPdyjzVpc3KHiOejWDhWbsXe85Bb8/XueDwz5pUI72CVAtv40raPdJvt6RPFaUy+CmyZShQ5xSpmIuF1+r4Zl9Zj89MiYPrUt7xLd2Q480vqiD5fwRAGByMnkcvmf2lebVV/kSJGQSnl6ewO9J45rVLeI6v/c7e7uR//dUDfgDmF56rR0glVLh56bD4Rd+4AMqZNFL9howLxeVwPl8AwIM9JUkkik6ms+w2UYA7ZFFVa1/TAdgrv3bZ0VSjggYPzW5gAg99TCO2mA/rfz9sikwtoiVssT+uwxyAhavc2hCQyftvSWg/C578h7dZZyCtYw/zoonzYLEErZsMmS6xHbdpvcOX0tWnyItJPG1PhwZMlXl1x6+TvHRARZieEAkfB8aaBxzYMyISN5+HjcoqLWadN2kMxrs+bes5+bFch2vqj9NJy1zYs8GyFkqkmk2Dgf7ScUE2NmhIfQjWjQ2U2OHrXN6tJI//edf2fAt+gN64oQoPD9fREjxFa+6d7SDtlJUq4XCIvwDGcEZNw3/FoCRCXT4hwGGdsUi2BBouzVAj5najjM41kGY2XAD3UiDuS/YJTwAR8YSF0sR5eMhtaAkIclVcnyKSV1LYFF2nwSsqD9J5JCKHnEZNn450GUbUlyMMHQXCBzqsXGkvusH7db5lcDEP3nFNmXY180 86YvZhVW WXpYtyf1t0/18q4fWoFPU4llklk1bLHjysJSDUkWyQeRa+289WaNoLSHkjCv/OxCezJHEQ7qjWrNGKAXvSiL/8hg7DCnMUdU1H6QfijERKfhgrswNx8nOuOFW6lIIwGogQZIAJQXOldqG09nci1b+GMUoJ5T35DYpYOHrTimvAOB6MxIKxEIoJEKR4PaBgL5un98sp5og4/9f9n2oMXjG2wEx1Pr5SpxUWXzm5g6Q2D+Tt3p5Y8CVIZbbYt08IrCl4lhgAFQXuMizqe5H1Ki/qA8zaD93h8s+pKYlAmhf7C04lufGX+ykAmMnCx6I7aJxa866dPdcEYYBJiGALyM6w8L1fb/cXIa0MJQJEKs8RtaAvYpz1WC0FLptBksDZHrbNu1oZV/DTtaxmXjLaSeS7haRSm1IdBd5pIE8vOiaELB9AX4JrDEQmK7Hiwqoa8I/O7bEkR+W8YS/EfY382CakFEICFDSewBsacog++PZqGaU19y1eaYm7gEm89O2Vug7l6+eqTxCUlWcgcbxo5zgV2SMbKLrgBcTukXQx8pvdZptwQKN5dkuWjJ+Vt3hCfSU3Qa360LetXIaGOZ1q4b9pvx35cWV7Mcl0MWkQOlMQNoYYGD8suOxAon7X+8rKncDKv1tzXoDMHN9EmapQ1WOipamLw/RMNNJtnSVk2qdevem0aeVWuumWr8HxvgK/fV3rG7naR3ie2waDNg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Rakie Kim When a system has multiple NUMA nodes and it becomes bandwidth hungry, the current MPOL_INTERLEAVE could be an wise option. However, if those NUMA nodes consist of different types of memory such as having local DRAM and CXL memory together, the current round-robin based interleaving policy doesn't maximize the overall bandwidth because of their different bandwidth characteristics. Instead, the interleaving can be more efficient when the allocation policy follows each NUMA nodes' bandwidth weight rather than having 1:1 round-robin allocation. This patch introduces a new memory policy, MPOL_WEIGHTED_INTERLEAVE, which enables weighted interleaving between NUMA nodes. Weighted interleave allows for a proportional distribution of memory across multiple numa nodes, preferablly apportioned to match the bandwidth capacity of each node from the perspective of the accessing node. For example, if a system has 1 CPU node (0), and 2 memory nodes (0,1), with a relative bandwidth of (100GB/s, 50GB/s) respectively, the appropriate weight distribution is (2:1). Weights will be acquired from the global weight array exposed by the sysfs extension: /sys/kernel/mm/mempolicy/weighted_interleave/ The policy will then allocate the number of pages according to the set weights. For example, if the weights are (2,1), then 2 pages will be allocated on node0 for every 1 page allocated on node1. The new flag MPOL_WEIGHTED_INTERLEAVE can be used in set_mempolicy(2) and mbind(2). There are 3 integration points: weighted_interleave_nodes: Counts the number of allocations as they occur, and applies the weight for the current node. When the weight reaches 0, switch to the next node. Applied by `mempolicy_slab_node()` and `policy_nodemask()` weighted_interleave_nid: Gets the total weight of the nodemask as well as each individual node weight, then calculates the node based on the given index. Applied by `policy_nodemask()` and `mpol_misplaced()` bulk_array_weighted_interleave: Gets the total weight of the nodemask as well as each individual node weight, then calculates the number of "interleave rounds" as well as any delta ("partial round"). Calculates the number of pages for each node and allocates them. If a node was scheduled for interleave via interleave_nodes, the current weight (pol->cur_weight) will be allocated first, before the remaining bulk calculation is done. This simplifies the calculation at the cost of an additional allocation call. One piece of complexity is the interaction between a recent refactor which split the logic to acquire the "ilx" (interleave index) of an allocation and the actually application of the interleave. The calculation of the `interleave index` is done by `get_vma_policy()`, while the actual selection of the node will be later applied by the relevant weighted_interleave function. Suggested-by: Hasan Al Maruf Signed-off-by: Rakie Kim Co-developed-by: Honggyu Kim Signed-off-by: Honggyu Kim Co-developed-by: Hyeongtak Ji Signed-off-by: Hyeongtak Ji Co-developed-by: Gregory Price Signed-off-by: Gregory Price Co-developed-by: Srinivasulu Thanneeru Signed-off-by: Srinivasulu Thanneeru Co-developed-by: Ravi Jonnalagadda Signed-off-by: Ravi Jonnalagadda --- .../admin-guide/mm/numa_memory_policy.rst | 11 + include/linux/mempolicy.h | 5 + include/uapi/linux/mempolicy.h | 1 + mm/mempolicy.c | 197 +++++++++++++++++- 4 files changed, 211 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index eca38fa81e0f..d2c8e712785b 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -250,6 +250,17 @@ MPOL_PREFERRED_MANY can fall back to all existing numa nodes. This is effectively MPOL_PREFERRED allowed for a mask rather than a single node. +MPOL_WEIGHTED_INTERLEAVE + This mode operates the same as MPOL_INTERLEAVE, except that + interleaving behavior is executed based on weights set in + /sys/kernel/mm/mempolicy/weighted_interleave/ + + Weighted interleave allocations pages on nodes according to + their weight. For example if nodes [0,1] are weighted [5,2] + respectively, 5 pages will be allocated on node0 for every + 2 pages allocated on node1. This can better distribute data + according to bandwidth on heterogeneous memory systems. + NUMA memory policy supports the following optional mode flags: MPOL_F_STATIC_NODES diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 931b118336f4..ba09167e80f7 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -54,6 +54,11 @@ struct mempolicy { nodemask_t cpuset_mems_allowed; /* relative to these nodes */ nodemask_t user_nodemask; /* nodemask passed by user */ } w; + + /* Weighted interleave settings */ + struct { + unsigned char cur_weight; + } wil; }; /* diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index a8963f7ef4c2..1f9bb10d1a47 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -23,6 +23,7 @@ enum { MPOL_INTERLEAVE, MPOL_LOCAL, MPOL_PREFERRED_MANY, + MPOL_WEIGHTED_INTERLEAVE, MPOL_MAX, /* always last member of enum */ }; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 28dfae195beb..b4d94646e6a2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -305,6 +305,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, policy->mode = mode; policy->flags = flags; policy->home_node = NUMA_NO_NODE; + policy->wil.cur_weight = 0; return policy; } @@ -417,6 +418,10 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { .create = mpol_new_nodemask, .rebind = mpol_rebind_preferred, }, + [MPOL_WEIGHTED_INTERLEAVE] = { + .create = mpol_new_nodemask, + .rebind = mpol_rebind_nodemask, + }, }; static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, @@ -838,7 +843,8 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, old = current->mempolicy; current->mempolicy = new; - if (new && new->mode == MPOL_INTERLEAVE) + if (new && (new->mode == MPOL_INTERLEAVE || + new->mode == MPOL_WEIGHTED_INTERLEAVE)) current->il_prev = MAX_NUMNODES-1; task_unlock(current); mpol_put(old); @@ -864,6 +870,7 @@ static void get_policy_nodemask(struct mempolicy *pol, nodemask_t *nodes) case MPOL_INTERLEAVE: case MPOL_PREFERRED: case MPOL_PREFERRED_MANY: + case MPOL_WEIGHTED_INTERLEAVE: *nodes = pol->nodes; break; case MPOL_LOCAL: @@ -948,6 +955,13 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, } else if (pol == current->mempolicy && pol->mode == MPOL_INTERLEAVE) { *policy = next_node_in(current->il_prev, pol->nodes); + } else if (pol == current->mempolicy && + (pol->mode == MPOL_WEIGHTED_INTERLEAVE)) { + if (pol->wil.cur_weight) + *policy = current->il_prev; + else + *policy = next_node_in(current->il_prev, + pol->nodes); } else { err = -EINVAL; goto out; @@ -1777,7 +1791,8 @@ struct mempolicy *get_vma_policy(struct vm_area_struct *vma, pol = __get_vma_policy(vma, addr, ilx); if (!pol) pol = get_task_policy(current); - if (pol->mode == MPOL_INTERLEAVE) { + if (pol->mode == MPOL_INTERLEAVE || + pol->mode == MPOL_WEIGHTED_INTERLEAVE) { *ilx += vma->vm_pgoff >> order; *ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order); } @@ -1827,6 +1842,24 @@ bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone) return zone >= dynamic_policy_zone; } +static unsigned int weighted_interleave_nodes(struct mempolicy *policy) +{ + unsigned int next; + struct task_struct *me = current; + + next = next_node_in(me->il_prev, policy->nodes); + if (next == MAX_NUMNODES) + return next; + + if (!policy->wil.cur_weight) + policy->wil.cur_weight = iw_table[next]; + + policy->wil.cur_weight--; + if (!policy->wil.cur_weight) + me->il_prev = next; + return next; +} + /* Do dynamic interleaving for a process */ static unsigned int interleave_nodes(struct mempolicy *policy) { @@ -1861,6 +1894,9 @@ unsigned int mempolicy_slab_node(void) case MPOL_INTERLEAVE: return interleave_nodes(policy); + case MPOL_WEIGHTED_INTERLEAVE: + return weighted_interleave_nodes(policy); + case MPOL_BIND: case MPOL_PREFERRED_MANY: { @@ -1885,6 +1921,41 @@ unsigned int mempolicy_slab_node(void) } } +static unsigned int weighted_interleave_nid(struct mempolicy *pol, pgoff_t ilx) +{ + nodemask_t nodemask = pol->nodes; + unsigned int target, weight_total = 0; + int nid; + unsigned char weights[MAX_NUMNODES]; + unsigned char weight; + + barrier(); + + /* first ensure we have a valid nodemask */ + nid = first_node(nodemask); + if (nid == MAX_NUMNODES) + return nid; + + /* Then collect weights on stack and calculate totals */ + for_each_node_mask(nid, nodemask) { + weight = iw_table[nid]; + weight_total += weight; + weights[nid] = weight; + } + + /* Finally, calculate the node offset based on totals */ + target = (unsigned int)ilx % weight_total; + nid = first_node(nodemask); + while (target) { + weight = weights[nid]; + if (target < weight) + break; + target -= weight; + nid = next_node_in(nid, nodemask); + } + return nid; +} + /* * Do static interleaving for interleave index @ilx. Returns the ilx'th * node in pol->nodes (starting from ilx=0), wrapping around if ilx @@ -1953,6 +2024,11 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol, *nid = (ilx == NO_INTERLEAVE_INDEX) ? interleave_nodes(pol) : interleave_nid(pol, ilx); break; + case MPOL_WEIGHTED_INTERLEAVE: + *nid = (ilx == NO_INTERLEAVE_INDEX) ? + weighted_interleave_nodes(pol) : + weighted_interleave_nid(pol, ilx); + break; } return nodemask; @@ -2014,6 +2090,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) case MPOL_PREFERRED_MANY: case MPOL_BIND: case MPOL_INTERLEAVE: + case MPOL_WEIGHTED_INTERLEAVE: *mask = mempolicy->nodes; break; @@ -2113,7 +2190,8 @@ struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order, * If the policy is interleave or does not allow the current * node in its nodemask, we allocate the standard way. */ - if (pol->mode != MPOL_INTERLEAVE && + if ((pol->mode != MPOL_INTERLEAVE && + pol->mode != MPOL_WEIGHTED_INTERLEAVE) && (!nodemask || node_isset(nid, *nodemask))) { /* * First, try to allocate THP only on local node, but @@ -2249,6 +2327,106 @@ static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp, return total_allocated; } +static unsigned long alloc_pages_bulk_array_weighted_interleave(gfp_t gfp, + struct mempolicy *pol, unsigned long nr_pages, + struct page **page_array) +{ + struct task_struct *me = current; + unsigned long total_allocated = 0; + unsigned long nr_allocated; + unsigned long rounds; + unsigned long node_pages, delta; + unsigned char weight; + unsigned char weights[MAX_NUMNODES]; + unsigned int weight_total; + unsigned long rem_pages = nr_pages; + nodemask_t nodes = pol->nodes; + int nnodes, node, prev_node; + int i; + + /* Stabilize the nodemask on the stack */ + barrier(); + + nnodes = nodes_weight(nodes); + + /* Collect weights and save them on stack so they don't change */ + for_each_node_mask(node, nodes) { + weight = iw_table[node]; + weight_total += weight; + weights[node] = weight; + } + + /* Continue allocating from most recent node and adjust the nr_pages */ + if (pol->wil.cur_weight) { + node = next_node_in(me->il_prev, nodes); + node_pages = pol->wil.cur_weight; + if (node_pages > rem_pages) + node_pages = rem_pages; + nr_allocated = __alloc_pages_bulk(gfp, node, NULL, node_pages, + NULL, page_array); + page_array += nr_allocated; + total_allocated += nr_allocated; + /* if that's all the pages, no need to interleave */ + if (rem_pages <= pol->wil.cur_weight) { + pol->wil.cur_weight -= rem_pages; + return total_allocated; + } + /* Otherwise we adjust nr_pages down, and continue from there */ + rem_pages -= pol->wil.cur_weight; + pol->wil.cur_weight = 0; + prev_node = node; + } + + /* Now we can continue allocating as if from 0 instead of an offset */ + rounds = rem_pages / weight_total; + delta = rem_pages % weight_total; + for (i = 0; i < nnodes; i++) { + node = next_node_in(prev_node, nodes); + weight = weights[node]; + node_pages = weight * rounds; + if (delta) { + if (delta > weight) { + node_pages += weight; + delta -= weight; + } else { + node_pages += delta; + delta = 0; + } + } + /* We may not make it all the way around */ + if (!node_pages) + break; + /* If an over-allocation would occur, floor it */ + if (node_pages + total_allocated > nr_pages) { + node_pages = nr_pages - total_allocated; + delta = 0; + } + nr_allocated = __alloc_pages_bulk(gfp, node, NULL, node_pages, + NULL, page_array); + page_array += nr_allocated; + total_allocated += nr_allocated; + prev_node = node; + } + + /* + * Finally, we need to update me->il_prev and pol->wil.cur_weight + * if there were overflow pages, but not equivalent to the node + * weight, set the cur_weight to node_weight - delta and the + * me->il_prev to the previous node. Otherwise if it was perfect + * we can simply set il_prev to node and cur_weight to 0 + */ + if (node_pages) { + me->il_prev = prev_node; + node_pages %= weight; + pol->wil.cur_weight = weight - node_pages; + } else { + me->il_prev = node; + pol->wil.cur_weight = 0; + } + + return total_allocated; +} + static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid, struct mempolicy *pol, unsigned long nr_pages, struct page **page_array) @@ -2289,6 +2467,11 @@ unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp, return alloc_pages_bulk_array_interleave(gfp, pol, nr_pages, page_array); + if (pol->mode == MPOL_WEIGHTED_INTERLEAVE) + return alloc_pages_bulk_array_weighted_interleave(gfp, pol, + nr_pages, + page_array); + if (pol->mode == MPOL_PREFERRED_MANY) return alloc_pages_bulk_array_preferred_many(gfp, numa_node_id(), pol, nr_pages, page_array); @@ -2364,6 +2547,7 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) case MPOL_INTERLEAVE: case MPOL_PREFERRED: case MPOL_PREFERRED_MANY: + case MPOL_WEIGHTED_INTERLEAVE: return !!nodes_equal(a->nodes, b->nodes); case MPOL_LOCAL: return true; @@ -2500,6 +2684,10 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma, polnid = interleave_nid(pol, ilx); break; + case MPOL_WEIGHTED_INTERLEAVE: + polnid = weighted_interleave_nid(pol, ilx); + break; + case MPOL_PREFERRED: if (node_isset(curnid, pol->nodes)) goto out; @@ -2874,6 +3062,7 @@ static const char * const policy_modes[] = [MPOL_PREFERRED] = "prefer", [MPOL_BIND] = "bind", [MPOL_INTERLEAVE] = "interleave", + [MPOL_WEIGHTED_INTERLEAVE] = "weighted interleave", [MPOL_LOCAL] = "local", [MPOL_PREFERRED_MANY] = "prefer (many)", }; @@ -2933,6 +3122,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) } break; case MPOL_INTERLEAVE: + case MPOL_WEIGHTED_INTERLEAVE: /* * Default to online nodes with memory if no nodelist */ @@ -3043,6 +3233,7 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) case MPOL_PREFERRED_MANY: case MPOL_BIND: case MPOL_INTERLEAVE: + case MPOL_WEIGHTED_INTERLEAVE: nodes = pol->nodes; break; default: From patchwork Sat Dec 9 06:59:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13485961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 796FEC10F09 for ; Sat, 9 Dec 2023 06:59:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15B396B0075; Sat, 9 Dec 2023 01:59:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E3386B007D; Sat, 9 Dec 2023 01:59:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9FD86B007E; Sat, 9 Dec 2023 01:59:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D66E86B0075 for ; Sat, 9 Dec 2023 01:59:43 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9EE86C049A for ; Sat, 9 Dec 2023 06:59:43 +0000 (UTC) X-FDA: 81546379446.03.5B280C9 Received: from mail-yw1-f193.google.com (mail-yw1-f193.google.com [209.85.128.193]) by imf29.hostedemail.com (Postfix) with ESMTP id CB4E4120020 for ; Sat, 9 Dec 2023 06:59:41 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eBKXOAxa; spf=pass (imf29.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.128.193 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702105181; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i7HyyIuO4SKyKyBceL5BuQSQ5m+etdZ0u+23B3P6Igo=; b=7sHnShAF4FPVzIsPCu8S1DSCOmmf+6KAp4tGHJcsDj5n/2CfmpJzIRdA5rEolvlwdFltUV 3E1c7rpX/5JIIlH8SsBJRz26zUMp/p0NRSC0bZN5sdmNAsngFKVu5MVuN9OcthDO+kUgTG UN3NCTa1OhNGwVrLzpXQErDt8v+6ZVw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702105181; a=rsa-sha256; cv=none; b=XZ4P3T5o4k6ZV9G9KcAKsNcfFBO6gdHjDvrVK/CvjhoyQjNp7ZHK3Qf+uTxv3Tp6QZwd0A rzIzU0eGoI4VW0BCq8Igsy8ll4Jn5mHna+N+lx33GTIbed+oiakveSoYbLfx7KaFeu6nlm zPG7RR1F48IdpIGdQkfdZD3yHOVb+Rc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eBKXOAxa; spf=pass (imf29.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.128.193 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yw1-f193.google.com with SMTP id 00721157ae682-5cdc0b3526eso21582207b3.1 for ; Fri, 08 Dec 2023 22:59:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702105181; x=1702709981; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=i7HyyIuO4SKyKyBceL5BuQSQ5m+etdZ0u+23B3P6Igo=; b=eBKXOAxaBx1vvZiKkqNI6xHsDgucnrpEnNFUlky5vocZCkpbDL+Fffi0GTZ4aDMT3u ziZGmo5VfKJja/vt6lLPwnt7yIjB4JtvonkAc1TVHcr3UKw/j3eAaIejPNUxxuK4lbsV qXoeTSI5ju5kRuu0iwp4PwvG1AWG8mMU+MnSiE1T5Hqe27NDYFjPlflc/AnTEL/ptUpb PND2CCZQttA7yHTtiBmHosRLknuwJae5xfX8YkYjp309cc13/5gFxkfuNeoZJEeB0pEy FoIN/ZkEgFiyJiiQI/qXPzieC0R0/0vQN7Odt79NPrj8q2mkVG4z9UrspPDqZvYMuAjw NfEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105181; x=1702709981; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=i7HyyIuO4SKyKyBceL5BuQSQ5m+etdZ0u+23B3P6Igo=; b=AGRuQQZZ0Ah7ZO8XvRblem39fAVR7PGGe8+ZnsS+5lbo8vQK0kECDiXg2yxAkushUH Os5+AJOf0ARESPZjYIimLGG2ZdIaWz1eIaX4JRmr5uDNSvorIuZt+wYZXue3ARYDhAfK xOihFMcXgRc6B2ZHttYi08d4AvHM9Nag8DwmjfvvPrZx/gioJkd2qdecAbhA9IGO8fr2 pKIlAK+iL3PpFBQSf9qxTdtnz8/j9SV+oqJYZ6PJbqE7lcPZmMSj8ARD/2eHU1eGGIWW 6Sx1WahnDH/xOdkDdTGNMue4bw9grVIySL1VsRCPLy3hwq4pQEviX4bJlHr/z7SBglun FiYg== X-Gm-Message-State: AOJu0YzS1uQ1HYTCUQWzQ2X98IbBGaeY9oeLMmIwUwX6P2VQpUTpuiAu 1yzyXN+Ho1ichyhUnbzIjTCqey/J+Ni5 X-Google-Smtp-Source: AGHT+IGteQ1CEjfgmNHOonmlymj7glCZN8Z626LEQR0OLvqtfJs+kc+S7pyWNN9I9wFIiUm8ZkYGCg== X-Received: by 2002:a0d:c405:0:b0:5d7:1940:3efe with SMTP id g5-20020a0dc405000000b005d719403efemr1025990ywd.47.1702105180858; Fri, 08 Dec 2023 22:59:40 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x8-20020a81b048000000b005df5d592244sm326530ywk.78.2023.12.08.22.59.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 22:59:40 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com Subject: [PATCH v2 03/11] mm/mempolicy: refactor sanitize_mpol_flags for reuse Date: Sat, 9 Dec 2023 01:59:23 -0500 Message-Id: <20231209065931.3458-4-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231209065931.3458-1-gregory.price@memverge.com> References: <20231209065931.3458-1-gregory.price@memverge.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: CB4E4120020 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: x31zfgitkw3a1wpk3o4zmu6b8gdwni5s X-HE-Tag: 1702105181-508963 X-HE-Meta: U2FsdGVkX1/sAd6AiOwsYl214SPtrM+DbKG2CNTsn+4LfHstHiQo8Y6VLwVpHPp43RzSktphG8TbLbB2k1zCq/+aG0En6iwFIIRZrpGdL8A/nqAxlEnc+ZN5KTA+IPe2CfqWB35/s6gGB9Xvy4a8iJivqEF/99nzQ92y4UA6MxwT40rA2opSa/1dA/Q+q01ulBQmahj0XJYNil76Ir73d+Jw18py/nrIULI76PtOpYBw9ZZSGoxczHVNliBn1j1N3z286TENAZvvgG1rkz8FTSp8JAztlT5y0LzUdHX0LvObT7qh1H6f4QWHTAMjhAF5uqimWm0JwaS1xtythAxHBlM2iDkvuCmkdCofFL+xuXCzW6UkALjgnVaR1ykXdG6IX280Exq28EVhj3GlkmLngKTeg9DUeAxGf0qRWHul+iAs8gjVe/UY8oPA4xjZXs9ALMrWtLpiq9ItebGn4t1gVAc2EtF6atjlsn47gealmL687YrahpGImPgPwaJL95ygidD5MW2Cigq2g5IiSH6rMJjze2by82R3ShFqkdwrvBpvufbtriMFFZEPNlaZdzQx5Ko4OuyU+BiQOAfs6XA/1L8MzZYq2GsPQgct2Z/BBRgfZla89v4+2wJmDEpx4LOnidG4EpH8tknzidL6aD1wb4eO2c88/QESdN8T9a0XGRNGKbfWlKCWoftCHsUT25Wc7G9oFfuo1KTeyle61BvvDDPmEykep4F22t9LR+B6cMn24n7h80xcDQ8NPll40Pmd8MUvfX9e+0vLuz8GyR92/HFtmzToHIFRco9j15z7RTB3+UCvyDSlQ+U44vqxB6ExDCaZc3xFfxYEmTl71RcpRTHdll1HE9c2se1FQ/q+dE4+hhxgg4u9yx/pctHAOQXScv7pE0vSiFx8di89W/ysHtgB9SimcrMbHP5GM3S++R1oegI73/FKYGa2dDQ+OpOt9VcQIuJpj2sdSGNCVqf wBE9We/3 Uv2chXayPIpOieBZS8pxTmX5OOntf+H9OnAHkEFXooTBbGyvHd6ZPAUPnIxo+ZOxNavI5l07h//3pKfCvBjWv1d12ico02FxXExKQs0l0ngz5rnVPNtojrigrXkN1O54do3CUUDFkq2us08EDLkNvdd2Y4SKNW5hBsiaafdApDx6ydo9LYP/fDMKNy40Ej6J9wJeSYgf3K+c3iy7HxBgzH5DGdcRdI+RqO9VEOcYQuD6i/npFpez6GGSPQgEzu9wRXssMmTxrzWmiZytymfNzbEBuYt5T24jzf/abszf+/MKmOXwOF7BWXWmTluA/d6zqbIy57bdy9FZ2psUXn0LSWL6k2JM0AJsXaVB3NK5lTVyy2fCnWjmvEe2UosjSnc3OCK8NdU0bstgqEFAL50rpxY9y2M6z3nB7efSTFj6zjtzqOKXGx6K7Oa2TKTe1AEat2NYvtQkS06cCYzhLh+nrICJX69bCEooSf9shCNYMd2B8jX71wp6jJVZkVHi0BpS9jNBlISnqCjjFp1c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: split sanitize_mpol_flags into sanitize and validate. Sanitize is used by set_mempolicy to split (int mode) into mode and mode_flags, and then validates them. Validate validates already split flags. Validate will be reused for new syscalls that accept already split mode and mode_flags. Signed-off-by: Gregory Price --- mm/mempolicy.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index b4d94646e6a2..65d023720e83 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1463,24 +1463,39 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode, return copy_to_user(mask, nodes_addr(*nodes), copy) ? -EFAULT : 0; } -/* Basic parameter sanity check used by both mbind() and set_mempolicy() */ -static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) +/* + * Basic parameter sanity check used by mbind/set_mempolicy + * May modify flags to include internal flags (e.g. MPOL_F_MOF/F_MORON) + */ +static inline int validate_mpol_flags(unsigned short mode, unsigned short *flags) { - *flags = *mode & MPOL_MODE_FLAGS; - *mode &= ~MPOL_MODE_FLAGS; - - if ((unsigned int)(*mode) >= MPOL_MAX) + if ((unsigned int)(mode) >= MPOL_MAX) return -EINVAL; if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) return -EINVAL; if (*flags & MPOL_F_NUMA_BALANCING) { - if (*mode != MPOL_BIND) + if (mode != MPOL_BIND) return -EINVAL; *flags |= (MPOL_F_MOF | MPOL_F_MORON); } return 0; } +/* + * Used by mbind/set_memplicy to split and validate mode/flags + * set_mempolicy combines (mode | flags), split them out into separate + * fields and return just the mode in mode_arg and flags in flags. + */ +static inline int sanitize_mpol_flags(int *mode_arg, unsigned short *flags) +{ + unsigned short mode = (*mode_arg & ~MPOL_MODE_FLAGS); + + *flags = *mode_arg & MPOL_MODE_FLAGS; + *mode_arg = mode; + + return validate_mpol_flags(mode, flags); +} + static long kernel_mbind(unsigned long start, unsigned long len, unsigned long mode, const unsigned long __user *nmask, unsigned long maxnode, unsigned int flags) From patchwork Sat Dec 9 06:59:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13485962 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A36EC4167B for ; Sat, 9 Dec 2023 06:59:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA2946B007E; Sat, 9 Dec 2023 01:59:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E54F26B0081; Sat, 9 Dec 2023 01:59:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA43C6B0082; Sat, 9 Dec 2023 01:59:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B60DC6B007E for ; Sat, 9 Dec 2023 01:59:49 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8D0B9C04A9 for ; Sat, 9 Dec 2023 06:59:49 +0000 (UTC) X-FDA: 81546379698.28.54A2E37 Received: from mail-yb1-f194.google.com (mail-yb1-f194.google.com [209.85.219.194]) by imf02.hostedemail.com (Postfix) with ESMTP id BB34D80024 for ; Sat, 9 Dec 2023 06:59:47 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=byB+oTGt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.219.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702105187; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5fJWjKBdWhZssADuPucbwh57xIICZA8yt0Ti8FZ2CXY=; b=PSv/EwsHqn/h7tAUf38yT8yJ2GUNM9GR8LZcG/Ki4+Yizr4OZ8elCEX+JITt5gLeAgbTlk yEiSMICMsBybc4XvSetTcCXk7HHYEww1nN6gSv6gWqe6DhDX9Nvni7vo5qi2f0U9lxsVD6 PTBNwXBVO9EcxiEcBL/X9Mv12ltbpyc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=byB+oTGt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.219.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702105187; a=rsa-sha256; cv=none; b=k+aJ1Ho7EnkRiDCnGy68hlYQ92XP3RkWUYqTr6be9Zg3aSwqbGEZDO8detjHBhBhi85Ccl 793iQKOiEKNDys48UDrXOtj0WIUipwuZ0NZoLqQEhoPd1tbcSrGb6OH1/F4AMxG6W1C2cJ PA0M2ZAr9GTfg/Lo0jYIcqIGiVIrmoI= Received: by mail-yb1-f194.google.com with SMTP id 3f1490d57ef6-dbc38f3fbc7so2447809276.3 for ; Fri, 08 Dec 2023 22:59:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702105187; x=1702709987; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5fJWjKBdWhZssADuPucbwh57xIICZA8yt0Ti8FZ2CXY=; b=byB+oTGtMtbd7Rjplvw6fPy6WBgEPiqEsgqy8p5olx01+ljwS4Sl3UfGtuFege3+CR P95t2WvEUWpUfwh+1axQiERyD7jKmCJoj55BjpTrWagLP/6mXFkkE5vPqfQD1+Yp8eT6 KACWyEtfC83YfoP2MDEsEuG7ur9r82tG34jMiXDq2XZVD58ntRr8YIj29dlN0tjvL1CW D/cty8JPxsUNmoP/1NE1vZlFXlZ9KQC25AXTlOdseZ5PWqKn8mrhfc39fVRmcsEQs+YM z7KKIu17UdyPqFo+X/IEBmTTXQAz8sVwAuy6plWlKp1BZTb9tfU1epkt9Xvw9YX8ajK0 MevQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105187; x=1702709987; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5fJWjKBdWhZssADuPucbwh57xIICZA8yt0Ti8FZ2CXY=; b=PedaSNSSCMsRVVip9fG6nRkqD9GNyA6NCGmB7Jnc8Se2NW40bS3WGPdKlSauyGC3cP +UYbqELyn88K/1PJu0dkzfWfqWwP6gCVFhEoYOR8PNkqrNTsDnMbdAW88JlGKcBZ3R/A IoLEmYtE7djD304XwplxRfz9u+czOYc54UtcR+6+um1gWe9+TwS0TnXqtyLvDohZDobh ca2NdbAUkJDMiXegl3B0eqZBwndkl2/AXyB03rolW2c67hKiABLfE0iuVF1PhIrvMH6E M69YaCGoPM7E8grqurznlNbrybkElwJ0uECgaqJR3IBLoHZWXPH/F0OLLF4swAMoCRMb rmvA== X-Gm-Message-State: AOJu0YwMNLcuNpR2iQpYW4f5jm2Vvd0pNmhiGmQ5cb47X/Udp3RJHaSH 97AylBq1M/IkjBRVs0yLksNhgwI6UsX6 X-Google-Smtp-Source: AGHT+IHF8XUqVMPhCWme+bTJUc1wnYluDhMFCId+PO64vQhrQCMsy2CrIazKg3YTeo7a1cynsK6z/A== X-Received: by 2002:a0d:d48c:0:b0:5d3:e835:bd67 with SMTP id w134-20020a0dd48c000000b005d3e835bd67mr948296ywd.41.1702105186764; Fri, 08 Dec 2023 22:59:46 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x8-20020a81b048000000b005df5d592244sm326530ywk.78.2023.12.08.22.59.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 22:59:46 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com Subject: [PATCH v2 06/11] mm/mempolicy: allow home_node to be set by mpol_new Date: Sat, 9 Dec 2023 01:59:26 -0500 Message-Id: <20231209065931.3458-7-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231209065931.3458-1-gregory.price@memverge.com> References: <20231209065931.3458-1-gregory.price@memverge.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: BB34D80024 X-Stat-Signature: mfaqozf1pwnmi6eincxbqfopca5mtsoz X-Rspam-User: X-HE-Tag: 1702105187-900715 X-HE-Meta: U2FsdGVkX19uKoop6wurAiBPn8SgExcCrbEqRhPtngJqAR1sFwn/adDleupZr8daOgb5ntjNdiELgs2MfY1iLBHBvPEy3j9llew9qhsvGSDsi+eQJhZo/wqFfnI/XKVUY6QZQcc3AbVZpJgX0qqBvi8bJrU7eO0SDpi1BJgb/RxeVanpibUlkJc/3jQ1UPwv1Un4y2AO6MPiWiX0s+ykrermoMKnR9znaME30X21MROEPnf9yXagClqejnjqBlwgkae52pufpUqivxc5jY5/EmpSSyewKxQugnNALZs4j2miq+e6ISsHeG2L7anz7Mz3K9OOsghiiYNBf6IBtCdN5vdMI9g8K//ohSWDaBSPR8Su26Ci+S9UExc4JdJ6YT7T0sACn19xrnxwRY+kBWBWeRTw6E7bt6hPovQvc8HvYIpxSSKGGIVULruou6gk5P/mvmr7+4FblpIYFIlEyplnxJG1BmSqXwfKh5iz/O1Vtu2w5Iv2s8wVLKJAbuKEDmkViOehkTZcdseksiRAfZv8+WREhRzx21pnqa0m2dSuxV4Yy+6MkLzAIPQptYiVqKXoc5y1Dhqs4y1X89nSicL4/RN4MTNStpy33m8dbIk80/sIBZ33V8FlL7AC7BD16DKV6c/2I3EKSdlso847a6wKk6oLfnBJRIzWArERZjzuaQnLXQGH1/rdyP0E7jKacjB9u3pELxuHLw9VR/59GSU77CCplAu590SA6UxQwpyiS9xhHCe/4wjNQJS7isGl8wUd/8IZFzPNyeO0WzNJ2Y3X6NJoCGM5XxBHnWRrQIepzp0zQVTiL72Y8TexzbghsVM3dSimYmKLa+j105snVGkAIyRAaruB/TxtGN4Tvkms897e6FXeMApzn9kJ03yIujtL9gVdg0w7ijME8uZIYlEhhhER6MYgRj86DqEJ+5cb/Eh3bpCmaqxlF/UMIKIzBSR0RR01WlUNNLTb7+jh2gP 9RI5yFPm m/LURo2kqhCiitHdScPMKxesrQISmC0Mcub9DRbLjtX4Mx+7bbt5jPF1MdR76SPti5RMwfUUWVJJPVv5qWWjmMM74oYgTQUH0+ebV0kynh99N8SL/FmJ6lhAC3h2WX+teCg3++gcZHtuWoamivnneFfQjSNVlyNuyjlCX1JBYVmao0L/MALL0HwjGtPjRwFvIFgr/Q/kX15YQykRkUOWic0OsiT8WekIl9+AvAubj9mbtUdhJyJAzKlJR/XryXYoPU29dP5giLMPS8IQ7N460y2pFX/lPN0f6C7+/1EaQNU84CiFbqKo9g4b/BoSPbf2NTw2HU/aHp2RwAOhF8iH/sPhqunEbGPba7DSurzxiwveoHUS9URuTE/viS2jlhglka8SMtBEckw2a5jUEObRa7csD6dKYRBEssmoGpgku++RQHuIWmc7uRQCxqrLb/WNo9hRFPjYPOpspwo+5AMcOgRE4E2bDljWZ8gvVN2z2/Gi2HlLk74YWUrBtmb3V+kqskL/sHTe7kw7A2+8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch adds the plumbing into mpol_new() to allow the argument structure's home_node field to set mempolicy home node. The syscall sys_set_mempolicy_home_node was added to allow a home node to be registered for a vma. For set_mempolicy2 and mbind2 syscalls, it would be useful to add this as an extension to allow the user to submit a fully formed mempolicy configuration in a single call, rather than require multiple calls to configure a mempolicy. This will become particularly useful if/when pidfd interfaces to change process mempolicies from outside the task appear, as each call to change the mempolicy does an atomic swap of that policy in the task, rather than mutate the policy. Signed-off-by: Gregory Price --- mm/mempolicy.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index ce5b7963e9b5..446167dcebdc 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -308,6 +308,7 @@ static struct mempolicy *mpol_new(struct mempolicy_args *args) policy->flags = flags; policy->home_node = NUMA_NO_NODE; policy->wil.cur_weight = 0; + policy->home_node = args->home_node; return policy; } @@ -1621,6 +1622,7 @@ static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask, args.mode = lmode; args.mode_flags = mode_flags; args.policy_nodes = &nodes; + args.home_node = NUMA_NO_NODE; return do_set_mempolicy(&args); } @@ -2980,6 +2982,8 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol) margs.mode = mpol->mode; margs.mode_flags = mpol->flags; margs.policy_nodes = &mpol->w.user_nodemask; + margs.home_node = NUMA_NO_NODE; + /* contextualize the tmpfs mount point mempolicy to this file */ npol = mpol_new(&margs); if (IS_ERR(npol)) @@ -3138,6 +3142,7 @@ void __init numa_policy_init(void) memset(&args, 0, sizeof(args)); args.mode = MPOL_INTERLEAVE; args.policy_nodes = &interleave_nodes; + args.home_node = NUMA_NO_NODE; if (do_set_mempolicy(&args)) pr_err("%s: interleaving failed\n", __func__); @@ -3152,6 +3157,7 @@ void numa_default_policy(void) memset(&args, 0, sizeof(args)); args.mode = MPOL_DEFAULT; + args.home_node = NUMA_NO_NODE; do_set_mempolicy(&args); } @@ -3274,6 +3280,8 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) margs.mode = mode; margs.mode_flags = mode_flags; margs.policy_nodes = &nodes; + margs.home_node = NUMA_NO_NODE; + new = mpol_new(&margs); if (IS_ERR(new)) goto out; From patchwork Sat Dec 9 06:59:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13485963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40564C10DC3 for ; Sat, 9 Dec 2023 06:59:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBCB66B0082; Sat, 9 Dec 2023 01:59:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B6AE66B0083; Sat, 9 Dec 2023 01:59:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0D286B0085; Sat, 9 Dec 2023 01:59:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 918136B0082 for ; Sat, 9 Dec 2023 01:59:58 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5F7FB1A0296 for ; Sat, 9 Dec 2023 06:59:58 +0000 (UTC) X-FDA: 81546380076.13.BB8AA75 Received: from mail-qk1-f193.google.com (mail-qk1-f193.google.com [209.85.222.193]) by imf12.hostedemail.com (Postfix) with ESMTP id 904CD4000A for ; Sat, 9 Dec 2023 06:59:56 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UrZFSjjl; spf=pass (imf12.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.222.193 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702105196; a=rsa-sha256; cv=none; b=KlUHEXdeU3pXKgvbxIw54qwKn4u88qOy34W8VysCxnLcZND0H8DeyzZ+HAcdM0pH31p8jL uK5wzN41cf6m3wzjJLxZiyTKOcXjxxUA0GjgpBq5THk3DibLZV/snKDMdygeRtueXsV+kW +y5kqXD9kZAuA9d5wA03/TpAng3Ep2g= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UrZFSjjl; spf=pass (imf12.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.222.193 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702105196; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/jpJNV10Y4Aq3La+SspO/hMcopvAOMbrsn8P/kGKMEU=; b=5FqD/ayv6V1rjTiIcKgWEj6iiA9cEu6YFS1yY/BySqJ57H8AnND7ShcNg8FqruImHmiQC3 /3xqARsDJRYgyxsDNmAIWMPNU365bmivLQ6UQ7dFFR619qqEg3/6fiXZPRuBZoC774tYBT MeH82OPV0p7xAsooRIfvA46jDrU6R6I= Received: by mail-qk1-f193.google.com with SMTP id af79cd13be357-77f347b0299so144498085a.3 for ; Fri, 08 Dec 2023 22:59:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702105195; x=1702709995; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/jpJNV10Y4Aq3La+SspO/hMcopvAOMbrsn8P/kGKMEU=; b=UrZFSjjlCJq7M/n6zA+LkOrrXdQ+toq2u1Hh7PB+ftMsc8mJNIj39RmwCQCA3tCt0g 8d9X8xzOXO5frKt1QhCXuDo7Hn4Vs2cdKlkLtJ/oqJArdf6CODIHQuoz1IJ9iyVYiPM9 7wgyieN8QhclqQziwjff9IsZ+Hpoca1Q2G62pncw4Q+lWj2dcaLnY4QMCXulaZ7c62Fe L4KjpYGb53DP8xz9czp5kxY+QVntv6dpGLHRqQVPMdnWlAd46Ks8tzWaoUguPrrGaZuh ulNWoxRJ+DGmDdAcI2dKOsVYBhX7Fls59V5vHA6UlgsEW7WzVLSPLd54Mm2QwfnFVoZc Yenw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105195; x=1702709995; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/jpJNV10Y4Aq3La+SspO/hMcopvAOMbrsn8P/kGKMEU=; b=Sc6eXad+mvdLxVudgYBluyXzvrrc6P+kkCTs4Y+CPCGE9sk+MG37ZNG2D1iRzRYt0d CLsdsMKDu2bGeNquchk3rqrmTCT+3eEhJBvPlVCShiaVI+0attcDNO0WUuBRlQb7piNR MuD0mzdm7pYDNAA82ceSIKgLRGpr9ZsNcaxHPIwg7CsDDIgrYhGsWXSouMholket89wN 4YfO+Ak3algcGlobWIO4Un8Dic4k3FYEmuQpMM/D1Rs/t+ipp2khJpXaN2qfICGUeTtm n2GrmoFiG5+RMmsYjOnCgrV356VQW7+sZeuz3omR6ECwy1idNYWhy6vkTamk+TBdzSKV ClPg== X-Gm-Message-State: AOJu0YxCh/l1ef3+pnQ1OMuKWEIGRTwBm8xqDfNGkh5+5AhqXxmeBezO N4wHZDWRIT/xOwXVN5KdDOUA5t9WHIau X-Google-Smtp-Source: AGHT+IGq7B5WE+Xyw4YblcWAaIb3zQDtffG9M6pUeoRlky0GQTbpT4Cg4FD2j38+EVMPor4kTHolGw== X-Received: by 2002:a05:620a:2a10:b0:77f:3161:9147 with SMTP id o16-20020a05620a2a1000b0077f31619147mr1748373qkp.19.1702105195576; Fri, 08 Dec 2023 22:59:55 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x8-20020a81b048000000b005df5d592244sm326530ywk.78.2023.12.08.22.59.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 22:59:55 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com, Frank van der Linden Subject: [PATCH v2 07/11] mm/mempolicy: add userland mempolicy arg structure Date: Sat, 9 Dec 2023 01:59:27 -0500 Message-Id: <20231209065931.3458-8-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231209065931.3458-1-gregory.price@memverge.com> References: <20231209065931.3458-1-gregory.price@memverge.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 904CD4000A X-Stat-Signature: 9eug4q3xjmx3p33zjrooxgn1bjrtbi6w X-Rspam-User: X-HE-Tag: 1702105196-127344 X-HE-Meta: U2FsdGVkX1+loX1Ho4PLiZMQQPVaXLjZ6+nihcEEe+qXpXFuS5ySJPJSXuWNfeF2bx/BB2FZiBt2VFsy72w06n4+wGbp1FDVMCm4J+JrJ4Wfm0WNyk7wa8OxVrinyniauZWz5phTM8IOKSAy+WiCcxQS4EWIO0YROYQO81AorMnrnh6gzZ2+D4dM7mC3EeqvJytvdIgT7ttQDpvfkKTh/zbJZmBA5BmN4qLW5+TOpGScNoKisXw5bfRf2td7+m/0IYm+5iI00apy0LSgws59KySF+GhxDj8cpbra3plbPN0EYfSnS1u2yNGp7t634XZpG7jP0kuw9QLd5zOu0IOOiXqI65VuPmTw1Tj/bqc0AAIrYvv0KOh5cktor2WooMkhKvRVmcAkHfqghfyy9kO3iMGnH56bwu5sJoOpXi5JH0fPbSXAtW25dfeVp/GmSI7ng2heKJsscBinjvUF0kezCoeTKv1QhD9nOTbLtIKu0iXoLIY3YjxzYaPpTs1vYApST15+Q2d9azNbDRUIi/ArU/rmvvKXWlr42JIF1YUHP7K+Ak6hI5OQRkkweq6PdPAJ1zs98xua8ri7DXmFaH5/ARsLd6R4YjkI3Oxiw8Drg/j+sGPYAVfBT78dIWhm5p4/cwWLh7QSeFMrd6UIybu1PuJSq5oVGOqVqdtNA6rtOpynSQQjgco7R5ZGPm2ys/iPvCzG7/aFOFyOHqQvgVIAPmSd3lDzcshnDiA6CWWQJa+1/ia+eylSNtSDMgjzCtWjqXYxrdmBKyNoCj3/TEcrjKdpW760uAvWfzK1TofFErKX9xRrj02vdDgJzWvJTCOsbnS+ddclTy7R3r11zoXPO/y2Y2L4ap1xFPkRjAHaoIFW+v3oufOc38p+frjFjz8BKwzPXKAuSJPugI8Ml1gF7jfNXd5mdHjD0/U4I8y84wmLEYjI4M/PKZ6oRm31762yMLw+vak4yVTVP6fETrx L0LCO7dz 4pn6n0rhm+MphkQAWDsusvz7dJ+RdSKdPthHU37bxmEjMx4HUFyg+N/9Vuwv2MfzFuygtXzX9PJOeIq5XJVSrC4IA0nnHcy5Rcg52z8QItN8IkBB6TkJZQTpB0e917HK3fWdP3kXqSaiT0GlgNW34UCtFpJ/bgLupnInyjRarTicFmuq+vDxXb1TB4fc9UxgUc3v0YZn7KHEcXtZSZ89G3BlrKFNbNTttA2hLp3SbU/EtEmpIq/d9dSvXcWkLpXTn2jxgb/zW0yFlkb4CqkoCUewwB1qSChy3RlyoMhrVMfT7dphzvmxT3Tlfrz3Zsrnz8dphZ3nd+3w+BKBeXWw+eWZa9imYvAURu5/wkULuTI0FFU/5087qWHZKPjPrhDYVBwbRe+EuElV+Js2l5SJMWIXVTeydRbPPTHmDgM9SgQBALeJ4dflCCLRA0JltUR8nDxc3OEq/2jzLm0V3sCC9iHmEARSsbPOEjO7RXPoEkZmgycbZGPlT2gXaF0MJS6HtiuwS8IXwlUuT6JVs+/2ErHwkv+UVSWFQqeReQN1S5TZYrfQmmqxYYuMYs6q6CdjnjOlR5VxB0kICybEt/88yh8qApbrSeowBkNOCl2YdaEXfVrwxaTnKWg7z22fpCU5/Tj1F X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch adds the new user-api argument structure intended for set_mempolicy2 and mbind2. struct mpol_args { __u16 mode; __u16 mode_flags; __s32 home_node; /* mbind2: policy home node */ __aligned_u64 *pol_nodes; __u64 pol_maxnodes; __u64 addr; /* get_mempolicy: policy address */ __s32 policy_node; /* get_mempolicy: policy node info */ __s32 addr_node; /* get_mempolicy: memory range policy */ }; This structure is intended to be extensible as new mempolicy extensions are added. For example, set_mempolicy_home_node was added to allow vma mempolicies to have a preferred/home node assigned. This structure allows the addition of that setting at the time the mempolicy is set, rather than requiring additional calls to modify the policy. Full breakdown of arguments as of this patch: mode: Mempolicy mode (MPOL_DEFAULT, MPOL_INTERLEAVE) mode_flags: Flags previously or'd into mode in set_mempolicy (e.g.: MPOL_F_STATIC_NODES, MPOL_F_RELATIVE_NODES) home_node: for mbind2. Allows the setting of a policy's home with the use of MPOL_MF_HOME_NODE pol_nodes: Policy nodemask pol_maxnodes: Max number of nodes in the policy nodemask policy_node: for get_mempolicy2. Returns extended information about a policy that was previously reported by passing MPOL_F_NODE to get_mempolicy. Instead of overriding the mode value, simply add a field. addr: for get_mempolicy2. Used with MPOL_F_ADDR to run get_mempolicy against the vma the address belongs to instead of the task. addr_node: for get_mempolicy2. Returns the node the address belongs to. Previously get_mempolicy() would override the output value of (mode) if MPOL_F_ADDR and MPOL_F_NODE were set. Instead, we extend mpol_args to do this by default if MPOL_F_ADDR is set and do away with MPOL_F_NODE. Suggested-by: Frank van der Linden Suggested-by: Vinicius Tavares Petrucci Suggested-by: Hasan Al Maruf Signed-off-by: Gregory Price Co-developed-by: Vinicius Tavares Petrucci Signed-off-by: Vinicius Tavares Petrucci --- .../admin-guide/mm/numa_memory_policy.rst | 20 +++++++++++++++++++ include/uapi/linux/mempolicy.h | 12 +++++++++++ 2 files changed, 32 insertions(+) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index d2c8e712785b..64c5804dc40f 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -482,6 +482,26 @@ closest to which page allocation will come from. Specifying the home node overri the default allocation policy to allocate memory close to the local node for an executing CPU. +Extended Mempolicy Arguments:: + + struct mpol_args { + __u16 mode; + __u16 mode_flags; + __s32 home_node; /* mbind2: policy home node */ + __aligned_u64 pol_nodes; /* nodemask pointer */ + __u64 pol_maxnodes; + __u64 addr; /* get_mempolicy2: policy address */ + __s32 policy_node; /* get_mempolicy2: policy node information */ + __s32 addr_node; /* get_mempolicy2: memory range policy */ + }; + +The extended mempolicy argument structure is defined to allow the mempolicy +interfaces future extensibility without the need for additional system calls. + +The core arguments (mode, mode_flags, pol_nodes, and pol_maxnodes) apply to +all interfaces relative to their non-extended counterparts. Each additional +field may only apply to specific extended interfaces. See the respective +extended interface man page for more details. Memory Policy Command Line Interface ==================================== diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 1f9bb10d1a47..00a673e30047 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -27,6 +27,18 @@ enum { MPOL_MAX, /* always last member of enum */ }; +struct mpol_args { + /* Basic mempolicy settings */ + __u16 mode; + __u16 mode_flags; + __s32 home_node; /* mbind2: policy home node */ + __aligned_u64 pol_nodes; + __u64 pol_maxnodes; + __u64 addr; + __s32 policy_node; /* get_mempolicy: policy node info */ + __s32 addr_node; /* get_mempolicy: memory range policy */ +}; + /* Flags for set_mempolicy */ #define MPOL_F_STATIC_NODES (1 << 15) #define MPOL_F_RELATIVE_NODES (1 << 14) From patchwork Sat Dec 9 06:59:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13485964 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86B42C4167B for ; Sat, 9 Dec 2023 07:00:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A8176B0085; Sat, 9 Dec 2023 02:00:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 158BE6B0087; Sat, 9 Dec 2023 02:00:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3CA76B0088; Sat, 9 Dec 2023 02:00:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DF3076B0085 for ; Sat, 9 Dec 2023 02:00:13 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 901F7160470 for ; Sat, 9 Dec 2023 07:00:13 +0000 (UTC) X-FDA: 81546380706.29.3764609 Received: from mail-yw1-f194.google.com (mail-yw1-f194.google.com [209.85.128.194]) by imf07.hostedemail.com (Postfix) with ESMTP id A3C1E40012 for ; Sat, 9 Dec 2023 07:00:11 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XMgaqhVP; spf=pass (imf07.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.128.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702105211; a=rsa-sha256; cv=none; b=Rx5iSRmaC5SXqq+JvopRNYhKCE6eV2szNHEicT6NnH2NrYyteZrHS7xMQ2SzJo8XdUhj66 i39yPKy9L9ShPJ10OrAm2RDTqcFo+XfbAV26sIqaXlLX7KVdm1N9eYfhNqKp3Hv9UfF8xk gvF6xozP8NH5b5CM7h82b3hnoOB3dVA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XMgaqhVP; spf=pass (imf07.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.128.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702105211; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5jYajdpOXFi/B/dNFku3CaKz4MM7mFJ22/+jEV7wLMU=; b=fgiSbEO5Wa9akUaLxCH6AO4Bj4kri0XRNW3t/q9qIOSLhJ7jb9IFXkvHPClELfP8nBqtHS 2Fy62e61c9yp6y7RSnvK/8UT18VWeTKe19npnQ4JeDL1BISasWUfEiVyGjQL3Q0QeqORiV jfwlEefsKmfFGYBS3kTlN2nIDD2XCZE= Received: by mail-yw1-f194.google.com with SMTP id 00721157ae682-5d852ac9bb2so27032797b3.2 for ; Fri, 08 Dec 2023 23:00:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702105211; x=1702710011; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5jYajdpOXFi/B/dNFku3CaKz4MM7mFJ22/+jEV7wLMU=; b=XMgaqhVPbPC265xYTEg1YmTPF1Dj9z5Muqyi2q9zScaukqlhCTG6t4mb6WY4flHuht xIbC+PuirZT1Ta95Z6dK9fGJxaS7sQ86aa4BCLmZM3ivuyjtLjQTXjKUgzewNYsaUahH diWZMf4ykVq1QN/6Vhl8ABarLKaQb2xpYFtg0765eE15Bd+oRk+bJbDCV0a/w4qmlTFD 7b7DdWDjtMAdh4ihx0YdA1Cj7a0ClpsTO2iWpVaWgM1IsTVIPeDPP2gC8mxLY04FQsPq TJABPD9+XJ0uPuXVqsJ3JWJZk5q9/IbH/cd7UOow53oRYoHQuF2cf26yvhwhpDkLNmLw 1ZKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105211; x=1702710011; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5jYajdpOXFi/B/dNFku3CaKz4MM7mFJ22/+jEV7wLMU=; b=DpWf5yoMzo0uKaBVtlrQQQt8kIvRIhO1YjCBF7jm0VZgUp900KwI0V1AnRbd7/lB0g Z/gnSOL3e6jg/xsiaBagueyHRIAHZwSFuLhCnicIk60bkjq3KHx0SrXaatyKITZEl+Le XTyu9gZKWuO/PpHLzf2lpBr4P8+kvve7NU8BY1F5DK4wZFfinBFSnH7ywT97rm4vtdqw omSzbeqbAiWUTJEW9nHNntmmIBSOBWzvvUwrzHLUgG4A45gVj5NGC4psV8QsuQtAh7kY lIY2rzLKsYbVSf1wPL7/2zNZBgy7iM0fxNyDcQInmIoSOOBDWhN71X/f0+4fk0ZU/1rK CQfg== X-Gm-Message-State: AOJu0YzS+QwCFKo5rzJCGvqekiH+u5m0TPiM/DtkV4t94XUTgi2Qb5yY AEK3XRiixvRNELd/NZVWV66P9wJ0U8P8 X-Google-Smtp-Source: AGHT+IF4HLQWZThd+U+NGJAyGUqIh0Nsy1W2x6Jmvx3yrk87lHkzFdoTufr1N2HCgYU1abpTa7t0eg== X-Received: by 2002:a0d:cc05:0:b0:5d6:aea0:2232 with SMTP id o5-20020a0dcc05000000b005d6aea02232mr1034156ywd.19.1702105210686; Fri, 08 Dec 2023 23:00:10 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x8-20020a81b048000000b005df5d592244sm326530ywk.78.2023.12.08.23.00.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 23:00:10 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com, Michal Hocko Subject: [PATCH v2 09/11] mm/mempolicy: add get_mempolicy2 syscall Date: Sat, 9 Dec 2023 01:59:30 -0500 Message-Id: <20231209065931.3458-10-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231209065931.3458-1-gregory.price@memverge.com> References: <20231209065931.3458-1-gregory.price@memverge.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A3C1E40012 X-Stat-Signature: b46rn3insxfwoee46zcruzk333grpdmr X-Rspam-User: X-HE-Tag: 1702105211-538929 X-HE-Meta: U2FsdGVkX18LybOuWAp4OhrrrrngrFgnh5LlF01VSYlKVTvnpHsSHvS+b/478p0jSUqYj9VXroloPVryOwg9/9F+0ZcrCSUz6HFDftOxXVnA1bcF8XLeHx0KdGpI+j5e2YzPLkOkfa43Bg6n4BG1O3NaBpy63SRl95IU6kc+MPOiuxTOgLnw+ZmCapdi2w0ptQdNXKKvsKNCbTHYTHRcwPFRYTI8zbfByZFn7eL+Vcmi94Z247oIu/85AMbOMtobFdodX3fhFt31zK6SNRKWTsiUdpW6UUvMbAeWMDUC6lpffcg4oqFoaMbrHUnFjknqpTPzN0dgYuC54qenR9StGdXeU+UrTVL5V8Ht5JbeHpi2FAYbfL5NU6jHct1RVt02lsvIODRClFVQk2pU+sPWNp1bPOl6G4ly8oXDPgNEC+y38cgPEjud5HheCvgz7g3SH0qYMtYgrNn5h+3wWFD2BBOUAnDNw1C9uvAPxVSHmZhEVlDEfxjj0/j0m4TJMc089FbgJX4eqA27GpnwyRtUPXfPiKCn/LtWWyS5FO3qWfx50MkGulGI7kmsVIZmhj9pxXNsEX61pbyCY9jCmzsfW6syULz5NwlEEDn36nW3TOv0Fn1lSr7x6LroP0K0h+JoeG9npHavQy6YK7Ewe0l/wjpZciAw2BvYHeOStcsXvEEtZYvHsj/aYt4kjYGhxggA+2jwU+iVKBmoI/pGYE+ANpZ15UY+PjHIvy2AOMenRyElk+33ssWdiB9ip4ysCSbQPpaWXAAnbFF9QgLhBuhjuB/mQoki9FAqEO+WJBDi4N8lsc+iMtuIFHISx2LZ/jbP73LGss/Wipe8h5JGmaJDn9Nqo1ddvXvTISCIg15lTSD0cHQQmdfYl8b1tYz2bpbpAJjSfcM3GxTPTgq8R4Dgf4uYjdW6MqmOglV3yPTfjhOM7h4Gab8GIgUMKEnrvpOLHPRNMSB+VhUhIByT7Jf rNOSVYNG Ke6FqkUzMxC8N3Eqbbr0mxw3uJ8V+REu1l3RCdTq1uJlYcn9R54E7jqazXSA05VqJXHOoqUZs+G3bmSAOhUJBC17PiYqmSozKhOU1UCwi2jO5G3qUJaTYMPf4bxC2eQR15fD3tUtGk79e0/D74cV3TkM9x8Baos584QWrvPvrc/WR+F0ZEDc2FHLmbQDyCxi6dcijOaaGiOxxu51W0TvNIQNZkl1iP7fjrE53GfgTCyEhlUHFDvLMnC22KRMCl9a5DeNE64qhxDU+eCclMScRRj4Q5Vy0AL/0K7ZIwdroYCb+wIwiopZZAIW5d2AvHr7XmZSAaySIFdRtB8ebcmHZYxDVrAMwNYw3yqoYVPynF77UqrMibPEjnPawooQrbj1Bfpw9gogJnnpJ2R031KXIiHD/4ZUzNu6K7R8HKjA1KFYOfCK9Dq7Wv5KOhcwC8FF6lwc09ERHk/dSunSW3EOjgP2DLQhOw5atJ1BCXO5chygmkyTHna+SNCwiRlaJ9hbvQGkFgY0q0Wc3EA8iPYOfdvEPbuev+oQ6R2EobFfc4G3fuFxfNpS57oTImYIz8dEjMh9ptKohkRd6DZQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: get_mempolicy2 is an extensible get_mempolicy interface which allows a user to retrieve the memory policy for a task or address. Defined as: get_mempolicy2(struct mpol_args *args, size_t size, unsigned long flags) Input values include the following fields of mpol_args: pol_nodes: if set, the nodemask of the policy returned here pol_maxnodes: if pol_nodes is set, must describe max number of nodes to be copied to pol_nodes addr: if MPOL_F_ADDR is passed in `flags`, this address will be used to return the mempolicy details of the vma the address belongs to flags: if MPOL_F_MEMS_ALLOWED, returns mems_allowed in pol_nodes if MPOL_F_ADDR, return mempolicy info vma containing addr else, returns per-task mempolicy information Output values include the following fields of mpol_args: mode: mempolicy mode mode_flags: mempolicy mode flags pol_nodes: if set, the nodemask for the mempolicy policy_node: if the policy has extended node information, it will be placed here. For example MPOL_INTERLEAVE will return the next node which will be used for allocation addr_node: If MPOL_F_ADDR is set, the numa node that the address is located on will be returned. home_node: policy home node will be returned here, or -1 if not. MPOL_F_NODE has been dropped from get_mempolicy2 (it is ignored) in favor or returning explicit values in `policy_node` and `addr_node`. Suggested-by: Michal Hocko Signed-off-by: Gregory Price --- .../admin-guide/mm/numa_memory_policy.rst | 8 +++- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- mm/mempolicy.c | 47 +++++++++++++++++++ 18 files changed, 73 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index aabc24db92d3..a52624ab659a 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -456,11 +456,17 @@ Get [Task] Memory Policy or Related Information:: long get_mempolicy(int *mode, const unsigned long *nmask, unsigned long maxnode, void *addr, int flags); + long get_mempolicy2(struct mpol_args args, size_t size, + unsigned long flags); Queries the "task/process memory policy" of the calling task, or the policy or location of a specified virtual address, depending on the 'flags' argument. +get_mempolicy2() is an extended version of get_mempolicy() capable of +acquiring extended information about a mempolicy, including those +that can only be set via set_mempolicy2() or mbind2().. + See the get_mempolicy(2) man page for more details @@ -506,7 +512,7 @@ Extended Mempolicy Arguments:: The extended mempolicy argument structure is defined to allow the mempolicy interfaces future extensibility without the need for additional system calls. -Extended interfaces (set_mempolicy2) use this argument structure. +Extended interfaces (set_mempolicy2 and get_mempolicy2) use this structure. The core arguments (mode, mode_flags, pol_nodes, and pol_maxnodes) apply to all interfaces relative to their non-extended counterparts. Each additional diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 0dc288a1118a..0301a8b0a262 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -497,3 +497,4 @@ 565 common futex_wait sys_futex_wait 566 common futex_requeue sys_futex_requeue 567 common set_mempolicy2 sys_set_mempolicy2 +568 common get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 50172ec0e1f5..771a33446e8e 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -471,3 +471,4 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index 839d90c535f2..048a409e684c 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -457,3 +457,4 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 567c8b883735..327b01bd6793 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -463,3 +463,4 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index cc0640e16f2f..921d58e1da23 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -396,3 +396,4 @@ 455 n32 futex_wait sys_futex_wait 456 n32 futex_requeue sys_futex_requeue 457 n32 set_mempolicy2 sys_set_mempolicy2 +458 n32 get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index f7262fde98d9..9271c83c9993 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -445,3 +445,4 @@ 455 o32 futex_wait sys_futex_wait 456 o32 futex_requeue sys_futex_requeue 457 o32 set_mempolicy2 sys_set_mempolicy2 +458 o32 get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index e10f0e8bd064..0654f3f89fc7 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -456,3 +456,4 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 4f03f5f42b78..ac11d2064e7a 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -544,3 +544,4 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index f98dadc2e9df..1cdcafe1ccca 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -460,3 +460,4 @@ 455 common futex_wait sys_futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index f47ba9f2d05d..f71742024c29 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -460,3 +460,4 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 53fb16616728..2fbf5dbe0620 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -503,3 +503,4 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4b4dc41b24ee..0af813b9a118 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -462,3 +462,4 @@ 455 i386 futex_wait sys_futex_wait 456 i386 futex_requeue sys_futex_requeue 457 i386 set_mempolicy2 sys_set_mempolicy2 +458 i386 get_mempolicy2 sys_get_mempolicy2 diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 1bc2190bec27..0b777876fc15 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -379,6 +379,7 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index e26dc89399eb..4536c9a4227d 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -428,3 +428,4 @@ 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 +458 common get_mempolicy2 sys_get_mempolicy2 diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 3244cd990858..774512b7934e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -820,6 +820,8 @@ asmlinkage long sys_get_mempolicy(int __user *policy, unsigned long __user *nmask, unsigned long maxnode, unsigned long addr, unsigned long flags); +asmlinkage long sys_get_mempolicy2(struct mpol_args *args, size_t size, + unsigned long flags); asmlinkage long sys_set_mempolicy(int mode, const unsigned long __user *nmask, unsigned long maxnode); asmlinkage long sys_set_mempolicy2(struct mpol_args *args, size_t size, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 55486aba099f..719accc731db 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -830,9 +830,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait) __SYSCALL(__NR_futex_requeue, sys_futex_requeue) #define __NR_set_mempolicy2 457 __SYSCALL(__NR_set_mempolicy2, sys_set_mempolicy2) +#define __NR_get_mempolicy2 458 +__SYSCALL(__NR_get_mempolicy2, sys_get_mempolicy2) #undef __NR_syscalls -#define __NR_syscalls 458 +#define __NR_syscalls 459 /* * 32 bit systems traditionally used different diff --git a/mm/mempolicy.c b/mm/mempolicy.c index a56ff02f780e..cfe22156ef13 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1859,6 +1859,53 @@ SYSCALL_DEFINE5(get_mempolicy, int __user *, policy, return kernel_get_mempolicy(policy, nmask, maxnode, addr, flags); } +SYSCALL_DEFINE3(get_mempolicy2, struct mpol_args __user *, uargs, size_t, usize, + unsigned long, flags) +{ + struct mpol_args kargs; + struct mempolicy_args margs; + int err; + nodemask_t policy_nodemask; + unsigned long __user *nodes_ptr; + + err = copy_struct_from_user(&kargs, sizeof(kargs), uargs, usize); + if (err) + return -EINVAL; + + if (flags & MPOL_F_MEMS_ALLOWED) { + if (!margs.policy_nodes) + return -EINVAL; + err = do_get_mems_allowed(&policy_nodemask); + if (err) + return err; + nodes_ptr = u64_to_user_ptr(kargs.pol_nodes); + return copy_nodes_to_user(nodes_ptr, kargs.pol_maxnodes, + &policy_nodemask); + } + + margs.policy_nodes = kargs.pol_nodes ? &policy_nodemask : NULL; + if (flags & MPOL_F_ADDR) { + margs.addr = kargs.addr; + err = do_get_vma_mempolicy(&margs); + } else + err = do_get_task_mempolicy(&margs); + + if (err) + return err; + + kargs.mode = margs.mode; + kargs.mode_flags = margs.mode_flags; + kargs.policy_node = margs.policy_node; + kargs.addr_node = (flags & MPOL_F_ADDR) ? margs.addr_node : -1; + if (kargs.pol_nodes) { + nodes_ptr = u64_to_user_ptr(kargs.pol_nodes); + err = copy_nodes_to_user(nodes_ptr, kargs.pol_maxnodes, + margs.policy_nodes); + } + + return copy_to_user(uargs, &kargs, usize) ? -EFAULT : 0; +} + bool vma_migratable(struct vm_area_struct *vma) { if (vma->vm_flags & (VM_IO | VM_PFNMAP)) From patchwork Sat Dec 9 06:59:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13485965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B6B8C10F05 for ; Sat, 9 Dec 2023 07:00:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A832A6B0088; Sat, 9 Dec 2023 02:00:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A32346B0089; Sat, 9 Dec 2023 02:00:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AD5F6B008A; Sat, 9 Dec 2023 02:00:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 77AD36B0088 for ; Sat, 9 Dec 2023 02:00:22 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 542211A0482 for ; Sat, 9 Dec 2023 07:00:22 +0000 (UTC) X-FDA: 81546381084.27.77D428F Received: from mail-qk1-f194.google.com (mail-qk1-f194.google.com [209.85.222.194]) by imf26.hostedemail.com (Postfix) with ESMTP id 5BBB014001C for ; Sat, 9 Dec 2023 07:00:20 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eU63FA4b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.222.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702105220; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5/XyqqBkg5fuo33Ip2SFhoThrNsBfzrYojUCpknb6qg=; b=mQIkl82IvPXqwrDwrcpdsWaLLOgPgOka3YBVw1SIqzQwXQdc4Fxnjo/31k2uchZuOjPV4V e1Cxej4WOXKEEnRmHmX+hVoPR/qON3m67LLXhs1UBO+vQ5tWjG8dq12wECWS0mz7IrS7f1 e+qZWpkxcqKHMNcqA89PG/65vcM1rCA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eU63FA4b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.222.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702105220; a=rsa-sha256; cv=none; b=ntff9iDPd6qzfbP3L0OS+SbxyxOYKILG0YKbklhdg7evFNjciEORa77ptdgOcEOW9HrpS4 Z/9bCvAnc20fZerNsAbEk5YGBVpLAaG9rEfJFnyPbJZE6cE1cLOwGo+D8U3jxzP4bPI2UM 41+7pb/Ixpu8vSrUuHdKukr/kh7p/Ls= Received: by mail-qk1-f194.google.com with SMTP id af79cd13be357-77f552d4179so62926385a.1 for ; Fri, 08 Dec 2023 23:00:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702105219; x=1702710019; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5/XyqqBkg5fuo33Ip2SFhoThrNsBfzrYojUCpknb6qg=; b=eU63FA4bQDd+7qvNQdeegV9c/tt1NBsuFKeA4zMJlpq1p9or9jm3QK9pwbSrQ/Wcnh TDAJnt74b3gYFu2/ZcKytAq4jpexT3OjPwn95PMtcjj7xLNT9DkgcWzFGrDn8pmSHmnb aEWcas7BEjtRLBMkTpim80hK+rgKYV7sIeDdYSnCTKvC02Kik7PfCRDPpAaPU7ojwUzO qdG+PEjtS3OWgiaL6bmh3ZmHGLkktXHj4uLG0E1SaAyTa6sEPNw1GTIBOfVzbaGjNJEE LbEnuKVl8RawSIkGw76YviZRbCEcJ2eguqlu8YSJJrtAgVaZCYUlJow2oTzGXNp8QdC2 UlPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105219; x=1702710019; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5/XyqqBkg5fuo33Ip2SFhoThrNsBfzrYojUCpknb6qg=; b=sb4EIBs3vwHgY384FIYGK7Aw+mCcAqGVMZjg137TFyB9b/+J76AGOJrl84mQhcEUGp NexCsDqHbYcZZVKqyiFeC65O9gk81UT3JHSWBt4ABx/8Z3B/NQQ7RvoQcsK07h8pQQht mKKK4+B/L7tz/+nJFdutN1BLzmE+CeVNI0g4EIlLm3o3CxpPinDhyxDD5GRaxLJ94Vey xtMlS2GwZzR4LcpuVyuWjBrp5wyieevkwYVKwo09lmaC2hKj46FiSPKQE1UHfBPkpo7Q 0QM8agxr1flgUTNtEwu7mHqMR4zhqoJjXOJwlnLehDPokC/aGU4UATAcVCI333OCKsM7 JlJQ== X-Gm-Message-State: AOJu0YwNzGwkH90np7AX7kLwGYiOibuCuONvdgyXkG9XbK5uPj+MxWZc rD3LZ3jBeYOLdvR6Y/VGT+SS2cilP/cS X-Google-Smtp-Source: AGHT+IG+mvJKnPRrBL3r0GmZB6X9O+Z0T8mESY8jq/MrpvJQq54w3SN+fxjeQ0TUX61npkiogxX9JQ== X-Received: by 2002:a05:620a:2287:b0:77e:fba3:9383 with SMTP id o7-20020a05620a228700b0077efba39383mr1322911qkh.101.1702105219224; Fri, 08 Dec 2023 23:00:19 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x8-20020a81b048000000b005df5d592244sm326530ywk.78.2023.12.08.23.00.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 23:00:19 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com, Michal Hocko , Frank van der Linden Subject: [PATCH v2 10/11] mm/mempolicy: add the mbind2 syscall Date: Sat, 9 Dec 2023 01:59:31 -0500 Message-Id: <20231209065931.3458-11-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231209065931.3458-1-gregory.price@memverge.com> References: <20231209065931.3458-1-gregory.price@memverge.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 5BBB014001C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: mfpp8yrgogr679nh9mykh9dzya7xtezt X-HE-Tag: 1702105220-49470 X-HE-Meta: U2FsdGVkX19VXAR0PMozzN0bYGTWOj2WiJC5ilqt7kLjJnwY4NKTdXE4hJxuL/1ME/jzid3+Sjgkn5xKfxOG1O+kyAU21yVpHyq6yx2a1WxvJ8PRsWagoJAN/Zwy/yh1jxYEcdV+UHE6Ce3QPfeLmIn24jRfFSK2G5JoDwPHpC8QzadDA0hFVyMwTlq7DYqUQQik+JOTRTKkArmxATvK/7g5vR1Tq24LwYgPebNUQhFRuAr45T5COkZVUsg/3mHgJyM+ee5Zy/4NnerXOrRUrBAQ3Zf7vpUErV/xpAEw2If6M3BP5dzglwLoQXJgIR9PpjLwRpY0nIJo5xhSoRZy+JF/DsV44Db6lJ7KAROMy5X+0Qb5Qvj95Qeg0agxqai0L/0HyEZR50G20t/e0A5WBH4ImA3NREc1pureENEFMTbkxFs2NMv9KKBKQB/1aUq7zSKro7wXVzjWDzA6x+HtudzKtSdDmBkbnQc5/95hL768n/HwZ/ah0mK9IadYnQsNqLWIfzNxSDBGP0Ns9EUeF9YIb0W/TKpcqHX24mUT3sNf8FMBj5rWwtkcdtoxhumf3KMMgJdPFcy+umjiopk2Ndzq8C3xL6BAy0jxU3bDBliaUw2/A52c/SISBIHN8UzxE//2TKRE9IRfq0yaQfjSjcuCd+NwiBdL7TPTiguj7ANNBKANgKjN9wlpp+nk5YdN1MAAqdWha2rzyQ2EYIfZRc4SBNXLKLmX/B6LfXfJj+q0I2+9q30hcXpXB7MK16oIVUnq1Madp69dFWp4Sfthd1gTpCbMpyU93CbF1KcmUuupYZwFtNuKj4zaRr6jgsscrWELZIl16HvmXlROBpng7Lwpuds8hl7Pt2p21fejFr5whFlzeolU4vZHrKZRSsU0gFjxdxLG/ZQdNFhhn1cdAYGRFZMe2x9QOnIcpH5Bwcx8usi1MXSSNHu0toe9VhrJKMIntNq2eccVTUTwqRP 2bcjk9vo ep9un+VgLeF1YSYs9Cpx0qD2Yw28s9SmWjQCkSEEhy5u3dAQsq9ddRXQQFULyF23cRIKLXSt8ULMiexiHX5MuvuiNj6OuAOPadj2ItIvyTEnk3w9OlJN4OKDvSnmyk/eAYFxV0mi3CfFbvuJLEEEjNiuSfowTqrXGmAAahhuxvDVvLK4JirD0jk6MoNgH2KhqF+mGJIJYqz9M7a7EuK34pO/sOb4ggtfYYpWBKzspp3swRiZyGVzHksRqvN9/WFA5NaWshC3oGLSAvgfYKNkk4gWiwj/EWTLzhHrwgDEzWu6wT50/bhvxFwYrBUxbO40UVpnLIIF+WihJ3y6ocUE+y4y1GLjY1C17G7GpoCd3Rx5UkUo1r8HPZZR+fYtN/muR6hKez+ZoIh5EXGxds3iqcxC3ylYBvz62KCJkZe7i63+blnOO8fHKayRonVUUjLE2oU5j06tOyl086s709AaA/iLXJZ9W0ILL8ImHoBTojPOGkx7Ng4RcB4ERQvxyx72sYp1qKHVlGBnZim3f/ib0+Fz1OrneNIBhrWAcv7AIpC7HB7TZ3cBf/4k2ScBTtk4YQehsNrkmWlITeeYIR+PMyXl4Qe4/a65QyBn65nm2VgJLiuLn/5f4nk7FjzUpHAbuIptHxSu2zNoGuRLwYADPIDJ7KOWYIyJUadiCuD1mmEtC0wUtdFqnN4PUXEUNo6bBY0N/7DImV3FCCdrfk5RouGp5AL9hqpjPLlhP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: mbind2 is an extensible mbind interface which allows a user to set the mempolicy for one or more address ranges. Defined as: mbind2(struct mpol_args *args, size_t size, unsigned long flags) Input values include the following fields of mpl_args: mode: The MPOL_* policy (DEFAULT, INTERLEAVE, etc.) mode_flags: The MPOL_F_* flags that were previously passed in or'd into the mode. This was split to hopefully allow future extensions additional mode/flag space. pol_nodes: the nodemask to apply for the memory policy pol_maxnodes: The max number of nodes described by pol_nodes home_node: if MPOL_MF_HOME_NODE, set home node of policy to this vec: the vector of (address, len) memory ranges to operate on vlen: the number of entries in vec The semantics are otherwise the same as mbind(), except that the home_node can be set, and all address ranges defined by vec/vlen will be operated on. Valid flags for mbind2 include the same flags as mbind, plus MPOL_MF_HOME_NODE, which informs the syscall to utilize the value of mpol_args->home_node to set the mempolicy home node. Suggested-by: Michal Hocko Suggested-by: Frank van der Linden Suggested-by: Vinicius Tavares Petrucci Suggested-by: Rakie Kim Suggested-by: Hyeongtak Ji Suggested-by: Honggyu Kim Signed-off-by: Gregory Price Co-developed-by: Vinicius Tavares Petrucci --- .../admin-guide/mm/numa_memory_policy.rst | 12 +++- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/syscalls.h | 3 + include/uapi/asm-generic/unistd.h | 4 +- include/uapi/linux/mempolicy.h | 5 +- mm/mempolicy.c | 68 +++++++++++++++++++ 19 files changed, 102 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index a52624ab659a..f1ba33de3a6e 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -475,12 +475,18 @@ Install VMA/Shared Policy for a Range of Task's Address Space:: long mbind(void *start, unsigned long len, int mode, const unsigned long *nmask, unsigned long maxnode, unsigned flags); + long mbind2(struct iovec *vec, size_t vlen, struct mpol_args args, + size_t size, unsigned long flags); mbind() installs the policy specified by (mode, nmask, maxnodes) as a VMA policy for the range of the calling task's address space specified by the 'start' and 'len' arguments. Additional actions may be requested via the 'flags' argument. +mbind2() is an extended version of mbind() capable of operating on multiple +memory ranges in one syscall, and which is capable of setting the home node +for the memory policy without an additional call to set_mempolicy_home_node() + See the mbind(2) man page for more details. Set home node for a Range of Task's Address Spacec:: @@ -496,6 +502,9 @@ closest to which page allocation will come from. Specifying the home node overri the default allocation policy to allocate memory close to the local node for an executing CPU. +mbind2() also provides a way for the home node to be set at the time the +mempolicy is set. See the mbind(2) man page for more details. + Extended Mempolicy Arguments:: struct mpol_args { @@ -512,7 +521,8 @@ Extended Mempolicy Arguments:: The extended mempolicy argument structure is defined to allow the mempolicy interfaces future extensibility without the need for additional system calls. -Extended interfaces (set_mempolicy2 and get_mempolicy2) use this structure. +Extended interfaces (set_mempolicy2, get_mempolicy2, and mbind2) use this +this argument structure. The core arguments (mode, mode_flags, pol_nodes, and pol_maxnodes) apply to all interfaces relative to their non-extended counterparts. Each additional diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 0301a8b0a262..e8239293c35a 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -498,3 +498,4 @@ 566 common futex_requeue sys_futex_requeue 567 common set_mempolicy2 sys_set_mempolicy2 568 common get_mempolicy2 sys_get_mempolicy2 +569 common mbind2 sys_mbind2 diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 771a33446e8e..a3f39750257a 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -472,3 +472,4 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index 048a409e684c..9a12dface18e 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -458,3 +458,4 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 327b01bd6793..6cb740123137 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -464,3 +464,4 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 921d58e1da23..52cf720f8ae2 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -397,3 +397,4 @@ 456 n32 futex_requeue sys_futex_requeue 457 n32 set_mempolicy2 sys_set_mempolicy2 458 n32 get_mempolicy2 sys_get_mempolicy2 +459 n32 mbind2 sys_mbind2 diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9271c83c9993..fd37c5301a48 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -446,3 +446,4 @@ 456 o32 futex_requeue sys_futex_requeue 457 o32 set_mempolicy2 sys_set_mempolicy2 458 o32 get_mempolicy2 sys_get_mempolicy2 +459 o32 mbind2 sys_mbind2 diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 0654f3f89fc7..fcd67bc405b1 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -457,3 +457,4 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index ac11d2064e7a..89715417014c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -545,3 +545,4 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 1cdcafe1ccca..c8304e0d0aa7 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -461,3 +461,4 @@ 456 common futex_requeue sys_futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 sys_mbind2 diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index f71742024c29..e5c51b6c367f 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -461,3 +461,4 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 2fbf5dbe0620..74527f585500 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -504,3 +504,4 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 0af813b9a118..be2e2aa17dd8 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -463,3 +463,4 @@ 456 i386 futex_requeue sys_futex_requeue 457 i386 set_mempolicy2 sys_set_mempolicy2 458 i386 get_mempolicy2 sys_get_mempolicy2 +459 i386 mbind2 sys_mbind2 diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 0b777876fc15..6e2347eb8773 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -380,6 +380,7 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 4536c9a4227d..f00a21317dc0 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -429,3 +429,4 @@ 456 common futex_requeue sys_futex_requeue 457 common set_mempolicy2 sys_set_mempolicy2 458 common get_mempolicy2 sys_get_mempolicy2 +459 common mbind2 sys_mbind2 diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 774512b7934e..487dd9155b25 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -816,6 +816,9 @@ asmlinkage long sys_mbind(unsigned long start, unsigned long len, const unsigned long __user *nmask, unsigned long maxnode, unsigned flags); +asmlinkage long sys_mbind2(const struct iovec __user *vec, size_t vlen, + const struct mpol_args __user *uargs, size_t usize, + unsigned long flags); asmlinkage long sys_get_mempolicy(int __user *policy, unsigned long __user *nmask, unsigned long maxnode, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 719accc731db..cd31599bb9cc 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -832,9 +832,11 @@ __SYSCALL(__NR_futex_requeue, sys_futex_requeue) __SYSCALL(__NR_set_mempolicy2, sys_set_mempolicy2) #define __NR_get_mempolicy2 458 __SYSCALL(__NR_get_mempolicy2, sys_get_mempolicy2) +#define __NR_mbind2 459 +__SYSCALL(__NR_mbind2, sys_mbind2) #undef __NR_syscalls -#define __NR_syscalls 459 +#define __NR_syscalls 460 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 00a673e30047..506ea0f8f34e 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -56,13 +56,14 @@ struct mpol_args { #define MPOL_F_ADDR (1<<1) /* look up vma using address */ #define MPOL_F_MEMS_ALLOWED (1<<2) /* return allowed memories */ -/* Flags for mbind */ +/* Flags for mbind/mbind2 */ #define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */ #define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform to policy */ #define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to policy */ #define MPOL_MF_LAZY (1<<3) /* UNSUPPORTED FLAG: Lazy migrate on fault */ -#define MPOL_MF_INTERNAL (1<<4) /* Internal flags start here */ +#define MPOL_MF_HOME_NODE (1<<4) /* mbind2: set home node */ +#define MPOL_MF_INTERNAL (1<<5) /* Internal flags start here */ #define MPOL_MF_VALID (MPOL_MF_STRICT | \ MPOL_MF_MOVE | \ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index cfe22156ef13..8f609204fbe7 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1600,6 +1600,74 @@ SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len, return kernel_mbind(start, len, mode, nmask, maxnode, flags); } +SYSCALL_DEFINE5(mbind2, const struct iovec __user *, vec, size_t, vlen, + const struct mpol_args __user *, uargs, size_t, usize, + unsigned long, flags) +{ + struct mpol_args kargs; + struct mempolicy_args margs; + nodemask_t policy_nodes; + unsigned long __user *nodes_ptr; + struct iovec iovstack[UIO_FASTIOV]; + struct iovec *iov = iovstack; + struct iov_iter iter; + int err; + + if (!vec || !vlen) + return -EINVAL; + + err = copy_struct_from_user(&kargs, sizeof(kargs), uargs, usize); + if (err) + return -EINVAL; + + err = validate_mpol_flags(kargs.mode, &kargs.mode_flags); + if (err) + return err; + + margs.mode = kargs.mode; + margs.mode_flags = kargs.mode_flags; + margs.addr = kargs.addr; + + /* if home node given, validate it is online */ + if (flags & MPOL_MF_HOME_NODE) { + if ((kargs.home_node >= MAX_NUMNODES) || + !node_online(kargs.home_node)) + return -EINVAL; + margs.home_node = kargs.home_node; + } else + margs.home_node = NUMA_NO_NODE; + flags &= ~MPOL_MF_HOME_NODE; + + if (kargs.pol_nodes) { + nodes_ptr = u64_to_user_ptr(kargs.pol_nodes); + err = get_nodes(&policy_nodes, nodes_ptr, + kargs.pol_maxnodes); + if (err) + return err; + margs.policy_nodes = &policy_nodes; + } else + margs.policy_nodes = NULL; + + /* For each address range in vector, do_mbind */ + err = import_iovec(ITER_DEST, vec, vlen, ARRAY_SIZE(iovstack), &iov, + &iter); + if (err) + return err; + while (iov_iter_count(&iter)) { + unsigned long start, len; + + start = untagged_addr((unsigned long)iter_iov_addr(&iter)); + len = iter_iov_len(&iter); + err = do_mbind(start, len, &margs, flags); + if (err) + break; + iov_iter_advance(&iter, iter_iov_len(&iter)); + } + + kfree(iov); + return err; +} + /* Set the process memory policy */ static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask, unsigned long maxnode)