From patchwork Tue Jun 1 15:14:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12291617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 128A2C47080 for ; Tue, 1 Jun 2021 15:15:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9BB546138C for ; Tue, 1 Jun 2021 15:15:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9BB546138C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E68E66B0070; Tue, 1 Jun 2021 11:15:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E19166B0072; Tue, 1 Jun 2021 11:15:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC6EF6B0073; Tue, 1 Jun 2021 11:15:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id 9B96E6B0070 for ; Tue, 1 Jun 2021 11:15:02 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 20718A8C8 for ; Tue, 1 Jun 2021 15:15:02 +0000 (UTC) X-FDA: 78205502844.34.CB3B260 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf02.hostedemail.com (Postfix) with ESMTP id 97C914202A32 for ; Tue, 1 Jun 2021 15:14:54 +0000 (UTC) IronPort-SDR: ys2IsLYKj3UaaNaFIaCSy21zqsknus7k2SosADlhOluuM5IeuOTF5g3eRGk0MLxF+X6GZ+1QC+ TC2qg4SB7I7Q== X-IronPort-AV: E=McAfee;i="6200,9189,10002"; a="183926904" X-IronPort-AV: E=Sophos;i="5.83,240,1616482800"; d="scan'208";a="183926904" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 08:15:00 -0700 IronPort-SDR: Obq35QqSmQQp3aaLkf3lAIxP7EArbyjT5WVgxwRFZDL5Od3K5lwsSihK2dzD0bAWmFuTvh+UdY iwznmudzGHZw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,240,1616482800"; d="scan'208";a="482522142" Received: from shbuild999.sh.intel.com ([10.239.147.94]) by fmsmga002.fm.intel.com with ESMTP; 01 Jun 2021 08:14:56 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [v4 PATCH 1/3] mm/mempolicy: cleanup nodemask intersection check for oom Date: Tue, 1 Jun 2021 23:14:50 +0800 Message-Id: <1622560492-1294-2-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1622560492-1294-1-git-send-email-feng.tang@intel.com> References: <1622560492-1294-1-git-send-email-feng.tang@intel.com> Authentication-Results: imf02.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf02.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=feng.tang@intel.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 97C914202A32 X-Stat-Signature: ft16cendq4hxqmeinehpe6kechhbb1ut X-HE-Tag: 1622560494-270430 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mempolicy_nodemask_intersects seems to be a general purpose mempolicy function. In fact it is partially tailored for the OOM purpose instead. The oom proper is the only existing user so rename the function to make that purpose explicit. While at it drop the MPOL_INTERLEAVE as those allocations never has a nodemask defined (see alloc_page_interleave) so this is a dead code and a confusing one because MPOL_INTERLEAVE is a hint rather than a hard requirement so it shouldn't be considered during the OOM. The final code can be reduced to a check for MPOL_BIND which is the only memory policy that is a hard requirement and thus relevant to a constrained OOM logic. Suggested-by: Michal Hocko Signed-off-by: Feng Tang Acked-by: Michal Hocko --- include/linux/mempolicy.h | 2 +- mm/mempolicy.c | 34 +++++++++------------------------- mm/oom_kill.c | 2 +- 3 files changed, 11 insertions(+), 27 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 5f1c74d..8773c55 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -150,7 +150,7 @@ extern int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol, nodemask_t **nodemask); extern bool init_nodemask_of_mempolicy(nodemask_t *mask); -extern bool mempolicy_nodemask_intersects(struct task_struct *tsk, +extern bool mempolicy_in_oom_domain(struct task_struct *tsk, const nodemask_t *mask); extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d79fa29..6795a6a 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2094,16 +2094,16 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) #endif /* - * mempolicy_nodemask_intersects + * mempolicy_in_oom_domain * - * If tsk's mempolicy is "default" [NULL], return 'true' to indicate default - * policy. Otherwise, check for intersection between mask and the policy - * nodemask for 'bind' or 'interleave' policy. For 'preferred' or 'local' - * policy, always return true since it may allocate elsewhere on fallback. + * If tsk's mempolicy is "bind", check for intersection between mask and + * the policy nodemask. Otherwise, return true for all other policies + * including "interleave", as a tsk with "interleave" policy may have + * memory allocated from all nodes in system. * * Takes task_lock(tsk) to prevent freeing of its mempolicy. */ -bool mempolicy_nodemask_intersects(struct task_struct *tsk, +bool mempolicy_in_oom_domain(struct task_struct *tsk, const nodemask_t *mask) { struct mempolicy *mempolicy; @@ -2111,29 +2111,13 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, if (!mask) return ret; + task_lock(tsk); mempolicy = tsk->mempolicy; - if (!mempolicy) - goto out; - - switch (mempolicy->mode) { - case MPOL_PREFERRED: - /* - * MPOL_PREFERRED and MPOL_F_LOCAL are only preferred nodes to - * allocate from, they may fallback to other nodes when oom. - * Thus, it's possible for tsk to have allocated memory from - * nodes in mask. - */ - break; - case MPOL_BIND: - case MPOL_INTERLEAVE: + if (mempolicy && mempolicy->mode == MPOL_BIND) ret = nodes_intersects(mempolicy->v.nodes, *mask); - break; - default: - BUG(); - } -out: task_unlock(tsk); + return ret; } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index eefd3f5..fcc29e9 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -104,7 +104,7 @@ static bool oom_cpuset_eligible(struct task_struct *start, * mempolicy intersects current, otherwise it may be * needlessly killed. */ - ret = mempolicy_nodemask_intersects(tsk, mask); + ret = mempolicy_in_oom_domain(tsk, mask); } else { /* * This is not a mempolicy constrained oom, so only From patchwork Tue Jun 1 15:14:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12291621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3755BC47092 for ; Tue, 1 Jun 2021 15:15:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BFC7160FF3 for ; Tue, 1 Jun 2021 15:15:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BFC7160FF3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 62BF86B0072; Tue, 1 Jun 2021 11:15:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 607796B0073; Tue, 1 Jun 2021 11:15:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42FD26B0074; Tue, 1 Jun 2021 11:15:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 12AB66B0072 for ; Tue, 1 Jun 2021 11:15:07 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 95A46181AC9C6 for ; Tue, 1 Jun 2021 15:15:06 +0000 (UTC) X-FDA: 78205503012.14.D4BE402 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf30.hostedemail.com (Postfix) with ESMTP id A5070E000254 for ; Tue, 1 Jun 2021 15:14:56 +0000 (UTC) IronPort-SDR: iPV8nmDzQlwIUKGmhzNXUnBhG55PzJe6lnGPIqUpz5RMFoB+SHrqq8hEgTmgBqrrV/LLQwchdF wEqIwm2hNU4A== X-IronPort-AV: E=McAfee;i="6200,9189,10002"; a="183926918" X-IronPort-AV: E=Sophos;i="5.83,240,1616482800"; d="scan'208";a="183926918" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 08:15:04 -0700 IronPort-SDR: YGo1q/ZocArEtywygOsaH9gGrA7SkJmRSwv/yGW/GWmNbiwgELG67IbNh3aY43LOyaIdhp6Yum 9Ryo606XhvJA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,240,1616482800"; d="scan'208";a="482522187" Received: from shbuild999.sh.intel.com ([10.239.147.94]) by fmsmga002.fm.intel.com with ESMTP; 01 Jun 2021 08:15:00 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [v4 PATCH 2/3] mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy Date: Tue, 1 Jun 2021 23:14:51 +0800 Message-Id: <1622560492-1294-3-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1622560492-1294-1-git-send-email-feng.tang@intel.com> References: <1622560492-1294-1-git-send-email-feng.tang@intel.com> X-Rspamd-Queue-Id: A5070E000254 Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf30.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=feng.tang@intel.com X-Rspamd-Server: rspam04 X-Stat-Signature: fgnsewutdjr8x7ad71yckzr59gunbord X-HE-Tag: 1622560496-488347 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: MPOL_LOCAL policy has been setup as a real policy, but it is still handled like a faked POL_PREFERRED policy with one internal MPOL_F_LOCAL flag bit set, and there are many places having to judge the real 'prefer' or the 'local' policy, which are quite confusing. In current code, there are 4 cases that MPOL_LOCAL are used: 1. user specifies 'local' policy 2. user specifies 'prefer' policy, but with empty nodemask 3. system 'default' policy is used 4. 'prefer' policy + valid 'preferred' node with MPOL_F_STATIC_NODES flag set, and when it is 'rebind' to a nodemask which doesn't contains the 'preferred' node, it will perform as 'local' policy So make 'local' a real policy instead of a fake 'prefer' one, and kill MPOL_F_LOCAL bit, which can greatly reduce the confusion for code reading. For case 4, the logic of mpol_rebind_preferred() is confusing, as Michal Hocko pointed out: : I do believe that rebinding preferred policy is just bogus and it should : be dropped altogether on the ground that a preference is a mere hint from : userspace where to start the allocation. Unless I am missing something : cpusets will be always authoritative for the final placement. The : preferred node just acts as a starting point and it should be really : preserved when cpusets changes. Otherwise we have a very subtle behavior : corner cases. So dump all the tricky transformation between 'prefer' and 'local', and just record the new nodemask of rebinding. Suggested-by: Michal Hocko Signed-off-by: Feng Tang Acked-by: Michal Hocko Signed-off-by: Feng Tang Reported-by: kernel test robot --- include/uapi/linux/mempolicy.h | 1 - mm/mempolicy.c | 131 +++++++++++++++++------------------------ 2 files changed, 53 insertions(+), 79 deletions(-) diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 4832fd0..19a00bc 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -60,7 +60,6 @@ enum { * are never OR'ed into the mode in mempolicy API arguments. */ #define MPOL_F_SHARED (1 << 0) /* identify shared policies */ -#define MPOL_F_LOCAL (1 << 1) /* preferred local allocation */ #define MPOL_F_MOF (1 << 3) /* this policy wants migrate on fault */ #define MPOL_F_MORON (1 << 4) /* Migrate On protnone Reference On Node */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 6795a6a..9805291 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -121,8 +121,7 @@ enum zone_type policy_zone = 0; */ static struct mempolicy default_policy = { .refcnt = ATOMIC_INIT(1), /* never free it */ - .mode = MPOL_PREFERRED, - .flags = MPOL_F_LOCAL, + .mode = MPOL_LOCAL, }; static struct mempolicy preferred_node_policy[MAX_NUMNODES]; @@ -200,12 +199,9 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes) static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) { - if (!nodes) - pol->flags |= MPOL_F_LOCAL; /* local allocation */ - else if (nodes_empty(*nodes)) - return -EINVAL; /* no allowed nodes */ - else - pol->v.preferred_node = first_node(*nodes); + if (nodes_empty(*nodes)) + return -EINVAL; + pol->v.preferred_node = first_node(*nodes); return 0; } @@ -220,8 +216,7 @@ static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) /* * mpol_set_nodemask is called after mpol_new() to set up the nodemask, if * any, for the new policy. mpol_new() has already validated the nodes - * parameter with respect to the policy mode and flags. But, we need to - * handle an empty nodemask with MPOL_PREFERRED here. + * parameter with respect to the policy mode and flags. * * Must be called holding task's alloc_lock to protect task's mems_allowed * and mempolicy. May also be called holding the mmap_lock for write. @@ -234,30 +229,27 @@ static int mpol_set_nodemask(struct mempolicy *pol, /* if mode is MPOL_DEFAULT, pol is NULL. This is right. */ if (pol == NULL) return 0; + + if (pol->mode == MPOL_LOCAL) + return 0; + /* Check N_MEMORY */ nodes_and(nsc->mask1, cpuset_current_mems_allowed, node_states[N_MEMORY]); VM_BUG_ON(!nodes); - if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes)) - nodes = NULL; /* explicit local allocation */ - else { - if (pol->flags & MPOL_F_RELATIVE_NODES) - mpol_relative_nodemask(&nsc->mask2, nodes, &nsc->mask1); - else - nodes_and(nsc->mask2, *nodes, nsc->mask1); - if (mpol_store_user_nodemask(pol)) - pol->w.user_nodemask = *nodes; - else - pol->w.cpuset_mems_allowed = - cpuset_current_mems_allowed; - } + if (pol->flags & MPOL_F_RELATIVE_NODES) + mpol_relative_nodemask(&nsc->mask2, nodes, &nsc->mask1); + else + nodes_and(nsc->mask2, *nodes, nsc->mask1); - if (nodes) - ret = mpol_ops[pol->mode].create(pol, &nsc->mask2); + if (mpol_store_user_nodemask(pol)) + pol->w.user_nodemask = *nodes; else - ret = mpol_ops[pol->mode].create(pol, NULL); + pol->w.cpuset_mems_allowed = cpuset_current_mems_allowed; + + ret = mpol_ops[pol->mode].create(pol, &nsc->mask2); return ret; } @@ -290,13 +282,14 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, if (((flags & MPOL_F_STATIC_NODES) || (flags & MPOL_F_RELATIVE_NODES))) return ERR_PTR(-EINVAL); + + mode = MPOL_LOCAL; } } else if (mode == MPOL_LOCAL) { if (!nodes_empty(*nodes) || (flags & MPOL_F_STATIC_NODES) || (flags & MPOL_F_RELATIVE_NODES)) return ERR_PTR(-EINVAL); - mode = MPOL_PREFERRED; } else if (nodes_empty(*nodes)) return ERR_PTR(-EINVAL); policy = kmem_cache_alloc(policy_cache, GFP_KERNEL); @@ -344,25 +337,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes) static void mpol_rebind_preferred(struct mempolicy *pol, const nodemask_t *nodes) { - nodemask_t tmp; - - if (pol->flags & MPOL_F_STATIC_NODES) { - int node = first_node(pol->w.user_nodemask); - - if (node_isset(node, *nodes)) { - pol->v.preferred_node = node; - pol->flags &= ~MPOL_F_LOCAL; - } else - pol->flags |= MPOL_F_LOCAL; - } else if (pol->flags & MPOL_F_RELATIVE_NODES) { - mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); - pol->v.preferred_node = first_node(tmp); - } else if (!(pol->flags & MPOL_F_LOCAL)) { - pol->v.preferred_node = node_remap(pol->v.preferred_node, - pol->w.cpuset_mems_allowed, - *nodes); - pol->w.cpuset_mems_allowed = *nodes; - } + pol->w.cpuset_mems_allowed = *nodes; } /* @@ -376,7 +351,7 @@ static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask) { if (!pol) return; - if (!mpol_store_user_nodemask(pol) && !(pol->flags & MPOL_F_LOCAL) && + if (!mpol_store_user_nodemask(pol) && nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) return; @@ -427,6 +402,9 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { .create = mpol_new_bind, .rebind = mpol_rebind_nodemask, }, + [MPOL_LOCAL] = { + .rebind = mpol_rebind_default, + }, }; static int migrate_page_add(struct page *page, struct list_head *pagelist, @@ -919,10 +897,12 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) case MPOL_INTERLEAVE: *nodes = p->v.nodes; break; + case MPOL_LOCAL: + /* return empty node mask for local allocation */ + break; + case MPOL_PREFERRED: - if (!(p->flags & MPOL_F_LOCAL)) - node_set(p->v.preferred_node, *nodes); - /* else return empty node mask for local allocation */ + node_set(p->v.preferred_node, *nodes); break; default: BUG(); @@ -1894,9 +1874,9 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) /* Return the node id preferred by the given mempolicy, or the given id */ static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) { - if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) + if (policy->mode == MPOL_PREFERRED) { nd = policy->v.preferred_node; - else { + } else { /* * __GFP_THISNODE shouldn't even be used with the bind policy * because we might easily break the expectation to stay on the @@ -1933,14 +1913,11 @@ unsigned int mempolicy_slab_node(void) return node; policy = current->mempolicy; - if (!policy || policy->flags & MPOL_F_LOCAL) + if (!policy) return node; switch (policy->mode) { case MPOL_PREFERRED: - /* - * handled MPOL_F_LOCAL above - */ return policy->v.preferred_node; case MPOL_INTERLEAVE: @@ -1960,6 +1937,8 @@ unsigned int mempolicy_slab_node(void) &policy->v.nodes); return z->zone ? zone_to_nid(z->zone) : node; } + case MPOL_LOCAL: + return node; default: BUG(); @@ -2072,16 +2051,18 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) mempolicy = current->mempolicy; switch (mempolicy->mode) { case MPOL_PREFERRED: - if (mempolicy->flags & MPOL_F_LOCAL) - nid = numa_node_id(); - else - nid = mempolicy->v.preferred_node; + nid = mempolicy->v.preferred_node; init_nodemask_of_node(mask, nid); break; case MPOL_BIND: case MPOL_INTERLEAVE: - *mask = mempolicy->v.nodes; + *mask = mempolicy->v.nodes; + break; + + case MPOL_LOCAL: + nid = numa_node_id(); + init_nodemask_of_node(mask, nid); break; default: @@ -2188,7 +2169,7 @@ struct page *alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, * If the policy is interleave, or does not allow the current * node in its nodemask, we allocate the standard way. */ - if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) + if (pol->mode == MPOL_PREFERRED) hpage_node = pol->v.preferred_node; nmask = policy_nodemask(gfp, pol); @@ -2324,10 +2305,9 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) case MPOL_INTERLEAVE: return !!nodes_equal(a->v.nodes, b->v.nodes); case MPOL_PREFERRED: - /* a's ->flags is the same as b's */ - if (a->flags & MPOL_F_LOCAL) - return true; return a->v.preferred_node == b->v.preferred_node; + case MPOL_LOCAL: + return true; default: BUG(); return false; @@ -2465,10 +2445,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long break; case MPOL_PREFERRED: - if (pol->flags & MPOL_F_LOCAL) - polnid = numa_node_id(); - else - polnid = pol->v.preferred_node; + polnid = pol->v.preferred_node; + break; + + case MPOL_LOCAL: + polnid = numa_node_id(); break; case MPOL_BIND: @@ -2835,9 +2816,6 @@ void numa_default_policy(void) * Parse and format mempolicy from/to strings */ -/* - * "local" is implemented internally by MPOL_PREFERRED with MPOL_F_LOCAL flag. - */ static const char * const policy_modes[] = { [MPOL_DEFAULT] = "default", @@ -2915,7 +2893,6 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) */ if (nodelist) goto out; - mode = MPOL_PREFERRED; break; case MPOL_DEFAULT: /* @@ -2959,7 +2936,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) else if (nodelist) new->v.preferred_node = first_node(nodes); else - new->flags |= MPOL_F_LOCAL; + new->mode = MPOL_LOCAL; /* * Save nodes for contextualization: this will be used to "clone" @@ -3005,12 +2982,10 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) switch (mode) { case MPOL_DEFAULT: + case MPOL_LOCAL: break; case MPOL_PREFERRED: - if (flags & MPOL_F_LOCAL) - mode = MPOL_LOCAL; - else - node_set(pol->v.preferred_node, nodes); + node_set(pol->v.preferred_node, nodes); break; case MPOL_BIND: case MPOL_INTERLEAVE: From patchwork Tue Jun 1 15:14:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12291623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3710C47093 for ; Tue, 1 Jun 2021 15:15:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9033861375 for ; Tue, 1 Jun 2021 15:15:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9033861375 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 283156B0073; Tue, 1 Jun 2021 11:15:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 259766B0074; Tue, 1 Jun 2021 11:15:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FA7E6B0075; Tue, 1 Jun 2021 11:15:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id D01456B0073 for ; Tue, 1 Jun 2021 11:15:09 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 63E58180AD811 for ; Tue, 1 Jun 2021 15:15:09 +0000 (UTC) X-FDA: 78205503138.12.3ADAA0A Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf30.hostedemail.com (Postfix) with ESMTP id 70B7CE00026B for ; Tue, 1 Jun 2021 15:14:59 +0000 (UTC) IronPort-SDR: XuaueSrWUaI5mWXuRlBv40IM8wdcfqyoJNLAuU5SCmCtZREoVl7FYQdcbwIa8JdDzrSYebI1bJ RE0tunJlfVmQ== X-IronPort-AV: E=McAfee;i="6200,9189,10002"; a="183926966" X-IronPort-AV: E=Sophos;i="5.83,240,1616482800"; d="scan'208";a="183926966" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 08:15:08 -0700 IronPort-SDR: POLDI6p1uxAdB0Urxy1YA0R7vC1Ud3T53dZx8PxoyXnLVQepT3KmU5iDtgO4FOtT4jNPKLKNtE l4J9bNaMP77w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,240,1616482800"; d="scan'208";a="482522223" Received: from shbuild999.sh.intel.com ([10.239.147.94]) by fmsmga002.fm.intel.com with ESMTP; 01 Jun 2021 08:15:05 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [v4 PATCH 3/3] mm/mempolicy: unify the parameter sanity check for mbind and set_mempolicy Date: Tue, 1 Jun 2021 23:14:52 +0800 Message-Id: <1622560492-1294-4-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1622560492-1294-1-git-send-email-feng.tang@intel.com> References: <1622560492-1294-1-git-send-email-feng.tang@intel.com> X-Rspamd-Queue-Id: 70B7CE00026B Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf30.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=feng.tang@intel.com X-Rspamd-Server: rspam04 X-Stat-Signature: znrtk33m3j9s4d7d3a76anysj83opw3p X-HE-Tag: 1622560499-650732 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently the kernel_mbind() and kernel_set_mempolicy() do almost the same operation for parameter sanity check. Add a helper function to unify the code to reduce the redundancy, and make it easier for changing the sanity check code in future. [thanks to David Rientjes for suggesting using helper function instead of macro] Signed-off-by: Feng Tang Acked-by: Michal Hocko --- mm/mempolicy.c | 48 ++++++++++++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 9805291..32ca8fc 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1440,26 +1440,38 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode, return copy_to_user(mask, nodes_addr(*nodes), copy) ? -EFAULT : 0; } +/* Basic parameter sanity check used by both mbind() and set_mempolicy() */ +static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) +{ + *flags = *mode & MPOL_MODE_FLAGS; + *mode &= ~MPOL_MODE_FLAGS; + if ((unsigned int)(*mode) >= MPOL_MAX) + return -EINVAL; + if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) + return -EINVAL; + + return 0; +} + static long kernel_mbind(unsigned long start, unsigned long len, unsigned long mode, const unsigned long __user *nmask, unsigned long maxnode, unsigned int flags) { + unsigned short mode_flags; nodemask_t nodes; + int lmode = mode; int err; - unsigned short mode_flags; start = untagged_addr(start); - mode_flags = mode & MPOL_MODE_FLAGS; - mode &= ~MPOL_MODE_FLAGS; - if (mode >= MPOL_MAX) - return -EINVAL; - if ((mode_flags & MPOL_F_STATIC_NODES) && - (mode_flags & MPOL_F_RELATIVE_NODES)) - return -EINVAL; + err = sanitize_mpol_flags(&lmode, &mode_flags); + if (err) + return err; + err = get_nodes(&nodes, nmask, maxnode); if (err) return err; - return do_mbind(start, len, mode, mode_flags, &nodes, flags); + + return do_mbind(start, len, lmode, mode_flags, &nodes, flags); } SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len, @@ -1473,20 +1485,20 @@ SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len, static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask, unsigned long maxnode) { - int err; + unsigned short mode_flags; nodemask_t nodes; - unsigned short flags; + int lmode = mode; + int err; + + err = sanitize_mpol_flags(&lmode, &mode_flags); + if (err) + return err; - flags = mode & MPOL_MODE_FLAGS; - mode &= ~MPOL_MODE_FLAGS; - if ((unsigned int)mode >= MPOL_MAX) - return -EINVAL; - if ((flags & MPOL_F_STATIC_NODES) && (flags & MPOL_F_RELATIVE_NODES)) - return -EINVAL; err = get_nodes(&nodes, nmask, maxnode); if (err) return err; - return do_set_mempolicy(mode, flags, &nodes); + + return do_set_mempolicy(lmode, mode_flags, &nodes); } SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long __user *, nmask,