From patchwork Wed May 26 05:01:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12280627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 333F4C47088 for ; Wed, 26 May 2021 05:01:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B2E9F610CB for ; Wed, 26 May 2021 05:01:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B2E9F610CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BB3F36B006E; Wed, 26 May 2021 01:01:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8AB76B0070; Wed, 26 May 2021 01:01:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A59556B0071; Wed, 26 May 2021 01:01:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id 6D1476B006E for ; Wed, 26 May 2021 01:01:52 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F1E07989B for ; Wed, 26 May 2021 05:01:51 +0000 (UTC) X-FDA: 78182184822.21.C8BE389 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf01.hostedemail.com (Postfix) with ESMTP id A4ACA5001649 for ; Wed, 26 May 2021 05:01:44 +0000 (UTC) IronPort-SDR: Iv5738KtrO/I2SuT5wgtZBIda2jG6cwqfvvBNELEESK3XaYAYCAQnDc2UN9wP2GMrp2VzGVi0Y G1GUtIId7h2w== X-IronPort-AV: E=McAfee;i="6200,9189,9995"; a="200480065" X-IronPort-AV: E=Sophos;i="5.82,330,1613462400"; d="scan'208";a="200480065" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2021 22:01:50 -0700 IronPort-SDR: rHJTpW6ue71FjKcagiwVlhtoj6dWxRqE9aEHOu+VCRNJw6PuzQq0hdq/IBaFGfJO8T2JhV48Jy xCdE/41/iYXw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,330,1613462400"; d="scan'208";a="479700238" Received: from shbuild999.sh.intel.com ([10.239.147.94]) by fmsmga002.fm.intel.com with ESMTP; 25 May 2021 22:01:47 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [PATCH v1 1/4] mm/mempolicy: skip nodemask intersect check for 'interleave' when oom Date: Wed, 26 May 2021 13:01:39 +0800 Message-Id: <1622005302-23027-2-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1622005302-23027-1-git-send-email-feng.tang@intel.com> References: <1622005302-23027-1-git-send-email-feng.tang@intel.com> Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf01.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=feng.tang@intel.com X-Stat-Signature: d83o71ni5dz9p9w94ryws8a9tcsgk6zt X-Rspamd-Queue-Id: A4ACA5001649 X-Rspamd-Server: rspam02 X-HE-Tag: 1622005304-26729 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mempolicy_nodemask_intersects() is used in oom case to check if a task may have memory allocated on some memory nodes. Currently, the nodes_intersects() is run for both 'bind' and 'interleave' policies. But they are different regarding memory allocation, the nodemask is a forced requirement for 'bind', while just a hint for 'interleave'. Like in alloc_pages_vma(): nmask = policy_nodemask(gfp, pol); preferred_nid = policy_node(gfp, pol, node); page = __alloc_pages(gfp, order, preferred_nid, nmask); in plicy_nodemask(), only 'bind' policy may return its desired nodemask, while others return NULL. And this 'NULL' enables the 'interleave' policy can get memory from other nodes than its nodemask. So skip the nodemask intersect check for 'interleave' policy. Suggested-by: Michal Hocko Signed-off-by: Feng Tang --- mm/mempolicy.c | 24 ++++-------------------- 1 file changed, 4 insertions(+), 20 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d79fa29..1964cca 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2098,7 +2098,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) * * If tsk's mempolicy is "default" [NULL], return 'true' to indicate default * policy. Otherwise, check for intersection between mask and the policy - * nodemask for 'bind' or 'interleave' policy. For 'preferred' or 'local' + * nodemask for 'bind' policy. For 'interleave', 'preferred' or 'local' * policy, always return true since it may allocate elsewhere on fallback. * * Takes task_lock(tsk) to prevent freeing of its mempolicy. @@ -2111,29 +2111,13 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, if (!mask) return ret; + task_lock(tsk); mempolicy = tsk->mempolicy; - if (!mempolicy) - goto out; - - switch (mempolicy->mode) { - case MPOL_PREFERRED: - /* - * MPOL_PREFERRED and MPOL_F_LOCAL are only preferred nodes to - * allocate from, they may fallback to other nodes when oom. - * Thus, it's possible for tsk to have allocated memory from - * nodes in mask. - */ - break; - case MPOL_BIND: - case MPOL_INTERLEAVE: + if (mempolicy && mempolicy->mode == MPOL_BIND) ret = nodes_intersects(mempolicy->v.nodes, *mask); - break; - default: - BUG(); - } -out: task_unlock(tsk); + return ret; } From patchwork Wed May 26 05:01:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12280629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5B0FC2B9F7 for ; Wed, 26 May 2021 05:02:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33A5B613BF for ; Wed, 26 May 2021 05:02:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33A5B613BF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 55A8F6B0070; Wed, 26 May 2021 01:02:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 507636B0071; Wed, 26 May 2021 01:02:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A0BE6B0072; Wed, 26 May 2021 01:02:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0081.hostedemail.com [216.40.44.81]) by kanga.kvack.org (Postfix) with ESMTP id BA50B6B0070 for ; Wed, 26 May 2021 01:02:02 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 45E978249980 for ; Wed, 26 May 2021 05:02:02 +0000 (UTC) X-FDA: 78182185284.22.F50416E Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf10.hostedemail.com (Postfix) with ESMTP id 3524440B8CF7 for ; Wed, 26 May 2021 05:01:54 +0000 (UTC) IronPort-SDR: L5evG59dgOrYdWM/iqWzOCDBrSr89EMQGbU1Wcz+y4IaczpkYDIhEdIkpk7maJTyTle6JMhcOJ p5stWE0vT6pQ== X-IronPort-AV: E=McAfee;i="6200,9189,9995"; a="202418845" X-IronPort-AV: E=Sophos;i="5.82,330,1613462400"; d="scan'208";a="202418845" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2021 22:01:59 -0700 IronPort-SDR: zQkz1Y67YopzocW+inNWA3lREcKog8YlC7QCgQlRPgw8zTcoWfhNGL5ORVnUkuzQgO7nLoXXt1 qTfB7hcztefw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,330,1613462400"; d="scan'208";a="479700258" Received: from shbuild999.sh.intel.com ([10.239.147.94]) by fmsmga002.fm.intel.com with ESMTP; 25 May 2021 22:01:50 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [PATCH v1 2/4] mm/mempolicy: unify the preprocessing for mbind and set_mempolicy Date: Wed, 26 May 2021 13:01:40 +0800 Message-Id: <1622005302-23027-3-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1622005302-23027-1-git-send-email-feng.tang@intel.com> References: <1622005302-23027-1-git-send-email-feng.tang@intel.com> X-Rspamd-Queue-Id: 3524440B8CF7 Authentication-Results: imf10.hostedemail.com; dkim=none; spf=none (imf10.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=feng.tang@intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-Rspamd-Server: rspam03 X-Stat-Signature: iqapwtiw8use1wqagpdds9yu1gwegfys X-HE-Tag: 1622005314-74756 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently the kernel_mbind() and kernel_set_mempolicy() do almost the same operation for parameter sanity check and preprocessing. Add a helper function to unify the code to reduce the redundancy, and make it easier for changing the pre-processing code in future. [thanks to David Rientjes for suggesting using helper function instead of macro] Signed-off-by: Feng Tang --- mm/mempolicy.c | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1964cca..2830bb8 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1460,6 +1460,20 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode, return copy_to_user(mask, nodes_addr(*nodes), copy) ? -EFAULT : 0; } +static inline int mpol_pre_process(int *mode, const unsigned long __user *nmask, unsigned long maxnode, nodemask_t *nodes, unsigned short *flags) +{ + int ret; + + *flags = *mode & MPOL_MODE_FLAGS; + *mode &= ~MPOL_MODE_FLAGS; + if ((unsigned int)(*mode) >= MPOL_MAX) + return -EINVAL; + if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) + return -EINVAL; + ret = get_nodes(nodes, nmask, maxnode); + return ret; +} + static long kernel_mbind(unsigned long start, unsigned long len, unsigned long mode, const unsigned long __user *nmask, unsigned long maxnode, unsigned int flags) @@ -1467,19 +1481,14 @@ static long kernel_mbind(unsigned long start, unsigned long len, nodemask_t nodes; int err; unsigned short mode_flags; + int lmode = mode; - start = untagged_addr(start); - mode_flags = mode & MPOL_MODE_FLAGS; - mode &= ~MPOL_MODE_FLAGS; - if (mode >= MPOL_MAX) - return -EINVAL; - if ((mode_flags & MPOL_F_STATIC_NODES) && - (mode_flags & MPOL_F_RELATIVE_NODES)) - return -EINVAL; - err = get_nodes(&nodes, nmask, maxnode); + err = mpol_pre_process(&lmode, nmask, maxnode, &nodes, &mode_flags); if (err) return err; - return do_mbind(start, len, mode, mode_flags, &nodes, flags); + + start = untagged_addr(start); + return do_mbind(start, len, lmode, mode_flags, &nodes, flags); } SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len, @@ -1495,18 +1504,14 @@ static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask, { int err; nodemask_t nodes; - unsigned short flags; + unsigned short mode_flags; + int lmode = mode; - flags = mode & MPOL_MODE_FLAGS; - mode &= ~MPOL_MODE_FLAGS; - if ((unsigned int)mode >= MPOL_MAX) - return -EINVAL; - if ((flags & MPOL_F_STATIC_NODES) && (flags & MPOL_F_RELATIVE_NODES)) - return -EINVAL; - err = get_nodes(&nodes, nmask, maxnode); + err = mpol_pre_process(&lmode, nmask, maxnode, &nodes, &mode_flags); if (err) return err; - return do_set_mempolicy(mode, flags, &nodes); + + return do_set_mempolicy(lmode, mode_flags, &nodes); } SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long __user *, nmask, From patchwork Wed May 26 05:01:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12280631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 709A9C47082 for ; Wed, 26 May 2021 05:02:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BE400613B9 for ; Wed, 26 May 2021 05:02:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE400613B9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AB2F66B0071; Wed, 26 May 2021 01:02:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99C446B0072; Wed, 26 May 2021 01:02:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A1BC6B0073; Wed, 26 May 2021 01:02:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id BC25D6B0072 for ; Wed, 26 May 2021 01:02:03 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 47C3F181AEF3C for ; Wed, 26 May 2021 05:02:03 +0000 (UTC) X-FDA: 78182185326.20.FA251FB Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf10.hostedemail.com (Postfix) with ESMTP id 5AE6140B8CE8 for ; Wed, 26 May 2021 05:01:56 +0000 (UTC) IronPort-SDR: iqc0bmh9F7cxNmJ4/J4GgSyW4oUIokzYrQikH48DsIaeVlIwPALucYJoXqdWnvfPhATggHM0oA nOA4mMok5wZQ== X-IronPort-AV: E=McAfee;i="6200,9189,9995"; a="202418853" X-IronPort-AV: E=Sophos;i="5.82,330,1613462400"; d="scan'208";a="202418853" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2021 22:02:02 -0700 IronPort-SDR: BZ2phDWZMYSCuXqzBP9HF4qiR/YUWL78LDIyN1AjQYoeArfwkIiuvjjJWo2ch1fE1lcM7urbKn Kg44OKWfsc6A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,330,1613462400"; d="scan'208";a="479700298" Received: from shbuild999.sh.intel.com ([10.239.147.94]) by fmsmga002.fm.intel.com with ESMTP; 25 May 2021 22:01:58 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [PATCH v1 3/4] mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy Date: Wed, 26 May 2021 13:01:41 +0800 Message-Id: <1622005302-23027-4-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1622005302-23027-1-git-send-email-feng.tang@intel.com> References: <1622005302-23027-1-git-send-email-feng.tang@intel.com> X-Rspamd-Queue-Id: 5AE6140B8CE8 Authentication-Results: imf10.hostedemail.com; dkim=none; spf=none (imf10.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=feng.tang@intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-Rspamd-Server: rspam03 X-Stat-Signature: egs1p7qouc8xoqwfsskk1u31ezirtbgy X-HE-Tag: 1622005316-603078 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: MPOL_LOCAL policy has been setup as a real policy, but it is still handled like a faked POL_PREFERRED policy with one internal MPOL_F_LOCAL flag bit set, and there are many places having to judge the real 'prefer' or the 'local' policy, which are quite confusing. In current code, there are four cases that MPOL_LOCAL are used: * user specifies 'local' policy * user specifies 'prefer' policy, but with empty nodemask * system 'default' policy is used * 'prefer' policy + valid 'preferred' node with MPOL_F_STATIC_NODES flag set, and when it is 'rebind' to a nodemask which doesn't contains the 'preferred' node, it will add the MPOL_F_LOCAL bit and performs as 'local' policy. In future if it is 'rebind' again with valid nodemask, the policy will be restored back to 'prefer' So for the first three cases, we make 'local' a real policy instead of a fake 'prefer' one, this will reduce confusion for reading code. And next optional patch will kill the 'MPOL_F_LOCAL' bit. Signed-off-by: Feng Tang Acked-by: Michal Hocko --- mm/mempolicy.c | 66 ++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 39 insertions(+), 27 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 2830bb8..d97839d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -121,8 +121,7 @@ enum zone_type policy_zone = 0; */ static struct mempolicy default_policy = { .refcnt = ATOMIC_INIT(1), /* never free it */ - .mode = MPOL_PREFERRED, - .flags = MPOL_F_LOCAL, + .mode = MPOL_LOCAL, }; static struct mempolicy preferred_node_policy[MAX_NUMNODES]; @@ -200,12 +199,9 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes) static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) { - if (!nodes) - pol->flags |= MPOL_F_LOCAL; /* local allocation */ - else if (nodes_empty(*nodes)) - return -EINVAL; /* no allowed nodes */ - else - pol->v.preferred_node = first_node(*nodes); + if (nodes_empty(*nodes)) + return -EINVAL; + pol->v.preferred_node = first_node(*nodes); return 0; } @@ -217,6 +213,11 @@ static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) return 0; } +static int mpol_new_local(struct mempolicy *pol, const nodemask_t *nodes) +{ + return 0; +} + /* * mpol_set_nodemask is called after mpol_new() to set up the nodemask, if * any, for the new policy. mpol_new() has already validated the nodes @@ -239,25 +240,19 @@ static int mpol_set_nodemask(struct mempolicy *pol, cpuset_current_mems_allowed, node_states[N_MEMORY]); VM_BUG_ON(!nodes); - if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes)) - nodes = NULL; /* explicit local allocation */ - else { - if (pol->flags & MPOL_F_RELATIVE_NODES) - mpol_relative_nodemask(&nsc->mask2, nodes, &nsc->mask1); - else - nodes_and(nsc->mask2, *nodes, nsc->mask1); - if (mpol_store_user_nodemask(pol)) - pol->w.user_nodemask = *nodes; - else - pol->w.cpuset_mems_allowed = - cpuset_current_mems_allowed; - } + if (pol->flags & MPOL_F_RELATIVE_NODES) + mpol_relative_nodemask(&nsc->mask2, nodes, &nsc->mask1); + else + nodes_and(nsc->mask2, *nodes, nsc->mask1); - if (nodes) - ret = mpol_ops[pol->mode].create(pol, &nsc->mask2); + if (mpol_store_user_nodemask(pol)) + pol->w.user_nodemask = *nodes; else - ret = mpol_ops[pol->mode].create(pol, NULL); + pol->w.cpuset_mems_allowed = + cpuset_current_mems_allowed; + + ret = mpol_ops[pol->mode].create(pol, &nsc->mask2); return ret; } @@ -290,13 +285,14 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, if (((flags & MPOL_F_STATIC_NODES) || (flags & MPOL_F_RELATIVE_NODES))) return ERR_PTR(-EINVAL); + + mode = MPOL_LOCAL; } } else if (mode == MPOL_LOCAL) { if (!nodes_empty(*nodes) || (flags & MPOL_F_STATIC_NODES) || (flags & MPOL_F_RELATIVE_NODES)) return ERR_PTR(-EINVAL); - mode = MPOL_PREFERRED; } else if (nodes_empty(*nodes)) return ERR_PTR(-EINVAL); policy = kmem_cache_alloc(policy_cache, GFP_KERNEL); @@ -427,6 +423,10 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { .create = mpol_new_bind, .rebind = mpol_rebind_nodemask, }, + [MPOL_LOCAL] = { + .create = mpol_new_local, + .rebind = mpol_rebind_default, + }, }; static int migrate_page_add(struct page *page, struct list_head *pagelist, @@ -1965,6 +1965,8 @@ unsigned int mempolicy_slab_node(void) &policy->v.nodes); return z->zone ? zone_to_nid(z->zone) : node; } + case MPOL_LOCAL: + return node; default: BUG(); @@ -2089,6 +2091,11 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) *mask = mempolicy->v.nodes; break; + case MPOL_LOCAL: + nid = numa_node_id(); + init_nodemask_of_node(mask, nid); + break; + default: BUG(); } @@ -2333,6 +2340,8 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) if (a->flags & MPOL_F_LOCAL) return true; return a->v.preferred_node == b->v.preferred_node; + case MPOL_LOCAL: + return true; default: BUG(); return false; @@ -2476,6 +2485,10 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long polnid = pol->v.preferred_node; break; + case MPOL_LOCAL: + polnid = numa_node_id(); + break; + case MPOL_BIND: /* Optimize placement among multiple nodes via NUMA balancing */ if (pol->flags & MPOL_F_MORON) { @@ -2920,7 +2933,6 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) */ if (nodelist) goto out; - mode = MPOL_PREFERRED; break; case MPOL_DEFAULT: /* @@ -2964,7 +2976,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) else if (nodelist) new->v.preferred_node = first_node(nodes); else - new->flags |= MPOL_F_LOCAL; + new->mode = MPOL_LOCAL; /* * Save nodes for contextualization: this will be used to "clone" From patchwork Wed May 26 05:01:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12280633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C9FAC2B9F7 for ; Wed, 26 May 2021 05:02:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 078C9613B9 for ; Wed, 26 May 2021 05:02:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 078C9613B9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 91CF86B0072; Wed, 26 May 2021 01:02:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F3356B0073; Wed, 26 May 2021 01:02:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 724476B0074; Wed, 26 May 2021 01:02:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id 3A3246B0072 for ; Wed, 26 May 2021 01:02:07 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CE5EF8E7F for ; Wed, 26 May 2021 05:02:06 +0000 (UTC) X-FDA: 78182185452.13.176965D Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf10.hostedemail.com (Postfix) with ESMTP id E478840002F3 for ; Wed, 26 May 2021 05:01:59 +0000 (UTC) IronPort-SDR: 9Aq1OkEt7A3ozPqWN5xq3vfbF+8cWlF2ID5kYNVdl94JMbDamPDjue1TQGcc4dkoH2QHDWzL2I KDiAWY2/dl1g== X-IronPort-AV: E=McAfee;i="6200,9189,9995"; a="202418881" X-IronPort-AV: E=Sophos;i="5.82,330,1613462400"; d="scan'208";a="202418881" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2021 22:02:05 -0700 IronPort-SDR: DhbrCxNEq6gxkIKSq9OyZS9MbW0Q5HHyQY9MnRyocXoBhKT3wngJ0JnXf6oMiCRifgXF0ycl2V +cR+FD+x+aQA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,330,1613462400"; d="scan'208";a="479700325" Received: from shbuild999.sh.intel.com ([10.239.147.94]) by fmsmga002.fm.intel.com with ESMTP; 25 May 2021 22:02:02 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [PATCH v1 4/4] mm/mempolicy: kill MPOL_F_LOCAL bit Date: Wed, 26 May 2021 13:01:42 +0800 Message-Id: <1622005302-23027-5-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1622005302-23027-1-git-send-email-feng.tang@intel.com> References: <1622005302-23027-1-git-send-email-feng.tang@intel.com> X-Rspamd-Queue-Id: E478840002F3 Authentication-Results: imf10.hostedemail.com; dkim=none; spf=none (imf10.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=feng.tang@intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-Rspamd-Server: rspam03 X-Stat-Signature: 4o73es97ac9n1m9yczrkgxn7euwebtw3 X-HE-Tag: 1622005319-317171 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now the only remaining case of a real 'local' policy faked by 'prefer' policy plus MPOL_F_LOCAL bit is: A valid 'prefer' policy with a valid 'preferred' node is 'rebind' to a nodemask which doesn't contains the 'preferred' node, then it will handle allocation with 'local' policy. Add a new 'MPOL_F_LOCAL_TEMP' bit for this case, and kill the MPOL_F_LOCAL bit, which could simplify the code much. Reviewed-by: Andi Kleen Signed-off-by: Feng Tang --- include/uapi/linux/mempolicy.h | 1 + mm/mempolicy.c | 77 +++++++++++++++++++++++------------------- 2 files changed, 43 insertions(+), 35 deletions(-) diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 4832fd0..942844a 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -63,6 +63,7 @@ enum { #define MPOL_F_LOCAL (1 << 1) /* preferred local allocation */ #define MPOL_F_MOF (1 << 3) /* this policy wants migrate on fault */ #define MPOL_F_MORON (1 << 4) /* Migrate On protnone Reference On Node */ +#define MPOL_F_LOCAL_TEMP (1 << 5) /* a policy temporarily changed from 'prefer' to 'local' */ /* * These bit locations are exposed in the vm.zone_reclaim_mode sysctl diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d97839d..6046196 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -337,6 +337,22 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes) pol->v.nodes = tmp; } +static void mpol_rebind_local(struct mempolicy *pol, + const nodemask_t *nodes) +{ + if (unlikely(pol->flags & MPOL_F_STATIC_NODES)) { + int node = first_node(pol->w.user_nodemask); + + BUG_ON(!(pol->flags & MPOL_F_LOCAL_TEMP)); + + if (node_isset(node, *nodes)) { + pol->v.preferred_node = node; + pol->mode = MPOL_PREFERRED; + pol->flags &= ~MPOL_F_LOCAL_TEMP; + } + } +} + static void mpol_rebind_preferred(struct mempolicy *pol, const nodemask_t *nodes) { @@ -347,13 +363,19 @@ static void mpol_rebind_preferred(struct mempolicy *pol, if (node_isset(node, *nodes)) { pol->v.preferred_node = node; - pol->flags &= ~MPOL_F_LOCAL; - } else - pol->flags |= MPOL_F_LOCAL; + } else { + /* + * If there is no valid node, change the mode to + * MPOL_LOCAL, which will be restored back when + * next rebind() sees a valid node. + */ + pol->mode = MPOL_LOCAL; + pol->flags |= MPOL_F_LOCAL_TEMP; + } } else if (pol->flags & MPOL_F_RELATIVE_NODES) { mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); pol->v.preferred_node = first_node(tmp); - } else if (!(pol->flags & MPOL_F_LOCAL)) { + } else { pol->v.preferred_node = node_remap(pol->v.preferred_node, pol->w.cpuset_mems_allowed, *nodes); @@ -372,7 +394,7 @@ static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask) { if (!pol) return; - if (!mpol_store_user_nodemask(pol) && !(pol->flags & MPOL_F_LOCAL) && + if (!mpol_store_user_nodemask(pol) && nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) return; @@ -425,7 +447,7 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { }, [MPOL_LOCAL] = { .create = mpol_new_local, - .rebind = mpol_rebind_default, + .rebind = mpol_rebind_local, }, }; @@ -919,10 +941,12 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) case MPOL_INTERLEAVE: *nodes = p->v.nodes; break; + case MPOL_LOCAL: + /* return empty node mask for local allocation */ + break; + case MPOL_PREFERRED: - if (!(p->flags & MPOL_F_LOCAL)) - node_set(p->v.preferred_node, *nodes); - /* else return empty node mask for local allocation */ + node_set(p->v.preferred_node, *nodes); break; default: BUG(); @@ -1899,9 +1923,9 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) /* Return the node id preferred by the given mempolicy, or the given id */ static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) { - if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) + if (policy->mode == MPOL_PREFERRED) { nd = policy->v.preferred_node; - else { + } else { /* * __GFP_THISNODE shouldn't even be used with the bind policy * because we might easily break the expectation to stay on the @@ -1938,14 +1962,11 @@ unsigned int mempolicy_slab_node(void) return node; policy = current->mempolicy; - if (!policy || policy->flags & MPOL_F_LOCAL) + if (!policy) return node; switch (policy->mode) { case MPOL_PREFERRED: - /* - * handled MPOL_F_LOCAL above - */ return policy->v.preferred_node; case MPOL_INTERLEAVE: @@ -2079,16 +2100,13 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) mempolicy = current->mempolicy; switch (mempolicy->mode) { case MPOL_PREFERRED: - if (mempolicy->flags & MPOL_F_LOCAL) - nid = numa_node_id(); - else - nid = mempolicy->v.preferred_node; + nid = mempolicy->v.preferred_node; init_nodemask_of_node(mask, nid); break; case MPOL_BIND: case MPOL_INTERLEAVE: - *mask = mempolicy->v.nodes; + *mask = mempolicy->v.nodes; break; case MPOL_LOCAL: @@ -2200,7 +2218,7 @@ struct page *alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, * If the policy is interleave, or does not allow the current * node in its nodemask, we allocate the standard way. */ - if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) + if (pol->mode == MPOL_PREFERRED) hpage_node = pol->v.preferred_node; nmask = policy_nodemask(gfp, pol); @@ -2336,9 +2354,6 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) case MPOL_INTERLEAVE: return !!nodes_equal(a->v.nodes, b->v.nodes); case MPOL_PREFERRED: - /* a's ->flags is the same as b's */ - if (a->flags & MPOL_F_LOCAL) - return true; return a->v.preferred_node == b->v.preferred_node; case MPOL_LOCAL: return true; @@ -2479,10 +2494,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long break; case MPOL_PREFERRED: - if (pol->flags & MPOL_F_LOCAL) - polnid = numa_node_id(); - else - polnid = pol->v.preferred_node; + polnid = pol->v.preferred_node; break; case MPOL_LOCAL: @@ -2853,9 +2865,6 @@ void numa_default_policy(void) * Parse and format mempolicy from/to strings */ -/* - * "local" is implemented internally by MPOL_PREFERRED with MPOL_F_LOCAL flag. - */ static const char * const policy_modes[] = { [MPOL_DEFAULT] = "default", @@ -3022,12 +3031,10 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) switch (mode) { case MPOL_DEFAULT: + case MPOL_LOCAL: break; case MPOL_PREFERRED: - if (flags & MPOL_F_LOCAL) - mode = MPOL_LOCAL; - else - node_set(pol->v.preferred_node, nodes); + node_set(pol->v.preferred_node, nodes); break; case MPOL_BIND: case MPOL_INTERLEAVE: