From patchwork Wed Nov 11 06:37:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 11896487 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E48015E6 for ; Wed, 11 Nov 2020 06:38:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1C56620795 for ; Wed, 11 Nov 2020 06:38:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C56620795 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 235F66B005D; Wed, 11 Nov 2020 01:38:15 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1E67F6B0068; Wed, 11 Nov 2020 01:38:15 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D5FA6B006C; Wed, 11 Nov 2020 01:38:15 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0159.hostedemail.com [216.40.44.159]) by kanga.kvack.org (Postfix) with ESMTP id D39726B005D for ; Wed, 11 Nov 2020 01:38:14 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7F757362A for ; Wed, 11 Nov 2020 06:38:14 +0000 (UTC) X-FDA: 77471182908.16.way91_490d744272fb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 5EC29100E690C for ; Wed, 11 Nov 2020 06:38:14 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ying.huang@intel.com,,RULES_HIT:30054:30064,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yrk6ukhbum86u1k8hhyxbwjekh6ypwziw1aujw6mrx1xg4piyweyz8xdgfy3c.4mfu7ufz5dt76iqeasj45nwksrxyq7eeoepnhxnn4tht8utos4ww13gyw6umq8e.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:1:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: way91_490d744272fb X-Filterd-Recvd-Size: 7294 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 06:38:12 +0000 (UTC) IronPort-SDR: CzZ7Q519b4ekZ+ByPfrMDJbV6GrAdwXeDlMEuJyyilkfA3/f1jeF8D4Afn4zNcTdMR0EZPgRzF 1Uj62+b8e+kA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170259799" X-IronPort-AV: E=Sophos;i="5.77,468,1596524400"; d="scan'208";a="170259799" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 22:38:11 -0800 IronPort-SDR: aGUBJXPQTTedajjqdlRakP4Sq1QNv9YfJuR0WliIznMReZcd9XgQBvFZBTYIxqscUFF6yzgQG7 J4qadlxCKJeA== X-IronPort-AV: E=Sophos;i="5.77,468,1596524400"; d="scan'208";a="541655517" Received: from yhuang-mobile.sh.intel.com ([10.238.5.184]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 22:38:05 -0800 From: Huang Ying To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrew Morton , Ingo Molnar , Mel Gorman , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: [RFC -V4] autonuma: Migrate on fault among multiple bound nodes Date: Wed, 11 Nov 2020 14:37:17 +0800 Message-Id: <20201111063717.186589-1-ying.huang@intel.com> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now, AutoNUMA can only optimize the page placement among the NUMA nodes if the default memory policy is used. Because the memory policy specified explicitly should take precedence. But this seems too strict in some situations. For example, on a system with 4 NUMA nodes, if the memory of an application is bound to the node 0 and 1, AutoNUMA can potentially migrate the pages between the node 0 and 1 to reduce cross-node accessing without breaking the explicit memory binding policy. So in this patch, we added MPOL_MF_AUTONUMA to mbind() and MPOL_F_AUTONUMA to set_mempolicy(). With these flags specified, AutoNUMA will be enabled within the memory area or thread to optimize the page placement within the constrains of the specified memory binding policy. With these newly added flags, the NUMA balancing control mechanism becomes, - sysctl knob numa_balancing can enable/disable the NUMA balancing globally. - even if sysctl numa_balancing is enabled, the NUMA balancing will be disabled for the memory areas or applications with the explicit memory policy by default. - MPOL_MF_AUTONUMA and MPOL_F_AUTONUMA can be used to enable the NUMA balancing for the memory areas or applications when specifying the explicit memory policy. Various page placement optimization based on the NUMA balancing can be done with these flags. As the first step, in this patch, if the memory of the application is bound to multiple nodes (MPOL_BIND), and in the hint page fault handler both the faulting page node and the accessing node are in the policy nodemask, the page will be tried to be migrated to the accessing node to reduce the cross-node accessing. In the previous version of the patch, we tried to reuse MPOL_MF_LAZY for mbind(). But that flag is tied to MPOL_MF_MOVE.*, so it seems not a good API/ABI for the purpose of the patch. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Ingo Molnar Cc: Mel Gorman Cc: Rik van Riel Cc: Johannes Weiner Cc: "Matthew Wilcox (Oracle)" Cc: Dave Hansen Cc: Andi Kleen Cc: Michal Hocko Cc: David Rientjes --- include/uapi/linux/mempolicy.h | 10 +++++++--- mm/mempolicy.c | 13 +++++++++++++ 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 3354774af61e..99afe7f4b61e 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -28,12 +28,14 @@ enum { /* Flags for set_mempolicy */ #define MPOL_F_STATIC_NODES (1 << 15) #define MPOL_F_RELATIVE_NODES (1 << 14) +#define MPOL_F_AUTONUMA (1 << 13) /* Optimize with AutoNUMA if possible */ /* * MPOL_MODE_FLAGS is the union of all possible optional mode flags passed to * either set_mempolicy() or mbind(). */ -#define MPOL_MODE_FLAGS (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES) +#define MPOL_MODE_FLAGS \ + (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES | MPOL_F_AUTONUMA) /* Flags for get_mempolicy */ #define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */ @@ -46,11 +48,13 @@ enum { to policy */ #define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to policy */ #define MPOL_MF_LAZY (1<<3) /* Modifies '_MOVE: lazy migrate on fault */ -#define MPOL_MF_INTERNAL (1<<4) /* Internal flags start here */ +#define MPOL_MF_AUTONUMA (1<<4) /* Optimize with AutoNUMA if possible */ +#define MPOL_MF_INTERNAL (1<<5) /* Internal flags start here */ #define MPOL_MF_VALID (MPOL_MF_STRICT | \ MPOL_MF_MOVE | \ - MPOL_MF_MOVE_ALL) + MPOL_MF_MOVE_ALL | \ + MPOL_MF_AUTONUMA) /* * Internal flags that share the struct mempolicy flags word with diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3ca4898f3f24..37e4e76ded62 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -875,6 +875,9 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, goto out; } + if (new && new->mode == MPOL_BIND && (flags & MPOL_F_AUTONUMA)) + new->flags |= (MPOL_F_MOF | MPOL_F_MORON); + ret = mpol_set_nodemask(new, nodes, scratch); if (ret) { mpol_put(new); @@ -1278,6 +1281,8 @@ static long do_mbind(unsigned long start, unsigned long len, if (flags & ~(unsigned long)MPOL_MF_VALID) return -EINVAL; + if ((flags & MPOL_MF_LAZY) && (flags & MPOL_MF_AUTONUMA)) + return -EINVAL; if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE)) return -EPERM; @@ -1301,6 +1306,8 @@ static long do_mbind(unsigned long start, unsigned long len, if (flags & MPOL_MF_LAZY) new->flags |= MPOL_F_MOF; + if (new && new->mode == MPOL_BIND && (flags & MPOL_MF_AUTONUMA)) + new->flags |= (MPOL_F_MOF | MPOL_F_MORON); /* * If we are using the default policy then operation @@ -2490,6 +2497,12 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long break; case MPOL_BIND: + /* Optimize placement among multiple nodes via NUMA balancing */ + if (pol->flags & MPOL_F_MORON) { + if (node_isset(thisnid, pol->v.nodes)) + break; + goto out; + } /* * allows binding to multiple nodes.