From patchwork Tue Sep 8 06:09:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 11762569 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47544138E for ; Tue, 8 Sep 2020 06:35:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D2CFE21973 for ; Tue, 8 Sep 2020 06:35:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D2CFE21973 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D21E66B0002; Tue, 8 Sep 2020 02:35:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CD0ED6B0037; Tue, 8 Sep 2020 02:35:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE7296B0055; Tue, 8 Sep 2020 02:35:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id A624A6B0002 for ; Tue, 8 Sep 2020 02:35:38 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 624391EFD for ; Tue, 8 Sep 2020 06:35:38 +0000 (UTC) X-FDA: 77238933156.06.burn12_3817523270d2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 26CBE100B0EF6 for ; Tue, 8 Sep 2020 06:35:38 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ying.huang@intel.com,,RULES_HIT:30054:30056:30064,0,RBL:192.55.52.136:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yg3bfk53dw6cmyrm8179gszap4xyp44ontfyrmk1ccr7zwzi98a7nzs9jhir8.hno7bc1fp5xjb5b7yp15saks6zr7akg3jaryu3gatjgsm71igi5fm688e1b5uw5.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: burn12_3817523270d2 X-Filterd-Recvd-Size: 5359 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Sep 2020 06:35:36 +0000 (UTC) IronPort-SDR: /pnturaUdn/qLP9j5etZnL2gcWv3wKFrvesuoGd7caXhvPrj7Z0iwszCCqGkiu150bkIGXJCeG sMxc+u0QXyoA== X-IronPort-AV: E=McAfee;i="6000,8403,9737"; a="137597378" X-IronPort-AV: E=Sophos;i="5.76,404,1592895600"; d="scan'208";a="137597378" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Sep 2020 23:35:35 -0700 IronPort-SDR: 2tklKCq80yWYYcZRQVGfrPHW3ZIYTzXBpEWakjj88WThKyQckhPXbkSJVRJh9vrhxpzJWWLGvK jrKRu/xsKSHg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,404,1592895600"; d="scan'208";a="333369975" Received: from yhuang-mobile.sh.intel.com ([10.238.4.22]) by orsmga008.jf.intel.com with ESMTP; 07 Sep 2020 23:35:31 -0700 From: Huang Ying To: Dave Hansen , Peter Zijlstra , Andy Lutomirski , Ingo Molnar Cc: x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrew Morton , Thomas Gleixner , Borislav Petkov , "H. Peter Anvin" , Dan Williams , David Rientjes , Dave Jiang Subject: [PATCH] x86, fakenuma: Avoid too large emulated node Date: Tue, 8 Sep 2020 14:09:12 +0800 Message-Id: <20200908060912.12200-1-ying.huang@intel.com> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: 26CBE100B0EF6 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On a testing system with 2 physical NUMA node, 8GB memory, a small memory hole from 640KB to 1MB, and a large memory hole from 3GB to 4GB. If "numa=fake=1G" is used in kernel command line, the resulting fake NUMA nodes are as follows, NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x00000000-0xbfffffff] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0x00000000-0x13fffffff] Faking node 0 at [mem 0x0000000000000000-0x0000000041ffffff] (1056MB) Faking node 1 at [mem 0x0000000140000000-0x000000017fffffff] (1024MB) Faking node 2 at [mem 0x0000000042000000-0x0000000081ffffff] (1024MB) Faking node 3 at [mem 0x0000000180000000-0x00000001bfffffff] (1024MB) Faking node 4 at [mem 0x0000000082000000-0x000000013fffffff] (3040MB) Faking node 5 at [mem 0x00000001c0000000-0x00000001ffffffff] (1024MB) Faking node 6 at [mem 0x0000000200000000-0x000000023fffffff] (1024MB) Where, 7 fake NUMA nodes are emulated, the size of fake node 4 is 3040 - 1024 = 2016MB. This is nearly 2 times of the size of the other fake nodes (about 1024MB). This isn't a reasonable splitting. The better way is to make the fake node size not too large or small. So in this patch, the splitting algorithm is changed to make the fake node size between 1/2 to 3/2 of the specified node size. After applying this patch, the resulting fake NUMA nodes become, Faking node 0 at [mem 0x0000000000000000-0x0000000041ffffff] (1056MB) Faking node 1 at [mem 0x0000000140000000-0x000000017fffffff] (1024MB) Faking node 2 at [mem 0x0000000042000000-0x0000000081ffffff] (1024MB) Faking node 3 at [mem 0x0000000180000000-0x00000001bfffffff] (1024MB) Faking node 4 at [mem 0x0000000082000000-0x0000000103ffffff] (2080MB) Faking node 5 at [mem 0x00000001c0000000-0x00000001ffffffff] (1024MB) Faking node 6 at [mem 0x0000000104000000-0x000000013fffffff] (960MB) Faking node 7 at [mem 0x0000000200000000-0x000000023fffffff] (1024MB) The newly added node 6 is a little smaller than the specified node size (960MB vs. 1024MB). But the overall results look more reasonable. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Dan Williams Cc: David Rientjes Cc: Dave Jiang --- arch/x86/mm/numa_emulation.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c index 683cd12f4793..231469e1de6a 100644 --- a/arch/x86/mm/numa_emulation.c +++ b/arch/x86/mm/numa_emulation.c @@ -300,9 +300,10 @@ static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei, /* * If there won't be enough non-reserved memory for the * next node, this one must extend to the end of the - * physical node. + * physical node. The size of the emulated node should + * be between size/2 and size*3/2. */ - if ((limit - end - mem_hole_size(end, limit) < size) + if ((limit - end - mem_hole_size(end, limit) < size / 2) && !uniform) end = limit;