From patchwork Tue Aug 18 08:11:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srikar Dronamraju X-Patchwork-Id: 11720291 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67E2314F6 for ; Tue, 18 Aug 2020 08:12:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 288BE2076E for ; Tue, 18 Aug 2020 08:12:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="DdgNPuOI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 288BE2076E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3A4DB6B0005; Tue, 18 Aug 2020 04:12:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 356746B000C; Tue, 18 Aug 2020 04:12:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2456F6B0010; Tue, 18 Aug 2020 04:12:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id 0B3B26B0005 for ; Tue, 18 Aug 2020 04:12:23 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BD1F4824556B for ; Tue, 18 Aug 2020 08:12:22 +0000 (UTC) X-FDA: 77162972124.28.play29_31141df2701d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 7727E6C33 for ; Tue, 18 Aug 2020 08:12:22 +0000 (UTC) X-Spam-Summary: 1,0,0,f9a824a97ef4f0ae,d41d8cd98f00b204,srikar@linux.vnet.ibm.com,,RULES_HIT:1:2:41:69:355:379:541:800:960:966:967:973:982:988:989:1260:1261:1263:1311:1314:1345:1431:1437:1515:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2525:2559:2567:2682:2685:2691:2693:2741:2859:2892:2895:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3653:3865:3866:3867:3868:3870:3871:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4050:4250:4321:4379:4385:4605:5007:6117:6119:6261:6630:6653:6742:7875:7903:7904:8599:8603:8957:9000:9025:9121:9388:10004:10026:10049:11026:11232:11233:11473:11657:11658:11914:12043:12291:12296:12297:12438:12555:12679:12683:12895:12986:13053:13161:13180:13229:13846:13870:13894:14095:14096:14394:21063:21080:21324:21451:21611:21627:21749:21795:21811:21939:30034:30051:30054:30070:30075:30080,0,RBL:148.163.156.1:@linux.vnet.ibm.com:.lbl8.mailshell.net-62.8.0.100 64.201.201.201;04yfndsa14wjtwop6d5wrdutjoqhuyc5bcpe85tun qbch5p37 X-HE-Tag: play29_31141df2701d X-Filterd-Recvd-Size: 12036 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 18 Aug 2020 08:12:21 +0000 (UTC) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 07I81i9L002160; Tue, 18 Aug 2020 04:12:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=xSmMT1OcT6UDLbLN/FbzEHtTk8hgrmSdb2ulSgSfwc8=; b=DdgNPuOI0Ee+ck8pnUJaMBzKyVsGYEORkN9MftkEPEIZMd7ASQwE9MEdjpyMB4Bxnc3C HgMGnreRS9rPCjub4djCv+rBxFoZOB1fFKaxcatKI1BLIq5SVPOkS4m3aUD9cKtS9OJU iUuvWLJs3F43uUkyWbBW7758QDoIs7NnUX4Me/ePoHwBGlSZuSH0Xwol2J6LrzsiP4MX 9eBu8BMmWebqdGrfLSEvlnRRYsO2xvhkCWpPAQszYTt3a86GT8H5QUiGKa2LmHKJ8OgP RjVWYWy9IL0v+Pcip2jW1J69tFUozdKczt1o+2KhZ2FEhJAG1umIsvNuPeP7KaKTSyKq sg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3304r71eww-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Aug 2020 04:12:15 -0400 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 07I81qr9002913; Tue, 18 Aug 2020 04:12:13 -0400 Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com with ESMTP id 3304r71eux-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Aug 2020 04:12:13 -0400 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 07I8BZUX010524; Tue, 18 Aug 2020 08:12:11 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma06fra.de.ibm.com with ESMTP id 3304tr87ar-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Aug 2020 08:12:11 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 07I8C80t27001208 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Aug 2020 08:12:08 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9AE68A404D; Tue, 18 Aug 2020 08:12:08 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A02B4A405D; Tue, 18 Aug 2020 08:12:03 +0000 (GMT) Received: from srikart450.in.ibm.com (unknown [9.85.93.83]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 18 Aug 2020 08:12:03 +0000 (GMT) From: Srikar Dronamraju To: Michael Ellerman Cc: linuxppc-dev , Srikar Dronamraju , linux-mm@kvack.org, Michal Hocko , Mel Gorman , Vlastimil Babka , Christopher Lameter , Andrew Morton , Linus Torvalds , Gautham R Shenoy , Satheesh Rajendran , David Hildenbrand , Aneesh Kumar K V Subject: [PATCH v6 0/3] Offline memoryless cpuless node 0 Date: Tue, 18 Aug 2020 13:41:01 +0530 Message-Id: <20200818081104.57888-1-srikar@linux.vnet.ibm.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-08-18_04:2020-08-18,2020-08-18 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 spamscore=0 priorityscore=1501 malwarescore=0 clxscore=1015 phishscore=0 suspectscore=0 bulkscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008180051 X-Rspamd-Queue-Id: 7727E6C33 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changelog v5:->v6: - Now the fix is Powerpc specific. (David Hildenbrand, Michal Hocko, Christopher Lamater) - rebased to v5.8 link v5: https://lore.kernel.org/linuxppc-dev/20200624092846.9194-1-srikar@linux.vnet.ibm.com/t/#u Changelog v4:->v5: - rebased to v5.8 link v4: http://lore.kernel.org/lkml/20200512132937.19295-1-srikar@linux.vnet.ibm.com/t/#u Changelog v3:->v4: - Resolved comments from Christopher. Link v3: http://lore.kernel.org/lkml/20200501031128.19584-1-srikar@linux.vnet.ibm.com/t/#u Changelog v2:->v3: - Resolved comments from Gautham. Link v2: https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-srikar@linux.vnet.ibm.com/t/#u Changelog v1:->v2: - Rebased to v5.7-rc3 - Updated the changelog. Link v1: https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u Linux kernel configured with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause 1. numa_balancing to be enabled on systems with only one online node. 2. Existence of dummy (cpuless and memoryless) node which can confuse users/scripts looking at output of lscpu / numactl. This patchset wants to correct this anomaly. This should only affect systems that have CONFIG_MEMORYLESS_NODES. Currently there are only 2 architectures ia64 and powerpc that have this config. Note: Patch 3 in this patch series depends on patches 1 and 2. Without patches 1 and 2, patch 3 might crash powerpc. v5.8 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files ------------------ /sys/devices/system/node/online: 0,2 /proc/sys/kernel/numa_balancing: 1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.8 + patches ------------------ available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files ------------------ /sys/devices/system/node/online: 2 /proc/sys/kernel/numa_balancing: 0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 1. User space applications like Numactl, lscpu, that parse the sysfs tend to believe there is an extra online node. This tends to confuse users and applications. Other user space applications start believing that system was not able to use all the resources (i.e missing resources) or the system was not setup correctly. 2. Also existence of dummy node also leads to inconsistent information. The number of online nodes is inconsistent with the information in the device-tree and resource-dump 3. When the dummy node is present, single node non-Numa systems end up showing up as NUMA systems and numa_balancing gets enabled. This will mean we take the hit from the unnecessary numa hinting faults. On a machine with just one node with node number not being 0, the current setup will end up showing 2 online nodes. And when there are more than one online nodes, numa_balancing gets enabled. Without patch $ grep numa /proc/vmstat numa_hit 3864714 numa_miss 0 numa_foreign 0 numa_interleave 2872 numa_local 3864714 numa_other 0 numa_pte_updates 13739278 <---------- numa_huge_pte_updates 0 <---------- numa_hint_faults 13717222 <---------- numa_hint_faults_local 13717222 <---------- numa_pages_migrated 0 With patch $ grep numa /proc/vmstat numa_hit 6633324 numa_miss 0 numa_foreign 0 numa_interleave 2864 numa_local 6633324 numa_other 0 numa_pte_updates 0 <---------- numa_huge_pte_updates 0 <---------- numa_hint_faults 0 <---------- numa_hint_faults_local 0 <---------- numa_pages_migrated 0 Here are 2 sample numa programs. numa01.sh is a set of 2 process each running threads as many as number of cpus; each thread doing 50 loops on 3GB process shared memory operations. numa02.sh is a single process with threads as many as number of cpus; each thread doing 800 loops on 32MB thread local memory operations. Without patch ------------- Testcase Time: Min Max Avg StdDev ./numa01.sh Real: 164.67 164.89 164.76 0.07 ./numa01.sh Sys: 2.88 3.38 3.05 0.17 ./numa01.sh User: 1297.85 1301.82 1300.86 1.51 ./numa02.sh Real: 27.44 27.46 27.45 0.01 ./numa02.sh Sys: 0.15 0.25 0.21 0.03 ./numa02.sh User: 216.65 216.93 216.80 0.09 With patch ----------- Testcase Time: Min Max Avg StdDev %Change ./numa01.sh Real: 164.20 164.38 164.28 0.08 0.292184% ./numa01.sh Sys: 0.72 0.90 0.82 0.06 271.951% ./numa01.sh User: 1300.39 1301.97 1300.94 0.56 -0.0061494% ./numa02.sh Real: 27.41 27.51 27.45 0.03 0% ./numa02.sh Sys: 0.09 0.16 0.13 0.03 61.5385% ./numa02.sh User: 216.38 216.91 216.64 0.21 0.0738552% numa01.sh param no_patch with_patch %Change ----- ---------- ---------- ------- numa_hint_faults 2946055 0 -100% numa_hint_faults_local 2946055 0 -100% numa_hit 700617 681234 -2.76656% numa_local 700617 681234 -2.76656% numa_pte_updates 2947175 0 -100% pgfault 4125926 1120053 -72.8533% pgmajfault 269 181 -32.7138% numa02.sh param no_patch with_patch %Change ----- ---------- ---------- ------- numa_hint_faults 137623 0 -100% numa_hint_faults_local 137623 0 -100% numa_hit 51332 54645 6.45406% numa_local 51332 54645 6.45406% numa_pte_updates 138903 0 -100% pgfault 247058 116743 -52.7467% pgmajfault 154 157 1.94805% Observations: The real time and user time actually doesn't change much. However the system time changes to some extent. The reason being the number of numa hinting faults. With the patch we are not seeing the numa hinting faults. Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-mm@kvack.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Cc: Gautham R Shenoy Cc: Satheesh Rajendran Cc: David Hildenbrand Cc: Aneesh Kumar K V Srikar Dronamraju (3): powerpc/numa: Set numa_node for all possible cpus powerpc/numa: Prefer node id queried from vphn powerpc/numa: Offline memoryless cpuless node 0 arch/powerpc/mm/numa.c | 45 ++++++++++++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 10 deletions(-)