From patchwork Thu Nov 7 03:56:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11231791 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E11C1575 for ; Thu, 7 Nov 2019 04:10:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6471C2187F for ; Thu, 7 Nov 2019 04:10:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6471C2187F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6DF6D6B0003; Wed, 6 Nov 2019 23:10:56 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 668346B0006; Wed, 6 Nov 2019 23:10:56 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 508C66B0007; Wed, 6 Nov 2019 23:10:56 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id 32B0A6B0003 for ; Wed, 6 Nov 2019 23:10:56 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id C7DBF4410 for ; Thu, 7 Nov 2019 04:10:55 +0000 (UTC) X-FDA: 76128155670.17.wall47_7c6035c9f605c X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,dan.j.williams@intel.com,:linux-nvdimm@lists.01.org:ira.weiny@intel.com:david@redhat.com:peterz@infradead.org:bp@alien8.de:mpe@ellerman.id.au:vishal.l.verma@intel.com:hpa@zytor.com:x86@kernel.org:tglx@linutronix.de:dave.hansen@linux.intel.com:akpm@linux-foundation.org:mingo@redhat.com:mhocko@suse.com:aneesh.kumar@linux.ibm.com:rjw@rjwysocki.net:luto@kernel.org:oohall@gmail.com:linux-kernel@vger.kernel.org::dan.j.williams@intel.com,RULES_HIT:30012:30034:30054:30056:30064:30074:30075:30091,0,RBL:192.55.52.115:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: wall47_7c6035c9f605c X-Filterd-Recvd-Size: 6002 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 Nov 2019 04:10:54 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Nov 2019 20:10:52 -0800 X-IronPort-AV: E=Sophos;i="5.68,276,1569308400"; d="scan'208";a="192696501" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Nov 2019 20:10:52 -0800 Subject: [PATCH 00/16] Memory Hierarchy: Enable target node lookups for reserved memory From: Dan Williams To: linux-nvdimm@lists.01.org Cc: Ira Weiny , David Hildenbrand , Peter Zijlstra , Borislav Petkov , Michael Ellerman , Vishal Verma , "H. Peter Anvin" , x86@kernel.org, Thomas Gleixner , Dave Hansen , Andrew Morton , Ingo Molnar , Michal Hocko , "Aneesh Kumar K.V" , "Rafael J. Wysocki" , Andy Lutomirski , Oliver O'Halloran , linux-kernel@vger.kernel.org, linux-mm@kvack.org Date: Wed, 06 Nov 2019 19:56:35 -0800 Message-ID: <157309899529.1582359.15358067933360719580.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Yes, this patch series looks like a pile of boring libnvdimm cleanups, but buried at the end are some small gems that testing with libnvdimm uncovered. These gems will prove more valuable over time for Memory Hierarchy management as more platforms, via the ACPI HMAT and EFI Specific Purpose Memory, publish reserved or "soft-reserved" ranges to Linux. Linux system administrators will expect to be able to interact with those ranges with a unique numa node number when/if that memory is onlined via the dax_kmem driver [1]. One configuration that currently fails to properly convey the target node for the resulting memory hotplug operation is persistent memory defined by the memmap=nn!ss parameter. For example, today if node1 is a memory only node, and all the memory from node1 is specified to memmap=nn!ss and subsequently onlined, it will end up being onlined as node0 memory. As it stands, memory_add_physaddr_to_nid() can only identify online nodes and since node1 in this example has no online cpus / memory the target node is initialized node0. The fix is to preserve rather than discard the numa_meminfo entries that are relevant for reserved memory ranges, and to uplevel the node distance helper for determining the "local" (closest) node relative to an initiator node. The first 12 patches are cleanups to make sure that all nvdimm devices and their children properly export a numa_node attribute. The switch to a device-type is less code and less error prone as a result. Patch 13 and 14 are the core changes (gems) to allow numa node information for offline memory to be tracked. Patches 15 and 16 use this new capability to fix the conveyance of numa node information for memmap=nn!ss assignments. See patch 16 for more details. [1]: https://pmem.io/ndctl/daxctl-reconfigure-device.html --- Dan Williams (16): libnvdimm: Move attribute groups to device type libnvdimm: Move region attribute group definition libnvdimm: Move nd_device_attribute_group to device_type libnvdimm: Move nd_numa_attribute_group to device_type libnvdimm: Move nd_region_attribute_group to device_type libnvdimm: Move nd_mapping_attribute_group to device_type libnvdimm: Move nvdimm_attribute_group to device_type libnvdimm: Move nvdimm_bus_attribute_group to device_type dax: Create a dax device_type dax: Simplify root read-only definition for the 'resource' attribute libnvdimm: Simplify root read-only definition for the 'resource' attribute dax: Add numa_node to the default device-dax attributes acpi/mm: Up-level "map to online node" functionality x86/numa: Provide a range-to-target_node lookup facility libnvdimm/e820: Drop the wrapper around memory_add_physaddr_to_nid libnvdimm/e820: Retrieve and populate correct 'target_node' info arch/powerpc/platforms/pseries/papr_scm.c | 25 --- arch/x86/mm/numa.c | 72 ++++++++- drivers/acpi/nfit/core.c | 7 - drivers/acpi/numa.c | 41 ----- drivers/dax/bus.c | 22 ++- drivers/nvdimm/btt_devs.c | 24 +-- drivers/nvdimm/bus.c | 15 +- drivers/nvdimm/core.c | 8 + drivers/nvdimm/dax_devs.c | 27 +-- drivers/nvdimm/dimm_devs.c | 30 ++-- drivers/nvdimm/e820.c | 30 ---- drivers/nvdimm/namespace_devs.c | 77 +++++----- drivers/nvdimm/nd.h | 5 - drivers/nvdimm/of_pmem.c | 13 -- drivers/nvdimm/pfn_devs.c | 38 ++--- drivers/nvdimm/region_devs.c | 235 +++++++++++++++-------------- include/linux/acpi.h | 23 +++ include/linux/libnvdimm.h | 7 - include/linux/memory_hotplug.h | 6 + include/linux/numa.h | 2 mm/mempolicy.c | 30 ++++ 21 files changed, 382 insertions(+), 355 deletions(-)