From patchwork Mon Jun 24 18:19:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013941 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C1B8924 for ; Mon, 24 Jun 2019 18:33:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E48C28BAC for ; Mon, 24 Jun 2019 18:33:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4283C28BC4; Mon, 24 Jun 2019 18:33:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D402928BAC for ; Mon, 24 Jun 2019 18:33:51 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id A5E462129DB81; Mon, 24 Jun 2019 11:33:51 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 491EC212909E0 for ; Mon, 24 Jun 2019 11:33:50 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:33:49 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="359634788" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:33:49 -0700 Subject: [PATCH v4 01/10] acpi/numa: Establish a new drivers/acpi/numa/ directory From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:19:32 -0700 Message-ID: <156140037171.2951909.7432584124511649643.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ard.biesheuvel@linaro.org, peterz@infradead.org, Dave Hansen , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-nvdimm@lists.01.org, tglx@linutronix.de, Len Brown Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Currently hmat.c lives under an "hmat" directory which does not enhance the description of the file. The initial motivation for giving hmat.c its own directory was to delineate it as mm functionality in contrast to ACPI device driver functionality. As ACPI continues to play an increasing role in conveying memory location and performance topology information to the OS take the opportunity to co-locate these NUMA relevant tables in a combined directory. numa.c is renamed to srat.c and moved to drivers/acpi/numa/ along with hmat.c. Cc: Len Brown Cc: Keith Busch Cc: "Rafael J. Wysocki" Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Rafael J. Wysocki --- drivers/acpi/Kconfig | 9 +-------- drivers/acpi/Makefile | 3 +-- drivers/acpi/hmat/Makefile | 2 -- drivers/acpi/numa/Kconfig | 7 ++++++- drivers/acpi/numa/Makefile | 3 +++ drivers/acpi/numa/hmat.c | 0 drivers/acpi/numa/srat.c | 0 7 files changed, 11 insertions(+), 13 deletions(-) delete mode 100644 drivers/acpi/hmat/Makefile rename drivers/acpi/{hmat/Kconfig => numa/Kconfig} (72%) create mode 100644 drivers/acpi/numa/Makefile rename drivers/acpi/{hmat/hmat.c => numa/hmat.c} (100%) rename drivers/acpi/{numa.c => numa/srat.c} (100%) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 283ee94224c6..82c4a31c8701 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -321,12 +321,6 @@ config ACPI_THERMAL To compile this driver as a module, choose M here: the module will be called thermal. -config ACPI_NUMA - bool "NUMA support" - depends on NUMA - depends on (X86 || IA64 || ARM64) - default y if IA64_GENERIC || IA64_SGI_SN2 || ARM64 - config ACPI_CUSTOM_DSDT_FILE string "Custom DSDT Table file to include" default "" @@ -475,8 +469,7 @@ config ACPI_REDUCED_HARDWARE_ONLY If you are unsure what to do, do not enable this option. source "drivers/acpi/nfit/Kconfig" -source "drivers/acpi/hmat/Kconfig" - +source "drivers/acpi/numa/Kconfig" source "drivers/acpi/apei/Kconfig" source "drivers/acpi/dptf/Kconfig" diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 5d361e4e3405..f08a661274e8 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -55,7 +55,6 @@ acpi-$(CONFIG_X86) += acpi_cmos_rtc.o acpi-$(CONFIG_X86) += x86/apple.o acpi-$(CONFIG_X86) += x86/utils.o acpi-$(CONFIG_DEBUG_FS) += debugfs.o -acpi-$(CONFIG_ACPI_NUMA) += numa.o acpi-$(CONFIG_ACPI_PROCFS_POWER) += cm_sbs.o acpi-y += acpi_lpat.o acpi-$(CONFIG_ACPI_LPIT) += acpi_lpit.o @@ -80,7 +79,7 @@ obj-$(CONFIG_ACPI_PROCESSOR) += processor.o obj-$(CONFIG_ACPI) += container.o obj-$(CONFIG_ACPI_THERMAL) += thermal.o obj-$(CONFIG_ACPI_NFIT) += nfit/ -obj-$(CONFIG_ACPI_HMAT) += hmat/ +obj-$(CONFIG_ACPI_NUMA) += numa/ obj-$(CONFIG_ACPI) += acpi_memhotplug.o obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o obj-$(CONFIG_ACPI_BATTERY) += battery.o diff --git a/drivers/acpi/hmat/Makefile b/drivers/acpi/hmat/Makefile deleted file mode 100644 index 1c20ef36a385..000000000000 --- a/drivers/acpi/hmat/Makefile +++ /dev/null @@ -1,2 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_ACPI_HMAT) := hmat.o diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/numa/Kconfig similarity index 72% rename from drivers/acpi/hmat/Kconfig rename to drivers/acpi/numa/Kconfig index 95a29964dbea..d14582387ed0 100644 --- a/drivers/acpi/hmat/Kconfig +++ b/drivers/acpi/numa/Kconfig @@ -1,4 +1,9 @@ -# SPDX-License-Identifier: GPL-2.0 +config ACPI_NUMA + bool "NUMA support" + depends on NUMA + depends on (X86 || IA64 || ARM64) + default y if IA64_GENERIC || IA64_SGI_SN2 || ARM64 + config ACPI_HMAT bool "ACPI Heterogeneous Memory Attribute Table Support" depends on ACPI_NUMA diff --git a/drivers/acpi/numa/Makefile b/drivers/acpi/numa/Makefile new file mode 100644 index 000000000000..517a6c689a94 --- /dev/null +++ b/drivers/acpi/numa/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_ACPI_NUMA) += srat.o +obj-$(CONFIG_ACPI_HMAT) += hmat.o diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/numa/hmat.c similarity index 100% rename from drivers/acpi/hmat/hmat.c rename to drivers/acpi/numa/hmat.c diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa/srat.c similarity index 100% rename from drivers/acpi/numa.c rename to drivers/acpi/numa/srat.c From patchwork Mon Jun 24 18:19:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013945 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A912914B6 for ; Mon, 24 Jun 2019 18:33:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B72C28468 for ; Mon, 24 Jun 2019 18:33:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9082528BC0; Mon, 24 Jun 2019 18:33:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 48AD32888C for ; Mon, 24 Jun 2019 18:33:57 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 271082129DB9A; Mon, 24 Jun 2019 11:33:57 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.43; helo=mga05.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 06CBF212909E0 for ; Mon, 24 Jun 2019 11:33:55 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:33:55 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="184212692" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:33:54 -0700 Subject: [PATCH v4 02/10] acpi/numa/hmat: Skip publishing target info for nodes with no online memory From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:19:37 -0700 Message-ID: <156140037770.2951909.3387200938880485927.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ard.biesheuvel@linaro.org, peterz@infradead.org, Dave Hansen , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, tglx@linutronix.de, linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP There are multiple scenarios where the HMAT may contain information about proximity domains that are not currently online. Rather than fail to report any HMAT data just elide those offline domains. If and when those domains are later onlined they can be added to the HMEM reporting at that point. This was found while testing EFI_MEMORY_SP support which reserves "specific purpose" memory from the general allocation pool. If that reservation results in an empty numa-node then the node is not marked online leading a spurious: "acpi/hmat: Ignoring HMAT: Invalid table" ...result for HMAT parsing. Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Rafael J. Wysocki --- drivers/acpi/numa/hmat.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index 96b7d39a97c6..2c220cb7b620 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -96,9 +96,6 @@ static __init void alloc_memory_target(unsigned int mem_pxm) { struct memory_target *target; - if (pxm_to_node(mem_pxm) == NUMA_NO_NODE) - return; - target = find_mem_target(mem_pxm); if (target) return; @@ -588,6 +585,17 @@ static __init void hmat_register_targets(void) struct memory_target *target; list_for_each_entry(target, &targets, node) { + int nid = pxm_to_node(target->memory_pxm); + + /* + * Skip offline nodes. This can happen when memory + * marked EFI_MEMORY_SP, "specific purpose", is applied + * to all the memory in a promixity domain leading to + * the node being marked offline / unplugged, or if + * memory-only "hotplug" node is offline. + */ + if (nid == NUMA_NO_NODE || !node_online(nid)) + continue; hmat_register_target_initiators(target); hmat_register_target_perf(target); } From patchwork Mon Jun 24 18:19:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013949 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E77B214BB for ; Mon, 24 Jun 2019 18:34:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D860D28BBE for ; Mon, 24 Jun 2019 18:34:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CADA628BB6; Mon, 24 Jun 2019 18:34:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7918628BC2 for ; Mon, 24 Jun 2019 18:34:04 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 6B5A82129DB9D; Mon, 24 Jun 2019 11:34:04 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.20; helo=mga02.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 79F6B212909E0 for ; Mon, 24 Jun 2019 11:34:02 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:02 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="169525088" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:00 -0700 Subject: [PATCH v4 03/10] efi: Enumerate EFI_MEMORY_SP From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:19:43 -0700 Message-ID: <156140038345.2951909.2744090763307487180.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , peterz@infradead.org, Dave Hansen , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, tglx@linutronix.de, linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The intent of this bit is to allow the OS to identify precious or scarce memory resources and optionally manage it separately from EfiConventionalMemory. As defined older OSes that do not know about this attribute are permitted to ignore it and the memory will be handled according to the OS default policy for the given memory type. In other words, this "specific purpose" hint is deliberately weaker than EfiReservedMemoryType in that the system continues to operate if the OS takes no action on the attribute. The risk of taking no action is potentially unwanted / unmovable kernel allocations from the designated resource that prevent the full realization of the "specific purpose". For example, consider a system with a high-bandwidth memory pool. Older kernels are permitted to boot and consume that memory as conventional "System-RAM" newer kernels may arrange for that memory to be set aside by the system administrator for a dedicated high-bandwidth memory aware application to consume. Specifically, this mechanism allows for the elimination of scenarios where platform firmware tries to game OS policy by lying about ACPI SLIT values, i.e. claiming that a precious memory resource has a high distance to trigger the OS to avoid it by default. Implement simple detection of the bit for EFI memory table dumps and save the kernel policy for a follow-on change. Reviewed-by: Ard Biesheuvel Reviewed-by: Dave Hansen Signed-off-by: Dan Williams --- drivers/firmware/efi/efi.c | 5 +++-- include/linux/efi.h | 1 + 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 55b77c576c42..81db09485881 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -848,15 +848,16 @@ char * __init efi_md_typeattr_format(char *buf, size_t size, if (attr & ~(EFI_MEMORY_UC | EFI_MEMORY_WC | EFI_MEMORY_WT | EFI_MEMORY_WB | EFI_MEMORY_UCE | EFI_MEMORY_RO | EFI_MEMORY_WP | EFI_MEMORY_RP | EFI_MEMORY_XP | - EFI_MEMORY_NV | + EFI_MEMORY_NV | EFI_MEMORY_SP | EFI_MEMORY_RUNTIME | EFI_MEMORY_MORE_RELIABLE)) snprintf(pos, size, "|attr=0x%016llx]", (unsigned long long)attr); else snprintf(pos, size, - "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]", + "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]", attr & EFI_MEMORY_RUNTIME ? "RUN" : "", attr & EFI_MEMORY_MORE_RELIABLE ? "MR" : "", + attr & EFI_MEMORY_SP ? "SP" : "", attr & EFI_MEMORY_NV ? "NV" : "", attr & EFI_MEMORY_XP ? "XP" : "", attr & EFI_MEMORY_RP ? "RP" : "", diff --git a/include/linux/efi.h b/include/linux/efi.h index 6ebc2098cfe1..91368f5ce114 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -112,6 +112,7 @@ typedef struct { #define EFI_MEMORY_MORE_RELIABLE \ ((u64)0x0000000000010000ULL) /* higher reliability */ #define EFI_MEMORY_RO ((u64)0x0000000000020000ULL) /* read-only */ +#define EFI_MEMORY_SP ((u64)0x0000000000040000ULL) /* special purpose */ #define EFI_MEMORY_RUNTIME ((u64)0x8000000000000000ULL) /* range requires runtime mapping */ #define EFI_MEMORY_DESCRIPTOR_VERSION 1 From patchwork Mon Jun 24 18:19:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CF8F14B6 for ; Mon, 24 Jun 2019 18:34:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6E02528BC1 for ; Mon, 24 Jun 2019 18:34:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 61FF728BC2; Mon, 24 Jun 2019 18:34:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E0B2628BC1 for ; Mon, 24 Jun 2019 18:34:08 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C78462129DBBB; Mon, 24 Jun 2019 11:34:08 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.115; helo=mga14.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id A64EF21296704 for ; Mon, 24 Jun 2019 11:34:07 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:07 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="188004783" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:06 -0700 Subject: [PATCH v4 04/10] x86, efi: Push EFI_MEMMAP check into leaf routines From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:19:49 -0700 Message-ID: <156140038957.2951909.3922978209175458460.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ard.biesheuvel@linaro.org, Peter Zijlstra , Dave Hansen , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Ingo Molnar , Andy Lutomirski , "H. Peter Anvin" , tglx@linutronix.de, linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP In preparation for adding another EFI_MEMMAP dependent call that needs to occur before e820__memblock_setup() fixup the existing efi calls to check for EFI_MEMMAP internally. This is cleaner than checking EFI_MEMMAP multiple times in setup_arch(). Cc: Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Andy Lutomirski Cc: Thomas Gleixner Cc: Peter Zijlstra Reviewed-by: Dave Hansen Signed-off-by: Dan Williams --- arch/x86/include/asm/efi.h | 9 ++++++++- arch/x86/kernel/setup.c | 19 +++++++++---------- arch/x86/platform/efi/efi.c | 3 +++ arch/x86/platform/efi/quirks.c | 3 +++ drivers/firmware/efi/esrt.c | 3 +++ drivers/firmware/efi/fake_mem.c | 2 +- include/linux/efi.h | 2 -- 7 files changed, 27 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 606a4b6a9812..7d52378e376a 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -140,7 +140,6 @@ extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); extern void efi_recover_from_page_fault(unsigned long phys_addr); extern void efi_free_boot_services(void); -extern void efi_reserve_boot_services(void); struct efi_setup_data { u64 fw_vendor; @@ -243,12 +242,20 @@ static inline bool efi_is_64bit(void) extern bool efi_reboot_required(void); +extern void efi_find_mirror(void); +extern void efi_reserve_boot_services(void); #else static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {} static inline bool efi_reboot_required(void) { return false; } +static inline void efi_find_mirror(void) +{ +} +static inline void efi_reserve_boot_services(void) +{ +} #endif /* CONFIG_EFI */ #endif /* _ASM_X86_EFI_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 08a5f4a131f5..b68fd57a8d26 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1103,21 +1103,20 @@ void __init setup_arch(char **cmdline_p) cleanup_highmap(); memblock_set_current_limit(ISA_END_ADDRESS); + e820__memblock_setup(); reserve_bios_regions(); - if (efi_enabled(EFI_MEMMAP)) { - efi_fake_memmap(); - efi_find_mirror(); - efi_esrt_init(); + efi_fake_memmap(); + efi_find_mirror(); + efi_esrt_init(); - /* - * The EFI specification says that boot service code won't be - * called after ExitBootServices(). This is, in fact, a lie. - */ - efi_reserve_boot_services(); - } + /* + * The EFI specification says that boot service code won't be + * called after ExitBootServices(). This is, in fact, a lie. + */ + efi_reserve_boot_services(); /* preallocate 4k for mptable mpc */ e820__memblock_alloc_reserved_mpc_new(); diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index e1cb01a22fa8..4e8458b1ca30 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -103,6 +103,9 @@ void __init efi_find_mirror(void) efi_memory_desc_t *md; u64 mirror_size = 0, total_size = 0; + if (!efi_enabled(EFI_MEMMAP)) + return; + for_each_efi_memory_desc(md) { unsigned long long start = md->phys_addr; unsigned long long size = md->num_pages << EFI_PAGE_SHIFT; diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index feb77777c8b8..50f7303da7be 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -320,6 +320,9 @@ void __init efi_reserve_boot_services(void) { efi_memory_desc_t *md; + if (!efi_enabled(EFI_MEMMAP)) + return; + for_each_efi_memory_desc(md) { u64 start = md->phys_addr; u64 size = md->num_pages << EFI_PAGE_SHIFT; diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c index d6dd5f503fa2..2762e0662bf4 100644 --- a/drivers/firmware/efi/esrt.c +++ b/drivers/firmware/efi/esrt.c @@ -246,6 +246,9 @@ void __init efi_esrt_init(void) int rc; phys_addr_t end; + if (!efi_enabled(EFI_MEMMAP)) + return; + pr_debug("esrt-init: loading.\n"); if (!esrt_table_exists()) return; diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index 9501edc0fcfb..526b45331d96 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -44,7 +44,7 @@ void __init efi_fake_memmap(void) void *new_memmap; int i; - if (!nr_fake_mem) + if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem) return; /* count up the number of EFI memory descriptor */ diff --git a/include/linux/efi.h b/include/linux/efi.h index 91368f5ce114..ea6ce3ef71f5 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1042,9 +1042,7 @@ extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if pos extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking); -extern void efi_find_mirror(void); #else - static inline efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking) From patchwork Mon Jun 24 18:19:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013955 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 311EB924 for ; Mon, 24 Jun 2019 18:34:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 207772884E for ; Mon, 24 Jun 2019 18:34:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 143BC28BBB; Mon, 24 Jun 2019 18:34:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 243CF28BAC for ; Mon, 24 Jun 2019 18:34:15 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 1506621296B18; Mon, 24 Jun 2019 11:34:15 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.93; helo=mga11.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 92D9D212909E0 for ; Mon, 24 Jun 2019 11:34:13 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:13 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="182689415" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:12 -0700 Subject: [PATCH v4 05/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:19:55 -0700 Message-ID: <156140039574.2951909.3007721710664432872.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kbuild test robot , Ard Biesheuvel , Peter Zijlstra , Dave Hansen , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Ingo Molnar , Borislav Petkov , Andy Lutomirski , "H. Peter Anvin" , Darren Hart , tglx@linutronix.de, Andy Shevchenko , linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This patch introduces 2 new concepts at once given the entanglement between early boot enumeration relative to memory that can optionally be reserved from the kernel page allocator by default. The new concepts are: - E820_TYPE_APPLICATION_RESERVED: Upon detecting the EFI_MEMORY_SP attribute on EFI_CONVENTIONAL memory, update the E820 map with this new type. Only perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is enabled, otherwise treat it as typical ram. - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for a device driver to search iomem resources for application specific memory. Teach the iomem code to identify such ranges as "Application Reserved". A follow-on change integrates parsing of the ACPI HMAT to identify the node and sub-range boundaries of EFI_MEMORY_SP designated memory. For now, just identify and reserve memory of this type. For now this translation of EFI_CONVENTIONAL + EFI_MEMORY_SP is x86/E820-only, but other archs could choose to publish IORES_DESC_APPLICATION_RESERVED resources from their platform-firmware memory map handlers. Cc: Cc: Borislav Petkov Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Darren Hart Cc: Andy Shevchenko Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ard Biesheuvel Reported-by: kbuild test robot Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- arch/x86/Kconfig | 23 +++++++++++++++++++++++ arch/x86/boot/compressed/eboot.c | 5 ++++- arch/x86/boot/compressed/kaslr.c | 3 ++- arch/x86/include/asm/e820/types.h | 9 +++++++++ arch/x86/include/asm/efi.h | 17 +++++++++++++++++ arch/x86/kernel/e820.c | 12 ++++++++++-- arch/x86/kernel/setup.c | 1 + arch/x86/platform/efi/efi.c | 37 +++++++++++++++++++++++++++++++++---- include/linux/ioport.h | 1 + 9 files changed, 100 insertions(+), 8 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2bbbd4d1ba31..d8d3b4e87ac1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1955,6 +1955,29 @@ config EFI_MIXED If unsure, say N. +config EFI_APPLICATION_RESERVED + bool "Reserve EFI Specific Purpose Memory" + depends on ACPI_HMAT + default ACPI_HMAT + depends on EFI + ---help--- + On systems that have mixed performance classes of memory EFI + may indicate specific purpose memory with an attribute (See + EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this + attribute may have unique performance characteristics compared + to the system's general purpose "System RAM" pool. On the + expectation that such memory has application specific usage, + and its base EFI memory type is "conventional" answer Y to + arrange for the kernel to reserve it as an "Application + Reserved" resource, and set aside for direct-access + (device-dax) by default. The memory range can later be + optionally assigned to the page allocator by system + administrator policy via the device-dax kmem facility. Say N + to have the kernel treat this memory as "System RAM" by + default. + + If unsure, say Y. + config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index 544ac4fafd11..a6c96eb6e633 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -560,7 +560,10 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s case EFI_BOOT_SERVICES_CODE: case EFI_BOOT_SERVICES_DATA: case EFI_CONVENTIONAL_MEMORY: - e820_type = E820_TYPE_RAM; + if (is_efi_application_reserved(d)) + e820_type = E820_TYPE_APPLICATION_RESERVED; + else + e820_type = E820_TYPE_RAM; break; case EFI_ACPI_MEMORY_NVS: diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 2e53c056ba20..e8306f452182 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -757,7 +757,8 @@ process_efi_entries(unsigned long minimum, unsigned long image_size) * * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free. */ - if (md->type != EFI_CONVENTIONAL_MEMORY) + if (md->type != EFI_CONVENTIONAL_MEMORY + || is_efi_application_reserved(md)) continue; if (efi_mirror_found && diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h index c3aa4b5e49e2..41193c116a1f 100644 --- a/arch/x86/include/asm/e820/types.h +++ b/arch/x86/include/asm/e820/types.h @@ -28,6 +28,15 @@ enum e820_type { */ E820_TYPE_PRAM = 12, + /* + * Special-purpose / application-specific memory is indicated to + * the system via the EFI_MEMORY_SP attribute. Define an e820 + * translation of this memory type for the purpose of + * reserving this range and marking it with the + * IORES_DESC_APPLICATION_RESERVED designation. + */ + E820_TYPE_APPLICATION_RESERVED = 0xefffffff, + /* * Reserved RAM used by the kernel itself if * CONFIG_INTEL_TXT=y is enabled, memory of this type diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 7d52378e376a..4f80254e0541 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -141,6 +141,19 @@ extern void efi_switch_mm(struct mm_struct *mm); extern void efi_recover_from_page_fault(unsigned long phys_addr); extern void efi_free_boot_services(void); +#ifdef CONFIG_EFI_APPLICATION_RESERVED +static inline bool is_efi_application_reserved(efi_memory_desc_t *md) +{ + return md->type == EFI_CONVENTIONAL_MEMORY + && (md->attribute & EFI_MEMORY_SP); +} +#else +static inline bool is_efi_application_reserved(efi_memory_desc_t *md) +{ + return false; +} +#endif + struct efi_setup_data { u64 fw_vendor; u64 runtime; @@ -244,6 +257,7 @@ extern bool efi_reboot_required(void); extern void efi_find_mirror(void); extern void efi_reserve_boot_services(void); +extern void __init efi_find_application_reserved(void); #else static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {} static inline bool efi_reboot_required(void) @@ -256,6 +270,9 @@ static inline void efi_find_mirror(void) static inline void efi_reserve_boot_services(void) { } +static inline void __init efi_find_application_reserved(void) +{ +} #endif /* CONFIG_EFI */ #endif /* _ASM_X86_EFI_H */ diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 8f32e705a980..c5b91c2d0661 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -189,6 +189,7 @@ static void __init e820_print_type(enum e820_type type) switch (type) { case E820_TYPE_RAM: /* Fall through: */ case E820_TYPE_RESERVED_KERN: pr_cont("usable"); break; + case E820_TYPE_APPLICATION_RESERVED: pr_cont("application reserved"); break; case E820_TYPE_RESERVED: pr_cont("reserved"); break; case E820_TYPE_ACPI: pr_cont("ACPI data"); break; case E820_TYPE_NVS: pr_cont("ACPI NVS"); break; @@ -1036,6 +1037,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry) case E820_TYPE_UNUSABLE: return "Unusable memory"; case E820_TYPE_PRAM: return "Persistent Memory (legacy)"; case E820_TYPE_PMEM: return "Persistent Memory"; + case E820_TYPE_APPLICATION_RESERVED: return "Application Reserved"; case E820_TYPE_RESERVED: return "Reserved"; default: return "Unknown E820 type"; } @@ -1051,6 +1053,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry) case E820_TYPE_UNUSABLE: /* Fall-through: */ case E820_TYPE_PRAM: /* Fall-through: */ case E820_TYPE_PMEM: /* Fall-through: */ + case E820_TYPE_APPLICATION_RESERVED: /* Fall-through: */ case E820_TYPE_RESERVED: /* Fall-through: */ default: return IORESOURCE_MEM; } @@ -1063,6 +1066,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry) case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE; case E820_TYPE_PMEM: return IORES_DESC_PERSISTENT_MEMORY; case E820_TYPE_PRAM: return IORES_DESC_PERSISTENT_MEMORY_LEGACY; + case E820_TYPE_APPLICATION_RESERVED: return IORES_DESC_APPLICATION_RESERVED; case E820_TYPE_RESERVED_KERN: /* Fall-through: */ case E820_TYPE_RAM: /* Fall-through: */ case E820_TYPE_UNUSABLE: /* Fall-through: */ @@ -1078,13 +1082,14 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res) return true; /* - * Treat persistent memory like device memory, i.e. reserve it - * for exclusive use of a driver + * Treat persistent memory and other special memory ranges like + * device memory, i.e. reserve it for exclusive use of a driver */ switch (type) { case E820_TYPE_RESERVED: case E820_TYPE_PRAM: case E820_TYPE_PMEM: + case E820_TYPE_APPLICATION_RESERVED: return false; case E820_TYPE_RESERVED_KERN: case E820_TYPE_RAM: @@ -1285,6 +1290,9 @@ void __init e820__memblock_setup(void) if (end != (resource_size_t)end) continue; + if (entry->type == E820_TYPE_APPLICATION_RESERVED) + memblock_reserve(entry->addr, entry->size); + if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue; diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index b68fd57a8d26..3b9001b7c951 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1104,6 +1104,7 @@ void __init setup_arch(char **cmdline_p) memblock_set_current_limit(ISA_END_ADDRESS); + efi_find_application_reserved(); e820__memblock_setup(); reserve_bios_regions(); diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 4e8458b1ca30..4b4a9eb6d2c9 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -126,10 +126,18 @@ void __init efi_find_mirror(void) * more than the max 128 entries that can fit in the e820 legacy * (zeropage) memory map. */ +enum add_efi_mode { + ADD_EFI_ALL, + ADD_EFI_APPLICATION_RESERVED, +}; -static void __init do_add_efi_memmap(void) +static void __init do_add_efi_memmap(enum add_efi_mode mode) { efi_memory_desc_t *md; + int add = 0; + + if (!efi_enabled(EFI_MEMMAP)) + return; for_each_efi_memory_desc(md) { unsigned long long start = md->phys_addr; @@ -142,7 +150,9 @@ static void __init do_add_efi_memmap(void) case EFI_BOOT_SERVICES_CODE: case EFI_BOOT_SERVICES_DATA: case EFI_CONVENTIONAL_MEMORY: - if (md->attribute & EFI_MEMORY_WB) + if (is_efi_application_reserved(md)) + e820_type = E820_TYPE_APPLICATION_RESERVED; + else if (md->attribute & EFI_MEMORY_WB) e820_type = E820_TYPE_RAM; else e820_type = E820_TYPE_RESERVED; @@ -168,9 +178,22 @@ static void __init do_add_efi_memmap(void) e820_type = E820_TYPE_RESERVED; break; } + + if (e820_type == E820_TYPE_APPLICATION_RESERVED) + /* always add E820_TYPE_APPLICATION_RESERVED */; + else if (mode != ADD_EFI_APPLICATION_RESERVED) + continue; + + add++; e820__range_add(start, size, e820_type); } - e820__update_table(e820_table); + if (add) + e820__update_table(e820_table); +} + +void __init efi_find_application_reserved(void) +{ + do_add_efi_memmap(ADD_EFI_APPLICATION_RESERVED); } int __init efi_memblock_x86_reserve_range(void) @@ -203,7 +226,7 @@ int __init efi_memblock_x86_reserve_range(void) return rv; if (add_efi_memmap) - do_add_efi_memmap(); + do_add_efi_memmap(ADD_EFI_ALL); WARN(efi.memmap.desc_version != 1, "Unexpected EFI_MEMORY_DESCRIPTOR version %ld", @@ -756,6 +779,12 @@ static bool should_map_region(efi_memory_desc_t *md) if (IS_ENABLED(CONFIG_X86_32)) return false; + /* + * Specific purpose memory is reserved by default. + */ + if (is_efi_application_reserved(md)) + return false; + /* * Map all of RAM so that we can access arguments in the 1:1 * mapping when making EFI runtime calls. diff --git a/include/linux/ioport.h b/include/linux/ioport.h index da0ebaec25f0..2d79841ee9b9 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -133,6 +133,7 @@ enum { IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5, IORES_DESC_DEVICE_PRIVATE_MEMORY = 6, IORES_DESC_DEVICE_PUBLIC_MEMORY = 7, + IORES_DESC_APPLICATION_RESERVED = 8, }; /* helpers to define resources */ From patchwork Mon Jun 24 18:20:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013959 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE0AB14B6 for ; Mon, 24 Jun 2019 18:34:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CEBF828468 for ; Mon, 24 Jun 2019 18:34:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C2E8628BB6; Mon, 24 Jun 2019 18:34:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A725A28468 for ; Mon, 24 Jun 2019 18:34:20 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 91C1D2129DBAE; Mon, 24 Jun 2019 11:34:20 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.136; helo=mga12.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 0255521296704 for ; Mon, 24 Jun 2019 11:34:18 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:18 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="155241140" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:18 -0700 Subject: [PATCH v4 06/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:20:01 -0700 Message-ID: <156140040143.2951909.16322818756106417668.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , peterz@infradead.org, Dave Hansen , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , tglx@linutronix.de, linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Given that EFI_MEMORY_SP is platform BIOS policy descision for marking memory ranges as "reserved for a specific purpose" there will inevitably be scenarios where the BIOS omits the attribute in situations where it is desired. Unlike other attributes if the OS wants to reserve this memory from the kernel the reservation needs to happen early in init. So early, in fact, that it needs to happen before e820__memblock_setup() which is a pre-requisite for efi_fake_memmap() that wants to allocate memory for the updated table. Introduce an x86 specific efi_fake_memmap_early() that can search for attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table accordingly. Cc: Cc: Borislav Petkov Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Thomas Gleixner Cc: Ard Biesheuvel Reviewed-by: Dave Hansen Signed-off-by: Dan Williams --- arch/x86/include/asm/efi.h | 8 ++++ arch/x86/kernel/setup.c | 1 + drivers/firmware/efi/Makefile | 5 ++- drivers/firmware/efi/fake_mem.c | 24 ++++++------ drivers/firmware/efi/fake_mem.h | 10 +++++ drivers/firmware/efi/x86-fake_mem.c | 69 +++++++++++++++++++++++++++++++++++ 6 files changed, 103 insertions(+), 14 deletions(-) create mode 100644 drivers/firmware/efi/fake_mem.h create mode 100644 drivers/firmware/efi/x86-fake_mem.c diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 4f80254e0541..d6b18cedf0a8 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -275,4 +275,12 @@ static inline void __init efi_find_application_reserved(void) } #endif /* CONFIG_EFI */ +#ifdef CONFIG_EFI_FAKE_MEMMAP +extern void __init efi_fake_memmap_early(void); +#else +static inline void efi_fake_memmap_early(void) +{ +} +#endif + #endif /* _ASM_X86_EFI_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 3b9001b7c951..3a7de6d6f106 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1105,6 +1105,7 @@ void __init setup_arch(char **cmdline_p) memblock_set_current_limit(ISA_END_ADDRESS); efi_find_application_reserved(); + efi_fake_memmap_early(); e820__memblock_setup(); reserve_bios_regions(); diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index d2d0d2030620..b06a9df6094c 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -20,12 +20,15 @@ obj-$(CONFIG_UEFI_CPER) += cper.o obj-$(CONFIG_EFI_RUNTIME_MAP) += runtime-map.o obj-$(CONFIG_EFI_RUNTIME_WRAPPERS) += runtime-wrappers.o obj-$(CONFIG_EFI_STUB) += libstub/ -obj-$(CONFIG_EFI_FAKE_MEMMAP) += fake_mem.o +obj-$(CONFIG_EFI_FAKE_MEMMAP) += fake_map.o obj-$(CONFIG_EFI_BOOTLOADER_CONTROL) += efibc.o obj-$(CONFIG_EFI_TEST) += test/ obj-$(CONFIG_EFI_DEV_PATH_PARSER) += dev-path-parser.o obj-$(CONFIG_APPLE_PROPERTIES) += apple-properties.o +fake_map-y += fake_mem.o +fake_map-$(CONFIG_X86) += x86-fake_mem.o + arm-obj-$(CONFIG_EFI) := arm-init.o arm-runtime.o obj-$(CONFIG_ARM) += $(arm-obj-y) obj-$(CONFIG_ARM64) += $(arm-obj-y) diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index 526b45331d96..bb9fc70d0cfa 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -17,12 +17,10 @@ #include #include #include -#include +#include "fake_mem.h" -#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKE_MEM - -static struct efi_mem_range fake_mems[EFI_MAX_FAKEMEM]; -static int nr_fake_mem; +struct efi_mem_range efi_fake_mems[EFI_MAX_FAKEMEM]; +int nr_fake_mem; static int __init cmp_fake_mem(const void *x1, const void *x2) { @@ -50,7 +48,7 @@ void __init efi_fake_memmap(void) /* count up the number of EFI memory descriptor */ for (i = 0; i < nr_fake_mem; i++) { for_each_efi_memory_desc(md) { - struct range *r = &fake_mems[i].range; + struct range *r = &efi_fake_mems[i].range; new_nr_map += efi_memmap_split_count(md, r); } @@ -70,7 +68,7 @@ void __init efi_fake_memmap(void) } for (i = 0; i < nr_fake_mem; i++) - efi_memmap_insert(&efi.memmap, new_memmap, &fake_mems[i]); + efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]); /* swap into new EFI memmap */ early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map); @@ -104,22 +102,22 @@ static int __init setup_fake_mem(char *p) if (nr_fake_mem >= EFI_MAX_FAKEMEM) break; - fake_mems[nr_fake_mem].range.start = start; - fake_mems[nr_fake_mem].range.end = start + mem_size - 1; - fake_mems[nr_fake_mem].attribute = attribute; + efi_fake_mems[nr_fake_mem].range.start = start; + efi_fake_mems[nr_fake_mem].range.end = start + mem_size - 1; + efi_fake_mems[nr_fake_mem].attribute = attribute; nr_fake_mem++; if (*p == ',') p++; } - sort(fake_mems, nr_fake_mem, sizeof(struct efi_mem_range), + sort(efi_fake_mems, nr_fake_mem, sizeof(struct efi_mem_range), cmp_fake_mem, NULL); for (i = 0; i < nr_fake_mem; i++) pr_info("efi_fake_mem: add attr=0x%016llx to [mem 0x%016llx-0x%016llx]", - fake_mems[i].attribute, fake_mems[i].range.start, - fake_mems[i].range.end); + efi_fake_mems[i].attribute, efi_fake_mems[i].range.start, + efi_fake_mems[i].range.end); return *p == '\0' ? 0 : -EINVAL; } diff --git a/drivers/firmware/efi/fake_mem.h b/drivers/firmware/efi/fake_mem.h new file mode 100644 index 000000000000..0390be13df96 --- /dev/null +++ b/drivers/firmware/efi/fake_mem.h @@ -0,0 +1,10 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef __EFI_FAKE_MEM_H__ +#define __EFI_FAKE_MEM_H__ +#include + +#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKE_MEM + +extern struct efi_mem_range efi_fake_mems[EFI_MAX_FAKEMEM]; +extern int nr_fake_mem; +#endif /* __EFI_FAKE_MEM_H__ */ diff --git a/drivers/firmware/efi/x86-fake_mem.c b/drivers/firmware/efi/x86-fake_mem.c new file mode 100644 index 000000000000..3e9a80127562 --- /dev/null +++ b/drivers/firmware/efi/x86-fake_mem.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019 Intel Corporation. All rights reserved. */ +#include +#include +#include "fake_mem.h" + +void __init efi_fake_memmap_early(void) +{ + int i; + + /* + * efi_fake_mem() can handle all possibilities if EFI_MEMORY_SP + * is ignored. + */ + if (!IS_ENABLED(CONFIG_EFI_APPLICATION_RESERVED)) + return; + + if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem) + return; + + /* + * Given that efi_fake_memmap() needs to perform memblock + * allocations it needs to run after e820__memblock_setup(). + * However, if efi_fake_mem specifies EFI_MEMORY_SP for a given + * address range that potentially needs to mark the memory as + * reserved prior to e820__memblock_setup(). Update e820 + * directly if EFI_MEMORY_SP is specified for an + * EFI_CONVENTIONAL_MEMORY descriptor. + */ + for (i = 0; i < nr_fake_mem; i++) { + struct efi_mem_range *mem = &efi_fake_mems[i]; + efi_memory_desc_t *md; + u64 m_start, m_end; + + if ((mem->attribute & EFI_MEMORY_SP) == 0) + continue; + + m_start = mem->range.start; + m_end = mem->range.end; + for_each_efi_memory_desc(md) { + u64 start, end; + + if (md->type != EFI_CONVENTIONAL_MEMORY) + continue; + + start = md->phys_addr; + end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1; + + if (m_start <= end && m_end >= start) + /* fake range overlaps descriptor */; + else + continue; + + /* + * Trim the boundary of the e820 update to the + * descriptor in case the fake range overlaps + * !EFI_CONVENTIONAL_MEMORY + */ + start = max(start, m_start); + end = min(end, m_end); + + if (end <= start) + continue; + e820__range_update(start, end - start + 1, E820_TYPE_RAM, + E820_TYPE_APPLICATION_RESERVED); + e820__update_table(e820_table); + } + } +} From patchwork Mon Jun 24 18:20:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013965 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0957D14B6 for ; Mon, 24 Jun 2019 18:34:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F072228BB6 for ; Mon, 24 Jun 2019 18:34:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E4E1228BAC; Mon, 24 Jun 2019 18:34:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7D17A2884E for ; Mon, 24 Jun 2019 18:34:26 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 20D1E2129DBBC; Mon, 24 Jun 2019 11:34:26 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.93; helo=mga11.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id DBCB32129EB96 for ; Mon, 24 Jun 2019 11:34:23 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:23 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="152030289" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:23 -0700 Subject: [PATCH v4 07/10] resource: Uplevel the pmem "region" ida to a global allocator From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:20:06 -0700 Message-ID: <156140040657.2951909.15384446634808002027.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ard.biesheuvel@linaro.org, peterz@infradead.org, dave.hansen@linux.intel.com, linux-kernel@vger.kernel.org, Matthew Wilcox , linux-acpi@vger.kernel.org, tglx@linutronix.de, linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP In preparation for handling platform differentiated memory types beyond persistent memory, uplevel the "region" identifier to a global number space. This enables a device-dax instance to be registered to any memory type with guaranteed unique names. Given this is a general identifier for persistent and performance-differentiated memory, and a standalone header / source file was NAK'd, house it with the rest of the general resource enumeration implementation. Cc: Keith Busch Suggested-by: Matthew Wilcox Signed-off-by: Dan Williams --- drivers/nvdimm/Kconfig | 1 + drivers/nvdimm/core.c | 1 - drivers/nvdimm/nd-core.h | 1 - drivers/nvdimm/region_devs.c | 12 +++--------- include/linux/ioport.h | 27 +++++++++++++++++++++++++++ kernel/resource.c | 6 ++++++ lib/Kconfig | 3 +++ 7 files changed, 40 insertions(+), 11 deletions(-) diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig index 54500798f23a..4b3e66fe61c1 100644 --- a/drivers/nvdimm/Kconfig +++ b/drivers/nvdimm/Kconfig @@ -4,6 +4,7 @@ menuconfig LIBNVDIMM depends on PHYS_ADDR_T_64BIT depends on HAS_IOMEM depends on BLK_DEV + select MEMREGION help Generic support for non-volatile memory devices including ACPI-6-NFIT defined resources. On platforms that define an diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c index acce050856a8..75fe651d327d 100644 --- a/drivers/nvdimm/core.c +++ b/drivers/nvdimm/core.c @@ -463,7 +463,6 @@ static __exit void libnvdimm_exit(void) nd_region_exit(); nvdimm_exit(); nvdimm_bus_exit(); - nd_region_devs_exit(); nvdimm_devs_exit(); } diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h index e5ffd5733540..17561302dfec 100644 --- a/drivers/nvdimm/nd-core.h +++ b/drivers/nvdimm/nd-core.h @@ -133,7 +133,6 @@ struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev); int __init nvdimm_bus_init(void); void nvdimm_bus_exit(void); void nvdimm_devs_exit(void); -void nd_region_devs_exit(void); void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev); struct nd_region; void nd_region_create_ns_seed(struct nd_region *nd_region); diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b4ef7d9ff22e..576c390dabd6 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -27,7 +27,6 @@ */ #include -static DEFINE_IDA(region_ida); static DEFINE_PER_CPU(int, flush_idx); static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm, @@ -141,7 +140,7 @@ static void nd_region_release(struct device *dev) put_device(&nvdimm->dev); } free_percpu(nd_region->lane); - ida_simple_remove(®ion_ida, nd_region->id); + memregion_free(nd_region->id); if (is_nd_blk(dev)) kfree(to_nd_blk_region(dev)); else @@ -1036,7 +1035,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, if (!region_buf) return NULL; - nd_region->id = ida_simple_get(®ion_ida, 0, 0, GFP_KERNEL); + nd_region->id = memregion_alloc(GFP_KERNEL); if (nd_region->id < 0) goto err_id; @@ -1090,7 +1089,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, return nd_region; err_percpu: - ida_simple_remove(®ion_ida, nd_region->id); + memregion_free(nd_region->id); err_id: kfree(region_buf); return NULL; @@ -1237,8 +1236,3 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start, return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict); } - -void __exit nd_region_devs_exit(void) -{ - ida_destroy(®ion_ida); -} diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 2d79841ee9b9..72ea690b35a4 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -12,6 +12,12 @@ #ifndef __ASSEMBLY__ #include #include +#ifdef CONFIG_MEMREGION +#include +#else +#include +#endif + /* * Resources are tree-like, allowing * nesting etc.. @@ -287,6 +293,27 @@ static inline bool resource_overlaps(struct resource *r1, struct resource *r2) return (r1->start <= r2->end && r1->end >= r2->start); } +#ifdef CONFIG_MEMREGION +extern struct ida memregion_ids; +static inline int memregion_alloc(gfp_t gfp) +{ + return ida_alloc(&memregion_ids, gfp); +} + +static inline void memregion_free(int id) +{ + ida_free(&memregion_ids, id); +} +#else /* CONFIG_MEMREGION */ +static inline int memregion_alloc(gfp_t gfp) +{ + return -ENOMEM; +} + +static inline void memregion_free(int id) +{ +} +#endif #endif /* __ASSEMBLY__ */ #endif /* _LINUX_IOPORT_H */ diff --git a/kernel/resource.c b/kernel/resource.c index 158f04ec1d4f..82dbd9f28e91 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1637,4 +1637,10 @@ static int __init strict_iomem(char *str) return 1; } +#ifdef CONFIG_MEMREGION +/* identifiers for device memory regions */ +DEFINE_IDA(memregion_ids); +EXPORT_SYMBOL(memregion_ids); +#endif + __setup("iomem=", strict_iomem); diff --git a/lib/Kconfig b/lib/Kconfig index 90623a0e1942..89f7e4523799 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -602,6 +602,9 @@ config ARCH_NO_SG_CHAIN config ARCH_HAS_PMEM_API bool +config MEMREGION + bool + # use memcpy to implement user copies for nommu architectures config UACCESS_MEMCPY bool From patchwork Mon Jun 24 18:20:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013967 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99976924 for ; Mon, 24 Jun 2019 18:34:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B8542817F for ; Mon, 24 Jun 2019 18:34:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F9DA28BB5; Mon, 24 Jun 2019 18:34:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 09F272817F for ; Mon, 24 Jun 2019 18:34:35 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E95662129DB99; Mon, 24 Jun 2019 11:34:34 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 55DCA21296B07 for ; Mon, 24 Jun 2019 11:34:33 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:32 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="166408497" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:32 -0700 Subject: [PATCH v4 08/10] device-dax: Add a driver for "hmem" devices From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:20:16 -0700 Message-ID: <156140041177.2951909.8582567579750505172.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kbuild test robot , ard.biesheuvel@linaro.org, peterz@infradead.org, Dave Hansen , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, tglx@linutronix.de, linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Platform firmware like EFI/ACPI may publish "hmem" platform devices. Such a device is a performance differentiated memory range likely reserved for an application specific use case. The driver gives access to 100% of the capacity via a device-dax mmap instance by default. However, if over-subscription and other kernel memory management is desired the resulting dax device can be assigned to the core-mm via the kmem driver. This consumes "hmem" devices the producer of "hmem" devices is saved for a follow-on patch so that it can reference the new CONFIG_DEV_DAX_HMEM symbol to gate performing the enumeration work. Cc: Vishal Verma Cc: Keith Busch Cc: Dave Jiang Reported-by: kbuild test robot Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/dax/Kconfig | 27 +++++++++++++++++++---- drivers/dax/Makefile | 2 ++ drivers/dax/hmem.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/ioport.h | 4 +++ 4 files changed, 85 insertions(+), 5 deletions(-) create mode 100644 drivers/dax/hmem.c diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index f33c73e4af41..1a59ef86f148 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -32,19 +32,36 @@ config DEV_DAX_PMEM Say M if unsure +config DEV_DAX_HMEM + tristate "HMEM DAX: direct access to 'specific purpose' memory" + depends on EFI_APPLICATION_RESERVED + default DEV_DAX + help + EFI 2.8 platforms, and others, may advertise 'specific purpose' + memory. For example, a high bandwidth memory pool. The + indication from platform firmware is meant to reserve the + memory from typical usage by default. This driver creates + device-dax instances for these memory ranges, and that also + enables the possibility to assign them to the DEV_DAX_KMEM + driver to override the reservation and add them to kernel + "System RAM" pool. + + Say M if unsure. + config DEV_DAX_KMEM tristate "KMEM DAX: volatile-use of persistent memory" default DEV_DAX depends on DEV_DAX depends on MEMORY_HOTPLUG # for add_memory() and friends help - Support access to persistent memory as if it were RAM. This - allows easier use of persistent memory by unmodified - applications. + Support access to persistent, or other performance + differentiated memory as if it were System RAM. This allows + easier use of persistent memory by unmodified applications, or + adds core kernel memory services to heterogeneous memory types + (HMEM) marked "reserved" by platform firmware. To use this feature, a DAX device must be unbound from the - device_dax driver (PMEM DAX) and bound to this kmem driver - on each boot. + device_dax driver and bound to this kmem driver on each boot. Say N if unsure. diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile index 81f7d54dadfb..80065b38b3c4 100644 --- a/drivers/dax/Makefile +++ b/drivers/dax/Makefile @@ -2,9 +2,11 @@ obj-$(CONFIG_DAX) += dax.o obj-$(CONFIG_DEV_DAX) += device_dax.o obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o +obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o dax-y := super.o dax-y += bus.o device_dax-y := device.o +dax_hmem-y := hmem.o obj-y += pmem/ diff --git a/drivers/dax/hmem.c b/drivers/dax/hmem.c new file mode 100644 index 000000000000..62f9e3c80e21 --- /dev/null +++ b/drivers/dax/hmem.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include "bus.h" + +static int dax_hmem_probe(struct platform_device *pdev) +{ + struct dev_pagemap pgmap = { NULL }; + struct device *dev = &pdev->dev; + struct dax_region *dax_region; + struct memregion_info *mri; + struct dev_dax *dev_dax; + struct resource *res; + + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!res) + return -ENOMEM; + + mri = dev->platform_data; + pgmap.dev = dev; + memcpy(&pgmap.res, res, sizeof(*res)); + + dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node, + PMD_SIZE, PFN_DEV|PFN_MAP); + if (!dax_region) + return -ENOMEM; + + dev_dax = devm_create_dev_dax(dax_region, 0, &pgmap); + if (IS_ERR(dev_dax)) + return PTR_ERR(dev_dax); + + /* child dev_dax instances now own the lifetime of the dax_region */ + dax_region_put(dax_region); + return 0; +} + +static int dax_hmem_remove(struct platform_device *pdev) +{ + /* devm handles teardown */ + return 0; +} + +static struct platform_driver dax_hmem_driver = { + .probe = dax_hmem_probe, + .remove = dax_hmem_remove, + .driver = { + .name = "hmem", + }, +}; + +module_platform_driver(dax_hmem_driver); + +MODULE_ALIAS("platform:hmem*"); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Intel Corporation"); diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 72ea690b35a4..0c529c8f8027 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -294,6 +294,10 @@ static inline bool resource_overlaps(struct resource *r1, struct resource *r2) } #ifdef CONFIG_MEMREGION +struct memregion_info { + int target_node; +}; + extern struct ida memregion_ids; static inline int memregion_alloc(gfp_t gfp) { From patchwork Mon Jun 24 18:20:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013971 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D130E14B6 for ; Mon, 24 Jun 2019 18:34:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C2AB52888C for ; Mon, 24 Jun 2019 18:34:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B6E5528BC2; Mon, 24 Jun 2019 18:34:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5ACB328BAC for ; Mon, 24 Jun 2019 18:34:40 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 34B6E2129EB81; Mon, 24 Jun 2019 11:34:40 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 1D42B21296B07 for ; Mon, 24 Jun 2019 11:34:38 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:37 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="336583347" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:37 -0700 Subject: [PATCH v4 09/10] acpi/numa/hmat: Register HMAT at device_initcall level From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:20:21 -0700 Message-ID: <156140042119.2951909.7727308817426477621.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ard.biesheuvel@linaro.org, peterz@infradead.org, Dave Hansen , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-nvdimm@lists.01.org, Jonathan Cameron , tglx@linutronix.de, Len Brown Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP In preparation for registering device-dax instances for accessing EFI specific-purpose memory, arrange for the HMAT registration to occur later in the init process. Critically HMAT initialization needs to occur after e820__reserve_resources_late() which is the point at which the iomem resource tree is populated with "Application Reserved" (IORES_DESC_APPLICATION_RESERVED). e820__reserve_resources_late() happens at subsys_initcall time. Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Keith Busch Cc: Jonathan Cameron Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Rafael J. Wysocki --- drivers/acpi/numa/hmat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index 2c220cb7b620..1d329c4af3bf 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -671,4 +671,4 @@ static __init int hmat_init(void) acpi_put_table(tbl); return 0; } -subsys_initcall(hmat_init); +device_initcall(hmat_init); From patchwork Mon Jun 24 18:20:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11013977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 973F814B6 for ; Mon, 24 Jun 2019 18:34:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89B7227FB1 for ; Mon, 24 Jun 2019 18:34:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7DD7428BC1; Mon, 24 Jun 2019 18:34:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D355F28BBE for ; Mon, 24 Jun 2019 18:34:44 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C7DA421296704; Mon, 24 Jun 2019 11:34:44 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.136; helo=mga12.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id C7CAE21296704 for ; Mon, 24 Jun 2019 11:34:43 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:43 -0700 X-IronPort-AV: E=Sophos;i="5.63,413,1557212400"; d="scan'208";a="312794570" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jun 2019 11:34:43 -0700 Subject: [PATCH v4 10/10] acpi/numa/hmat: Register "specific purpose" memory as an "hmem" device From: Dan Williams To: x86@kernel.org Date: Mon, 24 Jun 2019 11:20:26 -0700 Message-ID: <156140042634.2951909.15878153818360710942.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ard.biesheuvel@linaro.org, peterz@infradead.org, Dave Hansen , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-nvdimm@lists.01.org, Jonathan Cameron , tglx@linutronix.de, Len Brown Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Memory that has been tagged EFI_MEMORY_SP, and has performance properties described by the ACPI HMAT is expected to have an application specific consumer. Those consumers may want 100% of the memory capacity to be reserved from any usage by the kernel. By default, with this enabling, a platform device is created to represent this differentiated resource. The device-dax "hmem" driver claims these devices by default and provides an mmap interface for the target application. If the administrator prefers, the hmem resource range can be made available to the core-mm via the device-dax hotplug facility, kmem, to online the memory with its own numa node. This was tested with an emulated HMAT produced by qemu (with the pending HMAT enabling patches), and "efi_fake_mem=8G@9G:0x40000" on the kernel command line to mark the memory ranges associated with node2 and node3 as EFI_MEMORY_SP. qemu numa configuration options: -numa node,mem=4G,cpus=0-19,nodeid=0 -numa node,mem=4G,cpus=20-39,nodeid=1 -numa node,mem=4G,nodeid=2 -numa node,mem=4G,nodeid=3 -numa dist,src=0,dst=0,val=10 -numa dist,src=0,dst=1,val=21 -numa dist,src=0,dst=2,val=21 -numa dist,src=0,dst=3,val=21 -numa dist,src=1,dst=0,val=21 -numa dist,src=1,dst=1,val=10 -numa dist,src=1,dst=2,val=21 -numa dist,src=1,dst=3,val=21 -numa dist,src=2,dst=0,val=21 -numa dist,src=2,dst=1,val=21 -numa dist,src=2,dst=2,val=10 -numa dist,src=2,dst=3,val=21 -numa dist,src=3,dst=0,val=21 -numa dist,src=3,dst=1,val=21 -numa dist,src=3,dst=2,val=21 -numa dist,src=3,dst=3,val=10 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10 -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,base-lat=10,latency=15 -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=15 -numa hmat-lb,initiator=0,target=3,hierarchy=memory,data-type=access-latency,base-lat=10,latency=20 -numa hmat-lb,initiator=0,target=3,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=20 -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10 -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10 -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5 -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5 -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,base-lat=10,latency=15 -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=15 -numa hmat-lb,initiator=1,target=3,hierarchy=memory,data-type=access-latency,base-lat=10,latency=20 -numa hmat-lb,initiator=1,target=3,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=20 Result: # daxctl list -RDu [ { "path":"\/platform\/hmem.1", "id":1, "size":"4.00 GiB (4.29 GB)", "align":2097152, "devices":[ { "chardev":"dax1.0", "size":"4.00 GiB (4.29 GB)" } ] }, { "path":"\/platform\/hmem.0", "id":0, "size":"4.00 GiB (4.29 GB)", "align":2097152, "devices":[ { "chardev":"dax0.0", "size":"4.00 GiB (4.29 GB)" } ] } ] # cat /proc/iomem [..] 240000000-43fffffff : Application Reserved 240000000-33fffffff : hmem.0 240000000-33fffffff : dax0.0 340000000-43fffffff : hmem.1 340000000-43fffffff : dax1.0 Cc: Len Brown Cc: Keith Busch Cc: "Rafael J. Wysocki" Cc: Vishal Verma Cc: Jonathan Cameron Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron Acked-by: Rafael J. Wysocki --- drivers/acpi/numa/Kconfig | 1 drivers/acpi/numa/hmat.c | 132 +++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 122 insertions(+), 11 deletions(-) diff --git a/drivers/acpi/numa/Kconfig b/drivers/acpi/numa/Kconfig index d14582387ed0..c1be746e111a 100644 --- a/drivers/acpi/numa/Kconfig +++ b/drivers/acpi/numa/Kconfig @@ -8,6 +8,7 @@ config ACPI_HMAT bool "ACPI Heterogeneous Memory Attribute Table Support" depends on ACPI_NUMA select HMEM_REPORTING + select MEMREGION help If set, this option has the kernel parse and report the platform's ACPI HMAT (Heterogeneous Memory Attributes Table), diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index 1d329c4af3bf..622c5feb3aa0 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -8,11 +8,16 @@ * the applicable attributes with the node's interfaces. */ +#define pr_fmt(fmt) "acpi/hmat: " fmt +#define dev_fmt(fmt) "acpi/hmat: " fmt + #include #include #include #include #include +#include +#include #include #include #include @@ -40,6 +45,7 @@ struct memory_target { struct list_head node; unsigned int memory_pxm; unsigned int processor_pxm; + struct resource memregions; struct node_hmem_attrs hmem_attrs; }; @@ -92,21 +98,35 @@ static __init void alloc_memory_initiator(unsigned int cpu_pxm) list_add_tail(&initiator->node, &initiators); } -static __init void alloc_memory_target(unsigned int mem_pxm) +static __init void alloc_memory_target(unsigned int mem_pxm, + resource_size_t start, resource_size_t len) { struct memory_target *target; target = find_mem_target(mem_pxm); - if (target) - return; - - target = kzalloc(sizeof(*target), GFP_KERNEL); - if (!target) - return; + if (!target) { + target = kzalloc(sizeof(*target), GFP_KERNEL); + if (!target) + return; + target->memory_pxm = mem_pxm; + target->processor_pxm = PXM_INVAL; + target->memregions = (struct resource) { + .name = "ACPI mem", + .start = 0, + .end = -1, + .flags = IORESOURCE_MEM, + }; + list_add_tail(&target->node, &targets); + } - target->memory_pxm = mem_pxm; - target->processor_pxm = PXM_INVAL; - list_add_tail(&target->node, &targets); + /* + * There are potentially multiple ranges per PXM, so record each + * in the per-target memregions resource tree. + */ + if (!__request_region(&target->memregions, start, len, "memory target", + IORESOURCE_MEM)) + pr_warn("failed to reserve %#llx - %#llx in pxm: %d\n", + start, start + len, mem_pxm); } static __init const char *hmat_data_type(u8 type) @@ -428,7 +448,7 @@ static __init int srat_parse_mem_affinity(union acpi_subtable_headers *header, return -EINVAL; if (!(ma->flags & ACPI_SRAT_MEM_ENABLED)) return 0; - alloc_memory_target(ma->proximity_domain); + alloc_memory_target(ma->proximity_domain, ma->base_address, ma->length); return 0; } @@ -580,6 +600,81 @@ static __init void hmat_register_target_perf(struct memory_target *target) node_set_perf_attrs(mem_nid, &target->hmem_attrs, 0); } +static __init void hmat_register_target_device(struct memory_target *target, + struct resource *r) +{ + /* define a clean / non-busy resource for the platform device */ + struct resource res = { + .start = r->start, + .end = r->end, + .flags = IORESOURCE_MEM, + }; + struct platform_device *pdev; + struct memregion_info info; + int rc, id; + + rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM, + IORES_DESC_APPLICATION_RESERVED); + if (rc != REGION_INTERSECTS) + return; + + id = memregion_alloc(GFP_KERNEL); + if (id < 0) { + pr_err("memregion allocation failure for %pr\n", &res); + return; + } + + pdev = platform_device_alloc("hmem", id); + if (!pdev) { + pr_err("hmem device allocation failure for %pr\n", &res); + goto out_pdev; + } + + pdev->dev.numa_node = acpi_map_pxm_to_online_node(target->memory_pxm); + info = (struct memregion_info) { + .target_node = acpi_map_pxm_to_node(target->memory_pxm), + }; + rc = platform_device_add_data(pdev, &info, sizeof(info)); + if (rc < 0) { + pr_err("hmem memregion_info allocation failure for %pr\n", &res); + goto out_pdev; + } + + rc = platform_device_add_resources(pdev, &res, 1); + if (rc < 0) { + pr_err("hmem resource allocation failure for %pr\n", &res); + goto out_resource; + } + + rc = platform_device_add(pdev); + if (rc < 0) { + dev_err(&pdev->dev, "device add failed for %pr\n", &res); + goto out_resource; + } + + return; + +out_resource: + put_device(&pdev->dev); +out_pdev: + memregion_free(id); +} + +static __init void hmat_register_target_devices(struct memory_target *target) +{ + struct resource *res; + + /* + * Do not bother creating devices if no driver is available to + * consume them. + */ + if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM)) + return; + + for (res = target->memregions.child; res; res = res->sibling) + hmat_register_target_device(target, res); +} + static __init void hmat_register_targets(void) { struct memory_target *target; @@ -587,6 +682,12 @@ static __init void hmat_register_targets(void) list_for_each_entry(target, &targets, node) { int nid = pxm_to_node(target->memory_pxm); + /* + * Devices may belong to either an offline or online + * node, so unconditionally add them. + */ + hmat_register_target_devices(target); + /* * Skip offline nodes. This can happen when memory * marked EFI_MEMORY_SP, "specific purpose", is applied @@ -608,7 +709,16 @@ static __init void hmat_free_structures(void) struct memory_initiator *initiator, *inext; list_for_each_entry_safe(target, tnext, &targets, node) { + struct resource *res, *res_next; + list_del(&target->node); + res = target->memregions.child; + while (res) { + res_next = res->sibling; + __release_region(&target->memregions, res->start, + resource_size(res)); + res = res_next; + } kfree(target); }