From patchwork Thu Apr 4 19:08:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10886117 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 97BE315AC for ; Thu, 4 Apr 2019 19:21:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F2C928AA2 for ; Thu, 4 Apr 2019 19:21:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 734EA28AAC; Thu, 4 Apr 2019 19:21:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B51F328AA2 for ; Thu, 4 Apr 2019 19:21:14 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id A77F7211F933F; Thu, 4 Apr 2019 12:21:14 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.43; helo=mga05.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 56A22211F35EB for ; Thu, 4 Apr 2019 12:21:13 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Apr 2019 12:21:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,309,1549958400"; d="scan'208";a="288767543" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga004.jf.intel.com with ESMTP; 04 Apr 2019 12:21:12 -0700 Subject: [RFC PATCH 1/5] efi: Detect UEFI 2.8 Special Purpose Memory From: Dan Williams To: linux-kernel@vger.kernel.org Date: Thu, 04 Apr 2019 12:08:33 -0700 Message-ID: <155440491334.3190322.44013027330479237.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , x86@kernel.org, linux-mm@kvack.org, Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Darren Hart , Thomas Gleixner , Andy Shevchenko , linux-nvdimm@lists.01.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a special purpose". The proposed Linux behavior for special purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. A follow-on patch integrates parsing of the ACPI HMAT to identify the node and sub-range boundaries of EFI_MEMORY_SP designated memory. For now, arrange for EFI_MEMORY_SP memory to be reserved. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Ard Biesheuvel Cc: Darren Hart Cc: Andy Shevchenko Signed-off-by: Dan Williams --- arch/x86/Kconfig | 18 ++++++++++++++++++ arch/x86/boot/compressed/eboot.c | 5 ++++- arch/x86/boot/compressed/kaslr.c | 2 +- arch/x86/include/asm/e820/types.h | 9 +++++++++ arch/x86/kernel/e820.c | 9 +++++++-- arch/x86/platform/efi/efi.c | 10 +++++++++- include/linux/efi.h | 14 ++++++++++++++ include/linux/ioport.h | 1 + 8 files changed, 63 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c1f9b3cf437c..cb9ca27de7a5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1961,6 +1961,24 @@ config EFI_MIXED If unsure, say N. +config EFI_SPECIAL_MEMORY + bool "EFI Special Purpose Memory Support" + depends on EFI + ---help--- + On systems that have mixed performance classes of memory EFI + may indicate special purpose memory with an attribute (See + EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this + attribute may have unique performance characteristics compared + to the system's general purpose "System RAM" pool. On the + expectation that such memory has application specific usage + answer Y to arrange for the kernel to reserve it for + direct-access (device-dax) by default. The memory range can + later be optionally assigned to the page allocator by system + administrator policy. Say N to have the kernel treat this + memory as general purpose by default. + + If unsure, say Y. + config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index 544ac4fafd11..9b90fae21abe 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -560,7 +560,10 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s case EFI_BOOT_SERVICES_CODE: case EFI_BOOT_SERVICES_DATA: case EFI_CONVENTIONAL_MEMORY: - e820_type = E820_TYPE_RAM; + if (is_efi_special(d)) + e820_type = E820_TYPE_SPECIAL; + else + e820_type = E820_TYPE_RAM; break; case EFI_ACPI_MEMORY_NVS: diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 2e53c056ba20..897e46eb9714 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -757,7 +757,7 @@ process_efi_entries(unsigned long minimum, unsigned long image_size) * * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free. */ - if (md->type != EFI_CONVENTIONAL_MEMORY) + if (md->type != EFI_CONVENTIONAL_MEMORY || is_efi_special(md)) continue; if (efi_mirror_found && diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h index c3aa4b5e49e2..0ab8abae2e8b 100644 --- a/arch/x86/include/asm/e820/types.h +++ b/arch/x86/include/asm/e820/types.h @@ -28,6 +28,15 @@ enum e820_type { */ E820_TYPE_PRAM = 12, + /* + * Special-purpose / application-specific memory is indicated to + * the system via the EFI_MEMORY_SP attribute. Define an e820 + * translation of this memory type for the purpose of + * reserving this range and marking it with the + * IORES_DESC_APPLICATION_RESERVED designation. + */ + E820_TYPE_SPECIAL = 0xefffffff, + /* * Reserved RAM used by the kernel itself if * CONFIG_INTEL_TXT=y is enabled, memory of this type diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 2879e234e193..9f50dd0bbb04 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -176,6 +176,7 @@ static void __init e820_print_type(enum e820_type type) switch (type) { case E820_TYPE_RAM: /* Fall through: */ case E820_TYPE_RESERVED_KERN: pr_cont("usable"); break; + case E820_TYPE_SPECIAL: /* Fall through: */ case E820_TYPE_RESERVED: pr_cont("reserved"); break; case E820_TYPE_ACPI: pr_cont("ACPI data"); break; case E820_TYPE_NVS: pr_cont("ACPI NVS"); break; @@ -1023,6 +1024,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry) case E820_TYPE_UNUSABLE: return "Unusable memory"; case E820_TYPE_PRAM: return "Persistent Memory (legacy)"; case E820_TYPE_PMEM: return "Persistent Memory"; + case E820_TYPE_SPECIAL: /* Fall-through: */ case E820_TYPE_RESERVED: return "Reserved"; default: return "Unknown E820 type"; } @@ -1038,6 +1040,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry) case E820_TYPE_UNUSABLE: /* Fall-through: */ case E820_TYPE_PRAM: /* Fall-through: */ case E820_TYPE_PMEM: /* Fall-through: */ + case E820_TYPE_SPECIAL: /* Fall-through: */ case E820_TYPE_RESERVED: /* Fall-through: */ default: return IORESOURCE_MEM; } @@ -1050,6 +1053,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry) case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE; case E820_TYPE_PMEM: return IORES_DESC_PERSISTENT_MEMORY; case E820_TYPE_PRAM: return IORES_DESC_PERSISTENT_MEMORY_LEGACY; + case E820_TYPE_SPECIAL: return IORES_DESC_APPLICATION_RESERVED; case E820_TYPE_RESERVED_KERN: /* Fall-through: */ case E820_TYPE_RAM: /* Fall-through: */ case E820_TYPE_UNUSABLE: /* Fall-through: */ @@ -1065,13 +1069,14 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res) return true; /* - * Treat persistent memory like device memory, i.e. reserve it - * for exclusive use of a driver + * Treat persistent memory and other special memory ranges like + * device memory, i.e. reserve it for exclusive use of a driver */ switch (type) { case E820_TYPE_RESERVED: case E820_TYPE_PRAM: case E820_TYPE_PMEM: + case E820_TYPE_SPECIAL: return false; case E820_TYPE_RESERVED_KERN: case E820_TYPE_RAM: diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index e1cb01a22fa8..d227751f331b 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -139,7 +139,9 @@ static void __init do_add_efi_memmap(void) case EFI_BOOT_SERVICES_CODE: case EFI_BOOT_SERVICES_DATA: case EFI_CONVENTIONAL_MEMORY: - if (md->attribute & EFI_MEMORY_WB) + if (is_efi_special(md)) + e820_type = E820_TYPE_SPECIAL; + else if (md->attribute & EFI_MEMORY_WB) e820_type = E820_TYPE_RAM; else e820_type = E820_TYPE_RESERVED; @@ -753,6 +755,12 @@ static bool should_map_region(efi_memory_desc_t *md) if (IS_ENABLED(CONFIG_X86_32)) return false; + /* + * Special purpose memory is not mapped by default. + */ + if (is_efi_special(md)) + return false; + /* * Map all of RAM so that we can access arguments in the 1:1 * mapping when making EFI runtime calls. diff --git a/include/linux/efi.h b/include/linux/efi.h index 54357a258b35..cecbc2bda1da 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -112,6 +112,7 @@ typedef struct { #define EFI_MEMORY_MORE_RELIABLE \ ((u64)0x0000000000010000ULL) /* higher reliability */ #define EFI_MEMORY_RO ((u64)0x0000000000020000ULL) /* read-only */ +#define EFI_MEMORY_SP ((u64)0x0000000000040000ULL) /* special purpose */ #define EFI_MEMORY_RUNTIME ((u64)0x8000000000000000ULL) /* range requires runtime mapping */ #define EFI_MEMORY_DESCRIPTOR_VERSION 1 @@ -128,6 +129,19 @@ typedef struct { u64 attribute; } efi_memory_desc_t; +#ifdef CONFIG_EFI_SPECIAL_MEMORY +static inline bool is_efi_special(efi_memory_desc_t *md) +{ + return md->type == EFI_CONVENTIONAL_MEMORY + && (md->attribute & EFI_MEMORY_SP); +} +#else +static inline bool is_efi_special(efi_memory_desc_t *md) +{ + return false; +} +#endif + typedef struct { efi_guid_t guid; u32 headersize; diff --git a/include/linux/ioport.h b/include/linux/ioport.h index da0ebaec25f0..2d79841ee9b9 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -133,6 +133,7 @@ enum { IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5, IORES_DESC_DEVICE_PRIVATE_MEMORY = 6, IORES_DESC_DEVICE_PUBLIC_MEMORY = 7, + IORES_DESC_APPLICATION_RESERVED = 8, }; /* helpers to define resources */ From patchwork Thu Apr 4 19:08:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10886121 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84F981575 for ; Thu, 4 Apr 2019 19:21:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BF3B28AA2 for ; Thu, 4 Apr 2019 19:21:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5EBFA28AAC; Thu, 4 Apr 2019 19:21:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DC59A28AA2 for ; Thu, 4 Apr 2019 19:21:19 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id BDCF6211F99A3; Thu, 4 Apr 2019 12:21:19 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id F2E04211F2696 for ; Thu, 4 Apr 2019 12:21:18 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Apr 2019 12:21:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,309,1549958400"; d="scan'208";a="137653465" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga008.fm.intel.com with ESMTP; 04 Apr 2019 12:21:18 -0700 Subject: [RFC PATCH 2/5] lib/memregion: Uplevel the pmem "region" ida to a global allocator From: Dan Williams To: linux-kernel@vger.kernel.org Date: Thu, 04 Apr 2019 12:08:38 -0700 Message-ID: <155440491849.3190322.17551464505265122881.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-nvdimm@lists.01.org, x86@kernel.org, linux-mm@kvack.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP In preparation for handling platform differentiated memory types beyond persistent memory, uplevel the "region" identifier to a global number space. This enables a device-dax instance to be registered to any memory type with guaranteed unique names. Cc: Keith Busch Signed-off-by: Dan Williams --- drivers/nvdimm/Kconfig | 1 + drivers/nvdimm/core.c | 1 - drivers/nvdimm/nd-core.h | 1 - drivers/nvdimm/region_devs.c | 13 ++++--------- include/linux/memregion.h | 6 ++++++ lib/Kconfig | 6 ++++++ lib/Makefile | 1 + lib/memregion.c | 22 ++++++++++++++++++++++ 8 files changed, 40 insertions(+), 11 deletions(-) create mode 100644 include/linux/memregion.h create mode 100644 lib/memregion.c diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig index 5e27918e4624..bf26cc5f6d67 100644 --- a/drivers/nvdimm/Kconfig +++ b/drivers/nvdimm/Kconfig @@ -3,6 +3,7 @@ menuconfig LIBNVDIMM depends on PHYS_ADDR_T_64BIT depends on HAS_IOMEM depends on BLK_DEV + select MEMREGION help Generic support for non-volatile memory devices including ACPI-6-NFIT defined resources. On platforms that define an diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c index acce050856a8..75fe651d327d 100644 --- a/drivers/nvdimm/core.c +++ b/drivers/nvdimm/core.c @@ -463,7 +463,6 @@ static __exit void libnvdimm_exit(void) nd_region_exit(); nvdimm_exit(); nvdimm_bus_exit(); - nd_region_devs_exit(); nvdimm_devs_exit(); } diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h index e5ffd5733540..17561302dfec 100644 --- a/drivers/nvdimm/nd-core.h +++ b/drivers/nvdimm/nd-core.h @@ -133,7 +133,6 @@ struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev); int __init nvdimm_bus_init(void); void nvdimm_bus_exit(void); void nvdimm_devs_exit(void); -void nd_region_devs_exit(void); void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev); struct nd_region; void nd_region_create_ns_seed(struct nd_region *nd_region); diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b4ef7d9ff22e..eefdfd2565dd 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -11,6 +11,7 @@ * General Public License for more details. */ #include +#include #include #include #include @@ -27,7 +28,6 @@ */ #include -static DEFINE_IDA(region_ida); static DEFINE_PER_CPU(int, flush_idx); static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm, @@ -141,7 +141,7 @@ static void nd_region_release(struct device *dev) put_device(&nvdimm->dev); } free_percpu(nd_region->lane); - ida_simple_remove(®ion_ida, nd_region->id); + memregion_free(nd_region->id); if (is_nd_blk(dev)) kfree(to_nd_blk_region(dev)); else @@ -1036,7 +1036,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, if (!region_buf) return NULL; - nd_region->id = ida_simple_get(®ion_ida, 0, 0, GFP_KERNEL); + nd_region->id = memregion_alloc(); if (nd_region->id < 0) goto err_id; @@ -1090,7 +1090,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, return nd_region; err_percpu: - ida_simple_remove(®ion_ida, nd_region->id); + memregion_free(nd_region->id); err_id: kfree(region_buf); return NULL; @@ -1237,8 +1237,3 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start, return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict); } - -void __exit nd_region_devs_exit(void) -{ - ida_destroy(®ion_ida); -} diff --git a/include/linux/memregion.h b/include/linux/memregion.h new file mode 100644 index 000000000000..99fa47793b49 --- /dev/null +++ b/include/linux/memregion.h @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _MEMREGION_H_ +#define _MEMREGION_H_ +int memregion_alloc(void); +void memregion_free(int id); +#endif /* _MEMREGION_H_ */ diff --git a/lib/Kconfig b/lib/Kconfig index a9e56539bd11..0b765d9a1145 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -318,6 +318,12 @@ config DECOMPRESS_LZ4 config GENERIC_ALLOCATOR bool +# +# Generic IDA for memory regions +# +config MEMREGION + bool + # # reed solomon support is select'ed if needed # diff --git a/lib/Makefile b/lib/Makefile index 3b08673e8881..6e237c4071af 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -122,6 +122,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o obj-$(CONFIG_CRC8) += crc8.o obj-$(CONFIG_XXHASH) += xxhash.o obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o +obj-$(CONFIG_MEMREGION) += memregion.o obj-$(CONFIG_842_COMPRESS) += 842/ obj-$(CONFIG_842_DECOMPRESS) += 842/ diff --git a/lib/memregion.c b/lib/memregion.c new file mode 100644 index 000000000000..21f5ff96c2eb --- /dev/null +++ b/lib/memregion.c @@ -0,0 +1,22 @@ +#include +#include + +static DEFINE_IDA(region_ida); + +int memregion_alloc(void) +{ + return ida_simple_get(®ion_ida, 0, 0, GFP_KERNEL); +} +EXPORT_SYMBOL(memregion_alloc); + +void memregion_free(int id) +{ + ida_simple_remove(®ion_ida, id); +} +EXPORT_SYMBOL(memregion_free); + +static void __exit memregion_exit(void) +{ + ida_destroy(®ion_ida); +} +module_exit(memregion_exit); From patchwork Thu Apr 4 19:08:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10886125 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D7E1915AC for ; Thu, 4 Apr 2019 19:21:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B338C28AA2 for ; Thu, 4 Apr 2019 19:21:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A6E7428AAA; Thu, 4 Apr 2019 19:21:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2627628AA2 for ; Thu, 4 Apr 2019 19:21:26 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id D41E3211F99A2; Thu, 4 Apr 2019 12:21:25 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.93; helo=mga11.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 6F665211F2696 for ; Thu, 4 Apr 2019 12:21:24 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Apr 2019 12:21:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,309,1549958400"; d="scan'208";a="140174118" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga007.fm.intel.com with ESMTP; 04 Apr 2019 12:21:23 -0700 Subject: [RFC PATCH 3/5] acpi/hmat: Track target address ranges From: Dan Williams To: linux-kernel@vger.kernel.org Date: Thu, 04 Apr 2019 12:08:44 -0700 Message-ID: <155440492414.3190322.12683374224345847860.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-nvdimm@lists.01.org, x86@kernel.org, "Rafael J. Wysocki" , linux-mm@kvack.org, Jonathan Cameron , Len Brown Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP As of ACPI 6.3 the HMAT no longer advertises the physical memory address range for its entries. Instead, the expectation is the corresponding entry in the SRAT is looked up by the target proximity domain. Given there may be multiple distinct address ranges that share the same performance profile (sparse address space), find_mem_target() is updated to also consider the start address of the memory range. Target property updates are also adjusted to loop over all possible 'struct target' instances that may share the same proximity domain identification. Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Keith Busch Cc: Jonathan Cameron Signed-off-by: Dan Williams --- drivers/acpi/hmat/hmat.c | 77 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 53 insertions(+), 24 deletions(-) diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c index b275016ff648..e7ae44c8d359 100644 --- a/drivers/acpi/hmat/hmat.c +++ b/drivers/acpi/hmat/hmat.c @@ -38,6 +38,7 @@ static struct memory_locality *localities_types[4]; struct memory_target { struct list_head node; + u64 start, size; unsigned int memory_pxm; unsigned int processor_pxm; struct node_hmem_attrs hmem_attrs; @@ -63,12 +64,13 @@ static __init struct memory_initiator *find_mem_initiator(unsigned int cpu_pxm) return NULL; } -static __init struct memory_target *find_mem_target(unsigned int mem_pxm) +static __init struct memory_target *find_mem_target(unsigned int mem_pxm, + u64 start) { struct memory_target *target; list_for_each_entry(target, &targets, node) - if (target->memory_pxm == mem_pxm) + if (target->memory_pxm == mem_pxm && target->start == start) return target; return NULL; } @@ -92,14 +94,15 @@ static __init void alloc_memory_initiator(unsigned int cpu_pxm) list_add_tail(&initiator->node, &initiators); } -static __init void alloc_memory_target(unsigned int mem_pxm) +static __init void alloc_memory_target(unsigned int mem_pxm, + u64 start, u64 size) { struct memory_target *target; if (pxm_to_node(mem_pxm) == NUMA_NO_NODE) return; - target = find_mem_target(mem_pxm); + target = find_mem_target(mem_pxm, start); if (target) return; @@ -109,6 +112,8 @@ static __init void alloc_memory_target(unsigned int mem_pxm) target->memory_pxm = mem_pxm; target->processor_pxm = PXM_INVAL; + target->start = start; + target->size = size; list_add_tail(&target->node, &targets); } @@ -183,8 +188,8 @@ static __init u32 hmat_normalize(u16 entry, u64 base, u8 type) return value; } -static __init void hmat_update_target_access(struct memory_target *target, - u8 type, u32 value) +static __init void __hmat_update_target_access(struct memory_target *target, + u8 type, u32 value) { switch (type) { case ACPI_HMAT_ACCESS_LATENCY: @@ -212,6 +217,20 @@ static __init void hmat_update_target_access(struct memory_target *target, } } +static __init void hmat_update_target_access(int memory_pxm, int processor_pxm, + u8 type, u32 value) +{ + struct memory_target *target; + + list_for_each_entry(target, &targets, node) { + if (target->processor_pxm != processor_pxm) + continue; + if (target->memory_pxm != memory_pxm) + continue; + __hmat_update_target_access(target, type, value); + } +} + static __init void hmat_add_locality(struct acpi_hmat_locality *hmat_loc) { struct memory_locality *loc; @@ -255,7 +274,6 @@ static __init int hmat_parse_locality(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_hmat_locality *hmat_loc = (void *)header; - struct memory_target *target; unsigned int init, targ, total_size, ipds, tpds; u32 *inits, *targs, value; u16 *entries; @@ -296,11 +314,9 @@ static __init int hmat_parse_locality(union acpi_subtable_headers *header, inits[init], targs[targ], value, hmat_data_type_suffix(type)); - if (mem_hier == ACPI_HMAT_MEMORY) { - target = find_mem_target(targs[targ]); - if (target && target->processor_pxm == inits[init]) - hmat_update_target_access(target, type, value); - } + if (mem_hier == ACPI_HMAT_MEMORY) + hmat_update_target_access(targs[targ], + inits[init], type, value); } } @@ -367,6 +383,7 @@ static int __init hmat_parse_proximity_domain(union acpi_subtable_headers *heade { struct acpi_hmat_proximity_domain *p = (void *)header; struct memory_target *target = NULL; + bool found = false; if (p->header.length != sizeof(*p)) { pr_notice("HMAT: Unexpected address range header length: %d\n", @@ -382,23 +399,34 @@ static int __init hmat_parse_proximity_domain(union acpi_subtable_headers *heade pr_info("HMAT: Memory Flags:%04x Processor Domain:%d Memory Domain:%d\n", p->flags, p->processor_PD, p->memory_PD); - if (p->flags & ACPI_HMAT_MEMORY_PD_VALID) { - target = find_mem_target(p->memory_PD); - if (!target) { - pr_debug("HMAT: Memory Domain missing from SRAT\n"); - return -EINVAL; - } - } - if (target && p->flags & ACPI_HMAT_PROCESSOR_PD_VALID) { - int p_node = pxm_to_node(p->processor_PD); + if ((p->flags & ACPI_HMAT_MEMORY_PD_VALID) == 0) + return 0; + + list_for_each_entry(target, &targets, node) { + int p_node; + + if (target->memory_pxm != p->memory_PD) + continue; + found = true; + if ((p->flags & ACPI_HMAT_PROCESSOR_PD_VALID) == 0) + continue; + + p_node = pxm_to_node(p->processor_PD); if (p_node == NUMA_NO_NODE) { - pr_debug("HMAT: Invalid Processor Domain\n"); + pr_debug("HMAT: Invalid Processor Domain: %d\n", + p->processor_PD); return -EINVAL; } + target->processor_pxm = p_node; } + if (!found) { + pr_debug("HMAT: Memory Domain missing from SRAT for pxm: %d\n", + p->memory_PD); + return -EINVAL; + } return 0; } @@ -431,7 +459,7 @@ static __init int srat_parse_mem_affinity(union acpi_subtable_headers *header, return -EINVAL; if (!(ma->flags & ACPI_SRAT_MEM_ENABLED)) return 0; - alloc_memory_target(ma->proximity_domain); + alloc_memory_target(ma->proximity_domain, ma->base_address, ma->length); return 0; } @@ -568,7 +596,8 @@ static __init void hmat_register_target_initiators(struct memory_target *target) clear_bit(initiator->processor_pxm, p_nodes); } if (best) - hmat_update_target_access(target, loc->hmat_loc->data_type, best); + __hmat_update_target_access(target, + loc->hmat_loc->data_type, best); } for_each_set_bit(i, p_nodes, MAX_NUMNODES) { From patchwork Thu Apr 4 19:08:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10886129 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 971AD15AC for ; Thu, 4 Apr 2019 19:21:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B91E28AA2 for ; Thu, 4 Apr 2019 19:21:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6FE1D28AAA; Thu, 4 Apr 2019 19:21:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 074C328AA2 for ; Thu, 4 Apr 2019 19:21:31 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id F17AA211F99A7; Thu, 4 Apr 2019 12:21:30 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.65; helo=mga03.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id E0F3B211F35EB for ; Thu, 4 Apr 2019 12:21:29 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Apr 2019 12:21:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,309,1549958400"; d="scan'208";a="146656209" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by FMSMGA003.fm.intel.com with ESMTP; 04 Apr 2019 12:21:29 -0700 Subject: [RFC PATCH 4/5] acpi/hmat: Register special purpose memory as a device From: Dan Williams To: linux-kernel@vger.kernel.org Date: Thu, 04 Apr 2019 12:08:49 -0700 Message-ID: <155440492988.3190322.4475460421334178449.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-nvdimm@lists.01.org, x86@kernel.org, "Rafael J. Wysocki" , linux-mm@kvack.org, Jonathan Cameron , Len Brown Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Memory that has been tagged EFI_SPECIAL_PURPOSE, and has performance properties described by the ACPI HMAT is expected to have an application specific consumer. Those consumers may want 100% of the memory capacity to be reserved from any usage by the kernel. By default, with this enabling, a platform device is created to represent this differentiated resource. A follow on change arranges for device-dax to claim these devices by default and provide an mmap interface for the target application. However, if the administrator prefers that some or all of the special purpose memory is made available to the core-mm the device-dax hotplug facility can be used to online the memory with its own numa node. Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Keith Busch Cc: Jonathan Cameron Signed-off-by: Dan Williams --- drivers/acpi/hmat/Kconfig | 1 + drivers/acpi/hmat/hmat.c | 63 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/memregion.h | 3 ++ 3 files changed, 67 insertions(+) diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig index 95a29964dbea..4fcf76e8aa1d 100644 --- a/drivers/acpi/hmat/Kconfig +++ b/drivers/acpi/hmat/Kconfig @@ -3,6 +3,7 @@ config ACPI_HMAT bool "ACPI Heterogeneous Memory Attribute Table Support" depends on ACPI_NUMA select HMEM_REPORTING + select MEMREGION help If set, this option has the kernel parse and report the platform's ACPI HMAT (Heterogeneous Memory Attributes Table), diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c index e7ae44c8d359..482360004ea0 100644 --- a/drivers/acpi/hmat/hmat.c +++ b/drivers/acpi/hmat/hmat.c @@ -13,6 +13,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -612,6 +615,65 @@ static __init void hmat_register_target_perf(struct memory_target *target) node_set_perf_attrs(mem_nid, &target->hmem_attrs, 0); } +static __init void hmat_register_target_device(struct memory_target *target) +{ + struct memregion_info info; + struct resource res = { + .start = target->start, + .end = target->start + target->size - 1, + .flags = IORESOURCE_MEM, + .desc = IORES_DESC_APPLICATION_RESERVED, + }; + struct platform_device *pdev; + int rc, id; + + if (region_intersects(target->start, target->size, IORESOURCE_MEM, + IORES_DESC_APPLICATION_RESERVED) + != REGION_INTERSECTS) + return; + + id = memregion_alloc(); + if (id < 0) { + pr_err("acpi/hmat: memregion allocation failure for %pr\n", &res); + return; + } + + pdev = platform_device_alloc("hmem", id); + if (!pdev) { + pr_err("acpi/hmat: hmem device allocation failure for %pr\n", &res); + goto out_pdev; + } + + pdev->dev.numa_node = acpi_map_pxm_to_online_node(target->processor_pxm); + info = (struct memregion_info) { + .target_node = acpi_map_pxm_to_node(target->memory_pxm), + }; + rc = platform_device_add_data(pdev, &info, sizeof(info)); + if (rc < 0) { + pr_err("acpi/hmat: hmem memregion_info allocation failure for %pr\n", &res); + goto out_pdev; + } + + rc = platform_device_add_resources(pdev, &res, 1); + if (rc < 0) { + pr_err("acpi/hmat: hmem resource allocation failure for %pr\n", &res); + goto out_resource; + } + + rc = platform_device_add(pdev); + if (rc < 0) { + dev_err(&pdev->dev, "acpi/hmat: device add failed for %pr\n", &res); + goto out_resource; + } + + return; + +out_resource: + put_device(&pdev->dev); +out_pdev: + memregion_free(id); +} + static __init void hmat_register_targets(void) { struct memory_target *target; @@ -619,6 +681,7 @@ static __init void hmat_register_targets(void) list_for_each_entry(target, &targets, node) { hmat_register_target_initiators(target); hmat_register_target_perf(target); + hmat_register_target_device(target); } } diff --git a/include/linux/memregion.h b/include/linux/memregion.h index 99fa47793b49..5de2ac7fcf5e 100644 --- a/include/linux/memregion.h +++ b/include/linux/memregion.h @@ -1,6 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 #ifndef _MEMREGION_H_ #define _MEMREGION_H_ +struct memregion_info { + int target_node; +}; int memregion_alloc(void); void memregion_free(int id); #endif /* _MEMREGION_H_ */ From patchwork Thu Apr 4 19:08:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10886133 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B31CD1575 for ; Thu, 4 Apr 2019 19:21:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C46628AA2 for ; Thu, 4 Apr 2019 19:21:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 905ED28AAA; Thu, 4 Apr 2019 19:21:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 228F528AA2 for ; Thu, 4 Apr 2019 19:21:36 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 15FAD211F99A6; Thu, 4 Apr 2019 12:21:36 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.136; helo=mga12.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id EC864211F99A6 for ; Thu, 4 Apr 2019 12:21:34 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Apr 2019 12:21:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,309,1549958400"; d="scan'208";a="335059446" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga005.fm.intel.com with ESMTP; 04 Apr 2019 12:21:34 -0700 Subject: [RFC PATCH 5/5] device-dax: Add a driver for "hmem" devices From: Dan Williams To: linux-kernel@vger.kernel.org Date: Thu, 04 Apr 2019 12:08:55 -0700 Message-ID: <155440493544.3190322.10165040361847405101.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155440490809.3190322.15060922240602775809.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-nvdimm@lists.01.org, x86@kernel.org, linux-mm@kvack.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Platform firmware like EFI/ACPI may publish "hmem" platform devices. Such a device is a performance differentiated memory range likely reserved for an application specific use case. The driver gives access to 100% of the capacity via a device-dax mmap instance by default. However, if over-subscription and other kernel memory management is desired the resulting dax device can be assigned to the core-mm via the kmem driver. Cc: Vishal Verma Cc: Keith Busch Cc: Dave Jiang Signed-off-by: Dan Williams --- drivers/dax/Kconfig | 26 ++++++++++++++++++---- drivers/dax/Makefile | 2 ++ drivers/dax/hmem.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+), 5 deletions(-) create mode 100644 drivers/dax/hmem.c diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index 5ef624fe3934..b3886f43dd77 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -38,16 +38,32 @@ config DEV_DAX_KMEM depends on DEV_DAX depends on MEMORY_HOTPLUG # for add_memory() and friends help - Support access to persistent memory as if it were RAM. This - allows easier use of persistent memory by unmodified - applications. + Support access to persistent, or other performance + differentiated memory as if it were System RAM. This allows + easier use of persistent memory by unmodified applications, or + adds core kernel memory services to heterogeneous memory types + (HMEM) marked "reserved" by platform firmware. To use this feature, a DAX device must be unbound from the - device_dax driver (PMEM DAX) and bound to this kmem driver - on each boot. + device_dax driver and bound to this kmem driver on each boot. Say N if unsure. +config DEV_DAX_HMEM + tristate "HMEM DAX: generic support for 'special purpose' memory" + default DEV_DAX + help + EFI 2.8 platforms, and others, may advertise 'special purpose' + memory. For example, a high bandwidth memory pool. The + indication from platform firmware is meant to reserve the + memory from typical usage by default. This driver creates + device-dax instances for these memory ranges, and that also + enables the possibility to assign them to the DEV_DAX_KMEM + driver to override the reservation and add them to kernel + "System RAM" pool. + + Say Y if unsure. + config DEV_DAX_PMEM_COMPAT tristate "PMEM DAX: support the deprecated /sys/class/dax interface" depends on DEV_DAX_PMEM diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile index 81f7d54dadfb..80065b38b3c4 100644 --- a/drivers/dax/Makefile +++ b/drivers/dax/Makefile @@ -2,9 +2,11 @@ obj-$(CONFIG_DAX) += dax.o obj-$(CONFIG_DEV_DAX) += device_dax.o obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o +obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o dax-y := super.o dax-y += bus.o device_dax-y := device.o +dax_hmem-y := hmem.o obj-y += pmem/ diff --git a/drivers/dax/hmem.c b/drivers/dax/hmem.c new file mode 100644 index 000000000000..b01cf9c65329 --- /dev/null +++ b/drivers/dax/hmem.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include "bus.h" + +static int dax_hmem_probe(struct platform_device *pdev) +{ + struct dev_pagemap pgmap = { 0 }; + struct device *dev = &pdev->dev; + struct dax_region *dax_region; + struct memregion_info *mri; + struct dev_dax *dev_dax; + struct resource *res; + + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!res) + return -ENOMEM; + + mri = dev->platform_data; + pgmap.dev = dev; + memcpy(&pgmap.res, res, sizeof(*res)); + + dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node, + PMD_SIZE, PFN_DEV|PFN_MAP); + if (!dax_region) + return -ENOMEM; + + dev_dax = devm_create_dev_dax(dax_region, 0, &pgmap); + if (IS_ERR(dev_dax)) + return PTR_ERR(dev_dax); + + /* child dev_dax instances now own the lifetime of the dax_region */ + dax_region_put(dax_region); + return 0; +} + +static int dax_hmem_remove(struct platform_device *pdev) +{ + /* devm handles teardown */ + return 0; +} + +static struct platform_driver dax_hmem_driver = { + .probe = dax_hmem_probe, + .remove = dax_hmem_remove, + .driver = { + .name = "hmem", + }, +}; + +module_platform_driver(dax_hmem_driver); + +MODULE_ALIAS("platform:hmem*"); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Intel Corporation");