From patchwork Wed Mar 4 02:08:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11419121 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69D7F924 for ; Wed, 4 Mar 2020 02:24:18 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 51A1B208C3 for ; Wed, 4 Mar 2020 02:24:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 51A1B208C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 96E2510FC361A; Tue, 3 Mar 2020 18:25:09 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=192.55.52.136; helo=mga12.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 3A03510FC3619 for ; Tue, 3 Mar 2020 18:25:07 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:14 -0800 X-IronPort-AV: E=Sophos;i="5.70,511,1574150400"; d="scan'208";a="319675283" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:13 -0800 Subject: [PATCH v4 1/5] mm/memremap_pages: Introduce memremap_compat_align() From: Dan Williams To: linux-nvdimm@lists.01.org Date: Tue, 03 Mar 2020 18:08:08 -0800 Message-ID: <158328768844.2223916.3587427265166021149.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: SITGKOIEMIJKUGJHBI3ZULEULYVRXEF2 X-Message-ID-Hash: SITGKOIEMIJKUGJHBI3ZULEULYVRXEF2 X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: "Aneesh Kumar K.V" , Benjamin Herrenschmidt , Paul Mackerras , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: The "sub-section memory hotplug" facility allows memremap_pages() users like libnvdimm to compensate for hardware platforms like x86 that have a section size larger than their hardware memory mapping granularity. The compensation that sub-section support affords is being tolerant of physical memory resources shifting by units smaller (64MiB on x86) than the memory-hotplug section size (128 MiB). Where the platform physical-memory mapping granularity is limited by the number and capability of address-decode-registers in the memory controller. While the sub-section support allows memremap_pages() to operate on sub-section (2MiB) granularity, the Power architecture may still require 16MiB alignment on "!radix_enabled()" platforms. In order for libnvdimm to be able to detect and manage this per-arch limitation, introduce memremap_compat_align() as a common minimum alignment across all driver-facing memory-mapping interfaces, and let Power override it to 16MiB in the "!radix_enabled()" case. The assumption / requirement for 16MiB to be a viable memremap_compat_align() value is that Power does not have platforms where its equivalent of address-decode-registers never hardware remaps a persistent memory resource on smaller than 16MiB boundaries. Note that I tried my best to not add a new Kconfig symbol, but header include entanglements defeated the #ifndef memremap_compat_align design pattern and the need to export it defeats the __weak design pattern for arch overrides. Based on an initial patch by Aneesh. Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.com Reported-by: Aneesh Kumar K.V Reported-by: Jeff Moyer Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Reviewed-by: Aneesh Kumar K.V Signed-off-by: Dan Williams Acked-by: Michael Ellerman (powerpc) --- arch/powerpc/Kconfig | 1 + arch/powerpc/mm/ioremap.c | 21 +++++++++++++++++++++ drivers/nvdimm/pfn_devs.c | 2 +- include/linux/memremap.h | 8 ++++++++ include/linux/mmzone.h | 1 + lib/Kconfig | 3 +++ mm/memremap.c | 23 +++++++++++++++++++++++ 7 files changed, 58 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 497b7d0b2d7e..e6ffe905e2b9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -122,6 +122,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV select ARCH_HAS_HUGEPD if HUGETLB_PAGE + select ARCH_HAS_MEMREMAP_COMPAT_ALIGN select ARCH_HAS_MMIOWB if PPC64 select ARCH_HAS_PHYS_TO_DMA select ARCH_HAS_PMEM_API diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c index fc669643ce6a..b1a0aebe8c48 100644 --- a/arch/powerpc/mm/ioremap.c +++ b/arch/powerpc/mm/ioremap.c @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -97,3 +98,23 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long size, return NULL; } + +#ifdef CONFIG_ZONE_DEVICE +/* + * Override the generic version in mm/memremap.c. + * + * With hash translation, the direct-map range is mapped with just one + * page size selected by htab_init_page_sizes(). Consult + * mmu_psize_defs[] to determine the minimum page size alignment. +*/ +unsigned long memremap_compat_align(void) +{ + unsigned int shift = mmu_psize_defs[mmu_linear_psize].shift; + + if (radix_enabled()) + return SUBSECTION_SIZE; + return max(SUBSECTION_SIZE, 1UL << shift); + +} +EXPORT_SYMBOL_GPL(memremap_compat_align); +#endif diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index b94f7a7e94b8..a5c25cb87116 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -750,7 +750,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) start = nsio->res.start; size = resource_size(&nsio->res); npfns = PHYS_PFN(size - SZ_8K); - align = max(nd_pfn->align, (1UL << SUBSECTION_SHIFT)); + align = max(nd_pfn->align, SUBSECTION_SIZE); end_trunc = start + size - ALIGN_DOWN(start + size, align); if (nd_pfn->mode == PFN_MODE_PMEM) { /* diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 6fefb09af7c3..8af1cbd8f293 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -132,6 +132,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns); +unsigned long memremap_compat_align(void); #else static inline void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) @@ -165,6 +166,12 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns) { } + +/* when memremap_pages() is disabled all archs can remap a single page */ +static inline unsigned long memremap_compat_align(void) +{ + return PAGE_SIZE; +} #endif /* CONFIG_ZONE_DEVICE */ static inline void put_dev_pagemap(struct dev_pagemap *pgmap) @@ -172,4 +179,5 @@ static inline void put_dev_pagemap(struct dev_pagemap *pgmap) if (pgmap) percpu_ref_put(pgmap->ref); } + #endif /* _LINUX_MEMREMAP_H_ */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 462f6873905a..6b77f7239af5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1170,6 +1170,7 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) #define SUBSECTION_SHIFT 21 +#define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT) #define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT) #define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT) diff --git a/lib/Kconfig b/lib/Kconfig index bc7e56370129..5d53f9609c25 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -615,6 +615,9 @@ config ARCH_HAS_PMEM_API config MEMREGION bool +config ARCH_HAS_MEMREMAP_COMPAT_ALIGN + bool + # use memcpy to implement user copies for nommu architectures config UACCESS_MEMCPY bool diff --git a/mm/memremap.c b/mm/memremap.c index 09b5b7adc773..3e7afaf05639 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -14,6 +15,28 @@ static DEFINE_XARRAY(pgmap_array); +/* + * The memremap() and memremap_pages() interfaces are alternately used + * to map persistent memory namespaces. These interfaces place different + * constraints on the alignment and size of the mapping (namespace). + * memremap() can map individual PAGE_SIZE pages. memremap_pages() can + * only map subsections (2MB), and at least one architecture (PowerPC) + * the minimum mapping granularity of memremap_pages() is 16MB. + * + * The role of memremap_compat_align() is to communicate the minimum + * arch supported alignment of a namespace such that it can freely + * switch modes without violating the arch constraint. Namely, do not + * allow a namespace to be PAGE_SIZE aligned since that namespace may be + * reconfigured into a mode that requires SUBSECTION_SIZE alignment. + */ +#ifndef CONFIG_ARCH_HAS_MEMREMAP_COMPAT_ALIGN +unsigned long memremap_compat_align(void) +{ + return SUBSECTION_SIZE; +} +EXPORT_SYMBOL_GPL(memremap_compat_align); +#endif + #ifdef CONFIG_DEV_PAGEMAP_OPS DEFINE_STATIC_KEY_FALSE(devmap_managed_key); EXPORT_SYMBOL(devmap_managed_key); From patchwork Wed Mar 4 02:08:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11419123 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 78EDD930 for ; Wed, 4 Mar 2020 02:24:22 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5F56D208C3 for ; Wed, 4 Mar 2020 02:24:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F56D208C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id A91F610FC3620; Tue, 3 Mar 2020 18:25:13 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=192.55.52.120; helo=mga04.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2C46010FC361D for ; Tue, 3 Mar 2020 18:25:11 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:19 -0800 X-IronPort-AV: E=Sophos;i="5.70,511,1574150400"; d="scan'208";a="232467854" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:19 -0800 Subject: [PATCH v4 2/5] libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid From: Dan Williams To: linux-nvdimm@lists.01.org Date: Tue, 03 Mar 2020 18:08:13 -0800 Message-ID: <158328769377.2223916.18385460083366544789.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: 2P4HMGOHGMHLPJUDLXXQZDXVWK6M4LOR X-Message-ID-Hash: 2P4HMGOHGMHLPJUDLXXQZDXVWK6M4LOR X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: "Aneesh Kumar K.V" , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: The EOPNOTSUPP return code from the pmem driver indicates that the namespace has a configuration that may be valid, but the current kernel does not support it. Expand this to all of the nd_pfn_validate() error conditions after the infoblock has been verified as self consistent. This prevents exposing the namespace to I/O when the infoblock needs to be corrected, or the system needs to be put into a different configuration (like changing the page size on PowerPC). Cc: Aneesh Kumar K.V Cc: Jeff Moyer Reviewed-by: Aneesh Kumar K.V Signed-off-by: Dan Williams --- drivers/nvdimm/pfn_devs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index a5c25cb87116..79fe02d6f657 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -561,14 +561,14 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) dev_dbg(&nd_pfn->dev, "align: %lx:%lx mode: %d:%d\n", nd_pfn->align, align, nd_pfn->mode, mode); - return -EINVAL; + return -EOPNOTSUPP; } } if (align > nvdimm_namespace_capacity(ndns)) { dev_err(&nd_pfn->dev, "alignment: %lx exceeds capacity %llx\n", align, nvdimm_namespace_capacity(ndns)); - return -EINVAL; + return -EOPNOTSUPP; } /* @@ -581,7 +581,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) if (offset >= resource_size(&nsio->res)) { dev_err(&nd_pfn->dev, "pfn array size exceeds capacity of %s\n", dev_name(&ndns->dev)); - return -EBUSY; + return -EOPNOTSUPP; } if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align)) @@ -589,7 +589,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) dev_err(&nd_pfn->dev, "bad offset: %#llx dax disabled align: %#lx\n", offset, align); - return -ENXIO; + return -EOPNOTSUPP; } return 0; From patchwork Wed Mar 4 02:08:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11419125 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6830D930 for ; Wed, 4 Mar 2020 02:24:27 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 507E720866 for ; Wed, 4 Mar 2020 02:24:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 507E720866 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id BC4D410FC3621; Tue, 3 Mar 2020 18:25:18 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id D880F10FC360A for ; Tue, 3 Mar 2020 18:25:16 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:24 -0800 X-IronPort-AV: E=Sophos;i="5.70,511,1574150400"; d="scan'208";a="438963156" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:24 -0800 Subject: [PATCH v4 3/5] libnvdimm/namespace: Enforce memremap_compat_align() From: Dan Williams To: linux-nvdimm@lists.01.org Date: Tue, 03 Mar 2020 18:08:19 -0800 Message-ID: <158328769923.2223916.7637626147977697918.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: P2X6APMFLHU7XHOXTMZTFPMG25QPX5WS X-Message-ID-Hash: P2X6APMFLHU7XHOXTMZTFPMG25QPX5WS X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: "Aneesh Kumar K.V" , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: The pmem driver on PowerPC crashes with the following signature when instantiating misaligned namespaces that map their capacity via memremap_pages(). BUG: Unable to handle kernel data access at 0xc001000406000000 Faulting instruction address: 0xc000000000090790 NIP [c000000000090790] arch_add_memory+0xc0/0x130 LR [c000000000090744] arch_add_memory+0x74/0x130 Call Trace: arch_add_memory+0x74/0x130 (unreliable) memremap_pages+0x74c/0xa30 devm_memremap_pages+0x3c/0xa0 pmem_attach_disk+0x188/0x770 nvdimm_bus_probe+0xd8/0x470 With the assumption that only memremap_pages() has alignment constraints, enforce memremap_compat_align() for pmem_should_map_pages(), nd_pfn, and nd_dax cases. This includes preventing the creation of namespaces where the base address is misaligned and cases there infoblock padding parameters are invalid. Reported-by: Aneesh Kumar K.V Cc: Jeff Moyer Fixes: a3619190d62e ("libnvdimm/pfn: stop padding pmem namespaces to section alignment") Reviewed-by: Aneesh Kumar K.V Signed-off-by: Dan Williams --- drivers/nvdimm/namespace_devs.c | 17 +++++++++++++++++ drivers/nvdimm/pfn.h | 12 ++++++++++++ drivers/nvdimm/pfn_devs.c | 32 +++++++++++++++++++++++++++++--- 3 files changed, 58 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 032dc61725ff..77e211c7d94d 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -10,6 +10,7 @@ #include #include "nd-core.h" #include "pmem.h" +#include "pfn.h" #include "nd.h" static void namespace_io_release(struct device *dev) @@ -1739,6 +1740,22 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev) return ERR_PTR(-ENODEV); } + /* + * Note, alignment validation for fsdax and devdax mode + * namespaces happens in nd_pfn_validate() where infoblock + * padding parameters can be applied. + */ + if (pmem_should_map_pages(dev)) { + struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); + struct resource *res = &nsio->res; + + if (!IS_ALIGNED(res->start | (res->end + 1), + memremap_compat_align())) { + dev_err(&ndns->dev, "%pr misaligned, unable to map\n", res); + return ERR_PTR(-EOPNOTSUPP); + } + } + if (is_namespace_pmem(&ndns->dev)) { struct nd_namespace_pmem *nspm; diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index acb19517f678..37cb1b8a2a39 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -24,6 +24,18 @@ struct nd_pfn_sb { __le64 npfns; __le32 mode; /* minor-version-1 additions for section alignment */ + /** + * @start_pad: Deprecated attribute to pad start-misaligned namespaces + * + * start_pad is deprecated because the original definition did + * not comprehend that dataoff is relative to the base address + * of the namespace not the start_pad adjusted base. The result + * is that the dax path is broken, but the block-I/O path is + * not. The kernel will no longer create namespaces using start + * padding, but it still supports block-I/O for legacy + * configurations mainly to allow a backup, reconfigure the + * namespace, and restore flow to repair dax operation. + */ __le32 start_pad; __le32 end_trunc; /* minor-version-2 record the base alignment of the mapping */ diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 79fe02d6f657..34db557dbad1 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -446,6 +446,7 @@ static bool nd_supported_alignment(unsigned long align) int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) { u64 checksum, offset; + struct resource *res; enum nd_pfn_mode mode; struct nd_namespace_io *nsio; unsigned long align, start_pad; @@ -578,13 +579,14 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) * established. */ nsio = to_nd_namespace_io(&ndns->dev); - if (offset >= resource_size(&nsio->res)) { + res = &nsio->res; + if (offset >= resource_size(res)) { dev_err(&nd_pfn->dev, "pfn array size exceeds capacity of %s\n", dev_name(&ndns->dev)); return -EOPNOTSUPP; } - if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align)) + if ((align && !IS_ALIGNED(res->start + offset + start_pad, align)) || !IS_ALIGNED(offset, PAGE_SIZE)) { dev_err(&nd_pfn->dev, "bad offset: %#llx dax disabled align: %#lx\n", @@ -592,6 +594,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) return -EOPNOTSUPP; } + if (!IS_ALIGNED(res->start + le32_to_cpu(pfn_sb->start_pad), + memremap_compat_align())) { + dev_err(&nd_pfn->dev, "resource start misaligned\n"); + return -EOPNOTSUPP; + } + + if (!IS_ALIGNED(res->end + 1 - le32_to_cpu(pfn_sb->end_trunc), + memremap_compat_align())) { + dev_err(&nd_pfn->dev, "resource end misaligned\n"); + return -EOPNOTSUPP; + } + return 0; } EXPORT_SYMBOL(nd_pfn_validate); @@ -750,7 +764,19 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) start = nsio->res.start; size = resource_size(&nsio->res); npfns = PHYS_PFN(size - SZ_8K); - align = max(nd_pfn->align, SUBSECTION_SIZE); + align = max(nd_pfn->align, memremap_compat_align()); + + /* + * When @start is misaligned fail namespace creation. See + * the 'struct nd_pfn_sb' commentary on why ->start_pad is not + * an option. + */ + if (!IS_ALIGNED(start, memremap_compat_align())) { + dev_err(&nd_pfn->dev, "%s: start %pa misaligned to %#lx\n", + dev_name(&ndns->dev), &start, + memremap_compat_align()); + return -EINVAL; + } end_trunc = start + size - ALIGN_DOWN(start + size, align); if (nd_pfn->mode == PFN_MODE_PMEM) { /* From patchwork Wed Mar 4 02:08:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11419127 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C309924 for ; Wed, 4 Mar 2020 02:24:35 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 63A2920866 for ; Wed, 4 Mar 2020 02:24:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63A2920866 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id CFD9F10FC3624; Tue, 3 Mar 2020 18:25:26 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=192.55.52.88; helo=mga01.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 66BB710FC3623 for ; Tue, 3 Mar 2020 18:25:22 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:30 -0800 X-IronPort-AV: E=Sophos;i="5.70,511,1574150400"; d="scan'208";a="263436259" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:30 -0800 Subject: [PATCH v4 4/5] libnvdimm/region: Introduce NDD_LABELING From: Dan Williams To: linux-nvdimm@lists.01.org Date: Tue, 03 Mar 2020 18:08:24 -0800 Message-ID: <158328770472.2223916.5413598505602504160.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: 5EJ77JEJIJWHDKZT7A5QQFRF3P3QB3EY X-Message-ID-Hash: 5EJ77JEJIJWHDKZT7A5QQFRF3P3QB3EY X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: "Aneesh Kumar K.V" , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: The NDD_ALIASING flag is used to indicate where pmem capacity might alias with blk capacity and require labeling. It is also used to indicate whether the DIMM supports labeling. Separate this latter capability into its own flag so that the NDD_ALIASING flag is scoped to true aliased configurations. To my knowledge aliased configurations only exist in the ACPI spec, there are no known platforms that ship this support in production. This clarity allows namespace-capacity alignment constraints around interleave-ways to be relaxed. Cc: Vishal Verma Cc: Oliver O'Halloran Reviewed-by: Jeff Moyer Reviewed-by: Aneesh Kumar K.V Link: https://lore.kernel.org/r/158041477856.3889308.4212605617834097674.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- arch/powerpc/platforms/pseries/papr_scm.c | 2 +- drivers/acpi/nfit/core.c | 4 +++- drivers/nvdimm/dimm.c | 2 +- drivers/nvdimm/dimm_devs.c | 9 +++++---- drivers/nvdimm/namespace_devs.c | 2 +- drivers/nvdimm/nd.h | 2 +- drivers/nvdimm/region_devs.c | 10 +++++----- include/linux/libnvdimm.h | 2 ++ 8 files changed, 19 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 0b4467e378e5..589858cb3203 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -328,7 +328,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) } dimm_flags = 0; - set_bit(NDD_ALIASING, &dimm_flags); + set_bit(NDD_LABELING, &dimm_flags); p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL); diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index a3320f93616d..71d7f2aa1b12 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2026,8 +2026,10 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc) continue; } - if (nfit_mem->bdw && nfit_mem->memdev_pmem) + if (nfit_mem->bdw && nfit_mem->memdev_pmem) { set_bit(NDD_ALIASING, &flags); + set_bit(NDD_LABELING, &flags); + } /* collate flags across all memdevs for this dimm */ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) { diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c index 64776ed15bb3..7d4ddc4d9322 100644 --- a/drivers/nvdimm/dimm.c +++ b/drivers/nvdimm/dimm.c @@ -99,7 +99,7 @@ static int nvdimm_probe(struct device *dev) if (ndd->ns_current >= 0) { rc = nd_label_reserve_dpa(ndd); if (rc == 0) - nvdimm_set_aliasing(dev); + nvdimm_set_labeling(dev); } nvdimm_bus_unlock(dev); diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index 94ea6dba6b4f..39a61a514746 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -32,7 +32,7 @@ int nvdimm_check_config_data(struct device *dev) if (!nvdimm->cmd_mask || !test_bit(ND_CMD_GET_CONFIG_DATA, &nvdimm->cmd_mask)) { - if (test_bit(NDD_ALIASING, &nvdimm->flags)) + if (test_bit(NDD_LABELING, &nvdimm->flags)) return -ENXIO; else return -ENOTTY; @@ -173,11 +173,11 @@ int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset, return rc; } -void nvdimm_set_aliasing(struct device *dev) +void nvdimm_set_labeling(struct device *dev) { struct nvdimm *nvdimm = to_nvdimm(dev); - set_bit(NDD_ALIASING, &nvdimm->flags); + set_bit(NDD_LABELING, &nvdimm->flags); } void nvdimm_set_locked(struct device *dev) @@ -312,8 +312,9 @@ static ssize_t flags_show(struct device *dev, { struct nvdimm *nvdimm = to_nvdimm(dev); - return sprintf(buf, "%s%s\n", + return sprintf(buf, "%s%s%s\n", test_bit(NDD_ALIASING, &nvdimm->flags) ? "alias " : "", + test_bit(NDD_LABELING, &nvdimm->flags) ? "label " : "", test_bit(NDD_LOCKED, &nvdimm->flags) ? "lock " : ""); } static DEVICE_ATTR_RO(flags); diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 77e211c7d94d..01f6c22f0d1a 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -2538,7 +2538,7 @@ static int init_active_labels(struct nd_region *nd_region) if (!ndd) { if (test_bit(NDD_LOCKED, &nvdimm->flags)) /* fail, label data may be unreadable */; - else if (test_bit(NDD_ALIASING, &nvdimm->flags)) + else if (test_bit(NDD_LABELING, &nvdimm->flags)) /* fail, labels needed to disambiguate dpa */; else return 0; diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index c9f6a5b5253a..ca39abe29c7c 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -252,7 +252,7 @@ int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset, void *buf, size_t len); long nvdimm_clear_poison(struct device *dev, phys_addr_t phys, unsigned int len); -void nvdimm_set_aliasing(struct device *dev); +void nvdimm_set_labeling(struct device *dev); void nvdimm_set_locked(struct device *dev); void nvdimm_clear_locked(struct device *dev); int nvdimm_security_setup_events(struct device *dev); diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index a19e535830d9..a5fc6e4c56ff 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -195,16 +195,16 @@ EXPORT_SYMBOL_GPL(nd_blk_region_set_provider_data); int nd_region_to_nstype(struct nd_region *nd_region) { if (is_memory(&nd_region->dev)) { - u16 i, alias; + u16 i, label; - for (i = 0, alias = 0; i < nd_region->ndr_mappings; i++) { + for (i = 0, label = 0; i < nd_region->ndr_mappings; i++) { struct nd_mapping *nd_mapping = &nd_region->mapping[i]; struct nvdimm *nvdimm = nd_mapping->nvdimm; - if (test_bit(NDD_ALIASING, &nvdimm->flags)) - alias++; + if (test_bit(NDD_LABELING, &nvdimm->flags)) + label++; } - if (alias) + if (label) return ND_DEVICE_NAMESPACE_PMEM; else return ND_DEVICE_NAMESPACE_IO; diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index 9df091bd30ba..18da4059be09 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -37,6 +37,8 @@ enum { NDD_WORK_PENDING = 4, /* ignore / filter NSLABEL_FLAG_LOCAL for this DIMM, i.e. no aliasing */ NDD_NOBLK = 5, + /* dimm supports namespace labels */ + NDD_LABELING = 6, /* need to set a limit somewhere, but yes, this is likely overkill */ ND_IOCTL_MAX_BUFLEN = SZ_4M, From patchwork Wed Mar 4 02:08:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11419129 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A2A7E930 for ; Wed, 4 Mar 2020 02:24:38 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8B274214D8 for ; Wed, 4 Mar 2020 02:24:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B274214D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id F081A10FC3629; Tue, 3 Mar 2020 18:25:29 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.20; helo=mga02.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 4F14F10FC3170 for ; Tue, 3 Mar 2020 18:25:28 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:36 -0800 X-IronPort-AV: E=Sophos;i="5.70,511,1574150400"; d="scan'208";a="287202660" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2020 18:24:36 -0800 Subject: [PATCH v4 5/5] libnvdimm/region: Introduce an 'align' attribute From: Dan Williams To: linux-nvdimm@lists.01.org Date: Tue, 03 Mar 2020 18:08:30 -0800 Message-ID: <158328771060.2223916.5449156100834597166.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158328768294.2223916.16551505954326988623.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: AEW7KEXWIDNEZWT6HGJ7N4BKYS324GKH X-Message-ID-Hash: AEW7KEXWIDNEZWT6HGJ7N4BKYS324GKH X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: "Aneesh Kumar K.V" , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: The align attribute applies an alignment constraint for namespace creation in a region. Whereas the 'align' attribute of a namespace applied alignment padding via an info block, the 'align' attribute applies alignment constraints to the free space allocation. The default for 'align' is the maximum known memremap_compat_align() across all archs (16MiB from PowerPC at time of writing) multiplied by the number of interleave ways if there is blk-aliasing. The minimum is PAGE_SIZE and allows for the creation of cross-arch incompatible namespaces, just as previous kernels allowed, but the expectation is cross-arch and mode-independent compatibility by default. The regression risk with this change is limited to cases that were dependent on the ability to create unaligned namespaces, *and* for some reason are unable to opt-out of aligned namespaces by writing to 'regionX/align'. If such a scenario arises the default can be flipped from opt-out to opt-in of compat-aligned namespace creation, but that is a last resort. The kernel will otherwise continue to support existing defined misaligned namespaces. Unfortunately this change needs to touch several parts of the implementation at once: - region/available_size: expand busy extents to current align - region/max_available_extent: expand busy extents to current align - namespace/size: trim free space to current align ...to keep the free space accounting conforming to the dynamic align setting. Reported-by: Aneesh Kumar K.V Reported-by: Jeff Moyer Reviewed-by: Aneesh Kumar K.V Reviewed-by: Jeff Moyer Link: https://lore.kernel.org/r/158041478371.3889308.14542630147672668068.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- drivers/nvdimm/dimm_devs.c | 86 +++++++++++++++++++++++---- drivers/nvdimm/namespace_devs.c | 9 ++- drivers/nvdimm/nd.h | 1 drivers/nvdimm/region_devs.c | 122 ++++++++++++++++++++++++++++++++++++--- 4 files changed, 192 insertions(+), 26 deletions(-) diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index 39a61a514746..b7b77e8d9027 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -563,6 +563,21 @@ int nvdimm_security_freeze(struct nvdimm *nvdimm) return rc; } +static unsigned long dpa_align(struct nd_region *nd_region) +{ + struct device *dev = &nd_region->dev; + + if (dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev), + "bus lock required for capacity provision\n")) + return 0; + if (dev_WARN_ONCE(dev, !nd_region->ndr_mappings || nd_region->align + % nd_region->ndr_mappings, + "invalid region align %#lx mappings: %d\n", + nd_region->align, nd_region->ndr_mappings)) + return 0; + return nd_region->align / nd_region->ndr_mappings; +} + int alias_dpa_busy(struct device *dev, void *data) { resource_size_t map_end, blk_start, new; @@ -571,6 +586,7 @@ int alias_dpa_busy(struct device *dev, void *data) struct nd_region *nd_region; struct nvdimm_drvdata *ndd; struct resource *res; + unsigned long align; int i; if (!is_memory(dev)) @@ -608,13 +624,21 @@ int alias_dpa_busy(struct device *dev, void *data) * Find the free dpa from the end of the last pmem allocation to * the end of the interleave-set mapping. */ + align = dpa_align(nd_region); + if (!align) + return 0; + for_each_dpa_resource(ndd, res) { + resource_size_t start, end; + if (strncmp(res->name, "pmem", 4) != 0) continue; - if ((res->start >= blk_start && res->start < map_end) - || (res->end >= blk_start - && res->end <= map_end)) { - new = max(blk_start, min(map_end + 1, res->end + 1)); + + start = ALIGN_DOWN(res->start, align); + end = ALIGN(res->end + 1, align) - 1; + if ((start >= blk_start && start < map_end) + || (end >= blk_start && end <= map_end)) { + new = max(blk_start, min(map_end, end) + 1); if (new != blk_start) { blk_start = new; goto retry; @@ -654,6 +678,7 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region) .res = NULL, }; struct resource *res; + unsigned long align; if (!ndd) return 0; @@ -661,10 +686,20 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region) device_for_each_child(&nvdimm_bus->dev, &info, alias_dpa_busy); /* now account for busy blk allocations in unaliased dpa */ + align = dpa_align(nd_region); + if (!align) + return 0; for_each_dpa_resource(ndd, res) { + resource_size_t start, end, size; + if (strncmp(res->name, "blk", 3) != 0) continue; - info.available -= resource_size(res); + start = ALIGN_DOWN(res->start, align); + end = ALIGN(res->end + 1, align) - 1; + size = end - start + 1; + if (size >= info.available) + return 0; + info.available -= size; } return info.available; @@ -683,19 +718,31 @@ resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region, struct nvdimm_bus *nvdimm_bus; resource_size_t max = 0; struct resource *res; + unsigned long align; /* if a dimm is disabled the available capacity is zero */ if (!ndd) return 0; + align = dpa_align(nd_region); + if (!align) + return 0; + nvdimm_bus = walk_to_nvdimm_bus(ndd->dev); if (__reserve_free_pmem(&nd_region->dev, nd_mapping->nvdimm)) return 0; for_each_dpa_resource(ndd, res) { + resource_size_t start, end; + if (strcmp(res->name, "pmem-reserve") != 0) continue; - if (resource_size(res) > max) - max = resource_size(res); + /* trim free space relative to current alignment setting */ + start = ALIGN(res->start, align); + end = ALIGN_DOWN(res->end + 1, align) - 1; + if (end < start) + continue; + if (end - start + 1 > max) + max = end - start + 1; } release_free_pmem(nvdimm_bus, nd_mapping); return max; @@ -723,24 +770,33 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region, struct nvdimm_drvdata *ndd = to_ndd(nd_mapping); struct resource *res; const char *reason; + unsigned long align; if (!ndd) return 0; + align = dpa_align(nd_region); + if (!align) + return 0; + map_start = nd_mapping->start; map_end = map_start + nd_mapping->size - 1; blk_start = max(map_start, map_end + 1 - *overlap); for_each_dpa_resource(ndd, res) { - if (res->start >= map_start && res->start < map_end) { + resource_size_t start, end; + + start = ALIGN_DOWN(res->start, align); + end = ALIGN(res->end + 1, align) - 1; + if (start >= map_start && start < map_end) { if (strncmp(res->name, "blk", 3) == 0) blk_start = min(blk_start, - max(map_start, res->start)); - else if (res->end > map_end) { + max(map_start, start)); + else if (end > map_end) { reason = "misaligned to iset"; goto err; } else - busy += resource_size(res); - } else if (res->end >= map_start && res->end <= map_end) { + busy += end - start + 1; + } else if (end >= map_start && end <= map_end) { if (strncmp(res->name, "blk", 3) == 0) { /* * If a BLK allocation overlaps the start of @@ -749,8 +805,8 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region, */ blk_start = map_start; } else - busy += resource_size(res); - } else if (map_start > res->start && map_start < res->end) { + busy += end - start + 1; + } else if (map_start > start && map_start < end) { /* total eclipse of the mapping */ busy += nd_mapping->size; blk_start = map_start; @@ -760,7 +816,7 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region, *overlap = map_end + 1 - blk_start; available = blk_start - map_start; if (busy < available) - return available - busy; + return ALIGN_DOWN(available - busy, align); return 0; err: diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 01f6c22f0d1a..ae155e860fdc 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -542,6 +542,11 @@ static void space_valid(struct nd_region *nd_region, struct nvdimm_drvdata *ndd, { bool is_reserve = strcmp(label_id->id, "pmem-reserve") == 0; bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0; + unsigned long align; + + align = nd_region->align / nd_region->ndr_mappings; + valid->start = ALIGN(valid->start, align); + valid->end = ALIGN_DOWN(valid->end + 1, align) - 1; if (valid->start >= valid->end) goto invalid; @@ -981,10 +986,10 @@ static ssize_t __size_store(struct device *dev, unsigned long long val) return -ENXIO; } - div_u64_rem(val, PAGE_SIZE * nd_region->ndr_mappings, &remainder); + div_u64_rem(val, nd_region->align, &remainder); if (remainder) { dev_dbg(dev, "%llu is not %ldK aligned\n", val, - (PAGE_SIZE * nd_region->ndr_mappings) / SZ_1K); + nd_region->align / SZ_1K); return -EINVAL; } diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index ca39abe29c7c..c4d69c1cce55 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -146,6 +146,7 @@ struct nd_region { struct device *btt_seed; struct device *pfn_seed; struct device *dax_seed; + unsigned long align; u16 ndr_mappings; u64 ndr_size; u64 ndr_start; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index a5fc6e4c56ff..bf239e783940 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -216,21 +216,25 @@ int nd_region_to_nstype(struct nd_region *nd_region) } EXPORT_SYMBOL(nd_region_to_nstype); -static ssize_t size_show(struct device *dev, - struct device_attribute *attr, char *buf) +static unsigned long long region_size(struct nd_region *nd_region) { - struct nd_region *nd_region = to_nd_region(dev); - unsigned long long size = 0; - - if (is_memory(dev)) { - size = nd_region->ndr_size; + if (is_memory(&nd_region->dev)) { + return nd_region->ndr_size; } else if (nd_region->ndr_mappings == 1) { struct nd_mapping *nd_mapping = &nd_region->mapping[0]; - size = nd_mapping->size; + return nd_mapping->size; } - return sprintf(buf, "%llu\n", size); + return 0; +} + +static ssize_t size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nd_region *nd_region = to_nd_region(dev); + + return sprintf(buf, "%llu\n", region_size(nd_region)); } static DEVICE_ATTR_RO(size); @@ -529,6 +533,55 @@ static ssize_t read_only_store(struct device *dev, } static DEVICE_ATTR_RW(read_only); +static ssize_t align_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nd_region *nd_region = to_nd_region(dev); + + return sprintf(buf, "%#lx\n", nd_region->align); +} + +static ssize_t align_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + struct nd_region *nd_region = to_nd_region(dev); + unsigned long val, dpa; + u32 remainder; + int rc; + + rc = kstrtoul(buf, 0, &val); + if (rc) + return rc; + + if (!nd_region->ndr_mappings) + return -ENXIO; + + /* + * Ensure space-align is evenly divisible by the region + * interleave-width because the kernel typically has no facility + * to determine which DIMM(s), dimm-physical-addresses, would + * contribute to the tail capacity in system-physical-address + * space for the namespace. + */ + dpa = val; + remainder = do_div(dpa, nd_region->ndr_mappings); + if (!is_power_of_2(dpa) || dpa < PAGE_SIZE + || val > region_size(nd_region) || remainder) + return -EINVAL; + + /* + * Given that space allocation consults this value multiple + * times ensure it does not change for the duration of the + * allocation. + */ + nvdimm_bus_lock(dev); + nd_region->align = val; + nvdimm_bus_unlock(dev); + + return len; +} +static DEVICE_ATTR_RW(align); + static ssize_t region_badblocks_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -571,6 +624,7 @@ static DEVICE_ATTR_RO(persistence_domain); static struct attribute *nd_region_attributes[] = { &dev_attr_size.attr, + &dev_attr_align.attr, &dev_attr_nstype.attr, &dev_attr_mappings.attr, &dev_attr_btt_seed.attr, @@ -626,6 +680,19 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n) return a->mode; } + if (a == &dev_attr_align.attr) { + int i; + + for (i = 0; i < nd_region->ndr_mappings; i++) { + struct nd_mapping *nd_mapping = &nd_region->mapping[i]; + struct nvdimm *nvdimm = nd_mapping->nvdimm; + + if (test_bit(NDD_LABELING, &nvdimm->flags)) + return a->mode; + } + return 0; + } + if (a != &dev_attr_set_cookie.attr && a != &dev_attr_available_size.attr) return a->mode; @@ -935,6 +1002,42 @@ void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane) } EXPORT_SYMBOL(nd_region_release_lane); +/* + * PowerPC requires this alignment for memremap_pages(). All other archs + * should be ok with SUBSECTION_SIZE (see memremap_compat_align()). + */ +#define MEMREMAP_COMPAT_ALIGN_MAX SZ_16M + +static unsigned long default_align(struct nd_region *nd_region) +{ + unsigned long align, per_mapping; + int i, mappings; + u32 remainder; + + if (is_nd_blk(&nd_region->dev)) + align = PAGE_SIZE; + else + align = MEMREMAP_COMPAT_ALIGN_MAX; + + for (i = 0; i < nd_region->ndr_mappings; i++) { + struct nd_mapping *nd_mapping = &nd_region->mapping[i]; + struct nvdimm *nvdimm = nd_mapping->nvdimm; + + if (test_bit(NDD_ALIASING, &nvdimm->flags)) { + align = MEMREMAP_COMPAT_ALIGN_MAX; + break; + } + } + + mappings = max_t(u16, 1, nd_region->ndr_mappings); + per_mapping = align; + remainder = do_div(per_mapping, mappings); + if (remainder) + align *= mappings; + + return align; +} + static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, struct nd_region_desc *ndr_desc, const struct device_type *dev_type, const char *caller) @@ -1039,6 +1142,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, dev->of_node = ndr_desc->of_node; nd_region->ndr_size = resource_size(ndr_desc->res); nd_region->ndr_start = ndr_desc->res->start; + nd_region->align = default_align(nd_region); if (ndr_desc->flush) nd_region->flush = ndr_desc->flush; else