Message ID | 20150605211912.20751.35406.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
The code is mostly generic, so the x86 in the subject is a little misleading. What keeps the Kconfig symbol in x86 anyway? Is there a reason why it can't be made entirely generic? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btw, I don't think this actually is safe without refcounting your kmap structure. The driver model ->remove callback can be called at any time, which will ioremap the memory and remap the kmap structure. But at this point a user might still be using it. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote: > Btw, I don't think this actually is safe without refcounting your kmap > structure. > > The driver model ->remove callback can be called at any time, which > will ioremap the memory and remap the kmap structure. But at this > point a user might still be using it. Won't the device data structures be pinned by the refcount on the bdev? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 10, 2015 at 11:03:35AM -0400, Matthew Wilcox wrote: > On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote: > > Btw, I don't think this actually is safe without refcounting your kmap > > structure. > > > > The driver model ->remove callback can be called at any time, which > > will ioremap the memory and remap the kmap structure. But at this > > point a user might still be using it. > > Won't the device data structures be pinned by the refcount on the bdev? An open filesystem only keeps a reference on the request_queue. The underlying driver model ->remove method will still be called on a surprise removal. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 10, 2015 at 8:11 AM, Christoph Hellwig <hch@lst.de> wrote: > On Wed, Jun 10, 2015 at 11:03:35AM -0400, Matthew Wilcox wrote: >> On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote: >> > Btw, I don't think this actually is safe without refcounting your kmap >> > structure. >> > >> > The driver model ->remove callback can be called at any time, which >> > will ioremap the memory and remap the kmap structure. But at this >> > point a user might still be using it. >> >> Won't the device data structures be pinned by the refcount on the bdev? > > An open filesystem only keeps a reference on the request_queue. The > underlying driver model ->remove method will still be called on > a surprise removal. On surprise removal my expectation is that the driver keeps the the ioremap mapping alive for at least a synchronize_rcu() period. With that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep the mapping alive for the short duration, or otherwise prevent new mappings. However, I missed converting the driver to order its iounmap() with respect to the pfn range registration via devres, will fix. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 10, 2015 at 08:36:48AM -0700, Dan Williams wrote: > On surprise removal my expectation is that the driver keeps the the > ioremap mapping alive for at least a synchronize_rcu() period. With > that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep > the mapping alive for the short duration, or otherwise prevent new > mappings. However, I missed converting the driver to order its > iounmap() with respect to the pfn range registration via devres, will > fix. Right. I guess the best idea is to replace devm_register_kmap_pfn_range with an interface that does the ioremap, in which case the cleanup function can take care of the proper ordering behind the drivers back. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 10, 2015 at 9:11 AM, Christoph Hellwig <hch@lst.de> wrote: > On Wed, Jun 10, 2015 at 08:36:48AM -0700, Dan Williams wrote: >> On surprise removal my expectation is that the driver keeps the the >> ioremap mapping alive for at least a synchronize_rcu() period. With >> that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep >> the mapping alive for the short duration, or otherwise prevent new >> mappings. However, I missed converting the driver to order its >> iounmap() with respect to the pfn range registration via devres, will >> fix. > > Right. I guess the best idea is to replace devm_register_kmap_pfn_range > with an interface that does the ioremap, in which case the cleanup > function can take care of the proper ordering behind the drivers back. Even better than what I was thinking... will do. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/Kconfig b/arch/Kconfig index a65eafb24997..9b98732e0f3b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -203,6 +203,9 @@ config HAVE_DMA_ATTRS config HAVE_DMA_CONTIGUOUS bool +config HAVE_KMAP_PFN + bool + config GENERIC_SMP_IDLE_THREAD bool diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 226d5696e1d1..130d1a4c2efc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1432,6 +1432,11 @@ config X86_PMEM_LEGACY Say Y if unsure. +config X86_PMEM_DMA + depends on !HIGHMEM + def_bool DEV_PFN + select HAVE_KMAP_PFN + config HIGHPTE bool "Allocate 3rd-level pagetables from highmem" depends on HIGHMEM diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 9286a46b7d69..85fd52d43a9a 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -83,6 +83,29 @@ static inline void __kunmap_atomic(void *addr) #endif /* CONFIG_HIGHMEM */ +#ifdef CONFIG_HAVE_KMAP_PFN +extern void *kmap_atomic_pfn_t(__pfn_t pfn); +extern void kunmap_atomic_pfn_t(void *addr); +extern int devm_register_kmap_pfn_range(struct device *dev, + struct resource *res, void *base); +#else +static inline void *kmap_atomic_pfn_t(__pfn_t pfn) +{ + return kmap_atomic(__pfn_t_to_page(pfn)); +} + +static inline void kunmap_atomic_pfn_t(void *addr) +{ + __kunmap_atomic(addr); +} + +static inline int devm_register_kmap_pfn_range(struct device *dev, + struct resource *res, void *base) +{ + return 0; +} +#endif /* CONFIG_HAVE_KMAP_PFN */ + #if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32) DECLARE_PER_CPU(int, __kmap_atomic_idx); diff --git a/mm/Makefile b/mm/Makefile index 98c4eaeabdcb..66e30c2addfe 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -78,3 +78,4 @@ obj-$(CONFIG_CMA) += cma.o obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o +obj-$(CONFIG_HAVE_KMAP_PFN) += pfn.o diff --git a/mm/pfn.c b/mm/pfn.c new file mode 100644 index 000000000000..0e046b49aebf --- /dev/null +++ b/mm/pfn.c @@ -0,0 +1,98 @@ +/* + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include <linux/rcupdate.h> +#include <linux/rculist.h> +#include <linux/highmem.h> +#include <linux/device.h> +#include <linux/slab.h> +#include <linux/mm.h> + +static LIST_HEAD(ranges); + +struct kmap { + struct list_head list; + struct resource *res; + struct device *dev; + void *base; +}; + +static void teardown_kmap(void *data) +{ + struct kmap *kmap = data; + + dev_dbg(kmap->dev, "kmap unregister %pr\n", kmap->res); + list_del_rcu(&kmap->list); + synchronize_rcu(); + kfree(kmap); +} + +int devm_register_kmap_pfn_range(struct device *dev, struct resource *res, + void *base) +{ + struct kmap *kmap = kzalloc(sizeof(*kmap), GFP_KERNEL); + int rc; + + if (!kmap) + return -ENOMEM; + + INIT_LIST_HEAD(&kmap->list); + kmap->res = res; + kmap->base = base; + kmap->dev = dev; + rc = devm_add_action(dev, teardown_kmap, kmap); + if (rc) { + kfree(kmap); + return rc; + } + dev_dbg(kmap->dev, "kmap register %pr\n", kmap->res); + list_add_rcu(&kmap->list, &ranges); + return 0; +} +EXPORT_SYMBOL_GPL(devm_register_kmap_pfn_range); + +void *kmap_atomic_pfn_t(__pfn_t pfn) +{ + struct page *page = __pfn_t_to_page(pfn); + resource_size_t addr; + struct kmap *kmap; + + if (page) + return kmap_atomic(page); + addr = __pfn_t_to_phys(pfn); + rcu_read_lock(); + list_for_each_entry_rcu(kmap, &ranges, list) + if (addr >= kmap->res->start && addr <= kmap->res->end) + return kmap->base + addr - kmap->res->start; + + /* only unlock in the error case */ + rcu_read_unlock(); + return NULL; +} +EXPORT_SYMBOL(kmap_atomic_pfn_t); + +void kunmap_atomic_pfn_t(void *addr) +{ + /* + * If the original __pfn_t had an entry in the memmap then + * 'addr' will be outside of vmalloc space i.e. it came from + * page_address() + */ + if (!is_vmalloc_addr(addr)) { + kunmap_atomic(addr); + return; + } + + /* signal that we are done with the range */ + rcu_read_unlock(); +} +EXPORT_SYMBOL(kunmap_atomic_pfn_t);
It would be unfortunate if the kmap infrastructure escaped its current 32-bit/HIGHMEM bonds and leaked into 64-bit code. Instead, if the user has enabled CONFIG_DEV_PFN we direct the kmap_atomic_pfn_t() implementation to scan a list of pre-mapped persistent memory address ranges inserted by the pmem driver. The __pfn_t to resource lookup is indeed inefficient walking of a linked list, but there are two mitigating factors: 1/ The number of persistent memory ranges is bounded by the number of DIMMs which is on the order of 10s of DIMMs, not hundreds. 2/ The lookup yields the entire range, if it becomes inefficient to do a kmap_atomic_pfn_t() a PAGE_SIZE at a time the caller can take advantage of the fact that the lookup can be amortized for all kmap operations it needs to perform in a given range. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/Kconfig | 3 + arch/x86/Kconfig | 5 ++ include/linux/highmem.h | 23 +++++++++++ mm/Makefile | 1 mm/pfn.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 130 insertions(+) create mode 100644 mm/pfn.c -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html