Message ID | 20150605211912.20751.35406.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
The code is mostly generic, so the x86 in the subject is a little misleading. What keeps the Kconfig symbol in x86 anyway? Is there a reason why it can't be made entirely generic?
Btw, I don't think this actually is safe without refcounting your kmap structure. The driver model ->remove callback can be called at any time, which will ioremap the memory and remap the kmap structure. But at this point a user might still be using it.
On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote: > Btw, I don't think this actually is safe without refcounting your kmap > structure. > > The driver model ->remove callback can be called at any time, which > will ioremap the memory and remap the kmap structure. But at this > point a user might still be using it. Won't the device data structures be pinned by the refcount on the bdev?
On Wed, Jun 10, 2015 at 11:03:35AM -0400, Matthew Wilcox wrote: > On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote: > > Btw, I don't think this actually is safe without refcounting your kmap > > structure. > > > > The driver model ->remove callback can be called at any time, which > > will ioremap the memory and remap the kmap structure. But at this > > point a user might still be using it. > > Won't the device data structures be pinned by the refcount on the bdev? An open filesystem only keeps a reference on the request_queue. The underlying driver model ->remove method will still be called on a surprise removal.
On Wed, Jun 10, 2015 at 8:11 AM, Christoph Hellwig <hch@lst.de> wrote: > On Wed, Jun 10, 2015 at 11:03:35AM -0400, Matthew Wilcox wrote: >> On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote: >> > Btw, I don't think this actually is safe without refcounting your kmap >> > structure. >> > >> > The driver model ->remove callback can be called at any time, which >> > will ioremap the memory and remap the kmap structure. But at this >> > point a user might still be using it. >> >> Won't the device data structures be pinned by the refcount on the bdev? > > An open filesystem only keeps a reference on the request_queue. The > underlying driver model ->remove method will still be called on > a surprise removal. On surprise removal my expectation is that the driver keeps the the ioremap mapping alive for at least a synchronize_rcu() period. With that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep the mapping alive for the short duration, or otherwise prevent new mappings. However, I missed converting the driver to order its iounmap() with respect to the pfn range registration via devres, will fix.
On Wed, Jun 10, 2015 at 08:36:48AM -0700, Dan Williams wrote: > On surprise removal my expectation is that the driver keeps the the > ioremap mapping alive for at least a synchronize_rcu() period. With > that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep > the mapping alive for the short duration, or otherwise prevent new > mappings. However, I missed converting the driver to order its > iounmap() with respect to the pfn range registration via devres, will > fix. Right. I guess the best idea is to replace devm_register_kmap_pfn_range with an interface that does the ioremap, in which case the cleanup function can take care of the proper ordering behind the drivers back.
On Wed, Jun 10, 2015 at 9:11 AM, Christoph Hellwig <hch@lst.de> wrote: > On Wed, Jun 10, 2015 at 08:36:48AM -0700, Dan Williams wrote: >> On surprise removal my expectation is that the driver keeps the the >> ioremap mapping alive for at least a synchronize_rcu() period. With >> that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep >> the mapping alive for the short duration, or otherwise prevent new >> mappings. However, I missed converting the driver to order its >> iounmap() with respect to the pfn range registration via devres, will >> fix. > > Right. I guess the best idea is to replace devm_register_kmap_pfn_range > with an interface that does the ioremap, in which case the cleanup > function can take care of the proper ordering behind the drivers back. Even better than what I was thinking... will do.
diff --git a/arch/Kconfig b/arch/Kconfig index a65eafb24997..9b98732e0f3b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -203,6 +203,9 @@ config HAVE_DMA_ATTRS config HAVE_DMA_CONTIGUOUS bool +config HAVE_KMAP_PFN + bool + config GENERIC_SMP_IDLE_THREAD bool diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 226d5696e1d1..130d1a4c2efc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1432,6 +1432,11 @@ config X86_PMEM_LEGACY Say Y if unsure. +config X86_PMEM_DMA + depends on !HIGHMEM + def_bool DEV_PFN + select HAVE_KMAP_PFN + config HIGHPTE bool "Allocate 3rd-level pagetables from highmem" depends on HIGHMEM diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 9286a46b7d69..85fd52d43a9a 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -83,6 +83,29 @@ static inline void __kunmap_atomic(void *addr) #endif /* CONFIG_HIGHMEM */ +#ifdef CONFIG_HAVE_KMAP_PFN +extern void *kmap_atomic_pfn_t(__pfn_t pfn); +extern void kunmap_atomic_pfn_t(void *addr); +extern int devm_register_kmap_pfn_range(struct device *dev, + struct resource *res, void *base); +#else +static inline void *kmap_atomic_pfn_t(__pfn_t pfn) +{ + return kmap_atomic(__pfn_t_to_page(pfn)); +} + +static inline void kunmap_atomic_pfn_t(void *addr) +{ + __kunmap_atomic(addr); +} + +static inline int devm_register_kmap_pfn_range(struct device *dev, + struct resource *res, void *base) +{ + return 0; +} +#endif /* CONFIG_HAVE_KMAP_PFN */ + #if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32) DECLARE_PER_CPU(int, __kmap_atomic_idx); diff --git a/mm/Makefile b/mm/Makefile index 98c4eaeabdcb..66e30c2addfe 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -78,3 +78,4 @@ obj-$(CONFIG_CMA) += cma.o obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o +obj-$(CONFIG_HAVE_KMAP_PFN) += pfn.o diff --git a/mm/pfn.c b/mm/pfn.c new file mode 100644 index 000000000000..0e046b49aebf --- /dev/null +++ b/mm/pfn.c @@ -0,0 +1,98 @@ +/* + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include <linux/rcupdate.h> +#include <linux/rculist.h> +#include <linux/highmem.h> +#include <linux/device.h> +#include <linux/slab.h> +#include <linux/mm.h> + +static LIST_HEAD(ranges); + +struct kmap { + struct list_head list; + struct resource *res; + struct device *dev; + void *base; +}; + +static void teardown_kmap(void *data) +{ + struct kmap *kmap = data; + + dev_dbg(kmap->dev, "kmap unregister %pr\n", kmap->res); + list_del_rcu(&kmap->list); + synchronize_rcu(); + kfree(kmap); +} + +int devm_register_kmap_pfn_range(struct device *dev, struct resource *res, + void *base) +{ + struct kmap *kmap = kzalloc(sizeof(*kmap), GFP_KERNEL); + int rc; + + if (!kmap) + return -ENOMEM; + + INIT_LIST_HEAD(&kmap->list); + kmap->res = res; + kmap->base = base; + kmap->dev = dev; + rc = devm_add_action(dev, teardown_kmap, kmap); + if (rc) { + kfree(kmap); + return rc; + } + dev_dbg(kmap->dev, "kmap register %pr\n", kmap->res); + list_add_rcu(&kmap->list, &ranges); + return 0; +} +EXPORT_SYMBOL_GPL(devm_register_kmap_pfn_range); + +void *kmap_atomic_pfn_t(__pfn_t pfn) +{ + struct page *page = __pfn_t_to_page(pfn); + resource_size_t addr; + struct kmap *kmap; + + if (page) + return kmap_atomic(page); + addr = __pfn_t_to_phys(pfn); + rcu_read_lock(); + list_for_each_entry_rcu(kmap, &ranges, list) + if (addr >= kmap->res->start && addr <= kmap->res->end) + return kmap->base + addr - kmap->res->start; + + /* only unlock in the error case */ + rcu_read_unlock(); + return NULL; +} +EXPORT_SYMBOL(kmap_atomic_pfn_t); + +void kunmap_atomic_pfn_t(void *addr) +{ + /* + * If the original __pfn_t had an entry in the memmap then + * 'addr' will be outside of vmalloc space i.e. it came from + * page_address() + */ + if (!is_vmalloc_addr(addr)) { + kunmap_atomic(addr); + return; + } + + /* signal that we are done with the range */ + rcu_read_unlock(); +} +EXPORT_SYMBOL(kunmap_atomic_pfn_t);
It would be unfortunate if the kmap infrastructure escaped its current 32-bit/HIGHMEM bonds and leaked into 64-bit code. Instead, if the user has enabled CONFIG_DEV_PFN we direct the kmap_atomic_pfn_t() implementation to scan a list of pre-mapped persistent memory address ranges inserted by the pmem driver. The __pfn_t to resource lookup is indeed inefficient walking of a linked list, but there are two mitigating factors: 1/ The number of persistent memory ranges is bounded by the number of DIMMs which is on the order of 10s of DIMMs, not hundreds. 2/ The lookup yields the entire range, if it becomes inefficient to do a kmap_atomic_pfn_t() a PAGE_SIZE at a time the caller can take advantage of the fact that the lookup can be amortized for all kmap operations it needs to perform in a given range. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/Kconfig | 3 + arch/x86/Kconfig | 5 ++ include/linux/highmem.h | 23 +++++++++++ mm/Makefile | 1 mm/pfn.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 130 insertions(+) create mode 100644 mm/pfn.c