diff mbox

[v4,2/9] x86: support kmap_atomic_pfn_t() for persistent memory

Message ID 20150605211912.20751.35406.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams June 5, 2015, 9:19 p.m. UTC
It would be unfortunate if the kmap infrastructure escaped its current
32-bit/HIGHMEM bonds and leaked into 64-bit code.  Instead, if the user
has enabled CONFIG_DEV_PFN we direct the kmap_atomic_pfn_t()
implementation to scan a list of pre-mapped persistent memory address
ranges inserted by the pmem driver.

The __pfn_t to resource lookup is indeed inefficient walking of a linked list,
but there are two mitigating factors:

1/ The number of persistent memory ranges is bounded by the number of
   DIMMs which is on the order of 10s of DIMMs, not hundreds.

2/ The lookup yields the entire range, if it becomes inefficient to do a
   kmap_atomic_pfn_t() a PAGE_SIZE at a time the caller can take
   advantage of the fact that the lookup can be amortized for all kmap
   operations it needs to perform in a given range.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/Kconfig            |    3 +
 arch/x86/Kconfig        |    5 ++
 include/linux/highmem.h |   23 +++++++++++
 mm/Makefile             |    1 
 mm/pfn.c                |   98 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 130 insertions(+)
 create mode 100644 mm/pfn.c


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Hellwig June 9, 2015, 6:50 a.m. UTC | #1
The code is mostly generic, so the x86 in the subject is a little misleading.
What keeps the Kconfig symbol in x86 anyway?  Is there a reason why
it can't be made entirely generic?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig June 10, 2015, 12:12 p.m. UTC | #2
Btw, I don't think this actually is safe without refcounting your kmap
structure.

The driver model ->remove callback can be called at any time, which
will ioremap the memory and remap the kmap structure.  But at this
point a user might still be using it.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Wilcox June 10, 2015, 3:03 p.m. UTC | #3
On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote:
> Btw, I don't think this actually is safe without refcounting your kmap
> structure.
> 
> The driver model ->remove callback can be called at any time, which
> will ioremap the memory and remap the kmap structure.  But at this
> point a user might still be using it.

Won't the device data structures be pinned by the refcount on the bdev?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig June 10, 2015, 3:11 p.m. UTC | #4
On Wed, Jun 10, 2015 at 11:03:35AM -0400, Matthew Wilcox wrote:
> On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote:
> > Btw, I don't think this actually is safe without refcounting your kmap
> > structure.
> > 
> > The driver model ->remove callback can be called at any time, which
> > will ioremap the memory and remap the kmap structure.  But at this
> > point a user might still be using it.
> 
> Won't the device data structures be pinned by the refcount on the bdev?

An open filesystem only keeps a reference on the request_queue.  The
underlying driver model ->remove method will still be called on
a surprise removal.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams June 10, 2015, 3:36 p.m. UTC | #5
On Wed, Jun 10, 2015 at 8:11 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Jun 10, 2015 at 11:03:35AM -0400, Matthew Wilcox wrote:
>> On Wed, Jun 10, 2015 at 02:12:02PM +0200, Christoph Hellwig wrote:
>> > Btw, I don't think this actually is safe without refcounting your kmap
>> > structure.
>> >
>> > The driver model ->remove callback can be called at any time, which
>> > will ioremap the memory and remap the kmap structure.  But at this
>> > point a user might still be using it.
>>
>> Won't the device data structures be pinned by the refcount on the bdev?
>
> An open filesystem only keeps a reference on the request_queue.  The
> underlying driver model ->remove method will still be called on
> a surprise removal.

On surprise removal my expectation is that the driver keeps the the
ioremap mapping alive for at least a synchronize_rcu() period.  With
that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep
the mapping alive for the short duration, or otherwise prevent new
mappings.  However, I missed converting the driver to order its
iounmap() with respect to the pfn range registration via devres, will
fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig June 10, 2015, 4:11 p.m. UTC | #6
On Wed, Jun 10, 2015 at 08:36:48AM -0700, Dan Williams wrote:
> On surprise removal my expectation is that the driver keeps the the
> ioremap mapping alive for at least a synchronize_rcu() period.  With
> that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep
> the mapping alive for the short duration, or otherwise prevent new
> mappings.  However, I missed converting the driver to order its
> iounmap() with respect to the pfn range registration via devres, will
> fix.

Right.  I guess the best idea is to replace devm_register_kmap_pfn_range
with an interface that does the ioremap, in which case the cleanup
function can take care of the proper ordering behind the drivers back.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams June 10, 2015, 4:17 p.m. UTC | #7
On Wed, Jun 10, 2015 at 9:11 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Jun 10, 2015 at 08:36:48AM -0700, Dan Williams wrote:
>> On surprise removal my expectation is that the driver keeps the the
>> ioremap mapping alive for at least a synchronize_rcu() period.  With
>> that in place the rcu_read_lock() in kmap_atomic_pfn_t() should keep
>> the mapping alive for the short duration, or otherwise prevent new
>> mappings.  However, I missed converting the driver to order its
>> iounmap() with respect to the pfn range registration via devres, will
>> fix.
>
> Right.  I guess the best idea is to replace devm_register_kmap_pfn_range
> with an interface that does the ioremap, in which case the cleanup
> function can take care of the proper ordering behind the drivers back.

Even better than what I was thinking... will do.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/Kconfig b/arch/Kconfig
index a65eafb24997..9b98732e0f3b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -203,6 +203,9 @@  config HAVE_DMA_ATTRS
 config HAVE_DMA_CONTIGUOUS
 	bool
 
+config HAVE_KMAP_PFN
+	bool
+
 config GENERIC_SMP_IDLE_THREAD
        bool
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 226d5696e1d1..130d1a4c2efc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1432,6 +1432,11 @@  config X86_PMEM_LEGACY
 
 	  Say Y if unsure.
 
+config X86_PMEM_DMA
+	depends on !HIGHMEM
+	def_bool DEV_PFN
+	select HAVE_KMAP_PFN
+
 config HIGHPTE
 	bool "Allocate 3rd-level pagetables from highmem"
 	depends on HIGHMEM
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 9286a46b7d69..85fd52d43a9a 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -83,6 +83,29 @@  static inline void __kunmap_atomic(void *addr)
 
 #endif /* CONFIG_HIGHMEM */
 
+#ifdef CONFIG_HAVE_KMAP_PFN
+extern void *kmap_atomic_pfn_t(__pfn_t pfn);
+extern void kunmap_atomic_pfn_t(void *addr);
+extern int devm_register_kmap_pfn_range(struct device *dev,
+		struct resource *res, void *base);
+#else
+static inline void *kmap_atomic_pfn_t(__pfn_t pfn)
+{
+	return kmap_atomic(__pfn_t_to_page(pfn));
+}
+
+static inline void kunmap_atomic_pfn_t(void *addr)
+{
+	__kunmap_atomic(addr);
+}
+
+static inline int devm_register_kmap_pfn_range(struct device *dev,
+		struct resource *res, void *base)
+{
+	return 0;
+}
+#endif /* CONFIG_HAVE_KMAP_PFN */
+
 #if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
 
 DECLARE_PER_CPU(int, __kmap_atomic_idx);
diff --git a/mm/Makefile b/mm/Makefile
index 98c4eaeabdcb..66e30c2addfe 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -78,3 +78,4 @@  obj-$(CONFIG_CMA)	+= cma.o
 obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
 obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
+obj-$(CONFIG_HAVE_KMAP_PFN) += pfn.o
diff --git a/mm/pfn.c b/mm/pfn.c
new file mode 100644
index 000000000000..0e046b49aebf
--- /dev/null
+++ b/mm/pfn.c
@@ -0,0 +1,98 @@ 
+/*
+ * Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include <linux/rcupdate.h>
+#include <linux/rculist.h>
+#include <linux/highmem.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/mm.h>
+
+static LIST_HEAD(ranges);
+
+struct kmap {
+	struct list_head list;
+	struct resource *res;
+	struct device *dev;
+	void *base;
+};
+
+static void teardown_kmap(void *data)
+{
+	struct kmap *kmap = data;
+
+	dev_dbg(kmap->dev, "kmap unregister %pr\n", kmap->res);
+	list_del_rcu(&kmap->list);
+	synchronize_rcu();
+	kfree(kmap);
+}
+
+int devm_register_kmap_pfn_range(struct device *dev, struct resource *res,
+		void *base)
+{
+	struct kmap *kmap = kzalloc(sizeof(*kmap), GFP_KERNEL);
+	int rc;
+
+	if (!kmap)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&kmap->list);
+	kmap->res = res;
+	kmap->base = base;
+	kmap->dev = dev;
+	rc = devm_add_action(dev, teardown_kmap, kmap);
+	if (rc) {
+		kfree(kmap);
+		return rc;
+	}
+	dev_dbg(kmap->dev, "kmap register %pr\n", kmap->res);
+	list_add_rcu(&kmap->list, &ranges);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devm_register_kmap_pfn_range);
+
+void *kmap_atomic_pfn_t(__pfn_t pfn)
+{
+	struct page *page = __pfn_t_to_page(pfn);
+	resource_size_t addr;
+	struct kmap *kmap;
+
+	if (page)
+		return kmap_atomic(page);
+	addr = __pfn_t_to_phys(pfn);
+	rcu_read_lock();
+	list_for_each_entry_rcu(kmap, &ranges, list)
+		if (addr >= kmap->res->start && addr <= kmap->res->end)
+			return kmap->base + addr - kmap->res->start;
+
+	/* only unlock in the error case */
+	rcu_read_unlock();
+	return NULL;
+}
+EXPORT_SYMBOL(kmap_atomic_pfn_t);
+
+void kunmap_atomic_pfn_t(void *addr)
+{
+	/*
+	 * If the original __pfn_t had an entry in the memmap then
+	 * 'addr' will be outside of vmalloc space i.e. it came from
+	 * page_address()
+	 */
+	if (!is_vmalloc_addr(addr)) {
+		kunmap_atomic(addr);
+		return;
+	}
+
+	/* signal that we are done with the range */
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(kunmap_atomic_pfn_t);