diff mbox

[1/7] drivers/block/pmem: Add a driver for persistent memory

Message ID 1435058738-8917-2-git-send-email-david.nystrom@ericsson.com (mailing list archive)
State Superseded
Headers show

Commit Message

David Nyström June 23, 2015, 11:25 a.m. UTC
From: Ross Zwisler <ross.zwisler@linux.intel.com>

This is a combination of 4 commits.

drivers/block/pmem: Add a driver for persistent memory

Commit-ID:  9e853f2313e5eb163cb1ea461b23c2332cf6438a
Gitweb:     http://git.kernel.org/tip/9e853f2313e5eb163cb1ea461b23c2332cf6438a
Author:     Ross Zwisler <ross.zwisler@linux.intel.com>
AuthorDate: Wed, 1 Apr 2015 09:12:19 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 1 Apr 2015 17:03:56 +0200

PMEM is a new driver that presents a reserved range of memory as
a block device.  This is useful for developing with NV-DIMMs,
and can be used with volatile memory as a development platform.

This patch contains the initial driver from Ross Zwisler, with
various changes: converted it to use a platform_device for
discovery, fixed partition support and merged various patches
from Boaz Harrosh.

Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@ml01.01.org
Link: http://lkml.kernel.org/r/1427872339-6688-3-git-send-email-hch@lst.de
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

pmem: Add prints at pmem_probe/remove

Add small prints at creation/remove of pmem devices.
So we can see in dmesg logs when users loaded/unloaded
the pmem driver and what devices were created.

The prints will look like this:
Printed by e820 on load:
 [  +0.000000] user: [mem 0x0000000100000000-0x000000015fffffff] persistent (type 12)
 [  +0.000000] user: [mem 0x0000000160000000-0x00000001dfffffff] persistent (type 12)
 ...
Printed by modprobe pmem:
 [  +0.003065] pmem pmem.0.auto: probe [0x0000000100000000:0x60000000]
 [  +0.001816] pmem pmem.1.auto: probe [0x0000000160000000:0x80000000]
 ...
Printed by modprobe -r pmem:
 [ +16.299145] pmem pmem.1.auto: remove
 [  +0.011155] pmem pmem.0.auto: remove

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>

pmem: Split out pmem_mapmem from pmem_alloc

I need this as a preparation for supporting different
mapping schema later.

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>

pmem: Support map= module param

Introduce a new map= module param for the pmem driver.

The map= param is an alternative way to create pmem
device. If map= is left empty (default) then the
platform devices will be loaded just as before.

But if map= is not empty, the platform devices
will not be considered and only the ranges specified
at map= will be created.

map= param is of the form:
		 map=mapS[,mapS...]

		 where mapS=nn[KMG]$ss[KMG],
		 or    mapS=nn[KMG]@ss[KMG],

		 nn=size, ss=offset

Just like the Kernel command line map && memmap parameters,
so anything you did at grub just copy/paste to here.

The "@" form is exactly the same as the "$" form only that
at bash prompt we need to escape the "$" with \$ so also
support the '@' char for convenience.

For each specified mapS there will be a device created.

On unload of driver all successfully created devices
will be unloaded.

NOTE: If at least one mapS creation is successful then
the modprobe will return success, and the driver will
stay loaded. However on first error the loading stops.
Some error messages might be displayed in dmesg.

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 MAINTAINERS            |   6 +
 drivers/block/Kconfig  |  11 ++
 drivers/block/Makefile |   1 +
 drivers/block/pmem.c   | 377 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 395 insertions(+)
 create mode 100644 drivers/block/pmem.c
diff mbox

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index d3b1571..d5bf0da 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6711,6 +6711,12 @@  S:	Maintained
 F:	Documentation/blockdev/ramdisk.txt
 F:	drivers/block/brd.c
 
+PERSISTENT MEMORY DRIVER
+M:	Ross Zwisler <ross.zwisler@linux.intel.com>
+L:	linux-nvdimm@lists.01.org
+S:	Supported
+F:	drivers/block/pmem.c
+
 RANDOM NUMBER DRIVER
 M:	Theodore Ts'o" <tytso@mit.edu>
 S:	Maintained
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index b81ddfe..860e8d1 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -387,6 +387,17 @@  config BLK_DEV_XIP
 	  will prevent RAM block device backing store memory from being
 	  allocated from highmem (only a problem for highmem systems).
 
+config BLK_DEV_PMEM
+	tristate "Persistent memory block device support"
+	help
+	  Saying Y here will allow you to use a contiguous range of reserved
+	  memory as one or more persistent block devices.
+
+	  To compile this driver as a module, choose M here: the module will be
+	  called 'pmem'.
+
+	  If unsure, say N.
+
 config CDROM_PKTCDVD
 	tristate "Packet writing on CD/DVD media"
 	depends on !UML
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index ca07399..6256f6e 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -14,6 +14,7 @@  obj-$(CONFIG_PS3_VRAM)		+= ps3vram.o
 obj-$(CONFIG_ATARI_FLOPPY)	+= ataflop.o
 obj-$(CONFIG_AMIGA_Z2RAM)	+= z2ram.o
 obj-$(CONFIG_BLK_DEV_RAM)	+= brd.o
+obj-$(CONFIG_BLK_DEV_PMEM)	+= pmem.o
 obj-$(CONFIG_BLK_DEV_LOOP)	+= loop.o
 obj-$(CONFIG_BLK_CPQ_DA)	+= cpqarray.o
 obj-$(CONFIG_BLK_CPQ_CISS_DA)  += cciss.o
diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
new file mode 100644
index 0000000..f756e5b
--- /dev/null
+++ b/drivers/block/pmem.c
@@ -0,0 +1,377 @@ 
+/*
+ * Persistent Memory Driver
+ *
+ * Copyright (c) 2014, Intel Corporation.
+ * Copyright (c) 2015, Christoph Hellwig <hch@lst.de>.
+ * Copyright (c) 2015, Boaz Harrosh <boaz@plexistor.com>.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include <asm/cacheflush.h>
+#include <linux/blkdev.h>
+#include <linux/hdreg.h>
+#include <linux/init.h>
+#include <linux/platform_device.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/slab.h>
+
+#define PMEM_MINORS		16
+
+struct pmem_device {
+	struct list_head	pmem_list;
+	struct request_queue	*pmem_queue;
+	struct gendisk		*pmem_disk;
+
+	/* One contiguous memory region per device */
+	phys_addr_t		phys_addr;
+	void			*virt_addr;
+	size_t			size;
+};
+
+static int pmem_major;
+static atomic_t pmem_index;
+
+static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
+			unsigned int len, unsigned int off, int rw,
+			sector_t sector)
+{
+	void *mem = kmap_atomic(page);
+	size_t pmem_off = sector << 9;
+
+	if (rw == READ) {
+		memcpy(mem + off, pmem->virt_addr + pmem_off, len);
+		flush_dcache_page(page);
+	} else {
+		flush_dcache_page(page);
+		memcpy(pmem->virt_addr + pmem_off, mem + off, len);
+	}
+
+	kunmap_atomic(mem);
+}
+
+static void pmem_make_request(struct request_queue *q, struct bio *bio)
+{
+	struct block_device *bdev = bio->bi_bdev;
+	struct pmem_device *pmem = bdev->bd_disk->private_data;
+	int rw;
+	struct bio_vec bvec;
+	sector_t sector;
+	struct bvec_iter iter;
+	int err = 0;
+
+	if (bio_end_sector(bio) > get_capacity(bdev->bd_disk)) {
+		err = -EIO;
+		goto out;
+	}
+
+	BUG_ON(bio->bi_rw & REQ_DISCARD);
+
+	rw = bio_data_dir(bio);
+	sector = bio->bi_iter.bi_sector;
+	bio_for_each_segment(bvec, bio, iter) {
+		pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len, bvec.bv_offset,
+			     rw, sector);
+		sector += bvec.bv_len >> 9;
+	}
+
+out:
+	bio_endio(bio, err);
+}
+
+static int pmem_rw_page(struct block_device *bdev, sector_t sector,
+		       struct page *page, int rw)
+{
+	struct pmem_device *pmem = bdev->bd_disk->private_data;
+
+	pmem_do_bvec(pmem, page, PAGE_CACHE_SIZE, 0, rw, sector);
+	page_endio(page, rw & WRITE, 0);
+
+	return 0;
+}
+
+static long pmem_direct_access(struct block_device *bdev, sector_t sector,
+			      void **kaddr, unsigned long *pfn, long size)
+{
+	struct pmem_device *pmem = bdev->bd_disk->private_data;
+	size_t offset = sector << 9;
+
+	if (!pmem)
+		return -ENODEV;
+
+	*kaddr = pmem->virt_addr + offset;
+	*pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
+
+	return pmem->size - offset;
+}
+
+static const struct block_device_operations pmem_fops = {
+	.owner =		THIS_MODULE,
+	.rw_page =		pmem_rw_page,
+	.direct_access =	pmem_direct_access,
+};
+
+/* pmem->phys_addr and pmem->size need to be set.
+ * Will then set virt_addr if successful.
+ */
+static int pmem_mapmem(struct pmem_device *pmem, struct device *dev)
+{
+	if (!request_mem_region(pmem->phys_addr, pmem->size, "pmem")) {
+		dev_warn(dev, "could not reserve region [0x%llx:0x%zx]\n",
+			   pmem->phys_addr, pmem->size);
+		return -EINVAL;
+	}
+
+	/*
+	 * Map the memory as non-cachable, as we can't write back the contents
+	 * of the CPU caches in case of a crash.
+	 */
+	pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
+	if (!pmem->virt_addr) {
+		dev_warn(dev, "could not ioremap_nocache [0x%llx:0x%zx]\n",
+			 pmem->phys_addr, pmem->size);
+		release_mem_region(pmem->phys_addr, pmem->size);
+		return -ENXIO;
+	}
+
+	return 0;
+}
+
+static void pmem_unmapmem(struct pmem_device *pmem)
+{
+	if (unlikely(!pmem->virt_addr))
+		return;
+
+	iounmap(pmem->virt_addr);
+	release_mem_region(pmem->phys_addr, pmem->size);
+	pmem->virt_addr = NULL;
+}
+
+static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
+{
+	struct pmem_device *pmem;
+	struct gendisk *disk;
+	int idx, err;
+
+	err = -ENOMEM;
+	pmem = kzalloc(sizeof(*pmem), GFP_KERNEL);
+	if (!pmem)
+		goto out;
+
+	pmem->phys_addr = res->start;
+	pmem->size = resource_size(res);
+
+	err = pmem_mapmem(pmem, dev);
+	if (err)
+		goto out_free_dev;
+
+	pmem->pmem_queue = blk_alloc_queue(GFP_KERNEL);
+	if (!pmem->pmem_queue)
+		goto out_unmap;
+
+	blk_queue_make_request(pmem->pmem_queue, pmem_make_request);
+	blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
+	blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
+
+	disk = alloc_disk(PMEM_MINORS);
+	if (!disk)
+		goto out_free_queue;
+
+	idx = atomic_inc_return(&pmem_index) - 1;
+
+	disk->major		= pmem_major;
+	disk->first_minor	= PMEM_MINORS * idx;
+	disk->fops		= &pmem_fops;
+	disk->private_data	= pmem;
+	disk->queue		= pmem->pmem_queue;
+	disk->flags		= GENHD_FL_EXT_DEVT;
+	sprintf(disk->disk_name, "pmem%d", idx);
+	disk->driverfs_dev = dev;
+	set_capacity(disk, pmem->size >> 9);
+	pmem->pmem_disk = disk;
+
+	add_disk(disk);
+
+	return pmem;
+
+out_free_queue:
+	blk_cleanup_queue(pmem->pmem_queue);
+out_unmap:
+	pmem_unmapmem(pmem);
+out_free_dev:
+	kfree(pmem);
+out:
+	return ERR_PTR(err);
+}
+
+static void pmem_free(struct pmem_device *pmem)
+{
+	del_gendisk(pmem->pmem_disk);
+	put_disk(pmem->pmem_disk);
+	blk_cleanup_queue(pmem->pmem_queue);
+	pmem_unmapmem(pmem);
+	kfree(pmem);
+}
+
+static int pmem_probe(struct platform_device *pdev)
+{
+	struct pmem_device *pmem;
+	struct resource *res;
+
+	if (WARN_ON(pdev->num_resources > 1))
+		return -ENXIO;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res)
+		return -ENXIO;
+
+	pmem = pmem_alloc(&pdev->dev, res);
+	if (IS_ERR(pmem))
+		return PTR_ERR(pmem);
+
+	platform_set_drvdata(pdev, pmem);
+	dev_info(&pdev->dev, "probe [%pa:0x%zx]\n",
+		 &pmem->phys_addr, pmem->size);
+
+	return 0;
+}
+
+static int pmem_remove(struct platform_device *pdev)
+{
+	struct pmem_device *pmem = platform_get_drvdata(pdev);
+
+	dev_info(&pdev->dev, "remove\n");
+	pmem_free(pmem);
+	return 0;
+}
+
+static struct platform_driver pmem_driver = {
+	.probe		= pmem_probe,
+	.remove		= pmem_remove,
+	.driver		= {
+		.owner	= THIS_MODULE,
+		.name	= "pmem",
+	},
+};
+
+static char *map;
+module_param(map, charp, S_IRUGO);
+MODULE_PARM_DESC(map,
+	"pmem device mapping: map=mapS[,mapS...] where:\n"
+	"mapS=nn[KMG]$ss[KMG] or mapS=nn[KMG]@ss[KMG], nn=size, ss=offset.");
+
+static LIST_HEAD(pmem_devices);
+
+static int __init
+pmem_parse_map_one(char *map, phys_addr_t *start, size_t *size)
+{
+	char *p = map;
+
+	*size = (size_t)memparse(p, &p);
+	if ((p == map) || ((*p != '$') && (*p != '@')))
+		return -EINVAL;
+
+	if (!*(++p))
+		return -EINVAL;
+
+	*start = (phys_addr_t)memparse(p, &p);
+
+	return *p == '\0' ? 0 : -EINVAL;
+}
+
+static int __init _load_from_map(void)
+{
+	struct pmem_device *pmem;
+	char *p, *pmem_map, *map_dup;
+	int err = -ENODEV;
+
+	map_dup = pmem_map = kstrdup(map, GFP_KERNEL);
+	if (unlikely(!pmem_map)) {
+		pr_debug("pmem_init strdup(%s) failed\n", map);
+		return -ENOMEM;
+	}
+
+	while ((p = strsep(&pmem_map, ",")) != NULL) {
+		struct resource res = {.start = 0};
+		size_t disk_size;
+
+		if (!*p)
+			continue;
+		err = pmem_parse_map_one(p, &res.start, &disk_size);
+		if (err)
+			goto out;
+		/*TODO: check alignments */
+
+		res.end = res.start + disk_size - 1;
+		pmem = pmem_alloc(NULL, &res);
+		if (IS_ERR(pmem)) {
+			err = PTR_ERR(pmem);
+			goto out;
+		}
+		list_add_tail(&pmem->pmem_list, &pmem_devices);
+	}
+
+out:
+	/* If we have at least one device we stay loaded and rmmod can
+	 * clean those that were loaded.
+	 */
+	if (!list_empty(&pmem_devices))
+		err = 0;
+
+	pr_info("pmem: init map=%s successful(%d) => %d\n",
+		map, atomic_read(&pmem_index), err);
+	kfree(map_dup);
+	return err;
+}
+
+void _unload_from_map(void)
+{
+	struct pmem_device *pmem, *next;
+
+	list_for_each_entry_safe(pmem, next, &pmem_devices, pmem_list) {
+		list_del(&pmem->pmem_list);
+		pmem_free(pmem);
+	}
+
+	pr_info("pmem: exit\n");
+}
+
+static int __init pmem_init(void)
+{
+	int error;
+
+	pmem_major = register_blkdev(0, "pmem");
+	if (pmem_major < 0)
+		return pmem_major;
+
+	if (map && *map)
+		return _load_from_map();
+
+	error = platform_driver_register(&pmem_driver);
+	if (error)
+		unregister_blkdev(pmem_major, "pmem");
+	return error;
+}
+module_init(pmem_init);
+
+static void pmem_exit(void)
+{
+	if (list_empty(&pmem_devices))
+		platform_driver_unregister(&pmem_driver);
+	else
+		_unload_from_map();
+
+	unregister_blkdev(pmem_major, "pmem");
+}
+module_exit(pmem_exit);
+
+MODULE_AUTHOR("Ross Zwisler <ross.zwisler@linux.intel.com>");
+MODULE_LICENSE("GPL v2");