From patchwork Tue Jun 23 11:25:32 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?David_Nystr=C3=B6m?= X-Patchwork-Id: 6659721 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 01A08C05AC for ; Tue, 23 Jun 2015 11:25:52 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CE6E5206AC for ; Tue, 23 Jun 2015 11:25:46 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DE944206AF for ; Tue, 23 Jun 2015 11:25:44 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id B5A14182688; Tue, 23 Jun 2015 04:25:44 -0700 (PDT) X-Original-To: linux-nvdimm@ml01.01.org Delivered-To: linux-nvdimm@ml01.01.org Received: from sesbmg22.ericsson.net (sesbmg22.ericsson.net [193.180.251.48]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 043FD182674 for ; Tue, 23 Jun 2015 04:25:42 -0700 (PDT) X-AuditID: c1b4fb30-f799f6d000000faf-77-55894234f251 Received: from ESESSHC008.ericsson.se (Unknown_Domain [153.88.253.125]) by sesbmg22.ericsson.net (Symantec Mail Security) with SMTP id CE.8B.04015.43249855; Tue, 23 Jun 2015 13:25:40 +0200 (CEST) Received: from esekilxxen355.rnd.ericsson.se (153.88.183.153) by smtp.internal.ericsson.com (153.88.183.44) with Microsoft SMTP Server id 14.3.210.2; Tue, 23 Jun 2015 13:25:39 +0200 Received: from esekilxxen3017.rnd.ericsson.se (esekilxxen3017.rnd.ericsson.se [147.214.13.194]) by esekilxxen355.rnd.ericsson.se (Postfix) with ESMTP id D87AFA00FD; Tue, 23 Jun 2015 13:25:39 +0200 (CEST) Received: by esekilxxen3017.rnd.ericsson.se (Postfix, from userid 265979) id C6DA64E249; Tue, 23 Jun 2015 13:25:39 +0200 (CEST) From: =?UTF-8?q?David=20Nystr=C3=B6m?= To: Subject: [PATCH 1/7] drivers/block/pmem: Add a driver for persistent memory Date: Tue, 23 Jun 2015 13:25:32 +0200 Message-ID: <1435058738-8917-2-git-send-email-david.nystrom@ericsson.com> X-Mailer: git-send-email 2.1.1 In-Reply-To: <1435058738-8917-1-git-send-email-david.nystrom@ericsson.com> References: <1435058738-8917-1-git-send-email-david.nystrom@ericsson.com> MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrDIsWRmVeSWpSXmKPExsUyM+Jvra6JU2eowcLXYhZz1q9hs/i/5xib xeq7/WwWE3qqLT5v+MdmcWLxNVaL5f0aFitXH2WymLZR3OLe0S9MFtPPvGS3WL22gdVi9b9T jBYTn/9msdi8aSqzxaO+t+wWD68dZXMQ9Pje2sficf/NXxaPic3v2D12zrrL7rF4z0smj8tn Sz02repk83h37hy7x4kZv1k85p0M9Nh9s4HNY9KFx8wel+e9ZfNYv2Urk8eJli+sAfxRXDYp qTmZZalF+nYJXBmLp9QVHCivuLWhjbmB8U5iFyMnh4SAicSrSdtZIGwxiQv31rN1MXJxCAkc ZZRY+HIRI4Szk1Fix/HH7BDOeUaJ9ad+QpU1Mkr8WPmJFaSfTcBZ4n3zDKAWDg4RAVWJexe9 QMLMAudYJNoeR4DYwgLeEudeNIGVswCVfPs/DczmFfCUaDi3lQniDDmJvZNXg53EKeAlsfJl JxuILQRU82j1HyaIekGJkzOfsEDMl5A4+OIFM8haIQEViS/LqyYwCs1CUjULSdUCRqZVjKLF qcVJuelGRnqpRZnJxcX5eXp5qSWbGIHRe3DLb4MdjC+fOx5iFOBgVOLhXbC3I1SINbGsuDL3 EKM0B4uSOO+MzXmhQgLpiSWp2ampBalF8UWlOanFhxiZODilGhjZVFWnedxLNv9y+n16verq 4NpfiUI7/kx4V6P77n7OjeOpSyK01N5duR4f0iL0kfdm1CGNHWqNjAYfdvgfr3kTv+zFOevZ e47aTPb98GHV2UThj7ftX1YmSkjYsHJGfzNdoneNPWipVcq9i79MH+elWDkkSnlUNuio2rA/ Yz3zau78GEHnlT+UWIozEg21mIuKEwGQJ+A3vwIAAA== Cc: Jens Axboe , Thomas Gleixner , Andrew Morton , Linus Torvalds , linux-nvdimm@ml01.01.org, Keith Busch , Andy Lutomirski , Jens Axboe , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Ffredrik.markstrom@gmail.com, Christoph Hellwig X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Ross Zwisler This is a combination of 4 commits. drivers/block/pmem: Add a driver for persistent memory Commit-ID: 9e853f2313e5eb163cb1ea461b23c2332cf6438a Gitweb: http://git.kernel.org/tip/9e853f2313e5eb163cb1ea461b23c2332cf6438a Author: Ross Zwisler AuthorDate: Wed, 1 Apr 2015 09:12:19 +0200 Committer: Ingo Molnar CommitDate: Wed, 1 Apr 2015 17:03:56 +0200 PMEM is a new driver that presents a reserved range of memory as a block device. This is useful for developing with NV-DIMMs, and can be used with volatile memory as a development platform. This patch contains the initial driver from Ross Zwisler, with various changes: converted it to use a platform_device for discovery, fixed partition support and merged various patches from Boaz Harrosh. Tested-by: Ross Zwisler Signed-off-by: Ross Zwisler Signed-off-by: Christoph Hellwig Acked-by: Dan Williams Cc: Andrew Morton Cc: Andy Lutomirski Cc: Boaz Harrosh Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Jens Axboe Cc: Jens Axboe Cc: Keith Busch Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Thomas Gleixner Cc: linux-nvdimm@ml01.01.org Link: http://lkml.kernel.org/r/1427872339-6688-3-git-send-email-hch@lst.de [ Minor cleanups. ] Signed-off-by: Ingo Molnar pmem: Add prints at pmem_probe/remove Add small prints at creation/remove of pmem devices. So we can see in dmesg logs when users loaded/unloaded the pmem driver and what devices were created. The prints will look like this: Printed by e820 on load: [ +0.000000] user: [mem 0x0000000100000000-0x000000015fffffff] persistent (type 12) [ +0.000000] user: [mem 0x0000000160000000-0x00000001dfffffff] persistent (type 12) ... Printed by modprobe pmem: [ +0.003065] pmem pmem.0.auto: probe [0x0000000100000000:0x60000000] [ +0.001816] pmem pmem.1.auto: probe [0x0000000160000000:0x80000000] ... Printed by modprobe -r pmem: [ +16.299145] pmem pmem.1.auto: remove [ +0.011155] pmem pmem.0.auto: remove Signed-off-by: Boaz Harrosh pmem: Split out pmem_mapmem from pmem_alloc I need this as a preparation for supporting different mapping schema later. Signed-off-by: Boaz Harrosh pmem: Support map= module param Introduce a new map= module param for the pmem driver. The map= param is an alternative way to create pmem device. If map= is left empty (default) then the platform devices will be loaded just as before. But if map= is not empty, the platform devices will not be considered and only the ranges specified at map= will be created. map= param is of the form: map=mapS[,mapS...] where mapS=nn[KMG]$ss[KMG], or mapS=nn[KMG]@ss[KMG], nn=size, ss=offset Just like the Kernel command line map && memmap parameters, so anything you did at grub just copy/paste to here. The "@" form is exactly the same as the "$" form only that at bash prompt we need to escape the "$" with \$ so also support the '@' char for convenience. For each specified mapS there will be a device created. On unload of driver all successfully created devices will be unloaded. NOTE: If at least one mapS creation is successful then the modprobe will return success, and the driver will stay loaded. However on first error the loading stops. Some error messages might be displayed in dmesg. Signed-off-by: Boaz Harrosh --- MAINTAINERS | 6 + drivers/block/Kconfig | 11 ++ drivers/block/Makefile | 1 + drivers/block/pmem.c | 377 +++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 395 insertions(+) create mode 100644 drivers/block/pmem.c diff --git a/MAINTAINERS b/MAINTAINERS index d3b1571..d5bf0da 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6711,6 +6711,12 @@ S: Maintained F: Documentation/blockdev/ramdisk.txt F: drivers/block/brd.c +PERSISTENT MEMORY DRIVER +M: Ross Zwisler +L: linux-nvdimm@lists.01.org +S: Supported +F: drivers/block/pmem.c + RANDOM NUMBER DRIVER M: Theodore Ts'o" S: Maintained diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index b81ddfe..860e8d1 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -387,6 +387,17 @@ config BLK_DEV_XIP will prevent RAM block device backing store memory from being allocated from highmem (only a problem for highmem systems). +config BLK_DEV_PMEM + tristate "Persistent memory block device support" + help + Saying Y here will allow you to use a contiguous range of reserved + memory as one or more persistent block devices. + + To compile this driver as a module, choose M here: the module will be + called 'pmem'. + + If unsure, say N. + config CDROM_PKTCDVD tristate "Packet writing on CD/DVD media" depends on !UML diff --git a/drivers/block/Makefile b/drivers/block/Makefile index ca07399..6256f6e 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -14,6 +14,7 @@ obj-$(CONFIG_PS3_VRAM) += ps3vram.o obj-$(CONFIG_ATARI_FLOPPY) += ataflop.o obj-$(CONFIG_AMIGA_Z2RAM) += z2ram.o obj-$(CONFIG_BLK_DEV_RAM) += brd.o +obj-$(CONFIG_BLK_DEV_PMEM) += pmem.o obj-$(CONFIG_BLK_DEV_LOOP) += loop.o obj-$(CONFIG_BLK_CPQ_DA) += cpqarray.o obj-$(CONFIG_BLK_CPQ_CISS_DA) += cciss.o diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c new file mode 100644 index 0000000..f756e5b --- /dev/null +++ b/drivers/block/pmem.c @@ -0,0 +1,377 @@ +/* + * Persistent Memory Driver + * + * Copyright (c) 2014, Intel Corporation. + * Copyright (c) 2015, Christoph Hellwig . + * Copyright (c) 2015, Boaz Harrosh . + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define PMEM_MINORS 16 + +struct pmem_device { + struct list_head pmem_list; + struct request_queue *pmem_queue; + struct gendisk *pmem_disk; + + /* One contiguous memory region per device */ + phys_addr_t phys_addr; + void *virt_addr; + size_t size; +}; + +static int pmem_major; +static atomic_t pmem_index; + +static void pmem_do_bvec(struct pmem_device *pmem, struct page *page, + unsigned int len, unsigned int off, int rw, + sector_t sector) +{ + void *mem = kmap_atomic(page); + size_t pmem_off = sector << 9; + + if (rw == READ) { + memcpy(mem + off, pmem->virt_addr + pmem_off, len); + flush_dcache_page(page); + } else { + flush_dcache_page(page); + memcpy(pmem->virt_addr + pmem_off, mem + off, len); + } + + kunmap_atomic(mem); +} + +static void pmem_make_request(struct request_queue *q, struct bio *bio) +{ + struct block_device *bdev = bio->bi_bdev; + struct pmem_device *pmem = bdev->bd_disk->private_data; + int rw; + struct bio_vec bvec; + sector_t sector; + struct bvec_iter iter; + int err = 0; + + if (bio_end_sector(bio) > get_capacity(bdev->bd_disk)) { + err = -EIO; + goto out; + } + + BUG_ON(bio->bi_rw & REQ_DISCARD); + + rw = bio_data_dir(bio); + sector = bio->bi_iter.bi_sector; + bio_for_each_segment(bvec, bio, iter) { + pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len, bvec.bv_offset, + rw, sector); + sector += bvec.bv_len >> 9; + } + +out: + bio_endio(bio, err); +} + +static int pmem_rw_page(struct block_device *bdev, sector_t sector, + struct page *page, int rw) +{ + struct pmem_device *pmem = bdev->bd_disk->private_data; + + pmem_do_bvec(pmem, page, PAGE_CACHE_SIZE, 0, rw, sector); + page_endio(page, rw & WRITE, 0); + + return 0; +} + +static long pmem_direct_access(struct block_device *bdev, sector_t sector, + void **kaddr, unsigned long *pfn, long size) +{ + struct pmem_device *pmem = bdev->bd_disk->private_data; + size_t offset = sector << 9; + + if (!pmem) + return -ENODEV; + + *kaddr = pmem->virt_addr + offset; + *pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT; + + return pmem->size - offset; +} + +static const struct block_device_operations pmem_fops = { + .owner = THIS_MODULE, + .rw_page = pmem_rw_page, + .direct_access = pmem_direct_access, +}; + +/* pmem->phys_addr and pmem->size need to be set. + * Will then set virt_addr if successful. + */ +static int pmem_mapmem(struct pmem_device *pmem, struct device *dev) +{ + if (!request_mem_region(pmem->phys_addr, pmem->size, "pmem")) { + dev_warn(dev, "could not reserve region [0x%llx:0x%zx]\n", + pmem->phys_addr, pmem->size); + return -EINVAL; + } + + /* + * Map the memory as non-cachable, as we can't write back the contents + * of the CPU caches in case of a crash. + */ + pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size); + if (!pmem->virt_addr) { + dev_warn(dev, "could not ioremap_nocache [0x%llx:0x%zx]\n", + pmem->phys_addr, pmem->size); + release_mem_region(pmem->phys_addr, pmem->size); + return -ENXIO; + } + + return 0; +} + +static void pmem_unmapmem(struct pmem_device *pmem) +{ + if (unlikely(!pmem->virt_addr)) + return; + + iounmap(pmem->virt_addr); + release_mem_region(pmem->phys_addr, pmem->size); + pmem->virt_addr = NULL; +} + +static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res) +{ + struct pmem_device *pmem; + struct gendisk *disk; + int idx, err; + + err = -ENOMEM; + pmem = kzalloc(sizeof(*pmem), GFP_KERNEL); + if (!pmem) + goto out; + + pmem->phys_addr = res->start; + pmem->size = resource_size(res); + + err = pmem_mapmem(pmem, dev); + if (err) + goto out_free_dev; + + pmem->pmem_queue = blk_alloc_queue(GFP_KERNEL); + if (!pmem->pmem_queue) + goto out_unmap; + + blk_queue_make_request(pmem->pmem_queue, pmem_make_request); + blk_queue_max_hw_sectors(pmem->pmem_queue, 1024); + blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY); + + disk = alloc_disk(PMEM_MINORS); + if (!disk) + goto out_free_queue; + + idx = atomic_inc_return(&pmem_index) - 1; + + disk->major = pmem_major; + disk->first_minor = PMEM_MINORS * idx; + disk->fops = &pmem_fops; + disk->private_data = pmem; + disk->queue = pmem->pmem_queue; + disk->flags = GENHD_FL_EXT_DEVT; + sprintf(disk->disk_name, "pmem%d", idx); + disk->driverfs_dev = dev; + set_capacity(disk, pmem->size >> 9); + pmem->pmem_disk = disk; + + add_disk(disk); + + return pmem; + +out_free_queue: + blk_cleanup_queue(pmem->pmem_queue); +out_unmap: + pmem_unmapmem(pmem); +out_free_dev: + kfree(pmem); +out: + return ERR_PTR(err); +} + +static void pmem_free(struct pmem_device *pmem) +{ + del_gendisk(pmem->pmem_disk); + put_disk(pmem->pmem_disk); + blk_cleanup_queue(pmem->pmem_queue); + pmem_unmapmem(pmem); + kfree(pmem); +} + +static int pmem_probe(struct platform_device *pdev) +{ + struct pmem_device *pmem; + struct resource *res; + + if (WARN_ON(pdev->num_resources > 1)) + return -ENXIO; + + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!res) + return -ENXIO; + + pmem = pmem_alloc(&pdev->dev, res); + if (IS_ERR(pmem)) + return PTR_ERR(pmem); + + platform_set_drvdata(pdev, pmem); + dev_info(&pdev->dev, "probe [%pa:0x%zx]\n", + &pmem->phys_addr, pmem->size); + + return 0; +} + +static int pmem_remove(struct platform_device *pdev) +{ + struct pmem_device *pmem = platform_get_drvdata(pdev); + + dev_info(&pdev->dev, "remove\n"); + pmem_free(pmem); + return 0; +} + +static struct platform_driver pmem_driver = { + .probe = pmem_probe, + .remove = pmem_remove, + .driver = { + .owner = THIS_MODULE, + .name = "pmem", + }, +}; + +static char *map; +module_param(map, charp, S_IRUGO); +MODULE_PARM_DESC(map, + "pmem device mapping: map=mapS[,mapS...] where:\n" + "mapS=nn[KMG]$ss[KMG] or mapS=nn[KMG]@ss[KMG], nn=size, ss=offset."); + +static LIST_HEAD(pmem_devices); + +static int __init +pmem_parse_map_one(char *map, phys_addr_t *start, size_t *size) +{ + char *p = map; + + *size = (size_t)memparse(p, &p); + if ((p == map) || ((*p != '$') && (*p != '@'))) + return -EINVAL; + + if (!*(++p)) + return -EINVAL; + + *start = (phys_addr_t)memparse(p, &p); + + return *p == '\0' ? 0 : -EINVAL; +} + +static int __init _load_from_map(void) +{ + struct pmem_device *pmem; + char *p, *pmem_map, *map_dup; + int err = -ENODEV; + + map_dup = pmem_map = kstrdup(map, GFP_KERNEL); + if (unlikely(!pmem_map)) { + pr_debug("pmem_init strdup(%s) failed\n", map); + return -ENOMEM; + } + + while ((p = strsep(&pmem_map, ",")) != NULL) { + struct resource res = {.start = 0}; + size_t disk_size; + + if (!*p) + continue; + err = pmem_parse_map_one(p, &res.start, &disk_size); + if (err) + goto out; + /*TODO: check alignments */ + + res.end = res.start + disk_size - 1; + pmem = pmem_alloc(NULL, &res); + if (IS_ERR(pmem)) { + err = PTR_ERR(pmem); + goto out; + } + list_add_tail(&pmem->pmem_list, &pmem_devices); + } + +out: + /* If we have at least one device we stay loaded and rmmod can + * clean those that were loaded. + */ + if (!list_empty(&pmem_devices)) + err = 0; + + pr_info("pmem: init map=%s successful(%d) => %d\n", + map, atomic_read(&pmem_index), err); + kfree(map_dup); + return err; +} + +void _unload_from_map(void) +{ + struct pmem_device *pmem, *next; + + list_for_each_entry_safe(pmem, next, &pmem_devices, pmem_list) { + list_del(&pmem->pmem_list); + pmem_free(pmem); + } + + pr_info("pmem: exit\n"); +} + +static int __init pmem_init(void) +{ + int error; + + pmem_major = register_blkdev(0, "pmem"); + if (pmem_major < 0) + return pmem_major; + + if (map && *map) + return _load_from_map(); + + error = platform_driver_register(&pmem_driver); + if (error) + unregister_blkdev(pmem_major, "pmem"); + return error; +} +module_init(pmem_init); + +static void pmem_exit(void) +{ + if (list_empty(&pmem_devices)) + platform_driver_unregister(&pmem_driver); + else + _unload_from_map(); + + unregister_blkdev(pmem_major, "pmem"); +} +module_exit(pmem_exit); + +MODULE_AUTHOR("Ross Zwisler "); +MODULE_LICENSE("GPL v2");