diff mbox

null_blk: Register as a LightNVM device

Message ID 1447236398-9421-1-git-send-email-m@bjorling.me (mailing list archive)
State Superseded, archived
Delegated to: Jens Axboe
Headers show

Commit Message

Matias Bjørling Nov. 11, 2015, 10:06 a.m. UTC
Add support for registering as a LightNVM device. This allows us to
evaluate the performance of the LightNVM library.

In /drivers/Makefile, LightNVM is moved above block device drivers
to make sure that the LightNVM media managers have been initialized
before drivers under /drivers/block are initialized.

Signed-off-by: Matias Bjørling <m@bjorling.me>
---
 Documentation/block/null_blk.txt |   3 +
 drivers/Makefile                 |   2 +-
 drivers/block/null_blk.c         | 170 +++++++++++++++++++++++++++++++++++++--
 3 files changed, 168 insertions(+), 7 deletions(-)

Comments

Jens Axboe Nov. 11, 2015, 9:27 p.m. UTC | #1
On 11/11/2015 03:06 AM, Matias Bjørling wrote:
> Add support for registering as a LightNVM device. This allows us to
> evaluate the performance of the LightNVM library.
>
> In /drivers/Makefile, LightNVM is moved above block device drivers
> to make sure that the LightNVM media managers have been initialized
> before drivers under /drivers/block are initialized.

Generally looks ok. One question:

> +static void *null_lnvm_create_dma_pool(struct request_queue *q, char *name)
> +{
> +	mempool_t *virtmem_pool;
> +
> +	ppa_cache = kmem_cache_create(name, PAGE_SIZE, 0, 0, NULL);
> +	if (!ppa_cache) {
> +		pr_err("null_nvm: Unable to create kmem cache\n");
> +		return NULL;
> +	}
> +
> +	virtmem_pool = mempool_create_slab_pool(64, ppa_cache);
> +	if (!virtmem_pool) {
> +		pr_err("null_nvm: Unable to create virtual memory pool\n");
> +		return NULL;
> +	}
> +
> +	return virtmem_pool;
> +}

Why create a slab cache if it's pages? Why not just have the mempool 
alloc/free alloc single pages?
Christoph Hellwig Nov. 12, 2015, 8:53 a.m. UTC | #2
On Wed, Nov 11, 2015 at 11:06:38AM +0100, Matias Bj??rling wrote:
> Add support for registering as a LightNVM device. This allows us to
> evaluate the performance of the LightNVM library.
> 
> In /drivers/Makefile, LightNVM is moved above block device drivers
> to make sure that the LightNVM media managers have been initialized
> before drivers under /drivers/block are initialized.

I don't think mixing the lighnvm code into null_blk is a good idea.
Please just create a separate null_nvm device for lighnvm. It already
has a completely separate I/O path.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Nov. 12, 2015, 3:49 p.m. UTC | #3
On 11/12/2015 01:53 AM, Christoph Hellwig wrote:
> On Wed, Nov 11, 2015 at 11:06:38AM +0100, Matias Bj??rling wrote:
>> Add support for registering as a LightNVM device. This allows us to
>> evaluate the performance of the LightNVM library.
>>
>> In /drivers/Makefile, LightNVM is moved above block device drivers
>> to make sure that the LightNVM media managers have been initialized
>> before drivers under /drivers/block are initialized.
>
> I don't think mixing the lighnvm code into null_blk is a good idea.
> Please just create a separate null_nvm device for lighnvm. It already
> has a completely separate I/O path.

But it still avoids duplicating the generics of it. Patch for null_blk:

3 files changed, 168 insertions(+), 7 deletions (-)

vs a standalone of:

3 files changed, 459 insertions(+)

It doesn't screw up null_blk, so I'd prefer just adding it as an 
on-the-side mode for that.
Christoph Hellwig Nov. 12, 2015, 3:52 p.m. UTC | #4
On Thu, Nov 12, 2015 at 08:49:08AM -0700, Jens Axboe wrote:
> But it still avoids duplicating the generics of it. Patch for null_blk:
> 
> 3 files changed, 168 insertions(+), 7 deletions (-)
> 
> vs a standalone of:
> 
> 3 files changed, 459 insertions(+)
> 
> It doesn't screw up null_blk, so I'd prefer just adding it as an on-the-side
> mode for that.

300 lines of boilerplate for just setting up a few request_queues seem
wrong, can you show the actual patch you measured?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Nov. 12, 2015, 3:54 p.m. UTC | #5
On 11/12/2015 08:52 AM, Christoph Hellwig wrote:
> On Thu, Nov 12, 2015 at 08:49:08AM -0700, Jens Axboe wrote:
>> But it still avoids duplicating the generics of it. Patch for null_blk:
>>
>> 3 files changed, 168 insertions(+), 7 deletions (-)
>>
>> vs a standalone of:
>>
>> 3 files changed, 459 insertions(+)
>>
>> It doesn't screw up null_blk, so I'd prefer just adding it as an on-the-side
>> mode for that.
>
> 300 lines of boilerplate for just setting up a few request_queues seem
> wrong, can you show the actual patch you measured?

I just took it from Matias' last posting:

http://marc.info/?l=linux-kernel&m=144605858228534&w=2
Christoph Hellwig Nov. 12, 2015, 3:58 p.m. UTC | #6
On Thu, Nov 12, 2015 at 08:54:48AM -0700, Jens Axboe wrote:
> >300 lines of boilerplate for just setting up a few request_queues seem
> >wrong, can you show the actual patch you measured?
> 
> I just took it from Matias' last posting:
> 
> http://marc.info/?l=linux-kernel&m=144605858228534&w=2

Well, that one has all these crazy completion methods which
are of no use for a blk-mq driver, so it should really be
compared without those.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Nov. 12, 2015, 4 p.m. UTC | #7
On 11/12/2015 08:58 AM, Christoph Hellwig wrote:
> On Thu, Nov 12, 2015 at 08:54:48AM -0700, Jens Axboe wrote:
>>> 300 lines of boilerplate for just setting up a few request_queues seem
>>> wrong, can you show the actual patch you measured?
>>
>> I just took it from Matias' last posting:
>>
>> http://marc.info/?l=linux-kernel&m=144605858228534&w=2
>
> Well, that one has all these crazy completion methods which
> are of no use for a blk-mq driver, so it should really be
> compared without those.

So we can cut it down a bit, it's still going to be the same boilerplate 
code that null_blk has, even with just mq completions. If it ended up 
rewriting null_blk to be something else entirely or full of ifdef 
sprinkles, I'd agree. But for adding a hundred lines of code to be able 
to test lightnvm perf, I think it's  a no-brainer to just add it to 
null_blk and not have a separate module.
Matias Bjørling Nov. 12, 2015, 6:29 p.m. UTC | #8
On 11/12/2015 05:00 PM, Jens Axboe wrote:
> On 11/12/2015 08:58 AM, Christoph Hellwig wrote:
>> On Thu, Nov 12, 2015 at 08:54:48AM -0700, Jens Axboe wrote:
>>>> 300 lines of boilerplate for just setting up a few request_queues seem
>>>> wrong, can you show the actual patch you measured?
>>>
>>> I just took it from Matias' last posting:
>>>
>>> http://marc.info/?l=linux-kernel&m=144605858228534&w=2
>>
>> Well, that one has all these crazy completion methods which
>> are of no use for a blk-mq driver, so it should really be
>> compared without those.
>
> So we can cut it down a bit, it's still going to be the same boilerplate
> code that null_blk has, even with just mq completions. If it ended up
> rewriting null_blk to be something else entirely or full of ifdef
> sprinkles, I'd agree. But for adding a hundred lines of code to be able
> to test lightnvm perf, I think it's  a no-brainer to just add it to
> null_blk and not have a separate module.
>

As it is now, I prefer it part of null_blk, as long as it basically copy 
the core queueing structure. If null_nvm, we will have to maintain in 
two places. It'll be nice to keep it in one place.

The reason I would keep null_nvm, would be to add appropriate waiting 
times to simulate flash. However, I've now seen three implementations 
that utilized lightnvm for simulations, and it still doesn't scale to 
+1M IOPS that we need to actually compare it to a real device.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt
index 2f6c6ff..d8880ca 100644
--- a/Documentation/block/null_blk.txt
+++ b/Documentation/block/null_blk.txt
@@ -70,3 +70,6 @@  use_per_node_hctx=[0/1]: Default: 0
      parameter.
   1: The multi-queue block layer is instantiated with a hardware dispatch
      queue for each CPU node in the system.
+
+use_lightnvm=[0/1]: Default: 0
+  Register device with LightNVM. Requires blk-mq to be used.
diff --git a/drivers/Makefile b/drivers/Makefile
index 73d0391..795d0ca 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -63,6 +63,7 @@  obj-$(CONFIG_FB_I810)           += video/fbdev/i810/
 obj-$(CONFIG_FB_INTEL)          += video/fbdev/intelfb/
 
 obj-$(CONFIG_PARPORT)		+= parport/
+obj-$(CONFIG_NVM)		+= lightnvm/
 obj-y				+= base/ block/ misc/ mfd/ nfc/
 obj-$(CONFIG_LIBNVDIMM)		+= nvdimm/
 obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
@@ -70,7 +71,6 @@  obj-$(CONFIG_NUBUS)		+= nubus/
 obj-y				+= macintosh/
 obj-$(CONFIG_IDE)		+= ide/
 obj-$(CONFIG_SCSI)		+= scsi/
-obj-$(CONFIG_NVM)		+= lightnvm/
 obj-y				+= nvme/
 obj-$(CONFIG_ATA)		+= ata/
 obj-$(CONFIG_TARGET_CORE)	+= target/
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 1c9e4fe..6550634 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -8,6 +8,7 @@ 
 #include <linux/slab.h>
 #include <linux/blk-mq.h>
 #include <linux/hrtimer.h>
+#include <linux/lightnvm.h>
 
 struct nullb_cmd {
 	struct list_head list;
@@ -39,6 +40,7 @@  struct nullb {
 
 	struct nullb_queue *queues;
 	unsigned int nr_queues;
+	char disk_name[DISK_NAME_LEN];
 };
 
 static LIST_HEAD(nullb_list);
@@ -119,6 +121,10 @@  static int nr_devices = 2;
 module_param(nr_devices, int, S_IRUGO);
 MODULE_PARM_DESC(nr_devices, "Number of devices to register");
 
+static bool use_lightnvm;
+module_param(use_lightnvm, bool, S_IRUGO);
+MODULE_PARM_DESC(use_lightnvm, "Register as a LightNVM device");
+
 static int irqmode = NULL_IRQ_SOFTIRQ;
 
 static int null_set_irqmode(const char *str, const struct kernel_param *kp)
@@ -426,6 +432,8 @@  static void null_del_dev(struct nullb *nullb)
 {
 	list_del_init(&nullb->list);
 
+	if (use_lightnvm)
+		nvm_unregister(nullb->disk->disk_name);
 	del_gendisk(nullb->disk);
 	blk_cleanup_queue(nullb->q);
 	if (queue_mode == NULL_Q_MQ)
@@ -435,6 +443,135 @@  static void null_del_dev(struct nullb *nullb)
 	kfree(nullb);
 }
 
+#ifdef CONFIG_NVM
+
+static struct kmem_cache *ppa_cache;
+
+static void null_lnvm_end_io(struct request *rq, int error)
+{
+	struct nvm_rq *rqd = rq->end_io_data;
+	struct nvm_dev *dev = rqd->dev;
+
+	dev->mt->end_io(rqd, error);
+
+	blk_put_request(rq);
+}
+
+static int null_lnvm_submit_io(struct request_queue *q, struct nvm_rq *rqd)
+{
+	struct request *rq;
+	struct bio *bio = rqd->bio;
+
+	rq = blk_mq_alloc_request(q, bio_rw(bio), GFP_KERNEL, 0);
+	if (IS_ERR(rq))
+		return -ENOMEM;
+
+	rq->cmd_type = REQ_TYPE_DRV_PRIV;
+	rq->__sector = bio->bi_iter.bi_sector;
+	rq->ioprio = bio_prio(bio);
+
+	if (bio_has_data(bio))
+		rq->nr_phys_segments = bio_phys_segments(q, bio);
+
+	rq->__data_len = bio->bi_iter.bi_size;
+	rq->bio = rq->biotail = bio;
+
+	rq->end_io_data = rqd;
+
+	blk_execute_rq_nowait(q, NULL, rq, 0, null_lnvm_end_io);
+
+	return 0;
+}
+
+static int null_lnvm_id(struct request_queue *q, struct nvm_id *id)
+{
+	sector_t size = gb * 1024 * 1024 * 1024ULL;
+	struct nvm_id_group *grp;
+
+	id->ver_id = 0x1;
+	id->vmnt = 0;
+	id->cgrps = 1;
+	id->cap = 0x3;
+	id->dom = 0x1;
+	id->ppat = NVM_ADDRMODE_LINEAR;
+
+	do_div(size, bs); /* convert size to pages */
+	grp = &id->groups[0];
+	grp->mtype = 0;
+	grp->fmtype = 1;
+	grp->num_ch = 1;
+	grp->num_lun = 1;
+	grp->num_pln = 1;
+	grp->num_blk = size / 256;
+	grp->num_pg = 256;
+	grp->fpg_sz = bs;
+	grp->csecs = bs;
+	grp->trdt = 25000;
+	grp->trdm = 25000;
+	grp->tprt = 500000;
+	grp->tprm = 500000;
+	grp->tbet = 1500000;
+	grp->tbem = 1500000;
+	grp->mpos = 0x010101; /* single plane rwe */
+	grp->cpar = hw_queue_depth;
+
+	return 0;
+}
+
+static void *null_lnvm_create_dma_pool(struct request_queue *q, char *name)
+{
+	mempool_t *virtmem_pool;
+
+	ppa_cache = kmem_cache_create(name, PAGE_SIZE, 0, 0, NULL);
+	if (!ppa_cache) {
+		pr_err("null_nvm: Unable to create kmem cache\n");
+		return NULL;
+	}
+
+	virtmem_pool = mempool_create_slab_pool(64, ppa_cache);
+	if (!virtmem_pool) {
+		pr_err("null_nvm: Unable to create virtual memory pool\n");
+		return NULL;
+	}
+
+	return virtmem_pool;
+}
+
+static void null_lnvm_destroy_dma_pool(void *pool)
+{
+	mempool_t *virtmem_pool = pool;
+
+	mempool_destroy(virtmem_pool);
+}
+
+static void *null_lnvm_dev_dma_alloc(struct request_queue *q, void *pool,
+				gfp_t mem_flags, dma_addr_t *dma_handler)
+{
+	return mempool_alloc(pool, mem_flags);
+}
+
+static void null_lnvm_dev_dma_free(void *pool, void *entry,
+							dma_addr_t dma_handler)
+{
+	mempool_free(entry, pool);
+}
+
+static struct nvm_dev_ops null_lnvm_dev_ops = {
+	.identity		= null_lnvm_id,
+	.submit_io		= null_lnvm_submit_io,
+
+	.create_dma_pool	= null_lnvm_create_dma_pool,
+	.destroy_dma_pool	= null_lnvm_destroy_dma_pool,
+	.dev_dma_alloc		= null_lnvm_dev_dma_alloc,
+	.dev_dma_free		= null_lnvm_dev_dma_free,
+
+	/* Simulate nvme protocol restriction */
+	.max_phys_sect		= 64,
+};
+#else
+static struct nvm_dev_ops null_lnvm_dev_ops;
+#endif /* CONFIG_NVM */
+
 static int null_open(struct block_device *bdev, fmode_t mode)
 {
 	return 0;
@@ -574,11 +711,6 @@  static int null_add_dev(void)
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, nullb->q);
 	queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, nullb->q);
 
-	disk = nullb->disk = alloc_disk_node(1, home_node);
-	if (!disk) {
-		rv = -ENOMEM;
-		goto out_cleanup_blk_queue;
-	}
 
 	mutex_lock(&lock);
 	list_add_tail(&nullb->list, &nullb_list);
@@ -588,6 +720,21 @@  static int null_add_dev(void)
 	blk_queue_logical_block_size(nullb->q, bs);
 	blk_queue_physical_block_size(nullb->q, bs);
 
+	sprintf(nullb->disk_name, "nullb%d", nullb->index);
+
+	if (use_lightnvm) {
+		rv = nvm_register(nullb->q, nullb->disk_name,
+							&null_lnvm_dev_ops);
+		if (rv)
+			goto out_cleanup_blk_queue;
+		goto done;
+	}
+
+	disk = nullb->disk = alloc_disk_node(1, home_node);
+	if (!disk) {
+		rv = -ENOMEM;
+		goto out_cleanup_lightnvm;
+	}
 	size = gb * 1024 * 1024 * 1024ULL;
 	set_capacity(disk, size >> 9);
 
@@ -597,10 +744,15 @@  static int null_add_dev(void)
 	disk->fops		= &null_fops;
 	disk->private_data	= nullb;
 	disk->queue		= nullb->q;
-	sprintf(disk->disk_name, "nullb%d", nullb->index);
+	strncpy(disk->disk_name, nullb->disk_name, DISK_NAME_LEN);
+
 	add_disk(disk);
+done:
 	return 0;
 
+out_cleanup_lightnvm:
+	if (use_lightnvm)
+		nvm_unregister(nullb->disk_name);
 out_cleanup_blk_queue:
 	blk_cleanup_queue(nullb->q);
 out_cleanup_tags:
@@ -624,6 +776,12 @@  static int __init null_init(void)
 		bs = PAGE_SIZE;
 	}
 
+	if (use_lightnvm && queue_mode != NULL_Q_MQ) {
+		pr_warn("null_blk: LightNVM only supported for blk-mq\n");
+		pr_warn("null_blk: defaults queue mode to blk-mq\n");
+		queue_mode = NULL_Q_MQ;
+	}
+
 	if (queue_mode == NULL_Q_MQ && use_per_node_hctx) {
 		if (submit_queues < nr_online_nodes) {
 			pr_warn("null_blk: submit_queues param is set to %u.",