diff mbox

blk-mq: check if all HW queues are mapped to cpu

Message ID 1452794840-29767-1-git-send-email-mlin@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Ming Lin Jan. 14, 2016, 6:07 p.m. UTC
From: Ming Lin <ming.l@ssi.samsung.com>

Suppose that a system has 8 logical CPUs(4 cores with hyperthread)
and that 5 hardware queues are provided by a block driver.
With the current algorithm this will lead to the following assignment
of logical CPU to hardware queue:

HWQ 0: 0 1
HWQ 1: 2 3
HWQ 2: 4 5
HWQ 3: 6 7
HWQ 4: (none)

One way to fix it is to change the algorithm so the assignment may be:

HWQ 0: 0 1
HWQ 1: 2 3
HWQ 2: 4 5
HWQ 3: 6
HWQ 4: 7

This patch only checks if all HW queues are mapped to cpu.
blk_mq_init_queue() will fail if not all HW queues are mapped.
And it also fails when hardware queue numbers > cpu numbers.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.comn>
---
 block/blk-mq-cpumap.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

Comments

Jens Axboe Jan. 14, 2016, 6:25 p.m. UTC | #1
On 01/14/2016 11:07 AM, Ming Lin wrote:
> From: Ming Lin <ming.l@ssi.samsung.com>
>
> Suppose that a system has 8 logical CPUs(4 cores with hyperthread)
> and that 5 hardware queues are provided by a block driver.
> With the current algorithm this will lead to the following assignment
> of logical CPU to hardware queue:
>
> HWQ 0: 0 1
> HWQ 1: 2 3
> HWQ 2: 4 5
> HWQ 3: 6 7
> HWQ 4: (none)
>
> One way to fix it is to change the algorithm so the assignment may be:
>
> HWQ 0: 0 1
> HWQ 1: 2 3
> HWQ 2: 4 5
> HWQ 3: 6
> HWQ 4: 7

This has been suggested before, but the previous mapping is actually 
what I originally intended, so that it's symmetric. So by design, not a 
bug. It's not that I'm completely adverse to changing it, but I've yet 
to see a compelling reason to do so (and your patch doesn't have that 
either, it just states that it changes the mapping from X to Y).
Ming Lin Jan. 14, 2016, 7:03 p.m. UTC | #2
On Thu, 2016-01-14 at 11:25 -0700, Jens Axboe wrote:
> On 01/14/2016 11:07 AM, Ming Lin wrote:
> > From: Ming Lin <ming.l@ssi.samsung.com>
> >
> > Suppose that a system has 8 logical CPUs(4 cores with hyperthread)
> > and that 5 hardware queues are provided by a block driver.
> > With the current algorithm this will lead to the following assignment
> > of logical CPU to hardware queue:
> >
> > HWQ 0: 0 1
> > HWQ 1: 2 3
> > HWQ 2: 4 5
> > HWQ 3: 6 7
> > HWQ 4: (none)
> >
> > One way to fix it is to change the algorithm so the assignment may be:
> >
> > HWQ 0: 0 1
> > HWQ 1: 2 3
> > HWQ 2: 4 5
> > HWQ 3: 6
> > HWQ 4: 7
> 
> This has been suggested before, but the previous mapping is actually 
> what I originally intended, so that it's symmetric. So by design, not a 
> bug. It's not that I'm completely adverse to changing it, but I've yet 
> to see a compelling reason to do so (and your patch doesn't have that 
> either, it just states that it changes the mapping from X to Y).

Yes, my patch doesn't do that. Only to mention that's a possible fix.
I'll remove that words to avoid confuse.

My patch only checks if all HW queues are mapped.

While developing NVMeOF driver, a new function
blk_mq_alloc_request_hctx() was added.

struct request *blk_mq_alloc_request_hctx(
	struct request_queue *q, int rw,
	unsigned int flags, unsigned int hctx_idx);

===
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 30 19:45:48 2015 +0100

    blk-mq: add blk_mq_alloc_request_hctx
    
    For some protocols like NVMe over Fabrics we need to be able to send
    initialization commands to a specific queue.
===

This function assumes all hctx are mapped, otherwise it will crash because hctx->tags is NULL.

During the tests, different hw queue numbers are passed into NVMeOF driver.
Then on my setup(8 logical cpus: 4core hyperthread), 5 hw queues will make it crash.
Because queue 4 is not mapped.

So we'd better check the mapping and fail blk_mq_init_queue() if not all
HW queues are mapped.

Agree?



--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Jan. 18, 2016, 10:25 a.m. UTC | #3
>> Suppose that a system has 8 logical CPUs(4 cores with hyperthread)
>> and that 5 hardware queues are provided by a block driver.
>> With the current algorithm this will lead to the following assignment
>> of logical CPU to hardware queue:
>>
>> HWQ 0: 0 1
>> HWQ 1: 2 3
>> HWQ 2: 4 5
>> HWQ 3: 6 7
>> HWQ 4: (none)
>>
>> One way to fix it is to change the algorithm so the assignment may be:
>>
>> HWQ 0: 0 1
>> HWQ 1: 2 3
>> HWQ 2: 4 5
>> HWQ 3: 6
>> HWQ 4: 7
>
> This has been suggested before, but the previous mapping is actually
> what I originally intended, so that it's symmetric. So by design, not a
> bug. It's not that I'm completely adverse to changing it, but I've yet
> to see a compelling reason to do so (and your patch doesn't have that
> either, it just states that it changes the mapping from X to Y).

I tend to agree that there is no good reason for this sort of artificial
mapping. I'd be happy if blk_mq_alloc_tag_set will fail (with an
explanatory log message) when given a weird mapping such as more
HW queues than cpus or non-symmetric number of HW queues...

Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 8764c24..5a88ac4 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -36,10 +36,17 @@  int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues,
 {
 	unsigned int i, nr_cpus, nr_uniq_cpus, queue, first_sibling;
 	cpumask_var_t cpus;
+	int *queue_map;
 
-	if (!alloc_cpumask_var(&cpus, GFP_ATOMIC))
+	queue_map = kzalloc(sizeof(int) * nr_queues, GFP_KERNEL);
+	if (!queue_map)
 		return 1;
 
+	if (!alloc_cpumask_var(&cpus, GFP_ATOMIC)) {
+		kfree(queue_map);
+		return 1;
+	}
+
 	cpumask_clear(cpus);
 	nr_cpus = nr_uniq_cpus = 0;
 	for_each_cpu(i, online_mask) {
@@ -54,7 +61,7 @@  int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues,
 	for_each_possible_cpu(i) {
 		if (!cpumask_test_cpu(i, online_mask)) {
 			map[i] = 0;
-			continue;
+			goto mapped;
 		}
 
 		/*
@@ -65,7 +72,7 @@  int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues,
 		if (nr_queues >= nr_cpus || nr_cpus == nr_uniq_cpus) {
 			map[i] = cpu_to_queue_index(nr_cpus, nr_queues, queue);
 			queue++;
-			continue;
+			goto mapped;
 		}
 
 		/*
@@ -80,10 +87,22 @@  int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues,
 			queue++;
 		} else
 			map[i] = map[first_sibling];
+
+mapped:
+		/* i is cpu, map[i] is queue index */
+		queue_map[map[i]] = 1;
 	}
 
 	free_cpumask_var(cpus);
-	return 0;
+
+	/* check if all queues are mapped to cpu */
+	for (i = 0; i < nr_queues; i++)
+		if (!queue_map[i])
+			break;
+
+	kfree(queue_map);
+
+	return i != nr_queues;
 }
 
 unsigned int *blk_mq_make_queue_map(struct blk_mq_tag_set *set)