[0/2] check the number of hw queues mapped to sw queues

Message ID	20160608222557.GC1696@localhost.localdomain (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> Date: Wed, 8 Jun 2016 18:25:57 -0400 From: Keith Busch <keith.busch@intel.com> To: Ming Lin <mlin@kernel.org> Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@fb.com>, James Smart <james.smart@broadcom.com> Subject: Re: [PATCH 0/2] check the number of hw queues mapped to sw queues Message-ID: <20160608222557.GC1696@localhost.localdomain> References: <1465415292-9416-1-git-send-email-mlin@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1465415292-9416-1-git-send-email-mlin@kernel.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Message ID

20160608222557.GC1696@localhost.localdomain (mailing list archive)

State

New, archived

Headers

Date: Wed, 8 Jun 2016 18:25:57 -0400
From: Keith Busch <keith.busch@intel.com>
To: Ming Lin <mlin@kernel.org>
Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@fb.com>,
	James Smart <james.smart@broadcom.com>
Subject: Re: [PATCH 0/2] check the number of hw queues mapped to sw queues
Message-ID: <20160608222557.GC1696@localhost.localdomain>
References: <1465415292-9416-1-git-send-email-mlin@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1465415292-9416-1-git-send-email-mlin@kernel.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Commit Message

Keith Busch June 8, 2016, 10:25 p.m. UTC

On Wed, Jun 08, 2016 at 03:48:10PM -0400, Ming Lin wrote:
> Back to Jan 2016, I send a patch:
> [PATCH] blk-mq: check if all HW queues are mapped to cpu
> http://www.spinics.net/lists/linux-block/msg01038.html
> 
> It adds check code to blk_mq_update_queue_map().
> But it seems too aggresive because it's not an error that some hw queues
> were not mapped to sw queues.
> 
> So this series just add a new function blk_mq_hctx_mapped() to check
> how many hw queues were mapped. And the driver(for example, nvme-rdma)
> that cares about it will do the check.

Wouldn't you prefer all 6 get assigned in this scenario instead of
utilizing fewer resources than your controller provides? I would like
blk-mq to use them all.

I've been trying to change blk_mq_update_queue_map to do this, but it's
not as easy as it sounds. The following is the simplest patch I came
up with that gets a better mapping *most* of the time.

I have 31 queues and 32 CPUs, and these are the results:

  # for i in $(ls -1v /sys/block/nvme0n1/mq/); do
      printf "hctx_idx %2d: " $i
      cat /sys/block/nvme0n1/mq/$i/cpu_list
    done

Before:

hctx_idx  0: 0, 16
hctx_idx  1: 1, 17
hctx_idx  3: 2, 18
hctx_idx  5: 3, 19
hctx_idx  7: 4, 20
hctx_idx  9: 5, 21
hctx_idx 11: 6, 22
hctx_idx 13: 7, 23
hctx_idx 15: 8, 24
hctx_idx 17: 9, 25
hctx_idx 19: 10, 26
hctx_idx 21: 11, 27
hctx_idx 23: 12, 28
hctx_idx 25: 13, 29
hctx_idx 27: 14, 30
hctx_idx 29: 15, 31

After:

hctx_id  0: 0, 16
hctx_id  1: 1
hctx_id  2: 2
hctx_id  3: 3
hctx_id  4: 4
hctx_id  5: 5
hctx_id  6: 6
hctx_id  7: 7
hctx_id  8: 8
hctx_id  9: 9
hctx_id 10: 10
hctx_id 11: 11
hctx_id 12: 12
hctx_id 13: 13
hctx_id 14: 14
hctx_id 15: 15
hctx_id 16: 17
hctx_id 17: 18
hctx_id 18: 19
hctx_id 19: 20
hctx_id 20: 21
hctx_id 21: 22
hctx_id 22: 23
hctx_id 23: 24
hctx_id 24: 25
hctx_id 25: 26
hctx_id 26: 27
hctx_id 27: 28
hctx_id 28: 29
hctx_id 29: 30
hctx_id 30: 31

---
--
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ming Lin June 8, 2016, 10:47 p.m. UTC | #1

On Wed, Jun 8, 2016 at 3:25 PM, Keith Busch <keith.busch@intel.com> wrote:
> On Wed, Jun 08, 2016 at 03:48:10PM -0400, Ming Lin wrote:
>> Back to Jan 2016, I send a patch:
>> [PATCH] blk-mq: check if all HW queues are mapped to cpu
>> http://www.spinics.net/lists/linux-block/msg01038.html
>>
>> It adds check code to blk_mq_update_queue_map().
>> But it seems too aggresive because it's not an error that some hw queues
>> were not mapped to sw queues.
>>
>> So this series just add a new function blk_mq_hctx_mapped() to check
>> how many hw queues were mapped. And the driver(for example, nvme-rdma)
>> that cares about it will do the check.
>
> Wouldn't you prefer all 6 get assigned in this scenario instead of
> utilizing fewer resources than your controller provides? I would like
> blk-mq to use them all.

That's ideal.

But we'll always see corner case that there is hctx(s) not mapped.
So I want to at least prevent the crash and return an error in the driver.

Here is another example that I create 64 queues on a server with 72 cpus.

hctx index 0: 0, 36
hctx index 1: 1, 37
hctx index 3: 2, 38
hctx index 5: 3, 39
hctx index 7: 4, 40
hctx index 8: 5, 41
hctx index 10: 6, 42
hctx index 12: 7, 43
hctx index 14: 8, 44
hctx index 16: 9, 45
hctx index 17: 10, 46
hctx index 19: 11, 47
hctx index 21: 12, 48
hctx index 23: 13, 49
hctx index 24: 14, 50
hctx index 26: 15, 51
hctx index 28: 16, 52
hctx index 30: 17, 53
hctx index 32: 18, 54
hctx index 33: 19, 55
hctx index 35: 20, 56
hctx index 37: 21, 57
hctx index 39: 22, 58
hctx index 40: 23, 59
hctx index 42: 24, 60
hctx index 44: 25, 61
hctx index 46: 26, 62
hctx index 48: 27, 63
hctx index 49: 28, 64
hctx index 51: 29, 65
hctx index 53  30, 66
hctx index 55:  31, 67
hctx index 56:  32, 68
hctx index 58:  33, 69
hctx index 60:  34, 70
hctx index 62:  35, 71

Other hctxs are not mapped.


>
> I've been trying to change blk_mq_update_queue_map to do this, but it's
> not as easy as it sounds. The following is the simplest patch I came
> up with that gets a better mapping *most* of the time.

Not working for my case with 6 hw queues(8 cpus):

[  108.318247] nvme nvme0: 6 hw queues created, but only 5 were mapped
to sw queues

hctx_idx 0: 0 1 4 5
hctx_idx 1: None
hctx_idx 2: 2
hctx_idx 3: 3
hctx_idx 4: 6
hctx_idx 5: 7

>
> ---
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index d0634bc..941c406 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -75,11 +75,12 @@ int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues,
>                 */
>                 first_sibling = get_first_sibling(i);
>                 if (first_sibling == i) {
> -                       map[i] = cpu_to_queue_index(nr_uniq_cpus, nr_queues,
> -                                                       queue);
> +                       map[i] = cpu_to_queue_index(max(nr_queues, (nr_cpus - queue)), nr_queues, queue);
>                         queue++;
> -               } else
> +               } else {
>                         map[i] = map[first_sibling];
> +                       --nr_cpus;
> +               }
>         }
>
>         free_cpumask_var(cpus);
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Keith Busch June 8, 2016, 11:05 p.m. UTC | #2

On Wed, Jun 08, 2016 at 03:47:10PM -0700, Ming Lin wrote:
> On Wed, Jun 8, 2016 at 3:25 PM, Keith Busch <keith.busch@intel.com> wrote:
> > I've been trying to change blk_mq_update_queue_map to do this, but it's
> > not as easy as it sounds. The following is the simplest patch I came
> > up with that gets a better mapping *most* of the time.
> 
> Not working for my case with 6 hw queues(8 cpus):
> 
> [  108.318247] nvme nvme0: 6 hw queues created, but only 5 were mapped
> to sw queues
> 
> hctx_idx 0: 0 1 4 5
> hctx_idx 1: None
> hctx_idx 2: 2
> hctx_idx 3: 3
> hctx_idx 4: 6
> hctx_idx 5: 7

Heh, not one of the good cases I see. I don't think there's a simple
change to use all contexts. Might need a larger rewrite.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index d0634bc..941c406 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -75,11 +75,12 @@  int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues,
 		*/
 		first_sibling = get_first_sibling(i);
 		if (first_sibling == i) {
-			map[i] = cpu_to_queue_index(nr_uniq_cpus, nr_queues,
-							queue);
+			map[i] = cpu_to_queue_index(max(nr_queues, (nr_cpus - queue)), nr_queues, queue);
 			queue++;
-		} else
+		} else {
 			map[i] = map[first_sibling];
+			--nr_cpus;
+		}
 	}

 	free_cpumask_var(cpus);

[0/2] check the number of hw queues mapped to sw queues

Commit Message

Comments

Patch