diff mbox series

[V2,2/4] blk-mq: fix shared queue mapping

Message ID 20181217104248.5828-3-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series blk-mq: queue mapping fix & improvement | expand

Commit Message

Ming Lei Dec. 17, 2018, 10:42 a.m. UTC
From: Christoph Hellwig <hch@lst.de>

Even though poll_queues are zero, nvme's mapping for HCTX_TYPE_POLL still
may be setup via blk_mq_map_queues() which cause different mapping compared
with HCTX_TYPE_DEFAULT's mapping built from managed irq affinity.

This mapping will cause hctx->type to be over-written in blk_mq_map_swqueue(),
then the whole mapping may become broken, for example, one same ctx can be
mapped to different hctxs with same hctx type. This bad mapping has caused
IO hang in simple dd test, as reported by Mike.

This patch sets map->nr_queues as zero explictly if there is zero
queues for such queue type, also maps to correct hctx if .nr_queues of the
queue type is zero.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>

(don't handle zero .nr_queues map in blk_mq_map_swqueue())
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c          |  3 +++
 block/blk-mq.h          | 11 +++++++----
 drivers/nvme/host/pci.c |  6 +-----
 3 files changed, 11 insertions(+), 9 deletions(-)

Comments

Christoph Hellwig Dec. 17, 2018, 10:57 a.m. UTC | #1
On Mon, Dec 17, 2018 at 06:42:46PM +0800, Ming Lei wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Even though poll_queues are zero, nvme's mapping for HCTX_TYPE_POLL still
> may be setup via blk_mq_map_queues() which cause different mapping compared
> with HCTX_TYPE_DEFAULT's mapping built from managed irq affinity.
> 
> This mapping will cause hctx->type to be over-written in blk_mq_map_swqueue(),
> then the whole mapping may become broken, for example, one same ctx can be
> mapped to different hctxs with same hctx type. This bad mapping has caused
> IO hang in simple dd test, as reported by Mike.
> 
> This patch sets map->nr_queues as zero explictly if there is zero
> queues for such queue type, also maps to correct hctx if .nr_queues of the
> queue type is zero.
> 
> Cc: Jeff Moyer <jmoyer@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> 
> (don't handle zero .nr_queues map in blk_mq_map_swqueue())
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

I actually think we should split this into three patches:

 1) skip zero-queue maps in blk_mq_map_swqueue
 2) skip zero-queue maps in blk_mq_map_queue
 3) don't duplicate maps in nvme

sorry for misleading you with my quick WIP patch.

I can send send patches for my two with a proper changelogs, but
I'll leave blk_mq_map_swqueue to you as I don't full understand
the rationale, even if it looks sensible to me.
Ming Lei Dec. 17, 2018, 11:06 a.m. UTC | #2
On Mon, Dec 17, 2018 at 11:57:25AM +0100, Christoph Hellwig wrote:
> On Mon, Dec 17, 2018 at 06:42:46PM +0800, Ming Lei wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > 
> > Even though poll_queues are zero, nvme's mapping for HCTX_TYPE_POLL still
> > may be setup via blk_mq_map_queues() which cause different mapping compared
> > with HCTX_TYPE_DEFAULT's mapping built from managed irq affinity.
> > 
> > This mapping will cause hctx->type to be over-written in blk_mq_map_swqueue(),
> > then the whole mapping may become broken, for example, one same ctx can be
> > mapped to different hctxs with same hctx type. This bad mapping has caused
> > IO hang in simple dd test, as reported by Mike.
> > 
> > This patch sets map->nr_queues as zero explictly if there is zero
> > queues for such queue type, also maps to correct hctx if .nr_queues of the
> > queue type is zero.
> > 
> > Cc: Jeff Moyer <jmoyer@redhat.com>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > 
> > (don't handle zero .nr_queues map in blk_mq_map_swqueue())
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> I actually think we should split this into three patches:
> 
>  1) skip zero-queue maps in blk_mq_map_swqueue
>  2) skip zero-queue maps in blk_mq_map_queue
>  3) don't duplicate maps in nvme
> 
> sorry for misleading you with my quick WIP patch.
> 
> I can send send patches for my two with a proper changelogs, but

OK, I just want to fix the issue quick, :-)

> I'll leave blk_mq_map_swqueue to you as I don't full understand
> the rationale, even if it looks sensible to me.

If maps won't be duplicated, skip zero-queue maps in blk_mq_map_swqueue() may
not be necessary. However, it may make the code more robust/simple.

Thanks,
Ming
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 313f28b2d079..e843f23843c8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2431,6 +2431,9 @@  static void blk_mq_map_swqueue(struct request_queue *q)
 		for (j = 0; j < set->nr_maps; j++) {
 			hctx = blk_mq_map_queue_type(q, j, i);
 
+			if (!set->map[j].nr_queues)
+				continue;
+
 			/*
 			 * If the CPU is already set in the mask, then we've
 			 * mapped this one already. This can happen if
diff --git a/block/blk-mq.h b/block/blk-mq.h
index b63a0de8a07a..f50c73d559d7 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -105,12 +105,15 @@  static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
 {
 	enum hctx_type type = HCTX_TYPE_DEFAULT;
 
-	if (q->tag_set->nr_maps > HCTX_TYPE_POLL &&
-	    ((flags & REQ_HIPRI) && test_bit(QUEUE_FLAG_POLL, &q->queue_flags)))
+	if ((flags & REQ_HIPRI) &&
+	    q->tag_set->nr_maps > HCTX_TYPE_POLL &&
+	    q->tag_set->map[HCTX_TYPE_POLL].nr_queues &&
+	    test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		type = HCTX_TYPE_POLL;
 
-	else if (q->tag_set->nr_maps > HCTX_TYPE_READ &&
-		 ((flags & REQ_OP_MASK) == REQ_OP_READ))
+	else if (((flags & REQ_OP_MASK) == REQ_OP_READ) &&
+	         q->tag_set->nr_maps > HCTX_TYPE_READ &&
+		 q->tag_set->map[HCTX_TYPE_READ].nr_queues)
 		type = HCTX_TYPE_READ;
 
 	return blk_mq_map_queue_type(q, type, cpu);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fb9d8270f32c..698b350b38cf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -496,11 +496,7 @@  static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 		map->nr_queues = dev->io_queues[i];
 		if (!map->nr_queues) {
 			BUG_ON(i == HCTX_TYPE_DEFAULT);
-
-			/* shared set, resuse read set parameters */
-			map->nr_queues = dev->io_queues[HCTX_TYPE_DEFAULT];
-			qoff = 0;
-			offset = queue_irq_offset(dev);
+			continue;
 		}
 
 		/*