Message ID | 20181217104248.5828-3-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: queue mapping fix & improvement | expand |
On Mon, Dec 17, 2018 at 06:42:46PM +0800, Ming Lei wrote: > From: Christoph Hellwig <hch@lst.de> > > Even though poll_queues are zero, nvme's mapping for HCTX_TYPE_POLL still > may be setup via blk_mq_map_queues() which cause different mapping compared > with HCTX_TYPE_DEFAULT's mapping built from managed irq affinity. > > This mapping will cause hctx->type to be over-written in blk_mq_map_swqueue(), > then the whole mapping may become broken, for example, one same ctx can be > mapped to different hctxs with same hctx type. This bad mapping has caused > IO hang in simple dd test, as reported by Mike. > > This patch sets map->nr_queues as zero explictly if there is zero > queues for such queue type, also maps to correct hctx if .nr_queues of the > queue type is zero. > > Cc: Jeff Moyer <jmoyer@redhat.com> > Cc: Mike Snitzer <snitzer@redhat.com> > Cc: Christoph Hellwig <hch@lst.de> > > (don't handle zero .nr_queues map in blk_mq_map_swqueue()) > Signed-off-by: Ming Lei <ming.lei@redhat.com> I actually think we should split this into three patches: 1) skip zero-queue maps in blk_mq_map_swqueue 2) skip zero-queue maps in blk_mq_map_queue 3) don't duplicate maps in nvme sorry for misleading you with my quick WIP patch. I can send send patches for my two with a proper changelogs, but I'll leave blk_mq_map_swqueue to you as I don't full understand the rationale, even if it looks sensible to me.
On Mon, Dec 17, 2018 at 11:57:25AM +0100, Christoph Hellwig wrote: > On Mon, Dec 17, 2018 at 06:42:46PM +0800, Ming Lei wrote: > > From: Christoph Hellwig <hch@lst.de> > > > > Even though poll_queues are zero, nvme's mapping for HCTX_TYPE_POLL still > > may be setup via blk_mq_map_queues() which cause different mapping compared > > with HCTX_TYPE_DEFAULT's mapping built from managed irq affinity. > > > > This mapping will cause hctx->type to be over-written in blk_mq_map_swqueue(), > > then the whole mapping may become broken, for example, one same ctx can be > > mapped to different hctxs with same hctx type. This bad mapping has caused > > IO hang in simple dd test, as reported by Mike. > > > > This patch sets map->nr_queues as zero explictly if there is zero > > queues for such queue type, also maps to correct hctx if .nr_queues of the > > queue type is zero. > > > > Cc: Jeff Moyer <jmoyer@redhat.com> > > Cc: Mike Snitzer <snitzer@redhat.com> > > Cc: Christoph Hellwig <hch@lst.de> > > > > (don't handle zero .nr_queues map in blk_mq_map_swqueue()) > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > I actually think we should split this into three patches: > > 1) skip zero-queue maps in blk_mq_map_swqueue > 2) skip zero-queue maps in blk_mq_map_queue > 3) don't duplicate maps in nvme > > sorry for misleading you with my quick WIP patch. > > I can send send patches for my two with a proper changelogs, but OK, I just want to fix the issue quick, :-) > I'll leave blk_mq_map_swqueue to you as I don't full understand > the rationale, even if it looks sensible to me. If maps won't be duplicated, skip zero-queue maps in blk_mq_map_swqueue() may not be necessary. However, it may make the code more robust/simple. Thanks, Ming
diff --git a/block/blk-mq.c b/block/blk-mq.c index 313f28b2d079..e843f23843c8 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2431,6 +2431,9 @@ static void blk_mq_map_swqueue(struct request_queue *q) for (j = 0; j < set->nr_maps; j++) { hctx = blk_mq_map_queue_type(q, j, i); + if (!set->map[j].nr_queues) + continue; + /* * If the CPU is already set in the mask, then we've * mapped this one already. This can happen if diff --git a/block/blk-mq.h b/block/blk-mq.h index b63a0de8a07a..f50c73d559d7 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -105,12 +105,15 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, { enum hctx_type type = HCTX_TYPE_DEFAULT; - if (q->tag_set->nr_maps > HCTX_TYPE_POLL && - ((flags & REQ_HIPRI) && test_bit(QUEUE_FLAG_POLL, &q->queue_flags))) + if ((flags & REQ_HIPRI) && + q->tag_set->nr_maps > HCTX_TYPE_POLL && + q->tag_set->map[HCTX_TYPE_POLL].nr_queues && + test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) type = HCTX_TYPE_POLL; - else if (q->tag_set->nr_maps > HCTX_TYPE_READ && - ((flags & REQ_OP_MASK) == REQ_OP_READ)) + else if (((flags & REQ_OP_MASK) == REQ_OP_READ) && + q->tag_set->nr_maps > HCTX_TYPE_READ && + q->tag_set->map[HCTX_TYPE_READ].nr_queues) type = HCTX_TYPE_READ; return blk_mq_map_queue_type(q, type, cpu); diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index fb9d8270f32c..698b350b38cf 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -496,11 +496,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set) map->nr_queues = dev->io_queues[i]; if (!map->nr_queues) { BUG_ON(i == HCTX_TYPE_DEFAULT); - - /* shared set, resuse read set parameters */ - map->nr_queues = dev->io_queues[HCTX_TYPE_DEFAULT]; - qoff = 0; - offset = queue_irq_offset(dev); + continue; } /*