Message ID | 1483569671-1462-6-git-send-email-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 2017-01-04 at 17:41 -0500, Keith Busch wrote: > +static void blk_mq_abandon_stopped_requests(struct request_queue *q) > +{ > + int i; > + struct request *rq, *next; > + struct blk_mq_hw_ctx *hctx; > + LIST_HEAD(rq_list); > + > + blk_mq_sync_queue(q); > + > + spin_lock(&q->requeue_lock); > + list_for_each_entry_safe(rq, next, &q->requeue_list, queuelist) { > + struct blk_mq_ctx *ctx; > + > + ctx = rq->mq_ctx; > + hctx = blk_mq_map_queue(q, ctx->cpu); > + if (blk_mq_hctx_stopped(hctx)) { > + list_del_init(&rq->queuelist); > + > + spin_lock(&hctx->lock); > + list_add_tail(&rq->queuelist, &rq_list); > + spin_unlock(&hctx->lock); > + } > + } > + > + queue_for_each_hw_ctx(q, hctx, i) { > + if (!blk_mq_hctx_stopped(hctx)) > + continue; > + > + flush_busy_ctxs(hctx, &rq_list); > + > + spin_lock(&hctx->lock); > + if (!list_empty(&hctx->dispatch)) > + list_splice_init(&hctx->dispatch, &rq_list); > + spin_unlock(&hctx->lock); > + } > + spin_unlock(&q->requeue_lock); > + > + while (!list_empty(&rq_list)) { > + rq = list_first_entry(&rq_list, struct request, queuelist); > + list_del_init(&rq->queuelist); > + rq->errors = -EAGAIN; > + blk_mq_end_request(rq, rq->errors); > + } > +} Hello Keith, This patch adds a second code path to the blk-mq core for running queues and hence will make the blk-mq core harder to maintain. Have you considered to implement this functionality by introducing a new "fail all requests" flag for hctx queues such that blk_mq_abandon_stopped_requests() can reuse the existing mechanism for running a queue? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> If hardware queues are stopped for some event, like the device has been > suspended by power management, requests allocated on that hardware queue > are indefinitely stuck causing a queue freeze to wait forever. I have a problem with this patch. IMO, this is a general issue so, so why do we tie a fix to calling blk_mq_update_nr_hw_queues()? We might not need to update nr_hw_queues at all. I'm fine with the blk_mq_abandon_stopped_requests but not with its call-site. Usually a driver knows when it wants to abandon all busy requests blk_mq_tagset_busy_iter(), maybe the right approach is to add a hook for all allocated tags? Or have blk_mq_quisce_queue get a fail all requests parameter from the callers? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 05, 2017 at 07:33:22AM +0000, Bart Van Assche wrote: > This patch adds a second code path to the blk-mq core for running queues and > hence will make the blk-mq core harder to maintain. Have you considered to > implement this functionality by introducing a new "fail all requests" flag > for hctx queues such that blk_mq_abandon_stopped_requests() can reuse the > existing mechanism for running a queue? Thanks for the suggestion. I'll look into that. I wanted to avoid more flags to test for in the fast-path, but I see that a special queue run method is problematic for maintenance. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 13, 2017 at 11:05:19PM +0200, Sagi Grimberg wrote: > > If hardware queues are stopped for some event, like the device has been > > suspended by power management, requests allocated on that hardware queue > > are indefinitely stuck causing a queue freeze to wait forever. > > I have a problem with this patch. IMO, this is a general issue so, so > why do we tie a fix to calling blk_mq_update_nr_hw_queues()? We might > not need to update nr_hw_queues at all. I'm fine with the > blk_mq_abandon_stopped_requests but not with its call-site. > > Usually a driver knows when it wants to abandon all busy requests > blk_mq_tagset_busy_iter(), maybe the right approach is to add > a hook for all allocated tags? Or have blk_mq_quisce_queue get a > fail all requests parameter from the callers? This patch is overly aggressive on failing allocated requests. There are scenarios where we wouldn't want to abandon them, like if the hw context is about to be brough back online, but this patch assumes all need to be abandoned. I'll see if there's some other tricks we can have a driver do. Thanks for the suggestions. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> If hardware queues are stopped for some event, like the device has been >>> suspended by power management, requests allocated on that hardware queue >>> are indefinitely stuck causing a queue freeze to wait forever. >> >> I have a problem with this patch. IMO, this is a general issue so, so >> why do we tie a fix to calling blk_mq_update_nr_hw_queues()? We might >> not need to update nr_hw_queues at all. I'm fine with the >> blk_mq_abandon_stopped_requests but not with its call-site. >> >> Usually a driver knows when it wants to abandon all busy requests >> blk_mq_tagset_busy_iter(), maybe the right approach is to add >> a hook for all allocated tags? Or have blk_mq_quisce_queue get a >> fail all requests parameter from the callers? > > This patch is overly aggressive on failing allocated requests. There > are scenarios where we wouldn't want to abandon them, like if the hw > context is about to be brough back online, but this patch assumes all > need to be abandoned. I'll see if there's some other tricks we can have > a driver do. Thanks for the suggestions. I agree, I do think though that this should be driven from the driver, because for fabrics, we might have some fabric error that triggers a periodic reconnect. So the "hw context is about to be brought back" is unknown from the driver pov, and when we delete the controller (because we give up) this is exactly where we need to abandon the allocated requests. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/blk-mq.c b/block/blk-mq.c index 9b7ed03..0c9a2a3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -117,22 +117,12 @@ void blk_mq_unfreeze_queue(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); -/** - * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished - * @q: request queue. - * - * Note: this function does not prevent that the struct request end_io() - * callback function is invoked. Additionally, it is not prevented that - * new queue_rq() calls occur unless the queue has been stopped first. - */ -void blk_mq_quiesce_queue(struct request_queue *q) +static void blk_mq_sync_queue(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; unsigned int i; bool rcu = false; - blk_mq_stop_hw_queues(q); - queue_for_each_hw_ctx(q, hctx, i) { if (hctx->flags & BLK_MQ_F_BLOCKING) synchronize_srcu(&hctx->queue_rq_srcu); @@ -142,6 +132,20 @@ void blk_mq_quiesce_queue(struct request_queue *q) if (rcu) synchronize_rcu(); } + +/** + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished + * @q: request queue. + * + * Note: this function does not prevent that the struct request end_io() + * callback function is invoked. Additionally, it is not prevented that + * new queue_rq() calls occur unless the queue has been stopped first. + */ +void blk_mq_quiesce_queue(struct request_queue *q) +{ + blk_mq_stop_hw_queues(q); + blk_mq_sync_queue(q); +} EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue); void blk_mq_wake_waiters(struct request_queue *q) @@ -2228,6 +2232,51 @@ static void blk_mq_queue_reinit(struct request_queue *q, blk_mq_sysfs_register(q); } +static void blk_mq_abandon_stopped_requests(struct request_queue *q) +{ + int i; + struct request *rq, *next; + struct blk_mq_hw_ctx *hctx; + LIST_HEAD(rq_list); + + blk_mq_sync_queue(q); + + spin_lock(&q->requeue_lock); + list_for_each_entry_safe(rq, next, &q->requeue_list, queuelist) { + struct blk_mq_ctx *ctx; + + ctx = rq->mq_ctx; + hctx = blk_mq_map_queue(q, ctx->cpu); + if (blk_mq_hctx_stopped(hctx)) { + list_del_init(&rq->queuelist); + + spin_lock(&hctx->lock); + list_add_tail(&rq->queuelist, &rq_list); + spin_unlock(&hctx->lock); + } + } + + queue_for_each_hw_ctx(q, hctx, i) { + if (!blk_mq_hctx_stopped(hctx)) + continue; + + flush_busy_ctxs(hctx, &rq_list); + + spin_lock(&hctx->lock); + if (!list_empty(&hctx->dispatch)) + list_splice_init(&hctx->dispatch, &rq_list); + spin_unlock(&hctx->lock); + } + spin_unlock(&q->requeue_lock); + + while (!list_empty(&rq_list)) { + rq = list_first_entry(&rq_list, struct request, queuelist); + list_del_init(&rq->queuelist); + rq->errors = -EAGAIN; + blk_mq_end_request(rq, rq->errors); + } +} + /* * New online cpumask which is going to be set in this hotplug event. * Declare this cpumasks as global as cpu-hotplug operation is invoked @@ -2250,6 +2299,8 @@ static void blk_mq_queue_reinit_work(void) list_for_each_entry(q, &all_q_list, all_q_node) blk_mq_freeze_queue_start(q); list_for_each_entry(q, &all_q_list, all_q_node) + blk_mq_abandon_stopped_requests(q); + list_for_each_entry(q, &all_q_list, all_q_node) blk_mq_freeze_queue_wait(q); list_for_each_entry(q, &all_q_list, all_q_node) @@ -2477,7 +2528,11 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) return; list_for_each_entry(q, &set->tag_list, tag_set_list) - blk_mq_freeze_queue(q); + blk_mq_freeze_queue_start(q); + list_for_each_entry(q, &set->tag_list, tag_set_list) + blk_mq_abandon_stopped_requests(q); + list_for_each_entry(q, &set->tag_list, tag_set_list) + blk_mq_freeze_queue_wait(q); set->nr_hw_queues = nr_hw_queues; if (set->ops->map_queues)
If hardware queues are stopped for some event, like the device has been suspended by power management, requests allocated on that hardware queue are indefinitely stuck causing a queue freeze to wait forever. This patch abandons requests on stopped queues after syncing with the all queue_rq events when we need to rebalance the queues. While we would prefer not to end the requests error if it's possible to submit them on a different context, there's no good way to unwind a request to submit on a valid context once it enters a stopped context for removal. Ending IO with EAGAIN is a better alternative than deadlocking. Reported-by: Marc Merlin <marc@merlins.org> Signed-off-by: Keith Busch <keith.busch@intel.com> --- block/blk-mq.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 67 insertions(+), 12 deletions(-)