From patchwork Tue Jun 2 09:14:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11583435 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 938EF90 for ; Tue, 2 Jun 2020 09:15:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7774B2063A for ; Tue, 2 Jun 2020 09:15:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KRigs3DR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726110AbgFBJP0 (ORCPT ); Tue, 2 Jun 2020 05:15:26 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:45994 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726174AbgFBJPZ (ORCPT ); Tue, 2 Jun 2020 05:15:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1591089323; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vQxeFBBL6Iqwv4tdiCIh6KvIs4mo8XCWf3P5jM2SPOc=; b=KRigs3DRQb4R+S6wrVZKfkZA5uQHyKnL/zrEoWKJsPTtAebxtw9C6bFaAKsFqMHc4BVdVQ 0TYRG5hYlzz3lvc4tdFc3YflO7rhiqa/xen9KVFuqr8xYYvmmv+4Tvk6o2XJX8/ZBHtm60 UWweIiQqmndOaT9nhLqYkYGcfLTldbk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-174-vVstX5PSNYaxZ9j8pQLWyg-1; Tue, 02 Jun 2020 05:15:22 -0400 X-MC-Unique: vVstX5PSNYaxZ9j8pQLWyg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A1B7F80058E; Tue, 2 Jun 2020 09:15:20 +0000 (UTC) Received: from localhost (ovpn-12-167.pek2.redhat.com [10.72.12.167]) by smtp.corp.redhat.com (Postfix) with ESMTP id DC47C10013C1; Tue, 2 Jun 2020 09:15:16 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Sagi Grimberg , Baolin Wang , Christoph Hellwig , Douglas Anderson , Johannes Thumshirn , Christoph Hellwig Subject: [PATCH V4 1/6] blk-mq: pass request queue into get/put budget callback Date: Tue, 2 Jun 2020 17:14:57 +0800 Message-Id: <20200602091502.1822499-2-ming.lei@redhat.com> In-Reply-To: <20200602091502.1822499-1-ming.lei@redhat.com> References: <20200602091502.1822499-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org blk-mq budget is abstract from scsi's device queue depth, and it is always per-request-queue instead of hctx. It can be quite absurd to get a budget from one hctx, then dequeue a request from scheduler queue, and this request may not belong to this hctx, at least for bfq and deadline. So fix the mess and always pass request queue to get/put budget callback. Cc: Sagi Grimberg Cc: Baolin Wang Cc: Christoph Hellwig Cc: Douglas Anderson Reviewed-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Douglas Anderson Reviewed-by: Sagi Grimberg Tested-by: Baolin Wang Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 8 ++++---- block/blk-mq.c | 8 ++++---- block/blk-mq.h | 12 ++++-------- drivers/scsi/scsi_lib.c | 8 +++----- include/linux/blk-mq.h | 4 ++-- 5 files changed, 17 insertions(+), 23 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index fdcc2c1dd178..a31e281e9d31 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -108,12 +108,12 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) break; } - if (!blk_mq_get_dispatch_budget(hctx)) + if (!blk_mq_get_dispatch_budget(q)) break; rq = e->type->ops.dispatch_request(hctx); if (!rq) { - blk_mq_put_dispatch_budget(hctx); + blk_mq_put_dispatch_budget(q); /* * We're releasing without dispatching. Holding the * budget could have blocked any "hctx"s with the @@ -173,12 +173,12 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) if (!sbitmap_any_bit_set(&hctx->ctx_map)) break; - if (!blk_mq_get_dispatch_budget(hctx)) + if (!blk_mq_get_dispatch_budget(q)) break; rq = blk_mq_dequeue_from_ctx(hctx, ctx); if (!rq) { - blk_mq_put_dispatch_budget(hctx); + blk_mq_put_dispatch_budget(q); /* * We're releasing without dispatching. Holding the * budget could have blocked any "hctx"s with the diff --git a/block/blk-mq.c b/block/blk-mq.c index 9a36ac1c1fa1..bcbf49bd7ebe 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1262,7 +1262,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, rq = list_first_entry(list, struct request, queuelist); hctx = rq->mq_hctx; - if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) { + if (!got_budget && !blk_mq_get_dispatch_budget(q)) { blk_mq_put_driver_tag(rq); no_budget_avail = true; break; @@ -1277,7 +1277,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * we'll re-run it below. */ if (!blk_mq_mark_tag_wait(hctx, rq)) { - blk_mq_put_dispatch_budget(hctx); + blk_mq_put_dispatch_budget(q); /* * For non-shared tags, the RESTART check * will suffice. @@ -1925,11 +1925,11 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, if (q->elevator && !bypass_insert) goto insert; - if (!blk_mq_get_dispatch_budget(hctx)) + if (!blk_mq_get_dispatch_budget(q)) goto insert; if (!blk_mq_get_driver_tag(rq)) { - blk_mq_put_dispatch_budget(hctx); + blk_mq_put_dispatch_budget(q); goto insert; } diff --git a/block/blk-mq.h b/block/blk-mq.h index a139b0631817..21d877105224 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -180,20 +180,16 @@ unsigned int blk_mq_in_flight(struct request_queue *q, struct hd_struct *part); void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part, unsigned int inflight[2]); -static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx *hctx) +static inline void blk_mq_put_dispatch_budget(struct request_queue *q) { - struct request_queue *q = hctx->queue; - if (q->mq_ops->put_budget) - q->mq_ops->put_budget(hctx); + q->mq_ops->put_budget(q); } -static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) +static inline bool blk_mq_get_dispatch_budget(struct request_queue *q) { - struct request_queue *q = hctx->queue; - if (q->mq_ops->get_budget) - return q->mq_ops->get_budget(hctx); + return q->mq_ops->get_budget(q); return true; } diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 82ad0244b3d0..b9adee0a9266 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1624,17 +1624,15 @@ static void scsi_mq_done(struct scsi_cmnd *cmd) clear_bit(SCMD_STATE_COMPLETE, &cmd->state); } -static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx) +static void scsi_mq_put_budget(struct request_queue *q) { - struct request_queue *q = hctx->queue; struct scsi_device *sdev = q->queuedata; atomic_dec(&sdev->device_busy); } -static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) +static bool scsi_mq_get_budget(struct request_queue *q) { - struct request_queue *q = hctx->queue; struct scsi_device *sdev = q->queuedata; return scsi_dev_queue_ready(q, sdev); @@ -1701,7 +1699,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, if (scsi_target(sdev)->can_queue > 0) atomic_dec(&scsi_target(sdev)->target_busy); out_put_budget: - scsi_mq_put_budget(hctx); + scsi_mq_put_budget(q); switch (ret) { case BLK_STS_OK: break; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index d6fcae17da5a..416d8609253b 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -270,8 +270,8 @@ struct blk_mq_queue_data { typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *, const struct blk_mq_queue_data *); typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *); -typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *); -typedef void (put_budget_fn)(struct blk_mq_hw_ctx *); +typedef bool (get_budget_fn)(struct request_queue *); +typedef void (put_budget_fn)(struct request_queue *); typedef enum blk_eh_timer_return (timeout_fn)(struct request *, bool); typedef int (init_hctx_fn)(struct blk_mq_hw_ctx *, void *, unsigned int); typedef void (exit_hctx_fn)(struct blk_mq_hw_ctx *, unsigned int); From patchwork Tue Jun 2 09:14:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11583437 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A0C3F739 for ; Tue, 2 Jun 2020 09:15:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AFE12072F for ; Tue, 2 Jun 2020 09:15:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="U3cawd5r" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726371AbgFBJPe (ORCPT ); Tue, 2 Jun 2020 05:15:34 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:52998 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726174AbgFBJPd (ORCPT ); Tue, 2 Jun 2020 05:15:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1591089331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jWyCwxuF6gZn0DpnM/IKFaX41nX4qdmOrPE7DVaieao=; b=U3cawd5roCBqGZjj1k3ZH8fIr0qhxK68oVVP2j4LHSQKvfTHzmBfIlFqEt9zdqJVJMgu05 5shbm1+Od3HKA8OJaP404PsbBuN9dcVTMlSetCCFgeGNe7oV1sTURVExCCiXR+5xy/ZOsA rVKjPX2YP4m7jwPOZCAwX6wAP6vDvbc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-473-UStEhd8SMYu8rGMG6kRMyw-1; Tue, 02 Jun 2020 05:15:28 -0400 X-MC-Unique: UStEhd8SMYu8rGMG6kRMyw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A1F56107ACF2; Tue, 2 Jun 2020 09:15:26 +0000 (UTC) Received: from localhost (ovpn-12-167.pek2.redhat.com [10.72.12.167]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0A06F19D7C; Tue, 2 Jun 2020 09:15:22 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Sagi Grimberg , Baolin Wang , Christoph Hellwig , Christoph Hellwig , Johannes Thumshirn Subject: [PATCH V4 2/6] blk-mq: pass hctx to blk_mq_dispatch_rq_list Date: Tue, 2 Jun 2020 17:14:58 +0800 Message-Id: <20200602091502.1822499-3-ming.lei@redhat.com> In-Reply-To: <20200602091502.1822499-1-ming.lei@redhat.com> References: <20200602091502.1822499-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org All requests in the 'list' of blk_mq_dispatch_rq_list belong to same hctx, so it is better to pass hctx instead of request queue, because blk-mq's dispatch target is hctx instead of request queue. Cc: Sagi Grimberg Cc: Baolin Wang Cc: Christoph Hellwig Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Johannes Thumshirn Tested-by: Baolin Wang Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 14 ++++++-------- block/blk-mq.c | 6 +++--- block/blk-mq.h | 2 +- 3 files changed, 10 insertions(+), 12 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index a31e281e9d31..632c6f8b63f7 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -96,10 +96,9 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) struct elevator_queue *e = q->elevator; LIST_HEAD(rq_list); int ret = 0; + struct request *rq; do { - struct request *rq; - if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; @@ -131,7 +130,7 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) * in blk_mq_dispatch_rq_list(). */ list_add(&rq->queuelist, &rq_list); - } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); + } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, true)); return ret; } @@ -161,10 +160,9 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) LIST_HEAD(rq_list); struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); int ret = 0; + struct request *rq; do { - struct request *rq; - if (!list_empty_careful(&hctx->dispatch)) { ret = -EAGAIN; break; @@ -200,7 +198,7 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) /* round robin for fair dispatch */ ctx = blk_mq_next_ctx(hctx, rq->mq_ctx); - } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); + } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, true)); WRITE_ONCE(hctx->dispatch_from, ctx); return ret; @@ -240,7 +238,7 @@ static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) */ if (!list_empty(&rq_list)) { blk_mq_sched_mark_restart_hctx(hctx); - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { + if (blk_mq_dispatch_rq_list(hctx, &rq_list, false)) { if (has_sched_dispatch) ret = blk_mq_do_dispatch_sched(hctx); else @@ -253,7 +251,7 @@ static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) ret = blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list); - blk_mq_dispatch_rq_list(q, &rq_list, false); + blk_mq_dispatch_rq_list(hctx, &rq_list, false); } return ret; diff --git a/block/blk-mq.c b/block/blk-mq.c index bcbf49bd7ebe..723bc39507fe 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1236,10 +1236,10 @@ static void blk_mq_handle_zone_resource(struct request *rq, /* * Returns true if we did some work AND can potentially do more. */ -bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, +bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, bool got_budget) { - struct blk_mq_hw_ctx *hctx; + struct request_queue *q = hctx->queue; struct request *rq, *nxt; bool no_tag = false; int errors, queued; @@ -1261,7 +1261,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, rq = list_first_entry(list, struct request, queuelist); - hctx = rq->mq_hctx; + WARN_ON_ONCE(hctx != rq->mq_hctx); if (!got_budget && !blk_mq_get_dispatch_budget(q)) { blk_mq_put_driver_tag(rq); no_budget_avail = true; diff --git a/block/blk-mq.h b/block/blk-mq.h index 21d877105224..d2d737b16e0e 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -40,7 +40,7 @@ struct blk_mq_ctx { void blk_mq_exit_queue(struct request_queue *q); int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr); void blk_mq_wake_waiters(struct request_queue *q); -bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool); +bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *, bool); void blk_mq_add_to_requeue_list(struct request *rq, bool at_head, bool kick_requeue_list); void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list); From patchwork Tue Jun 2 09:14:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11583439 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 983CA90 for ; Tue, 2 Jun 2020 09:15:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 81201206C3 for ; Tue, 2 Jun 2020 09:15:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="AMS0S4/W" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726267AbgFBJPi (ORCPT ); Tue, 2 Jun 2020 05:15:38 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:49737 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726174AbgFBJPh (ORCPT ); Tue, 2 Jun 2020 05:15:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1591089336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3xZKGNwP5mXr81tqlEZkGHDLI2cSCq0nCGaNkNgIV6I=; b=AMS0S4/WfccDEv0K/JV+5jJdfBq89GLcJeESB4VFj9cjfWkWCUA5Fpq1t7EdnTdhwqK7Tx cwYVSlNhG6TU30k+9N/k0hnuBtGFm8SEVy7Hn4RvuWoVlmtmxcyoN2fZv0q+2tjsJmYh/c 4C0rdXV8oeJAnf9mkEcWwYQeGhiuqbY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-357-lygVNbppMuWiy-xnlAA7Bg-1; Tue, 02 Jun 2020 05:15:34 -0400 X-MC-Unique: lygVNbppMuWiy-xnlAA7Bg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 00D6B100CCC2; Tue, 2 Jun 2020 09:15:33 +0000 (UTC) Received: from localhost (ovpn-12-167.pek2.redhat.com [10.72.12.167]) by smtp.corp.redhat.com (Postfix) with ESMTP id 780C45C1C5; Tue, 2 Jun 2020 09:15:29 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Sagi Grimberg , Baolin Wang , Christoph Hellwig , Johannes Thumshirn Subject: [PATCH V4 3/6] blk-mq: move getting driver tag and budget into one helper Date: Tue, 2 Jun 2020 17:14:59 +0800 Message-Id: <20200602091502.1822499-4-ming.lei@redhat.com> In-Reply-To: <20200602091502.1822499-1-ming.lei@redhat.com> References: <20200602091502.1822499-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Move code for getting driver tag and budget into one helper, so blk_mq_dispatch_rq_list gets a bit simplified, and easier to read. Meantime move updating of 'no_tag' and 'no_budget_available' into the branch for handling partial dispatch because that is exactly consumer of the two local variables. Also rename the parameter of 'got_budget' as 'ask_budget'. No functional change. Cc: Sagi Grimberg Cc: Baolin Wang Cc: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Tested-by: Baolin Wang Signed-off-by: Ming Lei --- block/blk-mq.c | 75 +++++++++++++++++++++++++++++++++----------------- 1 file changed, 49 insertions(+), 26 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 723bc39507fe..ee9342aac7be 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1233,18 +1233,51 @@ static void blk_mq_handle_zone_resource(struct request *rq, __blk_mq_requeue_request(rq); } +enum prep_dispatch { + PREP_DISPATCH_OK, + PREP_DISPATCH_NO_TAG, + PREP_DISPATCH_NO_BUDGET, +}; + +static enum prep_dispatch blk_mq_prep_dispatch_rq(struct request *rq, + bool ask_budget) +{ + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + + if (ask_budget && !blk_mq_get_dispatch_budget(rq->q)) { + blk_mq_put_driver_tag(rq); + return PREP_DISPATCH_NO_BUDGET; + } + + if (!blk_mq_get_driver_tag(rq)) { + /* + * The initial allocation attempt failed, so we need to + * rerun the hardware queue when a tag is freed. The + * waitqueue takes care of that. If the queue is run + * before we add this entry back on the dispatch list, + * we'll re-run it below. + */ + if (!blk_mq_mark_tag_wait(hctx, rq)) { + /* budget is always obtained before getting tag */ + blk_mq_put_dispatch_budget(rq->q); + return PREP_DISPATCH_NO_TAG; + } + } + + return PREP_DISPATCH_OK; +} + /* * Returns true if we did some work AND can potentially do more. */ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, bool got_budget) { + enum prep_dispatch prep; struct request_queue *q = hctx->queue; struct request *rq, *nxt; - bool no_tag = false; int errors, queued; blk_status_t ret = BLK_STS_OK; - bool no_budget_avail = false; LIST_HEAD(zone_list); if (list_empty(list)) @@ -1262,31 +1295,9 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, rq = list_first_entry(list, struct request, queuelist); WARN_ON_ONCE(hctx != rq->mq_hctx); - if (!got_budget && !blk_mq_get_dispatch_budget(q)) { - blk_mq_put_driver_tag(rq); - no_budget_avail = true; + prep = blk_mq_prep_dispatch_rq(rq, !got_budget); + if (prep != PREP_DISPATCH_OK) break; - } - - if (!blk_mq_get_driver_tag(rq)) { - /* - * The initial allocation attempt failed, so we need to - * rerun the hardware queue when a tag is freed. The - * waitqueue takes care of that. If the queue is run - * before we add this entry back on the dispatch list, - * we'll re-run it below. - */ - if (!blk_mq_mark_tag_wait(hctx, rq)) { - blk_mq_put_dispatch_budget(q); - /* - * For non-shared tags, the RESTART check - * will suffice. - */ - if (hctx->flags & BLK_MQ_F_TAG_SHARED) - no_tag = true; - break; - } - } list_del_init(&rq->queuelist); @@ -1339,6 +1350,18 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, */ if (!list_empty(list)) { bool needs_restart; + bool no_tag = false; + bool no_budget_avail = false; + + /* + * For non-shared tags, the RESTART check + * will suffice. + */ + if (prep == PREP_DISPATCH_NO_TAG && + (hctx->flags & BLK_MQ_F_TAG_SHARED)) + no_tag = true; + if (prep == PREP_DISPATCH_NO_BUDGET) + no_budget_avail = true; /* * If we didn't flush the entire list, we could have told From patchwork Tue Jun 2 09:15:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11583441 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 07BA5739 for ; Tue, 2 Jun 2020 09:15:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E208F2072F for ; Tue, 2 Jun 2020 09:15:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ahnYMkvL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726373AbgFBJPq (ORCPT ); Tue, 2 Jun 2020 05:15:46 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:43600 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726174AbgFBJPp (ORCPT ); Tue, 2 Jun 2020 05:15:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1591089344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WoyPxFuoNqpxCUGKApFEixGOx7Ttl2wzxIBmgLr8xpk=; b=ahnYMkvL+cx48On4f8IPn67gpRVPdtoBrGhQYMOQ4rVyHK32xzjVbcrW8/Hd054YsLWjd5 1kkJBVR8H0cr2aIbzdWM1a4+4Ty54EPve1T/eEUiwlVJ/hAY4pfyzb88OIsU+Bvy9fVsMM Nm8WQErN0+bVqrSZtatWhixDKGvW6vo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-215-4imAGeexOt6hUOmSYK7FIw-1; Tue, 02 Jun 2020 05:15:40 -0400 X-MC-Unique: 4imAGeexOt6hUOmSYK7FIw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 22897464; Tue, 2 Jun 2020 09:15:39 +0000 (UTC) Received: from localhost (ovpn-12-167.pek2.redhat.com [10.72.12.167]) by smtp.corp.redhat.com (Postfix) with ESMTP id B4B4710013D5; Tue, 2 Jun 2020 09:15:35 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Sagi Grimberg , Baolin Wang , Christoph Hellwig Subject: [PATCH V4 4/6] blk-mq: remove dead check from blk_mq_dispatch_rq_list Date: Tue, 2 Jun 2020 17:15:00 +0800 Message-Id: <20200602091502.1822499-5-ming.lei@redhat.com> In-Reply-To: <20200602091502.1822499-1-ming.lei@redhat.com> References: <20200602091502.1822499-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE is returned from .queue_rq, the 'list' variable always holds this rq which isn't queued to LLD successfully. So blk_mq_dispatch_rq_list() always returns false from the branch of '!list_empty(list)'. No functional change. Cc: Sagi Grimberg Cc: Baolin Wang Cc: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Christoph Hellwig Tested-by: Baolin Wang Signed-off-by: Ming Lei --- block/blk-mq.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ee9342aac7be..0e3aab91e6c0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1413,13 +1413,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, } else blk_mq_update_dispatch_busy(hctx, false); - /* - * If the host/device is unable to accept more work, inform the - * caller of that. - */ - if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) - return false; - return (queued + errors) != 0; } From patchwork Tue Jun 2 09:15:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11583443 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3528690 for ; Tue, 2 Jun 2020 09:15:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1E8772072F for ; Tue, 2 Jun 2020 09:15:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QSTl8LJi" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726625AbgFBJPv (ORCPT ); Tue, 2 Jun 2020 05:15:51 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:25065 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726174AbgFBJPv (ORCPT ); Tue, 2 Jun 2020 05:15:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1591089349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LNFpJcC2kxMLnCXcUcEft4/CN02dK5uQWiThijHSu6E=; b=QSTl8LJiDDSQ+9oLnQttYEEy6KOVQt1wNxuNFglbagq7nJwVluJz5YIMYlkXGel2yqGILS PrFdlzCVoSnONiuJlvscvAHasi+aN93ffxxk7vRmJtKSR16a9ZqS4sMsM3uFqLqM4KMh0/ +M9rUdbtgoBPsTbb6sbli3Ola/fKYCE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-343-buL0EJkkMtu_OSHBQr3Pcw-1; Tue, 02 Jun 2020 05:15:46 -0400 X-MC-Unique: buL0EJkkMtu_OSHBQr3Pcw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DA0DC461; Tue, 2 Jun 2020 09:15:44 +0000 (UTC) Received: from localhost (ovpn-12-167.pek2.redhat.com [10.72.12.167]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7370919D7C; Tue, 2 Jun 2020 09:15:41 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Sagi Grimberg , Baolin Wang , Christoph Hellwig Subject: [PATCH V4 5/6] blk-mq: pass obtained budget count to blk_mq_dispatch_rq_list Date: Tue, 2 Jun 2020 17:15:01 +0800 Message-Id: <20200602091502.1822499-6-ming.lei@redhat.com> In-Reply-To: <20200602091502.1822499-1-ming.lei@redhat.com> References: <20200602091502.1822499-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Pass obtained budget count to blk_mq_dispatch_rq_list(), and prepare for supporting fully batching submission. With the obtained budget count, it is easier to put extra budgets in case of .queue_rq failure. Meantime remove the old 'got_budget' parameter. Cc: Sagi Grimberg Cc: Baolin Wang Cc: Christoph Hellwig Tested-by: Baolin Wang Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 8 ++++---- block/blk-mq.c | 27 +++++++++++++++++++++++---- block/blk-mq.h | 3 ++- 3 files changed, 29 insertions(+), 9 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 632c6f8b63f7..4c72073830f3 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -130,7 +130,7 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) * in blk_mq_dispatch_rq_list(). */ list_add(&rq->queuelist, &rq_list); - } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, true)); + } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, 1)); return ret; } @@ -198,7 +198,7 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) /* round robin for fair dispatch */ ctx = blk_mq_next_ctx(hctx, rq->mq_ctx); - } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, true)); + } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, 1)); WRITE_ONCE(hctx->dispatch_from, ctx); return ret; @@ -238,7 +238,7 @@ static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) */ if (!list_empty(&rq_list)) { blk_mq_sched_mark_restart_hctx(hctx); - if (blk_mq_dispatch_rq_list(hctx, &rq_list, false)) { + if (blk_mq_dispatch_rq_list(hctx, &rq_list, 0)) { if (has_sched_dispatch) ret = blk_mq_do_dispatch_sched(hctx); else @@ -251,7 +251,7 @@ static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) ret = blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list); - blk_mq_dispatch_rq_list(hctx, &rq_list, false); + blk_mq_dispatch_rq_list(hctx, &rq_list, 0); } return ret; diff --git a/block/blk-mq.c b/block/blk-mq.c index 0e3aab91e6c0..901ef0264e44 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1259,7 +1259,8 @@ static enum prep_dispatch blk_mq_prep_dispatch_rq(struct request *rq, */ if (!blk_mq_mark_tag_wait(hctx, rq)) { /* budget is always obtained before getting tag */ - blk_mq_put_dispatch_budget(rq->q); + if (ask_budget) + blk_mq_put_dispatch_budget(rq->q); return PREP_DISPATCH_NO_TAG; } } @@ -1267,11 +1268,21 @@ static enum prep_dispatch blk_mq_prep_dispatch_rq(struct request *rq, return PREP_DISPATCH_OK; } +static void blk_mq_release_budgets(struct request_queue *q, + unsigned int nr_budgets) +{ + int i = 0; + + /* release got budgets */ + while (i++ < nr_budgets) + blk_mq_put_dispatch_budget(q); +} + /* * Returns true if we did some work AND can potentially do more. */ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, - bool got_budget) + unsigned int nr_budgets) { enum prep_dispatch prep; struct request_queue *q = hctx->queue; @@ -1283,7 +1294,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, if (list_empty(list)) return false; - WARN_ON(!list_is_singular(list) && got_budget); + WARN_ON(!list_is_singular(list) && nr_budgets); /* * Now process all the entries, sending them to the driver. @@ -1295,7 +1306,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, rq = list_first_entry(list, struct request, queuelist); WARN_ON_ONCE(hctx != rq->mq_hctx); - prep = blk_mq_prep_dispatch_rq(rq, !got_budget); + prep = blk_mq_prep_dispatch_rq(rq, !nr_budgets); if (prep != PREP_DISPATCH_OK) break; @@ -1314,6 +1325,12 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, bd.last = !blk_mq_get_driver_tag(nxt); } + /* + * once the request is queued to lld, no need to cover the + * budget any more + */ + if (nr_budgets) + nr_budgets--; ret = q->mq_ops->queue_rq(hctx, &bd); if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_mq_handle_dev_resource(rq, list); @@ -1353,6 +1370,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, bool no_tag = false; bool no_budget_avail = false; + blk_mq_release_budgets(q, nr_budgets); + /* * For non-shared tags, the RESTART check * will suffice. diff --git a/block/blk-mq.h b/block/blk-mq.h index d2d737b16e0e..f3a93acfad03 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -40,7 +40,8 @@ struct blk_mq_ctx { void blk_mq_exit_queue(struct request_queue *q); int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr); void blk_mq_wake_waiters(struct request_queue *q); -bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *, bool); +bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *, + unsigned int); void blk_mq_add_to_requeue_list(struct request *rq, bool at_head, bool kick_requeue_list); void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list); From patchwork Tue Jun 2 09:15:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11583445 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C792D90 for ; Tue, 2 Jun 2020 09:15:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AADF1206C3 for ; Tue, 2 Jun 2020 09:15:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WHEvXFpq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726485AbgFBJP4 (ORCPT ); Tue, 2 Jun 2020 05:15:56 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:36058 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726630AbgFBJPz (ORCPT ); Tue, 2 Jun 2020 05:15:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1591089353; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jFx9KZLwq/CD+rJwlgahObAwh8uTLtX1kXpA2hZnXf8=; b=WHEvXFpqu54oFuaZGN2eIgduISMn6NFbSuTHZTJqwVArbQbOL4+ErK2LpaSBr3uRq0Cio6 iOgyg4L+5HSIKwe2jppbxbzZipC0ILtjEPihb35Oskii0guAYD3XEpMB4IutWiyrBn0gnL nIwgGHpFT+MCiCvtyIj5rFsXcjl7NBQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-274-g3X-At0cMput2VvGDTf99g-1; Tue, 02 Jun 2020 05:15:52 -0400 X-MC-Unique: g3X-At0cMput2VvGDTf99g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 20DBE835B40; Tue, 2 Jun 2020 09:15:51 +0000 (UTC) Received: from localhost (ovpn-12-167.pek2.redhat.com [10.72.12.167]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9C9007E7F1; Tue, 2 Jun 2020 09:15:47 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Sagi Grimberg , Baolin Wang , Christoph Hellwig Subject: [PATCH V4 6/6] blk-mq: support batching dispatch in case of io scheduler Date: Tue, 2 Jun 2020 17:15:02 +0800 Message-Id: <20200602091502.1822499-7-ming.lei@redhat.com> In-Reply-To: <20200602091502.1822499-1-ming.lei@redhat.com> References: <20200602091502.1822499-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org More and more drivers want to get batching requests queued from block layer, such as mmc, and tcp based storage drivers. Also current in-tree users have virtio-scsi, virtio-blk and nvme. For none, we already support batching dispatch. But for io scheduler, every time we just take one request from scheduler and pass the single request to blk_mq_dispatch_rq_list(). This way makes batching dispatch not possible when io scheduler is applied. One reason is that we don't want to hurt sequential IO performance, becasue IO merge chance is reduced if more requests are dequeued from scheduler queue. Try to support batching dispatch for io scheduler by starting with the following simple approach: 1) still make sure we can get budget before dequeueing request 2) use hctx->dispatch_busy to evaluate if queue is busy, if it is busy we fackback to non-batching dispatch, otherwise dequeue as many as possible requests from scheduler, and pass them to blk_mq_dispatch_rq_list(). Wrt. 2), we use similar policy for none, and turns out that SCSI SSD performance got improved much. In future, maybe we can develop more intelligent algorithem for batching dispatch. Baolin has tested this patch and found that MMC performance is improved[3]. [1] https://lore.kernel.org/linux-block/20200512075501.GF1531898@T590/#r [2] https://lore.kernel.org/linux-block/fe6bd8b9-6ed9-b225-f80c-314746133722@grimberg.me/ [3] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@mail.gmail.com/ Cc: Sagi Grimberg Cc: Baolin Wang Cc: Christoph Hellwig Tested-by: Baolin Wang Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 96 ++++++++++++++++++++++++++++++++++++++++++-- block/blk-mq.c | 2 - 2 files changed, 93 insertions(+), 5 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 4c72073830f3..02ba7e86cce3 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -80,6 +81,74 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) blk_mq_run_hw_queue(hctx, true); } +/* + * We know bfq and deadline apply single scheduler queue instead of multi + * queue. However, the two are often used on single queue devices, also + * the current @hctx should affect the real device status most of times + * because of locality principle. + * + * So use current hctx->dispatch_busy directly for figuring out batching + * dispatch count. + */ +static unsigned int blk_mq_sched_get_batching_nr(struct blk_mq_hw_ctx *hctx) +{ + if (hctx->dispatch_busy) + return 1; + return hctx->queue->nr_requests; +} + +static int sched_rq_cmp(void *priv, struct list_head *a, struct list_head *b) +{ + struct request *rqa = container_of(a, struct request, queuelist); + struct request *rqb = container_of(b, struct request, queuelist); + + return rqa->mq_hctx > rqb->mq_hctx; +} + +static inline bool blk_mq_do_dispatch_rq_lists(struct blk_mq_hw_ctx *hctx, + struct list_head *lists, bool multi_hctxs, unsigned count) +{ + bool ret; + + if (!count) + return false; + + if (likely(!multi_hctxs)) + return blk_mq_dispatch_rq_list(hctx, lists, count); + + /* + * Requests from different hctx may be dequeued from some scheduler, + * such as bfq and deadline. + * + * Sort the requests in the list according to their hctx, dispatch + * batching requests from same hctx + */ + list_sort(NULL, lists, sched_rq_cmp); + + ret = false; + while (!list_empty(lists)) { + LIST_HEAD(list); + struct request *new, *rq = list_first_entry(lists, + struct request, queuelist); + unsigned cnt = 0; + + list_for_each_entry(new, lists, queuelist) { + if (new->mq_hctx != rq->mq_hctx) + break; + cnt++; + } + + if (new->mq_hctx == rq->mq_hctx) + list_splice_tail_init(lists, &list); + else + list_cut_before(&list, lists, &new->queuelist); + + ret = blk_mq_dispatch_rq_list(rq->mq_hctx, &list, cnt); + } + + return ret; +} + #define BLK_MQ_BUDGET_DELAY 3 /* ms units */ /* @@ -97,7 +166,15 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) LIST_HEAD(rq_list); int ret = 0; struct request *rq; - + int cnt; + unsigned int max_dispatch; + bool multi_hctxs, run_queue; + + again: + /* prepare one batch for dispatch */ + cnt = 0; + max_dispatch = blk_mq_sched_get_batching_nr(hctx); + multi_hctxs = run_queue = false; do { if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; @@ -120,7 +197,7 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) * no guarantee anyone will kick the queue. Kick it * ourselves. */ - blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY); + run_queue = true; break; } @@ -130,7 +207,20 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) * in blk_mq_dispatch_rq_list(). */ list_add(&rq->queuelist, &rq_list); - } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, 1)); + cnt++; + + if (rq->mq_hctx != hctx && !multi_hctxs) + multi_hctxs = true; + } while (cnt < max_dispatch); + + /* dispatch more if we may do more */ + if (blk_mq_do_dispatch_rq_lists(hctx, &rq_list, multi_hctxs, cnt) && + !ret) + goto again; + + /* in-flight request's completion can rerun queue */ + if (!cnt && run_queue) + blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY); return ret; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 901ef0264e44..00ad7d19e823 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1294,8 +1294,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, if (list_empty(list)) return false; - WARN_ON(!list_is_singular(list) && nr_budgets); - /* * Now process all the entries, sending them to the driver. */