From patchwork Mon Nov 27 05:07:21 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10075763 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CE7CA6028E for ; Mon, 27 Nov 2017 05:11:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C033D28D7B for ; Mon, 27 Nov 2017 05:11:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B4E5F28D88; Mon, 27 Nov 2017 05:11:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3C85928D7B for ; Mon, 27 Nov 2017 05:11:38 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 115264E4E6; Mon, 27 Nov 2017 05:11:37 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DEE6460176; Mon, 27 Nov 2017 05:11:36 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id B74AF4A467; Mon, 27 Nov 2017 05:11:36 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id vAR594Pj015840 for ; Mon, 27 Nov 2017 00:09:04 -0500 Received: by smtp.corp.redhat.com (Postfix) id 8562E609A5; Mon, 27 Nov 2017 05:09:04 +0000 (UTC) Delivered-To: dm-devel@redhat.com Received: from localhost (ovpn-12-32.pek2.redhat.com [10.72.12.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 402D960C8A; Mon, 27 Nov 2017 05:08:50 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, Mike Snitzer , dm-devel@redhat.com Date: Mon, 27 Nov 2017 13:07:21 +0800 Message-Id: <20171127050721.5884-6-ming.lei@redhat.com> In-Reply-To: <20171127050721.5884-1-ming.lei@redhat.com> References: <20171127050721.5884-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-loop: dm-devel@redhat.com Cc: Hannes Reinecke , linux-kernel@vger.kernel.org, Ming Lei , Christoph Hellwig , Bart Van Assche , Omar Sandoval Subject: [dm-devel] [PATCH V2 5/5] blk-mq: issue request directly for blk_insert_cloned_request X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 27 Nov 2017 05:11:37 +0000 (UTC) X-Virus-Scanned: ClamAV using ClamSMTP blk_insert_cloned_request() is called in fast path of dm-rq driver, and in this function we append request to hctx->dispatch_list of the underlying queue directly. 1) This way isn't efficient enough because hctx lock is always required 2) With blk_insert_cloned_request(), we bypass underlying queue's IO scheduler totally, and depend on DM rq driver to do IO schedule completely. But DM rq driver can't get underlying queue's dispatch feedback at all, and this information is extreamly useful for IO merge. Without that IO merge can't be done basically by blk-mq, and causes very bad sequential IO performance. This patch makes use of blk_mq_try_issue_directly() to dispatch rq to underlying queue and provides disptch result to dm-rq and blk-mq, and improves the above situations very much. With this patch, seqential IO is improved by 3X ~ 5X in my test over mpath/virtio-scsi. Signed-off-by: Ming Lei --- block/blk-core.c | 3 +-- block/blk-mq.c | 32 +++++++++++++++++++++++++++++--- block/blk-mq.h | 3 +++ drivers/md/dm-rq.c | 19 ++++++++++++++++--- 4 files changed, 49 insertions(+), 8 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index b8881750a3ac..e5a623b45a1d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2488,8 +2488,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * * bypass a potential scheduler on the bottom device for * insert. */ - blk_mq_request_bypass_insert(rq, true); - return BLK_STS_OK; + return blk_mq_request_direct_issue(rq); } spin_lock_irqsave(q->queue_lock, flags); diff --git a/block/blk-mq.c b/block/blk-mq.c index fd4fb6316ea1..c94a8d225b63 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1629,6 +1629,12 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, blk_qc_t new_cookie; blk_status_t ret = BLK_STS_OK; bool run_queue = true; + /* + * This function is called from blk_insert_cloned_request() if + * 'cookie' is NULL, and for dispatching this request only. + */ + bool dispatch_only = !cookie; + bool need_insert; /* RCU or SRCU read lock is needed before checking quiesced flag */ if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)) { @@ -1636,10 +1642,19 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, goto insert; } - if (q->elevator) + if (q->elevator && !dispatch_only) goto insert; - if (__blk_mq_issue_req(hctx, rq, &new_cookie, &ret)) + need_insert = __blk_mq_issue_req(hctx, rq, &new_cookie, &ret); + if (dispatch_only) { + if (need_insert) + return BLK_STS_RESOURCE; + if (ret == BLK_STS_RESOURCE) + __blk_mq_requeue_request(rq); + return ret; + } + + if (need_insert) goto insert; /* @@ -1661,7 +1676,10 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, } insert: - blk_mq_sched_insert_request(rq, false, run_queue, false, may_sleep); + if (!dispatch_only) + blk_mq_sched_insert_request(rq, false, run_queue, false, may_sleep); + else + blk_mq_request_bypass_insert(rq, run_queue); return ret; } @@ -1688,6 +1706,14 @@ static blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, return ret; } +blk_status_t blk_mq_request_direct_issue(struct request *rq) +{ + struct blk_mq_ctx *ctx = rq->mq_ctx; + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); + + return blk_mq_try_issue_directly(hctx, rq, NULL); +} + static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) { const int is_sync = op_is_sync(bio->bi_opf); diff --git a/block/blk-mq.h b/block/blk-mq.h index 6c7c3ff5bf62..81df35fbce77 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -60,6 +60,9 @@ void blk_mq_request_bypass_insert(struct request *rq, bool run_queue); void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list); +/* Used by DM for issuing req directly */ +blk_status_t blk_mq_request_direct_issue(struct request *rq); + /* * CPU -> queue mappings */ diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index cbe8a06ef8b0..b96aa208e5cc 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -395,7 +395,7 @@ static void end_clone_request(struct request *clone, blk_status_t error) dm_complete_request(tio->orig, error); } -static void dm_dispatch_clone_request(struct request *clone, struct request *rq) +static blk_status_t dm_dispatch_clone_request(struct request *clone, struct request *rq) { blk_status_t r; @@ -404,9 +404,10 @@ static void dm_dispatch_clone_request(struct request *clone, struct request *rq) clone->start_time = jiffies; r = blk_insert_cloned_request(clone->q, clone); - if (r) + if (r != BLK_STS_OK && r != BLK_STS_RESOURCE) /* must complete clone in terms of original request */ dm_complete_request(rq, r); + return r; } static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig, @@ -476,8 +477,10 @@ static int map_request(struct dm_rq_target_io *tio) struct mapped_device *md = tio->md; struct request *rq = tio->orig; struct request *clone = NULL; + blk_status_t ret; r = ti->type->clone_and_map_rq(ti, rq, &tio->info, &clone); + check_again: switch (r) { case DM_MAPIO_SUBMITTED: /* The target has taken the I/O to submit by itself later */ @@ -492,7 +495,17 @@ static int map_request(struct dm_rq_target_io *tio) /* The target has remapped the I/O so dispatch it */ trace_block_rq_remap(clone->q, clone, disk_devt(dm_disk(md)), blk_rq_pos(rq)); - dm_dispatch_clone_request(clone, rq); + ret = dm_dispatch_clone_request(clone, rq); + if (ret == BLK_STS_RESOURCE) { + blk_rq_unprep_clone(clone); + tio->ti->type->release_clone_rq(clone); + tio->clone = NULL; + if (!rq->q->mq_ops) + r = DM_MAPIO_DELAY_REQUEUE; + else + r = DM_MAPIO_REQUEUE; + goto check_again; + } break; case DM_MAPIO_REQUEUE: /* The target wants to requeue the I/O */