From patchwork Sat Aug 26 16:33:32 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ming Lei <ming.lei@redhat.com>
X-Patchwork-Id: 9923587
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	B382D602BD for <patchwork-linux-block@patchwork.kernel.org>;
	Sat, 26 Aug 2017 16:35:49 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A4E3D285D1
	for <patchwork-linux-block@patchwork.kernel.org>;
	Sat, 26 Aug 2017 16:35:49 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 99EFE285DD; Sat, 26 Aug 2017 16:35:49 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2A95A285D1
	for <patchwork-linux-block@patchwork.kernel.org>;
	Sat, 26 Aug 2017 16:35:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751124AbdHZQfs (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Sat, 26 Aug 2017 12:35:48 -0400
Received: from mx1.redhat.com ([209.132.183.28]:59878 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751061AbdHZQfs (ORCPT <rfc822;linux-block@vger.kernel.org>);
	Sat, 26 Aug 2017 12:35:48 -0400
Received: from smtp.corp.redhat.com
	(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 2D4927E423;
	Sat, 26 Aug 2017 16:35:48 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 2D4927E423
Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com;
	dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com;
	spf=fail smtp.mailfrom=ming.lei@redhat.com
Received: from localhost (ovpn-12-21.pek2.redhat.com [10.72.12.21])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 0B83D19E8E;
	Sat, 26 Aug 2017 16:35:41 +0000 (UTC)
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@fb.com>, linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>,
	Laurence Oberman <loberman@redhat.com>,
	Paolo Valente <paolo.valente@linaro.org>,
	Mel Gorman <mgorman@techsingularity.net>, Ming Lei <ming.lei@redhat.com>
Subject: [PATCH V3 14/14] blk-mq: improve bio merge from blk-mq sw queue
Date: Sun, 27 Aug 2017 00:33:32 +0800
Message-Id: <20170826163332.28971-15-ming.lei@redhat.com>
In-Reply-To: <20170826163332.28971-1-ming.lei@redhat.com>
References: <20170826163332.28971-1-ming.lei@redhat.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.5.110.27]);
	Sat, 26 Aug 2017 16:35:48 +0000 (UTC)
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This patch uses hash table to do bio merge from sw queue,
then we can align to blk-mq scheduler/block legacy's way
for bio merge.

Turns out bio merge via hash table is more efficient than
simple merge on the last 8 requests in sw queue. On SCSI SRP,
it is observed ~10% IOPS is increased in sequential IO test
with this patch.

It is also one step forward to real 'none' scheduler, in which
way the blk-mq scheduler framework can be more clean.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 49 ++++++++++++-------------------------------------
 block/blk-mq.c       | 28 +++++++++++++++++++++++++---
 2 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 5af0ff71730c..b958caa8bccb 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -268,50 +268,25 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 }
 EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
 
-/*
- * Reverse check our software queue for entries that we could potentially
- * merge with. Currently includes a hand-wavy stop count of 8, to not spend
- * too much time checking for merges.
- */
-static bool blk_mq_attempt_merge(struct request_queue *q,
+static bool blk_mq_ctx_try_merge(struct request_queue *q,
 				 struct blk_mq_ctx *ctx, struct bio *bio)
 {
-	struct request *rq;
-	int checked = 8;
+	struct request *rq, *free = NULL;
+	enum elv_merge type;
+	bool merged;
 
 	lockdep_assert_held(&ctx->lock);
 
-	list_for_each_entry_reverse(rq, &ctx->rq_list, queuelist) {
-		bool merged = false;
-
-		if (!checked--)
-			break;
-
-		if (!blk_rq_merge_ok(rq, bio))
-			continue;
+	type = elv_merge_ctx(q, &rq, bio, ctx);
+	merged = __blk_mq_try_merge(q, bio, &free, rq, type);
 
-		switch (blk_try_merge(rq, bio)) {
-		case ELEVATOR_BACK_MERGE:
-			if (blk_mq_sched_allow_merge(q, rq, bio))
-				merged = bio_attempt_back_merge(q, rq, bio);
-			break;
-		case ELEVATOR_FRONT_MERGE:
-			if (blk_mq_sched_allow_merge(q, rq, bio))
-				merged = bio_attempt_front_merge(q, rq, bio);
-			break;
-		case ELEVATOR_DISCARD_MERGE:
-			merged = bio_attempt_discard_merge(q, rq, bio);
-			break;
-		default:
-			continue;
-		}
+	if (free)
+		blk_mq_free_request(free);
 
-		if (merged)
-			ctx->rq_merged++;
-		return merged;
-	}
+	if (merged)
+		ctx->rq_merged++;
 
-	return false;
+	return merged;
 }
 
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
@@ -329,7 +304,7 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
 	if (hctx->flags & BLK_MQ_F_SHOULD_MERGE) {
 		/* default per sw-queue merge */
 		spin_lock(&ctx->lock);
-		ret = blk_mq_attempt_merge(q, ctx, bio);
+		ret = blk_mq_ctx_try_merge(q, ctx, bio);
 		spin_unlock(&ctx->lock);
 	}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fc3d26bbfc1a..d935f15c54da 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -847,6 +847,18 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	blk_queue_exit(q);
 }
 
+static void blk_mq_ctx_remove_rq_list(struct blk_mq_ctx *ctx,
+		struct list_head *head)
+{
+	struct request *rq;
+
+	lockdep_assert_held(&ctx->lock);
+
+	list_for_each_entry(rq, head, queuelist)
+		rqhash_del(rq);
+	ctx->last_merge = NULL;
+}
+
 struct flush_busy_ctx_data {
 	struct blk_mq_hw_ctx *hctx;
 	struct list_head *list;
@@ -861,6 +873,7 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
 	sbitmap_clear_bit(sb, bitnr);
 	spin_lock(&ctx->lock);
 	list_splice_tail_init(&ctx->rq_list, flush_data->list);
+	blk_mq_ctx_remove_rq_list(ctx, flush_data->list);
 	spin_unlock(&ctx->lock);
 	return true;
 }
@@ -890,17 +903,23 @@ static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr, void *d
 	struct dispatch_rq_data *dispatch_data = data;
 	struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
 	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+	struct request *rq = NULL;
 
 	spin_lock(&ctx->lock);
 	if (unlikely(!list_empty(&ctx->rq_list))) {
-		dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
-		list_del_init(&dispatch_data->rq->queuelist);
+		rq = list_entry_rq(ctx->rq_list.next);
+		list_del_init(&rq->queuelist);
+		rqhash_del(rq);
 		if (list_empty(&ctx->rq_list))
 			sbitmap_clear_bit(sb, bitnr);
 	}
+	if (ctx->last_merge == rq)
+		ctx->last_merge = NULL;
 	spin_unlock(&ctx->lock);
 
-	return !dispatch_data->rq;
+	dispatch_data->rq = rq;
+
+	return !rq;
 }
 
 struct request *blk_mq_dispatch_rq_from_ctx(struct blk_mq_hw_ctx *hctx,
@@ -1431,6 +1450,8 @@ static inline void __blk_mq_insert_req_list(struct blk_mq_hw_ctx *hctx,
 		list_add(&rq->queuelist, &ctx->rq_list);
 	else
 		list_add_tail(&rq->queuelist, &ctx->rq_list);
+
+	rqhash_add(ctx->hash, rq);
 }
 
 void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
@@ -1923,6 +1944,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	spin_lock(&ctx->lock);
 	if (!list_empty(&ctx->rq_list)) {
 		list_splice_init(&ctx->rq_list, &tmp);
+		blk_mq_ctx_remove_rq_list(ctx, &tmp);
 		blk_mq_hctx_clear_pending(hctx, ctx);
 	}
 	spin_unlock(&ctx->lock);