From patchwork Thu Nov  9 23:00:09 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jens Axboe <axboe@kernel.dk>
X-Patchwork-Id: 10052067
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	A29F16032D for <patchwork-linux-block@patchwork.kernel.org>;
	Thu,  9 Nov 2017 23:00:13 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 946DE2B196
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu,  9 Nov 2017 23:00:13 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 891372B19D; Thu,  9 Nov 2017 23:00:13 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D21862B196
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu,  9 Nov 2017 23:00:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754767AbdKIXAM (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Thu, 9 Nov 2017 18:00:12 -0500
Received: from mail-io0-f195.google.com ([209.85.223.195]:54813 "EHLO
	mail-io0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754001AbdKIXAL (ORCPT
	<rfc822; linux-block@vger.kernel.org>); Thu, 9 Nov 2017 18:00:11 -0500
Received: by mail-io0-f195.google.com with SMTP id e89so11631425ioi.11
	for <linux-block@vger.kernel.org>;
	Thu, 09 Nov 2017 15:00:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=kernel-dk.20150623.gappssmtp.com; s=20150623;
	h=to:cc:from:subject:message-id:date:user-agent:mime-version
	:content-language:content-transfer-encoding;
	bh=Z5zNEDgwxLdvxnDJPBlYqIf3LE0/vH9FTyKLpYvJFjc=;
	b=0o4uriw9DQ+a+sj3rkMX1M2CHcfd73rx+07kwO1r6tqypZJxrcxKtA161HMIpKd96C
	TieTMIaevu615hnVB9LL+Uxp5aR9GH0qxaXqnX9wCwqRYeVmzzy7C8jM+S3Z5Yx2ijdM
	Xi1ngfPEAqCrV15X46GZPdU8oFGI4fb9fnzNkWEwNv58R/pGGZ53ekqJ/uHJyVt6ZzzL
	et3ilodN92Ttc7haqob5yOhY2NuAHpZfKRzJpbBgVPXoqFtacY3I8J8tp6Wzu1wx2ozG
	WyW3czDmlkjYCstsz6wxz0P9dkQOlMspSI0OrmFt7D28joGuxLV4M9yo0LaQI9szUzA/
	Vr9Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:to:cc:from:subject:message-id:date:user-agent
	:mime-version:content-language:content-transfer-encoding;
	bh=Z5zNEDgwxLdvxnDJPBlYqIf3LE0/vH9FTyKLpYvJFjc=;
	b=aSOMePOGzXsH43vLczqZ/QCr5freI9e9HHZ7/xSfdKgiXRxlLAv6tcrM2nQIfEmNW7
	iienzF8QMeCXX/4LDj5GswVAUzu7g9SHVdF3HxMZIwFE6bT8Tg5SkE9bxKDdvxgfctF3
	uOrBAGZ1sG8VapjxJ+rqzfzABRUBAj1sS7ZDgy+0HTq6OVllwJBnE2SvJQN77ECrexb7
	5kt0VZWBNkvBmcb4GPDGXNQ+M3dD6aYDmY0b8cP+olZNUUIB7TO4juWfQo2MDqgKQisD
	39HVSkyZ201Mv2WFKfkx8RgVJk4YOodPNglnsBe8aR4QwotZmt55BQtpl3HKc5iej3M0
	nVeg==
X-Gm-Message-State: AJaThX7cHx3IR0OBykXWWY6cCAsazFE/447yuD++fn6QrJc5GTs3OTGc
	fpnNAltZ1NftOCSu7mOyg7Mfrg==
X-Google-Smtp-Source: 
 AGs4zMZdZM6hURmnJHTsOpgQLveBrqU40Y7BlaXS1+4QPatHww0ONixJ1Nf12/iGFse08tpWViMB6Q==
X-Received: by 10.107.204.1 with SMTP id c1mr2507300iog.72.1510268410667;
	Thu, 09 Nov 2017 15:00:10 -0800 (PST)
Received: from [192.168.1.154] ([216.160.245.98])
	by smtp.gmail.com with ESMTPSA id
	h97sm3951539iod.87.2017.11.09.15.00.09
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 09 Nov 2017 15:00:10 -0800 (PST)
To: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Cc: Omar Sandoval <osandov@fb.com>
From: Jens Axboe <axboe@kernel.dk>
Subject: [PATCH] blk-mq: improve tag waiting setup for non-shared tags
Message-ID: <3649d84b-978d-ce1b-a8bc-5735b07296a7@kernel.dk>
Date: Thu, 9 Nov 2017 16:00:09 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.4.0
MIME-Version: 1.0
Content-Language: en-US
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

If we run out of driver tags, we currently treat shared and non-shared
tags the same - both cases hook into the tag waitqueue. This is a bit
more costly than it needs to be on unshared tags, since we have to both
grab the hctx lock, and the waitqueue lock (and disable interrupts).
For the non-shared case, we can simply mark the queue as needing a
restart.

Split blk_mq_dispatch_wait_add() to account for both cases, and
rename it to blk_mq_mark_tag_wait() to better reflect what it
does now.

Without this patch, shared and non-shared performance is about the same
with 4 fio thread hammering on a single null_blk device (~410K, at 75%
sys). With the patch, the shared case is the same, but the non-shared
tags case runs at 431K at 71% sys.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Omar Sandoval <osandov@fb.com>

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bfe24a5b62a3..965317ea45f9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1006,44 +1006,68 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 	return 1;
 }
 
-static bool blk_mq_dispatch_wait_add(struct blk_mq_hw_ctx **hctx,
-				     struct request *rq)
+/*
+ * Mark us waiting for a tag. For shared tags, this involves hooking us into
+ * the tag wakeups. For non-shared tags, we can simply mark us nedeing a
+ * restart. For both caes, take care to check the condition again after
+ * marking us as waiting.
+ */
+static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx,
+				 struct request *rq)
 {
 	struct blk_mq_hw_ctx *this_hctx = *hctx;
-	wait_queue_entry_t *wait = &this_hctx->dispatch_wait;
+	bool shared_tags = (this_hctx->flags & BLK_MQ_F_TAG_SHARED) != 0;
 	struct sbq_wait_state *ws;
+	wait_queue_entry_t *wait;
+	bool ret;
 
-	if (!list_empty_careful(&wait->entry))
-		return false;
+	if (!shared_tags) {
+		if (!test_bit(BLK_MQ_S_SCHED_RESTART, &this_hctx->state))
+			set_bit(BLK_MQ_S_SCHED_RESTART, &this_hctx->state);
+	} else {
+		wait = &this_hctx->dispatch_wait;
+		if (!list_empty_careful(&wait->entry))
+			return false;
 
-	spin_lock(&this_hctx->lock);
-	if (!list_empty(&wait->entry)) {
-		spin_unlock(&this_hctx->lock);
-		return false;
-	}
+		spin_lock(&this_hctx->lock);
+		if (!list_empty(&wait->entry)) {
+			spin_unlock(&this_hctx->lock);
+			return false;
+		}
 
-	ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx);
-	add_wait_queue(&ws->wait, wait);
+		ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx);
+		add_wait_queue(&ws->wait, wait);
+	}
 
 	/*
 	 * It's possible that a tag was freed in the window between the
 	 * allocation failure and adding the hardware queue to the wait
 	 * queue.
 	 */
-	if (!blk_mq_get_driver_tag(rq, hctx, false)) {
+	ret = blk_mq_get_driver_tag(rq, hctx, false);
+
+	if (!shared_tags) {
+		/*
+		 * Don't clear RESTART here, someone else could have set it.
+		 * At most this will cost an extra queue run.
+		 */
+		return ret;
+	} else {
+		if (!ret) {
+			spin_unlock(&this_hctx->lock);
+			return false;
+		}
+
+		/*
+		 * We got a tag, remove ourselves from the wait queue to ensure
+		 * someone else gets the wakeup.
+		 */
+		spin_lock_irq(&ws->wait.lock);
+		list_del_init(&wait->entry);
+		spin_unlock_irq(&ws->wait.lock);
 		spin_unlock(&this_hctx->lock);
-		return false;
+		return true;
 	}
-
-	/*
-	 * We got a tag, remove ourselves from the wait queue to ensure
-	 * someone else gets the wakeup.
-	 */
-	spin_lock_irq(&ws->wait.lock);
-	list_del_init(&wait->entry);
-	spin_unlock_irq(&ws->wait.lock);
-	spin_unlock(&this_hctx->lock);
-	return true;
 }
 
 bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
@@ -1076,10 +1100,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 			 * before we add this entry back on the dispatch list,
 			 * we'll re-run it below.
 			 */
-			if (!blk_mq_dispatch_wait_add(&hctx, rq)) {
+			if (!blk_mq_mark_tag_wait(&hctx, rq)) {
 				if (got_budget)
 					blk_mq_put_dispatch_budget(hctx);
-				no_tag = true;
+				/*
+				 * For non-shared tags, the RESTART check
+				 * will suffice.
+				 */
+				if (hctx->flags & BLK_MQ_F_TAG_SHARED)
+					no_tag = true;
 				break;
 			}
 		}