From patchwork Thu Nov 9 23:00:09 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10052067 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A29F16032D for ; Thu, 9 Nov 2017 23:00:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 946DE2B196 for ; Thu, 9 Nov 2017 23:00:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 891372B19D; Thu, 9 Nov 2017 23:00:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D21862B196 for ; Thu, 9 Nov 2017 23:00:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754767AbdKIXAM (ORCPT ); Thu, 9 Nov 2017 18:00:12 -0500 Received: from mail-io0-f195.google.com ([209.85.223.195]:54813 "EHLO mail-io0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754001AbdKIXAL (ORCPT ); Thu, 9 Nov 2017 18:00:11 -0500 Received: by mail-io0-f195.google.com with SMTP id e89so11631425ioi.11 for ; Thu, 09 Nov 2017 15:00:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=to:cc:from:subject:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=Z5zNEDgwxLdvxnDJPBlYqIf3LE0/vH9FTyKLpYvJFjc=; b=0o4uriw9DQ+a+sj3rkMX1M2CHcfd73rx+07kwO1r6tqypZJxrcxKtA161HMIpKd96C TieTMIaevu615hnVB9LL+Uxp5aR9GH0qxaXqnX9wCwqRYeVmzzy7C8jM+S3Z5Yx2ijdM Xi1ngfPEAqCrV15X46GZPdU8oFGI4fb9fnzNkWEwNv58R/pGGZ53ekqJ/uHJyVt6ZzzL et3ilodN92Ttc7haqob5yOhY2NuAHpZfKRzJpbBgVPXoqFtacY3I8J8tp6Wzu1wx2ozG WyW3czDmlkjYCstsz6wxz0P9dkQOlMspSI0OrmFt7D28joGuxLV4M9yo0LaQI9szUzA/ Vr9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:from:subject:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=Z5zNEDgwxLdvxnDJPBlYqIf3LE0/vH9FTyKLpYvJFjc=; b=aSOMePOGzXsH43vLczqZ/QCr5freI9e9HHZ7/xSfdKgiXRxlLAv6tcrM2nQIfEmNW7 iienzF8QMeCXX/4LDj5GswVAUzu7g9SHVdF3HxMZIwFE6bT8Tg5SkE9bxKDdvxgfctF3 uOrBAGZ1sG8VapjxJ+rqzfzABRUBAj1sS7ZDgy+0HTq6OVllwJBnE2SvJQN77ECrexb7 5kt0VZWBNkvBmcb4GPDGXNQ+M3dD6aYDmY0b8cP+olZNUUIB7TO4juWfQo2MDqgKQisD 39HVSkyZ201Mv2WFKfkx8RgVJk4YOodPNglnsBe8aR4QwotZmt55BQtpl3HKc5iej3M0 nVeg== X-Gm-Message-State: AJaThX7cHx3IR0OBykXWWY6cCAsazFE/447yuD++fn6QrJc5GTs3OTGc fpnNAltZ1NftOCSu7mOyg7Mfrg== X-Google-Smtp-Source: AGs4zMZdZM6hURmnJHTsOpgQLveBrqU40Y7BlaXS1+4QPatHww0ONixJ1Nf12/iGFse08tpWViMB6Q== X-Received: by 10.107.204.1 with SMTP id c1mr2507300iog.72.1510268410667; Thu, 09 Nov 2017 15:00:10 -0800 (PST) Received: from [192.168.1.154] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id h97sm3951539iod.87.2017.11.09.15.00.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Nov 2017 15:00:10 -0800 (PST) To: "linux-block@vger.kernel.org" Cc: Omar Sandoval From: Jens Axboe Subject: [PATCH] blk-mq: improve tag waiting setup for non-shared tags Message-ID: <3649d84b-978d-ce1b-a8bc-5735b07296a7@kernel.dk> Date: Thu, 9 Nov 2017 16:00:09 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 Content-Language: en-US Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we run out of driver tags, we currently treat shared and non-shared tags the same - both cases hook into the tag waitqueue. This is a bit more costly than it needs to be on unshared tags, since we have to both grab the hctx lock, and the waitqueue lock (and disable interrupts). For the non-shared case, we can simply mark the queue as needing a restart. Split blk_mq_dispatch_wait_add() to account for both cases, and rename it to blk_mq_mark_tag_wait() to better reflect what it does now. Without this patch, shared and non-shared performance is about the same with 4 fio thread hammering on a single null_blk device (~410K, at 75% sys). With the patch, the shared case is the same, but the non-shared tags case runs at 431K at 71% sys. Signed-off-by: Jens Axboe Reviewed-by: Omar Sandoval diff --git a/block/blk-mq.c b/block/blk-mq.c index bfe24a5b62a3..965317ea45f9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1006,44 +1006,68 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, return 1; } -static bool blk_mq_dispatch_wait_add(struct blk_mq_hw_ctx **hctx, - struct request *rq) +/* + * Mark us waiting for a tag. For shared tags, this involves hooking us into + * the tag wakeups. For non-shared tags, we can simply mark us nedeing a + * restart. For both caes, take care to check the condition again after + * marking us as waiting. + */ +static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, + struct request *rq) { struct blk_mq_hw_ctx *this_hctx = *hctx; - wait_queue_entry_t *wait = &this_hctx->dispatch_wait; + bool shared_tags = (this_hctx->flags & BLK_MQ_F_TAG_SHARED) != 0; struct sbq_wait_state *ws; + wait_queue_entry_t *wait; + bool ret; - if (!list_empty_careful(&wait->entry)) - return false; + if (!shared_tags) { + if (!test_bit(BLK_MQ_S_SCHED_RESTART, &this_hctx->state)) + set_bit(BLK_MQ_S_SCHED_RESTART, &this_hctx->state); + } else { + wait = &this_hctx->dispatch_wait; + if (!list_empty_careful(&wait->entry)) + return false; - spin_lock(&this_hctx->lock); - if (!list_empty(&wait->entry)) { - spin_unlock(&this_hctx->lock); - return false; - } + spin_lock(&this_hctx->lock); + if (!list_empty(&wait->entry)) { + spin_unlock(&this_hctx->lock); + return false; + } - ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx); - add_wait_queue(&ws->wait, wait); + ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx); + add_wait_queue(&ws->wait, wait); + } /* * It's possible that a tag was freed in the window between the * allocation failure and adding the hardware queue to the wait * queue. */ - if (!blk_mq_get_driver_tag(rq, hctx, false)) { + ret = blk_mq_get_driver_tag(rq, hctx, false); + + if (!shared_tags) { + /* + * Don't clear RESTART here, someone else could have set it. + * At most this will cost an extra queue run. + */ + return ret; + } else { + if (!ret) { + spin_unlock(&this_hctx->lock); + return false; + } + + /* + * We got a tag, remove ourselves from the wait queue to ensure + * someone else gets the wakeup. + */ + spin_lock_irq(&ws->wait.lock); + list_del_init(&wait->entry); + spin_unlock_irq(&ws->wait.lock); spin_unlock(&this_hctx->lock); - return false; + return true; } - - /* - * We got a tag, remove ourselves from the wait queue to ensure - * someone else gets the wakeup. - */ - spin_lock_irq(&ws->wait.lock); - list_del_init(&wait->entry); - spin_unlock_irq(&ws->wait.lock); - spin_unlock(&this_hctx->lock); - return true; } bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, @@ -1076,10 +1100,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * before we add this entry back on the dispatch list, * we'll re-run it below. */ - if (!blk_mq_dispatch_wait_add(&hctx, rq)) { + if (!blk_mq_mark_tag_wait(&hctx, rq)) { if (got_budget) blk_mq_put_dispatch_budget(hctx); - no_tag = true; + /* + * For non-shared tags, the RESTART check + * will suffice. + */ + if (hctx->flags & BLK_MQ_F_TAG_SHARED) + no_tag = true; break; } }