From patchwork Wed Jun 27 20:02:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 10492569 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D35F8601A0 for ; Wed, 27 Jun 2018 20:02:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCFB129E27 for ; Wed, 27 Jun 2018 20:02:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D177D29FC0; Wed, 27 Jun 2018 20:02:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 267F829F8D for ; Wed, 27 Jun 2018 20:02:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966477AbeF0UCN (ORCPT ); Wed, 27 Jun 2018 16:02:13 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:31573 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964784AbeF0UCM (ORCPT ); Wed, 27 Jun 2018 16:02:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1530130446; x=1561666446; h=from:to:cc:subject:date:message-id; bh=OXuQ/naSwbgdmB+chGarxX1KJ9tOn0CA6dYK5Lc3+xg=; b=GFxvW6DwNPDGYYWNojcjb6HTsT13OboqLsTk5reGZfWF2S62ZgEp03HY freM5XuFvl1HI6EkxcCrJWfioaypB33ZHTczXeghVAsdz1h9q2+DzxRAc AJ3NcQ+RnzZWI1St0vl2xtrDIsfsJqkELkbsiqXkDNHzVPfLRI1+E93Bo 6N3d4QyXmE90US36XSZ93qMCzCJY/3IaMis8iDLq8B3CDvBoZSihE9pMS eW5NXRzoKDA0QB20SJo1DhSonx3dxYGov7Z0ccW8fIPdRdoQ/LPsFsQbs d841N+NMmLKOEcahF4NJA1ESrjEdCsjcsep+y90LsU6OGrTFdpYJixIpy w==; X-IronPort-AV: E=Sophos;i="5.51,280,1526313600"; d="scan'208";a="179135724" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Jun 2018 04:14:06 +0800 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP; 27 Jun 2018 12:51:44 -0700 Received: from thinkpad-bart.sdcorp.global.sandisk.com ([10.111.67.248]) by uls-op-cesaip01.wdc.com with ESMTP; 27 Jun 2018 13:02:12 -0700 From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Ming Lei , Omar Sandoval , Hannes Reinecke , Johannes Thumshirn Subject: [PATCH v2] blk-mq: Fix a race condition in blk_mq_mark_tag_wait() Date: Wed, 27 Jun 2018 13:02:12 -0700 Message-Id: <20180627200212.14435-1-bart.vanassche@wdc.com> X-Mailer: git-send-email 2.17.1 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Because the hctx lock is not held around the only blk_mq_tag_wakeup_all() call in the block layer, the wait queue entry removal in blk_mq_dispatch_wake() is protected by the wait queue lock only. Since the hctx->dispatch_wait entry can occur on any of the SBQ_WAIT_QUEUES, the wait queue presence check, adding .dispatch_wait to a wait queue and removing the wait queue entry must all be protected by both the hctx lock and the wait queue lock. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Ming Lei Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Johannes Thumshirn --- Changes compared to v1: made sure that the hctx lock is also held around the list_del() call in blk_mq_dispatch_wake(). block/blk-mq-debugfs.c | 4 ++-- block/blk-mq-sched.c | 8 +++---- block/blk-mq.c | 48 ++++++++++++++++++++++++------------------ 3 files changed, 34 insertions(+), 26 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 1c4532e92938..05fd98fad820 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -399,7 +399,7 @@ static void *hctx_dispatch_start(struct seq_file *m, loff_t *pos) { struct blk_mq_hw_ctx *hctx = m->private; - spin_lock(&hctx->lock); + spin_lock_irq(&hctx->lock); return seq_list_start(&hctx->dispatch, *pos); } @@ -415,7 +415,7 @@ static void hctx_dispatch_stop(struct seq_file *m, void *v) { struct blk_mq_hw_ctx *hctx = m->private; - spin_unlock(&hctx->lock); + spin_unlock_irq(&hctx->lock); } static const struct seq_operations hctx_dispatch_seq_ops = { diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 56c493c6cd90..07dd6fe9f134 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -190,10 +190,10 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) * more fair dispatch. */ if (!list_empty_careful(&hctx->dispatch)) { - spin_lock(&hctx->lock); + spin_lock_irq(&hctx->lock); if (!list_empty(&hctx->dispatch)) list_splice_init(&hctx->dispatch, &rq_list); - spin_unlock(&hctx->lock); + spin_unlock_irq(&hctx->lock); } /* @@ -368,9 +368,9 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, { /* dispatch flush rq directly */ if (rq->rq_flags & RQF_FLUSH_SEQ) { - spin_lock(&hctx->lock); + spin_lock_irq(&hctx->lock); list_add(&rq->queuelist, &hctx->dispatch); - spin_unlock(&hctx->lock); + spin_unlock_irq(&hctx->lock); return true; } diff --git a/block/blk-mq.c b/block/blk-mq.c index b6888ff556cf..e04321e91116 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1003,7 +1003,10 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, hctx = container_of(wait, struct blk_mq_hw_ctx, dispatch_wait); + spin_lock_irq(&hctx->lock); list_del_init(&wait->entry); + spin_unlock_irq(&hctx->lock); + blk_mq_run_hw_queue(hctx, true); return 1; } @@ -1020,7 +1023,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, struct blk_mq_hw_ctx *this_hctx = *hctx; struct sbq_wait_state *ws; wait_queue_entry_t *wait; - bool ret; + bool ret = false; if (!(this_hctx->flags & BLK_MQ_F_TAG_SHARED)) { if (!test_bit(BLK_MQ_S_SCHED_RESTART, &this_hctx->state)) @@ -1041,14 +1044,20 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, if (!list_empty_careful(&wait->entry)) return false; + ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx); + + /* + * Since hctx->dispatch_wait can already be on any of the + * SBQ_WAIT_QUEUES number of wait queues, serialize the check and + * add_wait_queue() calls below with this_hctx->lock. + */ + spin_lock_irq(&ws->wait.lock); spin_lock(&this_hctx->lock); - if (!list_empty(&wait->entry)) { - spin_unlock(&this_hctx->lock); - return false; - } + if (!list_empty(&wait->entry)) + goto unlock; - ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx); - add_wait_queue(&ws->wait, wait); + wait->flags &= ~WQ_FLAG_EXCLUSIVE; + __add_wait_queue(&ws->wait, wait); /* * It's possible that a tag was freed in the window between the @@ -1056,21 +1065,20 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, * queue. */ ret = blk_mq_get_driver_tag(rq, hctx, false); - if (!ret) { - spin_unlock(&this_hctx->lock); - return false; - } + if (!ret) + goto unlock; /* * We got a tag, remove ourselves from the wait queue to ensure * someone else gets the wakeup. */ - spin_lock_irq(&ws->wait.lock); list_del_init(&wait->entry); - spin_unlock_irq(&ws->wait.lock); + +unlock: spin_unlock(&this_hctx->lock); + spin_unlock_irq(&ws->wait.lock); - return true; + return ret; } #define BLK_MQ_RESOURCE_DELAY 3 /* ms units */ @@ -1171,9 +1179,9 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, if (!list_empty(list)) { bool needs_restart; - spin_lock(&hctx->lock); + spin_lock_irq(&hctx->lock); list_splice_init(list, &hctx->dispatch); - spin_unlock(&hctx->lock); + spin_unlock_irq(&hctx->lock); /* * If SCHED_RESTART was set by the caller of this function and @@ -1520,9 +1528,9 @@ void blk_mq_request_bypass_insert(struct request *rq, bool run_queue) struct blk_mq_ctx *ctx = rq->mq_ctx; struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); - spin_lock(&hctx->lock); + spin_lock_irq(&hctx->lock); list_add_tail(&rq->queuelist, &hctx->dispatch); - spin_unlock(&hctx->lock); + spin_unlock_irq(&hctx->lock); if (run_queue) blk_mq_run_hw_queue(hctx, false); @@ -2048,9 +2056,9 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) if (list_empty(&tmp)) return 0; - spin_lock(&hctx->lock); + spin_lock_irq(&hctx->lock); list_splice_tail_init(&tmp, &hctx->dispatch); - spin_unlock(&hctx->lock); + spin_unlock_irq(&hctx->lock); blk_mq_run_hw_queue(hctx, true); return 0;