From patchwork Fri Sep 1 18:49:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 9935103 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 42EE16038C for ; Fri, 1 Sep 2017 18:52:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3576827F91 for ; Fri, 1 Sep 2017 18:52:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 29C6E27FE4; Fri, 1 Sep 2017 18:52:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC4D027F91 for ; Fri, 1 Sep 2017 18:52:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752377AbdIASwB (ORCPT ); Fri, 1 Sep 2017 14:52:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:17479 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752227AbdIASwA (ORCPT ); Fri, 1 Sep 2017 14:52:00 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 41226769E6; Fri, 1 Sep 2017 18:52:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 41226769E6 Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=ming.lei@redhat.com Received: from localhost (ovpn-12-21.pek2.redhat.com [10.72.12.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id AE36960C25; Fri, 1 Sep 2017 18:51:46 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , "James E . J . Bottomley" Cc: Oleksandr Natalenko , Johannes Thumshirn , Tejun Heo , Ming Lei Subject: [PATCH V2 7/8] block: introduce preempt version of blk_[freeze|unfreeze]_queue Date: Sat, 2 Sep 2017 02:49:57 +0800 Message-Id: <20170901184958.19452-9-ming.lei@redhat.com> In-Reply-To: <20170901184958.19452-1-ming.lei@redhat.com> References: <20170901184958.19452-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Fri, 01 Sep 2017 18:52:00 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The two APIs are required to allow request allocation for RQF_PREEMPT when queue is frozen. The following two points have to be guaranteed for one queue: 1) preempt freezing can be started only after all pending normal & preempt freezing are completed 2) normal freezing can't be started if there is pending preempt freezing. Because for normal freezing, once blk_mq_freeze_queue_wait() is returned, we have to make sure no I/Os are pending. rwsem should have been perfect for this kind of sync, but lockdep will complain in case of nested normal freeze. So spin_lock with freezing status is used for the sync. Signed-off-by: Ming Lei --- block/blk-core.c | 2 ++ block/blk-mq.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++--- include/linux/blk-mq.h | 2 ++ include/linux/blkdev.h | 3 +++ 4 files changed, 76 insertions(+), 3 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index c199910d4fe1..bbcea07f17da 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -899,6 +899,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (blkcg_init_queue(q)) goto fail_ref; + spin_lock_init(&q->freeze_lock); + return q; fail_ref: diff --git a/block/blk-mq.c b/block/blk-mq.c index 695d2eeaf41a..bf8c057aa50f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -118,16 +118,48 @@ void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part, blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); } -void blk_freeze_queue_start(struct request_queue *q) +static void __blk_freeze_queue_start(struct request_queue *q, bool preempt) { int freeze_depth; + /* + * Wait for completion of another kind of freezing. + * + * We have to sync between normal freeze and preempt + * freeze. preempt freeze can only be started iff all + * pending normal & preempt freezing are completed, + * meantime normal freeze can be started only if there + * isn't pending preempt freezing. + * + * rwsem should have been perfect for this kind of sync, + * but lockdep will complain in case of nested normal freeze. + * + * So we have to use lock to do that manually. + */ + spin_lock(&q->freeze_lock); + wait_event_cmd(q->mq_freeze_wq, + preempt ? !(q->normal_freezing + q->preempt_freezing) : !q->preempt_freezing, + spin_unlock(&q->freeze_lock), + spin_lock(&q->freeze_lock)); + freeze_depth = atomic_inc_return(&q->mq_freeze_depth); if (freeze_depth == 1) { + if (preempt) + q->preempt_freezing = 1; + else + q->normal_freezing = 1; + spin_unlock(&q->freeze_lock); + percpu_ref_kill(&q->q_usage_counter); if (q->mq_ops) blk_mq_run_hw_queues(q, false); - } + } else + spin_unlock(&q->freeze_lock); +} + +void blk_freeze_queue_start(struct request_queue *q) +{ + __blk_freeze_queue_start(q, false); } EXPORT_SYMBOL_GPL(blk_freeze_queue_start); @@ -166,20 +198,54 @@ void blk_freeze_queue(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_freeze_queue); -void blk_unfreeze_queue(struct request_queue *q) +static void __blk_unfreeze_queue(struct request_queue *q, bool preempt) { int freeze_depth; freeze_depth = atomic_dec_return(&q->mq_freeze_depth); WARN_ON_ONCE(freeze_depth < 0); if (!freeze_depth) { + spin_lock(&q->freeze_lock); + if (preempt) + q->preempt_freezing = 0; + else + q->normal_freezing = 0; + spin_unlock(&q->freeze_lock); percpu_ref_reinit(&q->q_usage_counter); wake_up_all(&q->mq_freeze_wq); } } + +void blk_unfreeze_queue(struct request_queue *q) +{ + __blk_unfreeze_queue(q, false); +} EXPORT_SYMBOL_GPL(blk_unfreeze_queue); /* + * Once this function is returned, only allow to get request + * for preempt purpose, such as RQF_PREEMPT. + * + */ +void blk_freeze_queue_preempt(struct request_queue *q) +{ + __blk_freeze_queue_start(q, true); + blk_freeze_queue_wait(q); +} +EXPORT_SYMBOL_GPL(blk_freeze_queue_preempt); + +/* + * It is the caller's responsibility to make sure no new + * request can be allocated before calling this function. + */ +void blk_unfreeze_queue_preempt(struct request_queue *q) +{ + blk_freeze_queue_wait(q); + __blk_unfreeze_queue(q, true); +} +EXPORT_SYMBOL_GPL(blk_unfreeze_queue_preempt); + +/* * FIXME: replace the scsi_internal_device_*block_nowait() calls in the * mpt3sas driver such that this function can be removed. */ diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 0ba5cb043172..596f433eb54c 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -259,6 +259,8 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, busy_tag_iter_fn *fn, void *priv); void blk_freeze_queue(struct request_queue *q); void blk_unfreeze_queue(struct request_queue *q); +void blk_freeze_queue_preempt(struct request_queue *q); +void blk_unfreeze_queue_preempt(struct request_queue *q); void blk_freeze_queue_start(struct request_queue *q); void blk_freeze_queue_wait(struct request_queue *q); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index a43422f5379a..2d62965e91eb 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -565,6 +565,9 @@ struct request_queue { int bypass_depth; atomic_t mq_freeze_depth; + spinlock_t freeze_lock; + unsigned normal_freezing:1; + unsigned preempt_freezing:1; #if defined(CONFIG_BLK_DEV_BSG) bsg_job_fn *bsg_job_fn;