From patchwork Mon Sep 14 02:08:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11772609 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4808746 for ; Mon, 14 Sep 2020 02:08:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 99A8921655 for ; Mon, 14 Sep 2020 02:08:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="N7aDKklq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725965AbgINCIy (ORCPT ); Sun, 13 Sep 2020 22:08:54 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:30947 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725981AbgINCIw (ORCPT ); Sun, 13 Sep 2020 22:08:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600049330; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fUw+LQK8zgUeTJ2UOsdGIbvptQxVWFWxkrJYeywXe1s=; b=N7aDKklq9GnwFD0QauDY0O1q2qH3M4kdaArpufsMnINutJYs5n1xfaBYPNls0nRjkeGP9E YPKz9Gz/kSoCpxJaCWU3uVcXXwWtT9JFYW/CFcnNxKumS310DT07ST9oX3zMvgwkcsPO27 jLVdUc55/munUZXR4xkGSjGc71Ldwak= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-567-y7h6ipXnPd63y4u0sWJK4A-1; Sun, 13 Sep 2020 22:08:48 -0400 X-MC-Unique: y7h6ipXnPd63y4u0sWJK4A-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C47601882FB5; Mon, 14 Sep 2020 02:08:46 +0000 (UTC) Received: from localhost (ovpn-12-38.pek2.redhat.com [10.72.12.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0F55960C87; Mon, 14 Sep 2020 02:08:45 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Keith Busch Cc: Ming Lei , Sagi Grimberg , Bart Van Assche , Johannes Thumshirn , Chao Leng , Hannes Reinecke Subject: [PATCH V6 1/4] block: use test_and_{clear|test}_bit to set/clear QUEUE_FLAG_QUIESCED Date: Mon, 14 Sep 2020 10:08:24 +0800 Message-Id: <20200914020827.337615-2-ming.lei@redhat.com> In-Reply-To: <20200914020827.337615-1-ming.lei@redhat.com> References: <20200914020827.337615-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Prepare for replacing srcu with percpu-refcount for implementing queue quiesce. The following patch needs to avoid duplicated quiesce action for BLK_MQ_F_BLOCKING, so use test_and_{clear|test}_bit to set/clear QUEUE_FLAG_QUIESCED. Signed-off-by: Ming Lei Cc: Sagi Grimberg Cc: Bart Van Assche Cc: Johannes Thumshirn Cc: Chao Leng Reviewed-by: Hannes Reinecke Tested-by: Sagi Grimberg Reviewed-by: Keith Busch --- block/blk-core.c | 13 +++++++++++++ block/blk-mq.c | 11 ++++++++--- block/blk.h | 2 ++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ca3f0f00c943..34fae3986e79 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -107,6 +107,19 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_queue_flag_test_and_set); +/** + * blk_queue_flag_test_and_clear - atomically test and clear a queue flag + * @flag: flag to be clear + * @q: request queue + * + * Returns the previous value of @flag - 0 if the flag was not set and 1 if + * the flag was set. + */ +bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q) +{ + return test_and_clear_bit(flag, &q->queue_flags); +} + void blk_rq_init(struct request_queue *q, struct request *rq) { memset(rq, 0, sizeof(*rq)); diff --git a/block/blk-mq.c b/block/blk-mq.c index e04b759add75..c7d424e76781 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -199,13 +199,18 @@ void blk_mq_unfreeze_queue(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); +static bool __blk_mq_quiesce_queue_nowait(struct request_queue *q) +{ + return blk_queue_flag_test_and_set(QUEUE_FLAG_QUIESCED, q); +} + /* * FIXME: replace the scsi_internal_device_*block_nowait() calls in the * mpt3sas driver such that this function can be removed. */ void blk_mq_quiesce_queue_nowait(struct request_queue *q) { - blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q); + __blk_mq_quiesce_queue_nowait(q); } EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait); @@ -224,7 +229,7 @@ void blk_mq_quiesce_queue(struct request_queue *q) unsigned int i; bool rcu = false; - blk_mq_quiesce_queue_nowait(q); + __blk_mq_quiesce_queue_nowait(q); queue_for_each_hw_ctx(q, hctx, i) { if (hctx->flags & BLK_MQ_F_BLOCKING) @@ -246,7 +251,7 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue); */ void blk_mq_unquiesce_queue(struct request_queue *q) { - blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q); + blk_queue_flag_test_and_clear(QUEUE_FLAG_QUIESCED, q); /* dispatch requests which are inserted during quiescing */ blk_mq_run_hw_queues(q, true); diff --git a/block/blk.h b/block/blk.h index c08762e10b04..312a060ea2a2 100644 --- a/block/blk.h +++ b/block/blk.h @@ -448,4 +448,6 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); +bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q); + #endif /* BLK_INTERNAL_H */ From patchwork Mon Sep 14 02:08:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11772611 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 27583746 for ; Mon, 14 Sep 2020 02:09:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F3F752168B for ; Mon, 14 Sep 2020 02:09:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZbgD5pik" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725981AbgINCJB (ORCPT ); Sun, 13 Sep 2020 22:09:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:41524 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725973AbgINCJA (ORCPT ); Sun, 13 Sep 2020 22:09:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600049338; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cFpR+ip6uG8oKSm0d7bWdaLg/gIK+NXLpNYraI19JU0=; b=ZbgD5pikiVY59cCVILGdFqRZsU+U7uZzf4rnLguZNCvjnMB5bhv/jcfOK/VVPfnt1p/vZZ kcm7JBpxrqmjI4C5Ig+/lc+niVoedQye21ZoiRsXTh6LUvGKTGLZ6WUNtvPwlzAvpd5ih0 FSRnif/CwgHYZkoIhreHn2EKoeQyBQ0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-131-6JwpiXOUNJOPRy4yZ4q1sA-1; Sun, 13 Sep 2020 22:08:54 -0400 X-MC-Unique: 6JwpiXOUNJOPRy4yZ4q1sA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9C5381DDEF; Mon, 14 Sep 2020 02:08:52 +0000 (UTC) Received: from localhost (ovpn-12-38.pek2.redhat.com [10.72.12.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id C9E107B7AF; Mon, 14 Sep 2020 02:08:48 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Keith Busch Cc: Ming Lei , Sagi Grimberg , Bart Van Assche , Johannes Thumshirn , Chao Leng , Hannes Reinecke Subject: [PATCH V6 2/4] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING Date: Mon, 14 Sep 2020 10:08:25 +0800 Message-Id: <20200914020827.337615-3-ming.lei@redhat.com> In-Reply-To: <20200914020827.337615-1-ming.lei@redhat.com> References: <20200914020827.337615-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In case of BLK_MQ_F_BLOCKING, blk-mq uses SRCU to mark read critical section during dispatching request, then request queue quiesce is based on SRCU. What we want to get is low cost added in fast path. With percpu-ref, it is cleaner and simpler & enough for implementing queue quiesce. The main requirement is to make sure all read sections to observe QUEUE_FLAG_QUIESCED once blk_mq_quiesce_queue() returns. Also it becomes much easier to add interface of async queue quiesce. Meantime memory footprint can be reduced with per-request-queue percpu-ref. From implementation viewpoint, in fast path, not see percpu_ref is slower than SRCU, and srcu tree(default option in most distributions) could be slower since memory barrier is required in both lock & unlock, and rcu_read_lock()/rcu_read_unlock() should be much cheap than smp_mb(). 1) percpu_ref just hold the rcu_read_lock, then run a check & increase/decrease on the percpu variable: rcu_read_lock() if (__ref_is_percpu(ref, &percpu_count)) this_cpu_inc(*percpu_count); rcu_read_unlock() 2) srcu tree: idx = READ_ONCE(ssp->srcu_idx) & 0x1; this_cpu_inc(ssp->sda->srcu_lock_count[idx]); smp_mb(); /* B */ /* Avoid leaking the critical section. */ Also from my test on null_blk(blocking), not observe percpu-ref performs worse than srcu, see the following test: 1) test steps: rmmod null_blk > /dev/null 2>&1 modprobe null_blk nr_devices=1 submit_queues=1 blocking=1 fio --bs=4k --size=512G --rw=randread --norandommap --direct=1 --ioengine=libaio \ --iodepth=64 --runtime=60 --group_reporting=1 --name=nullb0 \ --filename=/dev/nullb0 --numjobs=32 test machine: HP DL380, 16 cpu cores, 2 threads per core, dual sockets/numa, Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz 2) test result: - srcu quiesce: 6063K IOPS - percpu-ref quiesce: 6113K IOPS Signed-off-by: Ming Lei Cc: Sagi Grimberg Cc: Bart Van Assche Cc: Johannes Thumshirn Cc: Chao Leng Reviewed-by: Hannes Reinecke Tested-by: Sagi Grimberg Reviewed-by: Keith Busch --- block/blk-mq-sysfs.c | 2 - block/blk-mq.c | 128 ++++++++++++++++++++++------------------- block/blk-sysfs.c | 6 +- include/linux/blk-mq.h | 8 --- include/linux/blkdev.h | 4 ++ 5 files changed, 79 insertions(+), 69 deletions(-) diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 062229395a50..799db7937105 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -38,8 +38,6 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj) cancel_delayed_work_sync(&hctx->run_work); - if (hctx->flags & BLK_MQ_F_BLOCKING) - cleanup_srcu_struct(hctx->srcu); blk_free_flush_queue(hctx->fq); sbitmap_free(&hctx->ctx_map); free_cpumask_var(hctx->cpumask); diff --git a/block/blk-mq.c b/block/blk-mq.c index c7d424e76781..65b39a498479 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -225,19 +225,23 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait); */ void blk_mq_quiesce_queue(struct request_queue *q) { - struct blk_mq_hw_ctx *hctx; - unsigned int i; - bool rcu = false; + bool blocking = !!(q->tag_set->flags & BLK_MQ_F_BLOCKING); + bool was_quiesced =__blk_mq_quiesce_queue_nowait(q); - __blk_mq_quiesce_queue_nowait(q); + if (!was_quiesced && blocking) + percpu_ref_kill(&q->dispatch_counter); - queue_for_each_hw_ctx(q, hctx, i) { - if (hctx->flags & BLK_MQ_F_BLOCKING) - synchronize_srcu(hctx->srcu); - else - rcu = true; - } - if (rcu) + /* + * In case of F_BLOCKING, if driver unquiesces its queue being + * quiesced, it can cause bigger trouble, and we simply return & + * warn once for avoiding hang here. + */ + if (blocking) + wait_event(q->mq_quiesce_wq, + percpu_ref_is_zero(&q->dispatch_counter) || + WARN_ON_ONCE(!percpu_ref_is_dying( + &q->dispatch_counter))); + else synchronize_rcu(); } EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue); @@ -251,7 +255,10 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue); */ void blk_mq_unquiesce_queue(struct request_queue *q) { - blk_queue_flag_test_and_clear(QUEUE_FLAG_QUIESCED, q); + if (blk_queue_flag_test_and_clear(QUEUE_FLAG_QUIESCED, q)) { + if (q->tag_set->flags & BLK_MQ_F_BLOCKING) + percpu_ref_resurrect(&q->dispatch_counter); + } /* dispatch requests which are inserted during quiescing */ blk_mq_run_hw_queues(q, true); @@ -704,24 +711,21 @@ void blk_mq_complete_request(struct request *rq) } EXPORT_SYMBOL(blk_mq_complete_request); -static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) - __releases(hctx->srcu) +static void hctx_unlock(struct blk_mq_hw_ctx *hctx) { - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) - rcu_read_unlock(); + if (hctx->flags & BLK_MQ_F_BLOCKING) + percpu_ref_put(&hctx->queue->dispatch_counter); else - srcu_read_unlock(hctx->srcu, srcu_idx); + rcu_read_unlock(); } -static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx) - __acquires(hctx->srcu) +/* Returning false means that queue is being quiesced */ +static inline bool hctx_lock(struct blk_mq_hw_ctx *hctx) { - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { - /* shut up gcc false positive */ - *srcu_idx = 0; - rcu_read_lock(); - } else - *srcu_idx = srcu_read_lock(hctx->srcu); + if (hctx->flags & BLK_MQ_F_BLOCKING) + return percpu_ref_tryget_live(&hctx->queue->dispatch_counter); + rcu_read_lock(); + return true; } /** @@ -1501,8 +1505,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, */ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) { - int srcu_idx; - /* * We should be running this queue from one of the CPUs that * are mapped to it. @@ -1536,9 +1538,10 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); - hctx_lock(hctx, &srcu_idx); - blk_mq_sched_dispatch_requests(hctx); - hctx_unlock(hctx, srcu_idx); + if (hctx_lock(hctx)) { + blk_mq_sched_dispatch_requests(hctx); + hctx_unlock(hctx); + } } static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx) @@ -1650,7 +1653,6 @@ EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); */ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) { - int srcu_idx; bool need_run; /* @@ -1661,10 +1663,12 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) * And queue will be rerun in blk_mq_unquiesce_queue() if it is * quiesced. */ - hctx_lock(hctx, &srcu_idx); + if (!hctx_lock(hctx)) + return; + need_run = !blk_queue_quiesced(hctx->queue) && blk_mq_hctx_has_pending(hctx); - hctx_unlock(hctx, srcu_idx); + hctx_unlock(hctx); if (need_run) __blk_mq_delay_run_hw_queue(hctx, async, 0); @@ -2004,7 +2008,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, bool run_queue = true; /* - * RCU or SRCU read lock is needed before checking quiesced flag. + * hctx_lock() is needed before checking quiesced flag. * * When queue is stopped or quiesced, ignore 'bypass_insert' from * blk_mq_request_issue_directly(), and return BLK_STS_OK to caller, @@ -2052,11 +2056,14 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_qc_t *cookie) { blk_status_t ret; - int srcu_idx; might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); - hctx_lock(hctx, &srcu_idx); + /* Insert request to queue in case of being quiesced */ + if (!hctx_lock(hctx)) { + blk_mq_sched_insert_request(rq, false, false, false); + return; + } ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true); if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) @@ -2064,19 +2071,22 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); - hctx_unlock(hctx, srcu_idx); + hctx_unlock(hctx); } blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last) { blk_status_t ret; - int srcu_idx; blk_qc_t unused_cookie; struct blk_mq_hw_ctx *hctx = rq->mq_hctx; - hctx_lock(hctx, &srcu_idx); + /* Insert request to queue in case of being quiesced */ + if (!hctx_lock(hctx)) { + blk_mq_sched_insert_request(rq, false, false, false); + return BLK_STS_OK; + } ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true, last); - hctx_unlock(hctx, srcu_idx); + hctx_unlock(hctx); return ret; } @@ -2607,20 +2617,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q, } } -static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set) -{ - int hw_ctx_size = sizeof(struct blk_mq_hw_ctx); - - BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, srcu), - __alignof__(struct blk_mq_hw_ctx)) != - sizeof(struct blk_mq_hw_ctx)); - - if (tag_set->flags & BLK_MQ_F_BLOCKING) - hw_ctx_size += sizeof(struct srcu_struct); - - return hw_ctx_size; -} - static int blk_mq_init_hctx(struct request_queue *q, struct blk_mq_tag_set *set, struct blk_mq_hw_ctx *hctx, unsigned hctx_idx) @@ -2658,7 +2654,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, struct blk_mq_hw_ctx *hctx; gfp_t gfp = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY; - hctx = kzalloc_node(blk_mq_hw_ctx_size(set), gfp, node); + hctx = kzalloc_node(sizeof(struct blk_mq_hw_ctx), gfp, node); if (!hctx) goto fail_alloc_hctx; @@ -2701,8 +2697,6 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, if (!hctx->fq) goto free_bitmap; - if (hctx->flags & BLK_MQ_F_BLOCKING) - init_srcu_struct(hctx->srcu); blk_mq_hctx_kobj_init(hctx); return hctx; @@ -3182,6 +3176,13 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, mutex_unlock(&q->sysfs_lock); } +static void blk_mq_dispatch_counter_release(struct percpu_ref *ref) +{ + struct request_queue *q = container_of(ref, struct request_queue, + dispatch_counter); + wake_up_all(&q->mq_quiesce_wq); +} + struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, struct request_queue *q, bool elevator_init) @@ -3198,6 +3199,14 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, if (blk_mq_alloc_ctxs(q)) goto err_poll; + if (set->flags & BLK_MQ_F_BLOCKING) { + init_waitqueue_head(&q->mq_quiesce_wq); + if (percpu_ref_init(&q->dispatch_counter, + blk_mq_dispatch_counter_release, + PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) + goto err_hctxs; + } + /* init q->mq_kobj and sw queues' kobjects */ blk_mq_sysfs_init(q); @@ -3206,7 +3215,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, blk_mq_realloc_hw_ctxs(set, q); if (!q->nr_hw_queues) - goto err_hctxs; + goto err_dispatch_counter; INIT_WORK(&q->timeout_work, blk_mq_timeout_work); blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ); @@ -3240,6 +3249,9 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, return q; +err_dispatch_counter: + if (set->flags & BLK_MQ_F_BLOCKING) + percpu_ref_exit(&q->dispatch_counter); err_hctxs: kfree(q->queue_hw_ctx); q->nr_hw_queues = 0; diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 81722cdcf0cb..c3868bd10255 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -795,9 +795,13 @@ static void blk_release_queue(struct kobject *kobj) blk_queue_free_zone_bitmaps(q); - if (queue_is_mq(q)) + if (queue_is_mq(q)) { blk_mq_release(q); + if (q->tag_set->flags & BLK_MQ_F_BLOCKING) + percpu_ref_exit(&q->dispatch_counter); + } + blk_trace_shutdown(q); mutex_lock(&q->debugfs_mutex); debugfs_remove_recursive(q->debugfs_dir); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index b23eeca4d677..df642055f02c 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -4,7 +4,6 @@ #include #include -#include struct blk_mq_tags; struct blk_flush_queue; @@ -173,13 +172,6 @@ struct blk_mq_hw_ctx { * q->unused_hctx_list. */ struct list_head hctx_list; - - /** - * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is - * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also - * blk_mq_hw_ctx_size(). - */ - struct srcu_struct srcu[]; }; /** diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 5bd96fbab9b4..cd9b040232c1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -571,6 +571,10 @@ struct request_queue { struct mutex mq_freeze_lock; struct percpu_ref q_usage_counter; + /* only used for BLK_MQ_F_BLOCKING */ + struct percpu_ref dispatch_counter; + wait_queue_head_t mq_quiesce_wq; + struct blk_mq_tag_set *tag_set; struct list_head tag_set_list; struct bio_set bio_split; From patchwork Mon Sep 14 02:08:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11772613 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6A8E4746 for ; Mon, 14 Sep 2020 02:09:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4799521655 for ; Mon, 14 Sep 2020 02:09:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="O+jct0+N" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725983AbgINCJG (ORCPT ); Sun, 13 Sep 2020 22:09:06 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:22252 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725973AbgINCJF (ORCPT ); Sun, 13 Sep 2020 22:09:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600049343; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d4KrjG+hpr/NvnFCCl/oxIpUDq3do9AAwekXW/8myO0=; b=O+jct0+NDu4Q44sGG6p0zj73IPVJTz00LR9lejxPk9MPB4myxo2Hj2hZ0SozK4yuHjJskn 4TUoHYa8TD5aemqUfoE9U+XkLOyPYm4/LDl5yLU6hnLYNz1nPO8c2fTf9rVUgWMdnAq2GM wcECdvgMeA8QpmfoqrIvyZJlbPBkxek= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-380-AluOSSMaOraaQbfd78bX1w-1; Sun, 13 Sep 2020 22:09:00 -0400 X-MC-Unique: AluOSSMaOraaQbfd78bX1w-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1779F1074651; Mon, 14 Sep 2020 02:08:59 +0000 (UTC) Received: from localhost (ovpn-12-38.pek2.redhat.com [10.72.12.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5D47F5DA7B; Mon, 14 Sep 2020 02:08:54 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Keith Busch Cc: Ming Lei , Sagi Grimberg , Bart Van Assche , Johannes Thumshirn , Chao Leng , Hannes Reinecke Subject: [PATCH V6 3/4] blk-mq: add tagset quiesce interface Date: Mon, 14 Sep 2020 10:08:26 +0800 Message-Id: <20200914020827.337615-4-ming.lei@redhat.com> In-Reply-To: <20200914020827.337615-1-ming.lei@redhat.com> References: <20200914020827.337615-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org drivers that have shared tagsets may need to quiesce potentially a lot of request queues that all share a single tagset (e.g. nvme). Add an interface to quiesce all the queues on a given tagset. This interface is useful because it can speedup the quiesce by doing it in parallel. For tagsets that have BLK_MQ_F_BLOCKING set, we kill request queue's dispatch percpu-refcount such that all of them wait for the counter becoming zero. For tagsets that don't have BLK_MQ_F_BLOCKING set, we simply call a single synchronize_rcu as this is sufficient. This patch is against Sagi's original post. Signed-off-by: Ming Lei Cc: Sagi Grimberg Cc: Bart Van Assche Cc: Johannes Thumshirn Cc: Chao Leng Reviewed-by: Hannes Reinecke Tested-by: Sagi Grimberg Reviewed-by: Keith Busch --- block/blk-mq.c | 59 +++++++++++++++++++++++++++++++++++------- include/linux/blk-mq.h | 2 ++ 2 files changed, 51 insertions(+), 10 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 65b39a498479..fb609fc38cf5 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -214,16 +214,7 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait); -/** - * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished - * @q: request queue. - * - * Note: this function does not prevent that the struct request end_io() - * callback function is invoked. Once this function is returned, we make - * sure no dispatch can happen until the queue is unquiesced via - * blk_mq_unquiesce_queue(). - */ -void blk_mq_quiesce_queue(struct request_queue *q) +static void __blk_mq_quiesce_queue(struct request_queue *q, bool wait) { bool blocking = !!(q->tag_set->flags & BLK_MQ_F_BLOCKING); bool was_quiesced =__blk_mq_quiesce_queue_nowait(q); @@ -231,6 +222,9 @@ void blk_mq_quiesce_queue(struct request_queue *q) if (!was_quiesced && blocking) percpu_ref_kill(&q->dispatch_counter); + if (!wait) + return; + /* * In case of F_BLOCKING, if driver unquiesces its queue being * quiesced, it can cause bigger trouble, and we simply return & @@ -244,6 +238,20 @@ void blk_mq_quiesce_queue(struct request_queue *q) else synchronize_rcu(); } + +/* + * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished + * @q: request queue. + * + * Note: this function does not prevent that the struct request end_io() + * callback function is invoked. Once this function is returned, we make + * sure no dispatch can happen until the queue is unquiesced via + * blk_mq_unquiesce_queue(). + */ +void blk_mq_quiesce_queue(struct request_queue *q) +{ + __blk_mq_quiesce_queue(q, true); +} EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue); /* @@ -265,6 +273,37 @@ void blk_mq_unquiesce_queue(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue); +void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set) +{ + struct request_queue *q; + + mutex_lock(&set->tag_list_lock); + list_for_each_entry(q, &set->tag_list, tag_set_list) + __blk_mq_quiesce_queue(q, false); + + /* wait until all queues' quiesce is done */ + if (set->flags & BLK_MQ_F_BLOCKING) { + list_for_each_entry(q, &set->tag_list, tag_set_list) + wait_event(q->mq_quiesce_wq, + percpu_ref_is_zero(&q->dispatch_counter)); + } else { + synchronize_rcu(); + } + mutex_unlock(&set->tag_list_lock); +} +EXPORT_SYMBOL_GPL(blk_mq_quiesce_tagset); + +void blk_mq_unquiesce_tagset(struct blk_mq_tag_set *set) +{ + struct request_queue *q; + + mutex_lock(&set->tag_list_lock); + list_for_each_entry(q, &set->tag_list, tag_set_list) + blk_mq_unquiesce_queue(q); + mutex_unlock(&set->tag_list_lock); +} +EXPORT_SYMBOL_GPL(blk_mq_unquiesce_tagset); + void blk_mq_wake_waiters(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index df642055f02c..90da3582b91d 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -519,6 +519,8 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap); void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues); void blk_mq_quiesce_queue_nowait(struct request_queue *q); +void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set); +void blk_mq_unquiesce_tagset(struct blk_mq_tag_set *set); unsigned int blk_mq_rq_cpu(struct request *rq); From patchwork Mon Sep 14 02:08:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11772615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 750BF746 for ; Mon, 14 Sep 2020 02:09:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58B0221974 for ; Mon, 14 Sep 2020 02:09:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QWAcPOTc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725986AbgINCJM (ORCPT ); Sun, 13 Sep 2020 22:09:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:55259 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725973AbgINCJL (ORCPT ); Sun, 13 Sep 2020 22:09:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600049349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IDI2FxmpYBfrCSucjjq1qfd4UEXOEGAb5Iw3m7LOSG4=; b=QWAcPOTccuLfAO4S0y4fCYwEyQhxccO39kbjw3oXhxDXkMN2wKdOWMZ7NBcwE+zcdGVlI4 IY18ywXxQ2lOK3Eg6shGI4KZomtl+jfV2ZNsoVpS0M07y1hMCDBD7M694P7XH8wuHCpeRG FyVq2+gTJLmbkcOgl6mbDHOx2Upo+so= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-197-u7i_aamfNO2GTopbEsgkPQ-1; Sun, 13 Sep 2020 22:09:07 -0400 X-MC-Unique: u7i_aamfNO2GTopbEsgkPQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 530F61074657; Mon, 14 Sep 2020 02:09:05 +0000 (UTC) Received: from localhost (ovpn-12-38.pek2.redhat.com [10.72.12.38]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9883919C4F; Mon, 14 Sep 2020 02:09:01 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Keith Busch Cc: Sagi Grimberg , Hannes Reinecke , Bart Van Assche , Johannes Thumshirn , Chao Leng , Ming Lei Subject: [PATCH V6 4/4] nvme: use blk_mq_[un]quiesce_tagset Date: Mon, 14 Sep 2020 10:08:27 +0800 Message-Id: <20200914020827.337615-5-ming.lei@redhat.com> In-Reply-To: <20200914020827.337615-1-ming.lei@redhat.com> References: <20200914020827.337615-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Sagi Grimberg All controller namespaces share the same tagset, so we can use this interface which does the optimal operation for parallel quiesce based on the tagset type (e.g. blocking tagsets and non-blocking tagsets). Reviewed-by: Hannes Reinecke Tested-by: Sagi Grimberg Reviewed-by: Keith Busch Cc: Sagi Grimberg Cc: Bart Van Assche Cc: Johannes Thumshirn Cc: Chao Leng Add code to unquiesce ctrl->connect_q in nvme_stop_queues(), meantime avoid to call blk_mq_quiesce_tagset()/blk_mq_unquiesce_tagset() if this tagset isn't initialized. Signed-off-by: Ming Lei Signed-off-by: Sagi Grimberg --- drivers/nvme/host/core.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index ea1fa41fbba8..a6af8978a3ba 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4623,23 +4623,22 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze); void nvme_stop_queues(struct nvme_ctrl *ctrl) { - struct nvme_ns *ns; + if (list_empty_careful(&ctrl->namespaces)) + return; - down_read(&ctrl->namespaces_rwsem); - list_for_each_entry(ns, &ctrl->namespaces, list) - blk_mq_quiesce_queue(ns->queue); - up_read(&ctrl->namespaces_rwsem); + blk_mq_quiesce_tagset(ctrl->tagset); + + if (ctrl->connect_q) + blk_mq_unquiesce_queue(ctrl->connect_q); } EXPORT_SYMBOL_GPL(nvme_stop_queues); void nvme_start_queues(struct nvme_ctrl *ctrl) { - struct nvme_ns *ns; + if (list_empty_careful(&ctrl->namespaces)) + return; - down_read(&ctrl->namespaces_rwsem); - list_for_each_entry(ns, &ctrl->namespaces, list) - blk_mq_unquiesce_queue(ns->queue); - up_read(&ctrl->namespaces_rwsem); + blk_mq_unquiesce_tagset(ctrl->tagset); } EXPORT_SYMBOL_GPL(nvme_start_queues);