From patchwork Tue Nov 30 07:37:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12646435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2116FC433FE for ; Tue, 30 Nov 2021 07:38:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239071AbhK3Hlm (ORCPT ); Tue, 30 Nov 2021 02:41:42 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:43304 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234337AbhK3Hlk (ORCPT ); Tue, 30 Nov 2021 02:41:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638257901; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qO/LILEvME24ar0ge2kp6BrECIWFnm0u0G6v6wboYKI=; b=A4CRwuDrjonblvy1rLQ8dIn4jt7IxMLay0fARAx/PCYDz/Gqeyskg5AqbUEXDa6lWnljr9 S9ovxSMg6okZ3/szi3XUKqybItc7VMDt873+xjzWDHEM92vnd8lzEg291opP0x7xPggEYC mDLfntAW8E8P+jn+94BpfMWSCNrnVj8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-296-SQLNRBG-PM2PmxISjzafAQ-1; Tue, 30 Nov 2021 02:38:16 -0500 X-MC-Unique: SQLNRBG-PM2PmxISjzafAQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D8A46180FD64; Tue, 30 Nov 2021 07:38:14 +0000 (UTC) Received: from localhost (ovpn-8-25.pek2.redhat.com [10.72.8.25]) by smtp.corp.redhat.com (Postfix) with ESMTP id 735F767842; Tue, 30 Nov 2021 07:38:11 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei Subject: [PATCH V2 1/5] blk-mq: remove hctx_lock and hctx_unlock Date: Tue, 30 Nov 2021 15:37:48 +0800 Message-Id: <20211130073752.3005936-2-ming.lei@redhat.com> In-Reply-To: <20211130073752.3005936-1-ming.lei@redhat.com> References: <20211130073752.3005936-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Remove hctx_lock and hctx_unlock, and add one helper of blk_mq_run_dispatch_ops() to run code block defined in dispatch_ops with rcu/srcu read held. Compared with hctx_lock()/hctx_unlock(): 1) remove 2 branch to 1, so we just need to check (hctx->flags & BLK_MQ_F_BLOCKING) once when running one dispatch_ops 2) srcu_idx needn't to be touched in case of non-blocking 3) might_sleep_if() can be moved to the blocking branch t/io_uring shows that ~4% IOPS boost is observed on null_blk. Signed-off-by: Ming Lei Reported-by: kernel test robot --- block/blk-mq.c | 79 ++++++++++++++++++++------------------------------ 1 file changed, 31 insertions(+), 48 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 2deb99cf185e..c5dc716b8167 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1079,25 +1079,22 @@ void blk_mq_complete_request(struct request *rq) } EXPORT_SYMBOL(blk_mq_complete_request); -static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) - __releases(hctx->srcu) -{ - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) - rcu_read_unlock(); - else - srcu_read_unlock(hctx->srcu, srcu_idx); -} - -static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx) - __acquires(hctx->srcu) -{ - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { - /* shut up gcc false positive */ - *srcu_idx = 0; - rcu_read_lock(); - } else - *srcu_idx = srcu_read_lock(hctx->srcu); -} +/* run the code block in @dispatch_ops with rcu/srcu read lock held */ +#define blk_mq_run_dispatch_ops(hctx, dispatch_ops) \ +do { \ + if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { \ + rcu_read_lock(); \ + (dispatch_ops); \ + rcu_read_unlock(); \ + } else { \ + int srcu_idx; \ + \ + might_sleep(); \ + srcu_idx = srcu_read_lock(hctx->srcu); \ + (dispatch_ops); \ + srcu_read_unlock(hctx->srcu, srcu_idx); \ + } \ +} while (0) /** * blk_mq_start_request - Start processing a request @@ -1960,19 +1957,13 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, */ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) { - int srcu_idx; - /* * We can't run the queue inline with ints disabled. Ensure that * we catch bad users of this early. */ WARN_ON_ONCE(in_interrupt()); - might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); - - hctx_lock(hctx, &srcu_idx); - blk_mq_sched_dispatch_requests(hctx); - hctx_unlock(hctx, srcu_idx); + blk_mq_run_dispatch_ops(hctx, blk_mq_sched_dispatch_requests(hctx)); } static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx) @@ -2084,7 +2075,6 @@ EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); */ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) { - int srcu_idx; bool need_run; /* @@ -2095,10 +2085,9 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) * And queue will be rerun in blk_mq_unquiesce_queue() if it is * quiesced. */ - hctx_lock(hctx, &srcu_idx); - need_run = !blk_queue_quiesced(hctx->queue) && - blk_mq_hctx_has_pending(hctx); - hctx_unlock(hctx, srcu_idx); + blk_mq_run_dispatch_ops(hctx, + need_run = !blk_queue_quiesced(hctx->queue) && + blk_mq_hctx_has_pending(hctx)); if (need_run) __blk_mq_delay_run_hw_queue(hctx, async, 0); @@ -2502,31 +2491,25 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq) { blk_status_t ret; - int srcu_idx; - - might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); - - hctx_lock(hctx, &srcu_idx); - - ret = __blk_mq_try_issue_directly(hctx, rq, false, true); - if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) - blk_mq_request_bypass_insert(rq, false, true); - else if (ret != BLK_STS_OK) - blk_mq_end_request(rq, ret); - hctx_unlock(hctx, srcu_idx); + blk_mq_run_dispatch_ops(hctx, + { + ret = __blk_mq_try_issue_directly(hctx, rq, false, true); + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) + blk_mq_request_bypass_insert(rq, false, true); + else if (ret != BLK_STS_OK) + blk_mq_end_request(rq, ret); + } + ); } static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last) { blk_status_t ret; - int srcu_idx; struct blk_mq_hw_ctx *hctx = rq->mq_hctx; - hctx_lock(hctx, &srcu_idx); - ret = __blk_mq_try_issue_directly(hctx, rq, true, last); - hctx_unlock(hctx, srcu_idx); - + blk_mq_run_dispatch_ops(hctx, + ret = __blk_mq_try_issue_directly(hctx, rq, true, last)); return ret; } From patchwork Tue Nov 30 07:37:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12646437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F53BC433F5 for ; Tue, 30 Nov 2021 07:38:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239099AbhK3HmD (ORCPT ); Tue, 30 Nov 2021 02:42:03 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:34713 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239080AbhK3Hlt (ORCPT ); Tue, 30 Nov 2021 02:41:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638257910; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d3P5jR+l+L0z0qtIKT2BMJPtc2z0RNo6n9uuv+kb7/A=; b=g7g8BlvUaanMJU9yLDsPVl5XBcwssZwm1HFrhpJ6uDSHAtIv9cItMDbZbaJLsJnfbimDD4 bkswIAGWieJJ6G2j0RpnslzZthRgzO8JLkFuJDepG2Oc2pz/KdQa+NX9fSdHHack7mjW/B V/zFhCPlRpBfc3wF/VRPpX4JsXcO4FI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-554-FGDxn6evPqixBSSbB-HAZQ-1; Tue, 30 Nov 2021 02:38:24 -0500 X-MC-Unique: FGDxn6evPqixBSSbB-HAZQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CF75194EE4; Tue, 30 Nov 2021 07:38:22 +0000 (UTC) Received: from localhost (ovpn-8-25.pek2.redhat.com [10.72.8.25]) by smtp.corp.redhat.com (Postfix) with ESMTP id B0E4C60BF1; Tue, 30 Nov 2021 07:38:17 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei Subject: [PATCH V2 2/5] blk-mq: move srcu from blk_mq_hw_ctx to request_queue Date: Tue, 30 Nov 2021 15:37:49 +0800 Message-Id: <20211130073752.3005936-3-ming.lei@redhat.com> In-Reply-To: <20211130073752.3005936-1-ming.lei@redhat.com> References: <20211130073752.3005936-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In case of BLK_MQ_F_BLOCKING, per-hctx srcu is used to protect dispatch critical area. However, this srcu instance stays at the end of hctx, and it often takes standalone cacheline, often cold. Inside srcu_read_lock() and srcu_read_unlock(), WRITE is always done on the indirect percpu variable which is allocated from heap instead of being embedded, srcu->srcu_idx is read only in srcu_read_lock(). It doesn't matter if srcu structure stays in hctx or request queue. So switch to per-request-queue srcu for protecting dispatch, and this way simplifies quiesce a lot, not mention quiesce is always done on the request queue wide. Cc: Keith Busch Signed-off-by: Ming Lei --- block/blk-core.c | 27 ++++++++++++++++++++++----- block/blk-mq-sysfs.c | 2 -- block/blk-mq.c | 41 ++++++++++------------------------------- block/blk-sysfs.c | 3 ++- block/blk.h | 10 +++++++++- block/genhd.c | 2 +- include/linux/blk-mq.h | 8 -------- include/linux/blkdev.h | 9 +++++++++ 8 files changed, 53 insertions(+), 49 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index b0660c9df852..10619fd83c1b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -66,6 +66,7 @@ DEFINE_IDA(blk_queue_ida); * For queue allocation */ struct kmem_cache *blk_requestq_cachep; +struct kmem_cache *blk_requestq_srcu_cachep; /* * Controlling structure to kblockd @@ -437,21 +438,27 @@ static void blk_timeout_work(struct work_struct *work) { } -struct request_queue *blk_alloc_queue(int node_id) +struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu) { struct request_queue *q; int ret; - q = kmem_cache_alloc_node(blk_requestq_cachep, - GFP_KERNEL | __GFP_ZERO, node_id); + q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu), + GFP_KERNEL | __GFP_ZERO, node_id); if (!q) return NULL; + if (alloc_srcu) { + blk_queue_flag_set(QUEUE_FLAG_HAS_SRCU, q); + if (init_srcu_struct(q->srcu) != 0) + goto fail_q; + } + q->last_merge = NULL; q->id = ida_simple_get(&blk_queue_ida, 0, 0, GFP_KERNEL); if (q->id < 0) - goto fail_q; + goto fail_srcu; ret = bioset_init(&q->bio_split, BIO_POOL_SIZE, 0, 0); if (ret) @@ -508,8 +515,11 @@ struct request_queue *blk_alloc_queue(int node_id) bioset_exit(&q->bio_split); fail_id: ida_simple_remove(&blk_queue_ida, q->id); +fail_srcu: + if (alloc_srcu) + cleanup_srcu_struct(q->srcu); fail_q: - kmem_cache_free(blk_requestq_cachep, q); + kmem_cache_free(blk_get_queue_kmem_cache(alloc_srcu), q); return NULL; } @@ -1301,6 +1311,9 @@ int __init blk_dev_init(void) sizeof_field(struct request, cmd_flags)); BUILD_BUG_ON(REQ_OP_BITS + REQ_FLAG_BITS > 8 * sizeof_field(struct bio, bi_opf)); + BUILD_BUG_ON(ALIGN(offsetof(struct request_queue, srcu), + __alignof__(struct request_queue)) != + sizeof(struct request_queue)); /* used for unplugging and affects IO latency/throughput - HIGHPRI */ kblockd_workqueue = alloc_workqueue("kblockd", @@ -1311,6 +1324,10 @@ int __init blk_dev_init(void) blk_requestq_cachep = kmem_cache_create("request_queue", sizeof(struct request_queue), 0, SLAB_PANIC, NULL); + blk_requestq_srcu_cachep = kmem_cache_create("request_queue_srcu", + sizeof(struct request_queue) + + sizeof(struct srcu_struct), 0, SLAB_PANIC, NULL); + blk_debugfs_root = debugfs_create_dir("block", NULL); return 0; diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 253c857cba47..674786574075 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -36,8 +36,6 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj) struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx, kobj); - if (hctx->flags & BLK_MQ_F_BLOCKING) - cleanup_srcu_struct(hctx->srcu); blk_free_flush_queue(hctx->fq); sbitmap_free(&hctx->ctx_map); free_cpumask_var(hctx->cpumask); diff --git a/block/blk-mq.c b/block/blk-mq.c index c5dc716b8167..a3ff671ca20e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -260,17 +260,9 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait); */ void blk_mq_wait_quiesce_done(struct request_queue *q) { - struct blk_mq_hw_ctx *hctx; - unsigned int i; - bool rcu = false; - - queue_for_each_hw_ctx(q, hctx, i) { - if (hctx->flags & BLK_MQ_F_BLOCKING) - synchronize_srcu(hctx->srcu); - else - rcu = true; - } - if (rcu) + if (blk_queue_has_srcu(q)) + synchronize_srcu(q->srcu); + else synchronize_rcu(); } EXPORT_SYMBOL_GPL(blk_mq_wait_quiesce_done); @@ -1090,9 +1082,9 @@ do { \ int srcu_idx; \ \ might_sleep(); \ - srcu_idx = srcu_read_lock(hctx->srcu); \ + srcu_idx = srcu_read_lock(hctx->queue->srcu); \ (dispatch_ops); \ - srcu_read_unlock(hctx->srcu, srcu_idx); \ + srcu_read_unlock(hctx->queue->srcu, srcu_idx); \ } \ } while (0) @@ -3431,20 +3423,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q, } } -static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set) -{ - int hw_ctx_size = sizeof(struct blk_mq_hw_ctx); - - BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, srcu), - __alignof__(struct blk_mq_hw_ctx)) != - sizeof(struct blk_mq_hw_ctx)); - - if (tag_set->flags & BLK_MQ_F_BLOCKING) - hw_ctx_size += sizeof(struct srcu_struct); - - return hw_ctx_size; -} - static int blk_mq_init_hctx(struct request_queue *q, struct blk_mq_tag_set *set, struct blk_mq_hw_ctx *hctx, unsigned hctx_idx) @@ -3482,7 +3460,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, struct blk_mq_hw_ctx *hctx; gfp_t gfp = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY; - hctx = kzalloc_node(blk_mq_hw_ctx_size(set), gfp, node); + hctx = kzalloc_node(sizeof(struct blk_mq_hw_ctx), gfp, node); if (!hctx) goto fail_alloc_hctx; @@ -3524,8 +3502,6 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, if (!hctx->fq) goto free_bitmap; - if (hctx->flags & BLK_MQ_F_BLOCKING) - init_srcu_struct(hctx->srcu); blk_mq_hctx_kobj_init(hctx); return hctx; @@ -3861,7 +3837,7 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set, struct request_queue *q; int ret; - q = blk_alloc_queue(set->numa_node); + q = blk_alloc_queue(set->numa_node, set->flags & BLK_MQ_F_BLOCKING); if (!q) return ERR_PTR(-ENOMEM); q->queuedata = queuedata; @@ -4010,6 +3986,9 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, struct request_queue *q) { + WARN_ON_ONCE(blk_queue_has_srcu(q) != + !!(set->flags & BLK_MQ_F_BLOCKING)); + /* mark the queue as mq asap */ q->mq_ops = set->ops; diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 4622da4bb992..3e6357321225 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -735,7 +735,8 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) { struct request_queue *q = container_of(rcu_head, struct request_queue, rcu_head); - kmem_cache_free(blk_requestq_cachep, q); + + kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q); } /* Unconfigure the I/O scheduler and dissociate from the cgroup controller. */ diff --git a/block/blk.h b/block/blk.h index a57c84654d0a..911f9f8db646 100644 --- a/block/blk.h +++ b/block/blk.h @@ -27,6 +27,7 @@ struct blk_flush_queue { }; extern struct kmem_cache *blk_requestq_cachep; +extern struct kmem_cache *blk_requestq_srcu_cachep; extern struct kobj_type blk_queue_ktype; extern struct ida blk_queue_ida; @@ -428,7 +429,14 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); -struct request_queue *blk_alloc_queue(int node_id); +static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu) +{ + if (srcu) + return blk_requestq_srcu_cachep; + return blk_requestq_cachep; +} +struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu); + int disk_scan_partitions(struct gendisk *disk, fmode_t mode); int disk_alloc_events(struct gendisk *disk); diff --git a/block/genhd.c b/block/genhd.c index 5179a4f00fba..3c139a1b6f04 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1338,7 +1338,7 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass) struct request_queue *q; struct gendisk *disk; - q = blk_alloc_queue(node); + q = blk_alloc_queue(node, false); if (!q) return NULL; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index d952c3442261..42fe97adb807 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -4,7 +4,6 @@ #include #include -#include #include #include #include @@ -376,13 +375,6 @@ struct blk_mq_hw_ctx { * q->unused_hctx_list. */ struct list_head hctx_list; - - /** - * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is - * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also - * blk_mq_hw_ctx_size(). - */ - struct srcu_struct srcu[]; }; /** diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 0a4416ef4fbf..c80cfaefc0a8 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -16,6 +16,7 @@ #include #include #include +#include struct module; struct request_queue; @@ -373,11 +374,18 @@ struct request_queue { * devices that do not have multiple independent access ranges. */ struct blk_independent_access_ranges *ia_ranges; + + /** + * @srcu: Sleepable RCU. Use as lock when type of the request queue + * is blocking (BLK_MQ_F_BLOCKING). Must be the last member + */ + struct srcu_struct srcu[]; }; /* Keep blk_queue_flag_name[] in sync with the definitions below */ #define QUEUE_FLAG_STOPPED 0 /* queue is stopped */ #define QUEUE_FLAG_DYING 1 /* queue being torn down */ +#define QUEUE_FLAG_HAS_SRCU 2 /* SRCU is allocated */ #define QUEUE_FLAG_NOMERGES 3 /* disable merge attempts */ #define QUEUE_FLAG_SAME_COMP 4 /* complete on same CPU-group */ #define QUEUE_FLAG_FAIL_IO 5 /* fake timeout */ @@ -415,6 +423,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) #define blk_queue_dying(q) test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags) +#define blk_queue_has_srcu(q) test_bit(QUEUE_FLAG_HAS_SRCU, &(q)->queue_flags) #define blk_queue_dead(q) test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags) #define blk_queue_init_done(q) test_bit(QUEUE_FLAG_INIT_DONE, &(q)->queue_flags) #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) From patchwork Tue Nov 30 07:37:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12646439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79DF2C433EF for ; Tue, 30 Nov 2021 07:39:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239069AbhK3Hme (ORCPT ); Tue, 30 Nov 2021 02:42:34 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:25539 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239086AbhK3Hly (ORCPT ); Tue, 30 Nov 2021 02:41:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638257915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+bkZ+oTSXTZDY84lJmh+NUQelyukHuT6Jz7jbXrpA0M=; b=Iwa2H7k14Jl8/qYg3oyMBL97dEkd8CWLEnSoNPq56sLklSIo7IbZEw/OOA6f5MTBi9XlH4 CvTm74t2jCo4OUn/Q4tN8Jp1KjK3/oPL0r63zN2vYuQgsNrpwUq7Ln5mB4Dsy/FlLNRPAj Y1mc1BilfhyfPRudmtdTcexUnK79Nmo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-399-xiKcwCS7O-OLh_irlKFMeg-1; Tue, 30 Nov 2021 02:38:30 -0500 X-MC-Unique: xiKcwCS7O-OLh_irlKFMeg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C51291006AA7; Tue, 30 Nov 2021 07:38:28 +0000 (UTC) Received: from localhost (ovpn-8-25.pek2.redhat.com [10.72.8.25]) by smtp.corp.redhat.com (Postfix) with ESMTP id 431A719724; Tue, 30 Nov 2021 07:38:24 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei Subject: [PATCH V2 3/5] blk-mq: add helper of blk_mq_shared_quiesce_wait() Date: Tue, 30 Nov 2021 15:37:50 +0800 Message-Id: <20211130073752.3005936-4-ming.lei@redhat.com> In-Reply-To: <20211130073752.3005936-1-ming.lei@redhat.com> References: <20211130073752.3005936-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add helper of blk_mq_shared_quiesce_wait() for supporting to quiesce queues in parallel, then we can just wait once if global quiesce wait is allowed. Signed-off-by: Ming Lei --- include/linux/blk-mq.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 42fe97adb807..6f3ccd604d72 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -788,6 +788,19 @@ static inline bool blk_mq_add_to_batch(struct request *req, return true; } +/* + * If the queue has allocated & used srcu to quiesce queue, quiesce wait is + * done via the synchronize_srcu(q->rcu), otherwise it can be done via + * shared synchronize_rcu() from other request queues in same host wide. + * + * This helper can help us to support quiescing queue in parallel, so just + * one quiesce wait is enough if shared quiesce wait is allowed. + */ +static inline bool blk_mq_shared_quiesce_wait(struct request_queue *q) +{ + return !blk_queue_has_srcu(q); +} + void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list); void blk_mq_kick_requeue_list(struct request_queue *q); void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs); From patchwork Tue Nov 30 07:37:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12646441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1E80C4332F for ; Tue, 30 Nov 2021 07:39:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239079AbhK3Hmf (ORCPT ); Tue, 30 Nov 2021 02:42:35 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:28041 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239095AbhK3HmD (ORCPT ); Tue, 30 Nov 2021 02:42:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638257924; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7xMXqhptIWrpNXAqWAhz5g7mlgjr3NfEv27Xmhs7v4E=; b=UIFQ97P8A7fPkG+Kcdb8xvuEgh8Mv6N0WP+AGxCqDt4JnwPx1dnUfzyykPvbxLOzTaDja/ YbiOoPK2is4zDJA3rsb7UAJDT25LPTn37spFPnn1bV3EvwnJZHw0g8z+nDCDPwvJQEkSN9 QYurmvN4PR9TqlWDpOBYQPiAA1TqcCw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-303-SQ0-HJPHMh2yTZUriJMdlQ-1; Tue, 30 Nov 2021 02:38:42 -0500 X-MC-Unique: SQ0-HJPHMh2yTZUriJMdlQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 005AB94EE1; Tue, 30 Nov 2021 07:38:41 +0000 (UTC) Received: from localhost (ovpn-8-25.pek2.redhat.com [10.72.8.25]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5A54D5D6BA; Tue, 30 Nov 2021 07:38:31 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei , Chao Leng Subject: [PATCH V2 4/5] nvme: quiesce namespace queue in parallel Date: Tue, 30 Nov 2021 15:37:51 +0800 Message-Id: <20211130073752.3005936-5-ming.lei@redhat.com> In-Reply-To: <20211130073752.3005936-1-ming.lei@redhat.com> References: <20211130073752.3005936-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Chao Leng reported that in case of lots of namespaces, it may take quite a while for nvme_stop_queues() to quiesce all namespace queues. Improve nvme_stop_queues() by running quiesce in parallel, and just wait once if global quiesce wait is allowed. Link: https://lore.kernel.org/linux-block/cc732195-c053-9ce4-e1a7-e7f6dcf762ac@huawei.com/ Reported-by: Chao Leng Signed-off-by: Ming Lei --- drivers/nvme/host/core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 4c63564adeaa..20827a360099 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4540,9 +4540,7 @@ static void nvme_start_ns_queue(struct nvme_ns *ns) static void nvme_stop_ns_queue(struct nvme_ns *ns) { if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags)) - blk_mq_quiesce_queue(ns->queue); - else - blk_mq_wait_quiesce_done(ns->queue); + blk_mq_quiesce_queue_nowait(ns->queue); } /* @@ -4643,6 +4641,11 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl) down_read(&ctrl->namespaces_rwsem); list_for_each_entry(ns, &ctrl->namespaces, list) nvme_stop_ns_queue(ns); + list_for_each_entry(ns, &ctrl->namespaces, list) { + blk_mq_wait_quiesce_done(ns->queue); + if (blk_mq_shared_quiesce_wait(ns->queue)) + break; + } up_read(&ctrl->namespaces_rwsem); } EXPORT_SYMBOL_GPL(nvme_stop_queues); From patchwork Tue Nov 30 07:37:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12646443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E59F5C433EF for ; Tue, 30 Nov 2021 07:39:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239082AbhK3Hmh (ORCPT ); Tue, 30 Nov 2021 02:42:37 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36757 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239101AbhK3HmI (ORCPT ); Tue, 30 Nov 2021 02:42:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638257929; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qP5lgzlSfPS1WcUPtQFjVSSj/qUXqQzGqLQNL0G+rkE=; b=CGV67cF8GAQafd4zdgAIz1Hd2ylULSK9YGq316MLWvFkIetFjJ3EcrC+MEelki3N1YWQ0m cwXLqaXXk1HhYcDeTgoqhWChRhW3HFE6jVzxFoTRJuA21AM0/rPjswWTtiyzY74KwzE+FB w1UffLRR8Yvpji+fNnoNgKYkDFygycA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-464-QyGI1VKwPbepzAfNy_X2bg-1; Tue, 30 Nov 2021 02:38:46 -0500 X-MC-Unique: QyGI1VKwPbepzAfNy_X2bg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A1F2580DDFE; Tue, 30 Nov 2021 07:38:44 +0000 (UTC) Received: from localhost (ovpn-8-25.pek2.redhat.com [10.72.8.25]) by smtp.corp.redhat.com (Postfix) with ESMTP id C935E5D6BA; Tue, 30 Nov 2021 07:38:43 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei Subject: [PATCH V2 5/5] scsi: use blk-mq quiesce APIs to implement scsi_host_block Date: Tue, 30 Nov 2021 15:37:52 +0800 Message-Id: <20211130073752.3005936-6-ming.lei@redhat.com> In-Reply-To: <20211130073752.3005936-1-ming.lei@redhat.com> References: <20211130073752.3005936-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org scsi_host_block() calls synchronize_rcu() directly to wait for quiesce done, this way is ugly since it exposes blk-mq quiesce's implementation details. Instead apply blk_mq_wait_quiesce_done() and blk_mq_shared_quiesce_wait() for scsi_host_block(). Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 5e8b5ecb3245..d93bfc08bc1a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2952,15 +2952,15 @@ scsi_host_block(struct Scsi_Host *shost) } } - /* - * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so - * calling synchronize_rcu() once is enough. - */ - WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING); - - if (!ret) - synchronize_rcu(); + if (!ret) { + shost_for_each_device(sdev, shost) { + struct request_queue *q = sdev->request_queue; + blk_mq_wait_quiesce_done(q); + if (blk_mq_shared_quiesce_wait(q)) + break; + } + } return ret; } EXPORT_SYMBOL_GPL(scsi_host_block);