From patchwork Fri Nov 19 02:18:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12628081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56393C433F5 for ; Fri, 19 Nov 2021 02:19:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3C60461A89 for ; Fri, 19 Nov 2021 02:19:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234033AbhKSCWR (ORCPT ); Thu, 18 Nov 2021 21:22:17 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:42738 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233590AbhKSCWQ (ORCPT ); Thu, 18 Nov 2021 21:22:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637288355; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sxGbpBGn5jUG/nLF4bcIP0Iq5XTWCQi6CTPJ1TFklZE=; b=EunFvoVKtxHHHezn3bWSemahU1WGTFWwzI8KZZxDvWPs8rMr674gQdyzbBef1zzg2p8d6W muAOBUPlNFfZbAT9TpVnI3d/NkZbGSLB4irQtcdhCgGEMzbT4T4igUAPLaC8eIlFwW1GoK R8JvebtXeHB/ov5bd5odnUc5f2BP7oU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-444-nL7awR4QOiqToMav_z5rHg-1; Thu, 18 Nov 2021 21:19:12 -0500 X-MC-Unique: nL7awR4QOiqToMav_z5rHg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0B4E51006AA1; Fri, 19 Nov 2021 02:19:11 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0AC9A4ABA9; Fri, 19 Nov 2021 02:19:08 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei Subject: [PATCH 1/5] blk-mq: move srcu from blk_mq_hw_ctx to request_queue Date: Fri, 19 Nov 2021 10:18:45 +0800 Message-Id: <20211119021849.2259254-2-ming.lei@redhat.com> In-Reply-To: <20211119021849.2259254-1-ming.lei@redhat.com> References: <20211119021849.2259254-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org In case of BLK_MQ_F_BLOCKING, per-hctx srcu is used to protect dispatch critical area. However, this srcu instance stays at the end of hctx, and it often takes standalone cacheline, often cold. Inside srcu_read_lock() and srcu_read_unlock(), WRITE is always done on the indirect percpu variable which is allocated from heap instead of being embedded, srcu->srcu_idx is read only in srcu_read_lock(). It doesn't matter if srcu structure stays in hctx or request queue. So switch to per-request-queue srcu for protecting dispatch, and this way simplifies quiesce a lot, not mention quiesce is always done on the request queue wide. io_uring randread IO test shows that IOPS is basically not affected by this change on null_blk(g_blocking, MQ). Cc: Keith Busch Signed-off-by: Ming Lei --- block/blk-core.c | 23 +++++++++++++++++---- block/blk-mq-sysfs.c | 2 -- block/blk-mq.c | 46 ++++++++++++------------------------------ block/blk-sysfs.c | 3 ++- block/blk.h | 10 ++++++++- block/genhd.c | 2 +- include/linux/blk-mq.h | 8 -------- include/linux/blkdev.h | 8 ++++++++ 8 files changed, 52 insertions(+), 50 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ee54b34d5e99..aed14485a932 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -64,6 +64,7 @@ DEFINE_IDA(blk_queue_ida); * For queue allocation */ struct kmem_cache *blk_requestq_cachep; +struct kmem_cache *blk_requestq_srcu_cachep; /* * Controlling structure to kblockd @@ -433,21 +434,25 @@ static void blk_timeout_work(struct work_struct *work) { } -struct request_queue *blk_alloc_queue(int node_id) +struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu) { struct request_queue *q; int ret; - q = kmem_cache_alloc_node(blk_requestq_cachep, - GFP_KERNEL | __GFP_ZERO, node_id); + q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu), + GFP_KERNEL | __GFP_ZERO, node_id); if (!q) return NULL; + q->alloc_srcu = alloc_srcu; + if (alloc_srcu && init_srcu_struct(q->srcu) != 0) + goto fail_q; + q->last_merge = NULL; q->id = ida_simple_get(&blk_queue_ida, 0, 0, GFP_KERNEL); if (q->id < 0) - goto fail_q; + goto fail_srcu; ret = bioset_init(&q->bio_split, BIO_POOL_SIZE, 0, 0); if (ret) @@ -504,6 +509,9 @@ struct request_queue *blk_alloc_queue(int node_id) bioset_exit(&q->bio_split); fail_id: ida_simple_remove(&blk_queue_ida, q->id); +fail_srcu: + if (q->alloc_srcu) + cleanup_srcu_struct(q->srcu); fail_q: kmem_cache_free(blk_requestq_cachep, q); return NULL; @@ -1305,6 +1313,9 @@ int __init blk_dev_init(void) sizeof_field(struct request, cmd_flags)); BUILD_BUG_ON(REQ_OP_BITS + REQ_FLAG_BITS > 8 * sizeof_field(struct bio, bi_opf)); + BUILD_BUG_ON(ALIGN(offsetof(struct request_queue, srcu), + __alignof__(struct request_queue)) != + sizeof(struct request_queue)); /* used for unplugging and affects IO latency/throughput - HIGHPRI */ kblockd_workqueue = alloc_workqueue("kblockd", @@ -1315,6 +1326,10 @@ int __init blk_dev_init(void) blk_requestq_cachep = kmem_cache_create("request_queue", sizeof(struct request_queue), 0, SLAB_PANIC, NULL); + blk_requestq_srcu_cachep = kmem_cache_create("request_queue_srcu", + sizeof(struct request_queue) + + sizeof(struct srcu_struct), 0, SLAB_PANIC, NULL); + blk_debugfs_root = debugfs_create_dir("block", NULL); return 0; diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 253c857cba47..674786574075 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -36,8 +36,6 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj) struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx, kobj); - if (hctx->flags & BLK_MQ_F_BLOCKING) - cleanup_srcu_struct(hctx->srcu); blk_free_flush_queue(hctx->fq); sbitmap_free(&hctx->ctx_map); free_cpumask_var(hctx->cpumask); diff --git a/block/blk-mq.c b/block/blk-mq.c index 1feb9ab65f28..9728a571b009 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -260,17 +260,11 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait); */ void blk_mq_wait_quiesce_done(struct request_queue *q) { - struct blk_mq_hw_ctx *hctx; - unsigned int i; - bool rcu = false; - - queue_for_each_hw_ctx(q, hctx, i) { - if (hctx->flags & BLK_MQ_F_BLOCKING) - synchronize_srcu(hctx->srcu); - else - rcu = true; - } - if (rcu) + WARN_ON_ONCE(q->alloc_srcu != !!(q->tag_set->flags & + BLK_MQ_F_BLOCKING)); + if (q->alloc_srcu) + synchronize_srcu(q->srcu); + else synchronize_rcu(); } EXPORT_SYMBOL_GPL(blk_mq_wait_quiesce_done); @@ -1082,16 +1076,16 @@ void blk_mq_complete_request(struct request *rq) } EXPORT_SYMBOL(blk_mq_complete_request); -static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) +static inline void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) __releases(hctx->srcu) { if (!(hctx->flags & BLK_MQ_F_BLOCKING)) rcu_read_unlock(); else - srcu_read_unlock(hctx->srcu, srcu_idx); + srcu_read_unlock(hctx->queue->srcu, srcu_idx); } -static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx) +static inline void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx) __acquires(hctx->srcu) { if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { @@ -1099,7 +1093,7 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx) *srcu_idx = 0; rcu_read_lock(); } else - *srcu_idx = srcu_read_lock(hctx->srcu); + *srcu_idx = srcu_read_lock(hctx->queue->srcu); } /** @@ -3515,20 +3509,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q, } } -static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set) -{ - int hw_ctx_size = sizeof(struct blk_mq_hw_ctx); - - BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, srcu), - __alignof__(struct blk_mq_hw_ctx)) != - sizeof(struct blk_mq_hw_ctx)); - - if (tag_set->flags & BLK_MQ_F_BLOCKING) - hw_ctx_size += sizeof(struct srcu_struct); - - return hw_ctx_size; -} - static int blk_mq_init_hctx(struct request_queue *q, struct blk_mq_tag_set *set, struct blk_mq_hw_ctx *hctx, unsigned hctx_idx) @@ -3566,7 +3546,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, struct blk_mq_hw_ctx *hctx; gfp_t gfp = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY; - hctx = kzalloc_node(blk_mq_hw_ctx_size(set), gfp, node); + hctx = kzalloc_node(sizeof(struct blk_mq_hw_ctx), gfp, node); if (!hctx) goto fail_alloc_hctx; @@ -3608,8 +3588,6 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, if (!hctx->fq) goto free_bitmap; - if (hctx->flags & BLK_MQ_F_BLOCKING) - init_srcu_struct(hctx->srcu); blk_mq_hctx_kobj_init(hctx); return hctx; @@ -3945,7 +3923,7 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set, struct request_queue *q; int ret; - q = blk_alloc_queue(set->numa_node); + q = blk_alloc_queue(set->numa_node, set->flags & BLK_MQ_F_BLOCKING); if (!q) return ERR_PTR(-ENOMEM); q->queuedata = queuedata; @@ -4094,6 +4072,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, struct request_queue *q) { + WARN_ON_ONCE(q->alloc_srcu != !!(set->flags & BLK_MQ_F_BLOCKING)); + /* mark the queue as mq asap */ q->mq_ops = set->ops; diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index cef1f713370b..2b79845a581d 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -734,7 +734,8 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) { struct request_queue *q = container_of(rcu_head, struct request_queue, rcu_head); - kmem_cache_free(blk_requestq_cachep, q); + + kmem_cache_free(blk_get_queue_kmem_cache(q->alloc_srcu), q); } /* Unconfigure the I/O scheduler and dissociate from the cgroup controller. */ diff --git a/block/blk.h b/block/blk.h index 296e3010f8d6..c14bca80aba9 100644 --- a/block/blk.h +++ b/block/blk.h @@ -32,6 +32,7 @@ struct blk_flush_queue { }; extern struct kmem_cache *blk_requestq_cachep; +extern struct kmem_cache *blk_requestq_srcu_cachep; extern struct kobj_type blk_queue_ktype; extern struct ida blk_queue_ida; @@ -448,7 +449,14 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); -struct request_queue *blk_alloc_queue(int node_id); +static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu) +{ + if (srcu) + return blk_requestq_srcu_cachep; + return blk_requestq_cachep; +} + +struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu); int disk_alloc_events(struct gendisk *disk); void disk_add_events(struct gendisk *disk); diff --git a/block/genhd.c b/block/genhd.c index c5392cc24d37..e624fe9371f2 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1341,7 +1341,7 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass) struct request_queue *q; struct gendisk *disk; - q = blk_alloc_queue(node); + q = blk_alloc_queue(node, false); if (!q) return NULL; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 308edc2a4925..5cc7fc1ea863 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -4,7 +4,6 @@ #include #include -#include #include #include #include @@ -376,13 +375,6 @@ struct blk_mq_hw_ctx { * q->unused_hctx_list. */ struct list_head hctx_list; - - /** - * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is - * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also - * blk_mq_hw_ctx_size(). - */ - struct srcu_struct srcu[]; }; /** diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index bd4370baccca..5741b46bca6c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -16,6 +16,7 @@ #include #include #include +#include struct module; struct request_queue; @@ -364,6 +365,7 @@ struct request_queue { #endif bool mq_sysfs_init_done; + bool alloc_srcu; #define BLK_MAX_WRITE_HINTS 5 u64 write_hints[BLK_MAX_WRITE_HINTS]; @@ -373,6 +375,12 @@ struct request_queue { * devices that do not have multiple independent access ranges. */ struct blk_independent_access_ranges *ia_ranges; + + /** + * @srcu: Sleepable RCU. Use as lock when type of the request queue + * is blocking (BLK_MQ_F_BLOCKING). Must be the last member + */ + struct srcu_struct srcu[]; }; /* Keep blk_queue_flag_name[] in sync with the definitions below */ From patchwork Fri Nov 19 02:18:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12628083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C228C433FE for ; Fri, 19 Nov 2021 02:19:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3C6D61266 for ; Fri, 19 Nov 2021 02:19:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234033AbhKSCWa (ORCPT ); Thu, 18 Nov 2021 21:22:30 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:55765 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229879AbhKSCW3 (ORCPT ); Thu, 18 Nov 2021 21:22:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637288368; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MW2hk2WD79i5l8JecZ6fNd7+T/zbAnW3HDjJF4jFwb8=; b=h8xcgw2bAeInu2HS9u6+PHWXQwcYXx+rCAugwP4clcJTcU7s34TeD8aDDhnfOMuBeX6vfg vFxe+ltp+eVftN/tOfswX7p+N4EuP7+BzGCvM5/jIrjID1eYP4jRm3ekSH0rhl8cBuku1j dydBGJ9FNXx7M5gQYoc70WLRdfcvHrg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-98-6ie_PxG9N4K-z5EhQ4hQzw-1; Thu, 18 Nov 2021 21:19:23 -0500 X-MC-Unique: 6ie_PxG9N4K-z5EhQ4hQzw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3C5871006AA0; Fri, 19 Nov 2021 02:19:22 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 13F0B5F4ED; Fri, 19 Nov 2021 02:19:14 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei Subject: [PATCH 2/5] blk-mq: rename hctx_lock & hctx_unlock Date: Fri, 19 Nov 2021 10:18:46 +0800 Message-Id: <20211119021849.2259254-3-ming.lei@redhat.com> In-Reply-To: <20211119021849.2259254-1-ming.lei@redhat.com> References: <20211119021849.2259254-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org We have moved srcu from 'struct blk_mq_hw_ctx' into 'struct request_queue', both hctx_lock and hctx_unlock are run on request queue level, so rename them as queue_lock and queue_unlock(). And it could be used for supporting Jens's ->queue_rqs(), as suggested by Keith. Also it could be extended for driver uses in future. Cc: Keith Busch Signed-off-by: Ming Lei --- block/blk-mq.c | 40 +++++++++++++++++++++++----------------- 1 file changed, 23 insertions(+), 17 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 9728a571b009..ba0d0e411b65 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1076,24 +1076,26 @@ void blk_mq_complete_request(struct request *rq) } EXPORT_SYMBOL(blk_mq_complete_request); -static inline void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) - __releases(hctx->srcu) +static inline void queue_unlock(struct request_queue *q, bool blocking, + int srcu_idx) + __releases(q->srcu) { - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) + if (!blocking) rcu_read_unlock(); else - srcu_read_unlock(hctx->queue->srcu, srcu_idx); + srcu_read_unlock(q->srcu, srcu_idx); } -static inline void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx) +static inline void queue_lock(struct request_queue *q, bool blocking, + int *srcu_idx) __acquires(hctx->srcu) { - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { + if (!blocking) { /* shut up gcc false positive */ *srcu_idx = 0; rcu_read_lock(); } else - *srcu_idx = srcu_read_lock(hctx->queue->srcu); + *srcu_idx = srcu_read_lock(q->srcu); } /** @@ -1958,6 +1960,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) { int srcu_idx; + bool blocking = hctx->flags & BLK_MQ_F_BLOCKING; /* * We can't run the queue inline with ints disabled. Ensure that @@ -1965,11 +1968,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) */ WARN_ON_ONCE(in_interrupt()); - might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); + might_sleep_if(blocking); - hctx_lock(hctx, &srcu_idx); + queue_lock(hctx->queue, blocking, &srcu_idx); blk_mq_sched_dispatch_requests(hctx); - hctx_unlock(hctx, srcu_idx); + queue_unlock(hctx->queue, blocking, srcu_idx); } static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx) @@ -2083,6 +2086,7 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) { int srcu_idx; bool need_run; + bool blocking = hctx->flags & BLK_MQ_F_BLOCKING; /* * When queue is quiesced, we may be switching io scheduler, or @@ -2092,10 +2096,10 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) * And queue will be rerun in blk_mq_unquiesce_queue() if it is * quiesced. */ - hctx_lock(hctx, &srcu_idx); + queue_lock(hctx->queue, blocking, &srcu_idx); need_run = !blk_queue_quiesced(hctx->queue) && blk_mq_hctx_has_pending(hctx); - hctx_unlock(hctx, srcu_idx); + queue_unlock(hctx->queue, blocking, srcu_idx); if (need_run) __blk_mq_delay_run_hw_queue(hctx, async, 0); @@ -2500,10 +2504,11 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, { blk_status_t ret; int srcu_idx; + bool blocking = hctx->flags & BLK_MQ_F_BLOCKING; - might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); + might_sleep_if(blocking); - hctx_lock(hctx, &srcu_idx); + queue_lock(hctx->queue, blocking, &srcu_idx); ret = __blk_mq_try_issue_directly(hctx, rq, false, true); if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) @@ -2511,7 +2516,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); - hctx_unlock(hctx, srcu_idx); + queue_unlock(hctx->queue, blocking, srcu_idx); } static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last) @@ -2519,10 +2524,11 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last) blk_status_t ret; int srcu_idx; struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + bool blocking = hctx->flags & BLK_MQ_F_BLOCKING; - hctx_lock(hctx, &srcu_idx); + queue_lock(hctx->queue, blocking, &srcu_idx); ret = __blk_mq_try_issue_directly(hctx, rq, true, last); - hctx_unlock(hctx, srcu_idx); + queue_unlock(hctx->queue, blocking, srcu_idx); return ret; } From patchwork Fri Nov 19 02:18:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12628085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F13CEC433F5 for ; Fri, 19 Nov 2021 02:19:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DB9B961AE2 for ; Fri, 19 Nov 2021 02:19:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234052AbhKSCWl (ORCPT ); Thu, 18 Nov 2021 21:22:41 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:30683 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234069AbhKSCWh (ORCPT ); Thu, 18 Nov 2021 21:22:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637288376; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Yd7NKGQpp1ZdaKbTLbIuoVxBrLtjhZOvXvgiVmAZ9k=; b=IQwgYSpQDJLUA5+DtFRzL3OYJUHaHp8iDt+rTauz6AXqQ+mRW1Wd852YvSEViSN4aKfis2 BwgZr9K6QT5nLfZvt0b2IABhD1dTYRgOYHGG1sZMLpK7nFEkHytcjmzjmihrdmTACMuJIf hIbgWsAn+hnYmnv2xdH7meuItZ5vuiI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-299-EZEdM3BnMdSXUOPyRgh_jg-1; Thu, 18 Nov 2021 21:19:33 -0500 X-MC-Unique: EZEdM3BnMdSXUOPyRgh_jg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C9BC1802C94; Fri, 19 Nov 2021 02:19:31 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5A71460BF1; Fri, 19 Nov 2021 02:19:24 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei Subject: [PATCH 3/5] blk-mq: add helper of blk_mq_global_quiesce_wait() Date: Fri, 19 Nov 2021 10:18:47 +0800 Message-Id: <20211119021849.2259254-4-ming.lei@redhat.com> In-Reply-To: <20211119021849.2259254-1-ming.lei@redhat.com> References: <20211119021849.2259254-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org Add helper of blk_mq_global_quiesce_wait() for supporting to quiesce queues in parallel, then we can just wait once if global quiesce wait is allowed. Signed-off-by: Ming Lei --- include/linux/blk-mq.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 5cc7fc1ea863..a9fecda2507e 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -777,6 +777,19 @@ static inline bool blk_mq_add_to_batch(struct request *req, return true; } +/* + * If the queue has allocated & used srcu to quiesce queue, quiesce wait is + * done via the synchronize_srcu(q->rcu), otherwise it is done via global + * synchronize_rcu(). + * + * This helper can help us to support quiescing queue in parallel, so just + * one quiesce wait is enough if global quiesce wait is allowed. + */ +static inline bool blk_mq_global_quiesce_wait(struct request_queue *q) +{ + return !q->alloc_srcu; +} + void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list); void blk_mq_kick_requeue_list(struct request_queue *q); void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs); From patchwork Fri Nov 19 02:18:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12628087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80B7CC433F5 for ; Fri, 19 Nov 2021 02:19:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C04761266 for ; Fri, 19 Nov 2021 02:19:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233475AbhKSCWr (ORCPT ); Thu, 18 Nov 2021 21:22:47 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:56376 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234069AbhKSCWr (ORCPT ); Thu, 18 Nov 2021 21:22:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637288386; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zx5b7EyasaE4iBR8eg0o0R2Juu59PwQHw2pizy5wlHs=; b=VPUB4G5HUfAI0wAkss93agVOk/l/jnbjRiRJ464oi0tynugGCGSVbf72tm5fnP3pIvKFdv CGgA6tA3MifmhaIfVLj9++ZT5oeGr0Mp3zyEuxNW0uHD4CHUM09zx7vZsbwDYz8ocF0cCQ 97FsDD1QMgenqoNpsHlsltx16Gc/5E4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-337-VhXBQGltPMCKpjIZEdpu3w-1; Thu, 18 Nov 2021 21:19:42 -0500 X-MC-Unique: VhXBQGltPMCKpjIZEdpu3w-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3A5FE15723; Fri, 19 Nov 2021 02:19:41 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id A88BC179B3; Fri, 19 Nov 2021 02:19:36 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei , Chao Leng Subject: [PATCH 4/5] nvme: quiesce namespace queue in parallel Date: Fri, 19 Nov 2021 10:18:48 +0800 Message-Id: <20211119021849.2259254-5-ming.lei@redhat.com> In-Reply-To: <20211119021849.2259254-1-ming.lei@redhat.com> References: <20211119021849.2259254-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org Chao Leng reported that in case of lots of namespaces, it may take quite a while for nvme_stop_queues() to quiesce all namespace queues. Improve nvme_stop_queues() by running quiesce in parallel, and just wait once if global quiesce wait is allowed. Link: https://lore.kernel.org/linux-block/cc732195-c053-9ce4-e1a7-e7f6dcf762ac@huawei.com/ Reported-by: Chao Leng Signed-off-by: Ming Lei --- drivers/nvme/host/core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 4b5de8f5435a..06741d3ed72b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4517,9 +4517,7 @@ static void nvme_start_ns_queue(struct nvme_ns *ns) static void nvme_stop_ns_queue(struct nvme_ns *ns) { if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags)) - blk_mq_quiesce_queue(ns->queue); - else - blk_mq_wait_quiesce_done(ns->queue); + blk_mq_quiesce_queue_nowait(ns->queue); } /* @@ -4620,6 +4618,11 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl) down_read(&ctrl->namespaces_rwsem); list_for_each_entry(ns, &ctrl->namespaces, list) nvme_stop_ns_queue(ns); + list_for_each_entry(ns, &ctrl->namespaces, list) { + blk_mq_wait_quiesce_done(ns->queue); + if (blk_mq_global_quiesce_wait(ns->queue)) + break; + } up_read(&ctrl->namespaces_rwsem); } EXPORT_SYMBOL_GPL(nvme_stop_queues); From patchwork Fri Nov 19 02:18:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 12628089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD865C433EF for ; Fri, 19 Nov 2021 02:19:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A431D61B27 for ; Fri, 19 Nov 2021 02:19:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234081AbhKSCW4 (ORCPT ); Thu, 18 Nov 2021 21:22:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:47082 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234085AbhKSCWy (ORCPT ); Thu, 18 Nov 2021 21:22:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637288392; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kVmDn8fHlf4GXS6Ok/pt7q9KsTTX1i/7O5l8UCEX6+c=; b=ZpLoG852e661ZrhtAPcZKOHMntiCKTpXRhGjo6s450ZDURFFiv5S66dsq60UxkOJZrCJY6 DPaH4BOrvE+3bKOoskN7QKdDvwmZBKt5GArkImg9rNDl1ohUDb9kkuIy/5HtTJsjnEtuYB TWfMnlrsbbsdAWF/PFhg1W0P+PNC51c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-205-ueS-ZQ7CPi6veN9qdj_HDQ-1; Thu, 18 Nov 2021 21:19:51 -0500 X-MC-Unique: ueS-ZQ7CPi6veN9qdj_HDQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CE2E91808312; Fri, 19 Nov 2021 02:19:49 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 84D115F4ED; Fri, 19 Nov 2021 02:19:44 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Jens Axboe , "Martin K . Petersen" Cc: Sagi Grimberg , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Keith Busch , Ming Lei Subject: [PATCH 5/5] scsi: use blk-mq quiesce APIs to implement scsi_host_block Date: Fri, 19 Nov 2021 10:18:49 +0800 Message-Id: <20211119021849.2259254-6-ming.lei@redhat.com> In-Reply-To: <20211119021849.2259254-1-ming.lei@redhat.com> References: <20211119021849.2259254-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org scsi_host_block() calls synchronize_rcu() directly to wait for quiesce done, this way is ugly since it exposes blk-mq quiesce's implementation details. Instead apply blk_mq_wait_quiesce_done() and blk_mq_global_quiesce_wait() for scsi_host_block(). Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 5e8b5ecb3245..b0da6a4a1784 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2952,15 +2952,15 @@ scsi_host_block(struct Scsi_Host *shost) } } - /* - * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so - * calling synchronize_rcu() once is enough. - */ - WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING); - - if (!ret) - synchronize_rcu(); + if (!ret) { + shost_for_each_device(sdev, shost) { + struct request_queue *q = sdev->request_queue; + blk_mq_wait_quiesce_done(q); + if (blk_mq_global_quiesce_wait(q)) + break; + } + } return ret; } EXPORT_SYMBOL_GPL(scsi_host_block);