From patchwork Tue Mar 29 09:40:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12794612 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A7F6C433FE for ; Tue, 29 Mar 2022 09:26:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234673AbiC2J1m (ORCPT ); Tue, 29 Mar 2022 05:27:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234142AbiC2J1l (ORCPT ); Tue, 29 Mar 2022 05:27:41 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7AEF972EA; Tue, 29 Mar 2022 02:25:58 -0700 (PDT) Received: from kwepemi500002.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4KSPLY0rn9z1GD1H; Tue, 29 Mar 2022 17:25:41 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi500002.china.huawei.com (7.221.188.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:57 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:56 +0800 From: Yu Kuai To: , , , CC: , , , Subject: [PATCH -next RFC 1/6] blk-mq: add a new flag 'BLK_MQ_F_NO_TAG_PREEMPTION' Date: Tue, 29 Mar 2022 17:40:43 +0800 Message-ID: <20220329094048.2107094-2-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220329094048.2107094-1-yukuai3@huawei.com> References: <20220329094048.2107094-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Tag preemption is the default behaviour, specifically blk_mq_get_tag() will try to get tag unconditionally, which means a new io can preempt tag even if there are lots of ios that are waiting for tags. This patch introduce a new flag, prepare to disable such behaviour, in order to optimize io performance for large random io for HHD. Signed-off-by: Yu Kuai --- block/blk-mq-debugfs.c | 1 + block/blk-mq.h | 5 +++++ include/linux/blk-mq.h | 7 ++++++- 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index aa0349e9f083..f4228532ee3d 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -226,6 +226,7 @@ static const char *const hctx_flag_name[] = { HCTX_FLAG_NAME(NO_SCHED), HCTX_FLAG_NAME(STACKING), HCTX_FLAG_NAME(TAG_HCTX_SHARED), + HCTX_FLAG_NAME(NO_TAG_PREEMPTION), }; #undef HCTX_FLAG_NAME diff --git a/block/blk-mq.h b/block/blk-mq.h index 2615bd58bad3..1a084b3b6097 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -168,6 +168,11 @@ static inline bool blk_mq_is_shared_tags(unsigned int flags) return flags & BLK_MQ_F_TAG_HCTX_SHARED; } +static inline bool blk_mq_is_tag_preemptive(unsigned int flags) +{ + return !(flags & BLK_MQ_F_NO_TAG_PREEMPTION); +} + static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data) { if (!(data->rq_flags & RQF_ELV)) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 7aa5c54901a9..c9434162acc5 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -656,7 +656,12 @@ enum { * or shared hwqs instead of 'mq-deadline'. */ BLK_MQ_F_NO_SCHED_BY_DEFAULT = 1 << 7, - BLK_MQ_F_ALLOC_POLICY_START_BIT = 8, + /* + * If the disk is under high io pressure, new io will wait directly + * without trying to preempt tag. + */ + BLK_MQ_F_NO_TAG_PREEMPTION = 1 << 8, + BLK_MQ_F_ALLOC_POLICY_START_BIT = 9, BLK_MQ_F_ALLOC_POLICY_BITS = 1, BLK_MQ_S_STOPPED = 0, From patchwork Tue Mar 29 09:40:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12794614 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49646C433EF for ; Tue, 29 Mar 2022 09:26:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234676AbiC2J1n (ORCPT ); Tue, 29 Mar 2022 05:27:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234662AbiC2J1m (ORCPT ); Tue, 29 Mar 2022 05:27:42 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9239098590; Tue, 29 Mar 2022 02:25:59 -0700 (PDT) Received: from kwepemi100010.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KSPJz5LcpzfZY9; Tue, 29 Mar 2022 17:24:19 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100010.china.huawei.com (7.221.188.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:57 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:57 +0800 From: Yu Kuai To: , , , CC: , , , Subject: [PATCH -next RFC 2/6] block: refactor to split bio thoroughly Date: Tue, 29 Mar 2022 17:40:44 +0800 Message-ID: <20220329094048.2107094-3-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220329094048.2107094-1-yukuai3@huawei.com> References: <20220329094048.2107094-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, the splited bio is handled first, and then continue to split the original bio. This patch tries to split the original bio thoroughly, so that it can be known in advance how many tags will be needed. Signed-off-by: Yu Kuai --- block/bio.c | 2 + block/blk-merge.c | 90 ++++++++++++++++++++++++++++----------- block/blk-mq.c | 7 ++- block/blk.h | 3 +- include/linux/blk_types.h | 4 ++ 5 files changed, 77 insertions(+), 29 deletions(-) diff --git a/block/bio.c b/block/bio.c index cdd7b2915c53..ac7ce8b4ba42 100644 --- a/block/bio.c +++ b/block/bio.c @@ -258,6 +258,8 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, bio->bi_flags = 0; bio->bi_ioprio = 0; bio->bi_status = 0; + bio->bi_nr_segs = 0; + bio->bi_nr_split = 0; bio->bi_iter.bi_sector = 0; bio->bi_iter.bi_size = 0; bio->bi_iter.bi_idx = 0; diff --git a/block/blk-merge.c b/block/blk-merge.c index 7771dacc99cb..340860746cac 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -309,44 +309,85 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, return bio_split(bio, sectors, GFP_NOIO, bs); } -/** - * __blk_queue_split - split a bio and submit the second half - * @q: [in] request_queue new bio is being queued at - * @bio: [in, out] bio to be split - * @nr_segs: [out] number of segments in the first bio - * - * Split a bio into two bios, chain the two bios, submit the second half and - * store a pointer to the first half in *@bio. If the second bio is still too - * big it will be split by a recursive call to this function. Since this - * function may allocate a new bio from q->bio_split, it is the responsibility - * of the caller to ensure that q->bio_split is only released after processing - * of the split bio has finished. - */ -void __blk_queue_split(struct request_queue *q, struct bio **bio, - unsigned int *nr_segs) +static struct bio *blk_queue_split_one(struct request_queue *q, struct bio *bio) { struct bio *split = NULL; + unsigned int nr_segs = 1; - switch (bio_op(*bio)) { + switch (bio_op(bio)) { case REQ_OP_DISCARD: case REQ_OP_SECURE_ERASE: - split = blk_bio_discard_split(q, *bio, &q->bio_split, nr_segs); + split = blk_bio_discard_split(q, bio, &q->bio_split, &nr_segs); break; case REQ_OP_WRITE_ZEROES: - split = blk_bio_write_zeroes_split(q, *bio, &q->bio_split, - nr_segs); + split = blk_bio_write_zeroes_split(q, bio, &q->bio_split, + &nr_segs); break; default: - split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs); + split = blk_bio_segment_split(q, bio, &q->bio_split, &nr_segs); break; } if (split) { /* there isn't chance to merge the splitted bio */ - split->bi_opf |= REQ_NOMERGE; + split->bi_opf |= (REQ_NOMERGE | REQ_SPLIT); + split->bi_nr_segs = nr_segs; + + bio_chain(split, bio); + trace_block_split(split, bio->bi_iter.bi_sector); + } else { + bio->bi_nr_segs = nr_segs; + } + + return split; +} + +static unsigned short blk_queue_split_all(struct request_queue *q, + struct bio *bio) +{ + struct bio *split = NULL; + struct bio *first = NULL; + unsigned short nr_split = 1; + unsigned short total; - bio_chain(split, *bio); - trace_block_split(split, (*bio)->bi_iter.bi_sector); + if (!current->bio_list) + return 1; + + while ((split = blk_queue_split_one(q, bio))) { + if (!first) + first = split; + + nr_split++; + submit_bio_noacct(split); + } + + total = nr_split; + while (first) { + first->bi_nr_split = --total; + first = first->bi_next; + } + + return nr_split; +} + +/** + * __blk_queue_split - split a bio, store the first and submit others + * @q: [in] request_queue new bio is being queued at + * @bio: [in, out] bio to be split + * + * Split a bio into several bios, chain all the bios, store a pointer to the + * first in *@bio, and submit others. Since this function may allocate a new + * bio from q->bio_split, it is the responsibility of the caller to ensure + * that q->bio_split is only released after processing of the split bio has + * finished. + */ +void __blk_queue_split(struct request_queue *q, struct bio **bio) +{ + struct bio *split = blk_queue_split_one(q, *bio); + + if (split) { + split->bi_nr_split = blk_queue_split_all(q, *bio); + (*bio)->bi_opf |= REQ_SPLIT; submit_bio_noacct(*bio); *bio = split; } @@ -365,10 +406,9 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio, void blk_queue_split(struct bio **bio) { struct request_queue *q = bdev_get_queue((*bio)->bi_bdev); - unsigned int nr_segs; if (blk_may_split(q, *bio)) - __blk_queue_split(q, bio, &nr_segs); + __blk_queue_split(q, bio); } EXPORT_SYMBOL(blk_queue_split); diff --git a/block/blk-mq.c b/block/blk-mq.c index e6f24fa4a4c2..cad207d2079e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2817,8 +2817,11 @@ void blk_mq_submit_bio(struct bio *bio) blk_status_t ret; blk_queue_bounce(q, &bio); - if (blk_may_split(q, bio)) - __blk_queue_split(q, &bio, &nr_segs); + if (blk_may_split(q, bio)) { + if (!(bio->bi_opf & REQ_SPLIT)) + __blk_queue_split(q, &bio); + nr_segs = bio->bi_nr_segs; + } if (!bio_integrity_prep(bio)) return; diff --git a/block/blk.h b/block/blk.h index 8ccbc6e07636..cd478187b525 100644 --- a/block/blk.h +++ b/block/blk.h @@ -303,8 +303,7 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio) bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE; } -void __blk_queue_split(struct request_queue *q, struct bio **bio, - unsigned int *nr_segs); +void __blk_queue_split(struct request_queue *q, struct bio **bio); int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs); bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index dd0763a1c674..702f6b83dc88 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -250,6 +250,8 @@ struct bio { */ unsigned short bi_flags; /* BIO_* below */ unsigned short bi_ioprio; + unsigned int bi_nr_segs; + unsigned int bi_nr_split; blk_status_t bi_status; atomic_t __bi_remaining; @@ -416,6 +418,7 @@ enum req_flag_bits { /* for driver use */ __REQ_DRV, __REQ_SWAP, /* swapping request. */ + __REQ_SPLIT, /* io is split */ __REQ_NR_BITS, /* stops here */ }; @@ -440,6 +443,7 @@ enum req_flag_bits { #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) +#define REQ_SPLIT (1ULL << __REQ_SPLIT) #define REQ_FAILFAST_MASK \ (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER) From patchwork Tue Mar 29 09:40:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12794613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 057B5C43219 for ; Tue, 29 Mar 2022 09:26:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232769AbiC2J1o (ORCPT ); Tue, 29 Mar 2022 05:27:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234664AbiC2J1m (ORCPT ); Tue, 29 Mar 2022 05:27:42 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BDD9972D2; Tue, 29 Mar 2022 02:26:00 -0700 (PDT) Received: from kwepemi100008.china.huawei.com (unknown [172.30.72.56]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KSPLZ1BV3zcbQS; Tue, 29 Mar 2022 17:25:42 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100008.china.huawei.com (7.221.188.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:58 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:57 +0800 From: Yu Kuai To: , , , CC: , , , Subject: [PATCH -next RFC 3/6] blk-mq: record how many tags are needed for splited bio Date: Tue, 29 Mar 2022 17:40:45 +0800 Message-ID: <20220329094048.2107094-4-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220329094048.2107094-1-yukuai3@huawei.com> References: <20220329094048.2107094-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Prepare to wake up number of threads based on required tags. Signed-off-by: Yu Kuai --- block/blk-mq-tag.c | 1 + block/blk-mq.c | 1 + block/blk-mq.h | 1 + include/linux/sbitmap.h | 2 ++ 4 files changed, 5 insertions(+) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 68ac23d0b640..83dfbe2f1cfc 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -155,6 +155,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (data->flags & BLK_MQ_REQ_NOWAIT) return BLK_MQ_NO_TAG; + wait.nr_tags += data->nr_split; ws = bt_wait_ptr(bt, data->hctx); do { struct sbitmap_queue *bt_prev; diff --git a/block/blk-mq.c b/block/blk-mq.c index cad207d2079e..9bace9e2c5ca 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2737,6 +2737,7 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q, .q = q, .nr_tags = 1, .cmd_flags = bio->bi_opf, + .nr_split = bio->bi_nr_split, }; struct request *rq; diff --git a/block/blk-mq.h b/block/blk-mq.h index 1a084b3b6097..3eabe394a5a9 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -156,6 +156,7 @@ struct blk_mq_alloc_data { /* allocate multiple requests/tags in one go */ unsigned int nr_tags; + unsigned int nr_split; struct request **cached_rq; /* input & output parameter */ diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 8f5a86e210b9..9c8c6da3d820 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -591,11 +591,13 @@ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m); struct sbq_wait { struct sbitmap_queue *sbq; /* if set, sbq_wait is accounted */ struct wait_queue_entry wait; + unsigned int nr_tags; }; #define DEFINE_SBQ_WAIT(name) \ struct sbq_wait name = { \ .sbq = NULL, \ + .nr_tags = 1, \ .wait = { \ .private = current, \ .func = autoremove_wake_function, \ From patchwork Tue Mar 29 09:40:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12794617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CF00C4332F for ; Tue, 29 Mar 2022 09:26:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234694AbiC2J2e (ORCPT ); Tue, 29 Mar 2022 05:28:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234684AbiC2J1o (ORCPT ); Tue, 29 Mar 2022 05:27:44 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12112986EA; Tue, 29 Mar 2022 02:26:01 -0700 (PDT) Received: from kwepemi500001.china.huawei.com (unknown [172.30.72.56]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KSPK1431gzfYyL; Tue, 29 Mar 2022 17:24:21 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi500001.china.huawei.com (7.221.188.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:59 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:58 +0800 From: Yu Kuai To: , , , CC: , , , Subject: [PATCH -next RFC 4/6] sbitmap: wake up the number of threads based on required tags Date: Tue, 29 Mar 2022 17:40:46 +0800 Message-ID: <20220329094048.2107094-5-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220329094048.2107094-1-yukuai3@huawei.com> References: <20220329094048.2107094-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, __sbq_wake_up() will wake up 'wake_batch' threads unconditionally, for split io this will intensify competition and split io won't be issued sequentially. This modification can optimize the ratio of sequentially for split big io, however, in order to gain optimal result, tag preemption still need to be disabled, which will be done in later patches. Signed-off-by: Yu Kuai --- lib/sbitmap.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index ae4fd4de9ebe..9d04c0ecc8f7 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -597,6 +597,26 @@ static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq) return NULL; } +static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned int nr_tags) +{ + struct sbq_wait *wait; + struct wait_queue_entry *entry; + unsigned int nr = 1; + + spin_lock_irq(&ws->wait.lock); + list_for_each_entry(entry, &ws->wait.head, entry) { + wait = container_of(entry, struct sbq_wait, wait); + if (nr_tags <= wait->nr_tags) + break; + + nr++; + nr_tags -= wait->nr_tags; + } + spin_unlock_irq(&ws->wait.lock); + + return nr; +} + static bool __sbq_wake_up(struct sbitmap_queue *sbq) { struct sbq_wait_state *ws; @@ -628,7 +648,7 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); if (ret == wait_cnt) { sbq_index_atomic_inc(&sbq->wake_index); - wake_up_nr(&ws->wait, wake_batch); + wake_up_nr(&ws->wait, get_wake_nr(ws, wake_batch)); return false; } From patchwork Tue Mar 29 09:40:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12794615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53ADEC433EF for ; Tue, 29 Mar 2022 09:26:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233061AbiC2J2c (ORCPT ); Tue, 29 Mar 2022 05:28:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234690AbiC2J1q (ORCPT ); Tue, 29 Mar 2022 05:27:46 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AED3698F51; Tue, 29 Mar 2022 02:26:01 -0700 (PDT) Received: from kwepemi100009.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KSPK21v2xzfYyV; Tue, 29 Mar 2022 17:24:22 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100009.china.huawei.com (7.221.188.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:59 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:59 +0800 From: Yu Kuai To: , , , CC: , , , Subject: [PATCH -next RFC 5/6] blk-mq: don't preempt tag expect for split bios Date: Tue, 29 Mar 2022 17:40:47 +0800 Message-ID: <20220329094048.2107094-6-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220329094048.2107094-1-yukuai3@huawei.com> References: <20220329094048.2107094-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In order to improve the sequential of split io, this patch disables tag preemption for the first split bios and other non-split bios if the device is under high io pressure. Noted that this solution rely on waitqueues of sbitmap to be balanced, otherwise it may happen that 'wake_batch' tags is freed and wakers don't obtain 'wake_batch' new tags, thus concurrent io will become less. The next patch will avoid such problem, however, fix the unfairness of waitqueues might be better. Signed-off-by: Yu Kuai --- block/blk-merge.c | 7 ++++++- block/blk-mq-tag.c | 37 ++++++++++++++++++++++++++----------- block/blk-mq.c | 6 ++++++ block/blk-mq.h | 1 + include/linux/blk_types.h | 2 ++ lib/sbitmap.c | 14 ++++++++++---- 6 files changed, 51 insertions(+), 16 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 340860746cac..fd4bbf773b45 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -357,6 +357,11 @@ static unsigned short blk_queue_split_all(struct request_queue *q, if (!first) first = split; + /* + * Except the first split bio, others will always preempt + * tag, so that they can be sequential. + */ + split->bi_opf |= REQ_PREEMPTIVE; nr_split++; submit_bio_noacct(split); } @@ -387,7 +392,7 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio) if (split) { split->bi_nr_split = blk_queue_split_all(q, *bio); - (*bio)->bi_opf |= REQ_SPLIT; + (*bio)->bi_opf |= (REQ_SPLIT | REQ_PREEMPTIVE); submit_bio_noacct(*bio); *bio = split; } diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 83dfbe2f1cfc..4e485bcc5820 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -127,6 +127,13 @@ unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags, return ret; } +static inline bool preempt_tag(struct blk_mq_alloc_data *data, + struct sbitmap_queue *bt) +{ + return data->preemption || + atomic_read(&bt->ws_active) <= SBQ_WAIT_QUEUES; +} + unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) { struct blk_mq_tags *tags = blk_mq_tags_from_data(data); @@ -148,12 +155,14 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) tag_offset = tags->nr_reserved_tags; } - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - goto found_tag; + if (data->flags & BLK_MQ_REQ_NOWAIT || preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + goto found_tag; - if (data->flags & BLK_MQ_REQ_NOWAIT) - return BLK_MQ_NO_TAG; + if (data->flags & BLK_MQ_REQ_NOWAIT) + return BLK_MQ_NO_TAG; + } wait.nr_tags += data->nr_split; ws = bt_wait_ptr(bt, data->hctx); @@ -171,20 +180,26 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) * Retry tag allocation after running the hardware queue, * as running the queue may also have found completions. */ - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - break; + if (preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + break; + } sbitmap_prepare_to_wait(bt, ws, &wait, TASK_UNINTERRUPTIBLE); - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - break; + if (preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + break; + } bt_prev = bt; io_schedule(); sbitmap_finish_wait(bt, ws, &wait); + if (!blk_mq_is_tag_preemptive(data->hctx->flags)) + data->preemption = true; data->ctx = blk_mq_get_ctx(data->q); data->hctx = blk_mq_map_queue(data->q, data->cmd_flags, diff --git a/block/blk-mq.c b/block/blk-mq.c index 9bace9e2c5ca..06ba6fa9ec1a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -470,6 +470,9 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) retry: data->ctx = blk_mq_get_ctx(q); data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx); + if (blk_mq_is_tag_preemptive(data->hctx->flags)) + data->preemption = true; + if (!(data->rq_flags & RQF_ELV)) blk_mq_tag_busy(data->hctx); @@ -577,6 +580,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, data.hctx = xa_load(&q->hctx_table, hctx_idx); if (!blk_mq_hw_queue_mapped(data.hctx)) goto out_queue_exit; + if (blk_mq_is_tag_preemptive(data.hctx->flags)) + data.preemption = true; cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask); data.ctx = __blk_mq_get_ctx(q, cpu); @@ -2738,6 +2743,7 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q, .nr_tags = 1, .cmd_flags = bio->bi_opf, .nr_split = bio->bi_nr_split, + .preemption = (bio->bi_opf & REQ_PREEMPTIVE), }; struct request *rq; diff --git a/block/blk-mq.h b/block/blk-mq.h index 3eabe394a5a9..915bb710dd6f 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -157,6 +157,7 @@ struct blk_mq_alloc_data { /* allocate multiple requests/tags in one go */ unsigned int nr_tags; unsigned int nr_split; + bool preemption; struct request **cached_rq; /* input & output parameter */ diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 702f6b83dc88..8fd9756f0a06 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -419,6 +419,7 @@ enum req_flag_bits { __REQ_DRV, __REQ_SWAP, /* swapping request. */ __REQ_SPLIT, /* io is splitted */ + __REQ_PREEMPTIVE, /* io can preempt tag */ __REQ_NR_BITS, /* stops here */ }; @@ -444,6 +445,7 @@ enum req_flag_bits { #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) #define REQ_SPLIT (1ULL << __REQ_SPLIT) +#define REQ_PREEMPTIVE (1ULL << __REQ_PREEMPTIVE) #define REQ_FAILFAST_MASK \ (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 9d04c0ecc8f7..1655c15ee11d 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -597,7 +597,8 @@ static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq) return NULL; } -static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned int nr_tags) +static unsigned int get_wake_nr(struct sbq_wait_state *ws, + unsigned int *nr_tags) { struct sbq_wait *wait; struct wait_queue_entry *entry; @@ -606,11 +607,13 @@ static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned int nr_tags) spin_lock_irq(&ws->wait.lock); list_for_each_entry(entry, &ws->wait.head, entry) { wait = container_of(entry, struct sbq_wait, wait); - if (nr_tags <= wait->nr_tags) + if (*nr_tags <= wait->nr_tags) { + *nr_tags = 0; break; + } nr++; - nr_tags -= wait->nr_tags; + *nr_tags -= wait->nr_tags; } spin_unlock_irq(&ws->wait.lock); @@ -648,7 +651,10 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); if (ret == wait_cnt) { sbq_index_atomic_inc(&sbq->wake_index); - wake_up_nr(&ws->wait, get_wake_nr(ws, wake_batch)); + wake_up_nr(&ws->wait, get_wake_nr(ws, &wake_batch)); + if (wake_batch) + sbitmap_queue_wake_all(sbq); + return false; } From patchwork Tue Mar 29 09:40:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12794616 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30B38C433F5 for ; Tue, 29 Mar 2022 09:26:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234662AbiC2J2d (ORCPT ); Tue, 29 Mar 2022 05:28:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234687AbiC2J1p (ORCPT ); Tue, 29 Mar 2022 05:27:45 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E09498F63; Tue, 29 Mar 2022 02:26:02 -0700 (PDT) Received: from kwepemi100007.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KSPK26vctzfZB8; Tue, 29 Mar 2022 17:24:22 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100007.china.huawei.com (7.221.188.115) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:26:00 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 29 Mar 2022 17:25:59 +0800 From: Yu Kuai To: , , , CC: , , , Subject: [PATCH -next RFC 6/6] sbitmap: force tag preemption if free tags are sufficient Date: Tue, 29 Mar 2022 17:40:48 +0800 Message-ID: <20220329094048.2107094-7-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220329094048.2107094-1-yukuai3@huawei.com> References: <20220329094048.2107094-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org If tag preemption is disabled and system is under high io pressure, inflight io should use up tags. Since new io will wait directly, this rely on waked up threads will obtain at least 'wake_batch' tags. However, this might be broken if 8 waitqueues is unbalanced. This patch tries to calculate free tags each time a 'ws' is woken up, and force tag preemption if free tags are sufficient. Signed-off-by: Yu Kuai --- block/blk-mq-tag.c | 3 ++- include/linux/sbitmap.h | 6 ++++++ lib/sbitmap.c | 5 +++++ 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 4e485bcc5820..55139a011e75 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -131,7 +131,8 @@ static inline bool preempt_tag(struct blk_mq_alloc_data *data, struct sbitmap_queue *bt) { return data->preemption || - atomic_read(&bt->ws_active) <= SBQ_WAIT_QUEUES; + atomic_read(&bt->ws_active) <= SBQ_WAIT_QUEUES || + READ_ONCE(bt->force_tag_preemption); } unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 9c8c6da3d820..7a0ea8c0692b 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -118,6 +118,12 @@ struct sbitmap_queue { */ unsigned int wake_batch; + /** + * @force_tag_preemption: prrempt tag even is tag preemption is + * disabled. + */ + bool force_tag_preemption; + /** * @wake_index: Next wait queue in @ws to wake up. */ diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 1655c15ee11d..49241b44f163 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -432,6 +432,7 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, sbq->min_shallow_depth = UINT_MAX; sbq->wake_batch = sbq_calc_wake_batch(sbq, depth); + sbq->force_tag_preemption = 0; atomic_set(&sbq->wake_index, 0); atomic_set(&sbq->ws_active, 0); @@ -650,6 +651,10 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) */ ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); if (ret == wait_cnt) { + bool force = (sbq->sb.depth - sbitmap_weight(&sbq->sb) > + READ_ONCE(sbq->wake_batch) * 2); + + WRITE_ONCE(sbq->force_tag_preemption, force); sbq_index_atomic_inc(&sbq->wake_index); wake_up_nr(&ws->wait, get_wake_nr(ws, &wake_batch)); if (wake_batch)