From patchwork Fri Apr 8 07:39:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12806199 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D3DEC433F5 for ; Fri, 8 Apr 2022 07:24:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229817AbiDHH0o (ORCPT ); Fri, 8 Apr 2022 03:26:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229752AbiDHH0m (ORCPT ); Fri, 8 Apr 2022 03:26:42 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D54E3693B5; Fri, 8 Apr 2022 00:24:40 -0700 (PDT) Received: from kwepemi100008.china.huawei.com (unknown [172.30.72.56]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4KZV9j1XTQz1HBfb; Fri, 8 Apr 2022 15:24:09 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100008.china.huawei.com (7.221.188.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 15:24:38 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 8 Apr 2022 15:24:37 +0800 From: Yu Kuai To: , , , , CC: , , Subject: [PATCH -next RFC v2 1/8] sbitmap: record the number of waiters for each waitqueue Date: Fri, 8 Apr 2022 15:39:09 +0800 Message-ID: <20220408073916.1428590-2-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220408073916.1428590-1-yukuai3@huawei.com> References: <20220408073916.1428590-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a counter in struct sbq_wait_stat to recored how many threads are waiting on the waitqueue, this will be used in later patches to make sure 8 waitqueues are balanced. Such counter will also be shown in debugfs so that user can see if waitqueues are balanced. Signed-off-by: Yu Kuai --- include/linux/sbitmap.h | 5 +++++ lib/sbitmap.c | 7 +++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 8f5a86e210b9..8a64271d0696 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -91,6 +91,11 @@ struct sbq_wait_state { */ atomic_t wait_cnt; + /** + * @waiters_cnt: Number of active waiters + */ + atomic_t waiters_cnt; + /** * @wait: Wait queue. */ diff --git a/lib/sbitmap.c b/lib/sbitmap.c index ae4fd4de9ebe..393f2b71647a 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -444,6 +444,7 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, for (i = 0; i < SBQ_WAIT_QUEUES; i++) { init_waitqueue_head(&sbq->ws[i].wait); atomic_set(&sbq->ws[i].wait_cnt, sbq->wake_batch); + atomic_set(&sbq->ws[i].waiters_cnt, 0); } return 0; @@ -759,9 +760,9 @@ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m) for (i = 0; i < SBQ_WAIT_QUEUES; i++) { struct sbq_wait_state *ws = &sbq->ws[i]; - seq_printf(m, "\t{.wait_cnt=%d, .wait=%s},\n", + seq_printf(m, "\t{.wait_cnt=%d, .waiters_cnt=%d},\n", atomic_read(&ws->wait_cnt), - waitqueue_active(&ws->wait) ? "active" : "inactive"); + atomic_read(&ws->waiters_cnt)); } seq_puts(m, "}\n"); @@ -798,6 +799,7 @@ void sbitmap_prepare_to_wait(struct sbitmap_queue *sbq, { if (!sbq_wait->sbq) { atomic_inc(&sbq->ws_active); + atomic_inc(&ws->waiters_cnt); sbq_wait->sbq = sbq; } prepare_to_wait_exclusive(&ws->wait, &sbq_wait->wait, state); @@ -810,6 +812,7 @@ void sbitmap_finish_wait(struct sbitmap_queue *sbq, struct sbq_wait_state *ws, finish_wait(&ws->wait, &sbq_wait->wait); if (sbq_wait->sbq) { atomic_dec(&sbq->ws_active); + atomic_dec(&ws->waiters_cnt); sbq_wait->sbq = NULL; } } From patchwork Fri Apr 8 07:39:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12806198 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C4E4C433FE for ; Fri, 8 Apr 2022 07:24:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229736AbiDHH0n (ORCPT ); Fri, 8 Apr 2022 03:26:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229762AbiDHH0m (ORCPT ); Fri, 8 Apr 2022 03:26:42 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DDFE369E0B; Fri, 8 Apr 2022 00:24:40 -0700 (PDT) Received: from kwepemi500001.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KZV9j47XwzdZlq; Fri, 8 Apr 2022 15:24:09 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi500001.china.huawei.com (7.221.188.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 15:24:38 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 8 Apr 2022 15:24:38 +0800 From: Yu Kuai To: , , , , CC: , , Subject: [PATCH -next RFC v2 2/8] blk-mq: call 'bt_wait_ptr()' later in blk_mq_get_tag() Date: Fri, 8 Apr 2022 15:39:10 +0800 Message-ID: <20220408073916.1428590-3-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220408073916.1428590-1-yukuai3@huawei.com> References: <20220408073916.1428590-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org bt_wait_ptr() will increase 'wait_index', however, if blk_mq_get_tag() get a tag successfully after bt_wait_ptr() is called and before sbitmap_prepare_to_wait() is called, then the 'ws' is skipped. This behavior might cause 8 waitqueues to be unbalanced. Move bt_wait_ptr() later should reduce the problem when the disk is under high io preesure. Signed-off-by: Yu Kuai --- block/blk-mq-tag.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 68ac23d0b640..228a0001694f 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -155,7 +155,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (data->flags & BLK_MQ_REQ_NOWAIT) return BLK_MQ_NO_TAG; - ws = bt_wait_ptr(bt, data->hctx); do { struct sbitmap_queue *bt_prev; @@ -174,6 +173,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (tag != BLK_MQ_NO_TAG) break; + ws = bt_wait_ptr(bt, data->hctx); sbitmap_prepare_to_wait(bt, ws, &wait, TASK_UNINTERRUPTIBLE); tag = __blk_mq_get_tag(data, bt); @@ -201,8 +201,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) */ if (bt != bt_prev) sbitmap_queue_wake_up(bt_prev); - - ws = bt_wait_ptr(bt, data->hctx); } while (1); sbitmap_finish_wait(bt, ws, &wait); From patchwork Fri Apr 8 07:39:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12806200 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD280C433F5 for ; Fri, 8 Apr 2022 07:24:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229864AbiDHH0q (ORCPT ); Fri, 8 Apr 2022 03:26:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229840AbiDHH0p (ORCPT ); Fri, 8 Apr 2022 03:26:45 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C057E369E1D; Fri, 8 Apr 2022 00:24:41 -0700 (PDT) Received: from kwepemi100009.china.huawei.com (unknown [172.30.72.53]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4KZV5Q6gtSzBs9J; Fri, 8 Apr 2022 15:20:26 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100009.china.huawei.com (7.221.188.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 15:24:39 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 8 Apr 2022 15:24:38 +0800 From: Yu Kuai To: , , , , CC: , , Subject: [PATCH -next RFC v2 3/8] sbitmap: make sure waitqueues are balanced Date: Fri, 8 Apr 2022 15:39:11 +0800 Message-ID: <20220408073916.1428590-4-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220408073916.1428590-1-yukuai3@huawei.com> References: <20220408073916.1428590-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, same waitqueue might be woken up continuously: __sbq_wake_up __sbq_wake_up sbq_wake_ptr -> assume 0 sbq_wake_ptr -> 0 atomic_dec_return atomic_dec_return atomic_cmpxchg -> succeed atomic_cmpxchg -> failed return true __sbq_wake_up sbq_wake_ptr atomic_read(&sbq->wake_index) -> still 0 sbq_index_atomic_inc -> inc to 1 if (waitqueue_active(&ws->wait)) if (wake_index != atomic_read(&sbq->wake_index)) atomic_set -> reset from 1 to 0 wake_up_nr -> wake up first waitqueue // continue to wake up in first waitqueue What's worse, io hung is possible in theory because wake up might be missed. For example, 2 * wake_batch tags are put, while only wake_batch threads are worken: __sbq_wake_up atomic_cmpxchg -> reset wait_cnt __sbq_wake_up -> decrease wait_cnt ... __sbq_wake_up -> wait_cnt is decreased to 0 again atomic_cmpxchg sbq_index_atomic_inc -> increase wake_index wake_up_nr -> wake up and waitqueue might be empty sbq_index_atomic_inc -> increase again, one waitqueue is skipped wake_up_nr -> invalid wake up because old wakequeue might be empty To fix the problem, refactor to make sure waitqueues will be woken up one by one, and also choose the next waitqueue by the number of threads that are waiting to keep waitqueues balanced. Test cmd: nr_requests is 64, and queue_depth is 32 [global] filename=/dev/sda ioengine=libaio direct=1 allow_mounted_write=0 group_reporting [test] rw=randwrite bs=4k numjobs=512 iodepth=2 Before this patch, waitqueues can be extremly unbalanced, for example: ws_active=484 ws={ {.wait_cnt=8, .waiters_cnt=117}, {.wait_cnt=8, .waiters_cnt=59}, {.wait_cnt=8, .waiters_cnt=76}, {.wait_cnt=8, .waiters_cnt=0}, {.wait_cnt=5, .waiters_cnt=24}, {.wait_cnt=8, .waiters_cnt=12}, {.wait_cnt=8, .waiters_cnt=21}, {.wait_cnt=8, .waiters_cnt=175}, } With this patch, waitqueues is always balanced, for example: ws_active=477 ws={ {.wait_cnt=8, .waiters_cnt=59}, {.wait_cnt=6, .waiters_cnt=62}, {.wait_cnt=8, .waiters_cnt=61}, {.wait_cnt=8, .waiters_cnt=60}, {.wait_cnt=8, .waiters_cnt=63}, {.wait_cnt=8, .waiters_cnt=56}, {.wait_cnt=8, .waiters_cnt=59}, {.wait_cnt=8, .waiters_cnt=57}, } Signed-off-by: Yu Kuai --- lib/sbitmap.c | 81 ++++++++++++++++++++++++++------------------------- 1 file changed, 42 insertions(+), 39 deletions(-) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 393f2b71647a..176fba0252d7 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -575,68 +575,71 @@ void sbitmap_queue_min_shallow_depth(struct sbitmap_queue *sbq, } EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth); -static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq) +/* always choose the 'ws' with the max waiters */ +static void sbq_update_wake_index(struct sbitmap_queue *sbq, + int old_wake_index) { - int i, wake_index; + int index, wake_index; + int max_waiters = 0; - if (!atomic_read(&sbq->ws_active)) - return NULL; + if (old_wake_index != atomic_read(&sbq->wake_index)) + return; - wake_index = atomic_read(&sbq->wake_index); - for (i = 0; i < SBQ_WAIT_QUEUES; i++) { - struct sbq_wait_state *ws = &sbq->ws[wake_index]; + for (wake_index = 0; wake_index < SBQ_WAIT_QUEUES; wake_index++) { + struct sbq_wait_state *ws; + int waiters; - if (waitqueue_active(&ws->wait)) { - if (wake_index != atomic_read(&sbq->wake_index)) - atomic_set(&sbq->wake_index, wake_index); - return ws; - } + if (wake_index == old_wake_index) + continue; - wake_index = sbq_index_inc(wake_index); + ws = &sbq->ws[wake_index]; + waiters = atomic_read(&ws->waiters_cnt); + if (waiters > max_waiters) { + max_waiters = waiters; + index = wake_index; + } } - return NULL; + if (max_waiters) + atomic_cmpxchg(&sbq->wake_index, old_wake_index, index); } static bool __sbq_wake_up(struct sbitmap_queue *sbq) { struct sbq_wait_state *ws; unsigned int wake_batch; - int wait_cnt; + int wait_cnt, wake_index; - ws = sbq_wake_ptr(sbq); - if (!ws) + if (!atomic_read(&sbq->ws_active)) return false; + wake_index = atomic_read(&sbq->wake_index); + ws = &sbq->ws[wake_index]; wait_cnt = atomic_dec_return(&ws->wait_cnt); - if (wait_cnt <= 0) { - int ret; - - wake_batch = READ_ONCE(sbq->wake_batch); - - /* - * Pairs with the memory barrier in sbitmap_queue_resize() to - * ensure that we see the batch size update before the wait - * count is reset. - */ - smp_mb__before_atomic(); - + if (wait_cnt > 0) { + return false; + } else if (wait_cnt < 0) { /* - * For concurrent callers of this, the one that failed the - * atomic_cmpxhcg() race should call this function again + * Concurrent callers should call this function again * to wakeup a new batch on a different 'ws'. */ - ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); - if (ret == wait_cnt) { - sbq_index_atomic_inc(&sbq->wake_index); - wake_up_nr(&ws->wait, wake_batch); - return false; - } - + sbq_update_wake_index(sbq, wake_index); return true; } - return false; + sbq_update_wake_index(sbq, wake_index); + wake_batch = READ_ONCE(sbq->wake_batch); + + /* + * Pairs with the memory barrier in sbitmap_queue_resize() to + * ensure that we see the batch size update before the wait + * count is reset. + */ + smp_mb__before_atomic(); + atomic_set(&ws->wait_cnt, wake_batch); + wake_up_nr(&ws->wait, wake_batch); + + return true; } void sbitmap_queue_wake_up(struct sbitmap_queue *sbq) From patchwork Fri Apr 8 07:39:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12806201 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A64B1C433EF for ; Fri, 8 Apr 2022 07:24:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229931AbiDHH0w (ORCPT ); Fri, 8 Apr 2022 03:26:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229852AbiDHH0q (ORCPT ); Fri, 8 Apr 2022 03:26:46 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2F3F369E0B; Fri, 8 Apr 2022 00:24:41 -0700 (PDT) Received: from kwepemi100007.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KZV8G3RLhzgYYk; Fri, 8 Apr 2022 15:22:54 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100007.china.huawei.com (7.221.188.115) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 15:24:40 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 8 Apr 2022 15:24:39 +0800 From: Yu Kuai To: , , , , CC: , , Subject: [PATCH -next RFC v2 4/8] blk-mq: don't preempt tag under heavy load Date: Fri, 8 Apr 2022 15:39:12 +0800 Message-ID: <20220408073916.1428590-5-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220408073916.1428590-1-yukuai3@huawei.com> References: <20220408073916.1428590-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Tag preemption is the default behaviour, specifically blk_mq_get_tag() will try to get tag unconditionally, which means a new io can preempt tag even if there are lots of ios that are waiting for tags. Such behaviour doesn't make sense when the disk is under heavy load, because it will intensify competition without improving performance, especially for huge io as split io is unlikely to be issued continuously. The idle way to disable tag preemption is to track how many tags are available, and wait directly in blk_mq_get_tag() if free tags are very little. However, this is out of reality because fast path is affected. As 'ws_active' is only updated in slow path, this patch disable tag preemption if 'ws_active' is greater than 8, which means there are many threads waiting for tags already. Once tag preemption is disabled, there is a situation that can cause performance degration(or io hung in extreme scenarios): the waitqueue doesn't have 'wake_batch' threads, thus wake up on this waitqueue might cause the concurrency of ios to be decreased. The next patch will fix this problem. Signed-off-by: Yu Kuai --- block/blk-mq-tag.c | 36 +++++++++++++++++++++++++----------- block/blk-mq.h | 1 + 2 files changed, 26 insertions(+), 11 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 228a0001694f..be2d49e6d69e 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -127,6 +127,13 @@ unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags, return ret; } +static inline bool preempt_tag(struct blk_mq_alloc_data *data, + struct sbitmap_queue *bt) +{ + return data->preemption || + atomic_read(&bt->ws_active) <= SBQ_WAIT_QUEUES; +} + unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) { struct blk_mq_tags *tags = blk_mq_tags_from_data(data); @@ -148,12 +155,14 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) tag_offset = tags->nr_reserved_tags; } - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - goto found_tag; + if (data->flags & BLK_MQ_REQ_NOWAIT || preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + goto found_tag; - if (data->flags & BLK_MQ_REQ_NOWAIT) - return BLK_MQ_NO_TAG; + if (data->flags & BLK_MQ_REQ_NOWAIT) + return BLK_MQ_NO_TAG; + } do { struct sbitmap_queue *bt_prev; @@ -169,20 +178,25 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) * Retry tag allocation after running the hardware queue, * as running the queue may also have found completions. */ - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - break; + if (preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + break; + } ws = bt_wait_ptr(bt, data->hctx); sbitmap_prepare_to_wait(bt, ws, &wait, TASK_UNINTERRUPTIBLE); - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - break; + if (preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + break; + } bt_prev = bt; io_schedule(); + data->preemption = true; sbitmap_finish_wait(bt, ws, &wait); data->ctx = blk_mq_get_ctx(data->q); diff --git a/block/blk-mq.h b/block/blk-mq.h index 2615bd58bad3..b49b20e11350 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -156,6 +156,7 @@ struct blk_mq_alloc_data { /* allocate multiple requests/tags in one go */ unsigned int nr_tags; + bool preemption; struct request **cached_rq; /* input & output parameter */ From patchwork Fri Apr 8 07:39:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12806202 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 994DDC433F5 for ; Fri, 8 Apr 2022 07:24:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229861AbiDHH0x (ORCPT ); Fri, 8 Apr 2022 03:26:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229854AbiDHH0q (ORCPT ); Fri, 8 Apr 2022 03:26:46 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2C72369E34; Fri, 8 Apr 2022 00:24:42 -0700 (PDT) Received: from kwepemi100006.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KZV8H1G1yzgYXx; Fri, 8 Apr 2022 15:22:55 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100006.china.huawei.com (7.221.188.165) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 15:24:40 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 8 Apr 2022 15:24:40 +0800 From: Yu Kuai To: , , , , CC: , , Subject: [PATCH -next RFC v2 5/8] sbitmap: force tag preemption if free tags are sufficient Date: Fri, 8 Apr 2022 15:39:13 +0800 Message-ID: <20220408073916.1428590-6-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220408073916.1428590-1-yukuai3@huawei.com> References: <20220408073916.1428590-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Now that tag preemption is disabled, if wakers doesn't use up 'wake_batch' tags while preemption is still disabled, io concurrency will be declined. To fix the problem, add a detection before wake up, and force tag preemption is free tags are sufficient, so that the extra tags can be used by new io. Signed-off-by: Yu Kuai --- block/blk-mq-tag.c | 3 ++- include/linux/sbitmap.h | 2 ++ lib/sbitmap.c | 11 +++++++++++ 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index be2d49e6d69e..dfbb06edfbc3 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -131,7 +131,8 @@ static inline bool preempt_tag(struct blk_mq_alloc_data *data, struct sbitmap_queue *bt) { return data->preemption || - atomic_read(&bt->ws_active) <= SBQ_WAIT_QUEUES; + atomic_read(&bt->ws_active) <= SBQ_WAIT_QUEUES || + bt->force_tag_preemption; } unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 8a64271d0696..ca00ccb6af48 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -143,6 +143,8 @@ struct sbitmap_queue { * sbitmap_queue_get_shallow() */ unsigned int min_shallow_depth; + + bool force_tag_preemption; }; /** diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 176fba0252d7..8d01e02ea4b1 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -434,6 +434,7 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, sbq->wake_batch = sbq_calc_wake_batch(sbq, depth); atomic_set(&sbq->wake_index, 0); atomic_set(&sbq->ws_active, 0); + sbq->force_tag_preemption = true; sbq->ws = kzalloc_node(SBQ_WAIT_QUEUES * sizeof(*sbq->ws), flags, node); if (!sbq->ws) { @@ -604,6 +605,15 @@ static void sbq_update_wake_index(struct sbitmap_queue *sbq, atomic_cmpxchg(&sbq->wake_index, old_wake_index, index); } +static inline void sbq_update_preemption(struct sbitmap_queue *sbq, + unsigned int wake_batch) +{ + bool force = (sbq->sb.depth - sbitmap_weight(&sbq->sb)) >= + wake_batch << 1; + + WRITE_ONCE(sbq->force_tag_preemption, force); +} + static bool __sbq_wake_up(struct sbitmap_queue *sbq) { struct sbq_wait_state *ws; @@ -637,6 +647,7 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) */ smp_mb__before_atomic(); atomic_set(&ws->wait_cnt, wake_batch); + sbq_update_preemption(sbq, wake_batch); wake_up_nr(&ws->wait, wake_batch); return true; From patchwork Fri Apr 8 07:39:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12806203 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2172C433FE for ; Fri, 8 Apr 2022 07:24:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229937AbiDHH0y (ORCPT ); Fri, 8 Apr 2022 03:26:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229862AbiDHH0q (ORCPT ); Fri, 8 Apr 2022 03:26:46 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5301D36A451; Fri, 8 Apr 2022 00:24:43 -0700 (PDT) Received: from kwepemi100003.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KZV9m2SBdzdZgB; Fri, 8 Apr 2022 15:24:12 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100003.china.huawei.com (7.221.188.122) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 15:24:41 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 8 Apr 2022 15:24:40 +0800 From: Yu Kuai To: , , , , CC: , , Subject: [PATCH -next RFC v2 6/8] blk-mq: force tag preemption for split bios Date: Fri, 8 Apr 2022 15:39:14 +0800 Message-ID: <20220408073916.1428590-7-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220408073916.1428590-1-yukuai3@huawei.com> References: <20220408073916.1428590-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org For HDD, sequential io is much faster than random io, thus it's better to issue split io continuously. However, this is broken when tag preemption is disabled, because wakers can only get one tag each time. Thus tag preemption should be disabled for split bios, specifically the first bio won't preempt tag, and following split bios will preempt tag. Signed-off-by: Yu Kuai --- block/blk-merge.c | 9 ++++++++- block/blk-mq.c | 1 + include/linux/blk_types.h | 4 ++++ 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 7771dacc99cb..cab6ca681513 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -343,12 +343,19 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio, if (split) { /* there isn't chance to merge the splitted bio */ - split->bi_opf |= REQ_NOMERGE; + split->bi_opf |= (REQ_NOMERGE | REQ_SPLIT); + if ((*bio)->bi_opf & REQ_SPLIT) + split->bi_opf |= REQ_PREEMPT; + else + (*bio)->bi_opf |= REQ_SPLIT; bio_chain(split, *bio); trace_block_split(split, (*bio)->bi_iter.bi_sector); submit_bio_noacct(*bio); *bio = split; + } else { + if ((*bio)->bi_opf & REQ_SPLIT) + (*bio)->bi_opf |= REQ_PREEMPT; } } diff --git a/block/blk-mq.c b/block/blk-mq.c index ed3ed86f7dd2..909420c5186c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2737,6 +2737,7 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q, .q = q, .nr_tags = 1, .cmd_flags = bio->bi_opf, + .preemption = (bio->bi_opf & REQ_PREEMPT), }; struct request *rq; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index c62274466e72..6b56e271f926 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -418,6 +418,8 @@ enum req_flag_bits { /* for driver use */ __REQ_DRV, __REQ_SWAP, /* swapping request. */ + __REQ_SPLIT, + __REQ_PREEMPT, __REQ_NR_BITS, /* stops here */ }; @@ -443,6 +445,8 @@ enum req_flag_bits { #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) +#define REQ_SPLIT (1ULL << __REQ_SPLIT) +#define REQ_PREEMPT (1ULL << __REQ_PREEMPT) #define REQ_FAILFAST_MASK \ (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER) From patchwork Fri Apr 8 07:39:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12806204 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A0B7C433F5 for ; Fri, 8 Apr 2022 07:24:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229752AbiDHH04 (ORCPT ); Fri, 8 Apr 2022 03:26:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbiDHH0u (ORCPT ); Fri, 8 Apr 2022 03:26:50 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C23D136A467; Fri, 8 Apr 2022 00:24:44 -0700 (PDT) Received: from kwepemi100004.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4KZV5T548xzBs9H; Fri, 8 Apr 2022 15:20:29 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100004.china.huawei.com (7.221.188.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 15:24:42 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 8 Apr 2022 15:24:41 +0800 From: Yu Kuai To: , , , , CC: , , Subject: [PATCH -next RFC v2 7/8] blk-mq: record how many tags are needed for splited bio Date: Fri, 8 Apr 2022 15:39:15 +0800 Message-ID: <20220408073916.1428590-8-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220408073916.1428590-1-yukuai3@huawei.com> References: <20220408073916.1428590-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, each time 8(or wake batch) requests is done, 8 waiters will be woken up, this is not necessary because we only need to make sure wakers will use up 8 tags. For example, if we know in advance that a thread need 8 tags, then wake up one thread is enough, and this can also avoid unnecessary context switch. This patch tries to provide such information that how many tags will be needed for huge io, and it will be used in next patch. Signed-off-by: Yu Kuai --- block/blk-mq-tag.c | 1 + block/blk-mq.c | 24 +++++++++++++++++++++--- block/blk-mq.h | 1 + include/linux/sbitmap.h | 2 ++ 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index dfbb06edfbc3..f91879772dc8 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -165,6 +165,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) return BLK_MQ_NO_TAG; } + wait.nr_tags += data->nr_split; do { struct sbitmap_queue *bt_prev; diff --git a/block/blk-mq.c b/block/blk-mq.c index 909420c5186c..65a3b11d5c9f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2731,12 +2731,14 @@ static bool blk_mq_attempt_bio_merge(struct request_queue *q, static struct request *blk_mq_get_new_requests(struct request_queue *q, struct blk_plug *plug, struct bio *bio, - unsigned int nsegs) + unsigned int nsegs, + unsigned int nr_split) { struct blk_mq_alloc_data data = { .q = q, .nr_tags = 1, .cmd_flags = bio->bi_opf, + .nr_split = nr_split, .preemption = (bio->bi_opf & REQ_PREEMPT), }; struct request *rq; @@ -2795,6 +2797,19 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, return rq; } +static inline unsigned int caculate_sectors_split(struct bio *bio) +{ + switch (bio_op(bio)) { + case REQ_OP_DISCARD: + case REQ_OP_SECURE_ERASE: + case REQ_OP_WRITE_ZEROES: + return 0; + default: + return (bio_sectors(bio) - 1) / + queue_max_sectors(bio->bi_bdev->bd_queue); + } +} + /** * blk_mq_submit_bio - Create and send a request to block device. * @bio: Bio pointer. @@ -2815,11 +2830,14 @@ void blk_mq_submit_bio(struct bio *bio) const int is_sync = op_is_sync(bio->bi_opf); struct request *rq; unsigned int nr_segs = 1; + unsigned int nr_split = 0; blk_status_t ret; blk_queue_bounce(q, &bio); - if (blk_may_split(q, bio)) + if (blk_may_split(q, bio)) { + nr_split = caculate_sectors_split(bio); __blk_queue_split(q, &bio, &nr_segs); + } if (!bio_integrity_prep(bio)) return; @@ -2828,7 +2846,7 @@ void blk_mq_submit_bio(struct bio *bio) if (!rq) { if (!bio) return; - rq = blk_mq_get_new_requests(q, plug, bio, nr_segs); + rq = blk_mq_get_new_requests(q, plug, bio, nr_segs, nr_split); if (unlikely(!rq)) return; } diff --git a/block/blk-mq.h b/block/blk-mq.h index b49b20e11350..dfb2f1b9bf06 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -156,6 +156,7 @@ struct blk_mq_alloc_data { /* allocate multiple requests/tags in one go */ unsigned int nr_tags; + unsigned int nr_split; bool preemption; struct request **cached_rq; diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index ca00ccb6af48..1abd8ed5d406 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -596,12 +596,14 @@ void sbitmap_queue_wake_up(struct sbitmap_queue *sbq); void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m); struct sbq_wait { + unsigned int nr_tags; struct sbitmap_queue *sbq; /* if set, sbq_wait is accounted */ struct wait_queue_entry wait; }; #define DEFINE_SBQ_WAIT(name) \ struct sbq_wait name = { \ + .nr_tags = 1, \ .sbq = NULL, \ .wait = { \ .private = current, \ From patchwork Fri Apr 8 07:39:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12806205 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A13A9C433EF for ; Fri, 8 Apr 2022 07:24:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229913AbiDHH05 (ORCPT ); Fri, 8 Apr 2022 03:26:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229870AbiDHH0s (ORCPT ); Fri, 8 Apr 2022 03:26:48 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C24C036A468; Fri, 8 Apr 2022 00:24:44 -0700 (PDT) Received: from kwepemi100002.china.huawei.com (unknown [172.30.72.57]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KZV8K4MVNzgYYr; Fri, 8 Apr 2022 15:22:57 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100002.china.huawei.com (7.221.188.188) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 15:24:42 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 8 Apr 2022 15:24:42 +0800 From: Yu Kuai To: , , , , CC: , , Subject: [PATCH -next RFC v2 8/8] sbitmap: wake up the number of threads based on required tags Date: Fri, 8 Apr 2022 15:39:16 +0800 Message-ID: <20220408073916.1428590-9-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220408073916.1428590-1-yukuai3@huawei.com> References: <20220408073916.1428590-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Always wake up 'wake_batch' threads will intensify competition and split io won't be issued continuously. Now that how many tags is required is recorded for huge io, it's safe to wake up baed on required tags. Signed-off-by: Yu Kuai --- lib/sbitmap.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 8d01e02ea4b1..eac9fa5c2b4d 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -614,6 +614,26 @@ static inline void sbq_update_preemption(struct sbitmap_queue *sbq, WRITE_ONCE(sbq->force_tag_preemption, force); } +static unsigned int get_wake_nr(struct sbq_wait_state *ws, unsigned int nr_tags) +{ + struct sbq_wait *wait; + struct wait_queue_entry *entry; + unsigned int nr = 1; + + spin_lock_irq(&ws->wait.lock); + list_for_each_entry(entry, &ws->wait.head, entry) { + wait = container_of(entry, struct sbq_wait, wait); + if (nr_tags <= wait->nr_tags) + break; + + nr++; + nr_tags -= wait->nr_tags; + } + spin_unlock_irq(&ws->wait.lock); + + return nr; +} + static bool __sbq_wake_up(struct sbitmap_queue *sbq) { struct sbq_wait_state *ws; @@ -648,7 +668,7 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) smp_mb__before_atomic(); atomic_set(&ws->wait_cnt, wake_batch); sbq_update_preemption(sbq, wake_batch); - wake_up_nr(&ws->wait, wake_batch); + wake_up_nr(&ws->wait, get_wake_nr(ws, wake_batch)); return true; }