From patchwork Sat Apr 6 08:00:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13619693 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A49422EF3; Sat, 6 Apr 2024 08:09:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390957; cv=none; b=M3zes1VSXXOdHZgDfC80/TTSD+lUctWMhQqpsChDd0KOI6ZJrbpB5fobCBV9u0t/2pfbqSmFEvsFQ3ywxxS0kKSBMEVI548UskYTDFIBCb56bReqbkM7sptVsZzzQ168HDPTVxNALKUJe9ziulYrTN8+GI3vJqbPPnfMUiOtCWY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390957; c=relaxed/simple; bh=xxcx8OkpTohRM9azF4Zm7hMXmOs2NXhfcFEvsEQmFKI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=I3BYrzXYco9UxoQvRo11jdux8691jgeWoNsszvcJzgrAFpb8x9nXxytAn3qpZu9BDU397TCnriIX0/n3swlYWqZnIUnbOdEC9QOpTtNFKmcQN10vzrDtUM3+R/Uum7N/m57Xb24XhYbTpOV7u6GaP29XN11CTMXO5X6ZTYTzUHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4VBSg16JLRz4f3lVb; Sat, 6 Apr 2024 16:09:01 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 59FE31A058D; Sat, 6 Apr 2024 16:09:10 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAjAxFmnZTkJA--.2125S5; Sat, 06 Apr 2024 16:09:09 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, chenhuacai@kernel.org, tj@kernel.org, josef@toxicpanda.com, jhs@mojatatu.com, svenjoac@gmx.de, raven@themaw.net, pctammela@mojatatu.com, yukuai3@huawei.com, qde@naccy.de, zhaotianrui@loongson.cn Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, cgroups@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH RFC v2 1/6] blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW Date: Sat, 6 Apr 2024 16:00:54 +0800 Message-Id: <20240406080059.2248314-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240406080059.2248314-1-yukuai1@huaweicloud.com> References: <20240406080059.2248314-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAjAxFmnZTkJA--.2125S5 X-Coremail-Antispam: 1UD129KBjvAXoWDJFykZF4DuFy8Jry3tr4Utwb_yoWrAFWrXo WfXrs3Jwn7Gr18Kw4jgwnrtF43ua1kArs8ZrsYv3WxXF1xJ3yDu3y2ya43Ja4fCa1akFZ8 CFy0gr1Yqa1xAr1fn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYR7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r18M28IrcIa0x kI8VCY1x0267AKxVW8JVW5JwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l84AC jcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr 1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1l e2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI 8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwAC jcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka0x kIwI1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8Jw CI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfU8ZXoUUUU U X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai One the one hand, it's marked EXPERIMENTAL since 2017, and looks like there are no users since then, and no testers and no developers, it's just not active at all. On the other hand, even if the config is disabled, there are still many fields in throtl_grp and throtl_data and many functions that are only used for throtl low. At last, currently blk-throtl is initialized during disk initialization, and destroyed during disk removal, and it exposes many functions to be called directly from block layer. Hence I'm planning to optimize blk-throtl and finially support building it as kernel module. Remove throtl low will make the work much easier. Signed-off-by: Yu Kuai Acked-by: Tejun Heo --- Documentation/ABI/stable/sysfs-block | 12 - arch/loongarch/configs/loongson3_defconfig | 1 - block/Kconfig | 11 - block/bio.c | 1 - block/blk-stat.c | 3 - block/blk-sysfs.c | 7 - block/blk-throttle.c | 901 +-------------------- block/blk-throttle.h | 26 +- block/blk.h | 11 - 9 files changed, 45 insertions(+), 928 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 1fe9a553c37b..31bbc3b20089 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -584,18 +584,6 @@ Description: the data. If no such restriction exists, this file will contain '0'. This file is writable for testing purposes. - -What: /sys/block//queue/throttle_sample_time -Date: March 2017 -Contact: linux-block@vger.kernel.org -Description: - [RW] This is the time window that blk-throttle samples data, in - millisecond. blk-throttle makes decision based on the - samplings. Lower time means cgroups have more smooth throughput, - but higher CPU overhead. This exists only when - CONFIG_BLK_DEV_THROTTLING_LOW is enabled. - - What: /sys/block//queue/virt_boundary_mask Date: April 2021 Contact: linux-block@vger.kernel.org diff --git a/arch/loongarch/configs/loongson3_defconfig b/arch/loongarch/configs/loongson3_defconfig index f18c2ba871ef..fc0d89d4c1c5 100644 --- a/arch/loongarch/configs/loongson3_defconfig +++ b/arch/loongarch/configs/loongson3_defconfig @@ -76,7 +76,6 @@ CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_BLK_DEV_ZONED=y CONFIG_BLK_DEV_THROTTLING=y -CONFIG_BLK_DEV_THROTTLING_LOW=y CONFIG_BLK_WBT=y CONFIG_BLK_CGROUP_IOLATENCY=y CONFIG_BLK_CGROUP_FC_APPID=y diff --git a/block/Kconfig b/block/Kconfig index 1de4682d48cc..72814e485f82 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -120,17 +120,6 @@ config BLK_DEV_THROTTLING See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information. -config BLK_DEV_THROTTLING_LOW - bool "Block throttling .low limit interface support (EXPERIMENTAL)" - depends on BLK_DEV_THROTTLING - help - Add .low limit interface for block throttling. The low limit is a best - effort limit to prioritize cgroups. Depending on the setting, the limit - can be used to protect cgroups in terms of bandwidth/iops and better - utilize disk resource. - - Note, this is an experimental interface and could be changed someday. - config BLK_WBT bool "Enable support for block device writeback throttling" help diff --git a/block/bio.c b/block/bio.c index d24420ed1c4c..4404cd0a2690 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1596,7 +1596,6 @@ void bio_endio(struct bio *bio) goto again; } - blk_throtl_bio_endio(bio); /* release cgroup info */ bio_uninit(bio); if (bio->bi_end_io) diff --git a/block/blk-stat.c b/block/blk-stat.c index e42c263e53fb..eaf60097bbe1 100644 --- a/block/blk-stat.c +++ b/block/blk-stat.c @@ -57,9 +57,6 @@ void blk_stat_add(struct request *rq, u64 now) value = (now >= rq->io_start_time_ns) ? now - rq->io_start_time_ns : 0; - if (req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE) - blk_throtl_stat_add(rq, value); - rcu_read_lock(); cpu = get_cpu(); list_for_each_entry_rcu(cb, &q->stats->callbacks, list) { diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 8c8f69d8ba48..674c5c602364 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -516,10 +516,6 @@ QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout"); QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask"); QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment"); -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW -QUEUE_RW_ENTRY(blk_throtl_sample_time, "throttle_sample_time"); -#endif - /* legacy alias for logical_block_size: */ static struct queue_sysfs_entry queue_hw_sector_size_entry = { .attr = {.name = "hw_sector_size", .mode = 0444 }, @@ -640,9 +636,6 @@ static struct attribute *queue_attrs[] = { &queue_fua_entry.attr, &queue_dax_entry.attr, &queue_poll_delay_entry.attr, -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW - &blk_throtl_sample_time_entry.attr, -#endif &queue_virt_boundary_mask_entry.attr, &queue_dma_alignment_entry.attr, NULL, diff --git a/block/blk-throttle.c b/block/blk-throttle.c index f4850a6f860b..1e45e48564f4 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -25,18 +25,6 @@ #define DFL_THROTL_SLICE_HD (HZ / 10) #define DFL_THROTL_SLICE_SSD (HZ / 50) #define MAX_THROTL_SLICE (HZ) -#define MAX_IDLE_TIME (5L * 1000 * 1000) /* 5 s */ -#define MIN_THROTL_BPS (320 * 1024) -#define MIN_THROTL_IOPS (10) -#define DFL_LATENCY_TARGET (-1L) -#define DFL_IDLE_THRESHOLD (0) -#define DFL_HD_BASELINE_LATENCY (4000L) /* 4ms */ -#define LATENCY_FILTERED_SSD (0) -/* - * For HD, very small latency comes from sequential IO. Such IO is helpless to - * help determine if its IO is impacted by others, hence we ignore the IO - */ -#define LATENCY_FILTERED_HD (1000L) /* 1ms */ /* A workqueue to queue throttle related work */ static struct workqueue_struct *kthrotld_workqueue; @@ -70,19 +58,6 @@ struct throtl_data /* Work for dispatching throttled bios */ struct work_struct dispatch_work; - unsigned int limit_index; - bool limit_valid[LIMIT_CNT]; - - unsigned long low_upgrade_time; - unsigned long low_downgrade_time; - - unsigned int scale; - - struct latency_bucket tmp_buckets[2][LATENCY_BUCKET_SIZE]; - struct avg_latency_bucket avg_buckets[2][LATENCY_BUCKET_SIZE]; - struct latency_bucket __percpu *latency_buckets[2]; - unsigned long last_calculate_time; - unsigned long filtered_latency; bool track_bio_latency; }; @@ -126,84 +101,24 @@ static struct throtl_data *sq_to_td(struct throtl_service_queue *sq) return container_of(sq, struct throtl_data, service_queue); } -/* - * cgroup's limit in LIMIT_MAX is scaled if low limit is set. This scale is to - * make the IO dispatch more smooth. - * Scale up: linearly scale up according to elapsed time since upgrade. For - * every throtl_slice, the limit scales up 1/2 .low limit till the - * limit hits .max limit - * Scale down: exponentially scale down if a cgroup doesn't hit its .low limit - */ -static uint64_t throtl_adjusted_limit(uint64_t low, struct throtl_data *td) -{ - /* arbitrary value to avoid too big scale */ - if (td->scale < 4096 && time_after_eq(jiffies, - td->low_upgrade_time + td->scale * td->throtl_slice)) - td->scale = (jiffies - td->low_upgrade_time) / td->throtl_slice; - - return low + (low >> 1) * td->scale; -} - static uint64_t tg_bps_limit(struct throtl_grp *tg, int rw) { struct blkcg_gq *blkg = tg_to_blkg(tg); - struct throtl_data *td; - uint64_t ret; if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent) return U64_MAX; - td = tg->td; - ret = tg->bps[rw][td->limit_index]; - if (ret == 0 && td->limit_index == LIMIT_LOW) { - /* intermediate node or iops isn't 0 */ - if (!list_empty(&blkg->blkcg->css.children) || - tg->iops[rw][td->limit_index]) - return U64_MAX; - else - return MIN_THROTL_BPS; - } - - if (td->limit_index == LIMIT_MAX && tg->bps[rw][LIMIT_LOW] && - tg->bps[rw][LIMIT_LOW] != tg->bps[rw][LIMIT_MAX]) { - uint64_t adjusted; - - adjusted = throtl_adjusted_limit(tg->bps[rw][LIMIT_LOW], td); - ret = min(tg->bps[rw][LIMIT_MAX], adjusted); - } - return ret; + return tg->bps[rw]; } static unsigned int tg_iops_limit(struct throtl_grp *tg, int rw) { struct blkcg_gq *blkg = tg_to_blkg(tg); - struct throtl_data *td; - unsigned int ret; if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent) return UINT_MAX; - td = tg->td; - ret = tg->iops[rw][td->limit_index]; - if (ret == 0 && tg->td->limit_index == LIMIT_LOW) { - /* intermediate node or bps isn't 0 */ - if (!list_empty(&blkg->blkcg->css.children) || - tg->bps[rw][td->limit_index]) - return UINT_MAX; - else - return MIN_THROTL_IOPS; - } - - if (td->limit_index == LIMIT_MAX && tg->iops[rw][LIMIT_LOW] && - tg->iops[rw][LIMIT_LOW] != tg->iops[rw][LIMIT_MAX]) { - uint64_t adjusted; - - adjusted = throtl_adjusted_limit(tg->iops[rw][LIMIT_LOW], td); - if (adjusted > UINT_MAX) - adjusted = UINT_MAX; - ret = min_t(unsigned int, tg->iops[rw][LIMIT_MAX], adjusted); - } - return ret; + return tg->iops[rw]; } #define request_bucket_index(sectors) \ @@ -359,20 +274,10 @@ static struct blkg_policy_data *throtl_pd_alloc(struct gendisk *disk, } RB_CLEAR_NODE(&tg->rb_node); - tg->bps[READ][LIMIT_MAX] = U64_MAX; - tg->bps[WRITE][LIMIT_MAX] = U64_MAX; - tg->iops[READ][LIMIT_MAX] = UINT_MAX; - tg->iops[WRITE][LIMIT_MAX] = UINT_MAX; - tg->bps_conf[READ][LIMIT_MAX] = U64_MAX; - tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX; - tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX; - tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX; - /* LIMIT_LOW will have default value 0 */ - - tg->latency_target = DFL_LATENCY_TARGET; - tg->latency_target_conf = DFL_LATENCY_TARGET; - tg->idletime_threshold = DFL_IDLE_THRESHOLD; - tg->idletime_threshold_conf = DFL_IDLE_THRESHOLD; + tg->bps[READ] = U64_MAX; + tg->bps[WRITE] = U64_MAX; + tg->iops[READ] = UINT_MAX; + tg->iops[WRITE] = UINT_MAX; return &tg->pd; @@ -418,18 +323,15 @@ static void throtl_pd_init(struct blkg_policy_data *pd) static void tg_update_has_rules(struct throtl_grp *tg) { struct throtl_grp *parent_tg = sq_to_tg(tg->service_queue.parent_sq); - struct throtl_data *td = tg->td; int rw; for (rw = READ; rw <= WRITE; rw++) { tg->has_rules_iops[rw] = (parent_tg && parent_tg->has_rules_iops[rw]) || - (td->limit_valid[td->limit_index] && - tg_iops_limit(tg, rw) != UINT_MAX); + tg_iops_limit(tg, rw) != UINT_MAX; tg->has_rules_bps[rw] = (parent_tg && parent_tg->has_rules_bps[rw]) || - (td->limit_valid[td->limit_index] && - (tg_bps_limit(tg, rw) != U64_MAX)); + tg_bps_limit(tg, rw) != U64_MAX; } } @@ -443,49 +345,6 @@ static void throtl_pd_online(struct blkg_policy_data *pd) tg_update_has_rules(tg); } -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW -static void blk_throtl_update_limit_valid(struct throtl_data *td) -{ - struct cgroup_subsys_state *pos_css; - struct blkcg_gq *blkg; - bool low_valid = false; - - rcu_read_lock(); - blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg) { - struct throtl_grp *tg = blkg_to_tg(blkg); - - if (tg->bps[READ][LIMIT_LOW] || tg->bps[WRITE][LIMIT_LOW] || - tg->iops[READ][LIMIT_LOW] || tg->iops[WRITE][LIMIT_LOW]) { - low_valid = true; - break; - } - } - rcu_read_unlock(); - - td->limit_valid[LIMIT_LOW] = low_valid; -} -#else -static inline void blk_throtl_update_limit_valid(struct throtl_data *td) -{ -} -#endif - -static void throtl_upgrade_state(struct throtl_data *td); -static void throtl_pd_offline(struct blkg_policy_data *pd) -{ - struct throtl_grp *tg = pd_to_tg(pd); - - tg->bps[READ][LIMIT_LOW] = 0; - tg->bps[WRITE][LIMIT_LOW] = 0; - tg->iops[READ][LIMIT_LOW] = 0; - tg->iops[WRITE][LIMIT_LOW] = 0; - - blk_throtl_update_limit_valid(tg->td); - - if (!tg->td->limit_valid[tg->td->limit_index]) - throtl_upgrade_state(tg->td); -} - static void throtl_pd_free(struct blkg_policy_data *pd) { struct throtl_grp *tg = pd_to_tg(pd); @@ -1151,8 +1010,6 @@ static int throtl_select_dispatch(struct throtl_service_queue *parent_sq) return nr_disp; } -static bool throtl_can_upgrade(struct throtl_data *td, - struct throtl_grp *this_tg); /** * throtl_pending_timer_fn - timer function for service_queue->pending_timer * @t: the pending_timer member of the throtl_service_queue being serviced @@ -1189,9 +1046,6 @@ static void throtl_pending_timer_fn(struct timer_list *t) if (!q->root_blkg) goto out_unlock; - if (throtl_can_upgrade(td, NULL)) - throtl_upgrade_state(td); - again: parent_sq = sq->parent_sq; dispatched = false; @@ -1339,14 +1193,6 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) !blkg->parent->parent) continue; parent_tg = blkg_to_tg(blkg->parent); - /* - * make sure all children has lower idle time threshold and - * higher latency target - */ - this_tg->idletime_threshold = min(this_tg->idletime_threshold, - parent_tg->idletime_threshold); - this_tg->latency_target = max(this_tg->latency_target, - parent_tg->latency_target); } rcu_read_unlock(); @@ -1444,25 +1290,25 @@ static int tg_print_rwstat_recursive(struct seq_file *sf, void *v) static struct cftype throtl_legacy_files[] = { { .name = "throttle.read_bps_device", - .private = offsetof(struct throtl_grp, bps[READ][LIMIT_MAX]), + .private = offsetof(struct throtl_grp, bps[READ]), .seq_show = tg_print_conf_u64, .write = tg_set_conf_u64, }, { .name = "throttle.write_bps_device", - .private = offsetof(struct throtl_grp, bps[WRITE][LIMIT_MAX]), + .private = offsetof(struct throtl_grp, bps[WRITE]), .seq_show = tg_print_conf_u64, .write = tg_set_conf_u64, }, { .name = "throttle.read_iops_device", - .private = offsetof(struct throtl_grp, iops[READ][LIMIT_MAX]), + .private = offsetof(struct throtl_grp, iops[READ]), .seq_show = tg_print_conf_uint, .write = tg_set_conf_uint, }, { .name = "throttle.write_iops_device", - .private = offsetof(struct throtl_grp, iops[WRITE][LIMIT_MAX]), + .private = offsetof(struct throtl_grp, iops[WRITE]), .seq_show = tg_print_conf_uint, .write = tg_set_conf_uint, }, @@ -1497,58 +1343,30 @@ static u64 tg_prfill_limit(struct seq_file *sf, struct blkg_policy_data *pd, char bufs[4][21] = { "max", "max", "max", "max" }; u64 bps_dft; unsigned int iops_dft; - char idle_time[26] = ""; - char latency_time[26] = ""; if (!dname) return 0; - if (off == LIMIT_LOW) { - bps_dft = 0; - iops_dft = 0; - } else { - bps_dft = U64_MAX; - iops_dft = UINT_MAX; - } + bps_dft = U64_MAX; + iops_dft = UINT_MAX; - if (tg->bps_conf[READ][off] == bps_dft && - tg->bps_conf[WRITE][off] == bps_dft && - tg->iops_conf[READ][off] == iops_dft && - tg->iops_conf[WRITE][off] == iops_dft && - (off != LIMIT_LOW || - (tg->idletime_threshold_conf == DFL_IDLE_THRESHOLD && - tg->latency_target_conf == DFL_LATENCY_TARGET))) + if (tg->bps[READ] == bps_dft && + tg->bps[WRITE] == bps_dft && + tg->iops[READ] == iops_dft && + tg->iops[WRITE] == iops_dft) return 0; - if (tg->bps_conf[READ][off] != U64_MAX) - snprintf(bufs[0], sizeof(bufs[0]), "%llu", - tg->bps_conf[READ][off]); - if (tg->bps_conf[WRITE][off] != U64_MAX) - snprintf(bufs[1], sizeof(bufs[1]), "%llu", - tg->bps_conf[WRITE][off]); - if (tg->iops_conf[READ][off] != UINT_MAX) - snprintf(bufs[2], sizeof(bufs[2]), "%u", - tg->iops_conf[READ][off]); - if (tg->iops_conf[WRITE][off] != UINT_MAX) - snprintf(bufs[3], sizeof(bufs[3]), "%u", - tg->iops_conf[WRITE][off]); - if (off == LIMIT_LOW) { - if (tg->idletime_threshold_conf == ULONG_MAX) - strcpy(idle_time, " idle=max"); - else - snprintf(idle_time, sizeof(idle_time), " idle=%lu", - tg->idletime_threshold_conf); - - if (tg->latency_target_conf == ULONG_MAX) - strcpy(latency_time, " latency=max"); - else - snprintf(latency_time, sizeof(latency_time), - " latency=%lu", tg->latency_target_conf); - } - - seq_printf(sf, "%s rbps=%s wbps=%s riops=%s wiops=%s%s%s\n", - dname, bufs[0], bufs[1], bufs[2], bufs[3], idle_time, - latency_time); + if (tg->bps[READ] != U64_MAX) + snprintf(bufs[0], sizeof(bufs[0]), "%llu", tg->bps[READ]); + if (tg->bps[WRITE] != U64_MAX) + snprintf(bufs[1], sizeof(bufs[1]), "%llu", tg->bps[WRITE]); + if (tg->iops[READ] != UINT_MAX) + snprintf(bufs[2], sizeof(bufs[2]), "%u", tg->iops[READ]); + if (tg->iops[WRITE] != UINT_MAX) + snprintf(bufs[3], sizeof(bufs[3]), "%u", tg->iops[WRITE]); + + seq_printf(sf, "%s rbps=%s wbps=%s riops=%s wiops=%s\n", + dname, bufs[0], bufs[1], bufs[2], bufs[3]); return 0; } @@ -1566,10 +1384,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, struct blkg_conf_ctx ctx; struct throtl_grp *tg; u64 v[4]; - unsigned long idle_time; - unsigned long latency_time; int ret; - int index = of_cft(of)->private; blkg_conf_init(&ctx, buf); @@ -1580,13 +1395,11 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, tg = blkg_to_tg(ctx.blkg); tg_update_carryover(tg); - v[0] = tg->bps_conf[READ][index]; - v[1] = tg->bps_conf[WRITE][index]; - v[2] = tg->iops_conf[READ][index]; - v[3] = tg->iops_conf[WRITE][index]; + v[0] = tg->bps[READ]; + v[1] = tg->bps[WRITE]; + v[2] = tg->iops[READ]; + v[3] = tg->iops[WRITE]; - idle_time = tg->idletime_threshold_conf; - latency_time = tg->latency_target_conf; while (true) { char tok[27]; /* wiops=18446744073709551616 */ char *p; @@ -1618,60 +1431,16 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, v[2] = min_t(u64, val, UINT_MAX); else if (!strcmp(tok, "wiops") && val > 1) v[3] = min_t(u64, val, UINT_MAX); - else if (off == LIMIT_LOW && !strcmp(tok, "idle")) - idle_time = val; - else if (off == LIMIT_LOW && !strcmp(tok, "latency")) - latency_time = val; else goto out_finish; } - tg->bps_conf[READ][index] = v[0]; - tg->bps_conf[WRITE][index] = v[1]; - tg->iops_conf[READ][index] = v[2]; - tg->iops_conf[WRITE][index] = v[3]; + tg->bps[READ] = v[0]; + tg->bps[WRITE] = v[1]; + tg->iops[READ] = v[2]; + tg->iops[WRITE] = v[3]; - if (index == LIMIT_MAX) { - tg->bps[READ][index] = v[0]; - tg->bps[WRITE][index] = v[1]; - tg->iops[READ][index] = v[2]; - tg->iops[WRITE][index] = v[3]; - } - tg->bps[READ][LIMIT_LOW] = min(tg->bps_conf[READ][LIMIT_LOW], - tg->bps_conf[READ][LIMIT_MAX]); - tg->bps[WRITE][LIMIT_LOW] = min(tg->bps_conf[WRITE][LIMIT_LOW], - tg->bps_conf[WRITE][LIMIT_MAX]); - tg->iops[READ][LIMIT_LOW] = min(tg->iops_conf[READ][LIMIT_LOW], - tg->iops_conf[READ][LIMIT_MAX]); - tg->iops[WRITE][LIMIT_LOW] = min(tg->iops_conf[WRITE][LIMIT_LOW], - tg->iops_conf[WRITE][LIMIT_MAX]); - tg->idletime_threshold_conf = idle_time; - tg->latency_target_conf = latency_time; - - /* force user to configure all settings for low limit */ - if (!(tg->bps[READ][LIMIT_LOW] || tg->iops[READ][LIMIT_LOW] || - tg->bps[WRITE][LIMIT_LOW] || tg->iops[WRITE][LIMIT_LOW]) || - tg->idletime_threshold_conf == DFL_IDLE_THRESHOLD || - tg->latency_target_conf == DFL_LATENCY_TARGET) { - tg->bps[READ][LIMIT_LOW] = 0; - tg->bps[WRITE][LIMIT_LOW] = 0; - tg->iops[READ][LIMIT_LOW] = 0; - tg->iops[WRITE][LIMIT_LOW] = 0; - tg->idletime_threshold = DFL_IDLE_THRESHOLD; - tg->latency_target = DFL_LATENCY_TARGET; - } else if (index == LIMIT_LOW) { - tg->idletime_threshold = tg->idletime_threshold_conf; - tg->latency_target = tg->latency_target_conf; - } - - blk_throtl_update_limit_valid(tg->td); - if (tg->td->limit_valid[LIMIT_LOW]) { - if (index == LIMIT_LOW) - tg->td->limit_index = LIMIT_LOW; - } else - tg->td->limit_index = LIMIT_MAX; - tg_conf_updated(tg, index == LIMIT_LOW && - tg->td->limit_valid[LIMIT_LOW]); + tg_conf_updated(tg, false); ret = 0; out_finish: blkg_conf_exit(&ctx); @@ -1679,21 +1448,11 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, } static struct cftype throtl_files[] = { -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW - { - .name = "low", - .flags = CFTYPE_NOT_ON_ROOT, - .seq_show = tg_print_limit, - .write = tg_set_limit, - .private = LIMIT_LOW, - }, -#endif { .name = "max", .flags = CFTYPE_NOT_ON_ROOT, .seq_show = tg_print_limit, .write = tg_set_limit, - .private = LIMIT_MAX, }, { } /* terminate */ }; @@ -1712,7 +1471,6 @@ struct blkcg_policy blkcg_policy_throtl = { .pd_alloc_fn = throtl_pd_alloc, .pd_init_fn = throtl_pd_init, .pd_online_fn = throtl_pd_online, - .pd_offline_fn = throtl_pd_offline, .pd_free_fn = throtl_pd_free, }; @@ -1761,418 +1519,6 @@ void blk_throtl_cancel_bios(struct gendisk *disk) spin_unlock_irq(&q->queue_lock); } -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW -static unsigned long __tg_last_low_overflow_time(struct throtl_grp *tg) -{ - unsigned long rtime = jiffies, wtime = jiffies; - - if (tg->bps[READ][LIMIT_LOW] || tg->iops[READ][LIMIT_LOW]) - rtime = tg->last_low_overflow_time[READ]; - if (tg->bps[WRITE][LIMIT_LOW] || tg->iops[WRITE][LIMIT_LOW]) - wtime = tg->last_low_overflow_time[WRITE]; - return min(rtime, wtime); -} - -static unsigned long tg_last_low_overflow_time(struct throtl_grp *tg) -{ - struct throtl_service_queue *parent_sq; - struct throtl_grp *parent = tg; - unsigned long ret = __tg_last_low_overflow_time(tg); - - while (true) { - parent_sq = parent->service_queue.parent_sq; - parent = sq_to_tg(parent_sq); - if (!parent) - break; - - /* - * The parent doesn't have low limit, it always reaches low - * limit. Its overflow time is useless for children - */ - if (!parent->bps[READ][LIMIT_LOW] && - !parent->iops[READ][LIMIT_LOW] && - !parent->bps[WRITE][LIMIT_LOW] && - !parent->iops[WRITE][LIMIT_LOW]) - continue; - if (time_after(__tg_last_low_overflow_time(parent), ret)) - ret = __tg_last_low_overflow_time(parent); - } - return ret; -} - -static bool throtl_tg_is_idle(struct throtl_grp *tg) -{ - /* - * cgroup is idle if: - * - single idle is too long, longer than a fixed value (in case user - * configure a too big threshold) or 4 times of idletime threshold - * - average think time is more than threshold - * - IO latency is largely below threshold - */ - unsigned long time; - bool ret; - - time = min_t(unsigned long, MAX_IDLE_TIME, 4 * tg->idletime_threshold); - ret = tg->latency_target == DFL_LATENCY_TARGET || - tg->idletime_threshold == DFL_IDLE_THRESHOLD || - (blk_time_get_ns() >> 10) - tg->last_finish_time > time || - tg->avg_idletime > tg->idletime_threshold || - (tg->latency_target && tg->bio_cnt && - tg->bad_bio_cnt * 5 < tg->bio_cnt); - throtl_log(&tg->service_queue, - "avg_idle=%ld, idle_threshold=%ld, bad_bio=%d, total_bio=%d, is_idle=%d, scale=%d", - tg->avg_idletime, tg->idletime_threshold, tg->bad_bio_cnt, - tg->bio_cnt, ret, tg->td->scale); - return ret; -} - -static bool throtl_low_limit_reached(struct throtl_grp *tg, int rw) -{ - struct throtl_service_queue *sq = &tg->service_queue; - bool limit = tg->bps[rw][LIMIT_LOW] || tg->iops[rw][LIMIT_LOW]; - - /* - * if low limit is zero, low limit is always reached. - * if low limit is non-zero, we can check if there is any request - * is queued to determine if low limit is reached as we throttle - * request according to limit. - */ - return !limit || sq->nr_queued[rw]; -} - -static bool throtl_tg_can_upgrade(struct throtl_grp *tg) -{ - /* - * cgroup reaches low limit when low limit of READ and WRITE are - * both reached, it's ok to upgrade to next limit if cgroup reaches - * low limit - */ - if (throtl_low_limit_reached(tg, READ) && - throtl_low_limit_reached(tg, WRITE)) - return true; - - if (time_after_eq(jiffies, - tg_last_low_overflow_time(tg) + tg->td->throtl_slice) && - throtl_tg_is_idle(tg)) - return true; - return false; -} - -static bool throtl_hierarchy_can_upgrade(struct throtl_grp *tg) -{ - while (true) { - if (throtl_tg_can_upgrade(tg)) - return true; - tg = sq_to_tg(tg->service_queue.parent_sq); - if (!tg || !tg_to_blkg(tg)->parent) - return false; - } - return false; -} - -static bool throtl_can_upgrade(struct throtl_data *td, - struct throtl_grp *this_tg) -{ - struct cgroup_subsys_state *pos_css; - struct blkcg_gq *blkg; - - if (td->limit_index != LIMIT_LOW) - return false; - - if (time_before(jiffies, td->low_downgrade_time + td->throtl_slice)) - return false; - - rcu_read_lock(); - blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg) { - struct throtl_grp *tg = blkg_to_tg(blkg); - - if (tg == this_tg) - continue; - if (!list_empty(&tg_to_blkg(tg)->blkcg->css.children)) - continue; - if (!throtl_hierarchy_can_upgrade(tg)) { - rcu_read_unlock(); - return false; - } - } - rcu_read_unlock(); - return true; -} - -static void throtl_upgrade_check(struct throtl_grp *tg) -{ - unsigned long now = jiffies; - - if (tg->td->limit_index != LIMIT_LOW) - return; - - if (time_after(tg->last_check_time + tg->td->throtl_slice, now)) - return; - - tg->last_check_time = now; - - if (!time_after_eq(now, - __tg_last_low_overflow_time(tg) + tg->td->throtl_slice)) - return; - - if (throtl_can_upgrade(tg->td, NULL)) - throtl_upgrade_state(tg->td); -} - -static void throtl_upgrade_state(struct throtl_data *td) -{ - struct cgroup_subsys_state *pos_css; - struct blkcg_gq *blkg; - - throtl_log(&td->service_queue, "upgrade to max"); - td->limit_index = LIMIT_MAX; - td->low_upgrade_time = jiffies; - td->scale = 0; - rcu_read_lock(); - blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg) { - struct throtl_grp *tg = blkg_to_tg(blkg); - struct throtl_service_queue *sq = &tg->service_queue; - - tg->disptime = jiffies - 1; - throtl_select_dispatch(sq); - throtl_schedule_next_dispatch(sq, true); - } - rcu_read_unlock(); - throtl_select_dispatch(&td->service_queue); - throtl_schedule_next_dispatch(&td->service_queue, true); - queue_work(kthrotld_workqueue, &td->dispatch_work); -} - -static void throtl_downgrade_state(struct throtl_data *td) -{ - td->scale /= 2; - - throtl_log(&td->service_queue, "downgrade, scale %d", td->scale); - if (td->scale) { - td->low_upgrade_time = jiffies - td->scale * td->throtl_slice; - return; - } - - td->limit_index = LIMIT_LOW; - td->low_downgrade_time = jiffies; -} - -static bool throtl_tg_can_downgrade(struct throtl_grp *tg) -{ - struct throtl_data *td = tg->td; - unsigned long now = jiffies; - - /* - * If cgroup is below low limit, consider downgrade and throttle other - * cgroups - */ - if (time_after_eq(now, tg_last_low_overflow_time(tg) + - td->throtl_slice) && - (!throtl_tg_is_idle(tg) || - !list_empty(&tg_to_blkg(tg)->blkcg->css.children))) - return true; - return false; -} - -static bool throtl_hierarchy_can_downgrade(struct throtl_grp *tg) -{ - struct throtl_data *td = tg->td; - - if (time_before(jiffies, td->low_upgrade_time + td->throtl_slice)) - return false; - - while (true) { - if (!throtl_tg_can_downgrade(tg)) - return false; - tg = sq_to_tg(tg->service_queue.parent_sq); - if (!tg || !tg_to_blkg(tg)->parent) - break; - } - return true; -} - -static void throtl_downgrade_check(struct throtl_grp *tg) -{ - uint64_t bps; - unsigned int iops; - unsigned long elapsed_time; - unsigned long now = jiffies; - - if (tg->td->limit_index != LIMIT_MAX || - !tg->td->limit_valid[LIMIT_LOW]) - return; - if (!list_empty(&tg_to_blkg(tg)->blkcg->css.children)) - return; - if (time_after(tg->last_check_time + tg->td->throtl_slice, now)) - return; - - elapsed_time = now - tg->last_check_time; - tg->last_check_time = now; - - if (time_before(now, tg_last_low_overflow_time(tg) + - tg->td->throtl_slice)) - return; - - if (tg->bps[READ][LIMIT_LOW]) { - bps = tg->last_bytes_disp[READ] * HZ; - do_div(bps, elapsed_time); - if (bps >= tg->bps[READ][LIMIT_LOW]) - tg->last_low_overflow_time[READ] = now; - } - - if (tg->bps[WRITE][LIMIT_LOW]) { - bps = tg->last_bytes_disp[WRITE] * HZ; - do_div(bps, elapsed_time); - if (bps >= tg->bps[WRITE][LIMIT_LOW]) - tg->last_low_overflow_time[WRITE] = now; - } - - if (tg->iops[READ][LIMIT_LOW]) { - iops = tg->last_io_disp[READ] * HZ / elapsed_time; - if (iops >= tg->iops[READ][LIMIT_LOW]) - tg->last_low_overflow_time[READ] = now; - } - - if (tg->iops[WRITE][LIMIT_LOW]) { - iops = tg->last_io_disp[WRITE] * HZ / elapsed_time; - if (iops >= tg->iops[WRITE][LIMIT_LOW]) - tg->last_low_overflow_time[WRITE] = now; - } - - /* - * If cgroup is below low limit, consider downgrade and throttle other - * cgroups - */ - if (throtl_hierarchy_can_downgrade(tg)) - throtl_downgrade_state(tg->td); - - tg->last_bytes_disp[READ] = 0; - tg->last_bytes_disp[WRITE] = 0; - tg->last_io_disp[READ] = 0; - tg->last_io_disp[WRITE] = 0; -} - -static void blk_throtl_update_idletime(struct throtl_grp *tg) -{ - unsigned long now; - unsigned long last_finish_time = tg->last_finish_time; - - if (last_finish_time == 0) - return; - - now = blk_time_get_ns() >> 10; - if (now <= last_finish_time || - last_finish_time == tg->checked_last_finish_time) - return; - - tg->avg_idletime = (tg->avg_idletime * 7 + now - last_finish_time) >> 3; - tg->checked_last_finish_time = last_finish_time; -} - -static void throtl_update_latency_buckets(struct throtl_data *td) -{ - struct avg_latency_bucket avg_latency[2][LATENCY_BUCKET_SIZE]; - int i, cpu, rw; - unsigned long last_latency[2] = { 0 }; - unsigned long latency[2]; - - if (!blk_queue_nonrot(td->queue) || !td->limit_valid[LIMIT_LOW]) - return; - if (time_before(jiffies, td->last_calculate_time + HZ)) - return; - td->last_calculate_time = jiffies; - - memset(avg_latency, 0, sizeof(avg_latency)); - for (rw = READ; rw <= WRITE; rw++) { - for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { - struct latency_bucket *tmp = &td->tmp_buckets[rw][i]; - - for_each_possible_cpu(cpu) { - struct latency_bucket *bucket; - - /* this isn't race free, but ok in practice */ - bucket = per_cpu_ptr(td->latency_buckets[rw], - cpu); - tmp->total_latency += bucket[i].total_latency; - tmp->samples += bucket[i].samples; - bucket[i].total_latency = 0; - bucket[i].samples = 0; - } - - if (tmp->samples >= 32) { - int samples = tmp->samples; - - latency[rw] = tmp->total_latency; - - tmp->total_latency = 0; - tmp->samples = 0; - latency[rw] /= samples; - if (latency[rw] == 0) - continue; - avg_latency[rw][i].latency = latency[rw]; - } - } - } - - for (rw = READ; rw <= WRITE; rw++) { - for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { - if (!avg_latency[rw][i].latency) { - if (td->avg_buckets[rw][i].latency < last_latency[rw]) - td->avg_buckets[rw][i].latency = - last_latency[rw]; - continue; - } - - if (!td->avg_buckets[rw][i].valid) - latency[rw] = avg_latency[rw][i].latency; - else - latency[rw] = (td->avg_buckets[rw][i].latency * 7 + - avg_latency[rw][i].latency) >> 3; - - td->avg_buckets[rw][i].latency = max(latency[rw], - last_latency[rw]); - td->avg_buckets[rw][i].valid = true; - last_latency[rw] = td->avg_buckets[rw][i].latency; - } - } - - for (i = 0; i < LATENCY_BUCKET_SIZE; i++) - throtl_log(&td->service_queue, - "Latency bucket %d: read latency=%ld, read valid=%d, " - "write latency=%ld, write valid=%d", i, - td->avg_buckets[READ][i].latency, - td->avg_buckets[READ][i].valid, - td->avg_buckets[WRITE][i].latency, - td->avg_buckets[WRITE][i].valid); -} -#else -static inline void throtl_update_latency_buckets(struct throtl_data *td) -{ -} - -static void blk_throtl_update_idletime(struct throtl_grp *tg) -{ -} - -static void throtl_downgrade_check(struct throtl_grp *tg) -{ -} - -static void throtl_upgrade_check(struct throtl_grp *tg) -{ -} - -static bool throtl_can_upgrade(struct throtl_data *td, - struct throtl_grp *this_tg) -{ - return false; -} - -static void throtl_upgrade_state(struct throtl_data *td) -{ -} -#endif - bool __blk_throtl_bio(struct bio *bio) { struct request_queue *q = bdev_get_queue(bio->bi_bdev); @@ -2185,21 +1531,12 @@ bool __blk_throtl_bio(struct bio *bio) struct throtl_data *td = tg->td; rcu_read_lock(); - spin_lock_irq(&q->queue_lock); - - throtl_update_latency_buckets(td); - - blk_throtl_update_idletime(tg); - sq = &tg->service_queue; -again: while (true) { if (tg->last_low_overflow_time[rw] == 0) tg->last_low_overflow_time[rw] = jiffies; - throtl_downgrade_check(tg); - throtl_upgrade_check(tg); /* throtl is FIFO - if bios are already queued, should queue */ if (sq->nr_queued[rw]) break; @@ -2207,10 +1544,6 @@ bool __blk_throtl_bio(struct bio *bio) /* if above limits, break to queue */ if (!tg_may_dispatch(tg, bio, NULL)) { tg->last_low_overflow_time[rw] = jiffies; - if (throtl_can_upgrade(td, tg)) { - throtl_upgrade_state(td); - goto again; - } break; } @@ -2270,101 +1603,12 @@ bool __blk_throtl_bio(struct bio *bio) } out_unlock: -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW - if (throttled || !td->track_bio_latency) - bio->bi_issue.value |= BIO_ISSUE_THROTL_SKIP_LATENCY; -#endif spin_unlock_irq(&q->queue_lock); rcu_read_unlock(); return throttled; } -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW -static void throtl_track_latency(struct throtl_data *td, sector_t size, - enum req_op op, unsigned long time) -{ - const bool rw = op_is_write(op); - struct latency_bucket *latency; - int index; - - if (!td || td->limit_index != LIMIT_LOW || - !(op == REQ_OP_READ || op == REQ_OP_WRITE) || - !blk_queue_nonrot(td->queue)) - return; - - index = request_bucket_index(size); - - latency = get_cpu_ptr(td->latency_buckets[rw]); - latency[index].total_latency += time; - latency[index].samples++; - put_cpu_ptr(td->latency_buckets[rw]); -} - -void blk_throtl_stat_add(struct request *rq, u64 time_ns) -{ - struct request_queue *q = rq->q; - struct throtl_data *td = q->td; - - throtl_track_latency(td, blk_rq_stats_sectors(rq), req_op(rq), - time_ns >> 10); -} - -void blk_throtl_bio_endio(struct bio *bio) -{ - struct blkcg_gq *blkg; - struct throtl_grp *tg; - u64 finish_time_ns; - unsigned long finish_time; - unsigned long start_time; - unsigned long lat; - int rw = bio_data_dir(bio); - - blkg = bio->bi_blkg; - if (!blkg) - return; - tg = blkg_to_tg(blkg); - if (!tg->td->limit_valid[LIMIT_LOW]) - return; - - finish_time_ns = blk_time_get_ns(); - tg->last_finish_time = finish_time_ns >> 10; - - start_time = bio_issue_time(&bio->bi_issue) >> 10; - finish_time = __bio_issue_time(finish_time_ns) >> 10; - if (!start_time || finish_time <= start_time) - return; - - lat = finish_time - start_time; - /* this is only for bio based driver */ - if (!(bio->bi_issue.value & BIO_ISSUE_THROTL_SKIP_LATENCY)) - throtl_track_latency(tg->td, bio_issue_size(&bio->bi_issue), - bio_op(bio), lat); - - if (tg->latency_target && lat >= tg->td->filtered_latency) { - int bucket; - unsigned int threshold; - - bucket = request_bucket_index(bio_issue_size(&bio->bi_issue)); - threshold = tg->td->avg_buckets[rw][bucket].latency + - tg->latency_target; - if (lat > threshold) - tg->bad_bio_cnt++; - /* - * Not race free, could get wrong count, which means cgroups - * will be throttled - */ - tg->bio_cnt++; - } - - if (time_after(jiffies, tg->bio_cnt_reset_time) || tg->bio_cnt > 1024) { - tg->bio_cnt_reset_time = tg->td->throtl_slice + jiffies; - tg->bio_cnt /= 2; - tg->bad_bio_cnt /= 2; - } -} -#endif - int blk_throtl_init(struct gendisk *disk) { struct request_queue *q = disk->queue; @@ -2374,19 +1618,6 @@ int blk_throtl_init(struct gendisk *disk) td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node); if (!td) return -ENOMEM; - td->latency_buckets[READ] = __alloc_percpu(sizeof(struct latency_bucket) * - LATENCY_BUCKET_SIZE, __alignof__(u64)); - if (!td->latency_buckets[READ]) { - kfree(td); - return -ENOMEM; - } - td->latency_buckets[WRITE] = __alloc_percpu(sizeof(struct latency_bucket) * - LATENCY_BUCKET_SIZE, __alignof__(u64)); - if (!td->latency_buckets[WRITE]) { - free_percpu(td->latency_buckets[READ]); - kfree(td); - return -ENOMEM; - } INIT_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn); throtl_service_queue_init(&td->service_queue); @@ -2394,18 +1625,10 @@ int blk_throtl_init(struct gendisk *disk) q->td = td; td->queue = q; - td->limit_valid[LIMIT_MAX] = true; - td->limit_index = LIMIT_MAX; - td->low_upgrade_time = jiffies; - td->low_downgrade_time = jiffies; - /* activate policy */ ret = blkcg_activate_policy(disk, &blkcg_policy_throtl); - if (ret) { - free_percpu(td->latency_buckets[READ]); - free_percpu(td->latency_buckets[WRITE]); + if (ret) kfree(td); - } return ret; } @@ -2417,8 +1640,6 @@ void blk_throtl_exit(struct gendisk *disk) del_timer_sync(&q->td->service_queue.pending_timer); throtl_shutdown_wq(q); blkcg_deactivate_policy(disk, &blkcg_policy_throtl); - free_percpu(q->td->latency_buckets[READ]); - free_percpu(q->td->latency_buckets[WRITE]); kfree(q->td); } @@ -2426,58 +1647,18 @@ void blk_throtl_register(struct gendisk *disk) { struct request_queue *q = disk->queue; struct throtl_data *td; - int i; td = q->td; BUG_ON(!td); - if (blk_queue_nonrot(q)) { + if (blk_queue_nonrot(q)) td->throtl_slice = DFL_THROTL_SLICE_SSD; - td->filtered_latency = LATENCY_FILTERED_SSD; - } else { + else td->throtl_slice = DFL_THROTL_SLICE_HD; - td->filtered_latency = LATENCY_FILTERED_HD; - for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { - td->avg_buckets[READ][i].latency = DFL_HD_BASELINE_LATENCY; - td->avg_buckets[WRITE][i].latency = DFL_HD_BASELINE_LATENCY; - } - } -#ifndef CONFIG_BLK_DEV_THROTTLING_LOW - /* if no low limit, use previous default */ - td->throtl_slice = DFL_THROTL_SLICE_HD; - -#else td->track_bio_latency = !queue_is_mq(q); if (!td->track_bio_latency) blk_stat_enable_accounting(q); -#endif -} - -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW -ssize_t blk_throtl_sample_time_show(struct request_queue *q, char *page) -{ - if (!q->td) - return -EINVAL; - return sprintf(page, "%u\n", jiffies_to_msecs(q->td->throtl_slice)); -} - -ssize_t blk_throtl_sample_time_store(struct request_queue *q, - const char *page, size_t count) -{ - unsigned long v; - unsigned long t; - - if (!q->td) - return -EINVAL; - if (kstrtoul(page, 10, &v)) - return -EINVAL; - t = msecs_to_jiffies(v); - if (t == 0 || t > MAX_THROTL_SLICE) - return -EINVAL; - q->td->throtl_slice = t; - return count; } -#endif static int __init throtl_init(void) { diff --git a/block/blk-throttle.h b/block/blk-throttle.h index bffbc9cfc8ab..32503fd83a84 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -58,12 +58,6 @@ enum tg_state_flags { THROTL_TG_CANCELING = 1 << 2, /* starts to cancel bio */ }; -enum { - LIMIT_LOW, - LIMIT_MAX, - LIMIT_CNT, -}; - struct throtl_grp { /* must be the first member */ struct blkg_policy_data pd; @@ -102,14 +96,14 @@ struct throtl_grp { bool has_rules_iops[2]; /* internally used bytes per second rate limits */ - uint64_t bps[2][LIMIT_CNT]; + uint64_t bps[2]; /* user configured bps limits */ - uint64_t bps_conf[2][LIMIT_CNT]; + uint64_t bps_conf[2]; /* internally used IOPS limits */ - unsigned int iops[2][LIMIT_CNT]; + unsigned int iops[2]; /* user configured IOPS limits */ - unsigned int iops_conf[2][LIMIT_CNT]; + unsigned int iops_conf[2]; /* Number of bytes dispatched in current slice */ uint64_t bytes_disp[2]; @@ -132,22 +126,10 @@ struct throtl_grp { unsigned long last_check_time; - unsigned long latency_target; /* us */ - unsigned long latency_target_conf; /* us */ /* When did we start a new slice */ unsigned long slice_start[2]; unsigned long slice_end[2]; - unsigned long last_finish_time; /* ns / 1024 */ - unsigned long checked_last_finish_time; /* ns / 1024 */ - unsigned long avg_idletime; /* ns / 1024 */ - unsigned long idletime_threshold; /* us */ - unsigned long idletime_threshold_conf; /* us */ - - unsigned int bio_cnt; /* total bios */ - unsigned int bad_bio_cnt; /* bios exceeding latency threshold */ - unsigned long bio_cnt_reset_time; - struct blkg_rwstat stat_bytes; struct blkg_rwstat stat_ios; }; diff --git a/block/blk.h b/block/blk.h index 5cac4e29ae17..47960bf31543 100644 --- a/block/blk.h +++ b/block/blk.h @@ -379,17 +379,6 @@ static inline void ioc_clear_queue(struct request_queue *q) } #endif /* CONFIG_BLK_ICQ */ -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW -extern ssize_t blk_throtl_sample_time_show(struct request_queue *q, char *page); -extern ssize_t blk_throtl_sample_time_store(struct request_queue *q, - const char *page, size_t count); -extern void blk_throtl_bio_endio(struct bio *bio); -extern void blk_throtl_stat_add(struct request *rq, u64 time); -#else -static inline void blk_throtl_bio_endio(struct bio *bio) { } -static inline void blk_throtl_stat_add(struct request *rq, u64 time) { } -#endif - struct bio *__blk_queue_bounce(struct bio *bio, struct request_queue *q); static inline bool blk_queue_may_bounce(struct request_queue *q) From patchwork Sat Apr 6 08:00:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13619689 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B833F22EF8; Sat, 6 Apr 2024 08:09:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390955; cv=none; b=SIC6imFsN3jBdC97M4X70HKAlLHvSnXgdSS8GD9Mmzv0u2gYip4DhZYUW11apoS1n2iqO00FFuipFIX0sxYFMliWxHyml0UBZdKmSEWwRJ3rMhsN5dSZc6boifmGE3OL3CLyR4JPz5P0wKlcETGfmslFxrGG4fxXlklp6PKup00= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390955; c=relaxed/simple; bh=wTUt1zlSK0M746ZO+xcUeqwViG+nxeL9MF8pN5h8K7g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=b0xTqTnoUFmPyUWFoVykcMYtupZcwowPuLZ3VxhPH3tWMl3kVch1jo8Z7yIJkrBKrV7rXKn1stLKstuXgn98uICZI+LNuoJzXftU6NOcQ/iVuWFGxr3xZBGz/QlmPw5q4r6NX3/0nfS4dWuE9SdlmJFT+iqgmwP/R6CgSeDcWk8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4VBSg64v9fz4f3jqt; Sat, 6 Apr 2024 16:09:06 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id F3D171A016E; Sat, 6 Apr 2024 16:09:10 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAjAxFmnZTkJA--.2125S6; Sat, 06 Apr 2024 16:09:10 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, chenhuacai@kernel.org, tj@kernel.org, josef@toxicpanda.com, jhs@mojatatu.com, svenjoac@gmx.de, raven@themaw.net, pctammela@mojatatu.com, yukuai3@huawei.com, qde@naccy.de, zhaotianrui@loongson.cn Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, cgroups@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH RFC v2 2/6] blk-throttle: delay initialization until configuration Date: Sat, 6 Apr 2024 16:00:55 +0800 Message-Id: <20240406080059.2248314-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240406080059.2248314-1-yukuai1@huaweicloud.com> References: <20240406080059.2248314-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAjAxFmnZTkJA--.2125S6 X-Coremail-Antispam: 1UD129KBjvJXoWxtF18Zr4kCF13GrWfXr4Utwb_yoWxuw43pF W3ur45t3y0qrsF9F45tr45JFW3Kws7K34fC393GrWayr42vwn0qF1qvr18ZFZ5AFs7uF43 Ar4DJrs8KF18u3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8Jw CI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUOJPEUUUU U X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai Other cgroup policy like bfq, iocost are lazy-initialized when they are configured for the first time for the device, but blk-throttle is initialized unconditionally from blkcg_init_disk(). Delay initialization of blk-throttle as well, to save some cpu and memory overhead if it's not configured. Noted that once it's initialized, it can't be destroyed until disk removal, even if it's disabled. And following patches will support to destroy blk-throttle after it's disabled. Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 6 --- block/blk-sysfs.c | 1 - block/blk-throttle.c | 114 +++++++++++++++++++++++++++---------------- block/blk-throttle.h | 10 ++-- 4 files changed, 78 insertions(+), 53 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index bdbb557feb5a..b5e626a16758 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1441,14 +1441,8 @@ int blkcg_init_disk(struct gendisk *disk) if (ret) goto err_destroy_all; - ret = blk_throtl_init(disk); - if (ret) - goto err_ioprio_exit; - return 0; -err_ioprio_exit: - blk_ioprio_exit(disk); err_destroy_all: blkg_destroy_all(disk); return ret; diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 674c5c602364..1e2a0b18360c 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -807,7 +807,6 @@ int blk_register_queue(struct gendisk *disk) blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); wbt_enable_default(disk); - blk_throtl_register(disk); /* Now everything is ready and send out KOBJ_ADD uevent */ kobject_uevent(&disk->queue_kobj, KOBJ_ADD); diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 1e45e48564f4..acb2758f45ef 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1213,6 +1213,53 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) } } +static int blk_throtl_init(struct gendisk *disk) +{ + struct request_queue *q = disk->queue; + struct throtl_data *td; + int ret; + + td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node); + if (!td) + return -ENOMEM; + + INIT_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn); + throtl_service_queue_init(&td->service_queue); + + /* + * freeze queue before setting q->td, so that IO path won't see + * q->td while policy is not activated yet. + */ + blk_mq_freeze_queue(disk->queue); + blk_mq_quiesce_queue(disk->queue); + + q->td = td; + td->queue = q; + + /* activate policy */ + ret = blkcg_activate_policy(disk, &blkcg_policy_throtl); + if (ret) { + q->td = NULL; + kfree(td); + goto out; + } + + if (blk_queue_nonrot(q)) + td->throtl_slice = DFL_THROTL_SLICE_SSD; + else + td->throtl_slice = DFL_THROTL_SLICE_HD; + td->track_bio_latency = !queue_is_mq(q); + if (!td->track_bio_latency) + blk_stat_enable_accounting(q); + +out: + blk_mq_unquiesce_queue(disk->queue); + blk_mq_unfreeze_queue(disk->queue); + + return ret; +} + + static ssize_t tg_set_conf(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off, bool is_u64) { @@ -1224,6 +1271,16 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of, blkg_conf_init(&ctx, buf); + ret = blkg_conf_open_bdev(&ctx); + if (ret) + goto out_finish; + + if (!ctx.bdev->bd_queue->td) { + ret = blk_throtl_init(ctx.bdev->bd_disk); + if (ret) + goto out_finish; + } + ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, &ctx); if (ret) goto out_finish; @@ -1388,6 +1445,16 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, blkg_conf_init(&ctx, buf); + ret = blkg_conf_open_bdev(&ctx); + if (ret) + goto out_finish; + + if (!ctx.bdev->bd_queue->td) { + ret = blk_throtl_init(ctx.bdev->bd_disk); + if (ret) + goto out_finish; + } + ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, &ctx); if (ret) goto out_finish; @@ -1480,6 +1547,9 @@ void blk_throtl_cancel_bios(struct gendisk *disk) struct cgroup_subsys_state *pos_css; struct blkcg_gq *blkg; + if (!q->td) + return; + spin_lock_irq(&q->queue_lock); /* * queue_lock is held, rcu lock is not needed here technically. @@ -1609,57 +1679,19 @@ bool __blk_throtl_bio(struct bio *bio) return throttled; } -int blk_throtl_init(struct gendisk *disk) -{ - struct request_queue *q = disk->queue; - struct throtl_data *td; - int ret; - - td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node); - if (!td) - return -ENOMEM; - - INIT_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn); - throtl_service_queue_init(&td->service_queue); - - q->td = td; - td->queue = q; - - /* activate policy */ - ret = blkcg_activate_policy(disk, &blkcg_policy_throtl); - if (ret) - kfree(td); - return ret; -} - void blk_throtl_exit(struct gendisk *disk) { struct request_queue *q = disk->queue; - BUG_ON(!q->td); + if (!q->td) + return; + del_timer_sync(&q->td->service_queue.pending_timer); throtl_shutdown_wq(q); blkcg_deactivate_policy(disk, &blkcg_policy_throtl); kfree(q->td); } -void blk_throtl_register(struct gendisk *disk) -{ - struct request_queue *q = disk->queue; - struct throtl_data *td; - - td = q->td; - BUG_ON(!td); - - if (blk_queue_nonrot(q)) - td->throtl_slice = DFL_THROTL_SLICE_SSD; - else - td->throtl_slice = DFL_THROTL_SLICE_HD; - td->track_bio_latency = !queue_is_mq(q); - if (!td->track_bio_latency) - blk_stat_enable_accounting(q); -} - static int __init throtl_init(void) { kthrotld_workqueue = alloc_workqueue("kthrotld", WQ_MEM_RECLAIM, 0); diff --git a/block/blk-throttle.h b/block/blk-throttle.h index 32503fd83a84..e7f5562a32e9 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -150,23 +150,23 @@ static inline struct throtl_grp *blkg_to_tg(struct blkcg_gq *blkg) * Internal throttling interface */ #ifndef CONFIG_BLK_DEV_THROTTLING -static inline int blk_throtl_init(struct gendisk *disk) { return 0; } static inline void blk_throtl_exit(struct gendisk *disk) { } -static inline void blk_throtl_register(struct gendisk *disk) { } static inline bool blk_throtl_bio(struct bio *bio) { return false; } static inline void blk_throtl_cancel_bios(struct gendisk *disk) { } #else /* CONFIG_BLK_DEV_THROTTLING */ -int blk_throtl_init(struct gendisk *disk); void blk_throtl_exit(struct gendisk *disk); -void blk_throtl_register(struct gendisk *disk); bool __blk_throtl_bio(struct bio *bio); void blk_throtl_cancel_bios(struct gendisk *disk); static inline bool blk_should_throtl(struct bio *bio) { - struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg); + struct throtl_grp *tg; int rw = bio_data_dir(bio); + if (!bio->bi_bdev->bd_queue->td) + return false; + + tg = blkg_to_tg(bio->bi_blkg); if (!cgroup_subsys_on_dfl(io_cgrp_subsys)) { if (!bio_flagged(bio, BIO_CGROUP_ACCT)) { bio_set_flag(bio, BIO_CGROUP_ACCT); From patchwork Sat Apr 6 08:00:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13619691 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98A5022F02; Sat, 6 Apr 2024 08:09:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390956; cv=none; b=EPkhY8v6rzbdp4pwrtBWwKRj3xLvd0wOSB1r72qs96kfpPfEJGgrobqQbRsYvBireicl1UwwK22DDQfEsDDZekxUmnBeYbbF5WJlMU+TF9NpM5lSvN0V5//sjm8/fGl8XCK3RQmlOSl5z5WdEUdFuc2LHa/muYBWJqfYFlnoUpM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390956; c=relaxed/simple; bh=RGeLpQq1tNqBVU02itqj6kS9o7uyCx/gUkBGHq49FmQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=E2bmnZnw6XL5Gp8Or9SnYmybqvFAK7I/ttnQNnXliUq2OZNt6VRZ2A9fNZo6fVQjRQMtMfAaQ4DJic/WZB2McHQ1qEbCFDf1dI4/wpLjHi2Mo61x7HPyadWOipqnzMxDQys6N1nbTFgRkxY6bNow+rwhZGiABHNZgwLZXqNz9v4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4VBSg316Nlz4f3lVx; Sat, 6 Apr 2024 16:09:03 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 9C4B91A016E; Sat, 6 Apr 2024 16:09:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAjAxFmnZTkJA--.2125S7; Sat, 06 Apr 2024 16:09:11 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, chenhuacai@kernel.org, tj@kernel.org, josef@toxicpanda.com, jhs@mojatatu.com, svenjoac@gmx.de, raven@themaw.net, pctammela@mojatatu.com, yukuai3@huawei.com, qde@naccy.de, zhaotianrui@loongson.cn Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, cgroups@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH RFC v2 3/6] blk-throttle: expand tg_conf_updated() to return if any tg has rules Date: Sat, 6 Apr 2024 16:00:56 +0800 Message-Id: <20240406080059.2248314-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240406080059.2248314-1-yukuai1@huaweicloud.com> References: <20240406080059.2248314-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAjAxFmnZTkJA--.2125S7 X-Coremail-Antispam: 1UD129KBjvJXoW7Kr47Kr4xtry3ur17GryDJrb_yoW5Jr48pF yUAa1UJw4DZFs09rsxJr1qgFWFqan7Zryaq393Ww4rtr4aqwnYgFn0vryruayrAF93uF4x ZrsrtrWYkF109FDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8Jw CI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUojjgUUUU U X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai There are no functional changes yet, prepare to destroy blk-throttle when it's disabled. Signed-off-by: Yu Kuai --- block/blk-throttle.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index acb2758f45ef..b371442131fe 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -320,9 +320,10 @@ static void throtl_pd_init(struct blkg_policy_data *pd) * This doesn't require walking up to the top of the hierarchy as the * parent's has_rules[] is guaranteed to be correct. */ -static void tg_update_has_rules(struct throtl_grp *tg) +static bool tg_update_has_rules(struct throtl_grp *tg) { struct throtl_grp *parent_tg = sq_to_tg(tg->service_queue.parent_sq); + bool has_rules = false; int rw; for (rw = READ; rw <= WRITE; rw++) { @@ -332,7 +333,11 @@ static void tg_update_has_rules(struct throtl_grp *tg) tg->has_rules_bps[rw] = (parent_tg && parent_tg->has_rules_bps[rw]) || tg_bps_limit(tg, rw) != U64_MAX; + if (tg->has_rules_iops[rw] || tg->has_rules_bps[rw]) + has_rules = true; } + + return has_rules; } static void throtl_pd_online(struct blkg_policy_data *pd) @@ -1163,11 +1168,12 @@ static int tg_print_conf_uint(struct seq_file *sf, void *v) return 0; } -static void tg_conf_updated(struct throtl_grp *tg, bool global) +static bool tg_conf_updated(struct throtl_grp *tg, bool global) { struct throtl_service_queue *sq = &tg->service_queue; struct cgroup_subsys_state *pos_css; struct blkcg_gq *blkg; + bool has_rules = false; throtl_log(&tg->service_queue, "limit change rbps=%llu wbps=%llu riops=%u wiops=%u", @@ -1187,7 +1193,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) struct throtl_grp *this_tg = blkg_to_tg(blkg); struct throtl_grp *parent_tg; - tg_update_has_rules(this_tg); + if (tg_update_has_rules(this_tg)) + has_rules = true; /* ignore root/second level */ if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent || !blkg->parent->parent) @@ -1211,6 +1218,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) tg_update_disptime(tg); throtl_schedule_next_dispatch(sq->parent_sq, true); } + + return has_rules; } static int blk_throtl_init(struct gendisk *disk) From patchwork Sat Apr 6 08:00:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13619692 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E51622F0F; Sat, 6 Apr 2024 08:09:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390957; cv=none; b=Xzxws0X6anV3kXNIYi6dWujH4Lfy8kAI54ZCP3WCOGxcYAMvB7LQPyPdB+xdYe4D31JaUHdYUsd2nvpKcFbZoRlvlF9SDVzKfRuHto80gp8lrjDZLlBt0WrezOc413jmIUPZp9uSm9PDV45TLtDt0y/1oRREUBYVClDNK4wPBU8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390957; c=relaxed/simple; bh=iCfeyhGX9RMMXq590xrJBnszC/V6SW+D0j93ZZWz8N4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=AdauP58/t3PIL9QLw3nIab9zLPmZOhL+tOvUH/6UqhGTSO3OO8J27PStcvXe7pb+CKk/B6ataRTprW/c3/oWBXS2zQOwks/dxe2+dtQ9sjF+LatYx/54HaAywMZnYhdE3OWNTaZ6H5jwHXRdBW4jsK0NF+RZU94M9UMtr3fquU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4VBSg76qSmz4f3k6B; Sat, 6 Apr 2024 16:09:07 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 429DB1A0172; Sat, 6 Apr 2024 16:09:12 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAjAxFmnZTkJA--.2125S8; Sat, 06 Apr 2024 16:09:11 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, chenhuacai@kernel.org, tj@kernel.org, josef@toxicpanda.com, jhs@mojatatu.com, svenjoac@gmx.de, raven@themaw.net, pctammela@mojatatu.com, yukuai3@huawei.com, qde@naccy.de, zhaotianrui@loongson.cn Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, cgroups@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH RFC v2 4/6] blk-cgroup: add a new helper blkg_conf_exit_blkg() Date: Sat, 6 Apr 2024 16:00:57 +0800 Message-Id: <20240406080059.2248314-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240406080059.2248314-1-yukuai1@huaweicloud.com> References: <20240406080059.2248314-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAjAxFmnZTkJA--.2125S8 X-Coremail-Antispam: 1UD129KBjvJXoW7Cr43GF1ktFyDCr1kWw4kJFb_yoW8Cr45p3 y2qr45K340gFnxZay3t3W7uryft3ykKry3J3yfGrWakr13try7uFn8AF40kFWrZFs7Ar45 XrWFqrW0kr18CrUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai The new helper just release the spin_lock 'queue_lock' and keep the mutex 'rq_qos_mutex' held, to allow blk_throttle and other cgroup policies to be destroyed when they are disabled by user. Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 17 +++++++++++++++++ block/blk-cgroup.h | 1 + 2 files changed, 18 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index b5e626a16758..ada9258f78bc 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -949,6 +949,23 @@ void blkg_conf_exit(struct blkg_conf_ctx *ctx) } EXPORT_SYMBOL_GPL(blkg_conf_exit); +/* + * blkg_conf_exit_blkg - like blkg_conf_exit() but only release queue_lock. + * @ctx: blkg_conf_ctx initialized with blkg_conf_init() + * + * This function must be called on all blkg_conf_ctx's initialized with + * blkg_conf_init(). + */ +void blkg_conf_exit_blkg(struct blkg_conf_ctx *ctx) + __releases(&ctx->bdev->bd_queue->queue_lock) +{ + if (ctx->blkg) { + spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock); + ctx->blkg = NULL; + } +} +EXPORT_SYMBOL_GPL(blkg_conf_exit_blkg); + static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src) { int i; diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index 78b74106bf10..009f96191e71 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -219,6 +219,7 @@ int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx); int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, struct blkg_conf_ctx *ctx); void blkg_conf_exit(struct blkg_conf_ctx *ctx); +void blkg_conf_exit_blkg(struct blkg_conf_ctx *ctx); /** * bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg From patchwork Sat Apr 6 08:00:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13619694 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA6001EB40; Sat, 6 Apr 2024 08:09:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390958; cv=none; b=Phn7jkf0/H2VoAvg0dG9wiOOd1uEoZGKiMus81JjZw8P4YJboedNOoiicSOA1eqiCzjEkd0lPeVIGk6lvKbVfVdnfF/K6C7Byjd0Ta6rZbr0g1GifiyXMXUV4n0LkMIr04vAqUys9YwTdILjC1p83N+v5lvgj6STyTsOr72Oxks= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390958; c=relaxed/simple; bh=y/WrU/OEcl5D+L2Pl2z0VOhvYq4DtNWAPdMONi5KRaM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PXsZfTX4kr/ZsuNSpVry3Ta24enQpzCYfqNyc+BqIftev1QOAytjwvO+zDX+cIF+1ZIJ+KJVx6AgI56UZ4W9Dp5udj0Tw+ayTxbx9Dd/KGB9aWl0jeHu25uRpauVlNN9iH6oJvLtth8mVZWm54sLULBrVagZ571/u7VhnE66beo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4VBSg433H9z4f3lWD; Sat, 6 Apr 2024 16:09:04 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id E070B1A0568; Sat, 6 Apr 2024 16:09:12 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAjAxFmnZTkJA--.2125S9; Sat, 06 Apr 2024 16:09:12 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, chenhuacai@kernel.org, tj@kernel.org, josef@toxicpanda.com, jhs@mojatatu.com, svenjoac@gmx.de, raven@themaw.net, pctammela@mojatatu.com, yukuai3@huawei.com, qde@naccy.de, zhaotianrui@loongson.cn Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, cgroups@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH RFC v2 5/6] blk-throttle: support to destroy throtl_data when blk-throttle is disabled Date: Sat, 6 Apr 2024 16:00:58 +0800 Message-Id: <20240406080059.2248314-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240406080059.2248314-1-yukuai1@huaweicloud.com> References: <20240406080059.2248314-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAjAxFmnZTkJA--.2125S9 X-Coremail-Antispam: 1UD129KBjvJXoWxZr1DKFyfury3tF1rCF4rXwb_yoWrGrWUpF W5Aw45t3y0qrsFgry3tr1UJFWakws7Jry7J393G3yftF42v34DWF1kZFy0vFWfAF97WF43 CryktrW7GF18GrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai Currently once blk-throttle is enabled, it can't be destroyed until disk removal, even it's disabled. Also prepare to support building it as kernel module. Signed-off-by: Yu Kuai --- block/blk-throttle.c | 65 +++++++++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 22 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index b371442131fe..5c16be07a594 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -28,6 +28,7 @@ /* A workqueue to queue throttle related work */ static struct workqueue_struct *kthrotld_workqueue; +static DECLARE_WAIT_QUEUE_HEAD(destroy_wait); #define rb_entry_tg(node) rb_entry((node), struct throtl_grp, rb_node) @@ -906,6 +907,11 @@ static void start_parent_slice_with_credit(struct throtl_grp *child_tg, } +static bool td_has_io(struct throtl_data *td) +{ + return td->nr_queued[READ] + td->nr_queued[WRITE] != 0; +} + static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) { struct throtl_service_queue *sq = &tg->service_queue; @@ -941,6 +947,8 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) &parent_sq->queued[rw]); BUG_ON(tg->td->nr_queued[rw] <= 0); tg->td->nr_queued[rw]--; + if (!td_has_io(tg->td)) + wake_up(&destroy_wait); } throtl_trim_slice(tg, rw); @@ -1268,6 +1276,31 @@ static int blk_throtl_init(struct gendisk *disk) return ret; } +void blk_throtl_exit(struct gendisk *disk) +{ + struct request_queue *q = disk->queue; + + if (!q->td) + return; + + del_timer_sync(&q->td->service_queue.pending_timer); + cancel_work_sync(&q->td->dispatch_work); + blkcg_deactivate_policy(disk, &blkcg_policy_throtl); + kfree(q->td); + q->td = NULL; +} + +static void blk_throtl_destroy(struct gendisk *disk) +{ + struct throtl_data *td = disk->queue->td; + + /* + * There are no rules, all throttled BIO should be dispatched + * immediately. + */ + wait_event(destroy_wait, !td_has_io(td)); + blk_throtl_exit(disk); +} static ssize_t tg_set_conf(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off, bool is_u64) @@ -1308,7 +1341,11 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of, else *(unsigned int *)((void *)tg + of_cft(of)->private) = v; - tg_conf_updated(tg, false); + blkg_conf_exit_blkg(&ctx); + + if (!tg_conf_updated(tg, false)) + blk_throtl_destroy(ctx.bdev->bd_disk); + ret = 0; out_finish: blkg_conf_exit(&ctx); @@ -1516,7 +1553,11 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, tg->iops[READ] = v[2]; tg->iops[WRITE] = v[3]; - tg_conf_updated(tg, false); + blkg_conf_exit_blkg(&ctx); + + if (!tg_conf_updated(tg, false)) + blk_throtl_destroy(ctx.bdev->bd_disk); + ret = 0; out_finish: blkg_conf_exit(&ctx); @@ -1533,13 +1574,6 @@ static struct cftype throtl_files[] = { { } /* terminate */ }; -static void throtl_shutdown_wq(struct request_queue *q) -{ - struct throtl_data *td = q->td; - - cancel_work_sync(&td->dispatch_work); -} - struct blkcg_policy blkcg_policy_throtl = { .dfl_cftypes = throtl_files, .legacy_cftypes = throtl_legacy_files, @@ -1688,19 +1722,6 @@ bool __blk_throtl_bio(struct bio *bio) return throttled; } -void blk_throtl_exit(struct gendisk *disk) -{ - struct request_queue *q = disk->queue; - - if (!q->td) - return; - - del_timer_sync(&q->td->service_queue.pending_timer); - throtl_shutdown_wq(q); - blkcg_deactivate_policy(disk, &blkcg_policy_throtl); - kfree(q->td); -} - static int __init throtl_init(void) { kthrotld_workqueue = alloc_workqueue("kthrotld", WQ_MEM_RECLAIM, 0); From patchwork Sat Apr 6 08:00:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13619695 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86A1A2837D; Sat, 6 Apr 2024 08:09:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390959; cv=none; b=t6wrR18X+osJ3AFM1DdF3otSpMKzdN1lx35HobiiedoNVN/JaPA9Qu21tlIIUHima5BDl9cwaYKgsPe4DEWG0nRakp7Sm7KONFo1uqvN28rRhceg1jiqOHsbAYnuqbrYOGsocKFAfBQ/M5ASJ2mlzYVXi+8YrbXlsWscNa3S9qs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712390959; c=relaxed/simple; bh=WJeIAOXxuVPXb/26+piie2EBBu53zikCe5YpY+pHvKY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hLhOXy2e4nhFxUS8tcQOc9ByG6NbRnqOgIvh9Tapf7QQRIeceVtfJD0o64uhkLr8RQDebF2WOuiS8eKpG+GXVJLrXgWUrtevjkWGFVVHPfNmJhoSrGoGst11/aZ4qbrKZpYyE0/mmeFc/4zmLvJymky+kFwoN3fIL/BsI3unGP0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4VBSg91x42z4f3jkj; Sat, 6 Apr 2024 16:09:09 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 8F53C1A0BD3; Sat, 6 Apr 2024 16:09:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxAjAxFmnZTkJA--.2125S10; Sat, 06 Apr 2024 16:09:13 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, chenhuacai@kernel.org, tj@kernel.org, josef@toxicpanda.com, jhs@mojatatu.com, svenjoac@gmx.de, raven@themaw.net, pctammela@mojatatu.com, yukuai3@huawei.com, qde@naccy.de, zhaotianrui@loongson.cn Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, cgroups@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH RFC v2 6/6] blk-throtl: switch to use rq_qos Date: Sat, 6 Apr 2024 16:00:59 +0800 Message-Id: <20240406080059.2248314-7-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240406080059.2248314-1-yukuai1@huaweicloud.com> References: <20240406080059.2248314-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDHlxAjAxFmnZTkJA--.2125S10 X-Coremail-Antispam: 1UD129KBjvAXoWfuF4xWr45WF1kKr4kJFyUAwb_yoW8urW3Co WfJFZ7XFn5K348trWYq3yxAr1fuw4vqr15ZrnYgrn5Z3W8Aa4jq347KFy5GFyfWF1IyF4x AF1jqw4FvF4xGrZ5n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUO07AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2 ka0xkIwI1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r 1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU OBTYUUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai To avoid exposing blk-throttle internal implementation to general block layer. Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 2 - block/blk-core.c | 4 +- block/blk-merge.c | 2 +- block/blk-mq-debugfs.c | 2 + block/blk-rq-qos.c | 13 ++ block/blk-rq-qos.h | 11 ++ block/blk-sysfs.c | 1 - block/blk-throttle.c | 387 +++++++++++++++++++++++------------------ block/blk-throttle.h | 50 ------ block/genhd.c | 3 - include/linux/blkdev.h | 4 - 11 files changed, 246 insertions(+), 233 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index ada9258f78bc..0eede352ece1 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -32,7 +32,6 @@ #include "blk.h" #include "blk-cgroup.h" #include "blk-ioprio.h" -#include "blk-throttle.h" static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu); @@ -1473,7 +1472,6 @@ int blkcg_init_disk(struct gendisk *disk) void blkcg_exit_disk(struct gendisk *disk) { blkg_destroy_all(disk); - blk_throtl_exit(disk); } static void blkcg_exit(struct task_struct *tsk) diff --git a/block/blk-core.c b/block/blk-core.c index a16b5abdbbf5..27ccc55bac06 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -48,7 +48,7 @@ #include "blk-mq-sched.h" #include "blk-pm.h" #include "blk-cgroup.h" -#include "blk-throttle.h" +#include "blk-rq-qos.h" #include "blk-ioprio.h" struct dentry *blk_debugfs_root; @@ -832,7 +832,7 @@ void submit_bio_noacct(struct bio *bio) goto not_supported; } - if (blk_throtl_bio(bio)) + if (rq_qos_throttle_bio(q, bio)) return; submit_bio_noacct_nocheck(bio); return; diff --git a/block/blk-merge.c b/block/blk-merge.c index 2a06fd33039d..eb08b2d91ec5 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -14,9 +14,9 @@ #include #include "blk.h" +#include "blk-cgroup.h" #include "blk-mq-sched.h" #include "blk-rq-qos.h" -#include "blk-throttle.h" static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv) { diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 94668e72ab09..b7613b811d86 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -756,6 +756,8 @@ void blk_mq_debugfs_unregister_sched(struct request_queue *q) static const char *rq_qos_id_to_name(enum rq_qos_id id) { switch (id) { + case RQ_QOS_THROTTLE: + return "throttle"; case RQ_QOS_WBT: return "wbt"; case RQ_QOS_LATENCY: diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index dd7310c94713..fc2fc4052708 100644 --- a/block/blk-rq-qos.c +++ b/block/blk-rq-qos.c @@ -59,6 +59,19 @@ void __rq_qos_requeue(struct rq_qos *rqos, struct request *rq) } while (rqos); } +bool __rq_qos_throttle_bio(struct rq_qos *rqos, struct bio *bio) +{ + do { + if (rqos->ops->throttle_bio && + rqos->ops->throttle_bio(rqos, bio)) + return true; + + rqos = rqos->next; + } while (rqos); + + return false; +} + void __rq_qos_throttle(struct rq_qos *rqos, struct bio *bio) { do { diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h index 37245c97ee61..03d8daab8ed6 100644 --- a/block/blk-rq-qos.h +++ b/block/blk-rq-qos.h @@ -14,6 +14,7 @@ struct blk_mq_debugfs_attr; enum rq_qos_id { + RQ_QOS_THROTTLE, RQ_QOS_WBT, RQ_QOS_LATENCY, RQ_QOS_COST, @@ -35,6 +36,7 @@ struct rq_qos { }; struct rq_qos_ops { + bool (*throttle_bio)(struct rq_qos *, struct bio *); void (*throttle)(struct rq_qos *, struct bio *); void (*track)(struct rq_qos *, struct request *, struct bio *); void (*merge)(struct rq_qos *, struct request *, struct bio *); @@ -104,6 +106,7 @@ void __rq_qos_cleanup(struct rq_qos *rqos, struct bio *bio); void __rq_qos_done(struct rq_qos *rqos, struct request *rq); void __rq_qos_issue(struct rq_qos *rqos, struct request *rq); void __rq_qos_requeue(struct rq_qos *rqos, struct request *rq); +bool __rq_qos_throttle_bio(struct rq_qos *rqos, struct bio *bio); void __rq_qos_throttle(struct rq_qos *rqos, struct bio *bio); void __rq_qos_track(struct rq_qos *rqos, struct request *rq, struct bio *bio); void __rq_qos_merge(struct rq_qos *rqos, struct request *rq, struct bio *bio); @@ -144,6 +147,14 @@ static inline void rq_qos_done_bio(struct bio *bio) } } +static inline bool rq_qos_throttle_bio(struct request_queue *q, struct bio *bio) +{ + if (q->rq_qos) + return __rq_qos_throttle_bio(q->rq_qos, bio); + + return false; +} + static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio) { if (q->rq_qos) { diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 1e2a0b18360c..7f5ece4b8f9e 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -18,7 +18,6 @@ #include "blk-rq-qos.h" #include "blk-wbt.h" #include "blk-cgroup.h" -#include "blk-throttle.h" struct queue_sysfs_entry { struct attribute attr; diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 5c16be07a594..b1ff0640a39a 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -11,6 +11,7 @@ #include #include #include "blk.h" +#include "blk-rq-qos.h" #include "blk-cgroup-rwstat.h" #include "blk-stat.h" #include "blk-throttle.h" @@ -45,8 +46,8 @@ struct avg_latency_bucket { bool valid; }; -struct throtl_data -{ +struct throtl_data { + struct rq_qos rqos; /* service tree for active throtl groups */ struct throtl_service_queue service_queue; @@ -65,6 +66,19 @@ struct throtl_data static void throtl_pending_timer_fn(struct timer_list *t); +static struct throtl_data *rqos_to_td(struct rq_qos *rqos) +{ + if (!rqos) + return NULL; + + return container_of(rqos, struct throtl_data, rqos); +} + +static struct throtl_data *q_to_td(struct request_queue *q) +{ + return rqos_to_td(rq_qos_id(q, RQ_QOS_THROTTLE)); +} + static inline struct blkcg_gq *tg_to_blkg(struct throtl_grp *tg) { return pd_to_blkg(&tg->pd); @@ -293,7 +307,7 @@ static void throtl_pd_init(struct blkg_policy_data *pd) { struct throtl_grp *tg = pd_to_tg(pd); struct blkcg_gq *blkg = tg_to_blkg(tg); - struct throtl_data *td = blkg->q->td; + struct throtl_data *td = q_to_td(blkg->q); struct throtl_service_queue *sq = &tg->service_queue; /* @@ -1230,6 +1244,192 @@ static bool tg_conf_updated(struct throtl_grp *tg, bool global) return has_rules; } +static void blk_throtl_cancel_bios(struct request_queue *q) +{ + struct cgroup_subsys_state *pos_css; + struct blkcg_gq *blkg; + + spin_lock_irq(&q->queue_lock); + /* + * queue_lock is held, rcu lock is not needed here technically. + * However, rcu lock is still held to emphasize that following + * path need RCU protection and to prevent warning from lockdep. + */ + rcu_read_lock(); + blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) { + struct throtl_grp *tg = blkg_to_tg(blkg); + struct throtl_service_queue *sq = &tg->service_queue; + + /* + * Set the flag to make sure throtl_pending_timer_fn() won't + * stop until all throttled bios are dispatched. + */ + tg->flags |= THROTL_TG_CANCELING; + + /* + * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup + * will be inserted to service queue without THROTL_TG_PENDING + * set in tg_update_disptime below. Then IO dispatched from + * child in tg_dispatch_one_bio will trigger double insertion + * and corrupt the tree. + */ + if (!(tg->flags & THROTL_TG_PENDING)) + continue; + + /* + * Update disptime after setting the above flag to make sure + * throtl_select_dispatch() won't exit without dispatching. + */ + tg_update_disptime(tg); + + throtl_schedule_pending_timer(sq, jiffies + 1); + } + rcu_read_unlock(); + spin_unlock_irq(&q->queue_lock); +} + +static void blk_throtl_exit(struct rq_qos *rqos) +{ + struct throtl_data *td = rqos_to_td(rqos); + + blk_throtl_cancel_bios(td->queue); + /* + * There are no rules, all throttled BIO should be dispatched + * immediately. + */ + wait_event(destroy_wait, !td_has_io(td)); + + del_timer_sync(&td->service_queue.pending_timer); + cancel_work_sync(&td->dispatch_work); + blkcg_deactivate_policy(rqos->disk, &blkcg_policy_throtl); + kfree(td); +} + +static bool blk_should_throtl(struct throtl_grp *tg, struct bio *bio) +{ + int rw = bio_data_dir(bio); + + if (!cgroup_subsys_on_dfl(io_cgrp_subsys)) { + if (!bio_flagged(bio, BIO_CGROUP_ACCT)) { + bio_set_flag(bio, BIO_CGROUP_ACCT); + blkg_rwstat_add(&tg->stat_bytes, bio->bi_opf, + bio->bi_iter.bi_size); + } + blkg_rwstat_add(&tg->stat_ios, bio->bi_opf, 1); + } + + /* iops limit is always counted */ + if (tg->has_rules_iops[rw]) + return true; + + if (tg->has_rules_bps[rw] && !bio_flagged(bio, BIO_BPS_THROTTLED)) + return true; + + return false; +} + +static bool __blk_throtl_bio(struct throtl_grp *tg, struct bio *bio) +{ + struct request_queue *q = bdev_get_queue(bio->bi_bdev); + struct throtl_qnode *qn = NULL; + struct throtl_service_queue *sq; + bool rw = bio_data_dir(bio); + bool throttled = false; + struct throtl_data *td = tg->td; + + rcu_read_lock(); + spin_lock_irq(&q->queue_lock); + sq = &tg->service_queue; + + while (true) { + if (tg->last_low_overflow_time[rw] == 0) + tg->last_low_overflow_time[rw] = jiffies; + /* throtl is FIFO - if bios are already queued, should queue */ + if (sq->nr_queued[rw]) + break; + + /* if above limits, break to queue */ + if (!tg_may_dispatch(tg, bio, NULL)) { + tg->last_low_overflow_time[rw] = jiffies; + break; + } + + /* within limits, let's charge and dispatch directly */ + throtl_charge_bio(tg, bio); + + /* + * We need to trim slice even when bios are not being queued + * otherwise it might happen that a bio is not queued for + * a long time and slice keeps on extending and trim is not + * called for a long time. Now if limits are reduced suddenly + * we take into account all the IO dispatched so far at new + * low rate and * newly queued IO gets a really long dispatch + * time. + * + * So keep on trimming slice even if bio is not queued. + */ + throtl_trim_slice(tg, rw); + + /* + * @bio passed through this layer without being throttled. + * Climb up the ladder. If we're already at the top, it + * can be executed directly. + */ + qn = &tg->qnode_on_parent[rw]; + sq = sq->parent_sq; + tg = sq_to_tg(sq); + if (!tg) { + bio_set_flag(bio, BIO_BPS_THROTTLED); + goto out_unlock; + } + } + + /* out-of-limit, queue to @tg */ + throtl_log(sq, "[%c] bio. bdisp=%llu sz=%u bps=%llu iodisp=%u iops=%u queued=%d/%d", + rw == READ ? 'R' : 'W', + tg->bytes_disp[rw], bio->bi_iter.bi_size, + tg_bps_limit(tg, rw), + tg->io_disp[rw], tg_iops_limit(tg, rw), + sq->nr_queued[READ], sq->nr_queued[WRITE]); + + tg->last_low_overflow_time[rw] = jiffies; + + td->nr_queued[rw]++; + throtl_add_bio_tg(bio, qn, tg); + throttled = true; + + /* + * Update @tg's dispatch time and force schedule dispatch if @tg + * was empty before @bio. The forced scheduling isn't likely to + * cause undue delay as @bio is likely to be dispatched directly if + * its @tg's disptime is not in the future. + */ + if (tg->flags & THROTL_TG_WAS_EMPTY) { + tg_update_disptime(tg); + throtl_schedule_next_dispatch(tg->service_queue.parent_sq, true); + } + +out_unlock: + spin_unlock_irq(&q->queue_lock); + + rcu_read_unlock(); + return throttled; +} + +static bool blk_throtl_bio(struct rq_qos *rqos, struct bio *bio) +{ + struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg); + + if (!blk_should_throtl(tg, bio)) + return false; + + return __blk_throtl_bio(tg, bio); +} + +static const struct rq_qos_ops throtl_rqos_ops = { + .throttle_bio = blk_throtl_bio, + .exit = blk_throtl_exit, +}; static int blk_throtl_init(struct gendisk *disk) { struct request_queue *q = disk->queue; @@ -1243,20 +1443,21 @@ static int blk_throtl_init(struct gendisk *disk) INIT_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn); throtl_service_queue_init(&td->service_queue); - /* - * freeze queue before setting q->td, so that IO path won't see - * q->td while policy is not activated yet. - */ blk_mq_freeze_queue(disk->queue); blk_mq_quiesce_queue(disk->queue); - q->td = td; td->queue = q; + ret = rq_qos_add(&td->rqos, disk, RQ_QOS_THROTTLE, &throtl_rqos_ops); + if (ret) { + kfree(td); + goto out; + } + /* activate policy */ ret = blkcg_activate_policy(disk, &blkcg_policy_throtl); if (ret) { - q->td = NULL; + rq_qos_del(&td->rqos); kfree(td); goto out; } @@ -1276,30 +1477,14 @@ static int blk_throtl_init(struct gendisk *disk) return ret; } -void blk_throtl_exit(struct gendisk *disk) -{ - struct request_queue *q = disk->queue; - - if (!q->td) - return; - - del_timer_sync(&q->td->service_queue.pending_timer); - cancel_work_sync(&q->td->dispatch_work); - blkcg_deactivate_policy(disk, &blkcg_policy_throtl); - kfree(q->td); - q->td = NULL; -} - static void blk_throtl_destroy(struct gendisk *disk) { - struct throtl_data *td = disk->queue->td; + struct throtl_data *td = q_to_td(disk->queue); - /* - * There are no rules, all throttled BIO should be dispatched - * immediately. - */ - wait_event(destroy_wait, !td_has_io(td)); - blk_throtl_exit(disk); + lockdep_assert_held(&td->queue->rq_qos_mutex); + + rq_qos_del(&td->rqos); + blk_throtl_exit(&td->rqos); } static ssize_t tg_set_conf(struct kernfs_open_file *of, @@ -1317,7 +1502,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of, if (ret) goto out_finish; - if (!ctx.bdev->bd_queue->td) { + if (!q_to_td(ctx.bdev->bd_queue)) { ret = blk_throtl_init(ctx.bdev->bd_disk); if (ret) goto out_finish; @@ -1495,7 +1680,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, if (ret) goto out_finish; - if (!ctx.bdev->bd_queue->td) { + if (!q_to_td(ctx.bdev->bd_queue)) { ret = blk_throtl_init(ctx.bdev->bd_disk); if (ret) goto out_finish; @@ -1584,144 +1769,6 @@ struct blkcg_policy blkcg_policy_throtl = { .pd_free_fn = throtl_pd_free, }; -void blk_throtl_cancel_bios(struct gendisk *disk) -{ - struct request_queue *q = disk->queue; - struct cgroup_subsys_state *pos_css; - struct blkcg_gq *blkg; - - if (!q->td) - return; - - spin_lock_irq(&q->queue_lock); - /* - * queue_lock is held, rcu lock is not needed here technically. - * However, rcu lock is still held to emphasize that following - * path need RCU protection and to prevent warning from lockdep. - */ - rcu_read_lock(); - blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) { - struct throtl_grp *tg = blkg_to_tg(blkg); - struct throtl_service_queue *sq = &tg->service_queue; - - /* - * Set the flag to make sure throtl_pending_timer_fn() won't - * stop until all throttled bios are dispatched. - */ - tg->flags |= THROTL_TG_CANCELING; - - /* - * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup - * will be inserted to service queue without THROTL_TG_PENDING - * set in tg_update_disptime below. Then IO dispatched from - * child in tg_dispatch_one_bio will trigger double insertion - * and corrupt the tree. - */ - if (!(tg->flags & THROTL_TG_PENDING)) - continue; - - /* - * Update disptime after setting the above flag to make sure - * throtl_select_dispatch() won't exit without dispatching. - */ - tg_update_disptime(tg); - - throtl_schedule_pending_timer(sq, jiffies + 1); - } - rcu_read_unlock(); - spin_unlock_irq(&q->queue_lock); -} - -bool __blk_throtl_bio(struct bio *bio) -{ - struct request_queue *q = bdev_get_queue(bio->bi_bdev); - struct blkcg_gq *blkg = bio->bi_blkg; - struct throtl_qnode *qn = NULL; - struct throtl_grp *tg = blkg_to_tg(blkg); - struct throtl_service_queue *sq; - bool rw = bio_data_dir(bio); - bool throttled = false; - struct throtl_data *td = tg->td; - - rcu_read_lock(); - spin_lock_irq(&q->queue_lock); - sq = &tg->service_queue; - - while (true) { - if (tg->last_low_overflow_time[rw] == 0) - tg->last_low_overflow_time[rw] = jiffies; - /* throtl is FIFO - if bios are already queued, should queue */ - if (sq->nr_queued[rw]) - break; - - /* if above limits, break to queue */ - if (!tg_may_dispatch(tg, bio, NULL)) { - tg->last_low_overflow_time[rw] = jiffies; - break; - } - - /* within limits, let's charge and dispatch directly */ - throtl_charge_bio(tg, bio); - - /* - * We need to trim slice even when bios are not being queued - * otherwise it might happen that a bio is not queued for - * a long time and slice keeps on extending and trim is not - * called for a long time. Now if limits are reduced suddenly - * we take into account all the IO dispatched so far at new - * low rate and * newly queued IO gets a really long dispatch - * time. - * - * So keep on trimming slice even if bio is not queued. - */ - throtl_trim_slice(tg, rw); - - /* - * @bio passed through this layer without being throttled. - * Climb up the ladder. If we're already at the top, it - * can be executed directly. - */ - qn = &tg->qnode_on_parent[rw]; - sq = sq->parent_sq; - tg = sq_to_tg(sq); - if (!tg) { - bio_set_flag(bio, BIO_BPS_THROTTLED); - goto out_unlock; - } - } - - /* out-of-limit, queue to @tg */ - throtl_log(sq, "[%c] bio. bdisp=%llu sz=%u bps=%llu iodisp=%u iops=%u queued=%d/%d", - rw == READ ? 'R' : 'W', - tg->bytes_disp[rw], bio->bi_iter.bi_size, - tg_bps_limit(tg, rw), - tg->io_disp[rw], tg_iops_limit(tg, rw), - sq->nr_queued[READ], sq->nr_queued[WRITE]); - - tg->last_low_overflow_time[rw] = jiffies; - - td->nr_queued[rw]++; - throtl_add_bio_tg(bio, qn, tg); - throttled = true; - - /* - * Update @tg's dispatch time and force schedule dispatch if @tg - * was empty before @bio. The forced scheduling isn't likely to - * cause undue delay as @bio is likely to be dispatched directly if - * its @tg's disptime is not in the future. - */ - if (tg->flags & THROTL_TG_WAS_EMPTY) { - tg_update_disptime(tg); - throtl_schedule_next_dispatch(tg->service_queue.parent_sq, true); - } - -out_unlock: - spin_unlock_irq(&q->queue_lock); - - rcu_read_unlock(); - return throttled; -} - static int __init throtl_init(void) { kthrotld_workqueue = alloc_workqueue("kthrotld", WQ_MEM_RECLAIM, 0); diff --git a/block/blk-throttle.h b/block/blk-throttle.h index e7f5562a32e9..147940c4a7ca 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -146,54 +146,4 @@ static inline struct throtl_grp *blkg_to_tg(struct blkcg_gq *blkg) return pd_to_tg(blkg_to_pd(blkg, &blkcg_policy_throtl)); } -/* - * Internal throttling interface - */ -#ifndef CONFIG_BLK_DEV_THROTTLING -static inline void blk_throtl_exit(struct gendisk *disk) { } -static inline bool blk_throtl_bio(struct bio *bio) { return false; } -static inline void blk_throtl_cancel_bios(struct gendisk *disk) { } -#else /* CONFIG_BLK_DEV_THROTTLING */ -void blk_throtl_exit(struct gendisk *disk); -bool __blk_throtl_bio(struct bio *bio); -void blk_throtl_cancel_bios(struct gendisk *disk); - -static inline bool blk_should_throtl(struct bio *bio) -{ - struct throtl_grp *tg; - int rw = bio_data_dir(bio); - - if (!bio->bi_bdev->bd_queue->td) - return false; - - tg = blkg_to_tg(bio->bi_blkg); - if (!cgroup_subsys_on_dfl(io_cgrp_subsys)) { - if (!bio_flagged(bio, BIO_CGROUP_ACCT)) { - bio_set_flag(bio, BIO_CGROUP_ACCT); - blkg_rwstat_add(&tg->stat_bytes, bio->bi_opf, - bio->bi_iter.bi_size); - } - blkg_rwstat_add(&tg->stat_ios, bio->bi_opf, 1); - } - - /* iops limit is always counted */ - if (tg->has_rules_iops[rw]) - return true; - - if (tg->has_rules_bps[rw] && !bio_flagged(bio, BIO_BPS_THROTTLED)) - return true; - - return false; -} - -static inline bool blk_throtl_bio(struct bio *bio) -{ - - if (!blk_should_throtl(bio)) - return false; - - return __blk_throtl_bio(bio); -} #endif /* CONFIG_BLK_DEV_THROTTLING */ - -#endif diff --git a/block/genhd.c b/block/genhd.c index bb29a68e1d67..9bc55321e606 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -27,7 +27,6 @@ #include #include -#include "blk-throttle.h" #include "blk.h" #include "blk-mq-sched.h" #include "blk-rq-qos.h" @@ -699,8 +698,6 @@ void del_gendisk(struct gendisk *disk) blk_mq_freeze_queue_wait(q); - blk_throtl_cancel_bios(disk); - blk_sync_queue(q); blk_flush_integrity(); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c3e8f7cf96be..368f450f74f2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -487,10 +487,6 @@ struct request_queue { int mq_freeze_depth; -#ifdef CONFIG_BLK_DEV_THROTTLING - /* Throttle data */ - struct throtl_data *td; -#endif struct rcu_head rcu_head; wait_queue_head_t mq_freeze_wq; /*