From patchwork Tue Aug 23 03:31:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12951669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF45EC32774 for ; Tue, 23 Aug 2022 03:20:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237331AbiHWDUJ (ORCPT ); Mon, 22 Aug 2022 23:20:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239941AbiHWDUH (ORCPT ); Mon, 22 Aug 2022 23:20:07 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31E674DB66; Mon, 22 Aug 2022 20:20:05 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.169]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MBZFH2NQHzlkb5; Tue, 23 Aug 2022 11:18:43 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.127.227]) by APP1 (Coremail) with SMTP id cCh0CgDXkO1fRwRj66O+Ag--.4888S5; Tue, 23 Aug 2022 11:20:03 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, tj@kernel.org, ming.lei@redhat.com, mkoutny@suse.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com Subject: [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio Date: Tue, 23 Aug 2022 11:31:27 +0800 Message-Id: <20220823033130.874230-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220823033130.874230-1-yukuai1@huaweicloud.com> References: <20220823033130.874230-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDXkO1fRwRj66O+Ag--.4888S5 X-Coremail-Antispam: 1UD129KBjvJXoWxtFW8Xry3tFW8Zr4ktF15CFg_yoW3Zw4UpF yxuFWkWr4kGrsFgr1DJF1I9Fy8JF48Ary5ArZxGw4SvF1akrnrtFn7Zr10yFyY9FZxCayD ZF1vk3srCa1UGrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBE14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r1I6r4UM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUqAp5UUUUU = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Yu Kuai Test scripts: cd /sys/fs/cgroup/blkio/ echo "8:0 1024" > blkio.throttle.write_bps_device echo $$ > cgroup.procs dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & Test result: 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s The problem is that the second bio is finished after 10s instead of 20s. Root cause: 1) second bio will be flagged: __blk_throtl_bio while (true) { ... if (sq->nr_queued[rw]) -> some bio is throttled already break }; bio_set_flag(bio, BIO_THROTTLED); -> flag the bio 2) flagged bio will be dispatched without waiting: throtl_dispatch_tg tg_may_dispatch tg_with_in_bps_limit if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) *wait = 0; -> wait time is zero return true; commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") support to count split bios for iops limit, thus it adds flagged bio checking in tg_with_in_bps_limit() so that split bios will only count once for bps limit, however, it introduce a new problem that io throttle won't work if multiple bios are throttled. In order to fix the problem, two flags BIO_IOPS/BPS_THROTTLED is used, thus iops and bps can be treated separately for split bio: 1) for bps limit, original bio already flagged, nothing need to be done. 2) for iops limit, original bio already flagged, and the flag should be cleared before resubmitting the original bio. Noted this patch also remove the code to set flag in __bio_clone(), it's introduced in commit 111be8839817 ("block-throttle: avoid double charge"), and author thinks split bio can be resubmited and throttled again, which is wrong because split bio will continue to dispatch from caller. Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") Cc: Signed-off-by: Yu Kuai --- block/bio.c | 2 -- block/blk-merge.c | 7 +++++++ block/blk-throttle.c | 23 +++++++++++++---------- block/blk-throttle.h | 6 +++--- include/linux/bio.h | 6 ++++-- include/linux/blk_types.h | 6 ++++-- 6 files changed, 31 insertions(+), 19 deletions(-) diff --git a/block/bio.c b/block/bio.c index 3d3a2678fea2..77e3b764a078 100644 --- a/block/bio.c +++ b/block/bio.c @@ -760,8 +760,6 @@ EXPORT_SYMBOL(bio_put); static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) { bio_set_flag(bio, BIO_CLONED); - if (bio_flagged(bio_src, BIO_THROTTLED)) - bio_set_flag(bio, BIO_THROTTLED); bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_iter = bio_src->bi_iter; diff --git a/block/blk-merge.c b/block/blk-merge.c index ff04e9290715..10330bb038ca 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -358,6 +358,13 @@ struct bio *__bio_split_to_limits(struct bio *bio, struct queue_limits *lim, blkcg_bio_issue_init(split); bio_chain(split, bio); trace_block_split(split, bio->bi_iter.bi_sector); + + /* + * original bio will be resubmited and throttled again, clear + * the iops flag so that it can be count again for iops limit. + */ + if (bio_flagged(bio, BIO_IOPS_THROTTLED)) + bio_clear_flag(bio, BIO_IOPS_THROTTLED); submit_bio_noacct(bio); return split; } diff --git a/block/blk-throttle.c b/block/blk-throttle.c index e47506a8ef47..f3c9dcab63e2 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -762,7 +762,7 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; u64 tmp; - if (iops_limit == UINT_MAX) { + if (iops_limit == UINT_MAX || bio_flagged(bio, BIO_IOPS_THROTTLED)) { if (wait) *wait = 0; return true; @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, unsigned int bio_size = throtl_bio_data_size(bio); /* no need to throttle if this bio's bytes have been accounted */ - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { + if (bps_limit == U64_MAX || bio_flagged(bio, BIO_BPS_THROTTLED)) { if (wait) *wait = 0; return true; @@ -921,13 +921,11 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) unsigned int bio_size = throtl_bio_data_size(bio); /* Charge the bio to the group */ - if (!bio_flagged(bio, BIO_THROTTLED)) { + if (!bio_flagged(bio, BIO_BPS_THROTTLED)) { tg->bytes_disp[rw] += bio_size; tg->last_bytes_disp[rw] += bio_size; } - tg->io_disp[rw]++; - tg->last_io_disp[rw]++; /* * BIO_THROTTLED is used to prevent the same bio to be throttled @@ -935,8 +933,10 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) * second time when it eventually gets issued. Set it when a bio * is being charged to a tg. */ - if (!bio_flagged(bio, BIO_THROTTLED)) - bio_set_flag(bio, BIO_THROTTLED); + if (!bio_flagged(bio, BIO_IOPS_THROTTLED)) { + tg->io_disp[rw]++; + tg->last_io_disp[rw]++; + } } /** @@ -1026,6 +1026,8 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) sq->nr_queued[rw]--; throtl_charge_bio(tg, bio); + bio_set_flag(bio, BIO_BPS_THROTTLED); + bio_set_flag(bio, BIO_IOPS_THROTTLED); /* * If our parent is another tg, we just need to transfer @bio to @@ -2159,8 +2161,11 @@ bool __blk_throtl_bio(struct bio *bio) qn = &tg->qnode_on_parent[rw]; sq = sq->parent_sq; tg = sq_to_tg(sq); - if (!tg) + if (!tg) { + bio_set_flag(bio, BIO_BPS_THROTTLED); + bio_set_flag(bio, BIO_IOPS_THROTTLED); goto out_unlock; + } } /* out-of-limit, queue to @tg */ @@ -2189,8 +2194,6 @@ bool __blk_throtl_bio(struct bio *bio) } out_unlock: - bio_set_flag(bio, BIO_THROTTLED); - #ifdef CONFIG_BLK_DEV_THROTTLING_LOW if (throttled || !td->track_bio_latency) bio->bi_issue.value |= BIO_ISSUE_THROTL_SKIP_LATENCY; diff --git a/block/blk-throttle.h b/block/blk-throttle.h index c1b602996127..45c6f3c1dfe0 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -174,9 +174,9 @@ static inline bool blk_throtl_bio(struct bio *bio) { struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg); - /* no need to throttle bps any more if the bio has been throttled */ - if (bio_flagged(bio, BIO_THROTTLED) && - !(tg->flags & THROTL_TG_HAS_IOPS_LIMIT)) + /* no need to throttle any more if the bio has been throttled */ + if (bio_flagged(bio, BIO_BPS_THROTTLED) && + bio_flagged(bio, BIO_IOPS_THROTTLED)) return false; if (!tg->has_rules[bio_data_dir(bio)]) diff --git a/include/linux/bio.h b/include/linux/bio.h index ca22b06700a9..8ab014af163d 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -508,8 +508,10 @@ static inline void bio_clone_blkg_association(struct bio *dst, static inline void bio_set_dev(struct bio *bio, struct block_device *bdev) { bio_clear_flag(bio, BIO_REMAPPED); - if (bio->bi_bdev != bdev) - bio_clear_flag(bio, BIO_THROTTLED); + if (bio->bi_bdev != bdev) { + bio_clear_flag(bio, BIO_BPS_THROTTLED); + bio_clear_flag(bio, BIO_IOPS_THROTTLED); + } bio->bi_bdev = bdev; bio_associate_blkg(bio); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 1ef99790f6ed..d37cdafa2a07 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -325,8 +325,10 @@ enum { BIO_QUIET, /* Make BIO Quiet */ BIO_CHAIN, /* chained bio, ->bi_remaining in effect */ BIO_REFFED, /* bio has elevated ->bi_cnt */ - BIO_THROTTLED, /* This bio has already been subjected to - * throttling rules. Don't do it again. */ + BIO_BPS_THROTTLED, /* This bio has already been subjected to + * bps throttling rules. Don't do it again. */ + BIO_IOPS_THROTTLED, /* This bio has already been subjected to + * iops throttling rules. Don't do it again. */ BIO_TRACE_COMPLETION, /* bio_endio() should trace the final completion * of this bio. */ BIO_CGROUP_ACCT, /* has been accounted to a cgroup */ From patchwork Tue Aug 23 03:31:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12951672 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD6F0C28D13 for ; Tue, 23 Aug 2022 03:20:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240018AbiHWDUM (ORCPT ); Mon, 22 Aug 2022 23:20:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239904AbiHWDUH (ORCPT ); Mon, 22 Aug 2022 23:20:07 -0400 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 929434E603; Mon, 22 Aug 2022 20:20:05 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.169]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4MBZF23YN0z6V2sG; Tue, 23 Aug 2022 11:18:30 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.127.227]) by APP1 (Coremail) with SMTP id cCh0CgDXkO1fRwRj66O+Ag--.4888S6; Tue, 23 Aug 2022 11:20:03 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, tj@kernel.org, ming.lei@redhat.com, mkoutny@suse.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com Subject: [PATCH v8 2/4] blk-throttle: prevent overflow while calculating wait time Date: Tue, 23 Aug 2022 11:31:28 +0800 Message-Id: <20220823033130.874230-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220823033130.874230-1-yukuai1@huaweicloud.com> References: <20220823033130.874230-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDXkO1fRwRj66O+Ag--.4888S6 X-Coremail-Antispam: 1UD129KBjvJXoW7ur48GFy3JFW5uw4ftr4ruFg_yoW8GFW8pr WfGr40gF4rXF1fKF43Jan0va48Jws8JFW5G3y3Wan2yF90kr97XF9YvrWFva18AFZ29a1f Aw1Dtry7Ar1DA3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBE14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUc6pPUUUUU = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Yu Kuai There is a problem found by code review in tg_with_in_bps_limit() that 'bps_limit * jiffy_elapsed_rnd' might overflow. Fix the problem by calling mul_u64_u64_div_u64() instead. Signed-off-by: Yu Kuai Acked-by: Tejun Heo --- block/blk-throttle.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index f3c9dcab63e2..773a355d3838 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -806,7 +806,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, u64 bps_limit, unsigned long *wait) { bool rw = bio_data_dir(bio); - u64 bytes_allowed, extra_bytes, tmp; + u64 bytes_allowed, extra_bytes; unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; unsigned int bio_size = throtl_bio_data_size(bio); @@ -824,10 +824,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, jiffy_elapsed_rnd = tg->td->throtl_slice; jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice); - - tmp = bps_limit * jiffy_elapsed_rnd; - do_div(tmp, HZ); - bytes_allowed = tmp; + bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd, + (u64)HZ); if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) { if (wait) From patchwork Tue Aug 23 03:31:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12951671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4389FC32796 for ; Tue, 23 Aug 2022 03:20:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240009AbiHWDUK (ORCPT ); Mon, 22 Aug 2022 23:20:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239952AbiHWDUH (ORCPT ); Mon, 22 Aug 2022 23:20:07 -0400 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E22184F6B7; Mon, 22 Aug 2022 20:20:05 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.169]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4MBZF26vjlz6V2tg; Tue, 23 Aug 2022 11:18:30 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.127.227]) by APP1 (Coremail) with SMTP id cCh0CgDXkO1fRwRj66O+Ag--.4888S7; Tue, 23 Aug 2022 11:20:04 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, tj@kernel.org, ming.lei@redhat.com, mkoutny@suse.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com Subject: [PATCH v8 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed Date: Tue, 23 Aug 2022 11:31:29 +0800 Message-Id: <20220823033130.874230-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220823033130.874230-1-yukuai1@huaweicloud.com> References: <20220823033130.874230-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDXkO1fRwRj66O+Ag--.4888S7 X-Coremail-Antispam: 1UD129KBjvJXoWxAw4DWw13XFy7uFy5CrW7Arb_yoWrCry5pr WfGr1jqF4rXF4xtF13JwsIkayrAw1kCFyUJ3y3Wa4vyF15uFykJFn3ZrW0yay7ZF97ua1x Zw1qq3ZrCF4DAFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBE14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUd8n5UUUUU = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Yu Kuai No functional changes, new apis will be used in later patches to calculate wait time for throttled bios when new configuration is submitted. Noted this patch also rename tg_with_in_iops/bps_limit() to tg_within_iops/bps_limit(). Signed-off-by: Yu Kuai Acked-by: Tejun Heo --- block/blk-throttle.c | 59 ++++++++++++++++++++++++++------------------ 1 file changed, 35 insertions(+), 24 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 773a355d3838..757b620f0f2d 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -754,33 +754,20 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) tg->slice_start[rw], tg->slice_end[rw], jiffies); } -static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, - u32 iops_limit, unsigned long *wait) +static unsigned int calculate_io_allowed(u32 iops_limit, + unsigned long jiffy_elapsed) { - bool rw = bio_data_dir(bio); unsigned int io_allowed; - unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; u64 tmp; - if (iops_limit == UINT_MAX || bio_flagged(bio, BIO_IOPS_THROTTLED)) { - if (wait) - *wait = 0; - return true; - } - - jiffy_elapsed = jiffies - tg->slice_start[rw]; - - /* Round up to the next throttle slice, wait time must be nonzero */ - jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice); - /* - * jiffy_elapsed_rnd should not be a big value as minimum iops can be + * jiffy_elapsed should not be a big value as minimum iops can be * 1 then at max jiffy elapsed should be equivalent of 1 second as we * will allow dispatch after 1 second and after that slice should * have been trimmed. */ - tmp = (u64)iops_limit * jiffy_elapsed_rnd; + tmp = (u64)iops_limit * jiffy_elapsed; do_div(tmp, HZ); if (tmp > UINT_MAX) @@ -788,6 +775,32 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, else io_allowed = tmp; + return io_allowed; +} + +static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) +{ + return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); +} + +static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio, + u32 iops_limit, unsigned long *wait) +{ + bool rw = bio_data_dir(bio); + unsigned int io_allowed; + unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; + + if (iops_limit == UINT_MAX || bio_flagged(bio, BIO_IOPS_THROTTLED)) { + if (wait) + *wait = 0; + return true; + } + + jiffy_elapsed = jiffies - tg->slice_start[rw]; + + /* Round up to the next throttle slice, wait time must be nonzero */ + jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice); + io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd); if (tg->io_disp[rw] + 1 <= io_allowed) { if (wait) *wait = 0; @@ -802,8 +815,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, return false; } -static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, - u64 bps_limit, unsigned long *wait) +static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, + u64 bps_limit, unsigned long *wait) { bool rw = bio_data_dir(bio); u64 bytes_allowed, extra_bytes; @@ -824,9 +837,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, jiffy_elapsed_rnd = tg->td->throtl_slice; jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice); - bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd, - (u64)HZ); - + bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd); if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) { if (wait) *wait = 0; @@ -895,8 +906,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, jiffies + tg->td->throtl_slice); } - if (tg_with_in_bps_limit(tg, bio, bps_limit, &bps_wait) && - tg_with_in_iops_limit(tg, bio, iops_limit, &iops_wait)) { + if (tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) && + tg_within_iops_limit(tg, bio, iops_limit, &iops_wait)) { if (wait) *wait = 0; return true; From patchwork Tue Aug 23 03:31:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12951673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CCCFC32789 for ; Tue, 23 Aug 2022 03:20:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240021AbiHWDUM (ORCPT ); Mon, 22 Aug 2022 23:20:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239954AbiHWDUH (ORCPT ); Mon, 22 Aug 2022 23:20:07 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6380B4CA0B; Mon, 22 Aug 2022 20:20:06 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.169]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MBZF26MxDzKwQj; Tue, 23 Aug 2022 11:18:30 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.127.227]) by APP1 (Coremail) with SMTP id cCh0CgDXkO1fRwRj66O+Ag--.4888S8; Tue, 23 Aug 2022 11:20:04 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, tj@kernel.org, ming.lei@redhat.com, mkoutny@suse.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com Subject: [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates Date: Tue, 23 Aug 2022 11:31:30 +0800 Message-Id: <20220823033130.874230-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220823033130.874230-1-yukuai1@huaweicloud.com> References: <20220823033130.874230-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: cCh0CgDXkO1fRwRj66O+Ag--.4888S8 X-Coremail-Antispam: 1UD129KBjvJXoW3GFyDGr1DGFyDWr48GrWktFb_yoWxtry8pF WFkrs0qw45Xan7KFZxC3Z0yay0qws7Jry3J3y7Gr15AF1YkryktFn8ZrWYyay8AF97uFWI vw1qgF9xCF42vrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQSdkU UUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Yu Kuai If new configuration is submitted while a bio is throttled, then new waiting time is recalculated regardless that the bio might already wait for some time: tg_conf_updated throtl_start_new_slice tg_update_disptime throtl_schedule_next_dispatch Then io hung can be triggered by always submmiting new configuration before the throttled bio is dispatched. Fix the problem by respecting the time that throttled bio already waited. In order to do that, add new fields to record how many bytes/io are waited, and use it to calculate wait time for throttled bio under new configuration. Some simple test: 1) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 2048" > blkio.throttle.write_bps_device { sleep 2 echo "8:0 1024" > blkio.throttle.write_bps_device } & dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct 2) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 1024" > blkio.throttle.write_bps_device { sleep 4 echo "8:0 2048" > blkio.throttle.write_bps_device } & dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct test results: io finish time before this patch with this patch 1) 10s 6s 2) 8s 6s Signed-off-by: Yu Kuai Reviewed-by: Michal Koutný --- block/blk-throttle.c | 58 +++++++++++++++++++++++++++++++++++++++----- block/blk-throttle.h | 9 +++++++ 2 files changed, 61 insertions(+), 6 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 757b620f0f2d..679e08c0714c 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -639,6 +639,8 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg, { tg->bytes_disp[rw] = 0; tg->io_disp[rw] = 0; + tg->carryover_bytes[rw] = 0; + tg->carryover_ios[rw] = 0; /* * Previous slice has expired. We must have trimmed it after last @@ -656,12 +658,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg, tg->slice_end[rw], jiffies); } -static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw) +static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw, + bool clear_carryover) { tg->bytes_disp[rw] = 0; tg->io_disp[rw] = 0; tg->slice_start[rw] = jiffies; tg->slice_end[rw] = jiffies + tg->td->throtl_slice; + if (clear_carryover) { + tg->carryover_bytes[rw] = 0; + tg->carryover_ios[rw] = 0; + } throtl_log(&tg->service_queue, "[%c] new slice start=%lu end=%lu jiffies=%lu", @@ -783,6 +790,41 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); } +static void __tg_update_carryover(struct throtl_grp *tg, bool rw) +{ + unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw]; + u64 bps_limit = tg_bps_limit(tg, rw); + u32 iops_limit = tg_iops_limit(tg, rw); + + /* + * If config is updated while bios are still throttled, calculate and + * accumulate how many bytes/ios are waited across changes. And + * carryover_bytes/ios will be used to calculate new wait time under new + * configuration. + */ + if (bps_limit != U64_MAX) + tg->carryover_bytes[rw] += + calculate_bytes_allowed(bps_limit, jiffy_elapsed) - + tg->bytes_disp[rw]; + if (iops_limit != UINT_MAX) + tg->carryover_ios[rw] += + calculate_io_allowed(iops_limit, jiffy_elapsed) - + tg->io_disp[rw]; +} + +static void tg_update_carryover(struct throtl_grp *tg) +{ + if (tg->service_queue.nr_queued[READ]) + __tg_update_carryover(tg, READ); + if (tg->service_queue.nr_queued[WRITE]) + __tg_update_carryover(tg, WRITE); + + /* see comments in struct throtl_grp for meaning of these fields. */ + throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__, + tg->carryover_bytes[READ], tg->carryover_bytes[WRITE], + tg->carryover_ios[READ], tg->carryover_ios[WRITE]); +} + static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio, u32 iops_limit, unsigned long *wait) { @@ -800,7 +842,8 @@ static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio, /* Round up to the next throttle slice, wait time must be nonzero */ jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice); - io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd); + io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) + + tg->carryover_ios[rw]; if (tg->io_disp[rw] + 1 <= io_allowed) { if (wait) *wait = 0; @@ -837,7 +880,8 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, jiffy_elapsed_rnd = tg->td->throtl_slice; jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice); - bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd); + bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) + + tg->carryover_bytes[rw]; if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) { if (wait) *wait = 0; @@ -898,7 +942,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, * slice and it should be extended instead. */ if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw])) - throtl_start_new_slice(tg, rw); + throtl_start_new_slice(tg, rw, true); else { if (time_before(tg->slice_end[rw], jiffies + tg->td->throtl_slice)) @@ -1332,8 +1376,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) * that a group's limit are dropped suddenly and we don't want to * account recently dispatched IO with new low rate. */ - throtl_start_new_slice(tg, READ); - throtl_start_new_slice(tg, WRITE); + throtl_start_new_slice(tg, READ, false); + throtl_start_new_slice(tg, WRITE, false); if (tg->flags & THROTL_TG_PENDING) { tg_update_disptime(tg); @@ -1361,6 +1405,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of, v = U64_MAX; tg = blkg_to_tg(ctx.blkg); + tg_update_carryover(tg); if (is_u64) *(u64 *)((void *)tg + of_cft(of)->private) = v; @@ -1547,6 +1592,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, return ret; tg = blkg_to_tg(ctx.blkg); + tg_update_carryover(tg); v[0] = tg->bps_conf[READ][index]; v[1] = tg->bps_conf[WRITE][index]; diff --git a/block/blk-throttle.h b/block/blk-throttle.h index 45c6f3c1dfe0..a25e9e356d17 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -121,6 +121,15 @@ struct throtl_grp { uint64_t last_bytes_disp[2]; unsigned int last_io_disp[2]; + /* + * The following two fields are updated when new configuration is + * submitted while some bios are still throttled, they record how many + * bytes/ios are waited already in previous configuration, and they will + * be used to calculate wait time under new configuration. + */ + uint64_t carryover_bytes[2]; + unsigned int carryover_ios[2]; + unsigned long last_check_time; unsigned long latency_target; /* us */