Message ID | 20240624130940.3751791-1-lilingfeng@huaweicloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: cancel all throttled bios when deleting the cgroup | expand |
Hello. On Mon, Jun 24, 2024 at 09:09:40PM GMT, Li Lingfeng <lilingfeng@huaweicloud.com> wrote: > From: Li Lingfeng <lilingfeng3@huawei.com> > > When a process migrates to another cgroup and the original cgroup is deleted, > the restrictions of throttled bios cannot be removed. If the restrictions > are set too low, it will take a long time to complete these bios. When pd_offline_fn is called because of disk going away, it makes sense to cancel the bios. However, when pd_offline_fn is called due to cgroup removal (with possibly surviving originating process), wouldn't bio cancelling lead to loss of data? Aha, it wouldn't -- the purpose of the function is to "flush" throttled bios (in the original patch they'd immediately fail, here they the IO operation may succeed). Is that correct? (Wouldn't there be a more descriptive name than tg_cancel_bios then?) And if a user is allowed to remove cgroup and use this to bypass the throttling, they also must have permissions to migrate away from the cgroup (and consistent config would thus allow them to change the limit too), therefore this doesn't allow bypassing the throttling limit. If you agree, could you please add the explanation to commit message too? Thanks, Michal
在 2024/6/25 18:34, Michal Koutný 写道: > Hello. > > On Mon, Jun 24, 2024 at 09:09:40PM GMT, Li Lingfeng <lilingfeng@huaweicloud.com> wrote: >> From: Li Lingfeng <lilingfeng3@huawei.com> >> >> When a process migrates to another cgroup and the original cgroup is deleted, >> the restrictions of throttled bios cannot be removed. If the restrictions >> are set too low, it will take a long time to complete these bios. > When pd_offline_fn is called because of disk going away, it makes sense > to cancel the bios. However, when pd_offline_fn is called due to cgroup > removal (with possibly surviving originating process), wouldn't bio > cancelling lead to loss of data? > Aha, it wouldn't -- the purpose of the function is to "flush" throttled > bios (in the original patch they'd immediately fail, here they the IO > operation may succeed). > Is that correct? (Wouldn't there be a more descriptive name than > tg_cancel_bios then?) Thanks for your advice. It's indeed more appropriate to use "flush" instead of "cancel" here, I will change it soon. > > And if a user is allowed to remove cgroup and use this to bypass the > throttling, they also must have permissions to migrate away from the > cgroup (and consistent config would thus allow them to change the limit > too), therefore this doesn't allow bypassing the throttling limit. If > you agree, could you please add the explanation to commit message too? I didn't quite get what you mean. Do you mean this patch will cause a change in mechanics, and it is necessary to add an explanation? (After deleting the original cgroup, Before: the limit of the throttled bios can't be changed and the bios will complete under this limit; Now: the limit will be canceled and the throttled bios will be flushed immediately.) > Thanks, > Michal
On Tue, Jun 25, 2024 at 07:38:34PM GMT, Li Lingfeng <lilingfeng@huaweicloud.com> wrote: > Thanks for your advice. It's indeed more appropriate to use "flush" instead > of "cancel" here, I will change it soon. I saw your v2. Didn't you forget to change also the function name? > I didn't quite get what you mean. Do you mean this patch will cause a change > in mechanics, and it is necessary to add an explanation? > > (After deleting the original cgroup, > Before: the limit of the throttled bios can't be changed and the bios will > complete under this limit; > Now: the limit will be canceled and the throttled bios will be flushed > immediately.) I mean -- can the new mechanics be exploited to bypass throttling by sending IO from a process, migrate it between cgroups and rmdir them? That should be covered in the commit log. Thanks, Michal
在 2024/6/27 22:48, Michal Koutný 写道: > On Tue, Jun 25, 2024 at 07:38:34PM GMT, Li Lingfeng <lilingfeng@huaweicloud.com> wrote: >> Thanks for your advice. It's indeed more appropriate to use "flush" instead >> of "cancel" here, I will change it soon. > I saw your v2. Didn't you forget to change also the function name? Yes, sorry for losing it. > >> I didn't quite get what you mean. Do you mean this patch will cause a change >> in mechanics, and it is necessary to add an explanation? >> >> (After deleting the original cgroup, >> Before: the limit of the throttled bios can't be changed and the bios will >> complete under this limit; >> Now: the limit will be canceled and the throttled bios will be flushed >> immediately.) > I mean -- can the new mechanics be exploited to bypass throttling by > sending IO from a process, migrate it between cgroups and rmdir them? > That should be covered in the commit log. Yes. Migrating a process to a new cgroup means we want the next bio will be throttled by the new limit. We can flush the throttled bios by deleting the old cgroup, or keep it to make the previous bios complete slowly under the original limit. Thanks. > > Thanks, > Michal
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index c1bf73f8c75d..a0e5b28951ca 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1534,6 +1534,42 @@ static void throtl_shutdown_wq(struct request_queue *q) cancel_work_sync(&td->dispatch_work); } +static void tg_cancel_bios(struct throtl_grp *tg) +{ + struct throtl_service_queue *sq = &tg->service_queue; + + if (tg->flags & THROTL_TG_CANCELING) + return; + /* + * Set the flag to make sure throtl_pending_timer_fn() won't + * stop until all throttled bios are dispatched. + */ + tg->flags |= THROTL_TG_CANCELING; + + /* + * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup + * will be inserted to service queue without THROTL_TG_PENDING + * set in tg_update_disptime below. Then IO dispatched from + * child in tg_dispatch_one_bio will trigger double insertion + * and corrupt the tree. + */ + if (!(tg->flags & THROTL_TG_PENDING)) + return; + + /* + * Update disptime after setting the above flag to make sure + * throtl_select_dispatch() won't exit without dispatching. + */ + tg_update_disptime(tg); + + throtl_schedule_pending_timer(sq, jiffies + 1); +} + +static void throtl_pd_offline(struct blkg_policy_data *pd) +{ + tg_cancel_bios(pd_to_tg(pd)); +} + struct blkcg_policy blkcg_policy_throtl = { .dfl_cftypes = throtl_files, .legacy_cftypes = throtl_legacy_files, @@ -1541,6 +1577,7 @@ struct blkcg_policy blkcg_policy_throtl = { .pd_alloc_fn = throtl_pd_alloc, .pd_init_fn = throtl_pd_init, .pd_online_fn = throtl_pd_online, + .pd_offline_fn = throtl_pd_offline, .pd_free_fn = throtl_pd_free, }; @@ -1561,32 +1598,15 @@ void blk_throtl_cancel_bios(struct gendisk *disk) */ rcu_read_lock(); blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) { - struct throtl_grp *tg = blkg_to_tg(blkg); - struct throtl_service_queue *sq = &tg->service_queue; - - /* - * Set the flag to make sure throtl_pending_timer_fn() won't - * stop until all throttled bios are dispatched. - */ - tg->flags |= THROTL_TG_CANCELING; - /* - * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup - * will be inserted to service queue without THROTL_TG_PENDING - * set in tg_update_disptime below. Then IO dispatched from - * child in tg_dispatch_one_bio will trigger double insertion - * and corrupt the tree. + * disk_release will call pd_offline_fn to cancel bios. + * However, disk_release can't be called if someone get + * the refcount of device and issued bios which are + * inflight after del_gendisk. + * Cancel bios here to ensure no bios are inflight after + * del_gendisk. */ - if (!(tg->flags & THROTL_TG_PENDING)) - continue; - - /* - * Update disptime after setting the above flag to make sure - * throtl_select_dispatch() won't exit without dispatching. - */ - tg_update_disptime(tg); - - throtl_schedule_pending_timer(sq, jiffies + 1); + tg_cancel_bios(blkg_to_tg(blkg)); } rcu_read_unlock(); spin_unlock_irq(&q->queue_lock);