diff mbox series

block: cancel all throttled bios when deleting the cgroup

Message ID 20240624130940.3751791-1-lilingfeng@huaweicloud.com (mailing list archive)
State New
Headers show
Series block: cancel all throttled bios when deleting the cgroup | expand

Commit Message

Li Lingfeng June 24, 2024, 1:09 p.m. UTC
From: Li Lingfeng <lilingfeng3@huawei.com>

When a process migrates to another cgroup and the original cgroup is deleted,
the restrictions of throttled bios cannot be removed. If the restrictions
are set too low, it will take a long time to complete these bios.

Refer to the process of deleting a disk to remove the restrictions and
issue bios when deleting the cgroup.

References:
https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.com

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
---
 block/blk-throttle.c | 68 ++++++++++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 24 deletions(-)

Comments

Michal Koutný June 25, 2024, 10:34 a.m. UTC | #1
Hello.

On Mon, Jun 24, 2024 at 09:09:40PM GMT, Li Lingfeng <lilingfeng@huaweicloud.com> wrote:
> From: Li Lingfeng <lilingfeng3@huawei.com>
> 
> When a process migrates to another cgroup and the original cgroup is deleted,
> the restrictions of throttled bios cannot be removed. If the restrictions
> are set too low, it will take a long time to complete these bios.

When pd_offline_fn is called because of disk going away, it makes sense
to cancel the bios. However, when pd_offline_fn is called due to cgroup
removal (with possibly surviving originating process), wouldn't bio
cancelling lead to loss of data?
Aha, it wouldn't -- the purpose of the function is to "flush" throttled
bios (in the original patch they'd immediately fail, here they the IO
operation may succeed).
Is that correct? (Wouldn't there be a more descriptive name than
tg_cancel_bios then?)

And if a user is allowed to remove cgroup and use this to bypass the
throttling, they also must have permissions to migrate away from the
cgroup (and consistent config would thus allow them to change the limit
too), therefore this doesn't allow bypassing the throttling limit. If
you agree, could you please add the explanation to commit message too?

Thanks,
Michal
Li Lingfeng June 25, 2024, 11:38 a.m. UTC | #2
在 2024/6/25 18:34, Michal Koutný 写道:
> Hello.
>
> On Mon, Jun 24, 2024 at 09:09:40PM GMT, Li Lingfeng <lilingfeng@huaweicloud.com> wrote:
>> From: Li Lingfeng <lilingfeng3@huawei.com>
>>
>> When a process migrates to another cgroup and the original cgroup is deleted,
>> the restrictions of throttled bios cannot be removed. If the restrictions
>> are set too low, it will take a long time to complete these bios.
> When pd_offline_fn is called because of disk going away, it makes sense
> to cancel the bios. However, when pd_offline_fn is called due to cgroup
> removal (with possibly surviving originating process), wouldn't bio
> cancelling lead to loss of data?
> Aha, it wouldn't -- the purpose of the function is to "flush" throttled
> bios (in the original patch they'd immediately fail, here they the IO
> operation may succeed).
> Is that correct? (Wouldn't there be a more descriptive name than
> tg_cancel_bios then?)
Thanks for your advice. It's indeed more appropriate to use "flush" 
instead of "cancel" here, I will change it soon.
>
> And if a user is allowed to remove cgroup and use this to bypass the
> throttling, they also must have permissions to migrate away from the
> cgroup (and consistent config would thus allow them to change the limit
> too), therefore this doesn't allow bypassing the throttling limit. If
> you agree, could you please add the explanation to commit message too?

I didn't quite get what you mean. Do you mean this patch will cause a 
change in mechanics, and it is necessary to add an explanation?

(After deleting the original cgroup,
  Before: the limit of the throttled bios can't be changed and the bios 
will complete under this limit;
  Now: the limit will be canceled and the throttled bios will be flushed 
immediately.)

> Thanks,
> Michal
Michal Koutný June 27, 2024, 2:48 p.m. UTC | #3
On Tue, Jun 25, 2024 at 07:38:34PM GMT, Li Lingfeng <lilingfeng@huaweicloud.com> wrote:
> Thanks for your advice. It's indeed more appropriate to use "flush" instead
> of "cancel" here, I will change it soon.

I saw your v2. Didn't you forget to change also the function name?

> I didn't quite get what you mean. Do you mean this patch will cause a change
> in mechanics, and it is necessary to add an explanation?
> 
> (After deleting the original cgroup,
>  Before: the limit of the throttled bios can't be changed and the bios will
> complete under this limit;
>  Now: the limit will be canceled and the throttled bios will be flushed
> immediately.)

I mean -- can the new mechanics be exploited to bypass throttling by
sending IO from a process, migrate it between cgroups and rmdir them?
That should be covered in the commit log.

Thanks,
Michal
Li Lingfeng June 28, 2024, 2:04 a.m. UTC | #4
在 2024/6/27 22:48, Michal Koutný 写道:
> On Tue, Jun 25, 2024 at 07:38:34PM GMT, Li Lingfeng <lilingfeng@huaweicloud.com> wrote:
>> Thanks for your advice. It's indeed more appropriate to use "flush" instead
>> of "cancel" here, I will change it soon.
> I saw your v2. Didn't you forget to change also the function name?
Yes, sorry for losing it.
>
>> I didn't quite get what you mean. Do you mean this patch will cause a change
>> in mechanics, and it is necessary to add an explanation?
>>
>> (After deleting the original cgroup,
>>   Before: the limit of the throttled bios can't be changed and the bios will
>> complete under this limit;
>>   Now: the limit will be canceled and the throttled bios will be flushed
>> immediately.)
> I mean -- can the new mechanics be exploited to bypass throttling by
> sending IO from a process, migrate it between cgroups and rmdir them?
> That should be covered in the commit log.
Yes.
Migrating a process to a new cgroup means we want the next bio will be
throttled by the new limit.
We can flush the throttled bios by deleting the old cgroup, or keep it
to make the previous bios complete slowly under the original limit.


Thanks.

>
> Thanks,
> Michal
diff mbox series

Patch

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index c1bf73f8c75d..a0e5b28951ca 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1534,6 +1534,42 @@  static void throtl_shutdown_wq(struct request_queue *q)
 	cancel_work_sync(&td->dispatch_work);
 }
 
+static void tg_cancel_bios(struct throtl_grp *tg)
+{
+	struct throtl_service_queue *sq = &tg->service_queue;
+
+	if (tg->flags & THROTL_TG_CANCELING)
+		return;
+	/*
+	 * Set the flag to make sure throtl_pending_timer_fn() won't
+	 * stop until all throttled bios are dispatched.
+	 */
+	tg->flags |= THROTL_TG_CANCELING;
+
+	/*
+	 * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
+	 * will be inserted to service queue without THROTL_TG_PENDING
+	 * set in tg_update_disptime below. Then IO dispatched from
+	 * child in tg_dispatch_one_bio will trigger double insertion
+	 * and corrupt the tree.
+	 */
+	if (!(tg->flags & THROTL_TG_PENDING))
+		return;
+
+	/*
+	 * Update disptime after setting the above flag to make sure
+	 * throtl_select_dispatch() won't exit without dispatching.
+	 */
+	tg_update_disptime(tg);
+
+	throtl_schedule_pending_timer(sq, jiffies + 1);
+}
+
+static void throtl_pd_offline(struct blkg_policy_data *pd)
+{
+	tg_cancel_bios(pd_to_tg(pd));
+}
+
 struct blkcg_policy blkcg_policy_throtl = {
 	.dfl_cftypes		= throtl_files,
 	.legacy_cftypes		= throtl_legacy_files,
@@ -1541,6 +1577,7 @@  struct blkcg_policy blkcg_policy_throtl = {
 	.pd_alloc_fn		= throtl_pd_alloc,
 	.pd_init_fn		= throtl_pd_init,
 	.pd_online_fn		= throtl_pd_online,
+	.pd_offline_fn		= throtl_pd_offline,
 	.pd_free_fn		= throtl_pd_free,
 };
 
@@ -1561,32 +1598,15 @@  void blk_throtl_cancel_bios(struct gendisk *disk)
 	 */
 	rcu_read_lock();
 	blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) {
-		struct throtl_grp *tg = blkg_to_tg(blkg);
-		struct throtl_service_queue *sq = &tg->service_queue;
-
-		/*
-		 * Set the flag to make sure throtl_pending_timer_fn() won't
-		 * stop until all throttled bios are dispatched.
-		 */
-		tg->flags |= THROTL_TG_CANCELING;
-
 		/*
-		 * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
-		 * will be inserted to service queue without THROTL_TG_PENDING
-		 * set in tg_update_disptime below. Then IO dispatched from
-		 * child in tg_dispatch_one_bio will trigger double insertion
-		 * and corrupt the tree.
+		 * disk_release will call pd_offline_fn to cancel bios.
+		 * However, disk_release can't be called if someone get
+		 * the refcount of device and issued bios which are
+		 * inflight after del_gendisk.
+		 * Cancel bios here to ensure no bios are inflight after
+		 * del_gendisk.
 		 */
-		if (!(tg->flags & THROTL_TG_PENDING))
-			continue;
-
-		/*
-		 * Update disptime after setting the above flag to make sure
-		 * throtl_select_dispatch() won't exit without dispatching.
-		 */
-		tg_update_disptime(tg);
-
-		throtl_schedule_pending_timer(sq, jiffies + 1);
+		tg_cancel_bios(blkg_to_tg(blkg));
 	}
 	rcu_read_unlock();
 	spin_unlock_irq(&q->queue_lock);