diff mbox series

[for-5.14/block] blkcg: drop CLONE_IO check in blkcg_can_attach()

Message ID YJrTvHbrRDbJjw+S@slm.duckdns.org (mailing list archive)
State New, archived
Headers show
Series [for-5.14/block] blkcg: drop CLONE_IO check in blkcg_can_attach() | expand

Commit Message

Tejun Heo May 11, 2021, 6:58 p.m. UTC
blkcg has always rejected to attach if any of the member tasks has shared
io_context. The rationale was that io_contexts can be shared across
different cgroups making it impossible to define what the appropriate
control behavior should be. However, this check causes more problems than it
solves:

* The check prevents controller enable and migrations but not CLONE_IO
  itself, which can lead to surprises as the outcome changes depending on
  the order of operations.

* Sharing within a cgroup is fine but the check can't distinguish that. This
  leads to unnecessary conflicts with the recent CLONE_IO usage in io_uring.

io_context sharing doesn't make any difference for rq_qos based controllers
and the way it's used is safe as long as tasks aren't migrated dynamically
which is the vast majority of use cases. While we can try to make the check
more precise to avoid false positives, the added complexity doesn't seem
worthwhile. Let's just drop blkcg_can_attach().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |   27 ---------------------------
 1 file changed, 27 deletions(-)

Comments

Jens Axboe May 11, 2021, 7:39 p.m. UTC | #1
On 5/11/21 12:58 PM, Tejun Heo wrote:
> blkcg has always rejected to attach if any of the member tasks has shared
> io_context. The rationale was that io_contexts can be shared across
> different cgroups making it impossible to define what the appropriate
> control behavior should be. However, this check causes more problems than it
> solves:
> 
> * The check prevents controller enable and migrations but not CLONE_IO
>   itself, which can lead to surprises as the outcome changes depending on
>   the order of operations.
> 
> * Sharing within a cgroup is fine but the check can't distinguish that. This
>   leads to unnecessary conflicts with the recent CLONE_IO usage in io_uring.
> 
> io_context sharing doesn't make any difference for rq_qos based controllers
> and the way it's used is safe as long as tasks aren't migrated dynamically
> which is the vast majority of use cases. While we can try to make the check
> more precise to avoid false positives, the added complexity doesn't seem
> worthwhile. Let's just drop blkcg_can_attach().

Applied, thanks.
diff mbox series

Patch

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 582d2f18717ee..d169e20551588 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1217,32 +1217,6 @@  void blkcg_exit_queue(struct request_queue *q)
 	blk_throtl_exit(q);
 }
 
-/*
- * We cannot support shared io contexts, as we have no mean to support
- * two tasks with the same ioc in two different groups without major rework
- * of the main cic data structures.  For now we allow a task to change
- * its cgroup only if it's the only owner of its ioc.
- */
-static int blkcg_can_attach(struct cgroup_taskset *tset)
-{
-	struct task_struct *task;
-	struct cgroup_subsys_state *dst_css;
-	struct io_context *ioc;
-	int ret = 0;
-
-	/* task_lock() is needed to avoid races with exit_io_context() */
-	cgroup_taskset_for_each(task, dst_css, tset) {
-		task_lock(task);
-		ioc = task->io_context;
-		if (ioc && atomic_read(&ioc->nr_tasks) > 1)
-			ret = -EINVAL;
-		task_unlock(task);
-		if (ret)
-			break;
-	}
-	return ret;
-}
-
 static void blkcg_bind(struct cgroup_subsys_state *root_css)
 {
 	int i;
@@ -1275,7 +1249,6 @@  struct cgroup_subsys io_cgrp_subsys = {
 	.css_online = blkcg_css_online,
 	.css_offline = blkcg_css_offline,
 	.css_free = blkcg_css_free,
-	.can_attach = blkcg_can_attach,
 	.css_rstat_flush = blkcg_rstat_flush,
 	.bind = blkcg_bind,
 	.dfl_cftypes = blkcg_files,