diff mbox series

[1/2] io_uring/io-wq: do not allow pinning outside of cpuset

Message ID 20240910143320.123234-2-felix.moessbauer@siemens.com (mailing list archive)
State New
Headers show
Series io_uring/io-wq: respect cgroup cpusets | expand

Commit Message

Felix Moessbauer Sept. 10, 2024, 2:33 p.m. UTC
The io work queue polling threads are userland threads that just never
exit to the userland. By that, they are also assigned to a cgroup (the
group of the creating task).

When changing the affinity of the io_wq thread via syscall, we must only
allow cpumasks within the ambient limits. These are defined by the cpuset
controller of the cgroup (if enabled).

Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
---
 io_uring/io-wq.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

Comments

Jens Axboe Sept. 10, 2024, 2:55 p.m. UTC | #1
On 9/10/24 8:33 AM, Felix Moessbauer wrote:
> The io work queue polling threads are userland threads that just never
> exit to the userland. By that, they are also assigned to a cgroup (the
> group of the creating task).

They are not polling threads, they are just worker threads.

> When changing the affinity of the io_wq thread via syscall, we must only
> allow cpumasks within the ambient limits. These are defined by the cpuset
> controller of the cgroup (if enabled).

ambient limits? Not quite sure that's the correct term to use here.

Outside of commit message oddities, change looks good.
diff mbox series

Patch

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index f1e7c670add8..c7055a8895d7 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -13,6 +13,7 @@ 
 #include <linux/slab.h>
 #include <linux/rculist_nulls.h>
 #include <linux/cpu.h>
+#include <linux/cpuset.h>
 #include <linux/task_work.h>
 #include <linux/audit.h>
 #include <linux/mmu_context.h>
@@ -1322,17 +1323,29 @@  static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node)
 
 int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask)
 {
+	cpumask_var_t allowed_mask;
+	int ret = 0;
+
 	if (!tctx || !tctx->io_wq)
 		return -EINVAL;
 
+	if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL))
+		return -ENOMEM;
+
 	rcu_read_lock();
-	if (mask)
-		cpumask_copy(tctx->io_wq->cpu_mask, mask);
-	else
-		cpumask_copy(tctx->io_wq->cpu_mask, cpu_possible_mask);
+	cpuset_cpus_allowed(tctx->io_wq->task, allowed_mask);
+	if (mask) {
+		if (cpumask_subset(mask, allowed_mask))
+			cpumask_copy(tctx->io_wq->cpu_mask, mask);
+		else
+			ret = -EINVAL;
+	} else {
+		cpumask_copy(tctx->io_wq->cpu_mask, allowed_mask);
+	}
 	rcu_read_unlock();
 
-	return 0;
+	free_cpumask_var(allowed_mask);
+	return ret;
 }
 
 /*