diff mbox series

[v14,5/6] sched/core: uclamp: Update CPU's refcount on TG's clamp changes

Message ID 20190822132811.31294-6-patrick.bellasi@arm.com (mailing list archive)
State Not Applicable, archived
Headers show
Series Add utilization clamping support (CGroups API) | expand

Commit Message

Patrick Bellasi Aug. 22, 2019, 1:28 p.m. UTC
On updates of task group (TG) clamp values, ensure that these new values
are enforced on all RUNNABLE tasks of the task group, i.e. all RUNNABLE
tasks are immediately boosted and/or capped as requested.

Do that each time we update effective clamps from cpu_util_update_eff().
Use the *cgroup_subsys_state (css) to walk the list of tasks in each
affected TG and update their RUNNABLE tasks.
Update each task by using the same mechanism used for cpu affinity masks
updates, i.e. by taking the rq lock.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
---
 kernel/sched/core.c | 58 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra Aug. 30, 2019, 9:48 a.m. UTC | #1
On Thu, Aug 22, 2019 at 02:28:10PM +0100, Patrick Bellasi wrote:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 04fc161e4dbe..fc2dc86a2abe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1043,6 +1043,57 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
>  		uclamp_rq_dec_id(rq, p, clamp_id);
>  }
>  
> +static inline void
> +uclamp_update_active(struct task_struct *p, unsigned int clamp_id)
> +{
> +	struct rq_flags rf;
> +	struct rq *rq;
> +
> +	/*
> +	 * Lock the task and the rq where the task is (or was) queued.
> +	 *
> +	 * We might lock the (previous) rq of a !RUNNABLE task, but that's the
> +	 * price to pay to safely serialize util_{min,max} updates with
> +	 * enqueues, dequeues and migration operations.
> +	 * This is the same locking schema used by __set_cpus_allowed_ptr().
> +	 */
> +	rq = task_rq_lock(p, &rf);

Since modifying cgroup parameters is priv only, this should be OK I
suppose. Priv can already DoS the system anyway.

> +	/*
> +	 * Setting the clamp bucket is serialized by task_rq_lock().
> +	 * If the task is not yet RUNNABLE and its task_struct is not
> +	 * affecting a valid clamp bucket, the next time it's enqueued,
> +	 * it will already see the updated clamp bucket value.
> +	 */
> +	if (!p->uclamp[clamp_id].active)
> +		goto done;
> +
> +	uclamp_rq_dec_id(rq, p, clamp_id);
> +	uclamp_rq_inc_id(rq, p, clamp_id);
> +
> +done:

I'm thinking that:

	if (p->uclamp[clamp_id].active) {
		uclamp_rq_dec_id(rq, p, clamp_id);
		uclamp_rq_inc_id(rq, p, clamp_id);
	}

was too obvious? ;-)

> +
> +	task_rq_unlock(rq, p, &rf);
> +}
Patrick Bellasi Sept. 2, 2019, 6:44 a.m. UTC | #2
On Fri, Aug 30, 2019 at 09:48:34 +0000, Peter Zijlstra wrote...

> On Thu, Aug 22, 2019 at 02:28:10PM +0100, Patrick Bellasi wrote:
>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 04fc161e4dbe..fc2dc86a2abe 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1043,6 +1043,57 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
>>  		uclamp_rq_dec_id(rq, p, clamp_id);
>>  }
>>  
>> +static inline void
>> +uclamp_update_active(struct task_struct *p, unsigned int clamp_id)
>> +{
>> +	struct rq_flags rf;
>> +	struct rq *rq;
>> +
>> +	/*
>> +	 * Lock the task and the rq where the task is (or was) queued.
>> +	 *
>> +	 * We might lock the (previous) rq of a !RUNNABLE task, but that's the
>> +	 * price to pay to safely serialize util_{min,max} updates with
>> +	 * enqueues, dequeues and migration operations.
>> +	 * This is the same locking schema used by __set_cpus_allowed_ptr().
>> +	 */
>> +	rq = task_rq_lock(p, &rf);
>
> Since modifying cgroup parameters is priv only, this should be OK I
> suppose. Priv can already DoS the system anyway.

Are you referring to the possibility to DoS the scheduler by keep
writing cgroup attributes?

Because, in that case I think cgroup attributes could be written also by
non priv users. It all depends on how they are mounted and permissions
are set. Isn't it?

Anyway, I'm not sure we can fix that here... and in principle we could
have that DoS by setting CPUs affinities, which is user exposed.
Isn't it?

>> +	/*
>> +	 * Setting the clamp bucket is serialized by task_rq_lock().
>> +	 * If the task is not yet RUNNABLE and its task_struct is not
>> +	 * affecting a valid clamp bucket, the next time it's enqueued,
>> +	 * it will already see the updated clamp bucket value.
>> +	 */
>> +	if (!p->uclamp[clamp_id].active)
>> +		goto done;
>> +
>> +	uclamp_rq_dec_id(rq, p, clamp_id);
>> +	uclamp_rq_inc_id(rq, p, clamp_id);
>> +
>> +done:
>
> I'm thinking that:
>
> 	if (p->uclamp[clamp_id].active) {
> 		uclamp_rq_dec_id(rq, p, clamp_id);
> 		uclamp_rq_inc_id(rq, p, clamp_id);
> 	}
>
> was too obvious? ;-)

Yep, right... I think there was some more code in prev versions but I
forgot to get rid of that "goto" pattern after some change.

>> +
>> +	task_rq_unlock(rq, p, &rf);
>> +}

Cheers,
Patrick
Peter Zijlstra Sept. 2, 2019, 7:38 a.m. UTC | #3
On Mon, Sep 02, 2019 at 07:44:40AM +0100, Patrick Bellasi wrote:
> On Fri, Aug 30, 2019 at 09:48:34 +0000, Peter Zijlstra wrote...
> > On Thu, Aug 22, 2019 at 02:28:10PM +0100, Patrick Bellasi wrote:

> >> +	rq = task_rq_lock(p, &rf);
> >
> > Since modifying cgroup parameters is priv only, this should be OK I
> > suppose. Priv can already DoS the system anyway.
> 
> Are you referring to the possibility to DoS the scheduler by keep
> writing cgroup attributes?

Yep.

> Because, in that case I think cgroup attributes could be written also by
> non priv users. It all depends on how they are mounted and permissions
> are set. Isn't it?
> 
> Anyway, I'm not sure we can fix that here... and in principle we could
> have that DoS by setting CPUs affinities, which is user exposed.
> Isn't it?

Only for a single task; by using the cgroup thing we have that in-kernel
iteration of tasks.

The thing I worry about is bouncing rq->lock around the system; but
yeah, I suppose a normal user could achieve something similar with
enough tasks.

> >> +	/*
> >> +	 * Setting the clamp bucket is serialized by task_rq_lock().
> >> +	 * If the task is not yet RUNNABLE and its task_struct is not
> >> +	 * affecting a valid clamp bucket, the next time it's enqueued,
> >> +	 * it will already see the updated clamp bucket value.
> >> +	 */
> >> +	if (!p->uclamp[clamp_id].active)
> >> +		goto done;
> >> +
> >> +	uclamp_rq_dec_id(rq, p, clamp_id);
> >> +	uclamp_rq_inc_id(rq, p, clamp_id);
> >> +
> >> +done:
> >
> > I'm thinking that:
> >
> > 	if (p->uclamp[clamp_id].active) {
> > 		uclamp_rq_dec_id(rq, p, clamp_id);
> > 		uclamp_rq_inc_id(rq, p, clamp_id);
> > 	}
> >
> > was too obvious? ;-)
> 
> Yep, right... I think there was some more code in prev versions but I
> forgot to get rid of that "goto" pattern after some change.

OK, already fixed that.
diff mbox series

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 04fc161e4dbe..fc2dc86a2abe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1043,6 +1043,57 @@  static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
 		uclamp_rq_dec_id(rq, p, clamp_id);
 }
 
+static inline void
+uclamp_update_active(struct task_struct *p, unsigned int clamp_id)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+
+	/*
+	 * Lock the task and the rq where the task is (or was) queued.
+	 *
+	 * We might lock the (previous) rq of a !RUNNABLE task, but that's the
+	 * price to pay to safely serialize util_{min,max} updates with
+	 * enqueues, dequeues and migration operations.
+	 * This is the same locking schema used by __set_cpus_allowed_ptr().
+	 */
+	rq = task_rq_lock(p, &rf);
+
+	/*
+	 * Setting the clamp bucket is serialized by task_rq_lock().
+	 * If the task is not yet RUNNABLE and its task_struct is not
+	 * affecting a valid clamp bucket, the next time it's enqueued,
+	 * it will already see the updated clamp bucket value.
+	 */
+	if (!p->uclamp[clamp_id].active)
+		goto done;
+
+	uclamp_rq_dec_id(rq, p, clamp_id);
+	uclamp_rq_inc_id(rq, p, clamp_id);
+
+done:
+
+	task_rq_unlock(rq, p, &rf);
+}
+
+static inline void
+uclamp_update_active_tasks(struct cgroup_subsys_state *css,
+			   unsigned int clamps)
+{
+	struct css_task_iter it;
+	struct task_struct *p;
+	unsigned int clamp_id;
+
+	css_task_iter_start(css, 0, &it);
+	while ((p = css_task_iter_next(&it))) {
+		for_each_clamp_id(clamp_id) {
+			if ((0x1 << clamp_id) & clamps)
+				uclamp_update_active(p, clamp_id);
+		}
+	}
+	css_task_iter_end(&it);
+}
+
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 static void cpu_util_update_eff(struct cgroup_subsys_state *css);
 static void uclamp_update_root_tg(void)
@@ -7160,8 +7211,13 @@  static void cpu_util_update_eff(struct cgroup_subsys_state *css)
 			uc_se[clamp_id].bucket_id = uclamp_bucket_id(eff[clamp_id]);
 			clamps |= (0x1 << clamp_id);
 		}
-		if (!clamps)
+		if (!clamps) {
 			css = css_rightmost_descendant(css);
+			continue;
+		}
+
+		/* Immediately update descendants RUNNABLE tasks */
+		uclamp_update_active_tasks(css, clamps);
 	}
 }