Message ID | 20190822132811.31294-6-patrick.bellasi@arm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | Add utilization clamping support (CGroups API) | expand |
On Thu, Aug 22, 2019 at 02:28:10PM +0100, Patrick Bellasi wrote: > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 04fc161e4dbe..fc2dc86a2abe 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1043,6 +1043,57 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) > uclamp_rq_dec_id(rq, p, clamp_id); > } > > +static inline void > +uclamp_update_active(struct task_struct *p, unsigned int clamp_id) > +{ > + struct rq_flags rf; > + struct rq *rq; > + > + /* > + * Lock the task and the rq where the task is (or was) queued. > + * > + * We might lock the (previous) rq of a !RUNNABLE task, but that's the > + * price to pay to safely serialize util_{min,max} updates with > + * enqueues, dequeues and migration operations. > + * This is the same locking schema used by __set_cpus_allowed_ptr(). > + */ > + rq = task_rq_lock(p, &rf); Since modifying cgroup parameters is priv only, this should be OK I suppose. Priv can already DoS the system anyway. > + /* > + * Setting the clamp bucket is serialized by task_rq_lock(). > + * If the task is not yet RUNNABLE and its task_struct is not > + * affecting a valid clamp bucket, the next time it's enqueued, > + * it will already see the updated clamp bucket value. > + */ > + if (!p->uclamp[clamp_id].active) > + goto done; > + > + uclamp_rq_dec_id(rq, p, clamp_id); > + uclamp_rq_inc_id(rq, p, clamp_id); > + > +done: I'm thinking that: if (p->uclamp[clamp_id].active) { uclamp_rq_dec_id(rq, p, clamp_id); uclamp_rq_inc_id(rq, p, clamp_id); } was too obvious? ;-) > + > + task_rq_unlock(rq, p, &rf); > +}
On Fri, Aug 30, 2019 at 09:48:34 +0000, Peter Zijlstra wrote... > On Thu, Aug 22, 2019 at 02:28:10PM +0100, Patrick Bellasi wrote: > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 04fc161e4dbe..fc2dc86a2abe 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1043,6 +1043,57 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) >> uclamp_rq_dec_id(rq, p, clamp_id); >> } >> >> +static inline void >> +uclamp_update_active(struct task_struct *p, unsigned int clamp_id) >> +{ >> + struct rq_flags rf; >> + struct rq *rq; >> + >> + /* >> + * Lock the task and the rq where the task is (or was) queued. >> + * >> + * We might lock the (previous) rq of a !RUNNABLE task, but that's the >> + * price to pay to safely serialize util_{min,max} updates with >> + * enqueues, dequeues and migration operations. >> + * This is the same locking schema used by __set_cpus_allowed_ptr(). >> + */ >> + rq = task_rq_lock(p, &rf); > > Since modifying cgroup parameters is priv only, this should be OK I > suppose. Priv can already DoS the system anyway. Are you referring to the possibility to DoS the scheduler by keep writing cgroup attributes? Because, in that case I think cgroup attributes could be written also by non priv users. It all depends on how they are mounted and permissions are set. Isn't it? Anyway, I'm not sure we can fix that here... and in principle we could have that DoS by setting CPUs affinities, which is user exposed. Isn't it? >> + /* >> + * Setting the clamp bucket is serialized by task_rq_lock(). >> + * If the task is not yet RUNNABLE and its task_struct is not >> + * affecting a valid clamp bucket, the next time it's enqueued, >> + * it will already see the updated clamp bucket value. >> + */ >> + if (!p->uclamp[clamp_id].active) >> + goto done; >> + >> + uclamp_rq_dec_id(rq, p, clamp_id); >> + uclamp_rq_inc_id(rq, p, clamp_id); >> + >> +done: > > I'm thinking that: > > if (p->uclamp[clamp_id].active) { > uclamp_rq_dec_id(rq, p, clamp_id); > uclamp_rq_inc_id(rq, p, clamp_id); > } > > was too obvious? ;-) Yep, right... I think there was some more code in prev versions but I forgot to get rid of that "goto" pattern after some change. >> + >> + task_rq_unlock(rq, p, &rf); >> +} Cheers, Patrick
On Mon, Sep 02, 2019 at 07:44:40AM +0100, Patrick Bellasi wrote: > On Fri, Aug 30, 2019 at 09:48:34 +0000, Peter Zijlstra wrote... > > On Thu, Aug 22, 2019 at 02:28:10PM +0100, Patrick Bellasi wrote: > >> + rq = task_rq_lock(p, &rf); > > > > Since modifying cgroup parameters is priv only, this should be OK I > > suppose. Priv can already DoS the system anyway. > > Are you referring to the possibility to DoS the scheduler by keep > writing cgroup attributes? Yep. > Because, in that case I think cgroup attributes could be written also by > non priv users. It all depends on how they are mounted and permissions > are set. Isn't it? > > Anyway, I'm not sure we can fix that here... and in principle we could > have that DoS by setting CPUs affinities, which is user exposed. > Isn't it? Only for a single task; by using the cgroup thing we have that in-kernel iteration of tasks. The thing I worry about is bouncing rq->lock around the system; but yeah, I suppose a normal user could achieve something similar with enough tasks. > >> + /* > >> + * Setting the clamp bucket is serialized by task_rq_lock(). > >> + * If the task is not yet RUNNABLE and its task_struct is not > >> + * affecting a valid clamp bucket, the next time it's enqueued, > >> + * it will already see the updated clamp bucket value. > >> + */ > >> + if (!p->uclamp[clamp_id].active) > >> + goto done; > >> + > >> + uclamp_rq_dec_id(rq, p, clamp_id); > >> + uclamp_rq_inc_id(rq, p, clamp_id); > >> + > >> +done: > > > > I'm thinking that: > > > > if (p->uclamp[clamp_id].active) { > > uclamp_rq_dec_id(rq, p, clamp_id); > > uclamp_rq_inc_id(rq, p, clamp_id); > > } > > > > was too obvious? ;-) > > Yep, right... I think there was some more code in prev versions but I > forgot to get rid of that "goto" pattern after some change. OK, already fixed that.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 04fc161e4dbe..fc2dc86a2abe 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1043,6 +1043,57 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) uclamp_rq_dec_id(rq, p, clamp_id); } +static inline void +uclamp_update_active(struct task_struct *p, unsigned int clamp_id) +{ + struct rq_flags rf; + struct rq *rq; + + /* + * Lock the task and the rq where the task is (or was) queued. + * + * We might lock the (previous) rq of a !RUNNABLE task, but that's the + * price to pay to safely serialize util_{min,max} updates with + * enqueues, dequeues and migration operations. + * This is the same locking schema used by __set_cpus_allowed_ptr(). + */ + rq = task_rq_lock(p, &rf); + + /* + * Setting the clamp bucket is serialized by task_rq_lock(). + * If the task is not yet RUNNABLE and its task_struct is not + * affecting a valid clamp bucket, the next time it's enqueued, + * it will already see the updated clamp bucket value. + */ + if (!p->uclamp[clamp_id].active) + goto done; + + uclamp_rq_dec_id(rq, p, clamp_id); + uclamp_rq_inc_id(rq, p, clamp_id); + +done: + + task_rq_unlock(rq, p, &rf); +} + +static inline void +uclamp_update_active_tasks(struct cgroup_subsys_state *css, + unsigned int clamps) +{ + struct css_task_iter it; + struct task_struct *p; + unsigned int clamp_id; + + css_task_iter_start(css, 0, &it); + while ((p = css_task_iter_next(&it))) { + for_each_clamp_id(clamp_id) { + if ((0x1 << clamp_id) & clamps) + uclamp_update_active(p, clamp_id); + } + } + css_task_iter_end(&it); +} + #ifdef CONFIG_UCLAMP_TASK_GROUP static void cpu_util_update_eff(struct cgroup_subsys_state *css); static void uclamp_update_root_tg(void) @@ -7160,8 +7211,13 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css) uc_se[clamp_id].bucket_id = uclamp_bucket_id(eff[clamp_id]); clamps |= (0x1 << clamp_id); } - if (!clamps) + if (!clamps) { css = css_rightmost_descendant(css); + continue; + } + + /* Immediately update descendants RUNNABLE tasks */ + uclamp_update_active_tasks(css, clamps); } }