[3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()

Message ID	1314138000-2049-4-git-send-email-tj@kernel.org (mailing list archive)
State	Not Applicable, archived
Headers	show Received: from smtp1.linux-foundation.org (smtp1.linux-foundation.org [140.211.169.13]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p7NMOtnQ006874 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for <patchwork-linux-pm@patchwork.kernel.org>; Tue, 23 Aug 2011 22:25:20 GMT From: Tejun Heo <tj@kernel.org> To: rjw@sisk.pl, paul@paulmenage.org, lizf@cn.fujitsu.com Date: Wed, 24 Aug 2011 00:19:57 +0200 Message-Id: <1314138000-2049-4-git-send-email-tj@kernel.org> In-Reply-To: <1314138000-2049-1-git-send-email-tj@kernel.org> References: <1314138000-2049-1-git-send-email-tj@kernel.org> Received-SPF: pass (localhost is always allowed.) Cc: containers@lists.linux-foundation.org, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>, linux-kernel@vger.kernel.org, James Morris <jmorris@namei.org>, Tejun Heo <tj@kernel.org>, linux-pm@lists.linux-foundation.org, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Subject: [linux-pm] [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linux-pm-bounces@lists.linux-foundation.org Errors-To: linux-pm-bounces@lists.linux-foundation.org

On Wed, 24 Aug 2011 00:19:57 +0200 Tejun Heo <tj@kernel.org> wrote: > Currently, there's no way to pass multiple tasks to cgroup_subsys > methods necessitating the need for separate per-process and per-task > methods. This patch introduces cgroup_taskset which can be used to > pass multiple tasks and their associated cgroups to cgroup_subsys > methods. > > Three methods - can_attach(), cancel_attach() and attach() - are > converted to use cgroup_taskset. This unifies passed parameters so > that all methods have access to all information. Conversions in this > patchset are identical and don't introduce any behavior change. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Cc: Paul Menage <paul@paulmenage.org> > Cc: Li Zefan <lizf@cn.fujitsu.com> > Cc: Balbir Singh <bsingharora@gmail.com> > Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Cc: James Morris <jmorris@namei.org> Thank you for your work. I welcome this ! Some comments around memcg. > --- > Documentation/cgroups/cgroups.txt | 26 ++++++---- > include/linux/cgroup.h | 28 +++++++++- > kernel/cgroup.c | 99 +++++++++++++++++++++++++++++++++---- > kernel/cgroup_freezer.c | 2 +- > kernel/cpuset.c | 18 ++++--- > mm/memcontrol.c | 16 +++--- > security/device_cgroup.c | 7 ++- > 7 files changed, 153 insertions(+), 43 deletions(-) > > diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt > index cd67e90..2eee7cf 100644 > --- a/Documentation/cgroups/cgroups.txt > +++ b/Documentation/cgroups/cgroups.txt > @@ -594,16 +594,21 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be > called multiple times against a cgroup. > > int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, > - struct task_struct *task) > + struct cgroup_taskset *tset) > (cgroup_mutex held by caller) > > -Called prior to moving a task into a cgroup; if the subsystem > -returns an error, this will abort the attach operation. If a NULL > -task is passed, then a successful result indicates that *any* > -unspecified task can be moved into the cgroup. Note that this isn't > +Called prior to moving one or more tasks into a cgroup; if the > +subsystem returns an error, this will abort the attach operation. > +@tset contains the tasks to be attached and is guaranteed to have at > +least one task in it. If there are multiple, it's guaranteed that all > +are from the same thread group, Do this, "If there are multiple, it's guaranteed that all are from the same thread group ", means the 'tset' contains only one mm_struct ? And is it guaranteed that any task in tset will not be freed while subsystem routine runs ? > @tset contains all tasks from the > +group whether they're actually switching cgroup or not, and the first > +task is the leader. Each @tset entry also contains the task's old > +cgroup and tasks which aren't switching cgroup can be skipped easily > +using the cgroup_taskset_for_each() iterator. Note that this isn't > called on a fork. If this method returns 0 (success) then this should > -remain valid while the caller holds cgroup_mutex and it is ensured that either > -attach() or cancel_attach() will be called in future. > +remain valid while the caller holds cgroup_mutex and it is ensured > +that either attach() or cancel_attach() will be called in future. > > int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk); > (cgroup_mutex held by caller) > @@ -613,14 +618,14 @@ attached (possibly many when using cgroup_attach_proc). Called after > can_attach. > > void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, > - struct task_struct *task, bool threadgroup) > + struct cgroup_taskset *tset) > (cgroup_mutex held by caller) > > Called when a task attach operation has failed after can_attach() has succeeded. > A subsystem whose can_attach() has some side-effects should provide this > function, so that the subsystem can implement a rollback. If not, not necessary. > This will be called only about subsystems whose can_attach() operation have > -succeeded. > +succeeded. The parameters are identical to can_attach(). > > void pre_attach(struct cgroup *cgrp); > (cgroup_mutex held by caller) > @@ -629,11 +634,12 @@ For any non-per-thread attachment work that needs to happen before > attach_task. Needed by cpuset. > > void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, > - struct cgroup *old_cgrp, struct task_struct *task) > + struct cgroup_taskset *tset) > (cgroup_mutex held by caller) > > Called after the task has been attached to the cgroup, to allow any > post-attachment activity that requires memory allocations or blocking. > +The parameters are identical to can_attach(). > > void attach_task(struct cgroup *cgrp, struct task_struct *tsk); > (cgroup_mutex held by caller) > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index da7e4bc..2470c8e 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -457,6 +457,28 @@ void cgroup_exclude_rmdir(struct cgroup_subsys_state *css); > void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css); > > /* > + * Control Group taskset, used to pass around set of tasks to cgroup_subsys > + * methods. > + */ > +struct cgroup_taskset; > +struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset); > +struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset); > +struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset); > +int cgroup_taskset_size(struct cgroup_taskset *tset); > + > +/** > + * cgroup_taskset_for_each - iterate cgroup_taskset > + * @task: the loop cursor > + * @skip_cgrp: skip if task's cgroup matches this, %NULL to iterate through all > + * @tset: taskset to iterate > + */ > +#define cgroup_taskset_for_each(task, skip_cgrp, tset) \ > + for ((task) = cgroup_taskset_first((tset)); (task); \ > + (task) = cgroup_taskset_next((tset))) \ > + if (!(skip_cgrp) || \ > + cgroup_taskset_cur_cgroup((tset)) != (skip_cgrp)) > + > +/* > * Control Group subsystem type. > * See Documentation/cgroups/cgroups.txt for details > */ > @@ -467,14 +489,14 @@ struct cgroup_subsys { > int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp); > void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp); > int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp, > - struct task_struct *tsk); > + struct cgroup_taskset *tset); > int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk); > void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp, > - struct task_struct *tsk); > + struct cgroup_taskset *tset); > void (*pre_attach)(struct cgroup *cgrp); > void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk); > void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp, > - struct cgroup *old_cgrp, struct task_struct *tsk); > + struct cgroup_taskset *tset); > void (*fork)(struct cgroup_subsys *ss, struct task_struct *task); > void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp, > struct cgroup *old_cgrp, struct task_struct *task); > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index cf5f3e3..474674b 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -1739,11 +1739,85 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen) > } > EXPORT_SYMBOL_GPL(cgroup_path); > > +/* > + * Control Group taskset > + */ > struct task_and_cgroup { > struct task_struct *task; > struct cgroup *cgrp; > }; > > +struct cgroup_taskset { > + struct task_and_cgroup single; > + struct flex_array *tc_array; > + int tc_array_len; > + int idx; > + struct cgroup *cur_cgrp; > +}; > + > +/** > + * cgroup_taskset_first - reset taskset and return the first task > + * @tset: taskset of interest > + * > + * @tset iteration is initialized and the first task is returned. > + */ > +struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset) > +{ > + if (tset->tc_array) { > + tset->idx = 0; > + return cgroup_taskset_next(tset); > + } else { > + tset->cur_cgrp = tset->single.cgrp; > + return tset->single.task; > + } > +} > +EXPORT_SYMBOL_GPL(cgroup_taskset_first); > + > +/** > + * cgroup_taskset_next - iterate to the next task in taskset > + * @tset: taskset of interest > + * > + * Return the next task in @tset. Iteration must have been initialized > + * with cgroup_taskset_first(). > + */ > +struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset) > +{ > + struct task_and_cgroup *tc; > + > + if (!tset->tc_array || tset->idx >= tset->tc_array_len) > + return NULL; > + > + tc = flex_array_get(tset->tc_array, tset->idx++); > + tset->cur_cgrp = tc->cgrp; > + return tc->task; > +} > +EXPORT_SYMBOL_GPL(cgroup_taskset_next); > + > +/** > + * cgroup_taskset_cur_cgroup - return the matching cgroup for the current task > + * @tset: taskset of interest > + * > + * Return the cgroup for the current (last returned) task of @tset. This > + * function must be preceded by either cgroup_taskset_first() or > + * cgroup_taskset_next(). > + */ > +struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset) > +{ > + return tset->cur_cgrp; > +} > +EXPORT_SYMBOL_GPL(cgroup_taskset_cur_cgroup); > + > +/** > + * cgroup_taskset_size - return the number of tasks in taskset > + * @tset: taskset of interest > + */ > +int cgroup_taskset_size(struct cgroup_taskset *tset) > +{ > + return tset->tc_array ? tset->tc_array_len : 1; > +} > +EXPORT_SYMBOL_GPL(cgroup_taskset_size); > + > + > /* > * cgroup_task_migrate - move a task from one cgroup to another. > * > @@ -1828,15 +1902,19 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk) > struct cgroup_subsys *ss, *failed_ss = NULL; > struct cgroup *oldcgrp; > struct cgroupfs_root *root = cgrp->root; > + struct cgroup_taskset tset = { }; > > /* Nothing to do if the task is already in that cgroup */ > oldcgrp = task_cgroup_from_root(tsk, root); > if (cgrp == oldcgrp) > return 0; > > + tset.single.task = tsk; > + tset.single.cgrp = oldcgrp; > + > for_each_subsys(root, ss) { > if (ss->can_attach) { > - retval = ss->can_attach(ss, cgrp, tsk); > + retval = ss->can_attach(ss, cgrp, &tset); > if (retval) { > /* > * Remember on which subsystem the can_attach() > @@ -1867,7 +1945,7 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk) > if (ss->attach_task) > ss->attach_task(cgrp, tsk); > if (ss->attach) > - ss->attach(ss, cgrp, oldcgrp, tsk); > + ss->attach(ss, cgrp, &tset); > } > > synchronize_rcu(); > @@ -1889,7 +1967,7 @@ out: > */ > break; > if (ss->cancel_attach) > - ss->cancel_attach(ss, cgrp, tsk); > + ss->cancel_attach(ss, cgrp, &tset); > } > } > return retval; > @@ -2005,6 +2083,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) > struct task_struct *tsk; > struct task_and_cgroup *tc; > struct flex_array *group; > + struct cgroup_taskset tset = { }; > /* > * we need to make sure we have css_sets for all the tasks we're > * going to move -before- we actually start moving them, so that in > @@ -2067,6 +2146,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) > } while_each_thread(leader, tsk); > /* remember the number of threads in the array for later. */ > group_size = i; > + tset.tc_array = group; > + tset.tc_array_len = group_size; > rcu_read_unlock(); > > /* methods shouldn't be called if no task is actually migrating */ > @@ -2079,7 +2160,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) > */ > for_each_subsys(root, ss) { > if (ss->can_attach) { > - retval = ss->can_attach(ss, cgrp, leader); > + retval = ss->can_attach(ss, cgrp, &tset); > if (retval) { > failed_ss = ss; > goto out_cancel_attach; > @@ -2169,10 +2250,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) > * being moved, this call will need to be reworked to communicate that. > */ > for_each_subsys(root, ss) { > - if (ss->attach) { > - tc = flex_array_get(group, 0); > - ss->attach(ss, cgrp, tc->cgrp, tc->task); > - } > + if (ss->attach) > + ss->attach(ss, cgrp, &tset); > } > > /* > @@ -2194,11 +2273,11 @@ out_cancel_attach: > for_each_subsys(root, ss) { > if (ss == failed_ss) { > if (cancel_failed_ss && ss->cancel_attach) > - ss->cancel_attach(ss, cgrp, leader); > + ss->cancel_attach(ss, cgrp, &tset); > break; > } > if (ss->cancel_attach) > - ss->cancel_attach(ss, cgrp, leader); > + ss->cancel_attach(ss, cgrp, &tset); > } > } > out_put_tasks: > diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c > index 4e82525..a2b0082 100644 > --- a/kernel/cgroup_freezer.c > +++ b/kernel/cgroup_freezer.c > @@ -159,7 +159,7 @@ static void freezer_destroy(struct cgroup_subsys *ss, > */ > static int freezer_can_attach(struct cgroup_subsys *ss, > struct cgroup *new_cgroup, > - struct task_struct *task) > + struct cgroup_taskset *tset) > { > struct freezer *freezer; > > diff --git a/kernel/cpuset.c b/kernel/cpuset.c > index 10131fd..2e5825b 100644 > --- a/kernel/cpuset.c > +++ b/kernel/cpuset.c > @@ -1368,10 +1368,10 @@ static int fmeter_getrate(struct fmeter *fmp) > } > > /* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */ > -static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont, > - struct task_struct *tsk) > +static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, > + struct cgroup_taskset *tset) > { > - struct cpuset *cs = cgroup_cs(cont); > + struct cpuset *cs = cgroup_cs(cgrp); > > if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)) > return -ENOSPC; > @@ -1384,7 +1384,7 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont, > * set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may > * be changed. > */ > - if (tsk->flags & PF_THREAD_BOUND) > + if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND) > return -EINVAL; > > return 0; > @@ -1434,12 +1434,14 @@ static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk) > cpuset_update_task_spread_flag(cs, tsk); > } > > -static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont, > - struct cgroup *oldcont, struct task_struct *tsk) > +static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, > + struct cgroup_taskset *tset) > { > struct mm_struct *mm; > - struct cpuset *cs = cgroup_cs(cont); > - struct cpuset *oldcs = cgroup_cs(oldcont); > + struct task_struct *tsk = cgroup_taskset_first(tset); > + struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset); > + struct cpuset *cs = cgroup_cs(cgrp); > + struct cpuset *oldcs = cgroup_cs(oldcgrp); > > /* > * Change mm, possibly for multiple threads in a threadgroup. This is > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 930de94..b2802cc 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5460,8 +5460,9 @@ static void mem_cgroup_clear_mc(void) > > static int mem_cgroup_can_attach(struct cgroup_subsys *ss, > struct cgroup *cgroup, > - struct task_struct *p) > + struct cgroup_taskset *tset) > { > + struct task_struct *p = cgroup_taskset_first(tset); > int ret = 0; > struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup); > Ah..hmm. I think this doesn't work as expected for memcg. Maybe code like this will be required. { struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup); struct mem_cgroup *from = NULL; struct task_struct *p; int ret = 0; /* * memcg just works against mm-owner. Check mm-owner is in this cgroup. * Because tset contains only one thread-group, we'll find a task of * mm->owner, at most. */ for_cgroup_taskset_for_each(task, NULL, tset) { struct mm_struct *mm; mm = get_task_mm(task); if (!mm) continue; if (mm->owner == task) { p = task; break; } mmput(mm); } if (!p) return ret; from = mem_cgroup_from_task(p); mem_cgroup_start_move(from); spin_lock(&mc.lock); mc.from = from; mc.to = mem; spin_unlock(&mc.lock); /* We set mc.moving_task later */ ret = mem_cgroup_precharge_mc(mm); if (ret) mem_cgroup_clear_mc(); mm_put(mm); return ret; } > @@ -5499,7 +5500,7 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss, > > static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss, > struct cgroup *cgroup, > - struct task_struct *p) > + struct cgroup_taskset *tset) > { > mem_cgroup_clear_mc(); > } > @@ -5616,9 +5617,9 @@ retry: > > static void mem_cgroup_move_task(struct cgroup_subsys *ss, > struct cgroup *cont, > - struct cgroup *old_cont, > - struct task_struct *p) > + struct cgroup_taskset *tset) > { > + struct task_struct *p = cgroup_taskset_first(tset); > struct mm_struct *mm = get_task_mm(p); > Similar code with can_attach() will be required. Thanks, -Kame

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index cd67e90..2eee7cf 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -594,16 +594,21 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be called multiple times against a cgroup. int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *task) + struct cgroup_taskset *tset) (cgroup_mutex held by caller) -Called prior to moving a task into a cgroup; if the subsystem -returns an error, this will abort the attach operation. If a NULL -task is passed, then a successful result indicates that *any* -unspecified task can be moved into the cgroup. Note that this isn't +Called prior to moving one or more tasks into a cgroup; if the +subsystem returns an error, this will abort the attach operation. +@tset contains the tasks to be attached and is guaranteed to have at +least one task in it. If there are multiple, it's guaranteed that all +are from the same thread group, @tset contains all tasks from the +group whether they're actually switching cgroup or not, and the first +task is the leader. Each @tset entry also contains the task's old +cgroup and tasks which aren't switching cgroup can be skipped easily +using the cgroup_taskset_for_each() iterator. Note that this isn't called on a fork. If this method returns 0 (success) then this should -remain valid while the caller holds cgroup_mutex and it is ensured that either -attach() or cancel_attach() will be called in future. +remain valid while the caller holds cgroup_mutex and it is ensured +that either attach() or cancel_attach() will be called in future. int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk); (cgroup_mutex held by caller) @@ -613,14 +618,14 @@ attached (possibly many when using cgroup_attach_proc). Called after can_attach. void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *task, bool threadgroup) + struct cgroup_taskset *tset) (cgroup_mutex held by caller) Called when a task attach operation has failed after can_attach() has succeeded. A subsystem whose can_attach() has some side-effects should provide this function, so that the subsystem can implement a rollback. If not, not necessary. This will be called only about subsystems whose can_attach() operation have -succeeded. +succeeded. The parameters are identical to can_attach(). void pre_attach(struct cgroup *cgrp); (cgroup_mutex held by caller) @@ -629,11 +634,12 @@ For any non-per-thread attachment work that needs to happen before attach_task. Needed by cpuset. void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup *old_cgrp, struct task_struct *task) + struct cgroup_taskset *tset) (cgroup_mutex held by caller) Called after the task has been attached to the cgroup, to allow any post-attachment activity that requires memory allocations or blocking. +The parameters are identical to can_attach(). void attach_task(struct cgroup *cgrp, struct task_struct *tsk); (cgroup_mutex held by caller) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index da7e4bc..2470c8e 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -457,6 +457,28 @@ void cgroup_exclude_rmdir(struct cgroup_subsys_state *css); void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css); /* + * Control Group taskset, used to pass around set of tasks to cgroup_subsys + * methods. + */ +struct cgroup_taskset; +struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset); +struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset); +struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset); +int cgroup_taskset_size(struct cgroup_taskset *tset); + +/** + * cgroup_taskset_for_each - iterate cgroup_taskset + * @task: the loop cursor + * @skip_cgrp: skip if task's cgroup matches this, %NULL to iterate through all + * @tset: taskset to iterate + */ +#define cgroup_taskset_for_each(task, skip_cgrp, tset) \ + for ((task) = cgroup_taskset_first((tset)); (task); \ + (task) = cgroup_taskset_next((tset))) \ + if (!(skip_cgrp) || \ + cgroup_taskset_cur_cgroup((tset)) != (skip_cgrp)) + +/* * Control Group subsystem type. * See Documentation/cgroups/cgroups.txt for details */ @@ -467,14 +489,14 @@ struct cgroup_subsys { int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp); void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp); int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *tsk); + struct cgroup_taskset *tset); int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk); void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *tsk); + struct cgroup_taskset *tset); void (*pre_attach)(struct cgroup *cgrp); void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk); void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup *old_cgrp, struct task_struct *tsk); + struct cgroup_taskset *tset); void (*fork)(struct cgroup_subsys *ss, struct task_struct *task); void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp, struct cgroup *old_cgrp, struct task_struct *task); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index cf5f3e3..474674b 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1739,11 +1739,85 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen) } EXPORT_SYMBOL_GPL(cgroup_path); +/* + * Control Group taskset + */ struct task_and_cgroup { struct task_struct *task; struct cgroup *cgrp; }; +struct cgroup_taskset { + struct task_and_cgroup single; + struct flex_array *tc_array; + int tc_array_len; + int idx; + struct cgroup *cur_cgrp; +}; + +/** + * cgroup_taskset_first - reset taskset and return the first task + * @tset: taskset of interest + * + * @tset iteration is initialized and the first task is returned. + */ +struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset) +{ + if (tset->tc_array) { + tset->idx = 0; + return cgroup_taskset_next(tset); + } else { + tset->cur_cgrp = tset->single.cgrp; + return tset->single.task; + } +} +EXPORT_SYMBOL_GPL(cgroup_taskset_first); + +/** + * cgroup_taskset_next - iterate to the next task in taskset + * @tset: taskset of interest + * + * Return the next task in @tset. Iteration must have been initialized + * with cgroup_taskset_first(). + */ +struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset) +{ + struct task_and_cgroup *tc; + + if (!tset->tc_array || tset->idx >= tset->tc_array_len) + return NULL; + + tc = flex_array_get(tset->tc_array, tset->idx++); + tset->cur_cgrp = tc->cgrp; + return tc->task; +} +EXPORT_SYMBOL_GPL(cgroup_taskset_next); + +/** + * cgroup_taskset_cur_cgroup - return the matching cgroup for the current task + * @tset: taskset of interest + * + * Return the cgroup for the current (last returned) task of @tset. This + * function must be preceded by either cgroup_taskset_first() or + * cgroup_taskset_next(). + */ +struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset) +{ + return tset->cur_cgrp; +} +EXPORT_SYMBOL_GPL(cgroup_taskset_cur_cgroup); + +/** + * cgroup_taskset_size - return the number of tasks in taskset + * @tset: taskset of interest + */ +int cgroup_taskset_size(struct cgroup_taskset *tset) +{ + return tset->tc_array ? tset->tc_array_len : 1; +} +EXPORT_SYMBOL_GPL(cgroup_taskset_size); + + /* * cgroup_task_migrate - move a task from one cgroup to another. * @@ -1828,15 +1902,19 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk) struct cgroup_subsys *ss, *failed_ss = NULL; struct cgroup *oldcgrp; struct cgroupfs_root *root = cgrp->root; + struct cgroup_taskset tset = { }; /* Nothing to do if the task is already in that cgroup */ oldcgrp = task_cgroup_from_root(tsk, root); if (cgrp == oldcgrp) return 0; + tset.single.task = tsk; + tset.single.cgrp = oldcgrp; + for_each_subsys(root, ss) { if (ss->can_attach) { - retval = ss->can_attach(ss, cgrp, tsk); + retval = ss->can_attach(ss, cgrp, &tset); if (retval) { /* * Remember on which subsystem the can_attach() @@ -1867,7 +1945,7 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk) if (ss->attach_task) ss->attach_task(cgrp, tsk); if (ss->attach) - ss->attach(ss, cgrp, oldcgrp, tsk); + ss->attach(ss, cgrp, &tset); } synchronize_rcu(); @@ -1889,7 +1967,7 @@ out: */ break; if (ss->cancel_attach) - ss->cancel_attach(ss, cgrp, tsk); + ss->cancel_attach(ss, cgrp, &tset); } } return retval; @@ -2005,6 +2083,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) struct task_struct *tsk; struct task_and_cgroup *tc; struct flex_array *group; + struct cgroup_taskset tset = { }; /* * we need to make sure we have css_sets for all the tasks we're * going to move -before- we actually start moving them, so that in @@ -2067,6 +2146,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) } while_each_thread(leader, tsk); /* remember the number of threads in the array for later. */ group_size = i; + tset.tc_array = group; + tset.tc_array_len = group_size; rcu_read_unlock(); /* methods shouldn't be called if no task is actually migrating */ @@ -2079,7 +2160,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) */ for_each_subsys(root, ss) { if (ss->can_attach) { - retval = ss->can_attach(ss, cgrp, leader); + retval = ss->can_attach(ss, cgrp, &tset); if (retval) { failed_ss = ss; goto out_cancel_attach; @@ -2169,10 +2250,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) * being moved, this call will need to be reworked to communicate that. */ for_each_subsys(root, ss) { - if (ss->attach) { - tc = flex_array_get(group, 0); - ss->attach(ss, cgrp, tc->cgrp, tc->task); - } + if (ss->attach) + ss->attach(ss, cgrp, &tset); } /* @@ -2194,11 +2273,11 @@ out_cancel_attach: for_each_subsys(root, ss) { if (ss == failed_ss) { if (cancel_failed_ss && ss->cancel_attach) - ss->cancel_attach(ss, cgrp, leader); + ss->cancel_attach(ss, cgrp, &tset); break; } if (ss->cancel_attach) - ss->cancel_attach(ss, cgrp, leader); + ss->cancel_attach(ss, cgrp, &tset); } } out_put_tasks: diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c index 4e82525..a2b0082 100644 --- a/kernel/cgroup_freezer.c +++ b/kernel/cgroup_freezer.c @@ -159,7 +159,7 @@ static void freezer_destroy(struct cgroup_subsys *ss, */ static int freezer_can_attach(struct cgroup_subsys *ss, struct cgroup *new_cgroup, - struct task_struct *task) + struct cgroup_taskset *tset) { struct freezer *freezer; diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 10131fd..2e5825b 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1368,10 +1368,10 @@ static int fmeter_getrate(struct fmeter *fmp) } /* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */ -static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont, - struct task_struct *tsk) +static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, + struct cgroup_taskset *tset) { - struct cpuset *cs = cgroup_cs(cont); + struct cpuset *cs = cgroup_cs(cgrp); if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)) return -ENOSPC; @@ -1384,7 +1384,7 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont, * set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may * be changed. */ - if (tsk->flags & PF_THREAD_BOUND) + if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND) return -EINVAL; return 0; @@ -1434,12 +1434,14 @@ static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk) cpuset_update_task_spread_flag(cs, tsk); } -static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont, - struct cgroup *oldcont, struct task_struct *tsk) +static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, + struct cgroup_taskset *tset) { struct mm_struct *mm; - struct cpuset *cs = cgroup_cs(cont); - struct cpuset *oldcs = cgroup_cs(oldcont); + struct task_struct *tsk = cgroup_taskset_first(tset); + struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset); + struct cpuset *cs = cgroup_cs(cgrp); + struct cpuset *oldcs = cgroup_cs(oldcgrp); /* * Change mm, possibly for multiple threads in a threadgroup. This is diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 930de94..b2802cc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5460,8 +5460,9 @@ static void mem_cgroup_clear_mc(void) static int mem_cgroup_can_attach(struct cgroup_subsys *ss, struct cgroup *cgroup, - struct task_struct *p) + struct cgroup_taskset *tset) { + struct task_struct *p = cgroup_taskset_first(tset); int ret = 0; struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup); @@ -5499,7 +5500,7 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss, static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgroup, - struct task_struct *p) + struct cgroup_taskset *tset) { mem_cgroup_clear_mc(); } @@ -5616,9 +5617,9 @@ retry: static void mem_cgroup_move_task(struct cgroup_subsys *ss, struct cgroup *cont, - struct cgroup *old_cont, - struct task_struct *p) + struct cgroup_taskset *tset) { + struct task_struct *p = cgroup_taskset_first(tset); struct mm_struct *mm = get_task_mm(p); if (mm) { @@ -5633,19 +5634,18 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss, #else /* !CONFIG_MMU */ static int mem_cgroup_can_attach(struct cgroup_subsys *ss, struct cgroup *cgroup, - struct task_struct *p) + struct cgroup_taskset *tset) { return 0; } static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgroup, - struct task_struct *p) + struct cgroup_taskset *tset) { } static void mem_cgroup_move_task(struct cgroup_subsys *ss, struct cgroup *cont, - struct cgroup *old_cont, - struct task_struct *p) + struct cgroup_taskset *tset) { } #endif diff --git a/security/device_cgroup.c b/security/device_cgroup.c index 4450fbe..8b5b5d8 100644 --- a/security/device_cgroup.c +++ b/security/device_cgroup.c @@ -62,11 +62,12 @@ static inline struct dev_cgroup *task_devcgroup(struct task_struct *task) struct cgroup_subsys devices_subsys; static int devcgroup_can_attach(struct cgroup_subsys *ss, - struct cgroup *new_cgroup, struct task_struct *task) + struct cgroup *new_cgrp, struct cgroup_taskset *set) { - if (current != task && !capable(CAP_SYS_ADMIN)) - return -EPERM; + struct task_struct *task = cgroup_taskset_first(set); + if (current != task && !capable(CAP_SYS_ADMIN)) + return -EPERM; return 0; }

[3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()

Commit Message

Comments

Patch