Message ID | 20220819220927.3409575-2-kuifeng@fb.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | Parameterize task iterators. | expand |
On 8/19/22 3:09 PM, Kui-Feng Lee wrote: > Allow creating an iterator that loops through resources of one task/thread. one thread/process. > > People could only create iterators to loop through all resources of > files, vma, and tasks in the system, even though they were interested > in only the resources of a specific task or process. Passing the > additional parameters, people can now create an iterator to go > through all resources or only the resources of a task. > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Acked-by: Yonghong Song <yhs@fb.com>
On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote: > > Allow creating an iterator that loops through resources of one task/thread. > > People could only create iterators to loop through all resources of > files, vma, and tasks in the system, even though they were interested > in only the resources of a specific task or process. Passing the > additional parameters, people can now create an iterator to go > through all resources or only the resources of a task. > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> > --- > include/linux/bpf.h | 25 +++++++ > include/uapi/linux/bpf.h | 6 ++ > kernel/bpf/task_iter.c | 116 ++++++++++++++++++++++++++------- > tools/include/uapi/linux/bpf.h | 6 ++ > 4 files changed, 129 insertions(+), 24 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 39bd36359c1e..59712dd917d8 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user *pathname, int flags); > extern int bpf_iter_ ## target(args); \ > int __init bpf_iter_ ## target(args) { return 0; } > > +/* > + * The task type of iterators. > + * > + * For BPF task iterators, they can be parameterized with various > + * parameters to visit only some of tasks. > + * > + * BPF_TASK_ITER_ALL (default) > + * Iterate over resources of every task. > + * > + * BPF_TASK_ITER_TID > + * Iterate over resources of a task/tid. > + * > + * BPF_TASK_ITER_TGID > + * Iterate over reosurces of evevry task of a process / task group. typos: resources, every > + */ > +enum bpf_iter_task_type { > + BPF_TASK_ITER_ALL = 0, > + BPF_TASK_ITER_TID, > + BPF_TASK_ITER_TGID, > +}; > + [...] > rcu_read_lock(); > retry: > - pid = find_ge_pid(*tid, ns); > + pid = find_ge_pid(*tid, common->ns); > if (pid) { > - *tid = pid_nr_ns(pid, ns); > + *tid = pid_nr_ns(pid, common->ns); > task = get_pid_task(pid, PIDTYPE_PID); > if (!task) { > ++*tid; > goto retry; > - } else if (skip_if_dup_files && !thread_group_leader(task) && > - task->files == task->group_leader->files) { > + } else if ((skip_if_dup_files && !thread_group_leader(task) && > + task->files == task->group_leader->files) || > + (common->type == BPF_TASK_ITER_TGID && > + __task_pid_nr_ns(task, PIDTYPE_TGID, common->ns) != common->pid)) { it gets super hard to follow this logic, would a simple helper function to calculate this condition (and maybe some comments to explain the logic behind these checks?) make it a bit more readable? > put_task_struct(task); > task = NULL; > ++*tid; > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos) > struct bpf_iter_seq_task_info *info = seq->private; > struct task_struct *task; > > - task = task_seq_get_next(info->common.ns, &info->tid, false); > + task = task_seq_get_next(&info->common, &info->tid, false); > if (!task) > return NULL; > > @@ -73,7 +90,7 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos) > ++*pos; > ++info->tid; > put_task_struct((struct task_struct *)v); > - task = task_seq_get_next(info->common.ns, &info->tid, false); > + task = task_seq_get_next(&info->common, &info->tid, false); > if (!task) > return NULL; > > @@ -117,6 +134,48 @@ static void task_seq_stop(struct seq_file *seq, void *v) > put_task_struct((struct task_struct *)v); > } > > +static int bpf_iter_attach_task(struct bpf_prog *prog, > + union bpf_iter_link_info *linfo, > + struct bpf_iter_aux_info *aux) > +{ > + unsigned int flags; > + struct pid_namespace *ns; > + struct pid *pid; > + pid_t tgid; it seems it would be simpler to first check that at most one of tid/pid/pid_fd is set instead of making sure that aux->task.type wasn't already set. How about if (!!linfo->task.tid + !!linfo->task.pid + !!linfo->task.pid_fd > 1) return -EINVAL; ? > + > + aux->task.type = BPF_TASK_ITER_ALL; > + if (linfo->task.tid != 0) { > + aux->task.type = BPF_TASK_ITER_TID; > + aux->task.pid = linfo->task.tid; > + } > + if (linfo->task.pid != 0) { > + if (aux->task.type != BPF_TASK_ITER_ALL) > + return -EINVAL; > + > + aux->task.type = BPF_TASK_ITER_TGID; > + aux->task.pid = linfo->task.pid; > + } > + if (linfo->task.pid_fd != 0) { > + if (aux->task.type != BPF_TASK_ITER_ALL) > + return -EINVAL; > + > + aux->task.type = BPF_TASK_ITER_TGID; > + ns = task_active_pid_ns(current); > + if (IS_ERR(ns)) > + return PTR_ERR(ns); > + > + pid = pidfd_get_pid(linfo->task.pid_fd, &flags); > + if (IS_ERR(pid)) > + return PTR_ERR(pid); > + > + tgid = pid_nr_ns(pid, ns); > + aux->task.pid = tgid; > + put_pid(pid); > + } > + > + return 0; > +} > + [...]
On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote: > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote: > > > > Allow creating an iterator that loops through resources of one > > task/thread. > > > > People could only create iterators to loop through all resources of > > files, vma, and tasks in the system, even though they were > > interested > > in only the resources of a specific task or process. Passing the > > additional parameters, people can now create an iterator to go > > through all resources or only the resources of a task. > > > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> > > --- > > include/linux/bpf.h | 25 +++++++ > > include/uapi/linux/bpf.h | 6 ++ > > kernel/bpf/task_iter.c | 116 ++++++++++++++++++++++++++--- > > ---- > > tools/include/uapi/linux/bpf.h | 6 ++ > > 4 files changed, 129 insertions(+), 24 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 39bd36359c1e..59712dd917d8 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user > > *pathname, int flags); > > extern int bpf_iter_ ## target(args); \ > > int __init bpf_iter_ ## target(args) { return 0; } > > > > +/* > > + * The task type of iterators. > > + * > > + * For BPF task iterators, they can be parameterized with various > > + * parameters to visit only some of tasks. > > + * > > + * BPF_TASK_ITER_ALL (default) > > + * Iterate over resources of every task. > > + * > > + * BPF_TASK_ITER_TID > > + * Iterate over resources of a task/tid. > > + * > > + * BPF_TASK_ITER_TGID > > + * Iterate over reosurces of evevry task of a process / task > > group. > > typos: resources, every > > > + */ > > +enum bpf_iter_task_type { > > + BPF_TASK_ITER_ALL = 0, > > + BPF_TASK_ITER_TID, > > + BPF_TASK_ITER_TGID, > > +}; > > + > > [...] > > > rcu_read_lock(); > > retry: > > - pid = find_ge_pid(*tid, ns); > > + pid = find_ge_pid(*tid, common->ns); > > if (pid) { > > - *tid = pid_nr_ns(pid, ns); > > + *tid = pid_nr_ns(pid, common->ns); > > task = get_pid_task(pid, PIDTYPE_PID); > > if (!task) { > > ++*tid; > > goto retry; > > - } else if (skip_if_dup_files && > > !thread_group_leader(task) && > > - task->files == task->group_leader- > > >files) { > > + } else if ((skip_if_dup_files && > > !thread_group_leader(task) && > > + task->files == task->group_leader- > > >files) || > > + (common->type == BPF_TASK_ITER_TGID && > > + __task_pid_nr_ns(task, PIDTYPE_TGID, > > common->ns) != common->pid)) { > > it gets super hard to follow this logic, would a simple helper > function to calculate this condition (and maybe some comments to > explain the logic behind these checks?) make it a bit more readable? !matched_task(task, common, skip_if_dup_file)? bool matched_task(struct task_struct *task, struct bpf_iter_seq_task_common *common, bool skip_if_dup_file) { /* Should not have the same 'files' if skip_if_dup_file is true */ bool diff_files_if = !skip_if_dup_file || (thread_group_leader(task) && task->file != task->gorup_leader->fies); /* Should have the given tgid if the type is BPF_TASK_ITER_TGI */ bool have_tgid_if = common->type != BPF_TASK_ITER_TGID || __task_pid_nr_ns(task, PIDTYPE_TGID, common->ns) == common->pid; return diff_files_if && have_tgid_if; } How about this? > > > put_task_struct(task); > > task = NULL; > > ++*tid; > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file *seq, > > loff_t *pos) > > struct bpf_iter_seq_task_info *info = seq->private; > > struct task_struct *task; > > > > - task = task_seq_get_next(info->common.ns, &info->tid, > > false); > > + task = task_seq_get_next(&info->common, &info->tid, false); > > if (!task) > > return NULL; > > > > @@ -73,7 +90,7 @@ static void *task_seq_next(struct seq_file *seq, > > void *v, loff_t *pos) > > ++*pos; > > ++info->tid; > > put_task_struct((struct task_struct *)v); > > - task = task_seq_get_next(info->common.ns, &info->tid, > > false); > > + task = task_seq_get_next(&info->common, &info->tid, false); > > if (!task) > > return NULL; > > > > @@ -117,6 +134,48 @@ static void task_seq_stop(struct seq_file > > *seq, void *v) > > put_task_struct((struct task_struct *)v); > > } > > > > +static int bpf_iter_attach_task(struct bpf_prog *prog, > > + union bpf_iter_link_info *linfo, > > + struct bpf_iter_aux_info *aux) > > +{ > > + unsigned int flags; > > + struct pid_namespace *ns; > > + struct pid *pid; > > + pid_t tgid; > > it seems it would be simpler to first check that at most one of > tid/pid/pid_fd is set instead of making sure that aux->task.type > wasn't already set. > > How about > > if (!!linfo->task.tid + !!linfo->task.pid + !!linfo->task.pid_fd > 1) > return -EINVAL; > > ? Agree > > > + > > + aux->task.type = BPF_TASK_ITER_ALL; > > + if (linfo->task.tid != 0) { > > + aux->task.type = BPF_TASK_ITER_TID; > > + aux->task.pid = linfo->task.tid; > > + } > > + if (linfo->task.pid != 0) { > > + if (aux->task.type != BPF_TASK_ITER_ALL) > > + return -EINVAL; > > + > > + aux->task.type = BPF_TASK_ITER_TGID; > > + aux->task.pid = linfo->task.pid; > > + } > > + if (linfo->task.pid_fd != 0) { > > + if (aux->task.type != BPF_TASK_ITER_ALL) > > + return -EINVAL; > > + > > + aux->task.type = BPF_TASK_ITER_TGID; > > + ns = task_active_pid_ns(current); > > + if (IS_ERR(ns)) > > + return PTR_ERR(ns); > > + > > + pid = pidfd_get_pid(linfo->task.pid_fd, &flags); > > + if (IS_ERR(pid)) > > + return PTR_ERR(pid); > > + > > + tgid = pid_nr_ns(pid, ns); > > + aux->task.pid = tgid; > > + put_pid(pid); > > + } > > + > > + return 0; > > +} > > + > > [...]
On Wed, Aug 24, 2022 at 5:16 PM Kui-Feng Lee <kuifeng@fb.com> wrote: > > On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote: > > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote: > > > > > > Allow creating an iterator that loops through resources of one > > > task/thread. > > > > > > People could only create iterators to loop through all resources of > > > files, vma, and tasks in the system, even though they were > > > interested > > > in only the resources of a specific task or process. Passing the > > > additional parameters, people can now create an iterator to go > > > through all resources or only the resources of a task. > > > > > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> > > > --- > > > include/linux/bpf.h | 25 +++++++ > > > include/uapi/linux/bpf.h | 6 ++ > > > kernel/bpf/task_iter.c | 116 ++++++++++++++++++++++++++--- > > > ---- > > > tools/include/uapi/linux/bpf.h | 6 ++ > > > 4 files changed, 129 insertions(+), 24 deletions(-) > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > index 39bd36359c1e..59712dd917d8 100644 > > > --- a/include/linux/bpf.h > > > +++ b/include/linux/bpf.h > > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user > > > *pathname, int flags); > > > extern int bpf_iter_ ## target(args); \ > > > int __init bpf_iter_ ## target(args) { return 0; } > > > > > > +/* > > > + * The task type of iterators. > > > + * > > > + * For BPF task iterators, they can be parameterized with various > > > + * parameters to visit only some of tasks. > > > + * > > > + * BPF_TASK_ITER_ALL (default) > > > + * Iterate over resources of every task. > > > + * > > > + * BPF_TASK_ITER_TID > > > + * Iterate over resources of a task/tid. > > > + * > > > + * BPF_TASK_ITER_TGID > > > + * Iterate over reosurces of evevry task of a process / task > > > group. > > > > typos: resources, every > > > > > + */ > > > +enum bpf_iter_task_type { > > > + BPF_TASK_ITER_ALL = 0, > > > + BPF_TASK_ITER_TID, > > > + BPF_TASK_ITER_TGID, > > > +}; > > > + > > > > [...] > > > > > rcu_read_lock(); > > > retry: > > > - pid = find_ge_pid(*tid, ns); > > > + pid = find_ge_pid(*tid, common->ns); > > > if (pid) { > > > - *tid = pid_nr_ns(pid, ns); > > > + *tid = pid_nr_ns(pid, common->ns); > > > task = get_pid_task(pid, PIDTYPE_PID); > > > if (!task) { > > > ++*tid; > > > goto retry; > > > - } else if (skip_if_dup_files && > > > !thread_group_leader(task) && > > > - task->files == task->group_leader- > > > >files) { > > > + } else if ((skip_if_dup_files && > > > !thread_group_leader(task) && > > > + task->files == task->group_leader- > > > >files) || > > > + (common->type == BPF_TASK_ITER_TGID && > > > + __task_pid_nr_ns(task, PIDTYPE_TGID, > > > common->ns) != common->pid)) { > > > > it gets super hard to follow this logic, would a simple helper > > function to calculate this condition (and maybe some comments to > > explain the logic behind these checks?) make it a bit more readable? > > !matched_task(task, common, skip_if_dup_file)? > > bool matched_task(struct task_struct *task, > struct bpf_iter_seq_task_common *common, > bool skip_if_dup_file) { > /* Should not have the same 'files' if skip_if_dup_file is true > */ > bool diff_files_if = > !skip_if_dup_file || > (thread_group_leader(task) && > task->file != task->gorup_leader->fies); > /* Should have the given tgid if the type is BPF_TASK_ITER_TGI > */ > bool have_tgid_if = > common->type != BPF_TASK_ITER_TGID || > __task_pid_nr_ns(task, PIDTYPE_TGID, > common->ns) == common->pid; > return diff_files_if && have_tgid_if; > } > > How about this? > Hm... "matched_task" doesn't mean much, tbh, so not really. I wanted to suggest having a separate helper just for your TGID check and call it something more meaningful like "task_belongs_to_tgid". Can't come up with a good name for existing dup_file check, so I'd probably keep it as is. But also seems like there is same_thread_group() helper in include/linux/sched/signal.h, so let's look if we can use it, it seems like it's just comparing signal pointers (probably quite faster than what you are doing right now). But looking at this some more made me realize that even if we specify pid (tgid in kernel terms) we are still going to iterate through all the tasks, essentially. Is that right? So TGID mode isn't great for speeding up, we should point out to users that if they want to iterate files of the process, they probably want to use TID mode and set tid to pid to use the early termination condition in TID. It wasn't obvious to me until I re-read this patch like 3 times and wrote three different replies here :) But then I also went looking at what procfs doing for /proc/<pid/task/* dirs. It does seem like there are faster ways to iterate all threads of a process. See next_tid() which uses next_thread(), etc. Can you please check those and see if we can have faster in-process iteration? > > > > > put_task_struct(task); > > > task = NULL; > > > ++*tid; > > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file *seq, > > > loff_t *pos) > > > struct bpf_iter_seq_task_info *info = seq->private; > > > struct task_struct *task; > > > > > > - task = task_seq_get_next(info->common.ns, &info->tid, > > > false); > > > + task = task_seq_get_next(&info->common, &info->tid, false); > > > if (!task) > > > return NULL; > > > [...]
On Thu, 2022-08-25 at 14:13 -0700, Andrii Nakryiko wrote: > On Wed, Aug 24, 2022 at 5:16 PM Kui-Feng Lee <kuifeng@fb.com> wrote: > > > > On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote: > > > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> > > > wrote: > > > > > > > > Allow creating an iterator that loops through resources of one > > > > task/thread. > > > > > > > > People could only create iterators to loop through all > > > > resources of > > > > files, vma, and tasks in the system, even though they were > > > > interested > > > > in only the resources of a specific task or process. Passing > > > > the > > > > additional parameters, people can now create an iterator to go > > > > through all resources or only the resources of a task. > > > > > > > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> > > > > --- > > > > include/linux/bpf.h | 25 +++++++ > > > > include/uapi/linux/bpf.h | 6 ++ > > > > kernel/bpf/task_iter.c | 116 > > > > ++++++++++++++++++++++++++--- > > > > ---- > > > > tools/include/uapi/linux/bpf.h | 6 ++ > > > > 4 files changed, 129 insertions(+), 24 deletions(-) > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > index 39bd36359c1e..59712dd917d8 100644 > > > > --- a/include/linux/bpf.h > > > > +++ b/include/linux/bpf.h > > > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user > > > > *pathname, int flags); > > > > extern int bpf_iter_ ## target(args); > > > > \ > > > > int __init bpf_iter_ ## target(args) { return 0; } > > > > > > > > +/* > > > > + * The task type of iterators. > > > > + * > > > > + * For BPF task iterators, they can be parameterized with > > > > various > > > > + * parameters to visit only some of tasks. > > > > + * > > > > + * BPF_TASK_ITER_ALL (default) > > > > + * Iterate over resources of every task. > > > > + * > > > > + * BPF_TASK_ITER_TID > > > > + * Iterate over resources of a task/tid. > > > > + * > > > > + * BPF_TASK_ITER_TGID > > > > + * Iterate over reosurces of evevry task of a process / > > > > task > > > > group. > > > > > > typos: resources, every > > > > > > > + */ > > > > +enum bpf_iter_task_type { > > > > + BPF_TASK_ITER_ALL = 0, > > > > + BPF_TASK_ITER_TID, > > > > + BPF_TASK_ITER_TGID, > > > > +}; > > > > + > > > > > > [...] > > > > > > > rcu_read_lock(); > > > > retry: > > > > - pid = find_ge_pid(*tid, ns); > > > > + pid = find_ge_pid(*tid, common->ns); > > > > if (pid) { > > > > - *tid = pid_nr_ns(pid, ns); > > > > + *tid = pid_nr_ns(pid, common->ns); > > > > task = get_pid_task(pid, PIDTYPE_PID); > > > > if (!task) { > > > > ++*tid; > > > > goto retry; > > > > - } else if (skip_if_dup_files && > > > > !thread_group_leader(task) && > > > > - task->files == task->group_leader- > > > > > files) { > > > > + } else if ((skip_if_dup_files && > > > > !thread_group_leader(task) && > > > > + task->files == task->group_leader- > > > > > files) || > > > > + (common->type == BPF_TASK_ITER_TGID > > > > && > > > > + __task_pid_nr_ns(task, > > > > PIDTYPE_TGID, > > > > common->ns) != common->pid)) { > > > > > > it gets super hard to follow this logic, would a simple helper > > > function to calculate this condition (and maybe some comments to > > > explain the logic behind these checks?) make it a bit more > > > readable? > > > > !matched_task(task, common, skip_if_dup_file)? > > > > bool matched_task(struct task_struct *task, > > struct bpf_iter_seq_task_common *common, > > bool skip_if_dup_file) { > > /* Should not have the same 'files' if skip_if_dup_file is > > true > > */ > > bool diff_files_if = > > !skip_if_dup_file || > > (thread_group_leader(task) && > > task->file != task->gorup_leader->fies); > > /* Should have the given tgid if the type is > > BPF_TASK_ITER_TGI > > */ > > bool have_tgid_if = > > common->type != BPF_TASK_ITER_TGID || > > __task_pid_nr_ns(task, PIDTYPE_TGID, > > common->ns) == common->pid; > > return diff_files_if && have_tgid_if; > > } > > > > How about this? > > > > Hm... "matched_task" doesn't mean much, tbh, so not really. I wanted > to suggest having a separate helper just for your TGID check and call > it something more meaningful like "task_belongs_to_tgid". Can't come > up with a good name for existing dup_file check, so I'd probably keep > it as is. But also seems like there is same_thread_group() helper in > include/linux/sched/signal.h, so let's look if we can use it, it > seems > like it's just comparing signal pointers (probably quite faster than > what you are doing right now). > > But looking at this some more made me realize that even if we specify > pid (tgid in kernel terms) we are still going to iterate through all > the tasks, essentially. Is that right? So TGID mode isn't great for > speeding up, we should point out to users that if they want to > iterate > files of the process, they probably want to use TID mode and set tid > to pid to use the early termination condition in TID. > > It wasn't obvious to me until I re-read this patch like 3 times and > wrote three different replies here :) > > But then I also went looking at what procfs doing for > /proc/<pid/task/* dirs. It does seem like there are faster ways to > iterate all threads of a process. See next_tid() which uses > next_thread(), etc. Can you please check those and see if we can have > faster in-process iteration? > I haven't notice this message until now. It looks like very promise. I will add it to next version. > > > > > > > > put_task_struct(task); > > > > task = NULL; > > > > ++*tid; > > > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file > > > > *seq, > > > > loff_t *pos) > > > > struct bpf_iter_seq_task_info *info = seq->private; > > > > struct task_struct *task; > > > > > > > > - task = task_seq_get_next(info->common.ns, &info->tid, > > > > false); > > > > + task = task_seq_get_next(&info->common, &info->tid, > > > > false); > > > > if (!task) > > > > return NULL; > > > > > > [...]
On Fri, Aug 26, 2022 at 7:49 AM Kui-Feng Lee <kuifeng@fb.com> wrote: > > On Thu, 2022-08-25 at 14:13 -0700, Andrii Nakryiko wrote: > > On Wed, Aug 24, 2022 at 5:16 PM Kui-Feng Lee <kuifeng@fb.com> wrote: > > > > > > On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote: > > > > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> > > > > wrote: > > > > > > > > > > Allow creating an iterator that loops through resources of one > > > > > task/thread. > > > > > > > > > > People could only create iterators to loop through all > > > > > resources of > > > > > files, vma, and tasks in the system, even though they were > > > > > interested > > > > > in only the resources of a specific task or process. Passing > > > > > the > > > > > additional parameters, people can now create an iterator to go > > > > > through all resources or only the resources of a task. > > > > > > > > > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> > > > > > --- > > > > > include/linux/bpf.h | 25 +++++++ > > > > > include/uapi/linux/bpf.h | 6 ++ > > > > > kernel/bpf/task_iter.c | 116 > > > > > ++++++++++++++++++++++++++--- > > > > > ---- > > > > > tools/include/uapi/linux/bpf.h | 6 ++ > > > > > 4 files changed, 129 insertions(+), 24 deletions(-) > > > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > > index 39bd36359c1e..59712dd917d8 100644 > > > > > --- a/include/linux/bpf.h > > > > > +++ b/include/linux/bpf.h > > > > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user > > > > > *pathname, int flags); > > > > > extern int bpf_iter_ ## target(args); > > > > > \ > > > > > int __init bpf_iter_ ## target(args) { return 0; } > > > > > > > > > > +/* > > > > > + * The task type of iterators. > > > > > + * > > > > > + * For BPF task iterators, they can be parameterized with > > > > > various > > > > > + * parameters to visit only some of tasks. > > > > > + * > > > > > + * BPF_TASK_ITER_ALL (default) > > > > > + * Iterate over resources of every task. > > > > > + * > > > > > + * BPF_TASK_ITER_TID > > > > > + * Iterate over resources of a task/tid. > > > > > + * > > > > > + * BPF_TASK_ITER_TGID > > > > > + * Iterate over reosurces of evevry task of a process / > > > > > task > > > > > group. > > > > > > > > typos: resources, every > > > > > > > > > + */ > > > > > +enum bpf_iter_task_type { > > > > > + BPF_TASK_ITER_ALL = 0, > > > > > + BPF_TASK_ITER_TID, > > > > > + BPF_TASK_ITER_TGID, > > > > > +}; > > > > > + > > > > > > > > [...] > > > > > > > > > rcu_read_lock(); > > > > > retry: > > > > > - pid = find_ge_pid(*tid, ns); > > > > > + pid = find_ge_pid(*tid, common->ns); > > > > > if (pid) { > > > > > - *tid = pid_nr_ns(pid, ns); > > > > > + *tid = pid_nr_ns(pid, common->ns); > > > > > task = get_pid_task(pid, PIDTYPE_PID); > > > > > if (!task) { > > > > > ++*tid; > > > > > goto retry; > > > > > - } else if (skip_if_dup_files && > > > > > !thread_group_leader(task) && > > > > > - task->files == task->group_leader- > > > > > > files) { > > > > > + } else if ((skip_if_dup_files && > > > > > !thread_group_leader(task) && > > > > > + task->files == task->group_leader- > > > > > > files) || > > > > > + (common->type == BPF_TASK_ITER_TGID > > > > > && > > > > > + __task_pid_nr_ns(task, > > > > > PIDTYPE_TGID, > > > > > common->ns) != common->pid)) { > > > > > > > > it gets super hard to follow this logic, would a simple helper > > > > function to calculate this condition (and maybe some comments to > > > > explain the logic behind these checks?) make it a bit more > > > > readable? > > > > > > !matched_task(task, common, skip_if_dup_file)? > > > > > > bool matched_task(struct task_struct *task, > > > struct bpf_iter_seq_task_common *common, > > > bool skip_if_dup_file) { > > > /* Should not have the same 'files' if skip_if_dup_file is > > > true > > > */ > > > bool diff_files_if = > > > !skip_if_dup_file || > > > (thread_group_leader(task) && > > > task->file != task->gorup_leader->fies); > > > /* Should have the given tgid if the type is > > > BPF_TASK_ITER_TGI > > > */ > > > bool have_tgid_if = > > > common->type != BPF_TASK_ITER_TGID || > > > __task_pid_nr_ns(task, PIDTYPE_TGID, > > > common->ns) == common->pid; > > > return diff_files_if && have_tgid_if; > > > } > > > > > > How about this? > > > > > > > Hm... "matched_task" doesn't mean much, tbh, so not really. I wanted > > to suggest having a separate helper just for your TGID check and call > > it something more meaningful like "task_belongs_to_tgid". Can't come > > up with a good name for existing dup_file check, so I'd probably keep > > it as is. But also seems like there is same_thread_group() helper in > > include/linux/sched/signal.h, so let's look if we can use it, it > > seems > > like it's just comparing signal pointers (probably quite faster than > > what you are doing right now). > > > > But looking at this some more made me realize that even if we specify > > pid (tgid in kernel terms) we are still going to iterate through all > > the tasks, essentially. Is that right? So TGID mode isn't great for > > speeding up, we should point out to users that if they want to > > iterate > > files of the process, they probably want to use TID mode and set tid > > to pid to use the early termination condition in TID. > > > > It wasn't obvious to me until I re-read this patch like 3 times and > > wrote three different replies here :) > > > > But then I also went looking at what procfs doing for > > /proc/<pid/task/* dirs. It does seem like there are faster ways to > > iterate all threads of a process. See next_tid() which uses > > next_thread(), etc. Can you please check those and see if we can have > > faster in-process iteration? > > > > I haven't notice this message until now. > It looks like very promise. I will add it to next version. No worries. Thanks! Hopefully there is a way to make efficient per-process file and vma iteration. > > > > > > > > > > > > put_task_struct(task); > > > > > task = NULL; > > > > > ++*tid; > > > > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file > > > > > *seq, > > > > > loff_t *pos) > > > > > struct bpf_iter_seq_task_info *info = seq->private; > > > > > struct task_struct *task; > > > > > > > > > > - task = task_seq_get_next(info->common.ns, &info->tid, > > > > > false); > > > > > + task = task_seq_get_next(&info->common, &info->tid, > > > > > false); > > > > > if (!task) > > > > > return NULL; > > > > > > > > > [...] >
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 39bd36359c1e..59712dd917d8 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user *pathname, int flags); extern int bpf_iter_ ## target(args); \ int __init bpf_iter_ ## target(args) { return 0; } +/* + * The task type of iterators. + * + * For BPF task iterators, they can be parameterized with various + * parameters to visit only some of tasks. + * + * BPF_TASK_ITER_ALL (default) + * Iterate over resources of every task. + * + * BPF_TASK_ITER_TID + * Iterate over resources of a task/tid. + * + * BPF_TASK_ITER_TGID + * Iterate over reosurces of evevry task of a process / task group. + */ +enum bpf_iter_task_type { + BPF_TASK_ITER_ALL = 0, + BPF_TASK_ITER_TID, + BPF_TASK_ITER_TGID, +}; + struct bpf_iter_aux_info { struct bpf_map *map; + struct { + enum bpf_iter_task_type type; + u32 pid; + } task; }; typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 934a2a8beb87..778fbf11aa00 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -91,6 +91,12 @@ union bpf_iter_link_info { struct { __u32 map_fd; } map; + /* Parameters of task iterators. */ + struct { + __u32 tid; + __u32 pid; + __u32 pid_fd; + } task; }; /* BPF syscall commands, see bpf(2) man-page for more details. */ diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 8c921799def4..2f5fc6602917 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -12,6 +12,8 @@ struct bpf_iter_seq_task_common { struct pid_namespace *ns; + enum bpf_iter_task_type type; + u32 pid; }; struct bpf_iter_seq_task_info { @@ -22,24 +24,39 @@ struct bpf_iter_seq_task_info { u32 tid; }; -static struct task_struct *task_seq_get_next(struct pid_namespace *ns, +static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *common, u32 *tid, bool skip_if_dup_files) { struct task_struct *task = NULL; struct pid *pid; + if (common->type == BPF_TASK_ITER_TID) { + if (*tid && *tid != common->pid) + return NULL; + rcu_read_lock(); + pid = find_pid_ns(common->pid, common->ns); + if (pid) { + task = get_pid_task(pid, PIDTYPE_PID); + *tid = common->pid; + } + rcu_read_unlock(); + return task; + } + rcu_read_lock(); retry: - pid = find_ge_pid(*tid, ns); + pid = find_ge_pid(*tid, common->ns); if (pid) { - *tid = pid_nr_ns(pid, ns); + *tid = pid_nr_ns(pid, common->ns); task = get_pid_task(pid, PIDTYPE_PID); if (!task) { ++*tid; goto retry; - } else if (skip_if_dup_files && !thread_group_leader(task) && - task->files == task->group_leader->files) { + } else if ((skip_if_dup_files && !thread_group_leader(task) && + task->files == task->group_leader->files) || + (common->type == BPF_TASK_ITER_TGID && + __task_pid_nr_ns(task, PIDTYPE_TGID, common->ns) != common->pid)) { put_task_struct(task); task = NULL; ++*tid; @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos) struct bpf_iter_seq_task_info *info = seq->private; struct task_struct *task; - task = task_seq_get_next(info->common.ns, &info->tid, false); + task = task_seq_get_next(&info->common, &info->tid, false); if (!task) return NULL; @@ -73,7 +90,7 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos) ++*pos; ++info->tid; put_task_struct((struct task_struct *)v); - task = task_seq_get_next(info->common.ns, &info->tid, false); + task = task_seq_get_next(&info->common, &info->tid, false); if (!task) return NULL; @@ -117,6 +134,48 @@ static void task_seq_stop(struct seq_file *seq, void *v) put_task_struct((struct task_struct *)v); } +static int bpf_iter_attach_task(struct bpf_prog *prog, + union bpf_iter_link_info *linfo, + struct bpf_iter_aux_info *aux) +{ + unsigned int flags; + struct pid_namespace *ns; + struct pid *pid; + pid_t tgid; + + aux->task.type = BPF_TASK_ITER_ALL; + if (linfo->task.tid != 0) { + aux->task.type = BPF_TASK_ITER_TID; + aux->task.pid = linfo->task.tid; + } + if (linfo->task.pid != 0) { + if (aux->task.type != BPF_TASK_ITER_ALL) + return -EINVAL; + + aux->task.type = BPF_TASK_ITER_TGID; + aux->task.pid = linfo->task.pid; + } + if (linfo->task.pid_fd != 0) { + if (aux->task.type != BPF_TASK_ITER_ALL) + return -EINVAL; + + aux->task.type = BPF_TASK_ITER_TGID; + ns = task_active_pid_ns(current); + if (IS_ERR(ns)) + return PTR_ERR(ns); + + pid = pidfd_get_pid(linfo->task.pid_fd, &flags); + if (IS_ERR(pid)) + return PTR_ERR(pid); + + tgid = pid_nr_ns(pid, ns); + aux->task.pid = tgid; + put_pid(pid); + } + + return 0; +} + static const struct seq_operations task_seq_ops = { .start = task_seq_start, .next = task_seq_next, @@ -137,8 +196,7 @@ struct bpf_iter_seq_task_file_info { static struct file * task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info) { - struct pid_namespace *ns = info->common.ns; - u32 curr_tid = info->tid; + u32 saved_tid = info->tid; struct task_struct *curr_task; unsigned int curr_fd = info->fd; @@ -151,21 +209,18 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info) curr_task = info->task; curr_fd = info->fd; } else { - curr_task = task_seq_get_next(ns, &curr_tid, true); + curr_task = task_seq_get_next(&info->common, &info->tid, true); if (!curr_task) { info->task = NULL; - info->tid = curr_tid; return NULL; } - /* set info->task and info->tid */ + /* set info->task */ info->task = curr_task; - if (curr_tid == info->tid) { + if (saved_tid == info->tid) curr_fd = info->fd; - } else { - info->tid = curr_tid; + else curr_fd = 0; - } } rcu_read_lock(); @@ -186,9 +241,15 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info) /* the current task is done, go to the next task */ rcu_read_unlock(); put_task_struct(curr_task); + + if (info->common.type == BPF_TASK_ITER_TID) { + info->task = NULL; + return NULL; + } + info->task = NULL; info->fd = 0; - curr_tid = ++(info->tid); + saved_tid = ++(info->tid); goto again; } @@ -269,6 +330,9 @@ static int init_seq_pidns(void *priv_data, struct bpf_iter_aux_info *aux) struct bpf_iter_seq_task_common *common = priv_data; common->ns = get_pid_ns(task_active_pid_ns(current)); + common->type = aux->task.type; + common->pid = aux->task.pid; + return 0; } @@ -307,11 +371,10 @@ enum bpf_task_vma_iter_find_op { static struct vm_area_struct * task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) { - struct pid_namespace *ns = info->common.ns; enum bpf_task_vma_iter_find_op op; struct vm_area_struct *curr_vma; struct task_struct *curr_task; - u32 curr_tid = info->tid; + u32 saved_tid = info->tid; /* If this function returns a non-NULL vma, it holds a reference to * the task_struct, and holds read lock on vma->mm->mmap_lock. @@ -371,14 +434,13 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) } } else { again: - curr_task = task_seq_get_next(ns, &curr_tid, true); + curr_task = task_seq_get_next(&info->common, &info->tid, true); if (!curr_task) { - info->tid = curr_tid + 1; + info->tid++; goto finish; } - if (curr_tid != info->tid) { - info->tid = curr_tid; + if (saved_tid != info->tid) { /* new task, process the first vma */ op = task_vma_iter_first_vma; } else { @@ -430,9 +492,12 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) return curr_vma; next_task: + if (info->common.type == BPF_TASK_ITER_TID) + goto finish; + put_task_struct(curr_task); info->task = NULL; - curr_tid++; + info->tid++; goto again; finish: @@ -533,6 +598,7 @@ static const struct bpf_iter_seq_info task_seq_info = { static struct bpf_iter_reg task_reg_info = { .target = "task", + .attach_target = bpf_iter_attach_task, .feature = BPF_ITER_RESCHED, .ctx_arg_info_size = 1, .ctx_arg_info = { @@ -551,6 +617,7 @@ static const struct bpf_iter_seq_info task_file_seq_info = { static struct bpf_iter_reg task_file_reg_info = { .target = "task_file", + .attach_target = bpf_iter_attach_task, .feature = BPF_ITER_RESCHED, .ctx_arg_info_size = 2, .ctx_arg_info = { @@ -571,6 +638,7 @@ static const struct bpf_iter_seq_info task_vma_seq_info = { static struct bpf_iter_reg task_vma_reg_info = { .target = "task_vma", + .attach_target = bpf_iter_attach_task, .feature = BPF_ITER_RESCHED, .ctx_arg_info_size = 2, .ctx_arg_info = { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 1d6085e15fc8..7a0268749a48 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -91,6 +91,12 @@ union bpf_iter_link_info { struct { __u32 map_fd; } map; + /* Parameters of task iterators. */ + struct { + __u32 tid; + __u32 pid; + __u32 pid_fd; + } task; }; /* BPF syscall commands, see bpf(2) man-page for more details. */
Allow creating an iterator that loops through resources of one task/thread. People could only create iterators to loop through all resources of files, vma, and tasks in the system, even though they were interested in only the resources of a specific task or process. Passing the additional parameters, people can now create an iterator to go through all resources or only the resources of a task. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> --- include/linux/bpf.h | 25 +++++++ include/uapi/linux/bpf.h | 6 ++ kernel/bpf/task_iter.c | 116 ++++++++++++++++++++++++++------- tools/include/uapi/linux/bpf.h | 6 ++ 4 files changed, 129 insertions(+), 24 deletions(-)