Message ID | 20220721111430.416305-1-lee@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid() | expand |
On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote: > The documentation for find_pid() clearly states: > > "Must be called with the tasklist_lock or rcu_read_lock() held." > > Presently we do neither. > > In an ideal world we would wrap the in-lined call to find_vpid() along > with get_pid_task() in the suggested rcu_read_lock() and have done. > However, looking at get_pid_task()'s internals, it already does that > independently, so this would lead to deadlock. hm, we can have nested rcu_read_lock calls, right? jirka > > Instead, we'll use find_get_pid() which searches for the vpid, then > takes a reference to it preventing early free, all within the safety > of rcu_read_lock(). Once we have our reference we can safely make use > of it up until the point it is put. > > Cc: Alexei Starovoitov <ast@kernel.org> > Cc: Daniel Borkmann <daniel@iogearbox.net> > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: Andrii Nakryiko <andrii@kernel.org> > Cc: Martin KaFai Lau <martin.lau@linux.dev> > Cc: Song Liu <song@kernel.org> > Cc: Yonghong Song <yhs@fb.com> > Cc: KP Singh <kpsingh@kernel.org> > Cc: Stanislav Fomichev <sdf@google.com> > Cc: Hao Luo <haoluo@google.com> > Cc: Jiri Olsa <jolsa@kernel.org> > Cc: bpf@vger.kernel.org > Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") > Signed-off-by: Lee Jones <lee@kernel.org> > --- > kernel/bpf/syscall.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 83c7136c5788d..c20cff30581c4 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, > const struct perf_event *event; > struct task_struct *task; > struct file *file; > + struct pid *ppid; > int err; > > if (CHECK_ATTR(BPF_TASK_FD_QUERY)) > @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr, > if (attr->task_fd_query.flags != 0) > return -EINVAL; > > - task = get_pid_task(find_vpid(pid), PIDTYPE_PID); > + ppid = find_get_pid(pid); > + task = get_pid_task(ppid, PIDTYPE_PID); > + put_pid(ppid); > if (!task) > return -ENOENT; > > -- > 2.37.0.170.g444d1eabd0-goog >
On Thu, 21 Jul 2022, Jiri Olsa wrote: > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote: > > The documentation for find_pid() clearly states: > > > > "Must be called with the tasklist_lock or rcu_read_lock() held." > > > > Presently we do neither. > > > > In an ideal world we would wrap the in-lined call to find_vpid() along > > with get_pid_task() in the suggested rcu_read_lock() and have done. > > However, looking at get_pid_task()'s internals, it already does that > > independently, so this would lead to deadlock. > > hm, we can have nested rcu_read_lock calls, right? I assumed not, but that might be an oversight on my part. Would that be your preference? > > Instead, we'll use find_get_pid() which searches for the vpid, then > > takes a reference to it preventing early free, all within the safety > > of rcu_read_lock(). Once we have our reference we can safely make use > > of it up until the point it is put. > > > > Cc: Alexei Starovoitov <ast@kernel.org> > > Cc: Daniel Borkmann <daniel@iogearbox.net> > > Cc: John Fastabend <john.fastabend@gmail.com> > > Cc: Andrii Nakryiko <andrii@kernel.org> > > Cc: Martin KaFai Lau <martin.lau@linux.dev> > > Cc: Song Liu <song@kernel.org> > > Cc: Yonghong Song <yhs@fb.com> > > Cc: KP Singh <kpsingh@kernel.org> > > Cc: Stanislav Fomichev <sdf@google.com> > > Cc: Hao Luo <haoluo@google.com> > > Cc: Jiri Olsa <jolsa@kernel.org> > > Cc: bpf@vger.kernel.org > > Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") > > Signed-off-by: Lee Jones <lee@kernel.org> > > --- > > kernel/bpf/syscall.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index 83c7136c5788d..c20cff30581c4 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, > > const struct perf_event *event; > > struct task_struct *task; > > struct file *file; > > + struct pid *ppid; > > int err; > > > > if (CHECK_ATTR(BPF_TASK_FD_QUERY)) > > @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr, > > if (attr->task_fd_query.flags != 0) > > return -EINVAL; > > > > - task = get_pid_task(find_vpid(pid), PIDTYPE_PID); > > + ppid = find_get_pid(pid); > > + task = get_pid_task(ppid, PIDTYPE_PID); > > + put_pid(ppid); > > if (!task) > > return -ENOENT; > >
On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote: > On Thu, 21 Jul 2022, Jiri Olsa wrote: > > > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote: > > > The documentation for find_pid() clearly states: typo find_vpid > > > > > > "Must be called with the tasklist_lock or rcu_read_lock() held." > > > > > > Presently we do neither. just curious, did you see crash related to this or you just spot that > > > > > > In an ideal world we would wrap the in-lined call to find_vpid() along > > > with get_pid_task() in the suggested rcu_read_lock() and have done. > > > However, looking at get_pid_task()'s internals, it already does that > > > independently, so this would lead to deadlock. > > > > hm, we can have nested rcu_read_lock calls, right? > > I assumed not, but that might be an oversight on my part. > > Would that be your preference? seems simpler than calling get/put for ppid jirka > > > > Instead, we'll use find_get_pid() which searches for the vpid, then > > > takes a reference to it preventing early free, all within the safety > > > of rcu_read_lock(). Once we have our reference we can safely make use > > > of it up until the point it is put. > > > > > > Cc: Alexei Starovoitov <ast@kernel.org> > > > Cc: Daniel Borkmann <daniel@iogearbox.net> > > > Cc: John Fastabend <john.fastabend@gmail.com> > > > Cc: Andrii Nakryiko <andrii@kernel.org> > > > Cc: Martin KaFai Lau <martin.lau@linux.dev> > > > Cc: Song Liu <song@kernel.org> > > > Cc: Yonghong Song <yhs@fb.com> > > > Cc: KP Singh <kpsingh@kernel.org> > > > Cc: Stanislav Fomichev <sdf@google.com> > > > Cc: Hao Luo <haoluo@google.com> > > > Cc: Jiri Olsa <jolsa@kernel.org> > > > Cc: bpf@vger.kernel.org > > > Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") > > > Signed-off-by: Lee Jones <lee@kernel.org> > > > --- > > > kernel/bpf/syscall.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > index 83c7136c5788d..c20cff30581c4 100644 > > > --- a/kernel/bpf/syscall.c > > > +++ b/kernel/bpf/syscall.c > > > @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, > > > const struct perf_event *event; > > > struct task_struct *task; > > > struct file *file; > > > + struct pid *ppid; > > > int err; > > > > > > if (CHECK_ATTR(BPF_TASK_FD_QUERY)) > > > @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr, > > > if (attr->task_fd_query.flags != 0) > > > return -EINVAL; > > > > > > - task = get_pid_task(find_vpid(pid), PIDTYPE_PID); > > > + ppid = find_get_pid(pid); > > > + task = get_pid_task(ppid, PIDTYPE_PID); > > > + put_pid(ppid); > > > if (!task) > > > return -ENOENT; > > > > > -- > Lee Jones [李琼斯]
On 7/21/22 5:14 AM, Jiri Olsa wrote: > On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote: >> On Thu, 21 Jul 2022, Jiri Olsa wrote: >> >>> On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote: >>>> The documentation for find_pid() clearly states: > > typo find_vpid > >>>> >>>> "Must be called with the tasklist_lock or rcu_read_lock() held." >>>> >>>> Presently we do neither. > > just curious, did you see crash related to this or you just spot that > >>>> >>>> In an ideal world we would wrap the in-lined call to find_vpid() along >>>> with get_pid_task() in the suggested rcu_read_lock() and have done. >>>> However, looking at get_pid_task()'s internals, it already does that >>>> independently, so this would lead to deadlock. >>> >>> hm, we can have nested rcu_read_lock calls, right? >> >> I assumed not, but that might be an oversight on my part. From kernel documentation, nested rcu_read_lock is allowed. https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html RCU's grace-period guarantee allows updaters to wait for the completion of all pre-existing RCU read-side critical sections. An RCU read-side critical section begins with the marker rcu_read_lock() and ends with the marker rcu_read_unlock(). These markers may be nested, and RCU treats a nested set as one big RCU read-side critical section. Production-quality implementations of rcu_read_lock() and rcu_read_unlock() are extremely lightweight, and in fact have exactly zero overhead in Linux kernels built for production use with CONFIG_PREEMPT=n. >> >> Would that be your preference? > > seems simpler than calling get/put for ppid The current implementation seems okay since we can hide rcu_read_lock() inside find_get_pid(). We can also avoid nested rcu_read_lock(), which is although allowed but not pretty. > > jirka > >> >>>> Instead, we'll use find_get_pid() which searches for the vpid, then >>>> takes a reference to it preventing early free, all within the safety >>>> of rcu_read_lock(). Once we have our reference we can safely make use >>>> of it up until the point it is put. >>>> >>>> Cc: Alexei Starovoitov <ast@kernel.org> >>>> Cc: Daniel Borkmann <daniel@iogearbox.net> >>>> Cc: John Fastabend <john.fastabend@gmail.com> >>>> Cc: Andrii Nakryiko <andrii@kernel.org> >>>> Cc: Martin KaFai Lau <martin.lau@linux.dev> >>>> Cc: Song Liu <song@kernel.org> >>>> Cc: Yonghong Song <yhs@fb.com> >>>> Cc: KP Singh <kpsingh@kernel.org> >>>> Cc: Stanislav Fomichev <sdf@google.com> >>>> Cc: Hao Luo <haoluo@google.com> >>>> Cc: Jiri Olsa <jolsa@kernel.org> >>>> Cc: bpf@vger.kernel.org >>>> Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") >>>> Signed-off-by: Lee Jones <lee@kernel.org> >>>> --- >>>> kernel/bpf/syscall.c | 5 ++++- >>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >>>> index 83c7136c5788d..c20cff30581c4 100644 >>>> --- a/kernel/bpf/syscall.c >>>> +++ b/kernel/bpf/syscall.c >>>> @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, >>>> const struct perf_event *event; >>>> struct task_struct *task; >>>> struct file *file; >>>> + struct pid *ppid; >>>> int err; >>>> >>>> if (CHECK_ATTR(BPF_TASK_FD_QUERY)) >>>> @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr, >>>> if (attr->task_fd_query.flags != 0) >>>> return -EINVAL; >>>> >>>> - task = get_pid_task(find_vpid(pid), PIDTYPE_PID); >>>> + ppid = find_get_pid(pid); >>>> + task = get_pid_task(ppid, PIDTYPE_PID); >>>> + put_pid(ppid); >>>> if (!task) >>>> return -ENOENT; >>>> >> >> -- >> Lee Jones [李琼斯]
On Thu, 21 Jul 2022, Yonghong Song wrote: > > > On 7/21/22 5:14 AM, Jiri Olsa wrote: > > On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote: > > > On Thu, 21 Jul 2022, Jiri Olsa wrote: > > > > > > > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote: > > > > > The documentation for find_pid() clearly states: > > > > typo find_vpid > > > > > > > > > > > > "Must be called with the tasklist_lock or rcu_read_lock() held." > > > > > > > > > > Presently we do neither. > > > > just curious, did you see crash related to this or you just spot that > > > > > > > > > > > > In an ideal world we would wrap the in-lined call to find_vpid() along > > > > > with get_pid_task() in the suggested rcu_read_lock() and have done. > > > > > However, looking at get_pid_task()'s internals, it already does that > > > > > independently, so this would lead to deadlock. > > > > > > > > hm, we can have nested rcu_read_lock calls, right? > > > > > > I assumed not, but that might be an oversight on my part. > > From kernel documentation, nested rcu_read_lock is allowed. > https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html > > RCU's grace-period guarantee allows updaters to wait for the completion of > all pre-existing RCU read-side critical sections. An RCU read-side critical > section begins with the marker rcu_read_lock() and ends with the marker > rcu_read_unlock(). These markers may be nested, and RCU treats a nested set > as one big RCU read-side critical section. Production-quality > implementations of rcu_read_lock() and rcu_read_unlock() are extremely > lightweight, and in fact have exactly zero overhead in Linux kernels built > for production use with CONFIG_PREEMPT=n. > > > > > > > Would that be your preference? > > > > seems simpler than calling get/put for ppid > > The current implementation seems okay since we can hide > rcu_read_lock() inside find_get_pid(). We can also avoid > nested rcu_read_lock(), which is although allowed but > not pretty. Right, this was my thinking. Happy to go with whatever you guys decide though. Make the call and I'll rework, or not.
On Thu, Jul 21, 2022 at 09:58:00PM +0100, Lee Jones wrote: > On Thu, 21 Jul 2022, Yonghong Song wrote: > > > > > > > On 7/21/22 5:14 AM, Jiri Olsa wrote: > > > On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote: > > > > On Thu, 21 Jul 2022, Jiri Olsa wrote: > > > > > > > > > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote: > > > > > > The documentation for find_pid() clearly states: > > > > > > typo find_vpid > > > > > > > > > > > > > > > "Must be called with the tasklist_lock or rcu_read_lock() held." > > > > > > > > > > > > Presently we do neither. > > > > > > just curious, did you see crash related to this or you just spot that > > > > > > > > > > > > > > > In an ideal world we would wrap the in-lined call to find_vpid() along > > > > > > with get_pid_task() in the suggested rcu_read_lock() and have done. > > > > > > However, looking at get_pid_task()'s internals, it already does that > > > > > > independently, so this would lead to deadlock. > > > > > > > > > > hm, we can have nested rcu_read_lock calls, right? > > > > > > > > I assumed not, but that might be an oversight on my part. > > > > From kernel documentation, nested rcu_read_lock is allowed. > > https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html > > > > RCU's grace-period guarantee allows updaters to wait for the completion of > > all pre-existing RCU read-side critical sections. An RCU read-side critical > > section begins with the marker rcu_read_lock() and ends with the marker > > rcu_read_unlock(). These markers may be nested, and RCU treats a nested set > > as one big RCU read-side critical section. Production-quality > > implementations of rcu_read_lock() and rcu_read_unlock() are extremely > > lightweight, and in fact have exactly zero overhead in Linux kernels built > > for production use with CONFIG_PREEMPT=n. > > > > > > > > > > Would that be your preference? > > > > > > seems simpler than calling get/put for ppid > > > > The current implementation seems okay since we can hide > > rcu_read_lock() inside find_get_pid(). We can also avoid > > nested rcu_read_lock(), which is although allowed but > > not pretty. > > Right, this was my thinking. > > Happy to go with whatever you guys decide though. > > Make the call and I'll rework, or not. ok, I can live with the current version ;-) could you please resend with fixed changelog? thanks, jirka
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 83c7136c5788d..c20cff30581c4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, const struct perf_event *event; struct task_struct *task; struct file *file; + struct pid *ppid; int err; if (CHECK_ATTR(BPF_TASK_FD_QUERY)) @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr, if (attr->task_fd_query.flags != 0) return -EINVAL; - task = get_pid_task(find_vpid(pid), PIDTYPE_PID); + ppid = find_get_pid(pid); + task = get_pid_task(ppid, PIDTYPE_PID); + put_pid(ppid); if (!task) return -ENOENT;
The documentation for find_pid() clearly states: "Must be called with the tasklist_lock or rcu_read_lock() held." Presently we do neither. In an ideal world we would wrap the in-lined call to find_vpid() along with get_pid_task() in the suggested rcu_read_lock() and have done. However, looking at get_pid_task()'s internals, it already does that independently, so this would lead to deadlock. Instead, we'll use find_get_pid() which searches for the vpid, then takes a reference to it preventing early free, all within the safety of rcu_read_lock(). Once we have our reference we can safely make use of it up until the point it is put. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: bpf@vger.kernel.org Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") Signed-off-by: Lee Jones <lee@kernel.org> --- kernel/bpf/syscall.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)