diff mbox series

[1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()

Message ID 20220721111430.416305-1-lee@kernel.org (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Kernel LATEST on ubuntu-latest with llvm-15
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Kernel LATEST on z15 with gcc
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/cc_maintainers success CCed 12 of 12 maintainers
netdev/build_clang success Errors and warnings before: 6 this patch: 6
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 17 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Lee Jones July 21, 2022, 11:14 a.m. UTC
The documentation for find_pid() clearly states:

  "Must be called with the tasklist_lock or rcu_read_lock() held."

Presently we do neither.

In an ideal world we would wrap the in-lined call to find_vpid() along
with get_pid_task() in the suggested rcu_read_lock() and have done.
However, looking at get_pid_task()'s internals, it already does that
independently, so this would lead to deadlock.

Instead, we'll use find_get_pid() which searches for the vpid, then
takes a reference to it preventing early free, all within the safety
of rcu_read_lock().  Once we have our reference we can safely make use
of it up until the point it is put.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: bpf@vger.kernel.org
Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
Signed-off-by: Lee Jones <lee@kernel.org>
---
 kernel/bpf/syscall.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Jiri Olsa July 21, 2022, 11:56 a.m. UTC | #1
On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> The documentation for find_pid() clearly states:
> 
>   "Must be called with the tasklist_lock or rcu_read_lock() held."
> 
> Presently we do neither.
> 
> In an ideal world we would wrap the in-lined call to find_vpid() along
> with get_pid_task() in the suggested rcu_read_lock() and have done.
> However, looking at get_pid_task()'s internals, it already does that
> independently, so this would lead to deadlock.

hm, we can have nested rcu_read_lock calls, right?

jirka

> 
> Instead, we'll use find_get_pid() which searches for the vpid, then
> takes a reference to it preventing early free, all within the safety
> of rcu_read_lock().  Once we have our reference we can safely make use
> of it up until the point it is put.
> 
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Song Liu <song@kernel.org>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: Stanislav Fomichev <sdf@google.com>
> Cc: Hao Luo <haoluo@google.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: bpf@vger.kernel.org
> Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
> Signed-off-by: Lee Jones <lee@kernel.org>
> ---
>  kernel/bpf/syscall.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 83c7136c5788d..c20cff30581c4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>  	const struct perf_event *event;
>  	struct task_struct *task;
>  	struct file *file;
> +	struct pid *ppid;
>  	int err;
>  
>  	if (CHECK_ATTR(BPF_TASK_FD_QUERY))
> @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>  	if (attr->task_fd_query.flags != 0)
>  		return -EINVAL;
>  
> -	task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
> +	ppid = find_get_pid(pid);
> +	task = get_pid_task(ppid, PIDTYPE_PID);
> +	put_pid(ppid);
>  	if (!task)
>  		return -ENOENT;
>  
> -- 
> 2.37.0.170.g444d1eabd0-goog
>
Lee Jones July 21, 2022, 11:59 a.m. UTC | #2
On Thu, 21 Jul 2022, Jiri Olsa wrote:

> On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> > The documentation for find_pid() clearly states:
> > 
> >   "Must be called with the tasklist_lock or rcu_read_lock() held."
> > 
> > Presently we do neither.
> > 
> > In an ideal world we would wrap the in-lined call to find_vpid() along
> > with get_pid_task() in the suggested rcu_read_lock() and have done.
> > However, looking at get_pid_task()'s internals, it already does that
> > independently, so this would lead to deadlock.
> 
> hm, we can have nested rcu_read_lock calls, right?

I assumed not, but that might be an oversight on my part.

Would that be your preference?

> > Instead, we'll use find_get_pid() which searches for the vpid, then
> > takes a reference to it preventing early free, all within the safety
> > of rcu_read_lock().  Once we have our reference we can safely make use
> > of it up until the point it is put.
> > 
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Andrii Nakryiko <andrii@kernel.org>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Song Liu <song@kernel.org>
> > Cc: Yonghong Song <yhs@fb.com>
> > Cc: KP Singh <kpsingh@kernel.org>
> > Cc: Stanislav Fomichev <sdf@google.com>
> > Cc: Hao Luo <haoluo@google.com>
> > Cc: Jiri Olsa <jolsa@kernel.org>
> > Cc: bpf@vger.kernel.org
> > Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
> > Signed-off-by: Lee Jones <lee@kernel.org>
> > ---
> >  kernel/bpf/syscall.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 83c7136c5788d..c20cff30581c4 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> >  	const struct perf_event *event;
> >  	struct task_struct *task;
> >  	struct file *file;
> > +	struct pid *ppid;
> >  	int err;
> >  
> >  	if (CHECK_ATTR(BPF_TASK_FD_QUERY))
> > @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> >  	if (attr->task_fd_query.flags != 0)
> >  		return -EINVAL;
> >  
> > -	task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
> > +	ppid = find_get_pid(pid);
> > +	task = get_pid_task(ppid, PIDTYPE_PID);
> > +	put_pid(ppid);
> >  	if (!task)
> >  		return -ENOENT;
> >
Jiri Olsa July 21, 2022, 12:14 p.m. UTC | #3
On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
> On Thu, 21 Jul 2022, Jiri Olsa wrote:
> 
> > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> > > The documentation for find_pid() clearly states:

typo find_vpid

> > > 
> > >   "Must be called with the tasklist_lock or rcu_read_lock() held."
> > > 
> > > Presently we do neither.

just curious, did you see crash related to this or you just spot that

> > > 
> > > In an ideal world we would wrap the in-lined call to find_vpid() along
> > > with get_pid_task() in the suggested rcu_read_lock() and have done.
> > > However, looking at get_pid_task()'s internals, it already does that
> > > independently, so this would lead to deadlock.
> > 
> > hm, we can have nested rcu_read_lock calls, right?
> 
> I assumed not, but that might be an oversight on my part.
> 
> Would that be your preference?

seems simpler than calling get/put for ppid

jirka

> 
> > > Instead, we'll use find_get_pid() which searches for the vpid, then
> > > takes a reference to it preventing early free, all within the safety
> > > of rcu_read_lock().  Once we have our reference we can safely make use
> > > of it up until the point it is put.
> > > 
> > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > > Cc: Song Liu <song@kernel.org>
> > > Cc: Yonghong Song <yhs@fb.com>
> > > Cc: KP Singh <kpsingh@kernel.org>
> > > Cc: Stanislav Fomichev <sdf@google.com>
> > > Cc: Hao Luo <haoluo@google.com>
> > > Cc: Jiri Olsa <jolsa@kernel.org>
> > > Cc: bpf@vger.kernel.org
> > > Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
> > > Signed-off-by: Lee Jones <lee@kernel.org>
> > > ---
> > >  kernel/bpf/syscall.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 83c7136c5788d..c20cff30581c4 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> > >  	const struct perf_event *event;
> > >  	struct task_struct *task;
> > >  	struct file *file;
> > > +	struct pid *ppid;
> > >  	int err;
> > >  
> > >  	if (CHECK_ATTR(BPF_TASK_FD_QUERY))
> > > @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> > >  	if (attr->task_fd_query.flags != 0)
> > >  		return -EINVAL;
> > >  
> > > -	task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
> > > +	ppid = find_get_pid(pid);
> > > +	task = get_pid_task(ppid, PIDTYPE_PID);
> > > +	put_pid(ppid);
> > >  	if (!task)
> > >  		return -ENOENT;
> > >  
> 
> -- 
> Lee Jones [李琼斯]
Yonghong Song July 21, 2022, 3:53 p.m. UTC | #4
On 7/21/22 5:14 AM, Jiri Olsa wrote:
> On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
>> On Thu, 21 Jul 2022, Jiri Olsa wrote:
>>
>>> On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
>>>> The documentation for find_pid() clearly states:
> 
> typo find_vpid
> 
>>>>
>>>>    "Must be called with the tasklist_lock or rcu_read_lock() held."
>>>>
>>>> Presently we do neither.
> 
> just curious, did you see crash related to this or you just spot that
> 
>>>>
>>>> In an ideal world we would wrap the in-lined call to find_vpid() along
>>>> with get_pid_task() in the suggested rcu_read_lock() and have done.
>>>> However, looking at get_pid_task()'s internals, it already does that
>>>> independently, so this would lead to deadlock.
>>>
>>> hm, we can have nested rcu_read_lock calls, right?
>>
>> I assumed not, but that might be an oversight on my part.

 From kernel documentation, nested rcu_read_lock is allowed.
https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

RCU's grace-period guarantee allows updaters to wait for the completion 
of all pre-existing RCU read-side critical sections. An RCU read-side 
critical section begins with the marker rcu_read_lock() and ends with 
the marker rcu_read_unlock(). These markers may be nested, and RCU 
treats a nested set as one big RCU read-side critical section. 
Production-quality implementations of rcu_read_lock() and 
rcu_read_unlock() are extremely lightweight, and in fact have exactly 
zero overhead in Linux kernels built for production use with 
CONFIG_PREEMPT=n.

>>
>> Would that be your preference?
> 
> seems simpler than calling get/put for ppid

The current implementation seems okay since we can hide
rcu_read_lock() inside find_get_pid(). We can also avoid
nested rcu_read_lock(), which is although allowed but
not pretty.

> 
> jirka
> 
>>
>>>> Instead, we'll use find_get_pid() which searches for the vpid, then
>>>> takes a reference to it preventing early free, all within the safety
>>>> of rcu_read_lock().  Once we have our reference we can safely make use
>>>> of it up until the point it is put.
>>>>
>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>>> Cc: Andrii Nakryiko <andrii@kernel.org>
>>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>>> Cc: Song Liu <song@kernel.org>
>>>> Cc: Yonghong Song <yhs@fb.com>
>>>> Cc: KP Singh <kpsingh@kernel.org>
>>>> Cc: Stanislav Fomichev <sdf@google.com>
>>>> Cc: Hao Luo <haoluo@google.com>
>>>> Cc: Jiri Olsa <jolsa@kernel.org>
>>>> Cc: bpf@vger.kernel.org
>>>> Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
>>>> Signed-off-by: Lee Jones <lee@kernel.org>
>>>> ---
>>>>   kernel/bpf/syscall.c | 5 ++++-
>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>>> index 83c7136c5788d..c20cff30581c4 100644
>>>> --- a/kernel/bpf/syscall.c
>>>> +++ b/kernel/bpf/syscall.c
>>>> @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>>   	const struct perf_event *event;
>>>>   	struct task_struct *task;
>>>>   	struct file *file;
>>>> +	struct pid *ppid;
>>>>   	int err;
>>>>   
>>>>   	if (CHECK_ATTR(BPF_TASK_FD_QUERY))
>>>> @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>>   	if (attr->task_fd_query.flags != 0)
>>>>   		return -EINVAL;
>>>>   
>>>> -	task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
>>>> +	ppid = find_get_pid(pid);
>>>> +	task = get_pid_task(ppid, PIDTYPE_PID);
>>>> +	put_pid(ppid);
>>>>   	if (!task)
>>>>   		return -ENOENT;
>>>>   
>>
>> -- 
>> Lee Jones [李琼斯]
Lee Jones July 21, 2022, 8:58 p.m. UTC | #5
On Thu, 21 Jul 2022, Yonghong Song wrote:

> 
> 
> On 7/21/22 5:14 AM, Jiri Olsa wrote:
> > On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
> > > On Thu, 21 Jul 2022, Jiri Olsa wrote:
> > > 
> > > > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> > > > > The documentation for find_pid() clearly states:
> > 
> > typo find_vpid
> > 
> > > > > 
> > > > >    "Must be called with the tasklist_lock or rcu_read_lock() held."
> > > > > 
> > > > > Presently we do neither.
> > 
> > just curious, did you see crash related to this or you just spot that
> > 
> > > > > 
> > > > > In an ideal world we would wrap the in-lined call to find_vpid() along
> > > > > with get_pid_task() in the suggested rcu_read_lock() and have done.
> > > > > However, looking at get_pid_task()'s internals, it already does that
> > > > > independently, so this would lead to deadlock.
> > > > 
> > > > hm, we can have nested rcu_read_lock calls, right?
> > > 
> > > I assumed not, but that might be an oversight on my part.
> 
> From kernel documentation, nested rcu_read_lock is allowed.
> https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html
> 
> RCU's grace-period guarantee allows updaters to wait for the completion of
> all pre-existing RCU read-side critical sections. An RCU read-side critical
> section begins with the marker rcu_read_lock() and ends with the marker
> rcu_read_unlock(). These markers may be nested, and RCU treats a nested set
> as one big RCU read-side critical section. Production-quality
> implementations of rcu_read_lock() and rcu_read_unlock() are extremely
> lightweight, and in fact have exactly zero overhead in Linux kernels built
> for production use with CONFIG_PREEMPT=n.
> 
> > > 
> > > Would that be your preference?
> > 
> > seems simpler than calling get/put for ppid
> 
> The current implementation seems okay since we can hide
> rcu_read_lock() inside find_get_pid(). We can also avoid
> nested rcu_read_lock(), which is although allowed but
> not pretty.

Right, this was my thinking.

Happy to go with whatever you guys decide though.

Make the call and I'll rework, or not.
Jiri Olsa July 22, 2022, 8:15 p.m. UTC | #6
On Thu, Jul 21, 2022 at 09:58:00PM +0100, Lee Jones wrote:
> On Thu, 21 Jul 2022, Yonghong Song wrote:
> 
> > 
> > 
> > On 7/21/22 5:14 AM, Jiri Olsa wrote:
> > > On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
> > > > On Thu, 21 Jul 2022, Jiri Olsa wrote:
> > > > 
> > > > > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> > > > > > The documentation for find_pid() clearly states:
> > > 
> > > typo find_vpid
> > > 
> > > > > > 
> > > > > >    "Must be called with the tasklist_lock or rcu_read_lock() held."
> > > > > > 
> > > > > > Presently we do neither.
> > > 
> > > just curious, did you see crash related to this or you just spot that
> > > 
> > > > > > 
> > > > > > In an ideal world we would wrap the in-lined call to find_vpid() along
> > > > > > with get_pid_task() in the suggested rcu_read_lock() and have done.
> > > > > > However, looking at get_pid_task()'s internals, it already does that
> > > > > > independently, so this would lead to deadlock.
> > > > > 
> > > > > hm, we can have nested rcu_read_lock calls, right?
> > > > 
> > > > I assumed not, but that might be an oversight on my part.
> > 
> > From kernel documentation, nested rcu_read_lock is allowed.
> > https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html
> > 
> > RCU's grace-period guarantee allows updaters to wait for the completion of
> > all pre-existing RCU read-side critical sections. An RCU read-side critical
> > section begins with the marker rcu_read_lock() and ends with the marker
> > rcu_read_unlock(). These markers may be nested, and RCU treats a nested set
> > as one big RCU read-side critical section. Production-quality
> > implementations of rcu_read_lock() and rcu_read_unlock() are extremely
> > lightweight, and in fact have exactly zero overhead in Linux kernels built
> > for production use with CONFIG_PREEMPT=n.
> > 
> > > > 
> > > > Would that be your preference?
> > > 
> > > seems simpler than calling get/put for ppid
> > 
> > The current implementation seems okay since we can hide
> > rcu_read_lock() inside find_get_pid(). We can also avoid
> > nested rcu_read_lock(), which is although allowed but
> > not pretty.
> 
> Right, this was my thinking.
> 
> Happy to go with whatever you guys decide though.
> 
> Make the call and I'll rework, or not.

ok, I can live with the current version ;-) could you please resend
with fixed changelog?

thanks,
jirka
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83c7136c5788d..c20cff30581c4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4385,6 +4385,7 @@  static int bpf_task_fd_query(const union bpf_attr *attr,
 	const struct perf_event *event;
 	struct task_struct *task;
 	struct file *file;
+	struct pid *ppid;
 	int err;
 
 	if (CHECK_ATTR(BPF_TASK_FD_QUERY))
@@ -4396,7 +4397,9 @@  static int bpf_task_fd_query(const union bpf_attr *attr,
 	if (attr->task_fd_query.flags != 0)
 		return -EINVAL;
 
-	task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
+	ppid = find_get_pid(pid);
+	task = get_pid_task(ppid, PIDTYPE_PID);
+	put_pid(ppid);
 	if (!task)
 		return -ENOENT;