diff mbox series

[1/3] bpf: task_group_seq_get_next: use __next_thread() rather than next_thread()

Message ID 20231114163234.GA890@redhat.com (mailing list archive)
State Accepted
Commit 2d1618054f25e11c44d189dbff4a60342a4cfb4b
Delegated to: BPF
Headers show
Series bpf: kernel/bpf/task_iter.c: don't abuse next_thread() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1145 this patch: 1145
netdev/cc_maintainers warning 8 maintainers not CCed: haoluo@google.com song@kernel.org kpsingh@kernel.org jolsa@kernel.org john.fastabend@gmail.com martin.lau@linux.dev andrii@kernel.org sdf@google.com
netdev/build_clang success Errors and warnings before: 1162 this patch: 1162
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1172 this patch: 1172
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-3 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-16 / veristat
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-4 pending Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-12 pending Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-9 pending Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-5 pending Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-15 pending Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-19 success Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-14 pending Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-17 pending Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-2 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-6 pending Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-23 pending Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-13 pending Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18 pending Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 pending Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-20 pending Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-llvm-16 / veristat
bpf/vmtest-bpf-VM_Test-16 pending Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-21 pending Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-3 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-11 success Logs for x86_64-gcc / build / build for x86_64 with gcc

Commit Message

Oleg Nesterov Nov. 14, 2023, 4:32 p.m. UTC
Lockless use of next_thread() should be avoided, kernel/bpf/task_iter.c
is the last user and the usage is wrong.

task_group_seq_get_next() can return the group leader twice if it races
with mt-thread exec which changes the group->leader's pid.

Change the main loop to use __next_thread(), kill "next_tid == common->pid"
check.

__next_thread() can't loop forever, we can also change this code to retry
if next_tid == 0.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/bpf/task_iter.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

Comments

Yonghong Song Nov. 16, 2023, 3:31 a.m. UTC | #1
On 11/14/23 11:32 AM, Oleg Nesterov wrote:
> Lockless use of next_thread() should be avoided, kernel/bpf/task_iter.c
> is the last user and the usage is wrong.
>
> task_group_seq_get_next() can return the group leader twice if it races
> with mt-thread exec which changes the group->leader's pid.
>
> Change the main loop to use __next_thread(), kill "next_tid == common->pid"
> check.
>
> __next_thread() can't loop forever, we can also change this code to retry
> if next_tid == 0.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>   kernel/bpf/task_iter.c | 12 +++++-------
>   1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 26082b97894d..51ae15e2b290 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -70,15 +70,13 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>   		return NULL;
>   
>   retry:
> -	task = next_thread(task);
> +	task = __next_thread(task);
> +	if (!task)
> +		return NULL;
>   
>   	next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
> -	if (!next_tid || next_tid == common->pid) {
> -		/* Run out of tasks of a process.  The tasks of a
> -		 * thread_group are linked as circular linked list.
> -		 */
> -		return NULL;
> -	}
> +	if (!next_tid)
> +		goto retry;

Look at the code. Looks like next_tid should never be 0 unless some
task is migrated to other namespace which I think is not possible.

common->ns is assigned as below:
   common->ns = get_pid_ns(task_active_pid_ns(current))
so we are searching tasks in the *current* namespace.

Look at:
pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns)
{
         struct upid *upid;
         pid_t nr = 0;

         if (pid && ns->level <= pid->level) {
                 upid = &pid->numbers[ns->level];
                 if (upid->ns == ns)
                         nr = upid->nr;
         }
         return nr;
}

pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
                         struct pid_namespace *ns)
{
         pid_t nr = 0;

         rcu_read_lock();
         if (!ns)
                 ns = task_active_pid_ns(current);
         nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
         rcu_read_unlock();
         
         return nr;
}

In func pid_nr_ns(), ns->level should be equal to pid->level if pid is
in input parameter 'ns'. and in this case the return value 'nr'
should be none zero.

If this is the case, could you remove
	if (!next_tid)
		goto retry;

Other than above, the change looks good to me.

>   
>   	if (skip_if_dup_files && task->files == task->group_leader->files)
>   		goto retry;
Oleg Nesterov Nov. 16, 2023, 9:34 a.m. UTC | #2
On 11/15, Yonghong Song wrote:
>
> On 11/14/23 11:32 AM, Oleg Nesterov wrote:
> >@@ -70,15 +70,13 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
> >  		return NULL;
> >  retry:
> >-	task = next_thread(task);
> >+	task = __next_thread(task);
> >+	if (!task)
> >+		return NULL;
> >  	next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
> >-	if (!next_tid || next_tid == common->pid) {
> >-		/* Run out of tasks of a process.  The tasks of a
> >-		 * thread_group are linked as circular linked list.
> >-		 */
> >-		return NULL;
> >-	}
> >+	if (!next_tid)
> >+		goto retry;
>
> Look at the code. Looks like next_tid should never be 0

...

> pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
>                         struct pid_namespace *ns)
> {
>         pid_t nr = 0;
>
>         rcu_read_lock();
>         if (!ns)
>                 ns = task_active_pid_ns(current);
>         nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^

Please note that task_pid_ptr(task, type)) can return NULL if this
task has already exited and called detach_pid().

detach_pid() does __change_pid(task, type, NULL), please note the

	*pid_ptr = new; // NULL in this case

assignment in __change_pid().

IOW. The problem is not that ns can change, the problem is that
task->thread_pid (and other pid links) can be NULL, and in this
case pid_nr_ns() returns zero.


This code should be rewritten from the very beginning, it should
not rely on pid_nr. If nothing else common->pid and/or pid_visiting
can be reused. But currently my only concern is next_thread().

> Other than above, the change looks good to me.

Thanks for review!

Oleg.
Yonghong Song Nov. 16, 2023, 11:46 a.m. UTC | #3
On 11/16/23 4:34 AM, Oleg Nesterov wrote:
> On 11/15, Yonghong Song wrote:
>> On 11/14/23 11:32 AM, Oleg Nesterov wrote:
>>> @@ -70,15 +70,13 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>>>   		return NULL;
>>>   retry:
>>> -	task = next_thread(task);
>>> +	task = __next_thread(task);
>>> +	if (!task)
>>> +		return NULL;
>>>   	next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
>>> -	if (!next_tid || next_tid == common->pid) {
>>> -		/* Run out of tasks of a process.  The tasks of a
>>> -		 * thread_group are linked as circular linked list.
>>> -		 */
>>> -		return NULL;
>>> -	}
>>> +	if (!next_tid)
>>> +		goto retry;
>> Look at the code. Looks like next_tid should never be 0
> ...
>
>> pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
>>                          struct pid_namespace *ns)
>> {
>>          pid_t nr = 0;
>>
>>          rcu_read_lock();
>>          if (!ns)
>>                  ns = task_active_pid_ns(current);
>>          nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
>                                            ^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Please note that task_pid_ptr(task, type)) can return NULL if this
> task has already exited and called detach_pid().
>
> detach_pid() does __change_pid(task, type, NULL), please note the
>
> 	*pid_ptr = new; // NULL in this case
>
> assignment in __change_pid().
>
> IOW. The problem is not that ns can change, the problem is that
> task->thread_pid (and other pid links) can be NULL, and in this
> case pid_nr_ns() returns zero.

Thanks for explanation. I certainly missed race between task
iterator and __change_pid(). Then the patch looks good to me.

Acked-by: Yonghong Song <yonghong.song@linux.dev>

>
>
> This code should be rewritten from the very beginning, it should
> not rely on pid_nr. If nothing else common->pid and/or pid_visiting
> can be reused. But currently my only concern is next_thread().
>
>> Other than above, the change looks good to me.
> Thanks for review!
>
> Oleg.
>
diff mbox series

Patch

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 26082b97894d..51ae15e2b290 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -70,15 +70,13 @@  static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 		return NULL;
 
 retry:
-	task = next_thread(task);
+	task = __next_thread(task);
+	if (!task)
+		return NULL;
 
 	next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
-	if (!next_tid || next_tid == common->pid) {
-		/* Run out of tasks of a process.  The tasks of a
-		 * thread_group are linked as circular linked list.
-		 */
-		return NULL;
-	}
+	if (!next_tid)
+		goto retry;
 
 	if (skip_if_dup_files && task->files == task->group_leader->files)
 		goto retry;