diff mbox series

[3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check

Message ID 20230825161947.GA16871@redhat.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: task_group_seq_get_next: use __next_thread() | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1332 this patch: 1332
netdev/cc_maintainers warning 9 maintainers not CCed: kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com sdf@google.com song@kernel.org yonghong.song@linux.dev jolsa@kernel.org haoluo@google.com ast@kernel.org
netdev/build_clang success Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1355 this patch: 1355
netdev/checkpatch warning WARNING: line length of 86 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-0 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-7 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for build for x86_64 with llvm-16

Commit Message

Oleg Nesterov Aug. 25, 2023, 4:19 p.m. UTC
Unless I am notally confused it is wrong. We are going to return or
skip next_task so we need to check next_task-files, not task->files.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/bpf/task_iter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Oleg Nesterov Aug. 25, 2023, 5:04 p.m. UTC | #1
Forgot to mention in the changelog...

In any case this doesn't look right. ->group_leader can exit before other
threads, call exit_files(), and in this case task_group_seq_get_next() will
check task->files == NULL.

On 08/25, Oleg Nesterov wrote:
>
> Unless I am notally confused it is wrong. We are going to return or
> skip next_task so we need to check next_task-files, not task->files.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/bpf/task_iter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 1589ec3faded..2264870ae3fc 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>
>  	common->pid_visiting = *tid;
>
> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>  		task = next_task;
>  		goto retry;
>  	}
> --
> 2.25.1.362.g51ebf55
Yonghong Song Aug. 25, 2023, 10:49 p.m. UTC | #2
On 8/25/23 9:19 AM, Oleg Nesterov wrote:
> Unless I am notally confused it is wrong. We are going to return or
> skip next_task so we need to check next_task-files, not task->files.

Thanks for capturing this. This is indeed an oversight.

Acked-by: Yonghong Song <yonghong.song@linux.dev>

> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>   kernel/bpf/task_iter.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 1589ec3faded..2264870ae3fc 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>   
>   	common->pid_visiting = *tid;
>   
> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>   		task = next_task;
>   		goto retry;
>   	}
Yonghong Song Aug. 25, 2023, 10:52 p.m. UTC | #3
On 8/25/23 10:04 AM, Oleg Nesterov wrote:
> Forgot to mention in the changelog...
> 
> In any case this doesn't look right. ->group_leader can exit before other
> threads, call exit_files(), and in this case task_group_seq_get_next() will
> check task->files == NULL.

It is okay. This won't be affecting correctness. We will end with
calling bpf program for 'next_task'.

> 
> On 08/25, Oleg Nesterov wrote:
>>
>> Unless I am notally confused it is wrong. We are going to return or
>> skip next_task so we need to check next_task-files, not task->files.
>>
>> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>> ---
>>   kernel/bpf/task_iter.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 1589ec3faded..2264870ae3fc 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>>
>>   	common->pid_visiting = *tid;
>>
>> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
>> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>>   		task = next_task;
>>   		goto retry;
>>   	}
>> --
>> 2.25.1.362.g51ebf55
> 
>
Oleg Nesterov Aug. 27, 2023, 8:19 p.m. UTC | #4
On 08/25, Yonghong Song wrote:
>
> On 8/25/23 10:04 AM, Oleg Nesterov wrote:
> >Forgot to mention in the changelog...
> >
> >In any case this doesn't look right. ->group_leader can exit before other
> >threads, call exit_files(), and in this case task_group_seq_get_next() will
> >check task->files == NULL.
>
> It is okay. This won't be affecting correctness. We will end with
> calling bpf program for 'next_task'.

Well, I didn't mean it is necessarily wrong, I simply do not know.

But let's suppose that we have a thread group with the main thread M + 1000
sub-threads. In the likely case they all have the same ->files, CLONE_THREAD
without CLONE_FILES is not that common.

Let's assume the BPF_TASK_ITER_TGID case for simplicity.

Now lets look at task_file_seq_get_next() which passes skip_if_dup_files == 1
to task_seq_get_next() and thus to task_group_seq_get_next().

Now, in this case task_seq_get_next() will return non-NULL only once (OK, unless
task_file_seq_ops.stop() was called), it will return the group leader M first,
then after task_file_seq_get_next() "reports" all the fd's of M and increments
info->tid, the next task_seq_get_next(&info->tid, true) should return NULL because
of the skip_if_dup_files check in task_group_seq_get_next().

Right?

But. if the group leader M exits then M->files == NULL. And in this case
task_seq_get_next() will need to "inspect" all the sub-threads even if they all
have the same ->files pointer.

No?

Again, I am not saying this is a bug and quite possibly I misread this code, but
in any case the skip_if_dup_files logic looks sub-optimal and confusing to me.

Nevermind, please forget. This is minor even if I am right.

Thanks for rewiev!

Oleg.
Yonghong Song Aug. 28, 2023, 1:18 a.m. UTC | #5
On 8/27/23 1:19 PM, Oleg Nesterov wrote:
> On 08/25, Yonghong Song wrote:
>>
>> On 8/25/23 10:04 AM, Oleg Nesterov wrote:
>>> Forgot to mention in the changelog...
>>>
>>> In any case this doesn't look right. ->group_leader can exit before other
>>> threads, call exit_files(), and in this case task_group_seq_get_next() will
>>> check task->files == NULL.
>>
>> It is okay. This won't be affecting correctness. We will end with
>> calling bpf program for 'next_task'.
> 
> Well, I didn't mean it is necessarily wrong, I simply do not know.
> 
> But let's suppose that we have a thread group with the main thread M + 1000
> sub-threads. In the likely case they all have the same ->files, CLONE_THREAD
> without CLONE_FILES is not that common.
> 
> Let's assume the BPF_TASK_ITER_TGID case for simplicity.
> 
> Now lets look at task_file_seq_get_next() which passes skip_if_dup_files == 1
> to task_seq_get_next() and thus to task_group_seq_get_next().
> 
> Now, in this case task_seq_get_next() will return non-NULL only once (OK, unless
> task_file_seq_ops.stop() was called), it will return the group leader M first,
> then after task_file_seq_get_next() "reports" all the fd's of M and increments
> info->tid, the next task_seq_get_next(&info->tid, true) should return NULL because
> of the skip_if_dup_files check in task_group_seq_get_next().
> 
> Right?
> 
> But. if the group leader M exits then M->files == NULL. And in this case
> task_seq_get_next() will need to "inspect" all the sub-threads even if they all
> have the same ->files pointer.

That is correct. I do not have practical experience on how much
possibility this scenario may happen. I assume it should be very low.
If this is not the case, we might need to revisit.

> 
> No?
> 
> Again, I am not saying this is a bug and quite possibly I misread this code, but
> in any case the skip_if_dup_files logic looks sub-optimal and confusing to me.
> 
> Nevermind, please forget. This is minor even if I am right.
> 
> Thanks for rewiev!
> 
> Oleg.
>
Oleg Nesterov Aug. 28, 2023, 10:54 a.m. UTC | #6
On 08/27, Yonghong Song wrote:
>
> On 8/27/23 1:19 PM, Oleg Nesterov wrote:
> >
> >But. if the group leader M exits then M->files == NULL. And in this case
> >task_seq_get_next() will need to "inspect" all the sub-threads even if they all
> >have the same ->files pointer.
>
> That is correct. I do not have practical experience on how much
> possibility this scenario may happen. I assume it should be very low.

Yes. I just tried to explain why the ->files check looks confusing to me.
Nevermind.

Could you review 6/6 as well?

Should I fold 1-5 into a single patch? I tried to document every change
and simplify the review, but I do not want to blow the git history.

Oleg.
Yonghong Song Aug. 29, 2023, 12:30 a.m. UTC | #7
On 8/28/23 3:54 AM, Oleg Nesterov wrote:
> On 08/27, Yonghong Song wrote:
>>
>> On 8/27/23 1:19 PM, Oleg Nesterov wrote:
>>>
>>> But. if the group leader M exits then M->files == NULL. And in this case
>>> task_seq_get_next() will need to "inspect" all the sub-threads even if they all
>>> have the same ->files pointer.
>>
>> That is correct. I do not have practical experience on how much
>> possibility this scenario may happen. I assume it should be very low.
> 
> Yes. I just tried to explain why the ->files check looks confusing to me.
> Nevermind.
> 
> Could you review 6/6 as well?

I think we can wait patch 6/6 after
    https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
is merged.

> 
> Should I fold 1-5 into a single patch? I tried to document every change
> and simplify the review, but I do not want to blow the git history.

Currently, because patch 6, the whole patch set cannot be tested by
bpf CI since it has a build failure:
   https://github.com/kernel-patches/bpf/pull/5580
I suggest you get patch 1-5 and resubmit with tag like
   "bpf-next v2"
   [Patch bpf-next v2 x/5] ...
so CI can build with different architectures and compilers to
ensure everything builds and runs fine.

> 
> Oleg.
>
Oleg Nesterov Aug. 30, 2023, 11:54 p.m. UTC | #8
On 08/28, Yonghong Song wrote:
>
> On 8/28/23 3:54 AM, Oleg Nesterov wrote:
> >
> >Could you review 6/6 as well?
>
> I think we can wait patch 6/6 after
>    https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> is merged.

OK.

> >Should I fold 1-5 into a single patch? I tried to document every change
> >and simplify the review, but I do not want to blow the git history.
>
> Currently, because patch 6, the whole patch set cannot be tested by
> bpf CI since it has a build failure:
>   https://github.com/kernel-patches/bpf/pull/5580

Heh. I thought this is obvious. I thought you can test 1-5 without 6/6
and _review_ 6/6.

I simply can't understand how can this pull/5580 come when I specially
mentioned

	> 6/6 obviously depends on
	>
	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
	>
	> which was not merged yet.

in 0/6.

> I suggest you get patch 1-5 and resubmit with tag like
>   "bpf-next v2"
>   [Patch bpf-next v2 x/5] ...
> so CI can build with different architectures and compilers to
> ensure everything builds and runs fine.

I think we can wait for

	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/

as you suggest above, then I'll send the s/next_thread/__next_thread/
oneliner without 1-5. I no longer think it makes sense to try to cleanup
the poor task_group_seq_get_next() when IMHO the whole task_iter logic
needs the complete rewrite. Yes, yes, I know, it is very easy to blame
someone else's code, sorry can't resist ;)

The only "fix" in this series is 3/6, but this code has more serious
bugs, so I guess we can forget it.

Oleg.
Yonghong Song Aug. 31, 2023, 11:29 a.m. UTC | #9
On 8/30/23 7:54 PM, Oleg Nesterov wrote:
> On 08/28, Yonghong Song wrote:
>>
>> On 8/28/23 3:54 AM, Oleg Nesterov wrote:
>>>
>>> Could you review 6/6 as well?
>>
>> I think we can wait patch 6/6 after
>>     https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
>> is merged.
> 
> OK.
> 
>>> Should I fold 1-5 into a single patch? I tried to document every change
>>> and simplify the review, but I do not want to blow the git history.
>>
>> Currently, because patch 6, the whole patch set cannot be tested by
>> bpf CI since it has a build failure:
>>    https://github.com/kernel-patches/bpf/pull/5580
> 
> Heh. I thought this is obvious. I thought you can test 1-5 without 6/6
> and _review_ 6/6.
> 
> I simply can't understand how can this pull/5580 come when I specially
> mentioned
> 
> 	> 6/6 obviously depends on
> 	>
> 	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
> 	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> 	>
> 	> which was not merged yet.
> 
> in 0/6.

The process in CI for testing is fully automated, and it does
not look at commit message. That is why it takes the whole
series. This is true for all other patch set.

> 
>> I suggest you get patch 1-5 and resubmit with tag like
>>    "bpf-next v2"
>>    [Patch bpf-next v2 x/5] ...
>> so CI can build with different architectures and compilers to
>> ensure everything builds and runs fine.
> 
> I think we can wait for
> 
> 	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> 
> as you suggest above, then I'll send the s/next_thread/__next_thread/
> oneliner without 1-5. I no longer think it makes sense to try to cleanup
> the poor task_group_seq_get_next() when IMHO the whole task_iter logic
> needs the complete rewrite. Yes, yes, I know, it is very easy to blame
> someone else's code, sorry can't resist ;)
> 
> The only "fix" in this series is 3/6, but this code has more serious
> bugs, so I guess we can forget it.
> 
> Oleg.
>
Oleg Nesterov Aug. 31, 2023, 12:06 p.m. UTC | #10
On 08/31, Yonghong Song wrote:
>
> On 8/30/23 7:54 PM, Oleg Nesterov wrote:
> >
> >I simply can't understand how can this pull/5580 come when I specially
> >mentioned
> >
> >	> 6/6 obviously depends on
> >	>
> >	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
> >	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> >	>
> >	> which was not merged yet.
> >
> >in 0/6.
>
> The process in CI for testing is fully automated,

Ah, OK, sorry then.

> >>I suggest you get patch 1-5 and resubmit with tag like
> >>   "bpf-next v2"
> >>   [Patch bpf-next v2 x/5] ...
> >>so CI can build with different architectures and compilers to
> >>ensure everything builds and runs fine.

OK, will do when I have time.

Thanks,

Oleg.
diff mbox series

Patch

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 1589ec3faded..2264870ae3fc 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -82,7 +82,7 @@  static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 
 	common->pid_visiting = *tid;
 
-	if (skip_if_dup_files && task->files == task->group_leader->files) {
+	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
 		task = next_task;
 		goto retry;
 	}