diff mbox series

[1/3,v4,bpf-next] bpf: save correct stopping point in file seq iteration.

Message ID 20201218185032.2464558-2-jonathan.lemon@gmail.com (mailing list archive)
State Accepted, archived
Delegated to: BPF
Headers show
Series bpf: increment and use correct thread iterator | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes fail Link
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: Jonathan Lemon <bsd@fb.com>' != 'Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>' WARNING: Unknown commit id 'a650da2ee52a', maybe rebased or not pulled?
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Jonathan Lemon Dec. 18, 2020, 6:50 p.m. UTC
From: Jonathan Lemon <bsd@fb.com>

On some systems, some variant of the following splat is
repeatedly seen.  The common factor in all traces seems
to be the entry point to task_file_seq_next().  With the
patch, all warnings go away.

    rcu: INFO: rcu_sched self-detected stall on CPU
    rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
    \x09(t=21033 jiffies g=159148529 q=223125)
    NMI backtrace for cpu 26
    CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
    Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
    Call Trace:
     <IRQ>
     dump_stack+0x50/0x70
     nmi_cpu_backtrace.cold.6+0x13/0x50
     ? lapic_can_unplug_cpu.cold.30+0x40/0x40
     nmi_trigger_cpumask_backtrace+0xba/0xca
     rcu_dump_cpu_stacks+0x99/0xc7
     rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
     ? tick_sched_do_timer+0x60/0x60
     update_process_times+0x24/0x50
     tick_sched_timer+0x37/0x70
     __hrtimer_run_queues+0xfe/0x270
     hrtimer_interrupt+0xf4/0x210
     smp_apic_timer_interrupt+0x5e/0x120
     apic_timer_interrupt+0xf/0x20
     </IRQ>
    RIP: 0010:get_pid_task+0x38/0x80
    Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
    RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
    RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
    RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
    R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
    R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
     ? find_ge_pid+0x1b/0x20
     task_seq_get_next+0x52/0xc0
     task_file_seq_get_next+0x159/0x220
     task_file_seq_next+0x4f/0xa0
     bpf_seq_read+0x159/0x390
     vfs_read+0x8a/0x140
     ksys_read+0x59/0xd0
     do_syscall_64+0x42/0x110
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f95ae73e76e
    Code: Bad RIP value.
    RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
    RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
    RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
    RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
    R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
    R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0

If unable to obtain the file structure for the current task,
proceed to the next task number after the one returned from
task_seq_get_next(), instead of the next task number from the
original iterator.

Also, save the stopping task number from task_seq_get_next()
on failure in case of restarts.

Fixes: a650da2ee52a ("bpf: Add task and task/file iterator targets")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 kernel/bpf/task_iter.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Andrii Nakryiko Dec. 18, 2020, 9:01 p.m. UTC | #1
On Fri, Dec 18, 2020 at 12:47 PM Jonathan Lemon
<jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> On some systems, some variant of the following splat is
> repeatedly seen.  The common factor in all traces seems
> to be the entry point to task_file_seq_next().  With the
> patch, all warnings go away.
>
>     rcu: INFO: rcu_sched self-detected stall on CPU
>     rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
>     \x09(t=21033 jiffies g=159148529 q=223125)
>     NMI backtrace for cpu 26
>     CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
>     Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
>     Call Trace:
>      <IRQ>
>      dump_stack+0x50/0x70
>      nmi_cpu_backtrace.cold.6+0x13/0x50
>      ? lapic_can_unplug_cpu.cold.30+0x40/0x40
>      nmi_trigger_cpumask_backtrace+0xba/0xca
>      rcu_dump_cpu_stacks+0x99/0xc7
>      rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
>      ? tick_sched_do_timer+0x60/0x60
>      update_process_times+0x24/0x50
>      tick_sched_timer+0x37/0x70
>      __hrtimer_run_queues+0xfe/0x270
>      hrtimer_interrupt+0xf4/0x210
>      smp_apic_timer_interrupt+0x5e/0x120
>      apic_timer_interrupt+0xf/0x20
>      </IRQ>
>     RIP: 0010:get_pid_task+0x38/0x80
>     Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
>     RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>     RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
>     RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
>     RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
>     R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
>     R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
>      ? find_ge_pid+0x1b/0x20
>      task_seq_get_next+0x52/0xc0
>      task_file_seq_get_next+0x159/0x220
>      task_file_seq_next+0x4f/0xa0
>      bpf_seq_read+0x159/0x390
>      vfs_read+0x8a/0x140
>      ksys_read+0x59/0xd0
>      do_syscall_64+0x42/0x110
>      entry_SYSCALL_64_after_hwframe+0x44/0xa9
>     RIP: 0033:0x7f95ae73e76e
>     Code: Bad RIP value.
>     RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>     RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
>     RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
>     RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
>     R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
>     R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
>
> If unable to obtain the file structure for the current task,
> proceed to the next task number after the one returned from
> task_seq_get_next(), instead of the next task number from the
> original iterator.
>
> Also, save the stopping task number from task_seq_get_next()
> on failure in case of restarts.
>
> Fixes: a650da2ee52a ("bpf: Add task and task/file iterator targets")
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---

LGTM, thanks!

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  kernel/bpf/task_iter.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 0458a40edf10..8033ab19138a 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -158,13 +158,14 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>                 if (!curr_task) {
>                         info->task = NULL;
>                         info->files = NULL;
> +                       info->tid = curr_tid;
>                         return NULL;
>                 }
>
>                 curr_files = get_files_struct(curr_task);
>                 if (!curr_files) {
>                         put_task_struct(curr_task);
> -                       curr_tid = ++(info->tid);
> +                       curr_tid = curr_tid + 1;
>                         info->fd = 0;
>                         goto again;
>                 }
> --
> 2.24.1
>
Daniel Borkmann Dec. 24, 2020, 1:08 a.m. UTC | #2
On 12/18/20 10:01 PM, Andrii Nakryiko wrote:
> On Fri, Dec 18, 2020 at 12:47 PM Jonathan Lemon
> <jonathan.lemon@gmail.com> wrote:
>>
>> From: Jonathan Lemon <bsd@fb.com>
>>
>> On some systems, some variant of the following splat is
>> repeatedly seen.  The common factor in all traces seems
>> to be the entry point to task_file_seq_next().  With the
>> patch, all warnings go away.
>>
>>      rcu: INFO: rcu_sched self-detected stall on CPU
>>      rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
>>      \x09(t=21033 jiffies g=159148529 q=223125)
>>      NMI backtrace for cpu 26
>>      CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
>>      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
>>      Call Trace:
>>       <IRQ>
>>       dump_stack+0x50/0x70
>>       nmi_cpu_backtrace.cold.6+0x13/0x50
>>       ? lapic_can_unplug_cpu.cold.30+0x40/0x40
>>       nmi_trigger_cpumask_backtrace+0xba/0xca
>>       rcu_dump_cpu_stacks+0x99/0xc7
>>       rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
>>       ? tick_sched_do_timer+0x60/0x60
>>       update_process_times+0x24/0x50
>>       tick_sched_timer+0x37/0x70
>>       __hrtimer_run_queues+0xfe/0x270
>>       hrtimer_interrupt+0xf4/0x210
>>       smp_apic_timer_interrupt+0x5e/0x120
>>       apic_timer_interrupt+0xf/0x20
>>       </IRQ>
>>      RIP: 0010:get_pid_task+0x38/0x80
>>      Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
>>      RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>>      RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
>>      RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
>>      RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
>>      R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
>>      R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
>>       ? find_ge_pid+0x1b/0x20
>>       task_seq_get_next+0x52/0xc0
>>       task_file_seq_get_next+0x159/0x220
>>       task_file_seq_next+0x4f/0xa0
>>       bpf_seq_read+0x159/0x390
>>       vfs_read+0x8a/0x140
>>       ksys_read+0x59/0xd0
>>       do_syscall_64+0x42/0x110
>>       entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>      RIP: 0033:0x7f95ae73e76e
>>      Code: Bad RIP value.
>>      RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>>      RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
>>      RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
>>      RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
>>      R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
>>      R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
>>
>> If unable to obtain the file structure for the current task,
>> proceed to the next task number after the one returned from
>> task_seq_get_next(), instead of the next task number from the
>> original iterator.
>>
>> Also, save the stopping task number from task_seq_get_next()
>> on failure in case of restarts.
>>
>> Fixes: a650da2ee52a ("bpf: Add task and task/file iterator targets")

Commit sha is non-existent, I fixed it up. Took first 2 into bpf given first in
particular is a fix. Please submit 3/3 individually once we synced trees back.

>> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
>> ---
> 
> LGTM, thanks!
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
diff mbox series

Patch

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 0458a40edf10..8033ab19138a 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -158,13 +158,14 @@  task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 		if (!curr_task) {
 			info->task = NULL;
 			info->files = NULL;
+			info->tid = curr_tid;
 			return NULL;
 		}
 
 		curr_files = get_files_struct(curr_task);
 		if (!curr_files) {
 			put_task_struct(curr_task);
-			curr_tid = ++(info->tid);
+			curr_tid = curr_tid + 1;
 			info->fd = 0;
 			goto again;
 		}