diff mbox series

[bpf-next] bpf: fix NULL event->prog pointer access in bpf_overflow_handler

Message ID 20210819155209.1927994-1-yhs@fb.com (mailing list archive)
State Accepted
Commit 594286b7574c6e8217b1c233cc0d0650f2268a77
Delegated to: BPF
Headers show
Series [bpf-next] bpf: fix NULL event->prog pointer access in bpf_overflow_handler | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: peterz@infradead.org; 13 maintainers not CCed: linux-perf-users@vger.kernel.org mark.rutland@arm.com songliubraving@fb.com jolsa@redhat.com mingo@redhat.com kafai@fb.com netdev@vger.kernel.org peterz@infradead.org acme@kernel.org namhyung@kernel.org john.fastabend@gmail.com alexander.shishkin@linux.intel.com kpsingh@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 41 this patch: 41
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 17 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 41 this patch: 41
netdev/header_inline success Link

Commit Message

Yonghong Song Aug. 19, 2021, 3:52 p.m. UTC
Andrii reported that libbpf CI hit the following oops when
running selftest send_signal:
  [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030
  [ 1243.161066] #PF: supervisor read access in kernel mode
  [ 1243.161066] #PF: error_code(0x0000) - not-present page
  [ 1243.161066] PGD 0 P4D 0
  [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI
  [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G           O      5.14.0-rc5 #1
  [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0
  [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 <ff> 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f
  [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046
  [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8
  [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80
  [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000
  [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000
  [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082
  [ 1243.161066] FS:  00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
  [ 1243.161066] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0
  [ 1243.161066] Call Trace:
  [ 1243.161066]  <IRQ>
  [ 1243.161066]  __perf_event_overflow+0x4f/0xf0
  [ 1243.161066]  perf_swevent_hrtimer+0x116/0x130
  [ 1243.161066]  ? __lock_acquire+0x378/0x2730
  [ 1243.161066]  ? __lock_acquire+0x372/0x2730
  [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
  [ 1243.161066]  ? find_held_lock+0x2b/0x80
  [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
  [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
  [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
  [ 1243.161066]  __hrtimer_run_queues+0x1a3/0x460
  [ 1243.161066]  hrtimer_interrupt+0x110/0x220
  [ 1243.161066]  __sysvec_apic_timer_interrupt+0x8a/0x260
  [ 1243.161066]  sysvec_apic_timer_interrupt+0x89/0xc0
  [ 1243.161066]  </IRQ>
  [ 1243.161066]  asm_sysvec_apic_timer_interrupt+0x12/0x20
  [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250
  [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 <58> 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0
  [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282
  [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000
  [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e
  [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001
  [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80
  [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000
  [ 1243.161066]  ? finish_task_switch+0xab/0x250
  [ 1243.161066]  ? finish_task_switch+0x70/0x250
  [ 1243.161066]  __schedule+0x36b/0xbb0
  [ 1243.161066]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
  [ 1243.161066]  ? lockdep_hardirqs_on+0x79/0x100
  [ 1243.161066]  schedule+0x43/0xe0
  [ 1243.161066]  pipe_read+0x30b/0x450
  [ 1243.161066]  ? wait_woken+0x80/0x80
  [ 1243.161066]  new_sync_read+0x164/0x170
  [ 1243.161066]  vfs_read+0x122/0x1b0
  [ 1243.161066]  ksys_read+0x93/0xd0
  [ 1243.161066]  do_syscall_64+0x35/0x80
  [ 1243.161066]  entry_SYSCALL_64_after_hwframe+0x44/0xae

The oops can also be reproduced with the following steps:
  ./vmtest.sh -s
  # at qemu shell
  cd /root/bpf && while true; do ./test_progs -t send_signal

Further analysis showed that the failure is introduced with
commit b89fbfbb854c ("bpf: Implement minimal BPF perf link").
With the above commit, the following scenario becomes possible:
    cpu1                        cpu2
                                hrtimer_interrupt -> bpf_overflow_handler
    (due to closing link_fd)
    bpf_perf_link_release ->
    perf_event_free_bpf_prog ->
    perf_event_free_bpf_handler ->
      WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler)
      event->prog = NULL
                                bpf_prog_run(event->prog, &ctx)

In the above case, the event->prog is NULL for bpf_prog_run, hence
causing oops.

To fix the issue, check whether event->prog is NULL or not. If it
is, do not call bpf_prog_run. This seems working as the above
reproducible step runs more than one hour and I didn't see any
failures.

Fixes: b89fbfbb854c ("bpf: Implement minimal BPF perf link")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/events/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Andrii Nakryiko Aug. 19, 2021, 6:48 p.m. UTC | #1
On Thu, Aug 19, 2021 at 8:52 AM Yonghong Song <yhs@fb.com> wrote:
>
> Andrii reported that libbpf CI hit the following oops when
> running selftest send_signal:
>   [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030
>   [ 1243.161066] #PF: supervisor read access in kernel mode
>   [ 1243.161066] #PF: error_code(0x0000) - not-present page
>   [ 1243.161066] PGD 0 P4D 0
>   [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI
>   [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G           O      5.14.0-rc5 #1
>   [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>   [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0
>   [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 <ff> 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f
>   [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046
>   [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8
>   [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80
>   [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000
>   [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000
>   [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082
>   [ 1243.161066] FS:  00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
>   [ 1243.161066] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0
>   [ 1243.161066] Call Trace:
>   [ 1243.161066]  <IRQ>
>   [ 1243.161066]  __perf_event_overflow+0x4f/0xf0
>   [ 1243.161066]  perf_swevent_hrtimer+0x116/0x130
>   [ 1243.161066]  ? __lock_acquire+0x378/0x2730
>   [ 1243.161066]  ? __lock_acquire+0x372/0x2730
>   [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
>   [ 1243.161066]  ? find_held_lock+0x2b/0x80
>   [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
>   [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
>   [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
>   [ 1243.161066]  __hrtimer_run_queues+0x1a3/0x460
>   [ 1243.161066]  hrtimer_interrupt+0x110/0x220
>   [ 1243.161066]  __sysvec_apic_timer_interrupt+0x8a/0x260
>   [ 1243.161066]  sysvec_apic_timer_interrupt+0x89/0xc0
>   [ 1243.161066]  </IRQ>
>   [ 1243.161066]  asm_sysvec_apic_timer_interrupt+0x12/0x20
>   [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250
>   [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 <58> 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0
>   [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282
>   [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000
>   [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e
>   [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001
>   [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80
>   [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000
>   [ 1243.161066]  ? finish_task_switch+0xab/0x250
>   [ 1243.161066]  ? finish_task_switch+0x70/0x250
>   [ 1243.161066]  __schedule+0x36b/0xbb0
>   [ 1243.161066]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
>   [ 1243.161066]  ? lockdep_hardirqs_on+0x79/0x100
>   [ 1243.161066]  schedule+0x43/0xe0
>   [ 1243.161066]  pipe_read+0x30b/0x450
>   [ 1243.161066]  ? wait_woken+0x80/0x80
>   [ 1243.161066]  new_sync_read+0x164/0x170
>   [ 1243.161066]  vfs_read+0x122/0x1b0
>   [ 1243.161066]  ksys_read+0x93/0xd0
>   [ 1243.161066]  do_syscall_64+0x35/0x80
>   [ 1243.161066]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The oops can also be reproduced with the following steps:
>   ./vmtest.sh -s
>   # at qemu shell
>   cd /root/bpf && while true; do ./test_progs -t send_signal
>
> Further analysis showed that the failure is introduced with
> commit b89fbfbb854c ("bpf: Implement minimal BPF perf link").
> With the above commit, the following scenario becomes possible:
>     cpu1                        cpu2
>                                 hrtimer_interrupt -> bpf_overflow_handler
>     (due to closing link_fd)
>     bpf_perf_link_release ->
>     perf_event_free_bpf_prog ->
>     perf_event_free_bpf_handler ->
>       WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler)
>       event->prog = NULL
>                                 bpf_prog_run(event->prog, &ctx)
>
> In the above case, the event->prog is NULL for bpf_prog_run, hence
> causing oops.
>
> To fix the issue, check whether event->prog is NULL or not. If it
> is, do not call bpf_prog_run. This seems working as the above
> reproducible step runs more than one hour and I didn't see any
> failures.
>
> Fixes: b89fbfbb854c ("bpf: Implement minimal BPF perf link")
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/events/core.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 2d1e63dd97f2..011cc5069b7b 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9920,13 +9920,16 @@ static void bpf_overflow_handler(struct perf_event *event,
>                 .data = data,
>                 .event = event,
>         };
> +       struct bpf_prog *prog;
>         int ret = 0;
>
>         ctx.regs = perf_arch_bpf_user_pt_regs(regs);
>         if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
>                 goto out;
>         rcu_read_lock();
> -       ret = bpf_prog_run(event->prog, &ctx);
> +       prog = READ_ONCE(event->prog);
> +       if (prog)
> +               ret = bpf_prog_run(prog, &ctx);

Thanks a lot for figuring this out! I knew BPF_PROG_RUN_ARRAY does
NULL check, but I missed that raw perf_event has its own code path
that doesn't use any of that.

>         rcu_read_unlock();
>  out:
>         __this_cpu_dec(bpf_prog_active);
> --
> 2.30.2
>
patchwork-bot+netdevbpf@kernel.org Aug. 19, 2021, 7:30 p.m. UTC | #2
Hello:

This patch was applied to bpf/bpf-next.git (refs/heads/master):

On Thu, 19 Aug 2021 08:52:09 -0700 you wrote:
> Andrii reported that libbpf CI hit the following oops when
> running selftest send_signal:
>   [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030
>   [ 1243.161066] #PF: supervisor read access in kernel mode
>   [ 1243.161066] #PF: error_code(0x0000) - not-present page
>   [ 1243.161066] PGD 0 P4D 0
>   [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI
>   [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G           O      5.14.0-rc5 #1
>   [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>   [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0
>   [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 <ff> 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f
>   [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046
>   [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8
>   [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80
>   [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000
>   [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000
>   [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082
>   [ 1243.161066] FS:  00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
>   [ 1243.161066] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0
>   [ 1243.161066] Call Trace:
>   [ 1243.161066]  <IRQ>
>   [ 1243.161066]  __perf_event_overflow+0x4f/0xf0
>   [ 1243.161066]  perf_swevent_hrtimer+0x116/0x130
>   [ 1243.161066]  ? __lock_acquire+0x378/0x2730
>   [ 1243.161066]  ? __lock_acquire+0x372/0x2730
>   [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
>   [ 1243.161066]  ? find_held_lock+0x2b/0x80
>   [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
>   [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
>   [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
>   [ 1243.161066]  __hrtimer_run_queues+0x1a3/0x460
>   [ 1243.161066]  hrtimer_interrupt+0x110/0x220
>   [ 1243.161066]  __sysvec_apic_timer_interrupt+0x8a/0x260
>   [ 1243.161066]  sysvec_apic_timer_interrupt+0x89/0xc0
>   [ 1243.161066]  </IRQ>
>   [ 1243.161066]  asm_sysvec_apic_timer_interrupt+0x12/0x20
>   [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250
>   [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 <58> 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0
>   [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282
>   [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000
>   [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e
>   [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001
>   [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80
>   [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000
>   [ 1243.161066]  ? finish_task_switch+0xab/0x250
>   [ 1243.161066]  ? finish_task_switch+0x70/0x250
>   [ 1243.161066]  __schedule+0x36b/0xbb0
>   [ 1243.161066]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
>   [ 1243.161066]  ? lockdep_hardirqs_on+0x79/0x100
>   [ 1243.161066]  schedule+0x43/0xe0
>   [ 1243.161066]  pipe_read+0x30b/0x450
>   [ 1243.161066]  ? wait_woken+0x80/0x80
>   [ 1243.161066]  new_sync_read+0x164/0x170
>   [ 1243.161066]  vfs_read+0x122/0x1b0
>   [ 1243.161066]  ksys_read+0x93/0xd0
>   [ 1243.161066]  do_syscall_64+0x35/0x80
>   [ 1243.161066]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> [...]

Here is the summary with links:
  - [bpf-next] bpf: fix NULL event->prog pointer access in bpf_overflow_handler
    https://git.kernel.org/bpf/bpf-next/c/594286b7574c

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2d1e63dd97f2..011cc5069b7b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9920,13 +9920,16 @@  static void bpf_overflow_handler(struct perf_event *event,
 		.data = data,
 		.event = event,
 	};
+	struct bpf_prog *prog;
 	int ret = 0;
 
 	ctx.regs = perf_arch_bpf_user_pt_regs(regs);
 	if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
 		goto out;
 	rcu_read_lock();
-	ret = bpf_prog_run(event->prog, &ctx);
+	prog = READ_ONCE(event->prog);
+	if (prog)
+		ret = bpf_prog_run(prog, &ctx);
 	rcu_read_unlock();
 out:
 	__this_cpu_dec(bpf_prog_active);