diff mbox series

[bpf,1/2] bpf: Fix a kernel verifier crash in stacksafe()

Message ID 20240812052106.3980303-1-yonghong.song@linux.dev (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf,1/2] bpf: Fix a kernel verifier crash in stacksafe() | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 29 this patch: 29
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 7 maintainers not CCed: kpsingh@kernel.org martin.lau@linux.dev haoluo@google.com jolsa@kernel.org song@kernel.org sdf@fomichev.me john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 39 this patch: 39
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-13 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-15 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc

Commit Message

Yonghong Song Aug. 12, 2024, 5:21 a.m. UTC
Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
The crash dump looks like below:

  [   65.874474] BUG: kernel NULL pointer dereference, address: 0000000000000088
  [   65.888406] #PF: supervisor read access in kernel mode
  [   65.898682] #PF: error_code(0x0000) - not-present page
  [   65.908957] PGD 0 P4D 0
  [   65.914020] Oops: 0000 [#1] SMP
  [   65.920300] CPU: 19 PID: 9364 Comm: scx_layered Kdump: loaded Tainted: G S          E      6.9.5-g93cea04637ea-dirty #7
  [   65.941874] Hardware name: Quanta Delta Lake MP 29F0EMA01D0/Delta Lake-Class1, BIOS F0E_3A19 04/27/2023
  [   65.960664] RIP: 0010:states_equal+0x3ee/0x770
  [   65.969559] Code: 33 85 ed 89 e8 41 0f 48 c7 83 e0 f8 89 e9 29 c1 48 63 c1 4c 89 e9 48 c1 e1 07 49 8d 14 08 0f
                 b6 54 10 78 49 03 8a 58 05 00 00 <3a> 54 08 78 0f 85 60 03 00 00 49 c1 e5 07 43 8b 44 28 70 83 e0 03
  [   66.007120] RSP: 0018:ffffc9000ebeb8b8 EFLAGS: 00010202
  [   66.017570] RAX: 0000000000000000 RBX: ffff888149719680 RCX: 0000000000000010
  [   66.031843] RDX: 0000000000000000 RSI: ffff88907f4e0c08 RDI: ffff8881572f0000
  [   66.046115] RBP: 0000000000000000 R08: ffff8883d5014000 R09: ffffffff83065d50
  [   66.060386] R10: ffff8881bf9a1800 R11: 0000000000000002 R12: 0000000000000000
  [   66.074659] R13: 0000000000000000 R14: ffff888149719a40 R15: 0000000000000007
  [   66.088932] FS:  00007f5d5da96800(0000) GS:ffff88907f4c0000(0000) knlGS:0000000000000000
  [   66.105114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   66.116606] CR2: 0000000000000088 CR3: 0000000388261001 CR4: 00000000007706f0
  [   66.130873] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   66.145145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   66.159416] PKRU: 55555554
  [   66.164823] Call Trace:
  [   66.169709]  <TASK>
  [   66.173906]  ? __die_body+0x66/0xb0
  [   66.180890]  ? page_fault_oops+0x370/0x3d0
  [   66.189082]  ? console_unlock+0xb5/0x140
  [   66.196926]  ? exc_page_fault+0x4f/0xb0
  [   66.204597]  ? asm_exc_page_fault+0x22/0x30
  [   66.212974]  ? states_equal+0x3ee/0x770
  [   66.220643]  ? states_equal+0x529/0x770
  [   66.228312]  do_check+0x60f/0x5240
  [   66.235114]  do_check_common+0x388/0x840
  [   66.242960]  do_check_subprogs+0x101/0x150
  [   66.251150]  bpf_check+0x5d5/0x4b60
  [   66.258134]  ? __mod_memcg_state+0x79/0x110
  [   66.266506]  ? pcpu_alloc+0x892/0xba0
  [   66.273829]  bpf_prog_load+0x5bb/0x660
  [   66.281324]  ? bpf_prog_bind_map+0x1e1/0x290
  [   66.289862]  __sys_bpf+0x29d/0x3a0
  [   66.296664]  __x64_sys_bpf+0x18/0x20
  [   66.303811]  do_syscall_64+0x6a/0x140
  [   66.311133]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Forther investigation shows that the crash is due to invalid memory access in stacksafe().
More specifically, it is the following code:

    if (exact != NOT_EXACT &&
        old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
        cur->stack[spi].slot_type[i % BPF_REG_SIZE])
            return false;

If cur->allocated_stack is 0, cur->stack will be a ZERO_SIZE_PTR. If this happens,
cur->stack[spi].slot_type[i % BPF_REG_SIZE] will crash the kernel as the memory
address is illegal. This is exactly what happened in the above crash dump.
If cur->allocated_stack is not 0, the above code could trigger array out-of-bound
access.

The patch added a condition 'i < cur->allocated_stack' to ensure
cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access always legal.

Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
Cc: Eduard Zingerman <eddyz87@gmail.com>
Reported-by: Daniel Hodges <hodgesd@meta.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eduard Zingerman Aug. 12, 2024, 5:38 p.m. UTC | #1
On Sun, 2024-08-11 at 22:21 -0700, Yonghong Song wrote:
> Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
> The crash dump looks like below:
> 
>   [   65.874474] BUG: kernel NULL pointer dereference, address: 0000000000000088
>   [   65.888406] #PF: supervisor read access in kernel mode
>   [   65.898682] #PF: error_code(0x0000) - not-present page
>   [   65.908957] PGD 0 P4D 0
>   [   65.914020] Oops: 0000 [#1] SMP
>   [   65.920300] CPU: 19 PID: 9364 Comm: scx_layered Kdump: loaded Tainted: G S          E      6.9.5-g93cea04637ea-dirty #7
>   [   65.941874] Hardware name: Quanta Delta Lake MP 29F0EMA01D0/Delta Lake-Class1, BIOS F0E_3A19 04/27/2023
>   [   65.960664] RIP: 0010:states_equal+0x3ee/0x770
>   [   65.969559] Code: 33 85 ed 89 e8 41 0f 48 c7 83 e0 f8 89 e9 29 c1 48 63 c1 4c 89 e9 48 c1 e1 07 49 8d 14 08 0f
>                  b6 54 10 78 49 03 8a 58 05 00 00 <3a> 54 08 78 0f 85 60 03 00 00 49 c1 e5 07 43 8b 44 28 70 83 e0 03
>   [   66.007120] RSP: 0018:ffffc9000ebeb8b8 EFLAGS: 00010202
>   [   66.017570] RAX: 0000000000000000 RBX: ffff888149719680 RCX: 0000000000000010
>   [   66.031843] RDX: 0000000000000000 RSI: ffff88907f4e0c08 RDI: ffff8881572f0000
>   [   66.046115] RBP: 0000000000000000 R08: ffff8883d5014000 R09: ffffffff83065d50
>   [   66.060386] R10: ffff8881bf9a1800 R11: 0000000000000002 R12: 0000000000000000
>   [   66.074659] R13: 0000000000000000 R14: ffff888149719a40 R15: 0000000000000007
>   [   66.088932] FS:  00007f5d5da96800(0000) GS:ffff88907f4c0000(0000) knlGS:0000000000000000
>   [   66.105114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [   66.116606] CR2: 0000000000000088 CR3: 0000000388261001 CR4: 00000000007706f0
>   [   66.130873] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   [   66.145145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   [   66.159416] PKRU: 55555554
>   [   66.164823] Call Trace:
>   [   66.169709]  <TASK>
>   [   66.173906]  ? __die_body+0x66/0xb0
>   [   66.180890]  ? page_fault_oops+0x370/0x3d0
>   [   66.189082]  ? console_unlock+0xb5/0x140
>   [   66.196926]  ? exc_page_fault+0x4f/0xb0
>   [   66.204597]  ? asm_exc_page_fault+0x22/0x30
>   [   66.212974]  ? states_equal+0x3ee/0x770
>   [   66.220643]  ? states_equal+0x529/0x770
>   [   66.228312]  do_check+0x60f/0x5240
>   [   66.235114]  do_check_common+0x388/0x840
>   [   66.242960]  do_check_subprogs+0x101/0x150
>   [   66.251150]  bpf_check+0x5d5/0x4b60
>   [   66.258134]  ? __mod_memcg_state+0x79/0x110
>   [   66.266506]  ? pcpu_alloc+0x892/0xba0
>   [   66.273829]  bpf_prog_load+0x5bb/0x660
>   [   66.281324]  ? bpf_prog_bind_map+0x1e1/0x290
>   [   66.289862]  __sys_bpf+0x29d/0x3a0
>   [   66.296664]  __x64_sys_bpf+0x18/0x20
>   [   66.303811]  do_syscall_64+0x6a/0x140
>   [   66.311133]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> Forther investigation shows that the crash is due to invalid memory access in stacksafe().
> More specifically, it is the following code:
> 
>     if (exact != NOT_EXACT &&
>         old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
>         cur->stack[spi].slot_type[i % BPF_REG_SIZE])
>             return false;
> 
> If cur->allocated_stack is 0, cur->stack will be a ZERO_SIZE_PTR. If this happens,
> cur->stack[spi].slot_type[i % BPF_REG_SIZE] will crash the kernel as the memory
> address is illegal. This is exactly what happened in the above crash dump.
> If cur->allocated_stack is not 0, the above code could trigger array out-of-bound
> access.
> 
> The patch added a condition 'i < cur->allocated_stack' to ensure
> cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access always legal.
> 
> Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
> Cc: Eduard Zingerman <eddyz87@gmail.com>
> Reported-by: Daniel Hodges <hodgesd@meta.com>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---

My bad, for some reason I thought that 'if (i >= cur->allocated_stack) return false;'
check below would be sufficient. (Which is obviously not true, sigh...).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>


[...]
Alexei Starovoitov Aug. 12, 2024, 5:44 p.m. UTC | #2
On Mon, Aug 12, 2024 at 10:38 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Sun, 2024-08-11 at 22:21 -0700, Yonghong Song wrote:
> > Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
> > The crash dump looks like below:
> >
> >   [   65.874474] BUG: kernel NULL pointer dereference, address: 0000000000000088
> >   [   65.888406] #PF: supervisor read access in kernel mode
> >   [   65.898682] #PF: error_code(0x0000) - not-present page
> >   [   65.908957] PGD 0 P4D 0
> >   [   65.914020] Oops: 0000 [#1] SMP
> >   [   65.920300] CPU: 19 PID: 9364 Comm: scx_layered Kdump: loaded Tainted: G S          E      6.9.5-g93cea04637ea-dirty #7
> >   [   65.941874] Hardware name: Quanta Delta Lake MP 29F0EMA01D0/Delta Lake-Class1, BIOS F0E_3A19 04/27/2023
> >   [   65.960664] RIP: 0010:states_equal+0x3ee/0x770
> >   [   65.969559] Code: 33 85 ed 89 e8 41 0f 48 c7 83 e0 f8 89 e9 29 c1 48 63 c1 4c 89 e9 48 c1 e1 07 49 8d 14 08 0f
> >                  b6 54 10 78 49 03 8a 58 05 00 00 <3a> 54 08 78 0f 85 60 03 00 00 49 c1 e5 07 43 8b 44 28 70 83 e0 03
> >   [   66.007120] RSP: 0018:ffffc9000ebeb8b8 EFLAGS: 00010202
> >   [   66.017570] RAX: 0000000000000000 RBX: ffff888149719680 RCX: 0000000000000010
> >   [   66.031843] RDX: 0000000000000000 RSI: ffff88907f4e0c08 RDI: ffff8881572f0000
> >   [   66.046115] RBP: 0000000000000000 R08: ffff8883d5014000 R09: ffffffff83065d50
> >   [   66.060386] R10: ffff8881bf9a1800 R11: 0000000000000002 R12: 0000000000000000
> >   [   66.074659] R13: 0000000000000000 R14: ffff888149719a40 R15: 0000000000000007
> >   [   66.088932] FS:  00007f5d5da96800(0000) GS:ffff88907f4c0000(0000) knlGS:0000000000000000
> >   [   66.105114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   [   66.116606] CR2: 0000000000000088 CR3: 0000000388261001 CR4: 00000000007706f0
> >   [   66.130873] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >   [   66.145145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >   [   66.159416] PKRU: 55555554
> >   [   66.164823] Call Trace:
> >   [   66.169709]  <TASK>
> >   [   66.173906]  ? __die_body+0x66/0xb0
> >   [   66.180890]  ? page_fault_oops+0x370/0x3d0
> >   [   66.189082]  ? console_unlock+0xb5/0x140
> >   [   66.196926]  ? exc_page_fault+0x4f/0xb0
> >   [   66.204597]  ? asm_exc_page_fault+0x22/0x30
> >   [   66.212974]  ? states_equal+0x3ee/0x770
> >   [   66.220643]  ? states_equal+0x529/0x770
> >   [   66.228312]  do_check+0x60f/0x5240
> >   [   66.235114]  do_check_common+0x388/0x840
> >   [   66.242960]  do_check_subprogs+0x101/0x150
> >   [   66.251150]  bpf_check+0x5d5/0x4b60
> >   [   66.258134]  ? __mod_memcg_state+0x79/0x110
> >   [   66.266506]  ? pcpu_alloc+0x892/0xba0
> >   [   66.273829]  bpf_prog_load+0x5bb/0x660
> >   [   66.281324]  ? bpf_prog_bind_map+0x1e1/0x290
> >   [   66.289862]  __sys_bpf+0x29d/0x3a0
> >   [   66.296664]  __x64_sys_bpf+0x18/0x20
> >   [   66.303811]  do_syscall_64+0x6a/0x140
> >   [   66.311133]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >
> > Forther investigation shows that the crash is due to invalid memory access in stacksafe().
> > More specifically, it is the following code:
> >
> >     if (exact != NOT_EXACT &&
> >         old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
> >         cur->stack[spi].slot_type[i % BPF_REG_SIZE])
> >             return false;
> >
> > If cur->allocated_stack is 0, cur->stack will be a ZERO_SIZE_PTR. If this happens,
> > cur->stack[spi].slot_type[i % BPF_REG_SIZE] will crash the kernel as the memory
> > address is illegal. This is exactly what happened in the above crash dump.
> > If cur->allocated_stack is not 0, the above code could trigger array out-of-bound
> > access.
> >
> > The patch added a condition 'i < cur->allocated_stack' to ensure
> > cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access always legal.
> >
> > Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
> > Cc: Eduard Zingerman <eddyz87@gmail.com>
> > Reported-by: Daniel Hodges <hodgesd@meta.com>
> > Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> > ---
>
> My bad, for some reason I thought that 'if (i >= cur->allocated_stack) return false;'
> check below would be sufficient. (Which is obviously not true, sigh...).
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>

Should we move the check up instead?

if (i >= cur->allocated_stack)
          return false;

Checking it twice looks odd.
Eduard Zingerman Aug. 12, 2024, 5:47 p.m. UTC | #3
On Mon, 2024-08-12 at 10:44 -0700, Alexei Starovoitov wrote:

[...]

> Should we move the check up instead?
> 
> if (i >= cur->allocated_stack)
>           return false;
> 
> Checking it twice looks odd.

A few checks before that, namely:

		if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
		    && exact == NOT_EXACT) {
			i += BPF_REG_SIZE - 1;
			/* explored state didn't use this */
			continue;
		}

		if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID)
			continue;

		if (env->allow_uninit_stack &&
		    old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC)
			continue;

Should be done regardless cur->allocated_stack.
Alexei Starovoitov Aug. 12, 2024, 5:50 p.m. UTC | #4
On Mon, Aug 12, 2024 at 10:47 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2024-08-12 at 10:44 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > Should we move the check up instead?
> >
> > if (i >= cur->allocated_stack)
> >           return false;
> >
> > Checking it twice looks odd.
>
> A few checks before that, namely:
>
>                 if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
>                     && exact == NOT_EXACT) {
>                         i += BPF_REG_SIZE - 1;
>                         /* explored state didn't use this */
>                         continue;
>                 }
>
>                 if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID)
>                         continue;
>
>                 if (env->allow_uninit_stack &&
>                     old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC)
>                         continue;
>
> Should be done regardless cur->allocated_stack.

Right, but then let's sink old->slot_type != cur->slot_type down?
Eduard Zingerman Aug. 12, 2024, 5:57 p.m. UTC | #5
On Mon, 2024-08-12 at 10:50 -0700, Alexei Starovoitov wrote:
> On Mon, Aug 12, 2024 at 10:47 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Mon, 2024-08-12 at 10:44 -0700, Alexei Starovoitov wrote:
> > 
> > [...]
> > 
> > > Should we move the check up instead?
> > > 
> > > if (i >= cur->allocated_stack)
> > >           return false;
> > > 
> > > Checking it twice looks odd.
> > 
> > A few checks before that, namely:
> > 
> >                 if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
> >                     && exact == NOT_EXACT) {
> >                         i += BPF_REG_SIZE - 1;
> >                         /* explored state didn't use this */
> >                         continue;
> >                 }
> > 
> >                 if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID)
> >                         continue;
> > 
> >                 if (env->allow_uninit_stack &&
> >                     old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC)
> >                         continue;
> > 
> > Should be done regardless cur->allocated_stack.
> 
> Right, but then let's sink old->slot_type != cur->slot_type down?

It does not seem correct to swap the order for these two checks:

		if (exact != NOT_EXACT && i < cur->allocated_stack &&
		    old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
		    cur->stack[spi].slot_type[i % BPF_REG_SIZE])
			return false;

		if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
		    && exact == NOT_EXACT) {
			i += BPF_REG_SIZE - 1;
			/* explored state didn't use this */
			continue;
		}

if we do, 'slot_type' won't be checked for 'cur' when 'old' register is not marked live.
Yonghong Song Aug. 12, 2024, 6:26 p.m. UTC | #6
On 8/12/24 10:50 AM, Alexei Starovoitov wrote:
> On Mon, Aug 12, 2024 at 10:47 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>> On Mon, 2024-08-12 at 10:44 -0700, Alexei Starovoitov wrote:
>>
>> [...]
>>
>>> Should we move the check up instead?
>>>
>>> if (i >= cur->allocated_stack)
>>>            return false;
>>>
>>> Checking it twice looks odd.
>> A few checks before that, namely:
>>
>>                  if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
>>                      && exact == NOT_EXACT) {
>>                          i += BPF_REG_SIZE - 1;
>>                          /* explored state didn't use this */
>>                          continue;
>>                  }
>>
>>                  if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID)
>>                          continue;
>>
>>                  if (env->allow_uninit_stack &&
>>                      old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC)
>>                          continue;
>>
>> Should be done regardless cur->allocated_stack.
> Right, but then let's sink old->slot_type != cur->slot_type down?

We could do the following to avoid double comparison: diff --git 
a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 
df3be12096cf..1906798f1a3d 100644 --- a/kernel/bpf/verifier.c +++ 
b/kernel/bpf/verifier.c @@ -17338,10 +17338,13 @@ static bool 
stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, */ 
for (i = 0; i < old->allocated_stack; i++) { struct bpf_reg_state 
*old_reg, *cur_reg; + bool cur_exceed_bound; spi = i / BPF_REG_SIZE; - 
if (exact != NOT_EXACT && + cur_exceed_bound = i >= 
cur->allocated_stack; + + if (exact != NOT_EXACT && !cur_exceed_bound && 
old->stack[spi].slot_type[i % BPF_REG_SIZE] != 
cur->stack[spi].slot_type[i % BPF_REG_SIZE]) return false; @@ -17363,7 
+17366,7 @@ static bool stacksafe(struct bpf_verifier_env *env, struct 
bpf_func_state *old, /* explored stack has more populated slots than 
current stack * and these slots were used */ - if (i >= 
cur->allocated_stack) + if (cur_exceed_bound) return false; /* 64-bit 
scalar spill vs all slots MISC and vice versa. WDYT?
Eduard Zingerman Aug. 12, 2024, 6:30 p.m. UTC | #7
On Mon, 2024-08-12 at 11:26 -0700, Yonghong Song wrote:

[...]

> 
> We could do the following to avoid double comparison: diff --git 
> a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 
> df3be12096cf..1906798f1a3d 100644 --- a/kernel/bpf/verifier.c +++ 
> b/kernel/bpf/verifier.c @@ -17338,10 +17338,13 @@ static bool 
> stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, */ 
> for (i = 0; i < old->allocated_stack; i++) { struct bpf_reg_state 
> *old_reg, *cur_reg; + bool cur_exceed_bound; spi = i / BPF_REG_SIZE; - 
> if (exact != NOT_EXACT && + cur_exceed_bound = i >= 
> cur->allocated_stack; + + if (exact != NOT_EXACT && !cur_exceed_bound && 
> old->stack[spi].slot_type[i % BPF_REG_SIZE] != 
> cur->stack[spi].slot_type[i % BPF_REG_SIZE]) return false; @@ -17363,7 
> +17366,7 @@ static bool stacksafe(struct bpf_verifier_env *env, struct 
> bpf_func_state *old, /* explored stack has more populated slots than 
> current stack * and these slots were used */ - if (i >= 
> cur->allocated_stack) + if (cur_exceed_bound) return false; /* 64-bit 
> scalar spill vs all slots MISC and vice versa. WDYT?
> 

Yonghong, something went wrong with formatting of the above email,
could you please resend?
Yonghong Song Aug. 12, 2024, 6:36 p.m. UTC | #8
On 8/12/24 11:30 AM, Eduard Zingerman wrote:
> On Mon, 2024-08-12 at 11:26 -0700, Yonghong Song wrote:
>
> [...]
>
>> We could do the following to avoid double comparison: diff --git
>> a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index
>> df3be12096cf..1906798f1a3d 100644 --- a/kernel/bpf/verifier.c +++
>> b/kernel/bpf/verifier.c @@ -17338,10 +17338,13 @@ static bool
>> stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, */
>> for (i = 0; i < old->allocated_stack; i++) { struct bpf_reg_state
>> *old_reg, *cur_reg; + bool cur_exceed_bound; spi = i / BPF_REG_SIZE; -
>> if (exact != NOT_EXACT && + cur_exceed_bound = i >=
>> cur->allocated_stack; + + if (exact != NOT_EXACT && !cur_exceed_bound &&
>> old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
>> cur->stack[spi].slot_type[i % BPF_REG_SIZE]) return false; @@ -17363,7
>> +17366,7 @@ static bool stacksafe(struct bpf_verifier_env *env, struct
>> bpf_func_state *old, /* explored stack has more populated slots than
>> current stack * and these slots were used */ - if (i >=
>> cur->allocated_stack) + if (cur_exceed_bound) return false; /* 64-bit
>> scalar spill vs all slots MISC and vice versa. WDYT?
>>
> Yonghong, something went wrong with formatting of the above email,
> could you please resend?

Sorry, I copy-paste from 'git diff' result to my email window. Not sure
why it caused the format issue after I sent out. Anyway, the following
is the patch I suggested:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index df3be12096cf..1906798f1a3d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17338,10 +17338,13 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
          */
         for (i = 0; i < old->allocated_stack; i++) {
                 struct bpf_reg_state *old_reg, *cur_reg;
+               bool cur_exceed_bound;
  
                 spi = i / BPF_REG_SIZE;
  
-               if (exact != NOT_EXACT &&
+               cur_exceed_bound = i >= cur->allocated_stack;
+
+               if (exact != NOT_EXACT && !cur_exceed_bound &&
                     old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
                     cur->stack[spi].slot_type[i % BPF_REG_SIZE])
                         return false;
@@ -17363,7 +17366,7 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
                 /* explored stack has more populated slots than current stack
                  * and these slots were used
                  */
-               if (i >= cur->allocated_stack)
+               if (cur_exceed_bound)
                         return false;
  
                 /* 64-bit scalar spill vs all slots MISC and vice versa.
Eduard Zingerman Aug. 12, 2024, 6:41 p.m. UTC | #9
On Mon, 2024-08-12 at 11:36 -0700, Yonghong Song wrote:

[...]

> Sorry, I copy-paste from 'git diff' result to my email window. Not sure
> why it caused the format issue after I sent out.

Sure, no problem

> Anyway, the following is the patch I suggested:
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index df3be12096cf..1906798f1a3d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -17338,10 +17338,13 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
>           */
>          for (i = 0; i < old->allocated_stack; i++) {
>                  struct bpf_reg_state *old_reg, *cur_reg;
> +               bool cur_exceed_bound;
>   
>                  spi = i / BPF_REG_SIZE;
>   
> -               if (exact != NOT_EXACT &&
> +               cur_exceed_bound = i >= cur->allocated_stack;

idk, I think C compiler would do this anyways,
to me the code is fine both with and without this additional variable.

> +
> +               if (exact != NOT_EXACT && !cur_exceed_bound &&
>                      old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
>                      cur->stack[spi].slot_type[i % BPF_REG_SIZE])
>                          return false;
> @@ -17363,7 +17366,7 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
>                  /* explored stack has more populated slots than current stack
>                   * and these slots were used
>                   */
> -               if (i >= cur->allocated_stack)
> +               if (cur_exceed_bound)
>                          return false;
>   
>                  /* 64-bit scalar spill vs all slots MISC and vice versa.
>
Yonghong Song Aug. 12, 2024, 7:21 p.m. UTC | #10
On 8/12/24 11:41 AM, Eduard Zingerman wrote:
> On Mon, 2024-08-12 at 11:36 -0700, Yonghong Song wrote:
>
> [...]
>
>> Sorry, I copy-paste from 'git diff' result to my email window. Not sure
>> why it caused the format issue after I sent out.
> Sure, no problem
>
>> Anyway, the following is the patch I suggested:
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index df3be12096cf..1906798f1a3d 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -17338,10 +17338,13 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
>>            */
>>           for (i = 0; i < old->allocated_stack; i++) {
>>                   struct bpf_reg_state *old_reg, *cur_reg;
>> +               bool cur_exceed_bound;
>>    
>>                   spi = i / BPF_REG_SIZE;
>>    
>> -               if (exact != NOT_EXACT &&
>> +               cur_exceed_bound = i >= cur->allocated_stack;
> idk, I think C compiler would do this anyways,
> to me the code is fine both with and without this additional variable.

Okay, I will keep the original (simpler) patch then.

>
>> +
>> +               if (exact != NOT_EXACT && !cur_exceed_bound &&
>>                       old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
>>                       cur->stack[spi].slot_type[i % BPF_REG_SIZE])
>>                           return false;
>> @@ -17363,7 +17366,7 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
>>                   /* explored stack has more populated slots than current stack
>>                    * and these slots were used
>>                    */
>> -               if (i >= cur->allocated_stack)
>> +               if (cur_exceed_bound)
>>                           return false;
>>    
>>                   /* 64-bit scalar spill vs all slots MISC and vice versa.
>>
>
Alexei Starovoitov Aug. 12, 2024, 7:29 p.m. UTC | #11
On Mon, Aug 12, 2024 at 10:57 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2024-08-12 at 10:50 -0700, Alexei Starovoitov wrote:
> > On Mon, Aug 12, 2024 at 10:47 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Mon, 2024-08-12 at 10:44 -0700, Alexei Starovoitov wrote:
> > >
> > > [...]
> > >
> > > > Should we move the check up instead?
> > > >
> > > > if (i >= cur->allocated_stack)
> > > >           return false;
> > > >
> > > > Checking it twice looks odd.
> > >
> > > A few checks before that, namely:
> > >
> > >                 if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
> > >                     && exact == NOT_EXACT) {
> > >                         i += BPF_REG_SIZE - 1;
> > >                         /* explored state didn't use this */
> > >                         continue;
> > >                 }
> > >
> > >                 if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID)
> > >                         continue;
> > >
> > >                 if (env->allow_uninit_stack &&
> > >                     old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC)
> > >                         continue;
> > >
> > > Should be done regardless cur->allocated_stack.
> >
> > Right, but then let's sink old->slot_type != cur->slot_type down?
>
> It does not seem correct to swap the order for these two checks:
>
>                 if (exact != NOT_EXACT && i < cur->allocated_stack &&
>                     old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
>                     cur->stack[spi].slot_type[i % BPF_REG_SIZE])
>                         return false;
>
>                 if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
>                     && exact == NOT_EXACT) {
>                         i += BPF_REG_SIZE - 1;
>                         /* explored state didn't use this */
>                         continue;
>                 }
>
> if we do, 'slot_type' won't be checked for 'cur' when 'old' register is not marked live.

I see. This is to compare states in open coded iter loops when liveness
is not propagated yet, right?

Then when comparing for exact states we should probably do:
if (exact != NOT_EXACT &&
    (i >= cur->allocated_stack ||
     old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
     cur->stack[spi].slot_type[i % BPF_REG_SIZE]))
   return false;

?
Eduard Zingerman Aug. 12, 2024, 7:43 p.m. UTC | #12
On Mon, 2024-08-12 at 12:29 -0700, Alexei Starovoitov wrote:

[...]

> > It does not seem correct to swap the order for these two checks:
> > 
> >                 if (exact != NOT_EXACT && i < cur->allocated_stack &&
> >                     old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
> >                     cur->stack[spi].slot_type[i % BPF_REG_SIZE])
> >                         return false;
> > 
> >                 if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
> >                     && exact == NOT_EXACT) {
> >                         i += BPF_REG_SIZE - 1;
> >                         /* explored state didn't use this */
> >                         continue;
> >                 }
> > 
> > if we do, 'slot_type' won't be checked for 'cur' when 'old' register is not marked live.
> 
> I see. This is to compare states in open coded iter loops when liveness
> is not propagated yet, right?

Yes

> 
> Then when comparing for exact states we should probably do:
> if (exact != NOT_EXACT &&
>     (i >= cur->allocated_stack ||
>      old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
>      cur->stack[spi].slot_type[i % BPF_REG_SIZE]))
>    return false;
> 
> ?

Hm, right, otherwise the old slots in the interval
[cur->allocated_stack..old->allocated_stack)
won't be checked using exact rules.
Yonghong Song Aug. 12, 2024, 8:02 p.m. UTC | #13
On 8/12/24 12:43 PM, Eduard Zingerman wrote:
> On Mon, 2024-08-12 at 12:29 -0700, Alexei Starovoitov wrote:
>
> [...]
>
>>> It does not seem correct to swap the order for these two checks:
>>>
>>>                  if (exact != NOT_EXACT && i < cur->allocated_stack &&
>>>                      old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
>>>                      cur->stack[spi].slot_type[i % BPF_REG_SIZE])
>>>                          return false;
>>>
>>>                  if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
>>>                      && exact == NOT_EXACT) {
>>>                          i += BPF_REG_SIZE - 1;
>>>                          /* explored state didn't use this */
>>>                          continue;
>>>                  }
>>>
>>> if we do, 'slot_type' won't be checked for 'cur' when 'old' register is not marked live.
>> I see. This is to compare states in open coded iter loops when liveness
>> is not propagated yet, right?
> Yes
>
>> Then when comparing for exact states we should probably do:
>> if (exact != NOT_EXACT &&
>>      (i >= cur->allocated_stack ||
>>       old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
>>       cur->stack[spi].slot_type[i % BPF_REG_SIZE]))
>>     return false;
>>
>> ?
> Hm, right, otherwise the old slots in the interval
> [cur->allocated_stack..old->allocated_stack)
> won't be checked using exact rules.

Okay, for *exact* stack slot_type comparison. Will make the change
and send v2 soon.
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4cb5441ad75f..1e3d7794bf13 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16883,7 +16883,7 @@  static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 
 		spi = i / BPF_REG_SIZE;
 
-		if (exact != NOT_EXACT &&
+		if (exact != NOT_EXACT && i < cur->allocated_stack &&
 		    old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
 		    cur->stack[spi].slot_type[i % BPF_REG_SIZE])
 			return false;