diff mbox series

[bpf-next] selftests/bpf: Workaround iters/iter_arr_with_actual_elem_count failure when -mcpu=cpuv4

Message ID 20240708154634.283426-1-yonghong.song@linux.dev (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next] selftests/bpf: Workaround iters/iter_arr_with_actual_elem_count failure when -mcpu=cpuv4 | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 12 maintainers not CCed: song@kernel.org sdf@google.com martin.lau@linux.dev mykolal@fb.com jose.marchesi@oracle.com jolsa@kernel.org linux-kselftest@vger.kernel.org eddyz87@gmail.com shuah@kernel.org haoluo@google.com kpsingh@kernel.org john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-31 fail Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18

Commit Message

Yonghong Song July 8, 2024, 3:46 p.m. UTC
With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count
failed with -mcpu=v4.

The following are the details:
  0: R1=ctx() R10=fp0
  ; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420
  0: (b4) w7 = 0                        ; R7_w=0
  ; int i, n = loop_data.n, sum = 0; @ iters.c:1422
  1: (18) r1 = 0xffffc90000191478       ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144)
  3: (61) r6 = *(u32 *)(r1 +128)        ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  ; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424
  4: (26) if w6 > 0x20 goto pc+27       ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
  5: (bf) r8 = r10                      ; R8_w=fp0 R10=fp0
  6: (07) r8 += -8                      ; R8_w=fp-8
  ; bpf_for(i, 0, n) { @ iters.c:1427
  7: (bf) r1 = r8                       ; R1_w=fp-8 R8_w=fp-8
  8: (b4) w2 = 0                        ; R2_w=0
  9: (bc) w3 = w6                       ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
  10: (85) call bpf_iter_num_new#45179          ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2
  11: (bf) r1 = r8                      ; R1=fp-8 R8=fp-8 refs=2
  12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
  ; bpf_for(i, 0, n) { @ iters.c:1427
  13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
  14: (81) r1 = *(s32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
  15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
  ; sum += loop_data.data[i]; @ iters.c:1429
  20: (67) r1 <<= 2                     ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2
  21: (18) r2 = 0xffffc90000191478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
  23: (0f) r2 += r1
  math between map_value pointer and register with unbounded min value is not allowed

The source code:
  int iter_arr_with_actual_elem_count(const void *ctx)
  {
        int i, n = loop_data.n, sum = 0;

        if (n > ARRAY_SIZE(loop_data.data))
                return 0;

        bpf_for(i, 0, n) {
                /* no rechecking of i against ARRAY_SIZE(loop_data.n) */
                sum += loop_data.data[i];
        }

        return sum;
  }

The insn #14 is a sign-extenstion load which is related to 'int i'.
The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later
insn #23 failed verification due to unbounded min value.

Actually insn #15 smin range can be better. Since after comparison, we know smin32=0 and smax32=32.
With insn #14 being a sign-extension load. We will know top 32bits should be 0 as well.
Current verifier is not able to handle this, and this patch is a workaround to fix
test failure by changing variable 'i' type from 'int' to 'unsigned' which will give
proper range during comparison.

  ; bpf_for(i, 0, n) { @ iters.c:1428
  13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
  14: (61) r1 = *(u32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=2
  ...
  from 15 to 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
  20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
  ; sum += loop_data.data[i]; @ iters.c:1430
  20: (67) r1 <<= 2                     ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=124,var_off=(0x0; 0x7c)) refs=2
  21: (18) r2 = 0xffffc90000185478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
  23: (0f) r2 += r1
  mark_precise: frame0: last_idx 23 first_idx 20 subseq_idx -1
  ...

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/testing/selftests/bpf/progs/iters.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Alexei Starovoitov July 8, 2024, 4:27 p.m. UTC | #1
On Mon, Jul 8, 2024 at 8:46 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count
> failed with -mcpu=v4.
>
> The following are the details:
>   0: R1=ctx() R10=fp0
>   ; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420
>   0: (b4) w7 = 0                        ; R7_w=0
>   ; int i, n = loop_data.n, sum = 0; @ iters.c:1422
>   1: (18) r1 = 0xffffc90000191478       ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144)
>   3: (61) r6 = *(u32 *)(r1 +128)        ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>   ; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424
>   4: (26) if w6 > 0x20 goto pc+27       ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>   5: (bf) r8 = r10                      ; R8_w=fp0 R10=fp0
>   6: (07) r8 += -8                      ; R8_w=fp-8
>   ; bpf_for(i, 0, n) { @ iters.c:1427
>   7: (bf) r1 = r8                       ; R1_w=fp-8 R8_w=fp-8
>   8: (b4) w2 = 0                        ; R2_w=0
>   9: (bc) w3 = w6                       ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>   10: (85) call bpf_iter_num_new#45179          ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2
>   11: (bf) r1 = r8                      ; R1=fp-8 R8=fp-8 refs=2
>   12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>   ; bpf_for(i, 0, n) { @ iters.c:1427
>   13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
>   14: (81) r1 = *(s32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
>   15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>   ; sum += loop_data.data[i]; @ iters.c:1429
>   20: (67) r1 <<= 2                     ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2
>   21: (18) r2 = 0xffffc90000191478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
>   23: (0f) r2 += r1
>   math between map_value pointer and register with unbounded min value is not allowed
>
> The source code:
>   int iter_arr_with_actual_elem_count(const void *ctx)
>   {
>         int i, n = loop_data.n, sum = 0;
>
>         if (n > ARRAY_SIZE(loop_data.data))
>                 return 0;
>
>         bpf_for(i, 0, n) {
>                 /* no rechecking of i against ARRAY_SIZE(loop_data.n) */
>                 sum += loop_data.data[i];
>         }
>
>         return sum;
>   }
>
> The insn #14 is a sign-extenstion load which is related to 'int i'.
> The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later
> insn #23 failed verification due to unbounded min value.
>
> Actually insn #15 smin range can be better. Since after comparison, we know smin32=0 and smax32=32.
> With insn #14 being a sign-extension load. We will know top 32bits should be 0 as well.
> Current verifier is not able to handle this, and this patch is a workaround to fix
> test failure by changing variable 'i' type from 'int' to 'unsigned' which will give
> proper range during comparison.
>
>   ; bpf_for(i, 0, n) { @ iters.c:1428
>   13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
>   14: (61) r1 = *(u32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=2
>   ...
>   from 15 to 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>   20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>   ; sum += loop_data.data[i]; @ iters.c:1430
>   20: (67) r1 <<= 2                     ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=124,var_off=(0x0; 0x7c)) refs=2
>   21: (18) r2 = 0xffffc90000185478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
>   23: (0f) r2 += r1
>   mark_precise: frame0: last_idx 23 first_idx 20 subseq_idx -1
>   ...
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  tools/testing/selftests/bpf/progs/iters.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
> index 16bdc3e25591..d1801d151a12 100644
> --- a/tools/testing/selftests/bpf/progs/iters.c
> +++ b/tools/testing/selftests/bpf/progs/iters.c
> @@ -1419,7 +1419,8 @@ SEC("raw_tp")
>  __success
>  int iter_arr_with_actual_elem_count(const void *ctx)
>  {
> -       int i, n = loop_data.n, sum = 0;
> +       unsigned i;
> +       int n = loop_data.n, sum = 0;
>
>         if (n > ARRAY_SIZE(loop_data.data))
>                 return 0;

I think we only have one realistic test that
checks 'range vs range' verifier logic.
Since "int i; bpf_for(i"
is a very common pattern in all other bpf_for tests it feels
wrong to workaround like this.

What exactly needs to be improved in 'range vs range' logic to
handle this case?
Yonghong Song July 8, 2024, 6:34 p.m. UTC | #2
On 7/8/24 9:27 AM, Alexei Starovoitov wrote:
> On Mon, Jul 8, 2024 at 8:46 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>> With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count
>> failed with -mcpu=v4.
>>
>> The following are the details:
>>    0: R1=ctx() R10=fp0
>>    ; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420
>>    0: (b4) w7 = 0                        ; R7_w=0
>>    ; int i, n = loop_data.n, sum = 0; @ iters.c:1422
>>    1: (18) r1 = 0xffffc90000191478       ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144)
>>    3: (61) r6 = *(u32 *)(r1 +128)        ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>>    ; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424
>>    4: (26) if w6 > 0x20 goto pc+27       ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>>    5: (bf) r8 = r10                      ; R8_w=fp0 R10=fp0
>>    6: (07) r8 += -8                      ; R8_w=fp-8
>>    ; bpf_for(i, 0, n) { @ iters.c:1427
>>    7: (bf) r1 = r8                       ; R1_w=fp-8 R8_w=fp-8
>>    8: (b4) w2 = 0                        ; R2_w=0
>>    9: (bc) w3 = w6                       ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>>    10: (85) call bpf_iter_num_new#45179          ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2
>>    11: (bf) r1 = r8                      ; R1=fp-8 R8=fp-8 refs=2
>>    12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>>    ; bpf_for(i, 0, n) { @ iters.c:1427
>>    13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
>>    14: (81) r1 = *(s32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
>>    15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>>    ; sum += loop_data.data[i]; @ iters.c:1429
>>    20: (67) r1 <<= 2                     ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2
>>    21: (18) r2 = 0xffffc90000191478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
>>    23: (0f) r2 += r1
>>    math between map_value pointer and register with unbounded min value is not allowed
>>
>> The source code:
>>    int iter_arr_with_actual_elem_count(const void *ctx)
>>    {
>>          int i, n = loop_data.n, sum = 0;
>>
>>          if (n > ARRAY_SIZE(loop_data.data))
>>                  return 0;
>>
>>          bpf_for(i, 0, n) {
>>                  /* no rechecking of i against ARRAY_SIZE(loop_data.n) */
>>                  sum += loop_data.data[i];
>>          }
>>
>>          return sum;
>>    }
>>
>> The insn #14 is a sign-extenstion load which is related to 'int i'.
>> The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later
>> insn #23 failed verification due to unbounded min value.
>>
>> Actually insn #15 smin range can be better. Since after comparison, we know smin32=0 and smax32=32.
>> With insn #14 being a sign-extension load. We will know top 32bits should be 0 as well.
>> Current verifier is not able to handle this, and this patch is a workaround to fix
>> test failure by changing variable 'i' type from 'int' to 'unsigned' which will give
>> proper range during comparison.
>>
>>    ; bpf_for(i, 0, n) { @ iters.c:1428
>>    13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
>>    14: (61) r1 = *(u32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=2
>>    ...
>>    from 15 to 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>>    20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>>    ; sum += loop_data.data[i]; @ iters.c:1430
>>    20: (67) r1 <<= 2                     ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=124,var_off=(0x0; 0x7c)) refs=2
>>    21: (18) r2 = 0xffffc90000185478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
>>    23: (0f) r2 += r1
>>    mark_precise: frame0: last_idx 23 first_idx 20 subseq_idx -1
>>    ...
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   tools/testing/selftests/bpf/progs/iters.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
>> index 16bdc3e25591..d1801d151a12 100644
>> --- a/tools/testing/selftests/bpf/progs/iters.c
>> +++ b/tools/testing/selftests/bpf/progs/iters.c
>> @@ -1419,7 +1419,8 @@ SEC("raw_tp")
>>   __success
>>   int iter_arr_with_actual_elem_count(const void *ctx)
>>   {
>> -       int i, n = loop_data.n, sum = 0;
>> +       unsigned i;
>> +       int n = loop_data.n, sum = 0;
>>
>>          if (n > ARRAY_SIZE(loop_data.data))
>>                  return 0;
> I think we only have one realistic test that
> checks 'range vs range' verifier logic.
> Since "int i; bpf_for(i"
> is a very common pattern in all other bpf_for tests it feels
> wrong to workaround like this.

Agree. We should fix this in verifier to be user friendly.

>
> What exactly needs to be improved in 'range vs range' logic to
> handle this case?

We can add a bit in struct bpf_reg_state like below
	struct bpf_reg_state {
		...
		enum bpf_reg_liveness live;
		bool precise;
	}
to
	struct bpf_reg_state {
		...
		enum bpf_reg_liveness live;
		unsigned precise:1;
		unsigned 32bit_sign_ext:1;  /* for *(s32 *)(...) */
	}
When the insn 15 is processed with 'true' branch
   14: (81) r1 = *(s32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
   15: (ae) if w1 < w6 goto pc+4

the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.

Can we avoid 32bit_sign_exit in the above? Let us say we have
   r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
   if w1 < w6 goto pc+4
where r1 achieves is trange through other means than 32bit sign extension e.g.
   call bpf_get_prandom_u32;
   r1 = r0;
   r1 <<= 32;
   call bpf_get_prandom_u32;
   r1 |= r0;  /* r1 is 64bit random number */
   r2 = 0xffffffff80000000 ll;
   if r1 s< r2 goto end;
   if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
   if w1 < w6 goto end;
   ...  <=== w1 range [0,31]
        <=== but if we have upper bit as 0xffffffff........, then the range will be
        <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
        <=== so the only possible way for upper 32bit range is 0.
end:

Therefore, looks like we do not need 32bit_sign_exit. Just from
R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
with refined range in true path of 'if w1 < w6 goto ...',
we can further refine w1 range properly.
Alexei Starovoitov July 8, 2024, 8:18 p.m. UTC | #3
On Mon, Jul 8, 2024 at 11:34 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 7/8/24 9:27 AM, Alexei Starovoitov wrote:
> > On Mon, Jul 8, 2024 at 8:46 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> >> With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count
> >> failed with -mcpu=v4.
> >>
> >> The following are the details:
> >>    0: R1=ctx() R10=fp0
> >>    ; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420
> >>    0: (b4) w7 = 0                        ; R7_w=0
> >>    ; int i, n = loop_data.n, sum = 0; @ iters.c:1422
> >>    1: (18) r1 = 0xffffc90000191478       ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144)
> >>    3: (61) r6 = *(u32 *)(r1 +128)        ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> >>    ; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424
> >>    4: (26) if w6 > 0x20 goto pc+27       ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> >>    5: (bf) r8 = r10                      ; R8_w=fp0 R10=fp0
> >>    6: (07) r8 += -8                      ; R8_w=fp-8
> >>    ; bpf_for(i, 0, n) { @ iters.c:1427
> >>    7: (bf) r1 = r8                       ; R1_w=fp-8 R8_w=fp-8
> >>    8: (b4) w2 = 0                        ; R2_w=0
> >>    9: (bc) w3 = w6                       ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> >>    10: (85) call bpf_iter_num_new#45179          ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2
> >>    11: (bf) r1 = r8                      ; R1=fp-8 R8=fp-8 refs=2
> >>    12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
> >>    ; bpf_for(i, 0, n) { @ iters.c:1427
> >>    13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
> >>    14: (81) r1 = *(s32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
> >>    15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
> >>    ; sum += loop_data.data[i]; @ iters.c:1429
> >>    20: (67) r1 <<= 2                     ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2
> >>    21: (18) r2 = 0xffffc90000191478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
> >>    23: (0f) r2 += r1
> >>    math between map_value pointer and register with unbounded min value is not allowed
> >>
> >> The source code:
> >>    int iter_arr_with_actual_elem_count(const void *ctx)
> >>    {
> >>          int i, n = loop_data.n, sum = 0;
> >>
> >>          if (n > ARRAY_SIZE(loop_data.data))
> >>                  return 0;
> >>
> >>          bpf_for(i, 0, n) {
> >>                  /* no rechecking of i against ARRAY_SIZE(loop_data.n) */
> >>                  sum += loop_data.data[i];
> >>          }
> >>
> >>          return sum;
> >>    }
> >>
> >> The insn #14 is a sign-extenstion load which is related to 'int i'.
> >> The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later
> >> insn #23 failed verification due to unbounded min value.
> >>
> >> Actually insn #15 smin range can be better. Since after comparison, we know smin32=0 and smax32=32.
> >> With insn #14 being a sign-extension load. We will know top 32bits should be 0 as well.
> >> Current verifier is not able to handle this, and this patch is a workaround to fix
> >> test failure by changing variable 'i' type from 'int' to 'unsigned' which will give
> >> proper range during comparison.
> >>
> >>    ; bpf_for(i, 0, n) { @ iters.c:1428
> >>    13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
> >>    14: (61) r1 = *(u32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=2
> >>    ...
> >>    from 15 to 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
> >>    20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
> >>    ; sum += loop_data.data[i]; @ iters.c:1430
> >>    20: (67) r1 <<= 2                     ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=124,var_off=(0x0; 0x7c)) refs=2
> >>    21: (18) r2 = 0xffffc90000185478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
> >>    23: (0f) r2 += r1
> >>    mark_precise: frame0: last_idx 23 first_idx 20 subseq_idx -1
> >>    ...
> >>
> >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> >> ---
> >>   tools/testing/selftests/bpf/progs/iters.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
> >> index 16bdc3e25591..d1801d151a12 100644
> >> --- a/tools/testing/selftests/bpf/progs/iters.c
> >> +++ b/tools/testing/selftests/bpf/progs/iters.c
> >> @@ -1419,7 +1419,8 @@ SEC("raw_tp")
> >>   __success
> >>   int iter_arr_with_actual_elem_count(const void *ctx)
> >>   {
> >> -       int i, n = loop_data.n, sum = 0;
> >> +       unsigned i;
> >> +       int n = loop_data.n, sum = 0;
> >>
> >>          if (n > ARRAY_SIZE(loop_data.data))
> >>                  return 0;
> > I think we only have one realistic test that
> > checks 'range vs range' verifier logic.
> > Since "int i; bpf_for(i"
> > is a very common pattern in all other bpf_for tests it feels
> > wrong to workaround like this.
>
> Agree. We should fix this in verifier to be user friendly.
>
> >
> > What exactly needs to be improved in 'range vs range' logic to
> > handle this case?
>
> We can add a bit in struct bpf_reg_state like below
>         struct bpf_reg_state {
>                 ...
>                 enum bpf_reg_liveness live;
>                 bool precise;
>         }
> to
>         struct bpf_reg_state {
>                 ...
>                 enum bpf_reg_liveness live;
>                 unsigned precise:1;
>                 unsigned 32bit_sign_ext:1;  /* for *(s32 *)(...) */
>         }
> When the insn 15 is processed with 'true' branch
>    14: (81) r1 = *(s32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
>    15: (ae) if w1 < w6 goto pc+4
>
> the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
>
> Can we avoid 32bit_sign_exit in the above? Let us say we have
>    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>    if w1 < w6 goto pc+4
> where r1 achieves is trange through other means than 32bit sign extension e.g.
>    call bpf_get_prandom_u32;
>    r1 = r0;
>    r1 <<= 32;
>    call bpf_get_prandom_u32;
>    r1 |= r0;  /* r1 is 64bit random number */
>    r2 = 0xffffffff80000000 ll;
>    if r1 s< r2 goto end;
>    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
>    if w1 < w6 goto end;
>    ...  <=== w1 range [0,31]
>         <=== but if we have upper bit as 0xffffffff........, then the range will be
>         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.

Just rephrasing for myself...
Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
then lower 32-bit has to be negative.
and because we're doing unsigned compare w1 < w6
and w6 is less than 80000000
we can conclude that upper bits are zero.
right?

>         <=== so the only possible way for upper 32bit range is 0.
> end:
>
> Therefore, looks like we do not need 32bit_sign_exit. Just from
> R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
> with refined range in true path of 'if w1 < w6 goto ...',
> we can further refine w1 range properly.

yep. looks like it.
We can hard code this special logic for this specific smin/smax pair,
but the gut feel is that we can generalize it further.
Yonghong Song July 8, 2024, 9:19 p.m. UTC | #4
On 7/8/24 1:18 PM, Alexei Starovoitov wrote:
> On Mon, Jul 8, 2024 at 11:34 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 7/8/24 9:27 AM, Alexei Starovoitov wrote:
>>> On Mon, Jul 8, 2024 at 8:46 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count
>>>> failed with -mcpu=v4.
>>>>
>>>> The following are the details:
>>>>     0: R1=ctx() R10=fp0
>>>>     ; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420
>>>>     0: (b4) w7 = 0                        ; R7_w=0
>>>>     ; int i, n = loop_data.n, sum = 0; @ iters.c:1422
>>>>     1: (18) r1 = 0xffffc90000191478       ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144)
>>>>     3: (61) r6 = *(u32 *)(r1 +128)        ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>>>>     ; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424
>>>>     4: (26) if w6 > 0x20 goto pc+27       ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>>>>     5: (bf) r8 = r10                      ; R8_w=fp0 R10=fp0
>>>>     6: (07) r8 += -8                      ; R8_w=fp-8
>>>>     ; bpf_for(i, 0, n) { @ iters.c:1427
>>>>     7: (bf) r1 = r8                       ; R1_w=fp-8 R8_w=fp-8
>>>>     8: (b4) w2 = 0                        ; R2_w=0
>>>>     9: (bc) w3 = w6                       ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>>>>     10: (85) call bpf_iter_num_new#45179          ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2
>>>>     11: (bf) r1 = r8                      ; R1=fp-8 R8=fp-8 refs=2
>>>>     12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>>>>     ; bpf_for(i, 0, n) { @ iters.c:1427
>>>>     13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
>>>>     14: (81) r1 = *(s32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
>>>>     15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>>>>     ; sum += loop_data.data[i]; @ iters.c:1429
>>>>     20: (67) r1 <<= 2                     ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2
>>>>     21: (18) r2 = 0xffffc90000191478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
>>>>     23: (0f) r2 += r1
>>>>     math between map_value pointer and register with unbounded min value is not allowed
>>>>
>>>> The source code:
>>>>     int iter_arr_with_actual_elem_count(const void *ctx)
>>>>     {
>>>>           int i, n = loop_data.n, sum = 0;
>>>>
>>>>           if (n > ARRAY_SIZE(loop_data.data))
>>>>                   return 0;
>>>>
>>>>           bpf_for(i, 0, n) {
>>>>                   /* no rechecking of i against ARRAY_SIZE(loop_data.n) */
>>>>                   sum += loop_data.data[i];
>>>>           }
>>>>
>>>>           return sum;
>>>>     }
>>>>
>>>> The insn #14 is a sign-extenstion load which is related to 'int i'.
>>>> The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later
>>>> insn #23 failed verification due to unbounded min value.
>>>>
>>>> Actually insn #15 smin range can be better. Since after comparison, we know smin32=0 and smax32=32.
>>>> With insn #14 being a sign-extension load. We will know top 32bits should be 0 as well.
>>>> Current verifier is not able to handle this, and this patch is a workaround to fix
>>>> test failure by changing variable 'i' type from 'int' to 'unsigned' which will give
>>>> proper range during comparison.
>>>>
>>>>     ; bpf_for(i, 0, n) { @ iters.c:1428
>>>>     13: (15) if r0 == 0x0 goto pc+2       ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
>>>>     14: (61) r1 = *(u32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=2
>>>>     ...
>>>>     from 15 to 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>>>>     20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
>>>>     ; sum += loop_data.data[i]; @ iters.c:1430
>>>>     20: (67) r1 <<= 2                     ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=124,var_off=(0x0; 0x7c)) refs=2
>>>>     21: (18) r2 = 0xffffc90000185478      ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
>>>>     23: (0f) r2 += r1
>>>>     mark_precise: frame0: last_idx 23 first_idx 20 subseq_idx -1
>>>>     ...
>>>>
>>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>>>> ---
>>>>    tools/testing/selftests/bpf/progs/iters.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
>>>> index 16bdc3e25591..d1801d151a12 100644
>>>> --- a/tools/testing/selftests/bpf/progs/iters.c
>>>> +++ b/tools/testing/selftests/bpf/progs/iters.c
>>>> @@ -1419,7 +1419,8 @@ SEC("raw_tp")
>>>>    __success
>>>>    int iter_arr_with_actual_elem_count(const void *ctx)
>>>>    {
>>>> -       int i, n = loop_data.n, sum = 0;
>>>> +       unsigned i;
>>>> +       int n = loop_data.n, sum = 0;
>>>>
>>>>           if (n > ARRAY_SIZE(loop_data.data))
>>>>                   return 0;
>>> I think we only have one realistic test that
>>> checks 'range vs range' verifier logic.
>>> Since "int i; bpf_for(i"
>>> is a very common pattern in all other bpf_for tests it feels
>>> wrong to workaround like this.
>> Agree. We should fix this in verifier to be user friendly.
>>
>>> What exactly needs to be improved in 'range vs range' logic to
>>> handle this case?
>> We can add a bit in struct bpf_reg_state like below
>>          struct bpf_reg_state {
>>                  ...
>>                  enum bpf_reg_liveness live;
>>                  bool precise;
>>          }
>> to
>>          struct bpf_reg_state {
>>                  ...
>>                  enum bpf_reg_liveness live;
>>                  unsigned precise:1;
>>                  unsigned 32bit_sign_ext:1;  /* for *(s32 *)(...) */
>>          }
>> When the insn 15 is processed with 'true' branch
>>     14: (81) r1 = *(s32 *)(r0 +0)         ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
>>     15: (ae) if w1 < w6 goto pc+4
>>
>> the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
>>
>> Can we avoid 32bit_sign_exit in the above? Let us say we have
>>     r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>>     if w1 < w6 goto pc+4
>> where r1 achieves is trange through other means than 32bit sign extension e.g.
>>     call bpf_get_prandom_u32;
>>     r1 = r0;
>>     r1 <<= 32;
>>     call bpf_get_prandom_u32;
>>     r1 |= r0;  /* r1 is 64bit random number */
>>     r2 = 0xffffffff80000000 ll;
>>     if r1 s< r2 goto end;
>>     if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
>>     if w1 < w6 goto end;
>>     ...  <=== w1 range [0,31]
>>          <=== but if we have upper bit as 0xffffffff........, then the range will be
>>          <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> Just rephrasing for myself...
> Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> then lower 32-bit has to be negative.
> and because we're doing unsigned compare w1 < w6
> and w6 is less than 80000000
> we can conclude that upper bits are zero.
> right?

Right.

>
>>          <=== so the only possible way for upper 32bit range is 0.
>> end:
>>
>> Therefore, looks like we do not need 32bit_sign_exit. Just from
>> R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
>> with refined range in true path of 'if w1 < w6 goto ...',
>> we can further refine w1 range properly.
> yep. looks like it.
> We can hard code this special logic for this specific smin/smax pair,
> but the gut feel is that we can generalize it further.

Great. Let me try to implement such a logic in verifier.
Eduard Zingerman July 8, 2024, 9:31 p.m. UTC | #5
On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:

[...]

> > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > 
> > Can we avoid 32bit_sign_exit in the above? Let us say we have
> >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> >    if w1 < w6 goto pc+4
> > where r1 achieves is trange through other means than 32bit sign extension e.g.
> >    call bpf_get_prandom_u32;
> >    r1 = r0;
> >    r1 <<= 32;
> >    call bpf_get_prandom_u32;
> >    r1 |= r0;  /* r1 is 64bit random number */
> >    r2 = 0xffffffff80000000 ll;
> >    if r1 s< r2 goto end;
> >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> >    if w1 < w6 goto end;
> >    ...  <=== w1 range [0,31]
> >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> 
> Just rephrasing for myself...
> Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> then lower 32-bit has to be negative.
> and because we're doing unsigned compare w1 < w6
> and w6 is less than 80000000
> we can conclude that upper bits are zero.
> right?

Sorry, could you please explain this a bit more.
The w1 < w6 comparison only infers information about sub-registers.
So the range for the full register r1 would still have 0xffffFFFF
for upper bits => r1 += r2 would fail.
What do I miss?

The non-cpuv4 version of the program does non-sign-extended load:

14: (61) r1 = *(u32 *)(r0 +0)   ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4)
                                  R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
15: (ae) if w1 < w6 goto pc+4   ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
                                  R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))

Tbh, it looks like LLVM deleted some info that could not be recovered
in this instance.

> 
> >         <=== so the only possible way for upper 32bit range is 0.
> > end:
> > 
> > Therefore, looks like we do not need 32bit_sign_exit. Just from
> > R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
> > with refined range in true path of 'if w1 < w6 goto ...',
> > we can further refine w1 range properly.
> 
> yep. looks like it.
> We can hard code this special logic for this specific smin/smax pair,
> but the gut feel is that we can generalize it further.
>
Andrii Nakryiko July 8, 2024, 10:11 p.m. UTC | #6
On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > >
> > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > >    if w1 < w6 goto pc+4
> > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > >    call bpf_get_prandom_u32;
> > >    r1 = r0;
> > >    r1 <<= 32;
> > >    call bpf_get_prandom_u32;
> > >    r1 |= r0;  /* r1 is 64bit random number */
> > >    r2 = 0xffffffff80000000 ll;
> > >    if r1 s< r2 goto end;
> > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > >    if w1 < w6 goto end;
> > >    ...  <=== w1 range [0,31]
> > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> >
> > Just rephrasing for myself...
> > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > then lower 32-bit has to be negative.
> > and because we're doing unsigned compare w1 < w6
> > and w6 is less than 80000000
> > we can conclude that upper bits are zero.
> > right?
>
> Sorry, could you please explain this a bit more.

Yep, also curious.

But meanwhile, I'm intending to update bpf_for() to something like
below to avoid this code generation pattern:

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 305c62817dd3..86dc854a713b 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -394,7 +394,18 @@ extern void bpf_iter_num_destroy(struct
bpf_iter_num *it) __weak __ksym;
                /* iteration step */
                         \
                int *___t = bpf_iter_num_next(&___it);
                         \
                /* termination and bounds check */
                         \
-               (___t && ((i) = *___t, (i) >= (start) && (i) <
(end)));                         \
+               (___t && ({
                         \
+                       __label__ l_false;
                         \
+                       _Bool ok = 0;
                         \
+                       (i) = *___t;
                         \
+                       asm volatile goto ("
                         \
+                               if %[_i] s< %[_start] goto
%l[l_false];                         \
+                               if %[_i] s>= %[_end] goto %l[l_false];
                         \
+                       " :: [_i]"r"(i), [_start]"ri"(start),
[_end]"ri"(end) :: l_false);      \
+                       ok = 1;
                         \
+               l_false:
                         \
+                       ok;
                         \
+               }));
                         \
        });
                         \
 )
 #endif /* bpf_for */

This produces this code for cpuv4:

    1294:       85 10 00 00 ff ff ff ff call -0x1
    1295:       15 00 10 00 00 00 00 00 if r0 == 0x0 goto +0x10 <LBB34_4>
    1296:       61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0x0)
    1297:       c5 01 0e 00 00 00 00 00 if r1 s< 0x0 goto +0xe <LBB34_4>
    1298:       7d 71 0d 00 00 00 00 00 if r1 s>= r7 goto +0xd <LBB34_4>
    1299:       bf 11 20 00 00 00 00 00 r1 = (s32)r1

> The w1 < w6 comparison only infers information about sub-registers.
> So the range for the full register r1 would still have 0xffffFFFF
> for upper bits => r1 += r2 would fail.
> What do I miss?
>
> The non-cpuv4 version of the program does non-sign-extended load:
>
> 14: (61) r1 = *(u32 *)(r0 +0)   ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4)
>                                   R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> 15: (ae) if w1 < w6 goto pc+4   ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>                                   R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>
> Tbh, it looks like LLVM deleted some info that could not be recovered
> in this instance.
>
> >
> > >         <=== so the only possible way for upper 32bit range is 0.
> > > end:
> > >
> > > Therefore, looks like we do not need 32bit_sign_exit. Just from
> > > R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
> > > with refined range in true path of 'if w1 < w6 goto ...',
> > > we can further refine w1 range properly.
> >
> > yep. looks like it.
> > We can hard code this special logic for this specific smin/smax pair,
> > but the gut feel is that we can generalize it further.
> >
>
Andrii Nakryiko July 8, 2024, 10:12 p.m. UTC | #7
On Mon, Jul 8, 2024 at 3:11 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
> >
> > [...]
> >
> > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > > >
> > > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > > >    if w1 < w6 goto pc+4
> > > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > > >    call bpf_get_prandom_u32;
> > > >    r1 = r0;
> > > >    r1 <<= 32;
> > > >    call bpf_get_prandom_u32;
> > > >    r1 |= r0;  /* r1 is 64bit random number */
> > > >    r2 = 0xffffffff80000000 ll;
> > > >    if r1 s< r2 goto end;
> > > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > > >    if w1 < w6 goto end;
> > > >    ...  <=== w1 range [0,31]
> > > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> > >
> > > Just rephrasing for myself...
> > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > > then lower 32-bit has to be negative.
> > > and because we're doing unsigned compare w1 < w6
> > > and w6 is less than 80000000
> > > we can conclude that upper bits are zero.
> > > right?
> >
> > Sorry, could you please explain this a bit more.
>
> Yep, also curious.
>
> But meanwhile, I'm intending to update bpf_for() to something like
> below to avoid this code generation pattern:
>

Well, thank you, Gmail, for messed up formatting. See [0] for properly
formatted diff.

  [0] https://gist.github.com/anakryiko/08a4374259469803af4ea2185296b0cb

>
> This produces this code for cpuv4:
>
>     1294:       85 10 00 00 ff ff ff ff call -0x1
>     1295:       15 00 10 00 00 00 00 00 if r0 == 0x0 goto +0x10 <LBB34_4>
>     1296:       61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0x0)
>     1297:       c5 01 0e 00 00 00 00 00 if r1 s< 0x0 goto +0xe <LBB34_4>
>     1298:       7d 71 0d 00 00 00 00 00 if r1 s>= r7 goto +0xd <LBB34_4>
>     1299:       bf 11 20 00 00 00 00 00 r1 = (s32)r1
>
> > The w1 < w6 comparison only infers information about sub-registers.
> > So the range for the full register r1 would still have 0xffffFFFF
> > for upper bits => r1 += r2 would fail.
> > What do I miss?
> >
> > The non-cpuv4 version of the program does non-sign-extended load:
> >
> > 14: (61) r1 = *(u32 *)(r0 +0)   ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4)
> >                                   R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> > 15: (ae) if w1 < w6 goto pc+4   ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> >                                   R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> >
> > Tbh, it looks like LLVM deleted some info that could not be recovered
> > in this instance.
> >
> > >
> > > >         <=== so the only possible way for upper 32bit range is 0.
> > > > end:
> > > >
> > > > Therefore, looks like we do not need 32bit_sign_exit. Just from
> > > > R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
> > > > with refined range in true path of 'if w1 < w6 goto ...',
> > > > we can further refine w1 range properly.
> > >
> > > yep. looks like it.
> > > We can hard code this special logic for this specific smin/smax pair,
> > > but the gut feel is that we can generalize it further.
> > >
> >
Alexei Starovoitov July 9, 2024, 2:03 a.m. UTC | #8
On Mon, Jul 8, 2024 at 3:12 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 8, 2024 at 3:11 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
> > >
> > > [...]
> > >
> > > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > > > >
> > > > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > > > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > > > >    if w1 < w6 goto pc+4
> > > > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > > > >    call bpf_get_prandom_u32;
> > > > >    r1 = r0;
> > > > >    r1 <<= 32;
> > > > >    call bpf_get_prandom_u32;
> > > > >    r1 |= r0;  /* r1 is 64bit random number */
> > > > >    r2 = 0xffffffff80000000 ll;
> > > > >    if r1 s< r2 goto end;
> > > > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > > > >    if w1 < w6 goto end;
> > > > >    ...  <=== w1 range [0,31]
> > > > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > > > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> > > >
> > > > Just rephrasing for myself...
> > > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > > > then lower 32-bit has to be negative.
> > > > and because we're doing unsigned compare w1 < w6
> > > > and w6 is less than 80000000
> > > > we can conclude that upper bits are zero.
> > > > right?
> > >
> > > Sorry, could you please explain this a bit more.
> >
> > Yep, also curious.
> >
> > But meanwhile, I'm intending to update bpf_for() to something like
> > below to avoid this code generation pattern:
> >
>
> Well, thank you, Gmail, for messed up formatting. See [0] for properly
> formatted diff.
>
>   [0] https://gist.github.com/anakryiko/08a4374259469803af4ea2185296b0cb

Not that simple. It needs sizeof(start)==8 extra hack like bpf_cmp().
And the same with 'end'. So it will get just as ugly.
Let's make the verifier smarter instead.
Alexei Starovoitov July 9, 2024, 2:09 a.m. UTC | #9
On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > >
> > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > >    if w1 < w6 goto pc+4
> > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > >    call bpf_get_prandom_u32;
> > >    r1 = r0;
> > >    r1 <<= 32;
> > >    call bpf_get_prandom_u32;
> > >    r1 |= r0;  /* r1 is 64bit random number */
> > >    r2 = 0xffffffff80000000 ll;
> > >    if r1 s< r2 goto end;
> > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > >    if w1 < w6 goto end;
> > >    ...  <=== w1 range [0,31]
> > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> >
> > Just rephrasing for myself...
> > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > then lower 32-bit has to be negative.
> > and because we're doing unsigned compare w1 < w6
> > and w6 is less than 80000000
> > we can conclude that upper bits are zero.
> > right?
>
> Sorry, could you please explain this a bit more.
> The w1 < w6 comparison only infers information about sub-registers.
> So the range for the full register r1 would still have 0xffffFFFF
> for upper bits => r1 += r2 would fail.
> What do I miss?

Not sure how to rephrase the above differently...
Because smin=0xffffffff80000000...
so full reg cannot be 0xffffFFFF0...123
so when lower 32-bit are compared with unsigned and range of rhs
is less than 8000000 it means that the upper 32-bit of full reg are zero.
Andrii Nakryiko July 9, 2024, 4:45 p.m. UTC | #10
On Mon, Jul 8, 2024 at 7:04 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 8, 2024 at 3:12 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 8, 2024 at 3:11 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > >
> > > > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
> > > >
> > > > [...]
> > > >
> > > > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > > > > >
> > > > > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > > > > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > > > > >    if w1 < w6 goto pc+4
> > > > > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > > > > >    call bpf_get_prandom_u32;
> > > > > >    r1 = r0;
> > > > > >    r1 <<= 32;
> > > > > >    call bpf_get_prandom_u32;
> > > > > >    r1 |= r0;  /* r1 is 64bit random number */
> > > > > >    r2 = 0xffffffff80000000 ll;
> > > > > >    if r1 s< r2 goto end;
> > > > > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > > > > >    if w1 < w6 goto end;
> > > > > >    ...  <=== w1 range [0,31]
> > > > > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > > > > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> > > > >
> > > > > Just rephrasing for myself...
> > > > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > > > > then lower 32-bit has to be negative.
> > > > > and because we're doing unsigned compare w1 < w6
> > > > > and w6 is less than 80000000
> > > > > we can conclude that upper bits are zero.
> > > > > right?
> > > >
> > > > Sorry, could you please explain this a bit more.
> > >
> > > Yep, also curious.
> > >
> > > But meanwhile, I'm intending to update bpf_for() to something like
> > > below to avoid this code generation pattern:
> > >
> >
> > Well, thank you, Gmail, for messed up formatting. See [0] for properly
> > formatted diff.
> >
> >   [0] https://gist.github.com/anakryiko/08a4374259469803af4ea2185296b0cb
>
> Not that simple. It needs sizeof(start)==8 extra hack like bpf_cmp().

I'm forgetting the details, but I feel like sizeof() == 4 was
important for bpf_cmp() to compare wX registers instead of always
comparing Rx. But in this case I think we are fine with always working
with full 64-bit Rx registers. Or is there some correctness issue
involved?

> And the same with 'end'. So it will get just as ugly.
> Let's make the verifier smarter instead.

Oh, absolutely, let's. But that doesn't solve the problem of someone
using bpf_for() with the latest Clang on an older kernel that doesn't
yet have this smartness, does it? Which is why I want to mitigate that
on the bpf_for() side in addition to improvements on the verifier
side.
Andrii Nakryiko July 9, 2024, 4:49 p.m. UTC | #11
On Mon, Jul 8, 2024 at 7:09 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
> >
> > [...]
> >
> > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > > >
> > > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > > >    if w1 < w6 goto pc+4
> > > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > > >    call bpf_get_prandom_u32;
> > > >    r1 = r0;
> > > >    r1 <<= 32;
> > > >    call bpf_get_prandom_u32;
> > > >    r1 |= r0;  /* r1 is 64bit random number */
> > > >    r2 = 0xffffffff80000000 ll;
> > > >    if r1 s< r2 goto end;
> > > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > > >    if w1 < w6 goto end;
> > > >    ...  <=== w1 range [0,31]
> > > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> > >
> > > Just rephrasing for myself...
> > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > > then lower 32-bit has to be negative.
> > > and because we're doing unsigned compare w1 < w6
> > > and w6 is less than 80000000
> > > we can conclude that upper bits are zero.
> > > right?
> >
> > Sorry, could you please explain this a bit more.
> > The w1 < w6 comparison only infers information about sub-registers.
> > So the range for the full register r1 would still have 0xffffFFFF
> > for upper bits => r1 += r2 would fail.
> > What do I miss?
>
> Not sure how to rephrase the above differently...
> Because smin=0xffffffff80000000...
> so full reg cannot be 0xffffFFFF0...123
> so when lower 32-bit are compared with unsigned and range of rhs
> is less than 8000000 it means that the upper 32-bit of full reg are zero.

yep, I think that makes sense. This has to be a special case when the
upper 32 bits are either all zeroes or ones, right?
Alexei Starovoitov July 9, 2024, 5:22 p.m. UTC | #12
On Tue, Jul 9, 2024 at 9:45 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 8, 2024 at 7:04 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jul 8, 2024 at 3:12 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jul 8, 2024 at 3:11 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > >
> > > > > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > > > > > >
> > > > > > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > > > > > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > > > > > >    if w1 < w6 goto pc+4
> > > > > > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > > > > > >    call bpf_get_prandom_u32;
> > > > > > >    r1 = r0;
> > > > > > >    r1 <<= 32;
> > > > > > >    call bpf_get_prandom_u32;
> > > > > > >    r1 |= r0;  /* r1 is 64bit random number */
> > > > > > >    r2 = 0xffffffff80000000 ll;
> > > > > > >    if r1 s< r2 goto end;
> > > > > > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > > > > > >    if w1 < w6 goto end;
> > > > > > >    ...  <=== w1 range [0,31]
> > > > > > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > > > > > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> > > > > >
> > > > > > Just rephrasing for myself...
> > > > > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > > > > > then lower 32-bit has to be negative.
> > > > > > and because we're doing unsigned compare w1 < w6
> > > > > > and w6 is less than 80000000
> > > > > > we can conclude that upper bits are zero.
> > > > > > right?
> > > > >
> > > > > Sorry, could you please explain this a bit more.
> > > >
> > > > Yep, also curious.
> > > >
> > > > But meanwhile, I'm intending to update bpf_for() to something like
> > > > below to avoid this code generation pattern:
> > > >
> > >
> > > Well, thank you, Gmail, for messed up formatting. See [0] for properly
> > > formatted diff.
> > >
> > >   [0] https://gist.github.com/anakryiko/08a4374259469803af4ea2185296b0cb
> >
> > Not that simple. It needs sizeof(start)==8 extra hack like bpf_cmp().
>
> I'm forgetting the details, but I feel like sizeof() == 4 was
> important for bpf_cmp() to compare wX registers instead of always
> comparing Rx. But in this case I think we are fine with always working
> with full 64-bit Rx registers. Or is there some correctness issue
> involved?

it's a correctness issue.
sizeof()==8 has to go via "r" otherwise it's a silent truncation
by llvm.

> > And the same with 'end'. So it will get just as ugly.
> > Let's make the verifier smarter instead.
>
> Oh, absolutely, let's. But that doesn't solve the problem of someone
> using bpf_for() with the latest Clang on an older kernel that doesn't
> yet have this smartness, does it? Which is why I want to mitigate that
> on the bpf_for() side in addition to improvements on the verifier
> side.

There is no urgency here.
Here it's a combination of the very latest llvm trunk and -mcpu=v4.
-mcpu=v4 is rare. Users can continue with -mcpu=v3 or released llvm.
Alexei Starovoitov July 9, 2024, 5:23 p.m. UTC | #13
On Tue, Jul 9, 2024 at 9:49 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 8, 2024 at 7:09 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
> > >
> > > [...]
> > >
> > > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > > > >
> > > > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > > > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > > > >    if w1 < w6 goto pc+4
> > > > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > > > >    call bpf_get_prandom_u32;
> > > > >    r1 = r0;
> > > > >    r1 <<= 32;
> > > > >    call bpf_get_prandom_u32;
> > > > >    r1 |= r0;  /* r1 is 64bit random number */
> > > > >    r2 = 0xffffffff80000000 ll;
> > > > >    if r1 s< r2 goto end;
> > > > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > > > >    if w1 < w6 goto end;
> > > > >    ...  <=== w1 range [0,31]
> > > > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > > > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> > > >
> > > > Just rephrasing for myself...
> > > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > > > then lower 32-bit has to be negative.
> > > > and because we're doing unsigned compare w1 < w6
> > > > and w6 is less than 80000000
> > > > we can conclude that upper bits are zero.
> > > > right?
> > >
> > > Sorry, could you please explain this a bit more.
> > > The w1 < w6 comparison only infers information about sub-registers.
> > > So the range for the full register r1 would still have 0xffffFFFF
> > > for upper bits => r1 += r2 would fail.
> > > What do I miss?
> >
> > Not sure how to rephrase the above differently...
> > Because smin=0xffffffff80000000...
> > so full reg cannot be 0xffffFFFF0...123
> > so when lower 32-bit are compared with unsigned and range of rhs
> > is less than 8000000 it means that the upper 32-bit of full reg are zero.
>
> yep, I think that makes sense. This has to be a special case when the
> upper 32 bits are either all zeroes or ones, right?

yes. exactly.
Andrii Nakryiko July 9, 2024, 6:12 p.m. UTC | #14
On Tue, Jul 9, 2024 at 10:22 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jul 9, 2024 at 9:45 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 8, 2024 at 7:04 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Jul 8, 2024 at 3:12 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 8, 2024 at 3:11 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > > > > > > >
> > > > > > > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > > > > > > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > > > > > > >    if w1 < w6 goto pc+4
> > > > > > > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > > > > > > >    call bpf_get_prandom_u32;
> > > > > > > >    r1 = r0;
> > > > > > > >    r1 <<= 32;
> > > > > > > >    call bpf_get_prandom_u32;
> > > > > > > >    r1 |= r0;  /* r1 is 64bit random number */
> > > > > > > >    r2 = 0xffffffff80000000 ll;
> > > > > > > >    if r1 s< r2 goto end;
> > > > > > > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > > > > > > >    if w1 < w6 goto end;
> > > > > > > >    ...  <=== w1 range [0,31]
> > > > > > > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > > > > > > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> > > > > > >
> > > > > > > Just rephrasing for myself...
> > > > > > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > > > > > > then lower 32-bit has to be negative.
> > > > > > > and because we're doing unsigned compare w1 < w6
> > > > > > > and w6 is less than 80000000
> > > > > > > we can conclude that upper bits are zero.
> > > > > > > right?
> > > > > >
> > > > > > Sorry, could you please explain this a bit more.
> > > > >
> > > > > Yep, also curious.
> > > > >
> > > > > But meanwhile, I'm intending to update bpf_for() to something like
> > > > > below to avoid this code generation pattern:
> > > > >
> > > >
> > > > Well, thank you, Gmail, for messed up formatting. See [0] for properly
> > > > formatted diff.
> > > >
> > > >   [0] https://gist.github.com/anakryiko/08a4374259469803af4ea2185296b0cb
> > >
> > > Not that simple. It needs sizeof(start)==8 extra hack like bpf_cmp().
> >
> > I'm forgetting the details, but I feel like sizeof() == 4 was
> > important for bpf_cmp() to compare wX registers instead of always
> > comparing Rx. But in this case I think we are fine with always working
> > with full 64-bit Rx registers. Or is there some correctness issue
> > involved?
>
> it's a correctness issue.
> sizeof()==8 has to go via "r" otherwise it's a silent truncation
> by llvm.
>
> > > And the same with 'end'. So it will get just as ugly.
> > > Let's make the verifier smarter instead.
> >
> > Oh, absolutely, let's. But that doesn't solve the problem of someone
> > using bpf_for() with the latest Clang on an older kernel that doesn't
> > yet have this smartness, does it? Which is why I want to mitigate that
> > on the bpf_for() side in addition to improvements on the verifier
> > side.
>
> There is no urgency here.
> Here it's a combination of the very latest llvm trunk and -mcpu=v4.
> -mcpu=v4 is rare. Users can continue with -mcpu=v3 or released llvm.

Ok, fair enough, I can hold off on this for now (though eventually
people will switch and will want to run on old kernels, so the problem
doesn't really go away).


But really it's quite annoying that we don't have a proper way to just
say "give me whatever register makes sense so I can embed it in the
assembly and do something simple with it".

Yonghong, Eduard, is this something that can be figured out to let us
do straightforward pieces of embedded assembly like bpf_cmp() or this
bpf_for() hack?
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
index 16bdc3e25591..d1801d151a12 100644
--- a/tools/testing/selftests/bpf/progs/iters.c
+++ b/tools/testing/selftests/bpf/progs/iters.c
@@ -1419,7 +1419,8 @@  SEC("raw_tp")
 __success
 int iter_arr_with_actual_elem_count(const void *ctx)
 {
-	int i, n = loop_data.n, sum = 0;
+	unsigned i;
+	int n = loop_data.n, sum = 0;
 
 	if (n > ARRAY_SIZE(loop_data.data))
 		return 0;