[bpf-next,0/7] bpf: Improve verifier for cond_op and spilled loop index variables

Message ID	20230330055600.86870-1-yhs@fb.com (mailing list archive)
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Yonghong Song <yhs@fb.com> To: <bpf@vger.kernel.org> CC: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, <kernel-team@fb.com>, Martin KaFai Lau <martin.lau@kernel.org> Subject: [PATCH bpf-next 0/7] bpf: Improve verifier for cond_op and spilled loop index variables Date: Wed, 29 Mar 2023 22:56:00 -0700 Message-ID: <20230330055600.86870-1-yhs@fb.com> Content-Type: text/plain Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Precedence: bulk
Series	bpf: Improve verifier for cond_op and spilled loop index variables \| expand [bpf-next,0/7] bpf: Improve verifier for cond_op and spilled loop index variables [bpf-next,1/7] bpf: Improve verifier JEQ/JNE insn branch taken checking [bpf-next,2/7] selftests/bpf: Add tests for non-constant cond_op NE/EQ bound deduction [bpf-next,3/7] bpf: Improve handling of pattern '<const> <cond_op> <non_const>' in verifier [bpf-next,4/7] selftests/bpf: Add verifier tests for code pattern '<const> <cond_op> <non_const>' [bpf-next,5/7] bpf: Mark potential spilled loop index variable as precise [bpf-next,6/7] selftests/bpf: Remove previous workaround for test verif_scale_loop6 [bpf-next,7/7] selftests/bpf: Add a new test based on loop6.c

Message ID

20230330055600.86870-1-yhs@fb.com (mailing list archive)

Headers

From: Yonghong Song <yhs@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Andrii Nakryiko <andrii@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>, <kernel-team@fb.com>,
        Martin KaFai Lau <martin.lau@kernel.org>
Subject: [PATCH bpf-next 0/7] bpf: Improve verifier for cond_op and spilled
 loop index variables
Date: Wed, 29 Mar 2023 22:56:00 -0700
Message-ID: <20230330055600.86870-1-yhs@fb.com>
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Precedence: bulk

Series

bpf: Improve verifier for cond_op and spilled loop index variables | expand

Message

Yonghong Song March 30, 2023, 5:56 a.m. UTC

LLVM commit [1] introduced hoistMinMax optimization like
  (i < VIRTIO_MAX_SGS) && (i < out_sgs)
to
  upper = MIN(VIRTIO_MAX_SGS, out_sgs)
  ... i < upper ...
and caused the verification failure. Commit [2] workarounded the issue by
adding some bpf assembly code to prohibit the above optimization.
This patch improved verifier such that verification can succeed without
the above workaround.

Without [2], the current verifier will hit the following failures:
  ...
  119: (15) if r1 == 0x0 goto pc+1      
  The sequence of 8193 jumps is too complex.
  verification time 525829 usec
  stack depth 64
  processed 156616 insns (limit 1000000) max_states_per_insn 8 total_states 1754 peak_states 1712 mark_read 12
  -- END PROG LOAD LOG --
  libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
  libbpf: failed to load object 'loop6.bpf.o'
  ...
The failure is due to verifier inadequately handling '<const> <cond_op> <non_const>' which will
go through both pathes and generate the following verificaiton states:
  ...
  89: (07) r2 += 1                      ; R2_w=5
  90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
  91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
  92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=6,umax=5,var_off=(0x4; 0x3))
      R2_w=5 R6_w=scalar(id=385) R7_w=0 R8_w=scalar() R9_w=scalar(umax=21474836475,var_off=(0x0; 0x7ffffffff))
      R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40_w=4 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm    
  ...
  89: (07) r2 += 1                      ; R2_w=6
  90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
  91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
  92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=7,umax=5,var_off=(0x4; 0x3))
      R2_w=6 R6=scalar(id=388) R7=0 R8_w=scalar() R9_w=scalar(umax=25769803770,var_off=(0x0; 0x7ffffffff))
      R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=5 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
    ...
  89: (07) r2 += 1                      ; R2_w=4088
  90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
  91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
  92: (ad) if r2 < r1 goto pc+41        ; R0=scalar() R1=scalar(umin=4089,umax=5,var_off=(0x0; 0x7))
      R2=4088 R6=scalar(id=12634) R7=0 R8=scalar() R9=scalar(umax=17557826301960,var_off=(0x0; 0xfffffffffff))
      R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=4087 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm

Patch 3 fixed the above issue by handling '<const> <cond_op> <non_const>' properly.
During developing selftests for Patch 3, I found some issues with bound deduction with
BPF_EQ/BPF_NE and fixed the issue in Patch 1.

After the above issue is fixed, the second issue shows up.
  ...
  67: (07) r1 += -16                    ; R1_w=fp-16
  ; bpf_probe_read_kernel(&sgp, sizeof(sgp), sgs + i);
  68: (b4) w2 = 8                       ; R2_w=8
  69: (85) call bpf_probe_read_kernel#113       ; R0_w=scalar() fp-16=mmmmmmmm
  ; return sgp;
  70: (79) r6 = *(u64 *)(r10 -16)       ; R6=scalar() R10=fp0
  ; for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX);
  71: (15) if r6 == 0x0 goto pc-49      ; R6=scalar()
  72: (b4) w1 = 0                       ; R1_w=0
  73: (05) goto pc-46
  ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
  28: (bc) w7 = w1                      ; R1_w=0 R7_w=0
  ; bpf_probe_read_kernel(&len, sizeof(len), &sgp->length);
  ...
  23: (79) r3 = *(u64 *)(r10 -40)       ; R3_w=2 R10=fp0
  ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
  24: (07) r3 += 1                      ; R3_w=3
  ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
  25: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
  26: (ad) if r3 < r1 goto pc+34 61: R0=scalar() R1_w=scalar(umin=4,umax=5,var_off=(0x4; 0x1)) R3_w=3 R6=scalar(id=1658)
     R7=0 R8=scalar(id=1653) R9=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm
     fp-24=mmmm???? fp-32= fp-40=2 fp-56= fp-64=mmmmmmmm
  ; if (sg_is_chain(&sg))               
  61: (7b) *(u64 *)(r10 -40) = r3       ; R3_w=3 R10=fp0 fp-40_w=3
    ...
  67: (07) r1 += -16                    ; R1_w=fp-16
  ; bpf_probe_read_kernel(&sgp, sizeof(sgp), sgs + i);
  68: (b4) w2 = 8                       ; R2_w=8
  69: (85) call bpf_probe_read_kernel#113       ; R0_w=scalar() fp-16=mmmmmmmm
  ; return sgp;
  70: (79) r6 = *(u64 *)(r10 -16)       
  ; for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX);
  infinite loop detected at insn 71     
  verification time 90800 usec
  stack depth 64
  processed 25017 insns (limit 1000000) max_states_per_insn 20 total_states 491 peak_states 169 mark_read 12
  -- END PROG LOAD LOG --
  libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -22

Further analysis found the index variable 'i' is spilled but since it is not marked as precise, regsafe will ignore
comparison since they are scalar values.

Since it is hard for verifier to determine whether a particular scalar is index variable or not, Patch 5 implemented
a heuristic such that if both old and new reg states are constant, mark the old one as precise to force scalar value
comparison and this fixed the problem.

The rest of patches are selftests related.

  [1] https://reviews.llvm.org/D143726
  [2] Commit 3c2611bac08a ("selftests/bpf: Fix trace_virtqueue_add_sgs test issue with LLVM 17.")

Yonghong Song (7):
  bpf: Improve verifier JEQ/JNE insn branch taken checking
  selftests/bpf: Add tests for non-constant cond_op NE/EQ bound
    deduction
  bpf: Improve handling of pattern '<const> <cond_op> <non_const>' in
    verifier
  selftests/bpf: Add verifier tests for code pattern '<const> <cond_op>
    <non_const>'
  bpf: Mark potential spilled loop index variable as precise
  selftests/bpf: Remove previous workaround for test verif_scale_loop6
  selftests/bpf: Add a new test based on loop6.c

 kernel/bpf/verifier.c                         |  40 +-
 .../bpf/prog_tests/bpf_verif_scale.c          |   5 +
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 tools/testing/selftests/bpf/progs/loop6.c     |   2 -
 tools/testing/selftests/bpf/progs/loop7.c     | 102 ++++
 .../verifier_bounds_deduction_non_const.c     | 553 ++++++++++++++++++
 .../progs/verifier_bounds_mix_sign_unsign.c   |   2 +-
 7 files changed, 701 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/loop7.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_bounds_deduction_non_const.c

Comments

Andrii Nakryiko April 4, 2023, 9:46 p.m. UTC | #1

On Wed, Mar 29, 2023 at 10:56 PM Yonghong Song <yhs@fb.com> wrote:
>
> LLVM commit [1] introduced hoistMinMax optimization like
>   (i < VIRTIO_MAX_SGS) && (i < out_sgs)
> to
>   upper = MIN(VIRTIO_MAX_SGS, out_sgs)
>   ... i < upper ...
> and caused the verification failure. Commit [2] workarounded the issue by
> adding some bpf assembly code to prohibit the above optimization.
> This patch improved verifier such that verification can succeed without
> the above workaround.
>
> Without [2], the current verifier will hit the following failures:
>   ...
>   119: (15) if r1 == 0x0 goto pc+1
>   The sequence of 8193 jumps is too complex.
>   verification time 525829 usec
>   stack depth 64
>   processed 156616 insns (limit 1000000) max_states_per_insn 8 total_states 1754 peak_states 1712 mark_read 12
>   -- END PROG LOAD LOG --
>   libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
>   libbpf: failed to load object 'loop6.bpf.o'
>   ...
> The failure is due to verifier inadequately handling '<const> <cond_op> <non_const>' which will
> go through both pathes and generate the following verificaiton states:
>   ...
>   89: (07) r2 += 1                      ; R2_w=5
>   90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
>   91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
>   92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=6,umax=5,var_off=(0x4; 0x3))

offtopic, but if this is a real output, then something is wrong with
scratching register logic for conditional, it should have emitted
states of R1 and R2, maybe you can take a look while working on this
patch set?

>       R2_w=5 R6_w=scalar(id=385) R7_w=0 R8_w=scalar() R9_w=scalar(umax=21474836475,var_off=(0x0; 0x7ffffffff))
>       R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40_w=4 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
>   ...
>   89: (07) r2 += 1                      ; R2_w=6
>   90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
>   91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
>   92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=7,umax=5,var_off=(0x4; 0x3))
>       R2_w=6 R6=scalar(id=388) R7=0 R8_w=scalar() R9_w=scalar(umax=25769803770,var_off=(0x0; 0x7ffffffff))
>       R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=5 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
>     ...
>   89: (07) r2 += 1                      ; R2_w=4088
>   90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
>   91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
>   92: (ad) if r2 < r1 goto pc+41        ; R0=scalar() R1=scalar(umin=4089,umax=5,var_off=(0x0; 0x7))
>       R2=4088 R6=scalar(id=12634) R7=0 R8=scalar() R9=scalar(umax=17557826301960,var_off=(0x0; 0xfffffffffff))
>       R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=4087 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
>
> Patch 3 fixed the above issue by handling '<const> <cond_op> <non_const>' properly.
> During developing selftests for Patch 3, I found some issues with bound deduction with
> BPF_EQ/BPF_NE and fixed the issue in Patch 1.
>
> After the above issue is fixed, the second issue shows up.
>   ...
>   67: (07) r1 += -16                    ; R1_w=fp-16
>   ; bpf_probe_read_kernel(&sgp, sizeof(sgp), sgs + i);
>   68: (b4) w2 = 8                       ; R2_w=8
>   69: (85) call bpf_probe_read_kernel#113       ; R0_w=scalar() fp-16=mmmmmmmm
>   ; return sgp;
>   70: (79) r6 = *(u64 *)(r10 -16)       ; R6=scalar() R10=fp0
>   ; for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX);
>   71: (15) if r6 == 0x0 goto pc-49      ; R6=scalar()
>   72: (b4) w1 = 0                       ; R1_w=0
>   73: (05) goto pc-46
>   ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
>   28: (bc) w7 = w1                      ; R1_w=0 R7_w=0
>   ; bpf_probe_read_kernel(&len, sizeof(len), &sgp->length);
>   ...
>   23: (79) r3 = *(u64 *)(r10 -40)       ; R3_w=2 R10=fp0
>   ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
>   24: (07) r3 += 1                      ; R3_w=3
>   ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < out_sgs); i++) {
>   25: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
>   26: (ad) if r3 < r1 goto pc+34 61: R0=scalar() R1_w=scalar(umin=4,umax=5,var_off=(0x4; 0x1)) R3_w=3 R6=scalar(id=1658)
>      R7=0 R8=scalar(id=1653) R9=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm
>      fp-24=mmmm???? fp-32= fp-40=2 fp-56= fp-64=mmmmmmmm
>   ; if (sg_is_chain(&sg))
>   61: (7b) *(u64 *)(r10 -40) = r3       ; R3_w=3 R10=fp0 fp-40_w=3
>     ...
>   67: (07) r1 += -16                    ; R1_w=fp-16
>   ; bpf_probe_read_kernel(&sgp, sizeof(sgp), sgs + i);
>   68: (b4) w2 = 8                       ; R2_w=8
>   69: (85) call bpf_probe_read_kernel#113       ; R0_w=scalar() fp-16=mmmmmmmm
>   ; return sgp;
>   70: (79) r6 = *(u64 *)(r10 -16)
>   ; for (n = 0, sgp = get_sgp(sgs, i); sgp && (n < SG_MAX);
>   infinite loop detected at insn 71
>   verification time 90800 usec
>   stack depth 64
>   processed 25017 insns (limit 1000000) max_states_per_insn 20 total_states 491 peak_states 169 mark_read 12
>   -- END PROG LOAD LOG --
>   libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -22
>
> Further analysis found the index variable 'i' is spilled but since it is not marked as precise, regsafe will ignore
> comparison since they are scalar values.
>
> Since it is hard for verifier to determine whether a particular scalar is index variable or not, Patch 5 implemented
> a heuristic such that if both old and new reg states are constant, mark the old one as precise to force scalar value
> comparison and this fixed the problem.
>
> The rest of patches are selftests related.
>
>   [1] https://reviews.llvm.org/D143726
>   [2] Commit 3c2611bac08a ("selftests/bpf: Fix trace_virtqueue_add_sgs test issue with LLVM 17.")
>
> Yonghong Song (7):
>   bpf: Improve verifier JEQ/JNE insn branch taken checking
>   selftests/bpf: Add tests for non-constant cond_op NE/EQ bound
>     deduction
>   bpf: Improve handling of pattern '<const> <cond_op> <non_const>' in
>     verifier
>   selftests/bpf: Add verifier tests for code pattern '<const> <cond_op>
>     <non_const>'
>   bpf: Mark potential spilled loop index variable as precise
>   selftests/bpf: Remove previous workaround for test verif_scale_loop6
>   selftests/bpf: Add a new test based on loop6.c
>
>  kernel/bpf/verifier.c                         |  40 +-
>  .../bpf/prog_tests/bpf_verif_scale.c          |   5 +
>  .../selftests/bpf/prog_tests/verifier.c       |   2 +
>  tools/testing/selftests/bpf/progs/loop6.c     |   2 -
>  tools/testing/selftests/bpf/progs/loop7.c     | 102 ++++
>  .../verifier_bounds_deduction_non_const.c     | 553 ++++++++++++++++++
>  .../progs/verifier_bounds_mix_sign_unsign.c   |   2 +-
>  7 files changed, 701 insertions(+), 5 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/loop7.c
>  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bounds_deduction_non_const.c
>
> --
> 2.34.1
>

Yonghong Song April 6, 2023, 4:49 p.m. UTC | #2

On 4/4/23 2:46 PM, Andrii Nakryiko wrote:
> On Wed, Mar 29, 2023 at 10:56 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> LLVM commit [1] introduced hoistMinMax optimization like
>>    (i < VIRTIO_MAX_SGS) && (i < out_sgs)
>> to
>>    upper = MIN(VIRTIO_MAX_SGS, out_sgs)
>>    ... i < upper ...
>> and caused the verification failure. Commit [2] workarounded the issue by
>> adding some bpf assembly code to prohibit the above optimization.
>> This patch improved verifier such that verification can succeed without
>> the above workaround.
>>
>> Without [2], the current verifier will hit the following failures:
>>    ...
>>    119: (15) if r1 == 0x0 goto pc+1
>>    The sequence of 8193 jumps is too complex.
>>    verification time 525829 usec
>>    stack depth 64
>>    processed 156616 insns (limit 1000000) max_states_per_insn 8 total_states 1754 peak_states 1712 mark_read 12
>>    -- END PROG LOAD LOG --
>>    libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
>>    libbpf: failed to load object 'loop6.bpf.o'
>>    ...
>> The failure is due to verifier inadequately handling '<const> <cond_op> <non_const>' which will
>> go through both pathes and generate the following verificaiton states:
>>    ...
>>    89: (07) r2 += 1                      ; R2_w=5
>>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
>>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
>>    92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=6,umax=5,var_off=(0x4; 0x3))
> 
> offtopic, but if this is a real output, then something is wrong with
> scratching register logic for conditional, it should have emitted
> states of R1 and R2, maybe you can take a look while working on this
> patch set?

Yes, this is the real output. Yes, the above R1_w should be an 
impossible state. This is what this patch tries to fix.
I am not what verifier should do if this state indeed happens,
return an -EFAULT or something?

> 
>>        R2_w=5 R6_w=scalar(id=385) R7_w=0 R8_w=scalar() R9_w=scalar(umax=21474836475,var_off=(0x0; 0x7ffffffff))
>>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40_w=4 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
>>    ...
>>    89: (07) r2 += 1                      ; R2_w=6
>>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
>>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
>>    92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=7,umax=5,var_off=(0x4; 0x3))
>>        R2_w=6 R6=scalar(id=388) R7=0 R8_w=scalar() R9_w=scalar(umax=25769803770,var_off=(0x0; 0x7ffffffff))
>>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=5 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
>>      ...
>>    89: (07) r2 += 1                      ; R2_w=4088
>>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
>>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
>>    92: (ad) if r2 < r1 goto pc+41        ; R0=scalar() R1=scalar(umin=4089,umax=5,var_off=(0x0; 0x7))
>>        R2=4088 R6=scalar(id=12634) R7=0 R8=scalar() R9=scalar(umax=17557826301960,var_off=(0x0; 0xfffffffffff))
>>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=4087 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
>>
>> Patch 3 fixed the above issue by handling '<const> <cond_op> <non_const>' properly.
>> During developing selftests for Patch 3, I found some issues with bound deduction with
>> BPF_EQ/BPF_NE and fixed the issue in Patch 1.
[...]

Andrii Nakryiko April 6, 2023, 6:38 p.m. UTC | #3

On Thu, Apr 6, 2023 at 9:49 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 4/4/23 2:46 PM, Andrii Nakryiko wrote:
> > On Wed, Mar 29, 2023 at 10:56 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >> LLVM commit [1] introduced hoistMinMax optimization like
> >>    (i < VIRTIO_MAX_SGS) && (i < out_sgs)
> >> to
> >>    upper = MIN(VIRTIO_MAX_SGS, out_sgs)
> >>    ... i < upper ...
> >> and caused the verification failure. Commit [2] workarounded the issue by
> >> adding some bpf assembly code to prohibit the above optimization.
> >> This patch improved verifier such that verification can succeed without
> >> the above workaround.
> >>
> >> Without [2], the current verifier will hit the following failures:
> >>    ...
> >>    119: (15) if r1 == 0x0 goto pc+1
> >>    The sequence of 8193 jumps is too complex.
> >>    verification time 525829 usec
> >>    stack depth 64
> >>    processed 156616 insns (limit 1000000) max_states_per_insn 8 total_states 1754 peak_states 1712 mark_read 12
> >>    -- END PROG LOAD LOG --
> >>    libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
> >>    libbpf: failed to load object 'loop6.bpf.o'
> >>    ...
> >> The failure is due to verifier inadequately handling '<const> <cond_op> <non_const>' which will
> >> go through both pathes and generate the following verificaiton states:
> >>    ...
> >>    89: (07) r2 += 1                      ; R2_w=5
> >>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
> >>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
> >>    92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=6,umax=5,var_off=(0x4; 0x3))
> >
> > offtopic, but if this is a real output, then something is wrong with
> > scratching register logic for conditional, it should have emitted
> > states of R1 and R2, maybe you can take a look while working on this
> > patch set?
>
> Yes, this is the real output. Yes, the above R1_w should be an
> impossible state. This is what this patch tries to fix.
> I am not what verifier should do if this state indeed happens,
> return an -EFAULT or something?

no-no, that's not what I'm talking about. Look at the instruction, it
compares R1 and R2, yet we print the state of R0 and R1, as if that
instruction worked with R0 and R1 instead. That's confusing and wrong.
There is some off-by-one error in mark_register_scratched() call, or
something like that.

>
> >
> >>        R2_w=5 R6_w=scalar(id=385) R7_w=0 R8_w=scalar() R9_w=scalar(umax=21474836475,var_off=(0x0; 0x7ffffffff))
> >>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40_w=4 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
> >>    ...
> >>    89: (07) r2 += 1                      ; R2_w=6
> >>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
> >>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
> >>    92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=7,umax=5,var_off=(0x4; 0x3))
> >>        R2_w=6 R6=scalar(id=388) R7=0 R8_w=scalar() R9_w=scalar(umax=25769803770,var_off=(0x0; 0x7ffffffff))
> >>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=5 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
> >>      ...
> >>    89: (07) r2 += 1                      ; R2_w=4088
> >>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
> >>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
> >>    92: (ad) if r2 < r1 goto pc+41        ; R0=scalar() R1=scalar(umin=4089,umax=5,var_off=(0x0; 0x7))
> >>        R2=4088 R6=scalar(id=12634) R7=0 R8=scalar() R9=scalar(umax=17557826301960,var_off=(0x0; 0xfffffffffff))
> >>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=4087 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
> >>
> >> Patch 3 fixed the above issue by handling '<const> <cond_op> <non_const>' properly.
> >> During developing selftests for Patch 3, I found some issues with bound deduction with
> >> BPF_EQ/BPF_NE and fixed the issue in Patch 1.
> [...]

Alexei Starovoitov April 6, 2023, 7:54 p.m. UTC | #4

On Thu, Apr 6, 2023 at 11:39 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 6, 2023 at 9:49 AM Yonghong Song <yhs@meta.com> wrote:
> >
> >
> >
> > On 4/4/23 2:46 PM, Andrii Nakryiko wrote:
> > > On Wed, Mar 29, 2023 at 10:56 PM Yonghong Song <yhs@fb.com> wrote:
> > >>
> > >> LLVM commit [1] introduced hoistMinMax optimization like
> > >>    (i < VIRTIO_MAX_SGS) && (i < out_sgs)
> > >> to
> > >>    upper = MIN(VIRTIO_MAX_SGS, out_sgs)
> > >>    ... i < upper ...
> > >> and caused the verification failure. Commit [2] workarounded the issue by
> > >> adding some bpf assembly code to prohibit the above optimization.
> > >> This patch improved verifier such that verification can succeed without
> > >> the above workaround.
> > >>
> > >> Without [2], the current verifier will hit the following failures:
> > >>    ...
> > >>    119: (15) if r1 == 0x0 goto pc+1
> > >>    The sequence of 8193 jumps is too complex.
> > >>    verification time 525829 usec
> > >>    stack depth 64
> > >>    processed 156616 insns (limit 1000000) max_states_per_insn 8 total_states 1754 peak_states 1712 mark_read 12
> > >>    -- END PROG LOAD LOG --
> > >>    libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
> > >>    libbpf: failed to load object 'loop6.bpf.o'
> > >>    ...
> > >> The failure is due to verifier inadequately handling '<const> <cond_op> <non_const>' which will
> > >> go through both pathes and generate the following verificaiton states:
> > >>    ...
> > >>    89: (07) r2 += 1                      ; R2_w=5
> > >>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
> > >>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
> > >>    92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=6,umax=5,var_off=(0x4; 0x3))
> > >
> > > offtopic, but if this is a real output, then something is wrong with
> > > scratching register logic for conditional, it should have emitted
> > > states of R1 and R2, maybe you can take a look while working on this
> > > patch set?
> >
> > Yes, this is the real output. Yes, the above R1_w should be an
> > impossible state. This is what this patch tries to fix.
> > I am not what verifier should do if this state indeed happens,
> > return an -EFAULT or something?
>
> no-no, that's not what I'm talking about. Look at the instruction, it
> compares R1 and R2, yet we print the state of R0 and R1, as if that
> instruction worked with R0 and R1 instead. That's confusing and wrong.
> There is some off-by-one error in mark_register_scratched() call, or
> something like that.

I suspect Yonghong just trimmed the output.
Line 92 has both R1 and R2 below.
Why R0, R6, R7, R8 are also there... is indeed wrong.
I've looked at scratch marking logic and don't an obvious bug.
Something to investigate.

> >
> > >
> > >>        R2_w=5 R6_w=scalar(id=385) R7_w=0 R8_w=scalar() R9_w=scalar(umax=21474836475,var_off=(0x0; 0x7ffffffff))
> > >>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40_w=4 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
> > >>    ...
> > >>    89: (07) r2 += 1                      ; R2_w=6
> > >>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
> > >>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
> > >>    92: (ad) if r2 < r1 goto pc+41        ; R0_w=scalar() R1_w=scalar(umin=7,umax=5,var_off=(0x4; 0x3))
> > >>        R2_w=6 R6=scalar(id=388) R7=0 R8_w=scalar() R9_w=scalar(umax=25769803770,var_off=(0x0; 0x7ffffffff))
> > >>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=5 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
> > >>      ...
> > >>    89: (07) r2 += 1                      ; R2_w=4088
> > >>    90: (79) r8 = *(u64 *)(r10 -48)       ; R8_w=scalar() R10=fp0
> > >>    91: (79) r1 = *(u64 *)(r10 -56)       ; R1_w=scalar(umax=5,var_off=(0x0; 0x7)) R10=fp0
> > >>    92: (ad) if r2 < r1 goto pc+41        ; R0=scalar() R1=scalar(umin=4089,umax=5,var_off=(0x0; 0x7))
> > >>        R2=4088 R6=scalar(id=12634) R7=0 R8=scalar() R9=scalar(umax=17557826301960,var_off=(0x0; 0xfffffffffff))
> > >>        R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-32= fp-40=4087 fp-48=mmmmmmmm fp-56= fp-64=mmmmmmmm
> > >>
> > >> Patch 3 fixed the above issue by handling '<const> <cond_op> <non_const>' properly.
> > >> During developing selftests for Patch 3, I found some issues with bound deduction with
> > >> BPF_EQ/BPF_NE and fixed the issue in Patch 1.
> > [...]