diff mbox series

[bpf-next,v3,3/4] bpf: verify scalar ids mapping in regsafe() using check_ids()

Message ID 20230606222411.1820404-4-eddyz87@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series verify scalar ids mapping in regsafe() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/cc_maintainers fail 1 blamed authors not CCed: john.fastabend@gmail.com; 9 maintainers not CCed: kpsingh@kernel.org shuah@kernel.org sdf@google.com john.fastabend@gmail.com song@kernel.org mykolal@fb.com linux-kselftest@vger.kernel.org jolsa@kernel.org haoluo@google.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff fail author Signed-off-by missing
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/checkpatch warning CHECK: spaces preferred around that '%' (ctx:WxV) WARNING: line length of 82 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc

Commit Message

Eduard Zingerman June 6, 2023, 10:24 p.m. UTC
Make sure that the following unsafe example is rejected by verifier:

1: r9 = ... some pointer with range X ...
2: r6 = ... unbound scalar ID=a ...
3: r7 = ... unbound scalar ID=b ...
4: if (r6 > r7) goto +1
5: r6 = r7
6: if (r6 > X) goto ...
--- checkpoint ---
7: r9 += r7
8: *(u64 *)r9 = Y

This example is unsafe because not all execution paths verify r7 range.
Because of the jump at (4) the verifier would arrive at (6) in two states:
I.  r6{.id=b}, r7{.id=b} via path 1-6;
II. r6{.id=a}, r7{.id=b} via path 1-4, 6.

Currently regsafe() does not call check_ids() for scalar registers,
thus from POV of regsafe() states (I) and (II) are identical. If the
path 1-6 is taken by verifier first, and checkpoint is created at (6)
the path [1-4, 6] would be considered safe.

This commit updates regsafe() to call check_ids() for precise scalar
registers.

To minimize the impact on verification performance, avoid generating
bpf_reg_state::id for constant scalar values when processing BPF_MOV
in check_alu_op(). Scalar IDs are utilized by find_equal_scalars() to
propagate information about value ranges for registers that hold the
same value. However, there is no need to propagate range information
for constants.

Still, there is some performance impact because of this change.
Using veristat to compare number of processed states for selftests
object files listed in tools/testing/selftests/bpf/veristat.cfg and
Cilium object files from [1] gives the following statistics:

$ ./veristat -e file,prog,states -f "states_pct>10" \
    -C master-baseline.log current.log
File         Program                         States  (DIFF)
-----------  ------------------------------  --------------
bpf_xdp.o    tail_handle_nat_fwd_ipv6        +155 (+23.92%)
bpf_xdp.o    tail_nodeport_nat_ingress_ipv4  +102 (+27.20%)
bpf_xdp.o    tail_rev_nodeport_lb4            +83 (+20.85%)
loop6.bpf.o  trace_virtqueue_add_sgs          +25 (+11.06%)

Also test case verifier_search_pruning/allocated_stack has to be
updated to avoid conflicts in register ID assignments between cached
and new states.

[1] git@github.com:anakryiko/cilium.git

Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 kernel/bpf/verifier.c                         | 34 ++++++++++++++++---
 .../bpf/progs/verifier_search_pruning.c       |  3 +-
 2 files changed, 32 insertions(+), 5 deletions(-)

Comments

Andrii Nakryiko June 7, 2023, 9:40 p.m. UTC | #1
On Tue, Jun 6, 2023 at 3:24 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Make sure that the following unsafe example is rejected by verifier:
>
> 1: r9 = ... some pointer with range X ...
> 2: r6 = ... unbound scalar ID=a ...
> 3: r7 = ... unbound scalar ID=b ...
> 4: if (r6 > r7) goto +1
> 5: r6 = r7
> 6: if (r6 > X) goto ...
> --- checkpoint ---
> 7: r9 += r7
> 8: *(u64 *)r9 = Y
>
> This example is unsafe because not all execution paths verify r7 range.
> Because of the jump at (4) the verifier would arrive at (6) in two states:
> I.  r6{.id=b}, r7{.id=b} via path 1-6;
> II. r6{.id=a}, r7{.id=b} via path 1-4, 6.
>
> Currently regsafe() does not call check_ids() for scalar registers,
> thus from POV of regsafe() states (I) and (II) are identical. If the
> path 1-6 is taken by verifier first, and checkpoint is created at (6)
> the path [1-4, 6] would be considered safe.
>
> This commit updates regsafe() to call check_ids() for precise scalar
> registers.
>
> To minimize the impact on verification performance, avoid generating
> bpf_reg_state::id for constant scalar values when processing BPF_MOV
> in check_alu_op(). Scalar IDs are utilized by find_equal_scalars() to
> propagate information about value ranges for registers that hold the
> same value. However, there is no need to propagate range information
> for constants.
>
> Still, there is some performance impact because of this change.
> Using veristat to compare number of processed states for selftests
> object files listed in tools/testing/selftests/bpf/veristat.cfg and
> Cilium object files from [1] gives the following statistics:
>
> $ ./veristat -e file,prog,states -f "states_pct>10" \
>     -C master-baseline.log current.log
> File         Program                         States  (DIFF)
> -----------  ------------------------------  --------------
> bpf_xdp.o    tail_handle_nat_fwd_ipv6        +155 (+23.92%)
> bpf_xdp.o    tail_nodeport_nat_ingress_ipv4  +102 (+27.20%)
> bpf_xdp.o    tail_rev_nodeport_lb4            +83 (+20.85%)
> loop6.bpf.o  trace_virtqueue_add_sgs          +25 (+11.06%)
>
> Also test case verifier_search_pruning/allocated_stack has to be
> updated to avoid conflicts in register ID assignments between cached
> and new states.
>
> [1] git@github.com:anakryiko/cilium.git
>
> Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---

So I checked it also on our internal BPF object files, and it looks
mostly good. Here are the only regressions:

Program                                   States (A)  States (B)
States   (DIFF)
----------------------------------------  ----------  ----------
---------------
balancer_ingress                               29219       34531
+5312 (+18.18%)
syar_bind6_protect6                             3257        3599
+342 (+10.50%)
syar_bind4_protect4                             2590        2931
+341 (+13.17%)
on_alloc                                         415         526
+111 (+26.75%)
on_free                                          406         517
+111 (+27.34%)
pycallcount                                      395         506
+111 (+28.10%)
resume_context                                   405         516
+111 (+27.41%)
on_py_event                                      395         506
+111 (+28.10%)
on_event                                         284         394
+110 (+38.73%)
handle_cuda_event                                268         378
+110 (+41.04%)
handle_cuda_launch                               276         386
+110 (+39.86%)
handle_cuda_malloc_ret                           272         382
+110 (+40.44%)
handle_cuda_memcpy                               270         380
+110 (+40.74%)
handle_cuda_memcpy_async                         270         380
+110 (+40.74%)
handle_pytorch_allocate_ret                      271         381
+110 (+40.59%)
handle_pytorch_malloc_ret                        272         382
+110 (+40.44%)
on_event                                         284         394
+110 (+38.73%)
on_event                                         284         394
+110 (+38.73%)
syar_task_enter_execve                           309         329
+20 (+6.47%)
kprobe__security_inode_link                      968         986
+18 (+1.86%)
kprobe__security_inode_symlink                   838         854
+16 (+1.91%)
tw_twfw_egress                                   249         251
+2 (+0.80%)
tw_twfw_ingress                                  250         252
+2 (+0.80%)
tw_twfw_tc_eg                                    248         250
+2 (+0.81%)
tw_twfw_tc_in                                    250         252
+2 (+0.80%)
raw_tracepoint__sched_process_exec               136         139
+3 (+2.21%)
kprobe_ret__do_filp_open                         869         871
+2 (+0.23%)
read_erlang_stack                                572         573
+1 (+0.17%)


They are mostly on small-ish programs. The only mild concern from my
side is balancer_ingress, which is one of Katran BPF programs. It add
+18% of states (which translates to about 70K more instructions
verified, up from 350K). I think we can live with this, but would be
nice to check why it's happening.

I suspect that dropping SCALAR IDs as we discussed (after fixing
register fill/spill ID generation) might completely mitigate that.

Overall, LGTM:

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  kernel/bpf/verifier.c                         | 34 ++++++++++++++++---
>  .../bpf/progs/verifier_search_pruning.c       |  3 +-
>  2 files changed, 32 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2aa60b73f1b5..175ca22b868e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12933,12 +12933,14 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
>                 if (BPF_SRC(insn->code) == BPF_X) {
>                         struct bpf_reg_state *src_reg = regs + insn->src_reg;
>                         struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
> +                       bool need_id = (src_reg->type == SCALAR_VALUE && !src_reg->id &&
> +                                       !tnum_is_const(src_reg->var_off));
>

nit: unnecessary outer ()

>                         if (BPF_CLASS(insn->code) == BPF_ALU64) {
>                                 /* case: R1 = R2
>                                  * copy register state to dest reg
>                                  */
> -                               if (src_reg->type == SCALAR_VALUE && !src_reg->id)
> +                               if (need_id)
>                                         /* Assign src and dst registers the same ID
>                                          * that will be used by find_equal_scalars()
>                                          * to propagate min/max range.

[...]
Eduard Zingerman June 8, 2023, 1:26 a.m. UTC | #2
On Wed, 2023-06-07 at 14:40 -0700, Andrii Nakryiko wrote:
> On Tue, Jun 6, 2023 at 3:24 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > Make sure that the following unsafe example is rejected by verifier:
> > 
> > 1: r9 = ... some pointer with range X ...
> > 2: r6 = ... unbound scalar ID=a ...
> > 3: r7 = ... unbound scalar ID=b ...
> > 4: if (r6 > r7) goto +1
> > 5: r6 = r7
> > 6: if (r6 > X) goto ...
> > --- checkpoint ---
> > 7: r9 += r7
> > 8: *(u64 *)r9 = Y
> > 
> > This example is unsafe because not all execution paths verify r7 range.
> > Because of the jump at (4) the verifier would arrive at (6) in two states:
> > I.  r6{.id=b}, r7{.id=b} via path 1-6;
> > II. r6{.id=a}, r7{.id=b} via path 1-4, 6.
> > 
> > Currently regsafe() does not call check_ids() for scalar registers,
> > thus from POV of regsafe() states (I) and (II) are identical. If the
> > path 1-6 is taken by verifier first, and checkpoint is created at (6)
> > the path [1-4, 6] would be considered safe.
> > 
> > This commit updates regsafe() to call check_ids() for precise scalar
> > registers.
> > 
> > To minimize the impact on verification performance, avoid generating
> > bpf_reg_state::id for constant scalar values when processing BPF_MOV
> > in check_alu_op(). Scalar IDs are utilized by find_equal_scalars() to
> > propagate information about value ranges for registers that hold the
> > same value. However, there is no need to propagate range information
> > for constants.
> > 
> > Still, there is some performance impact because of this change.
> > Using veristat to compare number of processed states for selftests
> > object files listed in tools/testing/selftests/bpf/veristat.cfg and
> > Cilium object files from [1] gives the following statistics:
> > 
> > $ ./veristat -e file,prog,states -f "states_pct>10" \
> >     -C master-baseline.log current.log
> > File         Program                         States  (DIFF)
> > -----------  ------------------------------  --------------
> > bpf_xdp.o    tail_handle_nat_fwd_ipv6        +155 (+23.92%)
> > bpf_xdp.o    tail_nodeport_nat_ingress_ipv4  +102 (+27.20%)
> > bpf_xdp.o    tail_rev_nodeport_lb4            +83 (+20.85%)
> > loop6.bpf.o  trace_virtqueue_add_sgs          +25 (+11.06%)
> > 
> > Also test case verifier_search_pruning/allocated_stack has to be
> > updated to avoid conflicts in register ID assignments between cached
> > and new states.
> > 
> > [1] git@github.com:anakryiko/cilium.git
> > 
> > Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
> > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> > ---
> 
> So I checked it also on our internal BPF object files, and it looks
> mostly good. Here are the only regressions:
> 
> Program                                   States (A)  States (B)
> States   (DIFF)
> ----------------------------------------  ----------  ----------
> ---------------
> balancer_ingress                               29219       34531
> +5312 (+18.18%)
> syar_bind6_protect6                             3257        3599
> +342 (+10.50%)
> syar_bind4_protect4                             2590        2931
> +341 (+13.17%)
> on_alloc                                         415         526
> +111 (+26.75%)
> on_free                                          406         517
> +111 (+27.34%)
> pycallcount                                      395         506
> +111 (+28.10%)
> resume_context                                   405         516
> +111 (+27.41%)
> on_py_event                                      395         506
> +111 (+28.10%)
> on_event                                         284         394
> +110 (+38.73%)
> handle_cuda_event                                268         378
> +110 (+41.04%)
> handle_cuda_launch                               276         386
> +110 (+39.86%)
> handle_cuda_malloc_ret                           272         382
> +110 (+40.44%)
> handle_cuda_memcpy                               270         380
> +110 (+40.74%)
> handle_cuda_memcpy_async                         270         380
> +110 (+40.74%)
> handle_pytorch_allocate_ret                      271         381
> +110 (+40.59%)
> handle_pytorch_malloc_ret                        272         382
> +110 (+40.44%)
> on_event                                         284         394
> +110 (+38.73%)
> on_event                                         284         394
> +110 (+38.73%)
> syar_task_enter_execve                           309         329
> +20 (+6.47%)
> kprobe__security_inode_link                      968         986
> +18 (+1.86%)
> kprobe__security_inode_symlink                   838         854
> +16 (+1.91%)
> tw_twfw_egress                                   249         251
> +2 (+0.80%)
> tw_twfw_ingress                                  250         252
> +2 (+0.80%)
> tw_twfw_tc_eg                                    248         250
> +2 (+0.81%)
> tw_twfw_tc_in                                    250         252
> +2 (+0.80%)
> raw_tracepoint__sched_process_exec               136         139
> +3 (+2.21%)
> kprobe_ret__do_filp_open                         869         871
> +2 (+0.23%)
> read_erlang_stack                                572         573
> +1 (+0.17%)
> 
> 
> They are mostly on small-ish programs. The only mild concern from my
> side is balancer_ingress, which is one of Katran BPF programs. It add
> +18% of states (which translates to about 70K more instructions
> verified, up from 350K). I think we can live with this, but would be
> nice to check why it's happening.

Thank you for reviewing this series.

I looked at the logs that you've shared, the difference is indeed
caused by some scalar registers having a unique ID in cached state and
no ID in current state or vice versa. The !rold->id trick that we
discussed for V2 helps :)

What do you think about an alternative way to exclude unique scalars
as in the patch below? (on top of this patch-set):

--- 8< -------------------------

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 235d7eded565..ece9722dff3b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15149,6 +15149,13 @@ static bool check_ids(u32 old_id, u32 cur_id, struct bpf_id_pair *idmap)
        return false;
 }
 
+static bool check_scalar_ids(u32 old_id, u32 cur_id, struct bpf_verifier_env *env)
+{
+       old_id = old_id ? old_id : env->id_gen++;
+       cur_id = cur_id ? cur_id : env->id_gen++;
+       return check_ids(old_id, cur_id, env->idmap_scratch);
+}
+
 static void clean_func_state(struct bpf_verifier_env *env,
                             struct bpf_func_state *st)
 {
@@ -15325,7 +15332,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
                 */
                return range_within(rold, rcur) &&
                       tnum_in(rold->var_off, rcur->var_off) &&
-                      check_ids(rold->id, rcur->id, idmap);
+                      check_scalar_ids(rold->id, rcur->id, env);
        case PTR_TO_MAP_KEY:
        case PTR_TO_MAP_VALUE:
        case PTR_TO_MEM:

------------------------- >8 ---

For me this patch removes all veristat differences compared to the
master. If doing it for real, I'd like to reset env->id_gen at exit
from states_equal() to the value it had at entry (to avoid allocating
too many ids).

> 
> I suspect that dropping SCALAR IDs as we discussed (after fixing
> register fill/spill ID generation) might completely mitigate that.
> 
> Overall, LGTM:
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> 
> >  kernel/bpf/verifier.c                         | 34 ++++++++++++++++---
> >  .../bpf/progs/verifier_search_pruning.c       |  3 +-
> >  2 files changed, 32 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 2aa60b73f1b5..175ca22b868e 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12933,12 +12933,14 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >                 if (BPF_SRC(insn->code) == BPF_X) {
> >                         struct bpf_reg_state *src_reg = regs + insn->src_reg;
> >                         struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
> > +                       bool need_id = (src_reg->type == SCALAR_VALUE && !src_reg->id &&
> > +                                       !tnum_is_const(src_reg->var_off));
> > 
> 
> nit: unnecessary outer ()
> 
> >                         if (BPF_CLASS(insn->code) == BPF_ALU64) {
> >                                 /* case: R1 = R2
> >                                  * copy register state to dest reg
> >                                  */
> > -                               if (src_reg->type == SCALAR_VALUE && !src_reg->id)
> > +                               if (need_id)
> >                                         /* Assign src and dst registers the same ID
> >                                          * that will be used by find_equal_scalars()
> >                                          * to propagate min/max range.
> 
> [...]
Andrii Nakryiko June 8, 2023, 5:21 p.m. UTC | #3
On Wed, Jun 7, 2023 at 6:26 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2023-06-07 at 14:40 -0700, Andrii Nakryiko wrote:
> > On Tue, Jun 6, 2023 at 3:24 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > Make sure that the following unsafe example is rejected by verifier:
> > >
> > > 1: r9 = ... some pointer with range X ...
> > > 2: r6 = ... unbound scalar ID=a ...
> > > 3: r7 = ... unbound scalar ID=b ...
> > > 4: if (r6 > r7) goto +1
> > > 5: r6 = r7
> > > 6: if (r6 > X) goto ...
> > > --- checkpoint ---
> > > 7: r9 += r7
> > > 8: *(u64 *)r9 = Y
> > >
> > > This example is unsafe because not all execution paths verify r7 range.
> > > Because of the jump at (4) the verifier would arrive at (6) in two states:
> > > I.  r6{.id=b}, r7{.id=b} via path 1-6;
> > > II. r6{.id=a}, r7{.id=b} via path 1-4, 6.
> > >
> > > Currently regsafe() does not call check_ids() for scalar registers,
> > > thus from POV of regsafe() states (I) and (II) are identical. If the
> > > path 1-6 is taken by verifier first, and checkpoint is created at (6)
> > > the path [1-4, 6] would be considered safe.
> > >
> > > This commit updates regsafe() to call check_ids() for precise scalar
> > > registers.
> > >
> > > To minimize the impact on verification performance, avoid generating
> > > bpf_reg_state::id for constant scalar values when processing BPF_MOV
> > > in check_alu_op(). Scalar IDs are utilized by find_equal_scalars() to
> > > propagate information about value ranges for registers that hold the
> > > same value. However, there is no need to propagate range information
> > > for constants.
> > >
> > > Still, there is some performance impact because of this change.
> > > Using veristat to compare number of processed states for selftests
> > > object files listed in tools/testing/selftests/bpf/veristat.cfg and
> > > Cilium object files from [1] gives the following statistics:
> > >
> > > $ ./veristat -e file,prog,states -f "states_pct>10" \
> > >     -C master-baseline.log current.log
> > > File         Program                         States  (DIFF)
> > > -----------  ------------------------------  --------------
> > > bpf_xdp.o    tail_handle_nat_fwd_ipv6        +155 (+23.92%)
> > > bpf_xdp.o    tail_nodeport_nat_ingress_ipv4  +102 (+27.20%)
> > > bpf_xdp.o    tail_rev_nodeport_lb4            +83 (+20.85%)
> > > loop6.bpf.o  trace_virtqueue_add_sgs          +25 (+11.06%)
> > >
> > > Also test case verifier_search_pruning/allocated_stack has to be
> > > updated to avoid conflicts in register ID assignments between cached
> > > and new states.
> > >
> > > [1] git@github.com:anakryiko/cilium.git
> > >
> > > Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
> > > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> > > ---
> >
> > So I checked it also on our internal BPF object files, and it looks
> > mostly good. Here are the only regressions:
> >
> > Program                                   States (A)  States (B)
> > States   (DIFF)
> > ----------------------------------------  ----------  ----------
> > ---------------
> > balancer_ingress                               29219       34531
> > +5312 (+18.18%)
> > syar_bind6_protect6                             3257        3599
> > +342 (+10.50%)
> > syar_bind4_protect4                             2590        2931
> > +341 (+13.17%)
> > on_alloc                                         415         526
> > +111 (+26.75%)
> > on_free                                          406         517
> > +111 (+27.34%)
> > pycallcount                                      395         506
> > +111 (+28.10%)
> > resume_context                                   405         516
> > +111 (+27.41%)
> > on_py_event                                      395         506
> > +111 (+28.10%)
> > on_event                                         284         394
> > +110 (+38.73%)
> > handle_cuda_event                                268         378
> > +110 (+41.04%)
> > handle_cuda_launch                               276         386
> > +110 (+39.86%)
> > handle_cuda_malloc_ret                           272         382
> > +110 (+40.44%)
> > handle_cuda_memcpy                               270         380
> > +110 (+40.74%)
> > handle_cuda_memcpy_async                         270         380
> > +110 (+40.74%)
> > handle_pytorch_allocate_ret                      271         381
> > +110 (+40.59%)
> > handle_pytorch_malloc_ret                        272         382
> > +110 (+40.44%)
> > on_event                                         284         394
> > +110 (+38.73%)
> > on_event                                         284         394
> > +110 (+38.73%)
> > syar_task_enter_execve                           309         329
> > +20 (+6.47%)
> > kprobe__security_inode_link                      968         986
> > +18 (+1.86%)
> > kprobe__security_inode_symlink                   838         854
> > +16 (+1.91%)
> > tw_twfw_egress                                   249         251
> > +2 (+0.80%)
> > tw_twfw_ingress                                  250         252
> > +2 (+0.80%)
> > tw_twfw_tc_eg                                    248         250
> > +2 (+0.81%)
> > tw_twfw_tc_in                                    250         252
> > +2 (+0.80%)
> > raw_tracepoint__sched_process_exec               136         139
> > +3 (+2.21%)
> > kprobe_ret__do_filp_open                         869         871
> > +2 (+0.23%)
> > read_erlang_stack                                572         573
> > +1 (+0.17%)
> >
> >
> > They are mostly on small-ish programs. The only mild concern from my
> > side is balancer_ingress, which is one of Katran BPF programs. It add
> > +18% of states (which translates to about 70K more instructions
> > verified, up from 350K). I think we can live with this, but would be
> > nice to check why it's happening.
>
> Thank you for reviewing this series.
>
> I looked at the logs that you've shared, the difference is indeed
> caused by some scalar registers having a unique ID in cached state and
> no ID in current state or vice versa. The !rold->id trick that we
> discussed for V2 helps :)
>
> What do you think about an alternative way to exclude unique scalars
> as in the patch below? (on top of this patch-set):
>
> --- 8< -------------------------
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 235d7eded565..ece9722dff3b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -15149,6 +15149,13 @@ static bool check_ids(u32 old_id, u32 cur_id, struct bpf_id_pair *idmap)
>         return false;
>  }
>
> +static bool check_scalar_ids(u32 old_id, u32 cur_id, struct bpf_verifier_env *env)
> +{
> +       old_id = old_id ? old_id : env->id_gen++;
> +       cur_id = cur_id ? cur_id : env->id_gen++;
> +       return check_ids(old_id, cur_id, env->idmap_scratch);
> +}
> +
>  static void clean_func_state(struct bpf_verifier_env *env,
>                              struct bpf_func_state *st)
>  {
> @@ -15325,7 +15332,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
>                  */
>                 return range_within(rold, rcur) &&
>                        tnum_in(rold->var_off, rcur->var_off) &&
> -                      check_ids(rold->id, rcur->id, idmap);
> +                      check_scalar_ids(rold->id, rcur->id, env);
>         case PTR_TO_MAP_KEY:
>         case PTR_TO_MAP_VALUE:
>         case PTR_TO_MEM:
>
> ------------------------- >8 ---
>
> For me this patch removes all veristat differences compared to the
> master. If doing it for real, I'd like to reset env->id_gen at exit
> from states_equal() to the value it had at entry (to avoid allocating
> too many ids).

Hm.. It's clever and pretty minimal, I like it. We are basically
allocating virtual ID for SCALAR that doesn't have id, just to make
sure we get a conflict if the SCALAR with ID cannot be mapped into two
different SCALARs, right?

The only question would be if it's safe to do that for case when
old_reg->id != 0 and cur_reg->id == 0? E.g., if in old (verified)
state we have r6.id = r7.id = 123, and in new state we have r6.id = 0
and r7.id = 0, then your implementation above will say that states are
equivalent. But are they, given there is a link between r6 and r7 that
might be required for correctness. Which we don't have in current
state.

So with this we are getting to my original concerns with your
!rold->id approach, which tries to ignore the necessity of link
between registers.

What if we do this only for old registers? Then, (in old state) r6.id
= 0, r7.id = 0, (in new state) r6.id = r7.id = 123. This will be
rejected because first we'll map 123 to newly allocated X for r6.id,
and then when we try to match r7.id=123 to another allocated ID X+1
we'll get a conflict, right?

>
> >
> > I suspect that dropping SCALAR IDs as we discussed (after fixing
> > register fill/spill ID generation) might completely mitigate that.
> >
> > Overall, LGTM:
> >
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > >  kernel/bpf/verifier.c                         | 34 ++++++++++++++++---
> > >  .../bpf/progs/verifier_search_pruning.c       |  3 +-
> > >  2 files changed, 32 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 2aa60b73f1b5..175ca22b868e 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -12933,12 +12933,14 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > >                 if (BPF_SRC(insn->code) == BPF_X) {
> > >                         struct bpf_reg_state *src_reg = regs + insn->src_reg;
> > >                         struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
> > > +                       bool need_id = (src_reg->type == SCALAR_VALUE && !src_reg->id &&
> > > +                                       !tnum_is_const(src_reg->var_off));
> > >
> >
> > nit: unnecessary outer ()
> >
> > >                         if (BPF_CLASS(insn->code) == BPF_ALU64) {
> > >                                 /* case: R1 = R2
> > >                                  * copy register state to dest reg
> > >                                  */
> > > -                               if (src_reg->type == SCALAR_VALUE && !src_reg->id)
> > > +                               if (need_id)
> > >                                         /* Assign src and dst registers the same ID
> > >                                          * that will be used by find_equal_scalars()
> > >                                          * to propagate min/max range.
> >
> > [...]
>
Eduard Zingerman June 8, 2023, 7:05 p.m. UTC | #4
On Thu, 2023-06-08 at 10:21 -0700, Andrii Nakryiko wrote:
> On Wed, Jun 7, 2023 at 6:26 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Wed, 2023-06-07 at 14:40 -0700, Andrii Nakryiko wrote:
> > > On Tue, Jun 6, 2023 at 3:24 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > 
> > > > Make sure that the following unsafe example is rejected by verifier:
> > > > 
> > > > 1: r9 = ... some pointer with range X ...
> > > > 2: r6 = ... unbound scalar ID=a ...
> > > > 3: r7 = ... unbound scalar ID=b ...
> > > > 4: if (r6 > r7) goto +1
> > > > 5: r6 = r7
> > > > 6: if (r6 > X) goto ...
> > > > --- checkpoint ---
> > > > 7: r9 += r7
> > > > 8: *(u64 *)r9 = Y
> > > > 
> > > > This example is unsafe because not all execution paths verify r7 range.
> > > > Because of the jump at (4) the verifier would arrive at (6) in two states:
> > > > I.  r6{.id=b}, r7{.id=b} via path 1-6;
> > > > II. r6{.id=a}, r7{.id=b} via path 1-4, 6.
> > > > 
> > > > Currently regsafe() does not call check_ids() for scalar registers,
> > > > thus from POV of regsafe() states (I) and (II) are identical. If the
> > > > path 1-6 is taken by verifier first, and checkpoint is created at (6)
> > > > the path [1-4, 6] would be considered safe.
> > > > 
> > > > This commit updates regsafe() to call check_ids() for precise scalar
> > > > registers.
> > > > 
> > > > To minimize the impact on verification performance, avoid generating
> > > > bpf_reg_state::id for constant scalar values when processing BPF_MOV
> > > > in check_alu_op(). Scalar IDs are utilized by find_equal_scalars() to
> > > > propagate information about value ranges for registers that hold the
> > > > same value. However, there is no need to propagate range information
> > > > for constants.
> > > > 
> > > > Still, there is some performance impact because of this change.
> > > > Using veristat to compare number of processed states for selftests
> > > > object files listed in tools/testing/selftests/bpf/veristat.cfg and
> > > > Cilium object files from [1] gives the following statistics:
> > > > 
> > > > $ ./veristat -e file,prog,states -f "states_pct>10" \
> > > >     -C master-baseline.log current.log
> > > > File         Program                         States  (DIFF)
> > > > -----------  ------------------------------  --------------
> > > > bpf_xdp.o    tail_handle_nat_fwd_ipv6        +155 (+23.92%)
> > > > bpf_xdp.o    tail_nodeport_nat_ingress_ipv4  +102 (+27.20%)
> > > > bpf_xdp.o    tail_rev_nodeport_lb4            +83 (+20.85%)
> > > > loop6.bpf.o  trace_virtqueue_add_sgs          +25 (+11.06%)
> > > > 
> > > > Also test case verifier_search_pruning/allocated_stack has to be
> > > > updated to avoid conflicts in register ID assignments between cached
> > > > and new states.
> > > > 
> > > > [1] git@github.com:anakryiko/cilium.git
> > > > 
> > > > Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
> > > > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> > > > ---
> > > 
> > > So I checked it also on our internal BPF object files, and it looks
> > > mostly good. Here are the only regressions:
> > > 
> > > Program                                   States (A)  States (B)
> > > States   (DIFF)
> > > ----------------------------------------  ----------  ----------
> > > ---------------
> > > balancer_ingress                               29219       34531
> > > +5312 (+18.18%)
> > > syar_bind6_protect6                             3257        3599
> > > +342 (+10.50%)
> > > syar_bind4_protect4                             2590        2931
> > > +341 (+13.17%)
> > > on_alloc                                         415         526
> > > +111 (+26.75%)
> > > on_free                                          406         517
> > > +111 (+27.34%)
> > > pycallcount                                      395         506
> > > +111 (+28.10%)
> > > resume_context                                   405         516
> > > +111 (+27.41%)
> > > on_py_event                                      395         506
> > > +111 (+28.10%)
> > > on_event                                         284         394
> > > +110 (+38.73%)
> > > handle_cuda_event                                268         378
> > > +110 (+41.04%)
> > > handle_cuda_launch                               276         386
> > > +110 (+39.86%)
> > > handle_cuda_malloc_ret                           272         382
> > > +110 (+40.44%)
> > > handle_cuda_memcpy                               270         380
> > > +110 (+40.74%)
> > > handle_cuda_memcpy_async                         270         380
> > > +110 (+40.74%)
> > > handle_pytorch_allocate_ret                      271         381
> > > +110 (+40.59%)
> > > handle_pytorch_malloc_ret                        272         382
> > > +110 (+40.44%)
> > > on_event                                         284         394
> > > +110 (+38.73%)
> > > on_event                                         284         394
> > > +110 (+38.73%)
> > > syar_task_enter_execve                           309         329
> > > +20 (+6.47%)
> > > kprobe__security_inode_link                      968         986
> > > +18 (+1.86%)
> > > kprobe__security_inode_symlink                   838         854
> > > +16 (+1.91%)
> > > tw_twfw_egress                                   249         251
> > > +2 (+0.80%)
> > > tw_twfw_ingress                                  250         252
> > > +2 (+0.80%)
> > > tw_twfw_tc_eg                                    248         250
> > > +2 (+0.81%)
> > > tw_twfw_tc_in                                    250         252
> > > +2 (+0.80%)
> > > raw_tracepoint__sched_process_exec               136         139
> > > +3 (+2.21%)
> > > kprobe_ret__do_filp_open                         869         871
> > > +2 (+0.23%)
> > > read_erlang_stack                                572         573
> > > +1 (+0.17%)
> > > 
> > > 
> > > They are mostly on small-ish programs. The only mild concern from my
> > > side is balancer_ingress, which is one of Katran BPF programs. It add
> > > +18% of states (which translates to about 70K more instructions
> > > verified, up from 350K). I think we can live with this, but would be
> > > nice to check why it's happening.
> > 
> > Thank you for reviewing this series.
> > 
> > I looked at the logs that you've shared, the difference is indeed
> > caused by some scalar registers having a unique ID in cached state and
> > no ID in current state or vice versa. The !rold->id trick that we
> > discussed for V2 helps :)
> > 
> > What do you think about an alternative way to exclude unique scalars
> > as in the patch below? (on top of this patch-set):
> > 
> > --- 8< -------------------------
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 235d7eded565..ece9722dff3b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -15149,6 +15149,13 @@ static bool check_ids(u32 old_id, u32 cur_id, struct bpf_id_pair *idmap)
> >         return false;
> >  }
> > 
> > +static bool check_scalar_ids(u32 old_id, u32 cur_id, struct bpf_verifier_env *env)
> > +{
> > +       old_id = old_id ? old_id : env->id_gen++;
> > +       cur_id = cur_id ? cur_id : env->id_gen++;
> > +       return check_ids(old_id, cur_id, env->idmap_scratch);
> > +}
> > +
> >  static void clean_func_state(struct bpf_verifier_env *env,
> >                              struct bpf_func_state *st)
> >  {
> > @@ -15325,7 +15332,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> >                  */
> >                 return range_within(rold, rcur) &&
> >                        tnum_in(rold->var_off, rcur->var_off) &&
> > -                      check_ids(rold->id, rcur->id, idmap);
> > +                      check_scalar_ids(rold->id, rcur->id, env);
> >         case PTR_TO_MAP_KEY:
> >         case PTR_TO_MAP_VALUE:
> >         case PTR_TO_MEM:
> > 
> > ------------------------- >8 ---
> > 
> > For me this patch removes all veristat differences compared to the
> > master. If doing it for real, I'd like to reset env->id_gen at exit
> > from states_equal() to the value it had at entry (to avoid allocating
> > too many ids).
> 
> Hm.. It's clever and pretty minimal, I like it. We are basically
> allocating virtual ID for SCALAR that doesn't have id, just to make
> sure we get a conflict if the SCALAR with ID cannot be mapped into two
> different SCALARs, right?
> 
> The only question would be if it's safe to do that for case when
> old_reg->id != 0 and cur_reg->id == 0? E.g., if in old (verified)
> state we have r6.id = r7.id = 123, and in new state we have r6.id = 0
> and r7.id = 0, then your implementation above will say that states are
> equivalent. But are they, given there is a link between r6 and r7 that
> might be required for correctness. Which we don't have in current
> state.

You mean the other way around, rold.id == 0, rcur.id != 0, right?
(below 0/2 means: original value 0, replaced by new id 2).

(1)   old cur
r6.id 0/2   1
r7.id 0/3   1 check_ids returns true

(2)   old cur
r6.id 1   0/2
r7.id 1   0/3 check_ids returns false

Also, (1) is no different from (3):

(3)   old cur
r6.id 1     3
r7.id 2     3 check_ids returns true

Current check:

		if (!rold->precise)
			return true;
		return range_within(rold, rcur) &&
		       tnum_in(rold->var_off, rcur->var_off) &&
		       check_ids(rold->id, rcur->id, idmap);

Will reject (1) and (2), but will accept (3).

New check:

		if (!rold->precise)
			return true;
		return range_within(rold, rcur) &&
		       tnum_in(rold->var_off, rcur->var_off) &&
		       check_scalar_ids(rold->id, rcur->id, idmap);

Will reject (2), but will accept (1) and (3).

And modification of check_scalar_ids() to generate IDs only for rold
or only for rcur would not reject (3) either.

(3) would be rejected only if current check is used together with
elimination of unique scalar IDs from old states.

My previous experiments show that eliminating unique IDs from old
states and not eliminating unique IDs from cur states is *very* bad
for performance.

> 
> So with this we are getting to my original concerns with your
> !rold->id approach, which tries to ignore the necessity of link
> between registers.
> 
> What if we do this only for old registers? Then, (in old state) r6.id
> = 0, r7.id = 0, (in new state) r6.id = r7.id = 123. This will be
> rejected because first we'll map 123 to newly allocated X for r6.id,
> and then when we try to match r7.id=123 to another allocated ID X+1
> we'll get a conflict, right?
> 
> > 
> > > 
> > > I suspect that dropping SCALAR IDs as we discussed (after fixing
> > > register fill/spill ID generation) might completely mitigate that.
> > > 
> > > Overall, LGTM:
> > > 
> > > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > > 
> > > >  kernel/bpf/verifier.c                         | 34 ++++++++++++++++---
> > > >  .../bpf/progs/verifier_search_pruning.c       |  3 +-
> > > >  2 files changed, 32 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 2aa60b73f1b5..175ca22b868e 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -12933,12 +12933,14 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > > >                 if (BPF_SRC(insn->code) == BPF_X) {
> > > >                         struct bpf_reg_state *src_reg = regs + insn->src_reg;
> > > >                         struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
> > > > +                       bool need_id = (src_reg->type == SCALAR_VALUE && !src_reg->id &&
> > > > +                                       !tnum_is_const(src_reg->var_off));
> > > > 
> > > 
> > > nit: unnecessary outer ()
> > > 
> > > >                         if (BPF_CLASS(insn->code) == BPF_ALU64) {
> > > >                                 /* case: R1 = R2
> > > >                                  * copy register state to dest reg
> > > >                                  */
> > > > -                               if (src_reg->type == SCALAR_VALUE && !src_reg->id)
> > > > +                               if (need_id)
> > > >                                         /* Assign src and dst registers the same ID
> > > >                                          * that will be used by find_equal_scalars()
> > > >                                          * to propagate min/max range.
> > > 
> > > [...]
> >
Eduard Zingerman June 8, 2023, 8:58 p.m. UTC | #5
On Thu, 2023-06-08 at 22:05 +0300, Eduard Zingerman wrote:
[...]
> > Hm.. It's clever and pretty minimal, I like it. We are basically
> > allocating virtual ID for SCALAR that doesn't have id, just to make
> > sure we get a conflict if the SCALAR with ID cannot be mapped into two
> > different SCALARs, right?
> > 
> > The only question would be if it's safe to do that for case when
> > old_reg->id != 0 and cur_reg->id == 0? E.g., if in old (verified)
> > state we have r6.id = r7.id = 123, and in new state we have r6.id = 0
> > and r7.id = 0, then your implementation above will say that states are
> > equivalent. But are they, given there is a link between r6 and r7 that
> > might be required for correctness. Which we don't have in current
> > state.
> 
> You mean the other way around, rold.id == 0, rcur.id != 0, right?
> (below 0/2 means: original value 0, replaced by new id 2).
> 
> (1)   old cur
> r6.id 0/2   1
> r7.id 0/3   1 check_ids returns true
> 
> (2)   old cur
> r6.id 1   0/2
> r7.id 1   0/3 check_ids returns false
> 
> Also, (1) is no different from (3):
> 
> (3)   old cur
> r6.id 1     3
> r7.id 2     3 check_ids returns true
> 
> Current check:
> 
> 		if (!rold->precise)
> 			return true;
> 		return range_within(rold, rcur) &&
> 		       tnum_in(rold->var_off, rcur->var_off) &&
> 		       check_ids(rold->id, rcur->id, idmap);
> 
> Will reject (1) and (2), but will accept (3).
> 
> New check:
> 
> 		if (!rold->precise)
> 			return true;
> 		return range_within(rold, rcur) &&
> 		       tnum_in(rold->var_off, rcur->var_off) &&
> 		       check_scalar_ids(rold->id, rcur->id, idmap);
> 
> Will reject (2), but will accept (1) and (3).
> 
> And modification of check_scalar_ids() to generate IDs only for rold
> or only for rcur would not reject (3) either.
> 
> (3) would be rejected only if current check is used together with
> elimination of unique scalar IDs from old states.
> 
> My previous experiments show that eliminating unique IDs from old
> states and not eliminating unique IDs from cur states is *very* bad
> for performance.
> 
> > 
> > So with this we are getting to my original concerns with your
> > !rold->id approach, which tries to ignore the necessity of link
> > between registers.
> > 
> > What if we do this only for old registers? Then, (in old state) r6.id
> > = 0, r7.id = 0, (in new state) r6.id = r7.id = 123. This will be
> > rejected because first we'll map 123 to newly allocated X for r6.id,
> > and then when we try to match r7.id=123 to another allocated ID X+1
> > we'll get a conflict, right?

[...]

Ok, here is what I think is the final version:
a. for each old or cur ID generate temporary unique ID;
b. for scalars use modified check_ids that forbids mapping same 'cur'
   ID to several different 'old' IDs.

(a) allows to reject situations like:

  (1)   old cur   (2)   old cur 
  r6.id 0   1     r6.id 1   0
  r7.id 0   1     r7.id 1   0

(b) allows to reject situations like:

  (3)   old cur
  r6.id 1   3
  r7.id 2   3

And whether some scalar ID is unique or not does not matter for the
algorithm.

Tests are passing, katran example is fine (350k insns, 29K states),
minor veristat regression:

File       Program                         States (DIFF)
---------  ------------------------------  -------------
bpf_xdp.o  tail_nodeport_nat_ingress_ipv4    +3 (+0.80%)
bpf_xdp.o  tail_rev_nodeport_lb4             +2 (+0.50%)

--- 8< -----------------------------

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 235d7eded565..5794dc7830db 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15149,6 +15149,31 @@ static bool check_ids(u32 old_id, u32 cur_id, struct bpf_id_pair *idmap)
        return false;
 }
 
+static bool check_scalar_ids(struct bpf_verifier_env *env, u32 old_id, u32 cur_id,
+                            struct bpf_id_pair *idmap)
+{
+       unsigned int i;
+
+       old_id = old_id ? old_id : env->id_gen++;
+       cur_id = cur_id ? cur_id : env->id_gen++;
+
+       for (i = 0; i < BPF_ID_MAP_SIZE; i++) {
+               if (!idmap[i].old) {
+                       /* Reached an empty slot; haven't seen this id before */
+                       idmap[i].old = old_id;
+                       idmap[i].cur = cur_id;
+                       return true;
+               }
+               if (idmap[i].old == old_id)
+                       return idmap[i].cur == cur_id;
+               if (idmap[i].cur == cur_id)
+                       return false;
+       }
+       /* We ran out of idmap slots, which should be impossible */
+       WARN_ON_ONCE(1);
+       return false;
+}
+
 static void clean_func_state(struct bpf_verifier_env *env,
                             struct bpf_func_state *st)
 {
@@ -15325,7 +15350,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
                 */
                return range_within(rold, rcur) &&
                       tnum_in(rold->var_off, rcur->var_off) &&
-                      check_ids(rold->id, rcur->id, idmap);
+                      check_scalar_ids(env, rold->id, rcur->id, idmap);
        case PTR_TO_MAP_KEY:
        case PTR_TO_MAP_VALUE:
        case PTR_TO_MEM:

----------------------------- >8 ---
Andrii Nakryiko June 8, 2023, 10:37 p.m. UTC | #6
On Thu, Jun 8, 2023 at 1:58 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2023-06-08 at 22:05 +0300, Eduard Zingerman wrote:
> [...]
> > > Hm.. It's clever and pretty minimal, I like it. We are basically
> > > allocating virtual ID for SCALAR that doesn't have id, just to make
> > > sure we get a conflict if the SCALAR with ID cannot be mapped into two
> > > different SCALARs, right?
> > >
> > > The only question would be if it's safe to do that for case when
> > > old_reg->id != 0 and cur_reg->id == 0? E.g., if in old (verified)
> > > state we have r6.id = r7.id = 123, and in new state we have r6.id = 0
> > > and r7.id = 0, then your implementation above will say that states are
> > > equivalent. But are they, given there is a link between r6 and r7 that
> > > might be required for correctness. Which we don't have in current
> > > state.
> >
> > You mean the other way around, rold.id == 0, rcur.id != 0, right?
> > (below 0/2 means: original value 0, replaced by new id 2).

no, I actually meant what I wrote, but I didn't realize that
check_ids() is kind of broken... Because it shouldn't allow the same
ID from cur state to be mapped to two different IDs in old state,
should it?

> >
> > (1)   old cur
> > r6.id 0/2   1
> > r7.id 0/3   1 check_ids returns true

I think this should be rejected.

> >
> > (2)   old cur
> > r6.id 1   0/2
> > r7.id 1   0/3 check_ids returns false

And this should be rejected.

> >
> > Also, (1) is no different from (3):
> >
> > (3)   old cur
> > r6.id 1     3
> > r7.id 2     3 check_ids returns true

And this definitely should be rejected.

The only situation that might not be rejected would be:

        old    cur
r6.id   0/1    3
r7.id.  0/2    4

And perhaps this one is ok as well?

        old    cur
r6.id   3      0/1
r7.id.  4      0/2


And my assumption was that that's what you are trying to do. But
weirdly check_ids() is enforcing only that old ID has to have a unique
mapping, which seems like a bug.

> >
> > Current check:
> >
> >               if (!rold->precise)
> >                       return true;
> >               return range_within(rold, rcur) &&
> >                      tnum_in(rold->var_off, rcur->var_off) &&
> >                      check_ids(rold->id, rcur->id, idmap);
> >
> > Will reject (1) and (2), but will accept (3).
> >
> > New check:
> >
> >               if (!rold->precise)
> >                       return true;
> >               return range_within(rold, rcur) &&
> >                      tnum_in(rold->var_off, rcur->var_off) &&
> >                      check_scalar_ids(rold->id, rcur->id, idmap);
> >
> > Will reject (2), but will accept (1) and (3).
> >
> > And modification of check_scalar_ids() to generate IDs only for rold
> > or only for rcur would not reject (3) either.
> >
> > (3) would be rejected only if current check is used together with
> > elimination of unique scalar IDs from old states.
> >
> > My previous experiments show that eliminating unique IDs from old
> > states and not eliminating unique IDs from cur states is *very* bad
> > for performance.
> >
> > >
> > > So with this we are getting to my original concerns with your
> > > !rold->id approach, which tries to ignore the necessity of link
> > > between registers.
> > >
> > > What if we do this only for old registers? Then, (in old state) r6.id
> > > = 0, r7.id = 0, (in new state) r6.id = r7.id = 123. This will be
> > > rejected because first we'll map 123 to newly allocated X for r6.id,
> > > and then when we try to match r7.id=123 to another allocated ID X+1
> > > we'll get a conflict, right?
>
> [...]
>
> Ok, here is what I think is the final version:
> a. for each old or cur ID generate temporary unique ID;
> b. for scalars use modified check_ids that forbids mapping same 'cur'
>    ID to several different 'old' IDs.
>
> (a) allows to reject situations like:
>
>   (1)   old cur   (2)   old cur
>   r6.id 0   1     r6.id 1   0
>   r7.id 0   1     r7.id 1   0
>
> (b) allows to reject situations like:
>
>   (3)   old cur
>   r6.id 1   3
>   r7.id 2   3
>
> And whether some scalar ID is unique or not does not matter for the
> algorithm.
>
> Tests are passing, katran example is fine (350k insns, 29K states),
> minor veristat regression:
>
> File       Program                         States (DIFF)
> ---------  ------------------------------  -------------
> bpf_xdp.o  tail_nodeport_nat_ingress_ipv4    +3 (+0.80%)
> bpf_xdp.o  tail_rev_nodeport_lb4             +2 (+0.50%)
>
> --- 8< -----------------------------
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 235d7eded565..5794dc7830db 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -15149,6 +15149,31 @@ static bool check_ids(u32 old_id, u32 cur_id, struct bpf_id_pair *idmap)
>         return false;
>  }
>
> +static bool check_scalar_ids(struct bpf_verifier_env *env, u32 old_id, u32 cur_id,
> +                            struct bpf_id_pair *idmap)
> +{
> +       unsigned int i;
> +
> +       old_id = old_id ? old_id : env->id_gen++;
> +       cur_id = cur_id ? cur_id : env->id_gen++;
> +
> +       for (i = 0; i < BPF_ID_MAP_SIZE; i++) {
> +               if (!idmap[i].old) {
> +                       /* Reached an empty slot; haven't seen this id before */
> +                       idmap[i].old = old_id;
> +                       idmap[i].cur = cur_id;
> +                       return true;
> +               }
> +               if (idmap[i].old == old_id)
> +                       return idmap[i].cur == cur_id;
> +               if (idmap[i].cur == cur_id)
> +                       return false;

I think this should just be added to existing check_ids(), I think
it's a bug that we don't check this condition today in check_ids().


But I'd say let's land fixes you have right now. And then work on
fixing and optimizing scala ID checks separately. We are doing too
many things at the same time :(

> +       }
> +       /* We ran out of idmap slots, which should be impossible */
> +       WARN_ON_ONCE(1);
> +       return false;
> +}
> +
>  static void clean_func_state(struct bpf_verifier_env *env,
>                              struct bpf_func_state *st)
>  {
> @@ -15325,7 +15350,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
>                  */
>                 return range_within(rold, rcur) &&
>                        tnum_in(rold->var_off, rcur->var_off) &&
> -                      check_ids(rold->id, rcur->id, idmap);
> +                      check_scalar_ids(env, rold->id, rcur->id, idmap);
>         case PTR_TO_MAP_KEY:
>         case PTR_TO_MAP_VALUE:
>         case PTR_TO_MEM:
>
> ----------------------------- >8 ---
Eduard Zingerman June 8, 2023, 11:40 p.m. UTC | #7
On Thu, 2023-06-08 at 15:37 -0700, Andrii Nakryiko wrote:
> On Thu, Jun 8, 2023 at 1:58 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Thu, 2023-06-08 at 22:05 +0300, Eduard Zingerman wrote:
> > [...]
> > > > Hm.. It's clever and pretty minimal, I like it. We are basically
> > > > allocating virtual ID for SCALAR that doesn't have id, just to make
> > > > sure we get a conflict if the SCALAR with ID cannot be mapped into two
> > > > different SCALARs, right?
> > > > 
> > > > The only question would be if it's safe to do that for case when
> > > > old_reg->id != 0 and cur_reg->id == 0? E.g., if in old (verified)
> > > > state we have r6.id = r7.id = 123, and in new state we have r6.id = 0
> > > > and r7.id = 0, then your implementation above will say that states are
> > > > equivalent. But are they, given there is a link between r6 and r7 that
> > > > might be required for correctness. Which we don't have in current
> > > > state.
> > > 
> > > You mean the other way around, rold.id == 0, rcur.id != 0, right?
> > > (below 0/2 means: original value 0, replaced by new id 2).
> 
> no, I actually meant what I wrote, but I didn't realize that
> check_ids() is kind of broken... Because it shouldn't allow the same
> ID from cur state to be mapped to two different IDs in old state,
> should it?

IDs are used for several things, and it looks like the answer might vary.

For example, I looked at mark_ptr_or_null_regs():
- it is called when conditional of form (ptr == NULL) is checked;
- it marks every register with pointer having same ID as ptr as
  null/non-null;
- when register is marked not null ID is removed (not for locks but
  ignore it for now).

Assume r6 and r7 are both PTR_MAYBE_NULL and ID assignments look as
follows:

        old cur
  r6.id 1   3
  r7.id 2   3
  
'old' is safe, which means the following:
- either r6 was not accessed or it was guarded by (r6 == NULL)
- either r7 was not accessed or it was guarded by (r7 == NULL)

In both cases it should be ok, if r6 and r7 are in fact the same
pointer. It would be checked to be not NULL two times but that's fine.
So, I'd say that 'cur' is a special case of 'old' and check_ids() is
correct for it. But this is the same argument I used for scalars and
you were not convinced :)

Need to examine each use case carefully.

> > > (1)   old cur
> > > r6.id 0/2   1
> > > r7.id 0/3   1 check_ids returns true
> 
> I think this should be rejected.

That's what we agreed upon when decided not to do !rold->id, so yes.

> > > (2)   old cur
> > > r6.id 1   0/2
> > > r7.id 1   0/3 check_ids returns false
> 
> And this should be rejected.

For sure.

> > > Also, (1) is no different from (3):
> > > 
> > > (3)   old cur
> > > r6.id 1     3
> > > r7.id 2     3 check_ids returns true
> 
> And this definitely should be rejected.

Same as (1).
 
> The only situation that might not be rejected would be:
> 
>         old    cur
> r6.id   0/1    3
> r7.id.  0/2    4
> 
> And perhaps this one is ok as well?
> 
>         old    cur
> r6.id   3      0/1
> r7.id.  4      0/2

I think these two should be accepted.

[...]
> > +static bool check_scalar_ids(struct bpf_verifier_env *env, u32 old_id, u32 cur_id,
> > +                            struct bpf_id_pair *idmap)
> > +{
> > +       unsigned int i;
> > +
> > +       old_id = old_id ? old_id : env->id_gen++;
> > +       cur_id = cur_id ? cur_id : env->id_gen++;
> > +
> > +       for (i = 0; i < BPF_ID_MAP_SIZE; i++) {
> > +               if (!idmap[i].old) {
> > +                       /* Reached an empty slot; haven't seen this id before */
> > +                       idmap[i].old = old_id;
> > +                       idmap[i].cur = cur_id;
> > +                       return true;
> > +               }
> > +               if (idmap[i].old == old_id)
> > +                       return idmap[i].cur == cur_id;
> > +               if (idmap[i].cur == cur_id)
> > +                       return false;
> 
> I think this should just be added to existing check_ids(), I think
> it's a bug that we don't check this condition today in check_ids().
> 
> But I'd say let's land fixes you have right now. And then work on
> fixing and optimizing scala ID checks separately. We are doing too
> many things at the same time :(

Ok, will post V4 with these changes and examine other cases later.
Thanks again for the discussion.

[...]
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2aa60b73f1b5..175ca22b868e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12933,12 +12933,14 @@  static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		if (BPF_SRC(insn->code) == BPF_X) {
 			struct bpf_reg_state *src_reg = regs + insn->src_reg;
 			struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
+			bool need_id = (src_reg->type == SCALAR_VALUE && !src_reg->id &&
+					!tnum_is_const(src_reg->var_off));
 
 			if (BPF_CLASS(insn->code) == BPF_ALU64) {
 				/* case: R1 = R2
 				 * copy register state to dest reg
 				 */
-				if (src_reg->type == SCALAR_VALUE && !src_reg->id)
+				if (need_id)
 					/* Assign src and dst registers the same ID
 					 * that will be used by find_equal_scalars()
 					 * to propagate min/max range.
@@ -12957,7 +12959,7 @@  static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				} else if (src_reg->type == SCALAR_VALUE) {
 					bool is_src_reg_u32 = src_reg->umax_value <= U32_MAX;
 
-					if (is_src_reg_u32 && !src_reg->id)
+					if (is_src_reg_u32 && need_id)
 						src_reg->id = ++env->id_gen;
 					copy_register_state(dst_reg, src_reg);
 					/* Make sure ID is cleared if src_reg is not in u32 range otherwise
@@ -15289,9 +15291,33 @@  static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
 			return false;
 		if (!rold->precise)
 			return true;
-		/* new val must satisfy old val knowledge */
+		/* Why check_ids() for scalar registers?
+		 *
+		 * Consider the following BPF code:
+		 *   1: r6 = ... unbound scalar, ID=a ...
+		 *   2: r7 = ... unbound scalar, ID=b ...
+		 *   3: if (r6 > r7) goto +1
+		 *   4: r6 = r7
+		 *   5: if (r6 > X) goto ...
+		 *   6: ... memory operation using r7 ...
+		 *
+		 * First verification path is [1-6]:
+		 * - at (4) same bpf_reg_state::id (b) would be assigned to r6 and r7;
+		 * - at (5) r6 would be marked <= X, find_equal_scalars() would also mark
+		 *   r7 <= X, because r6 and r7 share same id.
+		 * Next verification path is [1-4, 6].
+		 *
+		 * Instruction (6) would be reached in two states:
+		 *   I.  r6{.id=b}, r7{.id=b} via path 1-6;
+		 *   II. r6{.id=a}, r7{.id=b} via path 1-4, 6.
+		 *
+		 * Use check_ids() to distinguish these states.
+		 * ---
+		 * Also verify that new value satisfies old value range knowledge.
+		 */
 		return range_within(rold, rcur) &&
-		       tnum_in(rold->var_off, rcur->var_off);
+		       tnum_in(rold->var_off, rcur->var_off) &&
+		       check_ids(rold->id, rcur->id, idmap);
 	case PTR_TO_MAP_KEY:
 	case PTR_TO_MAP_VALUE:
 	case PTR_TO_MEM:
diff --git a/tools/testing/selftests/bpf/progs/verifier_search_pruning.c b/tools/testing/selftests/bpf/progs/verifier_search_pruning.c
index 5a14498d352f..bb3cd14bb3a1 100644
--- a/tools/testing/selftests/bpf/progs/verifier_search_pruning.c
+++ b/tools/testing/selftests/bpf/progs/verifier_search_pruning.c
@@ -271,7 +271,7 @@  l2_%=:	r0 = 0;						\
 
 SEC("socket")
 __description("allocated_stack")
-__success __msg("processed 15 insns")
+__success __msg("processed 16 insns")
 __success_unpriv __msg_unpriv("") __log_level(1) __retval(0)
 __naked void allocated_stack(void)
 {
@@ -279,6 +279,7 @@  __naked void allocated_stack(void)
 	r6 = r1;					\
 	call %[bpf_get_prandom_u32];			\
 	r7 = r0;					\
+	call %[bpf_get_prandom_u32];			\
 	if r0 == 0 goto l0_%=;				\
 	r0 = 0;						\
 	*(u64*)(r10 - 8) = r6;				\