Message ID | 20230106142214.1040390-1-eddyz87@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: Fix to preserve reg parent/live fields when copying range info | expand |
On Fri, Jan 6, 2023 at 6:22 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > Struct bpf_reg_state is copied directly in several places including: > - check_stack_write_fixed_off() (via save_register_state()); > - check_stack_read_fixed_off(); > - find_equal_scalars(). > > However, a literal copy of this struct also copies the following fields: > > struct bpf_reg_state { > ... > struct bpf_reg_state *parent; > ... > enum bpf_reg_liveness live; > ... > }; > > This breaks register parentage chain and liveness marking logic. > The commit message for the first patch has a detailed example. > This patch-set replaces direct copies with a call to a function > copy_register_state(dst,src), which preserves 'parent' and 'live' > fields of the 'dst'. > > The fix comes with a significant verifier runtime penalty for some > selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg > and cilium BPF binaries (see [1]): > > $ ./veristat -e file,prog,states -C -f 'states_diff>10' master-baseline.log current.log > File Program States (A) States (B) States (DIFF) > -------------------------- -------------------------------- ---------- ---------- --------------- > bpf_host.o tail_handle_ipv4_from_host 231 299 +68 (+29.44%) > bpf_host.o tail_handle_nat_fwd_ipv4 1088 1320 +232 (+21.32%) > bpf_host.o tail_handle_nat_fwd_ipv6 716 729 +13 (+1.82%) > bpf_host.o tail_nodeport_nat_ingress_ipv4 281 314 +33 (+11.74%) > bpf_host.o tail_nodeport_nat_ingress_ipv6 245 256 +11 (+4.49%) > bpf_lxc.o tail_handle_nat_fwd_ipv4 1088 1320 +232 (+21.32%) > bpf_lxc.o tail_handle_nat_fwd_ipv6 716 729 +13 (+1.82%) > bpf_lxc.o tail_ipv4_ct_egress 239 262 +23 (+9.62%) > bpf_lxc.o tail_ipv4_ct_ingress 239 262 +23 (+9.62%) > bpf_lxc.o tail_ipv4_ct_ingress_policy_only 239 262 +23 (+9.62%) > bpf_lxc.o tail_ipv6_ct_egress 181 195 +14 (+7.73%) > bpf_lxc.o tail_ipv6_ct_ingress 181 195 +14 (+7.73%) > bpf_lxc.o tail_ipv6_ct_ingress_policy_only 181 195 +14 (+7.73%) > bpf_lxc.o tail_nodeport_nat_ingress_ipv4 281 314 +33 (+11.74%) > bpf_lxc.o tail_nodeport_nat_ingress_ipv6 245 256 +11 (+4.49%) > bpf_overlay.o tail_handle_nat_fwd_ipv4 799 829 +30 (+3.75%) > bpf_overlay.o tail_nodeport_nat_ingress_ipv4 281 314 +33 (+11.74%) > bpf_overlay.o tail_nodeport_nat_ingress_ipv6 245 256 +11 (+4.49%) > bpf_sock.o cil_sock4_connect 47 70 +23 (+48.94%) > bpf_sock.o cil_sock4_sendmsg 45 68 +23 (+51.11%) > bpf_sock.o cil_sock6_post_bind 31 42 +11 (+35.48%) > bpf_xdp.o tail_lb_ipv4 4413 6457 +2044 (+46.32%) > bpf_xdp.o tail_lb_ipv6 6876 7249 +373 (+5.42%) > test_cls_redirect.bpf.o cls_redirect 4704 4799 +95 (+2.02%) > test_tcp_hdr_options.bpf.o estab 180 206 +26 (+14.44%) > xdp_synproxy_kern.bpf.o syncookie_tc 21059 21485 +426 (+2.02%) > xdp_synproxy_kern.bpf.o syncookie_xdp 21857 23122 +1265 (+5.79%) > -------------------------- -------------------------------- ---------- ---------- --------------- > > I looked through verification log for bpf_xdp.o tail_lb_ipv4 program in > order to identify the reason for ~50% visited states increase. It's just a 2K increase and it is fixing a real issue, so I think it's totally acceptable (and see below about STACK_INVALID vs STACK_MISC). Thanks for taking the time to analyze and explain this! > The slowdown is triggered by a difference in handling of three stack slots: > fp-56, fp-72 and fp-80, with the main difference coming from fp-72. > In fact the following change removes all the difference: > > @@ -3256,7 +3256,10 @@ static void save_register_state(struct bpf_func_state *state, > { > int i; > > - copy_register_state(&state->stack[spi].spilled_ptr, reg); > + if ((spi == 6 /*56*/ || spi == 8 /*72*/ || spi == 9 /*80*/) && size != BPF_REG_SIZE) > + state->stack[spi].spilled_ptr = *reg; > + else > + copy_register_state(&state->stack[spi].spilled_ptr, reg); > > For fp-56 I found the following pattern for divergences between > verification logs with and w/o this patch: > > - At some point insn 1862 is reached and checkpoint is created; > - At some other point insn 1862 is reached again: > - with this patch: > - the current state is considered *not* equivalent to the old checkpoint; > - the reason for mismatch is the state of fp-56: > - current state: fp-56=????mmmm > - checkpoint: fp-56_rD=mmmmmmmm I'm wondering if we should consider allowing uninitialized (STACK_INVALID) reads from stack, in general. It feels like it's causing more issues than is actually helpful in practice. Common code pattern is to __builtin_memset() some struct first, and only then initialize it, basically doing unnecessary work of zeroing out. All just to avoid verifier to complain about some irrelevant padding not being initialized. I haven't thought about this much, but it feels that STACK_MISC (initialized, but unknown scalar value) is basically equivalent to STACK_INVALID for all intents and purposes. Thoughts? Obviously, this is a completely separate change and issue from what you are addressing in this patch set. Awesome job on tracking this down and fixing it! For the patch set: Acked-by: Andrii Nakryiko <andrii@kernel.org> > - without this patch the current state is considered equivalent to the > checkpoint, the fp-56 is not present in the checkpoint. > > Here is a fragment of the verification log for when the checkpoint in > question created at insn 1862: > > checkpoint 1862: ... fp-56=mmmmmmmm ... > 1862: ... > 1863: ... > 1864: (61) r1 = *(u32 *)(r0 +0) > 1865: ... > 1866: (63) *(u32 *)(r10 -56) = r1 ; R1_w=scalar(...) R10=fp0 fp-56= > 1867: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 > 1868: (07) r2 += -56 ; R2_w=fp-56 > ; return map_lookup_elem(&LB4_BACKEND_MAP_V2, &backend_id); > 1869: (18) r1 = 0xffff888100286000 ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0) > 1871: (85) call bpf_map_lookup_elem#1 > > - Without this patch: > - at insn 1864 r1 liveness is set to REG_LIVE_WRITTEN; > - at insn 1866 fp-56 liveness is set REG_LIVE_WRITTEN mark because > of the direct r1 copy in save_register_state(); > - at insn 1871 REG_LIVE_READ is not propagated to fp-56 at > checkpoint 1862 because of the REG_LIVE_WRITTEN mark; > - eventually fp-56 is pruned from checkpoint at 1862 in > clean_func_state(). > - With this patch: > - at insn 1864 r1 liveness is set to REG_LIVE_WRITTEN; > - at insn 1866 fp-56 liveness is *not* set to REG_LIVE_WRITTEN mark > because write size is not equal to BPF_REG_SIZE; > - at insn 1871 REG_LIVE_READ is propagated to fp-56 at checkpoint 1862. > > Hence more states have to be visited by verifier with this patch compared > to current master. > > Similar patterns could be found for both fp-72 and fp-80, although these > are harder to track trough the log because of a big number of insns between > slot write and bpf_map_lookup_elem() call triggering read mark, boils down > to the following C code: > > struct ipv4_frag_id frag_id = { > .daddr = ip4->daddr, > .saddr = ip4->saddr, > .id = ip4->id, > .proto = ip4->protocol, > .pad = 0, > }; > ... > map_lookup_elem(..., &frag_id); > > Where: > - .id is mapped to fp-72, write of size u16; > - .saddr is mapped to fp-80, write of size u32. > > This patch-set is a continuation of discussion from [2]. > > Changes v1 -> v2 (no changes in the code itself): > - added analysis for the tail_lb_ipv4 verification slowdown; > - rebase against fresh master branch. > > [1] git@github.com:anakryiko/cilium.git > [2] https://lore.kernel.org/bpf/517af2c57ee4b9ce2d96a8cf33f7295f2d2dfe13.camel@gmail.com/ > > Eduard Zingerman (2): > bpf: Fix to preserve reg parent/live fields when copying range info > selftests/bpf: Verify copy_register_state() preserves parent/live > fields > > kernel/bpf/verifier.c | 25 +++++++++---- > .../selftests/bpf/verifier/search_pruning.c | 36 +++++++++++++++++++ > 2 files changed, 54 insertions(+), 7 deletions(-) > > -- > 2.39.0 >
On Wed, 2023-01-11 at 16:24 -0800, Andrii Nakryiko wrote: [...] > > I'm wondering if we should consider allowing uninitialized > (STACK_INVALID) reads from stack, in general. It feels like it's > causing more issues than is actually helpful in practice. Common code > pattern is to __builtin_memset() some struct first, and only then > initialize it, basically doing unnecessary work of zeroing out. All > just to avoid verifier to complain about some irrelevant padding not > being initialized. I haven't thought about this much, but it feels > that STACK_MISC (initialized, but unknown scalar value) is basically > equivalent to STACK_INVALID for all intents and purposes. Thoughts? Do you have an example of the __builtin_memset() usage? I tried passing partially initialized stack allocated structure to bpf_map_update_elem() and bpf_probe_write_user() and verifier did not complain. Regarding STACK_MISC vs STACK_INVALID, I think it's ok to replace STACK_INVALID with STACK_MISC if we are talking about STX/LDX/ALU instructions because after LDX you would get a full range register and you can't do much with a full range value. However, if a structure containing un-initialized fields (e.g. not just padding) is passed to a helper or kfunc is it an error? > Obviously, this is a completely separate change and issue from what > you are addressing in this patch set. > > Awesome job on tracking this down and fixing it! For the patch set: Thank you for reviewing this issue with me. > > Acked-by: Andrii Nakryiko <andrii@kernel.org> > > [...]
On Fri, Jan 13, 2023 at 12:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Wed, 2023-01-11 at 16:24 -0800, Andrii Nakryiko wrote: > [...] > > > > I'm wondering if we should consider allowing uninitialized > > (STACK_INVALID) reads from stack, in general. It feels like it's > > causing more issues than is actually helpful in practice. Common code > > pattern is to __builtin_memset() some struct first, and only then > > initialize it, basically doing unnecessary work of zeroing out. All > > just to avoid verifier to complain about some irrelevant padding not > > being initialized. I haven't thought about this much, but it feels > > that STACK_MISC (initialized, but unknown scalar value) is basically > > equivalent to STACK_INVALID for all intents and purposes. Thoughts? > > Do you have an example of the __builtin_memset() usage? > I tried passing partially initialized stack allocated structure to > bpf_map_update_elem() and bpf_probe_write_user() and verifier did not > complain. > > Regarding STACK_MISC vs STACK_INVALID, I think it's ok to replace > STACK_INVALID with STACK_MISC if we are talking about STX/LDX/ALU > instructions because after LDX you would get a full range register and > you can't do much with a full range value. However, if a structure > containing un-initialized fields (e.g. not just padding) is passed to > a helper or kfunc is it an error? if we are passing stack as a memory to helper/kfunc (which should be the only valid use case with STACK_MISC, right?), then I think we expect helper/kfunc to treat it as memory with unknowable contents. Not sure if I'm missing something, but MISC says it's some unknown value, and the only difference between INVALID and MISC is that MISC's value was written by program explicitly, while for INVALID that garbage value was there on the stack already (but still unknowable scalar), which effectively is the same thing. > > > Obviously, this is a completely separate change and issue from what > > you are addressing in this patch set. > > > > Awesome job on tracking this down and fixing it! For the patch set: > > Thank you for reviewing this issue with me. > > > > > Acked-by: Andrii Nakryiko <andrii@kernel.org> > > > > > [...]
On Fri, 2023-01-13 at 14:22 -0800, Andrii Nakryiko wrote: > On Fri, Jan 13, 2023 at 12:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > On Wed, 2023-01-11 at 16:24 -0800, Andrii Nakryiko wrote: > > [...] > > > > > > I'm wondering if we should consider allowing uninitialized > > > (STACK_INVALID) reads from stack, in general. It feels like it's > > > causing more issues than is actually helpful in practice. Common code > > > pattern is to __builtin_memset() some struct first, and only then > > > initialize it, basically doing unnecessary work of zeroing out. All > > > just to avoid verifier to complain about some irrelevant padding not > > > being initialized. I haven't thought about this much, but it feels > > > that STACK_MISC (initialized, but unknown scalar value) is basically > > > equivalent to STACK_INVALID for all intents and purposes. Thoughts? > > > > Do you have an example of the __builtin_memset() usage? > > I tried passing partially initialized stack allocated structure to > > bpf_map_update_elem() and bpf_probe_write_user() and verifier did not > > complain. > > > > Regarding STACK_MISC vs STACK_INVALID, I think it's ok to replace > > STACK_INVALID with STACK_MISC if we are talking about STX/LDX/ALU > > instructions because after LDX you would get a full range register and > > you can't do much with a full range value. However, if a structure > > containing un-initialized fields (e.g. not just padding) is passed to > > a helper or kfunc is it an error? > > if we are passing stack as a memory to helper/kfunc (which should be > the only valid use case with STACK_MISC, right?), then I think we > expect helper/kfunc to treat it as memory with unknowable contents. > Not sure if I'm missing something, but MISC says it's some unknown > value, and the only difference between INVALID and MISC is that MISC's > value was written by program explicitly, while for INVALID that > garbage value was there on the stack already (but still unknowable > scalar), which effectively is the same thing. I looked through the places where STACK_INVALID is used, here is the list: - unmark_stack_slots_dynptr() Destroy dynptr marks. Suppose STACK_INVALID is replaced by STACK_MISC here, in this case a scalar read would be possible from such slot, which in turn might lead to pointer leak. Might be a problem? - scrub_spilled_slot() mark spill slot STACK_MISC if not STACK_INVALID Called from: - save_register_state() called from check_stack_write_fixed_off() Would mark not all slots only for 32-bit writes. - check_stack_write_fixed_off() for insns like `fp[-8] = <const>` to destroy previous stack marks. - check_stack_range_initialized() here it always marks all 8 spi slots as STACK_MISC. Looks like STACK_MISC instead of STACK_INVALID wouldn't make a difference in these cases. - check_stack_write_fixed_off() Mark insn as sanitize_stack_spill if pointer is spilled to a stack slot that is marked STACK_INVALID. This one is a bit strange. E.g. the program like this: ... 42: fp[-8] = ptr ... Will mark insn (42) as sanitize_stack_spill. However, the program like this: ... 21: fp[-8] = 22 ;; marks as STACK_MISC ... 42: fp[-8] = ptr ... Won't mark insn (42) as sanitize_stack_spill, which seems strange. - stack_write_var_off() If !env->allow_ptr_leaks only allow writes if slots are not STACK_INVALID. I'm not sure I understand the intention. - clean_func_state() STACK_INVALID is used to mark spi's that are not REG_LIVE_READ as such that should not take part in the state comparison. However, stacksafe() has REG_LIVE_READ check as well, so this marking might be unnecessary. - stacksafe() STACK_INVALID is used as a mark that some bytes of an spi are not important in a state cached for state comparison. E.g. a slot in an old state might be marked 'mmmm????' and 'mmmmmmmm' or 'mmmm0000' in a new state. However other checks in stacksafe() would catch these variations. The conclusion being that some pointer leakage checks might need adjustment if STACK_INVALID is replaced by STACK_MISC. > > > > > > Obviously, this is a completely separate change and issue from what > > > you are addressing in this patch set. > > > > > > Awesome job on tracking this down and fixing it! For the patch set: > > > > Thank you for reviewing this issue with me. > > > > > > > > Acked-by: Andrii Nakryiko <andrii@kernel.org> > > > > > > > > [...]
On Fri, Jan 13, 2023 at 4:10 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Fri, 2023-01-13 at 14:22 -0800, Andrii Nakryiko wrote: > > On Fri, Jan 13, 2023 at 12:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > On Wed, 2023-01-11 at 16:24 -0800, Andrii Nakryiko wrote: > > > [...] > > > > > > > > I'm wondering if we should consider allowing uninitialized > > > > (STACK_INVALID) reads from stack, in general. It feels like it's > > > > causing more issues than is actually helpful in practice. Common code > > > > pattern is to __builtin_memset() some struct first, and only then > > > > initialize it, basically doing unnecessary work of zeroing out. All > > > > just to avoid verifier to complain about some irrelevant padding not > > > > being initialized. I haven't thought about this much, but it feels > > > > that STACK_MISC (initialized, but unknown scalar value) is basically > > > > equivalent to STACK_INVALID for all intents and purposes. Thoughts? > > > > > > Do you have an example of the __builtin_memset() usage? > > > I tried passing partially initialized stack allocated structure to > > > bpf_map_update_elem() and bpf_probe_write_user() and verifier did not > > > complain. > > > > > > Regarding STACK_MISC vs STACK_INVALID, I think it's ok to replace > > > STACK_INVALID with STACK_MISC if we are talking about STX/LDX/ALU > > > instructions because after LDX you would get a full range register and > > > you can't do much with a full range value. However, if a structure > > > containing un-initialized fields (e.g. not just padding) is passed to > > > a helper or kfunc is it an error? > > > > if we are passing stack as a memory to helper/kfunc (which should be > > the only valid use case with STACK_MISC, right?), then I think we > > expect helper/kfunc to treat it as memory with unknowable contents. > > Not sure if I'm missing something, but MISC says it's some unknown > > value, and the only difference between INVALID and MISC is that MISC's > > value was written by program explicitly, while for INVALID that > > garbage value was there on the stack already (but still unknowable > > scalar), which effectively is the same thing. > > I looked through the places where STACK_INVALID is used, here is the list: > > - unmark_stack_slots_dynptr() > Destroy dynptr marks. Suppose STACK_INVALID is replaced by > STACK_MISC here, in this case a scalar read would be possible from > such slot, which in turn might lead to pointer leak. > Might be a problem? We are already talking to enable reading STACK_DYNPTR slots directly. So not a problem? > > - scrub_spilled_slot() > mark spill slot STACK_MISC if not STACK_INVALID > Called from: > - save_register_state() called from check_stack_write_fixed_off() > Would mark not all slots only for 32-bit writes. > - check_stack_write_fixed_off() for insns like `fp[-8] = <const>` to > destroy previous stack marks. > - check_stack_range_initialized() > here it always marks all 8 spi slots as STACK_MISC. > Looks like STACK_MISC instead of STACK_INVALID wouldn't make a > difference in these cases. > > - check_stack_write_fixed_off() > Mark insn as sanitize_stack_spill if pointer is spilled to a stack > slot that is marked STACK_INVALID. This one is a bit strange. > E.g. the program like this: > > ... > 42: fp[-8] = ptr > ... > > Will mark insn (42) as sanitize_stack_spill. > However, the program like this: > > ... > 21: fp[-8] = 22 ;; marks as STACK_MISC > ... > 42: fp[-8] = ptr > ... > > Won't mark insn (42) as sanitize_stack_spill, which seems strange. > > - stack_write_var_off() > If !env->allow_ptr_leaks only allow writes if slots are not > STACK_INVALID. I'm not sure I understand the intention. > > - clean_func_state() > STACK_INVALID is used to mark spi's that are not REG_LIVE_READ as > such that should not take part in the state comparison. However, > stacksafe() has REG_LIVE_READ check as well, so this marking might > be unnecessary. > > - stacksafe() > STACK_INVALID is used as a mark that some bytes of an spi are not > important in a state cached for state comparison. E.g. a slot in an > old state might be marked 'mmmm????' and 'mmmmmmmm' or 'mmmm0000' in > a new state. However other checks in stacksafe() would catch these > variations. > > The conclusion being that some pointer leakage checks might need > adjustment if STACK_INVALID is replaced by STACK_MISC. Just to be clear. My suggestion was to *treat* STACK_INVALID as equivalent to STACK_MISC in stacksafe(), not really replace all the uses of STACK_INVALID with STACK_MISC. And to be on the safe side, I'd do it only if env->allow_ptr_leaks, of course. > > > > > > > > > > Obviously, this is a completely separate change and issue from what > > > > you are addressing in this patch set. > > > > > > > > Awesome job on tracking this down and fixing it! For the patch set: > > > > > > Thank you for reviewing this issue with me. > > > > > > > > > > > Acked-by: Andrii Nakryiko <andrii@kernel.org> > > > > > > > > > > > [...] >
On Fri, Jan 13, 2023 at 5:17 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Fri, Jan 13, 2023 at 4:10 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > On Fri, 2023-01-13 at 14:22 -0800, Andrii Nakryiko wrote: > > > On Fri, Jan 13, 2023 at 12:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > On Wed, 2023-01-11 at 16:24 -0800, Andrii Nakryiko wrote: > > > > [...] > > > > > > > > > > I'm wondering if we should consider allowing uninitialized > > > > > (STACK_INVALID) reads from stack, in general. It feels like it's > > > > > causing more issues than is actually helpful in practice. Common code > > > > > pattern is to __builtin_memset() some struct first, and only then > > > > > initialize it, basically doing unnecessary work of zeroing out. All > > > > > just to avoid verifier to complain about some irrelevant padding not > > > > > being initialized. I haven't thought about this much, but it feels > > > > > that STACK_MISC (initialized, but unknown scalar value) is basically > > > > > equivalent to STACK_INVALID for all intents and purposes. Thoughts? > > > > > > > > Do you have an example of the __builtin_memset() usage? > > > > I tried passing partially initialized stack allocated structure to > > > > bpf_map_update_elem() and bpf_probe_write_user() and verifier did not > > > > complain. > > > > > > > > Regarding STACK_MISC vs STACK_INVALID, I think it's ok to replace > > > > STACK_INVALID with STACK_MISC if we are talking about STX/LDX/ALU > > > > instructions because after LDX you would get a full range register and > > > > you can't do much with a full range value. However, if a structure > > > > containing un-initialized fields (e.g. not just padding) is passed to > > > > a helper or kfunc is it an error? > > > > > > if we are passing stack as a memory to helper/kfunc (which should be > > > the only valid use case with STACK_MISC, right?), then I think we > > > expect helper/kfunc to treat it as memory with unknowable contents. > > > Not sure if I'm missing something, but MISC says it's some unknown > > > value, and the only difference between INVALID and MISC is that MISC's > > > value was written by program explicitly, while for INVALID that > > > garbage value was there on the stack already (but still unknowable > > > scalar), which effectively is the same thing. > > > > I looked through the places where STACK_INVALID is used, here is the list: > > > > - unmark_stack_slots_dynptr() > > Destroy dynptr marks. Suppose STACK_INVALID is replaced by > > STACK_MISC here, in this case a scalar read would be possible from > > such slot, which in turn might lead to pointer leak. > > Might be a problem? > > We are already talking to enable reading STACK_DYNPTR slots directly. > So not a problem? > > > > > - scrub_spilled_slot() > > mark spill slot STACK_MISC if not STACK_INVALID > > Called from: > > - save_register_state() called from check_stack_write_fixed_off() > > Would mark not all slots only for 32-bit writes. > > - check_stack_write_fixed_off() for insns like `fp[-8] = <const>` to > > destroy previous stack marks. > > - check_stack_range_initialized() > > here it always marks all 8 spi slots as STACK_MISC. > > Looks like STACK_MISC instead of STACK_INVALID wouldn't make a > > difference in these cases. > > > > - check_stack_write_fixed_off() > > Mark insn as sanitize_stack_spill if pointer is spilled to a stack > > slot that is marked STACK_INVALID. This one is a bit strange. > > E.g. the program like this: > > > > ... > > 42: fp[-8] = ptr > > ... > > > > Will mark insn (42) as sanitize_stack_spill. > > However, the program like this: > > > > ... > > 21: fp[-8] = 22 ;; marks as STACK_MISC > > ... > > 42: fp[-8] = ptr > > ... > > > > Won't mark insn (42) as sanitize_stack_spill, which seems strange. > > > > - stack_write_var_off() > > If !env->allow_ptr_leaks only allow writes if slots are not > > STACK_INVALID. I'm not sure I understand the intention. > > > > - clean_func_state() > > STACK_INVALID is used to mark spi's that are not REG_LIVE_READ as > > such that should not take part in the state comparison. However, > > stacksafe() has REG_LIVE_READ check as well, so this marking might > > be unnecessary. > > > > - stacksafe() > > STACK_INVALID is used as a mark that some bytes of an spi are not > > important in a state cached for state comparison. E.g. a slot in an > > old state might be marked 'mmmm????' and 'mmmmmmmm' or 'mmmm0000' in > > a new state. However other checks in stacksafe() would catch these > > variations. > > > > The conclusion being that some pointer leakage checks might need > > adjustment if STACK_INVALID is replaced by STACK_MISC. > > Just to be clear. My suggestion was to *treat* STACK_INVALID as > equivalent to STACK_MISC in stacksafe(), not really replace all the > uses of STACK_INVALID with STACK_MISC. And to be on the safe side, I'd > do it only if env->allow_ptr_leaks, of course. Well, that, and to allow STACK_INVALID if env->allow_ptr_leaks in check_stack_read_fixed_off(), of course, to avoid "invalid read from stack off %d+%d size %d\n" error (that's fixing at least part of the problem with uninitialized struct padding). > > > > > > > > > > > > > > > Obviously, this is a completely separate change and issue from what > > > > > you are addressing in this patch set. > > > > > > > > > > Awesome job on tracking this down and fixing it! For the patch set: > > > > > > > > Thank you for reviewing this issue with me. > > > > > > > > > > > > > > Acked-by: Andrii Nakryiko <andrii@kernel.org> > > > > > > > > > > > > > > [...] > >
Hello: This series was applied to bpf/bpf.git (master) by Alexei Starovoitov <ast@kernel.org>: On Fri, 6 Jan 2023 16:22:12 +0200 you wrote: > Struct bpf_reg_state is copied directly in several places including: > - check_stack_write_fixed_off() (via save_register_state()); > - check_stack_read_fixed_off(); > - find_equal_scalars(). > > However, a literal copy of this struct also copies the following fields: > > [...] Here is the summary with links: - [bpf-next,v2,1/2] bpf: Fix to preserve reg parent/live fields when copying range info https://git.kernel.org/bpf/bpf/c/71f656a50176 - [bpf-next,v2,2/2] selftests/bpf: Verify copy_register_state() preserves parent/live fields https://git.kernel.org/bpf/bpf/c/b9fa9bc83929 You are awesome, thank you!
On Fri, Jan 13, 2023 at 5:31 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Fri, Jan 13, 2023 at 5:17 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Fri, Jan 13, 2023 at 4:10 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > On Fri, 2023-01-13 at 14:22 -0800, Andrii Nakryiko wrote: > > > > On Fri, Jan 13, 2023 at 12:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > > > On Wed, 2023-01-11 at 16:24 -0800, Andrii Nakryiko wrote: > > > > > [...] > > > > > > > > > > > > I'm wondering if we should consider allowing uninitialized > > > > > > (STACK_INVALID) reads from stack, in general. It feels like it's > > > > > > causing more issues than is actually helpful in practice. Common code > > > > > > pattern is to __builtin_memset() some struct first, and only then > > > > > > initialize it, basically doing unnecessary work of zeroing out. All > > > > > > just to avoid verifier to complain about some irrelevant padding not > > > > > > being initialized. I haven't thought about this much, but it feels > > > > > > that STACK_MISC (initialized, but unknown scalar value) is basically > > > > > > equivalent to STACK_INVALID for all intents and purposes. Thoughts? > > > > > > > > > > Do you have an example of the __builtin_memset() usage? > > > > > I tried passing partially initialized stack allocated structure to > > > > > bpf_map_update_elem() and bpf_probe_write_user() and verifier did not > > > > > complain. > > > > > > > > > > Regarding STACK_MISC vs STACK_INVALID, I think it's ok to replace > > > > > STACK_INVALID with STACK_MISC if we are talking about STX/LDX/ALU > > > > > instructions because after LDX you would get a full range register and > > > > > you can't do much with a full range value. However, if a structure > > > > > containing un-initialized fields (e.g. not just padding) is passed to > > > > > a helper or kfunc is it an error? > > > > > > > > if we are passing stack as a memory to helper/kfunc (which should be > > > > the only valid use case with STACK_MISC, right?), then I think we > > > > expect helper/kfunc to treat it as memory with unknowable contents. > > > > Not sure if I'm missing something, but MISC says it's some unknown > > > > value, and the only difference between INVALID and MISC is that MISC's > > > > value was written by program explicitly, while for INVALID that > > > > garbage value was there on the stack already (but still unknowable > > > > scalar), which effectively is the same thing. > > > > > > I looked through the places where STACK_INVALID is used, here is the list: > > > > > > - unmark_stack_slots_dynptr() > > > Destroy dynptr marks. Suppose STACK_INVALID is replaced by > > > STACK_MISC here, in this case a scalar read would be possible from > > > such slot, which in turn might lead to pointer leak. > > > Might be a problem? > > > > We are already talking to enable reading STACK_DYNPTR slots directly. > > So not a problem? > > > > > > > > - scrub_spilled_slot() > > > mark spill slot STACK_MISC if not STACK_INVALID > > > Called from: > > > - save_register_state() called from check_stack_write_fixed_off() > > > Would mark not all slots only for 32-bit writes. > > > - check_stack_write_fixed_off() for insns like `fp[-8] = <const>` to > > > destroy previous stack marks. > > > - check_stack_range_initialized() > > > here it always marks all 8 spi slots as STACK_MISC. > > > Looks like STACK_MISC instead of STACK_INVALID wouldn't make a > > > difference in these cases. > > > > > > - check_stack_write_fixed_off() > > > Mark insn as sanitize_stack_spill if pointer is spilled to a stack > > > slot that is marked STACK_INVALID. This one is a bit strange. > > > E.g. the program like this: > > > > > > ... > > > 42: fp[-8] = ptr > > > ... > > > > > > Will mark insn (42) as sanitize_stack_spill. > > > However, the program like this: > > > > > > ... > > > 21: fp[-8] = 22 ;; marks as STACK_MISC > > > ... > > > 42: fp[-8] = ptr > > > ... > > > > > > Won't mark insn (42) as sanitize_stack_spill, which seems strange. > > > > > > - stack_write_var_off() > > > If !env->allow_ptr_leaks only allow writes if slots are not > > > STACK_INVALID. I'm not sure I understand the intention. > > > > > > - clean_func_state() > > > STACK_INVALID is used to mark spi's that are not REG_LIVE_READ as > > > such that should not take part in the state comparison. However, > > > stacksafe() has REG_LIVE_READ check as well, so this marking might > > > be unnecessary. > > > > > > - stacksafe() > > > STACK_INVALID is used as a mark that some bytes of an spi are not > > > important in a state cached for state comparison. E.g. a slot in an > > > old state might be marked 'mmmm????' and 'mmmmmmmm' or 'mmmm0000' in > > > a new state. However other checks in stacksafe() would catch these > > > variations. > > > > > > The conclusion being that some pointer leakage checks might need > > > adjustment if STACK_INVALID is replaced by STACK_MISC. > > > > Just to be clear. My suggestion was to *treat* STACK_INVALID as > > equivalent to STACK_MISC in stacksafe(), not really replace all the > > uses of STACK_INVALID with STACK_MISC. And to be on the safe side, I'd > > do it only if env->allow_ptr_leaks, of course. > > Well, that, and to allow STACK_INVALID if env->allow_ptr_leaks in > check_stack_read_fixed_off(), of course, to avoid "invalid read from > stack off %d+%d size %d\n" error (that's fixing at least part of the > problem with uninitialized struct padding). +1 to Andrii's idea. It should help us recover this small increase in processed states. Eduard, The fix itself is brilliant. Thank you for investigating and providing the detailed explanation. I've read this thread and the previous one, walked through all the points and it all looks correct. Sorry it took me a long time to remember the details of liveness logic to review it properly. While you, Andrii and me keep this tricky knowledge in our heads could you please document how liveness works in Documentation/bpf/verifier.rst ? We'll be able to review it now and next time it will be easier to remember. I've tried Andrii's suggestion: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 7ee218827259..0f71ba6a56e2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3591,7 +3591,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, copy_register_state(&state->regs[dst_regno], reg); state->regs[dst_regno].subreg_def = subreg_def; } else { - for (i = 0; i < size; i++) { + for (i = 0; i < size && !env->allow_uninit_stack; i++) { type = stype[(slot - i) % BPF_REG_SIZE]; if (type == STACK_SPILL) continue; @@ -3628,7 +3628,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, } mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64); } else { - for (i = 0; i < size; i++) { + for (i = 0; i < size && !env->allow_uninit_stack; i++) { type = stype[(slot - i) % BPF_REG_SIZE]; if (type == STACK_MISC) continue; @@ -13208,6 +13208,10 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID) continue; + if (env->allow_uninit_stack && + old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC) + continue; and only dynptr/invalid_read[134] tests failed which is expected and acceptable. We can tweak those tests. Could you take over this diff, run veristat analysis and submit it as an official patch? I suspect we should see nice improvements in states processed. Thanks!
On Thu, Jan 19, 2023 at 3:52 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Jan 13, 2023 at 5:31 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Fri, Jan 13, 2023 at 5:17 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Fri, Jan 13, 2023 at 4:10 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > On Fri, 2023-01-13 at 14:22 -0800, Andrii Nakryiko wrote: > > > > > On Fri, Jan 13, 2023 at 12:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > > > > > On Wed, 2023-01-11 at 16:24 -0800, Andrii Nakryiko wrote: > > > > > > [...] > > > > > > > > > > > > > > I'm wondering if we should consider allowing uninitialized > > > > > > > (STACK_INVALID) reads from stack, in general. It feels like it's > > > > > > > causing more issues than is actually helpful in practice. Common code > > > > > > > pattern is to __builtin_memset() some struct first, and only then > > > > > > > initialize it, basically doing unnecessary work of zeroing out. All > > > > > > > just to avoid verifier to complain about some irrelevant padding not > > > > > > > being initialized. I haven't thought about this much, but it feels > > > > > > > that STACK_MISC (initialized, but unknown scalar value) is basically > > > > > > > equivalent to STACK_INVALID for all intents and purposes. Thoughts? > > > > > > > > > > > > Do you have an example of the __builtin_memset() usage? > > > > > > I tried passing partially initialized stack allocated structure to > > > > > > bpf_map_update_elem() and bpf_probe_write_user() and verifier did not > > > > > > complain. > > > > > > > > > > > > Regarding STACK_MISC vs STACK_INVALID, I think it's ok to replace > > > > > > STACK_INVALID with STACK_MISC if we are talking about STX/LDX/ALU > > > > > > instructions because after LDX you would get a full range register and > > > > > > you can't do much with a full range value. However, if a structure > > > > > > containing un-initialized fields (e.g. not just padding) is passed to > > > > > > a helper or kfunc is it an error? > > > > > > > > > > if we are passing stack as a memory to helper/kfunc (which should be > > > > > the only valid use case with STACK_MISC, right?), then I think we > > > > > expect helper/kfunc to treat it as memory with unknowable contents. > > > > > Not sure if I'm missing something, but MISC says it's some unknown > > > > > value, and the only difference between INVALID and MISC is that MISC's > > > > > value was written by program explicitly, while for INVALID that > > > > > garbage value was there on the stack already (but still unknowable > > > > > scalar), which effectively is the same thing. > > > > > > > > I looked through the places where STACK_INVALID is used, here is the list: > > > > > > > > - unmark_stack_slots_dynptr() > > > > Destroy dynptr marks. Suppose STACK_INVALID is replaced by > > > > STACK_MISC here, in this case a scalar read would be possible from > > > > such slot, which in turn might lead to pointer leak. > > > > Might be a problem? > > > > > > We are already talking to enable reading STACK_DYNPTR slots directly. > > > So not a problem? > > > > > > > > > > > - scrub_spilled_slot() > > > > mark spill slot STACK_MISC if not STACK_INVALID > > > > Called from: > > > > - save_register_state() called from check_stack_write_fixed_off() > > > > Would mark not all slots only for 32-bit writes. > > > > - check_stack_write_fixed_off() for insns like `fp[-8] = <const>` to > > > > destroy previous stack marks. > > > > - check_stack_range_initialized() > > > > here it always marks all 8 spi slots as STACK_MISC. > > > > Looks like STACK_MISC instead of STACK_INVALID wouldn't make a > > > > difference in these cases. > > > > > > > > - check_stack_write_fixed_off() > > > > Mark insn as sanitize_stack_spill if pointer is spilled to a stack > > > > slot that is marked STACK_INVALID. This one is a bit strange. > > > > E.g. the program like this: > > > > > > > > ... > > > > 42: fp[-8] = ptr > > > > ... > > > > > > > > Will mark insn (42) as sanitize_stack_spill. > > > > However, the program like this: > > > > > > > > ... > > > > 21: fp[-8] = 22 ;; marks as STACK_MISC > > > > ... > > > > 42: fp[-8] = ptr > > > > ... > > > > > > > > Won't mark insn (42) as sanitize_stack_spill, which seems strange. > > > > > > > > - stack_write_var_off() > > > > If !env->allow_ptr_leaks only allow writes if slots are not > > > > STACK_INVALID. I'm not sure I understand the intention. > > > > > > > > - clean_func_state() > > > > STACK_INVALID is used to mark spi's that are not REG_LIVE_READ as > > > > such that should not take part in the state comparison. However, > > > > stacksafe() has REG_LIVE_READ check as well, so this marking might > > > > be unnecessary. > > > > > > > > - stacksafe() > > > > STACK_INVALID is used as a mark that some bytes of an spi are not > > > > important in a state cached for state comparison. E.g. a slot in an > > > > old state might be marked 'mmmm????' and 'mmmmmmmm' or 'mmmm0000' in > > > > a new state. However other checks in stacksafe() would catch these > > > > variations. > > > > > > > > The conclusion being that some pointer leakage checks might need > > > > adjustment if STACK_INVALID is replaced by STACK_MISC. > > > > > > Just to be clear. My suggestion was to *treat* STACK_INVALID as > > > equivalent to STACK_MISC in stacksafe(), not really replace all the > > > uses of STACK_INVALID with STACK_MISC. And to be on the safe side, I'd > > > do it only if env->allow_ptr_leaks, of course. > > > > Well, that, and to allow STACK_INVALID if env->allow_ptr_leaks in > > check_stack_read_fixed_off(), of course, to avoid "invalid read from > > stack off %d+%d size %d\n" error (that's fixing at least part of the > > problem with uninitialized struct padding). > > +1 to Andrii's idea. > It should help us recover this small increase in processed states. > > Eduard, > > The fix itself is brilliant. Thank you for investigating > and providing the detailed explanation. > I've read this thread and the previous one, > walked through all the points and it all looks correct. > Sorry it took me a long time to remember the details > of liveness logic to review it properly. > > While you, Andrii and me keep this tricky knowledge in our > heads could you please document how liveness works in > Documentation/bpf/verifier.rst ? > We'll be able to review it now and next time it will be > easier to remember. > > I've tried Andrii's suggestion: > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 7ee218827259..0f71ba6a56e2 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -3591,7 +3591,7 @@ static int check_stack_read_fixed_off(struct > bpf_verifier_env *env, > > copy_register_state(&state->regs[dst_regno], reg); > state->regs[dst_regno].subreg_def = subreg_def; > } else { > - for (i = 0; i < size; i++) { > + for (i = 0; i < size && > !env->allow_uninit_stack; i++) { > type = stype[(slot - i) % BPF_REG_SIZE]; > if (type == STACK_SPILL) > continue; > @@ -3628,7 +3628,7 @@ static int check_stack_read_fixed_off(struct > bpf_verifier_env *env, > } > mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64); > } else { > - for (i = 0; i < size; i++) { > + for (i = 0; i < size && !env->allow_uninit_stack; i++) { > type = stype[(slot - i) % BPF_REG_SIZE]; > if (type == STACK_MISC) > continue; > @@ -13208,6 +13208,10 @@ static bool stacksafe(struct bpf_verifier_env > *env, struct bpf_func_state *old, > if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == > STACK_INVALID) > continue; > > + if (env->allow_uninit_stack && > + old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC) > + continue; > > and only dynptr/invalid_read[134] tests failed > which is expected and acceptable. > We can tweak those tests. > > Could you take over this diff, run veristat analysis and > submit it as an official patch? I suspect we should see nice > improvements in states processed. Indeed, some massive improvements: ./veristat -e file,prog,states -C -f 'states_diff<-10' bb aa File Program States (A) States (B) States (DIFF) -------------------------------- ----------------------- ---------- ---------- ---------------- bpf_flow.bpf.o flow_dissector_0 78 67 -11 (-14.10%) loop6.bpf.o trace_virtqueue_add_sgs 336 316 -20 (-5.95%) pyperf100.bpf.o on_event 6213 4670 -1543 (-24.84%) pyperf180.bpf.o on_event 11470 8364 -3106 (-27.08%) pyperf50.bpf.o on_event 3263 2370 -893 (-27.37%) pyperf600.bpf.o on_event 30335 22200 -8135 (-26.82%) pyperf600_bpf_loop.bpf.o on_event 287 145 -142 (-49.48%) pyperf600_nounroll.bpf.o on_event 37101 34169 -2932 (-7.90%) strobemeta.bpf.o on_event 15939 4893 -11046 (-69.30%) strobemeta_nounroll1.bpf.o on_event 1936 1538 -398 (-20.56%) strobemeta_nounroll2.bpf.o on_event 4436 3991 -445 (-10.03%) strobemeta_subprogs.bpf.o on_event 2025 1689 -336 (-16.59%) test_cls_redirect.bpf.o cls_redirect 4865 4042 -823 (-16.92%) test_cls_redirect_subprogs.bpf.o cls_redirect 4506 4389 -117 (-2.60%) test_tcp_hdr_options.bpf.o estab 211 178 -33 (-15.64%) test_xdp_noinline.bpf.o balancer_ingress_v4 262 235 -27 (-10.31%) test_xdp_noinline.bpf.o balancer_ingress_v6 253 210 -43 (-17.00%) xdp_synproxy_kern.bpf.o syncookie_tc 25086 7016 -18070 (-72.03%) xdp_synproxy_kern.bpf.o syncookie_xdp 24206 6941 -17265 (-71.33%) -------------------------------- ----------------------- ---------- ---------- ----------------
On Thu, 2023-01-19 at 15:52 -0800, Alexei Starovoitov wrote: > On Fri, Jan 13, 2023 at 5:31 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Fri, Jan 13, 2023 at 5:17 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Fri, Jan 13, 2023 at 4:10 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > On Fri, 2023-01-13 at 14:22 -0800, Andrii Nakryiko wrote: > > > > > On Fri, Jan 13, 2023 at 12:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > > > > > On Wed, 2023-01-11 at 16:24 -0800, Andrii Nakryiko wrote: > > > > > > [...] > > > > > > > > > > > > > > I'm wondering if we should consider allowing uninitialized > > > > > > > (STACK_INVALID) reads from stack, in general. It feels like it's > > > > > > > causing more issues than is actually helpful in practice. Common code > > > > > > > pattern is to __builtin_memset() some struct first, and only then > > > > > > > initialize it, basically doing unnecessary work of zeroing out. All > > > > > > > just to avoid verifier to complain about some irrelevant padding not > > > > > > > being initialized. I haven't thought about this much, but it feels > > > > > > > that STACK_MISC (initialized, but unknown scalar value) is basically > > > > > > > equivalent to STACK_INVALID for all intents and purposes. Thoughts? > > > > > > > > > > > > Do you have an example of the __builtin_memset() usage? > > > > > > I tried passing partially initialized stack allocated structure to > > > > > > bpf_map_update_elem() and bpf_probe_write_user() and verifier did not > > > > > > complain. > > > > > > > > > > > > Regarding STACK_MISC vs STACK_INVALID, I think it's ok to replace > > > > > > STACK_INVALID with STACK_MISC if we are talking about STX/LDX/ALU > > > > > > instructions because after LDX you would get a full range register and > > > > > > you can't do much with a full range value. However, if a structure > > > > > > containing un-initialized fields (e.g. not just padding) is passed to > > > > > > a helper or kfunc is it an error? > > > > > > > > > > if we are passing stack as a memory to helper/kfunc (which should be > > > > > the only valid use case with STACK_MISC, right?), then I think we > > > > > expect helper/kfunc to treat it as memory with unknowable contents. > > > > > Not sure if I'm missing something, but MISC says it's some unknown > > > > > value, and the only difference between INVALID and MISC is that MISC's > > > > > value was written by program explicitly, while for INVALID that > > > > > garbage value was there on the stack already (but still unknowable > > > > > scalar), which effectively is the same thing. > > > > > > > > I looked through the places where STACK_INVALID is used, here is the list: > > > > > > > > - unmark_stack_slots_dynptr() > > > > Destroy dynptr marks. Suppose STACK_INVALID is replaced by > > > > STACK_MISC here, in this case a scalar read would be possible from > > > > such slot, which in turn might lead to pointer leak. > > > > Might be a problem? > > > > > > We are already talking to enable reading STACK_DYNPTR slots directly. > > > So not a problem? > > > > > > > > > > > - scrub_spilled_slot() > > > > mark spill slot STACK_MISC if not STACK_INVALID > > > > Called from: > > > > - save_register_state() called from check_stack_write_fixed_off() > > > > Would mark not all slots only for 32-bit writes. > > > > - check_stack_write_fixed_off() for insns like `fp[-8] = <const>` to > > > > destroy previous stack marks. > > > > - check_stack_range_initialized() > > > > here it always marks all 8 spi slots as STACK_MISC. > > > > Looks like STACK_MISC instead of STACK_INVALID wouldn't make a > > > > difference in these cases. > > > > > > > > - check_stack_write_fixed_off() > > > > Mark insn as sanitize_stack_spill if pointer is spilled to a stack > > > > slot that is marked STACK_INVALID. This one is a bit strange. > > > > E.g. the program like this: > > > > > > > > ... > > > > 42: fp[-8] = ptr > > > > ... > > > > > > > > Will mark insn (42) as sanitize_stack_spill. > > > > However, the program like this: > > > > > > > > ... > > > > 21: fp[-8] = 22 ;; marks as STACK_MISC > > > > ... > > > > 42: fp[-8] = ptr > > > > ... > > > > > > > > Won't mark insn (42) as sanitize_stack_spill, which seems strange. > > > > > > > > - stack_write_var_off() > > > > If !env->allow_ptr_leaks only allow writes if slots are not > > > > STACK_INVALID. I'm not sure I understand the intention. > > > > > > > > - clean_func_state() > > > > STACK_INVALID is used to mark spi's that are not REG_LIVE_READ as > > > > such that should not take part in the state comparison. However, > > > > stacksafe() has REG_LIVE_READ check as well, so this marking might > > > > be unnecessary. > > > > > > > > - stacksafe() > > > > STACK_INVALID is used as a mark that some bytes of an spi are not > > > > important in a state cached for state comparison. E.g. a slot in an > > > > old state might be marked 'mmmm????' and 'mmmmmmmm' or 'mmmm0000' in > > > > a new state. However other checks in stacksafe() would catch these > > > > variations. > > > > > > > > The conclusion being that some pointer leakage checks might need > > > > adjustment if STACK_INVALID is replaced by STACK_MISC. > > > > > > Just to be clear. My suggestion was to *treat* STACK_INVALID as > > > equivalent to STACK_MISC in stacksafe(), not really replace all the > > > uses of STACK_INVALID with STACK_MISC. And to be on the safe side, I'd > > > do it only if env->allow_ptr_leaks, of course. > > > > Well, that, and to allow STACK_INVALID if env->allow_ptr_leaks in > > check_stack_read_fixed_off(), of course, to avoid "invalid read from > > stack off %d+%d size %d\n" error (that's fixing at least part of the > > problem with uninitialized struct padding). > > +1 to Andrii's idea. > It should help us recover this small increase in processed states. > > Eduard, > > The fix itself is brilliant. Thank you for investigating > and providing the detailed explanation. > I've read this thread and the previous one, > walked through all the points and it all looks correct. > Sorry it took me a long time to remember the details > of liveness logic to review it properly. > > While you, Andrii and me keep this tricky knowledge in our > heads could you please document how liveness works in > Documentation/bpf/verifier.rst ? > We'll be able to review it now and next time it will be > easier to remember. I'll extend the doc and finalize your patch over the weekend, thank you for the review. > > I've tried Andrii's suggestion: > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 7ee218827259..0f71ba6a56e2 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -3591,7 +3591,7 @@ static int check_stack_read_fixed_off(struct > bpf_verifier_env *env, > > copy_register_state(&state->regs[dst_regno], reg); > state->regs[dst_regno].subreg_def = subreg_def; > } else { > - for (i = 0; i < size; i++) { > + for (i = 0; i < size && > !env->allow_uninit_stack; i++) { > type = stype[(slot - i) % BPF_REG_SIZE]; > if (type == STACK_SPILL) > continue; > @@ -3628,7 +3628,7 @@ static int check_stack_read_fixed_off(struct > bpf_verifier_env *env, > } > mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64); > } else { > - for (i = 0; i < size; i++) { > + for (i = 0; i < size && !env->allow_uninit_stack; i++) { > type = stype[(slot - i) % BPF_REG_SIZE]; > if (type == STACK_MISC) > continue; > @@ -13208,6 +13208,10 @@ static bool stacksafe(struct bpf_verifier_env > *env, struct bpf_func_state *old, > if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == > STACK_INVALID) > continue; > > + if (env->allow_uninit_stack && > + old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC) > + continue; > > and only dynptr/invalid_read[134] tests failed > which is expected and acceptable. > We can tweak those tests. > > Could you take over this diff, run veristat analysis and > submit it as an official patch? I suspect we should see nice > improvements in states processed. > > Thanks!
On Thu, 2023-01-19 at 16:16 -0800, Alexei Starovoitov wrote: [...] > > > > > > > > Just to be clear. My suggestion was to *treat* STACK_INVALID as > > > > equivalent to STACK_MISC in stacksafe(), not really replace all the > > > > uses of STACK_INVALID with STACK_MISC. And to be on the safe side, I'd > > > > do it only if env->allow_ptr_leaks, of course. > > > > > > Well, that, and to allow STACK_INVALID if env->allow_ptr_leaks in > > > check_stack_read_fixed_off(), of course, to avoid "invalid read from > > > stack off %d+%d size %d\n" error (that's fixing at least part of the > > > problem with uninitialized struct padding). > > > > +1 to Andrii's idea. > > It should help us recover this small increase in processed states. > > [...] > > > > I've tried Andrii's suggestion: > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 7ee218827259..0f71ba6a56e2 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -3591,7 +3591,7 @@ static int check_stack_read_fixed_off(struct > > bpf_verifier_env *env, > > > > copy_register_state(&state->regs[dst_regno], reg); > > state->regs[dst_regno].subreg_def = subreg_def; > > } else { > > - for (i = 0; i < size; i++) { > > + for (i = 0; i < size && > > !env->allow_uninit_stack; i++) { > > type = stype[(slot - i) % BPF_REG_SIZE]; > > if (type == STACK_SPILL) > > continue; > > @@ -3628,7 +3628,7 @@ static int check_stack_read_fixed_off(struct > > bpf_verifier_env *env, > > } > > mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64); > > } else { > > - for (i = 0; i < size; i++) { > > + for (i = 0; i < size && !env->allow_uninit_stack; i++) { > > type = stype[(slot - i) % BPF_REG_SIZE]; > > if (type == STACK_MISC) > > continue; > > @@ -13208,6 +13208,10 @@ static bool stacksafe(struct bpf_verifier_env > > *env, struct bpf_func_state *old, > > if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == > > STACK_INVALID) > > continue; > > > > + if (env->allow_uninit_stack && > > + old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC) > > + continue; > > > > and only dynptr/invalid_read[134] tests failed > > which is expected and acceptable. > > We can tweak those tests. > > > > Could you take over this diff, run veristat analysis and > > submit it as an official patch? I suspect we should see nice > > improvements in states processed. > Hi Alexei, Andrii, Please note that the patch "bpf: Fix to preserve reg parent/live fields when copying range info" that started this conversation was applied to `bpf` tree, not `bpf-next`, so I'll wait until it gets its way to `bpf-next` before submitting formal patches, as it changes the performance numbers collected by veristat. I did all my experiments with this patch applied on top of `bpf-next`. I adapted the patch suggested by Alexei and put it to my github for now [1]. The performance gains are indeed significant: $ ./veristat -e file,states -C -f 'states_pct<-30' master.log uninit-reads.log File States (A) States (B) States (DIFF) -------------------------- ---------- ---------- ---------------- bpf_host.o 349 244 -105 (-30.09%) bpf_host.o 1320 895 -425 (-32.20%) bpf_lxc.o 1320 895 -425 (-32.20%) bpf_sock.o 70 48 -22 (-31.43%) bpf_sock.o 68 46 -22 (-32.35%) bpf_xdp.o 1554 803 -751 (-48.33%) bpf_xdp.o 6457 2473 -3984 (-61.70%) bpf_xdp.o 7249 3908 -3341 (-46.09%) pyperf600_bpf_loop.bpf.o 287 145 -142 (-49.48%) strobemeta.bpf.o 15879 4790 -11089 (-69.83%) strobemeta_nounroll2.bpf.o 20505 3931 -16574 (-80.83%) xdp_synproxy_kern.bpf.o 22564 7009 -15555 (-68.94%) xdp_synproxy_kern.bpf.o 24206 6941 -17265 (-71.33%) -------------------------- ---------- ---------- ---------------- However, this comes at a cost of allowing reads from uninitialized stack locations. As far as I understand access to uninitialized local variable is one of the most common errors when programming in C (although citation is needed). Also more tests are failing after register parentage chains patch is applied than in Alexei's initial try: 10 verifier tests and 1 progs test (test_global_func10.c, I have not modified it yet, it should wait for my changes for unprivileged execution mode support in test_loader.c). I don't really like how I had to fix those tests. I took a detailed look at the difference in verifier behavior between master and the branch [1] for pyperf600_bpf_loop.bpf.o and identified that the difference is caused by the fact that helper functions do not mark the stack they access as REG_LIVE_WRITTEN, the details are in the commit message [3], but TLDR is the following example: 1: bpf_probe_read_user(&foo, ...); 2: if (*foo) ... Here `*foo` will not get REG_LIVE_WRITTEN mark when (1) is verified, thus `*foo` read at (2) might lead to excessive REG_LIVE_READ marks and thus more verification states. I prepared a patch that changes helper calls verification to apply REG_LIVE_WRITTEN when write size and alignment allow this, again currently on my github [2]. This patch has less dramatic performance impact, but nonetheless significant: $ veristat -e file,states -C -f 'states_pct<-30' master.log helpers-written.log File States (A) States (B) States (DIFF) -------------------------- ---------- ---------- ---------------- pyperf600_bpf_loop.bpf.o 287 156 -131 (-45.64%) strobemeta.bpf.o 15879 4772 -11107 (-69.95%) strobemeta_nounroll1.bpf.o 2065 1337 -728 (-35.25%) strobemeta_nounroll2.bpf.o 20505 3788 -16717 (-81.53%) test_cls_redirect.bpf.o 8129 4799 -3330 (-40.96%) -------------------------- ---------- ---------- ---------------- I suggest that instead of dropping a useful safety check I can further investigate difference in behavior between "uninit-reads.log" and "helpers-written.log" and maybe figure out other improvements. Unfortunately the comparison process is extremely time consuming. wdyt? [1] https://github.com/eddyz87/bpf/tree/allow-uninit-stack-reads [2] https://github.com/eddyz87/bpf/tree/mark-helper-stack-as-written [3] https://github.com/kernel-patches/bpf/commit/b29842309271c21cbcb3f85d56cdf9f45f8382d2 > Indeed, some massive improvements: > ./veristat -e file,prog,states -C -f 'states_diff<-10' bb aa > File Program States (A) > States (B) States (DIFF) > -------------------------------- ----------------------- ---------- > ---------- ---------------- > bpf_flow.bpf.o flow_dissector_0 78 > 67 -11 (-14.10%) > loop6.bpf.o trace_virtqueue_add_sgs 336 > 316 -20 (-5.95%) > pyperf100.bpf.o on_event 6213 > 4670 -1543 (-24.84%) > pyperf180.bpf.o on_event 11470 > 8364 -3106 (-27.08%) > pyperf50.bpf.o on_event 3263 > 2370 -893 (-27.37%) > pyperf600.bpf.o on_event 30335 > 22200 -8135 (-26.82%) > pyperf600_bpf_loop.bpf.o on_event 287 > 145 -142 (-49.48%) > pyperf600_nounroll.bpf.o on_event 37101 > 34169 -2932 (-7.90%) > strobemeta.bpf.o on_event 15939 > 4893 -11046 (-69.30%) > strobemeta_nounroll1.bpf.o on_event 1936 > 1538 -398 (-20.56%) > strobemeta_nounroll2.bpf.o on_event 4436 > 3991 -445 (-10.03%) > strobemeta_subprogs.bpf.o on_event 2025 > 1689 -336 (-16.59%) > test_cls_redirect.bpf.o cls_redirect 4865 > 4042 -823 (-16.92%) > test_cls_redirect_subprogs.bpf.o cls_redirect 4506 > 4389 -117 (-2.60%) > test_tcp_hdr_options.bpf.o estab 211 > 178 -33 (-15.64%) > test_xdp_noinline.bpf.o balancer_ingress_v4 262 > 235 -27 (-10.31%) > test_xdp_noinline.bpf.o balancer_ingress_v6 253 > 210 -43 (-17.00%) > xdp_synproxy_kern.bpf.o syncookie_tc 25086 > 7016 -18070 (-72.03%) > xdp_synproxy_kern.bpf.o syncookie_xdp 24206 > 6941 -17265 (-71.33%) > -------------------------------- ----------------------- ---------- > ---------- ----------------
On Mon, Jan 30, 2023 at 7:34 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Thu, 2023-01-19 at 16:16 -0800, Alexei Starovoitov wrote: > [...] > > > > > > > > > > Just to be clear. My suggestion was to *treat* STACK_INVALID as > > > > > equivalent to STACK_MISC in stacksafe(), not really replace all the > > > > > uses of STACK_INVALID with STACK_MISC. And to be on the safe side, I'd > > > > > do it only if env->allow_ptr_leaks, of course. > > > > > > > > Well, that, and to allow STACK_INVALID if env->allow_ptr_leaks in > > > > check_stack_read_fixed_off(), of course, to avoid "invalid read from > > > > stack off %d+%d size %d\n" error (that's fixing at least part of the > > > > problem with uninitialized struct padding). > > > > > > +1 to Andrii's idea. > > > It should help us recover this small increase in processed states. > > > > [...] > > > > > > I've tried Andrii's suggestion: > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > index 7ee218827259..0f71ba6a56e2 100644 > > > --- a/kernel/bpf/verifier.c > > > +++ b/kernel/bpf/verifier.c > > > @@ -3591,7 +3591,7 @@ static int check_stack_read_fixed_off(struct > > > bpf_verifier_env *env, > > > > > > copy_register_state(&state->regs[dst_regno], reg); > > > state->regs[dst_regno].subreg_def = subreg_def; > > > } else { > > > - for (i = 0; i < size; i++) { > > > + for (i = 0; i < size && > > > !env->allow_uninit_stack; i++) { > > > type = stype[(slot - i) % BPF_REG_SIZE]; > > > if (type == STACK_SPILL) > > > continue; > > > @@ -3628,7 +3628,7 @@ static int check_stack_read_fixed_off(struct > > > bpf_verifier_env *env, > > > } > > > mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64); > > > } else { > > > - for (i = 0; i < size; i++) { > > > + for (i = 0; i < size && !env->allow_uninit_stack; i++) { > > > type = stype[(slot - i) % BPF_REG_SIZE]; > > > if (type == STACK_MISC) > > > continue; > > > @@ -13208,6 +13208,10 @@ static bool stacksafe(struct bpf_verifier_env > > > *env, struct bpf_func_state *old, > > > if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == > > > STACK_INVALID) > > > continue; > > > > > > + if (env->allow_uninit_stack && > > > + old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC) > > > + continue; > > > > > > and only dynptr/invalid_read[134] tests failed > > > which is expected and acceptable. > > > We can tweak those tests. > > > > > > Could you take over this diff, run veristat analysis and > > > submit it as an official patch? I suspect we should see nice > > > improvements in states processed. > > > > Hi Alexei, Andrii, > > Please note that the patch > "bpf: Fix to preserve reg parent/live fields when copying range info" > that started this conversation was applied to `bpf` tree, not `bpf-next`, > so I'll wait until it gets its way to `bpf-next` before submitting formal > patches, as it changes the performance numbers collected by veristat. > I did all my experiments with this patch applied on top of `bpf-next`. > > I adapted the patch suggested by Alexei and put it to my github for > now [1]. The performance gains are indeed significant: > > $ ./veristat -e file,states -C -f 'states_pct<-30' master.log uninit-reads.log > File States (A) States (B) States (DIFF) > -------------------------- ---------- ---------- ---------------- > bpf_host.o 349 244 -105 (-30.09%) > bpf_host.o 1320 895 -425 (-32.20%) > bpf_lxc.o 1320 895 -425 (-32.20%) > bpf_sock.o 70 48 -22 (-31.43%) > bpf_sock.o 68 46 -22 (-32.35%) > bpf_xdp.o 1554 803 -751 (-48.33%) > bpf_xdp.o 6457 2473 -3984 (-61.70%) > bpf_xdp.o 7249 3908 -3341 (-46.09%) > pyperf600_bpf_loop.bpf.o 287 145 -142 (-49.48%) > strobemeta.bpf.o 15879 4790 -11089 (-69.83%) > strobemeta_nounroll2.bpf.o 20505 3931 -16574 (-80.83%) > xdp_synproxy_kern.bpf.o 22564 7009 -15555 (-68.94%) > xdp_synproxy_kern.bpf.o 24206 6941 -17265 (-71.33%) > -------------------------- ---------- ---------- ---------------- > > However, this comes at a cost of allowing reads from uninitialized > stack locations. As far as I understand access to uninitialized local > variable is one of the most common errors when programming in C > (although citation is needed). Yeah, a citation is really needed :) I don't see this often in practice, tbh. What I do see in practice is that people are unnecessarily __builtint_memset(0) struct and initialize all fields with field-by-field initialization, instead of just using a nice C syntax: struct my_struct s = { .field_a = 123, .field_b = 234, }; And all that just because there is some padding between field_a and field_b which the compiler won't zero-initialize. > > Also more tests are failing after register parentage chains patch is > applied than in Alexei's initial try: 10 verifier tests and 1 progs > test (test_global_func10.c, I have not modified it yet, it should wait > for my changes for unprivileged execution mode support in > test_loader.c). I don't really like how I had to fix those tests. > > I took a detailed look at the difference in verifier behavior between > master and the branch [1] for pyperf600_bpf_loop.bpf.o and identified > that the difference is caused by the fact that helper functions do not > mark the stack they access as REG_LIVE_WRITTEN, the details are in the > commit message [3], but TLDR is the following example: > > 1: bpf_probe_read_user(&foo, ...); > 2: if (*foo) ... > > Here `*foo` will not get REG_LIVE_WRITTEN mark when (1) is verified, > thus `*foo` read at (2) might lead to excessive REG_LIVE_READ marks > and thus more verification states. This is a good fix in its own right, of course, we should definitely do this! > > I prepared a patch that changes helper calls verification to apply > REG_LIVE_WRITTEN when write size and alignment allow this, again > currently on my github [2]. This patch has less dramatic performance > impact, but nonetheless significant: > > $ veristat -e file,states -C -f 'states_pct<-30' master.log helpers-written.log > File States (A) States (B) States (DIFF) > -------------------------- ---------- ---------- ---------------- > pyperf600_bpf_loop.bpf.o 287 156 -131 (-45.64%) > strobemeta.bpf.o 15879 4772 -11107 (-69.95%) > strobemeta_nounroll1.bpf.o 2065 1337 -728 (-35.25%) > strobemeta_nounroll2.bpf.o 20505 3788 -16717 (-81.53%) > test_cls_redirect.bpf.o 8129 4799 -3330 (-40.96%) > -------------------------- ---------- ---------- ---------------- > > I suggest that instead of dropping a useful safety check I can further > investigate difference in behavior between "uninit-reads.log" and > "helpers-written.log" and maybe figure out other improvements. > Unfortunately the comparison process is extremely time consuming. > > wdyt? I think reading uninitialized stack slot concerns are overblown in practice (in terms of their good implications for programmer's productivity), I'd still do it if only in the name of improving user experience. > > [1] https://github.com/eddyz87/bpf/tree/allow-uninit-stack-reads > [2] https://github.com/eddyz87/bpf/tree/mark-helper-stack-as-written > [3] https://github.com/kernel-patches/bpf/commit/b29842309271c21cbcb3f85d56cdf9f45f8382d2 > > > Indeed, some massive improvements: > > ./veristat -e file,prog,states -C -f 'states_diff<-10' bb aa > > File Program States (A) > > States (B) States (DIFF) > > -------------------------------- ----------------------- ---------- > > ---------- ---------------- > > bpf_flow.bpf.o flow_dissector_0 78 > > 67 -11 (-14.10%) > > loop6.bpf.o trace_virtqueue_add_sgs 336 > > 316 -20 (-5.95%) > > pyperf100.bpf.o on_event 6213 > > 4670 -1543 (-24.84%) > > pyperf180.bpf.o on_event 11470 > > 8364 -3106 (-27.08%) > > pyperf50.bpf.o on_event 3263 > > 2370 -893 (-27.37%) > > pyperf600.bpf.o on_event 30335 > > 22200 -8135 (-26.82%) > > pyperf600_bpf_loop.bpf.o on_event 287 > > 145 -142 (-49.48%) > > pyperf600_nounroll.bpf.o on_event 37101 > > 34169 -2932 (-7.90%) > > strobemeta.bpf.o on_event 15939 > > 4893 -11046 (-69.30%) > > strobemeta_nounroll1.bpf.o on_event 1936 > > 1538 -398 (-20.56%) > > strobemeta_nounroll2.bpf.o on_event 4436 > > 3991 -445 (-10.03%) > > strobemeta_subprogs.bpf.o on_event 2025 > > 1689 -336 (-16.59%) > > test_cls_redirect.bpf.o cls_redirect 4865 > > 4042 -823 (-16.92%) > > test_cls_redirect_subprogs.bpf.o cls_redirect 4506 > > 4389 -117 (-2.60%) > > test_tcp_hdr_options.bpf.o estab 211 > > 178 -33 (-15.64%) > > test_xdp_noinline.bpf.o balancer_ingress_v4 262 > > 235 -27 (-10.31%) > > test_xdp_noinline.bpf.o balancer_ingress_v6 253 > > 210 -43 (-17.00%) > > xdp_synproxy_kern.bpf.o syncookie_tc 25086 > > 7016 -18070 (-72.03%) > > xdp_synproxy_kern.bpf.o syncookie_xdp 24206 > > 6941 -17265 (-71.33%) > > -------------------------------- ----------------------- ---------- > > ---------- ---------------- >
On Mon, Jan 30, 2023 at 05:17:19PM -0800, Andrii Nakryiko wrote: > On Mon, Jan 30, 2023 at 7:34 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > On Thu, 2023-01-19 at 16:16 -0800, Alexei Starovoitov wrote: > > [...] > > > > > > > > > > > > Just to be clear. My suggestion was to *treat* STACK_INVALID as > > > > > > equivalent to STACK_MISC in stacksafe(), not really replace all the > > > > > > uses of STACK_INVALID with STACK_MISC. And to be on the safe side, I'd > > > > > > do it only if env->allow_ptr_leaks, of course. > > > > > > > > > > Well, that, and to allow STACK_INVALID if env->allow_ptr_leaks in > > > > > check_stack_read_fixed_off(), of course, to avoid "invalid read from > > > > > stack off %d+%d size %d\n" error (that's fixing at least part of the > > > > > problem with uninitialized struct padding). > > > > > > > > +1 to Andrii's idea. > > > > It should help us recover this small increase in processed states. > > > > > > [...] > > > > > > > > I've tried Andrii's suggestion: > > > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > > index 7ee218827259..0f71ba6a56e2 100644 > > > > --- a/kernel/bpf/verifier.c > > > > +++ b/kernel/bpf/verifier.c > > > > @@ -3591,7 +3591,7 @@ static int check_stack_read_fixed_off(struct > > > > bpf_verifier_env *env, > > > > > > > > copy_register_state(&state->regs[dst_regno], reg); > > > > state->regs[dst_regno].subreg_def = subreg_def; > > > > } else { > > > > - for (i = 0; i < size; i++) { > > > > + for (i = 0; i < size && > > > > !env->allow_uninit_stack; i++) { > > > > type = stype[(slot - i) % BPF_REG_SIZE]; > > > > if (type == STACK_SPILL) > > > > continue; > > > > @@ -3628,7 +3628,7 @@ static int check_stack_read_fixed_off(struct > > > > bpf_verifier_env *env, > > > > } > > > > mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64); > > > > } else { > > > > - for (i = 0; i < size; i++) { > > > > + for (i = 0; i < size && !env->allow_uninit_stack; i++) { > > > > type = stype[(slot - i) % BPF_REG_SIZE]; > > > > if (type == STACK_MISC) > > > > continue; > > > > @@ -13208,6 +13208,10 @@ static bool stacksafe(struct bpf_verifier_env > > > > *env, struct bpf_func_state *old, > > > > if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == > > > > STACK_INVALID) > > > > continue; > > > > > > > > + if (env->allow_uninit_stack && > > > > + old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC) > > > > + continue; > > > > > > > > and only dynptr/invalid_read[134] tests failed > > > > which is expected and acceptable. > > > > We can tweak those tests. > > > > > > > > Could you take over this diff, run veristat analysis and > > > > submit it as an official patch? I suspect we should see nice > > > > improvements in states processed. > > > > > > > Hi Alexei, Andrii, > > > > Please note that the patch > > "bpf: Fix to preserve reg parent/live fields when copying range info" > > that started this conversation was applied to `bpf` tree, not `bpf-next`, > > so I'll wait until it gets its way to `bpf-next` before submitting formal > > patches, as it changes the performance numbers collected by veristat. > > I did all my experiments with this patch applied on top of `bpf-next`. > > > > I adapted the patch suggested by Alexei and put it to my github for > > now [1]. The performance gains are indeed significant: > > > > $ ./veristat -e file,states -C -f 'states_pct<-30' master.log uninit-reads.log > > File States (A) States (B) States (DIFF) > > -------------------------- ---------- ---------- ---------------- > > bpf_host.o 349 244 -105 (-30.09%) > > bpf_host.o 1320 895 -425 (-32.20%) > > bpf_lxc.o 1320 895 -425 (-32.20%) > > bpf_sock.o 70 48 -22 (-31.43%) > > bpf_sock.o 68 46 -22 (-32.35%) > > bpf_xdp.o 1554 803 -751 (-48.33%) > > bpf_xdp.o 6457 2473 -3984 (-61.70%) > > bpf_xdp.o 7249 3908 -3341 (-46.09%) > > pyperf600_bpf_loop.bpf.o 287 145 -142 (-49.48%) > > strobemeta.bpf.o 15879 4790 -11089 (-69.83%) > > strobemeta_nounroll2.bpf.o 20505 3931 -16574 (-80.83%) > > xdp_synproxy_kern.bpf.o 22564 7009 -15555 (-68.94%) > > xdp_synproxy_kern.bpf.o 24206 6941 -17265 (-71.33%) > > -------------------------- ---------- ---------- ---------------- > > > > However, this comes at a cost of allowing reads from uninitialized > > stack locations. As far as I understand access to uninitialized local > > variable is one of the most common errors when programming in C > > (although citation is needed). > > Yeah, a citation is really needed :) I don't see this often in > practice, tbh. What I do see in practice is that people are > unnecessarily __builtint_memset(0) struct and initialize all fields > with field-by-field initialization, instead of just using a nice C > syntax: > > struct my_struct s = { > .field_a = 123, > .field_b = 234, > }; > > > And all that just because there is some padding between field_a and > field_b which the compiler won't zero-initialize. > > > > > Also more tests are failing after register parentage chains patch is > > applied than in Alexei's initial try: 10 verifier tests and 1 progs > > test (test_global_func10.c, I have not modified it yet, it should wait > > for my changes for unprivileged execution mode support in > > test_loader.c). I don't really like how I had to fix those tests. > > > > I took a detailed look at the difference in verifier behavior between > > master and the branch [1] for pyperf600_bpf_loop.bpf.o and identified > > that the difference is caused by the fact that helper functions do not > > mark the stack they access as REG_LIVE_WRITTEN, the details are in the > > commit message [3], but TLDR is the following example: > > > > 1: bpf_probe_read_user(&foo, ...); > > 2: if (*foo) ... > > > > Here `*foo` will not get REG_LIVE_WRITTEN mark when (1) is verified, > > thus `*foo` read at (2) might lead to excessive REG_LIVE_READ marks > > and thus more verification states. > > This is a good fix in its own right, of course, we should definitely do this! +1 > > > > I prepared a patch that changes helper calls verification to apply > > REG_LIVE_WRITTEN when write size and alignment allow this, again > > currently on my github [2]. This patch has less dramatic performance > > impact, but nonetheless significant: > > > > $ veristat -e file,states -C -f 'states_pct<-30' master.log helpers-written.log > > File States (A) States (B) States (DIFF) > > -------------------------- ---------- ---------- ---------------- > > pyperf600_bpf_loop.bpf.o 287 156 -131 (-45.64%) > > strobemeta.bpf.o 15879 4772 -11107 (-69.95%) > > strobemeta_nounroll1.bpf.o 2065 1337 -728 (-35.25%) > > strobemeta_nounroll2.bpf.o 20505 3788 -16717 (-81.53%) > > test_cls_redirect.bpf.o 8129 4799 -3330 (-40.96%) > > -------------------------- ---------- ---------- ---------------- > > > > I suggest that instead of dropping a useful safety check I can further > > investigate difference in behavior between "uninit-reads.log" and > > "helpers-written.log" and maybe figure out other improvements. > > Unfortunately the comparison process is extremely time consuming. > > > > wdyt? > > I think reading uninitialized stack slot concerns are overblown in > practice (in terms of their good implications for programmer's > productivity), I'd still do it if only in the name of improving user > experience. +1 Let's do both (REG_LIVE_WRITTEN for helpers and allow uninit). Uninit access should be caught by the compiler. The verifier is repeating the check for historical reasons when we tried to make it work for unpriv. Allow uninit won't increase the number of errors in bpf progs.
On Mon, 2023-01-30 at 18:42 -0800, Alexei Starovoitov wrote: [...] > > > > > > Hi Alexei, Andrii, > > > > > > Please note that the patch > > > "bpf: Fix to preserve reg parent/live fields when copying range info" > > > that started this conversation was applied to `bpf` tree, not `bpf-next`, > > > so I'll wait until it gets its way to `bpf-next` before submitting formal > > > patches, as it changes the performance numbers collected by veristat. > > > I did all my experiments with this patch applied on top of `bpf-next`. > > > > > > I adapted the patch suggested by Alexei and put it to my github for > > > now [1]. The performance gains are indeed significant: > > > > > > $ ./veristat -e file,states -C -f 'states_pct<-30' master.log uninit-reads.log > > > File States (A) States (B) States (DIFF) > > > -------------------------- ---------- ---------- ---------------- > > > bpf_host.o 349 244 -105 (-30.09%) > > > bpf_host.o 1320 895 -425 (-32.20%) > > > bpf_lxc.o 1320 895 -425 (-32.20%) > > > bpf_sock.o 70 48 -22 (-31.43%) > > > bpf_sock.o 68 46 -22 (-32.35%) > > > bpf_xdp.o 1554 803 -751 (-48.33%) > > > bpf_xdp.o 6457 2473 -3984 (-61.70%) > > > bpf_xdp.o 7249 3908 -3341 (-46.09%) > > > pyperf600_bpf_loop.bpf.o 287 145 -142 (-49.48%) > > > strobemeta.bpf.o 15879 4790 -11089 (-69.83%) > > > strobemeta_nounroll2.bpf.o 20505 3931 -16574 (-80.83%) > > > xdp_synproxy_kern.bpf.o 22564 7009 -15555 (-68.94%) > > > xdp_synproxy_kern.bpf.o 24206 6941 -17265 (-71.33%) > > > -------------------------- ---------- ---------- ---------------- > > > > > > However, this comes at a cost of allowing reads from uninitialized > > > stack locations. As far as I understand access to uninitialized local > > > variable is one of the most common errors when programming in C > > > (although citation is needed). > > > > Yeah, a citation is really needed :) I don't see this often in > > practice, tbh. What I do see in practice is that people are > > unnecessarily __builtint_memset(0) struct and initialize all fields > > with field-by-field initialization, instead of just using a nice C > > syntax: > > > > struct my_struct s = { > > .field_a = 123, > > .field_b = 234, > > }; > > > > > > And all that just because there is some padding between field_a and > > field_b which the compiler won't zero-initialize. Andrii, do you have such example somewhere? If the use case is to pass 's' to some helper function it should already work if function argument type is 'ARG_PTR_TO_UNINIT_MEM'. Handled by the following code in the verifier.c:check_stack_range_initialized(): ... if (meta && meta->raw_mode) { ... meta->access_size = access_size; meta->regno = regno; return 0; } But only for fixed stack offsets. I'm asking because it might point to some bug. > > > > > > > > Also more tests are failing after register parentage chains patch is > > > applied than in Alexei's initial try: 10 verifier tests and 1 progs > > > test (test_global_func10.c, I have not modified it yet, it should wait > > > for my changes for unprivileged execution mode support in > > > test_loader.c). I don't really like how I had to fix those tests. > > > > > > I took a detailed look at the difference in verifier behavior between > > > master and the branch [1] for pyperf600_bpf_loop.bpf.o and identified > > > that the difference is caused by the fact that helper functions do not > > > mark the stack they access as REG_LIVE_WRITTEN, the details are in the > > > commit message [3], but TLDR is the following example: > > > > > > 1: bpf_probe_read_user(&foo, ...); > > > 2: if (*foo) ... > > > > > > Here `*foo` will not get REG_LIVE_WRITTEN mark when (1) is verified, > > > thus `*foo` read at (2) might lead to excessive REG_LIVE_READ marks > > > and thus more verification states. > > > > This is a good fix in its own right, of course, we should definitely do this! > > +1 > > > > > > > I prepared a patch that changes helper calls verification to apply > > > REG_LIVE_WRITTEN when write size and alignment allow this, again > > > currently on my github [2]. This patch has less dramatic performance > > > impact, but nonetheless significant: > > > > > > $ veristat -e file,states -C -f 'states_pct<-30' master.log helpers-written.log > > > File States (A) States (B) States (DIFF) > > > -------------------------- ---------- ---------- ---------------- > > > pyperf600_bpf_loop.bpf.o 287 156 -131 (-45.64%) > > > strobemeta.bpf.o 15879 4772 -11107 (-69.95%) > > > strobemeta_nounroll1.bpf.o 2065 1337 -728 (-35.25%) > > > strobemeta_nounroll2.bpf.o 20505 3788 -16717 (-81.53%) > > > test_cls_redirect.bpf.o 8129 4799 -3330 (-40.96%) > > > -------------------------- ---------- ---------- ---------------- > > > > > > I suggest that instead of dropping a useful safety check I can further > > > investigate difference in behavior between "uninit-reads.log" and > > > "helpers-written.log" and maybe figure out other improvements. > > > Unfortunately the comparison process is extremely time consuming. > > > > > > wdyt? > > > > I think reading uninitialized stack slot concerns are overblown in > > practice (in terms of their good implications for programmer's > > productivity), I'd still do it if only in the name of improving user > > experience. > > +1 > Let's do both (REG_LIVE_WRITTEN for helpers and allow uninit). > > Uninit access should be caught by the compiler. > The verifier is repeating the check for historical reasons when we > tried to make it work for unpriv. > Allow uninit won't increase the number of errors in bpf progs. Thank you for the feedback. I'll submit both patches when "bpf: Fix to preserve reg parent/live fields when copying range info" will get to bpf-next. Thanks, Eduard.
On Tue, Jan 31, 2023 at 12:29 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Mon, 2023-01-30 at 18:42 -0800, Alexei Starovoitov wrote: > [...] > > > > > > > > Hi Alexei, Andrii, > > > > > > > > Please note that the patch > > > > "bpf: Fix to preserve reg parent/live fields when copying range info" > > > > that started this conversation was applied to `bpf` tree, not `bpf-next`, > > > > so I'll wait until it gets its way to `bpf-next` before submitting formal > > > > patches, as it changes the performance numbers collected by veristat. > > > > I did all my experiments with this patch applied on top of `bpf-next`. > > > > > > > > I adapted the patch suggested by Alexei and put it to my github for > > > > now [1]. The performance gains are indeed significant: > > > > > > > > $ ./veristat -e file,states -C -f 'states_pct<-30' master.log uninit-reads.log > > > > File States (A) States (B) States (DIFF) > > > > -------------------------- ---------- ---------- ---------------- > > > > bpf_host.o 349 244 -105 (-30.09%) > > > > bpf_host.o 1320 895 -425 (-32.20%) > > > > bpf_lxc.o 1320 895 -425 (-32.20%) > > > > bpf_sock.o 70 48 -22 (-31.43%) > > > > bpf_sock.o 68 46 -22 (-32.35%) > > > > bpf_xdp.o 1554 803 -751 (-48.33%) > > > > bpf_xdp.o 6457 2473 -3984 (-61.70%) > > > > bpf_xdp.o 7249 3908 -3341 (-46.09%) > > > > pyperf600_bpf_loop.bpf.o 287 145 -142 (-49.48%) > > > > strobemeta.bpf.o 15879 4790 -11089 (-69.83%) > > > > strobemeta_nounroll2.bpf.o 20505 3931 -16574 (-80.83%) > > > > xdp_synproxy_kern.bpf.o 22564 7009 -15555 (-68.94%) > > > > xdp_synproxy_kern.bpf.o 24206 6941 -17265 (-71.33%) > > > > -------------------------- ---------- ---------- ---------------- > > > > > > > > However, this comes at a cost of allowing reads from uninitialized > > > > stack locations. As far as I understand access to uninitialized local > > > > variable is one of the most common errors when programming in C > > > > (although citation is needed). > > > > > > Yeah, a citation is really needed :) I don't see this often in > > > practice, tbh. What I do see in practice is that people are > > > unnecessarily __builtint_memset(0) struct and initialize all fields > > > with field-by-field initialization, instead of just using a nice C > > > syntax: > > > > > > struct my_struct s = { > > > .field_a = 123, > > > .field_b = 234, > > > }; > > > > > > > > > And all that just because there is some padding between field_a and > > > field_b which the compiler won't zero-initialize. > > Andrii, do you have such example somewhere? If the use case is to pass > 's' to some helper function it should already work if function > argument type is 'ARG_PTR_TO_UNINIT_MEM'. Handled by the following > code in the verifier.c:check_stack_range_initialized(): > > ... > if (meta && meta->raw_mode) { > ... > meta->access_size = access_size; > meta->regno = regno; > return 0; > } > > But only for fixed stack offsets. I'm asking because it might point to > some bug. I don't have specifics at hand, but there were at least few different internal cases where this popped up. And just grepping code base, we have tons of __builtin_memset(&event, 0, sizeof(event)); in our code base, even if struct initialization would be more convenient. I don't remember how exactly that struct was used, but one of the use cases was passing it to bpf_perf_event_output(), which expects initialized memory. > > > > > > > > > > > > Also more tests are failing after register parentage chains patch is > > > > applied than in Alexei's initial try: 10 verifier tests and 1 progs > > > > test (test_global_func10.c, I have not modified it yet, it should wait > > > > for my changes for unprivileged execution mode support in > > > > test_loader.c). I don't really like how I had to fix those tests. > > > > > > > > I took a detailed look at the difference in verifier behavior between > > > > master and the branch [1] for pyperf600_bpf_loop.bpf.o and identified > > > > that the difference is caused by the fact that helper functions do not > > > > mark the stack they access as REG_LIVE_WRITTEN, the details are in the > > > > commit message [3], but TLDR is the following example: > > > > > > > > 1: bpf_probe_read_user(&foo, ...); > > > > 2: if (*foo) ... > > > > > > > > Here `*foo` will not get REG_LIVE_WRITTEN mark when (1) is verified, > > > > thus `*foo` read at (2) might lead to excessive REG_LIVE_READ marks > > > > and thus more verification states. > > > > > > This is a good fix in its own right, of course, we should definitely do this! > > > > +1 > > > > > > > > > > I prepared a patch that changes helper calls verification to apply > > > > REG_LIVE_WRITTEN when write size and alignment allow this, again > > > > currently on my github [2]. This patch has less dramatic performance > > > > impact, but nonetheless significant: > > > > > > > > $ veristat -e file,states -C -f 'states_pct<-30' master.log helpers-written.log > > > > File States (A) States (B) States (DIFF) > > > > -------------------------- ---------- ---------- ---------------- > > > > pyperf600_bpf_loop.bpf.o 287 156 -131 (-45.64%) > > > > strobemeta.bpf.o 15879 4772 -11107 (-69.95%) > > > > strobemeta_nounroll1.bpf.o 2065 1337 -728 (-35.25%) > > > > strobemeta_nounroll2.bpf.o 20505 3788 -16717 (-81.53%) > > > > test_cls_redirect.bpf.o 8129 4799 -3330 (-40.96%) > > > > -------------------------- ---------- ---------- ---------------- > > > > > > > > I suggest that instead of dropping a useful safety check I can further > > > > investigate difference in behavior between "uninit-reads.log" and > > > > "helpers-written.log" and maybe figure out other improvements. > > > > Unfortunately the comparison process is extremely time consuming. > > > > > > > > wdyt? > > > > > > I think reading uninitialized stack slot concerns are overblown in > > > practice (in terms of their good implications for programmer's > > > productivity), I'd still do it if only in the name of improving user > > > experience. > > > > +1 > > Let's do both (REG_LIVE_WRITTEN for helpers and allow uninit). > > > > Uninit access should be caught by the compiler. > > The verifier is repeating the check for historical reasons when we > > tried to make it work for unpriv. > > Allow uninit won't increase the number of errors in bpf progs. > > Thank you for the feedback. I'll submit both patches when > "bpf: Fix to preserve reg parent/live fields when copying range info" > will get to bpf-next. > > Thanks, > Eduard.