Message ID | 162318061518.323820.4329181800429686297.stgit@john-XPS-13-9370 (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf fix for mixed tail calls and subprograms | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 5 maintainers not CCed: yhs@fb.com kpsingh@kernel.org andrii@kernel.org kafai@fb.com songliubraving@fb.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 20 this patch: 20 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 24 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 20 this patch: 20 |
netdev/header_inline | success | Link |
On 6/8/21 12:30 PM, John Fastabend wrote: > The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and > then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the > individual JITs. The poke_tab[] to use is stored in the insn->imm by > the code adding it to that array slot. The JIT then uses imm to find the > right entry for an individual instruction. In the x86 bpf_jit_comp.c > this is done by calling emit_bpf_tail_call_direct with the poke_tab[] > of the imm value. > > However, we observed the below null-ptr-deref when mixing tail call > programs with subprog programs. For this to happen we just need to > mix bpf-2-bpf calls and tailcalls with some extra calls or instructions > that would be patched later by one of the fixup routines. So whats > happening? > > Before the fixup_call_args() -- where the jit op is done -- various > code patching is done by do_misc_fixups(). This may increase the > insn count, for example when we patch map_lookup_up using map_gen_lookup > hook. This does two things. First, it means the instruction index, > insn_idx field, of a tail call instruction will move by a 'delta'. > > In verifier code, > > struct bpf_jit_poke_descriptor desc = { > .reason = BPF_POKE_REASON_TAIL_CALL, > .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state), > .tail_call.key = bpf_map_key_immediate(aux), > .insn_idx = i + delta, > }; > > Then subprog start values subprog_info[i].start will be updated > with the delta and any poke descriptor index will also be updated > with the delta in adjust_poke_desc(). If we look at the adjust > subprog starts though we see its only adjusted when the delta > occurs before the new instructions, > > /* NOTE: fake 'exit' subprog should be updated as well. */ > for (i = 0; i <= env->subprog_cnt; i++) { > if (env->subprog_info[i].start <= off) > continue; > > Earlier subprograms are not changed because their start values > are not moved. But, adjust_poke_desc() does the offset + delta > indiscriminately. The result is poke descriptors are potentially > corrupted. > > Then in jit_subprogs() we only populate the poke_tab[] > when the above insn_idx is less than the next subprogram start. From > above we corrupted our insn_idx so we might incorrectly assume a > poke descriptor is not used in a subprogram omitting it from the > subprogram. And finally when the jit runs it does the deref of poke_tab > when emitting the instruction and crashes with below. Because earlier > step omitted the poke descriptor. > > The fix is straight forward with above context. Simply move same logic > from adjust_subprog_starts() into adjust_poke_descs() and only adjust > insn_idx when needed. > > [ 88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290 > [ 88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295 > [ 88.487490] Call Trace: > [ 88.487498] dump_stack+0x93/0xc2 > [ 88.487515] kasan_report.cold+0x5f/0xd8 > [ 88.487530] ? do_jit+0x184a/0x3290 > [ 88.487542] do_jit+0x184a/0x3290 > ... > [ 88.487709] bpf_int_jit_compile+0x248/0x810 > ... > [ 88.487765] bpf_check+0x3718/0x5140 > ... > [ 88.487920] bpf_prog_load+0xa22/0xf10 > > CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Fixes: a748c6975dea3 ("bpf: propagate poke descriptors to subprograms") > Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com>
On Tue, Jun 08, 2021 at 12:30:15PM -0700, John Fastabend wrote: > The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and > then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the > individual JITs. The poke_tab[] to use is stored in the insn->imm by > the code adding it to that array slot. The JIT then uses imm to find the > right entry for an individual instruction. In the x86 bpf_jit_comp.c > this is done by calling emit_bpf_tail_call_direct with the poke_tab[] > of the imm value. > > However, we observed the below null-ptr-deref when mixing tail call > programs with subprog programs. For this to happen we just need to > mix bpf-2-bpf calls and tailcalls with some extra calls or instructions > that would be patched later by one of the fixup routines. So whats > happening? > > Before the fixup_call_args() -- where the jit op is done -- various > code patching is done by do_misc_fixups(). This may increase the > insn count, for example when we patch map_lookup_up using map_gen_lookup > hook. This does two things. First, it means the instruction index, > insn_idx field, of a tail call instruction will move by a 'delta'. > > In verifier code, > > struct bpf_jit_poke_descriptor desc = { > .reason = BPF_POKE_REASON_TAIL_CALL, > .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state), > .tail_call.key = bpf_map_key_immediate(aux), > .insn_idx = i + delta, > }; > > Then subprog start values subprog_info[i].start will be updated > with the delta and any poke descriptor index will also be updated > with the delta in adjust_poke_desc(). If we look at the adjust > subprog starts though we see its only adjusted when the delta > occurs before the new instructions, > > /* NOTE: fake 'exit' subprog should be updated as well. */ > for (i = 0; i <= env->subprog_cnt; i++) { > if (env->subprog_info[i].start <= off) > continue; > > Earlier subprograms are not changed because their start values > are not moved. But, adjust_poke_desc() does the offset + delta > indiscriminately. The result is poke descriptors are potentially > corrupted. > > Then in jit_subprogs() we only populate the poke_tab[] > when the above insn_idx is less than the next subprogram start. From > above we corrupted our insn_idx so we might incorrectly assume a > poke descriptor is not used in a subprogram omitting it from the > subprogram. And finally when the jit runs it does the deref of poke_tab > when emitting the instruction and crashes with below. Because earlier > step omitted the poke descriptor. > > The fix is straight forward with above context. Simply move same logic > from adjust_subprog_starts() into adjust_poke_descs() and only adjust > insn_idx when needed. > > [ 88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290 > [ 88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295 > [ 88.487490] Call Trace: > [ 88.487498] dump_stack+0x93/0xc2 > [ 88.487515] kasan_report.cold+0x5f/0xd8 > [ 88.487530] ? do_jit+0x184a/0x3290 > [ 88.487542] do_jit+0x184a/0x3290 > ... > [ 88.487709] bpf_int_jit_compile+0x248/0x810 > ... > [ 88.487765] bpf_check+0x3718/0x5140 > ... > [ 88.487920] bpf_prog_load+0xa22/0xf10 > > CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Fixes: a748c6975dea3 ("bpf: propagate poke descriptors to subprograms") > Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: John Fastabend <john.fastabend@gmail.com> > --- > kernel/bpf/verifier.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 94ba5163d4c5..ac8373da849c 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -11408,7 +11408,7 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len > } > } > > -static void adjust_poke_descs(struct bpf_prog *prog, u32 len) > +static void adjust_poke_descs(struct bpf_prog *prog, u32 off, u32 len) > { > struct bpf_jit_poke_descriptor *tab = prog->aux->poke_tab; > int i, sz = prog->aux->size_poke_tab; > @@ -11416,6 +11416,8 @@ static void adjust_poke_descs(struct bpf_prog *prog, u32 len) > > for (i = 0; i < sz; i++) { > desc = &tab[i]; Can we have a comment below that would say something like: "don't update taicall's insn idx if the patching is being done on higher insns" ? What I'm saying is that after a long break from that code I find 'off' as a confusing name. It's the offset within the flat-structured bpf prog (so the prog that is not yet sliced onto subprogs). Maybe we could find a better name for that, like "curr_insn_idx". I'm not sure what's your view on that. OTOH I'm aware that whole content of bpf_patch_insn_data operates on 'off'. Generally sorry that I missed that, it didn't come to my mind to mix in other helpers that include patching. Anyway: Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > + if (desc->insn_idx <= off) > + continue; > desc->insn_idx += len - 1; > } > } > @@ -11436,7 +11438,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of > if (adjust_insn_aux_data(env, new_prog, off, len)) > return NULL; > adjust_subprog_starts(env, off, len); > - adjust_poke_descs(new_prog, len); > + adjust_poke_descs(new_prog, off, len); > return new_prog; > } > > >
Maciej Fijalkowski wrote: > On Tue, Jun 08, 2021 at 12:30:15PM -0700, John Fastabend wrote: > > The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and > > then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the > > individual JITs. The poke_tab[] to use is stored in the insn->imm by > > the code adding it to that array slot. The JIT then uses imm to find the > > right entry for an individual instruction. In the x86 bpf_jit_comp.c > > this is done by calling emit_bpf_tail_call_direct with the poke_tab[] > > of the imm value. > > > > However, we observed the below null-ptr-deref when mixing tail call > > programs with subprog programs. For this to happen we just need to > > mix bpf-2-bpf calls and tailcalls with some extra calls or instructions > > that would be patched later by one of the fixup routines. So whats > > happening? > > > > Before the fixup_call_args() -- where the jit op is done -- various > > code patching is done by do_misc_fixups(). This may increase the > > insn count, for example when we patch map_lookup_up using map_gen_lookup > > hook. This does two things. First, it means the instruction index, > > insn_idx field, of a tail call instruction will move by a 'delta'. > > > > In verifier code, > > > > struct bpf_jit_poke_descriptor desc = { > > .reason = BPF_POKE_REASON_TAIL_CALL, > > .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state), > > .tail_call.key = bpf_map_key_immediate(aux), > > .insn_idx = i + delta, > > }; > > > > Then subprog start values subprog_info[i].start will be updated > > with the delta and any poke descriptor index will also be updated > > with the delta in adjust_poke_desc(). If we look at the adjust > > subprog starts though we see its only adjusted when the delta > > occurs before the new instructions, > > > > /* NOTE: fake 'exit' subprog should be updated as well. */ > > for (i = 0; i <= env->subprog_cnt; i++) { > > if (env->subprog_info[i].start <= off) > > continue; > > > > Earlier subprograms are not changed because their start values > > are not moved. But, adjust_poke_desc() does the offset + delta > > indiscriminately. The result is poke descriptors are potentially > > corrupted. > > > > Then in jit_subprogs() we only populate the poke_tab[] > > when the above insn_idx is less than the next subprogram start. From > > above we corrupted our insn_idx so we might incorrectly assume a > > poke descriptor is not used in a subprogram omitting it from the > > subprogram. And finally when the jit runs it does the deref of poke_tab > > when emitting the instruction and crashes with below. Because earlier > > step omitted the poke descriptor. > > > > The fix is straight forward with above context. Simply move same logic > > from adjust_subprog_starts() into adjust_poke_descs() and only adjust > > insn_idx when needed. > > > > [ 88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290 > > [ 88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295 > > [ 88.487490] Call Trace: > > [ 88.487498] dump_stack+0x93/0xc2 > > [ 88.487515] kasan_report.cold+0x5f/0xd8 > > [ 88.487530] ? do_jit+0x184a/0x3290 > > [ 88.487542] do_jit+0x184a/0x3290 > > ... > > [ 88.487709] bpf_int_jit_compile+0x248/0x810 > > ... > > [ 88.487765] bpf_check+0x3718/0x5140 > > ... > > [ 88.487920] bpf_prog_load+0xa22/0xf10 > > > > CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > Fixes: a748c6975dea3 ("bpf: propagate poke descriptors to subprograms") > > Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> > > Signed-off-by: John Fastabend <john.fastabend@gmail.com> > > --- > > kernel/bpf/verifier.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 94ba5163d4c5..ac8373da849c 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -11408,7 +11408,7 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len > > } > > } > > > > -static void adjust_poke_descs(struct bpf_prog *prog, u32 len) > > +static void adjust_poke_descs(struct bpf_prog *prog, u32 off, u32 len) > > { > > struct bpf_jit_poke_descriptor *tab = prog->aux->poke_tab; > > int i, sz = prog->aux->size_poke_tab; > > @@ -11416,6 +11416,8 @@ static void adjust_poke_descs(struct bpf_prog *prog, u32 len) > > > > for (i = 0; i < sz; i++) { > > desc = &tab[i]; > > Can we have a comment below that would say something like: > "don't update taicall's insn idx if the patching is being done on higher > insns" ? > > What I'm saying is that after a long break from that code I find 'off' as > a confusing name. It's the offset within the flat-structured bpf prog (so > the prog that is not yet sliced onto subprogs). Maybe we could find a > better name for that, like "curr_insn_idx". I'm not sure what's your view > on that. > > OTOH I'm aware that whole content of bpf_patch_insn_data operates on > 'off'. I'm not necessarily opposed to a comment there but we don't have a comment above for the same operation on start offsets. I'll think about it for a follow up patch assuming no one shouts. > > Generally sorry that I missed that, it didn't come to my mind to mix in > other helpers that include patching. Thanks for testing. > > Anyway: > Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > > + if (desc->insn_idx <= off) > > + continue; > > desc->insn_idx += len - 1; > > } > > } > > @@ -11436,7 +11438,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of > > if (adjust_insn_aux_data(env, new_prog, off, len)) > > return NULL; > > adjust_subprog_starts(env, off, len); > > - adjust_poke_descs(new_prog, len); > > + adjust_poke_descs(new_prog, off, len); > > return new_prog; > > } > > > > > >
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 94ba5163d4c5..ac8373da849c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -11408,7 +11408,7 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len } } -static void adjust_poke_descs(struct bpf_prog *prog, u32 len) +static void adjust_poke_descs(struct bpf_prog *prog, u32 off, u32 len) { struct bpf_jit_poke_descriptor *tab = prog->aux->poke_tab; int i, sz = prog->aux->size_poke_tab; @@ -11416,6 +11416,8 @@ static void adjust_poke_descs(struct bpf_prog *prog, u32 len) for (i = 0; i < sz; i++) { desc = &tab[i]; + if (desc->insn_idx <= off) + continue; desc->insn_idx += len - 1; } } @@ -11436,7 +11438,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of if (adjust_insn_aux_data(env, new_prog, off, len)) return NULL; adjust_subprog_starts(env, off, len); - adjust_poke_descs(new_prog, len); + adjust_poke_descs(new_prog, off, len); return new_prog; }