[RFC,bpf-next,v1,3/9] bpf: Implement bpf_throw kfunc

Message ID	20230405004239.1375399-4-memxor@gmail.com (mailing list archive)
State	RFC
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Kumar Kartikeya Dwivedi <memxor@gmail.com> To: bpf@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Martin KaFai Lau <martin.lau@kernel.org>, David Vernet <void@manifault.com> Subject: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc Date: Wed, 5 Apr 2023 02:42:33 +0200 Message-Id: <20230405004239.1375399-4-memxor@gmail.com> In-Reply-To: <20230405004239.1375399-1-memxor@gmail.com> References: <20230405004239.1375399-1-memxor@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Exceptions - 1/2 \| expand [RFC,bpf-next,v1,0/9] Exceptions - 1/2 [RFC,bpf-next,v1,1/9] bpf: Fix kfunc callback handling [RFC,bpf-next,v1,2/9] bpf: Refactor and generalize optimize_bpf_loop [RFC,bpf-next,v1,3/9] bpf: Implement bpf_throw kfunc [RFC,bpf-next,v1,4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs [RFC,bpf-next,v1,5/9] bpf: Add pass to fixup global function throw information [RFC,bpf-next,v1,6/9] bpf: Add KF_THROW annotation for kfuncs [RFC,bpf-next,v1,7/9] bpf: Introduce bpf_set_exception_callback kfunc [RFC,bpf-next,v1,8/9] bpf: Introduce BPF assertion macros [RFC,bpf-next,v1,9/9] selftests/bpf: Add tests for BPF exceptions

Context	Check	Description
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-4	pending	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for build for x86_64 with llvm-16
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 18485 this patch: 18485
netdev/cc_maintainers	warning	21 maintainers not CCed: mgorman@suse.de mykolal@fb.com sdf@google.com mingo@redhat.com yhs@fb.com john.fastabend@gmail.com dietmar.eggemann@arm.com jolsa@kernel.org martin.lau@linux.dev linux-kselftest@vger.kernel.org bsegall@google.com vschneid@redhat.com peterz@infradead.org vincent.guittot@linaro.org kpsingh@kernel.org bristot@redhat.com shuah@kernel.org haoluo@google.com song@kernel.org rostedt@goodmis.org juri.lelli@redhat.com
netdev/build_clang	success	Errors and warnings before: 3988 this patch: 3988
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 19432 this patch: 19432
netdev/checkpatch	fail	CHECK: Please don't use multiple blank lines CHECK: Unnecessary parentheses around 'prog->expected_attach_type == BPF_TRACE_FEXIT' CHECK: extern prototypes should be avoided in .h files CHECK: multiple assignments should be avoided ERROR: open brace '{' following function definitions go on the next line WARNING: Prefer 'fallthrough;' over fallthrough comment WARNING: Prefer __noreturn over __attribute__((noreturn)) WARNING: line length of 100 exceeds 80 columns WARNING: line length of 104 exceeds 80 columns WARNING: line length of 110 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns WARNING: line length of 96 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

On Wed, Apr 12, 2023 at 09:36:12PM CEST, Alexei Starovoitov wrote: > On Fri, Apr 07, 2023 at 04:46:55AM +0200, Kumar Kartikeya Dwivedi wrote: > > On Fri, Apr 07, 2023 at 04:11:36AM CEST, Alexei Starovoitov wrote: > > > On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote: > > > > > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > > > > > > > > > - The exception state is represented using four booleans in the > > > > > > task_struct of current task. Each boolean corresponds to the exception > > > > > > state for each kernel context. This allows BPF programs to be > > > > > > interrupted and still not clobber the other's exception state. > > > > > > > > > > that doesn't work for sleepable bpf progs and in RT for regular progs too. > > > > > > > > > > > > > Can you elaborate? If a sleepable program blocks, that means the task is > > > > scheduled out, so the next program will use the other task's task_struct. > > > > Same for preemption for normal progs (under RT or not). > > > > > > I was worried about the case of the same task but different code paths > > > in the kernel with tracing prog stepping on preempted lsm.s prog. > > > I think you point that in this case they gotta be in different interrupt_context_level. > > > I need to think it through a bit more. > > > > > > > If there is nesting, the programs always clear their exception state on exit, so > > the prog that calls into the kernel which then calls into the tracing prog etc. > > won't see its exception state on return. The only path where attaching programs > > would screw things up is when we see a thrown exception and start unwinding > > (where clearing would be a problem since its done frame-by-frame). For that, I > > already prevent _throwing_ fexit programs from attaching to subprogs in this > > series (normal ones are still ok and supported, because fentry/fexit is > > important for stats etc.). There might be some other corner cases I missed but > > ensuring this property alone in general should make things work correctly. > > > > > > > How about using a scratch space in prog->aux->exception[] instead of current task? > > > > > > > > > > > > > I actually had this thought. It's even better because we can hardcode the > > > > address of the exception state right in the program (since prog->aux remains > > > > stable during bpf_patch_insn_data). > > > > > > exactly. > > > > > > > However, concurrent program invocations on > > > > multiple CPUs doesn't work well with this. It's like, one program sets the state > > > > while the other tries to check it. > > > > > > Right. If it asserts on one cpu all other cpus will unwind as well, > > > since we're saying bpf_assert is for exceptions when user cannot convince > > > the verifier that the program is correct. > > > So it doesn't matter that it aborted everywhere. It's probably a good thing too. > > > > > > > We can discuss the semantics (this makes bpf_assert more stronger and basically > > poisons the program globally in some sense), but implementation wise it's going > > to be a lot more tricky to reason about correctness. > > > > Right now, the verifier follows paths and knows what resources are held when we > > throw from a nested call chain (to complain on leaks). Callers doing the check > > for exception state at runtime expect only certain throwing points to trigger > > the check and rely on that for leak freedom. > > > > With a global prog->aux->exception, things will be ok for the CPU on which the > > exception was thrown, but some other CPU will see the check returning true in a > > caller even if the callee subprog for it did not throw and was possibly > > transferring its acquired references to the caller after completing execution, > > which now causes leaks (because subprogs are allowed to acquire and return to > > their caller). > > > > The way to handle this would be that we assume every callee which throws may > > also notionally throw right when returning (due to some other CPU's throw which > > we may see). Then every exit from throwing callees may be processed as throwing > > if we see the global state as set. > > > > However, this completely prevents subprogs from transferring some acquired > > resource to their caller (which I think is too restrictive). If I'm acquiring > > memory from static subprog and returning to my caller, I can't any longer since > > I notionally throw when exiting and holding resources when doing bpf_throw is > > disallowed, so transfer is out of the question. > > I was under impression that subprogs cannot acquire refs and transfer them > to caller. > Looks like your commit 9d9d00ac29d0 ("bpf: Fix reference state management for synchronous callbacks") > allowed too much. I think you misunderstood the change in that commit. It was about restricting callback functions from acquiring references and not releasing them before their BPF_EXIT (since our handling is not completely correct for more than one iteration). The verifier has always allowed acquiring references and transferring them to the caller for subprog calls. > I don't think it's a good idea to support coding patterns like: > void my_alloc_foo(struct foo **ptr) > { > struct foo *p = bpf_obj_new(typeof(*p)); > *ptr = p; > } > > It's a correct C, of course, but do we really want to support such code? > I don't think the verifier can fully support it anyway. > That commit of yours allowed some of it in theory, but above example probably won't work, > since 'transfer' isn't understood by the verifier. I have no strong opinions about restricting (I think the code for handling transfers is sane and correct, we just transfer the modified reference state, and it's a natural valid form of writing programs), especially since static subprogs do not have the limitations as global subprogs, and you really don't want to inline everything all the time. But I think we may end up breaking existing code/programs if we do. Whether that fallout will be small or not, I have no data yet to predict. > > Regardless whether we tighten the verifier now or later such subprogs shouldn't be throwing. > So I don't see an issue doing global prog->aux->exception. That's certainly an option, but I still think we need to be a bit careful. The reason is that during analysis, we need to determine that whenever a subprog call exits, are we in a state where we can safely unwind? It might end up restricting a large set of use cases, but I can only say with certainty after I try it out. Right now, I heavily rely on the assumption that the checks only become true when something throws (to also minimize rewrites, but that's a minor reason). The core reason is being able to argue about correctness. With global exception state, they can become true anytime, so we need to be a lot more conservative even if we e.g. didn't see a subprog as throwing from all callsites. call subprog(A) // will be rewritten, as using R1=A can throw call subprog(B) // not rewritten, as using R1=B does not throw

diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 002a811b6b90..04b81f5fe809 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1287,6 +1287,7 @@ static inline bool bpf_prog_has_trampoline(const struct bpf_prog *prog) struct bpf_func_info_aux { u16 linkage; bool unreliable; + bool throws_exception; }; enum bpf_jit_poke_reason { @@ -1430,7 +1431,8 @@ struct bpf_prog { enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ call_get_func_ip:1, /* Do we call get_func_ip() */ - tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */ + tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */ + throws_exception:1; /* Does this program throw exceptions? */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ @@ -3035,4 +3037,9 @@ static inline gfp_t bpf_memcg_flags(gfp_t flags) return flags; } +/* BPF Exception helpers */ +void bpf_reset_exception(void); +u64 bpf_get_exception(void); +void bpf_throw(void); + #endif /* _LINUX_BPF_H */ diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 81d525d057c7..bc067223d3ee 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -430,6 +430,17 @@ struct bpf_loop_inline_state { u32 callback_subprogno; /* valid when fit_for_inline is true */ }; +enum { + BPF_THROW_NONE, + BPF_THROW_OUTER, + BPF_THROW_INNER, +}; + +struct bpf_throw_state { + int type; + bool check_helper_ret_code; +}; + /* Possible states for alu_state member. */ #define BPF_ALU_SANITIZE_SRC (1U << 0) #define BPF_ALU_SANITIZE_DST (1U << 1) @@ -464,6 +475,7 @@ struct bpf_insn_aux_data { */ struct bpf_loop_inline_state loop_inline_state; }; + struct bpf_throw_state throw_state; u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */ struct btf_struct_meta *kptr_struct_meta; u64 map_key_state; /* constant (32 bit) key tracking for maps */ @@ -537,6 +549,7 @@ struct bpf_subprog_info { bool tail_call_reachable; bool has_ld_abs; bool is_async_cb; + bool can_throw; }; /* single container for all structs diff --git a/include/linux/sched.h b/include/linux/sched.h index b11b4517760f..a568245b59a2 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1480,6 +1480,7 @@ struct task_struct { struct bpf_local_storage __rcu *bpf_storage; /* Used for BPF run context */ struct bpf_run_ctx *bpf_ctx; + bool bpf_exception_thrown[4]; #endif #ifdef CONFIG_GCC_PLUGIN_STACKLEAK diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 2058e89b5ddd..de0eadf8706f 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -905,7 +905,14 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map, if (IS_ERR(prog)) return prog; - if (!bpf_prog_map_compatible(map, prog)) { + /* Programs which throw exceptions are not allowed to be tail call + * targets. This is because it forces us to be conservative for each + * bpf_tail_call invocation and assume it may throw, since we do not + * know what the target program may do, thus causing us to propagate the + * exception and mark calling prog as potentially throwing. Just be + * restrictive for now and disallow this. + */ + if (prog->throws_exception || !bpf_prog_map_compatible(map, prog)) { bpf_prog_put(prog); return ERR_PTR(-EINVAL); } diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 6be16db9f188..89e70907257c 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1879,6 +1879,20 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root, } } +notrace void bpf_reset_exception(void) +{ + int i = interrupt_context_level(); + + current->bpf_exception_thrown[i] = false; +} + +notrace u64 bpf_get_exception(void) +{ + int i = interrupt_context_level(); + + return current->bpf_exception_thrown[i]; +} + __diag_push(); __diag_ignore_all("-Wmissing-prototypes", "Global functions as their definitions will be in vmlinux BTF"); @@ -2295,6 +2309,13 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) rcu_read_unlock(); } +__bpf_kfunc notrace void bpf_throw(void) +{ + int i = interrupt_context_level(); + + current->bpf_exception_thrown[i] = true; +} + __diag_pop(); BTF_SET8_START(generic_btf_ids) @@ -2321,6 +2342,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL) #endif BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_throw) BTF_SET8_END(generic_btf_ids) static const struct btf_kfunc_id_set generic_kfunc_set = { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index e18ac7fdc210..f82e7a174d6a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3144,6 +3144,16 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog, tgt_prog = prog->aux->dst_prog; } + /* Don't allow tracing programs to attach to fexit and clear exception + * state when we are unwinding the program. + */ + if (prog->type == BPF_PROG_TYPE_TRACING && + (prog->expected_attach_type == BPF_TRACE_FEXIT) && + tgt_prog && tgt_prog->throws_exception && prog->throws_exception) { + err = -EINVAL; + goto out_unlock; + } + err = bpf_link_prime(&link->link.link, &link_primer); if (err) goto out_unlock; diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index f61d5138b12b..e9f9dd52f16c 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -514,7 +514,9 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_tr kind = bpf_attach_type_to_tramp(link->link.prog); if (tr->extension_prog) /* cannot attach fentry/fexit if extension prog is attached. - * cannot overwrite extension prog either. + * cannot overwrite extension prog either. We rely on this to + * not check extension prog's exception specification (since + * throwing extension may not replace non-throwing). */ return -EBUSY; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8ecd5df73b07..6981d8817c71 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2787,6 +2787,8 @@ static int add_subprog_and_kfunc(struct bpf_verifier_env *env) return 0; } +static bool is_bpf_throw_call(struct bpf_insn *insn); + static int check_subprogs(struct bpf_verifier_env *env) { int i, subprog_start, subprog_end, off, cur_subprog = 0; @@ -2820,11 +2822,12 @@ static int check_subprogs(struct bpf_verifier_env *env) if (i == subprog_end - 1) { /* to avoid fall-through from one subprog into another * the last insn of the subprog should be either exit - * or unconditional jump back + * or unconditional jump back or bpf_throw call */ if (code != (BPF_JMP | BPF_EXIT) && - code != (BPF_JMP | BPF_JA)) { - verbose(env, "last insn is not an exit or jmp\n"); + code != (BPF_JMP | BPF_JA) && + !is_bpf_throw_call(insn + i)) { + verbose(env, "last insn is not an exit or jmp or bpf_throw call\n"); return -EINVAL; } subprog_start = subprog_end; @@ -8200,6 +8203,7 @@ static int set_callee_state(struct bpf_verifier_env *env, struct bpf_func_state *callee, int insn_idx); static bool is_callback_calling_kfunc(u32 btf_id); +static int mark_chain_throw(struct bpf_verifier_env *env, int insn_idx); static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, int *insn_idx, int subprog, @@ -8247,6 +8251,12 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn caller->regs[BPF_REG_0].subreg_def = DEF_NOT_SUBREG; /* continue with next insn after call */ + + /* We don't explore the global function, but if it + * throws, mark the callchain as throwing. + */ + if (env->subprog_info[subprog].can_throw) + return mark_chain_throw(env, *insn_idx); return 0; } } @@ -8382,6 +8392,53 @@ static int set_callee_state(struct bpf_verifier_env *env, return 0; } +static int set_throw_state_type(struct bpf_verifier_env *env, int insn_idx, + int frame, int subprog) +{ + struct bpf_throw_state *ts = &env->insn_aux_data[insn_idx].throw_state; + int type; + + if (!frame && !subprog && env->prog->type != BPF_PROG_TYPE_EXT) + type = BPF_THROW_OUTER; + else + type = BPF_THROW_INNER; + if (ts->type != BPF_THROW_NONE) { + if (ts->type != type) { + verbose(env, + "conflicting rewrite type for throwing call insn %d: %d and %d\n", + insn_idx, ts->type, type); + return -EINVAL; + } + } + ts->type = type; + return 0; +} + +static int mark_chain_throw(struct bpf_verifier_env *env, int insn_idx) { + struct bpf_func_info_aux *func_info_aux = env->prog->aux->func_info_aux; + struct bpf_subprog_info *subprog = env->subprog_info; + struct bpf_verifier_state *state = env->cur_state; + struct bpf_func_state **frame = state->frame; + u32 cur_subprogno; + int ret; + + /* Mark all callsites leading up to this throw and their corresponding + * subprogs and update their func_info_aux table. + */ + for (int i = 1; i <= state->curframe; i++) { + u32 subprogno = frame[i - 1]->subprogno; + + func_info_aux[subprogno].throws_exception = subprog[subprogno].can_throw = true; + ret = set_throw_state_type(env, frame[i]->callsite, i - 1, subprogno); + if (ret < 0) + return ret; + } + /* Now mark actual instruction which caused the throw */ + cur_subprogno = frame[state->curframe]->subprogno; + func_info_aux[cur_subprogno].throws_exception = subprog[cur_subprogno].can_throw = true; + return set_throw_state_type(env, insn_idx, state->curframe, cur_subprogno); +} + static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, int *insn_idx) { @@ -8394,7 +8451,6 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, target_insn); return -EFAULT; } - return __check_func_call(env, insn, insn_idx, subprog, set_callee_state); } @@ -8755,17 +8811,17 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, return 0; } -static int check_reference_leak(struct bpf_verifier_env *env) +static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit) { struct bpf_func_state *state = cur_func(env); bool refs_lingering = false; int i; - if (state->frameno && !state->in_callback_fn) + if (!exception_exit && state->frameno && !state->in_callback_fn) return 0; for (i = 0; i < state->acquired_refs; i++) { - if (state->in_callback_fn && state->refs[i].callback_ref != state->frameno) + if (!exception_exit && state->in_callback_fn && state->refs[i].callback_ref != state->frameno) continue; verbose(env, "Unreleased reference id=%d alloc_insn=%d\n", state->refs[i].id, state->refs[i].insn_idx); @@ -8999,7 +9055,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn switch (func_id) { case BPF_FUNC_tail_call: - err = check_reference_leak(env); + err = check_reference_leak(env, false); if (err) { verbose(env, "tail_call would lead to reference leak\n"); return err; @@ -9615,6 +9671,7 @@ enum special_kfunc_type { KF_bpf_dynptr_from_xdp, KF_bpf_dynptr_slice, KF_bpf_dynptr_slice_rdwr, + KF_bpf_throw, }; BTF_SET_START(special_kfunc_set) @@ -9633,6 +9690,7 @@ BTF_ID(func, bpf_dynptr_from_skb) BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) +BTF_ID(func, bpf_throw) BTF_SET_END(special_kfunc_set) BTF_ID_LIST(special_kfunc_list) @@ -9653,6 +9711,7 @@ BTF_ID(func, bpf_dynptr_from_skb) BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) +BTF_ID(func, bpf_throw) static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta) { @@ -10736,6 +10795,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, } } + if (meta.btf == btf_vmlinux && meta.func_id == special_kfunc_list[KF_bpf_throw]) { + err = mark_chain_throw(env, insn_idx); + if (err < 0) + return err; + return 1; + } + for (i = 0; i < CALLER_SAVED_REGS; i++) mark_reg_not_init(env, regs, caller_saved[i]); @@ -13670,7 +13736,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn) * gen_ld_abs() may terminate the program at runtime, leading to * reference leak. */ - err = check_reference_leak(env); + err = check_reference_leak(env, false); if (err) { verbose(env, "BPF_LD_[ABS|IND] cannot be mixed with socket references\n"); return err; @@ -14075,6 +14141,10 @@ static int visit_insn(int t, struct bpf_verifier_env *env) if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { struct bpf_kfunc_call_arg_meta meta; + /* 'call bpf_throw' has no fallthrough edge, same as BPF_EXIT */ + if (is_bpf_throw_call(insn)) + return DONE_EXPLORING; + ret = fetch_kfunc_meta(env, insn, &meta, NULL); if (ret == 0 && is_iter_next_kfunc(&meta)) { mark_prune_point(env, t); @@ -14738,7 +14808,7 @@ static bool regs_exact(const struct bpf_reg_state *rold, const struct bpf_reg_state *rcur, struct bpf_id_pair *idmap) { - return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 && + return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 && check_ids(rold->id, rcur->id, idmap) && check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap); } @@ -15617,6 +15687,7 @@ static int do_check(struct bpf_verifier_env *env) int prev_insn_idx = -1; for (;;) { + bool exception_exit = false; struct bpf_insn *insn; u8 class; int err; @@ -15830,12 +15901,18 @@ static int do_check(struct bpf_verifier_env *env) return -EINVAL; } } - if (insn->src_reg == BPF_PSEUDO_CALL) + if (insn->src_reg == BPF_PSEUDO_CALL) { err = check_func_call(env, insn, &env->insn_idx); - else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) + } else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { err = check_kfunc_call(env, insn, &env->insn_idx); - else + if (err == 1) { + err = 0; + exception_exit = true; + goto process_bpf_exit_full; + } + } else { err = check_helper_call(env, insn, &env->insn_idx); + } if (err) return err; @@ -15863,6 +15940,7 @@ static int do_check(struct bpf_verifier_env *env) return -EINVAL; } +process_bpf_exit_full: if (env->cur_state->active_lock.ptr && !in_rbtree_lock_required_cb(env)) { verbose(env, "bpf_spin_unlock is missing\n"); @@ -15880,10 +15958,23 @@ static int do_check(struct bpf_verifier_env *env) * function, for which reference_state must * match caller reference state when it exits. */ - err = check_reference_leak(env); + err = check_reference_leak(env, exception_exit); if (err) return err; + /* The side effect of the prepare_func_exit + * which is being skipped is that it frees + * bpf_func_state. Typically, process_bpf_exit + * will only be hit with outermost exit. + * copy_verifier_state in pop_stack will handle + * freeing of any extra bpf_func_state left over + * from not processing all nested function + * exits. We also skip return code checks as + * they are not needed for exceptional exits. + */ + if (exception_exit) + goto process_bpf_exit; + if (state->curframe) { /* exit from nested function */ err = prepare_func_exit(env, &env->insn_idx); @@ -17438,6 +17529,33 @@ static int do_misc_fixups(struct bpf_verifier_env *env) int i, ret, cnt, delta = 0; for (i = 0; i < insn_cnt; i++, insn++) { + /* Typically, exception state is always cleared on entry and we + * ensure to clear it before exiting, but in some cases, our + * invocation can occur after a BPF callback has been executed + * asynchronously in the context of the current task, which may + * clobber the state (think of BPF timer callbacks). Callbacks + * never reset exception state (as they may be called from + * within a program). Thus, if we rely on seeing the exception + * state, always clear it on entry. + */ + if (i == 0 && prog->throws_exception) { + struct bpf_insn entry_insns[] = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), + BPF_EMIT_CALL(bpf_reset_exception), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + insn[i], + }; + + cnt = ARRAY_SIZE(entry_insns); + new_prog = bpf_patch_insn_data(env, i + delta, entry_insns, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = new_prog; + insn = new_prog->insnsi + i + delta; + } + /* Make divide-by-zero exceptions impossible. */ if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) || insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) || @@ -18030,7 +18148,33 @@ static bool is_inlineable_bpf_loop_call(struct bpf_insn *insn, return insn->code == (BPF_JMP | BPF_CALL) && insn->src_reg == 0 && insn->imm == BPF_FUNC_loop && - aux->loop_inline_state.fit_for_inline; + aux->loop_inline_state.fit_for_inline && + aux->throw_state.type == BPF_THROW_NONE; +} + +static struct bpf_prog *rewrite_bpf_throw_call(struct bpf_verifier_env *env, + int position, + struct bpf_throw_state *tstate, + u32 *cnt) +{ + struct bpf_insn insn_buf[] = { + env->prog->insnsi[position], + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }; + + *cnt = ARRAY_SIZE(insn_buf); + /* We don't need the call instruction for throws in frame 0 */ + if (tstate->type == BPF_THROW_OUTER) + return bpf_patch_insn_data(env, position, insn_buf + 1, *cnt - 1); + return bpf_patch_insn_data(env, position, insn_buf, *cnt); +} + +static bool is_bpf_throw_call(struct bpf_insn *insn) +{ + return insn->code == (BPF_JMP | BPF_CALL) && + insn->src_reg == BPF_PSEUDO_KFUNC_CALL && + insn->off == 0 && insn->imm == special_kfunc_list[KF_bpf_throw]; } /* For all sub-programs in the program (including main) check @@ -18069,8 +18213,24 @@ static int do_misc_rewrites(struct bpf_verifier_env *env) &cnt); if (!new_prog) return -ENOMEM; + } else if (is_bpf_throw_call(insn)) { + struct bpf_throw_state *throw_state = &insn_aux->throw_state; + + /* The verifier was able to prove that the bpf_throw + * call was unreachable, hence it must have not been + * seen and will be removed by opt_remove_dead_code. + */ + if (throw_state->type == BPF_THROW_NONE) { + WARN_ON_ONCE(insn_aux->seen); + goto skip; + } + + new_prog = rewrite_bpf_throw_call(env, i + delta, throw_state, &cnt); + if (!new_prog) + return -ENOMEM; } +skip: if (new_prog) { delta += cnt - 1; env->prog = new_prog; @@ -18240,6 +18400,12 @@ static int do_check_subprogs(struct bpf_verifier_env *env) "Func#%d is safe for any args that match its prototype\n", i); } + /* Only reliable functions from BTF PoV can be extended, hence + * remember their exception specification to check that we don't + * replace non-throwing subprog with throwing subprog. The + * opposite is fine though. + */ + aux->func_info_aux[i].throws_exception = env->subprog_info[i].can_throw; } return 0; } @@ -18250,8 +18416,12 @@ static int do_check_main(struct bpf_verifier_env *env) env->insn_idx = 0; ret = do_check_common(env, 0); - if (!ret) + if (!ret) { env->prog->aux->stack_depth = env->subprog_info[0].stack_depth; + env->prog->throws_exception = env->subprog_info[0].can_throw; + if (env->prog->aux->func_info) + env->prog->aux->func_info_aux[0].throws_exception = env->prog->throws_exception; + } return ret; } @@ -18753,6 +18923,42 @@ struct btf *bpf_get_btf_vmlinux(void) return btf_vmlinux; } +static int check_ext_prog(struct bpf_verifier_env *env) +{ + struct bpf_prog *tgt_prog = env->prog->aux->dst_prog; + u32 btf_id = env->prog->aux->attach_btf_id; + struct bpf_prog *prog = env->prog; + int subprog = -1; + + if (prog->type != BPF_PROG_TYPE_EXT) + return 0; + for (int i = 0; i < tgt_prog->aux->func_info_cnt; i++) { + if (tgt_prog->aux->func_info[i].type_id == btf_id) { + subprog = i; + break; + } + } + if (subprog == -1) { + verbose(env, "verifier internal error: extension prog's subprog not found\n"); + return -EFAULT; + } + /* BPF_THROW_OUTER rewrites won't match BPF_PROG_TYPE_EXT's + * BPF_THROW_INNER rewrites. + */ + if (!subprog && prog->throws_exception) { + verbose(env, "Cannot attach throwing extension to main subprog\n"); + return -EINVAL; + } + /* Overwriting extensions is not allowed, so we can simply check + * the specification of the subprog we are replacing. + */ + if (!tgt_prog->aux->func_info_aux[subprog].throws_exception && prog->throws_exception) { + verbose(env, "Cannot attach throwing extension to non-throwing subprog\n"); + return -EINVAL; + } + return 0; +} + int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) { u64 start_time = ktime_get_ns(); @@ -18871,6 +19077,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) ret = do_check_subprogs(env); ret = ret ?: do_check_main(env); + ret = ret ?: check_ext_prog(env); + + if (ret == 0 && bpf_prog_is_offloaded(env->prog->aux)) ret = bpf_prog_offload_finalize(env); diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index dbd2c729781a..d5de9251e775 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -89,4 +89,13 @@ extern void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node, */ extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym; +/* Description + * Throw an exception, terminating the execution of the program immediately. + * The eBPF runtime unwinds the stack automatically and exits the program with + * the default return value of 0. + * Returns + * This function never returns. + */ +extern void bpf_throw(void) __attribute__((noreturn)) __ksym; + #endif

[RFC,bpf-next,v1,3/9] bpf: Implement bpf_throw kfunc

Checks

Commit Message

Comments

Patch