diff mbox series

[RFC,v1,07/14] bpf: Use hidden subprog trampoline for bpf_throw

Message ID 20240201042109.1150490-8-memxor@gmail.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series Exceptions - Resource Cleanup | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/tree_selection success Guessing tree name failed - patch did not apply, async
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-13 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-17 pending Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-25 pending Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-32 pending Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-33 pending Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18

Commit Message

Kumar Kartikeya Dwivedi Feb. 1, 2024, 4:21 a.m. UTC
When we perform a bpf_throw kfunc call, callee saved registers in BPF
calling convention (R6-R9) may end up getting saved and clobbered by
bpf_throw. Typically, the kernel will restore the registers before
returning back to the BPF program, but in case of bpf_throw, the
function will never return. Therefore, any acquired resources sitting in
these registers will end up getting destroyed if not saved on the
stack, without any cleanup happening for them.

Also, when in a BPF call chain, caller frames may have held acquired
resources in R6-R9 and called their subprogs, which may have spilled
these on their stack frame to reuse these registers before entering the
bpf_throw kfunc. Thus, we also need to locate and free these callee
saved registers for each frame.

It is thus necessary to save these registers somewhere before we call
into the bpf_throw kfunc. Instead of adding spills everywhere bpf_throw
is called, we can use a new hidden subprog that saves R6-R9 on the stack
and then calls into bpf_throw. This way, all of the bpf_throw call sites
can be turned into call instructions for this subprog, and the hidden
subprog in turn will save the callee-saved registers before calling into
the bpf_throw kfunc.

In this way, when unwinding the stack, we can locate the callee saved
registers on the hidden subprog stack frame and perform their cleanup.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c  | 24 ++++++++--------
 include/linux/bpf.h          |  5 ++++
 include/linux/bpf_verifier.h |  3 +-
 kernel/bpf/verifier.c        | 55 ++++++++++++++++++++++++++++++++++--
 4 files changed, 71 insertions(+), 16 deletions(-)

Comments

Eduard Zingerman Feb. 15, 2024, 10:11 p.m. UTC | #1
On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
> When we perform a bpf_throw kfunc call, callee saved registers in BPF
> calling convention (R6-R9) may end up getting saved and clobbered by
> bpf_throw. Typically, the kernel will restore the registers before
> returning back to the BPF program, but in case of bpf_throw, the
> function will never return. Therefore, any acquired resources sitting in
> these registers will end up getting destroyed if not saved on the
> stack, without any cleanup happening for them.

Could you please paraphrase this description a bit?
It took me a while to figure out the difference between regular bpf
calls and kfunc calls. Something like:

  - For regular bpf subprogram calls jit emits code that pushes R6-R9 to stack
    before jumping into callee.
  - For kfunc calls jit emits instructions that do not guarantee that
    R6-R9 would be preserved on stack. E.g. for x86 kfunc call is translated
    as "call" instruction, which only pushes return address to stack.

--

Also, what do you think about the following hack:
- declare a hidden kfunc "bpf_throw_r(u64 r6, u64 r7, u64 r8, u64 r9)";
- replace all calls to bpf_throw() with calls to bpf_throw_r()
  (r1-r5 do not have to be preserved anyways).
Thus avoid necessity to introduce the trampoline.

[...]
Kumar Kartikeya Dwivedi Feb. 16, 2024, 9:59 p.m. UTC | #2
On Thu, 15 Feb 2024 at 23:11, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
> > When we perform a bpf_throw kfunc call, callee saved registers in BPF
> > calling convention (R6-R9) may end up getting saved and clobbered by
> > bpf_throw. Typically, the kernel will restore the registers before
> > returning back to the BPF program, but in case of bpf_throw, the
> > function will never return. Therefore, any acquired resources sitting in
> > these registers will end up getting destroyed if not saved on the
> > stack, without any cleanup happening for them.
>
> Could you please paraphrase this description a bit?
> It took me a while to figure out the difference between regular bpf
> calls and kfunc calls. Something like:
>
>   - For regular bpf subprogram calls jit emits code that pushes R6-R9 to stack
>     before jumping into callee.
>   - For kfunc calls jit emits instructions that do not guarantee that
>     R6-R9 would be preserved on stack. E.g. for x86 kfunc call is translated
>     as "call" instruction, which only pushes return address to stack.
>

Will rephrase like this, thanks for the suggestions.

> --
>
> Also, what do you think about the following hack:
> - declare a hidden kfunc "bpf_throw_r(u64 r6, u64 r7, u64 r8, u64 r9)";
> - replace all calls to bpf_throw() with calls to bpf_throw_r()
>   (r1-r5 do not have to be preserved anyways).
> Thus avoid necessity to introduce the trampoline.
>

I think we can do such a thing as well, but there are other tradeoffs.

Do you mean that R6 to R9 would be copied to R1 to R5? We will have to
special case such calls in each architecture's JIT, and add extra code
to handle it, since fixups from the verifier would also need to pass
the 6th argument, the cookie value to the bpf_throw call, which can't
fit in the 5 argument limit for existing kfuncs. I did contemplate
this solution but then decided against it for these reasons.

One of the advantages of this bpf_throw_tramp stuff is that it does
not increase code size for all callees, by doing the saving only when
subprog is called. We can do something similar for bpf_throw_r, but it
would be in architecture specific code in JIT or some arch_bpf_throw_r
instead.

Let me know if you suggested something different than what I understood above.

> [...]
>
>
Eduard Zingerman Feb. 17, 2024, 2:22 p.m. UTC | #3
On Fri, 2024-02-16 at 22:59 +0100, Kumar Kartikeya Dwivedi wrote:
[...]

> > Also, what do you think about the following hack:
> > - declare a hidden kfunc "bpf_throw_r(u64 r6, u64 r7, u64 r8, u64 r9)";
> > - replace all calls to bpf_throw() with calls to bpf_throw_r()
> >   (r1-r5 do not have to be preserved anyways).
> > Thus avoid necessity to introduce the trampoline.
> > 
> 
> I think we can do such a thing as well, but there are other tradeoffs.
> 
> Do you mean that R6 to R9 would be copied to R1 to R5? We will have to
> special case such calls in each architecture's JIT, and add extra code
> to handle it, since fixups from the verifier would also need to pass
> the 6th argument, the cookie value to the bpf_throw call, which can't
> fit in the 5 argument limit for existing kfuncs. I did contemplate
> this solution but then decided against it for these reasons.
> 
> One of the advantages of this bpf_throw_tramp stuff is that it does
> not increase code size for all callees, by doing the saving only when
> subprog is called. We can do something similar for bpf_throw_r, but it
> would be in architecture specific code in JIT or some arch_bpf_throw_r
> instead.
> 
> Let me know if you suggested something different than what I understood above.

Forgot about cookie, however R6-R9 fit in R2-R5, so the cookie would be fine.
arch_bpf_throw_r() that saves R6-R9 right after the call is probably
better than plain bpf register copying.

But you are correct that trampoline allows uniform processing in
arch_bpf_cleanup_frame_resource(), so it would be less C code to
implement this feature in the end.
diff mbox series

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e1390d1e331b..87692d983ffd 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -640,9 +640,10 @@  static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
 	EMIT2(X86_JE, offset);                    /* je out */
 
-	if (bpf_prog->aux->exception_boundary) {
+	if (bpf_prog->aux->exception_boundary || bpf_prog->aux->bpf_throw_tramp) {
 		pop_callee_regs(&prog, all_callee_regs_used);
-		pop_r12(&prog);
+		if (bpf_prog->aux->exception_boundary)
+			pop_r12(&prog);
 	} else {
 		pop_callee_regs(&prog, callee_regs_used);
 	}
@@ -699,9 +700,10 @@  static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
 	emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE,
 		  poke->tailcall_bypass);
 
-	if (bpf_prog->aux->exception_boundary) {
+	if (bpf_prog->aux->exception_boundary || bpf_prog->aux->bpf_throw_tramp) {
 		pop_callee_regs(&prog, all_callee_regs_used);
-		pop_r12(&prog);
+		if (bpf_prog->aux->exception_boundary)
+			pop_r12(&prog);
 	} else {
 		pop_callee_regs(&prog, callee_regs_used);
 	}
@@ -1164,12 +1166,9 @@  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 	/* Exception callback will clobber callee regs for its own use, and
 	 * restore the original callee regs from main prog's stack frame.
 	 */
-	if (bpf_prog->aux->exception_boundary) {
-		/* We also need to save r12, which is not mapped to any BPF
-		 * register, as we throw after entry into the kernel, which may
-		 * overwrite r12.
-		 */
-		push_r12(&prog);
+	if (bpf_prog->aux->exception_boundary || bpf_prog->aux->bpf_throw_tramp) {
+		if (bpf_prog->aux->exception_boundary)
+			push_r12(&prog);
 		push_callee_regs(&prog, all_callee_regs_used);
 	} else {
 		push_callee_regs(&prog, callee_regs_used);
@@ -2031,9 +2030,10 @@  st:			if (is_imm8(insn->off))
 			seen_exit = true;
 			/* Update cleanup_addr */
 			ctx->cleanup_addr = proglen;
-			if (bpf_prog->aux->exception_boundary) {
+			if (bpf_prog->aux->exception_boundary || bpf_prog->aux->bpf_throw_tramp) {
 				pop_callee_regs(&prog, all_callee_regs_used);
-				pop_r12(&prog);
+				if (bpf_prog->aux->exception_boundary)
+					pop_r12(&prog);
 			} else {
 				pop_callee_regs(&prog, callee_regs_used);
 			}
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 463c8d22ad72..83cff18a1b66 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3369,6 +3369,11 @@  static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 	return prog->aux->func_idx != 0;
 }
 
+static inline bool bpf_is_hidden_subprog(const struct bpf_prog *prog)
+{
+	return prog->aux->func_idx >= prog->aux->func_cnt;
+}
+
 struct bpf_frame_desc_reg_entry {
 	u32 type;
 	s16 spill_type;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 0113a3a940e2..04e27fce33d6 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -683,6 +683,7 @@  struct bpf_verifier_env {
 	u32 id_gen;			/* used to generate unique reg IDs */
 	u32 hidden_subprog_cnt;		/* number of hidden subprogs */
 	int exception_callback_subprog;
+	int bpf_throw_tramp_subprog;
 	bool explore_alu_limits;
 	bool allow_ptr_leaks;
 	/* Allow access to uninitialized stack memory. Writes with fixed offset are
@@ -699,7 +700,7 @@  struct bpf_verifier_env {
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	const struct bpf_line_info *prev_linfo;
 	struct bpf_verifier_log log;
-	struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 2]; /* max + 2 for the fake and exception subprogs */
+	struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 3]; /* max + 3 for the fake, exception, and bpf_throw_tramp subprogs */
 	union {
 		struct bpf_idmap idmap_scratch;
 		struct bpf_idset idset_scratch;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e5b1db1db679..942243cba9f1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19836,9 +19836,9 @@  static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat
 	int cnt = env->subprog_cnt;
 	struct bpf_prog *prog;
 
-	/* We only reserve one slot for hidden subprogs in subprog_info. */
-	if (env->hidden_subprog_cnt) {
-		verbose(env, "verifier internal error: only one hidden subprog supported\n");
+	/* We only reserve two slots for hidden subprogs in subprog_info. */
+	if (env->hidden_subprog_cnt == 2) {
+		verbose(env, "verifier internal error: only two hidden subprogs supported\n");
 		return -EFAULT;
 	}
 	/* We're not patching any existing instruction, just appending the new
@@ -19892,6 +19892,42 @@  static int do_misc_fixups(struct bpf_verifier_env *env)
 		mark_subprog_exc_cb(env, env->exception_callback_subprog);
 	}
 
+	if (env->seen_exception) {
+		struct bpf_insn patch[] = {
+			/* Use the correct insn_cnt here, as we want to append past the hidden subprog above. */
+			env->prog->insnsi[env->prog->len - 1],
+			/* Scratch R6-R9 so that the JIT spills them to the stack on entry. */
+			BPF_MOV64_IMM(BPF_REG_6, 0),
+			BPF_MOV64_IMM(BPF_REG_7, 0),
+			BPF_MOV64_IMM(BPF_REG_8, 0),
+			BPF_MOV64_IMM(BPF_REG_9, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, special_kfunc_list[KF_bpf_throw]),
+		};
+		const bool all_callee_regs_used[4] = {true, true, true, true};
+
+		ret = add_hidden_subprog(env, patch, ARRAY_SIZE(patch));
+		if (ret < 0)
+			return ret;
+		prog = env->prog;
+		insn = prog->insnsi;
+
+		env->bpf_throw_tramp_subprog = env->subprog_cnt - 1;
+		/* Ensure to mark callee_regs_used, so that we can collect any saved_regs if necessary. */
+		memcpy(env->subprog_info[env->bpf_throw_tramp_subprog].callee_regs_used, all_callee_regs_used, sizeof(all_callee_regs_used));
+		/* Certainly, we have seen a bpf_throw call in this program, as
+		 * seen_exception is true, therefore the bpf_kfunc_desc entry for it must
+		 * be populated and found here. We need to do the fixup now, otherwise
+		 * the loop over insn_cnt below won't see this kfunc call.
+		 */
+		ret = fixup_kfunc_call(env, &prog->insnsi[prog->len - 1], insn_buf, prog->len - 1, &cnt);
+		if (ret < 0)
+			return ret;
+		if (cnt != 0) {
+			verbose(env, "verifier internal error: unhandled patching for bpf_throw fixup in bpf_throw_tramp subprog\n");
+			return -EFAULT;
+		}
+	}
+
 	for (i = 0; i < insn_cnt; i++, insn++) {
 		/* Make divide-by-zero exceptions impossible. */
 		if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
@@ -20012,6 +20048,19 @@  static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (insn->src_reg == BPF_PSEUDO_CALL)
 			continue;
 		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
+			/* All bpf_throw calls in this program must be patched to call the
+			 * bpf_throw_tramp subprog instead.  This ensures we correctly save
+			 * the R6-R9 before entry into kernel, and can clean them up if
+			 * needed.
+			 * Note: seen_exception must be set, otherwise no bpf_throw_tramp is
+			 * generated.
+			 */
+			if (env->seen_exception && is_bpf_throw_kfunc(insn)) {
+				*insn = BPF_CALL_REL(0);
+				insn->imm = (int)env->subprog_info[env->bpf_throw_tramp_subprog].start - (i + delta) - 1;
+				continue;
+			}
+
 			ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt);
 			if (ret)
 				return ret;