diff mbox series

[v9,bpf-next,9/9] bpf, x86_64: use bpf_jit_binary_pack_alloc

Message ID 20220204185742.271030-10-song@kernel.org (mailing list archive)
State Accepted
Commit 1022a5498f6f745c3b5fd3f050a5e11e7ca354f0
Delegated to: BPF
Headers show
Series bpf_prog_pack allocator | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 12 maintainers not CCed: hpa@zytor.com bp@alien8.de kpsingh@kernel.org yoshfuji@linux-ipv6.org john.fastabend@gmail.com kafai@fb.com dsahern@kernel.org dave.hansen@linux.intel.com yhs@fb.com mingo@redhat.com tglx@linutronix.de davem@davemloft.net
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 7 this patch: 7
netdev/checkpatch warning WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON() WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 96 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next fail VM_Test

Commit Message

Song Liu Feb. 4, 2022, 6:57 p.m. UTC
From: Song Liu <songliubraving@fb.com>

Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes
the program to the rw buffer. When the jit is done, the program is copied
to the final location with bpf_jit_binary_pack_finalize.

Note that we need to do bpf_tail_call_direct_fixup after finalize.
Therefore, the text_live = false logic in __bpf_arch_text_poke is no
longer needed.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 arch/x86/net/bpf_jit_comp.c | 58 ++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 27 deletions(-)

Comments

Alexei Starovoitov Feb. 8, 2022, 2:24 a.m. UTC | #1
On Fri, Feb 04, 2022 at 10:57:42AM -0800, Song Liu wrote:
>  	if (image) {
>  		if (!prog->is_func || extra_pass) {
> +			/*
> +			 * bpf_jit_binary_pack_finalize fails in two scenarios:
> +			 *   1) header is not pointing to proper module memory;
> +			 *   2) the arch doesn't support bpf_arch_text_copy().
> +			 *
> +			 * Both cases are serious bugs that we should not continue.
> +			 */
> +			BUG_ON(bpf_jit_binary_pack_finalize(prog, header, rw_header));
>  			bpf_tail_call_direct_fixup(prog);
> -			bpf_jit_binary_lock_ro(header);

BUG_ON is discouraged.
It should only be used when the kernel absolutely cannot continue.
Here ro/rw_headers will be freed. We can WARN and goto out_addrs without drama.
Please send a follow up.

The rest looks great. Applied to bpf-next.
Andres Freund July 3, 2022, 3:02 a.m. UTC | #2
Hi,

On 2022-02-04 10:57:42 -0800, Song Liu wrote:
> From: Song Liu <songliubraving@fb.com>
> 
> Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes
> the program to the rw buffer. When the jit is done, the program is copied
> to the final location with bpf_jit_binary_pack_finalize.
> 
> Note that we need to do bpf_tail_call_direct_fixup after finalize.
> Therefore, the text_live = false logic in __bpf_arch_text_poke is no
> longer needed.

I think this broke bpf_jit_enable = 2. I just tried to use that, to verify I
didn't break tools/bpf/bpf_jit_disasm, and I just see output like

Jul 02 18:34:40 awork3 kernel: flen=142 proglen=735 pass=5 image=00000000d076e0db from=sshd pid=440127
Jul 02 18:34:40 awork3 kernel: JIT code: 00000000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
Jul 02 18:34:40 awork3 kernel: JIT code: 00000010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
Jul 02 18:34:40 awork3 kernel: JIT code: 00000020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
Jul 02 18:34:40 awork3 kernel: JIT code: 00000030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
...

while bpftool keeps showing reasonable content. The 'cc' content only started
with a later commit, but I think this is the commit that broke bpf_jit_enable
== 2.

At the time bpf_jit_dump() is called bpf_jit_binary_pack_alloc() pointed image to
ro_header->image, but that's not yet written to, because
bpf_jit_binary_pack_finalize() hasn't been called.

Greetings,

Andres Freund
Alexei Starovoitov July 3, 2022, 3:03 a.m. UTC | #3
On Sat, Jul 2, 2022 at 8:02 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-02-04 10:57:42 -0800, Song Liu wrote:
> > From: Song Liu <songliubraving@fb.com>
> >
> > Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes
> > the program to the rw buffer. When the jit is done, the program is copied
> > to the final location with bpf_jit_binary_pack_finalize.
> >
> > Note that we need to do bpf_tail_call_direct_fixup after finalize.
> > Therefore, the text_live = false logic in __bpf_arch_text_poke is no
> > longer needed.
>
> I think this broke bpf_jit_enable = 2.

Good. We need to remove that knob.
It's been wrong for a long time.
Andres Freund July 3, 2022, 3:14 a.m. UTC | #4
Hi,

On 2022-07-02 20:03:56 -0700, Alexei Starovoitov wrote:
> On Sat, Jul 2, 2022 at 8:02 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-02-04 10:57:42 -0800, Song Liu wrote:
> > > From: Song Liu <songliubraving@fb.com>
> > >
> > > Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes
> > > the program to the rw buffer. When the jit is done, the program is copied
> > > to the final location with bpf_jit_binary_pack_finalize.
> > >
> > > Note that we need to do bpf_tail_call_direct_fixup after finalize.
> > > Therefore, the text_live = false logic in __bpf_arch_text_poke is no
> > > longer needed.
> >
> > I think this broke bpf_jit_enable = 2.
> 
> Good. We need to remove that knob.
> It's been wrong for a long time.

Fine with me - I've never used it before trying to verify I am not breaking
tools/bpf/bpf_jit_disasm...

And yea, it does look like it bpf_jit_dump() was called too early before that
commit as well, just not as consequentially so.

Greetings,

Andres Freund
diff mbox series

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index c13d148f7396..643f38b91e30 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -330,8 +330,7 @@  static int emit_jump(u8 **pprog, void *func, void *ip)
 }
 
 static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
-				void *old_addr, void *new_addr,
-				const bool text_live)
+				void *old_addr, void *new_addr)
 {
 	const u8 *nop_insn = x86_nops[5];
 	u8 old_insn[X86_PATCH_SIZE];
@@ -365,10 +364,7 @@  static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 		goto out;
 	ret = 1;
 	if (memcmp(ip, new_insn, X86_PATCH_SIZE)) {
-		if (text_live)
-			text_poke_bp(ip, new_insn, X86_PATCH_SIZE, NULL);
-		else
-			memcpy(ip, new_insn, X86_PATCH_SIZE);
+		text_poke_bp(ip, new_insn, X86_PATCH_SIZE, NULL);
 		ret = 0;
 	}
 out:
@@ -384,7 +380,7 @@  int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 		/* BPF poking in modules is not supported */
 		return -EINVAL;
 
-	return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true);
+	return __bpf_arch_text_poke(ip, t, old_addr, new_addr);
 }
 
 #define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
@@ -558,24 +554,15 @@  static void bpf_tail_call_direct_fixup(struct bpf_prog *prog)
 		mutex_lock(&array->aux->poke_mutex);
 		target = array->ptrs[poke->tail_call.key];
 		if (target) {
-			/* Plain memcpy is used when image is not live yet
-			 * and still not locked as read-only. Once poke
-			 * location is active (poke->tailcall_target_stable),
-			 * any parallel bpf_arch_text_poke() might occur
-			 * still on the read-write image until we finally
-			 * locked it as read-only. Both modifications on
-			 * the given image are under text_mutex to avoid
-			 * interference.
-			 */
 			ret = __bpf_arch_text_poke(poke->tailcall_target,
 						   BPF_MOD_JUMP, NULL,
 						   (u8 *)target->bpf_func +
-						   poke->adj_off, false);
+						   poke->adj_off);
 			BUG_ON(ret < 0);
 			ret = __bpf_arch_text_poke(poke->tailcall_bypass,
 						   BPF_MOD_JUMP,
 						   (u8 *)poke->tailcall_target +
-						   X86_PATCH_SIZE, NULL, false);
+						   X86_PATCH_SIZE, NULL);
 			BUG_ON(ret < 0);
 		}
 		WRITE_ONCE(poke->tailcall_target_stable, true);
@@ -866,7 +853,7 @@  static void emit_nops(u8 **pprog, int len)
 
 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
 
-static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
+static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
 		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
 {
 	bool tail_call_reachable = bpf_prog->aux->tail_call_reachable;
@@ -893,8 +880,8 @@  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 	push_callee_regs(&prog, callee_regs_used);
 
 	ilen = prog - temp;
-	if (image)
-		memcpy(image + proglen, temp, ilen);
+	if (rw_image)
+		memcpy(rw_image + proglen, temp, ilen);
 	proglen += ilen;
 	addrs[0] = proglen;
 	prog = temp;
@@ -1323,6 +1310,9 @@  st:			if (is_imm8(insn->off))
 					pr_err("extable->insn doesn't fit into 32-bit\n");
 					return -EFAULT;
 				}
+				/* switch ex to rw buffer for writes */
+				ex = (void *)rw_image + ((void *)ex - (void *)image);
+
 				ex->insn = delta;
 
 				ex->data = EX_TYPE_BPF;
@@ -1705,7 +1695,7 @@  st:			if (is_imm8(insn->off))
 				pr_err("bpf_jit: fatal error\n");
 				return -EFAULT;
 			}
-			memcpy(image + proglen, temp, ilen);
+			memcpy(rw_image + proglen, temp, ilen);
 		}
 		proglen += ilen;
 		addrs[i] = proglen;
@@ -2246,6 +2236,7 @@  int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs)
 }
 
 struct x64_jit_data {
+	struct bpf_binary_header *rw_header;
 	struct bpf_binary_header *header;
 	int *addrs;
 	u8 *image;
@@ -2258,6 +2249,7 @@  struct x64_jit_data {
 
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
+	struct bpf_binary_header *rw_header = NULL;
 	struct bpf_binary_header *header = NULL;
 	struct bpf_prog *tmp, *orig_prog = prog;
 	struct x64_jit_data *jit_data;
@@ -2266,6 +2258,7 @@  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	bool tmp_blinded = false;
 	bool extra_pass = false;
 	bool padding = false;
+	u8 *rw_image = NULL;
 	u8 *image = NULL;
 	int *addrs;
 	int pass;
@@ -2301,6 +2294,8 @@  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		oldproglen = jit_data->proglen;
 		image = jit_data->image;
 		header = jit_data->header;
+		rw_header = jit_data->rw_header;
+		rw_image = (void *)rw_header + ((void *)image - (void *)header);
 		extra_pass = true;
 		padding = true;
 		goto skip_init_addrs;
@@ -2331,12 +2326,12 @@  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	for (pass = 0; pass < MAX_PASSES || image; pass++) {
 		if (!padding && pass >= PADDING_PASSES)
 			padding = true;
-		proglen = do_jit(prog, addrs, image, oldproglen, &ctx, padding);
+		proglen = do_jit(prog, addrs, image, rw_image, oldproglen, &ctx, padding);
 		if (proglen <= 0) {
 out_image:
 			image = NULL;
 			if (header)
-				bpf_jit_binary_free(header);
+				bpf_jit_binary_pack_free(header, rw_header);
 			prog = orig_prog;
 			goto out_addrs;
 		}
@@ -2360,8 +2355,9 @@  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 				sizeof(struct exception_table_entry);
 
 			/* allocate module memory for x86 insns and extable */
-			header = bpf_jit_binary_alloc(roundup(proglen, align) + extable_size,
-						      &image, align, jit_fill_hole);
+			header = bpf_jit_binary_pack_alloc(roundup(proglen, align) + extable_size,
+							   &image, align, &rw_header, &rw_image,
+							   jit_fill_hole);
 			if (!header) {
 				prog = orig_prog;
 				goto out_addrs;
@@ -2377,14 +2373,22 @@  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
 	if (image) {
 		if (!prog->is_func || extra_pass) {
+			/*
+			 * bpf_jit_binary_pack_finalize fails in two scenarios:
+			 *   1) header is not pointing to proper module memory;
+			 *   2) the arch doesn't support bpf_arch_text_copy().
+			 *
+			 * Both cases are serious bugs that we should not continue.
+			 */
+			BUG_ON(bpf_jit_binary_pack_finalize(prog, header, rw_header));
 			bpf_tail_call_direct_fixup(prog);
-			bpf_jit_binary_lock_ro(header);
 		} else {
 			jit_data->addrs = addrs;
 			jit_data->ctx = ctx;
 			jit_data->proglen = proglen;
 			jit_data->image = image;
 			jit_data->header = header;
+			jit_data->rw_header = rw_header;
 		}
 		prog->bpf_func = (void *)image;
 		prog->jited = 1;