diff mbox series

[RFC,bpf-next,1/2] bpf, x64: Fix tailcall infinite loop bug

Message ID 20230814134147.70289-2-hffilwlqm@gmail.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series bpf, x64: Fix tailcall infinite loop bug | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs_no_alu32 on s390x with gcc
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2812 this patch: 2812
netdev/cc_maintainers success CCed 22 of 22 maintainers
netdev/build_clang success Errors and warnings before: 1526 this patch: 1526
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 2840 this patch: 2840
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc

Commit Message

Leon Hwang Aug. 14, 2023, 1:41 p.m. UTC
From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls itself.

As a result, a tailcall infinite loop comes up. And the loop halts the
machine.

As we know, in tail call context, the tail_call_cnt propagates by stack
and RAX register between BPF subprograms. So do it in FENTRY/FEXIT
trampolines.

Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 23 +++++++++++++++++++----
 include/linux/bpf.h         |  6 ++++++
 kernel/bpf/trampoline.c     |  5 +++--
 kernel/bpf/verifier.c       |  9 +++++++--
 4 files changed, 35 insertions(+), 8 deletions(-)

Comments

Eduard Zingerman Aug. 15, 2023, 12:52 a.m. UTC | #1
On Mon, 2023-08-14 at 21:41 +0800, Leon Hwang wrote:
> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
> handling in JIT"), the tailcall on x64 works better than before.
> 
> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
> 
> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
> to other BPF programs"), BPF program is able to trace other BPF programs.
> 
> How about combining them all together?
> 
> 1. FENTRY/FEXIT on a BPF subprogram.
> 2. A tailcall runs in the BPF subprogram.
> 3. The tailcall calls itself.
> 
> As a result, a tailcall infinite loop comes up. And the loop halts the
> machine.
> 
> As we know, in tail call context, the tail_call_cnt propagates by stack
> and RAX register between BPF subprograms. So do it in FENTRY/FEXIT
> trampolines.

Hi Leon,

I'm not familiar with this part of the jit compiler, so decided that
taking a look at your series might be a good learning point.
I think I got the gist of it, but I don't understand where
the initial value of RAX (== 0) is coming from in
arch_prepare_bpf_trampoline(), could you please help me out?

Also a nitpick:
- in arch_prepare_bpf_trampoline() there is a comment detailing 
  the stack layout, it probably should be updated to say that
  tail call count is stored as well;
- before arch_prepare_bpf_trampoline() there is a comment with
  an example of generated assembly, should it be updated?

Thanks,
Eduard

> 
> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
> Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 23 +++++++++++++++++++----
>  include/linux/bpf.h         |  6 ++++++
>  kernel/bpf/trampoline.c     |  5 +++--
>  kernel/bpf/verifier.c       |  9 +++++++--
>  4 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index a5930042139d3..ca5366d97ad04 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1018,6 +1018,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
>  
>  #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
>  
> +/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> +#define RESTORE_TAIL_CALL_CNT(stack)				\
> +	EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)
> +
>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
>  		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
>  {
> @@ -1623,9 +1627,7 @@ st:			if (is_imm8(insn->off))
>  
>  			func = (u8 *) __bpf_call_base + imm32;
>  			if (tail_call_reachable) {
> -				/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> -				EMIT3_off32(0x48, 0x8B, 0x85,
> -					    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
> +				RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
>  				if (!imm32)
>  					return -EINVAL;
>  				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
> @@ -2464,6 +2466,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	else
>  		/* sub rsp, stack_size */
>  		EMIT4(0x48, 0x83, 0xEC, stack_size);
> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +		EMIT1(0x50);		/* push rax */
>  	/* mov QWORD PTR [rbp - rbx_off], rbx */
>  	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
>  
> @@ -2516,6 +2520,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  		restore_regs(m, &prog, regs_off);
>  		save_args(m, &prog, arg_stack_off, true);
>  
> +		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +			/* Before calling the original function, restore the
> +			 * tail_call_cnt from stack.
> +			 */
> +			RESTORE_TAIL_CALL_CNT(stack_size);
> +
>  		if (flags & BPF_TRAMP_F_ORIG_STACK) {
>  			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
>  			EMIT2(0xff, 0xd0); /* call *rax */
> @@ -2569,7 +2579,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  			ret = -EINVAL;
>  			goto cleanup;
>  		}
> -	}
> +	} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +		/* Before running the original function, restore the
> +		 * tail_call_cnt from stack.
> +		 */
> +		RESTORE_TAIL_CALL_CNT(stack_size);
> +
>  	/* restore return value of orig_call or fentry prog back into RAX */
>  	if (save_ret)
>  		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index cfabbcf47bdb8..55c72086034ef 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1028,6 +1028,11 @@ struct btf_func_model {
>   */
>  #define BPF_TRAMP_F_SHARE_IPMODIFY	BIT(6)
>  
> +/* Indicate that current trampoline is in a tail call context. Then, it has to
> + * cache and restore tail_call_cnt to avoid infinite tail call loop.
> + */
> +#define BPF_TRAMP_F_TAIL_CALL_CTX	BIT(7)
> +
>  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
>   * bytes on x86.
>   */
> @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info {
>  	struct module *tgt_mod;
>  	const char *tgt_name;
>  	const struct btf_type *tgt_type;
> +	bool tail_call_ctx;
>  };
>  
>  #define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 78acf28d48732..0fae334e3f7b8 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
>  		goto out;
>  	}
>  
> -	/* clear all bits except SHARE_IPMODIFY */
> -	tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
> +	/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
> +	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
>  
>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
> @@ -783,6 +783,7 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key,
>  
>  	memcpy(&tr->func.model, &tgt_info->fmodel, sizeof(tgt_info->fmodel));
>  	tr->func.addr = (void *)tgt_info->tgt_addr;
> +	tr->flags = (tgt_info->tail_call_ctx ? BPF_TRAMP_F_TAIL_CALL_CTX : 0);
>  out:
>  	mutex_unlock(&tr->mutex);
>  	return tr;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4ccca1f6c9981..a78e5a2ae5c72 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19400,10 +19400,15 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>  			return -EINVAL;
>  		fallthrough;
>  	case BPF_MODIFY_RETURN:
> -	case BPF_LSM_MAC:
> -	case BPF_LSM_CGROUP:
>  	case BPF_TRACE_FENTRY:
>  	case BPF_TRACE_FEXIT:
> +		if (tgt_prog && subprog > 0 &&
> +		    tgt_prog->aux->func[subprog]->is_func &&
> +		    tgt_prog->aux->tail_call_reachable)
> +			tgt_info->tail_call_ctx = true;
> +		fallthrough;
> +	case BPF_LSM_MAC:
> +	case BPF_LSM_CGROUP:
>  		if (!btf_type_is_func(t)) {
>  			bpf_log(log, "attach_btf_id %u is not a function\n",
>  				btf_id);
Leon Hwang Aug. 15, 2023, 3:01 a.m. UTC | #2
On 15/8/23 08:52, Eduard Zingerman wrote:
> On Mon, 2023-08-14 at 21:41 +0800, Leon Hwang wrote:
>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
>> handling in JIT"), the tailcall on x64 works better than before.
>>
>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>
>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
>> to other BPF programs"), BPF program is able to trace other BPF programs.
>>
>> How about combining them all together?
>>
>> 1. FENTRY/FEXIT on a BPF subprogram.
>> 2. A tailcall runs in the BPF subprogram.
>> 3. The tailcall calls itself.
>>
>> As a result, a tailcall infinite loop comes up. And the loop halts the
>> machine.
>>
>> As we know, in tail call context, the tail_call_cnt propagates by stack
>> and RAX register between BPF subprograms. So do it in FENTRY/FEXIT
>> trampolines.
> 
> Hi Leon,
> 
> I'm not familiar with this part of the jit compiler, so decided that
> taking a look at your series might be a good learning point.
> I think I got the gist of it, but I don't understand where
> the initial value of RAX (== 0) is coming from in
> arch_prepare_bpf_trampoline(), could you please help me out?
> 
> Also a nitpick:
> - in arch_prepare_bpf_trampoline() there is a comment detailing 
>   the stack layout, it probably should be updated to say that
>   tail call count is stored as well;
> - before arch_prepare_bpf_trampoline() there is a comment with
>   an example of generated assembly, should it be updated?
> 
> Thanks,
> Eduard
> 

a) Initial value of RAX is in emit_prologue().
	if (!ebpf_from_cbpf) {
		if (tail_call_reachable && !is_subprog)
			/* When it's the entry of the whole
			 * tailcall context, zeroing the RAX
			 * means init tail_call_cnt.
			 */
			EMIT2(0x31, 0xC0); /* xor eax, eax */
		else
			// Keep the same asm layout.
			EMIT2(0x66, 0x90); /* nop2 */
	}
   I'd like to add this comment to emit_prologue().

b) Good to update the stack layout. I'll do it.

c) Its comment will be updated also.

Thanks,
Leon

>>
>> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
>> Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
>> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
>> ---
>>  arch/x86/net/bpf_jit_comp.c | 23 +++++++++++++++++++----
>>  include/linux/bpf.h         |  6 ++++++
>>  kernel/bpf/trampoline.c     |  5 +++--
>>  kernel/bpf/verifier.c       |  9 +++++++--
>>  4 files changed, 35 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index a5930042139d3..ca5366d97ad04 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -1018,6 +1018,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
>>  
>>  #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
>>  
>> +/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
>> +#define RESTORE_TAIL_CALL_CNT(stack)				\
>> +	EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)
>> +
>>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
>>  		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
>>  {
>> @@ -1623,9 +1627,7 @@ st:			if (is_imm8(insn->off))
>>  
>>  			func = (u8 *) __bpf_call_base + imm32;
>>  			if (tail_call_reachable) {
>> -				/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
>> -				EMIT3_off32(0x48, 0x8B, 0x85,
>> -					    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
>> +				RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
>>  				if (!imm32)
>>  					return -EINVAL;
>>  				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
>> @@ -2464,6 +2466,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  	else
>>  		/* sub rsp, stack_size */
>>  		EMIT4(0x48, 0x83, 0xEC, stack_size);
>> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
>> +		EMIT1(0x50);		/* push rax */
>>  	/* mov QWORD PTR [rbp - rbx_off], rbx */
>>  	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
>>  
>> @@ -2516,6 +2520,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  		restore_regs(m, &prog, regs_off);
>>  		save_args(m, &prog, arg_stack_off, true);
>>  
>> +		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
>> +			/* Before calling the original function, restore the
>> +			 * tail_call_cnt from stack.
>> +			 */
>> +			RESTORE_TAIL_CALL_CNT(stack_size);
>> +
>>  		if (flags & BPF_TRAMP_F_ORIG_STACK) {
>>  			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
>>  			EMIT2(0xff, 0xd0); /* call *rax */
>> @@ -2569,7 +2579,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  			ret = -EINVAL;
>>  			goto cleanup;
>>  		}
>> -	}
>> +	} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
>> +		/* Before running the original function, restore the
>> +		 * tail_call_cnt from stack.
>> +		 */
>> +		RESTORE_TAIL_CALL_CNT(stack_size);
>> +
>>  	/* restore return value of orig_call or fentry prog back into RAX */
>>  	if (save_ret)
>>  		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index cfabbcf47bdb8..55c72086034ef 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1028,6 +1028,11 @@ struct btf_func_model {
>>   */
>>  #define BPF_TRAMP_F_SHARE_IPMODIFY	BIT(6)
>>  
>> +/* Indicate that current trampoline is in a tail call context. Then, it has to
>> + * cache and restore tail_call_cnt to avoid infinite tail call loop.
>> + */
>> +#define BPF_TRAMP_F_TAIL_CALL_CTX	BIT(7)
>> +
>>  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
>>   * bytes on x86.
>>   */
>> @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info {
>>  	struct module *tgt_mod;
>>  	const char *tgt_name;
>>  	const struct btf_type *tgt_type;
>> +	bool tail_call_ctx;
>>  };
>>  
>>  #define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */
>> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
>> index 78acf28d48732..0fae334e3f7b8 100644
>> --- a/kernel/bpf/trampoline.c
>> +++ b/kernel/bpf/trampoline.c
>> @@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
>>  		goto out;
>>  	}
>>  
>> -	/* clear all bits except SHARE_IPMODIFY */
>> -	tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
>> +	/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
>> +	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
>>  
>>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
>> @@ -783,6 +783,7 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key,
>>  
>>  	memcpy(&tr->func.model, &tgt_info->fmodel, sizeof(tgt_info->fmodel));
>>  	tr->func.addr = (void *)tgt_info->tgt_addr;
>> +	tr->flags = (tgt_info->tail_call_ctx ? BPF_TRAMP_F_TAIL_CALL_CTX : 0);
>>  out:
>>  	mutex_unlock(&tr->mutex);
>>  	return tr;
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 4ccca1f6c9981..a78e5a2ae5c72 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -19400,10 +19400,15 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>>  			return -EINVAL;
>>  		fallthrough;
>>  	case BPF_MODIFY_RETURN:
>> -	case BPF_LSM_MAC:
>> -	case BPF_LSM_CGROUP:
>>  	case BPF_TRACE_FENTRY:
>>  	case BPF_TRACE_FEXIT:
>> +		if (tgt_prog && subprog > 0 &&
>> +		    tgt_prog->aux->func[subprog]->is_func &&
>> +		    tgt_prog->aux->tail_call_reachable)
>> +			tgt_info->tail_call_ctx = true;
>> +		fallthrough;
>> +	case BPF_LSM_MAC:
>> +	case BPF_LSM_CGROUP:
>>  		if (!btf_type_is_func(t)) {
>>  			bpf_log(log, "attach_btf_id %u is not a function\n",
>>  				btf_id);
>
Eduard Zingerman Aug. 15, 2023, 2:35 p.m. UTC | #3
On Tue, 2023-08-15 at 11:01 +0800, Leon Hwang wrote:
[...]
> a) Initial value of RAX is in emit_prologue().
> 	if (!ebpf_from_cbpf) {
> 		if (tail_call_reachable && !is_subprog)
> 			/* When it's the entry of the whole
> 			 * tailcall context, zeroing the RAX
> 			 * means init tail_call_cnt.
> 			 */
> 			EMIT2(0x31, 0xC0); /* xor eax, eax */
> 		else
> 			// Keep the same asm layout.
> 			EMIT2(0x66, 0x90); /* nop2 */
> 	}
>    I'd like to add this comment to emit_prologue().

Got it, thank you.


[...]
Alexei Starovoitov Aug. 17, 2023, 10:31 p.m. UTC | #4
On Mon, Aug 14, 2023 at 09:41:46PM +0800, Leon Hwang wrote:
> @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info {
>  	struct module *tgt_mod;
>  	const char *tgt_name;
>  	const struct btf_type *tgt_type;
> +	bool tail_call_ctx;

Instead of extra flag here can you check tgt_prog->aux->tail_call_reachable in check_attach_btf_id()
and set tr->flags there?
Other than this the fix makes sense.
Please trim your cc list when you respin.
Just maintainers, Maciej (author of fixes tag) and bpf@vger is enough.
Leon Hwang Aug. 18, 2023, 2:10 a.m. UTC | #5
On 18/8/23 06:31, Alexei Starovoitov wrote:
> On Mon, Aug 14, 2023 at 09:41:46PM +0800, Leon Hwang wrote:
>> @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info {
>>  	struct module *tgt_mod;
>>  	const char *tgt_name;
>>  	const struct btf_type *tgt_type;
>> +	bool tail_call_ctx;
> 
> Instead of extra flag here can you check tgt_prog->aux->tail_call_reachable in check_attach_btf_id()
> and set tr->flags there?

Should we check tgt_prog->aux->func[subprog]->is_func? Or, tgt_prog->aux->tail_call_reachable
is enough?

I think tgt_prog->aux->func[subprog]->is_func is required to check. It's because it's a bug
about subprog instead of tgt_prog.

In check_attach_btf_id():

bool tail_call_ctx;
// ...
ret = bpf_check_attach_target(&env->log, prog, tgt_prog, btf_id, &tgt_info, &tail_call_ctx);
// ...
tr->flags = (tail_call_ctx ? BPF_TRAMP_F_TAIL_CALL_CTX : 0);

How about changing like this? However, it's bad to change bpf_check_attach_target() declaration.

> Other than this the fix makes sense.
> Please trim your cc list when you respin.> Just maintainers, Maciej (author of fixes tag) and bpf@vger is enough.

I'll trim it.

Thanks,
Leon
Alexei Starovoitov Aug. 18, 2023, 7:59 p.m. UTC | #6
On Thu, Aug 17, 2023 at 7:10 PM Leon Hwang <hffilwlqm@gmail.com> wrote:
>
>
>
> On 18/8/23 06:31, Alexei Starovoitov wrote:
> > On Mon, Aug 14, 2023 at 09:41:46PM +0800, Leon Hwang wrote:
> >> @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info {
> >>      struct module *tgt_mod;
> >>      const char *tgt_name;
> >>      const struct btf_type *tgt_type;
> >> +    bool tail_call_ctx;
> >
> > Instead of extra flag here can you check tgt_prog->aux->tail_call_reachable in check_attach_btf_id()
> > and set tr->flags there?
>
> Should we check tgt_prog->aux->func[subprog]->is_func? Or, tgt_prog->aux->tail_call_reachable
> is enough?

Please let the thread continue to a logical conclusion before resending
new version. Will reply there.
Leon Hwang Aug. 19, 2023, 3:38 a.m. UTC | #7
On 2023/8/19 03:59, Alexei Starovoitov wrote:
> On Thu, Aug 17, 2023 at 7:10 PM Leon Hwang <hffilwlqm@gmail.com> wrote:
>>
>>
>>
>> On 18/8/23 06:31, Alexei Starovoitov wrote:
>>> On Mon, Aug 14, 2023 at 09:41:46PM +0800, Leon Hwang wrote:
>>>> @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info {
>>>>      struct module *tgt_mod;
>>>>      const char *tgt_name;
>>>>      const struct btf_type *tgt_type;
>>>> +    bool tail_call_ctx;
>>>
>>> Instead of extra flag here can you check tgt_prog->aux->tail_call_reachable in check_attach_btf_id()
>>> and set tr->flags there?
>>
>> Should we check tgt_prog->aux->func[subprog]->is_func? Or, tgt_prog->aux->tail_call_reachable
>> is enough?
> 
> Please let the thread continue to a logical conclusion before resending
> new version. Will reply there.

Sorry for the new version without logical conclusion.

I'll do it better in the future.

Additionally, I'm looking forward to fix it, and then planning to add a
feature to trace tailcalls with trampoline.

Thanks,
Leon
diff mbox series

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a5930042139d3..ca5366d97ad04 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1018,6 +1018,10 @@  static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
 
 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
 
+/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
+#define RESTORE_TAIL_CALL_CNT(stack)				\
+	EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)
+
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
 		  int oldproglen, struct jit_context *ctx, bool jmp_padding)
 {
@@ -1623,9 +1627,7 @@  st:			if (is_imm8(insn->off))
 
 			func = (u8 *) __bpf_call_base + imm32;
 			if (tail_call_reachable) {
-				/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
-				EMIT3_off32(0x48, 0x8B, 0x85,
-					    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
+				RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
 				if (!imm32)
 					return -EINVAL;
 				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
@@ -2464,6 +2466,8 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	else
 		/* sub rsp, stack_size */
 		EMIT4(0x48, 0x83, 0xEC, stack_size);
+	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+		EMIT1(0x50);		/* push rax */
 	/* mov QWORD PTR [rbp - rbx_off], rbx */
 	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
 
@@ -2516,6 +2520,12 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 		restore_regs(m, &prog, regs_off);
 		save_args(m, &prog, arg_stack_off, true);
 
+		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+			/* Before calling the original function, restore the
+			 * tail_call_cnt from stack.
+			 */
+			RESTORE_TAIL_CALL_CNT(stack_size);
+
 		if (flags & BPF_TRAMP_F_ORIG_STACK) {
 			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
 			EMIT2(0xff, 0xd0); /* call *rax */
@@ -2569,7 +2579,12 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 			ret = -EINVAL;
 			goto cleanup;
 		}
-	}
+	} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+		/* Before running the original function, restore the
+		 * tail_call_cnt from stack.
+		 */
+		RESTORE_TAIL_CALL_CNT(stack_size);
+
 	/* restore return value of orig_call or fentry prog back into RAX */
 	if (save_ret)
 		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cfabbcf47bdb8..55c72086034ef 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1028,6 +1028,11 @@  struct btf_func_model {
  */
 #define BPF_TRAMP_F_SHARE_IPMODIFY	BIT(6)
 
+/* Indicate that current trampoline is in a tail call context. Then, it has to
+ * cache and restore tail_call_cnt to avoid infinite tail call loop.
+ */
+#define BPF_TRAMP_F_TAIL_CALL_CTX	BIT(7)
+
 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
  * bytes on x86.
  */
@@ -1147,6 +1152,7 @@  struct bpf_attach_target_info {
 	struct module *tgt_mod;
 	const char *tgt_name;
 	const struct btf_type *tgt_type;
+	bool tail_call_ctx;
 };
 
 #define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 78acf28d48732..0fae334e3f7b8 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -415,8 +415,8 @@  static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 		goto out;
 	}
 
-	/* clear all bits except SHARE_IPMODIFY */
-	tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
+	/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
+	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
 
 	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
 	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
@@ -783,6 +783,7 @@  struct bpf_trampoline *bpf_trampoline_get(u64 key,
 
 	memcpy(&tr->func.model, &tgt_info->fmodel, sizeof(tgt_info->fmodel));
 	tr->func.addr = (void *)tgt_info->tgt_addr;
+	tr->flags = (tgt_info->tail_call_ctx ? BPF_TRAMP_F_TAIL_CALL_CTX : 0);
 out:
 	mutex_unlock(&tr->mutex);
 	return tr;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ccca1f6c9981..a78e5a2ae5c72 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19400,10 +19400,15 @@  int bpf_check_attach_target(struct bpf_verifier_log *log,
 			return -EINVAL;
 		fallthrough;
 	case BPF_MODIFY_RETURN:
-	case BPF_LSM_MAC:
-	case BPF_LSM_CGROUP:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+		if (tgt_prog && subprog > 0 &&
+		    tgt_prog->aux->func[subprog]->is_func &&
+		    tgt_prog->aux->tail_call_reachable)
+			tgt_info->tail_call_ctx = true;
+		fallthrough;
+	case BPF_LSM_MAC:
+	case BPF_LSM_CGROUP:
 		if (!btf_type_is_func(t)) {
 			bpf_log(log, "attach_btf_id %u is not a function\n",
 				btf_id);