diff mbox series

bpf, x86: allow function arguments up to 12 for TRACING

Message ID 20230530044423.3897681-1-imagedong@tencent.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf, x86: allow function arguments up to 12 for TRACING | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc

Commit Message

Menglong Dong May 30, 2023, 4:44 a.m. UTC
From: Menglong Dong <imagedong@tencent.com>

For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
on the kernel functions whose arguments count less than 6. This is not
friendly at all, as too many functions have arguments count more than 6.

Therefore, let's enhance it by increasing the function arguments count
allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.

For the case that we don't need to call origin function, which means
without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
that stored in the frame of the caller to current frame. The arguments
of arg6-argN are stored in "$rbp + 0x18", we need copy them to
"$rbp - regs_off + (6 * 8)".

For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
in stack before call origin function, which means we need alloc extra
"8 * (arg_count - 6)" memory in the top of the stack. Note, there should
not be any data be pushed to the stack before call the origin function.
Then, we have to store rbx with 'mov' instead of 'push'.

It works well for the FENTRY and FEXIT, I'm not sure if there are other
complicated cases.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
 arch/x86/net/bpf_jit_comp.c | 88 ++++++++++++++++++++++++++++++++-----
 1 file changed, 77 insertions(+), 11 deletions(-)

Comments

Jiri Olsa May 31, 2023, 8:01 a.m. UTC | #1
On Tue, May 30, 2023 at 12:44:23PM +0800, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
> on the kernel functions whose arguments count less than 6. This is not
> friendly at all, as too many functions have arguments count more than 6.
> 
> Therefore, let's enhance it by increasing the function arguments count
> allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
> 
> For the case that we don't need to call origin function, which means
> without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
> that stored in the frame of the caller to current frame. The arguments
> of arg6-argN are stored in "$rbp + 0x18", we need copy them to
> "$rbp - regs_off + (6 * 8)".
> 
> For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
> in stack before call origin function, which means we need alloc extra
> "8 * (arg_count - 6)" memory in the top of the stack. Note, there should
> not be any data be pushed to the stack before call the origin function.
> Then, we have to store rbx with 'mov' instead of 'push'.
> 
> It works well for the FENTRY and FEXIT, I'm not sure if there are other
> complicated cases.
> 
> Signed-off-by: Menglong Dong <imagedong@tencent.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 88 ++++++++++++++++++++++++++++++++-----

please add selftests for this.. I had to add one to be able to check
the generated trampoline

jirka


>  1 file changed, 77 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 1056bbf55b17..a3bc7e86ca19 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1868,7 +1868,7 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>  	 * mov QWORD PTR [rbp-0x10],rdi
>  	 * mov QWORD PTR [rbp-0x8],rsi
>  	 */
> -	for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
> +	for (i = 0, j = 0; i < min(nr_regs, 12); i++) {
>  		/* The arg_size is at most 16 bytes, enforced by the verifier. */
>  		arg_size = m->arg_size[j];
>  		if (arg_size > 8) {
> @@ -1876,10 +1876,22 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>  			next_same_struct = !next_same_struct;
>  		}
>  
> -		emit_stx(prog, bytes_to_bpf_size(arg_size),
> -			 BPF_REG_FP,
> -			 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> -			 -(stack_size - i * 8));
> +		if (i <= 5) {
> +			/* store function arguments in regs */
> +			emit_stx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_FP,
> +				 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> +				 -(stack_size - i * 8));
> +		} else {
> +			/* store function arguments in stack */
> +			emit_ldx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_0, BPF_REG_FP,
> +				 (i - 6) * 8 + 0x18);
> +			emit_stx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_FP,
> +				 BPF_REG_0,
> +				 -(stack_size - i * 8));
> +		}
>  
>  		j = next_same_struct ? j : j + 1;
>  	}
> @@ -1913,6 +1925,41 @@ static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>  	}
>  }
>  
> +static void prepare_origin_stack(const struct btf_func_model *m, u8 **prog,
> +				 int nr_regs, int stack_size)
> +{
> +	int i, j, arg_size;
> +	bool next_same_struct = false;
> +
> +	if (nr_regs <= 6)
> +		return;
> +
> +	/* Prepare the function arguments in stack before call origin
> +	 * function. These arguments must be stored in the top of the
> +	 * stack.
> +	 */
> +	for (i = 0, j = 0; i < min(nr_regs, 12); i++) {
> +		/* The arg_size is at most 16 bytes, enforced by the verifier. */
> +		arg_size = m->arg_size[j];
> +		if (arg_size > 8) {
> +			arg_size = 8;
> +			next_same_struct = !next_same_struct;
> +		}
> +
> +		if (i > 5) {
> +			emit_ldx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_0, BPF_REG_FP,
> +				 (i - 6) * 8 + 0x18);
> +			emit_stx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_FP,
> +				 BPF_REG_0,
> +				 -(stack_size - (i - 6) * 8));
> +		}
> +
> +		j = next_same_struct ? j : j + 1;
> +	}
> +}
> +
>  static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>  			   struct bpf_tramp_link *l, int stack_size,
>  			   int run_ctx_off, bool save_ret)
> @@ -2136,7 +2183,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  				void *func_addr)
>  {
>  	int i, ret, nr_regs = m->nr_args, stack_size = 0;
> -	int regs_off, nregs_off, ip_off, run_ctx_off;
> +	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
>  	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
>  	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>  	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> @@ -2150,8 +2197,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
>  			nr_regs += (m->arg_size[i] + 7) / 8 - 1;
>  
> -	/* x86-64 supports up to 6 arguments. 7+ can be added in the future */
> -	if (nr_regs > 6)
> +	/* x86-64 supports up to 12 arguments. 1-6 are passed through
> +	 * regs, the remains are through stack.
> +	 */
> +	if (nr_regs > 12)
>  		return -ENOTSUPP;
>  
>  	/* Generated trampoline stack layout:
> @@ -2170,7 +2219,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	 *
>  	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
>  	 *
> +	 * RBP - rbx_off   [ rbx value       ]  always
> +	 *
>  	 * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
> +	 *
> +	 *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
> +	 *                     [ ...        ]
> +	 *                     [ stack_arg2 ]
> +	 * RBP - arg_stack_off [ stack_arg1 ]
>  	 */
>  
>  	/* room for return value of orig_call or fentry prog */
> @@ -2190,9 +2246,17 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  
>  	ip_off = stack_size;
>  
> +	stack_size += 8;
> +	rbx_off = stack_size;
> +
>  	stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
>  	run_ctx_off = stack_size;
>  
> +	if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG))
> +		stack_size += (nr_regs - 6) * 8;
> +
> +	arg_stack_off = stack_size;
> +
>  	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
>  		/* skip patched call instruction and point orig_call to actual
>  		 * body of the kernel function.
> @@ -2212,8 +2276,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	x86_call_depth_emit_accounting(&prog, NULL);
>  	EMIT1(0x55);		 /* push rbp */
>  	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
> -	EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
> -	EMIT1(0x53);		 /* push rbx */
> +	EMIT3_off32(0x48, 0x81, 0xEC, stack_size); /* sub rsp, stack_size */
> +	/* mov QWORD PTR [rbp - rbx_off], rbx */
> +	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
>  
>  	/* Store number of argument registers of the traced function:
>  	 *   mov rax, nr_regs
> @@ -2262,6 +2327,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  
>  	if (flags & BPF_TRAMP_F_CALL_ORIG) {
>  		restore_regs(m, &prog, nr_regs, regs_off);
> +		prepare_origin_stack(m, &prog, nr_regs, arg_stack_off);
>  
>  		if (flags & BPF_TRAMP_F_ORIG_STACK) {
>  			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
> @@ -2321,7 +2387,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	if (save_ret)
>  		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
>  
> -	EMIT1(0x5B); /* pop rbx */
> +	emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
>  	EMIT1(0xC9); /* leave */
>  	if (flags & BPF_TRAMP_F_SKIP_FRAME)
>  		/* skip our return address and return to parent */
> -- 
> 2.40.1
>
Menglong Dong May 31, 2023, 9:03 a.m. UTC | #2
On Wed, May 31, 2023 at 4:01 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Tue, May 30, 2023 at 12:44:23PM +0800, menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
> > on the kernel functions whose arguments count less than 6. This is not
> > friendly at all, as too many functions have arguments count more than 6.
> >
> > Therefore, let's enhance it by increasing the function arguments count
> > allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
> >
> > For the case that we don't need to call origin function, which means
> > without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
> > that stored in the frame of the caller to current frame. The arguments
> > of arg6-argN are stored in "$rbp + 0x18", we need copy them to
> > "$rbp - regs_off + (6 * 8)".
> >
> > For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
> > in stack before call origin function, which means we need alloc extra
> > "8 * (arg_count - 6)" memory in the top of the stack. Note, there should
> > not be any data be pushed to the stack before call the origin function.
> > Then, we have to store rbx with 'mov' instead of 'push'.
> >
> > It works well for the FENTRY and FEXIT, I'm not sure if there are other
> > complicated cases.
> >
> > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > ---
> >  arch/x86/net/bpf_jit_comp.c | 88 ++++++++++++++++++++++++++++++++-----
>
> please add selftests for this.. I had to add one to be able to check
> the generated trampoline
>

Okay!

BTW, I failed to compile the latest selftests/bpf with
the following errors:

progs/verifier_and.c:58:16: error: invalid operand for instruction
        asm volatile ("                                 \

The version of clang I used is:

clang --version
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Does anyone know the reason?

Thanks!
Menglong Dong

> jirka
>
>
Eduard Zingerman May 31, 2023, 12:02 p.m. UTC | #3
On Wed, 2023-05-31 at 17:03 +0800, Menglong Dong wrote:
> On Wed, May 31, 2023 at 4:01 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> > 
> > On Tue, May 30, 2023 at 12:44:23PM +0800, menglong8.dong@gmail.com wrote:
> > > From: Menglong Dong <imagedong@tencent.com>
> > > 
> > > For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
> > > on the kernel functions whose arguments count less than 6. This is not
> > > friendly at all, as too many functions have arguments count more than 6.
> > > 
> > > Therefore, let's enhance it by increasing the function arguments count
> > > allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
> > > 
> > > For the case that we don't need to call origin function, which means
> > > without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
> > > that stored in the frame of the caller to current frame. The arguments
> > > of arg6-argN are stored in "$rbp + 0x18", we need copy them to
> > > "$rbp - regs_off + (6 * 8)".
> > > 
> > > For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
> > > in stack before call origin function, which means we need alloc extra
> > > "8 * (arg_count - 6)" memory in the top of the stack. Note, there should
> > > not be any data be pushed to the stack before call the origin function.
> > > Then, we have to store rbx with 'mov' instead of 'push'.
> > > 
> > > It works well for the FENTRY and FEXIT, I'm not sure if there are other
> > > complicated cases.
> > > 
> > > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > > ---
> > >  arch/x86/net/bpf_jit_comp.c | 88 ++++++++++++++++++++++++++++++++-----
> > 
> > please add selftests for this.. I had to add one to be able to check
> > the generated trampoline
> > 
> 
> Okay!
> 
> BTW, I failed to compile the latest selftests/bpf with
> the following errors:
> 
> progs/verifier_and.c:58:16: error: invalid operand for instruction
>         asm volatile ("                                 \
> 

These tests were moved to use inline assembly recently (2 month ago).
Discussion at the time was whether to use \n\ or \ terminators at the
end of each line. People opted for \ as easier to read.
Replacing \ with \n\ and compiling this test using clang 14 shows
more informative error message:

$ make -j14 `pwd`/verifier_and.bpf.o
  CLNG-BPF [test_maps] verifier_and.bpf.o
progs/verifier_and.c:68:1: error: invalid operand for instruction
        w1 %%= 2;                                       \n\
^
<inline asm>:11:5: note: instantiated into assembly here
        w1 %= 2; 

My guess is that clang 14 does not know how to handle operations on
32-bit sub-registers w[0-9].

But using clang 14 I get some errors not related to inline assembly as well.
Also, I recall that there were runtime issues with clang 14 and
tests using enum64.

All-in-all, you need newer version of clang for tests nowadays,
sorry for inconvenience.

> The version of clang I used is:
> 
> clang --version
> Debian clang version 14.0.6
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> 
> Does anyone know the reason?
> 
> Thanks!
> Menglong Dong
> 
> > jirka
> > 
> > 
>
Menglong Dong May 31, 2023, 1:10 p.m. UTC | #4
On Wed, May 31, 2023 at 8:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2023-05-31 at 17:03 +0800, Menglong Dong wrote:
> > On Wed, May 31, 2023 at 4:01 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Tue, May 30, 2023 at 12:44:23PM +0800, menglong8.dong@gmail.com wrote:
> > > > From: Menglong Dong <imagedong@tencent.com>
> > > >
> > > > For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
> > > > on the kernel functions whose arguments count less than 6. This is not
> > > > friendly at all, as too many functions have arguments count more than 6.
> > > >
> > > > Therefore, let's enhance it by increasing the function arguments count
> > > > allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
> > > >
> > > > For the case that we don't need to call origin function, which means
> > > > without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
> > > > that stored in the frame of the caller to current frame. The arguments
> > > > of arg6-argN are stored in "$rbp + 0x18", we need copy them to
> > > > "$rbp - regs_off + (6 * 8)".
> > > >
> > > > For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
> > > > in stack before call origin function, which means we need alloc extra
> > > > "8 * (arg_count - 6)" memory in the top of the stack. Note, there should
> > > > not be any data be pushed to the stack before call the origin function.
> > > > Then, we have to store rbx with 'mov' instead of 'push'.
> > > >
> > > > It works well for the FENTRY and FEXIT, I'm not sure if there are other
> > > > complicated cases.
> > > >
> > > > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > > > ---
> > > >  arch/x86/net/bpf_jit_comp.c | 88 ++++++++++++++++++++++++++++++++-----
> > >
> > > please add selftests for this.. I had to add one to be able to check
> > > the generated trampoline
> > >
> >
> > Okay!
> >
> > BTW, I failed to compile the latest selftests/bpf with
> > the following errors:
> >
> > progs/verifier_and.c:58:16: error: invalid operand for instruction
> >         asm volatile ("                                 \
> >
>
> These tests were moved to use inline assembly recently (2 month ago).
> Discussion at the time was whether to use \n\ or \ terminators at the
> end of each line. People opted for \ as easier to read.
> Replacing \ with \n\ and compiling this test using clang 14 shows
> more informative error message:
>
> $ make -j14 `pwd`/verifier_and.bpf.o
>   CLNG-BPF [test_maps] verifier_and.bpf.o
> progs/verifier_and.c:68:1: error: invalid operand for instruction
>         w1 %%= 2;                                       \n\
> ^
> <inline asm>:11:5: note: instantiated into assembly here
>         w1 %= 2;
>
> My guess is that clang 14 does not know how to handle operations on
> 32-bit sub-registers w[0-9].
>
> But using clang 14 I get some errors not related to inline assembly as well.
> Also, I recall that there were runtime issues with clang 14 and
> tests using enum64.
>
> All-in-all, you need newer version of clang for tests nowadays,
> sorry for inconvenience.

Thanks for your explanation! It works well after I
update my clang to a newer version.

Menglong Dong
>
> > The version of clang I used is:
> >
> > clang --version
> > Debian clang version 14.0.6
> > Target: x86_64-pc-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> >
> > Does anyone know the reason?
> >
> > Thanks!
> > Menglong Dong
> >
> > > jirka
> > >
> > >
> >
>
diff mbox series

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 1056bbf55b17..a3bc7e86ca19 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1868,7 +1868,7 @@  static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
 	 * mov QWORD PTR [rbp-0x10],rdi
 	 * mov QWORD PTR [rbp-0x8],rsi
 	 */
-	for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
+	for (i = 0, j = 0; i < min(nr_regs, 12); i++) {
 		/* The arg_size is at most 16 bytes, enforced by the verifier. */
 		arg_size = m->arg_size[j];
 		if (arg_size > 8) {
@@ -1876,10 +1876,22 @@  static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
 			next_same_struct = !next_same_struct;
 		}
 
-		emit_stx(prog, bytes_to_bpf_size(arg_size),
-			 BPF_REG_FP,
-			 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
-			 -(stack_size - i * 8));
+		if (i <= 5) {
+			/* store function arguments in regs */
+			emit_stx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_FP,
+				 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
+				 -(stack_size - i * 8));
+		} else {
+			/* store function arguments in stack */
+			emit_ldx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_0, BPF_REG_FP,
+				 (i - 6) * 8 + 0x18);
+			emit_stx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_FP,
+				 BPF_REG_0,
+				 -(stack_size - i * 8));
+		}
 
 		j = next_same_struct ? j : j + 1;
 	}
@@ -1913,6 +1925,41 @@  static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
 	}
 }
 
+static void prepare_origin_stack(const struct btf_func_model *m, u8 **prog,
+				 int nr_regs, int stack_size)
+{
+	int i, j, arg_size;
+	bool next_same_struct = false;
+
+	if (nr_regs <= 6)
+		return;
+
+	/* Prepare the function arguments in stack before call origin
+	 * function. These arguments must be stored in the top of the
+	 * stack.
+	 */
+	for (i = 0, j = 0; i < min(nr_regs, 12); i++) {
+		/* The arg_size is at most 16 bytes, enforced by the verifier. */
+		arg_size = m->arg_size[j];
+		if (arg_size > 8) {
+			arg_size = 8;
+			next_same_struct = !next_same_struct;
+		}
+
+		if (i > 5) {
+			emit_ldx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_0, BPF_REG_FP,
+				 (i - 6) * 8 + 0x18);
+			emit_stx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_FP,
+				 BPF_REG_0,
+				 -(stack_size - (i - 6) * 8));
+		}
+
+		j = next_same_struct ? j : j + 1;
+	}
+}
+
 static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 			   struct bpf_tramp_link *l, int stack_size,
 			   int run_ctx_off, bool save_ret)
@@ -2136,7 +2183,7 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 				void *func_addr)
 {
 	int i, ret, nr_regs = m->nr_args, stack_size = 0;
-	int regs_off, nregs_off, ip_off, run_ctx_off;
+	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
 	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
 	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
 	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -2150,8 +2197,10 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
 			nr_regs += (m->arg_size[i] + 7) / 8 - 1;
 
-	/* x86-64 supports up to 6 arguments. 7+ can be added in the future */
-	if (nr_regs > 6)
+	/* x86-64 supports up to 12 arguments. 1-6 are passed through
+	 * regs, the remains are through stack.
+	 */
+	if (nr_regs > 12)
 		return -ENOTSUPP;
 
 	/* Generated trampoline stack layout:
@@ -2170,7 +2219,14 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	 *
 	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
 	 *
+	 * RBP - rbx_off   [ rbx value       ]  always
+	 *
 	 * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
+	 *
+	 *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
+	 *                     [ ...        ]
+	 *                     [ stack_arg2 ]
+	 * RBP - arg_stack_off [ stack_arg1 ]
 	 */
 
 	/* room for return value of orig_call or fentry prog */
@@ -2190,9 +2246,17 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 
 	ip_off = stack_size;
 
+	stack_size += 8;
+	rbx_off = stack_size;
+
 	stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
 	run_ctx_off = stack_size;
 
+	if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG))
+		stack_size += (nr_regs - 6) * 8;
+
+	arg_stack_off = stack_size;
+
 	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
 		/* skip patched call instruction and point orig_call to actual
 		 * body of the kernel function.
@@ -2212,8 +2276,9 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	x86_call_depth_emit_accounting(&prog, NULL);
 	EMIT1(0x55);		 /* push rbp */
 	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
-	EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
-	EMIT1(0x53);		 /* push rbx */
+	EMIT3_off32(0x48, 0x81, 0xEC, stack_size); /* sub rsp, stack_size */
+	/* mov QWORD PTR [rbp - rbx_off], rbx */
+	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
 
 	/* Store number of argument registers of the traced function:
 	 *   mov rax, nr_regs
@@ -2262,6 +2327,7 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
 		restore_regs(m, &prog, nr_regs, regs_off);
+		prepare_origin_stack(m, &prog, nr_regs, arg_stack_off);
 
 		if (flags & BPF_TRAMP_F_ORIG_STACK) {
 			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
@@ -2321,7 +2387,7 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	if (save_ret)
 		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
 
-	EMIT1(0x5B); /* pop rbx */
+	emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
 	EMIT1(0xC9); /* leave */
 	if (flags & BPF_TRAMP_F_SKIP_FRAME)
 		/* skip our return address and return to parent */