diff mbox series

[bpf-next,v2,2/5] bpf, x86: allow function arguments up to 14 for TRACING

Message ID 20230602065958.2869555-3-imagedong@tencent.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf, x86: allow function arguments up to 14 for TRACING | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-11 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs on s390x with gcc
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers success CCed 21 of 21 maintainers
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 10 this patch: 10
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc

Commit Message

Menglong Dong June 2, 2023, 6:59 a.m. UTC
From: Menglong Dong <imagedong@tencent.com>

For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
on the kernel functions whose arguments count less than 6. This is not
friendly at all, as too many functions have arguments count more than 6.

Therefore, let's enhance it by increasing the function arguments count
allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.

For the case that we don't need to call origin function, which means
without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
that stored in the frame of the caller to current frame. The arguments
of arg6-argN are stored in "$rbp + 0x18", we need copy them to
"$rbp - regs_off + (6 * 8)".

For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
in stack before call origin function, which means we need alloc extra
"8 * (arg_count - 6)" memory in the top of the stack. Note, there should
not be any data be pushed to the stack before call the origin function.
Then, we have to store rbx with 'mov' instead of 'push'.

It works well for the FENTRY and FEXIT, I'm not sure if there are other
complicated cases.

Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
v2:
- instead EMIT4 with EMIT3_off32 for "lea" to prevent overflow
- make MAX_BPF_FUNC_ARGS as the maximum argument count
---
 arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++------
 1 file changed, 81 insertions(+), 15 deletions(-)

Comments

Menglong Dong June 2, 2023, 7:40 a.m. UTC | #1
On Fri, Jun 2, 2023 at 3:01 PM <menglong8.dong@gmail.com> wrote:
>
> From: Menglong Dong <imagedong@tencent.com>
> @@ -2262,6 +2327,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>
>         if (flags & BPF_TRAMP_F_CALL_ORIG) {
>                 restore_regs(m, &prog, nr_regs, regs_off);
> +               prepare_origin_stack(m, &prog, nr_regs, arg_stack_off);
>
>                 if (flags & BPF_TRAMP_F_ORIG_STACK) {
>                         emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
> @@ -2321,14 +2387,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>         if (save_ret)
>                 emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
>
> -       EMIT1(0x5B); /* pop rbx */
> +       emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
>         EMIT1(0xC9); /* leave */
>         if (flags & BPF_TRAMP_F_SKIP_FRAME)
>                 /* skip our return address and return to parent */
>                 EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
>         emit_return(&prog, prog);
>         /* Make sure the trampoline generation logic doesn't overflow */
> -       if (WARN_ON_ONCE(prog > (u8 *)image_end - BPF_INSN_SAFETY)) {
> +       if (prog > (u8 *)image_end - BPF_INSN_SAFETY) {

Oops, this line is a mistake, and I should keep it still.

>                 ret = -EFAULT;
>                 goto cleanup;
>         }
> --
> 2.40.1
>
Alexei Starovoitov June 2, 2023, 6:31 p.m. UTC | #2
On Fri, Jun 2, 2023 at 12:01 AM <menglong8.dong@gmail.com> wrote:
>
> From: Menglong Dong <imagedong@tencent.com>

Please trim your cc when you respin. It's unnecessary huge.

> For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
> on the kernel functions whose arguments count less than 6. This is not
> friendly at all, as too many functions have arguments count more than 6.
>
> Therefore, let's enhance it by increasing the function arguments count
> allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
>
> For the case that we don't need to call origin function, which means
> without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
> that stored in the frame of the caller to current frame. The arguments
> of arg6-argN are stored in "$rbp + 0x18", we need copy them to
> "$rbp - regs_off + (6 * 8)".
>
> For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
> in stack before call origin function, which means we need alloc extra
> "8 * (arg_count - 6)" memory in the top of the stack. Note, there should
> not be any data be pushed to the stack before call the origin function.
> Then, we have to store rbx with 'mov' instead of 'push'.
>
> It works well for the FENTRY and FEXIT, I'm not sure if there are other
> complicated cases.
>
> Reviewed-by: Jiang Biao <benbjiang@tencent.com>
> Signed-off-by: Menglong Dong <imagedong@tencent.com>
> ---
> v2:
> - instead EMIT4 with EMIT3_off32 for "lea" to prevent overflow
> - make MAX_BPF_FUNC_ARGS as the maximum argument count
> ---
>  arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++------
>  1 file changed, 81 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 1056bbf55b17..0e247bb7d6f6 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1868,7 +1868,7 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>          * mov QWORD PTR [rbp-0x10],rdi
>          * mov QWORD PTR [rbp-0x8],rsi
>          */
> -       for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
> +       for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
>                 /* The arg_size is at most 16 bytes, enforced by the verifier. */
>                 arg_size = m->arg_size[j];
>                 if (arg_size > 8) {
> @@ -1876,10 +1876,22 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>                         next_same_struct = !next_same_struct;
>                 }
>
> -               emit_stx(prog, bytes_to_bpf_size(arg_size),
> -                        BPF_REG_FP,
> -                        i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> -                        -(stack_size - i * 8));
> +               if (i <= 5) {
> +                       /* store function arguments in regs */

The comment is confusing.
It's not storing arguments in regs.
It copies them from regs into stack.

> +                       emit_stx(prog, bytes_to_bpf_size(arg_size),
> +                                BPF_REG_FP,
> +                                i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> +                                -(stack_size - i * 8));
> +               } else {
> +                       /* store function arguments in stack */
> +                       emit_ldx(prog, bytes_to_bpf_size(arg_size),
> +                                BPF_REG_0, BPF_REG_FP,
> +                                (i - 6) * 8 + 0x18);
> +                       emit_stx(prog, bytes_to_bpf_size(arg_size),

and we will have garbage values in upper bytes.
Probably should fix both here and in regular copy from reg.

> +                                BPF_REG_FP,
> +                                BPF_REG_0,
> +                                -(stack_size - i * 8));
> +               }
>
>                 j = next_same_struct ? j : j + 1;
>         }
> @@ -1913,6 +1925,41 @@ static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>         }
>  }
>
> +static void prepare_origin_stack(const struct btf_func_model *m, u8 **prog,
> +                                int nr_regs, int stack_size)
> +{
> +       int i, j, arg_size;
> +       bool next_same_struct = false;
> +
> +       if (nr_regs <= 6)
> +               return;
> +
> +       /* Prepare the function arguments in stack before call origin
> +        * function. These arguments must be stored in the top of the
> +        * stack.
> +        */
> +       for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
> +               /* The arg_size is at most 16 bytes, enforced by the verifier. */
> +               arg_size = m->arg_size[j];
> +               if (arg_size > 8) {
> +                       arg_size = 8;
> +                       next_same_struct = !next_same_struct;
> +               }
> +
> +               if (i > 5) {
> +                       emit_ldx(prog, bytes_to_bpf_size(arg_size),
> +                                BPF_REG_0, BPF_REG_FP,
> +                                (i - 6) * 8 + 0x18);
> +                       emit_stx(prog, bytes_to_bpf_size(arg_size),
> +                                BPF_REG_FP,
> +                                BPF_REG_0,
> +                                -(stack_size - (i - 6) * 8));
> +               }
> +
> +               j = next_same_struct ? j : j + 1;
> +       }
> +}
> +
>  static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>                            struct bpf_tramp_link *l, int stack_size,
>                            int run_ctx_off, bool save_ret)
> @@ -1938,7 +1985,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>         /* arg1: mov rdi, progs[i] */
>         emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
>         /* arg2: lea rsi, [rbp - ctx_cookie_off] */
> -       EMIT4(0x48, 0x8D, 0x75, -run_ctx_off);
> +       EMIT3_off32(0x48, 0x8D, 0xB5, -run_ctx_off);
>
>         if (emit_rsb_call(&prog, bpf_trampoline_enter(p), prog))
>                 return -EINVAL;
> @@ -1954,7 +2001,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>         emit_nops(&prog, 2);
>
>         /* arg1: lea rdi, [rbp - stack_size] */
> -       EMIT4(0x48, 0x8D, 0x7D, -stack_size);
> +       EMIT3_off32(0x48, 0x8D, 0xBD, -stack_size);
>         /* arg2: progs[i]->insnsi for interpreter */
>         if (!p->jited)
>                 emit_mov_imm64(&prog, BPF_REG_2,
> @@ -1984,7 +2031,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>         /* arg2: mov rsi, rbx <- start time in nsec */
>         emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
>         /* arg3: lea rdx, [rbp - run_ctx_off] */
> -       EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
> +       EMIT3_off32(0x48, 0x8D, 0x95, -run_ctx_off);
>         if (emit_rsb_call(&prog, bpf_trampoline_exit(p), prog))
>                 return -EINVAL;
>
> @@ -2136,7 +2183,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>                                 void *func_addr)
>  {
>         int i, ret, nr_regs = m->nr_args, stack_size = 0;
> -       int regs_off, nregs_off, ip_off, run_ctx_off;
> +       int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
>         struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
>         struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>         struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> @@ -2150,8 +2197,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>                 if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
>                         nr_regs += (m->arg_size[i] + 7) / 8 - 1;
>
> -       /* x86-64 supports up to 6 arguments. 7+ can be added in the future */
> -       if (nr_regs > 6)
> +       /* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6
> +        * are passed through regs, the remains are through stack.
> +        */
> +       if (nr_regs > MAX_BPF_FUNC_ARGS)
>                 return -ENOTSUPP;
>
>         /* Generated trampoline stack layout:
> @@ -2170,7 +2219,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>          *
>          * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
>          *
> +        * RBP - rbx_off   [ rbx value       ]  always
> +        *

That is the case already and we just didn't document it, right?

>          * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
> +        *
> +        *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
> +        *                     [ ...        ]
> +        *                     [ stack_arg2 ]
> +        * RBP - arg_stack_off [ stack_arg1 ]
>          */
>
>         /* room for return value of orig_call or fentry prog */
> @@ -2190,9 +2246,17 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>
>         ip_off = stack_size;
>
> +       stack_size += 8;
> +       rbx_off = stack_size;
> +
>         stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
>         run_ctx_off = stack_size;
>
> +       if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG))
> +               stack_size += (nr_regs - 6) * 8;
> +
> +       arg_stack_off = stack_size;
> +
>         if (flags & BPF_TRAMP_F_SKIP_FRAME) {
>                 /* skip patched call instruction and point orig_call to actual
>                  * body of the kernel function.
> @@ -2212,8 +2276,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>         x86_call_depth_emit_accounting(&prog, NULL);
>         EMIT1(0x55);             /* push rbp */
>         EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
> -       EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
> -       EMIT1(0x53);             /* push rbx */
> +       EMIT3_off32(0x48, 0x81, 0xEC, stack_size); /* sub rsp, stack_size */
> +       /* mov QWORD PTR [rbp - rbx_off], rbx */
> +       emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
>
>         /* Store number of argument registers of the traced function:
>          *   mov rax, nr_regs
> @@ -2262,6 +2327,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>
>         if (flags & BPF_TRAMP_F_CALL_ORIG) {
>                 restore_regs(m, &prog, nr_regs, regs_off);
> +               prepare_origin_stack(m, &prog, nr_regs, arg_stack_off);
>
>                 if (flags & BPF_TRAMP_F_ORIG_STACK) {
>                         emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
> @@ -2321,14 +2387,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>         if (save_ret)
>                 emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
>
> -       EMIT1(0x5B); /* pop rbx */
> +       emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);

It can stay as 'pop', no?

>         EMIT1(0xC9); /* leave */
>         if (flags & BPF_TRAMP_F_SKIP_FRAME)
>                 /* skip our return address and return to parent */
>                 EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
>         emit_return(&prog, prog);
>         /* Make sure the trampoline generation logic doesn't overflow */
> -       if (WARN_ON_ONCE(prog > (u8 *)image_end - BPF_INSN_SAFETY)) {
> +       if (prog > (u8 *)image_end - BPF_INSN_SAFETY) {
>                 ret = -EFAULT;
>                 goto cleanup;
>         }
> --
> 2.40.1
>
Menglong Dong June 5, 2023, 2:40 a.m. UTC | #3
On Sat, Jun 3, 2023 at 2:31 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jun 2, 2023 at 12:01 AM <menglong8.dong@gmail.com> wrote:
> >
> > From: Menglong Dong <imagedong@tencent.com>
>
> Please trim your cc when you respin. It's unnecessary huge.

Sorry for bothering the unrelated people. The cc is generated
from ./scripts/get_maintainer.pl, and I'll keep it less than 15.

>
[...]
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 1056bbf55b17..0e247bb7d6f6 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -1868,7 +1868,7 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
> >          * mov QWORD PTR [rbp-0x10],rdi
> >          * mov QWORD PTR [rbp-0x8],rsi
> >          */
> > -       for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
> > +       for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
> >                 /* The arg_size is at most 16 bytes, enforced by the verifier. */
> >                 arg_size = m->arg_size[j];
> >                 if (arg_size > 8) {
> > @@ -1876,10 +1876,22 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
> >                         next_same_struct = !next_same_struct;
> >                 }
> >
> > -               emit_stx(prog, bytes_to_bpf_size(arg_size),
> > -                        BPF_REG_FP,
> > -                        i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> > -                        -(stack_size - i * 8));
> > +               if (i <= 5) {
> > +                       /* store function arguments in regs */
>
> The comment is confusing.
> It's not storing arguments in regs.
> It copies them from regs into stack.

Right, I'll use "copy arguments from regs into stack"
instead.

>
> > +                       emit_stx(prog, bytes_to_bpf_size(arg_size),
> > +                                BPF_REG_FP,
> > +                                i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> > +                                -(stack_size - i * 8));
> > +               } else {
> > +                       /* store function arguments in stack */
> > +                       emit_ldx(prog, bytes_to_bpf_size(arg_size),
> > +                                BPF_REG_0, BPF_REG_FP,
> > +                                (i - 6) * 8 + 0x18);
> > +                       emit_stx(prog, bytes_to_bpf_size(arg_size),
>
> and we will have garbage values in upper bytes.
> Probably should fix both here and in regular copy from reg.
>

I noticed it too......I'll dig it deeper to find a solution.

> > +                                BPF_REG_FP,
> > +                                BPF_REG_0,
> > +                                -(stack_size - i * 8));
> > +               }
> >
[......]
> >         /* Generated trampoline stack layout:
> > @@ -2170,7 +2219,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >          *
> >          * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
> >          *
> > +        * RBP - rbx_off   [ rbx value       ]  always
> > +        *
>
> That is the case already and we just didn't document it, right?
>

I'm afraid not anymore. In the origin logic, we use
"push rbx" after "sub rsp, stack_size". This will store
"rbx" into the top of the stack.

However, now we need to make sure the arguments,
which we copy from the stack frame of the caller into
current stack frame in prepare_origin_stack(), stay in
the top of the stack, to pass these arguments to the
orig_call through stack.

> >          * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
> > +        *
> > +        *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
> > +        *                     [ ...        ]
> > +        *                     [ stack_arg2 ]
> > +        * RBP - arg_stack_off [ stack_arg1 ]
> >          */
> >
> >         /* room for return value of orig_call or fentry prog */
> > @@ -2190,9 +2246,17 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >
> >         ip_off = stack_size;
> >
> > +       stack_size += 8;
> > +       rbx_off = stack_size;
> > +
> >         stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
> >         run_ctx_off = stack_size;
> >
> > +       if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG))
> > +               stack_size += (nr_regs - 6) * 8;
> > +
> > +       arg_stack_off = stack_size;
> > +
> >         if (flags & BPF_TRAMP_F_SKIP_FRAME) {
> >                 /* skip patched call instruction and point orig_call to actual
> >                  * body of the kernel function.
> > @@ -2212,8 +2276,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >         x86_call_depth_emit_accounting(&prog, NULL);
> >         EMIT1(0x55);             /* push rbp */
> >         EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
> > -       EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
> > -       EMIT1(0x53);             /* push rbx */
> > +       EMIT3_off32(0x48, 0x81, 0xEC, stack_size); /* sub rsp, stack_size */
> > +       /* mov QWORD PTR [rbp - rbx_off], rbx */
> > +       emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
> >
> >         /* Store number of argument registers of the traced function:
> >          *   mov rax, nr_regs
> > @@ -2262,6 +2327,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >
> >         if (flags & BPF_TRAMP_F_CALL_ORIG) {
> >                 restore_regs(m, &prog, nr_regs, regs_off);
> > +               prepare_origin_stack(m, &prog, nr_regs, arg_stack_off);
> >
> >                 if (flags & BPF_TRAMP_F_ORIG_STACK) {
> >                         emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
> > @@ -2321,14 +2387,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >         if (save_ret)
> >                 emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
> >
> > -       EMIT1(0x5B); /* pop rbx */
> > +       emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
>
> It can stay as 'pop', no?
>

As now we don't use "push rbx" anymore,
we can't use "pop" here either, as we store rbx in
the stack of a specific location.

Thanks!
Menglong Dong

> >         EMIT1(0xC9); /* leave */
> >         if (flags & BPF_TRAMP_F_SKIP_FRAME)
> >                 /* skip our return address and return to parent */
> >                 EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
> >         emit_return(&prog, prog);
> >         /* Make sure the trampoline generation logic doesn't overflow */
> > -       if (WARN_ON_ONCE(prog > (u8 *)image_end - BPF_INSN_SAFETY)) {
> > +       if (prog > (u8 *)image_end - BPF_INSN_SAFETY) {
> >                 ret = -EFAULT;
> >                 goto cleanup;
> >         }
> > --
> > 2.40.1
> >
Jiri Olsa June 5, 2023, 8:10 p.m. UTC | #4
On Fri, Jun 02, 2023 at 02:59:55PM +0800, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
> on the kernel functions whose arguments count less than 6. This is not
> friendly at all, as too many functions have arguments count more than 6.
> 
> Therefore, let's enhance it by increasing the function arguments count
> allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
> 
> For the case that we don't need to call origin function, which means
> without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
> that stored in the frame of the caller to current frame. The arguments
> of arg6-argN are stored in "$rbp + 0x18", we need copy them to
> "$rbp - regs_off + (6 * 8)".
> 
> For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
> in stack before call origin function, which means we need alloc extra
> "8 * (arg_count - 6)" memory in the top of the stack. Note, there should
> not be any data be pushed to the stack before call the origin function.
> Then, we have to store rbx with 'mov' instead of 'push'.
> 
> It works well for the FENTRY and FEXIT, I'm not sure if there are other
> complicated cases.
> 
> Reviewed-by: Jiang Biao <benbjiang@tencent.com>
> Signed-off-by: Menglong Dong <imagedong@tencent.com>
> ---
> v2:
> - instead EMIT4 with EMIT3_off32 for "lea" to prevent overflow

could you please describe in more details what's the problem with that?
you also changed that for 'sub rsp, stack_size'

thanks
jirka


> - make MAX_BPF_FUNC_ARGS as the maximum argument count
> ---
>  arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++------
>  1 file changed, 81 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 1056bbf55b17..0e247bb7d6f6 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1868,7 +1868,7 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>  	 * mov QWORD PTR [rbp-0x10],rdi
>  	 * mov QWORD PTR [rbp-0x8],rsi
>  	 */
> -	for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
> +	for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
>  		/* The arg_size is at most 16 bytes, enforced by the verifier. */
>  		arg_size = m->arg_size[j];
>  		if (arg_size > 8) {
> @@ -1876,10 +1876,22 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>  			next_same_struct = !next_same_struct;
>  		}
>  
> -		emit_stx(prog, bytes_to_bpf_size(arg_size),
> -			 BPF_REG_FP,
> -			 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> -			 -(stack_size - i * 8));
> +		if (i <= 5) {
> +			/* store function arguments in regs */
> +			emit_stx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_FP,
> +				 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> +				 -(stack_size - i * 8));
> +		} else {
> +			/* store function arguments in stack */
> +			emit_ldx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_0, BPF_REG_FP,
> +				 (i - 6) * 8 + 0x18);
> +			emit_stx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_FP,
> +				 BPF_REG_0,
> +				 -(stack_size - i * 8));
> +		}
>  
>  		j = next_same_struct ? j : j + 1;
>  	}
> @@ -1913,6 +1925,41 @@ static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
>  	}
>  }
>  
> +static void prepare_origin_stack(const struct btf_func_model *m, u8 **prog,
> +				 int nr_regs, int stack_size)
> +{
> +	int i, j, arg_size;
> +	bool next_same_struct = false;
> +
> +	if (nr_regs <= 6)
> +		return;
> +
> +	/* Prepare the function arguments in stack before call origin
> +	 * function. These arguments must be stored in the top of the
> +	 * stack.
> +	 */
> +	for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
> +		/* The arg_size is at most 16 bytes, enforced by the verifier. */
> +		arg_size = m->arg_size[j];
> +		if (arg_size > 8) {
> +			arg_size = 8;
> +			next_same_struct = !next_same_struct;
> +		}
> +
> +		if (i > 5) {
> +			emit_ldx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_0, BPF_REG_FP,
> +				 (i - 6) * 8 + 0x18);
> +			emit_stx(prog, bytes_to_bpf_size(arg_size),
> +				 BPF_REG_FP,
> +				 BPF_REG_0,
> +				 -(stack_size - (i - 6) * 8));
> +		}
> +
> +		j = next_same_struct ? j : j + 1;
> +	}
> +}
> +
>  static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>  			   struct bpf_tramp_link *l, int stack_size,
>  			   int run_ctx_off, bool save_ret)
> @@ -1938,7 +1985,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>  	/* arg1: mov rdi, progs[i] */
>  	emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
>  	/* arg2: lea rsi, [rbp - ctx_cookie_off] */
> -	EMIT4(0x48, 0x8D, 0x75, -run_ctx_off);
> +	EMIT3_off32(0x48, 0x8D, 0xB5, -run_ctx_off);
>  
>  	if (emit_rsb_call(&prog, bpf_trampoline_enter(p), prog))
>  		return -EINVAL;
> @@ -1954,7 +2001,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>  	emit_nops(&prog, 2);
>  
>  	/* arg1: lea rdi, [rbp - stack_size] */
> -	EMIT4(0x48, 0x8D, 0x7D, -stack_size);
> +	EMIT3_off32(0x48, 0x8D, 0xBD, -stack_size);
>  	/* arg2: progs[i]->insnsi for interpreter */
>  	if (!p->jited)
>  		emit_mov_imm64(&prog, BPF_REG_2,
> @@ -1984,7 +2031,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>  	/* arg2: mov rsi, rbx <- start time in nsec */
>  	emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
>  	/* arg3: lea rdx, [rbp - run_ctx_off] */
> -	EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
> +	EMIT3_off32(0x48, 0x8D, 0x95, -run_ctx_off);
>  	if (emit_rsb_call(&prog, bpf_trampoline_exit(p), prog))
>  		return -EINVAL;
>  
> @@ -2136,7 +2183,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  				void *func_addr)
>  {
>  	int i, ret, nr_regs = m->nr_args, stack_size = 0;
> -	int regs_off, nregs_off, ip_off, run_ctx_off;
> +	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
>  	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
>  	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>  	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> @@ -2150,8 +2197,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
>  			nr_regs += (m->arg_size[i] + 7) / 8 - 1;
>  
> -	/* x86-64 supports up to 6 arguments. 7+ can be added in the future */
> -	if (nr_regs > 6)
> +	/* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6
> +	 * are passed through regs, the remains are through stack.
> +	 */
> +	if (nr_regs > MAX_BPF_FUNC_ARGS)
>  		return -ENOTSUPP;
>  
>  	/* Generated trampoline stack layout:
> @@ -2170,7 +2219,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	 *
>  	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
>  	 *
> +	 * RBP - rbx_off   [ rbx value       ]  always
> +	 *
>  	 * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
> +	 *
> +	 *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
> +	 *                     [ ...        ]
> +	 *                     [ stack_arg2 ]
> +	 * RBP - arg_stack_off [ stack_arg1 ]
>  	 */
>  
>  	/* room for return value of orig_call or fentry prog */
> @@ -2190,9 +2246,17 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  
>  	ip_off = stack_size;
>  
> +	stack_size += 8;
> +	rbx_off = stack_size;
> +
>  	stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
>  	run_ctx_off = stack_size;
>  
> +	if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG))
> +		stack_size += (nr_regs - 6) * 8;
> +
> +	arg_stack_off = stack_size;
> +
>  	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
>  		/* skip patched call instruction and point orig_call to actual
>  		 * body of the kernel function.
> @@ -2212,8 +2276,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	x86_call_depth_emit_accounting(&prog, NULL);
>  	EMIT1(0x55);		 /* push rbp */
>  	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
> -	EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
> -	EMIT1(0x53);		 /* push rbx */
> +	EMIT3_off32(0x48, 0x81, 0xEC, stack_size); /* sub rsp, stack_size */
> +	/* mov QWORD PTR [rbp - rbx_off], rbx */
> +	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
>  
>  	/* Store number of argument registers of the traced function:
>  	 *   mov rax, nr_regs
> @@ -2262,6 +2327,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  
>  	if (flags & BPF_TRAMP_F_CALL_ORIG) {
>  		restore_regs(m, &prog, nr_regs, regs_off);
> +		prepare_origin_stack(m, &prog, nr_regs, arg_stack_off);
>  
>  		if (flags & BPF_TRAMP_F_ORIG_STACK) {
>  			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
> @@ -2321,14 +2387,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	if (save_ret)
>  		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
>  
> -	EMIT1(0x5B); /* pop rbx */
> +	emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
>  	EMIT1(0xC9); /* leave */
>  	if (flags & BPF_TRAMP_F_SKIP_FRAME)
>  		/* skip our return address and return to parent */
>  		EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
>  	emit_return(&prog, prog);
>  	/* Make sure the trampoline generation logic doesn't overflow */
> -	if (WARN_ON_ONCE(prog > (u8 *)image_end - BPF_INSN_SAFETY)) {
> +	if (prog > (u8 *)image_end - BPF_INSN_SAFETY) {
>  		ret = -EFAULT;
>  		goto cleanup;
>  	}
> -- 
> 2.40.1
>
Menglong Dong June 6, 2023, 2:02 a.m. UTC | #5
On Tue, Jun 6, 2023 at 4:11 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Fri, Jun 02, 2023 at 02:59:55PM +0800, menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
> > on the kernel functions whose arguments count less than 6. This is not
> > friendly at all, as too many functions have arguments count more than 6.
> >
> > Therefore, let's enhance it by increasing the function arguments count
> > allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
> >
> > For the case that we don't need to call origin function, which means
> > without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
> > that stored in the frame of the caller to current frame. The arguments
> > of arg6-argN are stored in "$rbp + 0x18", we need copy them to
> > "$rbp - regs_off + (6 * 8)".
> >
> > For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
> > in stack before call origin function, which means we need alloc extra
> > "8 * (arg_count - 6)" memory in the top of the stack. Note, there should
> > not be any data be pushed to the stack before call the origin function.
> > Then, we have to store rbx with 'mov' instead of 'push'.
> >
> > It works well for the FENTRY and FEXIT, I'm not sure if there are other
> > complicated cases.
> >
> > Reviewed-by: Jiang Biao <benbjiang@tencent.com>
> > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > ---
> > v2:
> > - instead EMIT4 with EMIT3_off32 for "lea" to prevent overflow
>
> could you please describe in more details what's the problem with that?
> you also changed that for 'sub rsp, stack_size'
>

Sorry for the confusion. Take 'sub rsp, stack_size' for example,
in the origin logic, which is:

  EMIT4(0x48, 0x83, 0xEC, stack_size)

the imm in the instruction is a signed char. So the maximum
of the imm is 127.

However, now the stack_size is more than 127 if
the count of the function arguments is more than 8.

Therefore, I use:

  EMIT3_off32(0x48, 0x81, 0xEC, stack_size)

And the imm in this instruction is signed int.

The same reason for "lea" instruction.

Thanks!
Menglong Dong

> thanks
> jirka
>
>
> > - make MAX_BPF_FUNC_ARGS as the maximum argument count
> > ---
> >  arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++------
> >  1 file changed, 81 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 1056bbf55b17..0e247bb7d6f6 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -1868,7 +1868,7 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
> >        * mov QWORD PTR [rbp-0x10],rdi
> >        * mov QWORD PTR [rbp-0x8],rsi
> >        */
> > -     for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
> > +     for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
> >               /* The arg_size is at most 16 bytes, enforced by the verifier. */
> >               arg_size = m->arg_size[j];
> >               if (arg_size > 8) {
> > @@ -1876,10 +1876,22 @@ static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
> >                       next_same_struct = !next_same_struct;
> >               }
> >
> > -             emit_stx(prog, bytes_to_bpf_size(arg_size),
> > -                      BPF_REG_FP,
> > -                      i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> > -                      -(stack_size - i * 8));
> > +             if (i <= 5) {
> > +                     /* store function arguments in regs */
> > +                     emit_stx(prog, bytes_to_bpf_size(arg_size),
> > +                              BPF_REG_FP,
> > +                              i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
> > +                              -(stack_size - i * 8));
> > +             } else {
> > +                     /* store function arguments in stack */
> > +                     emit_ldx(prog, bytes_to_bpf_size(arg_size),
> > +                              BPF_REG_0, BPF_REG_FP,
> > +                              (i - 6) * 8 + 0x18);
> > +                     emit_stx(prog, bytes_to_bpf_size(arg_size),
> > +                              BPF_REG_FP,
> > +                              BPF_REG_0,
> > +                              -(stack_size - i * 8));
> > +             }
> >
> >               j = next_same_struct ? j : j + 1;
> >       }
> > @@ -1913,6 +1925,41 @@ static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
> >       }
> >  }
> >
> > +static void prepare_origin_stack(const struct btf_func_model *m, u8 **prog,
> > +                              int nr_regs, int stack_size)
> > +{
> > +     int i, j, arg_size;
> > +     bool next_same_struct = false;
> > +
> > +     if (nr_regs <= 6)
> > +             return;
> > +
> > +     /* Prepare the function arguments in stack before call origin
> > +      * function. These arguments must be stored in the top of the
> > +      * stack.
> > +      */
> > +     for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
> > +             /* The arg_size is at most 16 bytes, enforced by the verifier. */
> > +             arg_size = m->arg_size[j];
> > +             if (arg_size > 8) {
> > +                     arg_size = 8;
> > +                     next_same_struct = !next_same_struct;
> > +             }
> > +
> > +             if (i > 5) {
> > +                     emit_ldx(prog, bytes_to_bpf_size(arg_size),
> > +                              BPF_REG_0, BPF_REG_FP,
> > +                              (i - 6) * 8 + 0x18);
> > +                     emit_stx(prog, bytes_to_bpf_size(arg_size),
> > +                              BPF_REG_FP,
> > +                              BPF_REG_0,
> > +                              -(stack_size - (i - 6) * 8));
> > +             }
> > +
> > +             j = next_same_struct ? j : j + 1;
> > +     }
> > +}
> > +
> >  static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
> >                          struct bpf_tramp_link *l, int stack_size,
> >                          int run_ctx_off, bool save_ret)
> > @@ -1938,7 +1985,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
> >       /* arg1: mov rdi, progs[i] */
> >       emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
> >       /* arg2: lea rsi, [rbp - ctx_cookie_off] */
> > -     EMIT4(0x48, 0x8D, 0x75, -run_ctx_off);
> > +     EMIT3_off32(0x48, 0x8D, 0xB5, -run_ctx_off);
> >
> >       if (emit_rsb_call(&prog, bpf_trampoline_enter(p), prog))
> >               return -EINVAL;
> > @@ -1954,7 +2001,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
> >       emit_nops(&prog, 2);
> >
> >       /* arg1: lea rdi, [rbp - stack_size] */
> > -     EMIT4(0x48, 0x8D, 0x7D, -stack_size);
> > +     EMIT3_off32(0x48, 0x8D, 0xBD, -stack_size);
> >       /* arg2: progs[i]->insnsi for interpreter */
> >       if (!p->jited)
> >               emit_mov_imm64(&prog, BPF_REG_2,
> > @@ -1984,7 +2031,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
> >       /* arg2: mov rsi, rbx <- start time in nsec */
> >       emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
> >       /* arg3: lea rdx, [rbp - run_ctx_off] */
> > -     EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
> > +     EMIT3_off32(0x48, 0x8D, 0x95, -run_ctx_off);
> >       if (emit_rsb_call(&prog, bpf_trampoline_exit(p), prog))
> >               return -EINVAL;
> >
> > @@ -2136,7 +2183,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >                               void *func_addr)
> >  {
> >       int i, ret, nr_regs = m->nr_args, stack_size = 0;
> > -     int regs_off, nregs_off, ip_off, run_ctx_off;
> > +     int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
> >       struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
> >       struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
> >       struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> > @@ -2150,8 +2197,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >               if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
> >                       nr_regs += (m->arg_size[i] + 7) / 8 - 1;
> >
> > -     /* x86-64 supports up to 6 arguments. 7+ can be added in the future */
> > -     if (nr_regs > 6)
> > +     /* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6
> > +      * are passed through regs, the remains are through stack.
> > +      */
> > +     if (nr_regs > MAX_BPF_FUNC_ARGS)
> >               return -ENOTSUPP;
> >
> >       /* Generated trampoline stack layout:
> > @@ -2170,7 +2219,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >        *
> >        * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
> >        *
> > +      * RBP - rbx_off   [ rbx value       ]  always
> > +      *
> >        * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
> > +      *
> > +      *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
> > +      *                     [ ...        ]
> > +      *                     [ stack_arg2 ]
> > +      * RBP - arg_stack_off [ stack_arg1 ]
> >        */
> >
> >       /* room for return value of orig_call or fentry prog */
> > @@ -2190,9 +2246,17 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >
> >       ip_off = stack_size;
> >
> > +     stack_size += 8;
> > +     rbx_off = stack_size;
> > +
> >       stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
> >       run_ctx_off = stack_size;
> >
> > +     if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG))
> > +             stack_size += (nr_regs - 6) * 8;
> > +
> > +     arg_stack_off = stack_size;
> > +
> >       if (flags & BPF_TRAMP_F_SKIP_FRAME) {
> >               /* skip patched call instruction and point orig_call to actual
> >                * body of the kernel function.
> > @@ -2212,8 +2276,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >       x86_call_depth_emit_accounting(&prog, NULL);
> >       EMIT1(0x55);             /* push rbp */
> >       EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
> > -     EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
> > -     EMIT1(0x53);             /* push rbx */
> > +     EMIT3_off32(0x48, 0x81, 0xEC, stack_size); /* sub rsp, stack_size */
> > +     /* mov QWORD PTR [rbp - rbx_off], rbx */
> > +     emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
> >
> >       /* Store number of argument registers of the traced function:
> >        *   mov rax, nr_regs
> > @@ -2262,6 +2327,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >
> >       if (flags & BPF_TRAMP_F_CALL_ORIG) {
> >               restore_regs(m, &prog, nr_regs, regs_off);
> > +             prepare_origin_stack(m, &prog, nr_regs, arg_stack_off);
> >
> >               if (flags & BPF_TRAMP_F_ORIG_STACK) {
> >                       emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
> > @@ -2321,14 +2387,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >       if (save_ret)
> >               emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
> >
> > -     EMIT1(0x5B); /* pop rbx */
> > +     emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
> >       EMIT1(0xC9); /* leave */
> >       if (flags & BPF_TRAMP_F_SKIP_FRAME)
> >               /* skip our return address and return to parent */
> >               EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
> >       emit_return(&prog, prog);
> >       /* Make sure the trampoline generation logic doesn't overflow */
> > -     if (WARN_ON_ONCE(prog > (u8 *)image_end - BPF_INSN_SAFETY)) {
> > +     if (prog > (u8 *)image_end - BPF_INSN_SAFETY) {
> >               ret = -EFAULT;
> >               goto cleanup;
> >       }
> > --
> > 2.40.1
> >
diff mbox series

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 1056bbf55b17..0e247bb7d6f6 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1868,7 +1868,7 @@  static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
 	 * mov QWORD PTR [rbp-0x10],rdi
 	 * mov QWORD PTR [rbp-0x8],rsi
 	 */
-	for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
+	for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
 		/* The arg_size is at most 16 bytes, enforced by the verifier. */
 		arg_size = m->arg_size[j];
 		if (arg_size > 8) {
@@ -1876,10 +1876,22 @@  static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
 			next_same_struct = !next_same_struct;
 		}
 
-		emit_stx(prog, bytes_to_bpf_size(arg_size),
-			 BPF_REG_FP,
-			 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
-			 -(stack_size - i * 8));
+		if (i <= 5) {
+			/* store function arguments in regs */
+			emit_stx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_FP,
+				 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
+				 -(stack_size - i * 8));
+		} else {
+			/* store function arguments in stack */
+			emit_ldx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_0, BPF_REG_FP,
+				 (i - 6) * 8 + 0x18);
+			emit_stx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_FP,
+				 BPF_REG_0,
+				 -(stack_size - i * 8));
+		}
 
 		j = next_same_struct ? j : j + 1;
 	}
@@ -1913,6 +1925,41 @@  static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
 	}
 }
 
+static void prepare_origin_stack(const struct btf_func_model *m, u8 **prog,
+				 int nr_regs, int stack_size)
+{
+	int i, j, arg_size;
+	bool next_same_struct = false;
+
+	if (nr_regs <= 6)
+		return;
+
+	/* Prepare the function arguments in stack before call origin
+	 * function. These arguments must be stored in the top of the
+	 * stack.
+	 */
+	for (i = 0, j = 0; i < min(nr_regs, MAX_BPF_FUNC_ARGS); i++) {
+		/* The arg_size is at most 16 bytes, enforced by the verifier. */
+		arg_size = m->arg_size[j];
+		if (arg_size > 8) {
+			arg_size = 8;
+			next_same_struct = !next_same_struct;
+		}
+
+		if (i > 5) {
+			emit_ldx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_0, BPF_REG_FP,
+				 (i - 6) * 8 + 0x18);
+			emit_stx(prog, bytes_to_bpf_size(arg_size),
+				 BPF_REG_FP,
+				 BPF_REG_0,
+				 -(stack_size - (i - 6) * 8));
+		}
+
+		j = next_same_struct ? j : j + 1;
+	}
+}
+
 static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 			   struct bpf_tramp_link *l, int stack_size,
 			   int run_ctx_off, bool save_ret)
@@ -1938,7 +1985,7 @@  static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 	/* arg1: mov rdi, progs[i] */
 	emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
 	/* arg2: lea rsi, [rbp - ctx_cookie_off] */
-	EMIT4(0x48, 0x8D, 0x75, -run_ctx_off);
+	EMIT3_off32(0x48, 0x8D, 0xB5, -run_ctx_off);
 
 	if (emit_rsb_call(&prog, bpf_trampoline_enter(p), prog))
 		return -EINVAL;
@@ -1954,7 +2001,7 @@  static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 	emit_nops(&prog, 2);
 
 	/* arg1: lea rdi, [rbp - stack_size] */
-	EMIT4(0x48, 0x8D, 0x7D, -stack_size);
+	EMIT3_off32(0x48, 0x8D, 0xBD, -stack_size);
 	/* arg2: progs[i]->insnsi for interpreter */
 	if (!p->jited)
 		emit_mov_imm64(&prog, BPF_REG_2,
@@ -1984,7 +2031,7 @@  static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 	/* arg2: mov rsi, rbx <- start time in nsec */
 	emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
 	/* arg3: lea rdx, [rbp - run_ctx_off] */
-	EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
+	EMIT3_off32(0x48, 0x8D, 0x95, -run_ctx_off);
 	if (emit_rsb_call(&prog, bpf_trampoline_exit(p), prog))
 		return -EINVAL;
 
@@ -2136,7 +2183,7 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 				void *func_addr)
 {
 	int i, ret, nr_regs = m->nr_args, stack_size = 0;
-	int regs_off, nregs_off, ip_off, run_ctx_off;
+	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
 	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
 	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
 	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -2150,8 +2197,10 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
 			nr_regs += (m->arg_size[i] + 7) / 8 - 1;
 
-	/* x86-64 supports up to 6 arguments. 7+ can be added in the future */
-	if (nr_regs > 6)
+	/* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6
+	 * are passed through regs, the remains are through stack.
+	 */
+	if (nr_regs > MAX_BPF_FUNC_ARGS)
 		return -ENOTSUPP;
 
 	/* Generated trampoline stack layout:
@@ -2170,7 +2219,14 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	 *
 	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
 	 *
+	 * RBP - rbx_off   [ rbx value       ]  always
+	 *
 	 * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
+	 *
+	 *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
+	 *                     [ ...        ]
+	 *                     [ stack_arg2 ]
+	 * RBP - arg_stack_off [ stack_arg1 ]
 	 */
 
 	/* room for return value of orig_call or fentry prog */
@@ -2190,9 +2246,17 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 
 	ip_off = stack_size;
 
+	stack_size += 8;
+	rbx_off = stack_size;
+
 	stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
 	run_ctx_off = stack_size;
 
+	if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG))
+		stack_size += (nr_regs - 6) * 8;
+
+	arg_stack_off = stack_size;
+
 	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
 		/* skip patched call instruction and point orig_call to actual
 		 * body of the kernel function.
@@ -2212,8 +2276,9 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	x86_call_depth_emit_accounting(&prog, NULL);
 	EMIT1(0x55);		 /* push rbp */
 	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
-	EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
-	EMIT1(0x53);		 /* push rbx */
+	EMIT3_off32(0x48, 0x81, 0xEC, stack_size); /* sub rsp, stack_size */
+	/* mov QWORD PTR [rbp - rbx_off], rbx */
+	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
 
 	/* Store number of argument registers of the traced function:
 	 *   mov rax, nr_regs
@@ -2262,6 +2327,7 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
 		restore_regs(m, &prog, nr_regs, regs_off);
+		prepare_origin_stack(m, &prog, nr_regs, arg_stack_off);
 
 		if (flags & BPF_TRAMP_F_ORIG_STACK) {
 			emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
@@ -2321,14 +2387,14 @@  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	if (save_ret)
 		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
 
-	EMIT1(0x5B); /* pop rbx */
+	emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
 	EMIT1(0xC9); /* leave */
 	if (flags & BPF_TRAMP_F_SKIP_FRAME)
 		/* skip our return address and return to parent */
 		EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
 	emit_return(&prog, prog);
 	/* Make sure the trampoline generation logic doesn't overflow */
-	if (WARN_ON_ONCE(prog > (u8 *)image_end - BPF_INSN_SAFETY)) {
+	if (prog > (u8 *)image_end - BPF_INSN_SAFETY) {
 		ret = -EFAULT;
 		goto cleanup;
 	}