diff mbox series

[bpf-next,v3,2/4] libbpf: Access first syscall argument with CO-RE direct read on arm64

Message ID 20240831041934.1629216-3-pulehui@huaweicloud.com (mailing list archive)
State Accepted
Commit ebd8ad4748885dd244100a9fc86b4c612c711518
Delegated to: BPF
Headers show
Series Fix accessing first syscall argument on RV64 | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-17 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
netdev/tree_selection success Clearly marked for bpf-next
netdev/apply success Patch already applied to bpf-next-0

Commit Message

Pu Lehui Aug. 31, 2024, 4:19 a.m. UTC
From: Pu Lehui <pulehui@huawei.com>

Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
into account the read pt_regs comes directly from the context, let's use
CO-RE direct read to access the first system call argument.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 tools/lib/bpf/bpf_tracing.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Xu Kuohai Aug. 31, 2024, 7:26 a.m. UTC | #1
On 8/31/2024 12:19 PM, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
> 
> Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
> SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
> into account the read pt_regs comes directly from the context, let's use
> CO-RE direct read to access the first system call argument.
> 
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>   tools/lib/bpf/bpf_tracing.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
> index e7d9382efeb3..051c408e6aed 100644
> --- a/tools/lib/bpf/bpf_tracing.h
> +++ b/tools/lib/bpf/bpf_tracing.h
> @@ -222,7 +222,7 @@ struct pt_regs___s390 {
>   
>   struct pt_regs___arm64 {
>   	unsigned long orig_x0;
> -};
> +} __attribute__((preserve_access_index));
>   
>   /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */
>   #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
> @@ -241,7 +241,7 @@ struct pt_regs___arm64 {
>   #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
>   #define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
>   #define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
> -#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
> +#define PT_REGS_PARM1_SYSCALL(x) (((const struct pt_regs___arm64 *)(x))->orig_x0)
>   #define PT_REGS_PARM1_CORE_SYSCALL(x) \
>   	BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)
>   

Cool!

Acked-by: Xu Kuohai <xukuohai@huawei.com>
Xu Kuohai Aug. 31, 2024, 7:57 a.m. UTC | #2
On 8/31/2024 3:26 PM, Xu Kuohai wrote:
> On 8/31/2024 12:19 PM, Pu Lehui wrote:
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
>> SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
>> into account the read pt_regs comes directly from the context, let's use
>> CO-RE direct read to access the first system call argument.
>>
>> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>> ---
>>   tools/lib/bpf/bpf_tracing.h | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
>> index e7d9382efeb3..051c408e6aed 100644
>> --- a/tools/lib/bpf/bpf_tracing.h
>> +++ b/tools/lib/bpf/bpf_tracing.h
>> @@ -222,7 +222,7 @@ struct pt_regs___s390 {
>>   struct pt_regs___arm64 {
>>       unsigned long orig_x0;
>> -};
>> +} __attribute__((preserve_access_index));
>>   /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */
>>   #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
>> @@ -241,7 +241,7 @@ struct pt_regs___arm64 {
>>   #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
>>   #define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
>>   #define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
>> -#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
>> +#define PT_REGS_PARM1_SYSCALL(x) (((const struct pt_regs___arm64 *)(x))->orig_x0)
>>   #define PT_REGS_PARM1_CORE_SYSCALL(x) \
>>       BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)
> 
> Cool!
> 
> Acked-by: Xu Kuohai <xukuohai@huawei.com>
> 
> 

Wait, it breaks the following test:

--- a/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
+++ b/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
@@ -43,7 +43,7 @@ int BPF_KPROBE(handle_sys_prctl)

         /* test for PT_REGS_PARM */

-       bpf_probe_read_kernel(&tmp, sizeof(tmp), &PT_REGS_PARM1_SYSCALL(real_regs));
+       tmp = PT_REGS_PARM1_SYSCALL(real_regs);
         arg1 = tmp;
         bpf_probe_read_kernel(&arg2, sizeof(arg2), &PT_REGS_PARM2_SYSCALL(real_regs));
         bpf_probe_read_kernel(&arg3, sizeof(arg3), &PT_REGS_PARM3_SYSCALL(real_regs));

Failed with verifier rejection:

0: R1=ctx() R10=fp0
; int BPF_KPROBE(handle_sys_prctl) @ bpf_syscall_macro.c:33
0: (bf) r6 = r1                       ; R1=ctx() R6_w=ctx()
; pid_t pid = bpf_get_current_pid_tgid() >> 32; @ bpf_syscall_macro.c:36
1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
; if (pid != filter_pid) @ bpf_syscall_macro.c:39
2: (18) r1 = 0xffff800082e0e000       ; R1_w=map_value(map=bpf_sysc.rodata,ks=4,vs=4)
4: (61) r1 = *(u32 *)(r1 +0)          ; R1_w=607
; pid_t pid = bpf_get_current_pid_tgid() >> 32; @ bpf_syscall_macro.c:36
5: (77) r0 >>= 32                     ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
; if (pid != filter_pid) @ bpf_syscall_macro.c:39
6: (5e) if w1 != w0 goto pc+98        ; R0_w=607 R1_w=607
; real_regs = PT_REGS_SYSCALL_REGS(ctx); @ bpf_syscall_macro.c:42
7: (79) r8 = *(u64 *)(r6 +0)          ; R6_w=ctx() R8_w=scalar()
; tmp = PT_REGS_PARM1_SYSCALL(real_regs); @ bpf_syscall_macro.c:46
8: (79) r1 = *(u64 *)(r8 +272)
R8 invalid mem access 'scalar'
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Andrii Nakryiko Sept. 4, 2024, 8:06 p.m. UTC | #3
On Sat, Aug 31, 2024 at 12:57 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
>
> On 8/31/2024 3:26 PM, Xu Kuohai wrote:
> > On 8/31/2024 12:19 PM, Pu Lehui wrote:
> >> From: Pu Lehui <pulehui@huawei.com>
> >>
> >> Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
> >> SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
> >> into account the read pt_regs comes directly from the context, let's use
> >> CO-RE direct read to access the first system call argument.
> >>
> >> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> >> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> >> ---
> >>   tools/lib/bpf/bpf_tracing.h | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
> >> index e7d9382efeb3..051c408e6aed 100644
> >> --- a/tools/lib/bpf/bpf_tracing.h
> >> +++ b/tools/lib/bpf/bpf_tracing.h
> >> @@ -222,7 +222,7 @@ struct pt_regs___s390 {
> >>   struct pt_regs___arm64 {
> >>       unsigned long orig_x0;
> >> -};
> >> +} __attribute__((preserve_access_index));
> >>   /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */
> >>   #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
> >> @@ -241,7 +241,7 @@ struct pt_regs___arm64 {
> >>   #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
> >>   #define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
> >>   #define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
> >> -#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
> >> +#define PT_REGS_PARM1_SYSCALL(x) (((const struct pt_regs___arm64 *)(x))->orig_x0)
> >>   #define PT_REGS_PARM1_CORE_SYSCALL(x) \
> >>       BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)
> >
> > Cool!
> >
> > Acked-by: Xu Kuohai <xukuohai@huawei.com>
> >
> >
>
> Wait, it breaks the following test:
>

You mean, *if you change the existing test like below*, it will break,
right? And that's expected, because arm64 has
ARCH_HAS_SYSCALL_WRAPPER, which means syscall pt_regs are actually not
the kprobe's ctx, so you can't directly access it. Which is why we
have PT_REGS_PARM1_CORE_SYSCALL() variants.

See how BPF_KSYSCALL macro is implemented, there are two cases:
___bpf_syswap_args(), which uses BPF_CORE_READ()-based macros to fetch
arguments, and ___bpf_syscall_args() which uses direct ctx reads.


> --- a/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
> @@ -43,7 +43,7 @@ int BPF_KPROBE(handle_sys_prctl)
>
>          /* test for PT_REGS_PARM */
>
> -       bpf_probe_read_kernel(&tmp, sizeof(tmp), &PT_REGS_PARM1_SYSCALL(real_regs));
> +       tmp = PT_REGS_PARM1_SYSCALL(real_regs);
>          arg1 = tmp;
>          bpf_probe_read_kernel(&arg2, sizeof(arg2), &PT_REGS_PARM2_SYSCALL(real_regs));
>          bpf_probe_read_kernel(&arg3, sizeof(arg3), &PT_REGS_PARM3_SYSCALL(real_regs));
>
> Failed with verifier rejection:
>
> 0: R1=ctx() R10=fp0
> ; int BPF_KPROBE(handle_sys_prctl) @ bpf_syscall_macro.c:33
> 0: (bf) r6 = r1                       ; R1=ctx() R6_w=ctx()
> ; pid_t pid = bpf_get_current_pid_tgid() >> 32; @ bpf_syscall_macro.c:36
> 1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
> ; if (pid != filter_pid) @ bpf_syscall_macro.c:39
> 2: (18) r1 = 0xffff800082e0e000       ; R1_w=map_value(map=bpf_sysc.rodata,ks=4,vs=4)
> 4: (61) r1 = *(u32 *)(r1 +0)          ; R1_w=607
> ; pid_t pid = bpf_get_current_pid_tgid() >> 32; @ bpf_syscall_macro.c:36
> 5: (77) r0 >>= 32                     ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> ; if (pid != filter_pid) @ bpf_syscall_macro.c:39
> 6: (5e) if w1 != w0 goto pc+98        ; R0_w=607 R1_w=607
> ; real_regs = PT_REGS_SYSCALL_REGS(ctx); @ bpf_syscall_macro.c:42
> 7: (79) r8 = *(u64 *)(r6 +0)          ; R6_w=ctx() R8_w=scalar()
> ; tmp = PT_REGS_PARM1_SYSCALL(real_regs); @ bpf_syscall_macro.c:46
> 8: (79) r1 = *(u64 *)(r8 +272)
> R8 invalid mem access 'scalar'
> processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
Andrii Nakryiko Sept. 4, 2024, 8:21 p.m. UTC | #4
On Fri, Aug 30, 2024 at 9:17 PM Pu Lehui <pulehui@huaweicloud.com> wrote:
>
> From: Pu Lehui <pulehui@huawei.com>
>
> Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
> SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
> into account the read pt_regs comes directly from the context, let's use
> CO-RE direct read to access the first system call argument.
>
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>  tools/lib/bpf/bpf_tracing.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
> index e7d9382efeb3..051c408e6aed 100644
> --- a/tools/lib/bpf/bpf_tracing.h
> +++ b/tools/lib/bpf/bpf_tracing.h
> @@ -222,7 +222,7 @@ struct pt_regs___s390 {
>
>  struct pt_regs___arm64 {
>         unsigned long orig_x0;
> -};
> +} __attribute__((preserve_access_index));
>
>  /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */
>  #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
> @@ -241,7 +241,7 @@ struct pt_regs___arm64 {
>  #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
>  #define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
>  #define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
> -#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
> +#define PT_REGS_PARM1_SYSCALL(x) (((const struct pt_regs___arm64 *)(x))->orig_x0)

It would probably be best (for consistency) to stick to using
__PTR_PARM1_SYSCALL_REG instead of hard-coding orig_x0 here, no? I'll
fix it up while applying. Same for patch #1 and #4.

It would be great if you can double-check that final patches in
bpf-next/master compile and work well for arm64, s390x, and RV64 (as I
can't really test that much locally).



>  #define PT_REGS_PARM1_CORE_SYSCALL(x) \
>         BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)
>
> --
> 2.34.1
>
Xu Kuohai Sept. 5, 2024, 2:39 a.m. UTC | #5
On 9/5/2024 4:06 AM, Andrii Nakryiko wrote:
> On Sat, Aug 31, 2024 at 12:57 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
>>
>> On 8/31/2024 3:26 PM, Xu Kuohai wrote:
>>> On 8/31/2024 12:19 PM, Pu Lehui wrote:
>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>
>>>> Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
>>>> SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
>>>> into account the read pt_regs comes directly from the context, let's use
>>>> CO-RE direct read to access the first system call argument.
>>>>
>>>> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
>>>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>>>> ---
>>>>    tools/lib/bpf/bpf_tracing.h | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
>>>> index e7d9382efeb3..051c408e6aed 100644
>>>> --- a/tools/lib/bpf/bpf_tracing.h
>>>> +++ b/tools/lib/bpf/bpf_tracing.h
>>>> @@ -222,7 +222,7 @@ struct pt_regs___s390 {
>>>>    struct pt_regs___arm64 {
>>>>        unsigned long orig_x0;
>>>> -};
>>>> +} __attribute__((preserve_access_index));
>>>>    /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */
>>>>    #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
>>>> @@ -241,7 +241,7 @@ struct pt_regs___arm64 {
>>>>    #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
>>>>    #define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
>>>>    #define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
>>>> -#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
>>>> +#define PT_REGS_PARM1_SYSCALL(x) (((const struct pt_regs___arm64 *)(x))->orig_x0)
>>>>    #define PT_REGS_PARM1_CORE_SYSCALL(x) \
>>>>        BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)
>>>
>>> Cool!
>>>
>>> Acked-by: Xu Kuohai <xukuohai@huawei.com>
>>>
>>>
>>
>> Wait, it breaks the following test:
>>
> 
> You mean, *if you change the existing test like below*, it will break,
> right? And that's expected, because arm64 has
> ARCH_HAS_SYSCALL_WRAPPER, which means syscall pt_regs are actually not
> the kprobe's ctx, so you can't directly access it. Which is why we
> have PT_REGS_PARM1_CORE_SYSCALL() variants.
> 
> See how BPF_KSYSCALL macro is implemented, there are two cases:
> ___bpf_syswap_args(), which uses BPF_CORE_READ()-based macros to fetch
> arguments, and ___bpf_syscall_args() which uses direct ctx reads.
>

Got it, thanks for the explanation.

> 
>> --- a/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
>> +++ b/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
>> @@ -43,7 +43,7 @@ int BPF_KPROBE(handle_sys_prctl)
>>
>>           /* test for PT_REGS_PARM */
>>
>> -       bpf_probe_read_kernel(&tmp, sizeof(tmp), &PT_REGS_PARM1_SYSCALL(real_regs));
>> +       tmp = PT_REGS_PARM1_SYSCALL(real_regs);
>>           arg1 = tmp;
>>           bpf_probe_read_kernel(&arg2, sizeof(arg2), &PT_REGS_PARM2_SYSCALL(real_regs));
>>           bpf_probe_read_kernel(&arg3, sizeof(arg3), &PT_REGS_PARM3_SYSCALL(real_regs));
>>
>> Failed with verifier rejection:
>>
>> 0: R1=ctx() R10=fp0
>> ; int BPF_KPROBE(handle_sys_prctl) @ bpf_syscall_macro.c:33
>> 0: (bf) r6 = r1                       ; R1=ctx() R6_w=ctx()
>> ; pid_t pid = bpf_get_current_pid_tgid() >> 32; @ bpf_syscall_macro.c:36
>> 1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
>> ; if (pid != filter_pid) @ bpf_syscall_macro.c:39
>> 2: (18) r1 = 0xffff800082e0e000       ; R1_w=map_value(map=bpf_sysc.rodata,ks=4,vs=4)
>> 4: (61) r1 = *(u32 *)(r1 +0)          ; R1_w=607
>> ; pid_t pid = bpf_get_current_pid_tgid() >> 32; @ bpf_syscall_macro.c:36
>> 5: (77) r0 >>= 32                     ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>> ; if (pid != filter_pid) @ bpf_syscall_macro.c:39
>> 6: (5e) if w1 != w0 goto pc+98        ; R0_w=607 R1_w=607
>> ; real_regs = PT_REGS_SYSCALL_REGS(ctx); @ bpf_syscall_macro.c:42
>> 7: (79) r8 = *(u64 *)(r6 +0)          ; R6_w=ctx() R8_w=scalar()
>> ; tmp = PT_REGS_PARM1_SYSCALL(real_regs); @ bpf_syscall_macro.c:46
>> 8: (79) r1 = *(u64 *)(r8 +272)
>> R8 invalid mem access 'scalar'
>> processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>>
Pu Lehui Sept. 5, 2024, 6:42 a.m. UTC | #6
On 2024/9/5 4:21, Andrii Nakryiko wrote:
> On Fri, Aug 30, 2024 at 9:17 PM Pu Lehui <pulehui@huaweicloud.com> wrote:
>>
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
>> SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
>> into account the read pt_regs comes directly from the context, let's use
>> CO-RE direct read to access the first system call argument.
>>
>> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>> ---
>>   tools/lib/bpf/bpf_tracing.h | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
>> index e7d9382efeb3..051c408e6aed 100644
>> --- a/tools/lib/bpf/bpf_tracing.h
>> +++ b/tools/lib/bpf/bpf_tracing.h
>> @@ -222,7 +222,7 @@ struct pt_regs___s390 {
>>
>>   struct pt_regs___arm64 {
>>          unsigned long orig_x0;
>> -};
>> +} __attribute__((preserve_access_index));
>>
>>   /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */
>>   #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
>> @@ -241,7 +241,7 @@ struct pt_regs___arm64 {
>>   #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
>>   #define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
>>   #define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
>> -#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
>> +#define PT_REGS_PARM1_SYSCALL(x) (((const struct pt_regs___arm64 *)(x))->orig_x0)
> 
> It would probably be best (for consistency) to stick to using
> __PTR_PARM1_SYSCALL_REG instead of hard-coding orig_x0 here, no? I'll
> fix it up while applying. Same for patch #1 and #4.
> 
> It would be great if you can double-check that final patches in
> bpf-next/master compile and work well for arm64, s390x, and RV64 (as I
> can't really test that much locally).

I check that locally with cross-platform vmtest on RV64, it looks good:

Summary: 569/3944 PASSED, 104 SKIPPED, 0 FAILED

and BPF CI meet happy on arm64, s390x.


> 
> 
> 
>>   #define PT_REGS_PARM1_CORE_SYSCALL(x) \
>>          BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)
>>
>> --
>> 2.34.1
>>
diff mbox series

Patch

diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
index e7d9382efeb3..051c408e6aed 100644
--- a/tools/lib/bpf/bpf_tracing.h
+++ b/tools/lib/bpf/bpf_tracing.h
@@ -222,7 +222,7 @@  struct pt_regs___s390 {
 
 struct pt_regs___arm64 {
 	unsigned long orig_x0;
-};
+} __attribute__((preserve_access_index));
 
 /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */
 #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
@@ -241,7 +241,7 @@  struct pt_regs___arm64 {
 #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
 #define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
 #define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
-#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
+#define PT_REGS_PARM1_SYSCALL(x) (((const struct pt_regs___arm64 *)(x))->orig_x0)
 #define PT_REGS_PARM1_CORE_SYSCALL(x) \
 	BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)