[bpf-next,v2,1/2] bpf: add bpf_get_hw_counter kfunc

Message ID	20241024205113.762622-1-vadfed@meta.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 835DF1531E2 for <bpf@vger.kernel.org>; Thu, 24 Oct 2024 20:51:58 +0000 (UTC) From: Vadim Fedorenko <vadfed@meta.com> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Eduard Zingerman <eddyz87@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, Vadim Fedorenko <vadim.fedorenko@linux.dev> CC: <x86@kernel.org>, <bpf@vger.kernel.org>, Vadim Fedorenko <vadfed@meta.com> Subject: [PATCH bpf-next v2 1/2] bpf: add bpf_get_hw_counter kfunc Date: Thu, 24 Oct 2024 13:51:12 -0700 Message-ID: <20241024205113.762622-1-vadfed@meta.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	[bpf-next,v2,1/2] bpf: add bpf_get_hw_counter kfunc \| expand [bpf-next,v2,1/2] bpf: add bpf_get_hw_counter kfunc [bpf-next,v2,2/2] selftests/bpf: add selftest to check rdtsc jit

Context	Check	Description
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-0	success	Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-11	fail	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32	success	Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-34	success	Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-38	success	Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-39	success	Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-33	success	Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-7	fail	Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8	fail	Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19	fail	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	fail	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-29	fail	Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-35	fail	Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-36	fail	Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-37	fail	Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-28	fail	Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 15 this patch: 15
netdev/build_tools	success	Errors and warnings before: 2 (+1) this patch: 2 (+1)
netdev/cc_maintainers	warning	15 maintainers not CCed: dave.hansen@linux.intel.com song@kernel.org udknight@gmail.com haoluo@google.com bp@alien8.de netdev@vger.kernel.org john.fastabend@gmail.com sdf@fomichev.me martin.lau@linux.dev hpa@zytor.com dsahern@kernel.org kpsingh@kernel.org yonghong.song@linux.dev mingo@redhat.com jolsa@kernel.org
netdev/build_clang	success	Errors and warnings before: 24 this patch: 24
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 1364 this patch: 1365
netdev/checkpatch	warning	WARNING: externs should be avoided in .c files WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 11 this patch: 11
netdev/source_inline	success	Was 0 now: 0

Vadim Fedorenko Oct. 24, 2024, 8:51 p.m. UTC

New kfunc to return ARCH-specific timecounter. For x86 BPF JIT converts
it into rdtsc ordered call. Other architectures will get JIT
implementation too if supported. The fallback is to
__arch_get_hw_counter().

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
v1 -> v2:
* Fix incorrect function return value type to u64
* Introduce bpf_jit_inlines_kfunc_call() and use it in
  mark_fastcall_pattern_for_call() to avoid clobbering in case of
	running programs with no JIT (Eduard)
* Avoid rewriting instruction and check function pointer directly
  in JIT (Alexei)
* Change includes to fix compile issues on non x86 architectures
---
 arch/x86/net/bpf_jit_comp.c   | 30 ++++++++++++++++++++++++++++++
 arch/x86/net/bpf_jit_comp32.c | 16 ++++++++++++++++
 include/linux/filter.h        |  1 +
 kernel/bpf/core.c             | 11 +++++++++++
 kernel/bpf/helpers.c          |  7 +++++++
 kernel/bpf/verifier.c         |  4 +++-
 6 files changed, 68 insertions(+), 1 deletion(-)

Alexei Starovoitov Oct. 24, 2024, 10:14 p.m. UTC | #1

On Thu, Oct 24, 2024 at 1:51 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>
> New kfunc to return ARCH-specific timecounter. For x86 BPF JIT converts
> it into rdtsc ordered call. Other architectures will get JIT
> implementation too if supported. The fallback is to
> __arch_get_hw_counter().
>
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> ---
> v1 -> v2:
> * Fix incorrect function return value type to u64
> * Introduce bpf_jit_inlines_kfunc_call() and use it in
>   mark_fastcall_pattern_for_call() to avoid clobbering in case of
>         running programs with no JIT (Eduard)
> * Avoid rewriting instruction and check function pointer directly
>   in JIT (Alexei)
> * Change includes to fix compile issues on non x86 architectures
> ---
>  arch/x86/net/bpf_jit_comp.c   | 30 ++++++++++++++++++++++++++++++
>  arch/x86/net/bpf_jit_comp32.c | 16 ++++++++++++++++
>  include/linux/filter.h        |  1 +
>  kernel/bpf/core.c             | 11 +++++++++++
>  kernel/bpf/helpers.c          |  7 +++++++
>  kernel/bpf/verifier.c         |  4 +++-
>  6 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 06b080b61aa5..a8cffbb19cf2 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1412,6 +1412,8 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
>  #define LOAD_TAIL_CALL_CNT_PTR(stack)                          \
>         __LOAD_TCC_PTR(BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack))
>
> +u64 bpf_get_hw_counter(void);

just add it to some .h

>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
>                   int oldproglen, struct jit_context *ctx, bool jmp_padding)
>  {
> @@ -2126,6 +2128,26 @@ st:                      if (is_imm8(insn->off))
>                 case BPF_JMP | BPF_CALL: {
>                         u8 *ip = image + addrs[i - 1];
>
> +                       if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> +                           imm32 == BPF_CALL_IMM(bpf_get_hw_counter)) {
> +                               /* Save RDX because RDTSC will use EDX:EAX to return u64 */
> +                               emit_mov_reg(&prog, true, AUX_REG, BPF_REG_3);
> +                               if (boot_cpu_has(X86_FEATURE_LFENCE_RDTSC))
> +                                       EMIT_LFENCE();
> +                               EMIT2(0x0F, 0x31);
> +
> +                               /* shl RDX, 32 */
> +                               maybe_emit_1mod(&prog, BPF_REG_3, true);
> +                               EMIT3(0xC1, add_1reg(0xE0, BPF_REG_3), 32);
> +                               /* or RAX, RDX */
> +                               maybe_emit_mod(&prog, BPF_REG_0, BPF_REG_3, true);
> +                               EMIT2(0x09, add_2reg(0xC0, BPF_REG_0, BPF_REG_3));
> +                               /* restore RDX from R11 */
> +                               emit_mov_reg(&prog, true, BPF_REG_3, AUX_REG);
> +
> +                               break;
> +                       }
> +
>                         func = (u8 *) __bpf_call_base + imm32;
>                         if (tail_call_reachable) {
>                                 LOAD_TAIL_CALL_CNT_PTR(bpf_prog->aux->stack_depth);
> @@ -3652,3 +3674,11 @@ u64 bpf_arch_uaddress_limit(void)
>  {
>         return 0;
>  }
> +
> +/* x86-64 JIT can inline kfunc */
> +bool bpf_jit_inlines_helper_call(s32 imm)

kfunc

> +{
> +       if (imm == BPF_CALL_IMM(bpf_get_hw_counter))
> +               return true;
> +       return false;
> +}
> diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
> index de0f9e5f9f73..66525cb1892c 100644
> --- a/arch/x86/net/bpf_jit_comp32.c
> +++ b/arch/x86/net/bpf_jit_comp32.c
> @@ -1656,6 +1656,8 @@ static int emit_kfunc_call(const struct bpf_prog *bpf_prog, u8 *end_addr,
>         return 0;
>  }
>
> +u64 bpf_get_hw_counter(void);
> +
>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
>                   int oldproglen, struct jit_context *ctx)
>  {
> @@ -2094,6 +2096,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
>                         if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
>                                 int err;
>
> +                               if (imm32 == BPF_CALL_IMM(bpf_get_hw_counter)) {
> +                                       if (boot_cpu_has(X86_FEATURE_LFENCE_RDTSC))
> +                                               EMIT3(0x0F, 0xAE, 0xE8);
> +                                       EMIT2(0x0F, 0x31);
> +                                       break;
> +                               }
> +
>                                 err = emit_kfunc_call(bpf_prog,
>                                                       image + addrs[i],
>                                                       insn, &prog);
> @@ -2621,3 +2630,10 @@ bool bpf_jit_supports_kfunc_call(void)
>  {
>         return true;
>  }
> +
> +bool bpf_jit_inlines_helper_call(s32 imm)

kfunc

> +{
> +       if (imm == BPF_CALL_IMM(bpf_get_hw_counter))
> +               return true;
> +       return false;
> +}
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 7d7578a8eac1..8bdd5e6b2a65 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1111,6 +1111,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
>  void bpf_jit_compile(struct bpf_prog *prog);
>  bool bpf_jit_needs_zext(void);
>  bool bpf_jit_inlines_helper_call(s32 imm);
> +bool bpf_jit_inlines_kfunc_call(s32 imm);
>  bool bpf_jit_supports_subprog_tailcalls(void);
>  bool bpf_jit_supports_percpu_insn(void);
>  bool bpf_jit_supports_kfunc_call(void);
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 233ea78f8f1b..ab6a2452ade0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2965,6 +2965,17 @@ bool __weak bpf_jit_inlines_helper_call(s32 imm)
>         return false;
>  }
>
> +/* Return true if the JIT inlines the call to the kfunc corresponding to
> + * the imm.
> + *
> + * The verifier will not patch the insn->imm for the call to the helper if
> + * this returns true.
> + */
> +bool __weak bpf_jit_inlines_kfunc_call(s32 imm)
> +{
> +       return false;
> +}
> +
>  /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
>  bool __weak bpf_jit_supports_subprog_tailcalls(void)
>  {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5c3fdb29c1b1..f7bf3debbcc4 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -23,6 +23,7 @@
>  #include <linux/btf_ids.h>
>  #include <linux/bpf_mem_alloc.h>
>  #include <linux/kasan.h>
> +#include <vdso/datapage.h>
>
>  #include "../../lib/kstrtox.h"
>
> @@ -3023,6 +3024,11 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
>         return ret + 1;
>  }
>
> +__bpf_kfunc u64 bpf_get_hw_counter(void)
> +{
> +       return __arch_get_hw_counter(1, NULL);
> +}
> +
>  __bpf_kfunc_end_defs();
>
>  BTF_KFUNCS_START(generic_btf_ids)
> @@ -3112,6 +3118,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
>  BTF_ID_FLAGS(func, bpf_get_kmem_cache)
> +BTF_ID_FLAGS(func, bpf_get_hw_counter, KF_FASTCALL)
>  BTF_KFUNCS_END(common_btf_ids)
>
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index f514247ba8ba..428e7b84bb02 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -11326,6 +11326,7 @@ BTF_ID(func, bpf_session_cookie)
>  BTF_ID_UNUSED
>  #endif
>  BTF_ID(func, bpf_get_kmem_cache)
> +BTF_ID(func, bpf_get_hw_counter)
>
>  static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
>  {
> @@ -16291,7 +16292,8 @@ static void mark_fastcall_pattern_for_call(struct bpf_verifier_env *env,
>                         return;
>
>                 clobbered_regs_mask = kfunc_fastcall_clobber_mask(&meta);
> -               can_be_inlined = is_fastcall_kfunc_call(&meta);
> +               can_be_inlined = is_fastcall_kfunc_call(&meta) && !call->off &&

what call->off check is for?

See errors in BPF CI.

pw-bot: cr

Eduard Zingerman Oct. 24, 2024, 10:17 p.m. UTC | #2

On Thu, 2024-10-24 at 15:14 -0700, Alexei Starovoitov wrote:

[...]

> > @@ -16291,7 +16292,8 @@ static void mark_fastcall_pattern_for_call(struct bpf_verifier_env *env,
> >                         return;
> > 
> >                 clobbered_regs_mask = kfunc_fastcall_clobber_mask(&meta);
> > -               can_be_inlined = is_fastcall_kfunc_call(&meta);
> > +               can_be_inlined = is_fastcall_kfunc_call(&meta) && !call->off &&
> 
> what call->off check is for?

call->imm is BTF id, call->off is ID of the BTF itself.
I asked Vadim to add this check to make sure that imm points to the
kernel BTF.

Alexei Starovoitov Oct. 24, 2024, 10:28 p.m. UTC | #3

On Thu, Oct 24, 2024 at 3:17 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-10-24 at 15:14 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > > @@ -16291,7 +16292,8 @@ static void mark_fastcall_pattern_for_call(struct bpf_verifier_env *env,
> > >                         return;
> > >
> > >                 clobbered_regs_mask = kfunc_fastcall_clobber_mask(&meta);
> > > -               can_be_inlined = is_fastcall_kfunc_call(&meta);
> > > +               can_be_inlined = is_fastcall_kfunc_call(&meta) && !call->off &&
> >
> > what call->off check is for?
>
> call->imm is BTF id, call->off is ID of the BTF itself.

it's actually offset in fd_array

> I asked Vadim to add this check to make sure that imm points to the
> kernel BTF.

makes sense.

is_fastcall_kfunc_call(&meta) && meta.btf == btf_vmlinux && ..

would have been much more obvious.

Eduard Zingerman Oct. 24, 2024, 10:34 p.m. UTC | #4

On Thu, 2024-10-24 at 15:28 -0700, Alexei Starovoitov wrote:

[...]

> > call->imm is BTF id, call->off is ID of the BTF itself.
> 
> it's actually offset in fd_array

Sure.

> > I asked Vadim to add this check to make sure that imm points to the
> > kernel BTF.
> 
> makes sense.
> 
> is_fastcall_kfunc_call(&meta) && meta.btf == btf_vmlinux && ..
> 
> would have been much more obvious.

Yes, this one looks better.

Andrii Nakryiko Oct. 24, 2024, 11:17 p.m. UTC | #5

On Thu, Oct 24, 2024 at 1:51 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>
> New kfunc to return ARCH-specific timecounter. For x86 BPF JIT converts
> it into rdtsc ordered call. Other architectures will get JIT
> implementation too if supported. The fallback is to
> __arch_get_hw_counter().
>
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> ---
> v1 -> v2:
> * Fix incorrect function return value type to u64
> * Introduce bpf_jit_inlines_kfunc_call() and use it in
>   mark_fastcall_pattern_for_call() to avoid clobbering in case of
>         running programs with no JIT (Eduard)
> * Avoid rewriting instruction and check function pointer directly
>   in JIT (Alexei)
> * Change includes to fix compile issues on non x86 architectures
> ---
>  arch/x86/net/bpf_jit_comp.c   | 30 ++++++++++++++++++++++++++++++
>  arch/x86/net/bpf_jit_comp32.c | 16 ++++++++++++++++
>  include/linux/filter.h        |  1 +
>  kernel/bpf/core.c             | 11 +++++++++++
>  kernel/bpf/helpers.c          |  7 +++++++
>  kernel/bpf/verifier.c         |  4 +++-
>  6 files changed, 68 insertions(+), 1 deletion(-)
>

[...]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5c3fdb29c1b1..f7bf3debbcc4 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -23,6 +23,7 @@
>  #include <linux/btf_ids.h>
>  #include <linux/bpf_mem_alloc.h>
>  #include <linux/kasan.h>
> +#include <vdso/datapage.h>
>
>  #include "../../lib/kstrtox.h"
>
> @@ -3023,6 +3024,11 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
>         return ret + 1;
>  }
>
> +__bpf_kfunc u64 bpf_get_hw_counter(void)

Hm... so the main idea behind this helper is to measure latency (i.e.,
time), right? So, first of all, the name itself doesn't make it clear
that this is **time stamp** counter, so maybe let's mention
"timestamp" somehow?

But then also, if I understand correctly, it will return the number of
cycles, right? And users would need to somehow convert that to
nanoseconds to make it useful. Is it trivial to do that from the BPF
side? If not, can we specify this helper to return nanoseconds instead
of cycles, maybe?

It would be great if selftest demonstratef the intended use case of
measuring some kernel function latency (or BPF helper latency, doesn't
matter much).

[...]

Vadim Fedorenko Oct. 25, 2024, 2:01 p.m. UTC | #6

On 25/10/2024 00:17, Andrii Nakryiko wrote:
> On Thu, Oct 24, 2024 at 1:51 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>
>> New kfunc to return ARCH-specific timecounter. For x86 BPF JIT converts
>> it into rdtsc ordered call. Other architectures will get JIT
>> implementation too if supported. The fallback is to
>> __arch_get_hw_counter().
>>
>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>> ---
>> v1 -> v2:
>> * Fix incorrect function return value type to u64
>> * Introduce bpf_jit_inlines_kfunc_call() and use it in
>>    mark_fastcall_pattern_for_call() to avoid clobbering in case of
>>          running programs with no JIT (Eduard)
>> * Avoid rewriting instruction and check function pointer directly
>>    in JIT (Alexei)
>> * Change includes to fix compile issues on non x86 architectures
>> ---
>>   arch/x86/net/bpf_jit_comp.c   | 30 ++++++++++++++++++++++++++++++
>>   arch/x86/net/bpf_jit_comp32.c | 16 ++++++++++++++++
>>   include/linux/filter.h        |  1 +
>>   kernel/bpf/core.c             | 11 +++++++++++
>>   kernel/bpf/helpers.c          |  7 +++++++
>>   kernel/bpf/verifier.c         |  4 +++-
>>   6 files changed, 68 insertions(+), 1 deletion(-)
>>
> 
> [...]
> 
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 5c3fdb29c1b1..f7bf3debbcc4 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -23,6 +23,7 @@
>>   #include <linux/btf_ids.h>
>>   #include <linux/bpf_mem_alloc.h>
>>   #include <linux/kasan.h>
>> +#include <vdso/datapage.h>
>>
>>   #include "../../lib/kstrtox.h"
>>
>> @@ -3023,6 +3024,11 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
>>          return ret + 1;
>>   }
>>
>> +__bpf_kfunc u64 bpf_get_hw_counter(void)
> 
> Hm... so the main idea behind this helper is to measure latency (i.e.,
> time), right? So, first of all, the name itself doesn't make it clear
> that this is **time stamp** counter, so maybe let's mention
> "timestamp" somehow?

Well, it's time stamp counter only on x86. Other architectures use cycle
or time counter naming. We might think of changing it to
bpf_get_hw_cycle_counter() if it gives more information.

> But then also, if I understand correctly, it will return the number of
> cycles, right? 

Yes, it will return the amount of cycles passed from the last CPU reset.

> And users would need to somehow convert that to
> nanoseconds to make it useful.

That's questionable. If you think about comparing the performance of the
same kernel function or bpf program on machines with the same
architecture but different generation or slightly different base
frequency. It's much more meaningful to compare CPU cycles instead of
nanoseconds. And with current CPU base frequencies cycles will be more
precise than nanoseconds.

> Is it trivial to do that from the BPF side?

Unfortunately, it is not. The program has to have an access to the cycle
counter configuration/specification to convert cycles to any time value.

 > If not, can we specify this helper to return nanoseconds instead> of 
cycles, maybe?

If we change the specification of the helper to return nanoseconds,
there will be no actual difference between this helper and
bpf_ktime_get_ns() which ends up in read_tsc() if tsc is setup as
system clock source.
At the same time I agree that it might be useful to have an option to
convert cycles into nanoseconds. I can introduce another helper to do
the actual conversion of cycles into nanoseconds using the same 
mechanics as in timekeeping or vDSO implementation of gettimeofday().
The usecase I see here is that the program can save start point in
cycles, then execute the function to check the latency, get the
cycles right after function ends and then use another kfunc to convert
cycles spent into nanoseconds. There will be no need to have this
additional kfunc inlined because it won't be on hot-path. WDYT?

> It would be great if selftest demonstratef the intended use case of
> measuring some kernel function latency (or BPF helper latency, doesn't
> matter much).

I can implement a use case described above if it's OK.

> 
> [...]

Andrii Nakryiko Oct. 25, 2024, 6:31 p.m. UTC | #7

On Fri, Oct 25, 2024 at 7:01 AM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 25/10/2024 00:17, Andrii Nakryiko wrote:
> > On Thu, Oct 24, 2024 at 1:51 PM Vadim Fedorenko <vadfed@meta.com> wrote:
> >>
> >> New kfunc to return ARCH-specific timecounter. For x86 BPF JIT converts
> >> it into rdtsc ordered call. Other architectures will get JIT
> >> implementation too if supported. The fallback is to
> >> __arch_get_hw_counter().
> >>
> >> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >> ---
> >> v1 -> v2:
> >> * Fix incorrect function return value type to u64
> >> * Introduce bpf_jit_inlines_kfunc_call() and use it in
> >>    mark_fastcall_pattern_for_call() to avoid clobbering in case of
> >>          running programs with no JIT (Eduard)
> >> * Avoid rewriting instruction and check function pointer directly
> >>    in JIT (Alexei)
> >> * Change includes to fix compile issues on non x86 architectures
> >> ---
> >>   arch/x86/net/bpf_jit_comp.c   | 30 ++++++++++++++++++++++++++++++
> >>   arch/x86/net/bpf_jit_comp32.c | 16 ++++++++++++++++
> >>   include/linux/filter.h        |  1 +
> >>   kernel/bpf/core.c             | 11 +++++++++++
> >>   kernel/bpf/helpers.c          |  7 +++++++
> >>   kernel/bpf/verifier.c         |  4 +++-
> >>   6 files changed, 68 insertions(+), 1 deletion(-)
> >>
> >
> > [...]
> >
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index 5c3fdb29c1b1..f7bf3debbcc4 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -23,6 +23,7 @@
> >>   #include <linux/btf_ids.h>
> >>   #include <linux/bpf_mem_alloc.h>
> >>   #include <linux/kasan.h>
> >> +#include <vdso/datapage.h>
> >>
> >>   #include "../../lib/kstrtox.h"
> >>
> >> @@ -3023,6 +3024,11 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
> >>          return ret + 1;
> >>   }
> >>
> >> +__bpf_kfunc u64 bpf_get_hw_counter(void)
> >
> > Hm... so the main idea behind this helper is to measure latency (i.e.,
> > time), right? So, first of all, the name itself doesn't make it clear
> > that this is **time stamp** counter, so maybe let's mention
> > "timestamp" somehow?
>
> Well, it's time stamp counter only on x86. Other architectures use cycle
> or time counter naming. We might think of changing it to
> bpf_get_hw_cycle_counter() if it gives more information.

bpf_get_cpu_cycles_counter()? or just bpf_get_cpu_cycles()?

>
> > But then also, if I understand correctly, it will return the number of
> > cycles, right?
>
> Yes, it will return the amount of cycles passed from the last CPU reset.
>
> > And users would need to somehow convert that to
> > nanoseconds to make it useful.
>
> That's questionable. If you think about comparing the performance of the
> same kernel function or bpf program on machines with the same
> architecture but different generation or slightly different base
> frequency. It's much more meaningful to compare CPU cycles instead of
> nanoseconds. And with current CPU base frequencies cycles will be more
> precise than nanoseconds.

I'm thinking not about narrow micro-benchmarking use cases, but
generic tracing and observability cases where in addition to
everything else, users almost always want to capture the duration of
whatever they are tracing. In human-relatable (and comparable across
various hosts) time units, not in cycles.

So in practice we'll have to show users how to convert this into
nanoseconds anyways. So let's at least have a test demonstrating how
to do it? (and an extra kfunc might be a solution, yep)

>
> > Is it trivial to do that from the BPF side?
>
> Unfortunately, it is not. The program has to have an access to the cycle
> counter configuration/specification to convert cycles to any time value.
>
>  > If not, can we specify this helper to return nanoseconds instead> of
> cycles, maybe?
>
> If we change the specification of the helper to return nanoseconds,
> there will be no actual difference between this helper and
> bpf_ktime_get_ns() which ends up in read_tsc() if tsc is setup as
> system clock source.
> At the same time I agree that it might be useful to have an option to
> convert cycles into nanoseconds. I can introduce another helper to do
> the actual conversion of cycles into nanoseconds using the same
> mechanics as in timekeeping or vDSO implementation of gettimeofday().
> The usecase I see here is that the program can save start point in
> cycles, then execute the function to check the latency, get the
> cycles right after function ends and then use another kfunc to convert
> cycles spent into nanoseconds. There will be no need to have this
> additional kfunc inlined because it won't be on hot-path. WDYT?

Sounds good to me. My main ask and the goal here is to *eventually*
have time units, because that's the only thing that can be compared
across hosts.

>
> > It would be great if selftest demonstratef the intended use case of
> > measuring some kernel function latency (or BPF helper latency, doesn't
> > matter much).
>
> I can implement a use case described above if it's OK.

Great, thanks.

>
> >
> > [...]
>

kernel test robot Oct. 26, 2024, 12:32 a.m. UTC | #8

Hi Vadim,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Vadim-Fedorenko/selftests-bpf-add-selftest-to-check-rdtsc-jit/20241025-045340
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20241024205113.762622-1-vadfed%40meta.com
patch subject: [PATCH bpf-next v2 1/2] bpf: add bpf_get_hw_counter kfunc
config: m68k-sun3_defconfig (https://download.01.org/0day-ci/archive/20241026/202410260829.tdMd3ywG-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241026/202410260829.tdMd3ywG-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410260829.tdMd3ywG-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/vdso/datapage.h:17,
                    from kernel/bpf/helpers.c:26:
>> include/vdso/processor.h:10:10: fatal error: asm/vdso/processor.h: No such file or directory
      10 | #include <asm/vdso/processor.h>
         |          ^~~~~~~~~~~~~~~~~~~~~~
   compilation terminated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for GET_FREE_REGION
   Depends on [n]: SPARSEMEM [=n]
   Selected by [m]:
   - RESOURCE_KUNIT_TEST [=m] && RUNTIME_TESTING_MENU [=y] && KUNIT [=m]


vim +10 include/vdso/processor.h

d8bb6993d871f5 Vincenzo Frascino 2020-03-20   9  
d8bb6993d871f5 Vincenzo Frascino 2020-03-20 @10  #include <asm/vdso/processor.h>
d8bb6993d871f5 Vincenzo Frascino 2020-03-20  11

kernel test robot Oct. 26, 2024, 1:24 a.m. UTC | #9

Hi Vadim,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Vadim-Fedorenko/selftests-bpf-add-selftest-to-check-rdtsc-jit/20241025-045340
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20241024205113.762622-1-vadfed%40meta.com
patch subject: [PATCH bpf-next v2 1/2] bpf: add bpf_get_hw_counter kfunc
config: um-allyesconfig (https://download.01.org/0day-ci/archive/20241026/202410260919.mccgFynd-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241026/202410260919.mccgFynd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410260919.mccgFynd-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/vdso/processor.h:10,
                    from include/vdso/datapage.h:17,
                    from kernel/bpf/helpers.c:26:
>> arch/x86/include/asm/vdso/processor.h:11:29: error: redefinition of 'rep_nop'
      11 | static __always_inline void rep_nop(void)
         |                             ^~~~~~~
   In file included from include/linux/spinlock_up.h:8,
                    from include/linux/spinlock.h:97,
                    from include/linux/debugobjects.h:6,
                    from include/linux/timer.h:8,
                    from include/linux/workqueue.h:9,
                    from include/linux/bpf.h:10,
                    from kernel/bpf/helpers.c:4:
   arch/x86/um/asm/processor.h:25:29: note: previous definition of 'rep_nop' with type 'void(void)'
      25 | static __always_inline void rep_nop(void)
         |                             ^~~~~~~
>> arch/x86/include/asm/vdso/processor.h:16:29: error: redefinition of 'cpu_relax'
      16 | static __always_inline void cpu_relax(void)
         |                             ^~~~~~~~~
   arch/x86/um/asm/processor.h:30:29: note: previous definition of 'cpu_relax' with type 'void(void)'
      30 | static __always_inline void cpu_relax(void)
         |                             ^~~~~~~~~
   In file included from include/uapi/linux/filter.h:9,
                    from include/linux/bpf.h:8:
   arch/x86/include/asm/vdso/gettimeofday.h: In function '__arch_get_hw_counter':
   arch/x86/include/asm/vdso/gettimeofday.h:253:34: error: 'VDSO_CLOCKMODE_TSC' undeclared (first use in this function); did you mean 'VDSO_CLOCKMODE_MAX'?
     253 |         if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
         |                                  ^~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:76:45: note: in definition of macro 'likely'
      76 | # define likely(x)      __builtin_expect(!!(x), 1)
         |                                             ^
   arch/x86/include/asm/vdso/gettimeofday.h:253:34: note: each undeclared identifier is reported only once for each function it appears in
     253 |         if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
         |                                  ^~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:76:45: note: in definition of macro 'likely'
      76 | # define likely(x)      __builtin_expect(!!(x), 1)
         |                                             ^
   arch/x86/include/asm/vdso/gettimeofday.h: In function 'vdso_calc_ns':
>> arch/x86/include/asm/vdso/gettimeofday.h:334:32: error: 'const struct vdso_data' has no member named 'max_cycles'
     334 |         if (unlikely(delta > vd->max_cycles)) {
         |                                ^~
   include/linux/compiler.h:77:45: note: in definition of macro 'unlikely'
      77 | # define unlikely(x)    __builtin_expect(!!(x), 0)
         |                                             ^

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for MODVERSIONS
   Depends on [n]: MODULES [=y] && !COMPILE_TEST [=y]
   Selected by [y]:
   - RANDSTRUCT_FULL [=y] && (CC_HAS_RANDSTRUCT [=n] || GCC_PLUGINS [=y]) && MODULES [=y]
   WARNING: unmet direct dependencies detected for GET_FREE_REGION
   Depends on [n]: SPARSEMEM [=n]
   Selected by [y]:
   - RESOURCE_KUNIT_TEST [=y] && RUNTIME_TESTING_MENU [=y] && KUNIT [=y]


vim +/rep_nop +11 arch/x86/include/asm/vdso/processor.h

abc22418db02b9 Vincenzo Frascino 2020-03-20   9  
abc22418db02b9 Vincenzo Frascino 2020-03-20  10  /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
abc22418db02b9 Vincenzo Frascino 2020-03-20 @11  static __always_inline void rep_nop(void)
abc22418db02b9 Vincenzo Frascino 2020-03-20  12  {
abc22418db02b9 Vincenzo Frascino 2020-03-20  13  	asm volatile("rep; nop" ::: "memory");
abc22418db02b9 Vincenzo Frascino 2020-03-20  14  }
abc22418db02b9 Vincenzo Frascino 2020-03-20  15  
abc22418db02b9 Vincenzo Frascino 2020-03-20 @16  static __always_inline void cpu_relax(void)
abc22418db02b9 Vincenzo Frascino 2020-03-20  17  {
abc22418db02b9 Vincenzo Frascino 2020-03-20  18  	rep_nop();
abc22418db02b9 Vincenzo Frascino 2020-03-20  19  }
abc22418db02b9 Vincenzo Frascino 2020-03-20  20

[bpf-next,v2,1/2] bpf: add bpf_get_hw_counter kfunc

Checks

Commit Message

Comments

Patch