[bpf-next] bpf: Optimize emit_mov_imm64().

Message ID	20240401233800.42737-1-alexei.starovoitov@gmail.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9F0B2E822 for <bpf@vger.kernel.org>; Mon, 1 Apr 2024 23:38:04 +0000 (UTC) From: Alexei Starovoitov <alexei.starovoitov@gmail.com> To: bpf@vger.kernel.org Cc: daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, eddyz87@gmail.com, kernel-team@fb.com Subject: [PATCH bpf-next] bpf: Optimize emit_mov_imm64(). Date: Mon, 1 Apr 2024 16:38:00 -0700 Message-Id: <20240401233800.42737-1-alexei.starovoitov@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[bpf-next] bpf: Optimize emit_mov_imm64(). \| expand [bpf-next] bpf: Optimize emit_mov_imm64().

Message ID

20240401233800.42737-1-alexei.starovoitov@gmail.com (mailing list archive)

State

Changes Requested

Delegated to:

BPF

Headers

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: bpf@vger.kernel.org
Cc: daniel@iogearbox.net,
	andrii@kernel.org,
	martin.lau@kernel.org,
	eddyz87@gmail.com,
	kernel-team@fb.com
Subject: [PATCH bpf-next] bpf: Optimize emit_mov_imm64().
Date: Mon,  1 Apr 2024 16:38:00 -0700
Message-Id: <20240401233800.42737-1-alexei.starovoitov@gmail.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

[bpf-next] bpf: Optimize emit_mov_imm64(). | expand

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 955 this patch: 955
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	16 maintainers not CCed: john.fastabend@gmail.com kpsingh@kernel.org mingo@redhat.com martin.lau@linux.dev tglx@linutronix.de dsahern@kernel.org sdf@google.com bp@alien8.de netdev@vger.kernel.org x86@kernel.org yonghong.song@linux.dev dave.hansen@linux.intel.com hpa@zytor.com haoluo@google.com jolsa@kernel.org song@kernel.org
netdev/build_clang	success	Errors and warnings before: 955 this patch: 955
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 966 this patch: 966
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 41 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-39	success	Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-38	success	Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-37	success	Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33	success	Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-41	success	Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-40	success	Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-32	success	Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-0	success	Logs for Lint
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-35	success	Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-34	success	Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-36	success	Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-42	success	Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc

Context

Check

Description

netdev/series_format

success

Single patches do not need cover letters

netdev/tree_selection

success

Clearly marked for bpf-next

netdev/ynl

success

Generated files up to date; no warnings/errors; no diff in generated;

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 955 this patch: 955

netdev/build_tools

success

No tools touched, skip

netdev/cc_maintainers

warning

16 maintainers not CCed: john.fastabend@gmail.com kpsingh@kernel.org mingo@redhat.com martin.lau@linux.dev tglx@linutronix.de dsahern@kernel.org sdf@google.com bp@alien8.de netdev@vger.kernel.org x86@kernel.org yonghong.song@linux.dev dave.hansen@linux.intel.com hpa@zytor.com haoluo@google.com jolsa@kernel.org song@kernel.org

netdev/build_clang

success

Errors and warnings before: 955 this patch: 955

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 966 this patch: 966

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 41 lines checked

netdev/build_clang_rust

success

No Rust files in patch. Skipping build

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

bpf/vmtest-bpf-next-PR

success

PR summary

bpf/vmtest-bpf-next-VM_Test-21

success

Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-39

success

Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18

bpf/vmtest-bpf-next-VM_Test-38

success

Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18

bpf/vmtest-bpf-next-VM_Test-31

success

Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17

bpf/vmtest-bpf-next-VM_Test-16

success

Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc

bpf/vmtest-bpf-next-VM_Test-37

success

Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18

bpf/vmtest-bpf-next-VM_Test-27

success

Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-26

success

Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-33

success

Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17

bpf/vmtest-bpf-next-VM_Test-24

success

Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-41

success

Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18

bpf/vmtest-bpf-next-VM_Test-30

success

Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17

bpf/vmtest-bpf-next-VM_Test-40

success

Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18

bpf/vmtest-bpf-next-VM_Test-22

success

Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-23

success

Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-32

success

Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17

bpf/vmtest-bpf-next-VM_Test-25

success

Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-14

success

Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc

bpf/vmtest-bpf-next-VM_Test-13

success

Logs for set-matrix

bpf/vmtest-bpf-next-VM_Test-15

success

Logs for x86_64-gcc / build-release

bpf/vmtest-bpf-next-VM_Test-0

success

Logs for Lint

bpf/vmtest-bpf-next-VM_Test-3

success

Logs for Validate matrix.py

bpf/vmtest-bpf-next-VM_Test-5

success

Logs for aarch64-gcc / build-release

bpf/vmtest-bpf-next-VM_Test-1

success

Logs for ShellCheck

bpf/vmtest-bpf-next-VM_Test-2

success

Logs for Unittests

bpf/vmtest-bpf-next-VM_Test-11

success

Logs for s390x-gcc / build / build for s390x with gcc

bpf/vmtest-bpf-next-VM_Test-35

success

Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18

bpf/vmtest-bpf-next-VM_Test-18

success

Logs for set-matrix

bpf/vmtest-bpf-next-VM_Test-4

success

Logs for aarch64-gcc / build / build for aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-19

success

Logs for x86_64-gcc / build / build for x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-17

success

Logs for s390x-gcc / veristat

bpf/vmtest-bpf-next-VM_Test-34

success

Logs for x86_64-llvm-17 / veristat

bpf/vmtest-bpf-next-VM_Test-29

success

Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization

bpf/vmtest-bpf-next-VM_Test-36

success

Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization

bpf/vmtest-bpf-next-VM_Test-12

success

Logs for s390x-gcc / build-release

bpf/vmtest-bpf-next-VM_Test-42

success

Logs for x86_64-llvm-18 / veristat

bpf/vmtest-bpf-next-VM_Test-20

success

Logs for x86_64-gcc / build-release

bpf/vmtest-bpf-next-VM_Test-28

success

Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17

bpf/vmtest-bpf-next-VM_Test-10

success

Logs for aarch64-gcc / veristat

bpf/vmtest-bpf-next-VM_Test-8

success

Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-6

success

Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-9

success

Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-7

success

Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc

Commit Message

Alexei Starovoitov April 1, 2024, 11:38 p.m. UTC

From: Alexei Starovoitov <ast@kernel.org>

Turned out that bpf prog callback addresses, bpf prog addresses
used in bpf_trampoline, and in other cases the 64-bit address
can be represented as sign extended 32-bit value.
According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339
"Skylake has 0.64c throughput for mov r64, imm64, vs. 0.25 for mov r32, imm32."
So use shorter encoding and faster instruction when possible.

Special care is needed in jit_subprogs(), since bpf_pseudo_func()
instruction cannot change its size during the last step of JIT.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c |  5 ++++-
 kernel/bpf/verifier.c       | 13 ++++++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

Comments

Daniel Borkmann April 2, 2024, 3:48 p.m. UTC | #1

On 4/2/24 1:38 AM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Turned out that bpf prog callback addresses, bpf prog addresses
> used in bpf_trampoline, and in other cases the 64-bit address
> can be represented as sign extended 32-bit value.
> According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339
> "Skylake has 0.64c throughput for mov r64, imm64, vs. 0.25 for mov r32, imm32."
> So use shorter encoding and faster instruction when possible.
> 
> Special care is needed in jit_subprogs(), since bpf_pseudo_func()
> instruction cannot change its size during the last step of JIT.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>   arch/x86/net/bpf_jit_comp.c |  5 ++++-
>   kernel/bpf/verifier.c       | 13 ++++++++++---
>   2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 3b639d6f2f54..47abddac6dc3 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -816,9 +816,10 @@ static void emit_mov_imm32(u8 **pprog, bool sign_propagate,
>   static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
>   			   const u32 imm32_hi, const u32 imm32_lo)
>   {
> +	u64 imm64 = ((u64)imm32_hi << 32) | (u32)imm32_lo;
>   	u8 *prog = *pprog;
>   
> -	if (is_uimm32(((u64)imm32_hi << 32) | (u32)imm32_lo)) {
> +	if (is_uimm32(imm64)) {
>   		/*
>   		 * For emitting plain u32, where sign bit must not be
>   		 * propagated LLVM tends to load imm64 over mov32
> @@ -826,6 +827,8 @@ static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
>   		 * 'mov %eax, imm32' instead.
>   		 */
>   		emit_mov_imm32(&prog, false, dst_reg, imm32_lo);
> +	} else if (is_simm32(imm64)) {
> +		emit_mov_imm32(&prog, true, dst_reg, imm32_lo);
>   	} else {
>   		/* movabsq rax, imm64 */
>   		EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index edb650667f44..d4a338e7b5e7 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19145,12 +19145,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>   		env->insn_aux_data[i].call_imm = insn->imm;
>   		/* point imm to __bpf_call_base+1 from JITs point of view */
>   		insn->imm = 1;
> -		if (bpf_pseudo_func(insn))
> +		if (bpf_pseudo_func(insn)) {
> +#if defined(MODULES_VADDR)
> +			u64 addr = MODULES_VADDR;
> +#else
> +			u64 addr = VMALLOC_START;
> +#endif

Is this beneficial for all archs? It seems this patch is mainly targetting x86.
Why not having a weak function like u64 bpf_jit_alloc_exec_start() which returns
the MODULES_VADDR for x86, but leaves the rest as-is?

For example, arm64 has MODULES_VADDR defined, but the allocator uses vmalloc
range instead, see bpf_jit_alloc_exec() there, so this is a different pool and
it's also not clear if this is better or worse wrt its imm encoding.

>   			/* jit (e.g. x86_64) may emit fewer instructions
>   			 * if it learns a u32 imm is the same as a u64 imm.
> -			 * Force a non zero here.
> +			 * Set close enough to possible prog address.
>   			 */
> -			insn[1].imm = 1;
> +			insn[0].imm = (u32)addr;
> +			insn[1].imm = addr >> 32;
> +		}
>   	}
>   
>   	err = bpf_prog_alloc_jited_linfo(prog);
>

Alexei Starovoitov April 3, 2024, 2:34 a.m. UTC | #2

On Tue, Apr 2, 2024 at 8:48 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/2/24 1:38 AM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Turned out that bpf prog callback addresses, bpf prog addresses
> > used in bpf_trampoline, and in other cases the 64-bit address
> > can be represented as sign extended 32-bit value.
> > According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339
> > "Skylake has 0.64c throughput for mov r64, imm64, vs. 0.25 for mov r32, imm32."
> > So use shorter encoding and faster instruction when possible.
> >
> > Special care is needed in jit_subprogs(), since bpf_pseudo_func()
> > instruction cannot change its size during the last step of JIT.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >   arch/x86/net/bpf_jit_comp.c |  5 ++++-
> >   kernel/bpf/verifier.c       | 13 ++++++++++---
> >   2 files changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 3b639d6f2f54..47abddac6dc3 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -816,9 +816,10 @@ static void emit_mov_imm32(u8 **pprog, bool sign_propagate,
> >   static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
> >                          const u32 imm32_hi, const u32 imm32_lo)
> >   {
> > +     u64 imm64 = ((u64)imm32_hi << 32) | (u32)imm32_lo;
> >       u8 *prog = *pprog;
> >
> > -     if (is_uimm32(((u64)imm32_hi << 32) | (u32)imm32_lo)) {
> > +     if (is_uimm32(imm64)) {
> >               /*
> >                * For emitting plain u32, where sign bit must not be
> >                * propagated LLVM tends to load imm64 over mov32
> > @@ -826,6 +827,8 @@ static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
> >                * 'mov %eax, imm32' instead.
> >                */
> >               emit_mov_imm32(&prog, false, dst_reg, imm32_lo);
> > +     } else if (is_simm32(imm64)) {
> > +             emit_mov_imm32(&prog, true, dst_reg, imm32_lo);
> >       } else {
> >               /* movabsq rax, imm64 */
> >               EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index edb650667f44..d4a338e7b5e7 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -19145,12 +19145,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
> >               env->insn_aux_data[i].call_imm = insn->imm;
> >               /* point imm to __bpf_call_base+1 from JITs point of view */
> >               insn->imm = 1;
> > -             if (bpf_pseudo_func(insn))
> > +             if (bpf_pseudo_func(insn)) {
> > +#if defined(MODULES_VADDR)
> > +                     u64 addr = MODULES_VADDR;
> > +#else
> > +                     u64 addr = VMALLOC_START;
> > +#endif
>
> Is this beneficial for all archs? It seems this patch is mainly targetting x86.
> Why not having a weak function like u64 bpf_jit_alloc_exec_start() which returns
> the MODULES_VADDR for x86, but leaves the rest as-is?
>
> For example, arm64 has MODULES_VADDR defined, but the allocator uses vmalloc
> range instead, see bpf_jit_alloc_exec() there, so this is a different pool and
> it's also not clear if this is better or worse wrt its imm encoding.

This part makes no difference for all JITs except x86.
Back when commit 3990ed4c4266 ("bpf: Stop caching subprog index in the
bpf_pseudo_func insn")
added the comment below: "jit (e.g. x86_64) may emit fewer instructions"
pseudo_func-s were introduced for x86 and only x86 JIT has this behavior.
Since then other JITs added support for pseudo_func-s, but none
of them rely on this part of the verifier.
So the comment still applies to x86 only (afaics).
s390, riscv, arm64 went with: "if (bpf_pseudo_func)" process
ld_imm64 differently regardless of what is the value of
insn[0].imm, insn[1].imm.
I think it's a bit wrong.
I considered removing this if (bpf_pseudo_func(insn)) from verifier.c
and doing a similar hack in x86 jit, but decided against that.
The previous insn[1].imm = 1 was a hack targeted at x86.
It served its purpose for 3 years.
A hack, but imo cleaner than if (bpf_pseudo_func(insn)) in JITs.
Since I'm making emit_mov_imm64() smarter, there is a need to
make this part of the verifier.c a bit more accurate in terms of
value it represents.
MODULES_VADDR or VMALLOC_START doesn't make a difference.
It's a kernel text address. It could be an (long)&_text. fwiw.
I believe all JITs can potentially generalize
if (bpf_pseudo_func(insn)) check into if (kernel_addr(imm64)),
but that's a follow up for somebody.

weak helper bpf_jit_alloc_exec_start() is certainly an overkill.
pseudo_func callback doesn't have to be jit-ed bpf prog.
It's the address of the function.
If there is ever an arch where kernel and jit-ed code needs different
insns to represent an address then we will tackle such issue at that time.

Notice that we have similar #if defined(MODULES_VADDR)
logic in bpf_jit_alloc_exec_limit() that was added 6 years ago
and it's still fine. No need to over design this one either.

>
> >                       /* jit (e.g. x86_64) may emit fewer instructions
> >                        * if it learns a u32 imm is the same as a u64 imm.
> > -                      * Force a non zero here.
> > +                      * Set close enough to possible prog address.
> >                        */
> > -                     insn[1].imm = 1;
> > +                     insn[0].imm = (u32)addr;
> > +                     insn[1].imm = addr >> 32;
> > +             }

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 3b639d6f2f54..47abddac6dc3 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -816,9 +816,10 @@  static void emit_mov_imm32(u8 **pprog, bool sign_propagate,
 static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
 			   const u32 imm32_hi, const u32 imm32_lo)
 {
+	u64 imm64 = ((u64)imm32_hi << 32) | (u32)imm32_lo;
 	u8 *prog = *pprog;
 
-	if (is_uimm32(((u64)imm32_hi << 32) | (u32)imm32_lo)) {
+	if (is_uimm32(imm64)) {
 		/*
 		 * For emitting plain u32, where sign bit must not be
 		 * propagated LLVM tends to load imm64 over mov32
@@ -826,6 +827,8 @@  static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
 		 * 'mov %eax, imm32' instead.
 		 */
 		emit_mov_imm32(&prog, false, dst_reg, imm32_lo);
+	} else if (is_simm32(imm64)) {
+		emit_mov_imm32(&prog, true, dst_reg, imm32_lo);
 	} else {
 		/* movabsq rax, imm64 */
 		EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index edb650667f44..d4a338e7b5e7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19145,12 +19145,19 @@  static int jit_subprogs(struct bpf_verifier_env *env)
 		env->insn_aux_data[i].call_imm = insn->imm;
 		/* point imm to __bpf_call_base+1 from JITs point of view */
 		insn->imm = 1;
-		if (bpf_pseudo_func(insn))
+		if (bpf_pseudo_func(insn)) {
+#if defined(MODULES_VADDR)
+			u64 addr = MODULES_VADDR;
+#else
+			u64 addr = VMALLOC_START;
+#endif
 			/* jit (e.g. x86_64) may emit fewer instructions
 			 * if it learns a u32 imm is the same as a u64 imm.
-			 * Force a non zero here.
+			 * Set close enough to possible prog address.
 			 */
-			insn[1].imm = 1;
+			insn[0].imm = (u32)addr;
+			insn[1].imm = addr >> 32;
+		}
 	}
 
 	err = bpf_prog_alloc_jited_linfo(prog);

[bpf-next] bpf: Optimize emit_mov_imm64().

Checks

Commit Message

Comments

Patch