[bpf] arm32, bpf: Fix sign-extension mov instruction

Message ID	20240409095038.26356-1-puranjay@kernel.org (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13D4280024; Tue, 9 Apr 2024 09:50:57 +0000 (UTC) From: Puranjay Mohan <puranjay@kernel.org> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>, Yonghong Song <yonghong.song@linux.dev>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Russell King <linux@armlinux.org.uk>, "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>, bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: puranjay12@gmail.com Subject: [PATCH bpf] arm32, bpf: Fix sign-extension mov instruction Date: Tue, 9 Apr 2024 09:50:38 +0000 Message-ID: <20240409095038.26356-1-puranjay@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[bpf] arm32, bpf: Fix sign-extension mov instruction \| expand [bpf] arm32, bpf: Fix sign-extension mov instruction

Message ID

20240409095038.26356-1-puranjay@kernel.org (mailing list archive)

State

Changes Requested

Delegated to:

BPF

Headers

From: Puranjay Mohan <puranjay@kernel.org>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>,
	Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	"Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>,
	bpf@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Cc: puranjay12@gmail.com
Subject: [PATCH bpf] arm32, bpf: Fix sign-extension mov instruction
Date: Tue,  9 Apr 2024 09:50:38 +0000
Message-ID: <20240409095038.26356-1-puranjay@kernel.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

[bpf] arm32, bpf: Fix sign-extension mov instruction | expand

Context	Check	Description
bpf/vmtest-bpf-PR	success	PR summary
bpf/vmtest-bpf-VM_Test-0	success	Logs for Lint
bpf/vmtest-bpf-VM_Test-2	success	Logs for Unittests
bpf/vmtest-bpf-VM_Test-3	success	Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-5	success	Logs for aarch64-gcc / build-release
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for bpf
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 8 this patch: 8
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 17 of 17 maintainers
netdev/build_clang	success	Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 8 this patch: 8
netdev/checkpatch	warning	WARNING: line length of 98 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-VM_Test-10	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-9	success	Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-4	success	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-11	success	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-16	success	Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-17	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-20	success	Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-19	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18	success	Logs for set-matrix
bpf/vmtest-bpf-VM_Test-28	success	Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-12	success	Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-34	success	Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-VM_Test-35	success	Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-33	success	Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-42	success	Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-VM_Test-41	success	Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-6	success	Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-21	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22	success	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23	success	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-25	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27	success	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-32	success	Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-30	success	Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-31	success	Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-37	success	Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-38	success	Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-39	success	Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-40	success	Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-7	success	Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-13	success	Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-8	success	Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-36	success	Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-VM_Test-29	success	Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-VM_Test-15	success	Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-14	success	Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc

Context

Check

Description

bpf/vmtest-bpf-PR

success

PR summary

bpf/vmtest-bpf-VM_Test-0

success

Logs for Lint

bpf/vmtest-bpf-VM_Test-2

success

Logs for Unittests

bpf/vmtest-bpf-VM_Test-3

success

Logs for Validate matrix.py

bpf/vmtest-bpf-VM_Test-1

success

Logs for ShellCheck

bpf/vmtest-bpf-VM_Test-5

success

Logs for aarch64-gcc / build-release

netdev/series_format

success

Single patches do not need cover letters

netdev/tree_selection

success

Clearly marked for bpf

netdev/ynl

success

Generated files up to date; no warnings/errors; no diff in generated;

netdev/fixes_present

success

Fixes tag present in non-next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 8 this patch: 8

netdev/build_tools

success

No tools touched, skip

netdev/cc_maintainers

success

CCed 17 of 17 maintainers

netdev/build_clang

success

Errors and warnings before: 8 this patch: 8

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

Fixes tag looks correct

netdev/build_allmodconfig_warn

success

Errors and warnings before: 8 this patch: 8

netdev/checkpatch

warning

WARNING: line length of 98 exceeds 80 columns

netdev/build_clang_rust

success

No Rust files in patch. Skipping build

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

bpf/vmtest-bpf-VM_Test-10

success

Logs for aarch64-gcc / veristat

bpf/vmtest-bpf-VM_Test-9

success

Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc

bpf/vmtest-bpf-VM_Test-4

success

Logs for aarch64-gcc / build / build for aarch64 with gcc

bpf/vmtest-bpf-VM_Test-11

success

Logs for s390x-gcc / build / build for s390x with gcc

bpf/vmtest-bpf-VM_Test-16

success

Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc

bpf/vmtest-bpf-VM_Test-17

success

Logs for s390x-gcc / veristat

bpf/vmtest-bpf-VM_Test-20

success

Logs for x86_64-gcc / build-release

bpf/vmtest-bpf-VM_Test-19

success

Logs for x86_64-gcc / build / build for x86_64 with gcc

bpf/vmtest-bpf-VM_Test-18

success

Logs for set-matrix

bpf/vmtest-bpf-VM_Test-28

success

Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17

bpf/vmtest-bpf-VM_Test-12

success

Logs for s390x-gcc / build-release

bpf/vmtest-bpf-VM_Test-34

success

Logs for x86_64-llvm-17 / veristat

bpf/vmtest-bpf-VM_Test-35

success

Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18

bpf/vmtest-bpf-VM_Test-33

success

Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17

bpf/vmtest-bpf-VM_Test-42

success

Logs for x86_64-llvm-18 / veristat

bpf/vmtest-bpf-VM_Test-41

success

Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18

bpf/vmtest-bpf-VM_Test-6

success

Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc

bpf/vmtest-bpf-VM_Test-21

success

Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-22

success

Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-24

success

Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-23

success

Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-26

success

Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-25

success

Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-27

success

Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-32

success

Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17

bpf/vmtest-bpf-VM_Test-30

success

Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17

bpf/vmtest-bpf-VM_Test-31

success

Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17

bpf/vmtest-bpf-VM_Test-37

success

Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18

bpf/vmtest-bpf-VM_Test-38

success

Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18

bpf/vmtest-bpf-VM_Test-39

success

Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18

bpf/vmtest-bpf-VM_Test-40

success

Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18

bpf/vmtest-bpf-VM_Test-7

success

Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc

bpf/vmtest-bpf-VM_Test-13

success

Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc

bpf/vmtest-bpf-VM_Test-8

success

Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc

bpf/vmtest-bpf-VM_Test-36

success

Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization

bpf/vmtest-bpf-VM_Test-29

success

Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization

bpf/vmtest-bpf-VM_Test-15

success

Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc

bpf/vmtest-bpf-VM_Test-14

success

Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc

Commit Message

Puranjay Mohan April 9, 2024, 9:50 a.m. UTC

The current implementation of the mov instruction with sign extension
clobbers the source register because it sign extends the source and then
moves it to the destination.

Fix this by moving the src to a temporary register before doing the sign
extension only if src is not an emulated register (on the scratch stack).

Also fix the emit_a32_movsx_r64() to put the register back on scratch
stack if that register is emulated on stack.

Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction")
Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com/
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/arm/net/bpf_jit_32.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Russell King (Oracle) April 9, 2024, 3:40 p.m. UTC | #1

On Tue, Apr 09, 2024 at 09:50:38AM +0000, Puranjay Mohan wrote:
> The current implementation of the mov instruction with sign extension
> clobbers the source register because it sign extends the source and then
> moves it to the destination.
> 
> Fix this by moving the src to a temporary register before doing the sign
> extension only if src is not an emulated register (on the scratch stack).
> 
> Also fix the emit_a32_movsx_r64() to put the register back on scratch
> stack if that register is emulated on stack.

It would be good to include in the commit message an example or two of
the resulting assembly code so that it's clear what the expected
generation is. Instead, I'm going to have to work it out myself, but
I'm quite sure this is information you already have.

> Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction")
> Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com/
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  arch/arm/net/bpf_jit_32.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index 1d672457d02f..8fde6ab66cb4 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -878,6 +878,13 @@ static inline void emit_a32_mov_r(const s8 dst, const s8 src, const u8 off,
>  
>  	rt = arm_bpf_get_reg32(src, tmp[0], ctx);
>  	if (off && off != 32) {
> +		/* If rt is not a stacked register, move it to tmp, so it doesn't get clobbered by
> +		 * the shift operations.
> +		 */
> +		if (rt == src) {
> +			emit(ARM_MOV_R(tmp[0], rt), ctx);
> +			rt = tmp[0];
> +		}

This change is adding inefficiency, don't we want to have the JIT
creating as efficient code as possible within the bounds of
reasonableness?

>  		emit(ARM_LSL_I(rt, rt, 32 - off), ctx);
>  		emit(ARM_ASR_I(rt, rt, 32 - off), ctx);

LSL and ASR can very easily take a different source register to the
destination register. All this needs to be is:

		emit(ARM_LSL_I(tmp[0], rt, 32 - off), ctx);
		emit(ARM_ASR_I(tmp[0], tmp[0], 32 - off), ctx);
		rt = tmp[0];

This will generate:

		lsl	tmp[0], src, #32-off
		asr	tmp[0], tmp[0], #32-off

and then the store to the output register will occur.

What about the high-32 bits of the register pair - should that be
taking any value?

>  	}

I notice in passing that the comments are out of sync with the
code - please update the comments along with code changes.

> @@ -919,15 +926,15 @@ static inline void emit_a32_movsx_r64(const bool is64, const u8 off, const s8 ds
>  	const s8 *tmp = bpf2a32[TMP_REG_1];
>  	const s8 *rt;
>  
> -	rt = arm_bpf_get_reg64(dst, tmp, ctx);
> -
>  	emit_a32_mov_r(dst_lo, src_lo, off, ctx);
>  	if (!is64) {
>  		if (!ctx->prog->aux->verifier_zext)
>  			/* Zero out high 4 bytes */
>  			emit_a32_mov_i(dst_hi, 0, ctx);
>  	} else {
> +		rt = arm_bpf_get_reg64(dst, tmp, ctx);
>  		emit(ARM_ASR_I(rt[0], rt[1], 31), ctx);
> +		arm_bpf_put_reg64(dst, rt, ctx);
>  	}
>  }

Why oh why oh why are we emitting code to read the source register
(which may be a load), then write it to the destination (which may
be a store) to only then immediately reload from the destination
to then do the sign extension? This is madness.

Please... apply some thought to the code generation from the JIT...
or I will remove you from being a maintainer of this code. I spent
time crafting some parts of the JIT to generate efficient code and
I'm seeing that a lot of that work is now being thrown away by
someone who seemingly doesn't care about generating "good" code.

Why not read the source 32-bit register (potentially into a temporary
register), store it to the destination low register, then do the
sign extension into the destination high register or zero the high
register. We _could_ be a bit more optimal here by checking whether
dst_hi is a stacked register and use that directly for the ASR
instruction, omitting the need to move it there afterwards - whether
that's worth it or not depends on the performance we expect from this
eBPF opcode.

	rt = arm_bpf_get_reg32(src_lo, tmp[1], ctx);
	/* rt may be either src[1] or tmp[1] */

	/* write dst_lo */
	arm_bpf_put_reg32(dst_lo, rt, ctx)

	if (is64) {
		emit(ARM_ASR_I(tmp[0], rt, 31), ctx);
		arm_bpf_put_reg32(dst_hi, tmp[0], ctx);
	} else if (!ctx->prog->aux->verifier_zext) {
		emit_a32_mov_i(dst_hi, 0, ctx);
	}

Puranjay Mohan April 19, 2024, 6:52 p.m. UTC | #2

"Russell King (Oracle)" <linux@armlinux.org.uk> writes:

> On Tue, Apr 09, 2024 at 09:50:38AM +0000, Puranjay Mohan wrote:
>> The current implementation of the mov instruction with sign extension
>> clobbers the source register because it sign extends the source and then
>> moves it to the destination.
>> 
>> Fix this by moving the src to a temporary register before doing the sign
>> extension only if src is not an emulated register (on the scratch stack).
>> 
>> Also fix the emit_a32_movsx_r64() to put the register back on scratch
>> stack if that register is emulated on stack.
>
> It would be good to include in the commit message an example or two of
> the resulting assembly code so that it's clear what the expected
> generation is. Instead, I'm going to have to work it out myself, but
> I'm quite sure this is information you already have.
>
>> Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction")
>> Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com/
>> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> ---
>>  arch/arm/net/bpf_jit_32.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>> index 1d672457d02f..8fde6ab66cb4 100644
>> --- a/arch/arm/net/bpf_jit_32.c
>> +++ b/arch/arm/net/bpf_jit_32.c
>> @@ -878,6 +878,13 @@ static inline void emit_a32_mov_r(const s8 dst, const s8 src, const u8 off,
>>  
>>  	rt = arm_bpf_get_reg32(src, tmp[0], ctx);
>>  	if (off && off != 32) {
>> +		/* If rt is not a stacked register, move it to tmp, so it doesn't get clobbered by
>> +		 * the shift operations.
>> +		 */
>> +		if (rt == src) {
>> +			emit(ARM_MOV_R(tmp[0], rt), ctx);
>> +			rt = tmp[0];
>> +		}
>
> This change is adding inefficiency, don't we want to have the JIT
> creating as efficient code as possible within the bounds of
> reasonableness?
>
>>  		emit(ARM_LSL_I(rt, rt, 32 - off), ctx);
>>  		emit(ARM_ASR_I(rt, rt, 32 - off), ctx);
>
> LSL and ASR can very easily take a different source register to the
> destination register. All this needs to be is:
>
> 		emit(ARM_LSL_I(tmp[0], rt, 32 - off), ctx);
> 		emit(ARM_ASR_I(tmp[0], tmp[0], 32 - off), ctx);
> 		rt = tmp[0];
>
> This will generate:
>
> 		lsl	tmp[0], src, #32-off
> 		asr	tmp[0], tmp[0], #32-off
>
> and then the store to the output register will occur.
>
> What about the high-32 bits of the register pair - should that be
> taking any value?
>
>>  	}
>
> I notice in passing that the comments are out of sync with the
> code - please update the comments along with code changes.
>
>> @@ -919,15 +926,15 @@ static inline void emit_a32_movsx_r64(const bool is64, const u8 off, const s8 ds
>>  	const s8 *tmp = bpf2a32[TMP_REG_1];
>>  	const s8 *rt;
>>  
>> -	rt = arm_bpf_get_reg64(dst, tmp, ctx);
>> -
>>  	emit_a32_mov_r(dst_lo, src_lo, off, ctx);
>>  	if (!is64) {
>>  		if (!ctx->prog->aux->verifier_zext)
>>  			/* Zero out high 4 bytes */
>>  			emit_a32_mov_i(dst_hi, 0, ctx);
>>  	} else {
>> +		rt = arm_bpf_get_reg64(dst, tmp, ctx);
>>  		emit(ARM_ASR_I(rt[0], rt[1], 31), ctx);
>> +		arm_bpf_put_reg64(dst, rt, ctx);
>>  	}
>>  }
>
> Why oh why oh why are we emitting code to read the source register
> (which may be a load), then write it to the destination (which may
> be a store) to only then immediately reload from the destination
> to then do the sign extension? This is madness.
>
> Please... apply some thought to the code generation from the JIT...
> or I will remove you from being a maintainer of this code. I spent
> time crafting some parts of the JIT to generate efficient code and
> I'm seeing that a lot of that work is now being thrown away by
> someone who seemingly doesn't care about generating "good" code.
>

Sorry for this, I also like to make sure the JITs are as efficient as
possible. I was too focused on fixing this as fast as possible and
didn't pay attention that day.

I have reimplemented the whole thing again to make sure all bugs are
fixed. The commit message has the generated assembly for all cases:

https://lore.kernel.org/all/20240419182832.27707-1-puranjay@kernel.org/

Thanks,
Puranjay

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 1d672457d02f..8fde6ab66cb4 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -878,6 +878,13 @@  static inline void emit_a32_mov_r(const s8 dst, const s8 src, const u8 off,
 
 	rt = arm_bpf_get_reg32(src, tmp[0], ctx);
 	if (off && off != 32) {
+		/* If rt is not a stacked register, move it to tmp, so it doesn't get clobbered by
+		 * the shift operations.
+		 */
+		if (rt == src) {
+			emit(ARM_MOV_R(tmp[0], rt), ctx);
+			rt = tmp[0];
+		}
 		emit(ARM_LSL_I(rt, rt, 32 - off), ctx);
 		emit(ARM_ASR_I(rt, rt, 32 - off), ctx);
 	}
@@ -919,15 +926,15 @@  static inline void emit_a32_movsx_r64(const bool is64, const u8 off, const s8 ds
 	const s8 *tmp = bpf2a32[TMP_REG_1];
 	const s8 *rt;
 
-	rt = arm_bpf_get_reg64(dst, tmp, ctx);
-
 	emit_a32_mov_r(dst_lo, src_lo, off, ctx);
 	if (!is64) {
 		if (!ctx->prog->aux->verifier_zext)
 			/* Zero out high 4 bytes */
 			emit_a32_mov_i(dst_hi, 0, ctx);
 	} else {
+		rt = arm_bpf_get_reg64(dst, tmp, ctx);
 		emit(ARM_ASR_I(rt[0], rt[1], 31), ctx);
+		arm_bpf_put_reg64(dst, rt, ctx);
 	}
 }

[bpf] arm32, bpf: Fix sign-extension mov instruction

Checks

Commit Message

Comments

Patch