Message ID | 20220321152852.2334294-1-xukuohai@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf, arm64: Optimize BPF store/load using arm64 str/ldr(immediate) | expand |
在 2022/3/21 23:28, Xu Kuohai 写道: > The current BPF store/load instruction is translated by the JIT into two > instructions. The first instruction moves the immediate offset into a > temporary register. The second instruction uses this temporary register > to do the real store/load. > > In fact, arm64 supports addressing with immediate offsets. So This series > introduces optimization that uses arm64 str/ldr instruction with immediate > offset when the offset fits. > > Example of generated instuction for r2 = *(u64 *)(r1 + 0): > > Without optimization: > mov x10, 0 > ldr x1, [x0, x10] > > With optimization: > ldr x1, [x0, 0] > > For the following bpftrace command: > > bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }' > > Without this series, jited code(fragment): > > 0: bti c > 4: stp x29, x30, [sp, #-16]! > 8: mov x29, sp > c: stp x19, x20, [sp, #-16]! > 10: stp x21, x22, [sp, #-16]! > 14: stp x25, x26, [sp, #-16]! > 18: mov x25, sp > 1c: mov x26, #0x0 // #0 > 20: bti j > 24: sub sp, sp, #0x90 > 28: add x19, x0, #0x0 > 2c: mov x0, #0x0 // #0 > 30: mov x10, #0xffffffffffffff78 // #-136 > 34: str x0, [x25, x10] > 38: mov x10, #0xffffffffffffff80 // #-128 > 3c: str x0, [x25, x10] > 40: mov x10, #0xffffffffffffff88 // #-120 > 44: str x0, [x25, x10] > 48: mov x10, #0xffffffffffffff90 // #-112 > 4c: str x0, [x25, x10] > 50: mov x10, #0xffffffffffffff98 // #-104 > 54: str x0, [x25, x10] > 58: mov x10, #0xffffffffffffffa0 // #-96 > 5c: str x0, [x25, x10] > 60: mov x10, #0xffffffffffffffa8 // #-88 > 64: str x0, [x25, x10] > 68: mov x10, #0xffffffffffffffb0 // #-80 > 6c: str x0, [x25, x10] > 70: mov x10, #0xffffffffffffffb8 // #-72 > 74: str x0, [x25, x10] > 78: mov x10, #0xffffffffffffffc0 // #-64 > 7c: str x0, [x25, x10] > 80: mov x10, #0xffffffffffffffc8 // #-56 > 84: str x0, [x25, x10] > 88: mov x10, #0xffffffffffffffd0 // #-48 > 8c: str x0, [x25, x10] > 90: mov x10, #0xffffffffffffffd8 // #-40 > 94: str x0, [x25, x10] > 98: mov x10, #0xffffffffffffffe0 // #-32 > 9c: str x0, [x25, x10] > a0: mov x10, #0xffffffffffffffe8 // #-24 > a4: str x0, [x25, x10] > a8: mov x10, #0xfffffffffffffff0 // #-16 > ac: str x0, [x25, x10] > b0: mov x10, #0xfffffffffffffff8 // #-8 > b4: str x0, [x25, x10] > b8: mov x10, #0x8 // #8 > bc: ldr x2, [x19, x10] > [...] > > With this series, jited code(fragment): > > 0: bti c > 4: stp x29, x30, [sp, #-16]! > 8: mov x29, sp > c: stp x19, x20, [sp, #-16]! > 10: stp x21, x22, [sp, #-16]! > 14: stp x25, x26, [sp, #-16]! > 18: stp x27, x28, [sp, #-16]! > 1c: mov x25, sp > 20: sub x27, x25, #0x88 > 24: mov x26, #0x0 // #0 > 28: bti j > 2c: sub sp, sp, #0x90 > 30: add x19, x0, #0x0 > 34: mov x0, #0x0 // #0 > 38: str x0, [x27] > 3c: str x0, [x27, #8] > 40: str x0, [x27, #16] > 44: str x0, [x27, #24] > 48: str x0, [x27, #32] > 4c: str x0, [x27, #40] > 50: str x0, [x27, #48] > 54: str x0, [x27, #56] > 58: str x0, [x27, #64] > 5c: str x0, [x27, #72] > 60: str x0, [x27, #80] > 64: str x0, [x27, #88] > 68: str x0, [x27, #96] > 6c: str x0, [x27, #104] > 70: str x0, [x27, #112] > 74: str x0, [x27, #120] > 78: str x0, [x27, #128] > 7c: ldr x2, [x19, #8] > [...] > > Tested with test_bpf on both big-endian and little-endian arm64 qemu: > > test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed] > test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] > test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED > > v4->v5: > 1. Fix incorrect FP offset in tail call scenario pointed out by Daniel, > and add a tail call test case for this issue > 2. Align down fpb_offset to 8 bytes to avoid unaligned offsets > 3. Style and spelling fix > > v3->v4: > 1. Fix compile error reported by kernel test robot > 2. Add one more test case for load/store in different offsets, and move > test case to last patch > 3. Fix some obvious bugs > > v2 -> v3: > 1. Split the v2 patch into 2 patches, one for arm64 instruction encoder, > the other for BPF JIT > 2. Add tests for BPF_LDX/BPF_STX with different offsets > 3. Adjust the offset of str/ldr(immediate) to positive number > > v1 -> v2: > 1. Remove macro definition that causes checkpatch to fail > 2. Append result to commit message > > Xu Kuohai (5): > arm64: insn: add ldr/str with immediate offset > bpf, arm64: Optimize BPF store/load using str/ldr with immediate > offset > bpf, arm64: adjust the offset of str/ldr(immediate) to positive number > bpf/tests: Add tests for BPF_LDX/BPF_STX with different offsets > bpf, arm64: add load store test case for tail call > > arch/arm64/include/asm/insn.h | 9 + > arch/arm64/lib/insn.c | 67 ++++++-- > arch/arm64/net/bpf_jit.h | 14 ++ > arch/arm64/net/bpf_jit_comp.c | 243 ++++++++++++++++++++++++-- > lib/test_bpf.c | 315 +++++++++++++++++++++++++++++++++- > 5 files changed, 613 insertions(+), 35 deletions(-) > ping ;)
Hello: This series was applied to bpf/bpf-next.git (master) by Daniel Borkmann <daniel@iogearbox.net>: On Mon, 21 Mar 2022 11:28:47 -0400 you wrote: > The current BPF store/load instruction is translated by the JIT into two > instructions. The first instruction moves the immediate offset into a > temporary register. The second instruction uses this temporary register > to do the real store/load. > > In fact, arm64 supports addressing with immediate offsets. So This series > introduces optimization that uses arm64 str/ldr instruction with immediate > offset when the offset fits. > > [...] Here is the summary with links: - [bpf-next,v5,1/5] arm64: insn: add ldr/str with immediate offset https://git.kernel.org/bpf/bpf-next/c/30c90f6757a7 - [bpf-next,v5,2/5] bpf, arm64: Optimize BPF store/load using arm64 str/ldr(immediate offset) https://git.kernel.org/bpf/bpf-next/c/7db6c0f1d8ee - [bpf-next,v5,3/5] bpf, arm64: adjust the offset of str/ldr(immediate) to positive number https://git.kernel.org/bpf/bpf-next/c/5b3d19b9bd40 - [bpf-next,v5,4/5] bpf/tests: Add tests for BPF_LDX/BPF_STX with different offsets https://git.kernel.org/bpf/bpf-next/c/f516420f683d - [bpf-next,v5,5/5] bpf, arm64: add load store test case for tail call https://git.kernel.org/bpf/bpf-next/c/38608ee7b690 You are awesome, thank you!