From patchwork Mon Mar 21 15:28:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Kuohai X-Patchwork-Id: 12787344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 777C5C433F5 for ; Mon, 21 Mar 2022 15:19:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:CC :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=TF0bWmTdPiFQD7MAoSFsJ9dD2iAiCXmG2+9G+jcGbUU=; b=TIRsDaYHmOQgKD 62WA8IHxwG2BH3PdYtO8dIlekMmoy70LaRzqY1XNEM02e0CMfZwnCo/yHVCpQBMj5GN0CscOVTTI9 GJ4yUTCZ5UdgF50NjdEfrebC2O4slYSVa7+Njq9lU3cQW9KEzSk4W9ojyvmJ+F2TPDbG4ymHR+/oQ 5xxPoT/koT5a1SpCjXEGuDKqmx3eMIbeIXBvJ+KRL4tOxbr7mfY+IunzHDCqrgLo+x/UlN5GFUVjw KFAm7EX97hevP4wGhmPCh0b7m7KPOrTroiTd1SK0LVYbY68MvyKqCriPQA6wOu6wttR4Y2YxRoHac QsEcV6LAVPYFkfddWYqQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nWJn7-0088w5-Ql; Mon, 21 Mar 2022 15:17:54 +0000 Received: from szxga01-in.huawei.com ([45.249.212.187]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nWJn0-0088tJ-4e for linux-arm-kernel@lists.infradead.org; Mon, 21 Mar 2022 15:17:48 +0000 Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KMdVg1MXqzfYr8; Mon, 21 Mar 2022 23:16:11 +0800 (CST) Received: from huawei.com (10.67.174.197) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 21 Mar 2022 23:17:42 +0800 From: Xu Kuohai To: , CC: Catalin Marinas , Will Deacon , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Julien Thierry , Mark Rutland , Hou Tao , Fuad Tabba , James Morse Subject: [PATCH -next v5 0/5] bpf, arm64: Optimize BPF store/load using arm64 str/ldr(immediate) Date: Mon, 21 Mar 2022 11:28:47 -0400 Message-ID: <20220321152852.2334294-1-xukuohai@huawei.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Originating-IP: [10.67.174.197] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220321_081746_627490_E5B73F30 X-CRM114-Status: UNSURE ( 8.97 ) X-CRM114-Notice: Please train this message. X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The current BPF store/load instruction is translated by the JIT into two instructions. The first instruction moves the immediate offset into a temporary register. The second instruction uses this temporary register to do the real store/load. In fact, arm64 supports addressing with immediate offsets. So This series introduces optimization that uses arm64 str/ldr instruction with immediate offset when the offset fits. Example of generated instuction for r2 = *(u64 *)(r1 + 0): Without optimization: mov x10, 0 ldr x1, [x0, x10] With optimization: ldr x1, [x0, 0] For the following bpftrace command: bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }' Without this series, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: mov x25, sp 1c: mov x26, #0x0 // #0 20: bti j 24: sub sp, sp, #0x90 28: add x19, x0, #0x0 2c: mov x0, #0x0 // #0 30: mov x10, #0xffffffffffffff78 // #-136 34: str x0, [x25, x10] 38: mov x10, #0xffffffffffffff80 // #-128 3c: str x0, [x25, x10] 40: mov x10, #0xffffffffffffff88 // #-120 44: str x0, [x25, x10] 48: mov x10, #0xffffffffffffff90 // #-112 4c: str x0, [x25, x10] 50: mov x10, #0xffffffffffffff98 // #-104 54: str x0, [x25, x10] 58: mov x10, #0xffffffffffffffa0 // #-96 5c: str x0, [x25, x10] 60: mov x10, #0xffffffffffffffa8 // #-88 64: str x0, [x25, x10] 68: mov x10, #0xffffffffffffffb0 // #-80 6c: str x0, [x25, x10] 70: mov x10, #0xffffffffffffffb8 // #-72 74: str x0, [x25, x10] 78: mov x10, #0xffffffffffffffc0 // #-64 7c: str x0, [x25, x10] 80: mov x10, #0xffffffffffffffc8 // #-56 84: str x0, [x25, x10] 88: mov x10, #0xffffffffffffffd0 // #-48 8c: str x0, [x25, x10] 90: mov x10, #0xffffffffffffffd8 // #-40 94: str x0, [x25, x10] 98: mov x10, #0xffffffffffffffe0 // #-32 9c: str x0, [x25, x10] a0: mov x10, #0xffffffffffffffe8 // #-24 a4: str x0, [x25, x10] a8: mov x10, #0xfffffffffffffff0 // #-16 ac: str x0, [x25, x10] b0: mov x10, #0xfffffffffffffff8 // #-8 b4: str x0, [x25, x10] b8: mov x10, #0x8 // #8 bc: ldr x2, [x19, x10] [...] With this series, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: stp x27, x28, [sp, #-16]! 1c: mov x25, sp 20: sub x27, x25, #0x88 24: mov x26, #0x0 // #0 28: bti j 2c: sub sp, sp, #0x90 30: add x19, x0, #0x0 34: mov x0, #0x0 // #0 38: str x0, [x27] 3c: str x0, [x27, #8] 40: str x0, [x27, #16] 44: str x0, [x27, #24] 48: str x0, [x27, #32] 4c: str x0, [x27, #40] 50: str x0, [x27, #48] 54: str x0, [x27, #56] 58: str x0, [x27, #64] 5c: str x0, [x27, #72] 60: str x0, [x27, #80] 64: str x0, [x27, #88] 68: str x0, [x27, #96] 6c: str x0, [x27, #104] 70: str x0, [x27, #112] 74: str x0, [x27, #120] 78: str x0, [x27, #128] 7c: ldr x2, [x19, #8] [...] Tested with test_bpf on both big-endian and little-endian arm64 qemu: test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed] test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED v4->v5: 1. Fix incorrect FP offset in tail call scenario pointed out by Daniel, and add a tail call test case for this issue 2. Align down fpb_offset to 8 bytes to avoid unaligned offsets 3. Style and spelling fix v3->v4: 1. Fix compile error reported by kernel test robot 2. Add one more test case for load/store in different offsets, and move test case to last patch 3. Fix some obvious bugs v2 -> v3: 1. Split the v2 patch into 2 patches, one for arm64 instruction encoder, the other for BPF JIT 2. Add tests for BPF_LDX/BPF_STX with different offsets 3. Adjust the offset of str/ldr(immediate) to positive number v1 -> v2: 1. Remove macro definition that causes checkpatch to fail 2. Append result to commit message Xu Kuohai (5): arm64: insn: add ldr/str with immediate offset bpf, arm64: Optimize BPF store/load using str/ldr with immediate offset bpf, arm64: adjust the offset of str/ldr(immediate) to positive number bpf/tests: Add tests for BPF_LDX/BPF_STX with different offsets bpf, arm64: add load store test case for tail call arch/arm64/include/asm/insn.h | 9 + arch/arm64/lib/insn.c | 67 ++++++-- arch/arm64/net/bpf_jit.h | 14 ++ arch/arm64/net/bpf_jit_comp.c | 243 ++++++++++++++++++++++++-- lib/test_bpf.c | 315 +++++++++++++++++++++++++++++++++- 5 files changed, 613 insertions(+), 35 deletions(-)