From patchwork Thu Mar 17 14:02:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Kuohai X-Patchwork-Id: 12784043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9F108C433EF for ; Thu, 17 Mar 2022 13:52:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:CC :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=p4ZUVNfxSeMPAssKwqNUMyhKX2mdhhJEudyupDAOgLo=; b=iYIqGXh1VMwjBL 11PuIVtPwb1eB/lUsx1ToSc5PrrPYuXB5I9i0rrAZddOGZCs6FG9lM8K66sdKLJVE6O24AmpzeB+X 6FLDg/LKc+Kg/9F7DE0SOLyU4AGU0ntprH6sSQ7yX8eCDUBEuTIXfEhuJmNFIhxOGJtmC/iuGqbTJ /83e/e/s7Z6XFeNol1YLDWoSbi+R10/XURBz9RrLwP0RQo7uml2SXGEl4lvt2DNMLi0shrE9gSVnQ K1nbH+tsF+UHjXXXxJtYFhpeG0ybe9eDZpONGUz2n9Gk8/aGb1A0uSYsYXlVHQ6FnmTmxK4j7Uevc qLprW5UHafaaZk2gkYPg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nUqXP-00GIgg-3s; Thu, 17 Mar 2022 13:51:35 +0000 Received: from szxga01-in.huawei.com ([45.249.212.187]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nUqXL-00GIeu-Dk for linux-arm-kernel@lists.infradead.org; Thu, 17 Mar 2022 13:51:33 +0000 Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KK7n06yCNzfYw4; Thu, 17 Mar 2022 21:49:56 +0800 (CST) Received: from huawei.com (10.67.174.197) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Thu, 17 Mar 2022 21:51:24 +0800 From: Xu Kuohai To: , CC: Catalin Marinas , Will Deacon , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Julien Thierry , Mark Rutland , Hou Tao , Fuad Tabba , James Morse Subject: [PATCH bpf-next v4 0/4] bpf, arm64: Optimize BPF store/load using Date: Thu, 17 Mar 2022 10:02:39 -0400 Message-ID: <20220317140243.212509-1-xukuohai@huawei.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Originating-IP: [10.67.174.197] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220317_065131_915117_E505D279 X-CRM114-Status: UNSURE ( 7.28 ) X-CRM114-Notice: Please train this message. X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The current BPF store/load instruction is translated by the JIT into two instructions. The first instruction moves the immediate offset into a temporary register. The second instruction uses this temporary register to do the real store/load. In fact, arm64 supports addressing with immediate offsets. So This series introduces optimization that uses arm64 str/ldr instruction with immediate offset when the offset fits. Example of generated instuction for r2 = *(u64 *)(r1 + 0): Without optimization: mov x10, 0 ldr x1, [x0, x10] With optimization: ldr x1, [x0, 0] For the following bpftrace command: bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }' Without this series, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: mov x25, sp 1c: mov x26, #0x0 // #0 20: bti j 24: sub sp, sp, #0x90 28: add x19, x0, #0x0 2c: mov x0, #0x0 // #0 30: mov x10, #0xffffffffffffff78 // #-136 34: str x0, [x25, x10] 38: mov x10, #0xffffffffffffff80 // #-128 3c: str x0, [x25, x10] 40: mov x10, #0xffffffffffffff88 // #-120 44: str x0, [x25, x10] 48: mov x10, #0xffffffffffffff90 // #-112 4c: str x0, [x25, x10] 50: mov x10, #0xffffffffffffff98 // #-104 54: str x0, [x25, x10] 58: mov x10, #0xffffffffffffffa0 // #-96 5c: str x0, [x25, x10] 60: mov x10, #0xffffffffffffffa8 // #-88 64: str x0, [x25, x10] 68: mov x10, #0xffffffffffffffb0 // #-80 6c: str x0, [x25, x10] 70: mov x10, #0xffffffffffffffb8 // #-72 74: str x0, [x25, x10] 78: mov x10, #0xffffffffffffffc0 // #-64 7c: str x0, [x25, x10] 80: mov x10, #0xffffffffffffffc8 // #-56 84: str x0, [x25, x10] 88: mov x10, #0xffffffffffffffd0 // #-48 8c: str x0, [x25, x10] 90: mov x10, #0xffffffffffffffd8 // #-40 94: str x0, [x25, x10] 98: mov x10, #0xffffffffffffffe0 // #-32 9c: str x0, [x25, x10] a0: mov x10, #0xffffffffffffffe8 // #-24 a4: str x0, [x25, x10] a8: mov x10, #0xfffffffffffffff0 // #-16 ac: str x0, [x25, x10] b0: mov x10, #0xfffffffffffffff8 // #-8 b4: str x0, [x25, x10] b8: mov x10, #0x8 // #8 bc: ldr x2, [x19, x10] [...] With this series, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: stp x27, x28, [sp, #-16]! 1c: mov x25, sp 20: sub x27, x25, #0x88 24: mov x26, #0x0 // #0 28: bti j 2c: sub sp, sp, #0x90 30: add x19, x0, #0x0 34: mov x0, #0x0 // #0 38: str x0, [x27] 3c: str x0, [x27, #8] 40: str x0, [x27, #16] 44: str x0, [x27, #24] 48: str x0, [x27, #32] 4c: str x0, [x27, #40] 50: str x0, [x27, #48] 54: str x0, [x27, #56] 58: str x0, [x27, #64] 5c: str x0, [x27, #72] 60: str x0, [x27, #80] 64: str x0, [x27, #88] 68: str x0, [x27, #96] 6c: str x0, [x27, #104] 70: str x0, [x27, #112] 74: str x0, [x27, #120] 78: str x0, [x27, #128] 7c: ldr x2, [x19, #8] [...] Tested with test_bpf on both big-endian and little-endian arm64 qemu: test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed] test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED Also tested with bpftrace, no failures encountered. v3->v4: 1. Fix compile error reported by kernel test robot 2. Bug fix, and move test case to last patch v2 -> v3: 1. Split the v2 patch into 2 patches, one for arm64 instruction encoder, the other for BPF JIT 2. Add tests for BPF_LDX/BPF_STX with different offsets 3. Adjust the offset of str/ldr(immediate) to positive number v1 -> v2: 1. Remove macro definition that causes checkpatch to fail 2. Append result to commit message Xu Kuohai (4): arm64: insn: add ldr/str with immediate offset bpf, arm64: Optimize BPF store/load using str/ldr with immediate offset bpf, arm64: adjust the offset of str/ldr(immediate) to positive number bpf/tests: Add tests for BPF_LDX/BPF_STX with different offsets arch/arm64/include/asm/insn.h | 9 ++ arch/arm64/lib/insn.c | 67 ++++++-- arch/arm64/net/bpf_jit.h | 14 ++ arch/arm64/net/bpf_jit_comp.c | 233 +++++++++++++++++++++++++-- lib/test_bpf.c | 285 +++++++++++++++++++++++++++++++++- 5 files changed, 573 insertions(+), 35 deletions(-)