From patchwork Sun Oct 2 05:11:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jie Meng X-Patchwork-Id: 12996749 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D09B4C433F5 for ; Sun, 2 Oct 2022 05:12:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229529AbiJBFMB (ORCPT ); Sun, 2 Oct 2022 01:12:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbiJBFMA (ORCPT ); Sun, 2 Oct 2022 01:12:00 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BFBD5072C for ; Sat, 1 Oct 2022 22:11:58 -0700 (PDT) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2922UMmn010951 for ; Sat, 1 Oct 2022 22:11:58 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=blWgJUKo2fEjrg8otb1YMeoqVIacBbtSVSu/vrW5KRY=; b=VL8UJ207lbQVjsKMxI69ei39riuuf/CB3tQBVj2KCK147GGgdcFUixDwQltD2xMXhHAf lyvLVhOHaIEXTyThB5S70mwscS+nkD5tia6f2uoBbcrFjzvuRlHWFqry7jOV25Y8HuDT sXsXaVP2CNGcGglGc/IMa71QFjxpkPHmo+0= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jxk8qudvm-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Sat, 01 Oct 2022 22:11:57 -0700 Received: from twshared15978.04.prn5.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 1 Oct 2022 22:11:56 -0700 Received: by devbig150.prn5.facebook.com (Postfix, from userid 187975) id 6740410F44C84; Sat, 1 Oct 2022 22:11:51 -0700 (PDT) From: Jie Meng To: , , , CC: Jie Meng Subject: [PATCH bpf-next v4 1/3] bpf,x64: avoid unnecessary instructions when shift dest is ecx Date: Sat, 1 Oct 2022 22:11:41 -0700 Message-ID: <20221002051143.831029-2-jmeng@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221002051143.831029-1-jmeng@fb.com> References: <20220927185801.1824838-1-jmeng@fb.com> <20221002051143.831029-1-jmeng@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: C2XZXrfTFuuV6VaKKHm0_6KqAIAlpq7q X-Proofpoint-ORIG-GUID: C2XZXrfTFuuV6VaKKHm0_6KqAIAlpq7q X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-10-01_15,2022-09-29_03,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net x64 JIT produces redundant instructions when a shift operation's destination register is BPF_REG_4/ecx and this patch removes them. Specifically, when dest reg is BPF_REG_4 but the src isn't, we needn't push and pop ecx around shift only to get it overwritten by r11 immediately afterwards. In the rare case when both dest and src registers are BPF_REG_4, a single shift instruction is sufficient and we don't need the two MOV instructions around the shift. To summarize using shift left as an example, without patch: ------------------------------------------------- | dst == ecx | dst != ecx ================================================= src == ecx | mov r11, ecx | shl dst, cl | shl r11, ecx | | mov ecx, r11 | ------------------------------------------------- src != ecx | mov r11, ecx | push ecx | push ecx | mov ecx, src | mov ecx, src | shl dst, cl | shl r11, cl | pop ecx | pop ecx | | mov ecx, r11 | ------------------------------------------------- With patch: ------------------------------------------------- | dst == ecx | dst != ecx ================================================= src == ecx | shl ecx, cl | shl dst, cl ------------------------------------------------- src != ecx | mov r11, ecx | push ecx | mov ecx, src | mov ecx, src | shl r11, cl | shl dst, cl | mov ecx, r11 | pop ecx ------------------------------------------------- Signed-off-by: Jie Meng --- arch/x86/net/bpf_jit_comp.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 5b6230779cf3..d9ba997c5891 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1136,16 +1136,15 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image case BPF_ALU64 | BPF_RSH | BPF_X: case BPF_ALU64 | BPF_ARSH | BPF_X: - /* Check for bad case when dst_reg == rcx */ - if (dst_reg == BPF_REG_4) { - /* mov r11, dst_reg */ - EMIT_mov(AUX_REG, dst_reg); - dst_reg = AUX_REG; - } - if (src_reg != BPF_REG_4) { /* common case */ - EMIT1(0x51); /* push rcx */ - + /* Check for bad case when dst_reg == rcx */ + if (dst_reg == BPF_REG_4) { + /* mov r11, dst_reg */ + EMIT_mov(AUX_REG, dst_reg); + dst_reg = AUX_REG; + } else { + EMIT1(0x51); /* push rcx */ + } /* mov rcx, src_reg */ EMIT_mov(BPF_REG_4, src_reg); } @@ -1157,12 +1156,14 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image b3 = simple_alu_opcodes[BPF_OP(insn->code)]; EMIT2(0xD3, add_1reg(b3, dst_reg)); - if (src_reg != BPF_REG_4) - EMIT1(0x59); /* pop rcx */ + if (src_reg != BPF_REG_4) { + if (insn->dst_reg == BPF_REG_4) + /* mov dst_reg, r11 */ + EMIT_mov(insn->dst_reg, AUX_REG); + else + EMIT1(0x59); /* pop rcx */ + } - if (insn->dst_reg == BPF_REG_4) - /* mov dst_reg, r11 */ - EMIT_mov(insn->dst_reg, AUX_REG); break; case BPF_ALU | BPF_END | BPF_FROM_BE: From patchwork Sun Oct 2 05:11:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jie Meng X-Patchwork-Id: 12996751 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F8CBC433F5 for ; Sun, 2 Oct 2022 05:12:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229436AbiJBFMF (ORCPT ); Sun, 2 Oct 2022 01:12:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229524AbiJBFME (ORCPT ); Sun, 2 Oct 2022 01:12:04 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 043885072D for ; Sat, 1 Oct 2022 22:12:04 -0700 (PDT) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2920D9Yp021992 for ; Sat, 1 Oct 2022 22:12:03 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=3/ZGgeFXZS8ms9eFfk+hjm0fN02/xg1a0CvXNbNibjg=; b=e2FJS4AWzgSzSTpQREuQTevomqkLZlRRjZ9YnzIGfhcayiPhsTwLkfwLz8v0Hi4/njgG fYpb78bsXZUUkmAfhQYETU9MIZCcUsT1U34BHbO+muyE6y7o/NC3GHUovouktqBpFSDI dupVwTkcoOt3NFOcY0KIYkY64UPGOLxJzkw= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jxmv4b45t-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Sat, 01 Oct 2022 22:12:03 -0700 Received: from twshared22593.02.prn5.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 1 Oct 2022 22:12:02 -0700 Received: by devbig150.prn5.facebook.com (Postfix, from userid 187975) id EB04510F44C89; Sat, 1 Oct 2022 22:11:51 -0700 (PDT) From: Jie Meng To: , , , CC: Jie Meng Subject: [PATCH bpf-next v4 2/3] bpf,x64: use shrx/sarx/shlx when available Date: Sat, 1 Oct 2022 22:11:42 -0700 Message-ID: <20221002051143.831029-3-jmeng@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221002051143.831029-1-jmeng@fb.com> References: <20220927185801.1824838-1-jmeng@fb.com> <20221002051143.831029-1-jmeng@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: a8csbyW6-hQZAg2ygzFbVoxKj9-TgwzB X-Proofpoint-ORIG-GUID: a8csbyW6-hQZAg2ygzFbVoxKj9-TgwzB X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-10-01_15,2022-09-29_03,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Instead of shr/sar/shl that implicitly use %cl, emit their more flexible alternatives provided in BMI2 when advantageous; keep using the non BMI2 instructions when shift count is already in BPF_REG_4/rcx as non BMI2 instructions are shorter. To summarize, when BMI2 is available: ------------------------------------------------- | arbitrary dst ================================================= src == ecx | shl dst, cl ------------------------------------------------- src != ecx | shlx dst, dst, src ------------------------------------------------- A concrete example between non BMI2 and BMI2 codegen. To shift %rsi by %rdi: Without BMI2: ef3: push %rcx 51 ef4: mov %rdi,%rcx 48 89 f9 ef7: shl %cl,%rsi 48 d3 e6 efa: pop %rcx 59 With BMI2: f0b: shlx %rdi,%rsi,%rsi c4 e2 c1 f7 f6 Signed-off-by: Jie Meng --- arch/x86/net/bpf_jit_comp.c | 64 +++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index d9ba997c5891..d09c54f3d2e0 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -889,6 +889,48 @@ static void emit_nops(u8 **pprog, int len) *pprog = prog; } +/* emit the 3-byte VEX prefix */ +static void emit_3vex(u8 **pprog, bool r, bool x, bool b, u8 m, + bool w, u8 src_reg2, bool l, u8 p) +{ + u8 *prog = *pprog; + u8 b0 = 0xc4, b1, b2; + u8 src2 = reg2hex[src_reg2]; + + if (is_ereg(src_reg2)) + src2 |= 1 << 3; + + /* + * 7 0 + * +---+---+---+---+---+---+---+---+ + * |~R |~X |~B | m | + * +---+---+---+---+---+---+---+---+ + */ + b1 = (!r << 7) | (!x << 6) | (!b << 5) | (m & 0x1f); + /* + * 7 0 + * +---+---+---+---+---+---+---+---+ + * | W | ~vvvv | L | pp | + * +---+---+---+---+---+---+---+---+ + */ + b2 = (w << 7) | ((~src2 & 0xf) << 3) | (l << 2) | (p & 3); + + EMIT3(b0, b1, b2); + *pprog = prog; +} + +/* emit BMI2 shift instruction */ +static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op) +{ + u8 *prog = *pprog; + bool r = is_ereg(dst_reg); + u8 m = 2; /* escape code 0f38 */ + + emit_3vex(&prog, r, false, r, m, is64, src_reg, false, op); + EMIT2(0xf7, add_2reg(0xC0, dst_reg, dst_reg)); + *pprog = prog; +} + #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp))) static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image, @@ -1135,6 +1177,28 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image case BPF_ALU64 | BPF_LSH | BPF_X: case BPF_ALU64 | BPF_RSH | BPF_X: case BPF_ALU64 | BPF_ARSH | BPF_X: + /* BMI2 shifts aren't better when shift count is already in rcx */ + if (boot_cpu_has(X86_FEATURE_BMI2) && src_reg != BPF_REG_4) { + /* shrx/sarx/shlx dst_reg, dst_reg, src_reg */ + bool w = (BPF_CLASS(insn->code) == BPF_ALU64); + u8 op; + + switch (BPF_OP(insn->code)) { + case BPF_LSH: + op = 1; /* prefix 0x66 */ + break; + case BPF_RSH: + op = 3; /* prefix 0xf2 */ + break; + case BPF_ARSH: + op = 2; /* prefix 0xf3 */ + break; + } + + emit_shiftx(&prog, dst_reg, src_reg, w, op); + + break; + } if (src_reg != BPF_REG_4) { /* common case */ /* Check for bad case when dst_reg == rcx */ From patchwork Sun Oct 2 05:11:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jie Meng X-Patchwork-Id: 12996752 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F85AC433FE for ; Sun, 2 Oct 2022 05:12:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229537AbiJBFMP (ORCPT ); Sun, 2 Oct 2022 01:12:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229524AbiJBFMO (ORCPT ); Sun, 2 Oct 2022 01:12:14 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F2305072C for ; Sat, 1 Oct 2022 22:12:14 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 291Nnfwu004662 for ; Sat, 1 Oct 2022 22:12:13 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=i01NmLPIFOUlWCPYtyI7PCkDUgAVJGEMBgBQ2uTJNrw=; b=D7cdmg4l+8+Ja8ZoOIXWyUsIov6G3SFH3rWP6+A1+2d5qpDGTW3AbzyBbEkHmrFZFas5 VT3qN0wpTaGPsWlqItyZP+hi9DzbR4RonFjepUoccKf3f0FUiB+qJpdVCxNGnPghYnck DnBJOeSmXSpa6CA9+ZEO5AngHoxga8WBgCs= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jxk303daq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Sat, 01 Oct 2022 22:12:13 -0700 Received: from twshared15978.04.prn5.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 1 Oct 2022 22:12:09 -0700 Received: by devbig150.prn5.facebook.com (Postfix, from userid 187975) id 6140110F44C8D; Sat, 1 Oct 2022 22:11:52 -0700 (PDT) From: Jie Meng To: , , , CC: Jie Meng Subject: [PATCH bpf-next v4 3/3] bpf: add selftests for lsh, rsh, arsh with reg operand Date: Sat, 1 Oct 2022 22:11:43 -0700 Message-ID: <20221002051143.831029-4-jmeng@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221002051143.831029-1-jmeng@fb.com> References: <20220927185801.1824838-1-jmeng@fb.com> <20221002051143.831029-1-jmeng@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: ziu9hQ_iA2KifGIjkX1WR8i04wR2Jb5t X-Proofpoint-ORIG-GUID: ziu9hQ_iA2KifGIjkX1WR8i04wR2Jb5t X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-10-01_15,2022-09-29_03,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Current tests cover only shifts with an immediate as the source operand/shift counts; add a new test case to cover register operand. Signed-off-by: Jie Meng --- tools/testing/selftests/bpf/verifier/jit.c | 24 ++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/tools/testing/selftests/bpf/verifier/jit.c b/tools/testing/selftests/bpf/verifier/jit.c index 79021c30e51e..8bf37e5207f1 100644 --- a/tools/testing/selftests/bpf/verifier/jit.c +++ b/tools/testing/selftests/bpf/verifier/jit.c @@ -20,6 +20,30 @@ .result = ACCEPT, .retval = 2, }, +{ + "jit: lsh, rsh, arsh by reg", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_MOV64_IMM(BPF_REG_4, 1), + BPF_MOV64_IMM(BPF_REG_1, 0xff), + BPF_ALU64_REG(BPF_LSH, BPF_REG_1, BPF_REG_0), + BPF_ALU32_REG(BPF_LSH, BPF_REG_1, BPF_REG_4), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x3fc, 1), + BPF_EXIT_INSN(), + BPF_ALU64_REG(BPF_RSH, BPF_REG_1, BPF_REG_4), + BPF_MOV64_REG(BPF_REG_4, BPF_REG_1), + BPF_ALU32_REG(BPF_RSH, BPF_REG_4, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_4, 0xff, 1), + BPF_EXIT_INSN(), + BPF_ALU64_REG(BPF_ARSH, BPF_REG_4, BPF_REG_4), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_4, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_IMM(BPF_REG_0, 2), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 2, +}, { "jit: mov32 for ldimm64, 1", .insns = {