From patchwork Tue Sep 27 18:57:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jie Meng X-Patchwork-Id: 12991110 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CD30C6FA83 for ; Tue, 27 Sep 2022 18:58:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230118AbiI0S6o (ORCPT ); Tue, 27 Sep 2022 14:58:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229567AbiI0S6n (ORCPT ); Tue, 27 Sep 2022 14:58:43 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01C211DB54C for ; Tue, 27 Sep 2022 11:58:42 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28RD5oPS015938 for ; Tue, 27 Sep 2022 11:58:42 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Tw59ClHvl/v9nyocTsrkM162WBmFqvGAp8VweSRoggw=; b=WzDiHbGBFer3mI+jp1TDp0Q18HXc+vAwyvXSCW2PX+qGJGXNyUspqe8VuLBtOfAy8wM0 qBgU533QvpPnbkW0sJgmsH8DAykFb5ndgLL4ikKG591CmTOubS1SsKqYjN93A3OuVdaU TRs0XOfC4KWp5YtcKLf9ACNGz1yyDSD92+k= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jumv5ew5p-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 27 Sep 2022 11:58:42 -0700 Received: from twshared13579.04.prn5.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 27 Sep 2022 11:58:41 -0700 Received: by devbig150.prn5.facebook.com (Postfix, from userid 187975) id C16ED10B4B674; Tue, 27 Sep 2022 11:58:34 -0700 (PDT) From: Jie Meng To: , , , CC: Jie Meng Subject: [PATCH bpf-next v3 1/3] bpf,x64: avoid unnecessary instructions when shift dest is ecx Date: Tue, 27 Sep 2022 11:57:59 -0700 Message-ID: <20220927185801.1824838-2-jmeng@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220927185801.1824838-1-jmeng@fb.com> References: <7437e1cb-325c-fc86-37f6-3422c085007d@iogearbox.net> <20220927185801.1824838-1-jmeng@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: gHMJSBqUpOIXjbHRVcP3btdoEaDKjBaO X-Proofpoint-ORIG-GUID: gHMJSBqUpOIXjbHRVcP3btdoEaDKjBaO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-27_09,2022-09-27_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net x64 JIT produces redundant instructions when a shift operation's destination register is BPF_REG_4/ecx and this patch removes them. Specifically, when dest reg is BPF_REG_4 but the src isn't, we needn't push and pop ecx around shift only to get it overwritten by r11 immediately afterwards. In the rare case when both dest and src registers are BPF_REG_4, a single shift instruction is sufficient and we don't need the two MOV instructions around the shift. To summarize using shift left as an example, without patch: ------------------------------------------------- | dst == ecx | dst != ecx ================================================= src == ecx | mov r11, ecx | shl dst, cl | shl r11, ecx | | mov ecx, r11 | ------------------------------------------------- src != ecx | mov r11, ecx | push ecx | push ecx | mov ecx, src | mov ecx, src | shl dst, cl | shl r11, cl | pop ecx | pop ecx | | mov ecx, r11 | ------------------------------------------------- With patch: ------------------------------------------------- | dst == ecx | dst != ecx ================================================= src == ecx | shl ecx, cl | shl dst, cl ------------------------------------------------- src != ecx | mov r11, ecx | push ecx | mov ecx, src | mov ecx, src | shl r11, cl | shl dst, cl | mov ecx, r11 | pop ecx ------------------------------------------------- Signed-off-by: Jie Meng --- arch/x86/net/bpf_jit_comp.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 35796db58116..6a5c59f1e6f9 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1136,18 +1136,18 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image case BPF_ALU64 | BPF_RSH | BPF_X: case BPF_ALU64 | BPF_ARSH | BPF_X: - /* Check for bad case when dst_reg == rcx */ - if (dst_reg == BPF_REG_4) { - /* mov r11, dst_reg */ - EMIT_mov(AUX_REG, dst_reg); - dst_reg = AUX_REG; - } - if (src_reg != BPF_REG_4) { /* common case */ - EMIT1(0x51); /* push rcx */ - - /* mov rcx, src_reg */ - EMIT_mov(BPF_REG_4, src_reg); + /* Check for bad case when dst_reg == rcx */ + if (dst_reg == BPF_REG_4) { + /* mov r11, dst_reg */ + EMIT_mov(AUX_REG, dst_reg); + dst_reg = AUX_REG; + } else { + EMIT1(0x51); /* push rcx */ + + /* mov rcx, src_reg */ + EMIT_mov(BPF_REG_4, src_reg); + } } /* shl %rax, %cl | shr %rax, %cl | sar %rax, %cl */ @@ -1157,12 +1157,14 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image b3 = simple_alu_opcodes[BPF_OP(insn->code)]; EMIT2(0xD3, add_1reg(b3, dst_reg)); - if (src_reg != BPF_REG_4) - EMIT1(0x59); /* pop rcx */ + if (src_reg != BPF_REG_4) { + if (insn->dst_reg == BPF_REG_4) + /* mov dst_reg, r11 */ + EMIT_mov(insn->dst_reg, AUX_REG); + else + EMIT1(0x59); /* pop rcx */ + } - if (insn->dst_reg == BPF_REG_4) - /* mov dst_reg, r11 */ - EMIT_mov(insn->dst_reg, AUX_REG); break; case BPF_ALU | BPF_END | BPF_FROM_BE: From patchwork Tue Sep 27 18:58:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jie Meng X-Patchwork-Id: 12991111 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D00FC07E9D for ; Tue, 27 Sep 2022 18:58:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229838AbiI0S6p (ORCPT ); Tue, 27 Sep 2022 14:58:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229567AbiI0S6p (ORCPT ); Tue, 27 Sep 2022 14:58:45 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 532731DB548 for ; Tue, 27 Sep 2022 11:58:44 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28RD5oPW015938 for ; Tue, 27 Sep 2022 11:58:44 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=g0GyE0YNtLCaf+o/OcF75OlaLwk3uYstodM8y0DIv5A=; b=ICZhdGQ3x7igV9qoNTN7bVP25tQN/I5SNudrChjznzYCRMwytIk8+M+qR4ACeqsG88a4 IE7oeKGXjdmLXdjWrpgVyM873Q2QC1s7CL68xrbWdR5gzn9pTXEwwTvmn9Z+WXlPl5b2 HYjIj3LMOPEOcLXm3XLwmzFH4uf2kDH+ShQ= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jumv5ew5p-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 27 Sep 2022 11:58:44 -0700 Received: from twshared13579.04.prn5.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 27 Sep 2022 11:58:42 -0700 Received: by devbig150.prn5.facebook.com (Postfix, from userid 187975) id D0A1D10B4B676; Tue, 27 Sep 2022 11:58:34 -0700 (PDT) From: Jie Meng To: , , , CC: Jie Meng Subject: [PATCH bpf-next v3 2/3] bpf,x64: use shrx/sarx/shlx when available Date: Tue, 27 Sep 2022 11:58:00 -0700 Message-ID: <20220927185801.1824838-3-jmeng@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220927185801.1824838-1-jmeng@fb.com> References: <7437e1cb-325c-fc86-37f6-3422c085007d@iogearbox.net> <20220927185801.1824838-1-jmeng@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: NKbsq47OCTQuknaES8VbE2hRVUMCF8ng X-Proofpoint-ORIG-GUID: NKbsq47OCTQuknaES8VbE2hRVUMCF8ng X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-27_09,2022-09-27_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Instead of shr/sar/shl that implicitly use %cl, emit their more flexible alternatives provided in BMI2 when advantageous; keep using the non BMI2 instructions when shift count is already in BPF_REG_4/rcx as non BMI2 instructions are shorter. To summarize, when BMI2 is available: ------------------------------------------------- | arbitrary dst ================================================= src == ecx | shl dst, cl ------------------------------------------------- src != ecx | shlx dst, dst, src ------------------------------------------------- A concrete example between non BMI2 and BMI2 codegen. To shift %rsi by %rdi: Without BMI2: ef3: push %rcx 51 ef4: mov %rdi,%rcx 48 89 f9 ef7: shl %cl,%rsi 48 d3 e6 efa: pop %rcx 59 With BMI2: f0b: shlx %rdi,%rsi,%rsi c4 e2 c1 f7 f6 Signed-off-by: Jie Meng --- arch/x86/net/bpf_jit_comp.c | 64 +++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 6a5c59f1e6f9..f91eac901c32 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -889,6 +889,48 @@ static void emit_nops(u8 **pprog, int len) *pprog = prog; } +/* emit the 3-byte VEX prefix */ +static void emit_3vex(u8 **pprog, bool r, bool x, bool b, u8 m, + bool w, u8 src_reg2, bool l, u8 p) +{ + u8 *prog = *pprog; + u8 b0 = 0xc4, b1, b2; + u8 src2 = reg2hex[src_reg2]; + + if (is_ereg(src_reg2)) + src2 |= 1 << 3; + + /* + * 7 0 + * +---+---+---+---+---+---+---+---+ + * |~R |~X |~B | m | + * +---+---+---+---+---+---+---+---+ + */ + b1 = (!r << 7) | (!x << 6) | (!b << 5) | (m & 0x1f); + /* + * 7 0 + * +---+---+---+---+---+---+---+---+ + * | W | ~vvvv | L | pp | + * +---+---+---+---+---+---+---+---+ + */ + b2 = (w << 7) | ((~src2 & 0xf) << 3) | (l << 2) | (p & 3); + + EMIT3(b0, b1, b2); + *pprog = prog; +} + +/* emit BMI2 shift instruction */ +static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op) +{ + u8 *prog = *pprog; + bool r = is_ereg(dst_reg); + u8 m = 2; /* escape code 0f38 */ + + emit_3vex(&prog, r, false, r, m, is64, src_reg, false, op); + EMIT2(0xf7, add_2reg(0xC0, dst_reg, dst_reg)); + *pprog = prog; +} + #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp))) static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image, @@ -1135,6 +1177,28 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image case BPF_ALU64 | BPF_LSH | BPF_X: case BPF_ALU64 | BPF_RSH | BPF_X: case BPF_ALU64 | BPF_ARSH | BPF_X: + /* BMI2 shifts aren't better when shift count is already in rcx */ + if (boot_cpu_has(X86_FEATURE_BMI2) && src_reg != BPF_REG_4) { + /* shrx/sarx/shlx dst_reg, dst_reg, src_reg */ + bool w = (BPF_CLASS(insn->code) == BPF_ALU64); + u8 op; + + switch (BPF_OP(insn->code)) { + case BPF_LSH: + op = 1; /* prefix 0x66 */ + break; + case BPF_RSH: + op = 3; /* prefix 0xf2 */ + break; + case BPF_ARSH: + op = 2; /* prefix 0xf3 */ + break; + } + + emit_shiftx(&prog, dst_reg, src_reg, w, op); + + break; + } if (src_reg != BPF_REG_4) { /* common case */ /* Check for bad case when dst_reg == rcx */ From patchwork Tue Sep 27 18:58:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jie Meng X-Patchwork-Id: 12991109 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D333C6FA82 for ; Tue, 27 Sep 2022 18:58:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231218AbiI0S6n (ORCPT ); Tue, 27 Sep 2022 14:58:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230118AbiI0S6n (ORCPT ); Tue, 27 Sep 2022 14:58:43 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94E381DB548 for ; Tue, 27 Sep 2022 11:58:42 -0700 (PDT) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28RI0GAm015702 for ; Tue, 27 Sep 2022 11:58:42 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=i01NmLPIFOUlWCPYtyI7PCkDUgAVJGEMBgBQ2uTJNrw=; b=NZAPFBMa1R6bXp9+W0WlMcffWSpI2r9yo6PILK5/EsW6t2hOSsDhFLpUqt4LiUjL0IJT ytAA0sjQ+G5QDXlGX+9PvlPr41nWNdq8naKzFFUQHBoe5txLDpSv39o9Ncfhb/HaLuUa fhaIlx1dGLPx/Wu62i+Hdx6uFrswRT3TU9k= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jv62ngf5x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 27 Sep 2022 11:58:42 -0700 Received: from twshared13315.14.prn3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 27 Sep 2022 11:58:40 -0700 Received: by devbig150.prn5.facebook.com (Postfix, from userid 187975) id DB0FE10B4B678; Tue, 27 Sep 2022 11:58:34 -0700 (PDT) From: Jie Meng To: , , , CC: Jie Meng Subject: [PATCH bpf-next v3 3/3] bpf: add selftests for lsh, rsh, arsh with reg operand Date: Tue, 27 Sep 2022 11:58:01 -0700 Message-ID: <20220927185801.1824838-4-jmeng@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220927185801.1824838-1-jmeng@fb.com> References: <7437e1cb-325c-fc86-37f6-3422c085007d@iogearbox.net> <20220927185801.1824838-1-jmeng@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 3W0DzlH9ZrMF-TczappRIs52ff6dBSCC X-Proofpoint-GUID: 3W0DzlH9ZrMF-TczappRIs52ff6dBSCC X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-27_09,2022-09-27_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Current tests cover only shifts with an immediate as the source operand/shift counts; add a new test case to cover register operand. Signed-off-by: Jie Meng --- tools/testing/selftests/bpf/verifier/jit.c | 24 ++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/tools/testing/selftests/bpf/verifier/jit.c b/tools/testing/selftests/bpf/verifier/jit.c index 79021c30e51e..8bf37e5207f1 100644 --- a/tools/testing/selftests/bpf/verifier/jit.c +++ b/tools/testing/selftests/bpf/verifier/jit.c @@ -20,6 +20,30 @@ .result = ACCEPT, .retval = 2, }, +{ + "jit: lsh, rsh, arsh by reg", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_MOV64_IMM(BPF_REG_4, 1), + BPF_MOV64_IMM(BPF_REG_1, 0xff), + BPF_ALU64_REG(BPF_LSH, BPF_REG_1, BPF_REG_0), + BPF_ALU32_REG(BPF_LSH, BPF_REG_1, BPF_REG_4), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x3fc, 1), + BPF_EXIT_INSN(), + BPF_ALU64_REG(BPF_RSH, BPF_REG_1, BPF_REG_4), + BPF_MOV64_REG(BPF_REG_4, BPF_REG_1), + BPF_ALU32_REG(BPF_RSH, BPF_REG_4, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_4, 0xff, 1), + BPF_EXIT_INSN(), + BPF_ALU64_REG(BPF_ARSH, BPF_REG_4, BPF_REG_4), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_4, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_IMM(BPF_REG_0, 2), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 2, +}, { "jit: mov32 for ldimm64, 1", .insns = {