From patchwork Wed Oct 19 12:50:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9D292C433FE for ; Wed, 19 Oct 2022 13:02:26 +0000 (UTC) Received: from localhost ([::1]:60304 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8iH-0000qv-3O for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:02:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57968) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8YB-00051m-Tc; Wed, 19 Oct 2022 08:52:00 -0400 Received: from [200.168.210.66] (port=3639 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8Y8-00025u-FF; Wed, 19 Oct 2022 08:51:59 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:42 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 74CC8800278; Wed, 19 Oct 2022 09:50:42 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 01/12] target/ppc: Moved VMLADDUHM to decodetree and use gvec Date: Wed, 19 Oct 2022 09:50:29 -0300 Message-Id: <20221019125040.48028-2-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:42.0781 (UTC) FILETIME=[6609E8D0:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" This patch moves VMLADDUHM to decodetree a creates a gvec implementation using mul_vec and add_vec. rept loop master patch 8 12500 0,01810500 0,00903100 (-50.1%) 25 4000 0,01739400 0,00747700 (-57.0%) 100 1000 0,01843600 0,00901400 (-51.1%) 500 200 0,02574600 0,01971000 (-23.4%) 2500 40 0,05921600 0,07121800 (+20.3%) 8000 12 0,15326700 0,21725200 (+41.7%) The significant difference in performance when REPT is low and LOOP is high I think is due to the fact that the new implementation has a higher translation time, as when using a helper only 5 TCGop are used but with the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec equivalent so this instruction is implemented with the help of 5 others, vmuleu, vmulou, vmrgh, vmrgl and vpkum). Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 +- target/ppc/insn32.decode | 2 ++ target/ppc/int_helper.c | 3 +- target/ppc/translate.c | 1 - target/ppc/translate/vmx-impl.c.inc | 48 ++++++++++++++++++----------- 5 files changed, 35 insertions(+), 21 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 57eee07256..9c562ab00e 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -264,7 +264,7 @@ DEF_HELPER_FLAGS_4(VMSUMUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) DEF_HELPER_5(VMSUMUHS, void, env, avr, avr, avr, avr) DEF_HELPER_FLAGS_4(VMSUMSHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) DEF_HELPER_5(VMSUMSHS, void, env, avr, avr, avr, avr) -DEF_HELPER_FLAGS_4(vmladduhm, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) +DEF_HELPER_FLAGS_5(VMLADDUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_2(mtvscr, TCG_CALL_NO_RWG, void, env, i32) DEF_HELPER_FLAGS_1(mfvscr, TCG_CALL_NO_RWG, i32, env) DEF_HELPER_3(lvebx, void, env, avr, tl) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index a5249ee32c..7445455a12 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -693,6 +693,8 @@ VMSUMUHS 000100 ..... ..... ..... ..... 100111 @VA VMSUMCUD 000100 ..... ..... ..... ..... 010111 @VA VMSUMUDM 000100 ..... ..... ..... ..... 100011 @VA +VMLADDUHM 000100 ..... ..... ..... ..... 100010 @VA + ## Vector String Instructions VSTRIBL 000100 ..... 00000 ..... . 0000001101 @VX_tb_rc diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 696096100b..0d25000b2a 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -974,7 +974,8 @@ void helper_vmhraddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, } } -void helper_vmladduhm(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) +void helper_VMLADDUHM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c, + uint32_t v) { int i; diff --git a/target/ppc/translate.c b/target/ppc/translate.c index e810842925..11f729c60c 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -6921,7 +6921,6 @@ GEN_HANDLER(lvsl, 0x1f, 0x06, 0x00, 0x00000001, PPC_ALTIVEC), GEN_HANDLER(lvsr, 0x1f, 0x06, 0x01, 0x00000001, PPC_ALTIVEC), GEN_HANDLER(mfvscr, 0x04, 0x2, 0x18, 0x001ff800, PPC_ALTIVEC), GEN_HANDLER(mtvscr, 0x04, 0x2, 0x19, 0x03ff0000, PPC_ALTIVEC), -GEN_HANDLER(vmladduhm, 0x04, 0x11, 0xFF, 0x00000000, PPC_ALTIVEC), #if defined(TARGET_PPC64) GEN_HANDLER_E(maddhd_maddhdu, 0x04, 0x18, 0xFF, 0x00000000, PPC_NONE, PPC2_ISA300), diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index e644ad3236..9f18c6d4f2 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -2523,24 +2523,6 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx) \ GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16) -static void gen_vmladduhm(DisasContext *ctx) -{ - TCGv_ptr ra, rb, rc, rd; - if (unlikely(!ctx->altivec_enabled)) { - gen_exception(ctx, POWERPC_EXCP_VPU); - return; - } - ra = gen_avr_ptr(rA(ctx->opcode)); - rb = gen_avr_ptr(rB(ctx->opcode)); - rc = gen_avr_ptr(rC(ctx->opcode)); - rd = gen_avr_ptr(rD(ctx->opcode)); - gen_helper_vmladduhm(rd, ra, rb, rc); - tcg_temp_free_ptr(ra); - tcg_temp_free_ptr(rb); - tcg_temp_free_ptr(rc); - tcg_temp_free_ptr(rd); -} - static bool do_va_helper(DisasContext *ctx, arg_VA *a, void (*gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr)) { @@ -2569,6 +2551,36 @@ TRANS_FLAGS2(ALTIVEC_207, VSUBECUQ, do_va_helper, gen_helper_VSUBECUQ) TRANS_FLAGS(ALTIVEC, VPERM, do_va_helper, gen_helper_VPERM) TRANS_FLAGS2(ISA300, VPERMR, do_va_helper, gen_helper_VPERMR) +static void gen_vmladduhm_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b, + TCGv_vec c) +{ + tcg_gen_mul_vec(vece, t, a, b); + tcg_gen_add_vec(vece, t, t, c); +} + +static bool trans_VMLADDUHM(DisasContext *ctx, arg_VA *a) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_add_vec, INDEX_op_mul_vec, 0 + }; + + static const GVecGen4 op = { + .fno = gen_helper_VMLADDUHM, + .fniv = gen_vmladduhm_vec, + .opt_opc = vecop_list, + .vece = MO_16 + }; + + REQUIRE_INSNS_FLAGS(ctx, ALTIVEC); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_4(avr_full_offset(a->vrt), avr_full_offset(a->vra), + avr_full_offset(a->vrb), avr_full_offset(a->rc), + 16, 16, &op); + + return true; +} + static bool trans_VSEL(DisasContext *ctx, arg_VA *a) { REQUIRE_INSNS_FLAGS(ctx, ALTIVEC); From patchwork Wed Oct 19 12:50:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011714 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3863AC433FE for ; Wed, 19 Oct 2022 13:03:34 +0000 (UTC) Received: from localhost ([::1]:36122 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8jL-0002KW-OL for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:03:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57970) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8YG-00055J-Ma; Wed, 19 Oct 2022 08:52:07 -0400 Received: from [200.168.210.66] (port=3639 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8YE-00025u-N9; Wed, 19 Oct 2022 08:52:04 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:43 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id B25EB800150; Wed, 19 Oct 2022 09:50:42 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 02/12] target/ppc: Move VMH[R]ADDSHS instruction to decodetree Date: Wed, 19 Oct 2022 09:50:30 -0300 Message-Id: <20221019125040.48028-3-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:43.0046 (UTC) FILETIME=[66325860:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" This patch moves VMHADDSHS and VMHRADDSHS to decodetree I couldn't find a satisfactory implementation with TCG inline. vmhaddshs: rept loop master patch 8 12500 0,02983400 0,02648500 (-11.2%) 25 4000 0,02946000 0,02518000 (-14.5%) 100 1000 0,03104300 0,02638000 (-15.0%) 500 200 0,04002000 0,03502500 (-12.5%) 2500 40 0,08090100 0,07562200 (-6.5%) 8000 12 0,19242600 0,18626800 (-3.2%) vmhraddshs: rept loop master patch 8 12500 0,03078600 0,02851000 (-7.4%) 25 4000 0,02793200 0,02746900 (-1.7%) 100 1000 0,02886000 0,02839900 (-1.6%) 500 200 0,03714700 0,03799200 (+2.3%) 2500 40 0,07948000 0,07852200 (-1.2%) 8000 12 0,19049800 0,18813900 (-1.2%) Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 4 ++-- target/ppc/insn32.decode | 2 ++ target/ppc/int_helper.c | 4 ++-- target/ppc/translate/vmx-impl.c.inc | 5 +++-- target/ppc/translate/vmx-ops.c.inc | 1 - 5 files changed, 9 insertions(+), 7 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 9c562ab00e..f02a9497b7 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -258,8 +258,8 @@ DEF_HELPER_4(vpkuhum, void, env, avr, avr, avr) DEF_HELPER_4(vpkuwum, void, env, avr, avr, avr) DEF_HELPER_4(vpkudum, void, env, avr, avr, avr) DEF_HELPER_FLAGS_3(vpkpx, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_5(vmhaddshs, void, env, avr, avr, avr, avr) -DEF_HELPER_5(vmhraddshs, void, env, avr, avr, avr, avr) +DEF_HELPER_5(VMHADDSHS, void, env, avr, avr, avr, avr) +DEF_HELPER_5(VMHRADDSHS, void, env, avr, avr, avr, avr) DEF_HELPER_FLAGS_4(VMSUMUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) DEF_HELPER_5(VMSUMUHS, void, env, avr, avr, avr, avr) DEF_HELPER_FLAGS_4(VMSUMSHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 7445455a12..9a509e84df 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -694,6 +694,8 @@ VMSUMCUD 000100 ..... ..... ..... ..... 010111 @VA VMSUMUDM 000100 ..... ..... ..... ..... 100011 @VA VMLADDUHM 000100 ..... ..... ..... ..... 100010 @VA +VMHADDSHS 000100 ..... ..... ..... ..... 100000 @VA +VMHRADDSHS 000100 ..... ..... ..... ..... 100001 @VA ## Vector String Instructions diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 0d25000b2a..ae1ba8084d 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -939,7 +939,7 @@ target_ulong helper_vctzlsbb(ppc_avr_t *r) return count; } -void helper_vmhaddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, +void helper_VMHADDSHS(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) { int sat = 0; @@ -957,7 +957,7 @@ void helper_vmhaddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, } } -void helper_vmhraddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, +void helper_VMHRADDSHS(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) { int sat = 0; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 9f18c6d4f2..3acd585a2f 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -2521,7 +2521,7 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx) \ tcg_temp_free_ptr(rd); \ } -GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16) +GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) static bool do_va_helper(DisasContext *ctx, arg_VA *a, void (*gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr)) @@ -2620,7 +2620,8 @@ static bool do_va_env_helper(DisasContext *ctx, arg_VA *a, TRANS_FLAGS(ALTIVEC, VMSUMUHS, do_va_env_helper, gen_helper_VMSUMUHS) TRANS_FLAGS(ALTIVEC, VMSUMSHS, do_va_env_helper, gen_helper_VMSUMSHS) -GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) +TRANS_FLAGS(ALTIVEC, VMHADDSHS, do_va_env_helper, gen_helper_VMHADDSHS) +TRANS_FLAGS(ALTIVEC, VMHRADDSHS, do_va_env_helper, gen_helper_VMHRADDSHS) GEN_VXFORM_NOA(vclzb, 1, 28) GEN_VXFORM_NOA(vclzh, 1, 29) diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index a3a0fd0650..7cd9d40e06 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -219,7 +219,6 @@ GEN_VXFORM_UIMM(vctsxs, 5, 15), #define GEN_VAFORM_PAIRED(name0, name1, opc2) \ GEN_HANDLER(name0##_##name1, 0x04, opc2, 0xFF, 0x00000000, PPC_ALTIVEC) -GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16), GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23), GEN_VXFORM_DUAL(vclzb, vpopcntb, 1, 28, PPC_NONE, PPC2_ALTIVEC_207), From patchwork Wed Oct 19 12:50:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7701AC433FE for ; Wed, 19 Oct 2022 13:20:01 +0000 (UTC) Received: from localhost ([::1]:39146 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8zI-000081-6u for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:20:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45560) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8YT-0005An-3q; Wed, 19 Oct 2022 08:52:17 -0400 Received: from [200.168.210.66] (port=3639 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8YQ-00025u-Jr; Wed, 19 Oct 2022 08:52:16 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:43 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 01E3F800150; Wed, 19 Oct 2022 09:50:43 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 03/12] target/ppc: Move V(ADD|SUB)CUW to decodetree and use gvec Date: Wed, 19 Oct 2022 09:50:31 -0300 Message-Id: <20221019125040.48028-4-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:43.0312 (UTC) FILETIME=[665AEF00:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an implementation based on the helper, with the main difference being changing the -1 (aka all bits set to 1) result returned by cmp when true to +1. It also implemented a .fni4 version of those instructions and dropped the helper. vaddcuw: rept loop master patch 8 12500 0,01008200 0,00612400 (-39.3%) 25 4000 0,01091500 0,00471600 (-56.8%) 100 1000 0,01332500 0,00593700 (-55.4%) 500 200 0,01998500 0,01275700 (-36.2%) 2500 40 0,04704300 0,04364300 (-7.2%) 8000 12 0,10748200 0,11241000 (+4.6%) vsubcuw: rept loop master patch 8 12500 0,01226200 0,00571600 (-53.4%) 25 4000 0,01493500 0,00462100 (-69.1%) 100 1000 0,01522700 0,00455100 (-70.1%) 500 200 0,02384600 0,01133500 (-52.5%) 2500 40 0,04935200 0,03178100 (-35.6%) 8000 12 0,09039900 0,09440600 (+4.4%) Overall there was a gain in performance, but the TCGop code was still slightly bigger in the new version (it went from 4 to 5). Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 - target/ppc/insn32.decode | 2 + target/ppc/int_helper.c | 18 --------- target/ppc/translate/vmx-impl.c.inc | 61 +++++++++++++++++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 3 +- 5 files changed, 60 insertions(+), 26 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index f02a9497b7..f7047ed2aa 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -193,11 +193,9 @@ DEF_HELPER_FLAGS_3(vslo, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vsro, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vsrv, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vslv, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vaddcuw, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_2(vprtybw, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vprtybd, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vprtybq, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_3(vsubcuw, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 9a509e84df..aebc7b73c8 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -608,12 +608,14 @@ VRLQNM 000100 ..... ..... ..... 00101000101 @VX ## Vector Integer Arithmetic Instructions +VADDCUW 000100 ..... ..... ..... 00110000000 @VX VADDCUQ 000100 ..... ..... ..... 00101000000 @VX VADDUQM 000100 ..... ..... ..... 00100000000 @VX VADDEUQM 000100 ..... ..... ..... ..... 111100 @VA VADDECUQ 000100 ..... ..... ..... ..... 111101 @VA +VSUBCUW 000100 ..... ..... ..... 10110000000 @VX VSUBCUQ 000100 ..... ..... ..... 10101000000 @VX VSUBUQM 000100 ..... ..... ..... 10100000000 @VX diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index ae1ba8084d..f8dd12e8ae 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -492,15 +492,6 @@ static inline void set_vscr_sat(CPUPPCState *env) env->vscr_sat.u32[0] = 1; } -void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(r->u32); i++) { - r->u32[i] = ~a->u32[i] < b->u32[i]; - } -} - /* vprtybw */ void helper_vprtybw(ppc_avr_t *r, ppc_avr_t *b) { @@ -1962,15 +1953,6 @@ void helper_vsro(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) #endif } -void helper_vsubcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(r->u32); i++) { - r->u32[i] = a->u32[i] >= b->u32[i]; - } -} - void helper_vsumsws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) { int64_t t; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 3acd585a2f..f52485a5f1 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -803,8 +803,6 @@ GEN_VXFORM(vsrv, 2, 28); GEN_VXFORM(vslv, 2, 29); GEN_VXFORM(vslo, 6, 16); GEN_VXFORM(vsro, 6, 17); -GEN_VXFORM(vaddcuw, 0, 6); -GEN_VXFORM(vsubcuw, 0, 22); static bool do_vector_gvec3_VX(DisasContext *ctx, arg_VX *a, int vece, void (*gen_gvec)(unsigned, uint32_t, uint32_t, @@ -2847,8 +2845,6 @@ static void gen_xpnd04_2(DisasContext *ctx) } -GEN_VXFORM_DUAL(vsubcuw, PPC_ALTIVEC, PPC_NONE, \ - xpnd04_1, PPC_NONE, PPC2_ISA300) GEN_VXFORM_DUAL(vsubsws, PPC_ALTIVEC, PPC_NONE, \ xpnd04_2, PPC_NONE, PPC2_ISA300) @@ -3110,6 +3106,63 @@ TRANS_FLAGS2(ALTIVEC_207, VPMSUMD, do_vx_helper, gen_helper_VPMSUMD) TRANS_FLAGS2(ALTIVEC_207, VSUBCUQ, do_vx_helper, gen_helper_VSUBCUQ) TRANS_FLAGS2(ALTIVEC_207, VSUBUQM, do_vx_helper, gen_helper_VSUBUQM) +static void gen_VADDCUW_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_not_vec(vece, a, a); + tcg_gen_cmp_vec(TCG_COND_LTU, vece, t, a, b); + tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(t, vece, 1)); +} + +static void gen_VADDCUW_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_not_i32(a, a); + tcg_gen_setcond_i32(TCG_COND_LTU, t, a, b); +} + +static void gen_VSUBCUW_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_cmp_vec(TCG_COND_GEU, vece, t, a, b); + tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(t, vece, 1)); +} + +static void gen_VSUBCUW_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_setcond_i32(TCG_COND_GEU, t, a, b); +} + +static bool do_vx_vaddsubcuw(DisasContext *ctx, arg_VX *a, int add) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_cmp_vec, 0 + }; + + static const GVecGen3 op[] = { + { + .fniv = gen_VSUBCUW_vec, + .fni4 = gen_VSUBCUW_i32, + .opt_opc = vecop_list, + .vece = MO_32 + }, + { + .fniv = gen_VADDCUW_vec, + .fni4 = gen_VADDCUW_i32, + .opt_opc = vecop_list, + .vece = MO_32 + }, + }; + + REQUIRE_INSNS_FLAGS(ctx, ALTIVEC); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra), + avr_full_offset(a->vrb), 16, 16, &op[add]); + + return true; +} + +TRANS(VSUBCUW, do_vx_vaddsubcuw, 0) +TRANS(VADDCUW, do_vx_vaddsubcuw, 1) + static bool do_vx_vmuleo(DisasContext *ctx, arg_VX *a, bool even, void (*gen_mul)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64)) { diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index 7cd9d40e06..ded0234123 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -106,12 +106,11 @@ GEN_VXFORM_300(vsrv, 2, 28), GEN_VXFORM_300(vslv, 2, 29), GEN_VXFORM(vslo, 6, 16), GEN_VXFORM(vsro, 6, 17), -GEN_VXFORM(vaddcuw, 0, 6), GEN_HANDLER_E_2(vprtybw, 0x4, 0x1, 0x18, 8, 0, PPC_NONE, PPC2_ISA300), GEN_HANDLER_E_2(vprtybd, 0x4, 0x1, 0x18, 9, 0, PPC_NONE, PPC2_ISA300), GEN_HANDLER_E_2(vprtybq, 0x4, 0x1, 0x18, 10, 0, PPC_NONE, PPC2_ISA300), -GEN_VXFORM_DUAL(vsubcuw, xpnd04_1, 0, 22, PPC_ALTIVEC, PPC_NONE), +GEN_VXFORM(xpnd04_1, 0, 22), GEN_VXFORM_300(bcdsr, 0, 23), GEN_VXFORM_300(bcdsr, 0, 31), GEN_VXFORM_DUAL(vaddubs, vmul10uq, 0, 8, PPC_ALTIVEC, PPC_NONE), From patchwork Wed Oct 19 12:50:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5E0F9C433FE for ; Wed, 19 Oct 2022 13:03:39 +0000 (UTC) Received: from localhost ([::1]:36132 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8jR-0002R0-Pv for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:03:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45562) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8YW-0005LK-4n; Wed, 19 Oct 2022 08:52:20 -0400 Received: from [200.168.210.66] (port=3639 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8YU-00025u-6m; Wed, 19 Oct 2022 08:52:19 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:43 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 38839800278; Wed, 19 Oct 2022 09:50:43 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 04/12] target/ppc: Move VNEG[WD] to decodtree and use gvec Date: Wed, 19 Oct 2022 09:50:32 -0300 Message-Id: <20221019125040.48028-5-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:43.0562 (UTC) FILETIME=[668114A0:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved the instructions VNEGW and VNEGD to decodetree and used gvec to decode it. vnegw: rept loop master patch 8 12500 0,01053200 0,00548400 (-47.9%) 25 4000 0,01030500 0,00390000 (-62.2%) 100 1000 0,01096300 0,00395400 (-63.9%) 500 200 0,01472000 0,00712300 (-51.6%) 2500 40 0,03809000 0,02147700 (-43.6%) 8000 12 0,09957100 0,06202100 (-37.7%) vnegd: rept loop master patch 8 12500 0,00594600 0,00543800 (-8.5%) 25 4000 0,00575200 0,00396400 (-31.1%) 100 1000 0,00676100 0,00394800 (-41.6%) 500 200 0,01149300 0,00709400 (-38.3%) 2500 40 0,03441500 0,02169600 (-37.0%) 8000 12 0,09516900 0,06337000 (-33.4%) Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 -- target/ppc/insn32.decode | 3 +++ target/ppc/int_helper.c | 12 ------------ target/ppc/translate/vmx-impl.c.inc | 15 +++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 2 -- 5 files changed, 16 insertions(+), 18 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index f7047ed2aa..b2e910b089 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -229,8 +229,6 @@ DEF_HELPER_FLAGS_2(VSTRIBL, TCG_CALL_NO_RWG, i32, avr, avr) DEF_HELPER_FLAGS_2(VSTRIBR, TCG_CALL_NO_RWG, i32, avr, avr) DEF_HELPER_FLAGS_2(VSTRIHL, TCG_CALL_NO_RWG, i32, avr, avr) DEF_HELPER_FLAGS_2(VSTRIHR, TCG_CALL_NO_RWG, i32, avr, avr) -DEF_HELPER_FLAGS_2(vnegw, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_2(vnegd, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vupkhpx, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vupklpx, TCG_CALL_NO_RWG, void, avr, avr) DEF_HELPER_FLAGS_2(vupkhsb, TCG_CALL_NO_RWG, void, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index aebc7b73c8..2658dd3395 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -629,6 +629,9 @@ VEXTSH2D 000100 ..... 11001 ..... 11000000010 @VX_tb VEXTSW2D 000100 ..... 11010 ..... 11000000010 @VX_tb VEXTSD2Q 000100 ..... 11011 ..... 11000000010 @VX_tb +VNEGD 000100 ..... 00111 ..... 11000000010 @VX_tb +VNEGW 000100 ..... 00110 ..... 11000000010 @VX_tb + ## Vector Mask Manipulation Instructions MTVSRBM 000100 ..... 10000 ..... 11001000010 @VX_tb diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index f8dd12e8ae..c7fd0d1faa 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1928,18 +1928,6 @@ XXBLEND(W, 32) XXBLEND(D, 64) #undef XXBLEND -#define VNEG(name, element) \ -void helper_##name(ppc_avr_t *r, ppc_avr_t *b) \ -{ \ - int i; \ - for (i = 0; i < ARRAY_SIZE(r->element); i++) { \ - r->element[i] = -b->element[i]; \ - } \ -} -VNEG(vnegw, s32) -VNEG(vnegd, s64) -#undef VNEG - void helper_vsro(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) { int sh = (b->VsrB(0xf) >> 3) & 0xf; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index f52485a5f1..b9a9e83ab3 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -2625,8 +2625,19 @@ GEN_VXFORM_NOA(vclzb, 1, 28) GEN_VXFORM_NOA(vclzh, 1, 29) GEN_VXFORM_TRANS(vclzw, 1, 30) GEN_VXFORM_TRANS(vclzd, 1, 31) -GEN_VXFORM_NOA_2(vnegw, 1, 24, 6) -GEN_VXFORM_NOA_2(vnegd, 1, 24, 7) + +static bool do_vneg(DisasContext *ctx, arg_VX_tb *a, unsigned vece) +{ + REQUIRE_INSNS_FLAGS2(ctx, ISA300); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_neg(vece, avr_full_offset(a->vrt), avr_full_offset(a->vrb), + 16, 16); + return true; +} + +TRANS(VNEGW, do_vneg, MO_32) +TRANS(VNEGD, do_vneg, MO_64) static void gen_vexts_i64(TCGv_i64 t, TCGv_i64 b, int64_t s) { diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index ded0234123..27908533dd 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -181,8 +181,6 @@ GEN_VXFORM_300_EXT(vextractd, 6, 11, 0x100000), GEN_VXFORM(vspltisb, 6, 12), GEN_VXFORM(vspltish, 6, 13), GEN_VXFORM(vspltisw, 6, 14), -GEN_VXFORM_300_EO(vnegw, 0x01, 0x18, 0x06), -GEN_VXFORM_300_EO(vnegd, 0x01, 0x18, 0x07), GEN_VXFORM_300_EO(vctzb, 0x01, 0x18, 0x1C), GEN_VXFORM_300_EO(vctzh, 0x01, 0x18, 0x1D), GEN_VXFORM_300_EO(vctzw, 0x01, 0x18, 0x1E), From patchwork Wed Oct 19 12:50:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 110B1C43217 for ; Wed, 19 Oct 2022 13:04:17 +0000 (UTC) Received: from localhost ([::1]:36164 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8k3-0002ku-N5 for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:04:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45564) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8Yb-0005cf-GN; Wed, 19 Oct 2022 08:52:25 -0400 Received: from [200.168.210.66] (port=3639 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8YX-00025u-9N; Wed, 19 Oct 2022 08:52:25 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:43 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 6E78F800150; Wed, 19 Oct 2022 09:50:43 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 05/12] target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec Date: Wed, 19 Oct 2022 09:50:33 -0300 Message-Id: <20221019125040.48028-6-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:43.0796 (UTC) FILETIME=[66A4C940:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8, respectively. vprtybw: rept loop master patch 8 12500 0,01198900 0,00703100 (-41.4%) 25 4000 0,01070100 0,00571400 (-46.6%) 100 1000 0,01123300 0,00678200 (-39.6%) 500 200 0,01601500 0,01535600 (-4.1%) 2500 40 0,03872900 0,05562100 (43.6%) 8000 12 0,10047000 0,16643000 (65.7%) vprtybd: rept loop master patch 8 12500 0,00757700 0,00788100 (4.0%) 25 4000 0,00652500 0,00669600 (2.6%) 100 1000 0,00714400 0,00825400 (15.5%) 500 200 0,01211000 0,01903700 (57.2%) 2500 40 0,03483800 0,07021200 (101.5%) 8000 12 0,09591800 0,21036200 (119.3%) vprtybq: rept loop master patch 8 12500 0,00675600 0,00667200 (-1.2%) 25 4000 0,00619400 0,00643200 (3.8%) 100 1000 0,00707100 0,00751100 (6.2%) 500 200 0,01199300 0,01342000 (11.9%) 2500 40 0,03490900 0,04092900 (17.2%) 8000 12 0,09588200 0,11465100 (19.6%) I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ, I'm not sure if it's worth to move those instructions. Comparing the assembly of the helper with the TCGop they are pretty similar, so I'm not sure why vprtybd took so much more time. Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 4 +- target/ppc/insn32.decode | 4 ++ target/ppc/int_helper.c | 25 +---------- target/ppc/translate/vmx-impl.c.inc | 68 +++++++++++++++++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 3 -- 5 files changed, 71 insertions(+), 33 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index b2e910b089..a06193bc67 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -193,9 +193,7 @@ DEF_HELPER_FLAGS_3(vslo, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vsro, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vsrv, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vslv, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_2(vprtybw, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_2(vprtybd, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_2(vprtybq, TCG_CALL_NO_RWG, void, avr, avr) +DEF_HELPER_FLAGS_3(VPRTYBQ, TCG_CALL_NO_RWG, void, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 2658dd3395..aa4968e6b9 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -529,6 +529,10 @@ VCTZDM 000100 ..... ..... ..... 11111000100 @VX VPDEPD 000100 ..... ..... ..... 10111001101 @VX VPEXTD 000100 ..... ..... ..... 10110001101 @VX +VPRTYBD 000100 ..... 01001 ..... 11000000010 @VX_tb +VPRTYBQ 000100 ..... 01010 ..... 11000000010 @VX_tb +VPRTYBW 000100 ..... 01000 ..... 11000000010 @VX_tb + ## Vector Permute and Formatting Instruction VEXTDUBVLX 000100 ..... ..... ..... ..... 011000 @VA diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index c7fd0d1faa..c6ce4665fa 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -492,31 +492,8 @@ static inline void set_vscr_sat(CPUPPCState *env) env->vscr_sat.u32[0] = 1; } -/* vprtybw */ -void helper_vprtybw(ppc_avr_t *r, ppc_avr_t *b) -{ - int i; - for (i = 0; i < ARRAY_SIZE(r->u32); i++) { - uint64_t res = b->u32[i] ^ (b->u32[i] >> 16); - res ^= res >> 8; - r->u32[i] = res & 1; - } -} - -/* vprtybd */ -void helper_vprtybd(ppc_avr_t *r, ppc_avr_t *b) -{ - int i; - for (i = 0; i < ARRAY_SIZE(r->u64); i++) { - uint64_t res = b->u64[i] ^ (b->u64[i] >> 32); - res ^= res >> 16; - res ^= res >> 8; - r->u64[i] = res & 1; - } -} - /* vprtybq */ -void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b) +void helper_VPRTYBQ(ppc_avr_t *r, ppc_avr_t *b, uint32_t v) { uint64_t res = b->u64[0] ^ b->u64[1]; res ^= res >> 32; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index b9a9e83ab3..cbb2a3ebe7 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -1659,9 +1659,71 @@ GEN_VXFORM_NOA_ENV(vrfim, 5, 11); GEN_VXFORM_NOA_ENV(vrfin, 5, 8); GEN_VXFORM_NOA_ENV(vrfip, 5, 10); GEN_VXFORM_NOA_ENV(vrfiz, 5, 9); -GEN_VXFORM_NOA(vprtybw, 1, 24); -GEN_VXFORM_NOA(vprtybd, 1, 24); -GEN_VXFORM_NOA(vprtybq, 1, 24); + +static void gen_vprtyb_vec(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + int i; + TCGv_vec tmp = tcg_temp_new_vec_matching(b); + /* MO_32 is 2, so 2 iteractions for MO_32 and 3 for MO_64 */ + for (i = 0; i < vece; i++) { + tcg_gen_shri_vec(vece, tmp, b, (4 << (vece - i))); + tcg_gen_xor_vec(vece, b, tmp, b); + } + tcg_gen_and_vec(vece, t, b, tcg_constant_vec_matching(t, vece, 1)); + tcg_temp_free_vec(tmp); +} + +/* vprtybw */ +static void gen_vprtyb_i32(TCGv_i32 t, TCGv_i32 b) +{ + tcg_gen_ctpop_i32(t, b); + tcg_gen_and_i32(t, t, tcg_constant_i32(1)); +} + +/* vprtybd */ +static void gen_vprtyb_i64(TCGv_i64 t, TCGv_i64 b) +{ + tcg_gen_ctpop_i64(t, b); + tcg_gen_and_i64(t, t, tcg_constant_i64(1)); +} + +static bool do_vx_vprtyb(DisasContext *ctx, arg_VX_tb *a, unsigned vece) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_shri_vec, 0 + }; + + static const GVecGen2 op[] = { + { + .fniv = gen_vprtyb_vec, + .fni4 = gen_vprtyb_i32, + .opt_opc = vecop_list, + .vece = MO_32 + }, + { + .fniv = gen_vprtyb_vec, + .fni8 = gen_vprtyb_i64, + .opt_opc = vecop_list, + .vece = MO_64 + }, + { + .fno = gen_helper_VPRTYBQ, + .vece = MO_128 + }, + }; + + REQUIRE_INSNS_FLAGS2(ctx, ISA300); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_2(avr_full_offset(a->vrt), avr_full_offset(a->vrb), + 16, 16, &op[vece - MO_32]); + + return true; +} + +TRANS(VPRTYBW, do_vx_vprtyb, MO_32) +TRANS(VPRTYBD, do_vx_vprtyb, MO_64) +TRANS(VPRTYBQ, do_vx_vprtyb, MO_128) static void gen_vsplt(DisasContext *ctx, int vece) { diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index 27908533dd..46a620a232 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -106,9 +106,6 @@ GEN_VXFORM_300(vsrv, 2, 28), GEN_VXFORM_300(vslv, 2, 29), GEN_VXFORM(vslo, 6, 16), GEN_VXFORM(vsro, 6, 17), -GEN_HANDLER_E_2(vprtybw, 0x4, 0x1, 0x18, 8, 0, PPC_NONE, PPC2_ISA300), -GEN_HANDLER_E_2(vprtybd, 0x4, 0x1, 0x18, 9, 0, PPC_NONE, PPC2_ISA300), -GEN_HANDLER_E_2(vprtybq, 0x4, 0x1, 0x18, 10, 0, PPC_NONE, PPC2_ISA300), GEN_VXFORM(xpnd04_1, 0, 22), GEN_VXFORM_300(bcdsr, 0, 23), From patchwork Wed Oct 19 12:50:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91C14C433FE for ; Wed, 19 Oct 2022 13:05:26 +0000 (UTC) Received: from localhost ([::1]:51084 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8lA-0003di-6x for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:05:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57272) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8Yn-0006Jq-TR; Wed, 19 Oct 2022 08:52:38 -0400 Received: from [200.168.210.66] (port=3639 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8Yc-00025u-If; Wed, 19 Oct 2022 08:52:37 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:44 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id B1FFA800278; Wed, 19 Oct 2022 09:50:43 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 06/12] target/ppc: Move VAVG[SU][BHW] to decodetree and use gvec Date: Wed, 19 Oct 2022 09:50:34 -0300 Message-Id: <20221019125040.48028-7-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:44.0062 (UTC) FILETIME=[66CD5FE0:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW, to decodetree and use gvec with them. For these one the right shift had to be made before the sum as to avoid an overflow, so add 1 at the end if any of the entries had 1 in its LSB as to replicate the "+ 1" before the shift described by the ISA. vavgub: rept loop master patch 8 12500 0,02616600 0,00754200 (-71.2%) 25 4000 0,02530000 0,00637700 (-74.8%) 100 1000 0,02604600 0,00790100 (-69.7%) 500 200 0,03189300 0,01838400 (-42.4%) 2500 40 0,06006900 0,06851000 (+14.1%) 8000 12 0,13941000 0,20548500 (+47.4%) vavguh: rept loop master patch 8 12500 0,01818200 0,00780600 (-57.1%) 25 4000 0,01789300 0,00641600 (-64.1%) 100 1000 0,01899100 0,00787200 (-58.5%) 500 200 0,02527200 0,01828400 (-27.7%) 2500 40 0,05361800 0,06773000 (+26.3%) 8000 12 0,12886600 0,20291400 (+57.5%) vavguw: rept loop master patch 8 12500 0,01423100 0,00776600 (-45.4%) 25 4000 0,01780800 0,00638600 (-64.1%) 100 1000 0,02085500 0,00787000 (-62.3%) 500 200 0,02737100 0,01828800 (-33.2%) 2500 40 0,05572600 0,06774200 (+21.6%) 8000 12 0,13101700 0,20311600 (+55.0%) vavgsb: rept loop master patch 8 12500 0,03006000 0,00788600 (-73.8%) 25 4000 0,02882200 0,00637800 (-77.9%) 100 1000 0,02958000 0,00791400 (-73.2%) 500 200 0,03548800 0,01860400 (-47.6%) 2500 40 0,06360000 0,06850800 (+7.7%) 8000 12 0,13816500 0,20550300 (+48.7%) vavgsh: rept loop master patch 8 12500 0,01965900 0,00776600 (-60.5%) 25 4000 0,01875400 0,00638700 (-65.9%) 100 1000 0,01952200 0,00786900 (-59.7%) 500 200 0,02562000 0,01760300 (-31.3%) 2500 40 0,05384300 0,06742800 (+25.2%) 8000 12 0,13240800 0,20330000 (+53.5%) vavgsw: rept loop master patch 8 12500 0,01407700 0,00775600 (-44.9%) 25 4000 0,01762300 0,00640000 (-63.7%) 100 1000 0,02046500 0,00788500 (-61.5%) 500 200 0,02745600 0,01843000 (-32.9%) 2500 40 0,05375500 0,06820500 (+26.9%) 8000 12 0,13068300 0,20304900 (+55.4%) These results to me seems to indicate that with gvec the results have a slower translation but faster execution. Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 12 ++-- target/ppc/insn32.decode | 9 +++ target/ppc/int_helper.c | 32 ++++----- target/ppc/translate/vmx-impl.c.inc | 106 ++++++++++++++++++++++++---- target/ppc/translate/vmx-ops.c.inc | 9 +-- 5 files changed, 127 insertions(+), 41 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index a06193bc67..71c22efc2e 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -143,15 +143,15 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64) #define dh_ctype_acc ppc_acc_t * #define dh_typecode_acc dh_typecode_ptr -DEF_HELPER_FLAGS_3(vavgub, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavguh, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavguw, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_4(VAVGUB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VAVGUH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VAVGUW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_3(vabsdub, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vabsduh, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vabsduw, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavgsb, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavgsh, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vavgsw, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_4(VAVGSB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VAVGSH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VAVGSW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_4(vcmpeqfp, void, env, avr, avr, avr) DEF_HELPER_4(vcmpgefp, void, env, avr, avr, avr) DEF_HELPER_4(vcmpgtfp, void, env, avr, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index aa4968e6b9..38458c01de 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -519,6 +519,15 @@ VCMPNEZW 000100 ..... ..... ..... . 0110000111 @VC VCMPSQ 000100 ... -- ..... ..... 00101000001 @VX_bf VCMPUQ 000100 ... -- ..... ..... 00100000001 @VX_bf +## Vector Integer Average Instructions + +VAVGSB 000100 ..... ..... ..... 10100000010 @VX +VAVGSH 000100 ..... ..... ..... 10101000010 @VX +VAVGSW 000100 ..... ..... ..... 10110000010 @VX +VAVGUB 000100 ..... ..... ..... 10000000010 @VX +VAVGUH 000100 ..... ..... ..... 10001000010 @VX +VAVGUW 000100 ..... ..... ..... 10010000010 @VX + ## Vector Bit Manipulation Instruction VGNB 000100 ..... -- ... ..... 10011001100 @VX_n diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index c6ce4665fa..bda76e54d4 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -570,25 +570,23 @@ VARITHSAT_UNSIGNED(w, u32, uint64_t, cvtsduw) #undef VARITHSAT_SIGNED #undef VARITHSAT_UNSIGNED -#define VAVG_DO(name, element, etype) \ - void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) \ - { \ - int i; \ - \ - for (i = 0; i < ARRAY_SIZE(r->element); i++) { \ - etype x = (etype)a->element[i] + (etype)b->element[i] + 1; \ - r->element[i] = x >> 1; \ - } \ +#define VAVG(name, element, etype) \ + void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t v)\ + { \ + int i; \ + \ + for (i = 0; i < ARRAY_SIZE(r->element); i++) { \ + etype x = (etype)a->element[i] + (etype)b->element[i] + 1; \ + r->element[i] = x >> 1; \ + } \ } -#define VAVG(type, signed_element, signed_type, unsigned_element, \ - unsigned_type) \ - VAVG_DO(avgs##type, signed_element, signed_type) \ - VAVG_DO(avgu##type, unsigned_element, unsigned_type) -VAVG(b, s8, int16_t, u8, uint16_t) -VAVG(h, s16, int32_t, u16, uint32_t) -VAVG(w, s32, int64_t, u32, uint64_t) -#undef VAVG_DO +VAVG(VAVGSB, s8, int16_t) +VAVG(VAVGUB, u8, uint16_t) +VAVG(VAVGSH, s16, int32_t) +VAVG(VAVGUH, u16, uint32_t) +VAVG(VAVGSW, s32, int64_t) +VAVG(VAVGUW, u32, uint64_t) #undef VAVG #define VABSDU_DO(name, element) \ diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index cbb2a3ebe7..195c601f7a 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -431,21 +431,9 @@ GEN_VXFORM_V(vminsb, MO_8, tcg_gen_gvec_smin, 1, 12); GEN_VXFORM_V(vminsh, MO_16, tcg_gen_gvec_smin, 1, 13); GEN_VXFORM_V(vminsw, MO_32, tcg_gen_gvec_smin, 1, 14); GEN_VXFORM_V(vminsd, MO_64, tcg_gen_gvec_smin, 1, 15); -GEN_VXFORM(vavgub, 1, 16); GEN_VXFORM(vabsdub, 1, 16); -GEN_VXFORM_DUAL(vavgub, PPC_ALTIVEC, PPC_NONE, \ - vabsdub, PPC_NONE, PPC2_ISA300) -GEN_VXFORM(vavguh, 1, 17); GEN_VXFORM(vabsduh, 1, 17); -GEN_VXFORM_DUAL(vavguh, PPC_ALTIVEC, PPC_NONE, \ - vabsduh, PPC_NONE, PPC2_ISA300) -GEN_VXFORM(vavguw, 1, 18); GEN_VXFORM(vabsduw, 1, 18); -GEN_VXFORM_DUAL(vavguw, PPC_ALTIVEC, PPC_NONE, \ - vabsduw, PPC_NONE, PPC2_ISA300) -GEN_VXFORM(vavgsb, 1, 20); -GEN_VXFORM(vavgsh, 1, 21); -GEN_VXFORM(vavgsw, 1, 22); GEN_VXFORM(vmrghb, 6, 0); GEN_VXFORM(vmrghh, 6, 1); GEN_VXFORM(vmrghw, 6, 2); @@ -3373,6 +3361,100 @@ TRANS(VMULHSD, do_vx_mulh, true , do_vx_vmulhd_i64) TRANS(VMULHUW, do_vx_mulh, false, do_vx_vmulhw_i64) TRANS(VMULHUD, do_vx_mulh, false, do_vx_vmulhd_i64) +static void do_vavg(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b, + void (*gen_shr_vec)(unsigned, TCGv_vec, TCGv_vec, int64_t)) +{ + TCGv_vec tmp = tcg_temp_new_vec_matching(t); + tcg_gen_or_vec(vece, tmp, a, b); + tcg_gen_and_vec(vece, tmp, tmp, tcg_constant_vec_matching(t, vece, 1)); + gen_shr_vec(vece, a, a, 1); + gen_shr_vec(vece, b, b, 1); + tcg_gen_add_vec(vece, t, a, b); + tcg_gen_add_vec(vece, t, t, tmp); + tcg_temp_free_vec(tmp); +} + +QEMU_FLATTEN +static void gen_vavgu(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + do_vavg(vece, t, a, b, tcg_gen_shri_vec); +} + +QEMU_FLATTEN +static void gen_vavgs(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + do_vavg(vece, t, a, b, tcg_gen_sari_vec); +} + +static bool do_vx_vavg(DisasContext *ctx, arg_VX *a, int sign, int vece) +{ + static const TCGOpcode vecop_list_s[] = { + INDEX_op_add_vec, INDEX_op_sari_vec, 0 + }; + static const TCGOpcode vecop_list_u[] = { + INDEX_op_add_vec, INDEX_op_shri_vec, 0 + }; + + static const GVecGen3 op[2][3] = { + { + { + .fniv = gen_vavgu, + .fno = gen_helper_VAVGUB, + .opt_opc = vecop_list_u, + .vece = MO_8 + }, + { + .fniv = gen_vavgu, + .fno = gen_helper_VAVGUH, + .opt_opc = vecop_list_u, + .vece = MO_16 + }, + { + .fniv = gen_vavgu, + .fno = gen_helper_VAVGUW, + .opt_opc = vecop_list_u, + .vece = MO_32 + }, + }, + { + { + .fniv = gen_vavgs, + .fno = gen_helper_VAVGSB, + .opt_opc = vecop_list_s, + .vece = MO_8 + }, + { + .fniv = gen_vavgs, + .fno = gen_helper_VAVGSH, + .opt_opc = vecop_list_s, + .vece = MO_16 + }, + { + .fniv = gen_vavgs, + .fno = gen_helper_VAVGSW, + .opt_opc = vecop_list_s, + .vece = MO_32 + }, + }, + }; + + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra), + avr_full_offset(a->vrb), 16, 16, &op[sign][vece]); + + + return true; +} + + +TRANS_FLAGS(ALTIVEC, VAVGSB, do_vx_vavg, 1, MO_8) +TRANS_FLAGS(ALTIVEC, VAVGSH, do_vx_vavg, 1, MO_16) +TRANS_FLAGS(ALTIVEC, VAVGSW, do_vx_vavg, 1, MO_32) +TRANS_FLAGS(ALTIVEC, VAVGUB, do_vx_vavg, 0, MO_8) +TRANS_FLAGS(ALTIVEC, VAVGUH, do_vx_vavg, 0, MO_16) +TRANS_FLAGS(ALTIVEC, VAVGUW, do_vx_vavg, 0, MO_32) + static bool do_vdiv_vmod(DisasContext *ctx, arg_VX *a, const int vece, void (*func_32)(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b), void (*func_64)(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)) diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index 46a620a232..02db51def0 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -83,12 +83,9 @@ GEN_VXFORM(vminsb, 1, 12), GEN_VXFORM(vminsh, 1, 13), GEN_VXFORM(vminsw, 1, 14), GEN_VXFORM_207(vminsd, 1, 15), -GEN_VXFORM_DUAL(vavgub, vabsdub, 1, 16, PPC_ALTIVEC, PPC_NONE), -GEN_VXFORM_DUAL(vavguh, vabsduh, 1, 17, PPC_ALTIVEC, PPC_NONE), -GEN_VXFORM_DUAL(vavguw, vabsduw, 1, 18, PPC_ALTIVEC, PPC_NONE), -GEN_VXFORM(vavgsb, 1, 20), -GEN_VXFORM(vavgsh, 1, 21), -GEN_VXFORM(vavgsw, 1, 22), +GEN_VXFORM(vabsdub, 1, 16), +GEN_VXFORM(vabsduh, 1, 17), +GEN_VXFORM(vabsduw, 1, 18), GEN_VXFORM(vmrghb, 6, 0), GEN_VXFORM(vmrghh, 6, 1), GEN_VXFORM(vmrghw, 6, 2), From patchwork Wed Oct 19 12:50:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011720 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A20C6C4332F for ; Wed, 19 Oct 2022 13:06:15 +0000 (UTC) Received: from localhost ([::1]:51100 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8ly-0003mc-IV for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:06:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57278) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8Yr-0006MF-2A; Wed, 19 Oct 2022 08:52:42 -0400 Received: from [200.168.210.66] (port=3639 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8Yp-00025u-3S; Wed, 19 Oct 2022 08:52:40 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:44 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id EC61A800150; Wed, 19 Oct 2022 09:50:43 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 07/12] target/ppc: Move VABSDU[BHW] to decodetree and use gvec Date: Wed, 19 Oct 2022 09:50:35 -0300 Message-Id: <20221019125040.48028-8-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:44.0250 (UTC) FILETIME=[66EA0FA0:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to translate them. vabsdub: rept loop master patch 8 12500 0,03601600 0,00688500 (-80.9%) 25 4000 0,03651000 0,00532100 (-85.4%) 100 1000 0,03666900 0,00595300 (-83.8%) 500 200 0,04305800 0,01244600 (-71.1%) 2500 40 0,06893300 0,04273700 (-38.0%) 8000 12 0,14633200 0,12660300 (-13.5%) vabsduh: rept loop master patch 8 12500 0,02172400 0,00687500 (-68.4%) 25 4000 0,02154100 0,00531500 (-75.3%) 100 1000 0,02235400 0,00596300 (-73.3%) 500 200 0,02827500 0,01245100 (-56.0%) 2500 40 0,05638400 0,04285500 (-24.0%) 8000 12 0,13166000 0,12641400 (-4.0%) vabsduw: rept loop master patch 8 12500 0,01646400 0,00688300 (-58.2%) 25 4000 0,01454500 0,00475500 (-67.3%) 100 1000 0,01545800 0,00511800 (-66.9%) 500 200 0,02168200 0,01114300 (-48.6%) 2500 40 0,04571300 0,04138800 (-9.5%) 8000 12 0,12209500 0,12178500 (-0.3%) Same as VADDCUW and VSUBCUW, overall performance gain but it uses more TCGop (4 before the patch, 6 after). Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 6 ++-- target/ppc/insn32.decode | 6 ++++ target/ppc/int_helper.c | 13 +++----- target/ppc/translate/vmx-impl.c.inc | 49 +++++++++++++++++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 3 -- 5 files changed, 60 insertions(+), 17 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 71c22efc2e..fd8280dfa7 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -146,9 +146,9 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64) DEF_HELPER_FLAGS_4(VAVGUB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGUH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGUW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) -DEF_HELPER_FLAGS_3(vabsdub, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vabsduh, TCG_CALL_NO_RWG, void, avr, avr, avr) -DEF_HELPER_FLAGS_3(vabsduw, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_4(VABSDUB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VABSDUH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) +DEF_HELPER_FLAGS_4(VABSDUW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGSB, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGSH, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_4(VAVGSW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 38458c01de..ae151c4b62 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -528,6 +528,12 @@ VAVGUB 000100 ..... ..... ..... 10000000010 @VX VAVGUH 000100 ..... ..... ..... 10001000010 @VX VAVGUW 000100 ..... ..... ..... 10010000010 @VX +## Vector Integer Absolute Difference Instructions + +VABSDUB 000100 ..... ..... ..... 10000000011 @VX +VABSDUH 000100 ..... ..... ..... 10001000011 @VX +VABSDUW 000100 ..... ..... ..... 10010000011 @VX + ## Vector Bit Manipulation Instruction VGNB 000100 ..... -- ... ..... 10011001100 @VX_n diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index bda76e54d4..d97a7f1f28 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -589,8 +589,8 @@ VAVG(VAVGSW, s32, int64_t) VAVG(VAVGUW, u32, uint64_t) #undef VAVG -#define VABSDU_DO(name, element) \ -void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) \ +#define VABSDU(name, element) \ +void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t v)\ { \ int i; \ \ @@ -606,12 +606,9 @@ void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) \ * name - instruction mnemonic suffix (b: byte, h: halfword, w: word) * element - element type to access from vector */ -#define VABSDU(type, element) \ - VABSDU_DO(absdu##type, element) -VABSDU(b, u8) -VABSDU(h, u16) -VABSDU(w, u32) -#undef VABSDU_DO +VABSDU(VABSDUB, u8) +VABSDU(VABSDUH, u16) +VABSDU(VABSDUW, u32) #undef VABSDU #define VCF(suffix, cvt, element) \ diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 195c601f7a..7741f2eb49 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -431,9 +431,6 @@ GEN_VXFORM_V(vminsb, MO_8, tcg_gen_gvec_smin, 1, 12); GEN_VXFORM_V(vminsh, MO_16, tcg_gen_gvec_smin, 1, 13); GEN_VXFORM_V(vminsw, MO_32, tcg_gen_gvec_smin, 1, 14); GEN_VXFORM_V(vminsd, MO_64, tcg_gen_gvec_smin, 1, 15); -GEN_VXFORM(vabsdub, 1, 16); -GEN_VXFORM(vabsduh, 1, 17); -GEN_VXFORM(vabsduw, 1, 18); GEN_VXFORM(vmrghb, 6, 0); GEN_VXFORM(vmrghh, 6, 1); GEN_VXFORM(vmrghw, 6, 2); @@ -3455,6 +3452,52 @@ TRANS_FLAGS(ALTIVEC, VAVGUB, do_vx_vavg, 0, MO_8) TRANS_FLAGS(ALTIVEC, VAVGUH, do_vx_vavg, 0, MO_16) TRANS_FLAGS(ALTIVEC, VAVGUW, do_vx_vavg, 0, MO_32) +static void gen_vabsdu(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_umax_vec(vece, t, a, b); + tcg_gen_umin_vec(vece, a, a, b); + tcg_gen_sub_vec(vece, t, t, a); +} + +static bool do_vabsdu(DisasContext *ctx, arg_VX *a, const int vece) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_umax_vec, INDEX_op_umin_vec, INDEX_op_sub_vec, 0 + }; + + static const GVecGen3 op[] = { + { + .fniv = gen_vabsdu, + .fno = gen_helper_VABSDUB, + .opt_opc = vecop_list, + .vece = MO_8 + }, + { + .fniv = gen_vabsdu, + .fno = gen_helper_VABSDUH, + .opt_opc = vecop_list, + .vece = MO_16 + }, + { + .fniv = gen_vabsdu, + .fno = gen_helper_VABSDUW, + .opt_opc = vecop_list, + .vece = MO_32 + }, + }; + + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra), + avr_full_offset(a->vrb), 16, 16, &op[vece]); + + return true; +} + +TRANS_FLAGS2(ISA300, VABSDUB, do_vabsdu, MO_8) +TRANS_FLAGS2(ISA300, VABSDUH, do_vabsdu, MO_16) +TRANS_FLAGS2(ISA300, VABSDUW, do_vabsdu, MO_32) + static bool do_vdiv_vmod(DisasContext *ctx, arg_VX *a, const int vece, void (*func_32)(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b), void (*func_64)(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)) diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc index 02db51def0..33fec8aca4 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -83,9 +83,6 @@ GEN_VXFORM(vminsb, 1, 12), GEN_VXFORM(vminsh, 1, 13), GEN_VXFORM(vminsw, 1, 14), GEN_VXFORM_207(vminsd, 1, 15), -GEN_VXFORM(vabsdub, 1, 16), -GEN_VXFORM(vabsduh, 1, 17), -GEN_VXFORM(vabsduw, 1, 18), GEN_VXFORM(vmrghb, 6, 0), GEN_VXFORM(vmrghh, 6, 1), GEN_VXFORM(vmrghw, 6, 2), From patchwork Wed Oct 19 12:50:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6EA0EC4332F for ; Wed, 19 Oct 2022 13:07:58 +0000 (UTC) Received: from localhost ([::1]:52868 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8nc-0007AH-6I for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:07:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57282) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8Yu-0006Qs-Q3; Wed, 19 Oct 2022 08:52:46 -0400 Received: from [200.168.210.66] (port=3639 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8Ys-00025u-47; Wed, 19 Oct 2022 08:52:44 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:44 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 29B35800278; Wed, 19 Oct 2022 09:50:44 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 08/12] target/ppc: Use gvec to decode XV[N]ABS[DS]P/XVNEG[DS]P Date: Wed, 19 Oct 2022 09:50:36 -0300 Message-Id: <20221019125040.48028-9-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:44.0531 (UTC) FILETIME=[6714F030:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved XVABSSP, XVABSDP, XVNABSSP,XVNABSDP, XVNEGSP and XVNEGDP to decodetree and used gvec to translate them. xvabssp: rept loop master patch 8 12500 0,00477900 0,00476000 (-0.4%) 25 4000 0,00442800 0,00353300 (-20.2%) 100 1000 0,00478700 0,00366100 (-23.5%) 500 200 0,00973200 0,00649400 (-33.3%) 2500 40 0,03165200 0,02226700 (-29.7%) 8000 12 0,09315900 0,06674900 (-28.3%) xvabsdp: rept loop master patch 8 12500 0,00475000 0,00474400 (-0.1%) 25 4000 0,00355600 0,00367500 (+3.3%) 100 1000 0,00444200 0,00366000 (-17.6%) 500 200 0,00942700 0,00732400 (-22.3%) 2500 40 0,02990000 0,02308500 (-22.8%) 8000 12 0,08770300 0,06683800 (-23.8%) xvnabssp: rept loop master patch 8 12500 0,00494500 0,00492900 (-0.3%) 25 4000 0,00397700 0,00338600 (-14.9%) 100 1000 0,00421400 0,00353500 (-16.1%) 500 200 0,01048000 0,00707100 (-32.5%) 2500 40 0,03251500 0,02238300 (-31.2%) 8000 12 0,08889100 0,06469800 (-27.2%) xvnabsdp: rept loop master patch 8 12500 0,00511000 0,00492700 (-3.6%) 25 4000 0,00398800 0,00381500 (-4.3%) 100 1000 0,00390500 0,00365900 (-6.3%) 500 200 0,00924800 0,00784600 (-15.2%) 2500 40 0,03138900 0,02391600 (-23.8%) 8000 12 0,09654200 0,05684600 (-41.1%) xvnegsp: rept loop master patch 8 12500 0,00493900 0,00452800 (-8.3%) 25 4000 0,00369100 0,00366800 (-0.6%) 100 1000 0,00371100 0,00380000 (+2.4%) 500 200 0,00991100 0,00652300 (-34.2%) 2500 40 0,03025800 0,02422300 (-19.9%) 8000 12 0,09251100 0,06457600 (-30.2%) xvnegdp: rept loop master patch 8 12500 0,00474900 0,00454400 (-4.3%) 25 4000 0,00353100 0,00325600 (-7.8%) 100 1000 0,00398600 0,00366800 (-8.0%) 500 200 0,01032300 0,00702400 (-32.0%) 2500 40 0,03125000 0,02422400 (-22.5%) 8000 12 0,09475100 0,06173000 (-34.9%) This one to me seemed the opposite of the previous instructions, as it looks like there was an improvement in the translation time (itself not a surprise as operations were done twice before so there was the need to translate twice as many TCGop) Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/insn32.decode | 9 ++++ target/ppc/translate/vsx-impl.c.inc | 73 ++++++++++++++++++++++++++--- target/ppc/translate/vsx-ops.c.inc | 6 --- 3 files changed, 76 insertions(+), 12 deletions(-) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index ae151c4b62..5b687078be 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -754,6 +754,15 @@ STXVRHX 011111 ..... ..... ..... 0010101101 . @X_TSX STXVRWX 011111 ..... ..... ..... 0011001101 . @X_TSX STXVRDX 011111 ..... ..... ..... 0011101101 . @X_TSX +## VSX Vector Binary Floating-Point Sign Manipulation Instructions + +XVABSDP 111100 ..... 00000 ..... 111011001 .. @XX2 +XVABSSP 111100 ..... 00000 ..... 110011001 .. @XX2 +XVNABSDP 111100 ..... 00000 ..... 111101001 .. @XX2 +XVNABSSP 111100 ..... 00000 ..... 110101001 .. @XX2 +XVNEGDP 111100 ..... 00000 ..... 111111001 .. @XX2 +XVNEGSP 111100 ..... 00000 ..... 110111001 .. @XX2 + ## VSX Scalar Multiply-Add Instructions XSMADDADP 111100 ..... ..... ..... 00100001 . . . @XX3 diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index e6e5c45ffd..8717e20d08 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -782,15 +782,76 @@ static void glue(gen_, name)(DisasContext *ctx) \ tcg_temp_free_i64(sgm); \ } -VSX_VECTOR_MOVE(xvabsdp, OP_ABS, SGN_MASK_DP) -VSX_VECTOR_MOVE(xvnabsdp, OP_NABS, SGN_MASK_DP) -VSX_VECTOR_MOVE(xvnegdp, OP_NEG, SGN_MASK_DP) VSX_VECTOR_MOVE(xvcpsgndp, OP_CPSGN, SGN_MASK_DP) -VSX_VECTOR_MOVE(xvabssp, OP_ABS, SGN_MASK_SP) -VSX_VECTOR_MOVE(xvnabssp, OP_NABS, SGN_MASK_SP) -VSX_VECTOR_MOVE(xvnegsp, OP_NEG, SGN_MASK_SP) VSX_VECTOR_MOVE(xvcpsgnsp, OP_CPSGN, SGN_MASK_SP) +#define TCG_OP_IMM_i64(FUNC, OP, IMM) \ + static void FUNC(TCGv_i64 t, TCGv_i64 b) \ + { \ + OP(t, b, IMM); \ + } + +TCG_OP_IMM_i64(do_xvabssp_i64, tcg_gen_andi_i64, ~SGN_MASK_SP) +TCG_OP_IMM_i64(do_xvnabssp_i64, tcg_gen_ori_i64, SGN_MASK_SP) +TCG_OP_IMM_i64(do_xvnegsp_i64, tcg_gen_xori_i64, SGN_MASK_SP) +TCG_OP_IMM_i64(do_xvabsdp_i64, tcg_gen_andi_i64, ~SGN_MASK_DP) +TCG_OP_IMM_i64(do_xvnabsdp_i64, tcg_gen_ori_i64, SGN_MASK_DP) +TCG_OP_IMM_i64(do_xvnegdp_i64, tcg_gen_xori_i64, SGN_MASK_DP) +#undef TCG_OP_IMM_i64 + +static void xv_msb_op1(unsigned vece, TCGv_vec t, TCGv_vec b, + void (*tcg_gen_op_vec)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec)) +{ + uint64_t msb = (vece == MO_32) ? SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_op_vec(vece, t, b, tcg_constant_vec_matching(t, vece, msb)); +} + +static void do_xvabs_vec(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + xv_msb_op1(vece, t, b, tcg_gen_andc_vec); +} + +static void do_xvnabs_vec(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + xv_msb_op1(vece, t, b, tcg_gen_or_vec); +} + +static void do_xvneg_vec(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + xv_msb_op1(vece, t, b, tcg_gen_xor_vec); +} + +static bool do_vsx_msb_op(DisasContext *ctx, arg_XX2 *a, unsigned vece, + void (*vec)(unsigned, TCGv_vec, TCGv_vec), + void (*i64)(TCGv_i64, TCGv_i64)) +{ + static const TCGOpcode vecop_list[] = { + 0 + }; + + const GVecGen2 op = { + .fni8 = i64, + .fniv = vec, + .opt_opc = vecop_list, + .vece = vece + }; + + REQUIRE_INSNS_FLAGS2(ctx, VSX); + REQUIRE_VSX(ctx); + + tcg_gen_gvec_2(vsr_full_offset(a->xt), vsr_full_offset(a->xb), + 16, 16, &op); + + return true; +} + +TRANS(XVABSDP, do_vsx_msb_op, MO_64, do_xvabs_vec, do_xvabsdp_i64) +TRANS(XVNABSDP, do_vsx_msb_op, MO_64, do_xvnabs_vec, do_xvnabsdp_i64) +TRANS(XVNEGDP, do_vsx_msb_op, MO_64, do_xvneg_vec, do_xvnegdp_i64) +TRANS(XVABSSP, do_vsx_msb_op, MO_32, do_xvabs_vec, do_xvabssp_i64) +TRANS(XVNABSSP, do_vsx_msb_op, MO_32, do_xvnabs_vec, do_xvnabssp_i64) +TRANS(XVNEGSP, do_vsx_msb_op, MO_32, do_xvneg_vec, do_xvnegsp_i64) + #define VSX_CMP(name, op1, op2, inval, type) \ static void gen_##name(DisasContext *ctx) \ { \ diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc index bff14bbece..b77324e0a8 100644 --- a/target/ppc/translate/vsx-ops.c.inc +++ b/target/ppc/translate/vsx-ops.c.inc @@ -165,13 +165,7 @@ GEN_XX3FORM(name, opc2, opc3 | 1, fl2) GEN_XX2FORM_DCMX(xvtstdcdp, 0x14, 0x1E, PPC2_ISA300), GEN_XX2FORM_DCMX(xvtstdcsp, 0x14, 0x1A, PPC2_ISA300), -GEN_XX2FORM(xvabsdp, 0x12, 0x1D, PPC2_VSX), -GEN_XX2FORM(xvnabsdp, 0x12, 0x1E, PPC2_VSX), -GEN_XX2FORM(xvnegdp, 0x12, 0x1F, PPC2_VSX), GEN_XX3FORM(xvcpsgndp, 0x00, 0x1E, PPC2_VSX), -GEN_XX2FORM(xvabssp, 0x12, 0x19, PPC2_VSX), -GEN_XX2FORM(xvnabssp, 0x12, 0x1A, PPC2_VSX), -GEN_XX2FORM(xvnegsp, 0x12, 0x1B, PPC2_VSX), GEN_XX3FORM(xvcpsgnsp, 0x00, 0x1A, PPC2_VSX), GEN_XX3FORM(xsadddp, 0x00, 0x04, PPC2_VSX), From patchwork Wed Oct 19 12:50:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011726 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DD221C4332F for ; Wed, 19 Oct 2022 13:09:05 +0000 (UTC) Received: from localhost ([::1]:42272 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8og-0003R3-IN for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:09:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:38200) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8a0-0000Uv-9d; Wed, 19 Oct 2022 08:53:54 -0400 Received: from [200.168.210.66] (port=54022 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8Zw-0002Qr-GE; Wed, 19 Oct 2022 08:53:51 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:44 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 6ADC8800150; Wed, 19 Oct 2022 09:50:44 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 09/12] target/ppc: Use gvec to decode XVCPSGN[SD]P Date: Wed, 19 Oct 2022 09:50:37 -0300 Message-Id: <20221019125040.48028-10-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:44.0734 (UTC) FILETIME=[6733E9E0:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved XVCPSGNSP and XVCPSGNDP to decodetree and used gvec to translate them. xvcpsgnsp: rept loop master patch 8 12500 0,00561400 0,00537900 (-4.2%) 25 4000 0,00562100 0,00400000 (-28.8%) 100 1000 0,00696900 0,00416300 (-40.3%) 500 200 0,02211900 0,00840700 (-62.0%) 2500 40 0,09328600 0,02728300 (-70.8%) 8000 12 0,27295300 0,06867800 (-74.8%) xvcpsgndp: rept loop master patch 8 12500 0,00556300 0,00584200 (+5.0%) 25 4000 0,00482700 0,00431700 (-10.6%) 100 1000 0,00585800 0,00464400 (-20.7%) 500 200 0,01565300 0,00839700 (-46.4%) 2500 40 0,05766500 0,02430600 (-57.8%) 8000 12 0,19875300 0,07947100 (-60.0%) Like the previous instructions there seemed to be a improvement on translation time. Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/insn32.decode | 2 + target/ppc/translate/vsx-impl.c.inc | 109 ++++++++++++++-------------- target/ppc/translate/vsx-ops.c.inc | 3 - 3 files changed, 55 insertions(+), 59 deletions(-) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 5b687078be..6549c4040e 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -762,6 +762,8 @@ XVNABSDP 111100 ..... 00000 ..... 111101001 .. @XX2 XVNABSSP 111100 ..... 00000 ..... 110101001 .. @XX2 XVNEGDP 111100 ..... 00000 ..... 111111001 .. @XX2 XVNEGSP 111100 ..... 00000 ..... 110111001 .. @XX2 +XVCPSGNDP 111100 ..... ..... ..... 11110000 ... @XX3 +XVCPSGNSP 111100 ..... ..... ..... 11010000 ... @XX3 ## VSX Scalar Multiply-Add Instructions diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index 8717e20d08..1c289238ec 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -729,62 +729,6 @@ VSX_SCALAR_MOVE_QP(xsnabsqp, OP_NABS, SGN_MASK_DP) VSX_SCALAR_MOVE_QP(xsnegqp, OP_NEG, SGN_MASK_DP) VSX_SCALAR_MOVE_QP(xscpsgnqp, OP_CPSGN, SGN_MASK_DP) -#define VSX_VECTOR_MOVE(name, op, sgn_mask) \ -static void glue(gen_, name)(DisasContext *ctx) \ - { \ - TCGv_i64 xbh, xbl, sgm; \ - if (unlikely(!ctx->vsx_enabled)) { \ - gen_exception(ctx, POWERPC_EXCP_VSXU); \ - return; \ - } \ - xbh = tcg_temp_new_i64(); \ - xbl = tcg_temp_new_i64(); \ - sgm = tcg_temp_new_i64(); \ - get_cpu_vsr(xbh, xB(ctx->opcode), true); \ - get_cpu_vsr(xbl, xB(ctx->opcode), false); \ - tcg_gen_movi_i64(sgm, sgn_mask); \ - switch (op) { \ - case OP_ABS: { \ - tcg_gen_andc_i64(xbh, xbh, sgm); \ - tcg_gen_andc_i64(xbl, xbl, sgm); \ - break; \ - } \ - case OP_NABS: { \ - tcg_gen_or_i64(xbh, xbh, sgm); \ - tcg_gen_or_i64(xbl, xbl, sgm); \ - break; \ - } \ - case OP_NEG: { \ - tcg_gen_xor_i64(xbh, xbh, sgm); \ - tcg_gen_xor_i64(xbl, xbl, sgm); \ - break; \ - } \ - case OP_CPSGN: { \ - TCGv_i64 xah = tcg_temp_new_i64(); \ - TCGv_i64 xal = tcg_temp_new_i64(); \ - get_cpu_vsr(xah, xA(ctx->opcode), true); \ - get_cpu_vsr(xal, xA(ctx->opcode), false); \ - tcg_gen_and_i64(xah, xah, sgm); \ - tcg_gen_and_i64(xal, xal, sgm); \ - tcg_gen_andc_i64(xbh, xbh, sgm); \ - tcg_gen_andc_i64(xbl, xbl, sgm); \ - tcg_gen_or_i64(xbh, xbh, xah); \ - tcg_gen_or_i64(xbl, xbl, xal); \ - tcg_temp_free_i64(xah); \ - tcg_temp_free_i64(xal); \ - break; \ - } \ - } \ - set_cpu_vsr(xT(ctx->opcode), xbh, true); \ - set_cpu_vsr(xT(ctx->opcode), xbl, false); \ - tcg_temp_free_i64(xbh); \ - tcg_temp_free_i64(xbl); \ - tcg_temp_free_i64(sgm); \ - } - -VSX_VECTOR_MOVE(xvcpsgndp, OP_CPSGN, SGN_MASK_DP) -VSX_VECTOR_MOVE(xvcpsgnsp, OP_CPSGN, SGN_MASK_SP) - #define TCG_OP_IMM_i64(FUNC, OP, IMM) \ static void FUNC(TCGv_i64 t, TCGv_i64 b) \ { \ @@ -852,6 +796,59 @@ TRANS(XVABSSP, do_vsx_msb_op, MO_32, do_xvabs_vec, do_xvabssp_i64) TRANS(XVNABSSP, do_vsx_msb_op, MO_32, do_xvnabs_vec, do_xvnabssp_i64) TRANS(XVNEGSP, do_vsx_msb_op, MO_32, do_xvneg_vec, do_xvnegsp_i64) +static void do_xvcpsgndp_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_andi_i64(a, a, SGN_MASK_DP); + tcg_gen_andi_i64(b, b, ~SGN_MASK_DP); + tcg_gen_or_i64(t, a, b); +} + +static void do_xvcpsgnsp_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_andi_i64(a, a, SGN_MASK_SP); + tcg_gen_andi_i64(b, b, ~SGN_MASK_SP); + tcg_gen_or_i64(t, a, b); +} + +static void do_xvcpsgn_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b) +{ + uint64_t msb = (vece == MO_32) ? SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_bitsel_vec(vece, t, tcg_constant_vec_matching(t, vece, msb), a, b); +} + +static bool do_xvcpsgn(DisasContext *ctx, arg_XX3 *a, unsigned vece) +{ + static const TCGOpcode vecop_list[] = { + 0 + }; + + static const GVecGen3 op[] = { + { + .fni8 = do_xvcpsgnsp_i64, + .fniv = do_xvcpsgn_vec, + .opt_opc = vecop_list, + .vece = MO_32 + }, + { + .fni8 = do_xvcpsgndp_i64, + .fniv = do_xvcpsgn_vec, + .opt_opc = vecop_list, + .vece = MO_64 + }, + }; + + REQUIRE_INSNS_FLAGS2(ctx, VSX); + REQUIRE_VSX(ctx); + + tcg_gen_gvec_3(vsr_full_offset(a->xt), vsr_full_offset(a->xa), + vsr_full_offset(a->xb), 16, 16, &op[vece - MO_32]); + + return true; +} + +TRANS(XVCPSGNSP, do_xvcpsgn, MO_32) +TRANS(XVCPSGNDP, do_xvcpsgn, MO_64) + #define VSX_CMP(name, op1, op2, inval, type) \ static void gen_##name(DisasContext *ctx) \ { \ diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc index b77324e0a8..f7d7377379 100644 --- a/target/ppc/translate/vsx-ops.c.inc +++ b/target/ppc/translate/vsx-ops.c.inc @@ -165,9 +165,6 @@ GEN_XX3FORM(name, opc2, opc3 | 1, fl2) GEN_XX2FORM_DCMX(xvtstdcdp, 0x14, 0x1E, PPC2_ISA300), GEN_XX2FORM_DCMX(xvtstdcsp, 0x14, 0x1A, PPC2_ISA300), -GEN_XX3FORM(xvcpsgndp, 0x00, 0x1E, PPC2_VSX), -GEN_XX3FORM(xvcpsgnsp, 0x00, 0x1A, PPC2_VSX), - GEN_XX3FORM(xsadddp, 0x00, 0x04, PPC2_VSX), GEN_VSX_XFORM_300(xsaddqp, 0x04, 0x00, 0x0), GEN_XX3FORM(xssubdp, 0x00, 0x05, PPC2_VSX), From patchwork Wed Oct 19 12:50:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011724 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7A112C4332F for ; Wed, 19 Oct 2022 13:07:15 +0000 (UTC) Received: from localhost ([::1]:40380 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8mw-0006bq-Ak for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:07:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40396) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8aI-00010I-3o; Wed, 19 Oct 2022 08:54:10 -0400 Received: from [200.168.210.66] (port=54022 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8a1-0002Qr-DZ; Wed, 19 Oct 2022 08:54:09 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:44 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id A3CB6800278; Wed, 19 Oct 2022 09:50:44 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 10/12] target/ppc: Moved XVTSTDC[DS]P to decodetree Date: Wed, 19 Oct 2022 09:50:38 -0300 Message-Id: <20221019125040.48028-11-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:44.0953 (UTC) FILETIME=[67555490:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper to be simpler and do all decoding in the decodetree (so XB, XT and DCMX are all calculated outside the helper). Obs: The tests in this one are slightly different, these are the sum of these instructions with all possible immediate and those instructions are repeated 10 times. xvtstdcsp: rept loop master patch 8 12500 2,76402100 2,70699100 (-2.1%) 25 4000 2,64867100 2,67884100 (+1.1%) 100 1000 2,73806300 2,78701000 (+1.8%) 500 200 3,44666500 3,61027600 (+4.7%) 2500 40 5,85790200 6,47475500 (+10.5%) 8000 12 15,22102100 17,46062900 (+14.7%) xvtstdcdp: rept loop master patch 8 12500 2,11818000 1,61065300 (-24.0%) 25 4000 2,04573400 1,60132200 (-21.7%) 100 1000 2,13834100 1,69988100 (-20.5%) 500 200 2,73977000 2,48631700 (-9.3%) 2500 40 5,05067000 5,25914100 (+4.1%) 8000 12 14,60507800 15,93704900 (+9.1%) Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/fpu_helper.c | 39 +++++++++++++++++++++++++++-- target/ppc/helper.h | 4 +-- target/ppc/insn32.decode | 5 ++++ target/ppc/translate/vsx-impl.c.inc | 28 +++++++++++++++++++-- target/ppc/translate/vsx-ops.c.inc | 8 ------ 5 files changed, 70 insertions(+), 14 deletions(-) diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c index ae25f32d6e..960a76a8a5 100644 --- a/target/ppc/fpu_helper.c +++ b/target/ppc/fpu_helper.c @@ -3295,11 +3295,46 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ } \ } -VSX_TEST_DC(xvtstdcdp, 2, xB(opcode), float64, VsrD(i), VsrD(i), UINT64_MAX, 0) -VSX_TEST_DC(xvtstdcsp, 4, xB(opcode), float32, VsrW(i), VsrW(i), UINT32_MAX, 0) VSX_TEST_DC(xststdcdp, 1, xB(opcode), float64, VsrD(0), VsrD(0), 0, 1) VSX_TEST_DC(xststdcqp, 1, (rB(opcode) + 32), float128, f128, VsrD(0), 0, 1) +#define VSX_TSTDC(tp) \ +static int32_t tp##_tstdc(tp b, uint32_t dcmx) \ +{ \ + uint32_t match = 0; \ + uint32_t sign = tp##_is_neg(b); \ + if (tp##_is_any_nan(b)) { \ + match = extract32(dcmx, 6, 1); \ + } else if (tp##_is_infinity(b)) { \ + match = extract32(dcmx, 4 + !sign, 1); \ + } else if (tp##_is_zero(b)) { \ + match = extract32(dcmx, 2 + !sign, 1); \ + } else if (tp##_is_zero_or_denormal(b)) { \ + match = extract32(dcmx, 0 + !sign, 1); \ + } \ + return (match != 0); \ +} + +VSX_TSTDC(float32) +VSX_TSTDC(float64) +#undef VSX_TSTDC + +void helper_XVTSTDCDP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) +{ + int i; + for (i = 0; i < 2; i++) { + t->s64[i] = (int64_t)-float64_tstdc(b->f64[i], dcmx); + } +} + +void helper_XVTSTDCSP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) +{ + int i; + for (i = 0; i < 4; i++) { + t->s32[i] = (int32_t)-float32_tstdc(b->f32[i], dcmx); + } +} + void helper_xststdcsp(CPUPPCState *env, uint32_t opcode, ppc_vsr_t *xb) { uint32_t dcmx, sign, exp; diff --git a/target/ppc/helper.h b/target/ppc/helper.h index fd8280dfa7..9e5d11939b 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -517,8 +517,8 @@ DEF_HELPER_3(xvcvsxdsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvuxdsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvsxwsp, void, env, vsr, vsr) DEF_HELPER_3(xvcvuxwsp, void, env, vsr, vsr) -DEF_HELPER_2(xvtstdcsp, void, env, i32) -DEF_HELPER_2(xvtstdcdp, void, env, i32) +DEF_HELPER_FLAGS_4(XVTSTDCSP, TCG_CALL_NO_RWG, void, vsr, vsr, i64, i32) +DEF_HELPER_FLAGS_4(XVTSTDCDP, TCG_CALL_NO_RWG, void, vsr, vsr, i64, i32) DEF_HELPER_3(xvrspi, void, env, vsr, vsr) DEF_HELPER_3(xvrspic, void, env, vsr, vsr) DEF_HELPER_3(xvrspim, void, env, vsr, vsr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 6549c4040e..c0a531be5c 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -199,6 +199,9 @@ @XX2_uim4 ...... ..... . uim:4 ..... ......... .. &XX2_uim xt=%xx_xt xb=%xx_xb +%xx_uim7 6:1 2:1 16:5 +@XX2_uim7 ...... ..... ..... ..... .... . ... . .. &XX2_uim xt=%xx_xt xb=%xx_xb uim=%xx_uim7 + &XX2_bf_xb bf xb @XX2_bf_xb ...... bf:3 .. ..... ..... ......... . . &XX2_bf_xb xb=%xx_xb @@ -848,6 +851,8 @@ XSCVSPDPN 111100 ..... ----- ..... 101001011 .. @XX2 ## VSX Binary Floating-Point Math Support Instructions XVXSIGSP 111100 ..... 01001 ..... 111011011 .. @XX2 +XVTSTDCDP 111100 ..... ..... ..... 1111 . 101 ... @XX2_uim7 +XVTSTDCSP 111100 ..... ..... ..... 1101 . 101 ... @XX2_uim7 ## VSX Vector Test Least-Significant Bit by Byte Instruction diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index 1c289238ec..287ea8e2ce 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -630,6 +630,8 @@ static void gen_mtvsrws(DisasContext *ctx) #define OP_CPSGN 4 #define SGN_MASK_DP 0x8000000000000000ull #define SGN_MASK_SP 0x8000000080000000ull +#define EXP_MASK_DP 0x7FF0000000000000ull +#define EXP_MASK_SP 0x7F8000007F800000ull #define VSX_SCALAR_MOVE(name, op, sgn_mask) \ static void glue(gen_, name)(DisasContext *ctx) \ @@ -1110,6 +1112,30 @@ GEN_VSX_HELPER_X2(xscvhpdp, 0x16, 0x15, 0x10, PPC2_ISA300) GEN_VSX_HELPER_R2(xscvsdqp, 0x04, 0x1A, 0x0A, PPC2_ISA300) GEN_VSX_HELPER_X2(xscvspdp, 0x12, 0x14, 0, PPC2_VSX) +static bool do_xvtstdc(DisasContext *ctx, arg_XX2_uim *a, unsigned vece) +{ + static const GVecGen2i op[] = { + { + .fnoi = gen_helper_XVTSTDCSP, + .vece = MO_32 + }, + { + .fnoi = gen_helper_XVTSTDCDP, + .vece = MO_64 + }, + }; + + REQUIRE_VSX(ctx); + + tcg_gen_gvec_2i(vsr_full_offset(a->xt), vsr_full_offset(a->xb), + 16, 16, (int32_t)(a->uim), &op[vece - MO_32]); + + return true; +} + +TRANS_FLAGS2(VSX, XVTSTDCSP, do_xvtstdc, MO_32) +TRANS_FLAGS2(VSX, XVTSTDCDP, do_xvtstdc, MO_64) + bool trans_XSCVSPDPN(DisasContext *ctx, arg_XX2 *a) { TCGv_i64 tmp; @@ -1213,8 +1239,6 @@ GEN_VSX_HELPER_X2(xvrspic, 0x16, 0x0A, 0, PPC2_VSX) GEN_VSX_HELPER_X2(xvrspim, 0x12, 0x0B, 0, PPC2_VSX) GEN_VSX_HELPER_X2(xvrspip, 0x12, 0x0A, 0, PPC2_VSX) GEN_VSX_HELPER_X2(xvrspiz, 0x12, 0x09, 0, PPC2_VSX) -GEN_VSX_HELPER_2(xvtstdcsp, 0x14, 0x1A, 0, PPC2_VSX) -GEN_VSX_HELPER_2(xvtstdcdp, 0x14, 0x1E, 0, PPC2_VSX) static bool trans_XXPERM(DisasContext *ctx, arg_XX3 *a) { diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc index f7d7377379..4b317d4b06 100644 --- a/target/ppc/translate/vsx-ops.c.inc +++ b/target/ppc/translate/vsx-ops.c.inc @@ -157,14 +157,6 @@ GEN_XX2FORM_EO(xvxexpdp, 0x16, 0x1D, 0x00, PPC2_ISA300), GEN_XX2FORM_EO(xvxsigdp, 0x16, 0x1D, 0x01, PPC2_ISA300), GEN_XX2FORM_EO(xvxexpsp, 0x16, 0x1D, 0x08, PPC2_ISA300), -/* DCMX = bit[25] << 6 | bit[29] << 5 | bit[11:15] */ -#define GEN_XX2FORM_DCMX(name, opc2, opc3, fl2) \ -GEN_XX3FORM(name, opc2, opc3 | 0, fl2), \ -GEN_XX3FORM(name, opc2, opc3 | 1, fl2) - -GEN_XX2FORM_DCMX(xvtstdcdp, 0x14, 0x1E, PPC2_ISA300), -GEN_XX2FORM_DCMX(xvtstdcsp, 0x14, 0x1A, PPC2_ISA300), - GEN_XX3FORM(xsadddp, 0x00, 0x04, PPC2_VSX), GEN_VSX_XFORM_300(xsaddqp, 0x04, 0x00, 0x0), GEN_XX3FORM(xssubdp, 0x00, 0x05, PPC2_VSX), From patchwork Wed Oct 19 12:50:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011748 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6118BC433FE for ; Wed, 19 Oct 2022 13:23:28 +0000 (UTC) Received: from localhost ([::1]:57946 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol92c-0004sN-9c for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:23:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35054) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8be-00045e-Cf; Wed, 19 Oct 2022 08:55:34 -0400 Received: from [200.168.210.66] (port=42674 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8bP-0002nO-2L; Wed, 19 Oct 2022 08:55:33 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:45 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id E0754800150; Wed, 19 Oct 2022 09:50:44 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 11/12] target/ppc: Moved XSTSTDC[QDS]P to decodetree Date: Wed, 19 Oct 2022 09:50:39 -0300 Message-Id: <20221019125040.48028-12-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:45.0203 (UTC) FILETIME=[677B7A30:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of its decoding away from the helper as previously the DCMX, XB and BF were calculated in the helper with the help of cpu_env, now that part was moved to the decodetree with the rest. xvtstdcsp: rept loop master patch 8 12500 1,85393600 1,94683600 (+5.0%) 25 4000 1,78779800 1,92479000 (+7.7%) 100 1000 2,12775000 2,28895500 (+7.6%) 500 200 2,99655300 3,23102900 (+7.8%) 2500 40 6,89082200 7,44827500 (+8.1%) 8000 12 17,50585500 18,95152100 (+8.3%) xvtstdcdp: rept loop master patch 8 12500 1,39043100 1,33539800 (-4.0%) 25 4000 1,35731800 1,37347800 (+1.2%) 100 1000 1,51514800 1,56053000 (+3.0%) 500 200 2,21014400 2,47906000 (+12.2%) 2500 40 5,39488200 6,68766700 (+24.0%) 8000 12 13,98623900 18,17661900 (+30.0%) xvtstdcdp: rept loop master patch 8 12500 1,35123800 1,34455800 (-0.5%) 25 4000 1,36441200 1,36759600 (+0.2%) 100 1000 1,49763500 1,54138400 (+2.9%) 500 200 2,19020200 2,46196400 (+12.4%) 2500 40 5,39265700 6,68147900 (+23.9%) 8000 12 14,04163600 18,19669600 (+29.6%) As some values are now decoded outside the helper and passed to it as an argument the number of arguments of the helper increased, the number of TCGop needed to load the arguments increased. I suspect that's why the slow-down in the tests with a high REPT but low LOOP. Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/fpu_helper.c | 114 +++++++++------------------- target/ppc/helper.h | 6 +- target/ppc/insn32.decode | 6 ++ target/ppc/translate/vsx-impl.c.inc | 20 ++++- target/ppc/translate/vsx-ops.c.inc | 4 - 5 files changed, 60 insertions(+), 90 deletions(-) diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c index 960a76a8a5..a66e16c212 100644 --- a/target/ppc/fpu_helper.c +++ b/target/ppc/fpu_helper.c @@ -3241,63 +3241,6 @@ void helper_XVXSIGSP(ppc_vsr_t *xt, ppc_vsr_t *xb) *xt = t; } -/* - * VSX_TEST_DC - VSX floating point test data class - * op - instruction mnemonic - * nels - number of elements (1, 2 or 4) - * xbn - VSR register number - * tp - type (float32 or float64) - * fld - vsr_t field (VsrD(*) or VsrW(*)) - * tfld - target vsr_t field (VsrD(*) or VsrW(*)) - * fld_max - target field max - * scrf - set result in CR and FPCC - */ -#define VSX_TEST_DC(op, nels, xbn, tp, fld, tfld, fld_max, scrf) \ -void helper_##op(CPUPPCState *env, uint32_t opcode) \ -{ \ - ppc_vsr_t *xt = &env->vsr[xT(opcode)]; \ - ppc_vsr_t *xb = &env->vsr[xbn]; \ - ppc_vsr_t t = { }; \ - uint32_t i, sign, dcmx; \ - uint32_t cc, match = 0; \ - \ - if (!scrf) { \ - dcmx = DCMX_XV(opcode); \ - } else { \ - t = *xt; \ - dcmx = DCMX(opcode); \ - } \ - \ - for (i = 0; i < nels; i++) { \ - sign = tp##_is_neg(xb->fld); \ - if (tp##_is_any_nan(xb->fld)) { \ - match = extract32(dcmx, 6, 1); \ - } else if (tp##_is_infinity(xb->fld)) { \ - match = extract32(dcmx, 4 + !sign, 1); \ - } else if (tp##_is_zero(xb->fld)) { \ - match = extract32(dcmx, 2 + !sign, 1); \ - } else if (tp##_is_zero_or_denormal(xb->fld)) { \ - match = extract32(dcmx, 0 + !sign, 1); \ - } \ - \ - if (scrf) { \ - cc = sign << CRF_LT_BIT | match << CRF_EQ_BIT; \ - env->fpscr &= ~FP_FPCC; \ - env->fpscr |= cc << FPSCR_FPCC; \ - env->crf[BF(opcode)] = cc; \ - } else { \ - t.tfld = match ? fld_max : 0; \ - } \ - match = 0; \ - } \ - if (!scrf) { \ - *xt = t; \ - } \ -} - -VSX_TEST_DC(xststdcdp, 1, xB(opcode), float64, VsrD(0), VsrD(0), 0, 1) -VSX_TEST_DC(xststdcqp, 1, (rB(opcode) + 32), float128, f128, VsrD(0), 0, 1) - #define VSX_TSTDC(tp) \ static int32_t tp##_tstdc(tp b, uint32_t dcmx) \ { \ @@ -3317,6 +3260,7 @@ static int32_t tp##_tstdc(tp b, uint32_t dcmx) \ VSX_TSTDC(float32) VSX_TSTDC(float64) +VSX_TSTDC(float128) #undef VSX_TSTDC void helper_XVTSTDCDP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) @@ -3335,34 +3279,44 @@ void helper_XVTSTDCSP(ppc_vsr_t *t, ppc_vsr_t *b, uint64_t dcmx, uint32_t v) } } -void helper_xststdcsp(CPUPPCState *env, uint32_t opcode, ppc_vsr_t *xb) +static bool not_SP_value(float64 val) { - uint32_t dcmx, sign, exp; - uint32_t cc, match = 0, not_sp = 0; - float64 arg = xb->VsrD(0); - float64 arg_sp; - - dcmx = DCMX(opcode); - exp = (arg >> 52) & 0x7FF; - sign = float64_is_neg(arg); - - if (float64_is_any_nan(arg)) { - match = extract32(dcmx, 6, 1); - } else if (float64_is_infinity(arg)) { - match = extract32(dcmx, 4 + !sign, 1); - } else if (float64_is_zero(arg)) { - match = extract32(dcmx, 2 + !sign, 1); - } else if (float64_is_zero_or_denormal(arg) || (exp > 0 && exp < 0x381)) { - match = extract32(dcmx, 0 + !sign, 1); - } - - arg_sp = helper_todouble(helper_tosingle(arg)); - not_sp = arg != arg_sp; + return val != helper_todouble(helper_tosingle(val)); +} +/* + * VSX_XS_TSTDC - VSX Scalar Test Data Class + * NAME - instruction name + * FLD - vsr_t field (VsrD(0) or f128) + * TP - type (float64 or float128) + */ +#define VSX_XS_TSTDC(NAME, FLD, TP) \ + void helper_##NAME(CPUPPCState *env, uint32_t bf, \ + uint32_t dcmx, ppc_vsr_t *b) \ + { \ + uint32_t cc, match, sign = TP##_is_neg(b->FLD); \ + match = TP##_tstdc(b->FLD, dcmx); \ + cc = sign << CRF_LT_BIT | match << CRF_EQ_BIT; \ + env->fpscr &= ~FP_FPCC; \ + env->fpscr |= cc << FPSCR_FPCC; \ + env->crf[bf] = cc; \ + } + +VSX_XS_TSTDC(XSTSTDCDP, VsrD(0), float64) +VSX_XS_TSTDC(XSTSTDCQP, f128, float128) +#undef VSX_XS_TSTDC + +void helper_XSTSTDCSP(CPUPPCState *env, uint32_t bf, + uint32_t dcmx, ppc_vsr_t *b) +{ + uint32_t cc, match, sign = float64_is_neg(b->VsrD(0)); + uint32_t exp = (b->VsrD(0) >> 52) & 0x7FF; + int not_sp = (int)not_SP_value(b->VsrD(0)); + match = float64_tstdc(b->VsrD(0), dcmx) || (exp > 0 && exp < 0x381); cc = sign << CRF_LT_BIT | match << CRF_EQ_BIT | not_sp << CRF_SO_BIT; env->fpscr &= ~FP_FPCC; env->fpscr |= cc << FPSCR_FPCC; - env->crf[BF(opcode)] = cc; + env->crf[bf] = cc; } void helper_xsrqpi(CPUPPCState *env, uint32_t opcode, diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 9e5d11939b..8344fe39c6 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -417,9 +417,9 @@ DEF_HELPER_3(xscvuxdsp, void, env, vsr, vsr) DEF_HELPER_3(xscvsxdsp, void, env, vsr, vsr) DEF_HELPER_4(xscvudqp, void, env, i32, vsr, vsr) DEF_HELPER_3(xscvuxddp, void, env, vsr, vsr) -DEF_HELPER_3(xststdcsp, void, env, i32, vsr) -DEF_HELPER_2(xststdcdp, void, env, i32) -DEF_HELPER_2(xststdcqp, void, env, i32) +DEF_HELPER_4(XSTSTDCSP, void, env, i32, i32, vsr) +DEF_HELPER_4(XSTSTDCDP, void, env, i32, i32, vsr) +DEF_HELPER_4(XSTSTDCQP, void, env, i32, i32, vsr) DEF_HELPER_3(xsrdpi, void, env, vsr, vsr) DEF_HELPER_3(xsrdpic, void, env, vsr, vsr) DEF_HELPER_3(xsrdpim, void, env, vsr, vsr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index c0a531be5c..334eb1beca 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -202,6 +202,9 @@ %xx_uim7 6:1 2:1 16:5 @XX2_uim7 ...... ..... ..... ..... .... . ... . .. &XX2_uim xt=%xx_xt xb=%xx_xb uim=%xx_uim7 +&XX2_bf_uim bf xb uim +@XX2_bf_uim ...... bf:3 uim:7 ..... ......... . . &XX2_bf_uim + &XX2_bf_xb bf xb @XX2_bf_xb ...... bf:3 .. ..... ..... ......... . . &XX2_bf_xb xb=%xx_xb @@ -853,6 +856,9 @@ XSCVSPDPN 111100 ..... ----- ..... 101001011 .. @XX2 XVXSIGSP 111100 ..... 01001 ..... 111011011 .. @XX2 XVTSTDCDP 111100 ..... ..... ..... 1111 . 101 ... @XX2_uim7 XVTSTDCSP 111100 ..... ..... ..... 1101 . 101 ... @XX2_uim7 +XSTSTDCSP 111100 ... ....... ..... 100101010 . - @XX2_bf_uim xb=%xx_xb +XSTSTDCDP 111100 ... ....... ..... 101101010 . - @XX2_bf_uim xb=%xx_xb +XSTSTDCQP 111111 ... ....... xb:5 1011000100 - @XX2_bf_uim ## VSX Vector Test Least-Significant Bit by Byte Instruction diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index 287ea8e2ce..af410cbf1b 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -1136,6 +1136,23 @@ static bool do_xvtstdc(DisasContext *ctx, arg_XX2_uim *a, unsigned vece) TRANS_FLAGS2(VSX, XVTSTDCSP, do_xvtstdc, MO_32) TRANS_FLAGS2(VSX, XVTSTDCDP, do_xvtstdc, MO_64) +static bool do_XX2_bf_uim(DisasContext *ctx, arg_XX2_bf_uim *a, bool vsr, + void (*gen_helper)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_ptr)) +{ + TCGv_ptr xb; + + REQUIRE_VSX(ctx); + xb = vsr ? gen_vsr_ptr(a->xb) : gen_avr_ptr(a->xb); + gen_helper(cpu_env, tcg_constant_i32(a->bf), tcg_constant_i32(a->uim), xb); + tcg_temp_free_ptr(xb); + + return true; +} + +TRANS_FLAGS2(ISA300, XSTSTDCSP, do_XX2_bf_uim, true, gen_helper_XSTSTDCSP) +TRANS_FLAGS2(ISA300, XSTSTDCDP, do_XX2_bf_uim, true, gen_helper_XSTSTDCDP) +TRANS_FLAGS2(ISA300, XSTSTDCQP, do_XX2_bf_uim, false, gen_helper_XSTSTDCQP) + bool trans_XSCVSPDPN(DisasContext *ctx, arg_XX2 *a) { TCGv_i64 tmp; @@ -1182,9 +1199,6 @@ GEN_VSX_HELPER_X2(xssqrtsp, 0x16, 0x00, 0, PPC2_VSX207) GEN_VSX_HELPER_X2(xsrsqrtesp, 0x14, 0x00, 0, PPC2_VSX207) GEN_VSX_HELPER_X2(xscvsxdsp, 0x10, 0x13, 0, PPC2_VSX207) GEN_VSX_HELPER_X2(xscvuxdsp, 0x10, 0x12, 0, PPC2_VSX207) -GEN_VSX_HELPER_X1(xststdcsp, 0x14, 0x12, 0, PPC2_ISA300) -GEN_VSX_HELPER_2(xststdcdp, 0x14, 0x16, 0, PPC2_ISA300) -GEN_VSX_HELPER_2(xststdcqp, 0x04, 0x16, 0, PPC2_ISA300) GEN_VSX_HELPER_X3(xvadddp, 0x00, 0x0C, 0, PPC2_VSX) GEN_VSX_HELPER_X3(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX) diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc index 4b317d4b06..a3ba094d62 100644 --- a/target/ppc/translate/vsx-ops.c.inc +++ b/target/ppc/translate/vsx-ops.c.inc @@ -147,10 +147,6 @@ GEN_HANDLER_E(xsiexpdp, 0x3C, 0x16, 0x1C, 0, PPC_NONE, PPC2_ISA300), GEN_VSX_XFORM_300(xsiexpqp, 0x4, 0x1B, 0x00000001), #endif -GEN_XX2FORM(xststdcdp, 0x14, 0x16, PPC2_ISA300), -GEN_XX2FORM(xststdcsp, 0x14, 0x12, PPC2_ISA300), -GEN_VSX_XFORM_300(xststdcqp, 0x04, 0x16, 0x00000001), - GEN_XX3FORM(xviexpsp, 0x00, 0x1B, PPC2_ISA300), GEN_XX3FORM(xviexpdp, 0x00, 0x1F, PPC2_ISA300), GEN_XX2FORM_EO(xvxexpdp, 0x16, 0x1D, 0x00, PPC2_ISA300), From patchwork Wed Oct 19 12:50:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas Mateus Martins Araujo e Castro X-Patchwork-Id: 13011749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 79909C433FE for ; Wed, 19 Oct 2022 13:23:47 +0000 (UTC) Received: from localhost ([::1]:46544 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol92w-0005up-D5 for qemu-devel@archiver.kernel.org; Wed, 19 Oct 2022 09:23:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59850) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ol8bh-00049q-S9; Wed, 19 Oct 2022 08:55:38 -0400 Received: from [200.168.210.66] (port=42674 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ol8bf-0002nO-HS; Wed, 19 Oct 2022 08:55:37 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Wed, 19 Oct 2022 09:50:45 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 1F3DA800278; Wed, 19 Oct 2022 09:50:45 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?utf-8?q?C?= =?utf-8?q?=C3=A9dric_Le_Goater?= , David Gibson , Greg Kurz Subject: [PATCH v3 12/12] target/ppc: Use gvec to decode XVTSTDC[DS]P Date: Wed, 19 Oct 2022 09:50:40 -0300 Message-Id: <20221019125040.48028-13-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> References: <20221019125040.48028-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2022 12:50:45.0406 (UTC) FILETIME=[679A73E0:01D8E3B9] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: "Lucas Mateus Castro (alqotel)" Used gvec to translate XVTSTDCSP and XVTSTDCDP. xvtstdcsp: rept loop imm master version prev version current version 25 4000 0 0,206200 0,040730 (-80.2%) 0,040740 (-80.2%) 25 4000 1 0,205120 0,053650 (-73.8%) 0,053510 (-73.9%) 25 4000 3 0,206160 0,058630 (-71.6%) 0,058570 (-71.6%) 25 4000 51 0,217110 0,191490 (-11.8%) 0,192320 (-11.4%) 25 4000 127 0,206160 0,191490 (-7.1%) 0,192640 (-6.6%) 8000 12 0 1,234719 0,418833 (-66.1%) 0,386365 (-68.7%) 8000 12 1 1,232417 1,435979 (+16.5%) 1,462792 (+18.7%) 8000 12 3 1,232760 1,766073 (+43.3%) 1,743990 (+41.5%) 8000 12 51 1,239281 1,319562 (+6.5%) 1,423479 (+14.9%) 8000 12 127 1,231708 1,315760 (+6.8%) 1,426667 (+15.8%) xvtstdcdp: rept loop imm master version prev version current version 25 4000 0 0,159930 0,040830 (-74.5%) 0,040610 (-74.6%) 25 4000 1 0,160640 0,053670 (-66.6%) 0,053480 (-66.7%) 25 4000 3 0,160020 0,063030 (-60.6%) 0,062960 (-60.7%) 25 4000 51 0,160410 0,128620 (-19.8%) 0,127470 (-20.5%) 25 4000 127 0,160330 0,127670 (-20.4%) 0,128690 (-19.7%) 8000 12 0 1,190365 0,422146 (-64.5%) 0,388417 (-67.4%) 8000 12 1 1,191292 1,445312 (+21.3%) 1,428698 (+19.9%) 8000 12 3 1,188687 1,980656 (+66.6%) 1,975354 (+66.2%) 8000 12 51 1,191250 1,264500 (+6.1%) 1,355083 (+13.8%) 8000 12 127 1,197313 1,266729 (+5.8%) 1,349156 (+12.7%) Overall, these instructions are the hardest ones to measure performance as the gvec implementation is affected by the immediate. Above there are 5 different scenarios when it comes to immediate and 2 when it comes to rept/loop combination. The immediates scenarios are: all bits are 0 therefore the target register should just be changed to 0, with 1 bit set, with 2 bits set in a combination the new implementation can deal with using gvec, 4 bits set and the new implementation can't deal with it using gvec and all bits set. The rept/loop scenarios are high loop and low rept (so it should spend more time executing it than translating it) and high rept low loop (so it should spend more time translating it than executing this code). These comparisons are between the upstream version, a previous similar implementation and a one with a cleaner code(this one). For a comparison with o previous different implementation: <20221010191356.83659-13-lucas.araujo@eldorado.org.br> Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/translate/vsx-impl.c.inc | 164 ++++++++++++++++++++++++++-- 1 file changed, 154 insertions(+), 10 deletions(-) diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc index af410cbf1b..7099e7823d 100644 --- a/target/ppc/translate/vsx-impl.c.inc +++ b/target/ppc/translate/vsx-impl.c.inc @@ -632,6 +632,8 @@ static void gen_mtvsrws(DisasContext *ctx) #define SGN_MASK_SP 0x8000000080000000ull #define EXP_MASK_DP 0x7FF0000000000000ull #define EXP_MASK_SP 0x7F8000007F800000ull +#define FRC_MASK_DP (~(SGN_MASK_DP | EXP_MASK_DP)) +#define FRC_MASK_SP (~(SGN_MASK_SP | EXP_MASK_SP)) #define VSX_SCALAR_MOVE(name, op, sgn_mask) \ static void glue(gen_, name)(DisasContext *ctx) \ @@ -1112,23 +1114,165 @@ GEN_VSX_HELPER_X2(xscvhpdp, 0x16, 0x15, 0x10, PPC2_ISA300) GEN_VSX_HELPER_R2(xscvsdqp, 0x04, 0x1A, 0x0A, PPC2_ISA300) GEN_VSX_HELPER_X2(xscvspdp, 0x12, 0x14, 0, PPC2_VSX) +/* test if +Inf */ +static void gen_is_pos_inf(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t exp_msk = (vece == MO_32) ? (uint32_t)EXP_MASK_SP : EXP_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, exp_msk)); +} + +/* test if -Inf */ +static void gen_is_neg_inf(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t exp_msk = (vece == MO_32) ? (uint32_t)EXP_MASK_SP : EXP_MASK_DP; + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, sgn_msk | exp_msk)); +} + +/* test if +Inf or -Inf */ +static void gen_is_any_inf(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t exp_msk = (vece == MO_32) ? (uint32_t)EXP_MASK_SP : EXP_MASK_DP; + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_andc_vec(vece, b, b, tcg_constant_vec_matching(t, vece, sgn_msk)); + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, exp_msk)); +} + +/* test if +0 */ +static void gen_is_pos_zero(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, 0)); +} + +/* test if -0 */ +static void gen_is_neg_zero(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, sgn_msk)); +} + +/* test if +0 or -0 */ +static void gen_is_any_zero(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_andc_vec(vece, b, b, tcg_constant_vec_matching(t, vece, sgn_msk)); + tcg_gen_cmp_vec(TCG_COND_EQ, vece, t, b, + tcg_constant_vec_matching(t, vece, 0)); +} + +/* test if +Denormal */ +static void gen_is_pos_denormal(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t frc_msk = (vece == MO_32) ? (uint32_t)FRC_MASK_SP : FRC_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_LEU, vece, t, b, + tcg_constant_vec_matching(t, vece, frc_msk)); + tcg_gen_cmp_vec(TCG_COND_NE, vece, b, b, + tcg_constant_vec_matching(t, vece, 0)); + tcg_gen_and_vec(vece, t, t, b); +} + +/* test if -Denormal */ +static void gen_is_neg_denormal(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + uint64_t frc_msk = (vece == MO_32) ? (uint32_t)FRC_MASK_SP : FRC_MASK_DP; + tcg_gen_cmp_vec(TCG_COND_LEU, vece, t, b, + tcg_constant_vec_matching(t, vece, sgn_msk | frc_msk)); + tcg_gen_cmp_vec(TCG_COND_GTU, vece, b, b, + tcg_constant_vec_matching(t, vece, sgn_msk)); + tcg_gen_and_vec(vece, t, t, b); +} + +/* test if +Denormal or -Denormal */ +static void gen_is_any_denormal(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + uint64_t frc_msk = (vece == MO_32) ? (uint32_t)FRC_MASK_SP : FRC_MASK_DP; + tcg_gen_andc_vec(vece, b, b, tcg_constant_vec_matching(t, vece, sgn_msk)); + tcg_gen_cmp_vec(TCG_COND_LE, vece, t, b, + tcg_constant_vec_matching(t, vece, frc_msk)); + tcg_gen_cmp_vec(TCG_COND_NE, vece, b, b, + tcg_constant_vec_matching(t, vece, 0)); + tcg_gen_and_vec(vece, t, t, b); +} + +/* test if NaN */ +static void gen_is_nan(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t v) +{ + uint64_t exp_msk = (vece == MO_32) ? (uint32_t)EXP_MASK_SP : EXP_MASK_DP; + uint64_t sgn_msk = (vece == MO_32) ? (uint32_t)SGN_MASK_SP : SGN_MASK_DP; + tcg_gen_and_vec(vece, b, b, tcg_constant_vec_matching(t, vece, ~sgn_msk)); + tcg_gen_cmp_vec(TCG_COND_GT, vece, t, b, + tcg_constant_vec_matching(t, vece, exp_msk)); +} + static bool do_xvtstdc(DisasContext *ctx, arg_XX2_uim *a, unsigned vece) { - static const GVecGen2i op[] = { - { - .fnoi = gen_helper_XVTSTDCSP, - .vece = MO_32 - }, - { - .fnoi = gen_helper_XVTSTDCDP, - .vece = MO_64 - }, + static const TCGOpcode vecop_list[] = { + INDEX_op_cmp_vec, 0 + }; + + GVecGen2i op = { + .fnoi = (vece == MO_32) ? gen_helper_XVTSTDCSP : gen_helper_XVTSTDCDP, + .vece = vece, + .opt_opc = vecop_list }; REQUIRE_VSX(ctx); + switch (a->uim) { + case 0: + set_cpu_vsr(a->xt, tcg_constant_i64(0), true); + set_cpu_vsr(a->xt, tcg_constant_i64(0), false); + return true; + case ((1 << 0) | (1 << 1)): + /* test if +Denormal or -Denormal */ + op.fniv = gen_is_any_denormal; + break; + case (1 << 0): + /* test if -Denormal */ + op.fniv = gen_is_neg_denormal; + break; + case (1 << 1): + /* test if +Denormal */ + op.fniv = gen_is_pos_denormal; + break; + case ((1 << 2) | (1 << 3)): + /* test if +0 or -0 */ + op.fniv = gen_is_any_zero; + break; + case (1 << 2): + /* test if -0 */ + op.fniv = gen_is_neg_zero; + break; + case (1 << 3): + /* test if +0 */ + op.fniv = gen_is_pos_zero; + break; + case ((1 << 4) | (1 << 5)): + /* test if +Inf or -Inf */ + op.fniv = gen_is_any_inf; + break; + case (1 << 4): + /* test if -Inf */ + op.fniv = gen_is_neg_inf; + break; + case (1 << 5): + /* test if +Inf */ + op.fniv = gen_is_pos_inf; + break; + case (1 << 6): + /* test if NaN */ + op.fniv = gen_is_nan; + break; + } tcg_gen_gvec_2i(vsr_full_offset(a->xt), vsr_full_offset(a->xb), - 16, 16, (int32_t)(a->uim), &op[vece - MO_32]); + 16, 16, a->uim, &op); return true; }