From patchwork Thu Aug 25 22:13:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 157BFECAAA2 for ; Thu, 25 Aug 2022 22:17:31 +0000 (UTC) Received: from localhost ([::1]:40144 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLAI-0001ww-RB for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:17:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49692) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7H-0007UZ-Oa for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:23 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:30122) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7C-0002hx-Us for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465658; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TYBd0BXAdM8xfMNh7raGYNQ09880/cEx2ZCvPjXnPkE=; b=icVe6tun0CAW2vZNRrDyH780YhzGo1i7vNONExlbGdFvS2Apd0oJ1xPX3us8bwYQlE4Ts0 xa8JH+RD6tvTOF/XsQBa5tC/CSlsZfRmi6/pwY+f7ayUqnAuT3RvUFlEIwz39pgga5tuM6 BqRSruXFKa4TJIiZk7Wc60WHg1++G+c= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-94-JfmsAj7BNiSzvB9xGhlYTg-1; Thu, 25 Aug 2022 18:14:16 -0400 X-MC-Unique: JfmsAj7BNiSzvB9xGhlYTg-1 Received: by mail-wr1-f70.google.com with SMTP id n17-20020adf8b11000000b0022536079ef1so3395467wra.0 for ; Thu, 25 Aug 2022 15:14:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=TYBd0BXAdM8xfMNh7raGYNQ09880/cEx2ZCvPjXnPkE=; b=vBd8EOVNDVKJq1RKGvzY+00MoSacgcragmDzgitJULfI1duqv+Rj05wIcP9FKPhjew Ba3k12xS2cmcpuiSix8Or1cSrhsJe9NsqmsKG/CdlkFW1YPBr1FfXcK02ouRkMqTB1ui qL/YpEJ8T9zkVL26NKOHnrlIXum0pBAeHHsK9eSYuu3UhmK3jYFx5vIFwKJGs5g3uIGv MAcryV9dEHWfRqRH5vxWpz1QX1BM2kdPWem3gv2FZ7H5Iy4VSVlqIpsKocoUpB7pDJE3 C/j1d9GZh+Ni5qnKIG/w91t9VNpGU/j1P+yUVnq7OIUOpgPVYYoV1S+syii1gXsuNSVd rVbg== X-Gm-Message-State: ACgBeo0c4wmzib+t3yXBBFUPDgrmwTjvQq9huKfMzKO53U73kQYY90sr 6uq1UO048RoYSuN45ZAhZ8/0vwDRLoXvrW7TivJVic7Hc5D200TTXG6wzvLCjRtWEeM6CQj3oBu if/IvyI7kAVtaYQArEllcCOxHiBP+LpNJEdVBKFZclrrP/d+eh33Xg38tmueJNKAs9gY= X-Received: by 2002:a05:600c:4e04:b0:3a5:a34e:ae81 with SMTP id b4-20020a05600c4e0400b003a5a34eae81mr3398920wmq.147.1661465655103; Thu, 25 Aug 2022 15:14:15 -0700 (PDT) X-Google-Smtp-Source: AA6agR4b5rXfLAG20Q5L/haueEliGAAqh/s7vSp9Vf9aDE+E2L8PmyOmGn9GRRkMns6xmH1efhP8YQ== X-Received: by 2002:a05:600c:4e04:b0:3a5:a34e:ae81 with SMTP id b4-20020a05600c4e0400b003a5a34eae81mr3398905wmq.147.1661465654695; Thu, 25 Aug 2022 15:14:14 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id r8-20020a05600c284800b003a3561d4f3fsm417572wmb.43.2022.08.25.15.14.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:14 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 01/18] i386: Rework sse_op_table1 Date: Fri, 26 Aug 2022 00:13:54 +0200 Message-Id: <20220825221411.35122-2-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Add a flags field each row in sse_op_table1. Initially this is only used as a replacement for the magic SSE_SPECIAL and SSE_DUMMY pointers, the other flags will become relevant as the rest of the AVX implementation is built out. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-5-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 314 +++++++++++++++++++++--------------- 1 file changed, 185 insertions(+), 129 deletions(-) int modrm, mod, rm, reg; + struct SSEOpHelper_table1 sse_op; SSEFunc_0_epp sse_fn_epp; SSEFunc_0_eppi sse_fn_eppi; SSEFunc_0_ppi sse_fn_ppi; @@ -3127,8 +3181,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, b1 = 3; else b1 = 0; - sse_fn_epp = sse_op_table1[b][b1]; - if (!sse_fn_epp) { + sse_op = sse_op_table1[b]; + sse_fn_epp = sse_op.op[b1]; + if ((sse_op.flags & (SSE_OPF_SPECIAL | SSE_OPF_3DNOW)) == 0 + && !sse_fn_epp) { goto unknown_op; } if ((b <= 0x5f && b >= 0x10) || b == 0xc6 || b == 0xc2) { @@ -3182,7 +3238,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, reg |= REX_R(s); } mod = (modrm >> 6) & 3; - if (sse_fn_epp == SSE_SPECIAL) { + if (sse_op.flags & SSE_OPF_SPECIAL) { b |= (b1 << 8); switch(b) { case 0x0e7: /* movntq */ @@ -3823,7 +3879,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_ldq_env_A0(s, op2_offset); } } - if (sse_fn_epp == SSE_SPECIAL) { + if (sse_fn_epp == SSE_SPECIAL_FN) { goto unknown_op; } @@ -4209,7 +4265,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, s->rip_offset = 1; - if (sse_fn_eppi == SSE_SPECIAL) { + if (sse_fn_eppi == SSE_SPECIAL_FN) { ot = mo_64_32(s->dflag); rm = (modrm & 7) | REX_B(s); if (mod != 3) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index b7972f0ff5..7fec582358 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2788,146 +2788,196 @@ typedef void (*SSEFunc_0_ppi)(TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv_i32 val); typedef void (*SSEFunc_0_eppt)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv val); -#define SSE_SPECIAL ((void *)1) -#define SSE_DUMMY ((void *)2) +#define SSE_OPF_V0 (1 << 0) /* vex.v must be 1111b (only 2 operands) */ +#define SSE_OPF_CMP (1 << 1) /* does not write for first operand */ +#define SSE_OPF_BLENDV (1 << 2) /* blendv* instruction */ +#define SSE_OPF_SPECIAL (1 << 3) /* magic */ +#define SSE_OPF_3DNOW (1 << 4) /* 3DNow! instruction */ +#define SSE_OPF_MMX (1 << 5) /* MMX/integer/AVX2 instruction */ +#define SSE_OPF_SCALAR (1 << 6) /* Has SSE scalar variants */ +#define SSE_OPF_AVX2 (1 << 7) /* AVX2 instruction */ +#define SSE_OPF_SHUF (1 << 9) /* pshufx/shufpx */ -#define MMX_OP2(x) { gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm } -#define SSE_FOP(x) { gen_helper_ ## x ## ps, gen_helper_ ## x ## pd, \ - gen_helper_ ## x ## ss, gen_helper_ ## x ## sd, } +#define OP(op, flags, a, b, c, d) \ + {flags, {a, b, c, d} } -static const SSEFunc_0_epp sse_op_table1[256][4] = { +#define MMX_OP(x) OP(op2, SSE_OPF_MMX, \ + gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm, NULL, NULL) + +#define SSE_FOP(name) OP(op2, SSE_OPF_SCALAR, \ + gen_helper_##name##ps, gen_helper_##name##pd, \ + gen_helper_##name##ss, gen_helper_##name##sd) +#define SSE_OP(sname, dname, op, flags) OP(op, flags, \ + gen_helper_##sname##_xmm, gen_helper_##dname##_xmm, NULL, NULL) + +struct SSEOpHelper_table1 { + int flags; + SSEFunc_0_epp op[4]; +}; + +#define SSE_3DNOW { SSE_OPF_3DNOW } +#define SSE_SPECIAL { SSE_OPF_SPECIAL } + +static const struct SSEOpHelper_table1 sse_op_table1[256] = { /* 3DNow! extensions */ - [0x0e] = { SSE_DUMMY }, /* femms */ - [0x0f] = { SSE_DUMMY }, /* pf... */ + [0x0e] = SSE_SPECIAL, /* femms */ + [0x0f] = SSE_3DNOW, /* pf... (sse_op_table5) */ /* pure SSE operations */ - [0x10] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movups, movupd, movss, movsd */ - [0x11] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movups, movupd, movss, movsd */ - [0x12] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movlps, movlpd, movsldup, movddup */ - [0x13] = { SSE_SPECIAL, SSE_SPECIAL }, /* movlps, movlpd */ - [0x14] = { gen_helper_punpckldq_xmm, gen_helper_punpcklqdq_xmm }, - [0x15] = { gen_helper_punpckhdq_xmm, gen_helper_punpckhqdq_xmm }, - [0x16] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movhps, movhpd, movshdup */ - [0x17] = { SSE_SPECIAL, SSE_SPECIAL }, /* movhps, movhpd */ + [0x10] = SSE_SPECIAL, /* movups, movupd, movss, movsd */ + [0x11] = SSE_SPECIAL, /* movups, movupd, movss, movsd */ + [0x12] = SSE_SPECIAL, /* movlps, movlpd, movsldup, movddup */ + [0x13] = SSE_SPECIAL, /* movlps, movlpd */ + [0x14] = SSE_OP(punpckldq, punpcklqdq, op2, 0), /* unpcklps, unpcklpd */ + [0x15] = SSE_OP(punpckhdq, punpckhqdq, op2, 0), /* unpckhps, unpckhpd */ + [0x16] = SSE_SPECIAL, /* movhps, movhpd, movshdup */ + [0x17] = SSE_SPECIAL, /* movhps, movhpd */ - [0x28] = { SSE_SPECIAL, SSE_SPECIAL }, /* movaps, movapd */ - [0x29] = { SSE_SPECIAL, SSE_SPECIAL }, /* movaps, movapd */ - [0x2a] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvtpi2ps, cvtpi2pd, cvtsi2ss, cvtsi2sd */ - [0x2b] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movntps, movntpd, movntss, movntsd */ - [0x2c] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvttps2pi, cvttpd2pi, cvttsd2si, cvttss2si */ - [0x2d] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvtps2pi, cvtpd2pi, cvtsd2si, cvtss2si */ - [0x2e] = { gen_helper_ucomiss, gen_helper_ucomisd }, - [0x2f] = { gen_helper_comiss, gen_helper_comisd }, - [0x50] = { SSE_SPECIAL, SSE_SPECIAL }, /* movmskps, movmskpd */ - [0x51] = SSE_FOP(sqrt), - [0x52] = { gen_helper_rsqrtps, NULL, gen_helper_rsqrtss, NULL }, - [0x53] = { gen_helper_rcpps, NULL, gen_helper_rcpss, NULL }, - [0x54] = { gen_helper_pand_xmm, gen_helper_pand_xmm }, /* andps, andpd */ - [0x55] = { gen_helper_pandn_xmm, gen_helper_pandn_xmm }, /* andnps, andnpd */ - [0x56] = { gen_helper_por_xmm, gen_helper_por_xmm }, /* orps, orpd */ - [0x57] = { gen_helper_pxor_xmm, gen_helper_pxor_xmm }, /* xorps, xorpd */ + [0x28] = SSE_SPECIAL, /* movaps, movapd */ + [0x29] = SSE_SPECIAL, /* movaps, movapd */ + [0x2a] = SSE_SPECIAL, /* cvtpi2ps, cvtpi2pd, cvtsi2ss, cvtsi2sd */ + [0x2b] = SSE_SPECIAL, /* movntps, movntpd, movntss, movntsd */ + [0x2c] = SSE_SPECIAL, /* cvttps2pi, cvttpd2pi, cvttsd2si, cvttss2si */ + [0x2d] = SSE_SPECIAL, /* cvtps2pi, cvtpd2pi, cvtsd2si, cvtss2si */ + [0x2e] = OP(op1, SSE_OPF_CMP | SSE_OPF_SCALAR | SSE_OPF_V0, + gen_helper_ucomiss, gen_helper_ucomisd, NULL, NULL), + [0x2f] = OP(op1, SSE_OPF_CMP | SSE_OPF_SCALAR | SSE_OPF_V0, + gen_helper_comiss, gen_helper_comisd, NULL, NULL), + [0x50] = SSE_SPECIAL, /* movmskps, movmskpd */ + [0x51] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, + gen_helper_sqrtps, gen_helper_sqrtpd, + gen_helper_sqrtss, gen_helper_sqrtsd), + [0x52] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, + gen_helper_rsqrtps, NULL, gen_helper_rsqrtss, NULL), + [0x53] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, + gen_helper_rcpps, NULL, gen_helper_rcpss, NULL), + [0x54] = SSE_OP(pand, pand, op2, 0), /* andps, andpd */ + [0x55] = SSE_OP(pandn, pandn, op2, 0), /* andnps, andnpd */ + [0x56] = SSE_OP(por, por, op2, 0), /* orps, orpd */ + [0x57] = SSE_OP(pxor, pxor, op2, 0), /* xorps, xorpd */ [0x58] = SSE_FOP(add), [0x59] = SSE_FOP(mul), - [0x5a] = { gen_helper_cvtps2pd, gen_helper_cvtpd2ps, - gen_helper_cvtss2sd, gen_helper_cvtsd2ss }, - [0x5b] = { gen_helper_cvtdq2ps, gen_helper_cvtps2dq, gen_helper_cvttps2dq }, + [0x5a] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, + gen_helper_cvtps2pd, gen_helper_cvtpd2ps, + gen_helper_cvtss2sd, gen_helper_cvtsd2ss), + [0x5b] = OP(op1, SSE_OPF_V0, + gen_helper_cvtdq2ps, gen_helper_cvtps2dq, + gen_helper_cvttps2dq, NULL), [0x5c] = SSE_FOP(sub), [0x5d] = SSE_FOP(min), [0x5e] = SSE_FOP(div), [0x5f] = SSE_FOP(max), - [0xc2] = SSE_FOP(cmpeq), - [0xc6] = { (SSEFunc_0_epp)gen_helper_shufps, - (SSEFunc_0_epp)gen_helper_shufpd }, /* XXX: casts */ + [0xc2] = SSE_FOP(cmpeq), /* sse_op_table4 */ + [0xc6] = OP(dummy, SSE_OPF_SHUF, (SSEFunc_0_epp)gen_helper_shufps, + (SSEFunc_0_epp)gen_helper_shufpd, NULL, NULL), /* SSSE3, SSE4, MOVBE, CRC32, BMI1, BMI2, ADX. */ - [0x38] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, - [0x3a] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, + [0x38] = SSE_SPECIAL, + [0x3a] = SSE_SPECIAL, /* MMX ops and their SSE extensions */ - [0x60] = MMX_OP2(punpcklbw), - [0x61] = MMX_OP2(punpcklwd), - [0x62] = MMX_OP2(punpckldq), - [0x63] = MMX_OP2(packsswb), - [0x64] = MMX_OP2(pcmpgtb), - [0x65] = MMX_OP2(pcmpgtw), - [0x66] = MMX_OP2(pcmpgtl), - [0x67] = MMX_OP2(packuswb), - [0x68] = MMX_OP2(punpckhbw), - [0x69] = MMX_OP2(punpckhwd), - [0x6a] = MMX_OP2(punpckhdq), - [0x6b] = MMX_OP2(packssdw), - [0x6c] = { NULL, gen_helper_punpcklqdq_xmm }, - [0x6d] = { NULL, gen_helper_punpckhqdq_xmm }, - [0x6e] = { SSE_SPECIAL, SSE_SPECIAL }, /* movd mm, ea */ - [0x6f] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movq, movdqa, , movqdu */ - [0x70] = { (SSEFunc_0_epp)gen_helper_pshufw_mmx, - (SSEFunc_0_epp)gen_helper_pshufd_xmm, - (SSEFunc_0_epp)gen_helper_pshufhw_xmm, - (SSEFunc_0_epp)gen_helper_pshuflw_xmm }, /* XXX: casts */ - [0x71] = { SSE_SPECIAL, SSE_SPECIAL }, /* shiftw */ - [0x72] = { SSE_SPECIAL, SSE_SPECIAL }, /* shiftd */ - [0x73] = { SSE_SPECIAL, SSE_SPECIAL }, /* shiftq */ - [0x74] = MMX_OP2(pcmpeqb), - [0x75] = MMX_OP2(pcmpeqw), - [0x76] = MMX_OP2(pcmpeql), - [0x77] = { SSE_DUMMY }, /* emms */ - [0x78] = { NULL, SSE_SPECIAL, NULL, SSE_SPECIAL }, /* extrq_i, insertq_i */ - [0x79] = { NULL, gen_helper_extrq_r, NULL, gen_helper_insertq_r }, - [0x7c] = { NULL, gen_helper_haddpd, NULL, gen_helper_haddps }, - [0x7d] = { NULL, gen_helper_hsubpd, NULL, gen_helper_hsubps }, - [0x7e] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movd, movd, , movq */ - [0x7f] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movq, movdqa, movdqu */ - [0xc4] = { SSE_SPECIAL, SSE_SPECIAL }, /* pinsrw */ - [0xc5] = { SSE_SPECIAL, SSE_SPECIAL }, /* pextrw */ - [0xd0] = { NULL, gen_helper_addsubpd, NULL, gen_helper_addsubps }, - [0xd1] = MMX_OP2(psrlw), - [0xd2] = MMX_OP2(psrld), - [0xd3] = MMX_OP2(psrlq), - [0xd4] = MMX_OP2(paddq), - [0xd5] = MMX_OP2(pmullw), - [0xd6] = { NULL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, - [0xd7] = { SSE_SPECIAL, SSE_SPECIAL }, /* pmovmskb */ - [0xd8] = MMX_OP2(psubusb), - [0xd9] = MMX_OP2(psubusw), - [0xda] = MMX_OP2(pminub), - [0xdb] = MMX_OP2(pand), - [0xdc] = MMX_OP2(paddusb), - [0xdd] = MMX_OP2(paddusw), - [0xde] = MMX_OP2(pmaxub), - [0xdf] = MMX_OP2(pandn), - [0xe0] = MMX_OP2(pavgb), - [0xe1] = MMX_OP2(psraw), - [0xe2] = MMX_OP2(psrad), - [0xe3] = MMX_OP2(pavgw), - [0xe4] = MMX_OP2(pmulhuw), - [0xe5] = MMX_OP2(pmulhw), - [0xe6] = { NULL, gen_helper_cvttpd2dq, gen_helper_cvtdq2pd, gen_helper_cvtpd2dq }, - [0xe7] = { SSE_SPECIAL , SSE_SPECIAL }, /* movntq, movntq */ - [0xe8] = MMX_OP2(psubsb), - [0xe9] = MMX_OP2(psubsw), - [0xea] = MMX_OP2(pminsw), - [0xeb] = MMX_OP2(por), - [0xec] = MMX_OP2(paddsb), - [0xed] = MMX_OP2(paddsw), - [0xee] = MMX_OP2(pmaxsw), - [0xef] = MMX_OP2(pxor), - [0xf0] = { NULL, NULL, NULL, SSE_SPECIAL }, /* lddqu */ - [0xf1] = MMX_OP2(psllw), - [0xf2] = MMX_OP2(pslld), - [0xf3] = MMX_OP2(psllq), - [0xf4] = MMX_OP2(pmuludq), - [0xf5] = MMX_OP2(pmaddwd), - [0xf6] = MMX_OP2(psadbw), - [0xf7] = { (SSEFunc_0_epp)gen_helper_maskmov_mmx, - (SSEFunc_0_epp)gen_helper_maskmov_xmm }, /* XXX: casts */ - [0xf8] = MMX_OP2(psubb), - [0xf9] = MMX_OP2(psubw), - [0xfa] = MMX_OP2(psubl), - [0xfb] = MMX_OP2(psubq), - [0xfc] = MMX_OP2(paddb), - [0xfd] = MMX_OP2(paddw), - [0xfe] = MMX_OP2(paddl), + [0x60] = MMX_OP(punpcklbw), + [0x61] = MMX_OP(punpcklwd), + [0x62] = MMX_OP(punpckldq), + [0x63] = MMX_OP(packsswb), + [0x64] = MMX_OP(pcmpgtb), + [0x65] = MMX_OP(pcmpgtw), + [0x66] = MMX_OP(pcmpgtl), + [0x67] = MMX_OP(packuswb), + [0x68] = MMX_OP(punpckhbw), + [0x69] = MMX_OP(punpckhwd), + [0x6a] = MMX_OP(punpckhdq), + [0x6b] = MMX_OP(packssdw), + [0x6c] = OP(op2, SSE_OPF_MMX, + NULL, gen_helper_punpcklqdq_xmm, NULL, NULL), + [0x6d] = OP(op2, SSE_OPF_MMX, + NULL, gen_helper_punpckhqdq_xmm, NULL, NULL), + [0x6e] = SSE_SPECIAL, /* movd mm, ea */ + [0x6f] = SSE_SPECIAL, /* movq, movdqa, , movqdu */ + [0x70] = OP(op1i, SSE_OPF_SHUF | SSE_OPF_MMX | SSE_OPF_V0, + (SSEFunc_0_epp)gen_helper_pshufw_mmx, + (SSEFunc_0_epp)gen_helper_pshufd_xmm, + (SSEFunc_0_epp)gen_helper_pshufhw_xmm, + (SSEFunc_0_epp)gen_helper_pshuflw_xmm), + [0x71] = SSE_SPECIAL, /* shiftw */ + [0x72] = SSE_SPECIAL, /* shiftd */ + [0x73] = SSE_SPECIAL, /* shiftq */ + [0x74] = MMX_OP(pcmpeqb), + [0x75] = MMX_OP(pcmpeqw), + [0x76] = MMX_OP(pcmpeql), + [0x77] = SSE_SPECIAL, /* emms */ + [0x78] = SSE_SPECIAL, /* extrq_i, insertq_i (sse4a) */ + [0x79] = OP(op1, SSE_OPF_V0, + NULL, gen_helper_extrq_r, NULL, gen_helper_insertq_r), + [0x7c] = OP(op2, 0, + NULL, gen_helper_haddpd, NULL, gen_helper_haddps), + [0x7d] = OP(op2, 0, + NULL, gen_helper_hsubpd, NULL, gen_helper_hsubps), + [0x7e] = SSE_SPECIAL, /* movd, movd, , movq */ + [0x7f] = SSE_SPECIAL, /* movq, movdqa, movdqu */ + [0xc4] = SSE_SPECIAL, /* pinsrw */ + [0xc5] = SSE_SPECIAL, /* pextrw */ + [0xd0] = OP(op2, 0, + NULL, gen_helper_addsubpd, NULL, gen_helper_addsubps), + [0xd1] = MMX_OP(psrlw), + [0xd2] = MMX_OP(psrld), + [0xd3] = MMX_OP(psrlq), + [0xd4] = MMX_OP(paddq), + [0xd5] = MMX_OP(pmullw), + [0xd6] = SSE_SPECIAL, + [0xd7] = SSE_SPECIAL, /* pmovmskb */ + [0xd8] = MMX_OP(psubusb), + [0xd9] = MMX_OP(psubusw), + [0xda] = MMX_OP(pminub), + [0xdb] = MMX_OP(pand), + [0xdc] = MMX_OP(paddusb), + [0xdd] = MMX_OP(paddusw), + [0xde] = MMX_OP(pmaxub), + [0xdf] = MMX_OP(pandn), + [0xe0] = MMX_OP(pavgb), + [0xe1] = MMX_OP(psraw), + [0xe2] = MMX_OP(psrad), + [0xe3] = MMX_OP(pavgw), + [0xe4] = MMX_OP(pmulhuw), + [0xe5] = MMX_OP(pmulhw), + [0xe6] = OP(op1, SSE_OPF_V0, + NULL, gen_helper_cvttpd2dq, + gen_helper_cvtdq2pd, gen_helper_cvtpd2dq), + [0xe7] = SSE_SPECIAL, /* movntq, movntq */ + [0xe8] = MMX_OP(psubsb), + [0xe9] = MMX_OP(psubsw), + [0xea] = MMX_OP(pminsw), + [0xeb] = MMX_OP(por), + [0xec] = MMX_OP(paddsb), + [0xed] = MMX_OP(paddsw), + [0xee] = MMX_OP(pmaxsw), + [0xef] = MMX_OP(pxor), + [0xf0] = SSE_SPECIAL, /* lddqu */ + [0xf1] = MMX_OP(psllw), + [0xf2] = MMX_OP(pslld), + [0xf3] = MMX_OP(psllq), + [0xf4] = MMX_OP(pmuludq), + [0xf5] = MMX_OP(pmaddwd), + [0xf6] = MMX_OP(psadbw), + [0xf7] = OP(op1t, SSE_OPF_MMX | SSE_OPF_V0, + (SSEFunc_0_epp)gen_helper_maskmov_mmx, + (SSEFunc_0_epp)gen_helper_maskmov_xmm, NULL, NULL), + [0xf8] = MMX_OP(psubb), + [0xf9] = MMX_OP(psubw), + [0xfa] = MMX_OP(psubl), + [0xfb] = MMX_OP(psubq), + [0xfc] = MMX_OP(paddb), + [0xfd] = MMX_OP(paddw), + [0xfe] = MMX_OP(paddl), }; +#undef MMX_OP +#undef OP +#undef SSE_FOP +#undef SSE_OP +#undef SSE_SPECIAL + +#define MMX_OP2(x) { gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm } +#define SSE_SPECIAL_FN ((void *)1) static const SSEFunc_0_epp sse_op_table2[3 * 8][2] = { [0 + 2] = MMX_OP2(psrlw), @@ -2970,6 +3020,8 @@ static const SSEFunc_l_ep sse_op_table3bq[] = { }; #endif +#define SSE_FOP(x) { gen_helper_ ## x ## ps, gen_helper_ ## x ## pd, \ + gen_helper_ ## x ## ss, gen_helper_ ## x ## sd, } static const SSEFunc_0_epp sse_op_table4[8][4] = { SSE_FOP(cmpeq), SSE_FOP(cmplt), @@ -2980,6 +3032,7 @@ static const SSEFudefine SSE42_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, CPUID_EXT_SSE42 } -#define SSE41_SPECIAL { { NULL, SSE_SPECIAL }, CPUID_EXT_SSE41 } +#define SSE41_SPECIAL { { NULL, SSE_SPECIAL_FN }, CPUID_EXT_SSE41 } #define PCLMULQDQ_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, \ CPUID_EXT_PCLMULQDQ } #define AESNI_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, CPUID_EXT_AES } @@ -3112,6 +3165,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, { int b1, op1_offset, op2_offset, is_xmm, val; From patchwork Thu Aug 25 22:13:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955288 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7A1C3ECAAA3 for ; Thu, 25 Aug 2022 22:17:30 +0000 (UTC) Received: from localhost ([::1]:38936 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLAF-0001sG-3O for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:17:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49694) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7J-0007Ua-Rc for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:31172) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7E-0002i5-Sq for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465659; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hv5VEf61RzTeuzgfj43vn3qCYdhmPZ1BWjH8zNjtBXk=; b=RpLOgNEwWSXEu0If/1MqQFj4983/rY7PpPzBZ9HHRfvwLEyh1OLdEMW1iNummCdwGAbR1x 7IWhZfh+dQZO6G+foYwvDsslk33HyOdEqlq4jUofOmbKD6iQjjrKGT+qgGibEFbnGG/OBg T5KwhgmpHDTlEh8SH8K5JZAMBWMKDxw= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-269-59fU5p1zM8GNqRCHY3GhBA-1; Thu, 25 Aug 2022 18:14:18 -0400 X-MC-Unique: 59fU5p1zM8GNqRCHY3GhBA-1 Received: by mail-wm1-f70.google.com with SMTP id j36-20020a05600c1c2400b003a540d88677so11496683wms.1 for ; Thu, 25 Aug 2022 15:14:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Hv5VEf61RzTeuzgfj43vn3qCYdhmPZ1BWjH8zNjtBXk=; b=ql68e40+rE6nBdlcIODHeVpu2OX5phRAXlhQiTnFwVRhQoErH+fY5ykqn0R5/dVgX/ xONCcFcSHRT3WaRZZpIFYdLO7Xj6etE/TNfwnM927muqcLqKFPUZe5Yq7UfZ7DlW5BS+ 1ORoyugpmM3xdjjodTchuTCo4FbtleSEbrT0pPyCOhGjKiANjPmfLDHOhfHrf1TG7OPF p5KyzBsSuFr2ILOzOfM2spskdaKM+Nr1Fi+/A3C1w+t/HZXUIGANKhuPL56cqZA/k6BJ OeKrmRrbM+ZIzKtpnolFC6f8T5S63I1gQn5tGlNMG+pfgUKgfCntP7ryez6pD7VfuVif 2Mvg== X-Gm-Message-State: ACgBeo1P6vRm+rOm4zReBGFfjeFxfuoDgJNqljJZxYpdtRRtKGjR8gmQ o1vFKUr2b+O/f06S45xHMzWnQxAqkupPoRBMt2HEmshQM2lvkL5tNELf2GZZvr8GRoSO+BG2re9 BCT3twsN5QIULw9FOnE1JfOavcb/s0Ewlpa/52XdxTSHOarUv3Gthh7e0kSySsHo2I7Q= X-Received: by 2002:adf:e4d0:0:b0:225:2947:3a5f with SMTP id v16-20020adfe4d0000000b0022529473a5fmr3311871wrm.387.1661465657086; Thu, 25 Aug 2022 15:14:17 -0700 (PDT) X-Google-Smtp-Source: AA6agR5dA6h5rArurnVE5VrXh7C8EZy9KCN+cHAAC0JqKwdIdSW5BiK/9fhmapNIAFHogI0d66jdKA== X-Received: by 2002:adf:e4d0:0:b0:225:2947:3a5f with SMTP id v16-20020adfe4d0000000b0022529473a5fmr3311861wrm.387.1661465656722; Thu, 25 Aug 2022 15:14:16 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id az41-20020a05600c602900b003a60bc8ae8fsm519253wmb.21.2022.08.25.15.14.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:16 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 02/18] i386: Rework sse_op_table6/7 Date: Fri, 26 Aug 2022 00:13:55 +0200 Message-Id: <20220825221411.35122-3-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Add a flags field each row in sse_op_table6 and sse_op_table7. Initially this is only used as a replacement for the magic SSE41_SPECIAL pointer. The other flags will become relevant as the rest of the avx implementation is built out. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-6-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 230 ++++++++++++++++++++---------------- 1 file changed, 131 insertions(+), 99 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 7fec582358..5335b86c01 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2977,7 +2977,6 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { #undef SSE_SPECIAL #define MMX_OP2(x) { gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm } -#define SSE_SPECIAL_FN ((void *)1) static const SSEFunc_0_epp sse_op_table2[3 * 8][2] = { [0 + 2] = MMX_OP2(psrlw), @@ -3061,113 +3060,134 @@ static const SSEFunc_0_epp sse_op_table5[256] = { [0xbf] = gen_helper_pavgb_mmx /* pavgusb */ }; -struct SSEOpHelper_epp { +struct SSEOpHelper_table6 { SSEFunc_0_epp op[2]; uint32_t ext_mask; + int flags; }; -struct SSEOpHelper_eppi { +struct SSEOpHelper_table7 { SSEFunc_0_eppi op[2]; uint32_t ext_mask; + int flags; }; -#define SSSE3_OP(x) { MMX_OP2(x), CPUID_EXT_SSSE3 } -#define SSE41_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, CPUID_EXT_SSE41 } -#define SSE42_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, CPUID_EXT_SSE42 } -#define SSE41_SPECIAL { { NULL, SSE_SPECIAL_FN }, CPUID_EXT_SSE41 } -#define PCLMULQDQ_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, \ - CPUID_EXT_PCLMULQDQ } -#define AESNI_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, CPUID_EXT_AES } +#define gen_helper_special_xmm NULL -static const struct SSEOpHelper_epp sse_op_table6[256] = { - [0x00] = SSSE3_OP(pshufb), - [0x01] = SSSE3_OP(phaddw), - [0x02] = SSSE3_OP(phaddd), - [0x03] = SSSE3_OP(phaddsw), - [0x04] = SSSE3_OP(pmaddubsw), - [0x05] = SSSE3_OP(phsubw), - [0x06] = SSSE3_OP(phsubd), - [0x07] = SSSE3_OP(phsubsw), - [0x08] = SSSE3_OP(psignb), - [0x09] = SSSE3_OP(psignw), - [0x0a] = SSSE3_OP(psignd), - [0x0b] = SSSE3_OP(pmulhrsw), - [0x10] = SSE41_OP(pblendvb), - [0x14] = SSE41_OP(blendvps), - [0x15] = SSE41_OP(blendvpd), - [0x17] = SSE41_OP(ptest), - [0x1c] = SSSE3_OP(pabsb), - [0x1d] = SSSE3_OP(pabsw), - [0x1e] = SSSE3_OP(pabsd), - [0x20] = SSE41_OP(pmovsxbw), - [0x21] = SSE41_OP(pmovsxbd), - [0x22] = SSE41_OP(pmovsxbq), - [0x23] = SSE41_OP(pmovsxwd), - [0x24] = SSE41_OP(pmovsxwq), - [0x25] = SSE41_OP(pmovsxdq), - [0x28] = SSE41_OP(pmuldq), - [0x29] = SSE41_OP(pcmpeqq), - [0x2a] = SSE41_SPECIAL, /* movntqda */ - [0x2b] = SSE41_OP(packusdw), - [0x30] = SSE41_OP(pmovzxbw), - [0x31] = SSE41_OP(pmovzxbd), - [0x32] = SSE41_OP(pmovzxbq), - [0x33] = SSE41_OP(pmovzxwd), - [0x34] = SSE41_OP(pmovzxwq), - [0x35] = SSE41_OP(pmovzxdq), - [0x37] = SSE42_OP(pcmpgtq), - [0x38] = SSE41_OP(pminsb), - [0x39] = SSE41_OP(pminsd), - [0x3a] = SSE41_OP(pminuw), - [0x3b] = SSE41_OP(pminud), - [0x3c] = SSE41_OP(pmaxsb), - [0x3d] = SSE41_OP(pmaxsd), - [0x3e] = SSE41_OP(pmaxuw), - [0x3f] = SSE41_OP(pmaxud), - [0x40] = SSE41_OP(pmulld), - [0x41] = SSE41_OP(phminposuw), - [0xdb] = AESNI_OP(aesimc), - [0xdc] = AESNI_OP(aesenc), - [0xdd] = AESNI_OP(aesenclast), - [0xde] = AESNI_OP(aesdec), - [0xdf] = AESNI_OP(aesdeclast), +#define OP(name, op, flags, ext, mmx_name) \ + {{mmx_name, gen_helper_ ## name ## _xmm}, CPUID_EXT_ ## ext, flags} +#define BINARY_OP_MMX(name, ext) \ + OP(name, op2, SSE_OPF_MMX, ext, gen_helper_ ## name ## _mmx) +#define BINARY_OP(name, ext, flags) \ + OP(name, op2, flags, ext, NULL) +#define UNARY_OP_MMX(name, ext) \ + OP(name, op1, SSE_OPF_V0 | SSE_OPF_MMX, ext, gen_helper_ ## name ## _mmx) +#define UNARY_OP(name, ext, flags) \ + OP(name, op1, SSE_OPF_V0 | flags, ext, NULL) +#define BLENDV_OP(name, ext, flags) OP(name, op3, SSE_OPF_BLENDV, ext, NULL) +#define CMP_OP(name, ext) OP(name, op1, SSE_OPF_CMP | SSE_OPF_V0, ext, NULL) +#define SPECIAL_OP(ext) OP(special, op1, SSE_OPF_SPECIAL, ext, NULL) + +/* prefix [66] 0f 38 */ +static const struct SSEOpHelper_table6 sse_op_table6[256] = { + [0x00] = BINARY_OP_MMX(pshufb, SSSE3), + [0x01] = BINARY_OP_MMX(phaddw, SSSE3), + [0x02] = BINARY_OP_MMX(phaddd, SSSE3), + [0x03] = BINARY_OP_MMX(phaddsw, SSSE3), + [0x04] = BINARY_OP_MMX(pmaddubsw, SSSE3), + [0x05] = BINARY_OP_MMX(phsubw, SSSE3), + [0x06] = BINARY_OP_MMX(phsubd, SSSE3), + [0x07] = BINARY_OP_MMX(phsubsw, SSSE3), + [0x08] = BINARY_OP_MMX(psignb, SSSE3), + [0x09] = BINARY_OP_MMX(psignw, SSSE3), + [0x0a] = BINARY_OP_MMX(psignd, SSSE3), + [0x0b] = BINARY_OP_MMX(pmulhrsw, SSSE3), + [0x10] = BLENDV_OP(pblendvb, SSE41, SSE_OPF_MMX), + [0x14] = BLENDV_OP(blendvps, SSE41, 0), + [0x15] = BLENDV_OP(blendvpd, SSE41, 0), + [0x17] = CMP_OP(ptest, SSE41), + [0x1c] = UNARY_OP_MMX(pabsb, SSSE3), + [0x1d] = UNARY_OP_MMX(pabsw, SSSE3), + [0x1e] = UNARY_OP_MMX(pabsd, SSSE3), + [0x20] = UNARY_OP(pmovsxbw, SSE41, SSE_OPF_MMX), + [0x21] = UNARY_OP(pmovsxbd, SSE41, SSE_OPF_MMX), + [0x22] = UNARY_OP(pmovsxbq, SSE41, SSE_OPF_MMX), + [0x23] = UNARY_OP(pmovsxwd, SSE41, SSE_OPF_MMX), + [0x24] = UNARY_OP(pmovsxwq, SSE41, SSE_OPF_MMX), + [0x25] = UNARY_OP(pmovsxdq, SSE41, SSE_OPF_MMX), + [0x28] = BINARY_OP(pmuldq, SSE41, SSE_OPF_MMX), + [0x29] = BINARY_OP(pcmpeqq, SSE41, SSE_OPF_MMX), + [0x2a] = SPECIAL_OP(SSE41), /* movntqda */ + [0x2b] = BINARY_OP(packusdw, SSE41, SSE_OPF_MMX), + [0x30] = UNARY_OP(pmovzxbw, SSE41, SSE_OPF_MMX), + [0x31] = UNARY_OP(pmovzxbd, SSE41, SSE_OPF_MMX), + [0x32] = UNARY_OP(pmovzxbq, SSE41, SSE_OPF_MMX), + [0x33] = UNARY_OP(pmovzxwd, SSE41, SSE_OPF_MMX), + [0x34] = UNARY_OP(pmovzxwq, SSE41, SSE_OPF_MMX), + [0x35] = UNARY_OP(pmovzxdq, SSE41, SSE_OPF_MMX), + [0x37] = BINARY_OP(pcmpgtq, SSE41, SSE_OPF_MMX), + [0x38] = BINARY_OP(pminsb, SSE41, SSE_OPF_MMX), + [0x39] = BINARY_OP(pminsd, SSE41, SSE_OPF_MMX), + [0x3a] = BINARY_OP(pminuw, SSE41, SSE_OPF_MMX), + [0x3b] = BINARY_OP(pminud, SSE41, SSE_OPF_MMX), + [0x3c] = BINARY_OP(pmaxsb, SSE41, SSE_OPF_MMX), + [0x3d] = BINARY_OP(pmaxsd, SSE41, SSE_OPF_MMX), + [0x3e] = BINARY_OP(pmaxuw, SSE41, SSE_OPF_MMX), + [0x3f] = BINARY_OP(pmaxud, SSE41, SSE_OPF_MMX), + [0x40] = BINARY_OP(pmulld, SSE41, SSE_OPF_MMX), + [0x41] = UNARY_OP(phminposuw, SSE41, 0), + [0xdb] = UNARY_OP(aesimc, AES, 0), + [0xdc] = BINARY_OP(aesenc, AES, 0), + [0xdd] = BINARY_OP(aesenclast, AES, 0), + [0xde] = BINARY_OP(aesdec, AES, 0), + [0xdf] = BINARY_OP(aesdeclast, AES, 0), }; -static const struct SSEOpHelper_eppi sse_op_table7[256] = { - [0x08] = SSE41_OP(roundps), - [0x09] = SSE41_OP(roundpd), - [0x0a] = SSE41_OP(roundss), - [0x0b] = SSE41_OP(roundsd), - [0x0c] = SSE41_OP(blendps), - [0x0d] = SSE41_OP(blendpd), - [0x0e] = SSE41_OP(pblendw), - [0x0f] = SSSE3_OP(palignr), - [0x14] = SSE41_SPECIAL, /* pextrb */ - [0x15] = SSE41_SPECIAL, /* pextrw */ - [0x16] = SSE41_SPECIAL, /* pextrd/pextrq */ - [0x17] = SSE41_SPECIAL, /* extractps */ - [0x20] = SSE41_SPECIAL, /* pinsrb */ - [0x21] = SSE41_SPECIAL, /* insertps */ - [0x22] = SSE41_SPECIAL, /* pinsrd/pinsrq */ - [0x40] = SSE41_OP(dpps), - [0x41] = SSE41_OP(dppd), - [0x42] = SSE41_OP(mpsadbw), - [0x44] = PCLMULQDQ_OP(pclmulqdq), - [0x60] = SSE42_OP(pcmpestrm), - [0x61] = SSE42_OP(pcmpestri), - [0x62] = SSE42_OP(pcmpistrm), - [0x63] = SSE42_OP(pcmpistri), - [0xdf] = AESNI_OP(aeskeygenassist), +/* prefix [66] 0f 3a */ +static const struct SSEOpHelper_table7 sse_op_table7[256] = { + [0x08] = UNARY_OP(roundps, SSE41, 0), + [0x09] = UNARY_OP(roundpd, SSE41, 0), + [0x0a] = UNARY_OP(roundss, SSE41, SSE_OPF_SCALAR), + [0x0b] = UNARY_OP(roundsd, SSE41, SSE_OPF_SCALAR), + [0x0c] = BINARY_OP(blendps, SSE41, 0), + [0x0d] = BINARY_OP(blendpd, SSE41, 0), + [0x0e] = BINARY_OP(pblendw, SSE41, SSE_OPF_MMX), + [0x0f] = BINARY_OP_MMX(palignr, SSSE3), + [0x14] = SPECIAL_OP(SSE41), /* pextrb */ + [0x15] = SPECIAL_OP(SSE41), /* pextrw */ + [0x16] = SPECIAL_OP(SSE41), /* pextrd/pextrq */ + [0x17] = SPECIAL_OP(SSE41), /* extractps */ + [0x20] = SPECIAL_OP(SSE41), /* pinsrb */ + [0x21] = SPECIAL_OP(SSE41), /* insertps */ + [0x22] = SPECIAL_OP(SSE41), /* pinsrd/pinsrq */ + [0x40] = BINARY_OP(dpps, SSE41, 0), + [0x41] = BINARY_OP(dppd, SSE41, 0), + [0x42] = BINARY_OP(mpsadbw, SSE41, SSE_OPF_MMX), + [0x44] = BINARY_OP(pclmulqdq, PCLMULQDQ, 0), + [0x60] = CMP_OP(pcmpestrm, SSE42), + [0x61] = CMP_OP(pcmpestri, SSE42), + [0x62] = CMP_OP(pcmpistrm, SSE42), + [0x63] = CMP_OP(pcmpistri, SSE42), + [0xdf] = UNARY_OP(aeskeygenassist, AES, 0), }; +#undef OP +#undef BINARY_OP_MMX +#undef BINARY_OP +#undef UNARY_OP_MMX +#undef UNARY_OP +#undef BLENDV_OP +#undef SPECIAL_OP + static void gen_sse(CPUX86State *env, DisasContext *s, int b, target_ulong pc_start) { int b1, op1_offset, op2_offset, is_xmm, val; int modrm, mod, rm, reg; struct SSEOpHelper_table1 sse_op; + struct SSEOpHelper_table6 op6; + struct SSEOpHelper_table7 op7; SSEFunc_0_epp sse_fn_epp; - SSEFunc_0_eppi sse_fn_eppi; SSEFunc_0_ppi sse_fn_ppi; SSEFunc_0_eppt sse_fn_eppt; MemOp ot; @@ -3828,12 +3848,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, mod = (modrm >> 6) & 3; assert(b1 < 2); - sse_fn_epp = sse_op_table6[b].op[b1]; - if (!sse_fn_epp) { + op6 = sse_op_table6[b]; + if (op6.ext_mask == 0) { goto unknown_op; } - if (!(s->cpuid_ext_features & sse_op_table6[b].ext_mask)) + if (!(s->cpuid_ext_features & op6.ext_mask)) { goto illegal_op; + } if (b1) { op1_offset = offsetof(CPUX86State,xmm_regs[reg]); @@ -3870,6 +3891,9 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } } } else { + if ((op6.flags & SSE_OPF_MMX) == 0) { + goto unknown_op; + } op1_offset = offsetof(CPUX86State,fpregs[reg].mmx); if (mod == 3) { op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); @@ -3879,13 +3903,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_ldq_env_A0(s, op2_offset); } } - if (sse_fn_epp == SSE_SPECIAL_FN) { - goto unknown_op; + if (!op6.op[b1]) { + goto illegal_op; } tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - sse_fn_epp(cpu_env, s->ptr0, s->ptr1); + op6.op[b1](cpu_env, s->ptr0, s->ptr1); if (b == 0x17) { set_cc_op(s, CC_OP_EFLAGS); @@ -4256,16 +4280,21 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, mod = (modrm >> 6) & 3; assert(b1 < 2); - sse_fn_eppi = sse_op_table7[b].op[b1]; - if (!sse_fn_eppi) { + op7 = sse_op_table7[b]; + if (op7.ext_mask == 0) { goto unknown_op; } - if (!(s->cpuid_ext_features & sse_op_table7[b].ext_mask)) + if (!(s->cpuid_ext_features & op7.ext_mask)) { goto illegal_op; + } s->rip_offset = 1; - if (sse_fn_eppi == SSE_SPECIAL_FN) { + if (op7.flags & SSE_OPF_SPECIAL) { + /* None of the "special" ops are valid on mmx registers */ + if (b1 == 0) { + goto illegal_op; + } ot = mo_64_32(s->dflag); rm = (modrm & 7) | REX_B(s); if (mod != 3) @@ -4410,6 +4439,9 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_ldo_env_A0(s, op2_offset); } } else { + if ((op7.flags & SSE_OPF_MMX) == 0) { + goto illegal_op; + } op1_offset = offsetof(CPUX86State,fpregs[reg].mmx); if (mod == 3) { op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); @@ -4432,7 +4464,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - sse_fn_eppi(cpu_env, s->ptr0, s->ptr1, tcg_const_i32(val)); + op7.op[b1](cpu_env, s->ptr0, s->ptr1, tcg_const_i32(val)); break; case 0x33a: From patchwork Thu Aug 25 22:13:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3C904ECAAA2 for ; Thu, 25 Aug 2022 22:26:00 +0000 (UTC) Received: from localhost ([::1]:51628 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLIV-0005RF-Al for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:25:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51580) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7M-0007bh-LP for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:28 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:50377) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7G-0002iF-2N for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465661; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V/JLlOdAw3QYFUnxJ8O2aMRfezhGQL+/wZadeFuKZyw=; b=ilglcSX2d5kaAAmE+jnxhh8Xw17H1xk32Q7QL6Utce4LbwQtJ9+LPHCf6JkvK3uufDGKiN d3no+SJO8ylVFaPX4Rh19ooWfxaed99rwtv2Q+aieeWvjgzFtSoXdCFYw3elqgz+OEvsQg 7ey8/JBF6NKncAA2w9WSqChUNzLsEIw= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-645-ZglJpf0EOIm4l_RvuVq3jQ-1; Thu, 25 Aug 2022 18:14:19 -0400 X-MC-Unique: ZglJpf0EOIm4l_RvuVq3jQ-1 Received: by mail-wm1-f70.google.com with SMTP id f18-20020a05600c4e9200b003a5f81299caso11487698wmq.7 for ; Thu, 25 Aug 2022 15:14:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=V/JLlOdAw3QYFUnxJ8O2aMRfezhGQL+/wZadeFuKZyw=; b=GCyCdKnZOmdsea1+sBA+qBTfKl5Flj+wd+vOYBj5z2vUORPIE+1LSP4SZpV+bBgk+l MLg5nzZnGjfUizPgOoNcoPDdJeduO5EoSFmz7qXR8MTIfuCmV8yBJMzJBsJEpmXgIV9x d+HcvOcVjEUEMUxbcvG0SJY6pbvpJtruPghRSpr2AJJI9ehH/gXjCjEswfeJcDCV6wJp fI/A/TCVydI363Z1lFf54Fp89rWddwD8My5u4cAXIQ3bgAySJGqVOaa7fqm3d1vSgiDy 56HJd57MmCxSHz9OWn3vyaFrykKQrPq/CO6zb70qSegNiW/2p5DrIc5lkzmibtzGEB96 8CZA== X-Gm-Message-State: ACgBeo2itTiBGSukMkC/ht7O18bJofKa9jOJsSWY/PzJ/yw2DjHYhGJP rnS8Vn1mQPW0+OkwwyTvC8kX06B6WicYJ+VYOqURWD6ZLGJDoHQz6Jl1TAyMskALRM2lgXVJGwH UnFaHg80F6S/Z+McH06xW+/BsVfouIu6VTKkL/Q9cccUg+cx+RfE50qG52+iW4PZJBwM= X-Received: by 2002:a5d:5343:0:b0:225:2ffe:77ba with SMTP id t3-20020a5d5343000000b002252ffe77bamr3491155wrv.453.1661465658459; Thu, 25 Aug 2022 15:14:18 -0700 (PDT) X-Google-Smtp-Source: AA6agR6p99qJxOsk2cRYLcL8XEVa0ACfPYPUVzar8js4Gb//s+G5Pzzc5vrcQaiixi1rCP5gI7kWIg== X-Received: by 2002:a5d:5343:0:b0:225:2ffe:77ba with SMTP id t3-20020a5d5343000000b002252ffe77bamr3491141wrv.453.1661465658110; Thu, 25 Aug 2022 15:14:18 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id p129-20020a1c2987000000b003a5fa79007fsm557356wmp.7.2022.08.25.15.14.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:17 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 03/18] i386: Add CHECK_NO_VEX Date: Fri, 26 Aug 2022 00:13:56 +0200 Message-Id: <20220825221411.35122-4-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Reject invalid VEX encodings on MMX instructions. Signed-off-by: Paul Brook Reviewed-by: Richard Henderson Message-Id: <20220424220204.2493824-7-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 5335b86c01..66ba690b7d 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3179,6 +3179,12 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] = { #undef BLENDV_OP #undef SPECIAL_OP +/* VEX prefix not allowed */ +#define CHECK_NO_VEX(s) do { \ + if (s->prefix & PREFIX_VEX) \ + goto illegal_op; \ + } while (0) + static void gen_sse(CPUX86State *env, DisasContext *s, int b, target_ulong pc_start) { @@ -3262,6 +3268,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, b |= (b1 << 8); switch(b) { case 0x0e7: /* movntq */ + CHECK_NO_VEX(s); if (mod == 3) { goto illegal_op; } @@ -3297,6 +3304,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x6e: /* movd mm, ea */ + CHECK_NO_VEX(s); #ifdef TARGET_X86_64 if (s->dflag == MO_64) { gen_ldst_modrm(env, s, modrm, MO_64, OR_TMP0, 0); @@ -3330,6 +3338,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x6f: /* movq mm, ea */ + CHECK_NO_VEX(s); if (mod != 3) { gen_lea_modrm(env, s, modrm); gen_ldq_env_A0(s, offsetof(CPUX86State, fpregs[reg].mmx)); @@ -3464,6 +3473,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, break; case 0x178: case 0x378: + CHECK_NO_VEX(s); { int bit_index, field_length; @@ -3484,6 +3494,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x7e: /* movd ea, mm */ + CHECK_NO_VEX(s); #ifdef TARGET_X86_64 if (s->dflag == MO_64) { tcg_gen_ld_i64(s->T0, cpu_env, @@ -3524,6 +3535,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_op_movq_env_0(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_Q(1))); break; case 0x7f: /* movq ea, mm */ + CHECK_NO_VEX(s); if (mod != 3) { gen_lea_modrm(env, s, modrm); gen_stq_env_A0(s, offsetof(CPUX86State, fpregs[reg].mmx)); @@ -3607,6 +3619,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, offsetof(CPUX86State, xmm_t0.ZMM_L(1))); op1_offset = offsetof(CPUX86State,xmm_t0); } else { + CHECK_NO_VEX(s); tcg_gen_movi_tl(s->T0, val); tcg_gen_st32_tl(s->T0, cpu_env, offsetof(CPUX86State, mmx_t0.MMX_L(0))); @@ -3648,6 +3661,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, break; case 0x02a: /* cvtpi2ps */ case 0x12a: /* cvtpi2pd */ + CHECK_NO_VEX(s); gen_helper_enter_mmx(cpu_env); if (mod != 3) { gen_lea_modrm(env, s, modrm); @@ -3693,6 +3707,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x12c: /* cvttpd2pi */ case 0x02d: /* cvtps2pi */ case 0x12d: /* cvtpd2pi */ + CHECK_NO_VEX(s); gen_helper_enter_mmx(cpu_env); if (mod != 3) { gen_lea_modrm(env, s, modrm); @@ -3766,6 +3781,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, tcg_gen_st16_tl(s->T0, cpu_env, offsetof(CPUX86State,xmm_regs[reg].ZMM_W(val))); } else { + CHECK_NO_VEX(s); val &= 3; tcg_gen_st16_tl(s->T0, cpu_env, offsetof(CPUX86State,fpregs[reg].mmx.MMX_W(val))); @@ -3805,6 +3821,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x2d6: /* movq2dq */ + CHECK_NO_VEX(s); gen_helper_enter_mmx(cpu_env); rm = (modrm & 7); gen_op_movq(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_Q(0)), @@ -3812,6 +3829,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_op_movq_env_0(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_Q(1))); break; case 0x3d6: /* movdq2q */ + CHECK_NO_VEX(s); gen_helper_enter_mmx(cpu_env); rm = (modrm & 7) | REX_B(s); gen_op_movq(s, offsetof(CPUX86State, fpregs[reg & 7].mmx), @@ -3827,6 +3845,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, offsetof(CPUX86State, xmm_regs[rm])); gen_helper_pmovmskb_xmm(s->tmp2_i32, cpu_env, s->ptr0); } else { + CHECK_NO_VEX(s); rm = (modrm & 7); tcg_gen_addi_ptr(s->ptr0, cpu_env, offsetof(CPUX86State, fpregs[rm].mmx)); @@ -3891,6 +3910,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } } } else { + CHECK_NO_VEX(s); if ((op6.flags & SSE_OPF_MMX) == 0) { goto unknown_op; } @@ -3928,6 +3948,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x3f0: /* crc32 Gd,Eb */ case 0x3f1: /* crc32 Gd,Ey */ do_crc32: + CHECK_NO_VEX(s); if (!(s->cpuid_ext_features & CPUID_EXT_SSE42)) { goto illegal_op; } @@ -3950,6 +3971,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x1f0: /* crc32 or movbe */ case 0x1f1: + CHECK_NO_VEX(s); /* For these insns, the f3 prefix is supposed to have priority over the 66 prefix, but that's not what we implement above setting b1. */ @@ -3959,6 +3981,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, /* FALLTHRU */ case 0x0f0: /* movbe Gy,My */ case 0x0f1: /* movbe My,Gy */ + CHECK_NO_VEX(s); if (!(s->cpuid_ext_features & CPUID_EXT_MOVBE)) { goto illegal_op; } @@ -4125,6 +4148,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x1f6: /* adcx Gy, Ey */ case 0x2f6: /* adox Gy, Ey */ + CHECK_NO_VEX(s); if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_ADX)) { goto illegal_op; } else { @@ -4439,6 +4463,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_ldo_env_A0(s, op2_offset); } } else { + CHECK_NO_VEX(s); if ((op7.flags & SSE_OPF_MMX) == 0) { goto illegal_op; } @@ -4565,6 +4590,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, op2_offset = offsetof(CPUX86State,xmm_regs[rm]); } } else { + CHECK_NO_VEX(s); op1_offset = offsetof(CPUX86State,fpregs[reg].mmx); if (mod != 3) { gen_lea_modrm(env, s, modrm); From patchwork Thu Aug 25 22:13:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955294 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 299D1ECAAA2 for ; Thu, 25 Aug 2022 22:22:31 +0000 (UTC) Received: from localhost ([::1]:53950 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLF8-00086b-8r for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:22:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49698) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7J-0007VL-AK for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:43220) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7H-0002iY-LS for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465663; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SjRmVtD48zaI4ofzZel1Dh+zA4qcu+w0B637Dv0CBVg=; b=R4+0Dg9eapyQl9fRS6qtzXK6nLhsFMZgOePxslWapmGlx2AnDsxEWUg+nY8fzg4zr74lsA yDFBSZ+2nFgAhqBTb82HRXh6WT4gxFzKskJqSp2GBAUqfu+jroUgIDg/wodj6laAQT0yMR nE1nOtmpvTcOgyAlVnuNW9kWKEOT9cY= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-639-iw6WXNsUOi-O9DsuBefL9w-1; Thu, 25 Aug 2022 18:14:21 -0400 X-MC-Unique: iw6WXNsUOi-O9DsuBefL9w-1 Received: by mail-wm1-f72.google.com with SMTP id h82-20020a1c2155000000b003a64d0510d9so6506033wmh.8 for ; Thu, 25 Aug 2022 15:14:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=SjRmVtD48zaI4ofzZel1Dh+zA4qcu+w0B637Dv0CBVg=; b=zah6KlZYZkzCzWiA8EHyB9fz7Yzsv2q63FCNvJqTWcW/pYs9Eqep1zwBAlr54NMeDB APz4DHkP0J/mVyRyaYxs+y7/yETF4/kalnIvzxB9OIx8mDD8PY3mwyLrz4IM+RFxJzKS TU6puo02+5w6YmrXXn+dB/SNPW/ePii4w53jfMqZP98hPipW7MYMazAKJVll+CWbMdCd 8gFnV8+D7v36EaczEx17Z1wotX/lUhayjOksI0BHEVKkjXApxrV8i9+GuWhz1BS/PGWy j+lkx9MeXlv1/oGLV0ZqX5n4EQ2lGaKJoIXvlXs0rRq4Fuph3uyyCv7wT+5MLtJPeWgM 1FPg== X-Gm-Message-State: ACgBeo2A42Z/e0oA2tGBQiq14erCZ2EtiDjirR4+TzZ4ENldFW/WwQfa zKZHmI2cVohTEUf23eQzaawQluSqrQwo/6mDZL9j21kZNiCrkfSVlasXj1Sr9Qu0mOm8g8n6zon fnKUykMf3rCj/YniDvC4BESPeFP1KbuySRNFtY8mPsJ1oykhp1UMQHU+q/tLjJKUWv6E= X-Received: by 2002:a05:6000:904:b0:21a:3dca:4297 with SMTP id bz4-20020a056000090400b0021a3dca4297mr3322437wrb.487.1661465660039; Thu, 25 Aug 2022 15:14:20 -0700 (PDT) X-Google-Smtp-Source: AA6agR5zBcVAvMLzwvbzuvJ/ykOT/Hb6MNi9k8oMa8H8XmbWWypX7p9r0l0UQuvQJxYAKipWPFyDqQ== X-Received: by 2002:a05:6000:904:b0:21a:3dca:4297 with SMTP id bz4-20020a056000090400b0021a3dca4297mr3322423wrb.487.1661465659710; Thu, 25 Aug 2022 15:14:19 -0700 (PDT) Received: from goa-sendmail ([93.56.160.208]) by smtp.gmail.com with ESMTPSA id z12-20020adff74c000000b002252f57865asm343544wrp.15.2022.08.25.15.14.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:19 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 04/18] i386: Move 3DNOW decoder Date: Fri, 26 Aug 2022 00:13:57 +0200 Message-Id: <20220825221411.35122-5-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Handle 3DNOW instructions early to avoid complicating the AVX logic. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-25-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 66ba690b7d..a51a5daff9 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3223,6 +3223,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, is_xmm = 1; } } + if (sse_op.flags & SSE_OPF_3DNOW) { + if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { + goto illegal_op; + } + } /* simple MMX/SSE operation */ if (s->flags & HF_TS_MASK) { gen_exception(s, EXCP07_PREX, pc_start - s->cs_base); @@ -4600,21 +4605,20 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, rm = (modrm & 7); op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); } + if (sse_op.flags & SSE_OPF_3DNOW) { + /* 3DNow! data insns */ + val = x86_ldub_code(env, s); + SSEFunc_0_epp op_3dnow = sse_op_table5[val]; + if (!op_3dnow) { + goto unknown_op; + } + tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); + tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); + op_3dnow(cpu_env, s->ptr0, s->ptr1); + return; + } } switch(b) { - case 0x0f: /* 3DNow! data insns */ - val = x86_ldub_code(env, s); - sse_fn_epp = sse_op_table5[val]; - if (!sse_fn_epp) { - goto unknown_op; - } - if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { - goto illegal_op; - } - tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); - tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - sse_fn_epp(cpu_env, s->ptr0, s->ptr1); - break; case 0x70: /* pshufx insn */ case 0xc6: /* pshufx insn */ val = x86_ldub_code(env, s); From patchwork Thu Aug 25 22:13:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955310 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9912DECAAA3 for ; Thu, 25 Aug 2022 22:32:14 +0000 (UTC) Received: from localhost ([::1]:50472 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLOW-0007eK-Hv for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:32:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51588) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7S-0007t1-5o for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:34 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:58828) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7Q-0002jU-7Y for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FB59XRn/4rhbAc9MXwLjZSTAewawaN9JxBrC3G+b+YA=; b=V+PiOZnsabIB0EWdMM5tXuh1iXtOhi7tQlOlk11jZtzqG8jWKdGjaRDiocCyfhxX9OKRx5 o32+RlESTjnFT9CEbtMJwG5IByFiFzOhDpFzqP0WWFdEAqkK4hEmF/BXxH6+UIm6UGlLHX osOL+nmkGmoPezA0kfTQGwGvkvk47xg= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-625-OKC8--P-Ocq1IwHyVdNA7A-1; Thu, 25 Aug 2022 18:14:22 -0400 X-MC-Unique: OKC8--P-Ocq1IwHyVdNA7A-1 Received: by mail-wm1-f70.google.com with SMTP id ay27-20020a05600c1e1b00b003a5bff0df8dso3114256wmb.0 for ; Thu, 25 Aug 2022 15:14:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=FB59XRn/4rhbAc9MXwLjZSTAewawaN9JxBrC3G+b+YA=; b=LQjUadw9tGS+twE96nF76c5PaxuTMuJHhNKXn62hJlSN/uZqTG7CM+Qx8VwyEBW6+1 O3kczrfHlPRgRQcTwods5F6fYcsya+oRZhvR1Zky9Kb+7SbZt0gIXOg/MVTU9v/LquCu xC2lVLB1czb5q0Hg5O3ZiSoacemfn5rTcLr7Hc1Gi5nAQcOy9BdJ/ErIm6NUWhMe6TjT SP8Pd+0PbLxNWlbNkj4gkkifRv+C52WtdyhZbTL+o3CeR1vO4MiOHM7Mrk5mNjcvilkX Wm+w5ECpZd/A2kBO0RY7S4DTtwHRYKlkGqvmhZHqCgGezDGFbp9h6z3PtAQCpQ3zXNOE AFmg== X-Gm-Message-State: ACgBeo0cg54h3s+r5JgR3wnOMKkMdo3y9IJndYUSPYVUeCG4JTQ+fHmy EXrGQTQq1nmA60TUZ/tIS1wmThrEa5OBDreIQxcWlur2xFrGOrg8LZrTMYiG0JAz/8RIsAGpEZJ 60tvxMhEJeVuIJXGi3vxI4OHm7br/fBHeoPk4fFNkfthBbnJ/I9Aa0PzlgP/kFiYmDdg= X-Received: by 2002:a5d:68c9:0:b0:225:330b:2d0 with SMTP id p9-20020a5d68c9000000b00225330b02d0mr3118175wrw.243.1661465661397; Thu, 25 Aug 2022 15:14:21 -0700 (PDT) X-Google-Smtp-Source: AA6agR5IDBgpGLb+oNbKyvRSHscpiy8dFPFamsD1Sc5J9F8u5xX668PUMI9ODViJetjiKMhUmFu2iA== X-Received: by 2002:a5d:68c9:0:b0:225:330b:2d0 with SMTP id p9-20020a5d68c9000000b00225330b02d0mr3118162wrw.243.1661465661093; Thu, 25 Aug 2022 15:14:21 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id b18-20020a056000055200b002252cb35184sm324503wrf.25.2022.08.25.15.14.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:20 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 05/18] i386: Add ZMM_OFFSET macro Date: Fri, 26 Aug 2022 00:13:58 +0200 Message-Id: <20220825221411.35122-6-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Add a convenience macro to get the address of an xmm_regs element within CPUX86State. This was originally going to be the basis of an implementation that broke operations into 128 bit chunks. I scrapped that idea, so this is now a purely cosmetic change. But I think a worthwhile one - it reduces the number of function calls that need to be split over multiple lines. No functional changes. Signed-off-by: Paul Brook Reviewed-by: Richard Henderson Message-Id: <20220424220204.2493824-9-paul@nowt.org> Signed-off-by: Paolo Bonzini --- target/i386/tcg/translate.c | 60 +++++++++++++++++-------------------- 1 file changed, 27 insertions(+), 33 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index a51a5daff9..57e2f8acdb 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2777,6 +2777,8 @@ static inline void gen_op_movq_env_0(DisasContext *s, int d_offset) tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset); } +#define ZMM_OFFSET(reg) offsetof(CPUX86State, xmm_regs[reg]) + typedef void (*SSEFunc_i_ep)(TCGv_i32 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_l_ep)(TCGv_i64 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_0_epi)(TCGv_ptr env, TCGv_ptr reg, TCGv_i32 val); @@ -3286,13 +3288,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); - gen_sto_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); + gen_sto_env_A0(s, ZMM_OFFSET(reg)); break; case 0x3f0: /* lddqu */ if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); - gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); + gen_ldo_env_A0(s, ZMM_OFFSET(reg)); break; case 0x22b: /* movntss */ case 0x32b: /* movntsd */ @@ -3329,15 +3331,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, #ifdef TARGET_X86_64 if (s->dflag == MO_64) { gen_ldst_modrm(env, s, modrm, MO_64, OR_TMP0, 0); - tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State,xmm_regs[reg])); + tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(reg)); gen_helper_movq_mm_T0_xmm(s->ptr0, s->T0); } else #endif { gen_ldst_modrm(env, s, modrm, MO_32, OR_TMP0, 0); - tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State,xmm_regs[reg])); + tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(reg)); tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0); gen_helper_movl_mm_T0_xmm(s->ptr0, s->tmp2_i32); } @@ -3363,11 +3363,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x26f: /* movdqu xmm, ea */ if (mod != 3) { gen_lea_modrm(env, s, modrm); - gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); + gen_ldo_env_A0(s, ZMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); - gen_op_movo(s, offsetof(CPUX86State, xmm_regs[reg]), - offsetof(CPUX86State,xmm_regs[rm])); + gen_op_movo(s, ZMM_OFFSET(reg), ZMM_OFFSET(rm)); } break; case 0x210: /* movss xmm, ea */ @@ -3421,7 +3420,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x212: /* movsldup */ if (mod != 3) { gen_lea_modrm(env, s, modrm); - gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); + gen_ldo_env_A0(s, ZMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); gen_op_movl(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_L(0)), @@ -3463,7 +3462,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x216: /* movshdup */ if (mod != 3) { gen_lea_modrm(env, s, modrm); - gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); + gen_ldo_env_A0(s, ZMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); gen_op_movl(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_L(1)), @@ -3486,8 +3485,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, goto illegal_op; field_length = x86_ldub_code(env, s) & 0x3F; bit_index = x86_ldub_code(env, s) & 0x3F; - tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State,xmm_regs[reg])); + tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(reg)); if (b1 == 1) gen_helper_extrq_i(cpu_env, s->ptr0, tcg_const_i32(bit_index), @@ -3558,11 +3556,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x27f: /* movdqu ea, xmm */ if (mod != 3) { gen_lea_modrm(env, s, modrm); - gen_sto_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); + gen_sto_env_A0(s, ZMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); - gen_op_movo(s, offsetof(CPUX86State, xmm_regs[rm]), - offsetof(CPUX86State,xmm_regs[reg])); + gen_op_movo(s, ZMM_OFFSET(rm), ZMM_OFFSET(reg)); } break; case 0x211: /* movss ea, xmm */ @@ -3641,7 +3638,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } if (is_xmm) { rm = (modrm & 7) | REX_B(s); - op2_offset = offsetof(CPUX86State,xmm_regs[rm]); + op2_offset = ZMM_OFFSET(rm); } else { rm = (modrm & 7); op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); @@ -3652,15 +3649,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, break; case 0x050: /* movmskps */ rm = (modrm & 7) | REX_B(s); - tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State,xmm_regs[rm])); + tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(rm)); gen_helper_movmskps(s->tmp2_i32, cpu_env, s->ptr0); tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32); break; case 0x150: /* movmskpd */ rm = (modrm & 7) | REX_B(s); - tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State,xmm_regs[rm])); + tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(rm)); gen_helper_movmskpd(s->tmp2_i32, cpu_env, s->ptr0); tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32); break; @@ -3676,7 +3671,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, rm = (modrm & 7); op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); } - op1_offset = offsetof(CPUX86State,xmm_regs[reg]); + op1_offset = ZMM_OFFSET(reg); tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); switch(b >> 8) { @@ -3693,7 +3688,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x32a: /* cvtsi2sd */ ot = mo_64_32(s->dflag); gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - op1_offset = offsetof(CPUX86State,xmm_regs[reg]); + op1_offset = ZMM_OFFSET(reg); tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); if (ot == MO_32) { SSEFunc_0_epi sse_fn_epi = sse_op_table3ai[(b >> 8) & 1]; @@ -3720,7 +3715,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_ldo_env_A0(s, op2_offset); } else { rm = (modrm & 7) | REX_B(s); - op2_offset = offsetof(CPUX86State,xmm_regs[rm]); + op2_offset = ZMM_OFFSET(rm); } op1_offset = offsetof(CPUX86State,fpregs[reg & 7].mmx); tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); @@ -3757,7 +3752,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, op2_offset = offsetof(CPUX86State,xmm_t0); } else { rm = (modrm & 7) | REX_B(s); - op2_offset = offsetof(CPUX86State,xmm_regs[rm]); + op2_offset = ZMM_OFFSET(rm); } tcg_gen_addi_ptr(s->ptr0, cpu_env, op2_offset); if (ot == MO_32) { @@ -3846,8 +3841,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, goto illegal_op; if (b1) { rm = (modrm & 7) | REX_B(s); - tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State, xmm_regs[rm])); + tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(rm)); gen_helper_pmovmskb_xmm(s->tmp2_i32, cpu_env, s->ptr0); } else { CHECK_NO_VEX(s); @@ -3881,9 +3875,9 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } if (b1) { - op1_offset = offsetof(CPUX86State,xmm_regs[reg]); + op1_offset = ZMM_OFFSET(reg); if (mod == 3) { - op2_offset = offsetof(CPUX86State,xmm_regs[rm | REX_B(s)]); + op2_offset = ZMM_OFFSET(rm | REX_B(s)); } else { op2_offset = offsetof(CPUX86State,xmm_t0); gen_lea_modrm(env, s, modrm); @@ -4459,9 +4453,9 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } if (b1) { - op1_offset = offsetof(CPUX86State,xmm_regs[reg]); + op1_offset = ZMM_OFFSET(reg); if (mod == 3) { - op2_offset = offsetof(CPUX86State,xmm_regs[rm | REX_B(s)]); + op2_offset = ZMM_OFFSET(rm | REX_B(s)); } else { op2_offset = offsetof(CPUX86State,xmm_t0); gen_lea_modrm(env, s, modrm); @@ -4545,7 +4539,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, break; } if (is_xmm) { - op1_offset = offsetof(CPUX86State,xmm_regs[reg]); + op1_offset = ZMM_OFFSET(reg); if (mod != 3) { int sz = 4; @@ -4592,7 +4586,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } } else { rm = (modrm & 7) | REX_B(s); - op2_offset = offsetof(CPUX86State,xmm_regs[rm]); + op2_offset = ZMM_OFFSET(rm); } } else { CHECK_NO_VEX(s); From patchwork Thu Aug 25 22:13:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955306 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B9200ECAAA3 for ; Thu, 25 Aug 2022 22:28:16 +0000 (UTC) Received: from localhost ([::1]:48122 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLKh-0001tm-Io for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:28:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51582) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7N-0007dT-7S for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:29 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:37472) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7L-0002ix-Av for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465666; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YWhQktPYK0DG3LYM16bkIqGCaa61EV2OuC9oZ3uKQSs=; b=LzmFd3Fcl3Llhm+XU+/1w0nQSUiNqr2UpqfIdGbBTfIX1gX7COTviTFMYlZ3pJ0K050osW VPMzkm39vASp9PMGT0CFI0nMEMVHzGA6CoHB4x7R6c/PnEZmppRZyC5rXknVttLNKTynWA qnX5UeQ9Q11sH9tTVSJVQG5vJckoKFM= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-153-LnfzC8xBPDiZwuuf14bV0w-1; Thu, 25 Aug 2022 18:14:25 -0400 X-MC-Unique: LnfzC8xBPDiZwuuf14bV0w-1 Received: by mail-wm1-f71.google.com with SMTP id v3-20020a1cac03000000b003a7012c430dso306545wme.3 for ; Thu, 25 Aug 2022 15:14:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=YWhQktPYK0DG3LYM16bkIqGCaa61EV2OuC9oZ3uKQSs=; b=EvG8LAdv1UUJIf4DgbQB9ArTmAyob1DGhKg8ucC/d8EDCKfCIu0uThlo6wWCoaiyIy X0WOQxD87hADXWxS/Te2WptgY6iooAxabnJBmHXuz5kOQC1poBb78f7/aBEL7xhbG9qd fMCZzqxc7zqM7n/u/Dmq9as+pIt3J8ic5wMjkmtV0L/IfBfz3Y6QU/wptzBOiVjdUMNN JVPVH1nAk+meb2Jgwiz4nVh0ulkrLxZqTyTBGqwPz4LroJXQzwVm4U3BYCQyWymLe/1K rWFLXz5eUC8CDcwxFYbKU4CQC4SeF+Es6ixtJ0KipdDHul1RmbR6+lRgAwSyhrigGaMJ F7lg== X-Gm-Message-State: ACgBeo2gcMP2w1bqwVAaJ/mF+I36VWP7eI5Dch5moRr/YukyCNKNjOD+ cKlYg7tuk88zMAb+c2+cBsbaeph7XOMlSnwMmSEMvyzr9LmG0+p009GzulKeofgIt4JnfWj4QvY L+iQV4CpL440yf9UZoUzHDHe55CnVzOZ10nuTXtjtFj7YOxhuia7iVCzEOQX/5OjcRQg= X-Received: by 2002:a05:6000:156f:b0:225:57be:b1d6 with SMTP id 15-20020a056000156f00b0022557beb1d6mr3260189wrz.423.1661465663774; Thu, 25 Aug 2022 15:14:23 -0700 (PDT) X-Google-Smtp-Source: AA6agR5rBzg2r3eCNb6VPjY+uAj58fBTrpYmcOxQryQxwkNhwH4Z6+5PYGViet2fI2WvsALYLrigog== X-Received: by 2002:a05:6000:156f:b0:225:57be:b1d6 with SMTP id 15-20020a056000156f00b0022557beb1d6mr3260177wrz.423.1661465663464; Thu, 25 Aug 2022 15:14:23 -0700 (PDT) Received: from goa-sendmail ([93.56.160.208]) by smtp.gmail.com with ESMTPSA id h3-20020adfe983000000b002254a7f4b9csm311680wrm.48.2022.08.25.15.14.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:22 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 06/18] i386: Rewrite vector shift helper Date: Fri, 26 Aug 2022 00:13:59 +0200 Message-Id: <20220825221411.35122-7-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Rewrite the vector shift helpers in preperation for AVX support (3 operand form and 256 bit vectors). For now keep the existing two operand interface. No functional changes to existing helpers. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-11-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 221 ++++++++++++++++++------------------------ 1 file changed, 96 insertions(+), 125 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index b12b271fcd..a1d3fbc482 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -56,195 +56,166 @@ #define MOVE(d, r) memcpy(&(d).B(0), &(r).B(0), SIZE) #endif -void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - int shift; +#if SHIFT == 0 +#define FPSRL(x, c) ((x) >> shift) +#define FPSRAW(x, c) ((int16_t)(x) >> shift) +#define FPSRAL(x, c) ((int32_t)(x) >> shift) +#define FPSLL(x, c) ((x) << shift) +#endif - if (s->Q(0) > 15) { - d->Q(0) = 0; -#if SHIFT == 1 - d->Q(1) = 0; -#endif +void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +{ + Reg *s = d; + int shift; + if (c->Q(0) > 15) { + for (int i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } } else { - shift = s->B(0); - d->W(0) >>= shift; - d->W(1) >>= shift; - d->W(2) >>= shift; - d->W(3) >>= shift; -#if SHIFT == 1 - d->W(4) >>= shift; - d->W(5) >>= shift; - d->W(6) >>= shift; - d->W(7) >>= shift; -#endif + shift = c->B(0); + for (int i = 0; i < 4 << SHIFT; i++) { + d->W(i) = FPSRL(s->W(i), shift); + } } } -void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { + Reg *s = d; int shift; + if (c->Q(0) > 15) { + for (int i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } + } else { + shift = c->B(0); + for (int i = 0; i < 4 << SHIFT; i++) { + d->W(i) = FPSLL(s->W(i), shift); + } + } +} - if (s->Q(0) > 15) { +void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +{ + Reg *s = d; + int shift; + if (c->Q(0) > 15) { shift = 15; } else { - shift = s->B(0); + shift = c->B(0); + } + for (int i = 0; i < 4 << SHIFT; i++) { + d->W(i) = FPSRAW(s->W(i), shift); } - d->W(0) = (int16_t)d->W(0) >> shift; - d->W(1) = (int16_t)d->W(1) >> shift; - d->W(2) = (int16_t)d->W(2) >> shift; - d->W(3) = (int16_t)d->W(3) >> shift; -#if SHIFT == 1 - d->W(4) = (int16_t)d->W(4) >> shift; - d->W(5) = (int16_t)d->W(5) >> shift; - d->W(6) = (int16_t)d->W(6) >> shift; - d->W(7) = (int16_t)d->W(7) >> shift; -#endif } -void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { + Reg *s = d; int shift; - - if (s->Q(0) > 15) { - d->Q(0) = 0; -#if SHIFT == 1 - d->Q(1) = 0; -#endif + if (c->Q(0) > 31) { + for (int i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } } else { - shift = s->B(0); - d->W(0) <<= shift; - d->W(1) <<= shift; - d->W(2) <<= shift; - d->W(3) <<= shift; -#if SHIFT == 1 - d->W(4) <<= shift; - d->W(5) <<= shift; - d->W(6) <<= shift; - d->W(7) <<= shift; -#endif + shift = c->B(0); + for (int i = 0; i < 2 << SHIFT; i++) { + d->L(i) = FPSRL(s->L(i), shift); + } } } -void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { + Reg *s = d; int shift; - - if (s->Q(0) > 31) { - d->Q(0) = 0; -#if SHIFT == 1 - d->Q(1) = 0; -#endif + if (c->Q(0) > 31) { + for (int i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } } else { - shift = s->B(0); - d->L(0) >>= shift; - d->L(1) >>= shift; -#if SHIFT == 1 - d->L(2) >>= shift; - d->L(3) >>= shift; -#endif + shift = c->B(0); + for (int i = 0; i < 2 << SHIFT; i++) { + d->L(i) = FPSLL(s->L(i), shift); + } } } -void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { + Reg *s = d; int shift; - - if (s->Q(0) > 31) { + if (c->Q(0) > 31) { shift = 31; } else { - shift = s->B(0); + shift = c->B(0); + } + for (int i = 0; i < 2 << SHIFT; i++) { + d->L(i) = FPSRAL(s->L(i), shift); } - d->L(0) = (int32_t)d->L(0) >> shift; - d->L(1) = (int32_t)d->L(1) >> shift; -#if SHIFT == 1 - d->L(2) = (int32_t)d->L(2) >> shift; - d->L(3) = (int32_t)d->L(3) >> shift; -#endif } -void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { + Reg *s = d; int shift; - - if (s->Q(0) > 31) { - d->Q(0) = 0; -#if SHIFT == 1 - d->Q(1) = 0; -#endif + if (c->Q(0) > 63) { + for (int i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } } else { - shift = s->B(0); - d->L(0) <<= shift; - d->L(1) <<= shift; -#if SHIFT == 1 - d->L(2) <<= shift; - d->L(3) <<= shift; -#endif + shift = c->B(0); + for (int i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = FPSRL(s->Q(i), shift); + } } } -void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { + Reg *s = d; int shift; - - if (s->Q(0) > 63) { - d->Q(0) = 0; -#if SHIFT == 1 - d->Q(1) = 0; -#endif + if (c->Q(0) > 63) { + for (int i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } } else { - shift = s->B(0); - d->Q(0) >>= shift; -#if SHIFT == 1 - d->Q(1) >>= shift; -#endif + shift = c->B(0); + for (int i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = FPSLL(s->Q(i), shift); + } } } -void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - int shift; - - if (s->Q(0) > 63) { - d->Q(0) = 0; -#if SHIFT == 1 - d->Q(1) = 0; -#endif - } else { - shift = s->B(0); - d->Q(0) <<= shift; -#if SHIFT == 1 - d->Q(1) <<= shift; -#endif - } -} - -#if SHIFT == 1 -void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +#if SHIFT >= 1 +void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { + Reg *s = d; int shift, i; - shift = s->L(0); + shift = c->L(0); if (shift > 16) { shift = 16; } for (i = 0; i < 16 - shift; i++) { - d->B(i) = d->B(i + shift); + d->B(i) = s->B(i + shift); } for (i = 16 - shift; i < 16; i++) { d->B(i) = 0; } } -void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { + Reg *s = d; int shift, i; - shift = s->L(0); + shift = c->L(0); if (shift > 16) { shift = 16; } for (i = 15; i >= shift; i--) { - d->B(i) = d->B(i - shift); + d->B(i) = s->B(i - shift); } for (i = 0; i < shift; i++) { d->B(i) = 0; From patchwork Thu Aug 25 22:14:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955292 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 40DB3ECAAA2 for ; Thu, 25 Aug 2022 22:22:22 +0000 (UTC) Received: from localhost ([::1]:51884 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLEz-0007eW-Ch for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:22:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51584) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7O-0007iD-OR for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:30 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:28441) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7M-0002j4-V0 for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mnq7WWcjlKpVAwqT57yhjo1aymVsPom9m3I11pNTPgc=; b=B3v+CMk4ivKwYLw+RhMMpytrKDcyzUMhEkdQr1LIWwQb+DKgfs46nUZXIp94p5rxmah55C B6nlzIohi61jGQnW6JkPP08Ba561qj6TKmEufhxM2V4dpoXyLy9rCGpVSjYmGtDIemWO3s tppBVVw4/MtUcCIUKAi7dOijeihtBQM= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-528-DSYCEtbkNtyir1Te6sdApA-1; Thu, 25 Aug 2022 18:14:27 -0400 X-MC-Unique: DSYCEtbkNtyir1Te6sdApA-1 Received: by mail-wr1-f71.google.com with SMTP id e14-20020adf9bce000000b002254afda62aso2654280wrc.18 for ; Thu, 25 Aug 2022 15:14:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=mnq7WWcjlKpVAwqT57yhjo1aymVsPom9m3I11pNTPgc=; b=6/+jVBMCKp/kb1DTuakOg4OeTvuzSCuGcAFBbEMQXoAUcjUXro0phoeSNtl+NyHEwZ qzF2b+AKMaCssiis2Ky1H4cRH3oRAnzVkv95t233O2V/LQQVI/B8g2cQ8631iOb596Fk upZLzqv2SEMN0D5xjkCs5ail3IKMV+yoTm1RvMHkpsgrTnbE9uDDOdyOJ7bv70IPqok2 XVu/cLOxP36ivhYJ8ldeEZJBr4PSWYzuljDkN0ESvLgAW6pzPp/DDbBl68AAS7+NptFh rHEYGsovbolzHURUTSljJ0QstvvUqMPJI9vrqt2jSg52/1QqA4o0DYDTiY5bEVkkLZh3 qakQ== X-Gm-Message-State: ACgBeo1Gg2qREafzLnyj0EfyJXSoqmFBDM+QYbEXivcXLrO07/GUmSbf MgArHYCt16M1V8pIwPGSkG7B+5+D6eCX4aeITYRiiy2PdWJAgSUX/tPGhQyz8SmA0mEoKeAnXmR sWjBK30q94sUdt5o2MnqnE5amDX59ke+nykwkppueVrHARyQIEiZ/QwSVuqgVAFmzceQ= X-Received: by 2002:a1c:7c0d:0:b0:3a6:673a:395e with SMTP id x13-20020a1c7c0d000000b003a6673a395emr3444144wmc.67.1661465665809; Thu, 25 Aug 2022 15:14:25 -0700 (PDT) X-Google-Smtp-Source: AA6agR5Dh1UV/THjmIpB8zXJwCjfkkbTYq9eibr5dEZF3pmq3CqK+qVAtzV8mF4R81DIkq+lVMDWfQ== X-Received: by 2002:a1c:7c0d:0:b0:3a6:673a:395e with SMTP id x13-20020a1c7c0d000000b003a6673a395emr3444132wmc.67.1661465665479; Thu, 25 Aug 2022 15:14:25 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id b4-20020adff244000000b002238ea5750csm353993wrp.72.2022.08.25.15.14.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:24 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 07/18] i386: Rewrite simple integer vector helpers Date: Fri, 26 Aug 2022 00:14:00 +0200 Message-Id: <20220825221411.35122-8-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Rewrite the "simple" vector integer helpers in preperation for AVX support. While the current code is able to use the same prototype for unary (a = F(b)) and binary (a = F(b, c)) operations, future changes will cause them to diverge. No functional changes to existing helpers Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-12-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 96 +++++++++++++++++-------------------------- 1 file changed, 38 insertions(+), 58 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index a1d3fbc482..0b5a8a9b34 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -223,63 +223,36 @@ void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } #endif -#define SSE_HELPER_B(name, F) \ +#define SSE_HELPER_1(name, elem, num, F) \ void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ { \ - d->B(0) = F(d->B(0), s->B(0)); \ - d->B(1) = F(d->B(1), s->B(1)); \ - d->B(2) = F(d->B(2), s->B(2)); \ - d->B(3) = F(d->B(3), s->B(3)); \ - d->B(4) = F(d->B(4), s->B(4)); \ - d->B(5) = F(d->B(5), s->B(5)); \ - d->B(6) = F(d->B(6), s->B(6)); \ - d->B(7) = F(d->B(7), s->B(7)); \ - XMM_ONLY( \ - d->B(8) = F(d->B(8), s->B(8)); \ - d->B(9) = F(d->B(9), s->B(9)); \ - d->B(10) = F(d->B(10), s->B(10)); \ - d->B(11) = F(d->B(11), s->B(11)); \ - d->B(12) = F(d->B(12), s->B(12)); \ - d->B(13) = F(d->B(13), s->B(13)); \ - d->B(14) = F(d->B(14), s->B(14)); \ - d->B(15) = F(d->B(15), s->B(15)); \ - ) \ - } + int n = num; \ + for (int i = 0; i < n; i++) { \ + d->elem(i) = F(s->elem(i)); \ + } \ + } + +#define SSE_HELPER_2(name, elem, num, F) \ + void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ + { \ + Reg *v = d; \ + int n = num; \ + for (int i = 0; i < n; i++) { \ + d->elem(i) = F(v->elem(i), s->elem(i)); \ + } \ + } + +#define SSE_HELPER_B(name, F) \ + SSE_HELPER_2(name, B, 8 << SHIFT, F) #define SSE_HELPER_W(name, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ - { \ - d->W(0) = F(d->W(0), s->W(0)); \ - d->W(1) = F(d->W(1), s->W(1)); \ - d->W(2) = F(d->W(2), s->W(2)); \ - d->W(3) = F(d->W(3), s->W(3)); \ - XMM_ONLY( \ - d->W(4) = F(d->W(4), s->W(4)); \ - d->W(5) = F(d->W(5), s->W(5)); \ - d->W(6) = F(d->W(6), s->W(6)); \ - d->W(7) = F(d->W(7), s->W(7)); \ - ) \ - } + SSE_HELPER_2(name, W, 4 << SHIFT, F) #define SSE_HELPER_L(name, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ - { \ - d->L(0) = F(d->L(0), s->L(0)); \ - d->L(1) = F(d->L(1), s->L(1)); \ - XMM_ONLY( \ - d->L(2) = F(d->L(2), s->L(2)); \ - d->L(3) = F(d->L(3), s->L(3)); \ - ) \ - } + SSE_HELPER_2(name, L, 2 << SHIFT, F) #define SSE_HELPER_Q(name, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ - { \ - d->Q(0) = F(d->Q(0), s->Q(0)); \ - XMM_ONLY( \ - d->Q(1) = F(d->Q(1), s->Q(1)); \ - ) \ - } + SSE_HELPER_2(name, Q, 1 << SHIFT, F) #if SHIFT == 0 static inline int satub(int x) @@ -400,12 +373,19 @@ SSE_HELPER_W(helper_pcmpeqw, FCMPEQ) SSE_HELPER_L(helper_pcmpeql, FCMPEQ) SSE_HELPER_W(helper_pmullw, FMULLW) -#if SHIFT == 0 -SSE_HELPER_W(helper_pmulhrw, FMULHRW) -#endif SSE_HELPER_W(helper_pmulhuw, FMULHUW) SSE_HELPER_W(helper_pmulhw, FMULHW) +#if SHIFT == 0 +void glue(helper_pmulhrw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + d->W(0) = FMULHRW(d->W(0), s->W(0)); + d->W(1) = FMULHRW(d->W(1), s->W(1)); + d->W(2) = FMULHRW(d->W(2), s->W(2)); + d->W(3) = FMULHRW(d->W(3), s->W(3)); +} +#endif + SSE_HELPER_B(helper_pavgb, FAVG) SSE_HELPER_W(helper_pavgw, FAVG) @@ -1538,12 +1518,12 @@ void glue(helper_phsubsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) MOVE(*d, r); } -#define FABSB(_, x) (x > INT8_MAX ? -(int8_t)x : x) -#define FABSW(_, x) (x > INT16_MAX ? -(int16_t)x : x) -#define FABSL(_, x) (x > INT32_MAX ? -(int32_t)x : x) -SSE_HELPER_B(helper_pabsb, FABSB) -SSE_HELPER_W(helper_pabsw, FABSW) -SSE_HELPER_L(helper_pabsd, FABSL) +#define FABSB(x) (x > INT8_MAX ? -(int8_t)x : x) +#define FABSW(x) (x > INT16_MAX ? -(int16_t)x : x) +#define FABSL(x) (x > INT32_MAX ? -(int32_t)x : x) +SSE_HELPER_1(helper_pabsb, B, 8 << SHIFT, FABSB) +SSE_HELPER_1(helper_pabsw, W, 4 << SHIFT, FABSW) +SSE_HELPER_1(helper_pabsd, L, 2 << SHIFT, FABSL) #define FMULHRSW(d, s) (((int16_t) d * (int16_t)s + 0x4000) >> 15) SSE_HELPER_W(helper_pmulhrsw, FMULHRSW) From patchwork Thu Aug 25 22:14:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AB9B6ECAAA3 for ; Thu, 25 Aug 2022 22:22:25 +0000 (UTC) Received: from localhost ([::1]:51886 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLF2-0007lY-T4 for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:22:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51586) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7R-0007r5-M4 for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:33 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:28815) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7O-0002jK-QH for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HXVFEPCZ4SpL+mqtv7xhNta9JBWwJIliNJXiNQlpfHE=; b=QF4EsAKzY6vF9NvwPMFYIOno5KXovHVr9tybTkm+yXWZDjRsOA4SgOiqNxRs26Y1xW9PSx rTzK7c3CnhZjHLbwHfrdq2NtKGtuLgzU3kXmVVHSpmGCiZ9SyoTqaO2NDeb3fdZvSyNg2T uwdhSxni8oBcIvXD3wCe5W5XcPOup1Y= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-219-CzVzv86oPzm8tgtTy_9FUQ-1; Thu, 25 Aug 2022 18:14:29 -0400 X-MC-Unique: CzVzv86oPzm8tgtTy_9FUQ-1 Received: by mail-wm1-f69.google.com with SMTP id h133-20020a1c218b000000b003a5fa79008bso3042221wmh.5 for ; Thu, 25 Aug 2022 15:14:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=HXVFEPCZ4SpL+mqtv7xhNta9JBWwJIliNJXiNQlpfHE=; b=60FYbdJ8wEl/4047dpqRY7fb7VHyCcl1mfCVIUOS+57BDaSu0cftanSOKkaPrdSKqZ V+zyJARwkBQtdZp50Wtz2QpAJPHKpRmcUC8zHDcYv6l9OlD7VWycRHR9O25H50mRR5Gr xp6S38eswMBc1ZizObqrp9/YDAvDlH9J+WqOnVQVp/iAHUgMaje6bMP24B25gL4Q75fx 5aQ/OUPlzajMTaalxXD8xKR8NIOEsm5Jy7O9AE1Gw+1lUCQsWb5ua6HZOjLiWbtwL6hc Asj/h1a7n8p6+goDcDaaFAqjYq8490+cgiExVzOjPI2qbWfAoBieYKfJDB3Uxw750dZU F71g== X-Gm-Message-State: ACgBeo22BJRu5S1wh0ABum+ca8QYlCpnD2fHgws9OQU+VRmkO3ZFV7yV 44hY0nSx3Lpt3tegQHkrcZLvMBTrombjspbS3HubNoKLovRgCQUXLBM6/Y+q0k6f3kr5V6iFNIX ADKtgFond/hHvInBhHM6ZGHGkTfxsjtWy5aC4nOqihe6MYFznse2X/pWckP2OMfY4+Xk= X-Received: by 2002:adf:d217:0:b0:225:259e:19e1 with SMTP id j23-20020adfd217000000b00225259e19e1mr3439181wrh.370.1661465667401; Thu, 25 Aug 2022 15:14:27 -0700 (PDT) X-Google-Smtp-Source: AA6agR6WqsfqLPtmACY0mrg2kZIkOsXZHnEaGVXIYsdG49u3eHFhSsiRE2YVvATp9+W32NFZriaQ4g== X-Received: by 2002:adf:d217:0:b0:225:259e:19e1 with SMTP id j23-20020adfd217000000b00225259e19e1mr3439172wrh.370.1661465667046; Thu, 25 Aug 2022 15:14:27 -0700 (PDT) Received: from goa-sendmail ([93.56.160.208]) by smtp.gmail.com with ESMTPSA id v7-20020a5d6b07000000b002207cec45cesm340816wrw.81.2022.08.25.15.14.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:26 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 08/18] i386: Misc integer AVX helper prep Date: Fri, 26 Aug 2022 00:14:01 +0200 Message-Id: <20220825221411.35122-9-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook More perparatory work for AVX support in various integer vector helpers No functional changes to existing helpers. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-13-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 164 +++++++++++++++++++++--------------------- 1 file changed, 80 insertions(+), 84 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 0b5a8a9b34..4d1fcbd3ae 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -391,19 +391,22 @@ SSE_HELPER_W(helper_pavgw, FAVG) void glue(helper_pmuludq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - d->Q(0) = (uint64_t)s->L(0) * (uint64_t)d->L(0); -#if SHIFT == 1 - d->Q(1) = (uint64_t)s->L(2) * (uint64_t)d->L(2); -#endif + Reg *v = d; + int i; + + for (i = 0; i < (1 << SHIFT); i++) { + d->Q(i) = (uint64_t)s->L(i * 2) * (uint64_t)v->L(i * 2); + } } void glue(helper_pmaddwd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { + Reg *v = d; int i; for (i = 0; i < (2 << SHIFT); i++) { - d->L(i) = (int16_t)s->W(2 * i) * (int16_t)d->W(2 * i) + - (int16_t)s->W(2 * i + 1) * (int16_t)d->W(2 * i + 1); + d->L(i) = (int16_t)s->W(2 * i) * (int16_t)v->W(2 * i) + + (int16_t)s->W(2 * i + 1) * (int16_t)v->W(2 * i + 1); } } @@ -417,32 +420,24 @@ static inline int abs1(int a) } } #endif + void glue(helper_psadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - unsigned int val; + Reg *v = d; + int i; - val = 0; - val += abs1(d->B(0) - s->B(0)); - val += abs1(d->B(1) - s->B(1)); - val += abs1(d->B(2) - s->B(2)); - val += abs1(d->B(3) - s->B(3)); - val += abs1(d->B(4) - s->B(4)); - val += abs1(d->B(5) - s->B(5)); - val += abs1(d->B(6) - s->B(6)); - val += abs1(d->B(7) - s->B(7)); - d->Q(0) = val; -#if SHIFT == 1 - val = 0; - val += abs1(d->B(8) - s->B(8)); - val += abs1(d->B(9) - s->B(9)); - val += abs1(d->B(10) - s->B(10)); - val += abs1(d->B(11) - s->B(11)); - val += abs1(d->B(12) - s->B(12)); - val += abs1(d->B(13) - s->B(13)); - val += abs1(d->B(14) - s->B(14)); - val += abs1(d->B(15) - s->B(15)); - d->Q(1) = val; -#endif + for (i = 0; i < (1 << SHIFT); i++) { + unsigned int val = 0; + val += abs1(v->B(8 * i + 0) - s->B(8 * i + 0)); + val += abs1(v->B(8 * i + 1) - s->B(8 * i + 1)); + val += abs1(v->B(8 * i + 2) - s->B(8 * i + 2)); + val += abs1(v->B(8 * i + 3) - s->B(8 * i + 3)); + val += abs1(v->B(8 * i + 4) - s->B(8 * i + 4)); + val += abs1(v->B(8 * i + 5) - s->B(8 * i + 5)); + val += abs1(v->B(8 * i + 6) - s->B(8 * i + 6)); + val += abs1(v->B(8 * i + 7) - s->B(8 * i + 7)); + d->Q(i) = val; + } } void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, @@ -459,20 +454,24 @@ void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, void glue(helper_movl_mm_T0, SUFFIX)(Reg *d, uint32_t val) { + int i; + d->L(0) = val; d->L(1) = 0; -#if SHIFT == 1 - d->Q(1) = 0; -#endif + for (i = 1; i < (1 << SHIFT); i++) { + d->Q(i) = 0; + } } #ifdef TARGET_X86_64 void glue(helper_movq_mm_T0, SUFFIX)(Reg *d, uint64_t val) { + int i; + d->Q(0) = val; -#if SHIFT == 1 - d->Q(1) = 0; -#endif + for (i = 1; i < (1 << SHIFT); i++) { + d->Q(i) = 0; + } } #endif @@ -1075,26 +1074,21 @@ uint32_t helper_movmskpd(CPUX86State *env, Reg *s) uint32_t glue(helper_pmovmskb, SUFFIX)(CPUX86State *env, Reg *s) { uint32_t val; + int i; val = 0; - val |= (s->B(0) >> 7); - val |= (s->B(1) >> 6) & 0x02; - val |= (s->B(2) >> 5) & 0x04; - val |= (s->B(3) >> 4) & 0x08; - val |= (s->B(4) >> 3) & 0x10; - val |= (s->B(5) >> 2) & 0x20; - val |= (s->B(6) >> 1) & 0x40; - val |= (s->B(7)) & 0x80; -#if SHIFT == 1 - val |= (s->B(8) << 1) & 0x0100; - val |= (s->B(9) << 2) & 0x0200; - val |= (s->B(10) << 3) & 0x0400; - val |= (s->B(11) << 4) & 0x0800; - val |= (s->B(12) << 5) & 0x1000; - val |= (s->B(13) << 6) & 0x2000; - val |= (s->B(14) << 7) & 0x4000; - val |= (s->B(15) << 8) & 0x8000; -#endif + for (i = 0; i < (1 << SHIFT); i++) { + uint8_t byte = 0; + byte |= (s->B(8 * i + 0) >> 7); + byte |= (s->B(8 * i + 1) >> 6) & 0x02; + byte |= (s->B(8 * i + 2) >> 5) & 0x04; + byte |= (s->B(8 * i + 3) >> 4) & 0x08; + byte |= (s->B(8 * i + 4) >> 3) & 0x10; + byte |= (s->B(8 * i + 5) >> 2) & 0x20; + byte |= (s->B(8 * i + 6) >> 1) & 0x40; + byte |= (s->B(8 * i + 7)) & 0x80; + val |= byte << (8 * i); + } return val; } @@ -1639,46 +1633,48 @@ SSE_HELPER_V(helper_blendvpd, Q, 2, FBLENDVPD) void glue(helper_ptest, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - uint64_t zf = (s->Q(0) & d->Q(0)) | (s->Q(1) & d->Q(1)); - uint64_t cf = (s->Q(0) & ~d->Q(0)) | (s->Q(1) & ~d->Q(1)); + uint64_t zf = 0, cf = 0; + int i; + for (i = 0; i < 1 << SHIFT; i++) { + zf |= (s->Q(i) & d->Q(i)); + cf |= (s->Q(i) & ~d->Q(i)); + } CC_SRC = (zf ? 0 : CC_Z) | (cf ? 0 : CC_C); } -#define SSE_HELPER_F(name, elem, num, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ - { \ - if (num > 2) { \ - if (num > 4) { \ - d->elem(7) = F(7); \ - d->elem(6) = F(6); \ - d->elem(5) = F(5); \ - d->elem(4) = F(4); \ - } \ - d->elem(3) = F(3); \ - d->elem(2) = F(2); \ - } \ - d->elem(1) = F(1); \ - d->elem(0) = F(0); \ +#define SSE_HELPER_F(name, elem, num, F) \ + void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ + { \ + int n = num; \ + for (int i = n; --i >= 0; ) { \ + d->elem(i) = F(i); \ + } \ } -SSE_HELPER_F(helper_pmovsxbw, W, 8, (int8_t) s->B) -SSE_HELPER_F(helper_pmovsxbd, L, 4, (int8_t) s->B) -SSE_HELPER_F(helper_pmovsxbq, Q, 2, (int8_t) s->B) -SSE_HELPER_F(helper_pmovsxwd, L, 4, (int16_t) s->W) -SSE_HELPER_F(helper_pmovsxwq, Q, 2, (int16_t) s->W) -SSE_HELPER_F(helper_pmovsxdq, Q, 2, (int32_t) s->L) -SSE_HELPER_F(helper_pmovzxbw, W, 8, s->B) -SSE_HELPER_F(helper_pmovzxbd, L, 4, s->B) -SSE_HELPER_F(helper_pmovzxbq, Q, 2, s->B) -SSE_HELPER_F(helper_pmovzxwd, L, 4, s->W) -SSE_HELPER_F(helper_pmovzxwq, Q, 2, s->W) -SSE_HELPER_F(helper_pmovzxdq, Q, 2, s->L) +#if SHIFT > 0 +SSE_HELPER_F(helper_pmovsxbw, W, 4 << SHIFT, (int8_t) s->B) +SSE_HELPER_F(helper_pmovsxbd, L, 2 << SHIFT, (int8_t) s->B) +SSE_HELPER_F(helper_pmovsxbq, Q, 1 << SHIFT, (int8_t) s->B) +SSE_HELPER_F(helper_pmovsxwd, L, 2 << SHIFT, (int16_t) s->W) +SSE_HELPER_F(helper_pmovsxwq, Q, 1 << SHIFT, (int16_t) s->W) +SSE_HELPER_F(helper_pmovsxdq, Q, 1 << SHIFT, (int32_t) s->L) +SSE_HELPER_F(helper_pmovzxbw, W, 4 << SHIFT, s->B) +SSE_HELPER_F(helper_pmovzxbd, L, 2 << SHIFT, s->B) +SSE_HELPER_F(helper_pmovzxbq, Q, 1 << SHIFT, s->B) +SSE_HELPER_F(helper_pmovzxwd, L, 2 << SHIFT, s->W) +SSE_HELPER_F(helper_pmovzxwq, Q, 1 << SHIFT, s->W) +SSE_HELPER_F(helper_pmovzxdq, Q, 1 << SHIFT, s->L) +#endif void glue(helper_pmuldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - d->Q(0) = (int64_t)(int32_t) d->L(0) * (int32_t) s->L(0); - d->Q(1) = (int64_t)(int32_t) d->L(2) * (int32_t) s->L(2); + Reg *v = d; + int i; + + for (i = 0; i < 1 << SHIFT; i++) { + d->Q(i) = (int64_t)(int32_t) v->L(2 * i) * (int32_t) s->L(2 * i); + } } #define FCMPEQQ(d, s) (d == s ? -1 : 0) From patchwork Thu Aug 25 22:14:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955314 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CE22AECAAA2 for ; Thu, 25 Aug 2022 22:36:31 +0000 (UTC) Received: from localhost ([::1]:56922 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLSg-0005QY-Tz for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:36:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60482) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7U-0007zc-2h for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:36 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:23261) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7R-0002jh-7v for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rftcleKzaj7+bQSu90jUAfXku6a0DS9NWPx/QUKtLMY=; b=LyCv8m6bAYfW1EliCoU4IoRIueStVJ3v+FZh57e4WbCwBZQanTBz4FhpzXRNP2sQqxhUJQ gy1sTX9hpNVKiXDAGo4PRmKbd4CUppi92VRx5vjKZtIAAzBHmqWMWR6ioonZMm+QpTjKJJ N3kMGOmGUa6Py9NcwTJJ6NkmebCdPbs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-638-dz8MbSYxNii0cC92MBr7pw-1; Thu, 25 Aug 2022 18:14:31 -0400 X-MC-Unique: dz8MbSYxNii0cC92MBr7pw-1 Received: by mail-wm1-f69.google.com with SMTP id h82-20020a1c2155000000b003a64d0510d9so6506186wmh.8 for ; Thu, 25 Aug 2022 15:14:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=rftcleKzaj7+bQSu90jUAfXku6a0DS9NWPx/QUKtLMY=; b=RchgSzHKKoGuC5pnKCj5EHmv3FpNDqYpzd7G8TrQypbtWMqh/1R0i93bGuUvT+etKy nr9MMYS1i6rpFF+f4xO+SeB2HgGvrKu6i0zGQGvAe7yG8ibMwG6fo+ZPJ0AXcXk/3c5g 77SeZE6nOZ8hPim889WO+McwR41v2PirTngv54jT9vfT2IogHI1nskzQMkLDytj0Th9F T8TDtPKIZ9wmtZ0YsbgqBljEkd6FrCb99PCJqrhoWX63bSW0gppz0D8PdN6YjPooXcY2 dRpFEGfFsj0va+DD3rvBweKl0Ul/yIv/1yv91RgtgRpewWAEm+ps0h0bJHYlOJnjmLFV bf3g== X-Gm-Message-State: ACgBeo3TQAmYPMz2OWwwtkiz9m6JpRQ4taukrxKFWtZSfWg3MTmQwe7M A+JAbtTolbOYVUOsKZHZ6iFcKVlb4HLguMkDWzW41mq3v5g1oPwqYkV44A9UOo4BguWO6yQ1bf+ LNJ3XDNXul52sY8FEe9R5ypshFynum0oPWiLLFJ66HeoWbbDTg+6e0vMKyANnyK1Sicg= X-Received: by 2002:a05:600c:3781:b0:3a6:804a:afc with SMTP id o1-20020a05600c378100b003a6804a0afcmr8836423wmr.27.1661465669268; Thu, 25 Aug 2022 15:14:29 -0700 (PDT) X-Google-Smtp-Source: AA6agR6lVmzoz3Q7sys/zWd0tGZTU17jxne/Qdcoptk0mfOh0eJDhUjDMkwvymrhVQt1o21+qM/F4w== X-Received: by 2002:a05:600c:3781:b0:3a6:804a:afc with SMTP id o1-20020a05600c378100b003a6804a0afcmr8836402wmr.27.1661465668669; Thu, 25 Aug 2022 15:14:28 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id r186-20020a1c44c3000000b003a5bd9448e5sm459695wma.28.2022.08.25.15.14.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:28 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 09/18] i386: Destructive vector helpers for AVX Date: Fri, 26 Aug 2022 00:14:02 +0200 Message-Id: <20220825221411.35122-10-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook These helpers need to take special care to avoid overwriting source values before the wole result has been calculated. Currently they use a dummy Reg typed variable to store the result then assign the whole register. This will cause 128 bit operations to corrupt the upper half of the register, so replace it with explicit temporaries and element assignments. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-14-paul@nowt.org> Signed-off-by: Paolo Bonzini --- target/i386/ops_sse.h | 565 +++++++++++++++++++++--------------------- 1 file changed, 284 insertions(+), 281 deletions(-) - Reg r; +#if SHIFT == 0 + uint8_t r[8]; - for (i = 0; i < (8 << SHIFT); i++) { - r.B(i) = (s->B(i) & 0x80) ? 0 : (d->B(s->B(i) & ((8 << SHIFT) - 1))); + for (i = 0; i < 8; i++) { + r[i] = (s->B(i) & 0x80) ? 0 : (v->B(s->B(i) & 7)); } + for (i = 0; i < 8; i++) { + d->B(i) = r[i]; + } +#else + uint8_t r[8 << SHIFT]; - MOVE(*d, r); -} - -void glue(helper_phaddw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - - Reg r; - - r.W(0) = (int16_t)d->W(0) + (int16_t)d->W(1); - r.W(1) = (int16_t)d->W(2) + (int16_t)d->W(3); - XMM_ONLY(r.W(2) = (int16_t)d->W(4) + (int16_t)d->W(5)); - XMM_ONLY(r.W(3) = (int16_t)d->W(6) + (int16_t)d->W(7)); - r.W((2 << SHIFT) + 0) = (int16_t)s->W(0) + (int16_t)s->W(1); - r.W((2 << SHIFT) + 1) = (int16_t)s->W(2) + (int16_t)s->W(3); - XMM_ONLY(r.W(6) = (int16_t)s->W(4) + (int16_t)s->W(5)); - XMM_ONLY(r.W(7) = (int16_t)s->W(6) + (int16_t)s->W(7)); - - MOVE(*d, r); -} - -void glue(helper_phaddd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - Reg r; - - r.L(0) = (int32_t)d->L(0) + (int32_t)d->L(1); - XMM_ONLY(r.L(1) = (int32_t)d->L(2) + (int32_t)d->L(3)); - r.L((1 << SHIFT) + 0) = (int32_t)s->L(0) + (int32_t)s->L(1); - XMM_ONLY(r.L(3) = (int32_t)s->L(2) + (int32_t)s->L(3)); - - MOVE(*d, r); -} - -void glue(helper_phaddsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - Reg r; - - r.W(0) = satsw((int16_t)d->W(0) + (int16_t)d->W(1)); - r.W(1) = satsw((int16_t)d->W(2) + (int16_t)d->W(3)); - XMM_ONLY(r.W(2) = satsw((int16_t)d->W(4) + (int16_t)d->W(5))); - XMM_ONLY(r.W(3) = satsw((int16_t)d->W(6) + (int16_t)d->W(7))); - r.W((2 << SHIFT) + 0) = satsw((int16_t)s->W(0) + (int16_t)s->W(1)); - r.W((2 << SHIFT) + 1) = satsw((int16_t)s->W(2) + (int16_t)s->W(3)); - XMM_ONLY(r.W(6) = satsw((int16_t)s->W(4) + (int16_t)s->W(5))); - XMM_ONLY(r.W(7) = satsw((int16_t)s->W(6) + (int16_t)s->W(7))); - - MOVE(*d, r); -} - -void glue(helper_pmaddubsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - d->W(0) = satsw((int8_t)s->B(0) * (uint8_t)d->B(0) + - (int8_t)s->B(1) * (uint8_t)d->B(1)); - d->W(1) = satsw((int8_t)s->B(2) * (uint8_t)d->B(2) + - (int8_t)s->B(3) * (uint8_t)d->B(3)); - d->W(2) = satsw((int8_t)s->B(4) * (uint8_t)d->B(4) + - (int8_t)s->B(5) * (uint8_t)d->B(5)); - d->W(3) = satsw((int8_t)s->B(6) * (uint8_t)d->B(6) + - (int8_t)s->B(7) * (uint8_t)d->B(7)); -#if SHIFT == 1 - d->W(4) = satsw((int8_t)s->B(8) * (uint8_t)d->B(8) + - (int8_t)s->B(9) * (uint8_t)d->B(9)); - d->W(5) = satsw((int8_t)s->B(10) * (uint8_t)d->B(10) + - (int8_t)s->B(11) * (uint8_t)d->B(11)); - d->W(6) = satsw((int8_t)s->B(12) * (uint8_t)d->B(12) + - (int8_t)s->B(13) * (uint8_t)d->B(13)); - d->W(7) = satsw((int8_t)s->B(14) * (uint8_t)d->B(14) + - (int8_t)s->B(15) * (uint8_t)d->B(15)); + for (i = 0; i < 8 << SHIFT; i++) { + int j = i & ~0xf; + r[i] = (s->B(i) & 0x80) ? 0 : v->B(j | (s->B(i) & 0xf)); + } + for (i = 0; i < 8 << SHIFT; i++) { + d->B(i) = r[i]; + } #endif } -void glue(helper_phsubw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - Reg r; +#if SHIFT == 0 - r.W(0) = (int16_t)d->W(0) - (int16_t)d->W(1); - r.W(1) = (int16_t)d->W(2) - (int16_t)d->W(3); - XMM_ONLY(r.W(2) = (int16_t)d->W(4) - (int16_t)d->W(5)); - XMM_ONLY(r.W(3) = (int16_t)d->W(6) - (int16_t)d->W(7)); - r.W((2 << SHIFT) + 0) = (int16_t)s->W(0) - (int16_t)s->W(1); - r.W((2 << SHIFT) + 1) = (int16_t)s->W(2) - (int16_t)s->W(3); - XMM_ONLY(r.W(6) = (int16_t)s->W(4) - (int16_t)s->W(5)); - XMM_ONLY(r.W(7) = (int16_t)s->W(6) - (int16_t)s->W(7)); - MOVE(*d, r); +#define SSE_HELPER_HW(name, F) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +{ \ + Reg *v = d; \ + uint16_t r[4]; \ + r[0] = F(v->W(0), v->W(1)); \ + r[1] = F(v->W(2), v->W(3)); \ + r[2] = F(s->W(0), s->W(1)); \ + r[3] = F(s->W(3), s->W(3)); \ + d->W(0) = r[0]; \ + d->W(1) = r[1]; \ + d->W(2) = r[2]; \ + d->W(3) = r[3]; \ } -void glue(helper_phsubd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - Reg r; - - r.L(0) = (int32_t)d->L(0) - (int32_t)d->L(1); - XMM_ONLY(r.L(1) = (int32_t)d->L(2) - (int32_t)d->L(3)); - r.L((1 << SHIFT) + 0) = (int32_t)s->L(0) - (int32_t)s->L(1); - XMM_ONLY(r.L(3) = (int32_t)s->L(2) - (int32_t)s->L(3)); - MOVE(*d, r); +#define SSE_HELPER_HL(name, F) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +{ \ + Reg *v = d; \ + uint32_t r0, r1; \ + r0 = F(v->L(0), v->L(1)); \ + r1 = F(s->L(0), s->L(1)); \ + d->W(0) = r0; \ + d->W(1) = r1; \ } -void glue(helper_phsubsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - Reg r; +#else - r.W(0) = satsw((int16_t)d->W(0) - (int16_t)d->W(1)); - r.W(1) = satsw((int16_t)d->W(2) - (int16_t)d->W(3)); - XMM_ONLY(r.W(2) = satsw((int16_t)d->W(4) - (int16_t)d->W(5))); - XMM_ONLY(r.W(3) = satsw((int16_t)d->W(6) - (int16_t)d->W(7))); - r.W((2 << SHIFT) + 0) = satsw((int16_t)s->W(0) - (int16_t)s->W(1)); - r.W((2 << SHIFT) + 1) = satsw((int16_t)s->W(2) - (int16_t)s->W(3)); - XMM_ONLY(r.W(6) = satsw((int16_t)s->W(4) - (int16_t)s->W(5))); - XMM_ONLY(r.W(7) = satsw((int16_t)s->W(6) - (int16_t)s->W(7))); - MOVE(*d, r); +#define SSE_HELPER_HW(name, F) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +{ \ + Reg *v = d; \ + int16_t r[4 << SHIFT]; \ + int i, j; \ + for (i = j = 0; j < 8; i++, j += 2) { \ + r[i] = F(v->W(j), v->W(j + 1)); \ + } \ + for (j = 0; j < 8; i++, j += 2) { \ + r[i] = F(s->W(j), s->W(j + 1)); \ + } \ + for (i = 0; i < 4 << SHIFT; i++) { \ + d->W(i) = r[i]; \ + } \ +} + +#define SSE_HELPER_HL(name, F) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +{ \ + Reg *v = d; \ + int32_t r[2 << SHIFT]; \ + int i, j; \ + for (i = j = 0; j < 4; i++, j += 2) { \ + r[i] = F(v->L(j), v->L(j + 1)); \ + } \ + for (j = 0; j < 4; i++, j += 2) { \ + r[i] = F(s->L(j), s->L(j + 1)); \ + } \ + for (i = 0; i < 2 << SHIFT; i++) { \ + d->L(i) = r[i]; \ + } \ +} +#endif + +SSE_HELPER_HW(phaddw, FADD) +SSE_HELPER_HW(phsubw, FSUB) +SSE_HELPER_HW(phaddsw, FADDSW) +SSE_HELPER_HW(phsubsw, FSUBSW) +SSE_HELPER_HL(phaddd, FADD) +SSE_HELPER_HL(phsubd, FSUB) + +#undef SSE_HELPER_HW +#undef SSE_HELPER_HL + +void glue(helper_pmaddubsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + Reg *v = d; + int i; + for (i = 0; i < 4 << SHIFT; i++) { + d->W(i) = satsw((int8_t)s->B(i * 2) * (uint8_t)v->B(i * 2) + + (int8_t)s->B(i * 2 + 1) * (uint8_t)v->B(i * 2 + 1)); + } } #define FABSB(x) (x > INT8_MAX ? -(int8_t)x : x) @@ -1532,32 +1521,38 @@ SSE_HELPER_L(helper_psignd, FSIGNL) void glue(helper_palignr, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, int32_t shift) { - Reg r; + Reg *v = d; + int i; /* XXX could be checked during translation */ - if (shift >= (16 << SHIFT)) { - r.Q(0) = 0; - XMM_ONLY(r.Q(1) = 0); + if (shift >= (SHIFT ? 32 : 16)) { + for (i = 0; i < (1 << SHIFT); i++) { + d->Q(i) = 0; + } } else { shift <<= 3; #define SHR(v, i) (i < 64 && i > -64 ? i > 0 ? v >> (i) : (v << -(i)) : 0) #if SHIFT == 0 - r.Q(0) = SHR(s->Q(0), shift - 0) | - SHR(d->Q(0), shift - 64); + d->Q(0) = SHR(s->Q(0), shift - 0) | + SHR(v->Q(0), shift - 64); #else - r.Q(0) = SHR(s->Q(0), shift - 0) | - SHR(s->Q(1), shift - 64) | - SHR(d->Q(0), shift - 128) | - SHR(d->Q(1), shift - 192); - r.Q(1) = SHR(s->Q(0), shift + 64) | - SHR(s->Q(1), shift - 0) | - SHR(d->Q(0), shift - 64) | - SHR(d->Q(1), shift - 128); + for (i = 0; i < (1 << SHIFT); i += 2) { + uint64_t r0, r1; + + r0 = SHR(s->Q(i), shift - 0) | + SHR(s->Q(i + 1), shift - 64) | + SHR(v->Q(i), shift - 128) | + SHR(v->Q(i + 1), shift - 192); + r1 = SHR(s->Q(i), shift + 64) | + SHR(s->Q(i + 1), shift - 0) | + SHR(v->Q(i), shift - 64) | + SHR(v->Q(i + 1), shift - 128); + d->Q(i) = r0; + d->Q(i + 1) = r1; + } #endif #undef SHR } - - MOVE(*d, r); } #define XMM0 (env->xmm_regs[0]) @@ -1682,17 +1677,23 @@ SSE_HELPER_Q(helper_pcmpeqq, FCMPEQQ) void glue(helper_packusdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - Reg r; + Reg *v = d; + uint16_t r[8]; + int i, j, k; - r.W(0) = satuw((int32_t) d->L(0)); - r.W(1) = satuw((int32_t) d->L(1)); - r.W(2) = satuw((int32_t) d->L(2)); - r.W(3) = satuw((int32_t) d->L(3)); - r.W(4) = satuw((int32_t) s->L(0)); - r.W(5) = satuw((int32_t) s->L(1)); - r.W(6) = satuw((int32_t) s->L(2)); - r.W(7) = satuw((int32_t) s->L(3)); - MOVE(*d, r); + for (i = 0, j = 0; i <= 2 << SHIFT; i += 8, j += 4) { + r[0] = satuw(v->L(j)); + r[1] = satuw(v->L(j + 1)); + r[2] = satuw(v->L(j + 2)); + r[3] = satuw(v->L(j + 3)); + r[4] = satuw(s->L(j)); + r[5] = satuw(s->L(j + 1)); + r[6] = satuw(s->L(j + 2)); + r[7] = satuw(s->L(j + 3)); + for (k = 0; k < 8; k++) { + d->W(i + k) = r[k]; + } + } } #define FMINSB(d, s) MIN((int8_t)d, (int8_t)s) @@ -1948,20 +1949,22 @@ void glue(helper_dppd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) void glue(helper_mpsadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t offset) { + Reg *v = d; int s0 = (offset & 3) << 2; int d0 = (offset & 4) << 0; int i; - Reg r; + uint16_t r[8]; for (i = 0; i < 8; i++, d0++) { - r.W(i) = 0; - r.W(i) += abs1(d->B(d0 + 0) - s->B(s0 + 0)); - r.W(i) += abs1(d->B(d0 + 1) - s->B(s0 + 1)); - r.W(i) += abs1(d->B(d0 + 2) - s->B(s0 + 2)); - r.W(i) += abs1(d->B(d0 + 3) - s->B(s0 + 3)); + r[i] = 0; + r[i] += abs1(v->B(d0 + 0) - s->B(s0 + 0)); + r[i] += abs1(v->B(d0 + 1) - s->B(s0 + 1)); + r[i] += abs1(v->B(d0 + 2) - s->B(s0 + 2)); + r[i] += abs1(v->B(d0 + 3) - s->B(s0 + 3)); + } + for (i = 0; i < 8; i++) { + d->W(i) = r[i]; } - - MOVE(*d, r); } /* SSE4.2 op helpers */ diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 4d1fcbd3ae..5265005f1e 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -475,71 +475,68 @@ void glue(helper_movq_mm_T0, SUFFIX)(Reg *d, uint64_t val) } #endif +#define SHUFFLE4(F, a, b, offset) do { \ + r0 = a->F((order & 3) + offset); \ + r1 = a->F(((order >> 2) & 3) + offset); \ + r2 = b->F(((order >> 4) & 3) + offset); \ + r3 = b->F(((order >> 6) & 3) + offset); \ + d->F(offset) = r0; \ + d->F(offset + 1) = r1; \ + d->F(offset + 2) = r2; \ + d->F(offset + 3) = r3; \ + } while (0) + #if SHIFT == 0 void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int order) { - Reg r; + uint16_t r0, r1, r2, r3; - r.W(0) = s->W(order & 3); - r.W(1) = s->W((order >> 2) & 3); - r.W(2) = s->W((order >> 4) & 3); - r.W(3) = s->W((order >> 6) & 3); - MOVE(*d, r); + SHUFFLE4(W, s, s, 0); } #else void helper_shufps(Reg *d, Reg *s, int order) { - Reg r; + Reg *v = d; + uint32_t r0, r1, r2, r3; - r.L(0) = d->L(order & 3); - r.L(1) = d->L((order >> 2) & 3); - r.L(2) = s->L((order >> 4) & 3); - r.L(3) = s->L((order >> 6) & 3); - MOVE(*d, r); + SHUFFLE4(L, v, s, 0); } void helper_shufpd(Reg *d, Reg *s, int order) { - Reg r; + Reg *v = d; + uint64_t r0, r1; - r.Q(0) = d->Q(order & 1); - r.Q(1) = s->Q((order >> 1) & 1); - MOVE(*d, r); + r0 = v->Q(order & 1); + r1 = s->Q((order >> 1) & 1); + d->Q(0) = r0; + d->Q(1) = r1; } void glue(helper_pshufd, SUFFIX)(Reg *d, Reg *s, int order) { - Reg r; + uint32_t r0, r1, r2, r3; - r.L(0) = s->L(order & 3); - r.L(1) = s->L((order >> 2) & 3); - r.L(2) = s->L((order >> 4) & 3); - r.L(3) = s->L((order >> 6) & 3); - MOVE(*d, r); + SHUFFLE4(L, s, s, 0); +#if SHIFT == 2 + SHUFFLE4(L, s, s, 4); +#endif } void glue(helper_pshuflw, SUFFIX)(Reg *d, Reg *s, int order) { - Reg r; + uint16_t r0, r1, r2, r3; - r.W(0) = s->W(order & 3); - r.W(1) = s->W((order >> 2) & 3); - r.W(2) = s->W((order >> 4) & 3); - r.W(3) = s->W((order >> 6) & 3); - r.Q(1) = s->Q(1); - MOVE(*d, r); + SHUFFLE4(W, s, s, 0); + d->Q(1) = s->Q(1); } void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) { - Reg r; + uint16_t r0, r1, r2, r3; - r.Q(0) = s->Q(0); - r.W(4) = s->W(4 + (order & 3)); - r.W(5) = s->W(4 + ((order >> 2) & 3)); - r.W(6) = s->W(4 + ((order >> 4) & 3)); - r.W(7) = s->W(4 + ((order >> 6) & 3)); - MOVE(*d, r); + d->Q(0) = s->Q(0); + SHUFFLE4(W, s, s, 4); } #endif @@ -1092,156 +1089,157 @@ uint32_t glue(helper_pmovmskb, SUFFIX)(CPUX86State *env, Reg *s) return val; } -void glue(helper_packsswb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - Reg r; +#if SHIFT == 0 +#define PACK_WIDTH 4 +#else +#define PACK_WIDTH 8 +#endif - r.B(0) = satsb((int16_t)d->W(0)); - r.B(1) = satsb((int16_t)d->W(1)); - r.B(2) = satsb((int16_t)d->W(2)); - r.B(3) = satsb((int16_t)d->W(3)); -#if SHIFT == 1 - r.B(4) = satsb((int16_t)d->W(4)); - r.B(5) = satsb((int16_t)d->W(5)); - r.B(6) = satsb((int16_t)d->W(6)); - r.B(7) = satsb((int16_t)d->W(7)); -#endif - r.B((4 << SHIFT) + 0) = satsb((int16_t)s->W(0)); - r.B((4 << SHIFT) + 1) = satsb((int16_t)s->W(1)); - r.B((4 << SHIFT) + 2) = satsb((int16_t)s->W(2)); - r.B((4 << SHIFT) + 3) = satsb((int16_t)s->W(3)); -#if SHIFT == 1 - r.B(12) = satsb((int16_t)s->W(4)); - r.B(13) = satsb((int16_t)s->W(5)); - r.B(14) = satsb((int16_t)s->W(6)); - r.B(15) = satsb((int16_t)s->W(7)); -#endif - MOVE(*d, r); +#define PACK4(F, to, reg, from) do { \ + r[to + 0] = F((int16_t)reg->W(from + 0)); \ + r[to + 1] = F((int16_t)reg->W(from + 1)); \ + r[to + 2] = F((int16_t)reg->W(from + 2)); \ + r[to + 3] = F((int16_t)reg->W(from + 3)); \ + } while (0) + +#define PACK_HELPER_B(name, F) \ +void glue(helper_pack ## name, SUFFIX)(CPUX86State *env, \ + Reg *d, Reg *s) \ +{ \ + Reg *v = d; \ + uint8_t r[PACK_WIDTH * 2]; \ + int i; \ + PACK4(F, 0, v, 0); \ + PACK4(F, PACK_WIDTH, s, 0); \ + XMM_ONLY( \ + PACK4(F, 4, v, 4); \ + PACK4(F, 12, s, 4); \ + ) \ + for (i = 0; i < PACK_WIDTH * 2; i++) { \ + d->B(i) = r[i]; \ + } \ } -void glue(helper_packuswb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ - Reg r; - - r.B(0) = satub((int16_t)d->W(0)); - r.B(1) = satub((int16_t)d->W(1)); - r.B(2) = satub((int16_t)d->W(2)); - r.B(3) = satub((int16_t)d->W(3)); -#if SHIFT == 1 - r.B(4) = satub((int16_t)d->W(4)); - r.B(5) = satub((int16_t)d->W(5)); - r.B(6) = satub((int16_t)d->W(6)); - r.B(7) = satub((int16_t)d->W(7)); -#endif - r.B((4 << SHIFT) + 0) = satub((int16_t)s->W(0)); - r.B((4 << SHIFT) + 1) = satub((int16_t)s->W(1)); - r.B((4 << SHIFT) + 2) = satub((int16_t)s->W(2)); - r.B((4 << SHIFT) + 3) = satub((int16_t)s->W(3)); -#if SHIFT == 1 - r.B(12) = satub((int16_t)s->W(4)); - r.B(13) = satub((int16_t)s->W(5)); - r.B(14) = satub((int16_t)s->W(6)); - r.B(15) = satub((int16_t)s->W(7)); -#endif - MOVE(*d, r); -} +PACK_HELPER_B(sswb, satsb) +PACK_HELPER_B(uswb, satub) void glue(helper_packssdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - Reg r; + Reg *v = d; + uint16_t r[PACK_WIDTH]; + int i, j, k; - r.W(0) = satsw(d->L(0)); - r.W(1) = satsw(d->L(1)); -#if SHIFT == 1 - r.W(2) = satsw(d->L(2)); - r.W(3) = satsw(d->L(3)); + for (i = 0, j = 0; i <= 2 << SHIFT; i += 8, j += 4) { + r[0] = satsw(v->L(j)); + r[1] = satsw(v->L(j + 1)); + r[PACK_WIDTH / 2 + 0] = satsw(s->L(j)); + r[PACK_WIDTH / 2 + 1] = satsw(s->L(j + 1)); +#if SHIFT >= 1 + r[2] = satsw(v->L(j + 2)); + r[3] = satsw(v->L(j + 3)); + r[6] = satsw(s->L(j + 2)); + r[7] = satsw(s->L(j + 3)); #endif - r.W((2 << SHIFT) + 0) = satsw(s->L(0)); - r.W((2 << SHIFT) + 1) = satsw(s->L(1)); -#if SHIFT == 1 - r.W(6) = satsw(s->L(2)); - r.W(7) = satsw(s->L(3)); -#endif - MOVE(*d, r); + for (k = 0; k < PACK_WIDTH; k++) { + d->W(i + k) = r[k]; + } + } } #define UNPCK_OP(base_name, base) \ \ void glue(helper_punpck ## base_name ## bw, SUFFIX)(CPUX86State *env,\ - Reg *d, Reg *s) \ + Reg *d, Reg *s) \ { \ - Reg r; \ + Reg *v = d; \ + uint8_t r[PACK_WIDTH * 2]; \ + int i; \ \ - r.B(0) = d->B((base << (SHIFT + 2)) + 0); \ - r.B(1) = s->B((base << (SHIFT + 2)) + 0); \ - r.B(2) = d->B((base << (SHIFT + 2)) + 1); \ - r.B(3) = s->B((base << (SHIFT + 2)) + 1); \ - r.B(4) = d->B((base << (SHIFT + 2)) + 2); \ - r.B(5) = s->B((base << (SHIFT + 2)) + 2); \ - r.B(6) = d->B((base << (SHIFT + 2)) + 3); \ - r.B(7) = s->B((base << (SHIFT + 2)) + 3); \ + r[0] = v->B((base * PACK_WIDTH) + 0); \ + r[1] = s->B((base * PACK_WIDTH) + 0); \ + r[2] = v->B((base * PACK_WIDTH) + 1); \ + r[3] = s->B((base * PACK_WIDTH) + 1); \ + r[4] = v->B((base * PACK_WIDTH) + 2); \ + r[5] = s->B((base * PACK_WIDTH) + 2); \ + r[6] = v->B((base * PACK_WIDTH) + 3); \ + r[7] = s->B((base * PACK_WIDTH) + 3); \ XMM_ONLY( \ - r.B(8) = d->B((base << (SHIFT + 2)) + 4); \ - r.B(9) = s->B((base << (SHIFT + 2)) + 4); \ - r.B(10) = d->B((base << (SHIFT + 2)) + 5); \ - r.B(11) = s->B((base << (SHIFT + 2)) + 5); \ - r.B(12) = d->B((base << (SHIFT + 2)) + 6); \ - r.B(13) = s->B((base << (SHIFT + 2)) + 6); \ - r.B(14) = d->B((base << (SHIFT + 2)) + 7); \ - r.B(15) = s->B((base << (SHIFT + 2)) + 7); \ + r[8] = v->B((base * PACK_WIDTH) + 4); \ + r[9] = s->B((base * PACK_WIDTH) + 4); \ + r[10] = v->B((base * PACK_WIDTH) + 5); \ + r[11] = s->B((base * PACK_WIDTH) + 5); \ + r[12] = v->B((base * PACK_WIDTH) + 6); \ + r[13] = s->B((base * PACK_WIDTH) + 6); \ + r[14] = v->B((base * PACK_WIDTH) + 7); \ + r[15] = s->B((base * PACK_WIDTH) + 7); \ ) \ - MOVE(*d, r); \ + for (i = 0; i < PACK_WIDTH * 2; i++) { \ + d->B(i) = r[i]; \ + } \ } \ \ void glue(helper_punpck ## base_name ## wd, SUFFIX)(CPUX86State *env,\ - Reg *d, Reg *s) \ + Reg *d, Reg *s) \ { \ - Reg r; \ + Reg *v = d; \ + uint16_t r[PACK_WIDTH]; \ + int i; \ \ - r.W(0) = d->W((base << (SHIFT + 1)) + 0); \ - r.W(1) = s->W((base << (SHIFT + 1)) + 0); \ - r.W(2) = d->W((base << (SHIFT + 1)) + 1); \ - r.W(3) = s->W((base << (SHIFT + 1)) + 1); \ + r[0] = v->W((base * (PACK_WIDTH / 2)) + 0); \ + r[1] = s->W((base * (PACK_WIDTH / 2)) + 0); \ + r[2] = v->W((base * (PACK_WIDTH / 2)) + 1); \ + r[3] = s->W((base * (PACK_WIDTH / 2)) + 1); \ XMM_ONLY( \ - r.W(4) = d->W((base << (SHIFT + 1)) + 2); \ - r.W(5) = s->W((base << (SHIFT + 1)) + 2); \ - r.W(6) = d->W((base << (SHIFT + 1)) + 3); \ - r.W(7) = s->W((base << (SHIFT + 1)) + 3); \ + r[4] = v->W((base * 4) + 2); \ + r[5] = s->W((base * 4) + 2); \ + r[6] = v->W((base * 4) + 3); \ + r[7] = s->W((base * 4) + 3); \ ) \ - MOVE(*d, r); \ + for (i = 0; i < PACK_WIDTH; i++) { \ + d->W(i) = r[i]; \ + } \ } \ \ void glue(helper_punpck ## base_name ## dq, SUFFIX)(CPUX86State *env,\ - Reg *d, Reg *s) \ + Reg *d, Reg *s) \ { \ - Reg r; \ + Reg *v = d; \ + uint32_t r[4]; \ \ - r.L(0) = d->L((base << SHIFT) + 0); \ - r.L(1) = s->L((base << SHIFT) + 0); \ + r[0] = v->L((base * (PACK_WIDTH / 4)) + 0); \ + r[1] = s->L((base * (PACK_WIDTH / 4)) + 0); \ XMM_ONLY( \ - r.L(2) = d->L((base << SHIFT) + 1); \ - r.L(3) = s->L((base << SHIFT) + 1); \ + r[2] = v->L((base * 2) + 1); \ + r[3] = s->L((base * 2) + 1); \ + d->L(2) = r[2]; \ + d->L(3) = r[3]; \ ) \ - MOVE(*d, r); \ + d->L(0) = r[0]; \ + d->L(1) = r[1]; \ } \ Reg *s) \ + void glue(helper_punpck ## base_name ## qdq, SUFFIX)( \ + CPUX86State *env, Reg *d, Reg *s) \ { \ - Reg r; \ + Reg *v = d; \ + uint64_t r[2]; \ \ - r.Q(0) = d->Q(base); \ - r.Q(1) = s->Q(base); \ - MOVE(*d, r); \ + r[0] = v->Q(base); \ + r[1] = s->Q(base); \ + d->Q(0) = r[0]; \ + d->Q(1) = r[1]; \ } \ ) UNPCK_OP(l, 0) UNPCK_OP(h, 1) +#undef PACK_WIDTH +#undef PACK_HELPER_B +#undef PACK4 + + /* 3DNow! float ops */ #if SHIFT == 0 void helper_pi2fd(CPUX86State *env, MMXReg *d, MMXReg *s) @@ -1394,122 +1392,113 @@ void helper_pswapd(CPUX86State *env, MMXReg *d, MMXReg *s) /* SSSE3 op helpers */ void glue(helper_pshufb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { + Reg *v = d; int i; From patchwork Thu Aug 25 22:14:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955290 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 379DBECAAA3 for ; Thu, 25 Aug 2022 22:20:38 +0000 (UTC) Received: from localhost ([::1]:43988 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLDJ-0003xM-26 for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:20:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60484) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7U-000827-NV for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:36 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:24822) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7S-0002jo-3g for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465673; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TsZpzVkXhT/EIcCYSGDxH9ei/35fcWpeRZ1Vg4GHF/w=; b=GDIcwtclM881p/unKV40ID0K8lPwLMHp9x5dKDvxfdiEJJbEihUjL2QbAgIzo63MGZHQ+c J9AKf26p04Nd0+b+0G70rf9SihrGX9FjDcZdYovbKxQptS+1CdZiKm4US4LlylFmjvo96f bwgIInIZ/fMGC9/mSBJH2TmRMj7J4aE= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-660-bPYxKniGNi6BdHIgyW1ssw-1; Thu, 25 Aug 2022 18:14:32 -0400 X-MC-Unique: bPYxKniGNi6BdHIgyW1ssw-1 Received: by mail-wr1-f69.google.com with SMTP id a7-20020adfbc47000000b002257209590cso1216583wrh.12 for ; Thu, 25 Aug 2022 15:14:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=TsZpzVkXhT/EIcCYSGDxH9ei/35fcWpeRZ1Vg4GHF/w=; b=TIQhPVhPUUeoX+86lieecq/j+QNmhIPDmpUQcOjrLY2Z7DYgGx4N2I5gR3dO8qK4hL yXfPnIJ0naJz57/ot/mNi/Pma69U8RVJgEV100yRop+TXHJ7z7JB8X3BDppTf0GVYhZT 7+hYYz9qC/F+K3sLIj0SToYHizNIveWr/aiSpCnsJXHzbStOBpII4Mkjzlh1kW/FkNHs MRw8Teo8GkFIE9qheDxMfHeJVrQ837PWvrMZ2+5R7/qd110aTfyx+/Bjn2kW3VPeqynl aeUhbRRVwAyEA5f8efvmn//WL9jnJs00BU1f6Ar0Hzb1c/URtUHc7vQ59pxujwr4Hhrn b7eQ== X-Gm-Message-State: ACgBeo2od6kFdOBGd9PuI+DDFtSaBMBN6RH8q4MFHDV/BkCBRujSz8lK GRmwsCjUmMmmAsL2ljR0/u7PwAx5KE+aBLLFzNTm9z4cA+uprLdndO+DMcc2AsjV07ZMghrXuLU JAJuyCWDi+Se2XL7VsjWqt8fUN/BKJcwv6h78h7EgTWBF2AzAC47lubimGrh0l0MHQr0= X-Received: by 2002:a05:6000:2c5:b0:225:618e:1708 with SMTP id o5-20020a05600002c500b00225618e1708mr3515504wry.510.1661465670362; Thu, 25 Aug 2022 15:14:30 -0700 (PDT) X-Google-Smtp-Source: AA6agR6yg1iZ6+Q2EeOz0uOlrH5DSopk5QCOb+01SyI3Qy0Ow8NUzg/DgWRq/ns+h3elEOOuVSHTzw== X-Received: by 2002:a05:6000:2c5:b0:225:618e:1708 with SMTP id o5-20020a05600002c500b00225618e1708mr3515487wry.510.1661465669911; Thu, 25 Aug 2022 15:14:29 -0700 (PDT) Received: from goa-sendmail ([93.56.160.208]) by smtp.gmail.com with ESMTPSA id r2-20020a1c2b02000000b003a5b6086381sm6292000wmr.48.2022.08.25.15.14.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:29 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 10/18] i386: Add size suffix to vector FP helpers Date: Fri, 26 Aug 2022 00:14:03 +0200 Message-Id: <20220825221411.35122-11-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook For AVX we're going to need both 128 bit (xmm) and 256 bit (ymm) variants of floating point helpers. Add the register type suffix to the existing *PS and *PD helpers (SS and SD variants are only valid on 128 bit vectors) No functional changes. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-15-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 48 ++++++++++++++++++------------------ target/i386/ops_sse_header.h | 48 ++++++++++++++++++------------------ target/i386/tcg/translate.c | 37 +++++++++++++-------------- 3 files changed, 67 insertions(+), 66 deletions(-) #define SSE_FOP(name) OP(op2, SSE_OPF_SCALAR, \ - gen_helper_##name##ps, gen_helper_##name##pd, \ + gen_helper_##name##ps##_xmm, gen_helper_##name##pd##_xmm, \ gen_helper_##name##ss, gen_helper_##name##sd) #define SSE_OP(sname, dname, op, flags) OP(op, flags, \ gen_helper_##sname##_xmm, gen_helper_##dname##_xmm, NULL, NULL) @@ -2846,12 +2846,12 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { gen_helper_comiss, gen_helper_comisd, NULL, NULL), [0x50] = SSE_SPECIAL, /* movmskps, movmskpd */ [0x51] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, - gen_helper_sqrtps, gen_helper_sqrtpd, + gen_helper_sqrtps_xmm, gen_helper_sqrtpd_xmm, gen_helper_sqrtss, gen_helper_sqrtsd), [0x52] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, - gen_helper_rsqrtps, NULL, gen_helper_rsqrtss, NULL), + gen_helper_rsqrtps_xmm, NULL, gen_helper_rsqrtss, NULL), [0x53] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, - gen_helper_rcpps, NULL, gen_helper_rcpss, NULL), + gen_helper_rcpps_xmm, NULL, gen_helper_rcpss, NULL), [0x54] = SSE_OP(pand, pand, op2, 0), /* andps, andpd */ [0x55] = SSE_OP(pandn, pandn, op2, 0), /* andnps, andnpd */ [0x56] = SSE_OP(por, por, op2, 0), /* orps, orpd */ @@ -2859,19 +2859,19 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x58] = SSE_FOP(add), [0x59] = SSE_FOP(mul), [0x5a] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, - gen_helper_cvtps2pd, gen_helper_cvtpd2ps, + gen_helper_cvtps2pd_xmm, gen_helper_cvtpd2ps_xmm, gen_helper_cvtss2sd, gen_helper_cvtsd2ss), [0x5b] = OP(op1, SSE_OPF_V0, - gen_helper_cvtdq2ps, gen_helper_cvtps2dq, - gen_helper_cvttps2dq, NULL), + gen_helper_cvtdq2ps_xmm, gen_helper_cvtps2dq_xmm, + gen_helper_cvttps2dq_xmm, NULL), [0x5c] = SSE_FOP(sub), [0x5d] = SSE_FOP(min), [0x5e] = SSE_FOP(div), [0x5f] = SSE_FOP(max), [0xc2] = SSE_FOP(cmpeq), /* sse_op_table4 */ - [0xc6] = OP(dummy, SSE_OPF_SHUF, (SSEFunc_0_epp)gen_helper_shufps, - (SSEFunc_0_epp)gen_helper_shufpd, NULL, NULL), + [0xc6] = OP(dummy, SSE_OPF_SHUF, (SSEFunc_0_epp)gen_helper_shufps_xmm, + (SSEFunc_0_epp)gen_helper_shufpd_xmm, NULL, NULL), /* SSSE3, SSE4, MOVBE, CRC32, BMI1, BMI2, ADX. */ [0x38] = SSE_SPECIAL, @@ -2912,15 +2912,15 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x79] = OP(op1, SSE_OPF_V0, NULL, gen_helper_extrq_r, NULL, gen_helper_insertq_r), [0x7c] = OP(op2, 0, - NULL, gen_helper_haddpd, NULL, gen_helper_haddps), + NULL, gen_helper_haddpd_xmm, NULL, gen_helper_haddps_xmm), [0x7d] = OP(op2, 0, - NULL, gen_helper_hsubpd, NULL, gen_helper_hsubps), + NULL, gen_helper_hsubpd_xmm, NULL, gen_helper_hsubps_xmm), [0x7e] = SSE_SPECIAL, /* movd, movd, , movq */ [0x7f] = SSE_SPECIAL, /* movq, movdqa, movdqu */ [0xc4] = SSE_SPECIAL, /* pinsrw */ [0xc5] = SSE_SPECIAL, /* pextrw */ [0xd0] = OP(op2, 0, - NULL, gen_helper_addsubpd, NULL, gen_helper_addsubps), + NULL, gen_helper_addsubpd_xmm, NULL, gen_helper_addsubps_xmm), [0xd1] = MMX_OP(psrlw), [0xd2] = MMX_OP(psrld), [0xd3] = MMX_OP(psrlq), @@ -2943,8 +2943,8 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0xe4] = MMX_OP(pmulhuw), [0xe5] = MMX_OP(pmulhw), [0xe6] = OP(op1, SSE_OPF_V0, - NULL, gen_helper_cvttpd2dq, - gen_helper_cvtdq2pd, gen_helper_cvtpd2dq), + NULL, gen_helper_cvttpd2dq_xmm, + gen_helper_cvtdq2pd_xmm, gen_helper_cvtpd2dq_xmm), [0xe7] = SSE_SPECIAL, /* movntq, movntq */ [0xe8] = MMX_OP(psubsb), [0xe9] = MMX_OP(psubsw), @@ -3021,8 +3021,9 @@ static const SSEFunc_l_ep sse_op_table3bq[] = { }; #endif -#define SSE_FOP(x) { gen_helper_ ## x ## ps, gen_helper_ ## x ## pd, \ - gen_helper_ ## x ## ss, gen_helper_ ## x ## sd, } +#define SSE_FOP(x) { \ + gen_helper_ ## x ## ps ## _xmm, gen_helper_ ## x ## pd ## _xmm, \ + gen_helper_ ## x ## ss, gen_helper_ ## x ## sd} static const SSEFunc_0_epp sse_op_table4[8][4] = { SSE_FOP(cmpeq), SSE_FOP(cmplt), @@ -3650,13 +3651,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x050: /* movmskps */ rm = (modrm & 7) | REX_B(s); tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(rm)); - gen_helper_movmskps(s->tmp2_i32, cpu_env, s->ptr0); + gen_helper_movmskps_xmm(s->tmp2_i32, cpu_env, s->ptr0); tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32); break; case 0x150: /* movmskpd */ rm = (modrm & 7) | REX_B(s); tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(rm)); - gen_helper_movmskpd(s->tmp2_i32, cpu_env, s->ptr0); + gen_helper_movmskpd_xmm(s->tmp2_i32, cpu_env, s->ptr0); tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32); break; case 0x02a: /* cvtpi2ps */ diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 5265005f1e..17fdc68f6e 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -494,7 +494,7 @@ void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int order) SHUFFLE4(W, s, s, 0); } #else -void helper_shufps(Reg *d, Reg *s, int order) +void glue(helper_shufps, SUFFIX)(Reg *d, Reg *s, int order) { Reg *v = d; uint32_t r0, r1, r2, r3; @@ -502,7 +502,7 @@ void helper_shufps(Reg *d, Reg *s, int order) SHUFFLE4(L, v, s, 0); } -void helper_shufpd(Reg *d, Reg *s, int order) +void glue(helper_shufpd, SUFFIX)(Reg *d, Reg *s, int order) { Reg *v = d; uint64_t r0, r1; @@ -545,7 +545,7 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) /* XXX: not accurate */ #define SSE_HELPER_S(name, F) \ - void helper_ ## name ## ps(CPUX86State *env, Reg *d, Reg *s) \ + void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ { \ d->ZMM_S(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ d->ZMM_S(1) = F(32, d->ZMM_S(1), s->ZMM_S(1)); \ @@ -558,7 +558,7 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) d->ZMM_S(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ } \ \ - void helper_ ## name ## pd(CPUX86State *env, Reg *d, Reg *s) \ + void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ { \ d->ZMM_D(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ d->ZMM_D(1) = F(64, d->ZMM_D(1), s->ZMM_D(1)); \ @@ -594,7 +594,7 @@ SSE_HELPER_S(sqrt, FPU_SQRT) /* float to float conversions */ -void helper_cvtps2pd(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_cvtps2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { float32 s0, s1; @@ -604,7 +604,7 @@ void helper_cvtps2pd(CPUX86State *env, Reg *d, Reg *s) d->ZMM_D(1) = float32_to_float64(s1, &env->sse_status); } -void helper_cvtpd2ps(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_cvtpd2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_S(0) = float64_to_float32(s->ZMM_D(0), &env->sse_status); d->ZMM_S(1) = float64_to_float32(s->ZMM_D(1), &env->sse_status); @@ -622,7 +622,7 @@ void helper_cvtsd2ss(CPUX86State *env, Reg *d, Reg *s) } /* integer to float */ -void helper_cvtdq2ps(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_cvtdq2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_S(0) = int32_to_float32(s->ZMM_L(0), &env->sse_status); d->ZMM_S(1) = int32_to_float32(s->ZMM_L(1), &env->sse_status); @@ -630,7 +630,7 @@ void helper_cvtdq2ps(CPUX86State *env, Reg *d, Reg *s) d->ZMM_S(3) = int32_to_float32(s->ZMM_L(3), &env->sse_status); } -void helper_cvtdq2pd(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_cvtdq2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int32_t l0, l1; @@ -707,7 +707,7 @@ WRAP_FLOATCONV(int64_t, float32_to_int64_round_to_zero, float32, INT64_MIN) WRAP_FLOATCONV(int64_t, float64_to_int64, float64, INT64_MIN) WRAP_FLOATCONV(int64_t, float64_to_int64_round_to_zero, float64, INT64_MIN) -void helper_cvtps2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_cvtps2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_L(0) = x86_float32_to_int32(s->ZMM_S(0), &env->sse_status); d->ZMM_L(1) = x86_float32_to_int32(s->ZMM_S(1), &env->sse_status); @@ -715,7 +715,7 @@ void helper_cvtps2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) d->ZMM_L(3) = x86_float32_to_int32(s->ZMM_S(3), &env->sse_status); } -void helper_cvtpd2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_cvtpd2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_L(0) = x86_float64_to_int32(s->ZMM_D(0), &env->sse_status); d->ZMM_L(1) = x86_float64_to_int32(s->ZMM_D(1), &env->sse_status); @@ -757,7 +757,7 @@ int64_t helper_cvtsd2sq(CPUX86State *env, ZMMReg *s) #endif /* float to integer truncated */ -void helper_cvttps2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_cvttps2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_L(0) = x86_float32_to_int32_round_to_zero(s->ZMM_S(0), &env->sse_status); d->ZMM_L(1) = x86_float32_to_int32_round_to_zero(s->ZMM_S(1), &env->sse_status); @@ -765,7 +765,7 @@ void helper_cvttps2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) d->ZMM_L(3) = x86_float32_to_int32_round_to_zero(s->ZMM_S(3), &env->sse_status); } -void helper_cvttpd2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_cvttpd2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_L(0) = x86_float64_to_int32_round_to_zero(s->ZMM_D(0), &env->sse_status); d->ZMM_L(1) = x86_float64_to_int32_round_to_zero(s->ZMM_D(1), &env->sse_status); @@ -806,7 +806,7 @@ int64_t helper_cvttsd2sq(CPUX86State *env, ZMMReg *s) } #endif -void helper_rsqrtps(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_rsqrtps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); d->ZMM_S(0) = float32_div(float32_one, @@ -833,7 +833,7 @@ void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *s) set_float_exception_flags(old_flags, &env->sse_status); } -void helper_rcpps(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_rcpps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); d->ZMM_S(0) = float32_div(float32_one, s->ZMM_S(0), &env->sse_status); @@ -894,7 +894,7 @@ void helper_insertq_i(CPUX86State *env, ZMMReg *d, int index, int length) d->ZMM_Q(0) = helper_insertq(d->ZMM_Q(0), index, length); } -void helper_haddps(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_haddps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { ZMMReg r; @@ -905,7 +905,7 @@ void helper_haddps(CPUX86State *env, ZMMReg *d, ZMMReg *s) MOVE(*d, r); } -void helper_haddpd(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_haddpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { ZMMReg r; @@ -914,7 +914,7 @@ void helper_haddpd(CPUX86State *env, ZMMReg *d, ZMMReg *s) MOVE(*d, r); } -void helper_hsubps(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_hsubps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { ZMMReg r; @@ -925,7 +925,7 @@ void helper_hsubps(CPUX86State *env, ZMMReg *d, ZMMReg *s) MOVE(*d, r); } -void helper_hsubpd(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_hsubpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { ZMMReg r; @@ -934,7 +934,7 @@ void helper_hsubpd(CPUX86State *env, ZMMReg *d, ZMMReg *s) MOVE(*d, r); } -void helper_addsubps(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_addsubps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_S(0) = float32_sub(d->ZMM_S(0), s->ZMM_S(0), &env->sse_status); d->ZMM_S(1) = float32_add(d->ZMM_S(1), s->ZMM_S(1), &env->sse_status); @@ -942,7 +942,7 @@ void helper_addsubps(CPUX86State *env, ZMMReg *d, ZMMReg *s) d->ZMM_S(3) = float32_add(d->ZMM_S(3), s->ZMM_S(3), &env->sse_status); } -void helper_addsubpd(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_D(0) = float64_sub(d->ZMM_D(0), s->ZMM_D(0), &env->sse_status); d->ZMM_D(1) = float64_add(d->ZMM_D(1), s->ZMM_D(1), &env->sse_status); @@ -950,7 +950,7 @@ void helper_addsubpd(CPUX86State *env, ZMMReg *d, ZMMReg *s) /* XXX: unordered */ #define SSE_HELPER_CMP(name, F) \ - void helper_ ## name ## ps(CPUX86State *env, Reg *d, Reg *s) \ + void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ { \ d->ZMM_L(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ d->ZMM_L(1) = F(32, d->ZMM_S(1), s->ZMM_S(1)); \ @@ -963,7 +963,7 @@ void helper_addsubpd(CPUX86State *env, ZMMReg *d, ZMMReg *s) d->ZMM_L(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ } \ \ - void helper_ ## name ## pd(CPUX86State *env, Reg *d, Reg *s) \ + void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ { \ d->ZMM_Q(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ d->ZMM_Q(1) = F(64, d->ZMM_D(1), s->ZMM_D(1)); \ @@ -1046,7 +1046,7 @@ void helper_comisd(CPUX86State *env, Reg *d, Reg *s) CC_SRC = comis_eflags[ret + 1]; } -uint32_t helper_movmskps(CPUX86State *env, Reg *s) +uint32_t glue(helper_movmskps, SUFFIX)(CPUX86State *env, Reg *s) { int b0, b1, b2, b3; @@ -1057,7 +1057,7 @@ uint32_t helper_movmskps(CPUX86State *env, Reg *s) return b0 | (b1 << 1) | (b2 << 2) | (b3 << 3); } -uint32_t helper_movmskpd(CPUX86State *env, Reg *s) +uint32_t glue(helper_movmskpd, SUFFIX)(CPUX86State *env, Reg *s) { int b0, b1; diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index cef28f2aae..fc697536a0 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -122,8 +122,8 @@ DEF_HELPER_2(glue(movq_mm_T0, SUFFIX), void, Reg, i64) #if SHIFT == 0 DEF_HELPER_3(glue(pshufw, SUFFIX), void, Reg, Reg, int) #else -DEF_HELPER_3(shufps, void, Reg, Reg, int) -DEF_HELPER_3(shufpd, void, Reg, Reg, int) +DEF_HELPER_3(glue(shufps, SUFFIX), void, Reg, Reg, int) +DEF_HELPER_3(glue(shufpd, SUFFIX), void, Reg, Reg, int) DEF_HELPER_3(glue(pshufd, SUFFIX), void, Reg, Reg, int) DEF_HELPER_3(glue(pshuflw, SUFFIX), void, Reg, Reg, int) DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int) @@ -134,9 +134,9 @@ DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int) /* XXX: not accurate */ #define SSE_HELPER_S(name, F) \ - DEF_HELPER_3(name ## ps, void, env, Reg, Reg) \ + DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ - DEF_HELPER_3(name ## pd, void, env, Reg, Reg) \ + DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) \ DEF_HELPER_3(name ## sd, void, env, Reg, Reg) SSE_HELPER_S(add, FPU_ADD) @@ -148,12 +148,12 @@ SSE_HELPER_S(max, FPU_MAX) SSE_HELPER_S(sqrt, FPU_SQRT) -DEF_HELPER_3(cvtps2pd, void, env, Reg, Reg) -DEF_HELPER_3(cvtpd2ps, void, env, Reg, Reg) +DEF_HELPER_3(glue(cvtps2pd, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(cvtpd2ps, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(cvtss2sd, void, env, Reg, Reg) DEF_HELPER_3(cvtsd2ss, void, env, Reg, Reg) -DEF_HELPER_3(cvtdq2ps, void, env, Reg, Reg) -DEF_HELPER_3(cvtdq2pd, void, env, Reg, Reg) +DEF_HELPER_3(glue(cvtdq2ps, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(cvtdq2pd, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(cvtpi2ps, void, env, ZMMReg, MMXReg) DEF_HELPER_3(cvtpi2pd, void, env, ZMMReg, MMXReg) DEF_HELPER_3(cvtsi2ss, void, env, ZMMReg, i32) @@ -164,8 +164,8 @@ DEF_HELPER_3(cvtsq2ss, void, env, ZMMReg, i64) DEF_HELPER_3(cvtsq2sd, void, env, ZMMReg, i64) #endif -DEF_HELPER_3(cvtps2dq, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(cvtpd2dq, void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(cvtps2dq, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(cvtpd2dq, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(cvtps2pi, void, env, MMXReg, ZMMReg) DEF_HELPER_3(cvtpd2pi, void, env, MMXReg, ZMMReg) DEF_HELPER_2(cvtss2si, s32, env, ZMMReg) @@ -175,8 +175,8 @@ DEF_HELPER_2(cvtss2sq, s64, env, ZMMReg) DEF_HELPER_2(cvtsd2sq, s64, env, ZMMReg) #endif -DEF_HELPER_3(cvttps2dq, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(cvttpd2dq, void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(cvttps2dq, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(cvttpd2dq, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(cvttps2pi, void, env, MMXReg, ZMMReg) DEF_HELPER_3(cvttpd2pi, void, env, MMXReg, ZMMReg) DEF_HELPER_2(cvttss2si, s32, env, ZMMReg) @@ -186,25 +186,25 @@ DEF_HELPER_2(cvttss2sq, s64, env, ZMMReg) DEF_HELPER_2(cvttsd2sq, s64, env, ZMMReg) #endif -DEF_HELPER_3(rsqrtps, void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(rsqrtps, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(rsqrtss, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(rcpps, void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(rcpps, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(rcpss, void, env, ZMMReg, ZMMReg) DEF_HELPER_3(extrq_r, void, env, ZMMReg, ZMMReg) DEF_HELPER_4(extrq_i, void, env, ZMMReg, int, int) DEF_HELPER_3(insertq_r, void, env, ZMMReg, ZMMReg) DEF_HELPER_4(insertq_i, void, env, ZMMReg, int, int) -DEF_HELPER_3(haddps, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(haddpd, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(hsubps, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(hsubpd, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(addsubps, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(addsubpd, void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(haddps, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(haddpd, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(hsubps, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(hsubpd, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(addsubps, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(addsubpd, SUFFIX), void, env, ZMMReg, ZMMReg) #define SSE_HELPER_CMP(name, F) \ - DEF_HELPER_3(name ## ps, void, env, Reg, Reg) \ + DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ - DEF_HELPER_3(name ## pd, void, env, Reg, Reg) \ + DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) \ DEF_HELPER_3(name ## sd, void, env, Reg, Reg) SSE_HELPER_CMP(cmpeq, FPU_CMPEQ) @@ -220,8 +220,8 @@ DEF_HELPER_3(ucomiss, void, env, Reg, Reg) DEF_HELPER_3(comiss, void, env, Reg, Reg) DEF_HELPER_3(ucomisd, void, env, Reg, Reg) DEF_HELPER_3(comisd, void, env, Reg, Reg) -DEF_HELPER_2(movmskps, i32, env, Reg) -DEF_HELPER_2(movmskpd, i32, env, Reg) +DEF_HELPER_2(glue(movmskps, SUFFIX), i32, env, Reg) +DEF_HELPER_2(glue(movmskpd, SUFFIX), i32, env, Reg) #endif D_helper_ ## x ## _xmm, NULL, NULL) From patchwork Thu Aug 25 22:14:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955296 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0DA90ECAAA2 for ; Thu, 25 Aug 2022 22:25:13 +0000 (UTC) Received: from localhost ([::1]:53668 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLHk-0004pI-0G for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:25:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60486) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7X-00087E-3R for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:39 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:28161) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7T-0002k0-Kq for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465675; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QEHOFaFlgWTPCkPs6skb+TASAdwn3rNOc3PE6mH7k5g=; b=cfpPLz9Nn+06lA3ybsr0tUShgIQrp954O14m15caf8xU8ha/lSfuTWemRpsHjMg9nOecsv IgZ6NLvjNQG6CEuqB4HErwgH2gs4rEBK9UKB4hY2UsnxJMcSrGfmKQrRPitNsL8PwmlO8n UFAzo9V3Z1gdrthOfrDSApBnawBbNfA= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-237-KYWXx_3rNeWwCE8dowLhwg-1; Thu, 25 Aug 2022 18:14:33 -0400 X-MC-Unique: KYWXx_3rNeWwCE8dowLhwg-1 Received: by mail-wm1-f69.google.com with SMTP id 203-20020a1c02d4000000b003a5f5bce876so3049819wmc.2 for ; Thu, 25 Aug 2022 15:14:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=QEHOFaFlgWTPCkPs6skb+TASAdwn3rNOc3PE6mH7k5g=; b=c6wabbFv6jOHfv9UaOmbnz8+LsnItFwsgnl/qAP2MJC/qJjKhdyyWLbRpk2QkBT7y4 Bsx9ZBa5Nq00vQm/C8igtLWsaA4++vhyY7bF9j5z19eGvUFsSP5/a/SEN+IVwLI5YQiP AnYjOKINnZBDnACwuGX3gmDqAEAA2MU23/jujo91JOo5SmzE6tccOTBiFVGqrSS61Yro /FUXK55ITJnKglN3G37ZP2ZbB0vZhzdQY3dsSvAFLD+P6XGSiFSe3trqCxHhESsZOpfq ScShNU5hlxK1Ntw6bu9Po68L31diYtLwJEctPAdQW5AY08JlIz4EGoi2XQSeXNflpWLk 8lKg== X-Gm-Message-State: ACgBeo16QrNOsArSs5kuWvkzj5jzcSORtGKQRU+uyXY0pBBuOpwOroth AyXud859s13zrC1DcNRC57rzFs4904BvCdrxrxjE6BEZ6UixMOLgemIHRRh4CUwW5B44bfEshR8 B5kwR8I31xZVX+kQDhyQER2XecZCPmtsV58JgJ/EaC1WKSvTAEkIYq0/H5FD/ZKvyzOY= X-Received: by 2002:a5d:6d86:0:b0:225:404f:ac99 with SMTP id l6-20020a5d6d86000000b00225404fac99mr3296520wrs.165.1661465672244; Thu, 25 Aug 2022 15:14:32 -0700 (PDT) X-Google-Smtp-Source: AA6agR6nOh8mX2q0QUOo10E6uqUvgxAZfp1KahfvP76QgoKK0pcNoSoeAF53BbeNfCQeQWgTEkaLlQ== X-Received: by 2002:a5d:6d86:0:b0:225:404f:ac99 with SMTP id l6-20020a5d6d86000000b00225404fac99mr3296513wrs.165.1661465671919; Thu, 25 Aug 2022 15:14:31 -0700 (PDT) Received: from goa-sendmail ([93.56.160.208]) by smtp.gmail.com with ESMTPSA id j12-20020adff54c000000b00223a50b1be8sm320885wrp.50.2022.08.25.15.14.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:31 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 11/18] i386: Floating point arithmetic helper AVX prep Date: Fri, 26 Aug 2022 00:14:04 +0200 Message-Id: <20220825221411.35122-12-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Prepare the "easy" floating point vector helpers for AVX No functional changes to existing helpers. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-16-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 138 ++++++++++++++++++++++++++++-------------- 1 file changed, 92 insertions(+), 46 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 17fdc68f6e..08359b8433 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -540,40 +540,58 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) } #endif -#if SHIFT == 1 +#if SHIFT >= 1 /* FPU ops */ /* XXX: not accurate */ -#define SSE_HELPER_S(name, F) \ - void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ +#define SSE_HELPER_P(name, F) \ + void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, \ + Reg *d, Reg *s) \ { \ - d->ZMM_S(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ - d->ZMM_S(1) = F(32, d->ZMM_S(1), s->ZMM_S(1)); \ - d->ZMM_S(2) = F(32, d->ZMM_S(2), s->ZMM_S(2)); \ - d->ZMM_S(3) = F(32, d->ZMM_S(3), s->ZMM_S(3)); \ + Reg *v = d; \ + int i; \ + for (i = 0; i < 2 << SHIFT; i++) { \ + d->ZMM_S(i) = F(32, v->ZMM_S(i), s->ZMM_S(i)); \ + } \ } \ \ - void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s) \ + void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, \ + Reg *d, Reg *s) \ { \ - d->ZMM_S(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ - } \ - \ - void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ - { \ - d->ZMM_D(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ - d->ZMM_D(1) = F(64, d->ZMM_D(1), s->ZMM_D(1)); \ - } \ - \ - void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s) \ - { \ - d->ZMM_D(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ + Reg *v = d; \ + int i; \ + for (i = 0; i < 1 << SHIFT; i++) { \ + d->ZMM_D(i) = F(64, v->ZMM_D(i), s->ZMM_D(i)); \ + } \ } +#if SHIFT == 1 + +#define SSE_HELPER_S(name, F) \ + SSE_HELPER_P(name, F) \ + \ + void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s)\ + { \ + Reg *v = d; \ + d->ZMM_S(0) = F(32, v->ZMM_S(0), s->ZMM_S(0)); \ + } \ + \ + void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s)\ + { \ + Reg *v = d; \ + d->ZMM_D(0) = F(64, v->ZMM_D(0), s->ZMM_D(0)); \ + } + +#else + +#define SSE_HELPER_S(name, F) SSE_HELPER_P(name, F) + +#endif + #define FPU_ADD(size, a, b) float ## size ## _add(a, b, &env->sse_status) #define FPU_SUB(size, a, b) float ## size ## _sub(a, b, &env->sse_status) #define FPU_MUL(size, a, b) float ## size ## _mul(a, b, &env->sse_status) #define FPU_DIV(size, a, b) float ## size ## _div(a, b, &env->sse_status) -#define FPU_SQRT(size, a, b) float ## size ## _sqrt(b, &env->sse_status) /* Note that the choice of comparison op here is important to get the * special cases right: for min and max Intel specifies that (-0,0), @@ -590,8 +608,34 @@ SSE_HELPER_S(mul, FPU_MUL) SSE_HELPER_S(div, FPU_DIV) SSE_HELPER_S(min, FPU_MIN) SSE_HELPER_S(max, FPU_MAX) -SSE_HELPER_S(sqrt, FPU_SQRT) +void glue(helper_sqrtps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + int i; + for (i = 0; i < 2 << SHIFT; i++) { + d->ZMM_S(i) = float32_sqrt(s->ZMM_S(i), &env->sse_status); + } +} + +void glue(helper_sqrtpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + int i; + for (i = 0; i < 1 << SHIFT; i++) { + d->ZMM_D(i) = float64_sqrt(s->ZMM_D(i), &env->sse_status); + } +} + +#if SHIFT == 1 +void helper_sqrtss(CPUX86State *env, Reg *d, Reg *s) +{ + d->ZMM_S(0) = float32_sqrt(s->ZMM_S(0), &env->sse_status); +} + +void helper_sqrtsd(CPUX86State *env, Reg *d, Reg *s) +{ + d->ZMM_D(0) = float64_sqrt(s->ZMM_D(0), &env->sse_status); +} +#endif /* float to float conversions */ void glue(helper_cvtps2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) @@ -809,18 +853,12 @@ int64_t helper_cvttsd2sq(CPUX86State *env, ZMMReg *s) void glue(helper_rsqrtps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); - d->ZMM_S(0) = float32_div(float32_one, - float32_sqrt(s->ZMM_S(0), &env->sse_status), - &env->sse_status); - d->ZMM_S(1) = float32_div(float32_one, - float32_sqrt(s->ZMM_S(1), &env->sse_status), - &env->sse_status); - d->ZMM_S(2) = float32_div(float32_one, - float32_sqrt(s->ZMM_S(2), &env->sse_status), - &env->sse_status); - d->ZMM_S(3) = float32_div(float32_one, - float32_sqrt(s->ZMM_S(3), &env->sse_status), - &env->sse_status); + int i; + for (i = 0; i < 2 << SHIFT; i++) { + d->ZMM_S(i) = float32_div(float32_one, + float32_sqrt(s->ZMM_S(i), &env->sse_status), + &env->sse_status); + } set_float_exception_flags(old_flags, &env->sse_status); } @@ -836,10 +874,10 @@ void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *s) void glue(helper_rcpps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); - d->ZMM_S(0) = float32_div(float32_one, s->ZMM_S(0), &env->sse_status); - d->ZMM_S(1) = float32_div(float32_one, s->ZMM_S(1), &env->sse_status); - d->ZMM_S(2) = float32_div(float32_one, s->ZMM_S(2), &env->sse_status); - d->ZMM_S(3) = float32_div(float32_one, s->ZMM_S(3), &env->sse_status); + int i; + for (i = 0; i < 2 << SHIFT; i++) { + d->ZMM_S(i) = float32_div(float32_one, s->ZMM_S(i), &env->sse_status); + } set_float_exception_flags(old_flags, &env->sse_status); } @@ -934,18 +972,24 @@ void glue(helper_hsubpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) MOVE(*d, r); } -void glue(helper_addsubps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_addsubps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - d->ZMM_S(0) = float32_sub(d->ZMM_S(0), s->ZMM_S(0), &env->sse_status); - d->ZMM_S(1) = float32_add(d->ZMM_S(1), s->ZMM_S(1), &env->sse_status); - d->ZMM_S(2) = float32_sub(d->ZMM_S(2), s->ZMM_S(2), &env->sse_status); - d->ZMM_S(3) = float32_add(d->ZMM_S(3), s->ZMM_S(3), &env->sse_status); + Reg *v = d; + int i; + for (i = 0; i < 2 << SHIFT; i += 2) { + d->ZMM_S(i) = float32_sub(v->ZMM_S(i), s->ZMM_S(i), &env->sse_status); + d->ZMM_S(i+1) = float32_add(v->ZMM_S(i+1), s->ZMM_S(i+1), &env->sse_status); + } } -void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - d->ZMM_D(0) = float64_sub(d->ZMM_D(0), s->ZMM_D(0), &env->sse_status); - d->ZMM_D(1) = float64_add(d->ZMM_D(1), s->ZMM_D(1), &env->sse_status); + Reg *v = d; + int i; + for (i = 0; i < 1 << SHIFT; i += 2) { + d->ZMM_D(i) = float64_sub(v->ZMM_D(i), s->ZMM_D(i), &env->sse_status); + d->ZMM_D(i+1) = float64_add(v->ZMM_D(i+1), s->ZMM_D(i+1), &env->sse_status); + } } /* XXX: unordered */ @@ -2294,6 +2338,8 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } #endif +#undef SSE_HELPER_S + #undef SHIFT #undef XMM_ONLY #undef Reg From patchwork Thu Aug 25 22:14:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DF0DFECAAA2 for ; Thu, 25 Aug 2022 22:40:28 +0000 (UTC) Received: from localhost ([::1]:55708 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLWV-0004j1-Sb for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:40:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60488) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7Y-0008D2-ST for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:40 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:40975) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7W-0002kH-Vv for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465676; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j0hzub3oFLEtUwwRTIdqW7g0Z8ZFMh+2g43PJvMaUq0=; b=RZqGKpbe8d5Ty+9oufpwisrrb3kejJg7h0zN2y97XBQJl45QQ8xmIfkFnqk9sKKbiFcDs9 RuQprpRpZv1vRMF/Dat2yqhEsUbtw0TeKIKkCCOlTWS5Ps9OtKPczAC+EcIgts72VREq1a rBuJIISQemZeKS5i9WpEk+oDhEcHi2I= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-256-lWHBbdJtOlCTSGmlt2yQVw-1; Thu, 25 Aug 2022 18:14:35 -0400 X-MC-Unique: lWHBbdJtOlCTSGmlt2yQVw-1 Received: by mail-wm1-f70.google.com with SMTP id b4-20020a05600c4e0400b003a5a96f1756so3066730wmq.0 for ; Thu, 25 Aug 2022 15:14:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=j0hzub3oFLEtUwwRTIdqW7g0Z8ZFMh+2g43PJvMaUq0=; b=4t97pK7d7Nushb47uyEjDhMAOqSNyFmjjDb8jRikTvQh0XGiaxAsDkXi23QWMPwiUD +M8eCfhYHJOhU9KKuSyyWPGkd0ciM8tbDnMilS9RD1o9V4UOH+fTc7HTgHmNb4KMNUrP ZUxz7UPOeGKWA+1j7wveq6bhXJdU/h3U+INeccNEGtEUWflr6Z1dJMSsfY0XBj3OUdri b/Wj2JhDgswQ9G64p35HFpOPuX9RFEGkfXhbRk60ExfuVRAk7LyO+BLeMhRHzEF0OjK4 6+ADprcsPhS56pT8veE8SrreTe8975FaT4U4pbbBbjahOQdlFeL3XZdhmxWwFjw3IxNM 1GMQ== X-Gm-Message-State: ACgBeo2LA7uDZmKz3oN4cgc1Dhi2rr5L+oPV2KJf50o2aRjcDNWfHDki 4PZHDAsvt8S1L5V4ofHKM3q+ab2w5FP+qul8UVBH4oExe5bJcGniHUvKrC7RtAhLv1DFXvci0GU kydZda3U2BQj9jNXyMGYK/y6zAbaVxCUv2y8gBa07GQoJtfoNQaXKVCae4Lt5NbeAUwU= X-Received: by 2002:a5d:6445:0:b0:225:1a75:7754 with SMTP id d5-20020a5d6445000000b002251a757754mr3331779wrw.239.1661465673760; Thu, 25 Aug 2022 15:14:33 -0700 (PDT) X-Google-Smtp-Source: AA6agR5iYk/NmTDhJQP3wJ1DD9DBWwau6frlRwZbNRvX7Yyt1/8in5ok2A6PSW5guj7fQCNXFdUdug== X-Received: by 2002:a5d:6445:0:b0:225:1a75:7754 with SMTP id d5-20020a5d6445000000b002251a757754mr3331770wrw.239.1661465673400; Thu, 25 Aug 2022 15:14:33 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id m12-20020a056000180c00b0022584e771adsm265647wrh.113.2022.08.25.15.14.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:32 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 12/18] i386: reimplement AVX comparison helpers Date: Fri, 26 Aug 2022 00:14:05 +0200 Message-Id: <20220825221411.35122-13-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook AVX includes additional a more extensive set of comparison predicates, some of some of which our softfloat implementation does not expose directly. Rewrite the helpers in terms of floatN_compare for future extensibility. Signed-off-by: Paul Brook Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 97 ++++++++++++++++++++---------------- target/i386/ops_sse_header.h | 24 ++++----- target/i386/tcg/translate.c | 20 ++++---- 3 files changed, 75 insertions(+), 66 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 08359b8433..851a05d594 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -992,57 +992,66 @@ void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } } -/* XXX: unordered */ -#define SSE_HELPER_CMP(name, F) \ - void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ +#define SSE_HELPER_CMP_P(name, F, C) \ + void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, \ + Reg *d, Reg *s) \ { \ - d->ZMM_L(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ - d->ZMM_L(1) = F(32, d->ZMM_S(1), s->ZMM_S(1)); \ - d->ZMM_L(2) = F(32, d->ZMM_S(2), s->ZMM_S(2)); \ - d->ZMM_L(3) = F(32, d->ZMM_S(3), s->ZMM_S(3)); \ + Reg *v = d; \ + int i; \ + for (i = 0; i < 2 << SHIFT; i++) { \ + d->ZMM_L(i) = F(32, C, v->ZMM_S(i), s->ZMM_S(i)); \ + } \ } \ \ - void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s) \ + void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, \ + Reg *d, Reg *s) \ { \ - d->ZMM_L(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ - } \ - \ - void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ - { \ - d->ZMM_Q(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ - d->ZMM_Q(1) = F(64, d->ZMM_D(1), s->ZMM_D(1)); \ - } \ - \ - void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s) \ - { \ - d->ZMM_Q(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ + Reg *v = d; \ + int i; \ + for (i = 0; i < 1 << SHIFT; i++) { \ + d->ZMM_Q(i) = F(64, C, v->ZMM_D(i), s->ZMM_D(i)); \ + } \ } -#define FPU_CMPEQ(size, a, b) \ - (float ## size ## _eq_quiet(a, b, &env->sse_status) ? -1 : 0) -#define FPU_CMPLT(size, a, b) \ - (float ## size ## _lt(a, b, &env->sse_status) ? -1 : 0) -#define FPU_CMPLE(size, a, b) \ - (float ## size ## _le(a, b, &env->sse_status) ? -1 : 0) -#define FPU_CMPUNORD(size, a, b) \ - (float ## size ## _unordered_quiet(a, b, &env->sse_status) ? -1 : 0) -#define FPU_CMPNEQ(size, a, b) \ - (float ## size ## _eq_quiet(a, b, &env->sse_status) ? 0 : -1) -#define FPU_CMPNLT(size, a, b) \ - (float ## size ## _lt(a, b, &env->sse_status) ? 0 : -1) -#define FPU_CMPNLE(size, a, b) \ - (float ## size ## _le(a, b, &env->sse_status) ? 0 : -1) -#define FPU_CMPORD(size, a, b) \ - (float ## size ## _unordered_quiet(a, b, &env->sse_status) ? 0 : -1) +#if SHIFT == 1 +#define SSE_HELPER_CMP(name, F, C) \ + SSE_HELPER_CMP_P(name, F, C) \ + void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s) \ + { \ + Reg *v = d; \ + d->ZMM_L(0) = F(32, C, v->ZMM_S(0), s->ZMM_S(0)); \ + } \ + \ + void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s) \ + { \ + Reg *v = d; \ + d->ZMM_Q(0) = F(64, C, v->ZMM_D(0), s->ZMM_D(0)); \ + } -SSE_HELPER_CMP(cmpeq, FPU_CMPEQ) -SSE_HELPER_CMP(cmplt, FPU_CMPLT) -SSE_HELPER_CMP(cmple, FPU_CMPLE) -SSE_HELPER_CMP(cmpunord, FPU_CMPUNORD) -SSE_HELPER_CMP(cmpneq, FPU_CMPNEQ) -SSE_HELPER_CMP(cmpnlt, FPU_CMPNLT) -SSE_HELPER_CMP(cmpnle, FPU_CMPNLE) -SSE_HELPER_CMP(cmpord, FPU_CMPORD) +#define FPU_EQ(x) (x == float_relation_equal) +#define FPU_LT(x) (x == float_relation_less) +#define FPU_LE(x) (x <= float_relation_equal) +#define FPU_UNORD(x) (x == float_relation_unordered) + +#define FPU_CMPQ(size, COND, a, b) \ + (COND(float ## size ## _compare_quiet(a, b, &env->sse_status)) ? -1 : 0) +#define FPU_CMPS(size, COND, a, b) \ + (COND(float ## size ## _compare(a, b, &env->sse_status)) ? -1 : 0) + +#else +#define SSE_HELPER_CMP(name, F, C) SSE_HELPER_CMP_P(name, F, C) +#endif + +SSE_HELPER_CMP(cmpeq, FPU_CMPQ, FPU_EQ) +SSE_HELPER_CMP(cmplt, FPU_CMPS, FPU_LT) +SSE_HELPER_CMP(cmple, FPU_CMPS, FPU_LE) +SSE_HELPER_CMP(cmpunord, FPU_CMPQ, FPU_UNORD) +SSE_HELPER_CMP(cmpneq, FPU_CMPQ, !FPU_EQ) +SSE_HELPER_CMP(cmpnlt, FPU_CMPS, !FPU_LT) +SSE_HELPER_CMP(cmpnle, FPU_CMPS, !FPU_LE) +SSE_HELPER_CMP(cmpord, FPU_CMPQ, !FPU_UNORD) + +#undef SSE_HELPER_CMP static const int comis_eflags[4] = {CC_C, CC_Z, 0, CC_Z | CC_P | CC_C}; diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index fc697536a0..d99464afb0 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -201,20 +201,20 @@ DEF_HELPER_3(glue(hsubpd, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(glue(addsubps, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(glue(addsubpd, SUFFIX), void, env, ZMMReg, ZMMReg) -#define SSE_HELPER_CMP(name, F) \ - DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ - DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ - DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) \ +#define SSE_HELPER_CMP(name, F, C) \ + DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ + DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ + DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) \ DEF_HELPER_3(name ## sd, void, env, Reg, Reg) -SSE_HELPER_CMP(cmpeq, FPU_CMPEQ) -SSE_HELPER_CMP(cmplt, FPU_CMPLT) -SSE_HELPER_CMP(cmple, FPU_CMPLE) -SSE_HELPER_CMP(cmpunord, FPU_CMPUNORD) -SSE_HELPER_CMP(cmpneq, FPU_CMPNEQ) -SSE_HELPER_CMP(cmpnlt, FPU_CMPNLT) -SSE_HELPER_CMP(cmpnle, FPU_CMPNLE) -SSE_HELPER_CMP(cmpord, FPU_CMPORD) +SSE_HELPER_CMP(cmpeq, FPU_CMPQ, FPU_EQ) +SSE_HELPER_CMP(cmplt, FPU_CMPS, FPU_LT) +SSE_HELPER_CMP(cmple, FPU_CMPS, FPU_LE) +SSE_HELPER_CMP(cmpunord, FPU_CMPQ, FPU_UNORD) +SSE_HELPER_CMP(cmpneq, FPU_CMPQ, !FPU_EQ) +SSE_HELPER_CMP(cmpnlt, FPU_CMPS, !FPU_LT) +SSE_HELPER_CMP(cmpnle, FPU_CMPS, !FPU_LE) +SSE_HELPER_CMP(cmpord, FPU_CMPQ, !FPU_UNORD) DEF_HELPER_3(ucomiss, void, env, Reg, Reg) DEF_HELPER_3(comiss, void, env, Reg, Reg) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 1e67607ca3..059e001d82 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3021,20 +3021,20 @@ static const SSEFunc_l_ep sse_op_table3bq[] = { }; #endif -#define SSE_FOP(x) { \ +#define SSE_CMP(x) { \ gen_helper_ ## x ## ps ## _xmm, gen_helper_ ## x ## pd ## _xmm, \ gen_helper_ ## x ## ss, gen_helper_ ## x ## sd} static const SSEFunc_0_epp sse_op_table4[8][4] = { - SSE_FOP(cmpeq), - SSE_FOP(cmplt), - SSE_FOP(cmple), - SSE_FOP(cmpunord), - SSE_FOP(cmpneq), - SSE_FOP(cmpnlt), - SSE_FOP(cmpnle), - SSE_FOP(cmpord), + SSE_CMP(cmpeq), + SSE_CMP(cmplt), + SSE_CMP(cmple), + SSE_CMP(cmpunord), + SSE_CMP(cmpneq), + SSE_CMP(cmpnlt), + SSE_CMP(cmpnle), + SSE_CMP(cmpord), }; -#undef SSE_FOP +#undef SSE_CMP static const SSEFunc_0_epp sse_op_table5[256] = { [0x0c] = gen_helper_pi2fw, From patchwork Thu Aug 25 22:14:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955305 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CE2E6ECAAA3 for ; Thu, 25 Aug 2022 22:28:08 +0000 (UTC) Received: from localhost ([::1]:48120 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLKZ-0001gT-Tq for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:28:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60494) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7c-0008Pw-N2 for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:44 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:47944) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7W-0002kM-AE for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465677; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qungBgIQRLzx+Bgep1MPWp2kJs22KSwmUKxLcHLUV5U=; b=cnqYoJNPoNetAVooqGwKjKxUBBEeTMsldZDA3ogCKZGVzZsl9Z5hzEt4xe0XSvbQObNbNz McPvT+wcldEVBimQ/ImmAbsn9pK5dftKOqMh3w9RktAqV1D6cpu2QvSlA29alI4xYSMOxG pGBDvfr2bKY86Dj+aafdchIsFKOJICY= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-193-oR8j7igoOfmiho8eVCeJCg-1; Thu, 25 Aug 2022 18:14:36 -0400 X-MC-Unique: oR8j7igoOfmiho8eVCeJCg-1 Received: by mail-wm1-f71.google.com with SMTP id f9-20020a7bcd09000000b003a62725489bso1145072wmj.2 for ; Thu, 25 Aug 2022 15:14:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=qungBgIQRLzx+Bgep1MPWp2kJs22KSwmUKxLcHLUV5U=; b=gYZxYwN34PBBRU5YMej6EoJj7TjVUrbd/IM3q324RfBoUS3FOw9169SS8WiDN/ADo2 6DKzll9eWPRQCZb8jDk1/1+3pdpCV5mHiX71E24qUeczDmhtbTcrTDuUuYhIaGFwi5WF TExKGm7ZIJIQ1y2usmRt3yjuxFdIlZUonTigextT6gE9YBkUt315Nmu5trveWk9PhH8o 9kjP/SO+9GiVTCvYpvjdcyOlpKxYPciad8OHmA637HHErj0oGRTht7xcAy/TyqlR5kXx +E3trAKWELWLSqGHphG8QHlEYwq9/0ZQ9Z3g0o3nHfTKne6efV9mor5V4hA5Iq/1fHGv yrRA== X-Gm-Message-State: ACgBeo0gugoTgcUr5kZU6339y2Z0XAi4d1tJ8HLhrJTC5vgiY3Va+qKw nb+qK162sXMparQ9tVlOxRJVaIt/9cDvpxNKO9YwlFAj7UXYexepalkCt4galtltVVd9lf/Z8fy tG/zTDO8SfAViMVDp6FxVp+uNoAee2odncTmmOgl27ndOsyrQ7HohlH2UfXtbPOmDQuw= X-Received: by 2002:a5d:47a9:0:b0:225:79bd:ad15 with SMTP id 9-20020a5d47a9000000b0022579bdad15mr3063100wrb.9.1661465675049; Thu, 25 Aug 2022 15:14:35 -0700 (PDT) X-Google-Smtp-Source: AA6agR6nPmxylQTqSkJq1+uVzM6oJG3V1a/RGAI8DbDPwyWop5orNFpiOgP3PTAfJQNLmmTLXl7PSg== X-Received: by 2002:a5d:47a9:0:b0:225:79bd:ad15 with SMTP id 9-20020a5d47a9000000b0022579bdad15mr3063093wrb.9.1661465674765; Thu, 25 Aug 2022 15:14:34 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id l14-20020a5d668e000000b002253fd19a6asm430943wru.18.2022.08.25.15.14.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:34 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 13/18] i386: Dot product AVX helper prep Date: Fri, 26 Aug 2022 00:14:06 +0200 Message-Id: <20220825221411.35122-14-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Make the dpps and dppd helpers AVX-ready I can't see any obvious reason why dppd shouldn't work on 256 bit ymm registers, but both AMD and Intel agree that it's xmm only. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-17-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 80 ++++++++++++++++++++++++------------------- 1 file changed, 45 insertions(+), 35 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 851a05d594..0493a26804 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1942,55 +1942,64 @@ SSE_HELPER_I(helper_blendps, L, 4, FBLENDP) SSE_HELPER_I(helper_blendpd, Q, 2, FBLENDP) SSE_HELPER_I(helper_pblendw, W, 8, FBLENDP) -void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) +void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, + uint32_t mask) { + Reg *v = d; float32 prod1, prod2, temp2, temp3, temp4; + int i; - /* - * We must evaluate (A+B)+(C+D), not ((A+B)+C)+D - * to correctly round the intermediate results - */ - if (mask & (1 << 4)) { - prod1 = float32_mul(d->ZMM_S(0), s->ZMM_S(0), &env->sse_status); - } else { - prod1 = float32_zero; - } - if (mask & (1 << 5)) { - prod2 = float32_mul(d->ZMM_S(1), s->ZMM_S(1), &env->sse_status); - } else { - prod2 = float32_zero; - } - temp2 = float32_add(prod1, prod2, &env->sse_status); - if (mask & (1 << 6)) { - prod1 = float32_mul(d->ZMM_S(2), s->ZMM_S(2), &env->sse_status); - } else { - prod1 = float32_zero; - } - if (mask & (1 << 7)) { - prod2 = float32_mul(d->ZMM_S(3), s->ZMM_S(3), &env->sse_status); - } else { - prod2 = float32_zero; - } - temp3 = float32_add(prod1, prod2, &env->sse_status); - temp4 = float32_add(temp2, temp3, &env->sse_status); + for (i = 0; i < 2 << SHIFT; i += 4) { + /* + * We must evaluate (A+B)+(C+D), not ((A+B)+C)+D + * to correctly round the intermediate results + */ + if (mask & (1 << 4)) { + prod1 = float32_mul(v->ZMM_S(i), s->ZMM_S(i), &env->sse_status); + } else { + prod1 = float32_zero; + } + if (mask & (1 << 5)) { + prod2 = float32_mul(v->ZMM_S(i+1), s->ZMM_S(i+1), &env->sse_status); + } else { + prod2 = float32_zero; + } + temp2 = float32_add(prod1, prod2, &env->sse_status); + if (mask & (1 << 6)) { + prod1 = float32_mul(v->ZMM_S(i+2), s->ZMM_S(i+2), &env->sse_status); + } else { + prod1 = float32_zero; + } + if (mask & (1 << 7)) { + prod2 = float32_mul(v->ZMM_S(i+3), s->ZMM_S(i+3), &env->sse_status); + } else { + prod2 = float32_zero; + } + temp3 = float32_add(prod1, prod2, &env->sse_status); + temp4 = float32_add(temp2, temp3, &env->sse_status); - d->ZMM_S(0) = (mask & (1 << 0)) ? temp4 : float32_zero; - d->ZMM_S(1) = (mask & (1 << 1)) ? temp4 : float32_zero; - d->ZMM_S(2) = (mask & (1 << 2)) ? temp4 : float32_zero; - d->ZMM_S(3) = (mask & (1 << 3)) ? temp4 : float32_zero; + d->ZMM_S(i) = (mask & (1 << 0)) ? temp4 : float32_zero; + d->ZMM_S(i+1) = (mask & (1 << 1)) ? temp4 : float32_zero; + d->ZMM_S(i+2) = (mask & (1 << 2)) ? temp4 : float32_zero; + d->ZMM_S(i+3) = (mask & (1 << 3)) ? temp4 : float32_zero; + } } -void glue(helper_dppd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) +#if SHIFT == 1 +/* Oddly, there is no ymm version of dppd */ +void glue(helper_dppd, SUFFIX)(CPUX86State *env, + Reg *d, Reg *s, uint32_t mask) { + Reg *v = d; float64 prod1, prod2, temp2; if (mask & (1 << 4)) { - prod1 = float64_mul(d->ZMM_D(0), s->ZMM_D(0), &env->sse_status); + prod1 = float64_mul(v->ZMM_D(0), s->ZMM_D(0), &env->sse_status); } else { prod1 = float64_zero; } if (mask & (1 << 5)) { - prod2 = float64_mul(d->ZMM_D(1), s->ZMM_D(1), &env->sse_status); + prod2 = float64_mul(v->ZMM_D(1), s->ZMM_D(1), &env->sse_status); } else { prod2 = float64_zero; } @@ -1998,6 +2007,7 @@ void glue(helper_dppd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) d->ZMM_D(0) = (mask & (1 << 0)) ? temp2 : float64_zero; d->ZMM_D(1) = (mask & (1 << 1)) ? temp2 : float64_zero; } +#endif void glue(helper_mpsadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t offset) From patchwork Thu Aug 25 22:14:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0E419ECAAA3 for ; Thu, 25 Aug 2022 22:23:36 +0000 (UTC) Received: from localhost ([::1]:38958 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLGB-0001JX-2I for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:23:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60490) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7Z-0008DZ-C7 for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:41 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:55759) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7X-0002kd-P1 for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465679; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/7CtAkdHx6EeLTd6jc8ryTB67q6iPWrou0FI4MgP6ag=; b=Dn3/1ldRXv5Vj/gq5QeDXnCp5KYBJOFZhP7FknnxYsuWyAnjF5cCX+B1XJq96T9o0AHRsk 9HV9Kzoh0sjjfm8yZpUfclOIVsRLubUGqhzOm4seVPZw7aDry2Xkjvg3rV288QPZZ1LwHw 4868fWOLXBCv6UwnBIOjZJoBNuXZ5Dg= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-519-JHiLX8VDOnafL3xvArMfyg-1; Thu, 25 Aug 2022 18:14:38 -0400 X-MC-Unique: JHiLX8VDOnafL3xvArMfyg-1 Received: by mail-wm1-f69.google.com with SMTP id j3-20020a05600c1c0300b003a5e72421c2so2985196wms.1 for ; Thu, 25 Aug 2022 15:14:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=/7CtAkdHx6EeLTd6jc8ryTB67q6iPWrou0FI4MgP6ag=; b=i/c2lpzO5+MrEeCidkyTgK9XM/VYX+FQOjabNLShUmHmLRwap1EQoBYyibc05cA9Tl 5M3iLNcEWLw6jYgFahtNY15wSyBj0DDu8wxxiugFu210IgBE3Eyg3WfxcqRLpMBKP79B TBH76HmalyMwqHsXMw4H4BhDynu9mektnwJXhx/tm5YJWqFuEYcJhSzPH7wyKxvL3Bj1 wqkF9N661zXaDQdarbw3jA65pijSTcJoEkDNNvKdwrdwIVtoCeSjtAXiZ6RKTYUqEoGa QcTS9ruT/l/D7JD4rMTnJ8j8HX3o9IQmqDtnBtlmNkNe+09uAUSmz7s21loQhUzSwFKR LHVA== X-Gm-Message-State: ACgBeo3FChaPXbJRBhyP5ch6/AGZeyL8K5tmH+pHsYpV+HbIIHpFSn8e Rb6FqOIb25EjF29OAFPLtt4egr7NRy1zV3Bg+/CAuGyYT29zlb5hNiMN+Gt5LN1AcAOE8IOncys 3co4QGL0i5NeeEAa0D23aaXNl8tz7Le9yzEjVWJCmkSkgrpwjluuLqigGBcyxaV5roy4= X-Received: by 2002:a05:600c:4f92:b0:3a6:cc5:e616 with SMTP id n18-20020a05600c4f9200b003a60cc5e616mr3417965wmq.53.1661465676473; Thu, 25 Aug 2022 15:14:36 -0700 (PDT) X-Google-Smtp-Source: AA6agR6znEnujWoWTvaTYRRBg/N0dG3SGkQXe040Ptnc6K3ifaqHBURbcqXxfEXzui4bsMxLceaQcw== X-Received: by 2002:a05:600c:4f92:b0:3a6:cc5:e616 with SMTP id n18-20020a05600c4f9200b003a60cc5e616mr3417959wmq.53.1661465676229; Thu, 25 Aug 2022 15:14:36 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id j7-20020a5d6187000000b0022584c82c80sm352229wru.19.2022.08.25.15.14.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:35 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 14/18] i386: Destructive FP helpers for AVX Date: Fri, 26 Aug 2022 00:14:07 +0200 Message-Id: <20220825221411.35122-15-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Perpare the horizontal atithmetic vector helpers for AVX These currently use a dummy Reg typed variable to store the result then assign the whole register. This will cause 128 bit operations to corrupt the upper half of the register, so replace it with explicit temporaries and element assignments. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-18-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 68 +++++++++++++++++++++---------------------- 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 0493a26804..7252e03619 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -932,45 +932,45 @@ void helper_insertq_i(CPUX86State *env, ZMMReg *d, int index, int length) d->ZMM_Q(0) = helper_insertq(d->ZMM_Q(0), index, length); } -void glue(helper_haddps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) -{ - ZMMReg r; - - r.ZMM_S(0) = float32_add(d->ZMM_S(0), d->ZMM_S(1), &env->sse_status); - r.ZMM_S(1) = float32_add(d->ZMM_S(2), d->ZMM_S(3), &env->sse_status); - r.ZMM_S(2) = float32_add(s->ZMM_S(0), s->ZMM_S(1), &env->sse_status); - r.ZMM_S(3) = float32_add(s->ZMM_S(2), s->ZMM_S(3), &env->sse_status); - MOVE(*d, r); +#define SSE_HELPER_HPS(name, F) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +{ \ + Reg *v = d; \ + float32 r[2 << SHIFT]; \ + int i, j; \ + for (i = j = 0; j < 4; i++, j += 2) { \ + r[i] = F(v->ZMM_S(j), v->ZMM_S(j + 1), &env->sse_status); \ + } \ + for (j = 0; j < 4; i++, j += 2) { \ + r[i] = F(s->ZMM_S(j), s->ZMM_S(j + 1), &env->sse_status); \ + } \ + for (i = 0; i < 2 << SHIFT; i++) { \ + d->ZMM_S(i) = r[i]; \ + } \ } -void glue(helper_haddpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) -{ - ZMMReg r; +SSE_HELPER_HPS(haddps, float32_add) +SSE_HELPER_HPS(hsubps, float32_sub) - r.ZMM_D(0) = float64_add(d->ZMM_D(0), d->ZMM_D(1), &env->sse_status); - r.ZMM_D(1) = float64_add(s->ZMM_D(0), s->ZMM_D(1), &env->sse_status); - MOVE(*d, r); +#define SSE_HELPER_HPD(name, F) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +{ \ + Reg *v = d; \ + float64 r[2 << SHIFT]; \ + int i, j; \ + for (i = j = 0; j < 2; i++, j += 2) { \ + r[i] = F(v->ZMM_D(j), v->ZMM_D(j + 1), &env->sse_status); \ + } \ + for (j = 0; j < 2; i++, j += 2) { \ + r[i] = F(s->ZMM_D(j), s->ZMM_D(j + 1), &env->sse_status); \ + } \ + for (i = 0; i < 1 << SHIFT; i++) { \ + d->ZMM_D(i) = r[i]; \ + } \ } -void glue(helper_hsubps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) -{ - ZMMReg r; - - r.ZMM_S(0) = float32_sub(d->ZMM_S(0), d->ZMM_S(1), &env->sse_status); - r.ZMM_S(1) = float32_sub(d->ZMM_S(2), d->ZMM_S(3), &env->sse_status); - r.ZMM_S(2) = float32_sub(s->ZMM_S(0), s->ZMM_S(1), &env->sse_status); - r.ZMM_S(3) = float32_sub(s->ZMM_S(2), s->ZMM_S(3), &env->sse_status); - MOVE(*d, r); -} - -void glue(helper_hsubpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) -{ - ZMMReg r; - - r.ZMM_D(0) = float64_sub(d->ZMM_D(0), d->ZMM_D(1), &env->sse_status); - r.ZMM_D(1) = float64_sub(s->ZMM_D(0), s->ZMM_D(1), &env->sse_status); - MOVE(*d, r); -} +SSE_HELPER_HPD(haddpd, float64_add) +SSE_HELPER_HPD(hsubpd, float64_sub) void glue(helper_addsubps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { From patchwork Thu Aug 25 22:14:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B17C3ECAAA2 for ; Thu, 25 Aug 2022 22:43:35 +0000 (UTC) Received: from localhost ([::1]:50886 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLZW-0002Ge-Ri for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:43:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60492) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7b-0008M6-Ka for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:43 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:49300) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7Z-0002kq-F4 for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465680; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fvbas4UYAGgZt48EwXWa6wXU7k3HQ7SO5UNXPPy1GOE=; b=MZ8x5kTUwdnt7TU+1FjHJzkv1jK0W+KFzutpv5YS+VEyy/NghpgstofTrVbOSjIfssXJGT iw0OHi3f0HJIvufKZANYOelUcDqxGzlxJ+nAqvrkPg4GgOV5VEeBLQ7Iv+3KqdOXVPYyZC HfnlSw0rweDbC7X9+3YjGQXVH9WMftA= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-591-eDgHNCBQN9GTJxBKWrYU9Q-1; Thu, 25 Aug 2022 18:14:39 -0400 X-MC-Unique: eDgHNCBQN9GTJxBKWrYU9Q-1 Received: by mail-wm1-f72.google.com with SMTP id ay21-20020a05600c1e1500b003a6271a9718so11236822wmb.0 for ; Thu, 25 Aug 2022 15:14:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Fvbas4UYAGgZt48EwXWa6wXU7k3HQ7SO5UNXPPy1GOE=; b=yd6LPncwjR4P2JN06tPaP5d9RtTQ0noVMVYCav20lovYmQzZO+wmAJ0FxS5jYiSeAZ wXOMIOJgt2swn/Rn0VJQMtWZo3d3AEDtXm7Ik8vTp6GqYzfJDeAF0xEektZAvxMzNjl5 gQHtpiZAv0y3NKXorvuXA1EnD53HQtGwpnhaR0yCpkOQLndUNnAKslJDkRAlDjlxl0TT uc3adQtiHtC1y70NyDVtGwGWaFXXsz7PEj3TXsD2hy4wYbx1jDJIFAYtzZh5AEHp1KnQ kJdZY0Qta9bdAUNupEFZczx6YVISMQSRIUfSI2yTjdkj+d753IjxmYhvyO3uTBnZC58f i77A== X-Gm-Message-State: ACgBeo12B5a1739wcfDuskvd26eSS1ipeoaZ4M/zkxHwgF2BQ5V/Soq3 TrPpN+ApUJSb78jz8zVGSxVAxTMATU6qts+n1gFRA8gATrYHtx2+kI8u8BLGpu8zk1GfHmpp5CU ABLigGbcUAXn/CgKC0LBFVHkEgnOS6jTNYVFmDCbtNN3ipDUm4NaqCGDZTzTEhaeh9+w= X-Received: by 2002:a5d:508a:0:b0:225:54cd:6a6f with SMTP id a10-20020a5d508a000000b0022554cd6a6fmr3579700wrt.658.1661465678128; Thu, 25 Aug 2022 15:14:38 -0700 (PDT) X-Google-Smtp-Source: AA6agR4v666PyZAZU9ytgnf6fvJFZgQYnrdrwRgzg0MbLCHvD/nW0PJkLVNXytBlN5pe6XernbNT3A== X-Received: by 2002:a5d:508a:0:b0:225:54cd:6a6f with SMTP id a10-20020a5d508a000000b0022554cd6a6fmr3579687wrt.658.1661465677693; Thu, 25 Aug 2022 15:14:37 -0700 (PDT) Received: from goa-sendmail ([93.56.160.208]) by smtp.gmail.com with ESMTPSA id u11-20020a05600c19cb00b003a5c7a942edsm6735433wmq.28.2022.08.25.15.14.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:37 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 15/18] i386: Misc AVX helper prep Date: Fri, 26 Aug 2022 00:14:08 +0200 Message-Id: <20220825221411.35122-16-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Fixup various vector helpers that either trivially exten to 256 bit, or don't have 256 bit variants. No functional changes to existing helpers Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-19-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 143 +++++++++++++++++++++++++++--------------- 1 file changed, 94 insertions(+), 49 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 7252e03619..6d5f9b9323 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -440,6 +440,7 @@ void glue(helper_psadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } } +#if SHIFT < 2 void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, target_ulong a0) { @@ -451,6 +452,7 @@ void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } } } +#endif void glue(helper_movl_mm_T0, SUFFIX)(Reg *d, uint32_t val) { @@ -640,21 +642,24 @@ void helper_sqrtsd(CPUX86State *env, Reg *d, Reg *s) /* float to float conversions */ void glue(helper_cvtps2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - float32 s0, s1; - - s0 = s->ZMM_S(0); - s1 = s->ZMM_S(1); - d->ZMM_D(0) = float32_to_float64(s0, &env->sse_status); - d->ZMM_D(1) = float32_to_float64(s1, &env->sse_status); + int i; + for (i = 1 << SHIFT; --i >= 0; ) { + d->ZMM_D(i) = float32_to_float64(s->ZMM_S(i), &env->sse_status); + } } void glue(helper_cvtpd2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - d->ZMM_S(0) = float64_to_float32(s->ZMM_D(0), &env->sse_status); - d->ZMM_S(1) = float64_to_float32(s->ZMM_D(1), &env->sse_status); - d->Q(1) = 0; + int i; + for (i = 0; i < 1 << SHIFT; i++) { + d->ZMM_S(i) = float64_to_float32(s->ZMM_D(i), &env->sse_status); + } + for (i >>= 1; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } } +#if SHIFT == 1 void helper_cvtss2sd(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_D(0) = float32_to_float64(s->ZMM_S(0), &env->sse_status); @@ -664,26 +669,27 @@ void helper_cvtsd2ss(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_S(0) = float64_to_float32(s->ZMM_D(0), &env->sse_status); } +#endif /* integer to float */ void glue(helper_cvtdq2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - d->ZMM_S(0) = int32_to_float32(s->ZMM_L(0), &env->sse_status); - d->ZMM_S(1) = int32_to_float32(s->ZMM_L(1), &env->sse_status); - d->ZMM_S(2) = int32_to_float32(s->ZMM_L(2), &env->sse_status); - d->ZMM_S(3) = int32_to_float32(s->ZMM_L(3), &env->sse_status); + int i; + for (i = 0; i < 2 << SHIFT; i++) { + d->ZMM_S(i) = int32_to_float32(s->ZMM_L(i), &env->sse_status); + } } void glue(helper_cvtdq2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - int32_t l0, l1; - - l0 = (int32_t)s->ZMM_L(0); - l1 = (int32_t)s->ZMM_L(1); - d->ZMM_D(0) = int32_to_float64(l0, &env->sse_status); - d->ZMM_D(1) = int32_to_float64(l1, &env->sse_status); + int i; + for (i = 1 << SHIFT; --i >= 0; ) { + int32_t l = s->ZMM_L(i); + d->ZMM_D(i) = int32_to_float64(l, &env->sse_status); + } } +#if SHIFT == 1 void helper_cvtpi2ps(CPUX86State *env, ZMMReg *d, MMXReg *s) { d->ZMM_S(0) = int32_to_float32(s->MMX_L(0), &env->sse_status); @@ -718,8 +724,11 @@ void helper_cvtsq2sd(CPUX86State *env, ZMMReg *d, uint64_t val) } #endif +#endif + /* float to integer */ +#if SHIFT == 1 /* * x86 mandates that we return the indefinite integer value for the result * of any float-to-integer conversion that raises the 'invalid' exception. @@ -750,22 +759,28 @@ WRAP_FLOATCONV(int64_t, float32_to_int64, float32, INT64_MIN) WRAP_FLOATCONV(int64_t, float32_to_int64_round_to_zero, float32, INT64_MIN) WRAP_FLOATCONV(int64_t, float64_to_int64, float64, INT64_MIN) WRAP_FLOATCONV(int64_t, float64_to_int64_round_to_zero, float64, INT64_MIN) +#endif void glue(helper_cvtps2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { - d->ZMM_L(0) = x86_float32_to_int32(s->ZMM_S(0), &env->sse_status); - d->ZMM_L(1) = x86_float32_to_int32(s->ZMM_S(1), &env->sse_status); - d->ZMM_L(2) = x86_float32_to_int32(s->ZMM_S(2), &env->sse_status); - d->ZMM_L(3) = x86_float32_to_int32(s->ZMM_S(3), &env->sse_status); + int i; + for (i = 0; i < 2 << SHIFT; i++) { + d->ZMM_L(i) = x86_float32_to_int32(s->ZMM_S(i), &env->sse_status); + } } void glue(helper_cvtpd2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { - d->ZMM_L(0) = x86_float64_to_int32(s->ZMM_D(0), &env->sse_status); - d->ZMM_L(1) = x86_float64_to_int32(s->ZMM_D(1), &env->sse_status); - d->ZMM_Q(1) = 0; + int i; + for (i = 0; i < 1 << SHIFT; i++) { + d->ZMM_L(i) = x86_float64_to_int32(s->ZMM_D(i), &env->sse_status); + } + for (i >>= 1; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } } +#if SHIFT == 1 void helper_cvtps2pi(CPUX86State *env, MMXReg *d, ZMMReg *s) { d->MMX_L(0) = x86_float32_to_int32(s->ZMM_S(0), &env->sse_status); @@ -799,23 +814,31 @@ int64_t helper_cvtsd2sq(CPUX86State *env, ZMMReg *s) return x86_float64_to_int64(s->ZMM_D(0), &env->sse_status); } #endif +#endif /* float to integer truncated */ void glue(helper_cvttps2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { - d->ZMM_L(0) = x86_float32_to_int32_round_to_zero(s->ZMM_S(0), &env->sse_status); - d->ZMM_L(1) = x86_float32_to_int32_round_to_zero(s->ZMM_S(1), &env->sse_status); - d->ZMM_L(2) = x86_float32_to_int32_round_to_zero(s->ZMM_S(2), &env->sse_status); - d->ZMM_L(3) = x86_float32_to_int32_round_to_zero(s->ZMM_S(3), &env->sse_status); + int i; + for (i = 0; i < 2 << SHIFT; i++) { + d->ZMM_L(i) = x86_float32_to_int32_round_to_zero(s->ZMM_S(i), + &env->sse_status); + } } void glue(helper_cvttpd2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { - d->ZMM_L(0) = x86_float64_to_int32_round_to_zero(s->ZMM_D(0), &env->sse_status); - d->ZMM_L(1) = x86_float64_to_int32_round_to_zero(s->ZMM_D(1), &env->sse_status); - d->ZMM_Q(1) = 0; + int i; + for (i = 0; i < 1 << SHIFT; i++) { + d->ZMM_L(i) = x86_float64_to_int32_round_to_zero(s->ZMM_D(i), + &env->sse_status); + } + for (i >>= 1; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + } } +#if SHIFT == 1 void helper_cvttps2pi(CPUX86State *env, MMXReg *d, ZMMReg *s) { d->MMX_L(0) = x86_float32_to_int32_round_to_zero(s->ZMM_S(0), &env->sse_status); @@ -849,6 +872,7 @@ int64_t helper_cvttsd2sq(CPUX86State *env, ZMMReg *s) return x86_float64_to_int64_round_to_zero(s->ZMM_D(0), &env->sse_status); } #endif +#endif void glue(helper_rsqrtps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { @@ -862,6 +886,7 @@ void glue(helper_rsqrtps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) set_float_exception_flags(old_flags, &env->sse_status); } +#if SHIFT == 1 void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *s) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); @@ -870,6 +895,7 @@ void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *s) &env->sse_status); set_float_exception_flags(old_flags, &env->sse_status); } +#endif void glue(helper_rcpps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { @@ -881,13 +907,16 @@ void glue(helper_rcpps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) set_float_exception_flags(old_flags, &env->sse_status); } +#if SHIFT == 1 void helper_rcpss(CPUX86State *env, ZMMReg *d, ZMMReg *s) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); d->ZMM_S(0) = float32_div(float32_one, s->ZMM_S(0), &env->sse_status); set_float_exception_flags(old_flags, &env->sse_status); } +#endif +#if SHIFT == 1 static inline uint64_t helper_extrq(uint64_t src, int shift, int len) { uint64_t mask; @@ -931,6 +960,7 @@ void helper_insertq_i(CPUX86State *env, ZMMReg *d, int index, int length) { d->ZMM_Q(0) = helper_insertq(d->ZMM_Q(0), index, length); } +#endif #define SSE_HELPER_HPS(name, F) \ void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ @@ -1053,6 +1083,7 @@ SSE_HELPER_CMP(cmpord, FPU_CMPQ, !FPU_UNORD) #undef SSE_HELPER_CMP +#if SHIFT == 1 static const int comis_eflags[4] = {CC_C, CC_Z, 0, CC_Z | CC_P | CC_C}; void helper_ucomiss(CPUX86State *env, Reg *d, Reg *s) @@ -1098,25 +1129,30 @@ void helper_comisd(CPUX86State *env, Reg *d, Reg *s) ret = float64_compare(d0, d1, &env->sse_status); CC_SRC = comis_eflags[ret + 1]; } +#endif uint32_t glue(helper_movmskps, SUFFIX)(CPUX86State *env, Reg *s) { - int b0, b1, b2, b3; + uint32_t mask; + int i; - b0 = s->ZMM_L(0) >> 31; - b1 = s->ZMM_L(1) >> 31; - b2 = s->ZMM_L(2) >> 31; - b3 = s->ZMM_L(3) >> 31; - return b0 | (b1 << 1) | (b2 << 2) | (b3 << 3); + mask = 0; + for (i = 0; i < 2 << SHIFT; i++) { + mask |= (s->ZMM_L(i) >> (31 - i)) & (1 << i); + } + return mask; } uint32_t glue(helper_movmskpd, SUFFIX)(CPUX86State *env, Reg *s) { - int b0, b1; + uint32_t mask; + int i; - b0 = s->ZMM_L(1) >> 31; - b1 = s->ZMM_L(3) >> 31; - return b0 | (b1 << 1); + mask = 0; + for (i = 0; i < 1 << SHIFT; i++) { + mask |= (s->ZMM_Q(i) >> (63 - i)) & (1 << i); + } + return mask; } #endif @@ -1765,6 +1801,7 @@ SSE_HELPER_L(helper_pmaxud, MAX) #define FMULLD(d, s) ((int32_t)d * (int32_t)s) SSE_HELPER_L(helper_pmulld, FMULLD) +#if SHIFT == 1 void glue(helper_phminposuw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int idx = 0; @@ -1796,12 +1833,14 @@ void glue(helper_phminposuw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) d->L(1) = 0; d->Q(1) = 0; } +#endif void glue(helper_roundps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mode) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); signed char prev_rounding_mode; + int i; prev_rounding_mode = env->sse_status.float_rounding_mode; if (!(mode & (1 << 2))) { @@ -1821,10 +1860,9 @@ void glue(helper_roundps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } } - d->ZMM_S(0) = float32_round_to_int(s->ZMM_S(0), &env->sse_status); - d->ZMM_S(1) = float32_round_to_int(s->ZMM_S(1), &env->sse_status); - d->ZMM_S(2) = float32_round_to_int(s->ZMM_S(2), &env->sse_status); - d->ZMM_S(3) = float32_round_to_int(s->ZMM_S(3), &env->sse_status); + for (i = 0; i < 2 << SHIFT; i++) { + d->ZMM_S(i) = float32_round_to_int(s->ZMM_S(i), &env->sse_status); + } if (mode & (1 << 3) && !(old_flags & float_flag_inexact)) { set_float_exception_flags(get_float_exception_flags(&env->sse_status) & @@ -1839,6 +1877,7 @@ void glue(helper_roundpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, { uint8_t old_flags = get_float_exception_flags(&env->sse_status); signed char prev_rounding_mode; + int i; prev_rounding_mode = env->sse_status.float_rounding_mode; if (!(mode & (1 << 2))) { @@ -1858,8 +1897,9 @@ void glue(helper_roundpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } } - d->ZMM_D(0) = float64_round_to_int(s->ZMM_D(0), &env->sse_status); - d->ZMM_D(1) = float64_round_to_int(s->ZMM_D(1), &env->sse_status); + for (i = 0; i < 1 << SHIFT; i++) { + d->ZMM_D(i) = float64_round_to_int(s->ZMM_D(i), &env->sse_status); + } if (mode & (1 << 3) && !(old_flags & float_flag_inexact)) { set_float_exception_flags(get_float_exception_flags(&env->sse_status) & @@ -1869,6 +1909,7 @@ void glue(helper_roundpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, env->sse_status.float_rounding_mode = prev_rounding_mode; } +#if SHIFT == 1 void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mode) { @@ -1936,6 +1977,7 @@ void glue(helper_roundsd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } env->sse_status.float_rounding_mode = prev_rounding_mode; } +#endif #define FBLENDP(d, s, m) (m ? s : d) SSE_HELPER_I(helper_blendps, L, 4, FBLENDP) @@ -2034,6 +2076,7 @@ void glue(helper_mpsadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #define FCMPGTQ(d, s) ((int64_t)d > (int64_t)s ? -1 : 0) SSE_HELPER_Q(helper_pcmpgtq, FCMPGTQ) +#if SHIFT == 1 static inline int pcmp_elen(CPUX86State *env, int reg, uint32_t ctrl) { target_long val, limit; @@ -2254,6 +2297,8 @@ target_ulong helper_crc32(uint32_t crc1, target_ulong msg, uint32_t len) return crc; } +#endif + void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t ctrl) { From patchwork Thu Aug 25 22:14:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2B4C4ECAAA2 for ; Thu, 25 Aug 2022 22:31:23 +0000 (UTC) Received: from localhost ([::1]:45774 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLNi-0007Ep-5F for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:31:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60496) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7d-0008Rg-5d for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:45 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:39974) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7a-0002l3-Rj for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465682; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Tok4j0sTJaxMGIqatullGemy6/Xa4yxJp5Fm5pV0WyQ=; b=Ut9tcXHtHhALK2vP9z7evHvKU0QMA6lBdlmxVpy0taqcYhy5q+3IRsPg9Jp2VTszcxHP1h l6Q0dUM75apPecLiceAswiggjWaOMu3HXvKbfwUqinwxJIT/l2RQUOvK7vrpAqoFJ4uAuF wMsYrxte+ObIjM6755b1cDtBQ2SXUDM= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-154-DxtBuoNKMCW1hbYILSc6bw-1; Thu, 25 Aug 2022 18:14:41 -0400 X-MC-Unique: DxtBuoNKMCW1hbYILSc6bw-1 Received: by mail-wm1-f69.google.com with SMTP id az42-20020a05600c602a00b003a552086ba9so3039649wmb.6 for ; Thu, 25 Aug 2022 15:14:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Tok4j0sTJaxMGIqatullGemy6/Xa4yxJp5Fm5pV0WyQ=; b=5u4AzrEjnWsHurFPJNeU5lDNF+4fXiSkKYuTgd0lFLutJeCbJnSZWkACvw0A2JKD0W NMI7OAqUhaGTEVJtdo/cAy5yhn0vyQz1W/CEPtKInVGYTtMbhNGGz9g5RNe/IMqSYQ2I EP1+hf7KEhU7yr3p2pfgGaaz1PUD6+F3c26BNNkvCfF34TTAlFne0TAZx4QB0MTYKEAk J0oBlPW8HyJjEeFLuso0SXIHxJt6Mpnmx4Pr/GSJ3zB7F0kXMp5SpadX9s7eIqg+PDNq m4hhllNQRWWwr+6qeAlMEpJNps+YVDqDcfrF1lXGSVKcT4pqEUKCQBQVXS3i5P9N+5UH DEQg== X-Gm-Message-State: ACgBeo3hRZ4LeN7/aB9Ld0k2qlZdgs8tswsOfzBQiyPeebnCKcYVEeFm nHk6ieBlOK31cRDDtWj5ZRvn9iQnqxhn/bIQhbEvqZirRfVctS8w+B4GJQlrwUe5MCa1WE8ek1T 6Chx8CDvpL3Ua+oBzMmCVpg1lOmSwosKFhpDn/kD0Ha+yjAAHhAFzYQcXfKaZymqAh0Y= X-Received: by 2002:a1c:3b55:0:b0:3a6:7b62:3901 with SMTP id i82-20020a1c3b55000000b003a67b623901mr3445426wma.113.1661465679463; Thu, 25 Aug 2022 15:14:39 -0700 (PDT) X-Google-Smtp-Source: AA6agR75i7edhqvlVTcvGLhho9okSPgU7px9gYqSJSoQzwg3q1nXPyGd0m+lk640aAUe9UT5OK3hEw== X-Received: by 2002:a1c:3b55:0:b0:3a6:7b62:3901 with SMTP id i82-20020a1c3b55000000b003a67b623901mr3445414wma.113.1661465679124; Thu, 25 Aug 2022 15:14:39 -0700 (PDT) Received: from goa-sendmail ([93.56.160.208]) by smtp.gmail.com with ESMTPSA id p15-20020a5d48cf000000b0022537d826f3sm341340wrs.23.2022.08.25.15.14.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:38 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 16/18] i386: Rewrite blendv helpers Date: Fri, 26 Aug 2022 00:14:09 +0200 Message-Id: <20220825221411.35122-17-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Rewrite the blendv helpers so that they can easily be extended to support the AVX encodings, which make all 4 arguments explicit. No functional changes to the existing helpers Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-20-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 101 ++++++++++++++++-------------------------- 1 file changed, 39 insertions(+), 62 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 6d5f9b9323..1ff3e92331 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1644,76 +1644,53 @@ void glue(helper_palignr, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } } -#define XMM0 (env->xmm_regs[0]) +#if SHIFT >= 1 -#if SHIFT == 1 #define SSE_HELPER_V(name, elem, num, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ + void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ { \ - d->elem(0) = F(d->elem(0), s->elem(0), XMM0.elem(0)); \ - d->elem(1) = F(d->elem(1), s->elem(1), XMM0.elem(1)); \ - if (num > 2) { \ - d->elem(2) = F(d->elem(2), s->elem(2), XMM0.elem(2)); \ - d->elem(3) = F(d->elem(3), s->elem(3), XMM0.elem(3)); \ - if (num > 4) { \ - d->elem(4) = F(d->elem(4), s->elem(4), XMM0.elem(4)); \ - d->elem(5) = F(d->elem(5), s->elem(5), XMM0.elem(5)); \ - d->elem(6) = F(d->elem(6), s->elem(6), XMM0.elem(6)); \ - d->elem(7) = F(d->elem(7), s->elem(7), XMM0.elem(7)); \ - if (num > 8) { \ - d->elem(8) = F(d->elem(8), s->elem(8), XMM0.elem(8)); \ - d->elem(9) = F(d->elem(9), s->elem(9), XMM0.elem(9)); \ - d->elem(10) = F(d->elem(10), s->elem(10), XMM0.elem(10)); \ - d->elem(11) = F(d->elem(11), s->elem(11), XMM0.elem(11)); \ - d->elem(12) = F(d->elem(12), s->elem(12), XMM0.elem(12)); \ - d->elem(13) = F(d->elem(13), s->elem(13), XMM0.elem(13)); \ - d->elem(14) = F(d->elem(14), s->elem(14), XMM0.elem(14)); \ - d->elem(15) = F(d->elem(15), s->elem(15), XMM0.elem(15)); \ - } \ - } \ + Reg *v = d; \ + Reg *m = &env->xmm_regs[0]; \ + int i; \ + for (i = 0; i < num; i++) { \ + d->elem(i) = F(v->elem(i), s->elem(i), m->elem(i)); \ } \ } +#define BLEND_I128(elem, num, F, b) do { \ + d->elem(b + 0) = F(v->elem(b + 0), s->elem(b + 0), ((imm >> 0) & 1)); \ + d->elem(b + 1) = F(v->elem(b + 1), s->elem(b + 1), ((imm >> 1) & 1)); \ + if (num > 2) { \ + d->elem(b + 2) = F(v->elem(b + 2), s->elem(b + 2), ((imm >> 2) & 1)); \ + d->elem(b + 3) = F(v->elem(b + 3), s->elem(b + 3), ((imm >> 3) & 1)); \ + } \ + if (num > 4) { \ + d->elem(b + 4) = F(v->elem(b + 4), s->elem(b + 4), ((imm >> 4) & 1)); \ + d->elem(b + 5) = F(v->elem(b + 5), s->elem(b + 5), ((imm >> 5) & 1)); \ + d->elem(b + 6) = F(v->elem(b + 6), s->elem(b + 6), ((imm >> 6) & 1)); \ + d->elem(b + 7) = F(v->elem(b + 7), s->elem(b + 7), ((imm >> 7) & 1)); \ + } \ + } while (0) + #define SSE_HELPER_I(name, elem, num, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t imm) \ + void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, \ + uint32_t imm) \ { \ - d->elem(0) = F(d->elem(0), s->elem(0), ((imm >> 0) & 1)); \ - d->elem(1) = F(d->elem(1), s->elem(1), ((imm >> 1) & 1)); \ - if (num > 2) { \ - d->elem(2) = F(d->elem(2), s->elem(2), ((imm >> 2) & 1)); \ - d->elem(3) = F(d->elem(3), s->elem(3), ((imm >> 3) & 1)); \ - if (num > 4) { \ - d->elem(4) = F(d->elem(4), s->elem(4), ((imm >> 4) & 1)); \ - d->elem(5) = F(d->elem(5), s->elem(5), ((imm >> 5) & 1)); \ - d->elem(6) = F(d->elem(6), s->elem(6), ((imm >> 6) & 1)); \ - d->elem(7) = F(d->elem(7), s->elem(7), ((imm >> 7) & 1)); \ - if (num > 8) { \ - d->elem(8) = F(d->elem(8), s->elem(8), ((imm >> 8) & 1)); \ - d->elem(9) = F(d->elem(9), s->elem(9), ((imm >> 9) & 1)); \ - d->elem(10) = F(d->elem(10), s->elem(10), \ - ((imm >> 10) & 1)); \ - d->elem(11) = F(d->elem(11), s->elem(11), \ - ((imm >> 11) & 1)); \ - d->elem(12) = F(d->elem(12), s->elem(12), \ - ((imm >> 12) & 1)); \ - d->elem(13) = F(d->elem(13), s->elem(13), \ - ((imm >> 13) & 1)); \ - d->elem(14) = F(d->elem(14), s->elem(14), \ - ((imm >> 14) & 1)); \ - d->elem(15) = F(d->elem(15), s->elem(15), \ - ((imm >> 15) & 1)); \ - } \ - } \ + Reg *v = d; \ + int i; \ + for (i = 0; i < num; i++) { \ + int j = i & 7; \ + d->elem(i) = F(v->elem(i), s->elem(i), (imm >> j) & 1); \ } \ } /* SSE4.1 op helpers */ -#define FBLENDVB(d, s, m) ((m & 0x80) ? s : d) -#define FBLENDVPS(d, s, m) ((m & 0x80000000) ? s : d) -#define FBLENDVPD(d, s, m) ((m & 0x8000000000000000LL) ? s : d) -SSE_HELPER_V(helper_pblendvb, B, 16, FBLENDVB) -SSE_HELPER_V(helper_blendvps, L, 4, FBLENDVPS) -SSE_HELPER_V(helper_blendvpd, Q, 2, FBLENDVPD) +#define FBLENDVB(v, s, m) ((m & 0x80) ? s : v) +#define FBLENDVPS(v, s, m) ((m & 0x80000000) ? s : v) +#define FBLENDVPD(v, s, m) ((m & 0x8000000000000000LL) ? s : v) +SSE_HELPER_V(helper_pblendvb, B, 8 << SHIFT, FBLENDVB) +SSE_HELPER_V(helper_blendvps, L, 2 << SHIFT, FBLENDVPS) +SSE_HELPER_V(helper_blendvpd, Q, 1 << SHIFT, FBLENDVPD) void glue(helper_ptest, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { @@ -1979,10 +1956,10 @@ void glue(helper_roundsd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } #endif -#define FBLENDP(d, s, m) (m ? s : d) -SSE_HELPER_I(helper_blendps, L, 4, FBLENDP) -SSE_HELPER_I(helper_blendpd, Q, 2, FBLENDP) -SSE_HELPER_I(helper_pblendw, W, 8, FBLENDP) +#define FBLENDP(v, s, m) (m ? s : v) +SSE_HELPER_I(helper_blendps, L, 2 << SHIFT, FBLENDP) +SSE_HELPER_I(helper_blendpd, Q, 1 << SHIFT, FBLENDP) +SSE_HELPER_I(helper_pblendw, W, 4 << SHIFT, FBLENDP) void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) From patchwork Thu Aug 25 22:14:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955298 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 23D81ECAAA2 for ; Thu, 25 Aug 2022 22:26:55 +0000 (UTC) Received: from localhost ([::1]:60564 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLJO-0007G2-46 for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:26:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60498) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7d-0008TZ-L3 for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:45 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:52293) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7c-0002lH-1u for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465683; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ls72lT1cPnoHm9VkJEIUVxVpDLhkizU4FberpVR/l3I=; b=Obzmm5oU2EdPcv2LB7GkCJPvO+faFGcHaIkzpSfQKjb/wMjh9O3uWT+IQPFcK/ae7HIvjO XrFCO7jlILI1FT4906clomQDQUz0OaH8IPFTrInNIQ51/0RBWVhtzAoBxY+q9lCszCT4XL 4Ijv5nhqVF9061//AJoUM8mVEdIPxfY= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-339-p5VpbtpQODalaqPxxZJbuA-1; Thu, 25 Aug 2022 18:14:42 -0400 X-MC-Unique: p5VpbtpQODalaqPxxZJbuA-1 Received: by mail-wm1-f69.google.com with SMTP id i7-20020a1c3b07000000b003a534ec2570so3038264wma.7 for ; Thu, 25 Aug 2022 15:14:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Ls72lT1cPnoHm9VkJEIUVxVpDLhkizU4FberpVR/l3I=; b=UZq9dq7gXKKLn87OzNqbeqn5jQd42dvByeXe9xYEYXEFtfT65k7zW/x9n8f+ow+9iF ac41IGXUDIefY8gm723aEupsyx0f5RRQu/p6z5LD86Pf4G/S9kgVl8qq+o6I36BE3eM3 Yd+HSDhY0SUmWWxgqSOrHuJTI2sv/6t3ImBtRsXDTqfdzdhJRzSWxJA/+VQ797h8lWjV p2tZz5z5NIEqrJnlKJzp+EFyZjaNQ8fgGCSeokDvnFCgMRdqxpnaURwweYFtteC4PVsP OV1o24uxg4PHMogYlj1eLwgqr3dbpo8LSrTWgvderQLr+ccQ8OFySwUV1F2vpQKz6blQ BcIg== X-Gm-Message-State: ACgBeo0Axa7SZmPLm/iZHXinHFBxtOyCJVDdrgSvVa378byF6vGxhCTm l+SeIV1anvrcWNihf+dbx511UDIv6iqr2v4qaCy0VrqjYbkIJgHFIEZcOBkLXVzGfO1eg/I9rd2 7zQ41AAtr2bTLxc4uWq988q5AEWZaCXI8WHawCQO06z4b7dtY/7fYfpXKw+Ti60BVabc= X-Received: by 2002:a5d:6488:0:b0:225:f99:338b with SMTP id o8-20020a5d6488000000b002250f99338bmr3492378wri.638.1661465680937; Thu, 25 Aug 2022 15:14:40 -0700 (PDT) X-Google-Smtp-Source: AA6agR6dVIUgKCK57KGeqeEBrTvWzh+tP/C6yw8UM4P2TmpZCSV0KF5s10M+S1Ak2vCBYaD2LAPJsA== X-Received: by 2002:a5d:6488:0:b0:225:f99:338b with SMTP id o8-20020a5d6488000000b002250f99338bmr3492372wri.638.1661465680655; Thu, 25 Aug 2022 15:14:40 -0700 (PDT) Received: from goa-sendmail ([93.56.160.208]) by smtp.gmail.com with ESMTPSA id t2-20020adfe102000000b0022063e5228bsm282297wrz.93.2022.08.25.15.14.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:40 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 17/18] i386: AVX pclmulqdq prep Date: Fri, 26 Aug 2022 00:14:10 +0200 Message-Id: <20220825221411.35122-18-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Make the pclmulqdq helper AVX ready Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-21-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 1ff3e92331..6b5d076685 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2276,14 +2276,14 @@ target_ulong helper_crc32(uint32_t crc1, target_ulong msg, uint32_t len) #endif -void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, - uint32_t ctrl) +#if SHIFT == 1 +static void clmulq(uint64_t *dest_l, uint64_t *dest_h, + uint64_t a, uint64_t b) { - uint64_t ah, al, b, resh, resl; + uint64_t al, ah, resh, resl; ah = 0; - al = d->Q((ctrl & 1) != 0); - b = s->Q((ctrl & 16) != 0); + al = a; resh = resl = 0; while (b) { @@ -2296,8 +2296,23 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, b >>= 1; } - d->Q(0) = resl; - d->Q(1) = resh; + *dest_l = resl; + *dest_h = resh; +} +#endif + +void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, + uint32_t ctrl) +{ + Reg *v = d; + uint64_t a, b; + int i; + + for (i = 0; i < 1 << SHIFT; i += 2) { + a = v->Q(((ctrl & 1) != 0) + i); + b = s->Q(((ctrl & 16) != 0) + i); + clmulq(&d->Q(i), &d->Q(i + 1), a, b); + } } void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) From patchwork Thu Aug 25 22:14:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12955313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CEF62ECAAA2 for ; Thu, 25 Aug 2022 22:36:08 +0000 (UTC) Received: from localhost ([::1]:37600 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oRLSJ-0004e0-Ta for qemu-devel@archiver.kernel.org; Thu, 25 Aug 2022 18:36:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53642) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7f-00007b-De for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:47 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36651) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oRL7d-0002li-Ms for qemu-devel@nongnu.org; Thu, 25 Aug 2022 18:14:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661465685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L/k06SoO5T+U0dlx/3ozz1TDbXRIIPT1aQsNiQ9nHzk=; b=facwuAIlhBpqi2TmbHJq1tagUiI28YGb0J7fzETIC+u0Scydi/tjRzIc8dbSXRudXvJ4QQ MKv7Fo6MoFmP953sfLOkCTKdbwPsseq9ZtaPTGi9aDoMdZNSCw9/obd5OHRBGpexsr65rv Na1CD3DCkbVDdliRrRIA3FQ4lycWVTk= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-206-fsiWfS1xMVKVCH_P7Lwdag-1; Thu, 25 Aug 2022 18:14:44 -0400 X-MC-Unique: fsiWfS1xMVKVCH_P7Lwdag-1 Received: by mail-wm1-f71.google.com with SMTP id b16-20020a05600c4e1000b003a5a47762c3so11469696wmq.9 for ; Thu, 25 Aug 2022 15:14:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=L/k06SoO5T+U0dlx/3ozz1TDbXRIIPT1aQsNiQ9nHzk=; b=Ij6XOJ8jmo+9p5J6NAqpLXWPubn+EllM8YTYolmuDa8zZN7m5aTr3wJlKWtA06Og/V Mcj6RjRBI9TbMkt5M52qQGLfxSYa4YY/67qv1rH4wjUozksjMfLOfFh1sFsVYx04NOCy edWPW9BEevDvPjobK4hi9IGrAYkNutE+mhykCt287TsDjNCTb3uc+GtE+gJAtt8Oihe/ f4ejflcVHkpUpEPHDL71WrlAA+7HP/CJABszsmG3gM9o0A5+tiFBxtfbzDFtrdDVcntD 9pk/dMIv/NpNs0+Qf14cqGtU+lNJJxPzW+OPzA/EoSNHjR0ZPDArRMrWLI8yVAQViAuD FWzQ== X-Gm-Message-State: ACgBeo1s0jDNTckLsuyFkmfJcvxRS/qenLiU4DM6gOhUaYVXDQKcFUdp Wy9J2J0hHbjC+k/jg7bZrVLhOK116u3CrQ33/JVg2tgl2r8hNId/rg/Ef/dneWGrToFgdL+Pdj6 byCmAjS8YpDMkGxtPr+kgMAQzB0kO23/YRT90bkni7hK6KMr45EFv2qvw/Q66kKFIY1I= X-Received: by 2002:a05:600c:1c8d:b0:3a6:8ef:f6c0 with SMTP id k13-20020a05600c1c8d00b003a608eff6c0mr9223916wms.23.1661465682497; Thu, 25 Aug 2022 15:14:42 -0700 (PDT) X-Google-Smtp-Source: AA6agR4IpTAhgXo52ggvYoxCwg5XzmyosKk4HetDZyDvykiDX6XuRNsrASynPveZnCg7P5zMlhVdMg== X-Received: by 2002:a05:600c:1c8d:b0:3a6:8ef:f6c0 with SMTP id k13-20020a05600c1c8d00b003a608eff6c0mr9223905wms.23.1661465682204; Thu, 25 Aug 2022 15:14:42 -0700 (PDT) Received: from goa-sendmail ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.gmail.com with ESMTPSA id w7-20020a5d6087000000b0021eed2414c9sm318296wrt.40.2022.08.25.15.14.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 15:14:41 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 18/18] i386: AVX+AES helpers prep Date: Fri, 26 Aug 2022 00:14:11 +0200 Message-Id: <20220825221411.35122-19-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220825221411.35122-1-pbonzini@redhat.com> References: <20220825221411.35122-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Make the AES vector helpers AVX ready No functional changes to existing helpers Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-22-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 49 +++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 6b5d076685..1e8d8e5c15 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2318,64 +2318,66 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; - Reg st = *d; + Reg st = *d; // v Reg rk = *s; - for (i = 0 ; i < 4 ; i++) { - d->L(i) = rk.L(i) ^ bswap32(AES_Td0[st.B(AES_ishifts[4*i+0])] ^ - AES_Td1[st.B(AES_ishifts[4*i+1])] ^ - AES_Td2[st.B(AES_ishifts[4*i+2])] ^ - AES_Td3[st.B(AES_ishifts[4*i+3])]); + for (i = 0 ; i < 2 << SHIFT ; i++) { + int j = i & 3; + d->L(i) = rk.L(i) ^ bswap32(AES_Td0[st.B(AES_ishifts[4 * j + 0])] ^ + AES_Td1[st.B(AES_ishifts[4 * j + 1])] ^ + AES_Td2[st.B(AES_ishifts[4 * j + 2])] ^ + AES_Td3[st.B(AES_ishifts[4 * j + 3])]); } } void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; - Reg st = *d; + Reg st = *d; // v Reg rk = *s; - for (i = 0; i < 16; i++) { - d->B(i) = rk.B(i) ^ (AES_isbox[st.B(AES_ishifts[i])]); + for (i = 0; i < 8 << SHIFT; i++) { + d->B(i) = rk.B(i) ^ (AES_isbox[st.B(AES_ishifts[i & 15] + (i & ~15))]); } } void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; - Reg st = *d; + Reg st = *d; // v Reg rk = *s; - for (i = 0 ; i < 4 ; i++) { - d->L(i) = rk.L(i) ^ bswap32(AES_Te0[st.B(AES_shifts[4*i+0])] ^ - AES_Te1[st.B(AES_shifts[4*i+1])] ^ - AES_Te2[st.B(AES_shifts[4*i+2])] ^ - AES_Te3[st.B(AES_shifts[4*i+3])]); + for (i = 0 ; i < 2 << SHIFT ; i++) { + int j = i & 3; + d->L(i) = rk.L(i) ^ bswap32(AES_Te0[st.B(AES_shifts[4 * j + 0])] ^ + AES_Te1[st.B(AES_shifts[4 * j + 1])] ^ + AES_Te2[st.B(AES_shifts[4 * j + 2])] ^ + AES_Te3[st.B(AES_shifts[4 * j + 3])]); } } void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; - Reg st = *d; + Reg st = *d; // v Reg rk = *s; - for (i = 0; i < 16; i++) { - d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i])]); + for (i = 0; i < 8 << SHIFT; i++) { + d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i & 15] + (i & ~15))]); } - } +#if SHIFT == 1 void glue(helper_aesimc, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; Reg tmp = *s; for (i = 0 ; i < 4 ; i++) { - d->L(i) = bswap32(AES_imc[tmp.B(4*i+0)][0] ^ - AES_imc[tmp.B(4*i+1)][1] ^ - AES_imc[tmp.B(4*i+2)][2] ^ - AES_imc[tmp.B(4*i+3)][3]); + d->L(i) = bswap32(AES_imc[tmp.B(4 * i + 0)][0] ^ + AES_imc[tmp.B(4 * i + 1)][1] ^ + AES_imc[tmp.B(4 * i + 2)][2] ^ + AES_imc[tmp.B(4 * i + 3)][3]); } } @@ -2393,6 +2395,7 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, d->L(3) = (d->L(2) << 24 | d->L(2) >> 8) ^ ctrl; } #endif +#endif #undef SSE_HELPER_S