From patchwork Sun Sep 11 23:03:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973108 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DF08DECAAD3 for ; Sun, 11 Sep 2022 23:08:19 +0000 (UTC) Received: from localhost ([::1]:52812 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXW3m-0004j1-S7 for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:08:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57134) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW08-0000Ea-CQ for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:37 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:54045) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW04-0006yE-25 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937467; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8qNlguD+njmm2618TYLP1T4Fj6fykhZdMOk4HXO6te4=; b=Ynjr12MtEi9A1jrhHgJ7lRJyrSZZDKjdlk+Cu019iFv15/700dedVVyFwqUTtDoHQYQ/Rv 6hw2/NU33GLQWXUciiewJkGAGNgYULRwh2rNW1disZdTyjrU6b3LWz61f6/IgExb1+uoFR DHJ8yTsUeDbWdm64w5/cHwxJ86Dv78E= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-61-_bW_QJPfPteNP0XN-V6b1A-1; Sun, 11 Sep 2022 19:04:26 -0400 X-MC-Unique: _bW_QJPfPteNP0XN-V6b1A-1 Received: by mail-ej1-f69.google.com with SMTP id xj11-20020a170906db0b00b0077b6ecb23fcso503218ejb.5 for ; Sun, 11 Sep 2022 16:04:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=8qNlguD+njmm2618TYLP1T4Fj6fykhZdMOk4HXO6te4=; b=2HITrOi470e/mbRRuJRJcE3qgbPrkiTiBb8xbdYkr0tWtSSDbZ9GblC0QEhvihwUbi isgpoFWWTQ+dZ32kUdsalkgXKALz8qZDqeYG4Z+XWxLYpaHhZJMRl2gdz99P6LPUbSS+ XFnPsOTt4rz8EgIAksjNFwq7Z/5mNeZ4w1ljuxnAjGjyKsTv10444dMH3Jx2QFRfVpXF wr9YABFt6y3firq8V+FQeeR52E+WgVefgzrHObTR2ORMhgmwCp8E+xElOHXT1gqxW/+z /7T6oW05EKY7K7T2M/TRY0Y/bQ5zCuWshPjn5EJ8ItJ7WV5DACvGbXZpwxW4UEfOD/71 x4lg== X-Gm-Message-State: ACgBeo3TcjukIGUw62oDk6L1zawvYMOMNp73URSoP4V6zte40qBCvXfw 17wBasOADxIsd0+Jb8jryczcnBuBa68VQHF3bwlpqVjUfUS3pWtsN3F+t1kI1Qcg5suWBqy9fX/ ctVttU1URHrQHRHsleaO3nGFBcMtiiEQRMVYEPRNmU0MJs9EjgcdJ/uIMjaoaTbNWGOI= X-Received: by 2002:a17:907:7610:b0:77c:b7a:9fb2 with SMTP id jx16-20020a170907761000b0077c0b7a9fb2mr4001709ejc.468.1662937464790; Sun, 11 Sep 2022 16:04:24 -0700 (PDT) X-Google-Smtp-Source: AA6agR602vst2u/i/LH3R8hm3j3FqnvrQSP61FYyhs8GOVy6yKOL/vpXsB5IlRmhhApbEZgJOatkjA== X-Received: by 2002:a17:907:7610:b0:77c:b7a:9fb2 with SMTP id jx16-20020a170907761000b0077c0b7a9fb2mr4001695ejc.468.1662937464510; Sun, 11 Sep 2022 16:04:24 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id 10-20020a170906218a00b007420aaba67esm3546828eju.36.2022.09.11.16.04.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:24 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: Richard Henderson Subject: [PATCH 01/37] target/i386: Define XMMReg and access macros, align ZMM registers Date: Mon, 12 Sep 2022 01:03:41 +0200 Message-Id: <20220911230418.340941-2-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Richard Henderson This will be used for emission and endian adjustments of gvec operations. Signed-off-by: Richard Henderson Message-Id: <20220822223722.1697758-2-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini --- target/i386/cpu.h | 56 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 43 insertions(+), 13 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 82004b65b9..8311b69c88 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1233,18 +1233,34 @@ typedef struct SegmentCache { uint32_t flags; } SegmentCache; -#define MMREG_UNION(n, bits) \ - union n { \ - uint8_t _b_##n[(bits)/8]; \ - uint16_t _w_##n[(bits)/16]; \ - uint32_t _l_##n[(bits)/32]; \ - uint64_t _q_##n[(bits)/64]; \ - float32 _s_##n[(bits)/32]; \ - float64 _d_##n[(bits)/64]; \ - } +typedef union MMXReg { + uint8_t _b_MMXReg[64 / 8]; + uint16_t _w_MMXReg[64 / 16]; + uint32_t _l_MMXReg[64 / 32]; + uint64_t _q_MMXReg[64 / 64]; + float32 _s_MMXReg[64 / 32]; + float64 _d_MMXReg[64 / 64]; +} MMXReg; -typedef MMREG_UNION(ZMMReg, 512) ZMMReg; -typedef MMREG_UNION(MMXReg, 64) MMXReg; +typedef union XMMReg { + uint64_t _q_XMMReg[128 / 64]; +} XMMReg; + +typedef union YMMReg { + uint64_t _q_YMMReg[256 / 64]; + XMMReg _x_YMMReg[256 / 128]; +} YMMReg; + +typedef union ZMMReg { + uint8_t _b_ZMMReg[512 / 8]; + uint16_t _w_ZMMReg[512 / 16]; + uint32_t _l_ZMMReg[512 / 32]; + uint64_t _q_ZMMReg[512 / 64]; + float32 _s_ZMMReg[512 / 32]; + float64 _d_ZMMReg[512 / 64]; + XMMReg _x_ZMMReg[512 / 128]; + YMMReg _y_ZMMReg[512 / 256]; +} ZMMReg; typedef struct BNDReg { uint64_t lb; @@ -1267,6 +1283,13 @@ typedef struct BNDCSReg { #define ZMM_S(n) _s_ZMMReg[15 - (n)] #define ZMM_Q(n) _q_ZMMReg[7 - (n)] #define ZMM_D(n) _d_ZMMReg[7 - (n)] +#define ZMM_X(n) _x_ZMMReg[3 - (n)] +#define ZMM_Y(n) _y_ZMMReg[1 - (n)] + +#define XMM_Q(n) _q_XMMReg[1 - (n)] + +#define YMM_Q(n) _q_YMMReg[3 - (n)] +#define YMM_X(n) _x_YMMReg[1 - (n)] #define MMX_B(n) _b_MMXReg[7 - (n)] #define MMX_W(n) _w_MMXReg[3 - (n)] @@ -1279,6 +1302,13 @@ typedef struct BNDCSReg { #define ZMM_S(n) _s_ZMMReg[n] #define ZMM_Q(n) _q_ZMMReg[n] #define ZMM_D(n) _d_ZMMReg[n] +#define ZMM_X(n) _x_ZMMReg[n] +#define ZMM_Y(n) _y_ZMMReg[n] + +#define XMM_Q(n) _q_XMMReg[n] + +#define YMM_Q(n) _q_YMMReg[n] +#define YMM_X(n) _x_YMMReg[n] #define MMX_B(n) _b_MMXReg[n] #define MMX_W(n) _w_MMXReg[n] @@ -1556,8 +1586,8 @@ typedef struct CPUArchState { float_status mmx_status; /* for 3DNow! float ops */ float_status sse_status; uint32_t mxcsr; - ZMMReg xmm_regs[CPU_NB_REGS == 8 ? 8 : 32]; - ZMMReg xmm_t0; + ZMMReg xmm_regs[CPU_NB_REGS == 8 ? 8 : 32] QEMU_ALIGNED(16); + ZMMReg xmm_t0 QEMU_ALIGNED(16); MMXReg mmx_t0; uint64_t opmask_regs[NB_OPMASK_REGS]; From patchwork Sun Sep 11 23:03:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973114 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 432B2C6FA83 for ; Sun, 11 Sep 2022 23:12:21 +0000 (UTC) Received: from localhost ([::1]:52352 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXW7g-00028h-Bx for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:12:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58228) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0D-0000Ez-CQ for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:42991) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0A-0006yS-B9 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937471; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nwEYZBWmHjd2XtwWa6PPS2NYjgVIiAYmrjnQD3uHZlg=; b=eaBxd1kPuNVM8KdfoUC1EsqP0pve6jidxvOOyfk0cWzvLPhTE58rTSWKTuxhrdW+EORyb1 t7c42AXXI6Ajr1vRTrlN5+wtyFmNGPsORLZp+ikYjQaPPNNlPV68zwtf4a3pnInAQBNcQR 8YZcchGLEgMNXnQMxEz9FHadkvG+2DU= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-571-nDqyeB52MVyoxDKJ-eZ-sg-1; Sun, 11 Sep 2022 19:04:29 -0400 X-MC-Unique: nDqyeB52MVyoxDKJ-eZ-sg-1 Received: by mail-ej1-f72.google.com with SMTP id sg37-20020a170907a42500b0077969e994f2so2297516ejc.9 for ; Sun, 11 Sep 2022 16:04:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=nwEYZBWmHjd2XtwWa6PPS2NYjgVIiAYmrjnQD3uHZlg=; b=Obpe4QnxepHUmHl+ks07ir5w8SPZCY/fS6t8BIhB0OkNEaCrK1aO59XcfmcyWXzjee JOm0ztfJ10jcifd/onIiO+YbNk/sjRTlRIpHqONWG42utSgENUipMt7lrPgj+V4GXG1x QGOMey29gASjQvgicsemQ3sZnE/jt0q2sjTwffcW3gHGWRy+2sthOkWmeZshyzp1tl5S +ANqr5V5epqss0GHpN3V5hJSq8lzenOVh0uYoJuOJvIBykne7pJLE0KA5UM210SklNgP KPuVrIPTtn4rVrn2LlhTYTW0EqIvkmFe8m78Jk+ExvuL7Iy9f5G8876/ZfpRqKJ+6HoR HfOg== X-Gm-Message-State: ACgBeo3Eh4dgFujsoMO3xkmbp1Gt2MEMqBjyWExeaAgN8gZBw8Y2x3lj namAr2BRj/llcNsemttXqt7zMtHewYMqo6WdoEHValV8jObr0GFl7ZZFAzgcbDfKopIYQKpfBE7 FLv8wLkZsrJ4ONemFiMsIja+f+MdAC8gKWWJUkvHAePTm1kJYY7GiKIXRBJV+lrqeO3c= X-Received: by 2002:a05:6402:11d0:b0:44e:ec42:e0b8 with SMTP id j16-20020a05640211d000b0044eec42e0b8mr20161750edw.131.1662937468037; Sun, 11 Sep 2022 16:04:28 -0700 (PDT) X-Google-Smtp-Source: AA6agR51MYqIvdvmNjz+FYo35G+RLoXTcHcU3Kgw+58XdLV8YAUpu47BzBkdkfdxVU0tFkdgch+KIg== X-Received: by 2002:a05:6402:11d0:b0:44e:ec42:e0b8 with SMTP id j16-20020a05640211d000b0044eec42e0b8mr20161740edw.131.1662937467668; Sun, 11 Sep 2022 16:04:27 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id j21-20020a1709064b5500b00776bd41529esm3498165ejv.155.2022.09.11.16.04.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:27 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 02/37] target/i386: make ldo/sto operations consistent with ldq Date: Mon, 12 Sep 2022 01:03:42 +0200 Message-Id: <20220911230418.340941-3-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" ldq takes a pointer to the first byte to load the 64-bit word in; ldo takes a pointer to the first byte of the ZMMReg. Make them consistent, which will be useful in the new SSE decoder's load/writeback routines. Signed-off-by: Paolo Bonzini --- target/i386/tcg/translate.c | 44 +++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 001af76663..9a85010dcd 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2761,28 +2761,29 @@ static inline void gen_ldo_env_A0(DisasContext *s, int offset) { int mem_index = s->mem_index; tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, mem_index, MO_LEUQ); - tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(0))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(XMMReg, XMM_Q(0))); tcg_gen_addi_tl(s->tmp0, s->A0, 8); tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); - tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(1))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(XMMReg, XMM_Q(1))); } static inline void gen_sto_env_A0(DisasContext *s, int offset) { int mem_index = s->mem_index; - tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(0))); + offset -= offsetof(ZMMReg, ZMM_Q(0)); + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(XMMReg, XMM_Q(0))); tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0, mem_index, MO_LEUQ); tcg_gen_addi_tl(s->tmp0, s->A0, 8); - tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(1))); + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(XMMReg, XMM_Q(1))); tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); } static inline void gen_op_movo(DisasContext *s, int d_offset, int s_offset) { - tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(0))); - tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(0))); - tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(1))); - tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(1))); + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, XMM_Q(0))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(XMMReg, XMM_Q(0))); + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, XMM_Q(1))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(XMMReg, XMM_Q(1))); } static inline void gen_op_movq(DisasContext *s, int d_offset, int s_offset) @@ -2804,6 +2805,7 @@ static inline void gen_op_movq_env_0(DisasContext *s, int d_offset) } #define ZMM_OFFSET(reg) offsetof(CPUX86State, xmm_regs[reg]) +#define XMM_OFFSET(reg) offsetof(CPUX86State, xmm_regs[reg].ZMM_X(0)) typedef void (*SSEFunc_i_ep)(TCGv_i32 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_l_ep)(TCGv_i64 val, TCGv_ptr env, TCGv_ptr reg); @@ -3317,13 +3319,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); - gen_sto_env_A0(s, ZMM_OFFSET(reg)); + gen_sto_env_A0(s, XMM_OFFSET(reg)); break; case 0x3f0: /* lddqu */ if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); - gen_ldo_env_A0(s, ZMM_OFFSET(reg)); + gen_ldo_env_A0(s, XMM_OFFSET(reg)); break; case 0x22b: /* movntss */ case 0x32b: /* movntsd */ @@ -3392,10 +3394,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x26f: /* movdqu xmm, ea */ if (mod != 3) { gen_lea_modrm(env, s, modrm); - gen_ldo_env_A0(s, ZMM_OFFSET(reg)); + gen_ldo_env_A0(s, XMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); - gen_op_movo(s, ZMM_OFFSET(reg), ZMM_OFFSET(rm)); + gen_op_movo(s, XMM_OFFSET(reg), XMM_OFFSET(rm)); } break; case 0x210: /* movss xmm, ea */ @@ -3451,7 +3453,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x212: /* movsldup */ if (mod != 3) { gen_lea_modrm(env, s, modrm); - gen_ldo_env_A0(s, ZMM_OFFSET(reg)); + gen_ldo_env_A0(s, XMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); gen_op_movl(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_L(0)), @@ -3493,7 +3495,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x216: /* movshdup */ if (mod != 3) { gen_lea_modrm(env, s, modrm); - gen_ldo_env_A0(s, ZMM_OFFSET(reg)); + gen_ldo_env_A0(s, XMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); gen_op_movl(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_L(1)), @@ -3587,10 +3589,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x27f: /* movdqu ea, xmm */ if (mod != 3) { gen_lea_modrm(env, s, modrm); - gen_sto_env_A0(s, ZMM_OFFSET(reg)); + gen_sto_env_A0(s, XMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); - gen_op_movo(s, ZMM_OFFSET(rm), ZMM_OFFSET(reg)); + gen_op_movo(s, XMM_OFFSET(rm), XMM_OFFSET(reg)); } break; case 0x211: /* movss ea, xmm */ @@ -3742,7 +3744,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_helper_enter_mmx(cpu_env); if (mod != 3) { gen_lea_modrm(env, s, modrm); - op2_offset = offsetof(CPUX86State,xmm_t0); + op2_offset = offsetof(CPUX86State,xmm_t0.ZMM_X(0)); gen_ldo_env_A0(s, op2_offset); } else { rm = (modrm & 7) | REX_B(s); @@ -3906,9 +3908,9 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } if (b1) { - op1_offset = ZMM_OFFSET(reg); + op1_offset = XMM_OFFSET(reg); if (mod == 3) { - op2_offset = ZMM_OFFSET(rm | REX_B(s)); + op2_offset = XMM_OFFSET(rm | REX_B(s)); } else { op2_offset = offsetof(CPUX86State,xmm_t0); gen_lea_modrm(env, s, modrm); @@ -4516,7 +4518,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, if (mod == 3) { op2_offset = ZMM_OFFSET(rm | REX_B(s)); } else { - op2_offset = offsetof(CPUX86State, xmm_t0); + op2_offset = offsetof(CPUX86State, xmm_t0.ZMM_X(0)); gen_lea_modrm(env, s, modrm); gen_ldo_env_A0(s, op2_offset); } @@ -4625,7 +4627,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, break; default: /* 128 bit access */ - gen_ldo_env_A0(s, op2_offset); + gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_t0.ZMM_X(0))); break; } } else { From patchwork Sun Sep 11 23:03:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 789B7C6FA83 for ; Sun, 11 Sep 2022 23:08:10 +0000 (UTC) Received: from localhost ([::1]:48284 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXW3d-0004ZV-DF for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:08:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58230) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0E-0000F0-Ju for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:57628) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0A-0006yc-RZ for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=63JWKeVT8M8PmHZsUE+N88zOM8RQl3zJTIbziTJXVyE=; b=P5jjJvu8OnB9bJ5KD8h6BndhawjtKaWt7wObbt+x6/CDiApVapddxQVnMhlKuDi3ptcCYJ hW5nWuA13nVH6S7RaY0X27rkN8q8ed3SfzABHJ8WQo0MPj587qWn/6w/fmPEsBI3ltJHJw 1Z4adGlIBmjFP3WXVMvl7jOVUUZeZ2A= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-47-LnkmA5myPmmdVs9UWaVSPg-1; Sun, 11 Sep 2022 19:04:32 -0400 X-MC-Unique: LnkmA5myPmmdVs9UWaVSPg-1 Received: by mail-ej1-f69.google.com with SMTP id qa33-20020a17090786a100b0077a69976d24so1608326ejc.7 for ; Sun, 11 Sep 2022 16:04:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=63JWKeVT8M8PmHZsUE+N88zOM8RQl3zJTIbziTJXVyE=; b=HMgcYN4Z8rpvGVFNYV8htsXDTrsJGYM8v3cbaXnlugyg6I93FCiQD2aUBT+W7ZPMzm 48Ese8V6dQbAAs8pFgZf+jRz6NR6Jdsle4uWgFN54dw3IJoANhMz9Gdj4M+UE/l8yWck Sea28I8PJSE8gozPkVaACrna8zV2UqBgpyKxBPkX96vbFiJSYpyIfYvlupKxSxWdgFLv AfdZmdJBm24VXxVvCiVQjEPNk79GVBFR84MSyjJz3mG15IdnT0NuBbBPuSOGT5EM886c eJMVEKhbMwJf5amzkcxc5ceGrDPytUj+z7hrMYmuxI6zI8rT4UiaYpC9evgaUI6OcWIA YeMQ== X-Gm-Message-State: ACgBeo3+O2whrNDPZ7/GKE9mu8OWDwZGKqe6b2jhtQgyFnbW1pwLUNxz 9u9TD5rZL033PsBnTGMn6JiASXCrICma8TAIwKyaCN/x0m/012ll/8w+InmvKz4VQgVmcs1X3gz kZSX+ZpN1jzQee7rCcCaT4Prg3ztYUrD+XnQ9A0cH0kkrX5ywEDOtNvs+0rGTH6Mwo1I= X-Received: by 2002:a05:6402:5168:b0:44e:9ca8:bf6 with SMTP id d8-20020a056402516800b0044e9ca80bf6mr20078761ede.384.1662937471411; Sun, 11 Sep 2022 16:04:31 -0700 (PDT) X-Google-Smtp-Source: AA6agR4fBzBxb9qXcHE+U1G+OyAoIWSCxGrSdxzPqt4ksamfpuBRE5ZR0xFq9IjSgTCY0zx5KIEnWQ== X-Received: by 2002:a05:6402:5168:b0:44e:9ca8:bf6 with SMTP id d8-20020a056402516800b0044e9ca80bf6mr20078750ede.384.1662937471171; Sun, 11 Sep 2022 16:04:31 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id kx24-20020a170907775800b0073d5948855asm3473448ejc.1.2022.09.11.16.04.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:30 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 03/37] target/i386: REPZ and REPNZ are mutually exclusive Date: Mon, 12 Sep 2022 01:03:43 +0200 Message-Id: <20220911230418.340941-4-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The later prefix wins if both are present, make it show in s->prefix too. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 9a85010dcd..f8fd93dae0 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4737,9 +4737,11 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) switch (b) { case 0xf3: prefixes |= PREFIX_REPZ; + prefixes &= ~PREFIX_REPNZ; goto next_byte; case 0xf2: prefixes |= PREFIX_REPNZ; + prefixes &= ~PREFIX_REPZ; goto next_byte; case 0xf0: prefixes |= PREFIX_LOCK; From patchwork Sun Sep 11 23:03:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C8F9ECAAD3 for ; Sun, 11 Sep 2022 23:11:50 +0000 (UTC) Received: from localhost ([::1]:34724 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXW7B-0000qU-H5 for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:11:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58232) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0L-0000G6-9L for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:45 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:45587) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0G-0006zC-M9 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937477; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W9Db00TXnG0CXgJlBEJ6W2z2OhCBntWHjkaxoFsAGHk=; b=KtNz4Sc+lItOggy4xz3wiJlFZ11OQKr7NfYJQ/lA7w5PHXbv5VO65UJhnzqDreCRZ2TJSC L4lOmQBYZu8HpbwWwJUhwcTAtAfghWNnvOKn6+7j4k55YbNLgyoi6/qGT2bBzfyQuN1rDW NoP9s7hFQYQ/O4dOOdtYN9539SjepGQ= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-148-vT1Osg7oM5SkZd4NvVtNdQ-1; Sun, 11 Sep 2022 19:04:36 -0400 X-MC-Unique: vT1Osg7oM5SkZd4NvVtNdQ-1 Received: by mail-ed1-f71.google.com with SMTP id y1-20020a056402358100b00451b144e23eso1179985edc.18 for ; Sun, 11 Sep 2022 16:04:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=W9Db00TXnG0CXgJlBEJ6W2z2OhCBntWHjkaxoFsAGHk=; b=ipRbnbp3DinrKwag9zH6r/F1sls3mxhE44HiP2jf+d2WBHBlFFAn+g8w7CTP9qNDGY ITWHIPxhoB651DbdsOMcbwNeb0RahAfO7Xl0SJW0LCF5KqtV+47fb/5nFyqnCyBzfsYd tDr0CFu/g/ObuA5Jia7YJUuKNywoB+yBEJkfJUwYx/yRMVn5P5VylOrGK2BUbEh3y60S 23+oDsTtY6Vi3AjdgvGDOi4lEE9to/C8tCX/rozG3AijBnqqTgJUV+qXDvsUdHT3Z/xa 2v/RDCImi7V4STcSuTNHq6hFitn7kja6T6XpgyNFA+S+q2nyH6aDs+uDGplgC8fnlltM epPg== X-Gm-Message-State: ACgBeo1hGD+9O0RwKE9b4SAFIpRrQNPQOeT164cUApIN30NWfT4RVBcL Gn4skIgxpdYORssey1TrP8KJ1m4ReaiDxw0rgOUKqayeNC2XEJ8v+5xW7FkTUvxy0ooXUJGZFYZ eWyCKpq0lmRTyV2frqsludqg5AAfE6Wba3Cza9o7gZzxGQuZg7bZQvkzK8zD4vUjWf8I= X-Received: by 2002:a17:907:6e1e:b0:77d:4fc3:d6c4 with SMTP id sd30-20020a1709076e1e00b0077d4fc3d6c4mr1998728ejc.289.1662937475203; Sun, 11 Sep 2022 16:04:35 -0700 (PDT) X-Google-Smtp-Source: AA6agR6rNljYwi2VCmqApjxc1UzI/1byX+K4JbplhaPtWZcLDDrNcoW1wcdBk6Auklm0FfuR45VX5A== X-Received: by 2002:a17:907:6e1e:b0:77d:4fc3:d6c4 with SMTP id sd30-20020a1709076e1e00b0077d4fc3d6c4mr1998717ejc.289.1662937474935; Sun, 11 Sep 2022 16:04:34 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id l1-20020a1709063d2100b00771cb506149sm3523138ejf.59.2022.09.11.16.04.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:34 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 04/37] target/i386: introduce insn_get_addr Date: Mon, 12 Sep 2022 01:03:44 +0200 Message-Id: <20220911230418.340941-5-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The "O" operand type in the Intel SDM needs to load an 8- to 64-bit unsigned value, while insn_get is limited to 32 bits. Extract the code out of disas_insn and into a separate function. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index f8fd93dae0..f1aa830fcc 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2308,6 +2308,31 @@ static void gen_ldst_modrm(CPUX86State *env, DisasContext *s, int modrm, } } +static inline target_ulong insn_get_addr(CPUX86State *env, DisasContext *s, MemOp ot) +{ + target_ulong ret; + + switch (ot) { + case MO_8: + ret = x86_ldub_code(env, s); + break; + case MO_16: + ret = x86_lduw_code(env, s); + break; + case MO_32: + ret = x86_ldl_code(env, s); + break; +#ifdef TARGET_X86_64 + case MO_64: + ret = x86_ldq_code(env, s); + break; +#endif + default: + tcg_abort(); + } + return ret; +} + static inline uint32_t insn_get(CPUX86State *env, DisasContext *s, MemOp ot) { uint32_t ret; @@ -5867,16 +5892,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) target_ulong offset_addr; ot = mo_b_d(b, dflag); - switch (s->aflag) { -#ifdef TARGET_X86_64 - case MO_64: - offset_addr = x86_ldq_code(env, s); - break; -#endif - default: - offset_addr = insn_get(env, s, s->aflag); - break; - } + offset_addr = insn_get_addr(env, s, s->aflag); tcg_gen_movi_tl(s->A0, offset_addr); gen_add_A0_ds_seg(s); if ((b & 2) == 0) { From patchwork Sun Sep 11 23:03:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E6030ECAAD3 for ; Sun, 11 Sep 2022 23:07:24 +0000 (UTC) Received: from localhost ([::1]:60582 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXW2t-0003J4-Vj for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:07:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41544) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0O-0000Gv-Vp for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:20927) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0K-0006zd-NV for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937482; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Umv+Midn7Lb16MmCty5tDWZf5hWKzwOgk4iO4hY9xoY=; b=Ajl0otZ/ozJCcbdG6kiJJywXpNQ9s+1YqWJMdtLnnSUwbMvNnoJNqqpM2nMQ/WXDg8Pn7V LfA4rcSa0DNWKFwZCAIfSGyqxK0QUWDjI+tETYG8rMlBcHobJpEGn5GFe5jR6dvWCnhnCm kT4WFleG3C58O31VUFwX2AwvMKjEyjU= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-361-fhaRmpYvN_Sw-gKKkQF79w-1; Sun, 11 Sep 2022 19:04:40 -0400 X-MC-Unique: fhaRmpYvN_Sw-gKKkQF79w-1 Received: by mail-ed1-f71.google.com with SMTP id p4-20020a056402500400b00447e8b6f62bso5005619eda.17 for ; Sun, 11 Sep 2022 16:04:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Umv+Midn7Lb16MmCty5tDWZf5hWKzwOgk4iO4hY9xoY=; b=IsOoGSDwcWKqqNx4flskhLdVD+v3i4C7P/RvzQzOUYFXTXtAIvTB+9no2LzZAe8Qon eW2II1ZyoOO5ByF9rzYM3VTswnha1wQ64CqPhCkCBsNGOOHZjjLawD+KRhGC7BuzB9mE 4RO3eL6rSyoGp8LEnQM0/FiNyAe+FdgiNYNR6UnS30Cr32o5PlCPiE3ju5M0veNYOk6X UnxQo6CnH5B506dmVn6Mey0qW+ZRslAZauiGeie5yVbrQfZ28XOr0hvD5KsrmdgeuJlF g/uiq8SZZi6YKmUTZDiQ7X+3iksghGwZoCIGFOpB/ivqZJKu84duNAvoPR/xpAxVtvF4 3PfQ== X-Gm-Message-State: ACgBeo0K2eKq6kRhDBB90ItrvBrhQYFNiAXLvccPofygaY0YIIA2ff8/ dxq4B22AdoNgDdoPf8zQOCcNOxWydeRdeYhV1f+8J5+REJlU5LNcxZ5h8592/r5lvrxHXOrbJFB P5dJkfWCHGjYS5hCFFdJQSXOJgVhKkO09ge9QLRBWQHNGGWSwTWq8cRyBDYE6BOn9Vbs= X-Received: by 2002:aa7:da86:0:b0:44e:91c8:eb4f with SMTP id q6-20020aa7da86000000b0044e91c8eb4fmr20948748eds.252.1662937479027; Sun, 11 Sep 2022 16:04:39 -0700 (PDT) X-Google-Smtp-Source: AA6agR5uTlKs4BxGyFlAlFQ3TyXXaau4sAPzPv9XzIRxrFFTe/4Ks6HwtgiA9rp0Pwvhkroc0xOf9Q== X-Received: by 2002:aa7:da86:0:b0:44e:91c8:eb4f with SMTP id q6-20020aa7da86000000b0044e91c8eb4fmr20948716eds.252.1662937478137; Sun, 11 Sep 2022 16:04:38 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id h25-20020a170906399900b0073da32b7db0sm3471459eje.199.2022.09.11.16.04.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:37 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 05/37] target/i386: add core of new i386 decoder Date: Mon, 12 Sep 2022 01:03:45 +0200 Message-Id: <20220911230418.340941-6-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The new decoder is based on three principles: - use mostly table-driven decoding, using tables derived as much as possible from the Intel manual. Centralizing the decode the operands makes it more homogeneous, for example all immediates are signed. All modrm handling is in one function, and can be shared between SSE and ALU instructions (including XMM<->GPR instructions). The SSE/AVX decoder will also not have duplicated code between the 0F, 0F38 and 0F3A tables. - keep the code as "non-branchy" as possible. Generally, the code for the new decoder is more verbose, but the control flow is simpler. Conditionals are not nested and have small bodies. All instruction groups are resolved even before operands are decoded, and code generation is separated as much as possible within small functions that only handle one instruction each. - keep address generation and (for ALU operands) memory loads and writeback as much in common code as possible. All ALU operations for example are implemented as T0=f(T0,T1). For non-ALU instructions, read-modify-write memory operations are rare, but registers do not have TCGv equivalents: therefore, the common logic sets up pointer temporaries with the operands, while load and writeback are handled by gvec or by helpers. These principles make future code review and extensibility simpler, at the cost of having a relatively large amount of code in the form of this patch. Even EVEX should not be _too_ hard to implement (it's just a crazy large amount of possibilities). This patch introduces the main decoder flow, and integrates the old decoder with the new one. The old decoder takes care of parsing prefixes and then optionally drops to the new one. The changes to the old decoder are minimal and allow it to be replaced incrementally with the new one. There is a debugging mechanism through a "LIMIT" environment variable. In user-mode emulation, the variable is the number of instructions decoded by the new decoder before permanently switching to the old one. In system emulation, the variable is the highest opcode that is decoded by the new decoder (this is less friendly, but it's the best that can be done without requiring deterministic execution). Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 752 +++++++++++++++++++++++++++++++ target/i386/tcg/decode-new.h | 181 ++++++++ target/i386/tcg/emit.c.inc | 31 ++ target/i386/tcg/translate.c | 64 ++- 4 files changed, 1021 insertions(+), 7 deletions(-) create mode 100644 target/i386/tcg/decode-new.c.inc create mode 100644 target/i386/tcg/decode-new.h create mode 100644 target/i386/tcg/emit.c.inc diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc new file mode 100644 index 0000000000..de8ef51a2d --- /dev/null +++ b/target/i386/tcg/decode-new.c.inc @@ -0,0 +1,752 @@ +/* + * New-style decoder for i386 instructions + * + * Copyright (c) 2022 Red Hat, Inc. + * + * Author: Paolo Bonzini + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +/* + * The decoder is mostly based on tables copied from the Intel SDM. As + * a result, most operand load and writeback is done entirely in common + * table-driven code using the same operand type (X86_TYPE_*) and + * size (X86_SIZE_*) codes used in the manual. + * + * The main difference is that the V, U and W types are extended to + * cover MMX as well; if an instruction is like + * + * por Pq, Qq + * 66 por Vx, Hx, Wx + * + * only the second row is included and the instruction is marked as a + * valid MMX instruction. The MMX flag directs the decoder to rewrite + * the V/U/H/W types to P/N/P/Q if there is no prefix, as well as changing + * "x" to "q" if there is no prefix. + * + * In addition, the ss/ps/sd/pd types are sometimes mushed together as "x" + * if the difference is expressed via prefixes. Individual instructions + * are separated by prefix in the generator functions. + * + * There are a couple cases in which instructions (e.g. MOVD) write the + * whole XMM or MM register but are established incorrectly in the manual + * as "d" or "q". These have to be fixed for the decoder to work correctly. + */ + +#define X86_OP_NONE { 0 }, + +#define X86_OP_GROUP3(op, op0_, s0_, op1_, s1_, op2_, s2_, ...) { \ + .decode = glue(decode_, op), \ + .op0 = glue(X86_TYPE_, op0_), \ + .s0 = glue(X86_SIZE_, s0_), \ + .op1 = glue(X86_TYPE_, op1_), \ + .s1 = glue(X86_SIZE_, s1_), \ + .op2 = glue(X86_TYPE_, op2_), \ + .s2 = glue(X86_SIZE_, s2_), \ + .is_decode = true, \ + ## __VA_ARGS__ \ +} + +#define X86_OP_GROUP0(op, ...) \ + X86_OP_GROUP3(op, None, None, None, None, None, None, ## __VA_ARGS__) + +#define X86_OP_ENTRY3(op, op0_, s0_, op1_, s1_, op2_, s2_, ...) { \ + .gen = glue(gen_, op), \ + .op0 = glue(X86_TYPE_, op0_), \ + .s0 = glue(X86_SIZE_, s0_), \ + .op1 = glue(X86_TYPE_, op1_), \ + .s1 = glue(X86_SIZE_, s1_), \ + .op2 = glue(X86_TYPE_, op2_), \ + .s2 = glue(X86_SIZE_, s2_), \ + ## __VA_ARGS__ \ +} + +#define X86_OP_ENTRY4(op, op0_, s0_, op1_, s1_, op2_, s2_, ...) \ + X86_OP_ENTRY3(op, op0_, s0_, op1_, s1_, op2_, s2_, \ + .op3 = X86_TYPE_I, .s3 = X86_SIZE_b, \ + ## __VA_ARGS__) + +#define X86_OP_ENTRY2(op, op0, s0, op1, s1, ...) \ + X86_OP_ENTRY3(op, op0, s0, 2op, s0, op1, s1, ## __VA_ARGS__) +#define X86_OP_ENTRY0(op, ...) \ + X86_OP_ENTRY3(op, None, None, None, None, None, None, ## __VA_ARGS__) + +#define i64 .special = X86_SPECIAL_i64, +#define o64 .special = X86_SPECIAL_o64, +#define xchg .special = X86_SPECIAL_Locked, +#define mmx .special = X86_SPECIAL_MMX, +#define zext0 .special = X86_SPECIAL_ZExtOp0, +#define zext2 .special = X86_SPECIAL_ZExtOp2, + +static uint8_t get_modrm(DisasContext *s, CPUX86State *env) +{ + if (!s->has_modrm) { + s->modrm = x86_ldub_code(env, s); + s->has_modrm = true; + } + return s->modrm; +} + +static const X86OpEntry opcodes_0F38_00toEF[240] = { +}; + +/* five rows for no prefix, 66, F3, F2, 66+F2 */ +static X86OpEntry opcodes_0F38_F0toFF[16][5] = { +}; + +static void decode_0F38(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + *b = x86_ldub_code(env, s); + if (*b < 0xf0) { + *entry = opcodes_0F38_00toEF[*b]; + } else { + int row = 0; + if (s->prefix & PREFIX_REPZ) { + /* The REPZ (F3) prefix has priority over 66 */ + row = 2; + } else { + row += s->prefix & PREFIX_REPNZ ? 3 : 0; + row += s->prefix & PREFIX_DATA ? 1 : 0; + } + *entry = opcodes_0F38_F0toFF[*b & 15][row]; + } +} + +static const X86OpEntry opcodes_0F3A[256] = { +}; + +static void decode_0F3A(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + *b = x86_ldub_code(env, s); + *entry = opcodes_0F3A[*b]; +} + +static const X86OpEntry opcodes_0F[256] = { + [0x38] = X86_OP_GROUP0(0F38), + [0x3a] = X86_OP_GROUP0(0F3A), +}; + +static void do_decode_0F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + *entry = opcodes_0F[*b]; +} + +static void decode_0F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + *b = x86_ldub_code(env, s); + do_decode_0F(s, env, entry, b); +} + +static const X86OpEntry opcodes_root[256] = { + [0x0F] = X86_OP_GROUP0(0F), +}; + +#undef mmx + +/* + * Decode the fixed part of the opcode and place the last + * in b. + */ +static void decode_root(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + *entry = opcodes_root[*b]; +} + + +static int decode_modrm(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + X86DecodedOp *op, X86OpType type) +{ + int modrm = get_modrm(s, env); + if ((modrm >> 6) == 3) { + if (s->prefix & PREFIX_LOCK) { + decode->e.gen = gen_illegal; + return 0xff; + } + op->n = (modrm & 7); + if (type != X86_TYPE_Q && type != X86_TYPE_N) { + op->n |= REX_B(s); + } + } else { + op->has_ea = true; + op->n = -1; + decode->mem = gen_lea_modrm_0(env, s, get_modrm(s, env)); + } + return modrm; +} + +static bool decode_op_size(DisasContext *s, X86OpEntry *e, X86OpSize size, MemOp *ot) +{ + switch (size) { + case X86_SIZE_b: /* byte */ + *ot = MO_8; + return true; + + case X86_SIZE_d: /* 32-bit */ + case X86_SIZE_ss: /* SSE/AVX scalar single precision */ + *ot = MO_32; + return true; + + case X86_SIZE_p: /* Far pointer, return offset size */ + case X86_SIZE_s: /* Descriptor, return offset size */ + case X86_SIZE_v: /* 16/32/64-bit, based on operand size */ + *ot = s->dflag; + return true; + + case X86_SIZE_pi: /* MMX */ + case X86_SIZE_q: /* 64-bit */ + case X86_SIZE_sd: /* SSE/AVX scalar double precision */ + *ot = MO_64; + return true; + + case X86_SIZE_w: /* 16-bit */ + *ot = MO_16; + return true; + + case X86_SIZE_y: /* 32/64-bit, based on operand size */ + *ot = s->dflag == MO_16 ? MO_32 : s->dflag; + return true; + + case X86_SIZE_z: /* 16-bit for 16-bit operand size, else 32-bit */ + *ot = s->dflag == MO_16 ? MO_16 : MO_32; + return true; + + case X86_SIZE_dq: /* SSE/AVX 128-bit */ + if (e->special == X86_SPECIAL_MMX && + !(s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) { + *ot = MO_64; + return true; + } + if (s->vex_l && e->s0 != X86_SIZE_qq) { + return false; + } + *ot = MO_128; + return true; + + case X86_SIZE_qq: /* AVX 256-bit */ + if (!s->vex_l) { + return false; + } + *ot = MO_256; + return true; + + case X86_SIZE_x: /* 128/256-bit, based on operand size */ + if (e->special == X86_SPECIAL_MMX && + !(s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) { + *ot = MO_64; + return true; + } + /* fall through */ + case X86_SIZE_ps: /* SSE/AVX packed single precision */ + case X86_SIZE_pd: /* SSE/AVX packed double precision */ + *ot = s->vex_l ? MO_256 : MO_128; + return true; + + case X86_SIZE_d64: /* Default to 64-bit in 64-bit mode */ + *ot = CODE64(s) && s->dflag == MO_32 ? MO_64 : s->dflag; + return true; + + case X86_SIZE_f64: /* Ignore size override prefix in 64-bit mode */ + *ot = CODE64(s) ? MO_64 : s->dflag; + return true; + + default: + *ot = -1; + return true; + } +} + +static bool decode_op(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + X86DecodedOp *op, X86OpType type, int b) +{ + int modrm; + + switch (type) { + case X86_TYPE_A: /* Implicit */ + case X86_TYPE_F: /* EFLAGS/RFLAGS */ + break; + + case X86_TYPE_B: /* VEX.vvvv selects a GPR */ + op->unit = X86_OP_INT; + op->n = s->vex_v; + break; + + case X86_TYPE_C: /* REG in the modrm byte selects a control register */ + op->unit = X86_OP_CR; + goto get_reg; + + case X86_TYPE_D: /* REG in the modrm byte selects a debug register */ + op->unit = X86_OP_DR; + goto get_reg; + + case X86_TYPE_G: /* REG in the modrm byte selects a GPR */ + op->unit = X86_OP_INT; + goto get_reg; + + case X86_TYPE_S: /* reg selects a segment register */ + op->unit = X86_OP_SEG; + goto get_reg; + + goto get_reg; + + case X86_TYPE_V: /* reg in the modrm byte selects an XMM/YMM register */ + if (decode->e.special == X86_SPECIAL_MMX && + !(s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) { + case X86_TYPE_P: /* reg in the modrm byte selects an MMX register */ + op->unit = X86_OP_MMX; + } else { + op->unit = X86_OP_SSE; + } + get_reg: + op->n = ((get_modrm(s, env) >> 3) & 7) | REX_R(s); + break; + + case X86_TYPE_E: /* ALU modrm operand */ + op->unit = X86_OP_INT; + goto get_modrm; + + case X86_TYPE_W: /* XMM/YMM modrm operand */ + if (decode->e.special == X86_SPECIAL_MMX && + !(s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) { + case X86_TYPE_Q: /* MMX modrm operand */ + op->unit = X86_OP_MMX; + } else { + op->unit = X86_OP_SSE; + } + goto get_modrm; + + case X86_TYPE_U: /* R/M in the modrm byte selects an XMM/YMM register */ + if (decode->e.special == X86_SPECIAL_MMX && + !(s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) { + case X86_TYPE_N: /* R/M in the modrm byte selects an MMX register */ + op->unit = X86_OP_MMX; + } else { + op->unit = X86_OP_SSE; + } + goto get_modrm_reg; + + case X86_TYPE_R: /* R/M in the modrm byte selects a register */ + op->unit = X86_OP_INT; + get_modrm_reg: + modrm = get_modrm(s, env); + if ((modrm >> 6) != 3) { + return false; + } + goto get_modrm; + + case X86_TYPE_M: /* modrm byte selects a memory operand */ + modrm = get_modrm(s, env); + if ((modrm >> 6) == 3) { + return false; + } + get_modrm: + decode_modrm(s, env, decode, op, type); + break; + + case X86_TYPE_O: /* Absolute address encoded in the instruction */ + op->unit = X86_OP_INT; + op->has_ea = true; + op->n = -1; + decode->mem = (AddressParts) { + .def_seg = R_DS, + .base = -1, + .index = -1, + .disp = insn_get_addr(env, s, s->aflag) + }; + break; + + case X86_TYPE_H: /* For AVX, VEX.vvvv selects an XMM/YMM register */ + if ((s->prefix & PREFIX_VEX)) { + op->unit = X86_OP_SSE; + op->n = s->vex_v; + break; + e X86_TYPE_J: /* Relative offset for a jump */ + op->unit = X86_OP_IMM; + decode->immediate = insn_get_signed(env, s, op->ot); + decode->immediate += s->pc - s->cs_base; + if (s->dflag == MO_16) { + decode->immediate &= 0xffff; + } else if (!CODE64(s)) { + decode->immediate &= 0xffffffffu; + } + break; + + case X86_TYPE_L: /* The upper 4 bits of the immediate select a 128-bit register */ + op->n = insn_get(env, s, op->ot) >> 4; + break; + + case X86_TYPE_X: /* string source */ + op->n = -1; + decode->mem = (AddressParts) { + .def_seg = R_DS, + .base = R_ESI, + .index = -1, + }; + break; + + case X86_TYPE_Y: /* string destination */ + op->n = -1; + decode->mem = (AddressParts) { + .def_seg = R_ES, + .base = R_EDI, + .index = -1, + }; + break; + + case X86_TYPE_2op: + *op = decode->op[0]; + break; + + case X86_TYPE_LoBits: + op->n = (b & 7) | REX_B(s); + op->unit = X86_OP_INT; + break; + + case X86_TYPE_0 ... X86_TYPE_7: + op->n = type - X86_TYPE_0; + op->unit = X86_OP_INT; + break; + + case X86_TYPE_ES ... X86_TYPE_GS: + op->n = type - X86_TYPE_ES; + op->unit = X86_OP_SEG; + break; + + default: + abort(); + } + + return true; +} + +static bool decode_insn(DisasContext *s, CPUX86State *env, X86DecodeFunc decode_func, + X86DecodedInsn *decode) +{ + X86OpEntry *e = &decode->e; + + decode_func(s, env, e, &decode->b); + while (e->is_decode) { + e->is_decode = false; + e->decode(s, env, e, &decode->b); + } + + /* First compute size of operands in order to initialize s->rip_offset. */ + if (e->op0 != X86_TYPE_None) { + if (!decode_op_size(s, e, e->s0, &decode->op[0].ot)) { + return false; + } + if (e->op0 == X86_TYPE_I) { + s->rip_offset += 1 << decode->op[0].ot; + } + } + if (e->op1 != X86_TYPE_None) { + if (!decode_op_size(s, e, e->s1, &decode->op[1].ot)) { + return false; + } + if (e->op1 == X86_TYPE_I) { + s->rip_offset += 1 << decode->op[1].ot; + } + } + if (e->op2 != X86_TYPE_None) { + if (!decode_op_size(s, e, e->s2, &decode->op[2].ot)) { + return false; + } + if (e->op2 == X86_TYPE_I) { + s->rip_offset += 1 << decode->op[2].ot; + } + } + if (e->op3 != X86_TYPE_None) { + assert(e->op3 == X86_TYPE_I && e->s3 == X86_SIZE_b); + s->rip_offset += 1; + } + + if (e->op0 != X86_TYPE_None && + !decode_op(s, env, decode, &decode->op[0], e->op0, decode->b)) { + return false; + } + + if (e->op1 != X86_TYPE_None && + !decode_op(s, env, decode, &decode->op[1], e->op1, decode->b)) { + return false; + } + + if (e->op2 != X86_TYPE_None && + !decode_op(s, env, decode, &decode->op[2], e->op2, decode->b)) { + return false; + } + + if (e->op3 != X86_TYPE_None) { + decode->immediate = insn_get_signed(env, s, MO_8); + } + + return true; +} + +/* convert one instruction. s->base.is_jmp is set if the translation must + be stopped. Return the next pc value */ +static target_ulong disas_insn_new(DisasContext *s, CPUState *cpu, int b) +{ + CPUX86State *env = cpu->env_ptr; + bool first = true; + X86DecodedInsn decode; + X86DecodeFunc decode_func = decode_root; + +#ifdef CONFIG_USER_ONLY + if (limit) { --limit; } +#endif + s->has_modrm = false; +#if 0 + s->pc_start = s->pc = s->base.pc_next; + s->override = -1; +#ifdef TARGET_X86_64 + s->rex_w = false; + s->rex_r = 0; + s->rex_x = 0; + s->rex_b = 0; +#endif + s->prefix = 0; + s->rip_offset = 0; /* for relative ip address */ + s->vex_l = 0; + s->vex_v = 0; + if (sigsetjmp(s->jmpbuf, 0) != 0) { + gen_exception_gpf(s); + return s->pc; + } +#endif + + next_byte: + if (first) { + first = false; + } else { + b = x86_ldub_code(env, s); + } + /* Collect prefixes. */ + switch (b) { + case 0xf3: + s->prefix |= PREFIX_REPZ; + s->prefix &= ~PREFIX_REPNZ; + goto next_byte; + case 0xf2: + s->prefix |= PREFIX_REPNZ; + s->prefix &= ~PREFIX_REPZ; + goto next_byte; + case 0xf0: + s->prefix |= PREFIX_LOCK; + goto next_byte; + case 0x2e: + s->override = R_CS; + goto next_byte; + case 0x36: + s->override = R_SS; + goto next_byte; + case 0x3e: + s->override = R_DS; + goto next_byte; + case 0x26: + s->override = R_ES; + goto next_byte; + case 0x64: + s->override = R_FS; + goto next_byte; + case 0x65: + s->override = R_GS; + goto next_byte; + case 0x66: + s->prefix |= PREFIX_DATA; + goto next_byte; + case 0x67: + s->prefix |= PREFIX_ADR; + goto next_byte; +#ifdef TARGET_X86_64 + case 0x40 ... 0x4f: + if (CODE64(s)) { + /* REX prefix */ + s->prefix |= PREFIX_REX; + s->rex_w = (b >> 3) & 1; + s->rex_r = (b & 0x4) << 1; + s->rex_x = (b & 0x2) << 2; + s->rex_b = (b & 0x1) << 3; + goto next_byte; + } + break; +#endif + case 0xc5: /* 2-byte VEX */ + case 0xc4: /* 3-byte VEX */ + /* VEX prefixes cannot be used except in 32-bit mode. + Otherwise the instruction is LES or LDS. */ + if (CODE32(s) && !VM86(s)) { + static const int pp_prefix[4] = { + 0, PREFIX_DATA, PREFIX_REPZ, PREFIX_REPNZ + }; + int vex3, vex2 = x86_ldub_code(env, s); + + if (!CODE64(s) && (vex2 & 0xc0) != 0xc0) { + /* 4.1.4.6: In 32-bit mode, bits [7:6] must be 11b, + otherwise the instruction is LES or LDS. */ + s->pc--; /* rewind the advance_pc() x86_ldub_code() did */ + break; + } + + /* 4.1.1-4.1.3: No preceding lock, 66, f2, f3, or rex prefixes. */ + if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ + | PREFIX_LOCK | PREFIX_DATA | PREFIX_REX)) { + goto illegal_op; + } +#ifdef TARGET_X86_64 + s->rex_r = (~vex2 >> 4) & 8; +#endif + if (b == 0xc5) { + /* 2-byte VEX prefix: RVVVVlpp, implied 0f leading opcode byte */ + vex3 = vex2; + decode_func = decode_0F; + } else { + /* 3-byte VEX prefix: RXBmmmmm wVVVVlpp */ + vex3 = x86_ldub_code(env, s); +#ifdef TARGET_X86_64 + s->rex_x = (~vex2 >> 3) & 8; + s->rex_b = (~vex2 >> 2) & 8; + s->rex_w = (vex3 >> 7) & 1; +#endif + switch (vex2 & 0x1f) { + case 0x01: /* Implied 0f leading opcode bytes. */ + decode_func = decode_0F; + break; + case 0x02: /* Implied 0f 38 leading opcode bytes. */ + decode_func = decode_0F38; + break; + case 0x03: /* Implied 0f 3a leading opcode bytes. */ + decode_func = decode_0F3A; + break; + default: /* Reserved for future use. */ + goto unknown_op; + } + } + s->vex_v = (~vex3 >> 3) & 0xf; + s->vex_l = (vex3 >> 2) & 1; + s->prefix |= pp_prefix[vex3 & 3] | PREFIX_VEX; + } + break; + default: + if (b >= 0x100) { + b -= 0x100; + decode_func = do_decode_0F; + } + break; + } + + /* Post-process prefixes. */ + if (CODE64(s)) { + /* In 64-bit mode, the default data size is 32-bit. Select 64-bit + data with rex_w, and 16-bit data with 0x66; rex_w takes precedence + over 0x66 if both are present. */ + s->dflag = (REX_W(s) ? MO_64 : s->prefix & PREFIX_DATA ? MO_16 : MO_32); + /* In 64-bit mode, 0x67 selects 32-bit addressing. */ + s->aflag = (s->prefix & PREFIX_ADR ? MO_32 : MO_64); + } else { + /* In 16/32-bit mode, 0x66 selects the opposite data size. */ + if (CODE32(s) ^ ((s->prefix & PREFIX_DATA) != 0)) { + s->dflag = MO_32; + } else { + s->dflag = MO_16; + } + /* In 16/32-bit mode, 0x67 selects the opposite addressing. */ + if (CODE32(s) ^ ((s->prefix & PREFIX_ADR) != 0)) { + s->aflag = MO_32; + } else { + s->aflag = MO_16; + } + } + + memset(&decode, 0, sizeof(decode)); + decode.b = b; + if (!decode_insn(s, env, decode_func, &decode)) { + goto illegal_op; + } + if (!decode.e.gen) { + goto unknown_op; + } + + switch (decode.e.special) { + case X86_SPECIAL_None: + break; + + case X86_SPECIAL_Locked: + if (decode.op[0].has_ea) { + s->prefix |= PREFIX_LOCK; + } + break; + + case X86_SPECIAL_ProtMode: + if (!PE(s) || VM86(s)) { + goto illegal_op; + } + break; + + case X86_SPECIAL_i64: + if (CODE64(s)) { + goto illegal_op; + } + break; + case X86_SPECIAL_o64: + if (!CODE64(s)) { + goto illegal_op; + } + break; + + case X86_SPECIAL_ZExtOp0: + assert(decode.op[0].unit == X86_OP_INT); + if (!decode.op[0].has_ea) { + decode.op[0].ot = MO_32; + } + break; + + case X86_SPECIAL_ZExtOp2: + assert(decode.op[2].unit == X86_OP_INT); + if (!decode.op[2].has_ea) { + decode.op[2].ot = MO_32; + } + break; + + case X86_SPECIAL_MMX: + if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ | PREFIX_DATA))) { + gen_helper_enter_mmx(cpu_env); + } + break; + } + + if (decode.op[0].has_ea || decode.op[1].has_ea || decode.op[2].has_ea) { + gen_load_ea(s, &decode.mem); + } + decode.e.gen(s, env, &decode); + return s->pc; + illegal_op: + gen_illegal_opcode(s); + return s->pc; + unknown_op: + gen_unknown_opcode(env, s); + return s->pc; +} diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h new file mode 100644 index 0000000000..fb44560aae --- /dev/null +++ b/target/i386/tcg/decode-new.h @@ -0,0 +1,181 @@ +/* + * Decode table flags, mostly based on Intel SDM. + * + * Copyright (c) 2022 Red Hat, Inc. + * + * Author: Paolo Bonzini + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +typedef enum X86OpType { + X86_TYPE_None, + + X86_TYPE_A, /* Implicit */ + X86_TYPE_B, /* VEX.vvvv selects a GPR */ + X86_TYPE_C, /* REG in the modrm byte selects a control register */ + X86_TYPE_D, /* REG in the modrm byte selects a debug register */ + X86_TYPE_E, /* ALU modrm operand */ + X86_TYPE_F, /* EFLAGS/RFLAGS */ + X86_TYPE_G, /* REG in the modrm byte selects a GPR */ + X86_TYPE_H, /* For AVX, VEX.vvvv selects an XMM/YMM register */ + X86_TYPE_I, /* Immediate */ + X86_TYPE_J, /* Relative offset for a jump */ + X86_TYPE_L, /* The upper 4 bits of the immediate select a 128-bit register */ + X86_TYPE_M, /* modrm byte selects a memory operand */ + X86_TYPE_N, /* R/M in the modrm byte selects an MMX register */ + X86_TYPE_O, /* Absolute address encoded in the instruction */ + X86_TYPE_P, /* reg in the modrm byte selects an MMX register */ + X86_TYPE_Q, /* MMX modrm operand */ + X86_TYPE_R, /* R/M in the modrm byte selects a register */ + X86_TYPE_S, /* reg selects a segment register */ + X86_TYPE_U, /* R/M in the modrm byte selects an XMM/YMM register */ + X86_TYPE_V, /* reg in the modrm byte selects an XMM/YMM register */ + X86_TYPE_W, /* XMM/YMM modrm operand */ + X86_TYPE_X, /* string source */ + X86_TYPE_Y, /* string destination */ + + /* Custom */ + X86_TYPE_2op, /* 2-operand RMW instruction */ + X86_TYPE_LoBits, /* encoded in bits 0-2 of the operand + REX.B */ + X86_TYPE_0, /* Hard-coded GPRs (RAX..RDI) */ + X86_TYPE_1, + X86_TYPE_2, + X86_TYPE_3, + X86_TYPE_4, + X86_TYPE_5, + X86_TYPE_6, + X86_TYPE_7, + X86_TYPE_ES, /* Hard-coded segment registers */ + X86_TYPE_CS, + X86_TYPE_SS, + X86_TYPE_DS, + X86_TYPE_FS, + X86_TYPE_GS, +} X86OpType; + +typedef enum X86OpSize { + X86_SIZE_None, + + X86_SIZE_a, /* BOUND operand */ + X86_SIZE_b, /* byte */ + X86_SIZE_d, /* 32-bit */ + X86_SIZE_dq, /* SSE/AVX 128-bit */ + X86_SIZE_p, /* Far pointer */ + X86_SIZE_pd, /* SSE/AVX packed double precision */ + X86_SIZE_pi, /* MMX */ + X86_SIZE_ps, /* SSE/AVX packed single precision */ + X86_SIZE_q, /* 64-bit */ + X86_SIZE_qq, /* AVX 256-bit */ + X86_SIZE_s, /* Descriptor */ + X86_SIZE_sd, /* SSE/AVX scalar double precision */ + X86_SIZE_ss, /* SSE/AVX scalar single precision */ + X86_SIZE_si, /* 32-bit GPR */ + X86_SIZE_v, /* 16/32/64-bit, based on operand size */ + X86_SIZE_w, /* 16-bit */ + X86_SIZE_x, /* 128/256-bit, based on operand size */ + X86_SIZE_y, /* 32/64-bit, based on operand size */ + X86_SIZE_z, /* 16-bit for 16-bit operand size, else 32-bit */ + + /* Custom */ + X86_SIZE_d64, + X86_SIZE_f64, +} X86OpSize; + +/* Execution flags */ + +typedef enum X86OpUnit { + X86_OP_SKIP, /* not valid or managed by emission function */ + X86_OP_SEG, /* segment selector */ + X86_OP_CR, /* control register */ + X86_OP_DR, /* debug register */ + X86_OP_INT, /* loaded into/stored from s->T0/T1 */ + X86_OP_IMM, /* immediate */ + X86_OP_SSE, /* address in either s->ptrX or s->A0 depending on has_ea */ + X86_OP_MMX, /* address in either s->ptrX or s->A0 depending on has_ea */ +} X86OpUnit; + +typedef enum X86InsnSpecial { + X86_SPECIAL_None, + + /* Always locked if it has a memory operand (XCHG) */ + X86_SPECIAL_Locked, + + /* Fault outside protected mode */ + X86_SPECIAL_ProtMode, + + /* + * Register operand 0/2 is zero extended to 32 bits. Rd/Mb or Rd/Mw + * in the manual. + */ + X86_SPECIAL_ZExtOp0, + X86_SPECIAL_ZExtOp2, + + /* + * MMX instruction exists with no prefix; if there is no prefix, V/H/W/U operands + * become P/P/Q/N, and size "x" becomes "q". + */ + X86_SPECIAL_MMX, + + /* Illegal or exclusive to 64-bit mode */ + X86_SPECIAL_i64, + X86_SPECIAL_o64, +} X86InsnSpecial; + +typedef struct X86OpEntry X86OpEntry; +typedef struct X86DecodedInsn X86DecodedInsn; + +/* Decode function for multibyte opcodes. */ +typedef void (*X86DecodeFunc)(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b); + +/* Code generation function. */ +typedef void (*X86GenFunc)(DisasContext *s, CPUX86State *env,nSpecial special : 8; + bool is_decode : 1; +}; + +typedef struct X86DecodedOp { + int8_t n; + MemOp ot; /* For b/c/d/p/s/q/v/w/y/z */ + X86OpUnit unit; + bool has_ea; +} X86DecodedOp; + +struct X86DecodedInsn { + X86OpEntry e; + X86DecodedOp op[3]; + target_ulong immediate; + AddressParts mem; + + uint8_t b; +}; + diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc new file mode 100644 index 0000000000..e86364ffc1 --- /dev/null +++ b/target/i386/tcg/emit.c.inc @@ -0,0 +1,31 @@ +/* + * New-style TCG opcode generator for i386 instructions + * + * Copyright (c) 2022 Red Hat, Inc. + * + * Author: Paolo Bonzini + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +static void gen_illegal(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_illegal_opcode(s); +} + +static void gen_load_ea(DisasContext *s, AddressParts *mem) +{ + TCGv ea = gen_lea_modrm_1(s, *mem); + gen_lea_v_seg(s, s->aflag, ea, mem->def_seg, s->override); +} diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index f1aa830fcc..f66bf2ac79 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -85,6 +85,9 @@ typedef struct DisasContext { int8_t override; /* -1 if no override, else R_CS, R_DS, etc */ uint8_t prefix; + bool has_modrm; + uint8_t modrm; + #ifndef CONFIG_USER_ONLY uint8_t cpl; /* code priv level */ uint8_t iopl; /* i/o priv level */ @@ -2356,6 +2359,31 @@ static inline uint32_t insn_get(CPUX86State *env, DisasContext *s, MemOp ot) return ret; } +static inline target_long insn_get_signed(CPUX86State *env, DisasContext *s, MemOp ot) +{ + target_long ret; + + switch (ot) { + case MO_8: + ret = (int8_t) x86_ldub_code(env, s); + break; + case MO_16: + ret = (int16_t) x86_lduw_code(env, s); + break; + case MO_32: + ret = (int32_t) x86_ldl_code(env, s); + break; +#ifdef TARGET_X86_64 + case MO_64: + ret = x86_ldq_code(env, s); + break; +#endif + default: + tcg_abort(); + } + return ret; +} + static inline int insn_const_size(MemOp ot) { if (ot <= MO_32) { @@ -2845,6 +2873,11 @@ typedef void (*SSEFunc_0_ppi)(TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv_i32 val); typedef void (*SSEFunc_0_eppt)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv val); +static bool first = true; static unsigned long limit; +#include "decode-new.h" +#include "emit.c.inc" +#include "decode-new.c.inc" + #define SSE_OPF_CMP (1 << 1) /* does not write for first operand */ #define SSE_OPF_SPECIAL (1 << 3) /* magic */ #define SSE_OPF_3DNOW (1 << 4) /* 3DNow! instruction */ @@ -4756,10 +4789,33 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) prefixes = 0; + if (first) first = false, limit = getenv("LIMIT") ? atol(getenv("LIMIT")) : -1; + bool use_new = true; +#ifdef CONFIG_USER_ONLY + use_new &= limit > 0; +#endif next_byte: + s->prefix = prefixes; b = x86_ldub_code(env, s); /* Collect prefixes. */ switch (b) { + default: +#ifndef CONFIG_USER_ONLY + use_new &= b <= limit; +#endif + if (use_new && 0) { + return disas_insn_new(s, cpu, b); + } + break; + case 0x0f: + b = x86_ldub_code(env, s) + 0x100; +#ifndef CONFIG_USER_ONLY + use_new &= b <= limit; +#endif + if (use_new && 0) { + return disas_insn_new(s, cpu, b + 0x100); + } + break; case 0xf3: prefixes |= PREFIX_REPZ; prefixes &= ~PREFIX_REPNZ; @@ -4810,6 +4866,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) #endif case 0xc5: /* 2-byte VEX */ case 0xc4: /* 3-byte VEX */ + use_new = false; /* VEX prefixes cannot be used except in 32-bit mode. Otherwise the instruction is LES or LDS. */ if (CODE32(s) && !VM86(s)) { @@ -4894,14 +4951,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) s->dflag = dflag; /* now check op code */ - reswitch: switch(b) { - case 0x0f: - /**************************/ - /* extended op code */ - b = x86_ldub_code(env, s) | 0x100; - goto reswitch; - /**************************/ /* arith & logic */ case 0x00 ... 0x05: From patchwork Sun Sep 11 23:03:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3B39BC6FA83 for ; Sun, 11 Sep 2022 23:15:56 +0000 (UTC) Received: from localhost ([::1]:45016 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWB9-0007hF-Bx for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:15:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41542) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0M-0000Gu-Pf for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:51696) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0K-0006zk-SZ for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937484; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AXfKQ9wMRjG03YHNEkvjK1M5NZzwlRM0zsPk2T4zDiY=; b=a69MCZHt8xaDIn2V8TxPHOy+gIuS8YovXfV9z9cD5OLg2Mpg9gg6+5a7roOAUvp5HBiFne 1tYWm6k5HBdS3Er6MoirUYNi5OKu6JFgXvRakYj3IUw7Kz7FfvhRdK0vJBGtwsnTZXvKAa hz/5hK6VQtBsIJqrJwuiYlxKGYNx4mo= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-617-PpGHnI5qOMyeXe776_L1rQ-1; Sun, 11 Sep 2022 19:04:42 -0400 X-MC-Unique: PpGHnI5qOMyeXe776_L1rQ-1 Received: by mail-ed1-f72.google.com with SMTP id e15-20020a056402190f00b0044f41e776a0so4959118edz.0 for ; Sun, 11 Sep 2022 16:04:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=AXfKQ9wMRjG03YHNEkvjK1M5NZzwlRM0zsPk2T4zDiY=; b=vrU10RoktFLSrRMKET9U80oel+UG45nZI7NZp9L7s9PV593T/R6csPmcs0yVi9y35C ozKiJHxQxaHvS0TdNXyflyxHa5d9yhqX/B1qveUj8SkR1xOu4wGF3iIJzEVGMzSKYFA/ WR60AK0s2tMfDeE4NIXw6xQzWekxfwnLTZczrfSPM9VliNkIxu0TBVJxCpzunxjsFl31 +SxEn8W2GSgxnml+3y06tQbltpOapUSZShJquayOsEdYGpgeMMZNUkuVCUL8DQ4oQ8Q1 kVuZ8fPl6YE5dhYYOrgVaCxlsFYjytLUJivex5AmQiDRNX/8hLza0P55zT5ya4HvacfX w31A== X-Gm-Message-State: ACgBeo3MxMPPMxdNCEiP1BTUnfq5qO1a7Jnkx68EFQJbsEUU5jaZjwLo xpKmEGAQPxu+mk1VL6IM4a2BPGeQgB29QRWr0CeXOkjbsdwJvfb1eqMz7kDx50649w9t1cids6U wg6iHHCqjU20LknALLZFIphxS1xRS3ryHYVkpKWc1cZwQJTR7x9J0vNae2dah14uOj9g= X-Received: by 2002:a05:6402:2691:b0:43d:ba10:854b with SMTP id w17-20020a056402269100b0043dba10854bmr19582638edd.158.1662937481302; Sun, 11 Sep 2022 16:04:41 -0700 (PDT) X-Google-Smtp-Source: AA6agR7mQhnebdpEp8TEePuKCDWRMB+UAU2P6geZR03b+dEJXju+ENUMwGvTUCubs2qnHvikuu9jUg== X-Received: by 2002:a05:6402:2691:b0:43d:ba10:854b with SMTP id w17-20020a056402269100b0043dba10854bmr19582624edd.158.1662937480976; Sun, 11 Sep 2022 16:04:40 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id 10-20020a170906310a00b0073d6d6e698bsm3470074ejx.187.2022.09.11.16.04.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:40 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 06/37] target/i386: add ALU load/writeback core Date: Mon, 12 Sep 2022 01:03:46 +0200 Message-Id: <20220911230418.340941-7-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Add generic code generation that takes care of preparing operands around calls to decode.e.gen in a table-driven manner, so that ALU operations need not take care of that. Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 20 +++- target/i386/tcg/decode-new.h | 1 + target/i386/tcg/emit.c.inc | 152 +++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 24 +++++ 4 files changed, 195 insertions(+), 2 deletions(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index de8ef51a2d..7f76051b2d 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -228,7 +228,7 @@ static bool decode_op_size(DisasContext *s, X86OpEntry *e, X86OpSize size, MemOp *ot = MO_64; return true; } - if (s->vex_l && e->s0 != X86_SIZE_qq) { + if (s->vex_l && e->s0 != X86_SIZE_qq && e->s1 != X86_SIZE_qq) { return false; } *ot = MO_128; @@ -741,7 +741,23 @@ static target_ulong disas_insn_new(DisasContext *s, CPUState *cpu, int b) if (decode.op[0].has_ea || decode.op[1].has_ea || decode.op[2].has_ea) { gen_load_ea(s, &decode.mem); } - decode.e.gen(s, env, &decode); + if (s->prefix & PREFIX_LOCK) { + if (decode.op[0].unit != X86_OP_INT || !decode.op[0].has_ea) { + goto illegal_op; + } + gen_load(s, s->T1, NULL, &decode.op[2], decode.immediate); + decode.e.gen(s, env, &decode); + } else { + if (decode.op[0].unit == X86_OP_MMX) { + gen_mmx_offset(s->ptr0, &decode.op[0]); + } else if (decode.op[0].unit == X86_OP_SSE) { + gen_xmm_offset(s->ptr0, &decode.op[0]); + } + gen_load(s, s->T0, s->ptr1, &decode.op[1], decode.immediate); + gen_load(s, s->T1, s->ptr2, &decode.op[2], decode.immediate); + decode.e.gen(s, env, &decode); + gen_writeback(s, &decode.op[0]); + } return s->pc; illegal_op: gen_illegal_opcode(s); diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h index fb44560aae..a2d3c3867f 100644 --- a/target/i386/tcg/decode-new.h +++ b/target/i386/tcg/decode-new.h @@ -168,6 +168,7 @@ typedef struct X86DecodedOp { MemOp ot; /* For b/c/d/p/s/q/v/w/y/z */ X86OpUnit unit; bool has_ea; + int offset; /* For MMX and SSE */ } X86DecodedOp; struct X86DecodedInsn { diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index e86364ffc1..6fa0062d6a 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -29,3 +29,155 @@ static void gen_load_ea(DisasContext *s, AddressParts *mem) TCGv ea = gen_lea_modrm_1(s, *mem); gen_lea_v_seg(s, s->aflag, ea, mem->def_seg, s->override); } + +static void gen_mmx_offset(TCGv_ptr ptr, X86DecodedOp *op) +{ + if (!op->has_ea) { + op->offset = offsetof(CPUX86State, fpregs[op->n].mmx); + } else { + op->offset = offsetof(CPUX86State, mmx_t0); + } + tcg_gen_addi_ptr(ptr, cpu_env, op->offset); + + /* + * ptr is for passing to helpers, and points to the MMXReg; op->offset + * is for TCG ops and points to the operand. + */ + if (op->ot == MO_32) { + op->offset += offsetof(MMXReg, MMX_L(0)); + } +} + +static int xmm_offset(MemOp ot) +{ + if (ot == MO_8) { + return offsetof(ZMMReg, ZMM_B(0)); + } else if (ot == MO_16) { + return offsetof(ZMMReg, ZMM_W(0)); + } else if (ot == MO_32) { + return offsetof(ZMMReg, ZMM_L(0)); + } else if (ot == MO_64) { + return offsetof(ZMMReg, ZMM_Q(0)); + } else if (ot == MO_128) { + return offsetof(ZMMReg, ZMM_X(0)); + } else if (ot == MO_256) { + return offsetof(ZMMReg, ZMM_Y(0)); + } else { + abort(); + } +} + +static void gen_xmm_offset(TCGv_ptr ptr, X86DecodedOp *op) +{ + if (!op->has_ea) { + op->offset = ZMM_OFFSET(op->n); + } else { + op->offset = offsetof(CPUX86State, xmm_t0); + } + /* + * ptr is for passing to helpers, and points to the ZMMReg; op->offset + * is for TCG ops (especially gvec) and points to the base of the vector. + */ + tcg_gen_addi_ptr(ptr, cpu_env, op->offset); + op->offset += xmm_offset(op->ot); +} + +static void gen_load_sse(DisasContext *s, TCGv temp, MemOp ot, int dest_ofs) +{ + if (ot == MO_8) { + gen_op_ld_v(s, MO_8, temp, s->A0); + tcg_gen_st8_tl(temp, cpu_env, dest_ofs); + } else if (ot == MO_16) { + gen_op_ld_v(s, MO_16, temp, s->A0); + tcg_gen_st16_tl(temp, cpu_env, dest_ofs); + } else if (ot == MO_32) { + gen_op_ld_v(s, MO_32, temp, s->A0); + tcg_gen_st32_tl(temp, cpu_env, dest_ofs); + } else if (ot == MO_64) { + gen_ldq_env_A0(s, dest_ofs); + } else if (ot == MO_128) { + gen_ldo_env_A0(s, dest_ofs); + } else if (ot == MO_256) { + gen_ldy_env_A0(s, dest_ofs); + } +} + +static void gen_load(DisasContext *s, TCGv v, TCGv_ptr ptr, X86DecodedOp *op, uint64_t imm) +{ + switch (op->unit) { + case X86_OP_SKIP: + return; + case X86_OP_SEG: + tcg_gen_ld32u_tl(v, cpu_env, + offsetof(CPUX86State,segs[op->n].selector)); + break; + case X86_OP_CR: + tcg_gen_ld_tl(v, cpu_env, offsetof(CPUX86State, cr[op->n])); + break; + case X86_OP_DR: + tcg_gen_ld_tl(v, cpu_env, offsetof(CPUX86State, dr[op->n])); + break; + case X86_OP_INT: + if (op->has_ea) { + gen_op_ld_v(s, op->ot, v, s->A0); + } else { + gen_op_mov_v_reg(s, op->ot, v, op->n); + } + break; + case X86_OP_IMM: + tcg_gen_movi_tl(v, imm); + break; + + case X86_OP_MMX: + gen_mmx_offset(ptr, op); + goto load_vector; + + case X86_OP_SSE: + gen_xmm_offset(ptr, op); + load_vector: + if (op->has_ea) { + gen_load_sse(s, v, op->ot, op->offset); + } + break; + + default: + abort(); + } +} + +static void gen_writeback(DisasContext *s, X86DecodedOp *op) +{ + switch (op->unit) { + case X86_OP_SKIP: + break; + case X86_OP_SEG: + /* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp. */ + gen_movl_seg_T0(s, op->n); + if (s->base.is_jmp) { + gen_jmp_im(s, s->pc - s->cs_base); + if (op->n == R_SS) { + s->flags &= ~HF_TF_MASK; + gen_eob_inhibit_irq(s, true); + } else { + gen_eob(s); + } + } + break; + case X86_OP_CR: + case X86_OP_DR: + /* TBD */ + break; + case X86_OP_INT: + if (op->has_ea) { + gen_op_st_v(s, op->ot, s->T0, s->A0); + } else { + gen_op_mov_reg_v(s, op->ot, op->n, s->T0); + } + break; + case X86_OP_MMX: + case X86_OP_SSE: + break; + default: + abort(); + } +} diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index f66bf2ac79..7e9920e29c 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2831,6 +2831,30 @@ static inline void gen_sto_env_A0(DisasContext *s, int offset) tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); } +static inline void gen_ldy_env_A0(DisasContext *s, int offset) +{ + int mem_index = s->mem_index; + gen_ldo_env_A0(s, offset); + tcg_gen_addi_tl(s->tmp0, s->A0, 16); + tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2))); + tcg_gen_addi_tl(s->tmp0, s->A0, 24); + tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3))); +} + +static inline void gen_sty_env_A0(DisasContext *s, int offset) +{ + int mem_index = s->mem_index; + gen_sto_env_A0(s, offset); + tcg_gen_addi_tl(s->tmp0, s->A0, 16); + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2))); + tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); + tcg_gen_addi_tl(s->tmp0, s->A0, 24); + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3))); + tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); +} + static inline void gen_op_movo(DisasContext *s, int d_offset, int s_offset) { tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, XMM_Q(0))); From patchwork Sun Sep 11 23:03:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CA766ECAAD3 for ; Sun, 11 Sep 2022 23:12:17 +0000 (UTC) Received: from localhost ([::1]:52350 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXW7c-0001zu-Vc for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:12:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41546) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0Q-0000H1-7a for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.145.221.124]:37829) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0O-000709-Nm for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937487; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UYueNIAawSAD0f0741B5PZIJ6OMzo/zyc2Cv3LG491A=; b=OKdsTJyL7yeLniJcdLpvYH/IBIFftuyHwpwfY9f72Rg7UmDvpvlcRoegetKcJuVL8dYaam sjQ/dIBKbFixbLMFKM1XVUx1bqmqdGm/zKL76GAj74nYlcZkkiMrMaSih5b/Hy06YDJb/V CL2k0DkVDSBWG6tNue44gK8/mlF+ACo= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-2-UBBemGmlPL2hU1aXL088Ng-1; Sun, 11 Sep 2022 19:04:45 -0400 X-MC-Unique: UBBemGmlPL2hU1aXL088Ng-1 Received: by mail-ej1-f71.google.com with SMTP id sh44-20020a1709076eac00b00741a01e2aafso2338187ejc.22 for ; Sun, 11 Sep 2022 16:04:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=UYueNIAawSAD0f0741B5PZIJ6OMzo/zyc2Cv3LG491A=; b=NmYyLWp+aztJa3+fIaHqLFnyKkZ0CW/18ec9q00YrrQckE0bbfrgHde3CbGN3L6wNb JVu8LYDfJEvRNudnsaiQTSYLSaoi6lmnK38h/ycS0Ni9WL3XTWbGXU23Yb7w3M8syPM2 /8wP4uaLCMcG9E6DsGYOJpDnrHj2MEzu72glY1zx6cHnaM3bgKaBo+o7+uGCZQohr9F+ sNc8cLm0jugqVhkdGpukucW2XBUD8iaudZ8aY4IX2p50oI39hmqMzcKexCnxf5jI9MkB RBX8cql1Y9N0HS01zE19CHSaW7oX1blz3chGPbLHT1a1/Tz8ntfXM216WWJ1GEVC4wvH 2NVQ== X-Gm-Message-State: ACgBeo1yrOJo6w3B5rcT9nmoiOyFfuVZYNruclJT9lV3jY3qtWBYJUGW TZQWDj2kXU7iPjTotkMSIZROXpEgnFcOpiE5i98yaB+RFXoFdnbYNvzDTbQqhnAOIW81wIHRHwX qhsgFYdCCWmdhQQTrRIGEPEapXlXqb7c6Ufrg5S28IWG3YfTHWeVjEGInPZq3FAYr8UI= X-Received: by 2002:a05:6402:42c5:b0:44e:b640:16fb with SMTP id i5-20020a05640242c500b0044eb64016fbmr20345431edc.29.1662937484026; Sun, 11 Sep 2022 16:04:44 -0700 (PDT) X-Google-Smtp-Source: AA6agR71UEKfloLKFTvQsXmFyqHYB3PNpl9neb1Fk4qPydV/Y4Ex65+2TrIh4hIZF/unwqBPCaQuqw== X-Received: by 2002:a05:6402:42c5:b0:44e:b640:16fb with SMTP id i5-20020a05640242c500b0044eb64016fbmr20345419edc.29.1662937483817; Sun, 11 Sep 2022 16:04:43 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id o23-20020a17090611d700b007341663d7ddsm3489887eja.96.2022.09.11.16.04.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:43 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 07/37] target/i386: add CPUID[EAX=7, ECX=0].ECX to DisasContext Date: Mon, 12 Sep 2022 01:03:47 +0200 Message-Id: <20220911230418.340941-8-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: permerror client-ip=216.145.221.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_FAIL=0.001, SPF_HELO_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" TCG will shortly implement VAES instructions, so add the relevant feature word to the DisasContext. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 7e9920e29c..a92ef61527 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -115,6 +115,7 @@ typedef struct DisasContext { int cpuid_ext2_features; int cpuid_ext3_features; int cpuid_7_0_ebx_features; + int cpuid_7_0_ecx_features; int cpuid_xsave_features; /* TCG local temps */ @@ -8860,6 +8861,7 @@ static void i386_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cpu) dc->cpuid_ext2_features = env->features[FEAT_8000_0001_EDX]; dc->cpuid_ext3_features = env->features[FEAT_8000_0001_ECX]; dc->cpuid_7_0_ebx_features = env->features[FEAT_7_0_EBX]; + dc->cpuid_7_0_ecx_features = env->features[FEAT_7_0_ECX]; dc->cpuid_xsave_features = env->features[FEAT_XSAVE]; dc->jmp_opt = !((cflags & CF_NO_GOTO_TB) || (flags & (HF_TF_MASK | HF_INHIBIT_IRQ_MASK))); From patchwork Sun Sep 11 23:03:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6A57CECAAD3 for ; Sun, 11 Sep 2022 23:15:04 +0000 (UTC) Received: from localhost ([::1]:58820 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWAJ-0006TD-GB for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:15:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41548) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0T-0000HZ-BZ for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:57023) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0R-00070g-O9 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937490; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vc/CcSryYtDk6Uvvp8jDUy7rkTxHVOtnsBmH3Dqy7g8=; b=B8mjl3uH89bB1fKYycFy+nGs1Zeo6WXvkOcIFlcBd9oE+t7X8eUEiXegh8/Qh++wZWtXjl 9l5GKnvX1hpvb685K1rwdj5FefU/mH6IdrbALHdT9qJyTKqe+SQvMdgWSTMdonveFIqJd8 afIt16Ahv2PgZ+qX8SIhsCbX8IwwsXk= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-297-WMJWJGIiP6CR18cd8hUv6w-1; Sun, 11 Sep 2022 19:04:48 -0400 X-MC-Unique: WMJWJGIiP6CR18cd8hUv6w-1 Received: by mail-ed1-f72.google.com with SMTP id w17-20020a056402269100b0043da2189b71so4887832edd.6 for ; Sun, 11 Sep 2022 16:04:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=vc/CcSryYtDk6Uvvp8jDUy7rkTxHVOtnsBmH3Dqy7g8=; b=DdV+wuvIqgTrGgocjotWZh7x8p0HloypgZE9kCFdV9f6AbLiU+JDKptEE4HaKzEYbx llFlHEioYe92TE38kB9VqhokcRVFVpRIelIgOfuXn0/UwH1kSgLLiXdrPEXGve6eHL5j gzoUfEcfNA/fRCYEaIFqvWZHGX3Bggvmu7TE9XlwlYy+e+BtgNMbzLiwYcxo3CTQfBH4 EpuGNOLlzA8FBsrfJrjron0XvA+KqRuROTQCQyIA1RUU9FTD1+cBcRYx1wTduuroO1QA zRTAnuXbqoa8Kcq8x9r5JTF2aOiN9UqZaGbjRrvajldI4Cn9WP0KxAmPQRMZ5QK08WOo gkAw== X-Gm-Message-State: ACgBeo30VkzewJtUvkNwnYJIHz0YpwtTfKa9W6kzpM+NngvHv+5UHBph c4J/QE+g+AUZ/kqlE6BQqu802/IOmixQDsBI4+aXFBF3E69YLRpyu/pSHS76Ts/yDx9ZIjsIsqT pkJdbjdsxmzlsEuN+tIR0+7lKr9eNQUH8VQIfMcw8cvaWkyZ4h3iBF9z4iviDABKCIr0= X-Received: by 2002:a05:6402:454:b0:447:59a8:fc7d with SMTP id p20-20020a056402045400b0044759a8fc7dmr20455790edw.68.1662937487287; Sun, 11 Sep 2022 16:04:47 -0700 (PDT) X-Google-Smtp-Source: AA6agR6Tqnt2LunwAYS6r2DwwdR/RwcDOPyQv+kuEn2HYF1utEKWyV0Mnbhe3+4NYCKPv2enXywTZw== X-Received: by 2002:a05:6402:454:b0:447:59a8:fc7d with SMTP id p20-20020a056402045400b0044759a8fc7dmr20455779edw.68.1662937486931; Sun, 11 Sep 2022 16:04:46 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id l21-20020a170906415500b0073d7ab84375sm3507635ejk.92.2022.09.11.16.04.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:46 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 08/37] target/i386: add CPUID feature checks to new decoder Date: Mon, 12 Sep 2022 01:03:48 +0200 Message-Id: <20220911230418.340941-9-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/decode-new.c.inc | 51 ++++++++++++++++++++++++++++++++ target/i386/tcg/decode-new.h | 20 +++++++++++++ 2 files changed, 71 insertions(+) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 7f76051b2d..a9b8b6c05f 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -83,6 +83,7 @@ #define X86_OP_ENTRY0(op, ...) \ X86_OP_ENTRY3(op, None, None, None, None, None, None, ## __VA_ARGS__) +#define cpuid(feat) .cpuid = X86_FEAT_##feat, #define i64 .special = X86_SPECIAL_i64, #define o64 .special = X86_SPECIAL_o64, #define xchg .special = X86_SPECIAL_Locked, @@ -506,6 +507,52 @@ static bool decode_insn(DisasContext *s, CPUX86State *env, X86DecodeFunc decode_ return true; } +static bool has_cpuid_feature(DisasContext *s, X86CPUIDFeature cpuid) +{ + switch (cpuid) { + case X86_FEAT_None: + return true; + case X86_FEAT_MOVBE: + return (s->cpuid_ext_features & CPUID_EXT_MOVBE); + case X86_FEAT_PCLMULQDQ: + return (s->cpuid_ext_features & CPUID_EXT_PCLMULQDQ); + case X86_FEAT_SSE: + return (s->cpuid_ext_features & CPUID_SSE); + case X86_FEAT_SSE2: + return (s->cpuid_ext_features & CPUID_SSE2); + case X86_FEAT_SSE3: + return (s->cpuid_ext_features & CPUID_EXT_SSE3); + case X86_FEAT_SSSE3: + return (s->cpuid_ext_features & CPUID_EXT_SSSE3); + case X86_FEAT_SSE41: + return (s->cpuid_ext_features & CPUID_EXT_SSE41); + case X86_FEAT_SSE42: + return (s->cpuid_ext_features & CPUID_EXT_SSE42); + case X86_FEAT_AES: + if (s->vex_l) { + return (s->cpuid_7_0_ecx_features & CPUID_7_0_ECX_VAES); + } else { + return (s->cpuid_ext_features & CPUID_EXT_AES); + } + case X86_FEAT_AVX: + return (s->cpuid_ext_features & CPUID_EXT_AVX); + + case X86_FEAT_SSE4A: + return (s->cpuid_ext3_features & CPUID_EXT3_SSE4A); + + case X86_FEAT_ADX: + return (s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_ADX); + case X86_FEAT_BMI1: + return (s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1); + case X86_FEAT_BMI2: + return (s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI2); + case X86_FEAT_AVX2: + return (s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_AVX2); + default: + abort(); + } +} + /* convert one instruction. s->base.is_jmp is set if the translation must be stopped. Return the next pc value */ static target_ulong disas_insn_new(DisasContext *s, CPUState *cpu, int b) @@ -690,6 +737,10 @@ static target_ulong disas_insn_new(DisasContext *s, CPUState *cpu, int b) goto unknown_op; } + if (!has_cpuid_feature(s, decode.e.cpuid)) { + goto illegal_op; + } + switch (decode.e.special) { case X86_SPECIAL_None: break; diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h index a2d3c3867f..6fb2d9151e 100644 --- a/target/i386/tcg/decode-new.h +++ b/target/i386/tcg/decode-new.h @@ -93,6 +93,25 @@ typedef enum X86OpSize { X86_SIZE_f64, } X86OpSize; +typedef enum X86CPUIDFeature { + X86_FEAT_None, + X86_FEAT_ADX, + X86_FEAT_AES, + X86_FEAT_AVX, + X86_FEAT_AVX2, + X86_FEAT_BMI1, + X86_FEAT_BMI2, + X86_FEAT_MOVBE, + X86_FEAT_PCLMULQDQ, + X86_FEAT_SSE, + X86_FEAT_SSE2, + X86_FEAT_SSE3, + X86_FEAT_SSSE3, + X86_FEAT_SSE41, + X86_FEAT_SSE42, + X86_FEAT_SSE4A, +} X86CPUIDFeature; + /* Execution flags */ typedef enum X86OpUnit { @@ -160,6 +179,7 @@ struct X86OpEntry { X86OpSize s3 : 8; X86InsnSpecial special : 8; + X86CPUIDFeature cpuid : 8; bool is_decode : 1; }; From patchwork Sun Sep 11 23:03:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 104D0C6FA83 for ; Sun, 11 Sep 2022 23:21:06 +0000 (UTC) Received: from localhost ([::1]:60192 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWG9-0003kE-Sg for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:21:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46076) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0Y-0000Lf-8a for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:58 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:55224) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0W-00070r-BS for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=88eLT3Xhok9qSDyVtfiTEEhAbzO8EaXK7iUzrXYOifs=; b=PFfeGpQWkRxHtbdm2dxOflS0VcPlcY+Qyyvq4z/lMfUP1MuCfyQS/Vbi8+GvAWWTcJaLkm qymPvDtREQCtdVlkH8f5LjS0AQ1cZ/Kcimt/n3JNFdwR3OaYYf6gPtD5vyQ0x9PyWHmXSM NjFrEDu787pgxoCw6qfh3i6rxji89Jc= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-358-BdzMww4_PJqVZ3SglK5b6w-1; Sun, 11 Sep 2022 19:04:51 -0400 X-MC-Unique: BdzMww4_PJqVZ3SglK5b6w-1 Received: by mail-ej1-f71.google.com with SMTP id qk37-20020a1709077fa500b00730c2d975a0so2295845ejc.13 for ; Sun, 11 Sep 2022 16:04:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=88eLT3Xhok9qSDyVtfiTEEhAbzO8EaXK7iUzrXYOifs=; b=Uc5uvPeFKrJbaS2TEmXIzVBcRbWjlizvrN9+/v1AjX/jTbUxvxV183U/n7EAMmhLYb +BhW2/wH6t4urgQgsuTirwP7YTQpMq2X2DWPndW9bbQS4JEj1spwmuNX/1r4IbFREzYh xoU6hLCBXeeKmQx8VPqIh2E8IfIsMYu3ufJ2TCw89QGQ8L34UNjNq2dufhSydDe98QFN QZSf/EaLm3ylYi70acO5saAd68gMvLMu9Ti4NSfxb4rkcV/g5Vmc2lZJP07iwaYOx47S lWMz7GTuBTBVskc1Kt0N6scAAIbOjxwXxw550CFFsJqy3z/5eAkr3xOFD9BawOru0k9t RxTw== X-Gm-Message-State: ACgBeo0hZnZKElGfkkCbT9VYuV2N5Z23bMpbXD4wgBqiFZ0+sGrc3yJB l7MHk2Y1RPs+C/pJbLS61H60n3BkrHcdNhe97AQFyaVdoZkEk6Mo0545jzQCSFC0EOOgDHP8/Ga 89B62YOqvHW8YGvtBKBHVxI60I0fkBZmvde+NAsB+lTr7adnxoBym9VrIjjaJ1X5Id38= X-Received: by 2002:a05:6402:2546:b0:451:5444:8c83 with SMTP id l6-20020a056402254600b0045154448c83mr6867545edb.194.1662937490516; Sun, 11 Sep 2022 16:04:50 -0700 (PDT) X-Google-Smtp-Source: AA6agR5haGv0G3SdylQrb7LkkzmHMcor+NWdGX1aqyX4J3BvRd8YZgM+Ok6i/3VhDP1rzW35Oz7gxw== X-Received: by 2002:a05:6402:2546:b0:451:5444:8c83 with SMTP id l6-20020a056402254600b0045154448c83mr6867530edb.194.1662937490219; Sun, 11 Sep 2022 16:04:50 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id w15-20020a17090633cf00b007417041fb2bsm3444301eja.116.2022.09.11.16.04.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:49 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: Paul Brook Subject: [PATCH 09/37] target/i386: add AVX_EN hflag Date: Mon, 12 Sep 2022 01:03:49 +0200 Message-Id: <20220911230418.340941-10-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Add a new hflag bit to determine whether AVX instructions are allowed Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-4-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/cpu.h | 3 +++ target/i386/helper.c | 12 ++++++++++++ target/i386/tcg/fpu_helper.c | 1 + 3 files changed, 16 insertions(+) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 8311b69c88..ff1df4ea53 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -169,6 +169,7 @@ typedef enum X86Seg { #define HF_MPX_EN_SHIFT 25 /* MPX Enabled (CR4+XCR0+BNDCFGx) */ #define HF_MPX_IU_SHIFT 26 /* BND registers in-use */ #define HF_UMIP_SHIFT 27 /* CR4.UMIP */ +#define HF_AVX_EN_SHIFT 28 /* AVX Enabled (CR4+XCR0) */ #define HF_CPL_MASK (3 << HF_CPL_SHIFT) #define HF_INHIBIT_IRQ_MASK (1 << HF_INHIBIT_IRQ_SHIFT) @@ -195,6 +196,7 @@ typedef enum X86Seg { #define HF_MPX_EN_MASK (1 << HF_MPX_EN_SHIFT) #define HF_MPX_IU_MASK (1 << HF_MPX_IU_SHIFT) #define HF_UMIP_MASK (1 << HF_UMIP_SHIFT) +#define HF_AVX_EN_MASK (1 << HF_AVX_EN_SHIFT) /* hflags2 */ @@ -2121,6 +2123,7 @@ void host_cpuid(uint32_t function, uint32_t count, /* helper.c */ void x86_cpu_set_a20(X86CPU *cpu, int a20_state); +void cpu_sync_avx_hflag(CPUX86State *env); #ifndef CONFIG_USER_ONLY static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs) diff --git a/target/i386/helper.c b/target/i386/helper.c index fa409e9c44..30083c9cff 100644 --- a/target/i386/helper.c +++ b/target/i386/helper.c @@ -29,6 +29,17 @@ #endif #include "qemu/log.h" +void cpu_sync_avx_hflag(CPUX86State *env) +{ + if ((env->cr[4] & CR4_OSXSAVE_MASK) + && (env->xcr0 & (XSTATE_SSE_MASK | XSTATE_YMM_MASK)) + == (XSTATE_SSE_MASK | XSTATE_YMM_MASK)) { + env->hflags |= HF_AVX_EN_MASK; + } else{ + env->hflags &= ~HF_AVX_EN_MASK; + } +} + void cpu_sync_bndcs_hflags(CPUX86State *env) { uint32_t hflags = env->hflags; @@ -209,6 +220,7 @@ void cpu_x86_update_cr4(CPUX86State *env, uint32_t new_cr4) env->hflags = hflags; cpu_sync_bndcs_hflags(env); + cpu_sync_avx_hflag(env); } #if !defined(CONFIG_USER_ONLY) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 30bc44fcf8..48bf0c5cf8 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2943,6 +2943,7 @@ void helper_xsetbv(CPUX86State *env, uint32_t ecx, uint64_t mask) env->xcr0 = mask; cpu_sync_bndcs_hflags(env); + cpu_sync_avx_hflag(env); return; do_gpf: From patchwork Sun Sep 11 23:03:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AC698C6FA83 for ; Sun, 11 Sep 2022 23:23:33 +0000 (UTC) Received: from localhost ([::1]:54546 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWIW-00012Y-Ql for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:23:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46078) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0d-0000Oe-0w for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:03 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:55513) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0X-000715-I2 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:04:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937497; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i7y7zyX5peGZwtpNG4lrcdi7+gttql+7vFP/UAcxIOQ=; b=ATYDswDAol1htd15kEoiChkWnzi7tAS3mnxZ0R9UjlqNE1nG7tyV7USdem6eL0b+Gypje/ lmQIg6IeG6GAS3k6zlsL+Vohc2Wq4yyl8qZRT/fjoRf0YBrZ3R6AU0L/VbF3/6QoPyzOpG 0CQ2fslNgMVLM2RO59Dx1olBQyJ7smo= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-398-EMp8ld9oNR-BWJfg_hr7Nw-1; Sun, 11 Sep 2022 19:04:55 -0400 X-MC-Unique: EMp8ld9oNR-BWJfg_hr7Nw-1 Received: by mail-ed1-f70.google.com with SMTP id i17-20020a05640242d100b0044f18a5379aso5053563edc.21 for ; Sun, 11 Sep 2022 16:04:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=i7y7zyX5peGZwtpNG4lrcdi7+gttql+7vFP/UAcxIOQ=; b=GjPZPs6fDIuP85HNsmMF/2DYpZil8/E8L/TvzXBnDZFka9VNmFk3qC7woZEPPuLN3s L6KMKgyap4yFiOj30YwX77NxeZ7QBTOELn4Sh7/r2FuFv+0VAsTbxi2qN3hdRtEhwnk7 D2mOZ5BFtxFfA90CGhTfCfHmO5T70KNz9gArFtcY1lsyVYKAO4qV9a3Sw/5ilsONIcOF sDArX6/lMMHg3pj3/5HFZeTrSkP6t8qz3EtTaUKyWxezlVOx3UwxLqKRxhPHR243nx7m ltrtdqTwNAJWScnxIASSQDbQYL9WZo4J/xOWQuFSTIlEVoqtzWtsJe4PHpmaoC1nw/OJ SNBQ== X-Gm-Message-State: ACgBeo39cbjtg+7+fZ4el6dtLevU2kuIYY9j3C57N7nQz7FwH/ZnfyUZ 2iC/4tGNvl74KfgJ+c2AyfiPIE5MfgUkHQeJ9EuZ8677W8NwkOnbMyX3hC9s/jjWS1YJ6gzHvoN wGaJKphmodcjbv5qX71g7z0oTr/NyFBfNdwQMozDVr0XthHk1SEWehqgL+OGif4Bipzs= X-Received: by 2002:a17:906:ee8e:b0:730:3646:d178 with SMTP id wt14-20020a170906ee8e00b007303646d178mr17033350ejb.426.1662937494124; Sun, 11 Sep 2022 16:04:54 -0700 (PDT) X-Google-Smtp-Source: AA6agR4NJ87nJz2HueVb1gRgjpfwtnSSnBAJkeyII5aC1059d8nuqBrcbgRIIfN0xhBCfzAs7zCXnw== X-Received: by 2002:a17:906:ee8e:b0:730:3646:d178 with SMTP id wt14-20020a170906ee8e00b007303646d178mr17033338ejb.426.1662937493691; Sun, 11 Sep 2022 16:04:53 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id ed10-20020a056402294a00b0045184540cecsm2177378edb.36.2022.09.11.16.04.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:53 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 10/37] target/i386: validate VEX prefixes via the instructions' exception classes Date: Mon, 12 Sep 2022 01:03:50 +0200 Message-Id: <20220911230418.340941-11-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 160 ++++++++++++++++++++++++++++++- target/i386/tcg/decode-new.h | 32 +++++++ target/i386/tcg/emit.c.inc | 34 ++++++- target/i386/tcg/translate.c | 17 ++-- 4 files changed, 232 insertions(+), 11 deletions(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index a9b8b6c05f..f6c032c694 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -91,6 +91,23 @@ #define zext0 .special = X86_SPECIAL_ZExtOp0, #define zext2 .special = X86_SPECIAL_ZExtOp2, +#define vex1 .vex_class = 1, +#define vex1_rep3 .vex_class = 1, .vex_special = X86_VEX_REPScalar, +#define vex2 .vex_class = 2, +#define vex2_rep3 .vex_class = 2, .vex_special = X86_VEX_REPScalar, +#define vex3 .vex_class = 3, +#define vex4 .vex_class = 4, +#define vex4_unal .vex_class = 4, .vex_special = X86_VEX_SSEUnaligned, +#define vex5 .vex_class = 5, +#define vex6 .vex_class = 6, +#define vex7 .vex_class = 7, +#define vex8 .vex_class = 8, +#define vex11 .vex_class = 11, +#define vex12 .vex_class = 12, +#define vex13 .vex_class = 13, + +#define avx2_256 .vex_special = X86_VEX_AVX2_256, + static uint8_t get_modrm(DisasContext *s, CPUX86State *env) { if (!s->has_modrm) { @@ -155,6 +172,18 @@ static const X86OpEntry opcodes_root[256] = { }; #undef mmx +#undef vex1 +#undef vex2 +#undef vex3 +#undef vex4 +#undef vex4_unal +#undef vex5 +#undef vex6 +#undef vex7 +#undef vex8 +#undef vex11 +#undef vex12 +#undef vex13 /* * Decode the fixed part of the opcode and place the last @@ -553,6 +582,132 @@ static bool has_cpuid_feature(DisasContext *s, X86CPUIDFeature cpuid) } } +static bool validate_vex(DisasContext *s, X86DecodedInsn *decode) +{ + X86OpEntry *e = &decode->e; + + switch (e->vex_special) { + case X86_VEX_REPScalar: + /* + * Instructions which differ between 00/66 and F2/F3 in the + * exception classification and the size of the memory operand. + */ + assert(e->vex_class == 1 || e->vex_class == 2); + if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) { + e->vex_class = 3; + if (s->vex_l) { + goto illegal; + } + assert(decode->e.s2 == X86_SIZE_x); + if (decode->op[2].has_ea) { + decode->op[2].ot = s->prefix & PREFIX_REPZ ? MO_32 : MO_64; + } + } + break; + + case X86_VEX_SSEUnaligned: + /* handled in sse_needs_alignment. */ + break; + + case X86_VEX_AVX2_256: + if ((s->prefix & PREFIX_VEX) && s->vex_l && !has_cpuid_feature(s, X86_FEAT_AVX2)) { + goto illegal; + } + } + + /* TODO: instructions that require VEX.W=0 (Table 2-16) */ + + switch (e->vex_class) { + case 0: + if (s->prefix & PREFIX_VEX) { + goto illegal; + } + return true; + case 1: + case 2: + case 3: + case 4: + case 5: + case 7: + if (s->prefix & PREFIX_VEX) { + if (!(s->flags & HF_AVX_EN_MASK)) { + goto illegal; + } + } else { + if (!(s->flags & HF_OSFXSR_MASK)) { + goto illegal; + } + } + break; + case 12: + assert(s->has_modrm); + /* Must have a VSIB byte and no address prefix. */ + if ((s->modrm & 7) != 4 || s->aflag == MO_16) { + goto illegal; + } + /* Check no overlap between registers. */ + if (decode->op[0].unit == decode->op[1].unit && decode->op[0].n == decode->op[1].n) { + goto illegal; + } + if (decode->op[0].unit == X86_OP_SSE && decode->op[0].n == decode->mem.index) { + goto illegal; + } + if (decode->op[1].unit == X86_OP_SSE && decode->op[1].n == decode->mem.index) { + goto illegal; + } + /* fall through */ + case 6: + case 11: + if (!(s->prefix & PREFIX_VEX)) { + goto illegal; + } + if (!(s->flags & HF_AVX_EN_MASK)) { + goto illegal; + } + break; + case 8: + if (!(s->prefix & PREFIX_VEX)) { + /* EMMS */ + return true; + } + if (!(s->flags & HF_AVX_EN_MASK)) { + goto illegal; + } + break; + case 13: + if (!(s->prefix & PREFIX_VEX)) { + goto illegal; + } + if (s->vex_l) { + goto illegal; + } + /* All integer instructions use VEX.vvvv, so exit. */ + return true; + } + + if (s->vex_v != 0 && + e->op0 != X86_TYPE_H && e->op0 != X86_TYPE_B && + e->op1 != X86_TYPE_H && e->op1 != X86_TYPE_B && + e->op2 != X86_TYPE_H && e->op2 != X86_TYPE_B) { + goto illegal; + } + + if (s->flags & HF_TS_MASK) { + goto nm_exception; + } + if (s->flags & HF_EM_MASK) { + goto illegal; + } + return true; + +nm_exception: + gen_NM_exception(s); + return false; +illegal: + gen_illegal_opcode(s); + return false; +} + /* convert one instruction. s->base.is_jmp is set if the translation must be stopped. Return the next pc value */ static target_ulong disas_insn_new(DisasContext *s, CPUState *cpu, int b) @@ -789,8 +944,11 @@ static target_ulong disas_insn_new(DisasContext *s, CPUState *cpu, int b) break; } + if (!validate_vex(s, &decode)) { + return s->pc; + } if (decode.op[0].has_ea || decode.op[1].has_ea || decode.op[2].has_ea) { - gen_load_ea(s, &decode.mem); + gen_load_ea(s, &decode.mem, decode.e.vex_class == 12); } if (s->prefix & PREFIX_LOCK) { if (decode.op[0].unit != X86_OP_INT || !decode.op[0].has_ea) { diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h index 6fb2d9151e..b5299d0dd2 100644 --- a/target/i386/tcg/decode-new.h +++ b/target/i386/tcg/decode-new.h @@ -152,6 +152,36 @@ typedef enum X86InsnSpecial { X86_SPECIAL_o64, } X86InsnSpecial; +/* + * Special cases for instructions that operate on XMM/YMM registers. Intel + * retconned all of them to have VEX exception classes other than 0 and 13, so + * all these only matter for instructions that have a VEX exception class. + * Based on tables in the "AVX and SSE Instruction Exception Specification" + * section of the manual. + */ +typedef enum X86VEXSpecial { + /* Legacy SSE instructions that allow unaligned operands */ + X86_VEX_SSEUnaligned, + + /* + * Used for instructions that distinguish the XMM operand type with an + * instruction prefix; legacy SSE encodings will allow unaligned operands + * for scalar operands only (identified by a REP prefix). In this case, + * the decoding table uses "x" for the vector operands instead of specifying + * pd/ps/sd/ss individually. + */ + X86_VEX_REPScalar, + + /* + * VEX instructions that only support 256-bit operands with AVX2 (Table 2-17 + * column 3). Columns 2 and 4 (instructions limited to 256- and 127-bit + * operands respectively) are implicit in the presence of dq and qq + * operands, and thus handled by decode_op_size. + */ + X86_VEX_AVX2_256, +} X86VEXSpecial; + + typedef struct X86OpEntry X86OpEntry; typedef struct X86DecodedInsn X86DecodedInsn; @@ -180,6 +210,8 @@ struct X86OpEntry { X86InsnSpecial special : 8; X86CPUIDFeature cpuid : 8; + uint8_t vex_class : 8; + X86VEXSpecial vex_special : 8; bool is_decode : 1; }; diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 6fa0062d6a..ce0205e05a 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -19,14 +19,19 @@ * License along with this library; if not, see . */ +static void gen_NM_exception(DisasContext *s) +{ + gen_exception(s, EXCP07_PREX, s->pc_start - s->cs_base); +} + static void gen_illegal(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { gen_illegal_opcode(s); } -static void gen_load_ea(DisasContext *s, AddressParts *mem) +static void gen_load_ea(DisasContext *s, AddressParts *mem, bool is_vsib) { - TCGv ea = gen_lea_modrm_1(s, *mem); + TCGv ea = gen_lea_modrm_1(s, *mem, is_vsib); gen_lea_v_seg(s, s->aflag, ea, mem->def_seg, s->override); } @@ -102,6 +107,25 @@ static void gen_load_sse(DisasContext *s, TCGv temp, MemOp ot, int dest_ofs) } } +static inline bool sse_needs_alignment(DisasContext *s, X86DecodedInsn *decode, X86DecodedOp *op) +{ + switch (decode->e.vex_class) { + case 2: + case 4: + if ((s->prefix & PREFIX_VEX) || + decode->e.vex_special == X86_VEX_SSEUnaligned) { + /* MOST legacy SSE instructions require aligned memory operands, but not all. */ + return false; + } + /* fall through */ + case 1: + return op->has_ea && op->ot >= MO_128; + + default: + return false; + } +} + static void gen_load(DisasContext *s, TCGv v, TCGv_ptr ptr, X86DecodedOp *op, uint64_t imm) { switch (op->unit) { @@ -175,7 +199,13 @@ static void gen_writeback(DisasContext *s, X86DecodedOp *op) } break; case X86_OP_MMX: + break; case X86_OP_SSE: + if ((s->prefix & PREFIX_VEX) && op->ot == MO_128) { + tcg_gen_gvec_dup_imm(MO_64, + offsetof(CPUX86State, xmm_regs[op->n].ZMM_X(1)), + 16, 16, 0); + } break; default: abort(); diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index a92ef61527..4ecf75ede3 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2217,11 +2217,11 @@ static AddressParts gen_lea_modrm_0(CPUX86State *env, DisasContext *s, } /* Compute the address, with a minimum number of TCG ops. */ -static TCGv gen_lea_modrm_1(DisasContext *s, AddressParts a) +static TCGv gen_lea_modrm_1(DisasContext *s, AddressParts a, bool is_vsib) { TCGv ea = NULL; - if (a.index >= 0) { + if (a.index >= 0 && !is_vsib) { if (a.scale == 0) { ea = cpu_regs[a.index]; } else { @@ -2249,7 +2249,7 @@ static TCGv gen_lea_modrm_1(DisasContext *s, AddressParts a) static void gen_lea_modrm(CPUX86State *env, DisasContext *s, int modrm) { AddressParts a = gen_lea_modrm_0(env, s, modrm); - TCGv ea = gen_lea_modrm_1(s, a); + TCGv ea = gen_lea_modrm_1(s, a, false); gen_lea_v_seg(s, s->aflag, ea, a.def_seg, s->override); } @@ -2262,7 +2262,8 @@ static void gen_nop_modrm(CPUX86State *env, DisasContext *s, int modrm) static void gen_bndck(CPUX86State *env, DisasContext *s, int modrm, TCGCond cond, TCGv_i64 bndv) { - TCGv ea = gen_lea_modrm_1(s, gen_lea_modrm_0(env, s, modrm)); + AddressParts a = gen_lea_modrm_0(env, s, modrm); + TCGv ea = gen_lea_modrm_1(s, a, false); tcg_gen_extu_tl_i64(s->tmp1_i64, ea); if (!CODE64(s)) { @@ -5953,7 +5954,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) reg = ((modrm >> 3) & 7) | REX_R(s); { AddressParts a = gen_lea_modrm_0(env, s, modrm); - TCGv ea = gen_lea_modrm_1(s, a); + TCGv ea = gen_lea_modrm_1(s, a, false); gen_lea_v_seg(s, s->aflag, ea, -1, -1); gen_op_mov_reg_v(s, dflag, reg, s->A0); } @@ -6180,7 +6181,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) if (mod != 3) { /* memory op */ AddressParts a = gen_lea_modrm_0(env, s, modrm); - TCGv ea = gen_lea_modrm_1(s, a); + TCGv ea = gen_lea_modrm_1(s, a, false); TCGv last_addr = tcg_temp_new(); bool update_fdp = true; @@ -7210,7 +7211,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) gen_exts(ot, s->T1); tcg_gen_sari_tl(s->tmp0, s->T1, 3 + ot); tcg_gen_shli_tl(s->tmp0, s->tmp0, ot); - tcg_gen_add_tl(s->A0, gen_lea_modrm_1(s, a), s->tmp0); + tcg_gen_add_tl(s->A0, gen_lea_modrm_1(s, a, false), s->tmp0); gen_lea_v_seg(s, s->aflag, s->A0, a.def_seg, s->override); if (!(s->prefix & PREFIX_LOCK)) { gen_op_ld_v(s, ot, s->T0, s->A0); @@ -8281,7 +8282,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) /* rip-relative generates #ud */ goto illegal_op; } - tcg_gen_not_tl(s->A0, gen_lea_modrm_1(s, a)); + tcg_gen_not_tl(s->A0, gen_lea_modrm_1(s, a, false)); if (!CODE64(s)) { tcg_gen_ext32u_tl(s->A0, s->A0); } From patchwork Sun Sep 11 23:03:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973118 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B461FC6FA86 for ; Sun, 11 Sep 2022 23:15:58 +0000 (UTC) Received: from localhost ([::1]:45018 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWBB-0007oF-Or for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:15:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58008) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0g-0000ZZ-UA for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:07 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:45099) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0f-0007C7-8j for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937504; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wP99hXnuPonol6H5+1uykQRwZBY8yy3iSpF/VeYj2vY=; b=BRQJUX28vIXbbwNN/ZLHnvzquFyrxaHiFGqTgWKrfzw4XcU3qw/Zi9/0R2ayO24zBiAG/s 5KTRBmTm+14y6V9jvo+XyH1N8iISEJxjOA2Is72x70o+AyEN3MZqNY9fna70Bs2WDnlAL2 JhXHqI/wQo3Qyp3EUTD2adMe7tHPwN8= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-561-L4_4L0WuPj2GFN9mrWz6Fw-1; Sun, 11 Sep 2022 19:04:57 -0400 X-MC-Unique: L4_4L0WuPj2GFN9mrWz6Fw-1 Received: by mail-ed1-f70.google.com with SMTP id y1-20020a056402358100b00451b144e23eso1180268edc.18 for ; Sun, 11 Sep 2022 16:04:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=wP99hXnuPonol6H5+1uykQRwZBY8yy3iSpF/VeYj2vY=; b=RrObV6llzT4b3lRFpB37HAfAtZQU+RBzyuO1hjbk6y/02ZW8UCnzymZ8ZKQFey/GTl WaHvwLBYCvO3GJ/K+wzHebgRhcOuVlhYUDyub8k+IPivT1XECt8mKOeltxjooQqmH+1u 5KjELfb8M6HvHPCgsKw0JXWe9hjg7cpOLdry32Kkf+sf6LWZy4g1GFHtcfn4qLG5i46D 02FARTfoFpCJBD6FubCpuq1K3N8ns6t5p40EvQQJexhbn4sSt1OpA3b3ZFnMZ+nYu1fb gjKREZBc4AI/MYn2AyydDGTJ5RZDPRb28mzAl5fhIffeG2B+fUR8H88o9OhKbFXjcKZl TBQw== X-Gm-Message-State: ACgBeo1VLTumA1MxmvcQMGqrp5E2zyt3i2tGgvM62OQ3appW7T+togVE MlXzYDzkryCjgdjmSUOtoLjPmueFNt+8jqJ7Jq6cvm/3b2WOQtQMFKTdjYDuZp5qMIRXXd3yU/s zIj5MZqKhP0sS8H4grzjX94cHuc6kCGCCKIC7eBVgFB3atU7V/q74iqP6iL5pELbUL40= X-Received: by 2002:a05:6402:268d:b0:451:d6e9:5572 with SMTP id w13-20020a056402268d00b00451d6e95572mr1665891edd.390.1662937496576; Sun, 11 Sep 2022 16:04:56 -0700 (PDT) X-Google-Smtp-Source: AA6agR4B4i6YV9/AsHy7yh1pnsjHNXqdhiAukV9jb0W0JwbX29ihd42lCmiBJgZLB5QSE6pk5zzS3A== X-Received: by 2002:a05:6402:268d:b0:451:d6e9:5572 with SMTP id w13-20020a056402268d00b00451d6e95572mr1665872edd.390.1662937496220; Sun, 11 Sep 2022 16:04:56 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id p11-20020a05640243cb00b0043df042bfc6sm4609917edc.47.2022.09.11.16.04.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:55 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 11/37] target/i386: validate SSE prefixes directly in the decoding table Date: Mon, 12 Sep 2022 01:03:51 +0200 Message-Id: <20220911230418.340941-12-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Many SSE and AVX instructions are only valid with specific prefixes (none, 66, F3, F2). Introduce a direct way to encode this in the decoding table to avoid using decode groups too much. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/decode-new.c.inc | 37 ++++++++++++++++++++++++++++++++ target/i386/tcg/decode-new.h | 1 + 2 files changed, 38 insertions(+) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index f6c032c694..7b4fd9fb54 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -108,6 +108,22 @@ #define avx2_256 .vex_special = X86_VEX_AVX2_256, +#define P_00 1 +#define P_66 (1 << PREFIX_DATA) +#define P_F3 (1 << PREFIX_REPZ) +#define P_F2 (1 << PREFIX_REPNZ) + +#define p_00 .valid_prefix = P_00, +#define p_66 .valid_prefix = P_66, +#define p_f3 .valid_prefix = P_F3, +#define p_f2 .valid_prefix = P_F2, +#define p_00_66 .valid_prefix = P_00|P_66, +#define p_00_f3 .valid_prefix = P_00|P_F3, +#define p_66_f2 .valid_prefix = P_66|P_F2, +#define p_00_66_f3 .valid_prefix = P_00|P_66|P_F3, +#define p_66_f3_f2 .valid_prefix = P_66|P_F3|P_F2, +#define p_00_66_f3_f2 .valid_prefix = P_00|P_66|P_F3|P_F2, + static uint8_t get_modrm(DisasContext *s, CPUX86State *env) { if (!s->has_modrm) { @@ -473,6 +489,23 @@ static bool decode_op(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, return true; } +static bool validate_sse_prefix(DisasContext *s, X86OpEntry *e) +{ + uint16_t sse_prefixes; + + if (!e->valid_prefix) { + return true; + } + if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) { + /* In SSE instructions, 0xF3 and 0xF2 cancel 0x66. */ + s->prefix &= ~PREFIX_DATA; + } + + /* Now, either zero or one bit is set in sse_prefixes. */ + sse_prefixes = s->prefix & (PREFIX_REPZ | PREFIX_REPNZ | PREFIX_DATA); + return e->valid_prefix & (1 << sse_prefixes); +} + static bool decode_insn(DisasContext *s, CPUX86State *env, X86DecodeFunc decode_func, X86DecodedInsn *decode) { @@ -484,6 +517,10 @@ static bool decode_insn(DisasContext *s, CPUX86State *env, X86DecodeFunc decode_ e->decode(s, env, e, &decode->b); } + if (!validate_sse_prefix(s, e)) { + return false; + } + /* First compute size of operands in order to initialize s->rip_offset. */ if (e->op0 != X86_TYPE_None) { if (!decode_op_size(s, e, e->s0, &decode->op[0].ot)) { diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h index b5299d0dd2..3db7b82506 100644 --- a/target/i386/tcg/decode-new.h +++ b/target/i386/tcg/decode-new.h @@ -212,6 +212,7 @@ struct X86OpEntry { X86CPUIDFeature cpuid : 8; uint8_t vex_class : 8; X86VEXSpecial vex_special : 8; + uint16_t valid_prefix : 16; bool is_decode : 1; }; From patchwork Sun Sep 11 23:03:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id ECD0FECAAD3 for ; Sun, 11 Sep 2022 23:27:41 +0000 (UTC) Received: from localhost ([::1]:45588 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWMW-0006o6-Sv for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:27:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58006) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0g-0000XP-6e for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:06 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:47919) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0d-00078r-Sh for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937503; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UNY3k0GK+xrv3mwMKFqtc6c6YsAVl3ncOB7LoSOElDA=; b=PZkMkLszs3VOmQdsP1jNfUGnZnqI/I10BxGFFELBEqajdMP4Oi2nLVGYDEbakdYYeoTP5/ 7Af7Rl81PSe6N41hpFgELzU0Iw34P4C/YXwCO9YJzecCvfXqHPnN80PPzCOhf96EoOPfD4 jv2A4CoM1lmveqBYSO2RCDM7JZcJpVQ= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-621-1prGvlsFPumQEnL1Zlnljg-1; Sun, 11 Sep 2022 19:05:01 -0400 X-MC-Unique: 1prGvlsFPumQEnL1Zlnljg-1 Received: by mail-ej1-f71.google.com with SMTP id dr17-20020a170907721100b00741a1ef8a20so2315286ejc.0 for ; Sun, 11 Sep 2022 16:05:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=UNY3k0GK+xrv3mwMKFqtc6c6YsAVl3ncOB7LoSOElDA=; b=dOBg8y9BXG4p3be2aRnopwDKl4zucYDNaMUvDLvgDht1jBu8EFfxlT8PfSUOvkhrS0 mjC41KkjMSnhJZqsFdzSG9tvobPmMgKz7jObWIBYIKAaY3WMM0sB1SCeISOfjcYRMqjm J5b2YjYH5/AI8TcCElVhOobGukJp4HZvSps8c6+9VaRpSGZoDIlnXmIO3zAYYVCuT2lF 1WlN1jPHeeJ5RA4GE9G856a9nDD/2sfGoBHPA/hdJrlwX6svD7ToJ4mhFfSVNbaOmvDh P007O3qr4XZ9bbDmAcy8aup3nfOcFKUp5/XFsjuI5Y7vqm3hm+Vn+rzxus/gI2SITz/c FhIw== X-Gm-Message-State: ACgBeo3rKvJ2Pfp5fdl6N2hyykemhn/19H16sT/1YBge9yLR7enBG665 e0zEZjgUh6LgiLgaloVMTD/5zH26VT5D8OdK7SmtAQRfQqWscnBP4J+zVFWBhUmt4pXgpk/CHC8 cUyWGa7lLUSpgKr3KC1+Tfi/7ALWeI1tuHy9OIpwTNhwAsT/f9v4OWiiZynrvfaDv9EI= X-Received: by 2002:a05:6402:4505:b0:451:1551:7b14 with SMTP id ez5-20020a056402450500b0045115517b14mr10552959edb.300.1662937499999; Sun, 11 Sep 2022 16:04:59 -0700 (PDT) X-Google-Smtp-Source: AA6agR5ugweRP24jlP6Wkd9NMgd8bPEkNmtLuqAjSN+q7H4SDWH0B9yUNSFNKkgxinxCnJKmYCwQBw== X-Received: by 2002:a05:6402:4505:b0:451:1551:7b14 with SMTP id ez5-20020a056402450500b0045115517b14mr10552929edb.300.1662937499536; Sun, 11 Sep 2022 16:04:59 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id p24-20020a056402075800b0045081dc93dfsm4655165edy.78.2022.09.11.16.04.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:04:59 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 12/37] target/i386: add scalar 0F 38 and 0F 3A instruction to new decoder Date: Mon, 12 Sep 2022 01:03:52 +0200 Message-Id: <20220911230418.340941-13-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Because these are the only VEX instructions that QEMU supports, the new decoder is entered on the first byte of a valid VEX prefix, and VEX decoding only needs to be done in decode-new.c.inc. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/decode-new.c.inc | 59 +++++++ target/i386/tcg/emit.c.inc | 261 +++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 49 +----- 3 files changed, 323 insertions(+), 46 deletions(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 7b4fd9fb54..b31daecb90 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -133,11 +133,69 @@ static uint8_t get_modrm(DisasContext *s, CPUX86State *env) return s->modrm; } +static void decode_group17(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86GenFunc group17_gen[8] = { + NULL, gen_BLSR, gen_BLSMSK, gen_BLSI, + }; + int op = (get_modrm(s, env) >> 3) & 7; + entry->gen = group17_gen[op]; +} + static const X86OpEntry opcodes_0F38_00toEF[240] = { }; /* five rows for no prefix, 66, F3, F2, 66+F2 */ static X86OpEntry opcodes_0F38_F0toFF[16][5] = { + [0] = { + X86_OP_ENTRY3(MOVBE, G,y, M,y, None,None, cpuid(MOVBE)), + X86_OP_ENTRY3(MOVBE, G,w, M,w, None,None, cpuid(MOVBE)), + {}, + X86_OP_ENTRY2(CRC32, G,d, E,b, cpuid(SSE42)), + X86_OP_ENTRY2(CRC32, G,d, E,b, cpuid(SSE42)), + }, + [1] = { + X86_OP_ENTRY3(MOVBE, M,y, G,y, None,None, cpuid(MOVBE)), + X86_OP_ENTRY3(MOVBE, M,w, G,w, None,None, cpuid(MOVBE)), + {}, + X86_OP_ENTRY2(CRC32, G,d, E,y, cpuid(SSE42)), + X86_OP_ENTRY2(CRC32, G,d, E,w, cpuid(SSE42)), + }, + [2] = { + X86_OP_ENTRY3(ANDN, G,y, B,y, E,y, vex13 cpuid(BMI1)), + {}, + {}, + {}, + {}, + }, + [3] = { + X86_OP_GROUP3(group17, B,y, E,y, None,None, vex13 cpuid(BMI1)), + {}, + {}, + {}, + {}, + }, + [5] = { + X86_OP_ENTRY3(BZHI, G,y, E,y, B,y, vex13 cpuid(BMI1)), + {}, + X86_OP_ENTRY3(PEXT, G,y, B,y, E,y, vex13 cpuid(BMI2)), + X86_OP_ENTRY3(PDEP, G,y, B,y, E,y, vex13 cpuid(BMI2)), + {}, + }, + [6] = { + {}, + X86_OP_ENTRY2(ADCX, G,y, E,y, cpuid(ADX)), + X86_OP_ENTRY2(ADOX, G,y, E,y, cpuid(ADX)), + X86_OP_ENTRY3(MULX, /* B,y, */ G,y, E,y, 2,y, vex13 cpuid(BMI2)), + {}, + }, + [7] = { + X86_OP_ENTRY3(BEXTR, G,y, E,y, B,y, vex13 cpuid(BMI1)), + X86_OP_ENTRY3(SHLX, G,y, E,y, B,y, vex13 cpuid(BMI1)), + X86_OP_ENTRY3(SARX, G,y, E,y, B,y, vex13 cpuid(BMI1)), + X86_OP_ENTRY3(SHRX, G,y, E,y, B,y, vex13 cpuid(BMI1)), + {}, + }, }; static void decode_0F38(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) @@ -159,6 +217,7 @@ static void decode_0F38(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui } static const X86OpEntry opcodes_0F3A[256] = { + [0xF0] = X86_OP_ENTRY3(RORX, G,y, E,y, I,b, vex13 cpuid(BMI2) p_f2), }; static void decode_0F3A(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index ce0205e05a..36b963a0d3 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -211,3 +211,264 @@ static void gen_writeback(DisasContext *s, X86DecodedOp *op) abort(); } } + +static void gen_ADCOX(DisasContext *s, CPUX86State *env, MemOp ot, int cc_op) +{ + TCGv carry_in = NULL; + TCGv carry_out = (cc_op == CC_OP_ADCX ? cpu_cc_dst : cpu_cc_src2); + TCGv zero; + + if (cc_op == s->cc_op || s->cc_op == CC_OP_ADCOX) { + /* Re-use the carry-out from a previous round. */ + carry_in = carry_out; + cc_op = s->cc_op; + } else if (s->cc_op == CC_OP_ADCX || s->cc_op == CC_OP_ADOX) { + /* Merge with the carry-out from the opposite instruction. */ + cc_op = CC_OP_ADCOX; + } + + /* If we don't have a carry-in, get it out of EFLAGS. */ + if (!carry_in) { + if (s->cc_op != CC_OP_ADCX && s->cc_op != CC_OP_ADOX) { + gen_compute_eflags(s); + } + carry_in = s->tmp0; + tcg_gen_extract_tl(carry_in, cpu_cc_src, + ctz32(cc_op == CC_OP_ADCX ? CC_C : CC_O), 1); + } + + switch (ot) { +#ifdef TARGET_X86_64 + case MO_32: + /* If TL is 64-bit just do everything in 64-bit arithmetic. */ + tcg_gen_add_i64(s->T0, s->T0, s->T1); + tcg_gen_add_i64(s->T0, s->T0, carry_in); + tcg_gen_shri_i64(carry_out, s->T0, 32); + break; +#endif + default: + zero = tcg_const_tl(0); + tcg_gen_add2_tl(s->T0, carry_out, s->T0, zero, carry_in, zero); + tcg_gen_add2_tl(s->T0, carry_out, s->T0, carry_out, s->T1, zero); + tcg_temp_free(zero); + break; + } + set_cc_op(s, cc_op); +} + +static void gen_ADCX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_ADCOX(s, env, decode->op[0].ot, CC_OP_ADCX); +} + +static void gen_ADOX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_ADCOX(s, env, decode->op[0].ot, CC_OP_ADOX); +} + +static void gen_ANDN(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + + tcg_gen_andc_tl(s->T0, s->T1, s->T0); + gen_op_update1_cc(s); + set_cc_op(s, CC_OP_LOGICB + ot); +} + +static void gen_BEXTR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + TCGv bound, zero; + + /* + * Extract START, and shift the operand. + * Shifts larger than operand size get zeros. + */ + tcg_gen_ext8u_tl(s->A0, s->T1); + tcg_gen_shr_tl(s->T0, s->T0, s->A0); + + bound = tcg_const_tl(ot == MO_64 ? 63 : 31); + zero = tcg_const_tl(0); + tcg_gen_movcond_tl(TCG_COND_LEU, s->T0, s->A0, bound, s->T0, zero); + tcg_temp_free(zero); + + /* Extract the LEN into a mask. Lengths larger than + * operand size get all ones. + */ + tcg_gen_extract_tl(s->A0, s->T1, 8, 8); + tcg_gen_movcond_tl(TCG_COND_LEU, s->A0, s->A0, bound, s->A0, bound); + tcg_temp_free(bound); + + tcg_gen_movi_tl(s->T1, 1); + tcg_gen_shl_tl(s->T1, s->T1, s->A0); + tcg_gen_subi_tl(s->T1, s->T1, 1); + tcg_gen_and_tl(s->T0, s->T0, s->T1); + + gen_op_update1_cc(s); + set_cc_op(s, CC_OP_LOGICB + ot); +} + +static void gen_BLSI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + + tcg_gen_neg_tl(s->T1, s->T0); + tcg_gen_and_tl(s->T0, s->T0, s->T1); + tcg_gen_mov_tl(cpu_cc_dst, s->T0); + set_cc_op(s, CC_OP_BMILGB + ot); +} + +static void gen_BLSMSK(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + + tcg_gen_subi_tl(s->T1, s->T0, 1); + tcg_gen_xor_tl(s->T0, s->T0, s->T1); + tcg_gen_mov_tl(cpu_cc_dst, s->T0); + set_cc_op(s, CC_OP_BMILGB + ot); +} + +static void gen_BLSR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + + tcg_gen_subi_tl(s->T1, s->T0, 1); + tcg_gen_and_tl(s->T0, s->T0, s->T1); + tcg_gen_mov_tl(cpu_cc_dst, s->T0); + set_cc_op(s, CC_OP_BMILGB + ot); +} + +static void gen_BZHI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + TCGv bound; + + tcg_gen_ext8u_tl(s->T1, cpu_regs[s->vex_v]); + bound = tcg_const_tl(ot == MO_64 ? 63 : 31); + + /* + * Note that since we're using BMILG (in order to get O + * cleared) we need to store the inverse into C. + */ + tcg_gen_setcond_tl(TCG_COND_LT, cpu_cc_src, s->T1, bound); + tcg_gen_movcond_tl(TCG_COND_GT, s->T1, s->T1, bound, bound, s->T1); + tcg_temp_free(bound); + + tcg_gen_movi_tl(s->A0, -1); + tcg_gen_shl_tl(s->A0, s->A0, s->T1); + tcg_gen_andc_tl(s->T0, s->T0, s->A0); + + gen_op_update1_cc(s); + set_cc_op(s, CC_OP_BMILGB + ot); +} + +static void gen_CRC32(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[2].ot; + + tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0); + gen_helper_crc32(s->T0, s->tmp2_i32, s->T1, tcg_const_i32(8 << ot)); +} + +static void gen_MOVBE(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + + /* M operand type does not load/store */ + if (decode->e.op0 == X86_TYPE_M) { + tcg_gen_qemu_st_tl(s->T0, s->A0, s->mem_index, ot | MO_BE); + } else { + tcg_gen_qemu_ld_tl(s->T0, s->A0, s->mem_index, ot | MO_BE); + } +} + +static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + + /* low part of result in VEX.vvvv, high in MODRM */ + switch (ot) { + default: + tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0); + tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1); + tcg_gen_mulu2_i32(s->tmp2_i32, s->tmp3_i32, + s->tmp2_i32, s->tmp3_i32); + tcg_gen_extu_i32_tl(cpu_regs[s->vex_v], s->tmp2_i32); + tcg_gen_extu_i32_tl(s->T0, s->tmp3_i32); + break; +#ifdef TARGET_X86_64 + case MO_64: + tcg_gen_mulu2_i64(cpu_regs[s->vex_v], s->T0, s->T0, s->T1); + break; +#endif + } + +} + +static void gen_PDEP(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[1].ot; + if (ot < MO_64) { + tcg_gen_ext32u_tl(s->T0, s->T0); + } + gen_helper_pdep(s->T0, s->T0, s->T1); +} + +static void gen_PEXT(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[1].ot; + if (ot < MO_64) { + tcg_gen_ext32u_tl(s->T0, s->T0); + } + gen_helper_pext(s->T0, s->T0, s->T1); +} + +static void gen_RORX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + int b = decode->immediate; + + if (ot == MO_64) { + tcg_gen_rotri_tl(s->T0, s->T0, b & 63); + } else { + tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0); + tcg_gen_rotri_i32(s->tmp2_i32, s->tmp2_i32, b & 31); + tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); + } +} + +static void gen_SARX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + int mask; + + mask = ot == MO_64 ? 63 : 31; + tcg_gen_andi_tl(s->T1, s->T1, mask); + if (ot != MO_64) { + tcg_gen_ext32s_tl(s->T0, s->T0); + } + tcg_gen_sar_tl(s->T0, s->T0, s->T1); +} + +static void gen_SHLX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + int mask; + + mask = ot == MO_64 ? 63 : 31; + tcg_gen_andi_tl(s->T1, s->T1, mask); + tcg_gen_shl_tl(s->T0, s->T0, s->T1); +} + +static void gen_SHRX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + int mask; + + mask = ot == MO_64 ? 63 : 31; + tcg_gen_andi_tl(s->T1, s->T1, mask); + if (ot != MO_64) { + tcg_gen_ext32u_tl(s->T0, s->T0); + } + tcg_gen_shr_tl(s->T0, s->T0, s->T1); +} diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 4ecf75ede3..7eed575f2e 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4892,59 +4892,16 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) #endif case 0xc5: /* 2-byte VEX */ case 0xc4: /* 3-byte VEX */ - use_new = false; - /* VEX prefixes cannot be used except in 32-bit mode. - Otherwise the instruction is LES or LDS. */ if (CODE32(s) && !VM86(s)) { - static const int pp_prefix[4] = { - 0, PREFIX_DATA, PREFIX_REPZ, PREFIX_REPNZ - }; - int vex3, vex2 = x86_ldub_code(env, s); + int vex2 = x86_ldub_code(env, s); + s->pc--; /* rewind the advance_pc() x86_ldub_code() did */ if (!CODE64(s) && (vex2 & 0xc0) != 0xc0) { /* 4.1.4.6: In 32-bit mode, bits [7:6] must be 11b, otherwise the instruction is LES or LDS. */ - s->pc--; /* rewind the advance_pc() x86_ldub_code() did */ break; } - - /* 4.1.1-4.1.3: No preceding lock, 66, f2, f3, or rex prefixes. */ - if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ - | PREFIX_LOCK | PREFIX_DATA | PREFIX_REX)) { - goto illegal_op; - } -#ifdef TARGET_X86_64 - s->rex_r = (~vex2 >> 4) & 8; -#endif - if (b == 0xc5) { - /* 2-byte VEX prefix: RVVVVlpp, implied 0f leading opcode byte */ - vex3 = vex2; - b = x86_ldub_code(env, s) | 0x100; - } else { - /* 3-byte VEX prefix: RXBmmmmm wVVVVlpp */ - vex3 = x86_ldub_code(env, s); -#ifdef TARGET_X86_64 - s->rex_x = (~vex2 >> 3) & 8; - s->rex_b = (~vex2 >> 2) & 8; - s->rex_w = (vex3 >> 7) & 1; -#endif - switch (vex2 & 0x1f) { - case 0x01: /* Implied 0f leading opcode bytes. */ - b = x86_ldub_code(env, s) | 0x100; - break; - case 0x02: /* Implied 0f 38 leading opcode bytes. */ - b = 0x138; - break; - case 0x03: /* Implied 0f 3a leading opcode bytes. */ - b = 0x13a; - break; - default: /* Reserved for future use. */ - goto unknown_op; - } - } - s->vex_v = (~vex3 >> 3) & 0xf; - s->vex_l = (vex3 >> 2) & 1; - prefixes |= pp_prefix[vex3 & 3] | PREFIX_VEX; + return disas_insn_new(s, cpu, b); } break; } From patchwork Sun Sep 11 23:03:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6A282C6FA83 for ; Sun, 11 Sep 2022 23:29:54 +0000 (UTC) Received: from localhost ([::1]:58304 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWOf-00048L-Iw for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:29:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58010) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0i-0000e3-Gz for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:08 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36635) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0g-0007CO-Fs for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937505; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=helfURINZcnrcnmWQ0ZAJ5xU5St994WUFyONXZJwxwQ=; b=gdvkjMgQJL6Vv4k4BVpdU47JzE5iyJhdDPrhJAzkTDfWZqYUEKISrCuYvAG/AnEBsEyKTw KSd2hURAgLwDKNbR1Ijy13Kg693GMmXoYnMWoKkWDiN8AXzqRjDyLdpKYBQFNSZtKqIAaE 8a4Hxry1ANLXfBGh8yeAL6yZcJm+E0Y= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-586-ufb8xmawMkafCxd5FFagTg-1; Sun, 11 Sep 2022 19:05:04 -0400 X-MC-Unique: ufb8xmawMkafCxd5FFagTg-1 Received: by mail-ed1-f70.google.com with SMTP id p4-20020a056402500400b00447e8b6f62bso5005922eda.17 for ; Sun, 11 Sep 2022 16:05:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=helfURINZcnrcnmWQ0ZAJ5xU5St994WUFyONXZJwxwQ=; b=pFlBA1uArY0lv7DPZsyZmWpZ71EGXl/gyLnyZYQuAZ1QfpCEugarxbuCuNiwvgCgwJ N3IDrfWJ/xBUOQX61RQ9RRKxeFzM7pPNzrF6zPDFqgHZlqL5OnKfR6D89bFY9BJA19qV G5Z1zlzZxwORPnd8pvxlQAkJyFbx8L+a+vg7Gft0QUqU1Qq56qTHdlmMTfEQ1GPk0j3t VdMrM0NCPuD2AlXqJJBpkYkst8eYLfBWWAo8P3e89PA78ZbEPGh3uzjR6fr4xICRkPMK eoXSbjZwWBNrP1uciZC/xxKAZBw1AwJP+aEbLz2evFPyuA9NgSc3OnzUqePbzgo8fQhJ h0ew== X-Gm-Message-State: ACgBeo17EqQVUXTtKAxhjTxn6szfMpxlJn7gXU4q+53AYGMQUWc406kC Wfi9R7IK6fkAHzfNsXRhx6UfqRAypcGQqGqdSjQDZpIwFUNQlyvjUbn9QeUAcG+rJ+1cBjcI5X4 DymIHAWfCZigqUUiLcIzRHMwyf06D8GbQC6mcB2VNN/vu8rlWSLtQlE15bBMdsce+izE= X-Received: by 2002:a17:907:2bcf:b0:772:4b8e:6b29 with SMTP id gv15-20020a1709072bcf00b007724b8e6b29mr14124678ejc.412.1662937503019; Sun, 11 Sep 2022 16:05:03 -0700 (PDT) X-Google-Smtp-Source: AA6agR5EKchO/52fX11BA7//6YOLX4E4DrIFJmvZr1S6Gv3h296XXCFYNddFn2Cs23ULiqfMZYrvdQ== X-Received: by 2002:a17:907:2bcf:b0:772:4b8e:6b29 with SMTP id gv15-20020a1709072bcf00b007724b8e6b29mr14124661ejc.412.1662937502544; Sun, 11 Sep 2022 16:05:02 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id 2-20020a170906210200b00715a02874acsm3536999ejt.35.2022.09.11.16.05.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:02 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 13/37] target/i386: remove scalar VEX instructions from old decoder Date: Mon, 12 Sep 2022 01:03:53 +0200 Message-Id: <20220911230418.340941-14-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This is all dead code, since the VEX prefix goes straight to the new decoder. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 243 ------------------------------------ 1 file changed, 243 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 7eed575f2e..240811bd49 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4119,151 +4119,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, s->mem_index, ot | MO_BE); } break; - - case 0x0f2: /* andn Gy, By, Ey */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - tcg_gen_andc_tl(s->T0, s->T0, cpu_regs[s->vex_v]); - gen_op_mov_reg_v(s, ot, reg, s->T0); - gen_op_update1_cc(s); - set_cc_op(s, CC_OP_LOGICB + ot); - break; - - case 0x0f7: /* bextr Gy, Ey, By */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - { - TCGv bound, zero; - - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - /* Extract START, and shift the operand. - Shifts larger than operand size get zeros. */ - tcg_gen_ext8u_tl(s->A0, cpu_regs[s->vex_v]); - tcg_gen_shr_tl(s->T0, s->T0, s->A0); - - bound = tcg_const_tl(ot == MO_64 ? 63 : 31); - zero = tcg_const_tl(0); - tcg_gen_movcond_tl(TCG_COND_LEU, s->T0, s->A0, bound, - s->T0, zero); - tcg_temp_free(zero); - - /* Extract the LEN into a mask. Lengths larger than - operand size get all ones. */ - tcg_gen_extract_tl(s->A0, cpu_regs[s->vex_v], 8, 8); - tcg_gen_movcond_tl(TCG_COND_LEU, s->A0, s->A0, bound, - s->A0, bound); - tcg_temp_free(bound); - tcg_gen_movi_tl(s->T1, 1); - tcg_gen_shl_tl(s->T1, s->T1, s->A0); - tcg_gen_subi_tl(s->T1, s->T1, 1); - tcg_gen_and_tl(s->T0, s->T0, s->T1); - - gen_op_mov_reg_v(s, ot, reg, s->T0); - gen_op_update1_cc(s); - set_cc_op(s, CC_OP_LOGICB + ot); - } - break; - - case 0x0f5: /* bzhi Gy, Ey, By */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI2) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - tcg_gen_ext8u_tl(s->T1, cpu_regs[s->vex_v]); - { - TCGv bound = tcg_const_tl(ot == MO_64 ? 63 : 31); - /* Note that since we're using BMILG (in order to get O - cleared) we need to store the inverse into C. */ - tcg_gen_setcond_tl(TCG_COND_LT, cpu_cc_src, - s->T1, bound); - tcg_gen_movcond_tl(TCG_COND_GT, s->T1, s->T1, - bound, bound, s->T1); - tcg_temp_free(bound); - } - tcg_gen_movi_tl(s->A0, -1); - tcg_gen_shl_tl(s->A0, s->A0, s->T1); - tcg_gen_andc_tl(s->T0, s->T0, s->A0); - gen_op_mov_reg_v(s, ot, reg, s->T0); - gen_op_update1_cc(s); - set_cc_op(s, CC_OP_BMILGB + ot); - break; - - case 0x3f6: /* mulx By, Gy, rdx, Ey */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI2) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - switch (ot) { - default: - tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0); - tcg_gen_trunc_tl_i32(s->tmp3_i32, cpu_regs[R_EDX]); - tcg_gen_mulu2_i32(s->tmp2_i32, s->tmp3_i32, - s->tmp2_i32, s->tmp3_i32); - tcg_gen_extu_i32_tl(cpu_regs[s->vex_v], s->tmp2_i32); - tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp3_i32); - break; -#ifdef TARGET_X86_64 - case MO_64: - tcg_gen_mulu2_i64(s->T0, s->T1, - s->T0, cpu_regs[R_EDX]); - tcg_gen_mov_i64(cpu_regs[s->vex_v], s->T0); - tcg_gen_mov_i64(cpu_regs[reg], s->T1); - break; -#endif - } - break; - - case 0x3f5: /* pdep Gy, By, Ey */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI2) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - /* Note that by zero-extending the source operand, we - automatically handle zero-extending the result. */ - if (ot == MO_64) { - tcg_gen_mov_tl(s->T1, cpu_regs[s->vex_v]); - } else { - tcg_gen_ext32u_tl(s->T1, cpu_regs[s->vex_v]); - } - gen_helper_pdep(cpu_regs[reg], s->T1, s->T0); - break; - - case 0x2f5: /* pext Gy, By, Ey */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI2) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - /* Note that by zero-extending the source operand, we - automatically handle zero-extending the result. */ - if (ot == MO_64) { - tcg_gen_mov_tl(s->T1, cpu_regs[s->vex_v]); - } else { - tcg_gen_ext32u_tl(s->T1, cpu_regs[s->vex_v]); - } - gen_helper_pext(cpu_regs[reg], s->T1, s->T0); - break; - case 0x1f6: /* adcx Gy, Ey */ case 0x2f6: /* adox Gy, Ey */ CHECK_NO_VEX(s); @@ -4343,73 +4198,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; - case 0x1f7: /* shlx Gy, Ey, By */ - case 0x2f7: /* sarx Gy, Ey, By */ - case 0x3f7: /* shrx Gy, Ey, By */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI2) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - if (ot == MO_64) { - tcg_gen_andi_tl(s->T1, cpu_regs[s->vex_v], 63); - } else { - tcg_gen_andi_tl(s->T1, cpu_regs[s->vex_v], 31); - } - if (b == 0x1f7) { - tcg_gen_shl_tl(s->T0, s->T0, s->T1); - } else if (b == 0x2f7) { - if (ot != MO_64) { - tcg_gen_ext32s_tl(s->T0, s->T0); - } - tcg_gen_sar_tl(s->T0, s->T0, s->T1); - } else { - if (ot != MO_64) { - tcg_gen_ext32u_tl(s->T0, s->T0); - } - tcg_gen_shr_tl(s->T0, s->T0, s->T1); - } - gen_op_mov_reg_v(s, ot, reg, s->T0); - break; - - case 0x0f3: - case 0x1f3: - case 0x2f3: - case 0x3f3: /* Group 17 */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - - tcg_gen_mov_tl(cpu_cc_src, s->T0); - switch (reg & 7) { - case 1: /* blsr By,Ey */ - tcg_gen_subi_tl(s->T1, s->T0, 1); - tcg_gen_and_tl(s->T0, s->T0, s->T1); - break; - case 2: /* blsmsk By,Ey */ - tcg_gen_subi_tl(s->T1, s->T0, 1); - tcg_gen_xor_tl(s->T0, s->T0, s->T1); - break; - case 3: /* blsi By, Ey */ - tcg_gen_neg_tl(s->T1, s->T0); - tcg_gen_and_tl(s->T0, s->T0, s->T1); - break; - default: - goto unknown_op; - } - tcg_gen_mov_tl(cpu_cc_dst, s->T0); - gen_op_mov_reg_v(s, ot, s->vex_v, s->T0); - set_cc_op(s, CC_OP_BMILGB + ot); - break; - - default: - goto unknown_op; } break; @@ -4625,37 +4413,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; - case 0x33a: - /* Various integer extensions at 0f 3a f[0-f]. */ - b = modrm | (b1 << 8); - modrm = x86_ldub_code(env, s); - reg = ((modrm >> 3) & 7) | REX_R(s); - - switch (b) { - case 0x3f0: /* rorx Gy,Ey, Ib */ - if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI2) - || !(s->prefix & PREFIX_VEX) - || s->vex_l != 0) { - goto illegal_op; - } - ot = mo_64_32(s->dflag); - gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); - b = x86_ldub_code(env, s); - if (ot == MO_64) { - tcg_gen_rotri_tl(s->T0, s->T0, b & 63); - } else { - tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0); - tcg_gen_rotri_i32(s->tmp2_i32, s->tmp2_i32, b & 31); - tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); - } - gen_op_mov_reg_v(s, ot, reg, s->T0); - break; - - default: - goto unknown_op; - } - break; - default: unknown_op: gen_unknown_opcode(env, s); From patchwork Sun Sep 11 23:03:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 902EFECAAD3 for ; Sun, 11 Sep 2022 23:21:42 +0000 (UTC) Received: from localhost ([::1]:34714 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWGj-00059J-HV for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:21:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58012) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0m-0000kg-Bf for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:40495) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0k-0007D8-5X for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937509; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bMtUDiV1+ffhKLN6thVrdMzy6J2M1GpEPn/VYO3Snmo=; b=EgcqRUmgDBdg/94mwVvLjGqnkk0OseqqvctFyFG/Jhf2KDtA7gD5C18EqlaImEY3W61sk4 gYtdcOepvvwSR6A4k5M0UQ8JzfGsFP9Df9xjj16EewpaqnJt8H1WNYgoafjPSNXgscwbAW yUnc1bDTfmWG6D7NA6hlyQZ7XOIg8Ow= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-260-C-68IwNMOA-Ygm5EyEOe7Q-1; Sun, 11 Sep 2022 19:05:07 -0400 X-MC-Unique: C-68IwNMOA-Ygm5EyEOe7Q-1 Received: by mail-ej1-f70.google.com with SMTP id ga37-20020a1709070c2500b0077dbc748733so67917ejc.17 for ; Sun, 11 Sep 2022 16:05:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=bMtUDiV1+ffhKLN6thVrdMzy6J2M1GpEPn/VYO3Snmo=; b=lXoLzqzM0jBTzlqAq2SU8blo78hBZJS1k0/ABDuetUtw/qn/9vXiMQNpiEu6JIX6Ng 23hr3yiAV8ahLHxyQFW0+YGnAGQ4ZyjW6cE7kf3yRnJYXsJ9fU12f8DhNcJvFvS0eg+t y1QrrQEfakC2AOI2xHjBQ1eGQQrCbwOGWen8lNCdKRpFrg4cEvtqM0BkjkL7ZbNBcV7m xnC0Ua7nyvXyi8dS+6PaKDXHF2othiEA/rEgrqaqa/qhAakGfcaIIkZBNxT/Y2/gLKPb cCD2FK31aeZNEF5mxXy3HDNlyirgQC9iBcZiNnlykceH+QMhAolJ39hh2grR+XVOujQ0 dwQA== X-Gm-Message-State: ACgBeo2VT0gojDFE/oMESHvk2IKFX7o6v5NOqnWIZlMHFM/rpyLbbHvu +wFrAKiA+mfXu2uqcgGwytuiRLXm0GWlUqudwTdh0tkgWI5MUy0ccKqOQS689FKW/zC9GdiLxtr 2PPhcXefQpMNuT6L1nGx2uT9N3iyQl/PL7dcvPkYcgppETozMvKxw95g5q/QnoxzDK+Y= X-Received: by 2002:a17:907:802:b0:73d:c710:943e with SMTP id wv2-20020a170907080200b0073dc710943emr16671233ejb.214.1662937505644; Sun, 11 Sep 2022 16:05:05 -0700 (PDT) X-Google-Smtp-Source: AA6agR6cUvrG8AQzCkng+i9yqy2RP23+eFj12dzN0v4qMPXEGcQ8QTePMtebOaBYsg8wBAEcFBvBXg== X-Received: by 2002:a17:907:802:b0:73d:c710:943e with SMTP id wv2-20020a170907080200b0073dc710943emr16671217ejb.214.1662937505323; Sun, 11 Sep 2022 16:05:05 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id t11-20020aa7d4cb000000b0044e8774914esm4730319edr.35.2022.09.11.16.05.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:04 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: Paul Brook Subject: [PATCH 14/37] target/i386: Prepare ops_sse_header.h for 256 bit AVX Date: Mon, 12 Sep 2022 01:03:54 +0200 Message-Id: <20220911230418.340941-15-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Adjust all #ifdefs to match the ones in ops_sse.h. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-23-paul@nowt.org> Signed-off-by: Paolo Bonzini Acked-by: Richard Henderson --- target/i386/ops_sse_header.h | 114 +++++++++++++++++++++++------------ 1 file changed, 75 insertions(+), 39 deletions(-) diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index d99464afb0..7f57dab496 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -43,7 +43,7 @@ DEF_HELPER_3(glue(pslld, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(psrlq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(psllq, SUFFIX), void, env, Reg, Reg) -#if SHIFT == 1 +#if SHIFT >= 1 DEF_HELPER_3(glue(psrldq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pslldq, SUFFIX), void, env, Reg, Reg) #endif @@ -101,7 +101,7 @@ SSE_HELPER_L(pcmpeql, FCMPEQ) SSE_HELPER_W(pmullw, FMULLW) #if SHIFT == 0 -SSE_HELPER_W(pmulhrw, FMULHRW) +DEF_HELPER_3(glue(pmulhrw, SUFFIX), void, env, Reg, Reg) #endif SSE_HELPER_W(pmulhuw, FMULHUW) SSE_HELPER_W(pmulhw, FMULHW) @@ -113,7 +113,9 @@ DEF_HELPER_3(glue(pmuludq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmaddwd, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(psadbw, SUFFIX), void, env, Reg, Reg) +#if SHIFT < 2 DEF_HELPER_4(glue(maskmov, SUFFIX), void, env, Reg, Reg, tl) +#endif DEF_HELPER_2(glue(movl_mm_T0, SUFFIX), void, Reg, i32) #ifdef TARGET_X86_64 DEF_HELPER_2(glue(movq_mm_T0, SUFFIX), void, Reg, i64) @@ -122,38 +124,63 @@ DEF_HELPER_2(glue(movq_mm_T0, SUFFIX), void, Reg, i64) #if SHIFT == 0 DEF_HELPER_3(glue(pshufw, SUFFIX), void, Reg, Reg, int) #else -DEF_HELPER_3(glue(shufps, SUFFIX), void, Reg, Reg, int) -DEF_HELPER_3(glue(shufpd, SUFFIX), void, Reg, Reg, int) DEF_HELPER_3(glue(pshufd, SUFFIX), void, Reg, Reg, int) DEF_HELPER_3(glue(pshuflw, SUFFIX), void, Reg, Reg, int) DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int) #endif -#if SHIFT == 1 +#if SHIFT >= 1 /* FPU ops */ /* XXX: not accurate */ -#define SSE_HELPER_S(name, F) \ - DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ - DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ - DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) \ +#define SSE_HELPER_P4(name) \ + DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ + DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) + +#define SSE_HELPER_P3(name, ...) \ + DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ + DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) + +#if SHIFT == 1 +#define SSE_HELPER_S4(name) \ + SSE_HELPER_P4(name) \ + DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ DEF_HELPER_3(name ## sd, void, env, Reg, Reg) +#define SSE_HELPER_S3(name) \ + SSE_HELPER_P3(name) \ + DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ + DEF_HELPER_3(name ## sd, void, env, Reg, Reg) +#else +#define SSE_HELPER_S4(name, ...) SSE_HELPER_P4(name) +#define SSE_HELPER_S3(name, ...) SSE_HELPER_P3(name) +#endif -SSE_HELPER_S(add, FPU_ADD) -SSE_HELPER_S(sub, FPU_SUB) -SSE_HELPER_S(mul, FPU_MUL) -SSE_HELPER_S(div, FPU_DIV) -SSE_HELPER_S(min, FPU_MIN) -SSE_HELPER_S(max, FPU_MAX) -SSE_HELPER_S(sqrt, FPU_SQRT) +DEF_HELPER_3(glue(shufps, SUFFIX), void, Reg, Reg, int) +DEF_HELPER_3(glue(shufpd, SUFFIX), void, Reg, Reg, int) +SSE_HELPER_S4(add) +SSE_HELPER_S4(sub) +SSE_HELPER_S4(mul) +SSE_HELPER_S4(div) +SSE_HELPER_S4(min) +SSE_HELPER_S4(max) + +SSE_HELPER_S3(sqrt) DEF_HELPER_3(glue(cvtps2pd, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(cvtpd2ps, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(cvtss2sd, void, env, Reg, Reg) -DEF_HELPER_3(cvtsd2ss, void, env, Reg, Reg) DEF_HELPER_3(glue(cvtdq2ps, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(cvtdq2pd, SUFFIX), void, env, Reg, Reg) + +DEF_HELPER_3(glue(cvtps2dq, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(cvtpd2dq, SUFFIX), void, env, ZMMReg, ZMMReg) + +DEF_HELPER_3(glue(cvttps2dq, SUFFIX), void, env, ZMMReg, ZMMReg) +DEF_HELPER_3(glue(cvttpd2dq, SUFFIX), void, env, ZMMReg, ZMMReg) + +#if SHIFT == 1 +DEF_HELPER_3(cvtss2sd, void, env, Reg, Reg) +DEF_HELPER_3(cvtsd2ss, void, env, Reg, Reg) DEF_HELPER_3(cvtpi2ps, void, env, ZMMReg, MMXReg) DEF_HELPER_3(cvtpi2pd, void, env, ZMMReg, MMXReg) DEF_HELPER_3(cvtsi2ss, void, env, ZMMReg, i32) @@ -164,8 +191,6 @@ DEF_HELPER_3(cvtsq2ss, void, env, ZMMReg, i64) DEF_HELPER_3(cvtsq2sd, void, env, ZMMReg, i64) #endif -DEF_HELPER_3(glue(cvtps2dq, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(cvtpd2dq, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(cvtps2pi, void, env, MMXReg, ZMMReg) DEF_HELPER_3(cvtpd2pi, void, env, MMXReg, ZMMReg) DEF_HELPER_2(cvtss2si, s32, env, ZMMReg) @@ -175,8 +200,6 @@ DEF_HELPER_2(cvtss2sq, s64, env, ZMMReg) DEF_HELPER_2(cvtsd2sq, s64, env, ZMMReg) #endif -DEF_HELPER_3(glue(cvttps2dq, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(cvttpd2dq, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(cvttps2pi, void, env, MMXReg, ZMMReg) DEF_HELPER_3(cvttpd2pi, void, env, MMXReg, ZMMReg) DEF_HELPER_2(cvttss2si, s32, env, ZMMReg) @@ -185,27 +208,24 @@ DEF_HELPER_2(cvttsd2si, s32, env, ZMMReg) DEF_HELPER_2(cvttss2sq, s64, env, ZMMReg) DEF_HELPER_2(cvttsd2sq, s64, env, ZMMReg) #endif +#endif DEF_HELPER_3(glue(rsqrtps, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(rsqrtss, void, env, ZMMReg, ZMMReg) DEF_HELPER_3(glue(rcpps, SUFFIX), void, env, ZMMReg, ZMMReg) +#if SHIFT == 1 +DEF_HELPER_3(rsqrtss, void, env, ZMMReg, ZMMReg) DEF_HELPER_3(rcpss, void, env, ZMMReg, ZMMReg) DEF_HELPER_3(extrq_r, void, env, ZMMReg, ZMMReg) DEF_HELPER_4(extrq_i, void, env, ZMMReg, int, int) DEF_HELPER_3(insertq_r, void, env, ZMMReg, ZMMReg) DEF_HELPER_4(insertq_i, void, env, ZMMReg, int, int) -DEF_HELPER_3(glue(haddps, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(haddpd, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(hsubps, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(hsubpd, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(addsubps, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(addsubpd, SUFFIX), void, env, ZMMReg, ZMMReg) +#endif -#define SSE_HELPER_CMP(name, F, C) \ - DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ - DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ - DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) \ - DEF_HELPER_3(name ## sd, void, env, Reg, Reg) +SSE_HELPER_P4(hadd) +SSE_HELPER_P4(hsub) +SSE_HELPER_P4(addsub) + +#define SSE_HELPER_CMP(name, F, C) SSE_HELPER_S4(name) SSE_HELPER_CMP(cmpeq, FPU_CMPQ, FPU_EQ) SSE_HELPER_CMP(cmplt, FPU_CMPS, FPU_LT) @@ -216,10 +236,13 @@ SSE_HELPER_CMP(cmpnlt, FPU_CMPS, !FPU_LT) SSE_HELPER_CMP(cmpnle, FPU_CMPS, !FPU_LE) SSE_HELPER_CMP(cmpord, FPU_CMPQ, !FPU_UNORD) +#if SHIFT == 1 DEF_HELPER_3(ucomiss, void, env, Reg, Reg) DEF_HELPER_3(comiss, void, env, Reg, Reg) DEF_HELPER_3(ucomisd, void, env, Reg, Reg) DEF_HELPER_3(comisd, void, env, Reg, Reg) +#endif + DEF_HELPER_2(glue(movmskps, SUFFIX), i32, env, Reg) DEF_HELPER_2(glue(movmskpd, SUFFIX), i32, env, Reg) #endif @@ -236,7 +259,7 @@ DEF_HELPER_3(glue(packssdw, SUFFIX), void, env, Reg, Reg) UNPCK_OP(l, 0) UNPCK_OP(h, 1) -#if SHIFT == 1 +#if SHIFT >= 1 DEF_HELPER_3(glue(punpcklqdq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(punpckhqdq, SUFFIX), void, env, Reg, Reg) #endif @@ -283,7 +306,7 @@ DEF_HELPER_3(glue(psignd, SUFFIX), void, env, Reg, Reg) DEF_HELPER_4(glue(palignr, SUFFIX), void, env, Reg, Reg, s32) /* SSE4.1 op helpers */ -#if SHIFT == 1 +#if SHIFT >= 1 DEF_HELPER_3(glue(pblendvb, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(blendvps, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(blendvpd, SUFFIX), void, env, Reg, Reg) @@ -312,22 +335,30 @@ DEF_HELPER_3(glue(pmaxsd, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmaxuw, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmaxud, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmulld, SUFFIX), void, env, Reg, Reg) +#if SHIFT == 1 DEF_HELPER_3(glue(phminposuw, SUFFIX), void, env, Reg, Reg) +#endif DEF_HELPER_4(glue(roundps, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(roundpd, SUFFIX), void, env, Reg, Reg, i32) +#if SHIFT == 1 DEF_HELPER_4(glue(roundss, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(roundsd, SUFFIX), void, env, Reg, Reg, i32) +#endif DEF_HELPER_4(glue(blendps, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(blendpd, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(pblendw, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(dpps, SUFFIX), void, env, Reg, Reg, i32) +#if SHIFT == 1 DEF_HELPER_4(glue(dppd, SUFFIX), void, env, Reg, Reg, i32) +#endif DEF_HELPER_4(glue(mpsadbw, SUFFIX), void, env, Reg, Reg, i32) #endif /* SSE4.2 op helpers */ -#if SHIFT == 1 +#if SHIFT >= 1 DEF_HELPER_3(glue(pcmpgtq, SUFFIX), void, env, Reg, Reg) +#endif +#if SHIFT == 1 DEF_HELPER_4(glue(pcmpestri, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(pcmpestrm, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(pcmpistri, SUFFIX), void, env, Reg, Reg, i32) @@ -336,13 +367,15 @@ DEF_HELPER_3(crc32, tl, i32, tl, i32) #endif /* AES-NI op helpers */ -#if SHIFT == 1 +#if SHIFT >= 1 DEF_HELPER_3(glue(aesdec, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(aesdeclast, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(aesenc, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(aesenclast, SUFFIX), void, env, Reg, Reg) +#if SHIFT == 1 DEF_HELPER_3(glue(aesimc, SUFFIX), void, env, Reg, Reg) DEF_HELPER_4(glue(aeskeygenassist, SUFFIX), void, env, Reg, Reg, i32) +#endif DEF_HELPER_4(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, i32) #endif @@ -354,6 +387,9 @@ DEF_HELPER_4(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, i32) #undef SSE_HELPER_W #undef SSE_HELPER_L #undef SSE_HELPER_Q -#undef SSE_HELPER_S +#undef SSE_HELPER_S3 +#undef SSE_HELPER_S4 +#undef SSE_HELPER_P3 +#undef SSE_HELPER_P4 #undef SSE_HELPER_CMP #undef UNPCK_OP From patchwork Sun Sep 11 23:03:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0E8E8C6FA83 for ; Sun, 11 Sep 2022 23:21:47 +0000 (UTC) Received: from localhost ([::1]:44942 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWGo-0005In-3p for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:21:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40816) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0w-0000oC-6Z for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:32200) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0o-0007Db-8r for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937513; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Vhx48BDGTwt0Ckz35BjzQ0EMK6AFnYOJEU23RYLW1fY=; b=VXREgcNnXu3QUxm6qGExW09VeMeWO1J5NHdJNKo/mWCCpC62OWNMR1HOYTqi8m2Vr96EDe q9FjyY/y/7w2+X+ARJ0cOnHsrEwFkj3TWMp97d3gkuhjijbTyfDqQUTNo7mgyUspQ4KO6R Ntl4k0uxcbniZFbdn/nwIYbZOnTyt10= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-618-BX6ZHfrMM3CBsgn4LL7dxA-1; Sun, 11 Sep 2022 19:05:12 -0400 X-MC-Unique: BX6ZHfrMM3CBsgn4LL7dxA-1 Received: by mail-ej1-f70.google.com with SMTP id xj11-20020a170906db0b00b0077b6ecb23fcso503454ejb.5 for ; Sun, 11 Sep 2022 16:05:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Vhx48BDGTwt0Ckz35BjzQ0EMK6AFnYOJEU23RYLW1fY=; b=h+02vmhG8ZboHG3yk51Cvg/PqUxLKS6epY9OG8aL0jLRj6aqPdUOe6S8IVkY6ahz06 Z+BV8DYwn/1yAUxh5gMeGfr2iC6b7pLkNRUWhY0wSi43wrQpi8tIYO6QELwLuiLU9uas zuLPyIanjJdbM8qlowEdm8cjkwot53EPRy72BjA8on/sb0lvrpvtlmJ4DP1wOD75DIt0 PyQ1E2aaa7xH4G6zj0OwYbbXJW1Pm/UUj5jWO3BKAq23WEqs9WVvTM93dQEcbOjJRy1W JeE1UhPEPdeQ/St0F6eu7bYhCbPZhrvsEoEjNB/YIsWArRxgm3y6dET4I8P1MnftSfbG yrLg== X-Gm-Message-State: ACgBeo1gZcrF9AneQ1YQKji8R5VKTd7ToEx9/FxUc8QbyLRzFeWK1mws IPwphmVhER6CwvluxgWBLQsi2bKr0oWRhWntEAsI++Ywd2yNJ6+XswhF8ZgU1tkCG7F7F3m4rMx 7jMt83IxmketBakLUSeWaeH00LC1NVRpP2SMZUfw/VZRpANSUzeD9NEJOxuJgA9vg/dc= X-Received: by 2002:a17:907:d07:b0:72e:ec79:ad0f with SMTP id gn7-20020a1709070d0700b0072eec79ad0fmr17127947ejc.296.1662937510181; Sun, 11 Sep 2022 16:05:10 -0700 (PDT) X-Google-Smtp-Source: AA6agR6evu6VMIqemb0iDTJMZXEampMiWcUUmD7GXWHXhh0JMbwi2O+/oFuBLTXT7Z2iZXQX1SMgog== X-Received: by 2002:a17:907:d07:b0:72e:ec79:ad0f with SMTP id gn7-20020a1709070d0700b0072eec79ad0fmr17127919ejc.296.1662937509115; Sun, 11 Sep 2022 16:05:09 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id q10-20020a50c34a000000b0044838efb8f8sm4669624edb.25.2022.09.11.16.05.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:08 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 15/37] target/i386: extend helpers to support VEX.V 3- and 4- operand encodings Date: Mon, 12 Sep 2022 01:03:55 +0200 Message-Id: <20220911230418.340941-16-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Add to the helpers all the operands that are needed to implement AVX. Extracted from a patch by Paul Brook . Message-Id: <20220424220204.2493824-26-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 173 +++++++++++++-------------------- target/i386/ops_sse_header.h | 149 ++++++++++++++-------------- target/i386/tcg/translate.c | 181 ++++++++++++++++++++++++----------- 3 files changed, 265 insertions(+), 238 deletions(-) SSE_CMP(cmple), @@ -3149,6 +3163,11 @@ static const SSEFunc_0_epp sse_op_table4[8][4] = { }; #undef SSE_CMP +static void gen_helper_pavgusb(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b) +{ + gen_helper_pavgb_mmx(env, reg_a, reg_a, reg_b); +} + static const SSEFunc_0_epp sse_op_table5[256] = { [0x0c] = gen_helper_pi2fw, [0x0d] = gen_helper_pi2fd, @@ -3173,7 +3192,7 @@ static const SSEFunc_0_epp sse_op_table5[256] = { [0xb6] = gen_helper_movq, /* pfrcpit2 */ [0xb7] = gen_helper_pmulhrw_mmx, [0xbb] = gen_helper_pswapd, - [0xbf] = gen_helper_pavgb_mmx, + [0xbf] = gen_helper_pavgusb, }; struct SSEOpHelper_table6 { @@ -3185,6 +3204,8 @@ struct SSEOpHelper_table6 { struct SSEOpHelper_table7 { union { SSEFunc_0_eppi op1; + SSEFunc_0_epppi op2; + SSEFunc_0_epppp op3; } fn[2]; uint32_t ext_mask; int flags; @@ -3196,15 +3217,15 @@ struct SSEOpHelper_table7 { {{{.op = mmx_name}, {.op = gen_helper_ ## name ## _xmm} }, \ CPUID_EXT_ ## ext, flags} #define BINARY_OP_MMX(name, ext) \ - OP(name, op1, SSE_OPF_MMX, ext, gen_helper_ ## name ## _mmx) + OP(name, op2, SSE_OPF_MMX, ext, gen_helper_ ## name ## _mmx) #define BINARY_OP(name, ext, flags) \ - OP(name, op1, flags, ext, NULL) + OP(name, op2, flags, ext, NULL) #define UNARY_OP_MMX(name, ext) \ - OP(name, op1, SSE_OPF_MMX, ext, gen_helper_ ## name ## _mmx) + OP(name, op1, SSE_OPF_V0 | SSE_OPF_MMX, ext, gen_helper_ ## name ## _mmx) #define UNARY_OP(name, ext, flags) \ - OP(name, op1, flags, ext, NULL) -#define BLENDV_OP(name, ext, flags) OP(name, op1, 0, ext, NULL) -#define CMP_OP(name, ext) OP(name, op1, SSE_OPF_CMP, ext, NULL) + OP(name, op1, SSE_OPF_V0 | flags, ext, NULL) +#define BLENDV_OP(name, ext, flags) OP(name, op3, SSE_OPF_BLENDV, ext, NULL) +#define CMP_OP(name, ext) OP(name, op1, SSE_OPF_CMP | SSE_OPF_V0, ext, NULL) #define SPECIAL_OP(ext) OP(special, op1, SSE_OPF_SPECIAL, ext, NULL) /* prefix [66] 0f 38 */ @@ -3748,7 +3769,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, op1_offset = offsetof(CPUX86State,mmx_t0); } assert(b1 < 2); - SSEFunc_0_epp fn = sse_op_table2[((b - 1) & 3) * 8 + + SSEFunc_0_eppp fn = sse_op_table2[((b - 1) & 3) * 8 + (((modrm >> 3)) & 7)][b1]; if (!fn) { goto unknown_op; @@ -3761,8 +3782,9 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); } tcg_gen_addi_ptr(s->ptr0, cpu_env, op2_offset); - tcg_gen_addi_ptr(s->ptr1, cpu_env, op1_offset); - fn(cpu_env, s->ptr0, s->ptr1); + tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); + tcg_gen_addi_ptr(s->ptr2, cpu_env, op1_offset); + fn(cpu_env, s->ptr0, s->ptr1, s->ptr2); break; case 0x050: /* movmskps */ rm = (modrm & 7) | REX_B(s); @@ -4030,7 +4052,21 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - op6->fn[b1].op1(cpu_env, s->ptr0, s->ptr1); + if (op6->flags & SSE_OPF_V0) { + op6->fn[b1].op1(cpu_env, s->ptr0, s->ptr1); + } else { + tcg_gen_addi_ptr(s->ptr2, cpu_env, op1_offset); + if (op6->flags & SSE_OPF_BLENDV) { + TCGv_ptr mask = tcg_temp_new_ptr(); + tcg_gen_addi_ptr(mask, cpu_env, ZMM_OFFSET(0)); + op6->fn[b1].op3(cpu_env, s->ptr0, s->ptr2, s->ptr1, + mask); + tcg_temp_free_ptr(mask); + } else { + SSEFunc_0_eppp fn = op6->fn[b1].op2; + fn(cpu_env, s->ptr0, s->ptr2, s->ptr1); + } + } } else { CHECK_NO_VEX(s); if ((op6->flags & SSE_OPF_MMX) == 0) { @@ -4046,7 +4082,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - op6->fn[0].op1(cpu_env, s->ptr0, s->ptr1); + if (op6->flags & SSE_OPF_V0) { + op6->fn[0].op1(cpu_env, s->ptr0, s->ptr1); + } else { + op6->fn[0].op2(cpu_env, s->ptr0, s->ptr0, s->ptr1); + } } if (op6->flags & SSE_OPF_CMP) { @@ -4380,7 +4420,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, /* We only actually have one MMX instuction (palignr) */ assert(b == 0x0f); - op7->fn[0].op1(cpu_env, s->ptr0, s->ptr1, + op7->fn[0].op2(cpu_env, s->ptr0, s->ptr0, s->ptr1, tcg_const_i32(val)); break; } @@ -4407,7 +4447,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - op7->fn[b1].op1(cpu_env, s->ptr0, s->ptr1, tcg_const_i32(val)); + if (op7->flags & SSE_OPF_V0) { + op7->fn[b1].op1(cpu_env, s->ptr0, s->ptr1, tcg_const_i32(val)); + } else { + tcg_gen_addi_ptr(s->ptr2, cpu_env, op1_offset); + op7->fn[b1].op2(cpu_env, s->ptr0, s->ptr2, s->ptr1, + tcg_const_i32(val)); + } if (op7->flags & SSE_OPF_CMP) { set_cc_op(s, CC_OP_EFLAGS); } @@ -4499,26 +4545,46 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, return; } } + + tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - if (sse_op_flags & SSE_OPF_SHUF) { - val = x86_ldub_code(env, s); - sse_op_fn.op1i(s->ptr0, s->ptr1, tcg_const_i32(val)); - } else if (b == 0xf7) { - /* maskmov : we must prepare A0 */ - if (mod != 3) { - goto illegal_op; + if (sse_op_flags & SSE_OPF_V0) { + if (sse_op_flags & SSE_OPF_SHUF) { + val = x86_ldub_code(env, s); + sse_op_fn.op1i(s->ptr0, s->ptr1, tcg_const_i32(val)); + } else if (b == 0xf7) { + /* maskmov : we must prepare A0 */ + if (mod != 3) { + goto illegal_op; + } + tcg_gen_mov_tl(s->A0, cpu_regs[R_EDI]); + gen_extu(s->aflag, s->A0); + gen_add_A0_ds_seg(s); + + tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); + tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); + sse_op_fn.op1t(cpu_env, s->ptr0, s->ptr1, s->A0); + /* Does not write to the fist operand */ + return; + } else { + sse_op_fn.op1(cpu_env, s->ptr0, s->ptr1); } - tcg_gen_mov_tl(s->A0, cpu_regs[R_EDI]); - gen_extu(s->aflag, s->A0); - gen_add_A0_ds_seg(s); - sse_op_fn.op1t(cpu_env, s->ptr0, s->ptr1, s->A0); - } else if (b == 0xc2) { - /* compare insns, bits 7:3 (7:5 for AVX) are ignored */ - val = x86_ldub_code(env, s) & 7; - sse_op_table4[val][b1](cpu_env, s->ptr0, s->ptr1); } else { - sse_op_fn.op1(cpu_env, s->ptr0, s->ptr1); + tcg_gen_addi_ptr(s->ptr2, cpu_env, op1_offset); + if (sse_op_flags & SSE_OPF_SHUF) { + val = x86_ldub_code(env, s); + sse_op_fn.op2i(s->ptr0, s->ptr2, s->ptr1, + tcg_const_i32(val)); + } else { + SSEFunc_0_eppp fn = sse_op_fn.op2; + if (b == 0xc2) { + /* compare insns */ + val = x86_ldub_code(env, s) & 7; + fn = sse_op_table4[val][b1]; + } + fn(cpu_env, s->ptr0, s->ptr2, s->ptr1); + } } if (sse_op_flags & SSE_OPF_CMP) { @@ -8598,6 +8664,7 @@ static void i386_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cpu) dc->tmp4 = tcg_temp_new(); dc->ptr0 = tcg_temp_new_ptr(); dc->ptr1 = tcg_temp_new_ptr(); + dc->ptr2 = tcg_temp_new_ptr(); dc->cc_srcT = tcg_temp_local_new(); } diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index c0766de18d..fb8733f509 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -48,9 +48,8 @@ #define FPSLL(x, c) ((x) << shift) #endif -void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift; if (c->Q(0) > 15) { for (int i = 0; i < 1 << SHIFT; i++) { @@ -64,9 +63,8 @@ void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift; if (c->Q(0) > 15) { for (int i = 0; i < 1 << SHIFT; i++) { @@ -80,9 +78,8 @@ void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift; if (c->Q(0) > 15) { shift = 15; @@ -94,9 +91,8 @@ void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift; if (c->Q(0) > 31) { for (int i = 0; i < 1 << SHIFT; i++) { @@ -110,9 +106,8 @@ void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift; if (c->Q(0) > 31) { for (int i = 0; i < 1 << SHIFT; i++) { @@ -126,9 +121,8 @@ void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift; if (c->Q(0) > 31) { shift = 31; @@ -140,9 +134,8 @@ void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift; if (c->Q(0) > 63) { for (int i = 0; i < 1 << SHIFT; i++) { @@ -156,9 +149,8 @@ void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift; if (c->Q(0) > 63) { for (int i = 0; i < 1 << SHIFT; i++) { @@ -173,9 +165,8 @@ void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } #if SHIFT >= 1 -void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift, i, j; shift = c->L(0); @@ -192,9 +183,8 @@ void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { - Reg *s = d; int shift, i, j; shift = c->L(0); @@ -222,9 +212,8 @@ void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } #define SSE_HELPER_2(name, elem, num, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ + void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ int n = num; \ for (int i = 0; i < n; i++) { \ d->elem(i) = F(v->elem(i), s->elem(i)); \ @@ -362,18 +351,24 @@ SSE_HELPER_W(helper_pcmpeqw, FCMPEQ) SSE_HELPER_L(helper_pcmpeql, FCMPEQ) SSE_HELPER_W(helper_pmullw, FMULLW) -#if SHIFT == 0 -SSE_HELPER_W(helper_pmulhrw, FMULHRW) -#endif SSE_HELPER_W(helper_pmulhuw, FMULHUW) SSE_HELPER_W(helper_pmulhw, FMULHW) +#if SHIFT == 0 +void glue(helper_pmulhrw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + d->W(0) = FMULHRW(d->W(0), s->W(0)); + d->W(1) = FMULHRW(d->W(1), s->W(1)); + d->W(2) = FMULHRW(d->W(2), s->W(2)); + d->W(3) = FMULHRW(d->W(3), s->W(3)); +} +#endif + SSE_HELPER_B(helper_pavgb, FAVG) SSE_HELPER_W(helper_pavgw, FAVG) -void glue(helper_pmuludq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_pmuludq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; int i; for (i = 0; i < (1 << SHIFT); i++) { @@ -381,9 +376,8 @@ void glue(helper_pmuludq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } } -void glue(helper_pmaddwd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_pmaddwd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; int i; for (i = 0; i < (2 << SHIFT); i++) { @@ -402,10 +396,8 @@ static inline int abs1(int a) } } #endif - -void glue(helper_psadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; int i; for (i = 0; i < (1 << SHIFT); i++) { @@ -478,9 +470,8 @@ void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int order) SHUFFLE4(W, s, s, 0); } #else -void glue(helper_shufps, SUFFIX)(Reg *d, Reg *s, int order) +void glue(helper_shufps, SUFFIX)(Reg *d, Reg *v, Reg *s, int order) { - Reg *v = d; uint32_t r0, r1, r2, r3; int i; @@ -489,9 +480,8 @@ void glue(helper_shufps, SUFFIX)(Reg *d, Reg *s, int order) } } -void glue(helper_shufpd, SUFFIX)(Reg *d, Reg *s, int order) +void glue(helper_shufpd, SUFFIX)(Reg *d, Reg *v, Reg *s, int order) { - Reg *v = d; uint64_t r0, r1; int i; @@ -543,9 +533,8 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) #define SSE_HELPER_P(name, F) \ void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, \ - Reg *d, Reg *s) \ + Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ int i; \ for (i = 0; i < 2 << SHIFT; i++) { \ d->ZMM_S(i) = F(32, v->ZMM_S(i), s->ZMM_S(i)); \ @@ -553,9 +542,8 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) } \ \ void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, \ - Reg *d, Reg *s) \ + Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ int i; \ for (i = 0; i < 1 << SHIFT; i++) { \ d->ZMM_D(i) = F(64, v->ZMM_D(i), s->ZMM_D(i)); \ @@ -567,15 +555,13 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) #define SSE_HELPER_S(name, F) \ SSE_HELPER_P(name, F) \ \ - void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s)\ + void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *v, Reg *s)\ { \ - Reg *v = d; \ d->ZMM_S(0) = F(32, v->ZMM_S(0), s->ZMM_S(0)); \ } \ \ - void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s)\ + void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *v, Reg *s)\ { \ - Reg *v = d; \ d->ZMM_D(0) = F(64, v->ZMM_D(0), s->ZMM_D(0)); \ } @@ -958,9 +944,8 @@ void helper_insertq_i(CPUX86State *env, ZMMReg *d, int index, int length) #endif #define SSE_HELPER_HPS(name, F) \ -void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ float32 r[2 << SHIFT]; \ int i, j, k; \ for (k = 0; k < 2 << SHIFT; k += LANE_WIDTH / 4) { \ @@ -980,9 +965,8 @@ SSE_HELPER_HPS(haddps, float32_add) SSE_HELPER_HPS(hsubps, float32_sub) #define SSE_HELPER_HPD(name, F) \ -void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ float64 r[1 << SHIFT]; \ int i, j, k; \ for (k = 0; k < 1 << SHIFT; k += LANE_WIDTH / 8) { \ @@ -1001,9 +985,8 @@ void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ SSE_HELPER_HPD(haddpd, float64_add) SSE_HELPER_HPD(hsubpd, float64_sub) -void glue(helper_addsubps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_addsubps, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; int i; for (i = 0; i < 2 << SHIFT; i += 2) { d->ZMM_S(i) = float32_sub(v->ZMM_S(i), s->ZMM_S(i), &env->sse_status); @@ -1011,9 +994,8 @@ void glue(helper_addsubps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } } -void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; int i; for (i = 0; i < 1 << SHIFT; i += 2) { d->ZMM_D(i) = float64_sub(v->ZMM_D(i), s->ZMM_D(i), &env->sse_status); @@ -1023,9 +1005,8 @@ void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) #define SSE_HELPER_CMP_P(name, F, C) \ void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, \ - Reg *d, Reg *s) \ + Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ int i; \ for (i = 0; i < 2 << SHIFT; i++) { \ d->ZMM_L(i) = C(F(32, v->ZMM_S(i), s->ZMM_S(i))) ? -1 : 0; \ @@ -1033,9 +1014,8 @@ void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } \ \ void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, \ - Reg *d, Reg *s) \ + Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ int i; \ for (i = 0; i < 1 << SHIFT; i++) { \ d->ZMM_Q(i) = C(F(64, v->ZMM_D(i), s->ZMM_D(i))) ? -1 : 0; \ @@ -1045,15 +1025,13 @@ void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) #if SHIFT == 1 #define SSE_HELPER_CMP(name, F, C) \ SSE_HELPER_CMP_P(name, F, C) \ - void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s) \ + void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ d->ZMM_L(0) = C(F(32, v->ZMM_S(0), s->ZMM_S(0))) ? -1 : 0; \ } \ \ - void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s) \ + void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ d->ZMM_Q(0) = C(F(64, v->ZMM_D(0), s->ZMM_D(0))) ? -1 : 0; \ } @@ -1179,9 +1157,8 @@ uint32_t glue(helper_pmovmskb, SUFFIX)(CPUX86State *env, Reg *s) #define PACK_HELPER_B(name, F) \ void glue(helper_pack ## name, SUFFIX)(CPUX86State *env, \ - Reg *d, Reg *s) \ + Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ uint8_t r[PACK_WIDTH * 2]; \ int j, k; \ for (j = 0; j < 4 << SHIFT; j += PACK_WIDTH) { \ @@ -1200,9 +1177,8 @@ void glue(helper_pack ## name, SUFFIX)(CPUX86State *env, \ PACK_HELPER_B(sswb, satsb) PACK_HELPER_B(uswb, satub) -void glue(helper_packssdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_packssdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; uint16_t r[PACK_WIDTH]; int j, k; @@ -1222,9 +1198,8 @@ void glue(helper_packssdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) #define UNPCK_OP(base_name, base) \ - Reg *v = d; \ uint8_t r[PACK_WIDTH * 2]; \ int j, i; \ \ @@ -1241,9 +1216,8 @@ void glue(helper_packssdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } \ \ void glue(helper_punpck ## base_name ## wd, SUFFIX)(CPUX86State *env,\ - Reg *d, Reg *s) \ + Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ uint16_t r[PACK_WIDTH]; \ int j, i; \ \ @@ -1260,9 +1234,8 @@ void glue(helper_packssdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } \ \ void glue(helper_punpck ## base_name ## dq, SUFFIX)(CPUX86State *env,\ - Reg *d, Reg *s) \ + Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ uint32_t r[PACK_WIDTH / 2]; \ int j, i; \ \ @@ -1280,9 +1253,8 @@ void glue(helper_packssdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ XMM_ONLY( \ void glue(helper_punpck ## base_name ## qdq, SUFFIX)( \ - CPUX86State *env, Reg *d, Reg *s) \ + CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ uint64_t r[2]; \ int i; \ \ @@ -1453,9 +1425,8 @@ void helper_pswapd(CPUX86State *env, MMXReg *d, MMXReg *s) #endif /* SSSE3 op helpers */ -void glue(helper_pshufb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_pshufb, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; int i; #if SHIFT == 0 uint8_t r[8]; @@ -1480,9 +1451,8 @@ void glue(helper_pshufb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } #define SSE_HELPER_HW(name, F) \ -void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ uint16_t r[4 << SHIFT]; \ int i, j, k; \ for (k = 0; k < 4 << SHIFT; k += LANE_WIDTH / 2) { \ @@ -1499,9 +1469,8 @@ void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ } #define SSE_HELPER_HL(name, F) \ -void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +void glue(helper_ ## name, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ - Reg *v = d; \ uint32_t r[2 << SHIFT]; \ int i, j, k; \ for (k = 0; k < 2 << SHIFT; k += LANE_WIDTH / 4) { \ @@ -1527,9 +1496,8 @@ SSE_HELPER_HL(phsubd, FSUB) #undef SSE_HELPER_HW #undef SSE_HELPER_HL -void glue(helper_pmaddubsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_pmaddubsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; int i; for (i = 0; i < 4 << SHIFT; i++) { d->W(i) = satsw((int8_t)s->B(i * 2) * (uint8_t)v->B(i * 2) + @@ -1554,10 +1522,9 @@ SSE_HELPER_B(helper_psignb, FSIGNB) SSE_HELPER_W(helper_psignw, FSIGNW) SSE_HELPER_L(helper_psignd, FSIGNL) -void glue(helper_palignr, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, +void glue(helper_palignr, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s, int32_t shift) { - Reg *v = d; int i; /* XXX could be checked during translation */ @@ -1594,10 +1561,9 @@ void glue(helper_palignr, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #if SHIFT >= 1 #define SSE_HELPER_V(name, elem, num, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ + void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s, \ + Reg *m) \ { \ - Reg *v = d; \ - Reg *m = &env->xmm_regs[0]; \ int i; \ for (i = 0; i < num; i++) { \ d->elem(i) = F(v->elem(i), s->elem(i), m->elem(i)); \ @@ -1605,10 +1571,9 @@ void glue(helper_palignr, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } #define SSE_HELPER_I(name, elem, num, F) \ - void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, \ + void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s, \ uint32_t imm) \ { \ - Reg *v = d; \ int i; \ for (i = 0; i < num; i++) { \ int j = i & 7; \ @@ -1660,9 +1625,8 @@ SSE_HELPER_F(helper_pmovzxwq, Q, 1 << SHIFT, s->W) SSE_HELPER_F(helper_pmovzxdq, Q, 1 << SHIFT, s->L) #endif -void glue(helper_pmuldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_pmuldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; int i; for (i = 0; i < 1 << SHIFT; i++) { @@ -1673,9 +1637,8 @@ void glue(helper_pmuldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) #define FCMPEQQ(d, s) (d == s ? -1 : 0) SSE_HELPER_Q(helper_pcmpeqq, FCMPEQQ) -void glue(helper_packusdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_packusdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { - Reg *v = d; uint16_t r[8]; int i, j, k; @@ -1893,10 +1856,9 @@ SSE_HELPER_I(helper_blendps, L, 2 << SHIFT, FBLENDP) SSE_HELPER_I(helper_blendpd, Q, 1 << SHIFT, FBLENDP) SSE_HELPER_I(helper_pblendw, W, 4 << SHIFT, FBLENDP) -void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, +void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s, uint32_t mask) { - Reg *v = d; float32 prod1, prod2, temp2, temp3, temp4; int i; @@ -1939,9 +1901,8 @@ void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #if SHIFT == 1 /* Oddly, there is no ymm version of dppd */ void glue(helper_dppd, SUFFIX)(CPUX86State *env, - Reg *d, Reg *s, uint32_t mask) + Reg *d, Reg *v, Reg *s, uint32_t mask) { - Reg *v = d; float64 prod1, prod2, temp2; if (mask & (1 << 4)) { @@ -1960,10 +1921,9 @@ void glue(helper_dppd, SUFFIX)(CPUX86State *env, } #endif -void glue(helper_mpsadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, +void glue(helper_mpsadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s, uint32_t offset) { - Reg *v = d; int i, j; uint16_t r[8]; @@ -2236,10 +2196,9 @@ static void clmulq(uint64_t *dest_l, uint64_t *dest_h, } #endif -void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, +void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s, uint32_t ctrl) { - Reg *v = d; uint64_t a, b; int i; @@ -2250,10 +2209,10 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } } -void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { int i; - Reg st = *d; + Reg st = *v; Reg rk = *s; for (i = 0 ; i < 2 << SHIFT ; i++) { @@ -2265,10 +2224,10 @@ void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } } -void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { int i; - Reg st = *d; + Reg st = *v; Reg rk = *s; for (i = 0; i < 8 << SHIFT; i++) { @@ -2276,10 +2235,10 @@ void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } } -void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { int i; - Reg st = *d; + Reg st = *v; Reg rk = *s; for (i = 0 ; i < 2 << SHIFT ; i++) { @@ -2291,10 +2250,10 @@ void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } } -void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) { int i; - Reg st = *d; + Reg st = *v; Reg rk = *s; for (i = 0; i < 8 << SHIFT; i++) { diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 7f57dab496..21fed7fa05 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -34,31 +34,31 @@ #define dh_typecode_ZMMReg dh_typecode_ptr #define dh_typecode_MMXReg dh_typecode_ptr -DEF_HELPER_3(glue(psrlw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psraw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psllw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psrld, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psrad, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pslld, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psrlq, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psllq, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(psrlw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psraw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psllw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psrld, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psrad, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pslld, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psrlq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psllq, SUFFIX), void, env, Reg, Reg, Reg) #if SHIFT >= 1 -DEF_HELPER_3(glue(psrldq, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pslldq, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(psrldq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pslldq, SUFFIX), void, env, Reg, Reg, Reg) #endif #define SSE_HELPER_B(name, F)\ - DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg) + DEF_HELPER_4(glue(name, SUFFIX), void, env, Reg, Reg, Reg) #define SSE_HELPER_W(name, F)\ - DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg) + DEF_HELPER_4(glue(name, SUFFIX), void, env, Reg, Reg, Reg) #define SSE_HELPER_L(name, F)\ - DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg) + DEF_HELPER_4(glue(name, SUFFIX), void, env, Reg, Reg, Reg) #define SSE_HELPER_Q(name, F)\ - DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg) + DEF_HELPER_4(glue(name, SUFFIX), void, env, Reg, Reg, Reg) SSE_HELPER_B(paddb, FADD) SSE_HELPER_W(paddw, FADD) @@ -109,10 +109,10 @@ SSE_HELPER_W(pmulhw, FMULHW) SSE_HELPER_B(pavgb, FAVG) SSE_HELPER_W(pavgw, FAVG) -DEF_HELPER_3(glue(pmuludq, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmaddwd, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(pmuludq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pmaddwd, SUFFIX), void, env, Reg, Reg, Reg) -DEF_HELPER_3(glue(psadbw, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(psadbw, SUFFIX), void, env, Reg, Reg, Reg) #if SHIFT < 2 DEF_HELPER_4(glue(maskmov, SUFFIX), void, env, Reg, Reg, tl) #endif @@ -134,8 +134,8 @@ DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int) /* XXX: not accurate */ #define SSE_HELPER_P4(name) \ - DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ - DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) + DEF_HELPER_4(glue(name ## ps, SUFFIX), void, env, Reg, Reg, Reg) \ + DEF_HELPER_4(glue(name ## pd, SUFFIX), void, env, Reg, Reg, Reg) #define SSE_HELPER_P3(name, ...) \ DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ @@ -144,8 +144,8 @@ DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int) #if SHIFT == 1 #define SSE_HELPER_S4(name) \ SSE_HELPER_P4(name) \ - DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ - DEF_HELPER_3(name ## sd, void, env, Reg, Reg) + DEF_HELPER_4(name ## ss, void, env, Reg, Reg, Reg) \ + DEF_HELPER_4(name ## sd, void, env, Reg, Reg, Reg) #define SSE_HELPER_S3(name) \ SSE_HELPER_P3(name) \ DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ @@ -155,8 +155,8 @@ DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int) #define SSE_HELPER_S3(name, ...) SSE_HELPER_P3(name) #endif -DEF_HELPER_3(glue(shufps, SUFFIX), void, Reg, Reg, int) -DEF_HELPER_3(glue(shufpd, SUFFIX), void, Reg, Reg, int) +DEF_HELPER_4(glue(shufps, SUFFIX), void, Reg, Reg, Reg, int) +DEF_HELPER_4(glue(shufpd, SUFFIX), void, Reg, Reg, Reg, int) SSE_HELPER_S4(add) SSE_HELPER_S4(sub) @@ -212,6 +212,7 @@ DEF_HELPER_2(cvttsd2sq, s64, env, ZMMReg) DEF_HELPER_3(glue(rsqrtps, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(glue(rcpps, SUFFIX), void, env, ZMMReg, ZMMReg) + #if SHIFT == 1 DEF_HELPER_3(rsqrtss, void, env, ZMMReg, ZMMReg) DEF_HELPER_3(rcpss, void, env, ZMMReg, ZMMReg) @@ -248,20 +249,20 @@ DEF_HELPER_2(glue(movmskpd, SUFFIX), i32, env, Reg) #endif DEF_HELPER_2(glue(pmovmskb, SUFFIX), i32, env, Reg) -DEF_HELPER_3(glue(packsswb, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(packuswb, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(packssdw, SUFFIX), void, env, Reg, Reg) -#define UNPCK_OP(base_name, base) \ - DEF_HELPER_3(glue(punpck ## base_name ## bw, SUFFIX), void, env, Reg, Reg) \ - DEF_HELPER_3(glue(punpck ## base_name ## wd, SUFFIX), void, env, Reg, Reg) \ - DEF_HELPER_3(glue(punpck ## base_name ## dq, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(packsswb, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(packuswb, SUFFIX)# name ## dq, SUFFIX), void, env, Reg, Reg, Reg) UNPCK_OP(l, 0) UNPCK_OP(h, 1) #if SHIFT >= 1 -DEF_HELPER_3(glue(punpcklqdq, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(punpckhqdq, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(punpcklqdq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(punpckhqdq, SUFFIX), void, env, Reg, Reg, Reg) #endif /* 3DNow! float ops */ @@ -288,28 +289,28 @@ DEF_HELPER_3(pswapd, void, env, MMXReg, MMXReg) #endif /* SSSE3 op helpers */ -DEF_HELPER_3(glue(phaddw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(phaddd, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(phaddsw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(phsubw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(phsubd, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(phsubsw, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(phaddw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(phaddd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(phaddsw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(phsubw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(phsubd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(phsubsw, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_3(glue(pabsb, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pabsw, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pabsd, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmaddubsw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmulhrsw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pshufb, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psignb, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psignw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(psignd, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_4(glue(palignr, SUFFIX), void, env, Reg, Reg, s32) +DEF_HELPER_4(glue(pmaddubsw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pmulhrsw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pshufb, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psignb, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psignw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(psignd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_5(glue(palignr, SUFFIX), void, env, Reg, Reg, Reg, s32) /* SSE4.1 op helpers */ #if SHIFT >= 1 -DEF_HELPER_3(glue(pblendvb, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(blendvps, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(blendvpd, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_5(glue(pblendvb, SUFFIX), void, env, Reg, Reg, Reg, Reg) +DEF_HELPER_5(glue(blendvps, SUFFIX), void, env, Reg, Reg, Reg, Reg) +DEF_HELPER_5(glue(blendvpd, SUFFIX), void, env, Reg, Reg, Reg, Reg) DEF_HELPER_3(glue(ptest, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmovsxbw, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmovsxbd, SUFFIX), void, env, Reg, Reg) @@ -323,40 +324,40 @@ DEF_HELPER_3(glue(pmovzxbq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmovzxwd, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmovzxwq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmovzxdq, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmuldq, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pcmpeqq, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(packusdw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pminsb, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pminsd, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pminuw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pminud, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmaxsb, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmaxsd, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmaxuw, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmaxud, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(pmulld, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(pmuldq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pcmpeqq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(packusdw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pminsb, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pminsd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pminuw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pminud, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pmaxsb, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pmaxsd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pmaxuw, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pmaxud, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(pmulld, SUFFIX), void, env, Reg, Reg, Reg) #if SHIFT == 1 DEF_HELPER_3(glue(phminposuw, SUFFIX), void, env, Reg, Reg) #endif DEF_HELPER_4(glue(roundps, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(roundpd, SUFFIX), void, env, Reg, Reg, i32) #if SHIFT == 1 -DEF_HELPER_4(glue(roundss, SUFFIX), void, env, Reg, Reg, i32) -DEF_HELPER_4(glue(roundsd, SUFFIX), void, env, Reg, Reg, i32) +DEF_HELPER_4(roundss_xmm, void, env, Reg, Reg, i32) +DEF_HELPER_4(roundsd_xmm, void, env, Reg, Reg, i32) #endif -DEF_HELPER_4(glue(blendps, SUFFIX), void, env, Reg, Reg, i32) -DEF_HELPER_4(glue(blendpd, SUFFIX), void, env, Reg, Reg, i32) -DEF_HELPER_4(glue(pblendw, SUFFIX), void, env, Reg, Reg, i32) -DEF_HELPER_4(glue(dpps, SUFFIX), void, env, Reg, Reg, i32) +DEF_HELPER_5(glue(blendps, SUFFIX), void, env, Reg, Reg, Reg, i32) +DEF_HELPER_5(glue(blendpd, SUFFIX), void, env, Reg, Reg, Reg, i32) +DEF_HELPER_5(glue(pblendw, SUFFIX), void, env, Reg, Reg, Reg, i32) +DEF_HELPER_5(glue(dpps, SUFFIX), void, env, Reg, Reg, Reg, i32) #if SHIFT == 1 -DEF_HELPER_4(glue(dppd, SUFFIX), void, env, Reg, Reg, i32) +DEF_HELPER_5(glue(dppd, SUFFIX), void, env, Reg, Reg, Reg, i32) #endif -DEF_HELPER_4(glue(mpsadbw, SUFFIX), void, env, Reg, Reg, i32) +DEF_HELPER_5(glue(mpsadbw, SUFFIX), void, env, Reg, Reg, Reg, i32) #endif /* SSE4.2 op helpers */ #if SHIFT >= 1 -DEF_HELPER_3(glue(pcmpgtq, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(pcmpgtq, SUFFIX), void, env, Reg, Reg, Reg) #endif #if SHIFT == 1 DEF_HELPER_4(glue(pcmpestri, SUFFIX), void, env, Reg, Reg, i32) @@ -368,15 +369,15 @@ DEF_HELPER_3(crc32, tl, i32, tl, i32) /* AES-NI op helpers */ #if SHIFT >= 1 -DEF_HELPER_3(glue(aesdec, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(aesdeclast, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(aesenc, SUFFIX), void, env, Reg, Reg) -DEF_HELPER_3(glue(aesenclast, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(aesdec, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(aesdeclast, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(aesenc, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(aesenclast, SUFFIX), void, env, Reg, Reg, Reg) #if SHIFT == 1 DEF_HELPER_3(glue(aesimc, SUFFIX), void, env, Reg, Reg) DEF_HELPER_4(glue(aeskeygenassist, SUFFIX), void, env, Reg, Reg, i32) #endif -DEF_HELPER_4(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, i32) +DEF_HELPER_5(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, Reg, i32) #endif #undef SHIFT diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 240811bd49..e996aab541 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -129,6 +129,7 @@ typedef struct DisasContext { TCGv tmp4; TCGv_ptr ptr0; TCGv_ptr ptr1; + TCGv_ptr ptr2; TCGv_i32 tmp2_i32; TCGv_i32 tmp3_i32; TCGv_i64 tmp1_i64; @@ -2893,18 +2894,28 @@ typedef void (*SSEFunc_0_epl)(TCGv_ptr env, TCGv_ptr reg, TCGv_i64 val); typedef void (*SSEFunc_0_epp)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b); typedef void (*SSEFunc_0_eppp)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv_ptr reg_c); +typedef void (*SSEFunc_0_epppp)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, + TCGv_ptr reg_c, TCGv_ptr reg_d); typedef void (*SSEFunc_0_eppi)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv_i32 val); +typedef void (*SSEFunc_0_epppi)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, + TCGv_ptr reg_c, TCGv_i32 val); typedef void (*SSEFunc_0_ppi)(TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv_i32 val); +typedef void (*SSEFunc_0_pppi)(TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv_ptr reg_c, + TCGv_i32 val); typedef void (*SSEFunc_0_eppt)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv val); +typedef void (*SSEFunc_0_epppt)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, + TCGv_ptr reg_c, TCGv val); static bool first = true; static unsigned long limit; #include "decode-new.h" #include "emit.c.inc" #include "decode-new.c.inc" +#define SSE_OPF_V0 (1 << 0) /* vex.v must be 1111b (only 2 operands) */ #define SSE_OPF_CMP (1 << 1) /* does not write for first operand */ +#define SSE_OPF_BLENDV (1 << 2) /* blendv* instruction */ #define SSE_OPF_SPECIAL (1 << 3) /* magic */ #define SSE_OPF_3DNOW (1 << 4) /* 3DNow! instruction */ #define SSE_OPF_MMX (1 << 5) /* MMX/integer/AVX2 instruction */ @@ -2914,10 +2925,10 @@ static bool first = true; static unsigned long limit; #define OP(op, flags, a, b, c, d) \ {flags, {{.op = a}, {.op = b}, {.op = c}, {.op = d} } } -#define MMX_OP(x) OP(op1, SSE_OPF_MMX, \ +#define MMX_OP(x) OP(op2, SSE_OPF_MMX, \ gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm, NULL, NULL) -#define SSE_FOP(name) OP(op1, SSE_OPF_SCALAR, \ +#define SSE_FOP(name) OP(op2, SSE_OPF_SCALAR, \ gen_helper_##name##ps##_xmm, gen_helper_##name##pd##_xmm, \ gen_helper_##name##ss, gen_helper_##name##sd) #define SSE_OP(sname, dname, op, flags) OP(op, flags, \ @@ -2927,6 +2938,9 @@ typedef union SSEFuncs { SSEFunc_0_epp op1; SSEFunc_0_ppi op1i; SSEFunc_0_eppt op1t; + SSEFunc_0_eppp op2; + SSEFunc_0_pppi op2i; + SSEFunc_0_epppp op3; } SSEFuncs; struct SSEOpHelper_table1 { @@ -2946,8 +2960,8 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x11] = SSE_SPECIAL, /* movups, movupd, movss, movsd */ [0x12] = SSE_SPECIAL, /* movlps, movlpd, movsldup, movddup */ [0x13] = SSE_SPECIAL, /* movlps, movlpd */ - [0x14] = SSE_OP(punpckldq, punpcklqdq, op1, 0), /* unpcklps, unpcklpd */ - [0x15] = SSE_OP(punpckhdq, punpckhqdq, op1, 0), /* unpckhps, unpckhpd */ + [0x14] = SSE_OP(punpckldq, punpcklqdq, op2, 0), /* unpcklps, unpcklpd */ + [0x15] = SSE_OP(punpckhdq, punpckhqdq, op2, 0), /* unpckhps, unpckhpd */ [0x16] = SSE_SPECIAL, /* movhps, movhpd, movshdup */ [0x17] = SSE_SPECIAL, /* movhps, movhpd */ @@ -2957,28 +2971,28 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x2b] = SSE_SPECIAL, /* movntps, movntpd, movntss, movntsd */ [0x2c] = SSE_SPECIAL, /* cvttps2pi, cvttpd2pi, cvttsd2si, cvttss2si */ [0x2d] = SSE_SPECIAL, /* cvtps2pi, cvtpd2pi, cvtsd2si, cvtss2si */ - [0x2e] = OP(op1, SSE_OPF_CMP | SSE_OPF_SCALAR, + [0x2e] = OP(op1, SSE_OPF_CMP | SSE_OPF_SCALAR | SSE_OPF_V0, gen_helper_ucomiss, gen_helper_ucomisd, NULL, NULL), - [0x2f] = OP(op1, SSE_OPF_CMP | SSE_OPF_SCALAR, + [0x2f] = OP(op1, SSE_OPF_CMP | SSE_OPF_SCALAR | SSE_OPF_V0, gen_helper_comiss, gen_helper_comisd, NULL, NULL), [0x50] = SSE_SPECIAL, /* movmskps, movmskpd */ - [0x51] = OP(op1, SSE_OPF_SCALAR, + [0x51] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, gen_helper_sqrtps_xmm, gen_helper_sqrtpd_xmm, gen_helper_sqrtss, gen_helper_sqrtsd), - [0x52] = OP(op1, SSE_OPF_SCALAR, + [0x52] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, gen_helper_rsqrtps_xmm, NULL, gen_helper_rsqrtss, NULL), - [0x53] = OP(op1, SSE_OPF_SCALAR, + [0x53] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, gen_helper_rcpps_xmm, NULL, gen_helper_rcpss, NULL), - [0x54] = SSE_OP(pand, pand, op1, 0), /* andps, andpd */ - [0x55] = SSE_OP(pandn, pandn, op1, 0), /* andnps, andnpd */ - [0x56] = SSE_OP(por, por, op1, 0), /* orps, orpd */ - [0x57] = SSE_OP(pxor, pxor, op1, 0), /* xorps, xorpd */ + [0x54] = SSE_OP(pand, pand, op2, 0), /* andps, andpd */ + [0x55] = SSE_OP(pandn, pandn, op2, 0), /* andnps, andnpd */ + [0x56] = SSE_OP(por, por, op2, 0), /* orps, orpd */ + [0x57] = SSE_OP(pxor, pxor, op2, 0), /* xorps, xorpd */ [0x58] = SSE_FOP(add), [0x59] = SSE_FOP(mul), - [0x5a] = OP(op1, SSE_OPF_SCALAR, + [0x5a] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, gen_helper_cvtps2pd_xmm, gen_helper_cvtpd2ps_xmm, gen_helper_cvtss2sd, gen_helper_cvtsd2ss), - [0x5b] = OP(op1, 0, + [0x5b] = OP(op1, SSE_OPF_V0, gen_helper_cvtdq2ps_xmm, gen_helper_cvtps2dq_xmm, gen_helper_cvttps2dq_xmm, NULL), [0x5c] = SSE_FOP(sub), @@ -2987,7 +3001,7 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x5f] = SSE_FOP(max), [0xc2] = SSE_FOP(cmpeq), /* sse_op_table4 */ - [0xc6] = SSE_OP(shufps, shufpd, op1i, SSE_OPF_SHUF), + [0xc6] = SSE_OP(shufps, shufpd, op2i, SSE_OPF_SHUF), /* SSSE3, SSE4, MOVBE, CRC32, BMI1, BMI2, ADX. */ [0x38] = SSE_SPECIAL, @@ -3006,13 +3020,13 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x69] = MMX_OP(punpckhwd), [0x6a] = MMX_OP(punpckhdq), [0x6b] = MMX_OP(packssdw), - [0x6c] = OP(op1, SSE_OPF_MMX, + [0x6c] = OP(op2, SSE_OPF_MMX, NULL, gen_helper_punpcklqdq_xmm, NULL, NULL), - [0x6d] = OP(op1, SSE_OPF_MMX, + [0x6d] = OP(op2, SSE_OPF_MMX, NULL, gen_helper_punpckhqdq_xmm, NULL, NULL), [0x6e] = SSE_SPECIAL, /* movd mm, ea */ [0x6f] = SSE_SPECIAL, /* movq, movdqa, , movqdu */ - [0x70] = OP(op1i, SSE_OPF_SHUF | SSE_OPF_MMX, + [0x70] = OP(op1i, SSE_OPF_SHUF | SSE_OPF_MMX | SSE_OPF_V0, gen_helper_pshufw_mmx, gen_helper_pshufd_xmm, gen_helper_pshufhw_xmm, gen_helper_pshuflw_xmm), [0x71] = SSE_SPECIAL, /* shiftw */ @@ -3023,17 +3037,17 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x76] = MMX_OP(pcmpeql), [0x77] = SSE_SPECIAL, /* emms */ [0x78] = SSE_SPECIAL, /* extrq_i, insertq_i (sse4a) */ - [0x79] = OP(op1, 0, + [0x79] = OP(op1, SSE_OPF_V0, NULL, gen_helper_extrq_r, NULL, gen_helper_insertq_r), - [0x7c] = OP(op1, 0, + [0x7c] = OP(op2, 0, NULL, gen_helper_haddpd_xmm, NULL, gen_helper_haddps_xmm), - [0x7d] = OP(op1, 0, + [0x7d] = OP(op2, 0, NULL, gen_helper_hsubpd_xmm, NULL, gen_helper_hsubps_xmm), [0x7e] = SSE_SPECIAL, /* movd, movd, , movq */ [0x7f] = SSE_SPECIAL, /* movq, movdqa, movdqu */ [0xc4] = SSE_SPECIAL, /* pinsrw */ [0xc5] = SSE_SPECIAL, /* pextrw */ - [0xd0] = OP(op1, 0, + [0xd0] = OP(op2, 0, NULL, gen_helper_addsubpd_xmm, NULL, gen_helper_addsubps_xmm), [0xd1] = MMX_OP(psrlw), [0xd2] = MMX_OP(psrld), @@ -3056,7 +3070,7 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0xe3] = MMX_OP(pavgw), [0xe4] = MMX_OP(pmulhuw), [0xe5] = MMX_OP(pmulhw), - [0xe6] = OP(op1, 0, + [0xe6] = OP(op1, SSE_OPF_V0, NULL, gen_helper_cvttpd2dq_xmm, gen_helper_cvtdq2pd_xmm, gen_helper_cvtpd2dq_xmm), [0xe7] = SSE_SPECIAL, /* movntq, movntq */ @@ -3075,7 +3089,7 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0xf4] = MMX_OP(pmuludq), [0xf5] = MMX_OP(pmaddwd), [0xf6] = MMX_OP(psadbw), - [0xf7] = OP(op1t, SSE_OPF_MMX, + [0xf7] = OP(op1t, SSE_OPF_MMX | SSE_OPF_V0, gen_helper_maskmov_mmx, gen_helper_maskmov_xmm, NULL, NULL), [0xf8] = MMX_OP(psubb), [0xf9] = MMX_OP(psubw), @@ -3093,7 +3107,7 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { #define MMX_OP2(x) { gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm } -static const SSEFunc_0_epp sse_op_table2[3 * 8][2] = { +static const SSEFunc_0_eppp sse_op_table2[3 * 8][2] = { [0 + 2] = MMX_OP2(psrlw), = { +static const SSEFunc_0_eppp sse_op_table4[8][4] = { SSE_CMP(cmpeq), SSE_CMP(cmplt), From patchwork Sun Sep 11 23:03:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973124 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CD278C6FA83 for ; Sun, 11 Sep 2022 23:24:41 +0000 (UTC) Received: from localhost ([::1]:47606 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWJb-0002UB-UW for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:24:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40818) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0w-0000oD-6R for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:38257) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0s-0007Do-0v for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937515; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YmKHaO/r+hE0N0Rf5uQ/ZkbpDUu41y7LeqCjf/WSlvI=; b=cE6HO8yFU7Ktthmo6D6B1q/LKohNtgwY24ZsGzeCUxJeZ1xdjO/fNl0ymPE1j5pEYm6A3g mHCYTAi6aCOvVcvPhR32mQK/XagfGTabSh8u8zihqcAvRe0Vt499F6SniYj/rOaEOMM86M hOp8RebpOdV0sUeZKPiwgGE6+5VDpn0= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-307-FH9NFhIyPeSGdmU4vCpf0g-1; Sun, 11 Sep 2022 19:05:14 -0400 X-MC-Unique: FH9NFhIyPeSGdmU4vCpf0g-1 Received: by mail-ed1-f71.google.com with SMTP id i17-20020a05640242d100b0044f18a5379aso5053814edc.21 for ; Sun, 11 Sep 2022 16:05:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=YmKHaO/r+hE0N0Rf5uQ/ZkbpDUu41y7LeqCjf/WSlvI=; b=Vp3enogvInCal7wIjfR+IBTTNPwu8DzSERv3AYBE2AmW3qyV6pX6Xek8BYibczweO/ BU68PkfwcA0a8knJiuKIid0eNse06LvX4AXhkqPNMSiXNEUeden98HhXZLUjRp86VRLF amO/bLXchhP6ZjLLHjOlQdzc6QSQMArHuZerTwhN9wJgJh8WSwSokTy54h2VJZn5sUQM OheXewSPTQFJ9UxSUxP7cZw/6QD/pVXcjqmXh82p3r6fPbctLZ7K9OJSFV5+iGuldolm 7xLNBESa1ztMcHO6ZJSqMsz+Jxxn/sCoHjuhBPy91B+FuC8r4SzekNmnIcP54fsC6Uum ClZA== X-Gm-Message-State: ACgBeo0qANMBWMJaW/r0Zy8lH94WgXSgzN+zuhOshVIGvUJyXI9CbMEE 2ppnRfDqtCKR+AJjgBrU1lRJRLD2vwHGXrVvRIX9EUC4RToHZvoDMcyu+SFGA94wXsSf8GeazV4 pKAaFxniWd0psDWReSf0+OaNCtCtNxzPVSW2KCIPmqLNRARsU0nJrkbzKJw2dMxBOWvw= X-Received: by 2002:a17:907:7b9d:b0:77d:8cd3:ea3a with SMTP id ne29-20020a1709077b9d00b0077d8cd3ea3amr1332795ejc.746.1662937512648; Sun, 11 Sep 2022 16:05:12 -0700 (PDT) X-Google-Smtp-Source: AA6agR5NEgq/d7P7zcOQwkpEDh6yZhs208ZnXo5J91zkcFp5vOR4B8TChK3HsLyjy6bay3uZcfpO1w== X-Received: by 2002:a17:907:7b9d:b0:77d:8cd3:ea3a with SMTP id ne29-20020a1709077b9d00b0077d8cd3ea3amr1332784ejc.746.1662937512343; Sun, 11 Sep 2022 16:05:12 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id f10-20020a170906048a00b0073d83f80b05sm3512566eja.94.2022.09.11.16.05.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:11 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 16/37] target/i386: support operand merging in binary scalar helpers Date: Mon, 12 Sep 2022 01:03:56 +0200 Message-Id: <20220911230418.340941-17-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the helpers to provide this functionality. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index fb8733f509..527da59299 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -557,12 +557,20 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) \ void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *v, Reg *s)\ { \ + int i; \ d->ZMM_S(0) = F(32, v->ZMM_S(0), s->ZMM_S(0)); \ + for (i = 1; i < 2 << SHIFT; i++) { \ + d->ZMM_L(i) = v->ZMM_L(i); \ + } \ } \ \ void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *v, Reg *s)\ { \ + int i; \ d->ZMM_D(0) = F(64, v->ZMM_D(0), s->ZMM_D(0)); \ + for (i = 1; i < 1 << SHIFT; i++) { \ + d->ZMM_Q(i) = v->ZMM_Q(i); \ + } \ } #else @@ -1027,12 +1035,20 @@ void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) SSE_HELPER_CMP_P(name, F, C) \ void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ + int i; \ d->ZMM_L(0) = C(F(32, v->ZMM_S(0), s->ZMM_S(0))) ? -1 : 0; \ + for (i = 1; i < 2 << SHIFT; i++) { \ + d->ZMM_L(i) = v->ZMM_L(i); \ + } \ } \ \ void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *v, Reg *s) \ { \ + int i; \ d->ZMM_Q(0) = C(F(64, v->ZMM_D(0), s->ZMM_D(0))) ? -1 : 0; \ + for (i = 1; i < 1 << SHIFT; i++) { \ + d->ZMM_Q(i) = v->ZMM_Q(i); \ + } \ } #define FPU_EQ(x) (x == float_relation_equal) From patchwork Sun Sep 11 23:03:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AA016ECAAD3 for ; Sun, 11 Sep 2022 23:28:08 +0000 (UTC) Received: from localhost ([::1]:58918 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWMx-0008Dd-Qj for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:28:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40822) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0y-0000oY-Bu for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:33317) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0v-0007GN-Gu for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937518; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QwFcav4CIlFUIW5RddyQog5+RBVD8ZgkmZmuUdbZRBs=; b=VwXTHubWHahrpM8NQ7MEGXwnw2o+2ha+NOfJ/HIky5r9eyeokx5JJ4aBtlwGSwP9m0dFZx w6vc0DxEZqKFqLHStwjOrw6zwAgEgnalOr9VYVmUoGRJAKR7zFnLZGjCr+miMqXHMJt88n OrfvQXfQUKOS+vnAU8D35C3qehJ/d1A= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-100-rViXsvwLMiO6CljFkOokUQ-1; Sun, 11 Sep 2022 19:05:17 -0400 X-MC-Unique: rViXsvwLMiO6CljFkOokUQ-1 Received: by mail-ed1-f69.google.com with SMTP id y14-20020a056402440e00b0044301c7ccd9so4951697eda.19 for ; Sun, 11 Sep 2022 16:05:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=QwFcav4CIlFUIW5RddyQog5+RBVD8ZgkmZmuUdbZRBs=; b=Nay74O3xw5rQ9MeNyuNbYTRSb9DtVNVxdhe48EHkSsmcw0Y6vxgEf+IoU7TYrtwDXe KWczuSTO9Sw8G2uoR8Pewy3ziI6OMzEakRiINyJg9pSaEkXAln2t20ZMM3jStR4zP2se vsmxvs1GgETAkAiMyC3T3WFrwa/DLciiLnCcL+7eaazMMfp1RBM0ZhLu8eZJWWiaRuxC l/mZq4BXqcZc72SyJPjKoix6WOBIlXQunRC/GK9wSPglTGLP05B7Z2cH0lIM1qt2iiJb czhigb1p66K3DZ9d4ITR7FLwWXnaN3ArDSFYn4vKYSljkIF9rFgnNpfgesRaboC//gr0 mxzA== X-Gm-Message-State: ACgBeo0l0gLgYOxpnJActII60dq2fyBeO4/z4/0Lwa2A6x5uYLVO9bqv tdrRNtvyJcOfxdgz15osU6yCe0c3nzb+7FTJkVCUmrb4X/IYmpF3cXd8aIk7f+OgjAAjC0KQl4Z +1ZIQOwPDNgctZm+SAw53S+Jr4SNSCY7m2zNdhlPONuG3ZJpPuCJ//ox/9SgkIzKuoSQ= X-Received: by 2002:a17:907:b09:b0:76f:99cc:81cd with SMTP id h9-20020a1709070b0900b0076f99cc81cdmr16145581ejl.530.1662937515712; Sun, 11 Sep 2022 16:05:15 -0700 (PDT) X-Google-Smtp-Source: AA6agR6ZHRFq9ZMsZkRKDITyjMJ94vqFrb/FHXbpAneuZ53FROcyaryOR8tQcQwJKoPim1fAfZL6Aw== X-Received: by 2002:a17:907:b09:b0:76f:99cc:81cd with SMTP id h9-20020a1709070b0900b0076f99cc81cdmr16145567ejl.530.1662937515284; Sun, 11 Sep 2022 16:05:15 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id g11-20020a170906538b00b0073ddd36ba8csm3505505ejo.145.2022.09.11.16.05.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:14 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 17/37] target/i386: provide 3-operand versions of unary scalar helpers Date: Mon, 12 Sep 2022 01:03:57 +0200 Message-Id: <20220911230418.340941-18-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the old SSE decoder to be compatible with new-style helpers. The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 48 ++++++++++++++++++++++++++++++------ target/i386/ops_sse_header.h | 16 ++++++------ target/i386/tcg/translate.c | 22 ++++++++++------- 3 files changed, 61 insertions(+), 25 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 527da59299..0d56f0949b 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -617,14 +617,22 @@ void glue(helper_sqrtpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } #if SHIFT == 1 -void helper_sqrtss(CPUX86State *env, Reg *d, Reg *s) +void helper_sqrtss(CPUX86State *env, Reg *d, Reg *v, Reg *s) { + int i; d->ZMM_S(0) = float32_sqrt(s->ZMM_S(0), &env->sse_status); + for (i = 1; i < 2 << SHIFT; i++) { + d->ZMM_L(i) = v->ZMM_L(i); + } } -void helper_sqrtsd(CPUX86State *env, Reg *d, Reg *s) +void helper_sqrtsd(CPUX86State *env, Reg *d, Reg *v, Reg *s) { + int i; d->ZMM_D(0) = float64_sqrt(s->ZMM_D(0), &env->sse_status); + for (i = 1; i < 1 << SHIFT; i++) { + d->ZMM_Q(i) = v->ZMM_Q(i); + } } #endif @@ -649,14 +657,22 @@ void glue(helper_cvtpd2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) } #if SHIFT == 1 -void helper_cvtss2sd(CPUX86State *env, Reg *d, Reg *s) +void helper_cvtss2sd(CPUX86State *env, Reg *d, Reg *v, Reg *s) { + int i; d->ZMM_D(0) = float32_to_float64(s->ZMM_S(0), &env->sse_status); + for (i = 1; i < 1 << SHIFT; i++) { + d->ZMM_Q(i) = v->ZMM_Q(i); + } } -void helper_cvtsd2ss(CPUX86State *env, Reg *d, Reg *s) +void helper_cvtsd2ss(CPUX86State *env, Reg *d, Reg *v, Reg *s) { + int i; d->ZMM_S(0) = float64_to_float32(s->ZMM_D(0), &env->sse_status); + for (i = 1; i < 2 << SHIFT; i++) { + d->ZMM_L(i) = v->ZMM_L(i); + } } #endif @@ -876,13 +892,17 @@ void glue(helper_rsqrtps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) } #if SHIFT == 1 -void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *v, ZMMReg *s) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); + int i; d->ZMM_S(0) = float32_div(float32_one, float32_sqrt(s->ZMM_S(0), &env->sse_status), &env->sse_status); set_float_exception_flags(old_flags, &env->sse_status); + for (i = 1; i < 2 << SHIFT; i++) { + d->ZMM_L(i) = v->ZMM_L(i); + } } #endif @@ -897,10 +917,14 @@ void glue(helper_rcpps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) } #if SHIFT == 1 -void helper_rcpss(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void helper_rcpss(CPUX86State *env, ZMMReg *d, ZMMReg *v, ZMMReg *s) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); + int i; d->ZMM_S(0) = float32_div(float32_one, s->ZMM_S(0), &env->sse_status); + for (i = 1; i < 2 << SHIFT; i++) { + d->ZMM_L(i) = v->ZMM_L(i); + } set_float_exception_flags(old_flags, &env->sse_status); } #endif @@ -1798,11 +1822,12 @@ void glue(helper_roundpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } #if SHIFT == 1 -void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, +void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s, uint32_t mode) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); signed char prev_rounding_mode; + int i; prev_rounding_mode = env->sse_status.float_rounding_mode; if (!(mode & (1 << 2))) { @@ -1823,6 +1848,9 @@ void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } d->ZMM_S(0) = float32_round_to_int(s->ZMM_S(0), &env->sse_status); + for (i = 1; i < 2 << SHIFT; i++) { + d->ZMM_L(i) = v->ZMM_L(i); + } if (mode & (1 << 3) && !(old_flags & float_flag_inexact)) { set_float_exception_flags(get_float_exception_flags(&env->sse_status) & @@ -1832,11 +1860,12 @@ void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, env->sse_status.float_rounding_mode = prev_rounding_mode; } -void glue(helper_roundsd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, +void glue(helper_roundsd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s, uint32_t mode) { uint8_t old_flags = get_float_exception_flags(&env->sse_status); signed char prev_rounding_mode; + int i; prev_rounding_mode = env->sse_status.float_rounding_mode; if (!(mode & (1 << 2))) { @@ -1857,6 +1886,9 @@ void glue(helper_roundsd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } d->ZMM_D(0) = float64_round_to_int(s->ZMM_D(0), &env->sse_status); + for (i = 1; i < 1 << SHIFT; i++) { + d->ZMM_Q(i) = v->ZMM_Q(i); + } if (mode & (1 << 3) && !(old_flags & float_flag_inexact)) { set_float_exception_flags(get_float_exception_flags(&env->sse_status) & diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 21fed7fa05..5d17146049 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -148,8 +148,8 @@ DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int) DEF_HELPER_4(name ## sd, void, env, Reg, Reg, Reg) #define SSE_HELPER_S3(name) \ SSE_HELPER_P3(name) \ - DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ - DEF_HELPER_3(name ## sd, void, env, Reg, Reg) + DEF_HELPER_4(name ## ss, void, env, Reg, Reg, Reg) \ + DEF_HELPER_4(name ## sd, void, env, Reg, Reg, Reg) #else #define SSE_HELPER_S4(name, ...) SSE_HELPER_P4(name) #define SSE_HELPER_S3(name, ...) SSE_HELPER_P3(name) @@ -179,8 +179,8 @@ DEF_HELPER_3(glue(cvttps2dq, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(glue(cvttpd2dq, SUFFIX), void, env, ZMMReg, ZMMReg) #if SHIFT == 1 -DEF_HELPER_3(cvtss2sd, void, env, Reg, Reg) -DEF_HELPER_3(cvtsd2ss, void, env, Reg, Reg) +DEF_HELPER_4(cvtss2sd, void, env, Reg, Reg, Reg) +DEF_HELPER_4(cvtsd2ss, void, env, Reg, Reg, Reg) DEF_HELPER_3(cvtpi2ps, void, env, ZMMReg, MMXReg) DEF_HELPER_3(cvtpi2pd, void, env, ZMMReg, MMXReg) DEF_HELPER_3(cvtsi2ss, void, env, ZMMReg, i32) @@ -214,8 +214,8 @@ DEF_HELPER_3(glue(rsqrtps, SUFFIX), void, env, ZMMReg, ZMMReg) DEF_HELPER_3(glue(rcpps, SUFFIX), void, env, ZMMReg, ZMMReg) #if SHIFT == 1 -DEF_HELPER_3(rsqrtss, void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(rcpss, void, env, ZMMReg, ZMMReg) +DEF_HELPER_4(rsqrtss, void, env, ZMMReg, ZMMReg, ZMMReg) +DEF_HELPER_4(rcpss, void, env, ZMMReg, ZMMReg, ZMMReg) DEF_HELPER_3(extrq_r, void, env, ZMMReg, ZMMReg) DEF_HELPER_4(extrq_i, void, env, ZMMReg, int, int) DEF_HELPER_3(insertq_r, void, env, ZMMReg, ZMMReg) @@ -342,8 +342,8 @@ DEF_HELPER_3(glue(phminposuw, SUFFIX), void, env, Reg, Reg) DEF_HELPER_4(glue(roundps, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(roundpd, SUFFIX), void, env, Reg, Reg, i32) #if SHIFT == 1 -DEF_HELPER_4(roundss_xmm, void, env, Reg, Reg, i32) -DEF_HELPER_4(roundsd_xmm, void, env, Reg, Reg, i32) +DEF_HELPER_5(roundss_xmm, void, env, Reg, Reg, Reg, i32) +DEF_HELPER_5(roundsd_xmm, void, env, Reg, Reg, Reg, i32) #endif DEF_HELPER_5(glue(blendps, SUFFIX), void, env, Reg, Reg, Reg, i32) DEF_HELPER_5(glue(blendpd, SUFFIX), void, env, Reg, Reg, Reg, i32) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index e996aab541..e147a95c5f 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2934,6 +2934,9 @@ static bool first = true; static unsigned long limit; #define SSE_OP(sname, dname, op, flags) OP(op, flags, \ gen_helper_##sname##_xmm, gen_helper_##dname##_xmm, NULL, NULL) +#define SSE_OP_UNARY(a, b, c, d) \ + {SSE_OPF_SCALAR | SSE_OPF_V0, {{.op1 = a}, {.op1 = b}, {.op2 = c}, {.op2 = d} } } + typedef union SSEFuncs { SSEFunc_0_epp op1; SSEFunc_0_ppi op1i; @@ -2976,12 +2979,12 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x2f] = OP(op1, SSE_OPF_CMP | SSE_OPF_SCALAR | SSE_OPF_V0, gen_helper_comiss, gen_helper_comisd, NULL, NULL), [0x50] = SSE_SPECIAL, /* movmskps, movmskpd */ - [0x51] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, + [0x51] = SSE_OP_UNARY( gen_helper_sqrtps_xmm, gen_helper_sqrtpd_xmm, gen_helper_sqrtss, gen_helper_sqrtsd), - [0x52] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, + [0x52] = SSE_OP_UNARY( gen_helper_rsqrtps_xmm, NULL, gen_helper_rsqrtss, NULL), - [0x53] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, + [0x53] = SSE_OP_UNARY( gen_helper_rcpps_xmm, NULL, gen_helper_rcpss, NULL), [0x54] = SSE_OP(pand, pand, op2, 0), /* andps, andpd */ [0x55] = SSE_OP(pandn, pandn, op2, 0), /* andnps, andnpd */ @@ -2989,9 +2992,9 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { [0x57] = SSE_OP(pxor, pxor, op2, 0), /* xorps, xorpd */ [0x58] = SSE_FOP(add), [0x59] = SSE_FOP(mul), - [0x5a] = OP(op1, SSE_OPF_SCALAR | SSE_OPF_V0, - gen_helper_cvtps2pd_xmm, gen_helper_cvtpd2ps_xmm, - gen_helper_cvtss2sd, gen_helper_cvtsd2ss), + [0x5a] = SSE_OP_UNARY( + gen_helper_cvtps2pd_xmm, gen_helper_cvtpd2ps_xmm, + gen_helper_cvtss2sd, gen_helper_cvtsd2ss), [0x5b] = OP(op1, SSE_OPF_V0, gen_helper_cvtdq2ps_xmm, gen_helper_cvtps2dq_xmm, gen_helper_cvttps2dq_xmm, NULL), @@ -3287,8 +3290,8 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { static const struct SSEOpHelper_table7 sse_op_table7[256] = { [0x08] = UNARY_OP(roundps, SSE41, 0), [0x09] = UNARY_OP(roundpd, SSE41, 0), - [0x0a] = UNARY_OP(roundss, SSE41, SSE_OPF_SCALAR), - [0x0b] = UNARY_OP(roundsd, SSE41, SSE_OPF_SCALAR), + [0x0a] = BINARY_OP(roundss, SSE41, SSE_OPF_SCALAR), + [0x0b] = BINARY_OP(roundsd, SSE41, SSE_OPF_SCALAR), [0x0c] = BINARY_OP(blendps, SSE41, 0), [0x0d] = BINARY_OP(blendpd, SSE41, 0), [0x0e] = BINARY_OP(pblendw, SSE41, SSE_OPF_MMX), @@ -4549,7 +4552,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - if (sse_op_flags & SSE_OPF_V0) { + if ((sse_op_flags & SSE_OPF_V0) && + !((sse_op_flags & SSE_OPF_SCALAR) && b1 >= 2)) { if (sse_op_flags & SSE_OPF_SHUF) { val = x86_ldub_code(env, s); sse_op_fn.op1i(s->ptr0, s->ptr1, tcg_const_i32(val)); From patchwork Sun Sep 11 23:03:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1B5D9ECAAD3 for ; Sun, 11 Sep 2022 23:24:50 +0000 (UTC) Received: from localhost ([::1]:47608 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWJl-0002pe-6G for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:24:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40820) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0x-0000oX-Qi for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:26763) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0v-0007IT-Tq for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hzgz4ku4rHQTyyjAV6VNW6xhHI2bqM8/RFghrFmi0LI=; b=WDG2DmJpdTU6RqGZbneN69y84mPZSzbe4wmam9etgbQ/Z5CsR7jXWAqjPi2Mh4GiZQOO9T Zm7bu8ZGjhAJhEgufFtsYzw8C+3ehieM4Zxu1p/h+Gmqy2t/G0EalftDXbzys38N/nBf7G RtN1RWlfyocXWNtLzpWYpMYFsoTqrcE= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-231-wvZPfnAkNX-xO3BNoX4_uA-1; Sun, 11 Sep 2022 19:05:19 -0400 X-MC-Unique: wvZPfnAkNX-xO3BNoX4_uA-1 Received: by mail-ed1-f70.google.com with SMTP id z9-20020a05640235c900b0044f0575e9ddso4978314edc.1 for ; Sun, 11 Sep 2022 16:05:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=hzgz4ku4rHQTyyjAV6VNW6xhHI2bqM8/RFghrFmi0LI=; b=4cnXMYRMr0umQmETv6bGUGaG/hIyoBeS5PGGPYnO4oZPCaA77wQxdvQ6vqzMheOwER cRy8TmPEkKrRaEfEspFuRq5QratvFq1PJw6J900cpsjsb404tAL/WNMrIq/TPXHbVU/c WstTfOmXFwASgA3FYuzrCBj90F3j3dJzQ3g+ORsAQCu7z1ZVuIUiaIMBwpjfytAZFfbn Hu0LlAy46L0ObZz/dg4G1l/XA3Vc85TfpqhOOwJQtNjAmnGJMHsZq186xQaYbgAUpany Qq3+xieYkr2xfEBxBQUE1aUlsyQslHmjsTQBrDDU/N7M8EwneLgNcCDWFMVy4UPnUC0H R5XA== X-Gm-Message-State: ACgBeo16UEDwcC3fwlZVhbLHXD7BpbABRandmQwpVF1xWPEmnJ0tRlbc Hw3p3YsKyyV6FKJr02bbv5SBzoI3ARvZDC6q/BoHN6UjcQpt3xY7mxhC5Tt/iuDB0m4uPpBYmg2 Nam8kiC/NL11j+g1QFmmxyK6MxUP85+H8r3W75yok3xFGMrm20FF6ZURGTxJsZZ2Fky8= X-Received: by 2002:a05:6402:26d3:b0:451:6ca9:bc5e with SMTP id x19-20020a05640226d300b004516ca9bc5emr5567897edd.325.1662937518196; Sun, 11 Sep 2022 16:05:18 -0700 (PDT) X-Google-Smtp-Source: AA6agR5NaUYAjnI/UHD2geYqz9M5ZN9gVsH8/i4uxob4+vUJH2TbVYpoPjRpjFQ2PNwFOBVXhpvKKA== X-Received: by 2002:a05:6402:26d3:b0:451:6ca9:bc5e with SMTP id x19-20020a05640226d300b004516ca9bc5emr5567883edd.325.1662937517952; Sun, 11 Sep 2022 16:05:17 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id f10-20020a170906048a00b0073d83f80b05sm3512640eja.94.2022.09.11.16.05.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:17 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 18/37] target/i386: implement additional AVX comparison operators Date: Mon, 12 Sep 2022 01:03:58 +0200 Message-Id: <20220911230418.340941-19-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, UPPERCASE_50_75=0.008 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The new implementation of SSE will cover AVX from the get go, so include the 24 extra comparison operators that are only available with the VEX prefix. Based on a patch by Paul Brook . Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/ops_sse.h | 38 ++++++++++++++++++++++++++++++++++++ target/i386/ops_sse_header.h | 27 +++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 0d56f0949b..93cee330d2 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1075,10 +1075,21 @@ void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) } \ } +static inline bool FPU_EQU(FloatRelation x) +{ + return (x == float_relation_equal || x == float_relation_unordered); +} +static inline bool FPU_GE(FloatRelation x) +{ + return (x == float_relation_equal || x == float_relation_greater); +} #define FPU_EQ(x) (x == float_relation_equal) #define FPU_LT(x) (x == float_relation_less) #define FPU_LE(x) (x <= float_relation_equal) +#define FPU_GT(x) (x == float_relation_greater) #define FPU_UNORD(x) (x == float_relation_unordered) +// We must make sure we evaluate the argument in case it is a signalling NAN +#define FPU_FALSE(x) (x == float_relation_equal && 0) #define FPU_CMPQ(size, a, b) \ float ## size ## _compare_quiet(a, b, &env->sse_status) @@ -1098,6 +1109,33 @@ SSE_HELPER_CMP(cmpnlt, FPU_CMPS, !FPU_LT) SSE_HELPER_CMP(cmpnle, FPU_CMPS, !FPU_LE) SSE_HELPER_CMP(cmpord, FPU_CMPQ, !FPU_UNORD) +SSE_HELPER_CMP(cmpequ, FPU_CMPQ, FPU_EQU) +SSE_HELPER_CMP(cmpnge, FPU_CMPS, !FPU_GE) +SSE_HELPER_CMP(cmpngt, FPU_CMPS, !FPU_GT) +SSE_HELPER_CMP(cmpfalse, FPU_CMPQ, FPU_FALSE) +SSE_HELPER_CMP(cmpnequ, FPU_CMPQ, !FPU_EQU) +SSE_HELPER_CMP(cmpge, FPU_CMPS, FPU_GE) +SSE_HELPER_CMP(cmpgt, FPU_CMPS, FPU_GT) +SSE_HELPER_CMP(cmptrue, FPU_CMPQ, !FPU_FALSE) + +SSE_HELPER_CMP(cmpeqs, FPU_CMPS, FPU_EQ) +SSE_HELPER_CMP(cmpltq, FPU_CMPQ, FPU_LT) +SSE_HELPER_CMP(cmpleq, FPU_CMPQ, FPU_LE) +SSE_HELPER_CMP(cmpunords, FPU_CMPS, FPU_UNORD) +SSE_HELPER_CMP(cmpneqq, FPU_CMPS, !FPU_EQ) +SSE_HELPER_CMP(cmpnltq, FPU_CMPQ, !FPU_LT) +SSE_HELPER_CMP(cmpnleq, FPU_CMPQ, !FPU_LE) +SSE_HELPER_CMP(cmpords, FPU_CMPS, !FPU_UNORD) + +SSE_HELPER_CMP(cmpequs, FPU_CMPS, FPU_EQU) +SSE_HELPER_CMP(cmpngeq, FPU_CMPQ, !FPU_GE) +SSE_HELPER_CMP(cmpngtq, FPU_CMPQ, !FPU_GT) +SSE_HELPER_CMP(cmpfalses, FPU_CMPS, FPU_FALSE) +SSE_HELPER_CMP(cmpnequs, FPU_CMPS, !FPU_EQU) +SSE_HELPER_CMP(cmpgeq, FPU_CMPQ, FPU_GE) +SSE_HELPER_CMP(cmpgtq, FPU_CMPQ, FPU_GT) +SSE_HELPER_CMP(cmptrues, FPU_CMPS, !FPU_FALSE) + #undef SSE_HELPER_CMP #if SHIFT == 1 diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 5d17146049..4bef536edb 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -237,6 +237,33 @@ SSE_HELPER_CMP(cmpnlt, FPU_CMPS, !FPU_LT) SSE_HELPER_CMP(cmpnle, FPU_CMPS, !FPU_LE) SSE_HELPER_CMP(cmpord, FPU_CMPQ, !FPU_UNORD) +SSE_HELPER_CMP(cmpequ, FPU_CMPQ, FPU_EQU) +SSE_HELPER_CMP(cmpnge, FPU_CMPS, !FPU_GE) +SSE_HELPER_CMP(cmpngt, FPU_CMPS, !FPU_GT) +SSE_HELPER_CMP(cmpfalse, FPU_CMPQ, FPU_FALSE) +SSE_HELPER_CMP(cmpnequ, FPU_CMPQ, !FPU_EQU) +SSE_HELPER_CMP(cmpge, FPU_CMPS, FPU_GE) +SSE_HELPER_CMP(cmpgt, FPU_CMPS, FPU_GT) +SSE_HELPER_CMP(cmptrue, FPU_CMPQ, !FPU_FALSE) + +SSE_HELPER_CMP(cmpeqs, FPU_CMPS, FPU_EQ) +SSE_HELPER_CMP(cmpltq, FPU_CMPQ, FPU_LT) +SSE_HELPER_CMP(cmpleq, FPU_CMPQ, FPU_LE) +SSE_HELPER_CMP(cmpunords, FPU_CMPS, FPU_UNORD) +SSE_HELPER_CMP(cmpneqq, FPU_CMPS, !FPU_EQ) +SSE_HELPER_CMP(cmpnltq, FPU_CMPQ, !FPU_LT) +SSE_HELPER_CMP(cmpnleq, FPU_CMPQ, !FPU_LE) +SSE_HELPER_CMP(cmpords, FPU_CMPS, !FPU_UNORD) + +SSE_HELPER_CMP(cmpequs, FPU_CMPS, FPU_EQU) +SSE_HELPER_CMP(cmpngeq, FPU_CMPQ, !FPU_GE) +SSE_HELPER_CMP(cmpngtq, FPU_CMPQ, !FPU_GT) +SSE_HELPER_CMP(cmpfalses, FPU_CMPS, FPU_FALSE) +SSE_HELPER_CMP(cmpnequs, FPU_CMPS, !FPU_EQU) +SSE_HELPER_CMP(cmpgeq, FPU_CMPQ, FPU_GE) +SSE_HELPER_CMP(cmpgtq, FPU_CMPQ, FPU_GT) +SSE_HELPER_CMP(cmptrues, FPU_CMPS, !FPU_FALSE) + #if SHIFT == 1 DEF_HELPER_3(ucomiss, void, env, Reg, Reg) DEF_HELPER_3(comiss, void, env, Reg, Reg) From patchwork Sun Sep 11 23:03:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973122 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E13C1ECAAD3 for ; Sun, 11 Sep 2022 23:22:09 +0000 (UTC) Received: from localhost ([::1]:40730 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWHB-0006HI-2G for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:22:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58482) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0z-0000oc-Qe for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:56646) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW0y-0007KW-60 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937523; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C9gFNdRiijjkpEaDUBRh1sDu148Og2rZQISI3RnoI30=; b=FpGsFHEOgjlhZRlpgXWkpAFeWW7HrZCXs5IjIAL+dskrlkodmxx66NGlqHVQrqZdTwPMPA opwPIsNcEt+OebuKKfvbRwwFyzLtO8FTQQNCGnLi2lDfKKqve5+bdX4TBbjKYdskjy62SR UccKwWNshxzhAJtJY9IY/0LoMJHwUiw= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-399-8utaPWmVNNGugkk2oVxRFw-1; Sun, 11 Sep 2022 19:05:22 -0400 X-MC-Unique: 8utaPWmVNNGugkk2oVxRFw-1 Received: by mail-ed1-f70.google.com with SMTP id r11-20020a05640251cb00b004516feb8c09so2154103edd.10 for ; Sun, 11 Sep 2022 16:05:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=C9gFNdRiijjkpEaDUBRh1sDu148Og2rZQISI3RnoI30=; b=NHCw+VHN/MmdXacsMbGstwQ4XW/Pit0ltktfN1jrV5CWaZVH4pe8iYkTvxiTVWiVrh 45eDXiBib7AqTJm2ZYTEIRhZ2Xw0LwvO3P46WZWC1DE49KUToQuQitzswo2i1Z608prQ EieT9K0tWOTs4GnXbeRjA6ETGUSfp8dkjoPJrqIsqQIQKY2Q2+QpdYaGHkl3jGgaMgNC WKyJvnwwf1PDKZ3OZsEXWOTSOuplKO615mGmFGUaBc69oQfOohOg11yHFmFpEJ6Aufl9 NrJNWCUjF2hvmt+SVmLBi0umRe1D+XKYoRsRTbFipPvvehlXTTkbROrox48Pfx/YIxoq RRmQ== X-Gm-Message-State: ACgBeo2ivw7XNqZ4NarsibZmSOmx79+NSNvXzCIq0HA343/5FAi91b0f rOmiHhisyEBatPcPPZZKxrO8sL95nQYu8tonsqWv/XJocYHh8XzSrhEq5zu8LTlft4iwaA3+pLs 8d9txPRRzqDwrLi4YlVorHNDxdGSILcEiVs2zLhL483GuUKW3JX+mtKwuiZA/UABVMO0= X-Received: by 2002:a17:907:8687:b0:730:7c7b:b9ce with SMTP id qa7-20020a170907868700b007307c7bb9cemr16569559ejc.656.1662937520744; Sun, 11 Sep 2022 16:05:20 -0700 (PDT) X-Google-Smtp-Source: AA6agR7KEYkSUxSsMjjkKYPfpAYH2oT44Gq3WKQMJ0x742yecMvyF/d8q/eQHSJBa9SGJtDLzO0iZw== X-Received: by 2002:a17:907:8687:b0:730:7c7b:b9ce with SMTP id qa7-20020a170907868700b007307c7bb9cemr16569550ejc.656.1662937520378; Sun, 11 Sep 2022 16:05:20 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id qw32-20020a1709066a2000b007246492658asm3525247ejc.117.2022.09.11.16.05.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:19 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 19/37] target/i386: Introduce 256-bit vector helpers Date: Mon, 12 Sep 2022 01:03:59 +0200 Message-Id: <20220911230418.340941-20-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The new implementation of SSE will cover AVX from the get go, because all the work for the helper functions is already done. We just need to build them. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/helper.h | 2 ++ target/i386/ops_sse.h | 5 +++++ target/i386/ops_sse_header.h | 4 ++++ target/i386/tcg/fpu_helper.c | 3 +++ 4 files changed, 14 insertions(+) diff --git a/target/i386/helper.h b/target/i386/helper.h index ac3b4d1ee3..3da5df98b9 100644 --- a/target/i386/helper.h +++ b/target/i386/helper.h @@ -218,6 +218,8 @@ DEF_HELPER_3(movq, void, env, ptr, ptr) #include "ops_sse_header.h" #define SHIFT 1 #include "ops_sse_header.h" +#define SHIFT 2 +#include "ops_sse_header.h" DEF_HELPER_3(rclb, tl, env, tl, tl) DEF_HELPER_3(rclw, tl, env, tl, tl) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 93cee330d2..4f72164c0f 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -35,7 +35,11 @@ #define W(n) ZMM_W(n) #define L(n) ZMM_L(n) #define Q(n) ZMM_Q(n) +#if SHIFT == 1 #define SUFFIX _xmm +#else +#define SUFFIX _ymm +#endif #endif #define LANE_WIDTH (SHIFT ? 16 : 8) @@ -2379,6 +2383,7 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #undef SSE_HELPER_S +#undef LANE_WIDTH #undef SHIFT #undef XMM_ONLY #undef Reg diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 4bef536edb..4041816945 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -21,7 +21,11 @@ #define SUFFIX _mmx #else #define Reg ZMMReg +#if SHIFT == 1 #define SUFFIX _xmm +#else +#define SUFFIX _ymm +#endif #endif #define dh_alias_Reg ptr diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 48bf0c5cf8..819e920ec6 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -3053,3 +3053,6 @@ void helper_movq(CPUX86State *env, void *d, void *s) #define SHIFT 1 #include "ops_sse.h" + +#define SHIFT 2 +#include "ops_sse.h" From patchwork Sun Sep 11 23:04:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5C3EEC6FA83 for ; Sun, 11 Sep 2022 23:35:00 +0000 (UTC) Received: from localhost ([::1]:48602 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWTb-0003B1-Hp for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:34:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58484) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW13-0000qT-2o for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:30 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:35854) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW10-0007Kn-SV for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937526; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gQZLklxHgqRRgkYl8eBLZxpuBOWeOWDzCxLLmpfeBOE=; b=Vs/bwbTkETvvFqauQPMVGz+7UxuMbf4K+xL6bsKM5C7UQ4PfSthgFS7TD2OTZx3Qo09pVH 9C5qjUrhCMkmxOD0sZxTJ68Dt2UGngP6Q7e5Z498kmFcCx1CZ61AgnQYUjbJj6E7TDoMbs ghPa7fwvuzkUhgMMyIb+xSrXOzrjrFg= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-583-ZGHNslfXM2qNYHcvQLfo0Q-1; Sun, 11 Sep 2022 19:05:25 -0400 X-MC-Unique: ZGHNslfXM2qNYHcvQLfo0Q-1 Received: by mail-ej1-f71.google.com with SMTP id sb14-20020a1709076d8e00b0073d48a10e10so2314625ejc.16 for ; Sun, 11 Sep 2022 16:05:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=gQZLklxHgqRRgkYl8eBLZxpuBOWeOWDzCxLLmpfeBOE=; b=PeZBeS29N1xqm6/DPSHBTVTvFmLczTV4wo+1RuTmSo5K2Eyu+ckoJLGxzQMUxe2tdr u3L6MZICdqNW7PRssgFJHqnbb5sLE7oPlZ4YzpA6Fb7fx14n1BhsLzE3SFLis/1O8Il3 jE1x+M/XRmHn0Trdhs75goMhaI5Uz/oSG7g6dk1bfW80Ye/HP427Hl2gyIreVac/papa oi7bEZfS/QMli/ZoNTDdQoDoAhTo3nEK6yDaLjyrPZX6hg7hNPI12y9YQTE5lEVYrZ3O KYzPgJIRbSOCcUZK1QkdTgbPwHbPibFnHbkQ2IRIqmzKbIji9ktMWLyDXArVm0fA8vvU +GiA== X-Gm-Message-State: ACgBeo3eBld748DkoRdPmoNYk2HsZ2ZiyU3R8zdN8qtn7rMpzES8FsSx ZZqAmc/49c5Ilhr5uJwMY8zwQ1vBBNdDSSPUNOunTGVQicybe1KsK+RYth7Zp8FSLqLXKDNPT4T 78kzfki+2XiYo9v1qA0To7EpImWPhDHF8U2oX8MLyjjaXbcXEicrWMMRLn8wzyGRcn28= X-Received: by 2002:a05:6402:148e:b0:44e:aa8c:abc5 with SMTP id e14-20020a056402148e00b0044eaa8cabc5mr19977482edv.145.1662937523584; Sun, 11 Sep 2022 16:05:23 -0700 (PDT) X-Google-Smtp-Source: AA6agR5gqACb6aZFMs0dUXoohAbxP9zs5IEbuVR+hgOh/1LfyWrKgNLEA2Uhb+B0TmNP/oTTImku4w== X-Received: by 2002:a05:6402:148e:b0:44e:aa8c:abc5 with SMTP id e14-20020a056402148e00b0044eaa8cabc5mr19977466edv.145.1662937523082; Sun, 11 Sep 2022 16:05:23 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id x21-20020aa7dad5000000b0044e91d8ccd2sm4648283eds.50.2022.09.11.16.05.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:22 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 20/37] target/i386: reimplement 0x0f 0x60-0x6f, add AVX Date: Mon, 12 Sep 2022 01:04:00 +0200 Message-Id: <20220911230418.340941-21-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" These are both MMX and SSE/AVX instructions, except for vmovdqu. In both cases the inputs and output is in s->ptr{0,1,2}, so the only difference between MMX, SSE, and AVX is which helper to call. PCMPGT, MOVD and MOVQ are implemented using gvec. The amount of macro magic for generating functions is kept to the minimum. In particular, the gvec cases are easy enough and have no duplication within each function, so they are spelled out one by one. Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 35 ++++++++ target/i386/tcg/emit.c.inc | 148 +++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 3 +- 3 files changed, 185 insertions(+), 1 deletion(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index b31daecb90..f20587c096 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -142,6 +142,23 @@ static void decode_group17(DisasContext *s, CPUX86State *env, X86OpEntry *entry, entry->gen = group17_gen[op]; } +static void decode_0F6F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + if (s->prefix & PREFIX_REPNZ) { + entry->gen = NULL; + } else if (s->prefix & PREFIX_REPZ) { + /* movdqu */ + entry->gen = gen_MOVDQ; + entry->vex_class = 4; + entry->vex_special = X86_VEX_SSEUnaligned; + } else { + /* MMX movq, movdqa */ + entry->gen = gen_MOVDQ; + entry->vex_class = 1; + entry->special = X86_SPECIAL_MMX; + } +} + static const X86OpEntry opcodes_0F38_00toEF[240] = { }; @@ -227,8 +244,26 @@ static void decode_0F3A(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui } static const X86OpEntry opcodes_0F[256] = { + [0x60] = X86_OP_ENTRY3(PUNPCKLBW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x61] = X86_OP_ENTRY3(PUNPCKLWD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x62] = X86_OP_ENTRY3(PUNPCKLDQ, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x63] = X86_OP_ENTRY3(PACKSSWB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x64] = X86_OP_ENTRY3(PCMPGTB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x65] = X86_OP_ENTRY3(PCMPGTW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x66] = X86_OP_ENTRY3(PCMPGTD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x67] = X86_OP_ENTRY3(PACKUSWB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x38] = X86_OP_GROUP0(0F38), [0x3a] = X86_OP_GROUP0(0F3A), + + [0x68] = X86_OP_ENTRY3(PUNPCKHBW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x69] = X86_OP_ENTRY3(PUNPCKHWD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x6a] = X86_OP_ENTRY3(PUNPCKHDQ, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x6b] = X86_OP_ENTRY3(PACKSSDW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x6c] = X86_OP_ENTRY3(PUNPCKLQDQ, V,x, H,x, W,x, vex4 p_66 avx2_256), + [0x6d] = X86_OP_ENTRY3(PUNPCKHQDQ, V,x, H,x, W,x, vex4 p_66 avx2_256), + [0x6e] = X86_OP_ENTRY3(MOVD_to, V,x, None,None, E,y, vex5 mmx p_00_66), /* wrong dest Vy on SDM! */ + [0x6f] = X86_OP_GROUP3(0F6F, V,x, None,None, W,x, vex5 mmx p_00_66_f3), }; static void do_decode_0F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 36b963a0d3..3f89d3cf50 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -212,6 +212,97 @@ static void gen_writeback(DisasContext *s, X86DecodedOp *op) } } +static inline int sse_vec_len(DisasContext *s, X86DecodedInsn *decode) +{ + if (decode->e.special == X86_SPECIAL_MMX && + !(s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) { + return 8; + } + return s->vex_l ? 32 : 16; +} + +static void gen_store_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, int src_ofs) +{ + MemOp ot = decode->op[0].ot; + int vec_len = sse_vec_len(s, decode); + + if (!decode->op[0].has_ea) { + tcg_gen_gvec_mov(MO_64, decode->op[0].offset, src_ofs, vec_len, vec_len); + return; + } + + switch (ot) { + case MO_64: + gen_stq_env_A0(s, src_ofs); + break; + case MO_128: + gen_sto_env_A0(s, src_ofs); + break; + case MO_256: + gen_sty_env_A0(s, src_ofs); + break; + default: + abort(); + } +} + +/* + * 00 = p* Pq, Qq (if mmx not NULL; no VEX) + * 66 = vp* Vx, Hx, Wx + * + * These are really the same encoding, because 1) V is the same as P when VEX.V + * is not present 2) P and Q are the same as H and W apart from MM/XMM + */ +static inline void gen_binary_int_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_eppp mmx, SSEFunc_0_eppp xmm, SSEFunc_0_eppp ymm) +{ + assert (!!mmx == !!(decode->e.special == X86_SPECIAL_MMX)); + + if (mmx && (s->prefix & PREFIX_VEX) && !(s->prefix & PREFIX_DATA)) { + /* VEX encoding is not applicable to MMX instructions. */ + gen_illegal_opcode(s); + return; + } + if (!(s->prefix & PREFIX_DATA)) { + mmx(cpu_env, s->ptr0, s->ptr1, s->ptr2); + } else if (!s->vex_l) { + xmm(cpu_env, s->ptr0, s->ptr1, s->ptr2); + } else { + ymm(cpu_env, s->ptr0, s->ptr1, s->ptr2); + } +} + +#define BINARY_INT_MMX(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_binary_int_sse(s, env, decode, \ + gen_helper_##lname##_mmx, \ + gen_helper_##lname##_xmm, \ + gen_helper_##lname##_ymm); \ +} +BINARY_INT_MMX(PUNPCKLBW, punpcklbw) +BINARY_INT_MMX(PUNPCKLWD, punpcklwd) +BINARY_INT_MMX(PUNPCKLDQ, punpckldq) +BINARY_INT_MMX(PACKSSWB, packsswb) +BINARY_INT_MMX(PACKUSWB, packuswb) +BINARY_INT_MMX(PUNPCKHBW, punpckhbw) +BINARY_INT_MMX(PUNPCKHWD, punpckhwd) +BINARY_INT_MMX(PUNPCKHDQ, punpckhdq) +BINARY_INT_MMX(PACKSSDW, packssdw) + +/* Instructions with no MMX equivalent. */ +#define BINARY_INT_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_binary_int_sse(s, env, decode, \ + NULL, \ + gen_helper_##lname##_xmm, \ + gen_helper_##lname##_ymm); \ +} + +BINARY_INT_SSE(PUNPCKLQDQ, punpcklqdq) +BINARY_INT_SSE(PUNPCKHQDQ, punpckhqdq) + static void gen_ADCOX(DisasContext *s, CPUX86State *env, MemOp ot, int cc_op) { TCGv carry_in = NULL; @@ -382,6 +473,36 @@ static void gen_MOVBE(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) } } +static void gen_MOVD_to(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[2].ot; + int vec_len = sse_vec_len(s, decode); + int lo_ofs = decode->op[0].offset + - xmm_offset(decode->op[0].ot) + + xmm_offset(ot); + + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + + switch (ot) { + case MO_32: +#ifdef TARGET_X86_64 + tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1); + tcg_gen_st_i32(s->tmp3_i32, cpu_env, lo_ofs); + break; + case MO_64: +#endif + tcg_gen_st_tl(s->T1, cpu_env, lo_ofs); + break; + default: + abort(); + } +} + +static void gen_MOVDQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_store_sse(s, env, decode, decode->op[2].offset); +} + static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { MemOp ot = decode->op[0].ot; @@ -405,6 +526,33 @@ static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) } +static void gen_PCMPGTB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_cmp(TCG_COND_GT, MO_8, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PCMPGTW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_cmp(TCG_COND_GT, MO_16, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PCMPGTD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_cmp(TCG_COND_GT, MO_32, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + static void gen_PDEP(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { MemOp ot = decode->op[1].ot; diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index e147a95c5f..cf18e12d38 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -23,6 +23,7 @@ #include "disas/disas.h" #include "exec/exec-all.h" #include "tcg/tcg-op.h" +#include "tcg/tcg-op-gvec.h" #include "exec/cpu_ldst.h" #include "exec/translator.h" @@ -4665,7 +4666,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) #ifndef CONFIG_USER_ONLY use_new &= b <= limit; #endif - if (use_new && 0) { + if (use_new && (b >= 0x160 && b <= 0x16f)) { return disas_insn_new(s, cpu, b + 0x100); } break; From patchwork Sun Sep 11 23:04:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6CFC7C6FA83 for ; Sun, 11 Sep 2022 23:37:43 +0000 (UTC) Received: from localhost ([::1]:37994 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWWE-0007h7-DW for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:37:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58486) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW16-0000uO-SC for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:34 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:34909) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW14-0007L6-91 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937529; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FW4Z1TYbUqVbKB+UYVWcEtd79l89qxtOGgRtrSHI16w=; b=aVPBO6kE0MY9gUIycLIWqCWJmVRDjO0VcFcHB7IdonpPjDogsWDzgbySTMwXdk5j+3ij1d 2E5M1DL+VA4OiSLVt20wDU7MnbDNiilDlF9CbNUwqyWbuxxdezZuA2Ym8O0B9tEY+yAWeN b8ZEtvb2YaCR305so18A06SRfCBSLyo= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-75-ApOGARNWPiChGWtiVBpYTg-1; Sun, 11 Sep 2022 19:05:28 -0400 X-MC-Unique: ApOGARNWPiChGWtiVBpYTg-1 Received: by mail-ed1-f70.google.com with SMTP id z6-20020a05640240c600b0043e1d52fd98so5055286edb.22 for ; Sun, 11 Sep 2022 16:05:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=FW4Z1TYbUqVbKB+UYVWcEtd79l89qxtOGgRtrSHI16w=; b=D69LzoAEXtMfaups7KCAE41wZFQn4Gqkt2rGdn3uCMv3MkBd+2e2ZyAEovikKo51ga RrziFccnmh4vPwH0qeeGwrO++ec9EP4k47ac/BaQZGh/cK1+g6lEM25cgREr86VU12hj tlFJSVcjjaR3dM0owFnzG7b1r4Q+Ho24Cr5zwktd9Hw5H/Z/CcDc1N7tzxYhjSr1kVcn 32sYI9xbEUoczIfE86LhS+vjphsYh88ASsLlIo4qgDMIpirlrGejC5wI3Xd+RP1G63ZX y0F/yHe0j8wq/YpzSQpyxxltyZyFWRWH3Xip35x2g9R+fFgSVE3sT4T6r3BwvNIMsOaL oe/g== X-Gm-Message-State: ACgBeo0JMyAxLof5EahCP+7L+6oZOufL5dhh3tNljfQKXMahT2hUca7E eJDQ2J9AEJNKvat7cxy5CNJQjd3/8NO97t+PTtCyTe6EmdZnwZ9IzMdUct1ZwssogmaBkDefS57 JyRsGT+3H+jvbNiceFeEWNodVEK856oc0Noy5UgXZ4oHDIc7HcerJ+vPzBjojQ0a5rVg= X-Received: by 2002:a05:6402:1e8f:b0:440:eb20:7a05 with SMTP id f15-20020a0564021e8f00b00440eb207a05mr19621552edf.169.1662937526851; Sun, 11 Sep 2022 16:05:26 -0700 (PDT) X-Google-Smtp-Source: AA6agR5SddNfKtVgA/CPDoPeNpiTUnNPUl9zsffeAJZWBVvK/F0LipDtkir+1kVrBK4gyRCu5bIL4w== X-Received: by 2002:a05:6402:1e8f:b0:440:eb20:7a05 with SMTP id f15-20020a0564021e8f00b00440eb207a05mr19621536edf.169.1662937526482; Sun, 11 Sep 2022 16:05:26 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id x6-20020a05640226c600b00445c0ab272fsm4718167edd.29.2022.09.11.16.05.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:26 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 21/37] target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX Date: Mon, 12 Sep 2022 01:04:01 +0200 Message-Id: <20220911230418.340941-22-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" These are more simple integer instructions present in both MMX and SSE/AVX, with no holes that were later occupied by newer instructions. Simple, non-saturating operations are implemented using gvec; apart from this, there is not much to talk about. Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 28 ++++++++ target/i386/tcg/emit.c.inc | 113 +++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 4 +- 3 files changed, 144 insertions(+), 1 deletion(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index f20587c096..59f5637583 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -264,6 +264,34 @@ static const X86OpEntry opcodes_0F[256] = { [0x6d] = X86_OP_ENTRY3(PUNPCKHQDQ, V,x, H,x, W,x, vex4 p_66 avx2_256), [0x6e] = X86_OP_ENTRY3(MOVD_to, V,x, None,None, E,y, vex5 mmx p_00_66), /* wrong dest Vy on SDM! */ [0x6f] = X86_OP_GROUP3(0F6F, V,x, None,None, W,x, vex5 mmx p_00_66_f3), + + /* Incorrectly missing from 2-17 */ + [0xd8] = X86_OP_ENTRY3(PSUBUSB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xd9] = X86_OP_ENTRY3(PSUBUSW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xda] = X86_OP_ENTRY3(PMINUB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xdb] = X86_OP_ENTRY3(PAND, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xdc] = X86_OP_ENTRY3(PADDUSB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xdd] = X86_OP_ENTRY3(PADDUSW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xde] = X86_OP_ENTRY3(PMAXUB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xdf] = X86_OP_ENTRY3(PANDN, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + + [0xe8] = X86_OP_ENTRY3(PSUBSB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xe9] = X86_OP_ENTRY3(PSUBSW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xea] = X86_OP_ENTRY3(PMINSW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xeb] = X86_OP_ENTRY3(POR, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xec] = X86_OP_ENTRY3(PADDSB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xed] = X86_OP_ENTRY3(PADDSW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xee] = X86_OP_ENTRY3(PMAXSW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xef] = X86_OP_ENTRY3(PXOR, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + + [0xf8] = X86_OP_ENTRY3(PSUBB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xf9] = X86_OP_ENTRY3(PSUBW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xfa] = X86_OP_ENTRY3(PSUBD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xfb] = X86_OP_ENTRY3(PSUBQ, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xfc] = X86_OP_ENTRY3(PADDB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xfd] = X86_OP_ENTRY3(PADDW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xfe] = X86_OP_ENTRY3(PADDD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + /* 0xff = UD0 */ }; static void do_decode_0F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 3f89d3cf50..1ba7a45668 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -290,6 +290,20 @@ BINARY_INT_MMX(PUNPCKHWD, punpckhwd) BINARY_INT_MMX(PUNPCKHDQ, punpckhdq) BINARY_INT_MMX(PACKSSDW, packssdw) +BINARY_INT_MMX(PSUBUSB, psubusb) +BINARY_INT_MMX(PSUBUSW, psubusw) +BINARY_INT_MMX(PMINUB, pminub) +BINARY_INT_MMX(PADDUSB, paddusb) +BINARY_INT_MMX(PADDUSW, paddusw) +BINARY_INT_MMX(PMAXUB, pmaxub) + +BINARY_INT_MMX(PSUBSB, psubsb) +BINARY_INT_MMX(PSUBSW, psubsw) +BINARY_INT_MMX(PMINSW, pminsw) +BINARY_INT_MMX(PADDSB, paddsb) +BINARY_INT_MMX(PADDSW, paddsw) +BINARY_INT_MMX(PMAXSW, pmaxsw) + /* Instructions with no MMX equivalent. */ #define BINARY_INT_SSE(uname, lname) \ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ @@ -526,6 +540,51 @@ static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) } +static void gen_PADDB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_add(MO_8, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PADDW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_add(MO_16, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PADDD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_add(MO_32, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PAND(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_and(MO_64, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PANDN(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_andc(MO_64, + decode->op[0].offset, decode->op[2].offset, + decode->op[1].offset, vec_len, vec_len); +} + static void gen_PCMPGTB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); @@ -571,6 +630,60 @@ static void gen_PEXT(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) gen_helper_pext(s->T0, s->T0, s->T1); } +static void gen_POR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_or(MO_64, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PXOR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_xor(MO_64, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PSUBB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_sub(MO_8, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PSUBW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_sub(MO_16, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PSUBD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_sub(MO_32, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PSUBQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_sub(MO_64, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + static void gen_RORX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { MemOp ot = decode->op[0].ot; diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index cf18e12d38..11c17258eb 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4666,7 +4666,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) #ifndef CONFIG_USER_ONLY use_new &= b <= limit; #endif - if (use_new && (b >= 0x160 && b <= 0x16f)) { + if (use_new && + ((b >= 0x160 && b <= 0x16f) || + (b >= 0x1d8 && b <= 0x1ff && (b & 8)))) { return disas_insn_new(s, cpu, b + 0x100); } break; From patchwork Sun Sep 11 23:04:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BA821C6FA83 for ; Sun, 11 Sep 2022 23:28:17 +0000 (UTC) Received: from localhost ([::1]:58926 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWN4-00007H-H0 for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:28:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34144) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1E-0000wZ-Qs for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:40 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:47341) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW17-0007LG-1l for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937532; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GW2XheR4o4W6ufiiIloxx0V0HDwPwXTU2rDqcPVe0eA=; b=MtD8+k9ZFqEFNY0jREDIztSOrqs+Db/A6pOvXKXkCWcGwZ0Cwgol01zml4eENfuHMJDilt m3Qj+WJW0/Q3OAjLl7qTB36TVryf2D7Xzfd/p2qG9WriN+LH4jlCVEhBcO55YmrZSOpDKm Tn5fZp3ZVIg0CMP+WzpwKL7cl4hnqGc= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-412-Xfph4AaZPSSknvfiLmeDHg-1; Sun, 11 Sep 2022 19:05:31 -0400 X-MC-Unique: Xfph4AaZPSSknvfiLmeDHg-1 Received: by mail-ed1-f69.google.com with SMTP id z2-20020a056402274200b004516734e755so2469572edd.3 for ; Sun, 11 Sep 2022 16:05:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=GW2XheR4o4W6ufiiIloxx0V0HDwPwXTU2rDqcPVe0eA=; b=OdM3YRTtffTv1fA6hjV3XjFiF0difeRi4iXHG6PMGM5qEhN/8n7yQNiEIsnRBM5SjI b2zmKREC7Z6SKtw/U+H6NGqVr1euLtzgJrmKL495i11l4WEenGnumUyhtNy7LkTGUELd cOIc640ZLAOPmq9WvzzXl5qQb9DWEG9rGuSXz8+xfU90dGRHZSytPSvkt8onUeCPdfgo mlqgPaTSJWmZDDO6Myb10ghItOn9mHh8ffg3etXceCu4JnxZyzdzmzp80gD+wy1O/eCX 7x3LQyxSmvBQ8Re76w8rQeqxXPCZiKBn7DX+JtF9R5ycYd6TIcmKngrx7NR1xqzxJy54 fqfw== X-Gm-Message-State: ACgBeo13F3taulddP1tvGRL/pl4o+btzAJwPAuCsSrkvR15pfRzpL8Bx ySnH7cs9DYps5N3/ZIsouOb+Zp27EYB0THWmD7605LXrzPXmdl3Mh+SSzVQbolZToBHNvdbtgiT I66oh6B7RfMiXYOCeJdlQ+w5fI2M44U73Li5fI5611jhxJiJ98ZsEutPKOi0QJXCRqe4= X-Received: by 2002:a17:906:9b87:b0:733:1795:2855 with SMTP id dd7-20020a1709069b8700b0073317952855mr16785118ejc.156.1662937529836; Sun, 11 Sep 2022 16:05:29 -0700 (PDT) X-Google-Smtp-Source: AA6agR6EwMzJuufccI18w0T5Jo3qrX20KFW8EKkG1+w8tUdiP2MwPIuY6nXk4xv7XiEU4y69PZWnfg== X-Received: by 2002:a17:906:9b87:b0:733:1795:2855 with SMTP id dd7-20020a1709069b8700b0073317952855mr16785100ejc.156.1662937529423; Sun, 11 Sep 2022 16:05:29 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id c26-20020a056402101a00b0044792480994sm4612566edu.68.2022.09.11.16.05.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:28 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 22/37] target/i386: reimplement 0x0f 0x50-0x5f, add AVX Date: Mon, 12 Sep 2022 01:04:02 +0200 Message-Id: <20220911230418.340941-23-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" These are mostly floating-point SSE operations. The odd ones out are MOVMSK and CVTxx2yy, the others are straightforward. Unary operations are a bit special in AVX because they have 2 operands for PD/PS operands (VEX.vvvv must be 1111b), and 3 operands for SD/SS. They are handled using X86_OP_GROUP3 for compactness. Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 32 ++++++ target/i386/tcg/emit.c.inc | 175 +++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 2 +- 3 files changed, 208 insertions(+), 1 deletion(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 59f5637583..5a94e05d71 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -243,7 +243,30 @@ static void decode_0F3A(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui *entry = opcodes_0F3A[*b]; } +static void decode_sse_unary(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ))) { + entry->op1 = X86_TYPE_None; + entry->s1 = X86_SIZE_None; + } + switch (*b) { + case 0x51: entry->gen = gen_VSQRT; break; + case 0x52: entry->gen = gen_VRSQRT; break; + case 0x53: entry->gen = gen_VRCP; break; + case 0x5A: entry->gen = gen_VCVTfp2fp; break; + } +} + static const X86OpEntry opcodes_0F[256] = { + [0x50] = X86_OP_ENTRY3(MOVMSK, G,y, None,None, U,x, vex7 p_00_66), + [0x51] = X86_OP_GROUP3(sse_unary, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), + [0x52] = X86_OP_GROUP3(sse_unary, V,x, H,x, W,x, vex5 p_00_f3), + [0x53] = X86_OP_GROUP3(sse_unary, V,x, H,x, W,x, vex5 p_00_f3), + [0x54] = X86_OP_ENTRY3(VAND, V,x, H,x, W,x, vex4 p_00_66), + [0x55] = X86_OP_ENTRY3(VANDN, V,x, H,x, W,x, vex4 p_00_66), + [0x56] = X86_OP_ENTRY3(VOR, V,x, H,x, W,x, vex4 p_00_66), + [0x57] = X86_OP_ENTRY3(VXOR, V,x, H,x, W,x, vex4 p_00_66), + [0x60] = X86_OP_ENTRY3(PUNPCKLBW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0x61] = X86_OP_ENTRY3(PUNPCKLWD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0x62] = X86_OP_ENTRY3(PUNPCKLDQ, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), @@ -256,6 +279,15 @@ static const X86OpEntry opcodes_0F[256] = { [0x38] = X86_OP_GROUP0(0F38), [0x3a] = X86_OP_GROUP0(0F3A), + [0x58] = X86_OP_ENTRY3(VADD, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), + [0x59] = X86_OP_ENTRY3(VMUL, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), + [0x5a] = X86_OP_GROUP3(sse_unary, V,x, H,x, W,x, vex3 p_00_66_f3_f2), + [0x5b] = X86_OP_ENTRY2(VCVTps_dq, V,x, W,x, vex2 p_00_66_f3), + [0x5c] = X86_OP_ENTRY3(VSUB, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), + [0x5d] = X86_OP_ENTRY3(VMIN, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), + [0x5e] = X86_OP_ENTRY3(VDIV, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), + [0x5f] = X86_OP_ENTRY3(VMAX, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), + [0x68] = X86_OP_ENTRY3(PUNPCKHBW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0x69] = X86_OP_ENTRY3(PUNPCKHWD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0x6a] = X86_OP_ENTRY3(PUNPCKHDQ, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 1ba7a45668..5feb50efdb 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -246,6 +246,140 @@ static void gen_store_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *dec } } +static inline int sse_prefix(DisasContext *s) +{ + if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) { + return s->prefix & PREFIX_REPZ ? 0xf3 : 0xf2; + } else { + return s->prefix & PREFIX_DATA ? 0x66 : 0x00; + } +} + +/* + * 00 = v*ps Vps, Hps, Wpd + * 66 = v*pd Vpd, Hpd, Wps + * f3 = v*ss Vss, Hss, Wps + * f2 = v*sd Vsd, Hsd, Wps + */ +static inline void gen_unary_fp_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_epp pd_xmm, SSEFunc_0_epp ps_xmm, + SSEFunc_0_epp pd_ymm, SSEFunc_0_epp ps_ymm, + SSEFunc_0_eppp sd, SSEFunc_0_eppp ss) +{ + if ((s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) != 0) { + SSEFunc_0_eppp fn = s->prefix & PREFIX_REPZ ? ss : sd; + if (!fn) { + gen_illegal_opcode(s); + return; + } + fn(cpu_env, s->ptr0, s->ptr1, s->ptr2); + } else { + SSEFunc_0_epp ps, pd, fn; + ps = s->vex_l ? ps_ymm : ps_xmm; + pd = s->vex_l ? pd_ymm : pd_xmm; + fn = s->prefix & PREFIX_DATA ? pd : ps; + if (!fn) { + gen_illegal_opcode(s); + return; + } + fn(cpu_env, s->ptr0, s->ptr2); + } +} +#define UNARY_FP_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_unary_fp_sse(s, env, decode, \ + gen_helper_##lname##pd_xmm, \ + gen_helper_##lname##ps_xmm, \ + gen_helper_##lname##pd_ymm, \ + gen_helper_##lname##ps_ymm, \ + gen_helper_##lname##sd, \ + gen_helper_##lname##ss); \ +} +UNARY_FP_SSE(VSQRT, sqrt) + +/* + * 00 = v*ps Vps, Hps, Wpd + * 66 = v*pd Vpd, Hpd, Wps + * f3 = v*ss Vss, Hss, Wps + * f2 = v*sd Vsd, Hsd, Wps + */ +static inline void gen_fp_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_eppp pd_xmm, SSEFunc_0_eppp ps_xmm, + SSEFunc_0_eppp pd_ymm, SSEFunc_0_eppp ps_ymm, + SSEFunc_0_eppp sd, SSEFunc_0_eppp ss) +{ + SSEFunc_0_eppp ps, pd, fn; + if ((s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) != 0) { + fn = s->prefix & PREFIX_REPZ ? ss : sd; + } else { + ps = s->vex_l ? ps_ymm : ps_xmm; + pd = s->vex_l ? pd_ymm : pd_xmm; + fn = s->prefix & PREFIX_DATA ? pd : ps; + } + if (fn) { + fn(cpu_env, s->ptr0, s->ptr1, s->ptr2); + } else { + gen_illegal_opcode(s); + } +} +#define FP_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_fp_sse(s, env, decode, \ + gen_helper_##lname##pd_xmm, \ + gen_helper_##lname##ps_xmm, \ + gen_helper_##lname##pd_ymm, \ + gen_helper_##lname##ps_ymm, \ + gen_helper_##lname##sd, \ + gen_helper_##lname##ss); \ +} +FP_SSE(VADD, add) +FP_SSE(VMUL, mul) +FP_SSE(VSUB, sub) +FP_SSE(VMIN, min) +FP_SSE(VDIV, div) +FP_SSE(VMAX, max) + +/* + * 00 = v*ps Vps, Wpd + * f3 = v*ss Vss, Wps + */ +static inline void gen_unary_fp32_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_epp ps_xmm, + SSEFunc_0_epp ps_ymm, + SSEFunc_0_eppp ss) +{ + if ((s->prefix & (PREFIX_DATA | PREFIX_REPNZ)) != 0) { + goto illegal_op; + } else if (s->prefix & PREFIX_REPZ) { + if (!ss) { + goto illegal_op; + } + ss(cpu_env, s->ptr0, s->ptr1, s->ptr2); + } else { + SSEFunc_0_epp fn = s->vex_l ? ps_ymm : ps_xmm; + if (!fn) { + goto illegal_op; + } + fn(cpu_env, s->ptr0, s->ptr2); + } + return; + +illegal_op: + gen_illegal_opcode(s); +} +#define UNARY_FP32_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_unary_fp32_sse(s, env, decode, \ + gen_helper_##lname##ps_xmm, \ + gen_helper_##lname##ps_ymm, \ + gen_helper_##lname##ss); \ +} +UNARY_FP32_SSE(VRSQRT, rsqrt) +UNARY_FP32_SSE(VRCP, rcp) + /* * 00 = p* Pq, Qq (if mmx not NULL; no VEX) * 66 = vp* Vx, Hx, Wx @@ -517,6 +651,16 @@ static void gen_MOVDQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) gen_store_sse(s, env, decode, decode->op[2].offset); } +static void gen_MOVMSK(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + typeof(gen_helper_movmskps_ymm) *ps, *pd, *fn; + ps = s->vex_l ? gen_helper_movmskps_ymm : gen_helper_movmskps_xmm; + pd = s->vex_l ? gen_helper_movmskpd_ymm : gen_helper_movmskpd_xmm; + fn = s->prefix & PREFIX_DATA ? pd : ps; + fn(s->tmp2_i32, cpu_env, s->ptr2); + tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); +} + static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { MemOp ot = decode->op[0].ot; @@ -733,3 +877,34 @@ static void gen_SHRX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) } tcg_gen_shr_tl(s->T0, s->T0, s->T1); } + +#define gen_VAND gen_PAND +#define gen_VANDN gen_PANDN + +static void gen_VCVTfp2fp(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_unary_fp_sse(s, env, decode, + gen_helper_cvtpd2ps_xmm, gen_helper_cvtps2pd_xmm, + gen_helper_cvtpd2ps_ymm, gen_helper_cvtps2pd_ymm, + gen_helper_cvtsd2ss, gen_helper_cvtss2sd); +} + +static void gen_VCVTps_dq(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + SSEFunc_0_epp fn = NULL; + switch (sse_prefix(s)) { + case 0x00: + fn = s->vex_l ? gen_helper_cvtdq2ps_ymm : gen_helper_cvtdq2ps_xmm; + break; + case 0x66: + fn = s->vex_l ? gen_helper_cvtps2dq_ymm : gen_helper_cvtps2dq_xmm; + break; + case 0xf3: + fn = s->vex_l ? gen_helper_cvttps2dq_ymm : gen_helper_cvttps2dq_xmm; + break; + } + fn(cpu_env, s->ptr0, s->ptr2); +} + +#define gen_VOR gen_POR +#define gen_VXOR gen_PXOR diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 11c17258eb..8ef419dd59 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4667,7 +4667,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) use_new &= b <= limit; #endif if (use_new && - ((b >= 0x160 && b <= 0x16f) || + ((b >= 0x150 && b <= 0x16f) || (b >= 0x1d8 && b <= 0x1ff && (b & 8)))) { return disas_insn_new(s, cpu, b + 0x100); } From patchwork Sun Sep 11 23:04:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 78E58ECAAD3 for ; Sun, 11 Sep 2022 23:31:37 +0000 (UTC) Received: from localhost ([::1]:35842 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWQK-0005en-FR for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:31:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34146) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1G-00012n-90 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:29431) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1D-0007LS-OT for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937535; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DBAah+7oggi8iL2POsLEQkKSu7EEFicyPdj/k6QJj1s=; b=UsQR0ikvHT/WFUNzCXWq1eiStbYgKdPghmBf5TfsLbff8Zu+F+nvrzThUwgmDIhTq9ovlY jAqf2WjnmVY7BbF86RNBfwV3IY6ifbdRhUrGRkiltPjEZNq62hzAiCvj0gMMM9TdoUcrUi EY3+W8X5tT8JdOf6f4BLoG2exFvyeeU= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-483-V04ELDmCPCSNZqRZpU47wA-1; Sun, 11 Sep 2022 19:05:33 -0400 X-MC-Unique: V04ELDmCPCSNZqRZpU47wA-1 Received: by mail-ej1-f72.google.com with SMTP id hs4-20020a1709073e8400b0073d66965277so2275946ejc.6 for ; Sun, 11 Sep 2022 16:05:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=DBAah+7oggi8iL2POsLEQkKSu7EEFicyPdj/k6QJj1s=; b=qDd5nSaop+JTkGVgkKeI4XmX6wxDbg5vS0nrpUBApTl33gdnmFw2vC4nlsowLIPHRw mXzhMtNWwUWvgVqRFrHVOGLVOrslrYNWAFqJ0tkafiIwZNQBcUpXLieWDtMjayUd/dtr yuFin7DUHwjiX93wTo9YFifQq8q9O5LGGnDZEhe7PTn6s3N8CUZaA+ZsCG3lFVA6Qoen eeaKDPtZztoDkYsBWDmVUB8boFXvXpLx4O7o2vGPNZzX+Xh6yc/1O6cHCPCabrpQFXVx ihFLlzLhxH2VOKgsVZBJlY0LwREfRmWxvLfMoRqXBqROG/HKXyg5l/hJmCj65S09zln9 jQSA== X-Gm-Message-State: ACgBeo0f6v/zoYdlZ8uBmIx7FNdYlr5xIy3btU0e4/5wsRVsz4NQrcZW IZm3zMLkJKzynegkaTXzH9RKS0gqcXiLmBLdRhBD6bWBho2JOJ9ynGgxYPIhk6dJf5FGk6JwKNK Ey8FS1p/8QZHf+2WYXzcXkXg4dLo1+xAu0nScIL5D2MWE4WYyaK91VunpBb4UpSuvjD0= X-Received: by 2002:a17:907:1612:b0:770:86e3:2f1f with SMTP id hb18-20020a170907161200b0077086e32f1fmr15768281ejc.403.1662937532378; Sun, 11 Sep 2022 16:05:32 -0700 (PDT) X-Google-Smtp-Source: AA6agR4FOTiAUBG/8q/jVNZN/7aAYxlH8N+MPgLUb2YEbtAwDX/MLNsYb2vrnF4s6kst+t8mnF2QPg== X-Received: by 2002:a17:907:1612:b0:770:86e3:2f1f with SMTP id hb18-20020a170907161200b0077086e32f1fmr15768269ejc.403.1662937532048; Sun, 11 Sep 2022 16:05:32 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id l18-20020a1709060cd200b0077085fdd613sm3508935ejh.44.2022.09.11.16.05.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:31 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 23/37] target/i386: reimplement 0x0f 0x78-0x7f, add AVX Date: Mon, 12 Sep 2022 01:04:03 +0200 Message-Id: <20220911230418.340941-24-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" These are a mixed match, including the first two horizontal (66 and F2 only) operations, more moves, and SSE4a extract/insert. Because SSE4a is pretty rare, I chose to leave the helper as they are, but it is possible to unify them by loading index and length from the source XMM register. Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 23 +++++++++ target/i386/tcg/emit.c.inc | 81 ++++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 1 + 3 files changed, 105 insertions(+) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 5a94e05d71..6aa8bac74f 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -159,6 +159,22 @@ static void decode_0F6F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui } } +static void decode_0F7E(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry movd_from_vec = + X86_OP_ENTRY3(MOVD_from, E,y, None,None, V,y, vex5 mmx); + static const X86OpEntry movq = + X86_OP_ENTRY3(MOVQ, V,x, None,None, W,q, vex5); /* wrong dest Vy on SDM! */ + + if (s->prefix & PREFIX_REPNZ) { + entry->gen = NULL; + } else if (s->prefix & PREFIX_REPZ) { + *entry = movq; + } else { + *entry = movd_from_vec; + } +} + static const X86OpEntry opcodes_0F38_00toEF[240] = { }; @@ -297,6 +313,13 @@ static const X86OpEntry opcodes_0F[256] = { [0x6e] = X86_OP_ENTRY3(MOVD_to, V,x, None,None, E,y, vex5 mmx p_00_66), /* wrong dest Vy on SDM! */ [0x6f] = X86_OP_GROUP3(0F6F, V,x, None,None, W,x, vex5 mmx p_00_66_f3), + [0x78] = X86_OP_ENTRY2(SSE4a_I, V,x, I,w, cpuid(SSE4A) p_66_f2), + [0x79] = X86_OP_ENTRY2(SSE4a_R, V,x, W,x, cpuid(SSE4A) p_66_f2), + [0x7c] = X86_OP_ENTRY3(VHADD, V,x, H,x, W,x, vex2 cpuid(SSE3) p_66_f2), + [0x7d] = X86_OP_ENTRY3(VHSUB, V,x, H,x, W,x, vex2 cpuid(SSE3) p_66_f2), + [0x7e] = X86_OP_GROUP0(0F7E), + [0x7f] = X86_OP_GROUP3(0F6F, W,x, None,None, V,x, vex5 mmx p_00_66_f3), + /* Incorrectly missing from 2-17 */ [0xd8] = X86_OP_ENTRY3(PSUBUSB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0xd9] = X86_OP_ENTRY3(PSUBUSW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 5feb50efdb..2053c9d8fb 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -380,6 +380,30 @@ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod UNARY_FP32_SSE(VRSQRT, rsqrt) UNARY_FP32_SSE(VRCP, rcp) +/* + * 66 = v*pd Vpd, Hpd, Wpd + * f2 = v*ps Vps, Hps, Wps + */ +static inline void gen_horizontal_fp_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_eppp pd_xmm, SSEFunc_0_eppp ps_xmm, + SSEFunc_0_eppp pd_ymm, SSEFunc_0_eppp ps_ymm) +{ + SSEFunc_0_eppp ps, pd, fn; + ps = s->vex_l ? ps_ymm : ps_xmm; + pd = s->vex_l ? pd_ymm : pd_xmm; + fn = s->prefix & PREFIX_DATA ? pd : ps; + fn(cpu_env, s->ptr0, s->ptr1, s->ptr2); +} +#define HORIZONTAL_FP_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_horizontal_fp_sse(s, env, decode, \ + gen_helper_##lname##pd_xmm, gen_helper_##lname##ps_xmm, \ + gen_helper_##lname##pd_ymm, gen_helper_##lname##ps_ymm); \ +} +HORIZONTAL_FP_SSE(VHADD, hadd) +HORIZONTAL_FP_SSE(VHSUB, hsub) + /* * 00 = p* Pq, Qq (if mmx not NULL; no VEX) * 66 = vp* Vx, Hx, Wx @@ -621,6 +645,28 @@ static void gen_MOVBE(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) } } +static void gen_MOVD_from(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[2].ot; + int lo_ofs = decode->op[2].offset + - xmm_offset(decode->op[2].ot) + + xmm_offset(ot); + + switch (ot) { + case MO_32: +#ifdef TARGET_X86_64 + tcg_gen_ld_i32(s->tmp2_i32, cpu_env, lo_ofs); + tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); + break; + case MO_64: +#endif + tcg_gen_ld_tl(s->T0, cpu_env, lo_ofs); + break; + default: + abort(); + } +} + static void gen_MOVD_to(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { MemOp ot = decode->op[2].ot; @@ -661,6 +707,18 @@ static void gen_MOVMSK(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); } +static void gen_MOVQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + int lo_ofs = decode->op[0].offset + - xmm_offset(decode->op[0].ot) + + xmm_offset(MO_64); + + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[2].offset); + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, lo_ofs); +} + static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { MemOp ot = decode->op[0].ot; @@ -878,6 +936,29 @@ static void gen_SHRX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) tcg_gen_shr_tl(s->T0, s->T0, s->T1); } +static void gen_SSE4a_I(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 length = tcg_const_i32(decode->immediate & 255); + TCGv_i32 index = tcg_const_i32(decode->immediate >> 8); + + if (s->prefix & PREFIX_DATA) { + gen_helper_extrq_i(cpu_env, s->ptr0, index, length); + } else { + gen_helper_insertq_i(cpu_env, s->ptr0, index, length); + } + tcg_temp_free_i32(length); + tcg_temp_free_i32(index); +} + +static void gen_SSE4a_R(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + if (s->prefix & PREFIX_DATA) { + gen_helper_extrq_r(cpu_env, s->ptr0, s->ptr2); + } else { + gen_helper_insertq_r(cpu_env, s->ptr0, s->ptr2); + } +} + #define gen_VAND gen_PAND #define gen_VANDN gen_PANDN diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 8ef419dd59..53d693279a 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4668,6 +4668,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) #endif if (use_new && ((b >= 0x150 && b <= 0x16f) || + (b >= 0x178 && b <= 0x17f) || (b >= 0x1d8 && b <= 0x1ff && (b & 8)))) { return disas_insn_new(s, cpu, b + 0x100); } From patchwork Sun Sep 11 23:04:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973136 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CCBFAECAAD3 for ; Sun, 11 Sep 2022 23:34:53 +0000 (UTC) Received: from localhost ([::1]:48600 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWTU-0002rI-QS for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:34:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34148) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1G-00014Z-Iu for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:20793) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1E-0007Ln-70 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937538; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2qPRXUOvERwr+r3H8Hg4XWGUZlIxjCBm8Pacyfe2giU=; b=IZCnhZtCK7S8ExnW2Z2kUJmI4aDze7b4kFPXF5PJUBOCmNaZVrs8cZc9MAO5K/saz1/6aW 9rPPTM3L1WJW1CTCuNfeo065kUfI5ABQZ3ZBXv8jPofCp9ra37/lVq3Rflft4mz4qyDi32 C2/zPvD224TFq383GB6EddSGi2k7jYY= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-158-yiduVVCaME6Z0SS8OjqI_g-1; Sun, 11 Sep 2022 19:05:36 -0400 X-MC-Unique: yiduVVCaME6Z0SS8OjqI_g-1 Received: by mail-ed1-f69.google.com with SMTP id m3-20020a056402430300b004512f6268dbso3996491edc.23 for ; Sun, 11 Sep 2022 16:05:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=2qPRXUOvERwr+r3H8Hg4XWGUZlIxjCBm8Pacyfe2giU=; b=HzSrVo+5cKhOfHCIgng1Ma5Gw7i9HVK/07jsFzpulSkEFldVP6DddrZg05JpsJiAX/ 8LTKTNEQFT7liWd3RGv1MG0cQ2K2/WDdcCgX5XWdFRrz6qUzEtSyftcJGoy0KhboYGNL yd/xErBgdxP+rg1+sUwXFg+CnGjpRKb6awW4cnfhlIuVCUxqwMV0AYKSwDmUQF8m/dHO 2DXLD/mWKFbeqW9rqWWFTkkwA8Q1juUOCmUJFtBWqBMsgEEYsF1aJmaX+JTurCPhfvRQ lT/ta+TwcqENrfVnZIvnt01nfGw7b2l+cYLuXwKHLn20poHb9huGY8FZNK5LgbCnposB o2mg== X-Gm-Message-State: ACgBeo38RuMy5xB/RvcJfQk9KrT//vQBkWTqaOFZsALOzoOCWQ4qoh/k THPutwmsIVCpt1VG7+YkhdoMOjkVGxPP6doRFU1gTwrgrb1Jgs9FJgvAZGwJdYmzdtXTt8uAEgF wVJ1iEG0Hzd1d8e0RXUxkRoVlcmb9pJDpqfNPI++K7FbRzvlDeSTavKmkIByyuUGp4wA= X-Received: by 2002:a05:6402:ca9:b0:44e:d8f3:3d0e with SMTP id cn9-20020a0564020ca900b0044ed8f33d0emr19807157edb.397.1662937535104; Sun, 11 Sep 2022 16:05:35 -0700 (PDT) X-Google-Smtp-Source: AA6agR4llx2fwDzKvLda/2vbSdO483+OVEm7RYrVZqYV4PNSXTu6T4u5Ka4iWbDkx+M6Md5owWYDBw== X-Received: by 2002:a05:6402:ca9:b0:44e:d8f3:3d0e with SMTP id cn9-20020a0564020ca900b0044ed8f33d0emr19807129edb.397.1662937534669; Sun, 11 Sep 2022 16:05:34 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id g21-20020aa7c855000000b0044e983132c3sm4612988edt.60.2022.09.11.16.05.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:34 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 24/37] target/i386: reimplement 0x0f 0x70-0x77, add AVX Date: Mon, 12 Sep 2022 01:04:04 +0200 Message-Id: <20220911230418.340941-25-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This includes shifts by immediate, which use bits 3-5 of the ModRM byte as an opcode extension. With the exception of 128-bit shifts, they are implemented using gvec. This also covers VZEROALL and VZEROUPPER, which use the same opcode as EMMS. If we were wanting to optimize out gen_clear_ymmh then this would be one of the starting points. The implementation of the VZEROALL and VZEROUPPER helpers is by Paul Brook. Signed-off-by: Paolo Bonzini --- target/i386/helper.h | 7 + target/i386/tcg/decode-new.c.inc | 76 ++++++++++ target/i386/tcg/emit.c.inc | 232 +++++++++++++++++++++++++++++++ target/i386/tcg/fpu_helper.c | 46 ++++++ target/i386/tcg/translate.c | 3 +- 5 files changed, 362 insertions(+), 2 deletions(-) diff --git a/target/i386/helper.h b/target/i386/helper.h index 3da5df98b9..d7e6878263 100644 --- a/target/i386/helper.h +++ b/target/i386/helper.h @@ -221,6 +221,13 @@ DEF_HELPER_3(movq, void, env, ptr, ptr) #define SHIFT 2 #include "ops_sse_header.h" +DEF_HELPER_1(vzeroall, void, env) +DEF_HELPER_1(vzeroupper, void, env) +#ifdef TARGET_X86_64 +DEF_HELPER_1(vzeroall_hi8, void, env) +DEF_HELPER_1(vzeroupper_hi8, void, env) +#endif + DEF_HELPER_3(rclb, tl, env, tl, tl) DEF_HELPER_3(rclw, tl, env, tl, tl) DEF_HELPER_3(rcll, tl, env, tl, tl) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 6aa8bac74f..0e2da85934 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -133,6 +133,19 @@ static uint8_t get_modrm(DisasContext *s, CPUX86State *env) return s->modrm; } +static inline const X86OpEntry *decode_by_prefix(DisasContext *s, const X86OpEntry entries[4]) +{ + if (s->prefix & PREFIX_REPNZ) { + return &entries[3]; + } else if (s->prefix & PREFIX_REPZ) { + return &entries[2]; + } else if (s->prefix & PREFIX_DATA) { + return &entries[1]; + } else { + return &entries[0]; + } +} + static void decode_group17(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) { static const X86GenFunc group17_gen[8] = { @@ -142,6 +155,48 @@ static void decode_group17(DisasContext *s, CPUX86State *env, X86OpEntry *entry, entry->gen = group17_gen[op]; } +static void decode_group12_13_14(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry group[3][8] = { + { + /* grp12 */ + {}, + {}, + X86_OP_ENTRY3(PSRLW_i, H,x, U,x, I,b, vex7 mmx avx2_256 p_00_66), + {}, + X86_OP_ENTRY3(PSRAW_i, H,x, U,x, I,b, vex7 mmx avx2_256 p_00_66), + {}, + X86_OP_ENTRY3(PSLLW_i, H,x, U,x, I,b, vex7 mmx avx2_256 p_00_66), + {}, + }, + { + /* grp13 */ + {}, + {}, + X86_OP_ENTRY3(PSRLD_i, H,x, U,x, I,b, vex7 mmx avx2_256 p_00_66), + {}, + X86_OP_ENTRY3(PSRAD_i, H,x, U,x, I,b, vex7 mmx avx2_256 p_00_66), + {}, + X86_OP_ENTRY3(PSLLD_i, H,x, U,x, I,b, vex7 mmx avx2_256 p_00_66), + {}, + }, + { + /* grp14 */ + {}, + {}, + X86_OP_ENTRY3(PSRLQ_i, H,x, U,x, I,b, vex7 mmx avx2_256 p_00_66), + X86_OP_ENTRY3(PSRLDQ_i, H,x, U,x, I,b, vex7 avx2_256 p_66), + {}, + {}, + X86_OP_ENTRY3(PSLLQ_i, H,x, U,x, I,b, vex7 mmx avx2_256 p_00_66), + X86_OP_ENTRY3(PSLLDQ_i, H,x, U,x, I,b, vex7 avx2_256 p_66), + } + }; + + int op = (get_modrm(s, env) >> 3) & 7; + *entry = group[*b - 0x71][op]; +} + static void decode_0F6F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) { if (s->prefix & PREFIX_REPNZ) { @@ -159,6 +214,18 @@ static void decode_0F6F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui } } +static void decode_0F70(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry pshufw[4] = { + X86_OP_ENTRY3(PSHUFW, P,q, Q,q, I,b, vex4), + X86_OP_ENTRY3(PSHUFD, V,x, W,x, I,b, vex4 avx2_256), + X86_OP_ENTRY3(PSHUFHW, V,x, W,x, I,b, vex4 avx2_256), + X86_OP_ENTRY3(PSHUFLW, V,x, W,x, I,b, vex4 avx2_256), + }; + + *entry = *decode_by_prefix(s, pshufw); +} + static void decode_0F7E(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) { static const X86OpEntry movd_from_vec = @@ -292,6 +359,15 @@ static const X86OpEntry opcodes_0F[256] = { [0x66] = X86_OP_ENTRY3(PCMPGTD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0x67] = X86_OP_ENTRY3(PACKUSWB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x70] = X86_OP_GROUP0(0F70), + [0x71] = X86_OP_GROUP0(group12_13_14), + [0x72] = X86_OP_GROUP0(group12_13_14), + [0x73] = X86_OP_GROUP0(group12_13_14), + [0x74] = X86_OP_ENTRY3(PCMPEQB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x75] = X86_OP_ENTRY3(PCMPEQW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x76] = X86_OP_ENTRY3(PCMPEQD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0x77] = X86_OP_ENTRY0(EMMS_VZERO, vex8), + [0x38] = X86_OP_GROUP0(0F38), [0x3a] = X86_OP_GROUP0(0F3A), diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 2053c9d8fb..fb01035d06 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -475,6 +475,30 @@ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod BINARY_INT_SSE(PUNPCKLQDQ, punpcklqdq) BINARY_INT_SSE(PUNPCKHQDQ, punpckhqdq) +static inline void gen_unary_imm_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_ppi xmm, SSEFunc_0_ppi ymm) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + if (!s->vex_l) { + xmm(s->ptr0, s->ptr1, imm); + } else { + ymm(s->ptr0, s->ptr1, imm); + } + tcg_temp_free_i32(imm); +} + +#define UNARY_IMM_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_unary_imm_sse(s, env, decode, \ + gen_helper_##lname##_xmm, \ + gen_helper_##lname##_ymm); \ +} + +UNARY_IMM_SSE(PSHUFD, pshufd) +UNARY_IMM_SSE(PSHUFHW, pshufhw) +UNARY_IMM_SSE(PSHUFLW, pshuflw) + static void gen_ADCOX(DisasContext *s, CPUX86State *env, MemOp ot, int cc_op) { TCGv carry_in = NULL; @@ -633,6 +657,29 @@ static void gen_CRC32(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) gen_helper_crc32(s->T0, s->tmp2_i32, s->T1, tcg_const_i32(8 << ot)); } +static void gen_EMMS_VZERO(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + if (!(s->prefix & PREFIX_VEX)) { + gen_helper_emms(cpu_env); + return; + } + if (s->vex_l) { + gen_helper_vzeroall(cpu_env); +#ifdef TARGET_X86_64 + if (CODE64(s)) { + gen_helper_vzeroall_hi8(cpu_env); + } +#endif + } else { + gen_helper_vzeroupper(cpu_env); +#ifdef TARGET_X86_64 + if (CODE64(s)) { + gen_helper_vzeroupper_hi8(cpu_env); + } +#endif + } +} + static void gen_MOVBE(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { MemOp ot = decode->op[0].ot; @@ -787,6 +834,33 @@ static void gen_PANDN(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) decode->op[1].offset, vec_len, vec_len); } +static void gen_PCMPEQB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_cmp(TCG_COND_EQ, MO_8, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PCMPEQW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_cmp(TCG_COND_EQ, MO_16, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PCMPEQD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_cmp(TCG_COND_EQ, MO_32, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + static void gen_PCMPGTB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); @@ -841,6 +915,164 @@ static void gen_POR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) decode->op[2].offset, vec_len, vec_len); } +static void gen_PSHUFW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + gen_helper_pshufw_mmx(s->ptr0, s->ptr1, imm); + tcg_temp_free_i32(imm); +} + +static void gen_PSRLW_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + if (decode->immediate >= 16) { + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + } else { + tcg_gen_gvec_shri(MO_16, + decode->op[0].offset, decode->op[1].offset, + decode->immediate, vec_len, vec_len); + } +} + +static void gen_PSLLW_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + if (decode->immediate >= 16) { + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + } else { + tcg_gen_gvec_shli(MO_16, + decode->op[0].offset, decode->op[1].offset, + decode->immediate, vec_len, vec_len); + } +} + +static void gen_PSRAW_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + if (decode->immediate >= 16) { + decode->immediate = 15; + } + tcg_gen_gvec_sari(MO_16, + decode->op[0].offset, decode->op[1].offset, + decode->immediate, vec_len, vec_len); +} + +static void gen_PSRLD_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + if (decode->immediate >= 32) { + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + } else { + tcg_gen_gvec_shri(MO_32, + decode->op[0].offset, decode->op[1].offset, + decode->immediate, vec_len, vec_len); + } +} + +static void gen_PSLLD_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + if (decode->immediate >= 32) { + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + } else { + tcg_gen_gvec_shli(MO_32, + decode->op[0].offset, decode->op[1].offset, + decode->immediate, vec_len, vec_len); + } +} + +static void gen_PSRAD_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + if (decode->immediate >= 32) { + decode->immediate = 31; + } + tcg_gen_gvec_sari(MO_32, + decode->op[0].offset, decode->op[1].offset, + decode->immediate, vec_len, vec_len); +} + +static void gen_PSRLQ_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + if (decode->immediate >= 64) { + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + } else { + tcg_gen_gvec_shri(MO_64, + decode->op[0].offset, decode->op[1].offset, + decode->immediate, vec_len, vec_len); + } +} + +static void gen_PSLLQ_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + if (decode->immediate >= 64) { + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + } else { + tcg_gen_gvec_shli(MO_64, + decode->op[0].offset, decode->op[1].offset, + decode->immediate, vec_len, vec_len); + } +} + +static inline TCGv_ptr make_imm_mmx_vec(uint32_t imm) +{ + TCGv_i64 imm_v = tcg_const_i64(imm); + TCGv_ptr ptr = tcg_temp_new_ptr(); + tcg_gen_addi_ptr(ptr, cpu_env, offsetof(CPUX86State, mmx_t0)); + tcg_gen_st_i64(imm_v, ptr, offsetof(MMXReg, MMX_Q(0))); + return ptr; +} + +static inline TCGv_ptr make_imm_xmm_vec(uint32_t imm, int vec_len) +{ + MemOp ot = vec_len == 16 ? MO_128 : MO_256; + TCGv_i32 imm_v = tcg_const_i32(imm); + TCGv_ptr ptr = tcg_temp_new_ptr(); + + tcg_gen_gvec_dup_imm(MO_64, offsetof(CPUX86State, xmm_t0) + xmm_offset(ot), + vec_len, vec_len, 0); + + tcg_gen_addi_ptr(ptr, cpu_env, offsetof(CPUX86State, xmm_t0)); + tcg_gen_st_i32(imm_v, ptr, offsetof(ZMMReg, ZMM_L(0))); + return ptr; +} + +static void gen_PSRLDQ_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + TCGv_ptr imm_vec = make_imm_xmm_vec(decode->immediate, vec_len); + + if (s->vex_l) { + gen_helper_psrldq_ymm(cpu_env, s->ptr0, s->ptr1, imm_vec); + } else { + gen_helper_psrldq_xmm(cpu_env, s->ptr0, s->ptr1, imm_vec); + } + tcg_temp_free_ptr(imm_vec); +} + +static void gen_PSLLDQ_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + TCGv_ptr imm_vec = make_imm_xmm_vec(decode->immediate, vec_len); + + if (s->vex_l) { + gen_helper_pslldq_ymm(cpu_env, s->ptr0, s->ptr1, imm_vec); + } else { + gen_helper_pslldq_xmm(cpu_env, s->ptr0, s->ptr1, imm_vec); + } + tcg_temp_free_ptr(imm_vec); +} + static void gen_PXOR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 819e920ec6..230907bc5c 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -3056,3 +3056,49 @@ void helper_movq(CPUX86State *env, void *d, void *s) #define SHIFT 2 #include "ops_sse.h" + +void helper_vzeroall(CPUX86State *env) +{ + int i; + + for (i = 0; i < 8; i++) { + env->xmm_regs[i].ZMM_Q(0) = 0; + env->xmm_regs[i].ZMM_Q(1) = 0; + env->xmm_regs[i].ZMM_Q(2) = 0; + env->xmm_regs[i].ZMM_Q(3) = 0; + } +} + +void helper_vzeroupper(CPUX86State *env) +{ + int i; + + for (i = 0; i < 8; i++) { + env->xmm_regs[i].ZMM_Q(2) = 0; + env->xmm_regs[i].ZMM_Q(3) = 0; + } +} + +#ifdef TARGET_X86_64 +void helper_vzeroall_hi8(CPUX86State *env) +{ + int i; + + for (i = 8; i < 16; i++) { + env->xmm_regs[i].ZMM_Q(0) = 0; + env->xmm_regs[i].ZMM_Q(1) = 0; + env->xmm_regs[i].ZMM_Q(2) = 0; + env->xmm_regs[i].ZMM_Q(3) = 0; + } +} + +void helper_vzeroupper_hi8(CPUX86State *ense_new && - ((b >= 0x150 && b <= 0x16f) || - (b >= 0x178 && b <= 0x17f) || + ((b >= 0x150 && b <= 0x17f) || (b >= 0x1d8 && b <= 0x1ff && (b & 8)))) { return disas_insn_new(s, cpu, b + 0x100); } -- 2.37.2 From patchwork Sun Sep 11 23:04:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 67261C6FA83 for ; Sun, 11 Sep 2022 23:25:27 +0000 (UTC) Received: from localhost ([::1]:35114 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWKM-0003aP-GW for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:25:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34150) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1H-00019D-PO for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:43 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:55935) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1F-0007Lx-LM for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937541; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N3NZ7MC0CPDroZn4Uo/o+l3CtgTMCvB3BnxcXynte+4=; b=IP38LEUdbDxBkJ7gYRgbp0XmfJEy4KlJm18gJANJvO4bWNKBfLnS4YXqv6dI6PbZaJvP37 AvHPVc/sRdEWSQNy4Ow52uaRWrmmZjcNDn2h4HH56ErZFJANRDc034+HH5aCKmzi7oZz7P Ge4xDHofCUEkpDUHIDaletI9HLTXLVE= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-137-Es3xfaMiOW-CxAfxIR2hhA-1; Sun, 11 Sep 2022 19:05:39 -0400 X-MC-Unique: Es3xfaMiOW-CxAfxIR2hhA-1 Received: by mail-ed1-f69.google.com with SMTP id t13-20020a056402524d00b0043db1fbefdeso4955908edd.2 for ; Sun, 11 Sep 2022 16:05:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=N3NZ7MC0CPDroZn4Uo/o+l3CtgTMCvB3BnxcXynte+4=; b=24XRk69M3XBZB81zgVNpnCP6b2SHAChFZoCRSrg/EudWaE04Jvo1/eJTZv/4vQBJ4Y 0piAT8lpL8Zq+5m5dIKMr96zBPHsrTn27wokgCYDhz4oLom1Kz15u/Ql3ucB3izFq1+V JiM48qfAzBAPgcbw9+zLcFwQOJtechInGgFMRERa5drwJXfTP1oa/diQZ8LmE6mkng5m 097PUaTid5HOiR0tlKTVD6/jPAdimntfVUboAlwDrTbuu8UBi6SX7mgorTNiDEQ67ipX 9TsrPtcbrIr0BYVAhaohHZjHp6hqmHWaIRBQbhqDTAJl3gLP4SqoZ4RQnvbRP1caOMvi 8MYQ== X-Gm-Message-State: ACgBeo1XhUhTWzLoP/KPC+h6hT5NQNR65ApD+btcB/Ry7ogacpah7qYg 2frkgkw8Wi7msPMUIB8V/hEvkT7Eb7O5O4/mjG6zhAc+WQRaXdYhr+X34AFMR7DmK3MUfxZkScH 3uuCshiQx1WsgKjzB4n0Usv30ES+Czogc/+swBtYPLvpVjb5fNrnTBzIQOlrbtVR6XHk= X-Received: by 2002:a05:6402:2709:b0:451:d665:e787 with SMTP id y9-20020a056402270900b00451d665e787mr1701356edd.317.1662937537635; Sun, 11 Sep 2022 16:05:37 -0700 (PDT) X-Google-Smtp-Source: AA6agR4TG3wrLMeV5w+Uh9xHH7oxi/9QiI4jgNnl+vMhz7MDB9/L4US/dzga+xeH5kuQSHUBWb1uVQ== X-Received: by 2002:a05:6402:2709:b0:451:d665:e787 with SMTP id y9-20020a056402270900b00451d665e787mr1701329edd.317.1662937537179; Sun, 11 Sep 2022 16:05:37 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id 13-20020a170906308d00b00742a4debae1sm3539878ejv.6.2022.09.11.16.05.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:36 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 25/37] target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX Date: Mon, 12 Sep 2022 01:04:05 +0200 Message-Id: <20220911230418.340941-26-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The more complicated ones here are d6-d7, e6-e7, f7. The others are trivial. Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 39 +++++++++++++ target/i386/tcg/emit.c.inc | 99 +++++++++++++++++++++++++++++--- target/i386/tcg/translate.c | 4 +- 3 files changed, 133 insertions(+), 9 deletions(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 0e2da85934..e9a9981a7f 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -242,6 +242,18 @@ static void decode_0F7E(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui } } +static void decode_0FD6(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry movq[4] = { + {}, + X86_OP_ENTRY3(MOVQ, W,x, None, None, V,q, vex5), + X86_OP_ENTRY3(MOVq_dq, V,dq, None, None, N,q), + X86_OP_ENTRY3(MOVq_dq, P,q, None, None, U,q), + }; + + *entry = *decode_by_prefix(s, movq); +} + static const X86OpEntry opcodes_0F38_00toEF[240] = { }; @@ -396,6 +408,33 @@ static const X86OpEntry opcodes_0F[256] = { [0x7e] = X86_OP_GROUP0(0F7E), [0x7f] = X86_OP_GROUP3(0F6F, W,x, None,None, V,x, vex5 mmx p_00_66_f3), + [0xd0] = X86_OP_ENTRY3(VADDSUB, V,x, H,x, W,x, vex2 cpuid(SSE3) p_66_f2), + [0xd1] = X86_OP_ENTRY3(PSRLW_r, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xd2] = X86_OP_ENTRY3(PSRLD_r, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xd3] = X86_OP_ENTRY3(PSRLQ_r, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xd4] = X86_OP_ENTRY3(PADDQ, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xd5] = X86_OP_ENTRY3(PMULLW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xd6] = X86_OP_GROUP0(0FD6), + [0xd7] = X86_OP_ENTRY3(PMOVMSKB, G,d, None,None, U,x, vex7 mmx avx2_256 p_00_66), /* MOVNTQ/MOVNTDQ */ + + [0xe0] = X86_OP_ENTRY3(PAVGB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xe1] = X86_OP_ENTRY3(PSRAW_r, V,x, H,x, W,x, vex7 mmx avx2_256 p_00_66), + [0xe2] = X86_OP_ENTRY3(PSRAD_r, V,x, H,x, W,x, vex7 mmx avx2_256 p_00_66), + [0xe3] = X86_OP_ENTRY3(PAVGW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xe4] = X86_OP_ENTRY3(PMULHUW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xe5] = X86_OP_ENTRY3(PMULHW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xe6] = X86_OP_ENTRY2(VCVTpd_dq, V,x, W,x, vex2 p_66_f3_f2), + [0xe7] = X86_OP_ENTRY3(MOVDQ, W,x, None,None, V,x, vex1 mmx p_00_66), /* MOVNTQ/MOVNTDQ */ + + [0xf0] = X86_OP_ENTRY3(LDDQU, V,x, None,None, M,x, vex4_unal cpuid(SSE3) p_f2), + [0xf1] = X86_OP_ENTRY3(PSLLW_r, V,x, H,x, W,x, vex7 mmx avx2_256 p_00_66), + [0xf2] = X86_OP_ENTRY3(PSLLD_r, V,x, H,x, W,x, vex7 mmx avx2_256 p_00_66), + [0xf3] = X86_OP_ENTRY3(PSLLQ_r, V,x, H,x, W,x, vex7 mmx avx2_256 p_00_66), + [0xf4] = X86_OP_ENTRY3(PMULUDQ, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xf5] = X86_OP_ENTRY3(PMADDWD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xf6] = X86_OP_ENTRY3(PSADBW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), + [0xf7] = X86_OP_ENTRY3(MASKMOV, None,None, V,dq, U,dq, vex4_unal avx2_256 mmx p_00_66), + /* Incorrectly missing from 2-17 */ [0xd8] = X86_OP_ENTRY3(PSUBUSB, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0xd9] = X86_OP_ENTRY3(PSUBUSW, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index fb01035d06..c90f933093 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -403,6 +403,7 @@ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod } HORIZONTAL_FP_SSE(VHADD, hadd) HORIZONTAL_FP_SSE(VHSUB, hsub) +HORIZONTAL_FP_SSE(VADDSUB, addsub) /* * 00 = p* Pq, Qq (if mmx not NULL; no VEX) @@ -462,6 +463,24 @@ BINARY_INT_MMX(PADDSB, paddsb) BINARY_INT_MMX(PADDSW, paddsw) BINARY_INT_MMX(PMAXSW, pmaxsw) +BINARY_INT_MMX(PAVGB, pavgb) +BINARY_INT_MMX(PAVGW, pavgw) +BINARY_INT_MMX(PMADDWD, pmaddwd) +BINARY_INT_MMX(PMULHUW, pmulhuw) +BINARY_INT_MMX(PMULHW, pmulhw) +BINARY_INT_MMX(PMULLW, pmullw) +BINARY_INT_MMX(PMULUDQ, pmuludq) +BINARY_INT_MMX(PSADBW, psadbw) + +BINARY_INT_MMX(PSLLW_r, psllw) +BINARY_INT_MMX(PSLLD_r, pslld) +BINARY_INT_MMX(PSLLQ_r, psllq) +BINARY_INT_MMX(PSRLW_r, psrlw) +BINARY_INT_MMX(PSRLD_r, psrld) +BINARY_INT_MMX(PSRLQ_r, psrlq) +BINARY_INT_MMX(PSRAW_r, psraw) +BINARY_INT_MMX(PSRAD_r, psrad) + /* Instructions with no MMX equivalent. */ #define BINARY_INT_SSE(uname, lname) \ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ @@ -680,6 +699,24 @@ static void gen_EMMS_VZERO(DisasContext *s, CPUX86State *env, X86DecodedInsn *de } } +static void gen_LDDQU(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_load_sse(s, s->T0, decode->op[0].ot, decode->op[0].offset); +} + +static void gen_MASKMOV(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + tcg_gen_mov_tl(s->A0, cpu_regs[R_EDI]); + gen_extu(s->aflag, s->A0); + gen_add_A0_ds_seg(s); + + if (s->prefix & PREFIX_DATA) { + gen_helper_maskmov_xmm(cpu_env, s->ptr1, s->ptr2, s->A0); + } else { + gen_helper_maskmov_mmx(cpu_env, s->ptr1, s->ptr2, s->A0); + } +} + static void gen_MOVBE(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { MemOp ot = decode->op[0].ot; @@ -756,14 +793,26 @@ static void gen_MOVMSK(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode static void gen_MOVQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { - int vec_len = sse_vec_len(s, decode); - int lo_ofs = decode->op[0].offset - - xmm_offset(decode->op[0].ot) - + xmm_offset(MO_64); - tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[2].offset); - tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); - tcg_gen_st_i64(s->tmp1_i64, cpu_env, lo_ofs); + + if (decode->op[0].has_ea) { + gen_op_st_v(s, MO_64, s->tmp1_i64, s->A0); + } else { + int vec_len = sse_vec_len(s, decode); + int lo_ofs = decode->op[0].offset + - xmm_offset(decode->op[0].ot) + + xmm_offset(MO_64); + + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, lo_ofs); + } +} + +static void gen_MOVq_dq(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_helper_enter_mmx(cpu_env); + /* Otherwise the same as any other movq. */ + return gen_MOVQ(s, env, decode); } static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) @@ -816,6 +865,15 @@ static void gen_PADDD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) decode->op[2].offset, vec_len, vec_len); } +static void gen_PADDQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_add(MO_64, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + static void gen_PAND(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); @@ -906,6 +964,16 @@ static void gen_PEXT(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) gen_helper_pext(s->T0, s->T0, s->T1); } +static void gen_PMOVMSKB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + if (s->prefix & PREFIX_DATA) { + gen_helper_pmovmskb_xmm(s->tmp2_i32, cpu_env, s->ptr2); + } else { + gen_helper_pmovmskb_mmx(s->tmp2_i32, cpu_env, s->ptr2); + } + tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); +} + static void gen_POR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); @@ -1202,6 +1270,23 @@ static void gen_VCVTfp2fp(DisasContext *s, CPUX86State *env, X86DecodedInsn *dec gen_helper_cvtsd2ss, gen_helper_cvtss2sd); } +static void gen_VCVTpd_dq(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + SSEFunc_0_epp fn = NULL; + switch (sse_prefix(s)) { + case 0x66: + fn = s->vex_l ? gen_helper_cvttpd2dq_ymm : gen_helper_cvttpd2dq_xmm; + break; + case 0xf3: + fn = s->vex_l ? gen_helper_cvtdq2pd_ymm : gen_helper_cvtdq2pd_xmm; + break; + case 0xf2: + fn = s->vex_l ? gen_helper_cvtpd2dq_ymm : gen_helper_cvtpd2dq_xmm; + break; + } + fn(cpu_env, s->ptr0, s->ptr2); +} + static void gen_VCVTps_dq(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { SSEFunc_0_epp fn = NULL; diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 45287dfea2..d15e988891 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4668,8 +4668,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) #endif if (use_new && ((b >= 0x150 && b <= 0x17f) || - (b >= 0x1d8 && b <= 0x1ff && (b & 8)))) { - return disas_insn_new(s, cpu, b + 0x100); + (b >= 0x1d0 && b <= 0x1ff))) { + return disas_insn_new(s, cpu, b); } break; case 0xf3: From patchwork Sun Sep 11 23:04:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973154 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 76912ECAAD3 for ; Sun, 11 Sep 2022 23:43:00 +0000 (UTC) Received: from localhost ([::1]:56330 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWbL-0005GQ-Ix for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:42:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40454) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1L-0001Ki-4S for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:47 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:54679) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1I-0007MO-8M for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937543; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RVLfn8FkZ0VcBJC3suTAwrnb6C5HFPcZMGzf+IzVQYI=; b=V3unoRwKdJ2A9sl942eALSRrsJQfW82kxAfIqj3rNfrzJFJyMNkoxoDhz8PQPVmabI3g+C fAmsgXHKpKbj009ZqILRiZG1yR51uSkQ/f8prb2+m7FdyzYm2etpeVUFklyS+xNnULrsQa gtuh31TLQr6wNbnJT9I+nKVXs3dMFlg= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-115-NmEniMP0P6-C9IUCEcslVA-1; Sun, 11 Sep 2022 19:05:42 -0400 X-MC-Unique: NmEniMP0P6-C9IUCEcslVA-1 Received: by mail-ed1-f72.google.com with SMTP id i17-20020a05640242d100b0044f18a5379aso5054244edc.21 for ; Sun, 11 Sep 2022 16:05:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=RVLfn8FkZ0VcBJC3suTAwrnb6C5HFPcZMGzf+IzVQYI=; b=GaU1Z/wA39EGi8ss359OdglgMPT34mAmksS5Owd/A9BSZgEr6e8Oy+UFxuz5WJcIo4 PXwYg7wLV+uZvh5uUEBYkQkFaJDUlQPjBdXB6HPH1Sk8E8Yfwk0Ww+RozCfkaTZxSCwM wiX/o97FZoDZ25khF8ncLwTO2ahqIGvO1oie+BF2unvEXfOw1OqwM1blCx5J2Y3zBPBe U1QwXhg9t1YbTDhkrdqZCprovTpN/SZzusqeC55XHrTZagR9dGD1aa7JzB2BvEgGlE/H wD6UXWKIAY2hE4Mo0Wjkh1Lq9fBwA1DATE7uc9K3STFS8ShW7C5WP19MHU1GBLo2eF2B EIKg== X-Gm-Message-State: ACgBeo2kN+xcYyx4LRYvVDpQSiFvCVX1+tNe/rKDfo9ivhLn9sUOnKKd x3BwwLybaF/KQmYaunSLbUvH0fXc3WqtFoFDrB9jxSdTR7AWiIPKX0sNLwZ0S4MQwR8CNeHhRjC sx773CDJ76tKFHsLSb0B0dIy6/m4zWeC5ybuHAtWs/hU1TdzNvdU725RoFtnHat2sXh4= X-Received: by 2002:a05:6402:5168:b0:44e:9ca8:bf6 with SMTP id d8-20020a056402516800b0044e9ca80bf6mr20081283ede.384.1662937540553; Sun, 11 Sep 2022 16:05:40 -0700 (PDT) X-Google-Smtp-Source: AA6agR7gXN8VfvbfwLrKcqiVtB05W67P3/gQC8sd+DSf7BbFdStYmGfWxE3Yibl8FIg2SdUvW9ckkQ== X-Received: by 2002:a05:6402:5168:b0:44e:9ca8:bf6 with SMTP id d8-20020a056402516800b0044e9ca80bf6mr20081262ede.384.1662937539902; Sun, 11 Sep 2022 16:05:39 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id kb8-20020a170907924800b00777d41ba812sm3459180ejb.113.2022.09.11.16.05.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:39 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 26/37] target/i386: reimplement 0x0f 0x3a, add AVX Date: Mon, 12 Sep 2022 01:04:06 +0200 Message-Id: <20220911230418.340941-27-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations are encoded in the opcode rater than in the prefixes. These three-byte opcodes also include AVX new instructions, whose implementation in the helpers was originally done by Paul Brook . Signed-off-by: Paolo Bonzini --- target/i386/ops_sse.h | 95 +++++++++ target/i386/ops_sse_header.h | 10 + target/i386/tcg/decode-new.c.inc | 75 +++++++ target/i386/tcg/emit.c.inc | 323 +++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 3 +- 5 files changed, 505 insertions(+), 1 deletion(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 4f72164c0f..7eba1cf0f1 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2381,6 +2381,101 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #endif #endif +#if SHIFT >= 1 +void glue(helper_vpermilpd_imm, SUFFIX)(Reg *d, Reg *s, uint32_t order) +{ + uint64_t r0, r1; + int i; + + for (i = 0; i < 1 << SHIFT; i += 2) { + r0 = s->Q(i + ((order >> 0) & 1)); + r1 = s->Q(i + ((order >> 1) & 1)); + d->Q(i) = r0; + d->Q(i+1) = r1; + + order >>= 2; + } +} + +void glue(helper_vpermilps_imm, SUFFIX)(Reg *d, Reg *s, uint32_t order) +{ + uint32_t r0, r1, r2, r3; + int i; + + for (i = 0; i < 2 << SHIFT; i += 4) { + r0 = s->L(i + ((order >> 0) & 3)); + r1 = s->L(i + ((order >> 2) & 3)); + r2 = s->L(i + ((order >> 4) & 3)); + r3 = s->L(i + ((order >> 6) & 3)); + d->L(i) = r0; + d->L(i+1) = r1; + d->L(i+2) = r2; + d->L(i+3) = r3; + } +} + +#if SHIFT >= 2 +void helper_vpermdq_ymm(Reg *d, Reg *v, Reg *s, uint32_t order) +{ + uint64_t r0, r1, r2, r3; + + switch (order & 3) { + case 0: + r0 = v->Q(0); + r1 = v->Q(1); + break; + case 1: + r0 = v->Q(2); + r1 = v->Q(3); + break; + case 2: + r0 = s->Q(0); + r1 = s->Q(1); + break; + case 3: + r0 = s->Q(2); + r1 = s->Q(3); + break; + } + switch ((order >> 4) & 3) { + case 0: + r2 = v->Q(0); + r3 = v->Q(1); + break; + case 1: + r2 = v->Q(2); + r3 = v->Q(3); + break; + case 2: + r2 = s->Q(0); + r3 = s->Q(1); + break; + case 3: + r2 = s->Q(2); + r3 = s->Q(3); + break; + } + d->Q(0) = r0; + d->Q(1) = r1; + d->Q(2) = r2; + d->Q(3) = r3; +} + +void helper_vpermq_ymm(Reg *d, Reg *s, uint32_t order) +{ + uint64_t r0, r1, r2, r3; + r0 = s->Q(order & 3); + r1 = s->Q((order >> 2) & 3); + r2 = s->Q((order >> 4) & 3); + r3 = s->Q((order >> 6) & 3); + d->Q(0) = r0; + d->Q(1) = r1; + d->Q(2) = r2; + d->Q(3) = r3; +} +#endif +#endif + #undef SSE_HELPER_S #undef LANE_WIDTH diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 4041816945..6b70d90734 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -411,6 +411,16 @@ DEF_HELPER_4(glue(aeskeygenassist, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_5(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, Reg, i32) #endif +/* AVX helpers */ +#if SHIFT >= 1 +DEF_HELPER_3(glue(vpermilpd_imm, SUFFIX), void, Reg, Reg, i32) +DEF_HELPER_3(glue(vpermilps_imm, SUFFIX), void, Reg, Reg, i32) +#if SHIFT == 2 +DEF_HELPER_4(vpermdq_ymm, void, Reg, Reg, Reg, i32) +DEF_HELPER_3(vpermq_ymm, void, Reg, Reg, i32) +#endif +#endif + #undef SHIFT #undef Reg #undef SUFFIX diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index e9a9981a7f..e7b406ff80 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -328,7 +328,78 @@ static void decode_0F38(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui } } +static void decode_VINSERTPS(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry + vinsertps_reg = X86_OP_ENTRY4(VINSERTPS_r, V,dq, H,dq, U,dq, vex5 cpuid(SSE41) p_66), + vinsertps_mem = X86_OP_ENTRY4(VINSERTPS_m, V,dq, H,dq, M,d, vex5 cpuid(SSE41) p_66); + + int modrm = get_modrm(s, env); + *entry = (modrm >> 6) == 3 ? vinsertps_reg : vinsertps_mem; +} + static const X86OpEntry opcodes_0F3A[256] = { + /* + * These are VEX-only, but incorrectly listed in the manual as exception type 4. + * Also the "qq" instructions are sometimes omitted by Table 2-17, but are VEX256 + * only. + */ + [0x00] = X86_OP_ENTRY3(VPERMQ, V,qq, W,qq, I,b, vex6 cpuid(AVX2) p_66), + [0x01] = X86_OP_ENTRY3(VPERMQ, V,qq, W,qq, I,b, vex6 cpuid(AVX2) p_66), /* VPERMPD */ + [0x02] = X86_OP_ENTRY4(VBLENDPS, V,x, H,x, W,x, vex6 cpuid(AVX2) p_66), /* VPBLENDD */ + [0x04] = X86_OP_ENTRY3(VPERMILPS_i, V,x, W,x, I,b, vex6 cpuid(AVX) p_66), + [0x05] = X86_OP_ENTRY3(VPERMILPD_i, V,x, W,x, I,b, vex6 cpuid(AVX) p_66), + [0x06] = X86_OP_ENTRY4(VPERM2x128, V,qq, H,qq, W,qq, vex6 cpuid(AVX) p_66), + + [0x14] = X86_OP_ENTRY3(PEXTRB, E,b, V,dq, I,b, vex5 cpuid(SSE41) zext0 p_66), + [0x15] = X86_OP_ENTRY3(PEXTRW, E,w, V,dq, I,b, vex5 cpuid(SSE41) zext0 p_66), + [0x16] = X86_OP_ENTRY3(PEXTR, E,y, V,dq, I,b, vex5 cpuid(SSE41) p_66), + [0x17] = X86_OP_ENTRY3(VEXTRACTPS, E,d, V,dq, I,b, vex5 cpuid(SSE41) p_66), + + [0x20] = X86_OP_ENTRY4(PINSRB, V,dq, H,dq, E,b, vex5 cpuid(SSE41) zext2 p_66), + [0x21] = X86_OP_GROUP0(VINSERTPS), + [0x22] = X86_OP_ENTRY4(PINSR, V,dq, H,dq, E,y, vex5 cpuid(SSE41) p_66), + + [0x40] = X86_OP_ENTRY4(VDDPS, V,x, H,x, W,x, vex2 cpuid(SSE41) p_66), + [0x41] = X86_OP_ENTRY4(VDDPD, V,dq, H,dq, W,dq, vex2 cpuid(SSE41) p_66), + [0x42] = X86_OP_ENTRY4(VMPSADBW, V,x, H,x, W,x, vex2 cpuid(SSE41) avx2_256 p_66), + [0x44] = X86_OP_ENTRY4(PCLMULQDQ, V,dq, H,dq, W,dq, vex4 cpuid(PCLMULQDQ) p_66), + [0x46] = X86_OP_ENTRY4(VPERM2x128, V,qq, H,qq, W,qq, vex6 cpuid(AVX2) p_66), + + [0x60] = X86_OP_ENTRY4(PCMPESTRM, None,None, V,dq, W,dq, vex4_unal cpuid(SSE42) p_66), + [0x61] = X86_OP_ENTRY4(PCMPESTRI, None,None, V,dq, W,dq, vex4_unal cpuid(SSE42) p_66), + [0x62] = X86_OP_ENTRY4(PCMPISTRM, None,None, V,dq, W,dq, vex4_unal cpuid(SSE42) p_66), + [0x63] = X86_OP_ENTRY4(PCMPISTRI, None,None, V,dq, W,dq, vex4_unal cpuid(SSE42) p_66), + + [0x08] = X86_OP_ENTRY3(VROUNDPS, V,x, W,x, I,b, vex2 cpuid(SSE41) p_66), + [0x09] = X86_OP_ENTRY3(VROUNDPD, V,x, W,x, I,b, vex2 cpuid(SSE41) p_66), + /* + * Not listed as four operand in the manual. Also writes and reads 128-bits + * from the first two operands due to the V operand picking higher entries of + * the H operand; the "Vss,Hss,Wss" description from the manual is incorrect. + * For other unary operations such as VSQRTSx this is hidden by the "REPScalar" + * value of vex_special, because the table lists the operand types of VSQRTPx. + */ + [0x0a] = X86_OP_ENTRY4(VROUNDSS, V,x, H,x, W,ss, vex3 cpuid(SSE41) p_66), + [0x0b] = X86_OP_ENTRY4(VROUNDSD, V,x, H,x, W,sd, vex3 cpuid(SSE41) p_66), + [0x0c] = X86_OP_ENTRY4(VBLENDPS, V,x, H,x, W,x, vex4 cpuid(SSE41) p_66), + [0x0d] = X86_OP_ENTRY4(VBLENDPD, V,x, H,x, W,x, vex4 cpuid(SSE41) p_66), + [0x0e] = X86_OP_ENTRY4(VPBLENDW, V,x, H,x, W,x, vex4 cpuid(SSE41) p_66), + [0x0f] = X86_OP_ENTRY4(PALIGNR, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx p_00_66), + + [0x18] = X86_OP_ENTRY4(VINSERTx128, V,qq, H,qq, W,qq, vex6 cpuid(AVX) p_66), + [0x19] = X86_OP_ENTRY3(VEXTRACTx128, W,dq, V,qq, I,b, vex6 cpuid(AVX) p_66), + + [0x38] = X86_OP_ENTRY4(VINSERTx128, V,qq, H,qq, W,qq, vex6 cpuid(AVX2) p_66), + [0x39] = X86_OP_ENTRY3(VEXTRACTx128, W,dq, V,qq, I,b, vex6 cpuid(AVX2) p_66), + + /* Listed incorrectly as type 4 */ + [0x4a] = X86_OP_ENTRY4(VBLENDVPS, V,x, H,x, W,x, vex6 cpuid(AVX) p_66), + [0x4b] = X86_OP_ENTRY4(VBLENDVPD, V,x, H,x, W,x, vex6 cpuid(AVX) p_66), + [0x4c] = X86_OP_ENTRY4(VPBLENDVB, V,x, H,x, W,x, vex6 cpuid(AVX) p_66 avx2_256), + + [0xdf] = X86_OP_ENTRY3(VAESKEYGEN, V,dq, W,dq, I,b, vex4 cpuid(AES) p_66), + [0xF0] = X86_OP_ENTRY3(RORX, G,y, E,y, I,b, vex13 cpuid(BMI2) p_f2), }; @@ -839,6 +910,10 @@ static bool decode_insn(DisasContext *s, CPUX86State *env, X86DecodeFunc decode_ } } if (e->op3 != X86_TYPE_None) { + /* + * A couple instructions actually use the extra immediate byte for an Lx + * register operand; those are handled in the gen_* functions as one off. + */ assert(e->op3 == X86_TYPE_I && e->s3 == X86_SIZE_b); s->rip_offset += 1; } diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index c90f933093..dbf2c05e16 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -405,6 +405,56 @@ HORIZONTAL_FP_SSE(VHADD, hadd) HORIZONTAL_FP_SSE(VHSUB, hsub) HORIZONTAL_FP_SSE(VADDSUB, addsub) +static inline void gen_ternary_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + int op3, SSEFunc_0_epppp xmm, SSEFunc_0_epppp ymm) +{ + SSEFunc_0_epppp fn = s->vex_l ? ymm : xmm; + TCGv_ptr ptr3 = tcg_temp_new_ptr(); + + /* The format of the fourth input is Lx */ + tcg_gen_addi_ptr(ptr3, cpu_env, ZMM_OFFSET(op3)); + fn(cpu_env, s->ptr0, s->ptr1, s->ptr2, ptr3); + tcg_temp_free_ptr(ptr3); +} +#define TERNARY_SSE(uvname, lname) \ +static void gen_##uvname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_ternary_sse(s, env, decode, (uint8_t)decode->immediate >> 4, \ + gen_helper_##lname##_xmm, gen_helper_##lname##_ymm); \ +} +TERNARY_SSE(VBLENDVPS, blendvps) +TERNARY_SSE(VBLENDVPD, blendvpd) +TERNARY_SSE(VPBLENDVB, pblendvb) + +static inline void gen_binary_imm_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_epppi xmm, SSEFunc_0_epppi ymm) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + if (!s->vex_l) { + xmm(cpu_env, s->ptr0, s->ptr1, s->ptr2, imm); + } else { + ymm(cpu_env, s->ptr0, s->ptr1, s->ptr2, imm); + } + tcg_temp_free_i32(imm); +} + +#define BINARY_IMM_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_binary_imm_sse(s, env, decode, \ + gen_helper_##lname##_xmm, \ + gen_helper_##lname##_ymm); \ +} + +BINARY_IMM_SSE(VBLENDPD, blendpd) +BINARY_IMM_SSE(VBLENDPS, blendps) +BINARY_IMM_SSE(VPBLENDW, pblendw) +BINARY_IMM_SSE(VDDPS, dpps) +#define gen_helper_dppd_ymm NULL +BINARY_IMM_SSE(VDDPD, dppd) +BINARY_IMM_SSE(VMPSADBW, mpsadbw) +BINARY_IMM_SSE(PCLMULQDQ, pclmulqdq) + /* * 00 = p* Pq, Qq (if mmx not NULL; no VEX) * 66 = vp* Vx, Hx, Wx @@ -517,6 +567,33 @@ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod UNARY_IMM_SSE(PSHUFD, pshufd) UNARY_IMM_SSE(PSHUFHW, pshufhw) UNARY_IMM_SSE(PSHUFLW, pshuflw) +#define gen_helper_vpermq_xmm NULL +UNARY_IMM_SSE(VPERMQ, vpermq) +UNARY_IMM_SSE(VPERMILPS_i, vpermilps_imm) +UNARY_IMM_SSE(VPERMILPD_i, vpermilpd_imm) + +static inline void gen_unary_imm_fp_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_eppi xmm, SSEFunc_0_eppi ymm) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + if (!s->vex_l) { + xmm(cpu_env, s->ptr0, s->ptr1, imm); + } else { + ymm(cpu_env, s->ptr0, s->ptr1, imm); + } + tcg_temp_free_i32(imm); +} + +#define UNARY_IMM_FP_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_unary_imm_fp_sse(s, env, decode, \ + gen_helper_##lname##_xmm, \ + gen_helper_##lname##_ymm); \ +} + +UNARY_IMM_FP_SSE(VROUNDPS, roundps) +UNARY_IMM_FP_SSE(VROUNDPD, roundpd) static void gen_ADCOX(DisasContext *s, CPUX86State *env, MemOp ot, int cc_op) { @@ -874,6 +951,19 @@ static void gen_PADDQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) decode->op[2].offset, vec_len, vec_len); } +static void gen_PALIGNR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + if (!(s->prefix & PREFIX_DATA)) { + gen_helper_palignr_mmx(cpu_env, s->ptr0, s->ptr1, s->ptr2, imm); + } else if (!s->vex_l) { + gen_helper_palignr_xmm(cpu_env, s->ptr0, s->ptr1, s->ptr2, imm); + } else { + gen_helper_palignr_ymm(cpu_env, s->ptr0, s->ptr1, s->ptr2, imm); + } + tcg_temp_free_i32(imm); +} + static void gen_PAND(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); @@ -919,6 +1009,46 @@ static void gen_PCMPEQD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod decode->op[2].offset, vec_len, vec_len); } +static void gen_PCMPESTRI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + gen_helper_pcmpestri_xmm(cpu_env, s->ptr1, s->ptr2, imm); + tcg_temp_free_i32(imm); + set_cc_op(s, CC_OP_EFLAGS); +} + +static void gen_PCMPESTRM(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + gen_helper_pcmpestrm_xmm(cpu_env, s->ptr1, s->ptr2, imm); + tcg_temp_free_i32(imm); + set_cc_op(s, CC_OP_EFLAGS); + if ((s->prefix & PREFIX_VEX) && !s->vex_l) { + tcg_gen_gvec_dup_imm(MO_64, offsetof(CPUX86State, xmm_regs[0].ZMM_X(1)), + 16, 16, 0); + } +} + +static void gen_PCMPISTRI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + gen_helper_pcmpistri_xmm(cpu_env, s->ptr1, s->ptr2, imm); + tcg_temp_free_i32(imm); + set_cc_op(s, CC_OP_EFLAGS); +} + +static void gen_PCMPISTRM(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + gen_helper_pcmpistrm_xmm(cpu_env, s->ptr1, s->ptr2, imm); + tcg_temp_free_i32(imm); + set_cc_op(s, CC_OP_EFLAGS); + if ((s->prefix & PREFIX_VEX) && !s->vex_l) { + tcg_gen_gvec_dup_imm(MO_64, offsetof(CPUX86State, xmm_regs[0].ZMM_X(1)), + 16, 16, 0); + } +} + static void gen_PCMPGTB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); tcg_gen_ld8u_tl(s->T0, s->ptr1, offsetof(ZMMReg, ZMM_B(val))); + break; + case MO_16: + tcg_gen_ld16u_tl(s->T0, s->ptr1, offsetof(ZMMReg, ZMM_W(val))); + break; + case MO_32: + tcg_gen_ld_i32(s->tmp2_i32, s->ptr1, offsetof(ZMMReg, ZMM_L(val))); + tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); + break; +#ifdef TARGET_X86_64 + case MO_64: + tcg_gen_ld_tl(s->T0, s->ptr1, offsetof(ZMMReg, ZMM_Q(val))); + break; +#endif + default: + abort(); + } +} + +static void gen_PEXTRB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_pextr(s, env, decode, MO_8); +} + +static void gen_PEXTRW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_pextr(s, env, decode, MO_16); +} + +static void gen_PEXTR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + MemOp ot = decode->op[0].ot; + gen_pextr(s, env, decode, ot); +} + +static inline void gen_pinsr(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, MemOp ot) +{ + int vec_len = sse_vec_len(s, decode); + int mask = (vec_len >> ot) - 1; + int val = decode->immediate & mask; + + if (decode->op[1].offset != decode->op[0].offset) { + assert(vec_len == 16); + gen_store_sse(s, env, decode, decode->op[1].offset); + } + + switch(ot) { + case MO_8: + tcg_gen_st8_tl(s->T1, s->ptr0, offsetof(ZMMReg, ZMM_B(val))); + break; + case MO_16: + tcg_gen_st16_tl(s->T1, s->ptr0, offsetof(ZMMReg, ZMM_W(val))); + break; + case MO_32: + tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T1); + tcg_gen_st_i32(s->tmp2_i32, s->ptr0, offsetof(ZMMReg, ZMM_L(val))); + break; +#ifdef TARGET_X86_64 + case MO_64: + tcg_gen_st_i64(s->T1, s->ptr0, offsetof(ZMMReg, ZMM_Q(val))); + break; +#endif + default: + abort(); + } +} + +static void gen_PINSRB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_pinsr(s, env, decode, MO_8); +} + +static void gen_PINSR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_pinsr(s, env, decode, decode->op[2].ot); +} + static void gen_PMOVMSKB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { if (s->prefix & PREFIX_DATA) { @@ -1259,6 +1474,14 @@ static void gen_SSE4a_R(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod } } +static inline void gen_VAESKEYGEN(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + assert(!s->vex_l); + gen_helper_aeskeygenassist_xmm(cpu_env, s->ptr0, s->ptr1, imm); + tcg_temp_free_i32(imm); +} + #define gen_VAND gen_PAND #define gen_VANDN gen_PANDN @@ -1304,5 +1527,105 @@ static void gen_VCVTps_dq(DisasContext *s, CPUX86State *env, X86DecodedInsn *dec fn(cpu_env, s->ptr0, s->ptr2); } +static void gen_VEXTRACTx128(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int mask = decode->immediate & 1; + int src_ofs = decode->op[1].offset + offsetof(YMMReg, YMM_X(mask)); + if (decode->op[0].has_ea) { + gen_sto_env_A0(s, src_ofs); + } else { + tcg_gen_gvec_mov(MO_64, decode->op[0].offset + offsetof(YMMReg, YMM_X(0)), src_ofs, 16, 16); + } +} + +static void gen_VEXTRACTPS(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_pextr(s, env, decode, MO_32); +} + +static void gen_vinsertps(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 zero = tcg_const_i32(0); /* float32_zero */ + int val = decode->immediate; + int dest_word = (val >> 4) & 3; + int new_mask = (val & 15) | (1 << dest_word); + int vec_len = 16; + + assert(!s->vex_l); + + if (new_mask == 15) { + /* All zeroes plus possibly from the inserted element */ + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + } else if (decode->op[1].offset != decode->op[0].offset) { + gen_store_sse(s, env, decode, decode->op[1].offset); + } + + if (new_mask != (val & 15)) { + tcg_gen_st_i32(s->tmp2_i32, s->ptr0, offsetof(ZMMReg, ZMM_L(dest_word))); + } + + if (new_mask != 15) { + if ((val >> 0) & 1) + tcg_gen_st_i32(zero, s->ptr0, offsetof(ZMMReg, ZMM_L(0))); + if ((val >> 1) & 1) + tcg_gen_st_i32(zero, s->ptr0, offsetof(ZMMReg, ZMM_L(1))); + if ((val >> 2) & 1) + tcg_gen_st_i32(zero, s->ptr0, offsetof(ZMMReg, ZMM_L(2))); + if ((val >> 3) & 1) + tcg_gen_st_i32(zero, s->ptr0, offsetof(ZMMReg, ZMM_L(3))); + } + + tcg_temp_free_i32(zero); +} + +static void gen_VINSERTPS_r(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int val = decode->immediate; + tcg_gen_ld_i32(s->tmp2_i32, s->ptr2, offsetof(ZMMReg, ZMM_L((val >> 6) & 3))); + gen_vinsertps(s, env, decode); +} + +static void gen_VINSERTPS_m(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0, s->mem_index, MO_LEUL); + gen_vinsertps(s, env, decode); +} + +static void gen_VINSERTx128(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int mask = decode->immediate & 1; + tcg_gen_gvec_mov(MO_64, + decode->op[0].offset + offsetof(YMMReg, YMM_X(mask)), + decode->op[2].offset + offsetof(YMMReg, YMM_X(0)), 16, 16); + tcg_gen_gvec_mov(MO_64, + decode->op[0].offset + offsetof(YMMReg, YMM_X(!mask)), + decode->op[1].offset + offsetof(YMMReg, YMM_X(!mask)), 16, 16); +} + #define gen_VOR gen_POR + +static inline void gen_VPERM2x128(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + assert(s->vex_l); + gen_helper_vpermdq_ymm(s->ptr0, s->ptr1, s->ptr2, imm); + tcg_temp_free_i32(imm); +} + +static inline void gen_VROUNDSD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + assert(!s->vex_l); + gen_helper_roundsd_xmm(cpu_env, s->ptr0, s->ptr1, s->ptr2, imm); + tcg_temp_free_i32(imm); +} + +static inline void gen_VROUNDSS(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + assert(!s->vex_l); + gen_helper_roundss_xmm(cpu_env, s->ptr0, s->ptr1, s->ptr2, imm); + tcg_temp_free_i32(imm); +} + #define gen_VXOR gen_PXOR diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index d15e988891..556087b1e9 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4667,7 +4667,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) use_new &= b <= limit; #endif if (use_new && - ((b >= 0x150 && b <= 0x17f) || + (b == 0x13a || + (b >= 0x150 && b <= 0x17f) || (b >= 0x1d0 && b <= 0x1ff))) { return disas_insn_new(s, cpu, b); } From patchwork Sun Sep 11 23:04:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B568DC6FA83 for ; Sun, 11 Sep 2022 23:31:21 +0000 (UTC) Received: from localhost ([::1]:35840 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWQ4-0005QI-IK for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:31:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40456) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1L-0001Mr-St for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:47 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:23209) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1K-0007MV-8B for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937545; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JAtiCaR9lQvQR+6tBzPSfnat0AkPUvcguUA8ZEf6XIg=; b=MXQ53lwfuNV1gu15kFkZYWPsI5H7zv0F7preebuyhwPwuRsm75yG51tqQxowTT3qLqTwMW IKjQMe6ZqK5WZMo+AS6aV65AL84LT19qajrtePpxLuXSGlmPP4WaUKtkrkn9TTLJ6UZaXA 569YJ7qCh6Ri30j1GSNpfFrq5AfKliE= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-651-Ltg7aSRCPQ67asHFUrVLDw-1; Sun, 11 Sep 2022 19:05:44 -0400 X-MC-Unique: Ltg7aSRCPQ67asHFUrVLDw-1 Received: by mail-ej1-f69.google.com with SMTP id go7-20020a1709070d8700b007793ffa7c44so2316428ejc.1 for ; Sun, 11 Sep 2022 16:05:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=JAtiCaR9lQvQR+6tBzPSfnat0AkPUvcguUA8ZEf6XIg=; b=BSfYZmOeYwXUgbCLAcZyPOmbas5rX6qreO3HtjFyh86Gy7WrtoACo7NkARgTZorFfd zK+dX9ieFPoqlrGiIsSE88jfRUUrAwPmNbi3/3d+3VE10dgXaBYDi6kNZWng5wVA+6Mj mCf9onXGWqnNejegz00TJyQNV2xpp9re2U/7u3M/41zKSDWq2CUFnyeaHhhUZYoGDsrZ Wf32pj2umI3M9Wg21vtXsatU19stlznAVyBAUqN3C41qAU1XI2h/6i4doDm96uHScbJW /fBmSQgBPd5sq2nBDo2NzNEccsCbv6HMb75rcAenIIw7dvZmQX3NmclFVjKL+x6iPkTR EtwA== X-Gm-Message-State: ACgBeo1aqrBOv7Aw0y2+VOZ+2IKPt7ck6X6eOiGPUmQURtSmgN49FP+M D6bMQ5MV+enMHm6YGLnaF+Z7bnDsrmd+uqspRlBs5h9Cppv9aLT2ML2rs8keKMmeOZDSBzPCFOv LqHc2dkm6FJrI8cb5S5/r8c12elN7ishwRfbhy5d9lwkaBWg75LQ034YLEeoyppVJlqA= X-Received: by 2002:a17:906:8a45:b0:77c:dd3:cebd with SMTP id gx5-20020a1709068a4500b0077c0dd3cebdmr3997585ejc.668.1662937542885; Sun, 11 Sep 2022 16:05:42 -0700 (PDT) X-Google-Smtp-Source: AA6agR6xkyJR6LenkBYoVr9eQeRMhfUXOhZqn4UGQQm0b2pqMjql19gmd/kk5SaLP+9oWF41cV5cmw== X-Received: by 2002:a17:906:8a45:b0:77c:dd3:cebd with SMTP id gx5-20020a1709068a4500b0077c0dd3cebdmr3997570ejc.668.1662937542599; Sun, 11 Sep 2022 16:05:42 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id ka5-20020a170907920500b0073c5192cba6sm3458406ejb.114.2022.09.11.16.05.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:42 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: Richard Henderson Subject: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb Date: Mon, 12 Sep 2022 01:04:07 +0200 Message-Id: <20220911230418.340941-28-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Richard Henderson As pmovmskb is used by strlen et al, this is the third highest overhead sse operation at %0.8. Signed-off-by: Richard Henderson [Reorganize to generate code for any vector size. - Paolo] Signed-off-by: Paolo Bonzini --- target/i386/tcg/emit.c.inc | 65 +++++++++++++++++++++++++++++++++++--- 1 file changed, 60 insertions(+), 5 deletions(-) diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index dbf2c05e16..52c0a7fbe0 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -1179,14 +1179,69 @@ static void gen_PINSR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) gen_pinsr(s, env, decode, decode->op[2].ot); } +static void gen_pmovmskb_i64(TCGv_i64 d, TCGv_i64 s) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_andi_i64(d, s, 0x8080808080808080ull); + + /* + * After each shift+or pair: + * 0: a.......b.......c.......d.......e.......f.......g.......h....... + * 7: ab......bc......cd......de......ef......fg......gh......h....... + * 14: abcd....bcde....cdef....defg....efgh....fgh.....gh......h....... + * 28: abcdefghbcdefgh.cdefgh..defgh...efgh....fgh.....gh......h....... + * The result is left in the high bits of the word. + */ + tcg_gen_shli_i64(t, d, 7); + tcg_gen_or_i64(d, d, t); + tcg_gen_shli_i64(t, d, 14); + tcg_gen_or_i64(d, d, t); + tcg_gen_shli_i64(t, d, 28); + tcg_gen_or_i64(d, d, t); +} + +static void gen_pmovmskb_vec(unsigned vece, TCGv_vec d, TCGv_vec s) +{ + TCGv_vec t = tcg_temp_new_vec_matching(d); + TCGv_vec m = tcg_constant_vec_matching(d, MO_8, 0x80); + + /* See above */ + tcg_gen_and_vec(vece, d, s, m); + tcg_gen_shli_vec(vece, t, d, 7); + tcg_gen_or_vec(vece, d, d, t); + tcg_gen_shli_vec(vece, t, d, 14); + tcg_gen_or_vec(vece, d, d, t); + if (vece == MO_64) { + tcg_gen_shli_vec(vece, t, d, 28); + tcg_gen_or_vec(vece, d, d, t); + } +} + static void gen_PMOVMSKB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { - if (s->prefix & PREFIX_DATA) { - gen_helper_pmovmskb_xmm(s->tmp2_i32, cpu_env, s->ptr2); - } else { - gen_helper_pmovmskb_mmx(s->tmp2_i32, cpu_env, s->ptr2); + static const TCGOpcode vecop_list[] = { INDEX_op_shli_vec, 0 }; + static const GVecGen2 g = { + .fni8 = gen_pmovmskb_i64, + .fniv = gen_pmovmskb_vec, + .opt_opc = vecop_list, + .vece = MO_64, + .prefer_i64 = TCG_TARGET_REG_BITS == 64 + }; + MemOp ot = decode->op[0].ot; + int vec_len = sse_vec_len(s, decode); + TCGv t = tcg_temp_new(); + + tcg_gen_gvec_2(offsetof(CPUX86State, xmm_t0) + xmm_offset(ot), decode->op[2].offset, + vec_len, vec_len, &g); + tcg_gen_ld8u_tl(s->T0, cpu_env, offsetof(CPUX86State, xmm_t0.ZMM_B(vec_len - 1))); + while (vec_len > 8) { + vec_len -= 8; + tcg_gen_shli_tl(s->T0, s->T0, 8); + tcg_gen_ld8u_tl(t, cpu_env, offsetof(CPUX86State, xmm_t0.ZMM_B(vec_len - 1))); + tcg_gen_or_tl(s->T0, s->T0, t); } - tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); + tcg_temp_free(t); } static void gen_POR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) From patchwork Sun Sep 11 23:04:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D028DC6FA83 for ; Sun, 11 Sep 2022 23:34:49 +0000 (UTC) Received: from localhost ([::1]:39744 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWTQ-0002dZ-Oa for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:34:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40460) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1T-0001R9-Bp for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:39465) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1P-0007Mq-Gj for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937549; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FLCbr+27Fz9V3QaOEkD58czOfvcqsfroDtUfI7SF6ug=; b=UOp5TTWYwg5Afl5dfMdJpPJWGjd6KQj/6UIwzw517i7WAL4XfQWu33r7mFz+BxCAtXJYlY 4kmxamcIcDTZYh3FESGHzVu39hwTQJBvnuFMZD4mviF0lac7KAlviu4iQxSk3I5U8u0rzD FfLYgJ7ukzDYxvY5DDhh8JsscGB1SsM= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-74-JojgOZgjOUiz_8dbJbkEnQ-1; Sun, 11 Sep 2022 19:05:48 -0400 X-MC-Unique: JojgOZgjOUiz_8dbJbkEnQ-1 Received: by mail-ej1-f70.google.com with SMTP id xj11-20020a170906db0b00b0077b6ecb23fcso503630ejb.5 for ; Sun, 11 Sep 2022 16:05:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=FLCbr+27Fz9V3QaOEkD58czOfvcqsfroDtUfI7SF6ug=; b=7uQ4TGcxLDBV1JBMXfhUyG2uCJ7sMFWMPxa2JDGwmG4gz4XoTq+qWIhRUXK+lbOZ2c 96kgp3ZnVy9kT6Rf5HcV0iCsXj/ITo3V1Onbd+CjBL1uMfY0smUoWNFZHo5rXdqqOsVM vJlSDl0Hi+AwYT+BPz3XBF0wHvAgOUj1nLoGNr/WFXrXy+AkBhb8nGUZtMuUqb/s78lZ vwdWL8uSSo/YliSk6ToKfbTeITwWwF3qWpYH3KZBn63q40FHrsp08MCr4ZDIqqDmKNGS IxxaSFZp4OmhOn5aWZm+WFoUZbRZbcWHjobtD6fhL7x1o7WZTucpjfV5QKme2R4nJMwh qIpQ== X-Gm-Message-State: ACgBeo0rO56J9gpkLDgcleO2s8ns/vaJmzB5p1DcXH2vuJzltPCnLXhC gPwzh8qpiQb31ZQk4srP+kgXbZ87jXznKTTVnm9jYKKMUCg+PqLGTtMv1JRInRVvM171lCsOvFB SXlOm9pI4Wv2OcXW3t00zjLa3Anse4BUfAv22Ib0wFkm9t1ZkhLwrX5J7fqQNtOgqqm8= X-Received: by 2002:a17:907:970a:b0:77d:a10c:e089 with SMTP id jg10-20020a170907970a00b0077da10ce089mr1270269ejc.364.1662937546430; Sun, 11 Sep 2022 16:05:46 -0700 (PDT) X-Google-Smtp-Source: AA6agR65uxDpXIdhCJEMJhY5rW0lB1cJZm1zacVP7sJImr9/GoVkel1Y6pdncHQKoGjcCLRKALJOYw== X-Received: by 2002:a17:907:970a:b0:77d:a10c:e089 with SMTP id jg10-20020a170907970a00b0077da10ce089mr1270245ejc.364.1662937545632; Sun, 11 Sep 2022 16:05:45 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id q16-20020aa7cc10000000b0044e84d05cd8sm4752148edt.0.2022.09.11.16.05.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:45 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 28/37] target/i386: reimplement 0x0f 0x38, add AVX Date: Mon, 12 Sep 2022 01:04:08 +0200 Message-Id: <20220911230418.340941-29-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" There are several special cases here: 1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov. 2) some instructions, such as variable-width shifts, select the vector element size via REX.W. 3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands. 3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order. 4) some helpers accept a Reg* but have a M argument (i.e. a value of 11 in the ModRM field causes an undefined opcode exception). For these, it is useful to add a custom X86_TYPE_WM value is added that does call gen_load() unlike X86_TYPE_M. These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook . Signed-off-by: Paolo Bonzini --- target/i386/ops_sse.h | 185 +++++++++++++++++++- target/i386/ops_sse_header.h | 19 ++ target/i386/tcg/decode-new.c.inc | 115 +++++++++++- target/i386/tcg/decode-new.h | 7 + target/i386/tcg/emit.c.inc | 288 ++++++++++++++++++++++++++++++- target/i386/tcg/translate.c | 2 +- 6 files changed, 608 insertions(+), 8 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 7eba1cf0f1..fbbe82c6e7 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2382,6 +2382,36 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #endif #if SHIFT >= 1 +void glue(helper_vpermilpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ + uint64_t r0, r1; + int i; + + for (i = 0; i < 1 << SHIFT; i += 2) { + r0 = v->Q(i + ((s->Q(i) >> 1) & 1)); + r1 = v->Q(i + ((s->Q(i+1) >> 1) & 1)); + d->Q(i) = r0; + d->Q(i+1) = r1; + } +} + +void glue(helper_vpermilps, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ + uint32_t r0, r1, r2, r3; + int i; + + for (i = 0; i < 2 << SHIFT; i += 4) { + r0 = v->L(i + (s->L(i) & 3)); + r1 = v->L(i + (s->L(i+1) & 3)); + r2 = v->L(i + (s->L(i+2) & 3)); + r3 = v->L(i + (s->L(i+3) & 3)); + d->L(i) = r0; + d->L(i+1) = r1; + d->L(i+2) = r2; + d->L(i+3) = r3; + } +} + void glue(helper_vpermilpd_imm, SUFFIX)(Reg *d, Reg *s, uint32_t order) { uint64_t r0, r1; @@ -2414,6 +2444,147 @@ void glue(helper_vpermilps_imm, SUFFIX)(Reg *d, Reg *s, uint32_t order) } } +#if SHIFT == 1 +#define FPSRLVD(x, c) (c < 32 ? ((x) >> c) : 0) +#define FPSRLVQ(x, c) (c < 64 ? ((x) >> c) : 0) +#define FPSRAVD(x, c) ((int32_t)(x) >> (c < 64 ? c : 31)) +#define FPSRAVQ(x, c) ((int64_t)(x) >> (c < 64 ? c : 63)) +#define FPSLLVD(x, c) (c < 32 ? ((x) << c) : 0) +#define FPSLLVQ(x, c) (c < 64 ? ((x) << c) : 0) +#endif + +SSE_HELPER_L(helper_vpsrlvd, FPSRLVD) +SSE_HELPER_L(helper_vpsravd, FPSRAVD) +SSE_HELPER_L(helper_vpsllvd, FPSLLVD) + +SSE_HELPER_Q(helper_vpsrlvq, FPSRLVQ) +SSE_HELPER_Q(helper_vpsravq, FPSRAVQ) +SSE_HELPER_Q(helper_vpsllvq, FPSLLVQ) + +void glue(helper_vtestps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + uint64_t zf = 0, cf = 0; + int i; + + for (i = 0; i < 2 << SHIFT; i++) { + zf |= (s->L(i) & d->L(i)); + cf |= (s->L(i) & ~d->L(i)); + } + CC_SRC = ((zf >> 31) ? 0 : CC_Z) | ((cf >> 31) ? 0 : CC_C); +} + +void glue(helper_vtestpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + uint64_t zf = 0, cf = 0; + int i; + + for (i = 0; i < 1 << SHIFT; i++) { + zf |= (s->Q(i) & d->Q(i)); + cf |= (s->Q(i) & ~d->Q(i)); + } + CC_SRC = ((zf >> 63) ? 0 : CC_Z) | ((cf >> 63) ? 0 : CC_C); +} + +void glue(helper_vpmaskmovd_st, SUFFIX)(CPUX86State *env, + Reg *v, Reg *s, target_ulong a0) +{ + int i; + + for (i = 0; i < (2 << SHIFT); i++) { + if (v->L(i) >> 31) { + cpu_stl_data_ra(env, a0 + i * 4, s->L(i), GETPC()); + } + } +} + +void glue(helper_vpmaskmovq_st, SUFFIX)(CPUX86State *env, + Reg *v, Reg *s, target_ulong a0) +{ + int i; + + for (i = 0; i < (1 << SHIFT); i++) { + if (v->Q(i) >> 63) { + cpu_stq_data_ra(env, a0 + i * 8, s->Q(i), GETPC()); + } + } +} + +void glue(helper_vpmaskmovd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ + int i; + + for (i = 0; i < (2 << SHIFT); i++) { + d->L(i) = (v->L(i) >> 31) ? s->L(i) : 0; + } +} + +void glue(helper_vpmaskmovq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ + int i; + + for (i = 0; i < (1 << SHIFT); i++) { + d->Q(i) = (v->Q(i) >> 63) ? s->Q(i) : 0; + } +} + +void glue(helper_vpgatherdd, SUFFIX)(CPUX86State *env, + Reg *d, Reg *v, Reg *s, target_ulong a0, unsigned scale) +{ + int i; + for (i = 0; i < (2 << SHIFT); i++) { + if (v->L(i) >> 31) { + target_ulong addr = a0 + + ((target_ulong)(int32_t)s->L(i) << scale); + d->L(i) = cpu_ldl_data_ra(env, addr, GETPC()); + } + v->L(i) = 0; + } +} +void glue(helper_vpgatherdq, SUFFIX)(CPUX86State *env, + Reg *d, Reg *v, Reg *s, target_ulong a0, unsigned scale) +{ + int i; + for (i = 0; i < (1 << SHIFT); i++) { + if (v->Q(i) >> 63) { + target_ulong addr = a0 + + ((target_ulong)(int32_t)s->L(i) << scale); + d->Q(i) = cpu_ldq_data_ra(env, addr, GETPC()); + } + v->Q(i) = 0; + } +} +void glue(helper_vpgatherqd, SUFFIX)(CPUX86State *env, + Reg *d, Reg *v, Reg *s, target_ulong a0, unsigned scale) +{ + int i; + for (i = 0; i < (1 << SHIFT); i++) { + if (v->L(i) >> 31) { + target_ulong addr = a0 + + ((target_ulong)(int64_t)s->Q(i) << scale); + d->L(i) = cpu_ldl_data_ra(env, addr, GETPC()); + } + v->L(i) = 0; + } + for (i /= 2; i < 1 << SHIFT; i++) { + d->Q(i) = 0; + v->Q(i) = 0; + } +} +void glue(helper_vpgatherqq, SUFFIX)(CPUX86State *env, + Reg *d, Reg *v, Reg *s, target_ulong a0, unsigned scale) +{ + int i; + for (i = 0; i < (1 << SHIFT); i++) { + if (v->Q(i) >> 63) { + target_ulong addr = a0 + + ((target_ulong)(int64_t)s->Q(i) << scale); + d->Q(i) = cpu_ldq_data_ra(env, addr, GETPC()); + } + v->Q(i) = 0; + } +} +#endif + #if SHIFT >= 2 void helper_vpermdq_ymm(Reg *d, Reg *v, Reg *s, uint32_t order) { @@ -2473,7 +2644,19 @@ void helper_vpermq_ymm(Reg *d, Reg *s, uint32_t order) d->Q(2) = r2; d->Q(3) = r3; } -#endif + +void helper_vpermd_ymm(Reg *d, Reg *v, Reg *s) +{ + uint32_t r[8]; + int i; + + for (i = 0; i < 8; i++) { + r[i] = s->L(v->L(i) & 7); + } + for (i = 0; i < 8; i++) { + d->L(i) = r[i]; + } +} #endif #undef SSE_HELPER_S diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 6b70d90734..e188cbd87d 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -413,9 +413,28 @@ DEF_HELPER_5(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, Reg, i32) /* AVX helpers */ #if SHIFT >= 1 +DEF_HELPER_4(glue(vpermilpd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpermilps, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_3(glue(vpermilpd_imm, SUFFIX), void, Reg, Reg, i32) DEF_HELPER_3(glue(vpermilps_imm, SUFFIX), void, Reg, Reg, i32) +DEF_HELPER_4(glue(vpsrlvd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsravd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsllvd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsrlvq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsravq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsllvq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_3(glue(vtestps, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(vtestpd, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(vpmaskmovd_st, SUFFIX), void, env, Reg, Reg, tl) +DEF_HELPER_4(glue(vpmaskmovq_st, SUFFIX), void, env, Reg, Reg, tl) +DEF_HELPER_4(glue(vpmaskmovd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpmaskmovq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_6(glue(vpgatherdd, SUFFIX), void, env, Reg, Reg, Reg, tl, i32) +DEF_HELPER_6(glue(vpgatherdq, SUFFIX), void, env, Reg, Reg, Reg, tl, i32) +DEF_HELPER_6(glue(vpgatherqd, SUFFIX), void, env, Reg, Reg, Reg, tl, i32) +DEF_HELPER_6(glue(vpgatherqq, SUFFIX), void, env, Reg, Reg, Reg, tl, i32) #if SHIFT == 2 +DEF_HELPER_3(vpermd_ymm, void, Reg, Reg, Reg) DEF_HELPER_4(vpermdq_ymm, void, Reg, Reg, Reg, i32) DEF_HELPER_3(vpermq_ymm, void, Reg, Reg, i32) #endif diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index e7b406ff80..7feb0eca4e 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -90,6 +90,7 @@ #define mmx .special = X86_SPECIAL_MMX, #define zext0 .special = X86_SPECIAL_ZExtOp0, #define zext2 .special = X86_SPECIAL_ZExtOp2, +#define avx_movx .special = X86_SPECIAL_AVXExtMov, #define vex1 .vex_class = 1, #define vex1_rep3 .vex_class = 1, .vex_special = X86_VEX_REPScalar, @@ -255,6 +256,105 @@ static void decode_0FD6(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui } static const X86OpEntry opcodes_0F38_00toEF[240] = { + [0x00] = X86_OP_ENTRY3(PSHUFB, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x01] = X86_OP_ENTRY3(PHADDW, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x02] = X86_OP_ENTRY3(PHADDD, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x03] = X86_OP_ENTRY3(PHADDSW, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x04] = X86_OP_ENTRY3(PMADDUBSW, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x05] = X86_OP_ENTRY3(PHSUBW, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x06] = X86_OP_ENTRY3(PHSUBD, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x07] = X86_OP_ENTRY3(PHSUBSW, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + + [0x10] = X86_OP_ENTRY3(PBLENDVB, V,x, None,None, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x14] = X86_OP_ENTRY3(BLENDVPS, V,x, None,None, W,x, vex4 cpuid(SSE41) p_66), + [0x15] = X86_OP_ENTRY3(BLENDVPD, V,x, None,None, W,x, vex4 cpuid(SSE41) p_66), + /* Listed incorrectly as type 4 */ + [0x16] = X86_OP_ENTRY3(VPERMD, V,qq, H,qq, W,qq, vex6 cpuid(AVX2) p_66), + [0x17] = X86_OP_ENTRY3(VPTEST, None,None, V,x, W,x, vex4 cpuid(SSE41) p_66), + + /* + * Source operand listed as Mq/Ux and similar in the manual; incorrectly listed + * as 128-bit only in 2-17. + */ + [0x20] = X86_OP_ENTRY3(VPMOVSXBW, V,x, None,None, W,q, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x21] = X86_OP_ENTRY3(VPMOVSXBD, V,x, None,None, W,d, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x22] = X86_OP_ENTRY3(VPMOVSXBQ, V,x, None,None, W,w, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x23] = X86_OP_ENTRY3(VPMOVSXWD, V,x, None,None, W,q, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x24] = X86_OP_ENTRY3(VPMOVSXWQ, V,x, None,None, W,d, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x25] = X86_OP_ENTRY3(VPMOVSXDQ, V,x, None,None, W,q, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + + /* Same as PMOVSX. */ + [0x30] = X86_OP_ENTRY3(VPMOVZXBW, V,x, None,None, W,q, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x31] = X86_OP_ENTRY3(VPMOVZXBD, V,x, None,None, W,d, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x32] = X86_OP_ENTRY3(VPMOVZXBQ, V,x, None,None, W,w, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x33] = X86_OP_ENTRY3(VPMOVZXWD, V,x, None,None, W,q, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x34] = X86_OP_ENTRY3(VPMOVZXWQ, V,x, None,None, W,d, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x35] = X86_OP_ENTRY3(VPMOVZXDQ, V,x, None,None, W,q, vex5 cpuid(SSE41) avx_movx avx2_256 p_66), + [0x36] = X86_OP_ENTRY3(VPERMD, V,qq, H,qq, W,qq, vex6 cpuid(AVX2) p_66), + [0x37] = X86_OP_ENTRY3(PCMPGTQ, V,x, H,x, W,x, vex4 cpuid(SSE42) avx2_256 p_66), + + [0x40] = X86_OP_ENTRY3(VPMULLD, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x41] = X86_OP_ENTRY3(VPHMINPOSUW, V,dq, None,None, W,dq, vex4 cpuid(SSE41) p_66), + /* Listed incorrectly as type 4 */ + [0x45] = X86_OP_ENTRY3(VPSRLV, V,x, H,x, W,x, vex6 cpuid(AVX2) p_66), + [0x46] = X86_OP_ENTRY3(VPSRAV, V,x, H,x, W,x, vex6 cpuid(AVX2) p_66), + [0x47] = X86_OP_ENTRY3(VPSLLV, V,x, H,x, W,x, vex6 cpuid(AVX2) p_66), + + [0x90] = X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 cpuid(AVX2) p_66), /* vpgatherdd/q */ + [0x91] = X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 cpuid(AVX2) p_66), /* vpgatherqd/q */ + [0x92] = X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 cpuid(AVX2) p_66), /* vgatherdps/d */ + [0x93] = X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 cpuid(AVX2) p_66), /* vgatherqps/d */ + + [0x08] = X86_OP_ENTRY3(PSIGNB, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x09] = X86_OP_ENTRY3(PSIGNW, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x0a] = X86_OP_ENTRY3(PSIGND, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x0b] = X86_OP_ENTRY3(PMULHRSW, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x0c] = X86_OP_ENTRY3(VPERMILPS, V,x, H,x, W,x, vex4 cpuid(AVX) p_00_66), + [0x0d] = X86_OP_ENTRY3(VPERMILPD, V,x, H,x, W,x, vex4 cpuid(AVX) p_66), + [0x0e] = X86_OP_ENTRY3(VTESTPS, None,None, V,x, W,x, vex4 cpuid(AVX) p_66), + [0x0f] = X86_OP_ENTRY3(VTESTPD, None,None, V,x, W,x, vex4 cpuid(AVX) p_66), + + [0x18] = X86_OP_ENTRY3(VPBROADCASTD, V,x, None,None, W,d, vex6 cpuid(AVX) p_66), /* vbroadcastss */ + [0x19] = X86_OP_ENTRY3(VPBROADCASTQ, V,qq, None,None, W,q, vex6 cpuid(AVX) p_66), /* vbroadcastsd */ + [0x1a] = X86_OP_ENTRY3(VBROADCASTx128, V,qq, None,None, WM,dq,vex6 cpuid(AVX) p_66), + [0x1c] = X86_OP_ENTRY3(PABSB, V,x, None,None, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x1d] = X86_OP_ENTRY3(PABSW, V,x, None,None, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + [0x1e] = X86_OP_ENTRY3(PABSD, V,x, None,None, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), + + [0x28] = X86_OP_ENTRY3(VPMULDQ, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x29] = X86_OP_ENTRY3(PCMPEQQ, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x2a] = X86_OP_ENTRY3(MOVNTDQA, V,x, None,None, M,x, vex1 cpuid(SSE41) avx2_256 p_66), + [0x2b] = X86_OP_ENTRY3(VPACKUSDW, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x2c] = X86_OP_ENTRY3(VMASKMOVPS, V,x, H,x, WM,x, vex6 cpuid(AVX) p_66), + [0x2d] = X86_OP_ENTRY3(VMASKMOVPD, V,x, H,x, WM,x, vex6 cpuid(AVX) p_66), + /* Incorrectly listed as Mx,Hx,Vx in the manual */ + [0x2e] = X86_OP_ENTRY3(VMASKMOVPS_st, M,x, V,x, H,x, vex6 cpuid(AVX) p_66), + [0x2f] = X86_OP_ENTRY3(,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x3c] = X86_OP_ENTRY3(VPMAXSB, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x3d] = X86_OP_ENTRY3(VPMAXSD, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x3e] = X86_OP_ENTRY3(VPMAXUW, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x3f] = X86_OP_ENTRY3(VPMAXUD, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + + [0x58] = X86_OP_ENTRY3(VPBROADCASTD, V,x, None,None, W,d, vex6 cpuid(AVX2) p_66), + [0x59] = X86_OP_ENTRY3(VPBROADCASTQ, V,x, None,None, W,q, vex6 cpuid(AVX2) p_66), + [0x5a] = X86_OP_ENTRY3(VBROADCASTx128, V,qq, None,None, WM,dq,vex6 cpuid(AVX2) p_66), + + [0x78] = X86_OP_ENTRY3(VPBROADCASTB, V,x, None,None, W,b, vex6 cpuid(AVX2) p_66), + [0x79] = X86_OP_ENTRY3(VPBROADCASTW, V,x, None,None, W,w, vex6 cpuid(AVX2) p_66), + + [0x8c] = X86_OP_ENTRY3(VPMASKMOV, V,x, H,x, WM,x, vex6 cpuid(AVX2) p_66), + [0x8e] = X86_OP_ENTRY3(VPMASKMOV_st, M,x, V,x, H,x, vex6 cpuid(AVX2) p_66), + + [0xdb] = X86_OP_ENTRY3(VAESIMC, V,dq, None,None, W,dq, vex4 cpuid(AES) p_66), + [0xdc] = X86_OP_ENTRY3(VAESENC, V,dq, H,dq, W,dq, vex4 cpuid(AES) p_66), + [0xdd] = X86_OP_ENTRY3(VAESENCLAST, V,dq, H,dq, W,dq, vex4 cpuid(AES) p_66), + [0xde] = X86_OP_ENTRY3(VAESDEC, V,dq, H,dq, W,dq, vex4 cpuid(AES) p_66), + [0xdf] = X86_OP_ENTRY3(VAESDECLAST, V,dq, H,dq, W,dq, vex4 cpuid(AES) p_66), }; /* five rows for no prefix, 66, F3, F2, 66+F2 */ @@ -384,8 +484,8 @@ static const X86OpEntry opcodes_0F3A[256] = { [0x0b] = X86_OP_ENTRY4(VROUNDSD, V,x, H,x, W,sd, vex3 cpuid(SSE41) p_66), [0x0c] = X86_OP_ENTRY4(VBLENDPS, V,x, H,x, W,x, vex4 cpuid(SSE41) p_66), [0x0d] = X86_OP_ENTRY4(VBLENDPD, V,x, H,x, W,x, vex4 cpuid(SSE41) p_66), - [0x0e] = X86_OP_ENTRY4(VPBLENDW, V,x, H,x, W,x, vex4 cpuid(SSE41) p_66), - [0x0f] = X86_OP_ENTRY4(PALIGNR, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx p_00_66), + [0x0e] = X86_OP_ENTRY4(VPBLENDW, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66), + [0x0f] = X86_OP_ENTRY4(PALIGNR, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66), [0x18] = X86_OP_ENTRY4(VINSERTx128, V,qq, H,qq, W,qq, vex6 cpuid(AVX) p_66), [0x19] = X86_OP_ENTRY3(VEXTRACTx128, W,dq, V,qq, I,b, vex6 cpuid(AVX) p_66), @@ -754,6 +854,9 @@ static bool decode_op(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, } goto get_modrm; + case X86_TYPE_WM: /* modrm byte selects an XMM/YMM memory operand */ + op->unit = X86_OP_SSE; + /* fall through */ case X86_TYPE_M: /* modrm byte selects a memory operand */ modrm = get_modrm(s, env); if ((modrm >> 6) == 3) { @@ -1341,6 +1444,14 @@ static target_ulong disas_insn_new(DisasContext *s, CPUState *cpu, int b) } break; + case X86_SPECIAL_AVXExtMov: + if (!decode.op[2].has_ea) { + decode.op[2].ot = s->vex_l ? MO_128 : MO_256; + } else if (s->vex_l) { + decode.op[2].ot++; + } + break; + case X86_SPECIAL_MMX: if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ | PREFIX_DATA))) { gen_helper_enter_mmx(cpu_env); diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h index 3db7b82506..e86876b9a9 100644 --- a/target/i386/tcg/decode-new.h +++ b/target/i386/tcg/decode-new.h @@ -47,6 +47,7 @@ typedef enum X86OpType { X86_TYPE_Y, /* string destination */ /* Custom */ + X86_TYPE_WM, /* modrm byte selects an XMM/YMM memory operand */ X86_TYPE_2op, /* 2-operand RMW instruction */ X86_TYPE_LoBits, /* encoded in bits 0-2 of the operand + REX.B */ X86_TYPE_0, /* Hard-coded GPRs (RAX..RDI) */ @@ -141,6 +142,12 @@ typedef enum X86InsnSpecial { X86_SPECIAL_ZExtOp0, X86_SPECIAL_ZExtOp2, + /* + * Register operand 2 is extended to full width, while a memory operand + * is doubled in size if VEX.L=1. + */ + X86_SPECIAL_AVXExtMov, + /* * MMX instruction exists with no prefix; if there is no prefix, V/H/W/U operands * become P/P/Q/N, and size "x" becomes "q". diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 52c0a7fbe0..7084875af6 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -19,6 +19,9 @@ * License along with this library; if not, see . */ +typedef void (*SSEFunc_0_epppti)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, + TCGv_ptr reg_c, TCGv a0, TCGv_i32 scale); + static void gen_NM_exception(DisasContext *s) { gen_exception(s, EXCP07_PREX, s->pc_start - s->cs_base); @@ -416,15 +419,21 @@ static inline void gen_ternary_sse(DisasContext *s, CPUX86State *env, X86Decoded fn(cpu_env, s->ptr0, s->ptr1, s->ptr2, ptr3); tcg_temp_free_ptr(ptr3); } -#define TERNARY_SSE(uvname, lname) \ +#define TERNARY_SSE(uname, uvname, lname) \ static void gen_##uvname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ { \ gen_ternary_sse(s, env, decode, (uint8_t)decode->immediate >> 4, \ gen_helper_##lname##_xmm, gen_helper_##lname##_ymm); \ +} \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + tcg_gen_mov_ptr(s->ptr1, s->ptr0); \ + gen_ternary_sse(s, env, decode, 0, \ + gen_helper_##lname##_xmm, gen_helper_##lname##_ymm); \ } -TERNARY_SSE(VBLENDVPS, blendvps) -TERNARY_SSE(VBLENDVPD, blendvpd) -TERNARY_SSE(VPBLENDVB, pblendvb) +TERNARY_SSE(BLENDVPS, VBLENDVPS, blendvps) +TERNARY_SSE(BLENDVPD, VBLENDVPD, blendvpd) +TERNARY_SSE(PBLENDVB, VPBLENDVB, pblendvb) static inline void gen_binary_imm_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, SSEFunc_0_epppi xmm, SSEFunc_0_epppi ymm) @@ -531,6 +540,19 @@ BINARY_INT_MMX(PSRLQ_r, psrlq) BINARY_INT_MMX(PSRAW_r, psraw) BINARY_INT_MMX(PSRAD_r, psrad) +BINARY_INT_MMX(PHADDW, phaddw) +BINARY_INT_MMX(PHADDSW, phaddsw) +BINARY_INT_MMX(PHADDD, phaddd) +BINARY_INT_MMX(PHSUBW, phsubw) +BINARY_INT_MMX(PHSUBSW, phsubsw) +BINARY_INT_MMX(PHSUBD, phsubd) +BINARY_INT_MMX(PMADDUBSW, pmaddubsw) +BINARY_INT_MMX(PSHUFB, pshufb) +BINARY_INT_MMX(PSIGNB, psignb) +BINARY_INT_MMX(PSIGNW, psignw) +BINARY_INT_MMX(PSIGND, psignd) +BINARY_INT_MMX(PMULHRSW, pmulhrsw) + /* Instructions with no MMX equivalent. */ #define BINARY_INT_SSE(uname, lname) \ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ @@ -541,8 +563,75 @@ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod gen_helper_##lname##_ymm); \ } +/* Instructions with no MMX equivalent. */ BINARY_INT_SSE(PUNPCKLQDQ, punpcklqdq) BINARY_INT_SSE(PUNPCKHQDQ, punpckhqdq) +BINARY_INT_SSE(VPACKUSDW, packusdw) +BINARY_INT_SSE(VPMINSB, pminsb) +BINARY_INT_SSE(VPMINUW, pminuw) +BINARY_INT_SSE(VPMINUD, pminud) +BINARY_INT_SSE(VPMINSD, pminsd) +BINARY_INT_SSE(VPMAXSB, pmaxsb) +BINARY_INT_SSE(VPMAXUW, pmaxuw) +BINARY_INT_SSE(VPMAXUD, pmaxud) +BINARY_INT_SSE(VPMAXSD, pmaxsd) +BINARY_INT_SSE(VPMULLD, pmulld) +BINARY_INT_SSE(VPMULDQ, pmuldq) +BINARY_INT_SSE(VPERMILPS, vpermilps) +BINARY_INT_SSE(VPERMILPD, vpermilpd) +BINARY_INT_SSE(VMASKMOVPS, vpmaskmovd) +BINARY_INT_SSE(VMASKMOVPD, vpmaskmovq) + +BINARY_INT_SSE(VAESDEC, aesdec) +BINARY_INT_SSE(VAESDECLAST, aesdeclast) +BINARY_INT_SSE(VAESENC, aesenc) +BINARY_INT_SSE(VAESENCLAST, aesenclast) + +static inline void gen_unary_int_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_epp xmm, SSEFunc_0_epp ymm) +{ + if (!s->vex_l) { + xmm(cpu_env, s->ptr0, s->ptr2); + } else { + ymm(cpu_env, s->ptr0, s->ptr2); + } +} + +#define UNARY_INT_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_unary_int_sse(s, env, decode, \ + gen_helper_##lname##_xmm, \ + gen_helper_##lname##_ymm); \ +} + +UNARY_INT_SSE(VPMOVSXBW, pmovsxbw) +UNARY_INT_SSE(VPMOVSXBD, pmovsxbd) +UNARY_INT_SSE(VPMOVSXBQ, pmovsxbq) +UNARY_INT_SSE(VPMOVSXWD, pmovsxwd) +UNARY_INT_SSE(VPMOVSXWQ, pmovsxwq) +UNARY_INT_SSE(VPMOVSXDQ, pmovsxdq) + +UNARY_INT_SSE(VPMOVZXBW, pmovzxbw) +UNARY_INT_SSE(VPMOVZXBD, pmovzxbd) +UNARY_INT_SSE(VPMOVZXBQ, pmovzxbq) +UNARY_INT_SSE(VPMOVZXWD, pmovzxwd) +UNARY_INT_SSE(VPMOVZXWQ, pmovzxwq) +UNARY_INT_SSE(VPMOVZXDQ, pmovzxdq) + +#define UNARY_CMP_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + if (!s->vex_l) { \ + gen_helper_##lname##_xmm(cpu_env, s->ptr1, s->ptr2); \ + } else { \ + gen_helper_##lname##_ymm(cpu_env, s->ptr1, s->ptr2); \ + } \ + set_cc_op(s, CC_OP_EFLAGS); \ +} +UNARY_CMP_SSE(VPTEST, ptest) +UNARY_CMP_SSE(VTESTPS, vtestps) +UNARY_CMP_SSE(VTESTPD, vtestpd) static inline void gen_unary_imm_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, SSEFunc_0_ppi xmm, SSEFunc_0_ppi ymm) @@ -595,6 +684,66 @@ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod UNARY_IMM_FP_SSE(VROUNDPS, roundps) UNARY_IMM_FP_SSE(VROUNDPD, roundpd) +static inline void gen_rexw_avx(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_eppp d_xmm, SSEFunc_0_eppp q_xmm, + SSEFunc_0_eppp d_ymm, SSEFunc_0_eppp q_ymm) +{ + SSEFunc_0_eppp d = s->vex_l ? d_ymm : d_xmm; + SSEFunc_0_eppp q = s->vex_l ? q_ymm : q_xmm; + SSEFunc_0_eppp fn = s->rex_w ? q : d; + fn(cpu_env, s->ptr0, s->ptr1, s->ptr2); +} + +/* REX.W affects whether to operate on 32- or 64-bit elements. */ +#define REXW_AVX(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_rexw_avx(s, env, decode, \ + gen_helper_##lname##d_xmm, gen_helper_##lname##q_xmm, \ + gen_helper_##lname##d_ymm, gen_helper_##lname##q_ymm); \ +} +REXW_AVX(VPSLLV, vpsllv) +REXW_AVX(VPSRLV, vpsrlv) +REXW_AVX(VPSRAV, vpsrav) +REXW_AVX(VPMASKMOV, vpmaskmov) + +/* Same as above, but with extra arguments to the helper. */ +static inline void gen_vsib_avx(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_epppti d_xmm, SSEFunc_0_epppti q_xmm, + SSEFunc_0_epppti d_ymm, SSEFunc_0_epppti q_ymm) +{ + SSEFunc_0_epppti d = s->vex_l ? d_ymm : d_xmm; + SSEFunc_0_epppti q = s->vex_l ? q_ymm : q_xmm; + SSEFunc_0_epppti fn = s->rex_w ? q : d; + TCGv_i32 scale = tcg_const_i32(decode->mem.scale); + TCGv_ptr index = tcg_temp_new_ptr(); + + /* Pass third input as (index, base, scale) */ + tcg_gen_addi_ptr(index, cpu_env, ZMM_OFFSET(decode->mem.index)); + fn(cpu_env, s->ptr0, s->ptr1, index, s->A0, scale); + + /* + * There are two output operands, so zero OP1's high 128 bits + * in the VEX.128 case. + */ + if (!s->vex_l) { + tcg_gen_gvec_dup_imm(MO_64, + decode->op[1].offset + offsetof(ZMMReg, ZMM_X(1)), + 16, 16, 0); + } + tcg_temp_free_ptr(index); + tcg_temp_free_i32(scale); +} +#define VSIB_AVX(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + gen_vsib_avx(s, env, decode, \ + gen_helper_##lname##d_xmm, gen_helper_##lname##q_xmm, \ + gen_helper_##lname##d_ymm, gen_helper_##lname##q_ymm); \ +} +VSIB_AVX(VPGATHERD, vpgatherd) +VSIB_AVX(VPGATHERQ, vpgatherq) + static void gen_ADCOX(DisasContext *s, CPUX86State *env, MemOp ot, int cc_op) { TCGv carry_in = NULL; @@ -868,6 +1017,11 @@ static void gen_MOVMSK(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); } +static void gen_MOVNTDQA(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_load_sse(s, s->T0, decode->op[0].ot, decode->op[0].offset); +} + static void gen_MOVQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[2].offset); @@ -915,6 +1069,27 @@ static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) } +static void gen_PABSB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_abs(MO_8, decode->op[0].offset, decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PABSW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_abs(MO_16, decode->op[0].offset, decode->op[2].offset, vec_len, vec_len); +} + +static void gen_PABSD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_abs(MO_32, decode->op[0].offset, decode->op[2].offset, vec_len, vec_len); +} + static void gen_PADDB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); @@ -1009,6 +1184,15 @@ static void gen_PCMPEQD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod decode->op[2].offset, vec_len, vec_len); } +static void gen_PCMPEQQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_cmp(TCG_COND_EQ, MO_64, + decode->op[0].offset, decode->op[1].offset, + decode->op[2].offset, vec_len, vec_len); +} + static void gen_PCMPESTRI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { TCGv_i32 imm = tcg_const_i32(decode->immediate); @@ -1076,6 +1260,15 @@ static void gen_PCMPGTD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod decode->op[2].offset, vec_len, vec_len); } +static void gen_PCMPGTQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_cmp(TCG_COND_GT, MO_64, +o tcg_temp_free_ptr(imm_vec); } +static void gen_VPBROADCASTB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_ld8u_i32(s->tmp2_i32, s->ptr2, 0); + tcg_gen_gvec_dup_i32(MO_8, decode->op[0].offset, vec_len, vec_len, s->tmp2_i32); +} + +static void gen_VPBROADCASTW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_ld16u_i32(s->tmp2_i32, s->ptr2, 0); + tcg_gen_gvec_dup_i32(MO_16, decode->op[0].offset, vec_len, vec_len, s->tmp2_i32); +} + +static void gen_VPBROADCASTD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_ld_i32(s->tmp2_i32, s->ptr2, 0); + tcg_gen_gvec_dup_i32(MO_32, decode->op[0].offset, vec_len, vec_len, s->tmp2_i32); +} + +static void gen_VPBROADCASTQ(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_ld_i64(s->tmp1_i64, s->ptr2, 0); + tcg_gen_gvec_dup_i64(MO_64, decode->op[0].offset, vec_len, vec_len, s->tmp1_i64); +} + static void gen_PXOR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { int vec_len = sse_vec_len(s, decode); @@ -1529,6 +1754,12 @@ static void gen_SSE4a_R(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod } } +static inline void gen_VAESIMC(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + assert(!s->vex_l); + gen_helper_aesimc_xmm(cpu_env, s->ptr0, s->ptr2); +} + static inline void gen_VAESKEYGEN(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { TCGv_i32 imm = tcg_const_i32(decode->immediate); @@ -1540,6 +1771,14 @@ static inline void gen_VAESKEYGEN(DisasContext *s, CPUX86State *env, X86DecodedI #define gen_VAND gen_PAND #define gen_VANDN gen_PANDN +static inline void gen_VBROADCASTx128(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + tcg_gen_gvec_mov(MO_64, decode->op[0].offset, + decode->op[2].offset, 16, 16); + tcg_gen_gvec_mov(MO_64, decode->op[0].offset + offsetof(YMMReg, YMM_X(1)), + decode->op[2].offset, 16, 16); +} + static void gen_VCVTfp2fp(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { gen_unary_fp_sse(s, env, decode, @@ -1657,8 +1896,43 @@ static void gen_VINSERTx128(DisasContext *s, CPUX86State *env, X86DecodedInsn *d decode->op[1].offset + offsetof(YMMReg, YMM_X(!mask)), 16, 16); } +static inline void gen_maskmov(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_0_eppt xmm, SSEFunc_0_eppt ymm) +{ + if (!s->vex_l) { + xmm(cpu_env, s->ptr2, s->ptr1, s->A0); + } else { + ymm(cpu_env, s->ptr2, s->ptr1, s->A0); + } +} + +static void gen_VMASKMOVPD_st(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_maskmov(s, env, decode, gen_helper_vpmaskmovq_st_xmm, gen_helper_vpmaskmovq_st_ymm); +} + +static void gen_VMASKMOVPS_st(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_maskmov(s, env, decode, gen_helper_vpmaskmovd_st_xmm, gen_helper_vpmaskmovd_st_ymm); +} + +static void gen_VPMASKMOV_st(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + if (s->rex_w) { + gen_VMASKMOVPD_st(s, env, decode); + } else { + gen_VMASKMOVPS_st(s, env, decode); + } +} + #define gen_VOR gen_POR +static void gen_VPERMD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + assert(s->vex_l); + gen_helper_vpermd_ymm(s->ptr0, s->ptr1, s->ptr2); +} + static inline void gen_VPERM2x128(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { TCGv_i32 imm = tcg_const_i32(decode->immediate); @@ -1667,6 +1941,12 @@ static inline void gen_VPERM2x128(DisasContext *s, CPUX86State *env, X86DecodedI tcg_temp_free_i32(imm); } +static void gen_VPHMINPOSUW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + assert(!s->vex_l); + gen_helper_phminposuw_xmm(cpu_env, s->ptr0, s->ptr2); +} + static inline void gen_VROUNDSD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { TCGv_i32 imm = tcg_const_i32(decode->immediate); diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 556087b1e9..e42cb275a1 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4667,7 +4667,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) use_new &= b <= limit; #endif if (use_new && - (b == 0x13a || + (b == 0x138 || b == 0x13a || (b >= 0x150 && b <= 0x17f) || (b >= 0x1d0 && b <= 0x1ff))) { return disas_insn_new(s, cpu, b); From patchwork Sun Sep 11 23:04:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973156 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E8348ECAAD3 for ; Sun, 11 Sep 2022 23:45:47 +0000 (UTC) Received: from localhost ([::1]:52526 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWe2-0008O3-Sj for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:45:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40458) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1S-0001Qz-3g for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:50824) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1Q-0007N2-53 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VBx+1Tm3b7AbQAkF8waIObgv+8BD9DA4k8u8OvaLSGY=; b=ghDDW3ju/vM8WspRy7h4Av4Dbpv6/cAtObY+dyCcRd359Y3zeawPljBZJzCd6Jv+TX94zS 9x103M3Qu/deK9pVx9JcxAUIhq0ZZIC9MyvQS6LhAAhlkm6a3VFMgsYk15ZZunVkHcSz5p ixtwBw8zvNFt/tltmnQR4CfonL/XUMY= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-516-r2AZ_cLxONyrNuz_h-5mbg-1; Sun, 11 Sep 2022 19:05:50 -0400 X-MC-Unique: r2AZ_cLxONyrNuz_h-5mbg-1 Received: by mail-ej1-f70.google.com with SMTP id qw34-20020a1709066a2200b0077e0e8a55b4so47330ejc.21 for ; Sun, 11 Sep 2022 16:05:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=VBx+1Tm3b7AbQAkF8waIObgv+8BD9DA4k8u8OvaLSGY=; b=VW/UjPdcjL1B5N1r6xdPJVrDzGd9M7MO/yk4sc9Z8I39/Cg2zphQKl5UY1//mq5srN +ygXxsj52IikUcCLMq5DHko9foV/hHg+PjggmSsENmRJ79nGpvcudQDgJRbzCx7v76fQ +xDEW+Cf6JhFhgsjSxSuenih0pY6oi20zLizAM9nTkg+kbNDbfxNSU+TQt1PWkrFd0WK RqdfQm5wfWV+hLWti7P//mkmxyRy5BVeLob5ShuK7xRTjaZegqa4+m1MGeggv41FtuEp k/92kYVDC9aRwEMS64mM75ZH9MjpJAM0IB6WDcdJMXhY2H3civG5lLzjoFSU4v8ERAq+ 2GKw== X-Gm-Message-State: ACgBeo1+w6HPjRGeJBPy5atbqwBPU3nnQSOuoN0igFlADAcbcQqpsGba CtvkD6eGTrPXC3Pv2UYjq5qIyWSrEsiFbPPNEAPRSkgWFE/lxrI9Hx/Y7z2kr79gGkaWxMz9IQU xebvFuhFDWy+qg7w/RLkUWMkAelNNvJcQQmtyWZx5oLLaQjduDG58Q3QCxgkgYkVIlf8= X-Received: by 2002:a17:906:cc14:b0:779:8ae0:eece with SMTP id ml20-20020a170906cc1400b007798ae0eecemr9361802ejb.418.1662937548855; Sun, 11 Sep 2022 16:05:48 -0700 (PDT) X-Google-Smtp-Source: AA6agR5MV4qqDhIG9+X0BJ7dNBM/dc8OtIUljJaX5Xc3OvxnAAz0Z09H+h+O8LFIDqKXXu9u5G9nQA== X-Received: by 2002:a17:906:cc14:b0:779:8ae0:eece with SMTP id ml20-20020a170906cc1400b007798ae0eecemr9361783ejb.418.1662937548515; Sun, 11 Sep 2022 16:05:48 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id j10-20020a17090623ea00b0073cdeedf56fsm3520988ejg.57.2022.09.11.16.05.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:48 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 29/37] target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX Date: Mon, 12 Sep 2022 01:04:09 +0200 Message-Id: <20220911230418.340941-30-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Nothing special going on here, for once. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/decode-new.c.inc | 5 +++ target/i386/tcg/emit.c.inc | 76 ++++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 1 + 3 files changed, 82 insertions(+) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 7feb0eca4e..c51b59f721 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -579,6 +579,11 @@ static const X86OpEntry opcodes_0F[256] = { [0x7e] = X86_OP_GROUP0(0F7E), [0x7f] = X86_OP_GROUP3(0F6F, W,x, None,None, V,x, vex5 mmx p_00_66_f3), + [0xc2] = X86_OP_ENTRY4(VCMP, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), + [0xc4] = X86_OP_ENTRY4(PINSRW, V,dq,H,dq,E,w, vex5 mmx p_00_66), + [0xc5] = X86_OP_ENTRY3(PEXTRW, G,d, U,dq,I,b, vex5 mmx p_00_66), + [0xc6] = X86_OP_ENTRY4(VSHUF, V,x, H,x, W,x, vex4 p_00_66), + [0xd0] = X86_OP_ENTRY3(VADDSUB, V,x, H,x, W,x, vex2 cpuid(SSE3) p_66_f2), [0xd1] = X86_OP_ENTRY3(PSRLW_r, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0xd2] = X86_OP_ENTRY3(PSRLD_r, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 7084875af6..d1819f3581 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -1367,6 +1367,11 @@ static void gen_PINSRB(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode gen_pinsr(s, env, decode, MO_8); } +static void gen_PINSRW(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_pinsr(s, env, decode, MO_16); +} + static void gen_PINSR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { gen_pinsr(s, env, decode, decode->op[2].ot); @@ -1779,6 +1784,66 @@ static inline void gen_VBROADCASTx128(DisasContext *s, CPUX86State *env, X86Deco decode->op[2].offset, 16, 16); } +/* + * 00 = v*ps Vps, Hps, Wpd + * 66 = v*pd Vpd, Hpd, Wps + * f3 = v*ss Vss, Hss, Wps + * f2 = v*sd Vsd, Hsd, Wps + */ +#define SSE_CMP(x) { \ + gen_helper_ ## x ## ps ## _xmm, gen_helper_ ## x ## pd ## _xmm, \ + gen_helper_ ## x ## ss, gen_helper_ ## x ## sd, \ + gen_helper_ ## x ## ps ## _ymm, gen_helper_ ## x ## pd ## _ymm} +static const SSEFunc_0_eppp gen_helper_cmp_funcs[32][6] = { + SSE_CMP(cmpeq), + SSE_CMP(cmplt), + SSE_CMP(cmple), + SSE_CMP(cmpunord), + SSE_CMP(cmpneq), + SSE_CMP(cmpnlt), + SSE_CMP(cmpnle), + SSE_CMP(cmpord), + + SSE_CMP(cmpequ), + SSE_CMP(cmpnge), + SSE_CMP(cmpngt), + SSE_CMP(cmpfalse), + SSE_CMP(cmpnequ), + SSE_CMP(cmpge), + SSE_CMP(cmpgt), + SSE_CMP(cmptrue), + + SSE_CMP(cmpeqs), + SSE_CMP(cmpltq), + SSE_CMP(cmpleq), + SSE_CMP(cmpunords), + SSE_CMP(cmpneqq), + SSE_CMP(cmpnltq), + SSE_CMP(cmpnleq), + SSE_CMP(cmpords), + + SSE_CMP(cmpequs), + SSE_CMP(cmpngeq), + SSE_CMP(cmpngtq), + SSE_CMP(cmpfalses), + SSE_CMP(cmpnequs), + SSE_CMP(cmpgeq), + SSE_CMP(cmpgtq), + SSE_CMP(cmptrues), +}; +#undef SSE_CMP + +static inline void gen_VCMP(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int index = decode->immediate & (s->prefix & PREFIX_VEX ? 31 : 7); + int b = + s->prefix & PREFIX_REPZ ? 2 /* ss */ : + s->prefix & PREFIX_REPNZ ? 3 /* ss */ : + !!(s->prefix & PREFIX_DATA) + (s->vex_l << 2); + + gen_helper_cmp_funcs[index][b](cpu_env, s->ptr0, s->ptr1, s->ptr2); +} + static void gen_VCVTfp2fp(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { gen_unary_fp_sse(s, env, decode, @@ -1963,4 +2028,15 @@ static inline void gen_VROUNDSS(DisasContext *s, CPUX86State *env, X86DecodedIns tcg_temp_free_i32(imm); } +static inline void gen_VSHUF(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv_i32 imm = tcg_const_i32(decode->immediate); + SSEFunc_0_pppi ps, pd, fn; + ps = s->vex_l ? gen_helper_shufps_ymm : gen_helper_shufps_xmm; + pd = s->vex_l ? gen_helper_shufpd_ymm : gen_helper_shufpd_xmm; + fn = s->prefix & PREFIX_DATA ? pd : ps; + fn(s->ptr0, s->ptr1, s->ptr2, imm); + tcg_temp_free_i32(imm); +} + #define gen_VXOR gen_PXOR diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index e42cb275a1..468867afcf 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4669,6 +4669,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) if (use_new && (b == 0x138 || b == 0x13a || (b >= 0x150 && b <= 0x17f) || + b == 0x1c2 || (b >= 0x1c4 && b <= 0x1c6) || (b >= 0x1d0 && b <= 0x1ff))) { return disas_insn_new(s, cpu, b); } From patchwork Sun Sep 11 23:04:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 64BAEECAAD3 for ; Sun, 11 Sep 2022 23:28:52 +0000 (UTC) Received: from localhost ([::1]:52214 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWNf-0000y7-FO for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:28:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44884) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1W-0001VG-Df for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:47699) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1T-0007Na-Sq for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kO2wRpfBkapFQOkArHuIU88jIMJhfesrSzDONpHHT5g=; b=T+aqoD5lKdVfcTRpnOb9pR9br9xH+dygr0XfXjqmb/ERGeKbKvX/CK2CFz/WnJqphLHnDu BOmrXvscOboaVh57qNIiFDCD1e/KYMQEyxM1XheEVm3JP4bs9c+WUACuapfxw0dGtumM+3 TlnaZ4b0JV+2F+7ZWmss1MpQyY5C5sk= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-641-j8sOznFQOLyarvAPkXZBYA-1; Sun, 11 Sep 2022 19:05:53 -0400 X-MC-Unique: j8sOznFQOLyarvAPkXZBYA-1 Received: by mail-ed1-f69.google.com with SMTP id z2-20020a056402274200b004516734e755so2469868edd.3 for ; Sun, 11 Sep 2022 16:05:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=kO2wRpfBkapFQOkArHuIU88jIMJhfesrSzDONpHHT5g=; b=7NOlkR0bq8VWwvaWb3SfoR8Bz/DgiP9PjOAc5UNEyVxejGUBtE2GEH8DnbeEHsRb2Q jkflXJ9pnH+SObyJq3LdbNRap0SACfBefCjHpNlDE8gCjLBtmqniOhTrzo9FDcXQ7sfL CnzP8ojvv8VO/cWgE5dvRLsqCkIj6R2PaJO0kMK3OdOrls8Qn7Gi/8khcnEaQXsi5+aQ x5eJXV90N6oU8WSb6uaeXC2TTb3md6+tpiKrYPEk0vhD6ekIYf1AFZCQEK+jd5GSNr2N 9oQQ67nb2QDYppSKAvDccJz5QHQ20nFWLIwytZEaWG+h5Xfia+UEBKMOoJjkN6W0+yit yVmw== X-Gm-Message-State: ACgBeo2iDIBaeeGa04p8V5tYuyp4FtFjMBh48IAvQ/TzoZkwG8sbknWh PtO/R0T27Ex96rmt79a7sQwxVzi7IGYe+JRXezOgEBpa4JawHsE7rUc2sERo8v3iUt03KOiFKCA n4sWXzJP8JUTYws97sUjZo8a2PC2wChXwbGKyr9WAGrotl/Xu5eBDwxQlKRfuWaZlCcU= X-Received: by 2002:a17:906:6a8d:b0:741:6a3b:536e with SMTP id p13-20020a1709066a8d00b007416a3b536emr17043186ejr.11.1662937551890; Sun, 11 Sep 2022 16:05:51 -0700 (PDT) X-Google-Smtp-Source: AA6agR6tJU72x2sQFhWP3EWkmlqapnrMN9ibEAcPfwzi19ttZT4VW0gG9nXxZw1udh3Qn84iwfnhbQ== X-Received: by 2002:a17:906:6a8d:b0:741:6a3b:536e with SMTP id p13-20020a1709066a8d00b007416a3b536emr17043166ejr.11.1662937551385; Sun, 11 Sep 2022 16:05:51 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id n17-20020a1709061d1100b007303fe58eb2sm3452740ejh.154.2022.09.11.16.05.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:50 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 30/37] target/i386: reimplement 0x0f 0x10-0x17, add AVX Date: Mon, 12 Sep 2022 01:04:10 +0200 Message-Id: <20220911230418.340941-31-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" These are mostly moves, and yet are a total pain. The main issue is that: 1) some instructions are selected by mod==11 (register operand) vs. mod=00/01/10 (memory operand) 2) stores to memory are two-operand operations, while the 3-register and load-from-memory versions operate on the entire contents of the destination; this makes it easier to separate the gen_* function for the store case 3) it's inefficient to load into xmm_T0 only to move the value out again, so the gen_* function for the load case is separated too The manual also has various mistakes in the operands here, for example the store case of MOVHPS operates on a 128-bit source (albeit discarding the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq. Likewise for the destination and source of MOVHLPS. VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ, but encoded as prefixes rather than separate operands. The helpers can be reused however. For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as helpers. I named the helper for MOVDDUP "movdldup" in preparation for possible future introduction of MOVDHDUP and to clarify the similarity with MOVSLDUP. Signed-off-by: Paolo Bonzini --- target/i386/ops_sse.h | 7 ++ target/i386/ops_sse_header.h | 3 + target/i386/tcg/decode-new.c.inc | 121 ++++++++++++++++++++++++++++++ target/i386/tcg/emit.c.inc | 123 +++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 1 + 5 files changed, 255 insertions(+) (b == 0x138 || b == 0x13a || + (b >= 0x110 && b <= 0x117) || (b >= 0x150 && b <= 0x17f) || b == 0x1c2 || (b >= 0x1c4 && b <= 0x1c6) || (b >= 0x1d0 && b <= 0x1ff))) { diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index fbbe82c6e7..52cae7ebe7 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1683,6 +1683,10 @@ void glue(helper_ptest, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) CC_SRC = (zf ? 0 : CC_Z) | (cf ? 0 : CC_C); } +#define FMOVSLDUP(i) s->L((i) & ~1) +#define FMOVSHDUP(i) s->L((i) | 1) +#define FMOVDLDUP(i) s->Q((i) & ~1) + #define SSE_HELPER_F(name, elem, num, F) \ void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ { \ @@ -1705,6 +1709,9 @@ SSE_HELPER_F(helper_pmovzxbq, Q, 1 << SHIFT, s->B) SSE_HELPER_F(helper_pmovzxwd, L, 2 << SHIFT, s->W) SSE_HELPER_F(helper_pmovzxwq, Q, 1 << SHIFT, s->W) SSE_HELPER_F(helper_pmovzxdq, Q, 1 << SHIFT, s->L) +SSE_HELPER_F(helper_pmovsldup, L, 2 << SHIFT, FMOVSLDUP) +SSE_HELPER_F(helper_pmovshdup, L, 2 << SHIFT, FMOVSHDUP) +SSE_HELPER_F(helper_pmovdldup, Q, 1 << SHIFT, FMOVDLDUP) #endif void glue(helper_pmuldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index e188cbd87d..ed51f10eef 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -355,6 +355,9 @@ DEF_HELPER_3(glue(pmovzxbq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmovzxwd, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmovzxwq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pmovzxdq, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(pmovsldup, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(pmovshdup, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(pmovdldup, SUFFIX), void, env, Reg, Reg) DEF_HELPER_4(glue(pmuldq, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_4(glue(pcmpeqq, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_4(glue(packusdw, SUFFIX), void, env, Reg, Reg, Reg) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index c51b59f721..268ccb886f 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -509,6 +509,117 @@ static void decode_0F3A(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui *entry = opcodes_0F3A[*b]; } +/* + * There are some mistakes in the operands in the manual, and the load/store/register + * cases are easiest to keep separate, so the entries for 10-17 follow simplicity and + * efficiency of implementation rather than copying what the manual says. + * + * In particular: + * + * 1) "VMOVSS m32, xmm1" and "VMOVSD m64, xmm1" do not support VEX.vvvv != 1111b, + * but this is not mentioned in the tables. + * + * 2) MOVHLPS, MOVHPS, MOVHPD, MOVLPD, MOVLPS read the high quadword of one of their + * operands, which must therefore be dq; MOVLPD and MOVLPS also write the high + * quadword of the V operand. + */ +static void decode_0F10(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry opcodes_0F10_reg[4] = { + X86_OP_ENTRY3(MOVDQ, V,x, None,None, W,x, vex4_unal), /* MOVUPS */ + X86_OP_ENTRY3(MOVDQ, V,x, None,None, W,x, vex4_unal), /* MOVUPD */ + X86_OP_ENTRY3(VMOVSS, V,x, H,x, W,x, vex4), + X86_OP_ENTRY3(VMOVLPx, V,x, H,x, W,x, vex4), /* MOVSD */ + }; + + static const X86OpEntry opcodes_0F10_mem[4] = { + X86_OP_ENTRY3(MOVDQ, V,x, None,None, W,x, vex4_unal), /* MOVUPS */ + X86_OP_ENTRY3(MOVDQ, V,x, None,None, W,x, vex4_unal), /* MOVUPD */ + X86_OP_ENTRY3(VMOVSS_ld, V,x, H,x, M,ss, vex4), + X86_OP_ENTRY3(VMOVSD_ld, V,x, H,x, M,sd, vex4), + }; + + if ((get_modrm(s, env) >> 6) == 3) { + *entry = *decode_by_prefix(s, opcodes_0F10_reg); + } else { + *entry = *decode_by_prefix(s, opcodes_0F10_mem); + } +} + +static void decode_0F11(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry opcodes_0F11_reg[4] = { + X86_OP_ENTRY3(MOVDQ, W,x, None,None, V,x, vex4), /* MOVPS */ + X86_OP_ENTRY3(MOVDQ, W,x, None,None, V,x, vex4), /* MOVPD */ + X86_OP_ENTRY3(VMOVSS, W,x, H,x, V,x, vex4), + X86_OP_ENTRY3(VMOVLPx, W,x, H,x, V,q, vex4), /* MOVSD */ + }; + + static const X86OpEntry opcodes_0F11_mem[4] = { + X86_OP_ENTRY3(MOVDQ, W,x, None,None, V,x, vex4), /* MOVPS */ + X86_OP_ENTRY3(MOVDQ, W,x, None,None, V,x, vex4), /* MOVPD */ + X86_OP_ENTRY3(VMOVSS_st, M,ss, None,None, V,x, vex4), + X86_OP_ENTRY3(VMOVLPx_st, M,sd, None,None, V,x, vex4), /* MOVSD */ + }; + + if ((get_modrm(s, env) >> 6) == 3) { + *entry = *decode_by_prefix(s, opcodes_0F11_reg); + } else { + *entry = *decode_by_prefix(s, opcodes_0F11_mem); + } +} + +static void decode_0F12(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry opcodes_0F12_mem[4] = { + /* + * Use dq for operand for compatibility with gen_MOVSD and + * to allow VEX128 only. + */ + X86_OP_ENTRY3(VMOVLPx_ld, V,dq, H,dq, M,q, vex4), /* MOVLPS */ + X86_OP_ENTRY3(VMOVLPx_ld, V,dq, H,dq, M,q, vex4), /* MOVLPD */ + X86_OP_ENTRY3(VMOVSLDUP, V,x, None,None, W,x, vex4 cpuid(SSE3)), + X86_OP_ENTRY3(VMOVDDUP, V,x, None,None, WM,q, vex4 cpuid(SSE3)), /* qq if VEX.256 */ + }; + static const X86OpEntry opcodes_0F12_reg[4] = { + X86_OP_ENTRY3(VMOVHLPS, V,dq, H,dq, U,dq, vex4), + X86_OP_ENTRY3(VMOVLPx, W,x, H,x, U,q, vex4), /* MOVLPD */ + X86_OP_ENTRY3(VMOVSLDUP, V,x, None,None, U,x, vex4 cpuid(SSE3)), + X86_OP_ENTRY3(VMOVDDUP, V,x, None,None, U,x, vex4 cpuid(SSE3)), + }; + + if ((get_modrm(s, env) >> 6) == 3) { + *entry = *decode_by_prefix(s, opcodes_0F12_reg); + } else { + *entry = *decode_by_prefix(s, opcodes_0F12_mem); + if ((s->prefix & PREFIX_REPNZ) && s->vex_l) { + entry->s2 = X86_SIZE_qq; + } + } +} + +static void decode_0F16(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry opcodes_0F16_mem[4] = { + X86_OP_ENTRY3(VMOVHPx_ld, V,dq, H,q, M,q, vex4), /* MOVHPS */ + X86_OP_ENTRY3(VMOVHPx_ld, V,dq, H,q, M,q, vex4), /* MOVHPD */ + X86_OP_ENTRY3(VMOVSHDUP, V,x, None,None, W,x, vex4 cpuid(SSE3)), + {}, + }; + static const X86OpEntry opcodes_0F16_reg[4] = { + X86_OP_ENTRY3(VMOVLHPS, V,dq, H,q, U,q, vex4), + X86_OP_ENTRY3(VMOVHPx, V,x, H,x, U,x, vex4), /* MOVHPD */ + X86_OP_ENTRY3(VMOVSHDUP, V,x, None,None, U,x, vex4 cpuid(SSE3)), + {}, + }; + + if ((get_modrm(s, env) >> 6) == 3) { + *entry = *decode_by_prefix(s, opcodes_0F16_reg); + } else { + *entry = *decode_by_prefix(s, opcodes_0F16_mem); + } +} + static void decode_sse_unary(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) { if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ))) { @@ -524,6 +635,16 @@ static void decode_sse_unary(DisasContext *s, CPUX86State *env, X86OpEntry *entr } static const X86OpEntry opcodes_0F[256] = { + [0x10] = X86_OP_GROUP0(0F10), + [0x11] = X86_OP_GROUP0(0F11), + [0x12] = X86_OP_GROUP0(0F12), + [0x13] = X86_OP_ENTRY3(VMOVLPx_st, M,q, None,None, V,q, vex4 p_00_66), + [0x14] = X86_OP_ENTRY3(VUNPCKLPx, V,x, H,x, W,x, vex4 p_00_66), + [0x15] = X86_OP_ENTRY3(VUNPCKHPx, V,x, H,x, W,x, vex4 p_00_66), + [0x16] = X86_OP_GROUP0(0F16), + /* Incorrectly listed as Mq,Vq in the manual */ + [0x17] = X86_OP_ENTRY3(VMOVHPx_st, M,q, None,None, V,dq, vex4 p_00_66), + [0x50] = X86_OP_ENTRY3(MOVMSK, G,y, None,None, U,x, vex7 p_00_66), [0x51] = X86_OP_GROUP3(sse_unary, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), [0x52] = X86_OP_GROUP3(sse_unary, V,x, H,x, W,x, vex5 p_00_f3), diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index d1819f3581..2319368cb5 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -326,6 +326,7 @@ static inline void gen_fp_sse(DisasContext *s, CPUX86State *env, X86DecodedInsn gen_illegal_opcode(s); } } + #define FP_SSE(uname, lname) \ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ { \ @@ -344,6 +345,20 @@ FP_SSE(VMIN, min) FP_SSE(VDIV, div) FP_SSE(VMAX, max) +#define FP_UNPACK_SSE(uname, lname) \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ +{ \ + /* PS maps to the DQ integer instruction, PD maps to QDQ. */ \ + gen_fp_sse(s, env, decode, \ + gen_helper_##lname##qdq_xmm, \ + gen_helper_##lname##dq_xmm, \ + gen_helper_##lname##qdq_ymm, \ + gen_helper_##lname##dq_ymm, \ + NULL, NULL); \ +} +FP_UNPACK_SSE(VUNPCKLPx, punpckl) +FP_UNPACK_SSE(VUNPCKHPx, punpckh) + /* * 00 = v*ps Vps, Wpd * f3 = v*ss Vss, Wps @@ -619,6 +634,10 @@ UNARY_INT_SSE(VPMOVZXWD, pmovzxwd) UNARY_INT_SSE(VPMOVZXWQ, pmovzxwq) UNARY_INT_SSE(VPMOVZXDQ, pmovzxdq) +UNARY_INT_SSE(VMOVSLDUP, pmovsldup) +UNARY_INT_SSE(VMOVSHDUP, pmovshdup) +UNARY_INT_SSE(VMOVDDUP, pmovdldup) + #define UNARY_CMP_SSE(uname, lname) \ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) \ { \ @@ -1981,6 +2000,110 @@ static void gen_VMASKMOVPS_st(DisasContext *s, CPUX86State *env, X86DecodedInsn gen_maskmov(s, env, decode, gen_helper_vpmaskmovd_st_xmm, gen_helper_vpmaskmovd_st_ymm); } +static void gen_VMOVHPx_ld(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + if (decode->op[0].offset != decode->op[1].offset) { + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0))); + } + gen_ldq_env_A0(s, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1))); +} + +static void gen_VMOVHPx_st(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_stq_env_A0(s, decode->op[2].offset + offsetof(XMMReg, XMM_Q(1))); +} + +static void gen_VMOVHPx(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + if (decode->op[0].offset != decode->op[1].offset) { + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0))); + } + if (decode->op[0].offset != decode->op[2].offset) { + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(1))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1))); + } +} + +static void gen_VMOVHLPS(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(1))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0))); + if (decode->op[0].offset != decode->op[1].offset) { + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(1))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1))); + } +} + +static void gen_VMOVLHPS(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(0))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1))); + if (decode->op[0].offset != decode->op[1].offset) { + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0))); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0))); + } +} + +static void gen_VMOVLPx(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_ld_i64(s->tmp1_i64, cpu_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(0))); + tcg_gen_gvec_mov(MO_64, decode->op[0].offset, decode->op[1].offset, vec_len, vec_len); + tcg_gen_st_i64(s->tmp1_i64, cpu_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0))); +} + +static void gen_VMOVLPx_ld(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_mov(MO_64, decode->op[0].offset, decode->op[1].offset, vec_len, vec_len); + tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, s->mem_index, MO_64); + tcg_gen_st_i64(s->tmp1_i64, s->ptr0, offsetof(ZMMReg, ZMM_Q(0))); +} + +static void gen_VMOVLPx_st(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + tcg_gen_ld_i64(s->tmp1_i64, s->ptr2, offsetof(ZMMReg, ZMM_Q(0))); + tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0, s->mem_index, MO_64); +} + +static void gen_VMOVSD_ld(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGv zero = tcg_const_i64(0); + + tcg_gen_st_i64(zero, s->ptr0, offsetof(ZMMReg, ZMM_Q(1))); + tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, s->mem_index, MO_64); + tcg_gen_st_i64(s->tmp1_i64, s->ptr0, offsetof(ZMMReg, ZMM_Q(0))); + tcg_temp_free_i64(zero); +} + +sL(0))); +} + +static void gen_VMOVSS_ld(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + + tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0); + tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0, s->mem_index, MO_32); + tcg_gen_st_i32(s->tmp2_i32, s->ptr0, offsetof(ZMMReg, ZMM_L(0))); +} + +static void gen_VMOVSS_st(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + tcg_gen_ld_i32(s->tmp2_i32, s->ptr2, offsetof(ZMMReg, ZMM_L(0))); + tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0, s->mem_index, MO_32); +} + static void gen_VPMASKMOV_st(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { if (s->rex_w) { diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 468867afcf..bb5f74140c 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4668,6 +4668,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) #endif if (use_new && From patchwork Sun Sep 11 23:04:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973134 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84AB1ECAAD3 for ; Sun, 11 Sep 2022 23:32:16 +0000 (UTC) Received: from localhost ([::1]:36298 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWQx-0006kk-K2 for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:32:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44886) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1X-0001XH-NJ for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:00 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:56399) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1V-0007Nm-Oa for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:05:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937557; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M+nKCzzgE5YjMK/hJLVxLXGbAIK3id29IdAPMiGFWyE=; b=cK6w6zthl/UwRwKfqGeE4A8vVs4u5Mk/W5sNjuxsMq7zS2G3E/M3bL+eN//kbjcE+sTGWG jZB6TuIkVbwzNaGeVyacjxjrWg/n0zrtLXoufY3Oc5OJdNr7HJp+fzwmjO9lj4DKRew3yV oJawjtrQPcG3ulP4/f5dS64tWY2UeiE= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-121-ii7LG8noOAOeP1bSis91Yw-1; Sun, 11 Sep 2022 19:05:55 -0400 X-MC-Unique: ii7LG8noOAOeP1bSis91Yw-1 Received: by mail-ed1-f71.google.com with SMTP id z2-20020a056402274200b004516734e755so2469902edd.3 for ; Sun, 11 Sep 2022 16:05:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=M+nKCzzgE5YjMK/hJLVxLXGbAIK3id29IdAPMiGFWyE=; b=oeTycS3hi+5NQ5Fc0gVlui4+KZ8HQxVlYiB/4l7l4WTA6J0eH973wOrlS8RZ5pLUZ+ GojsUha3GbqPZdrosc+16cutX47UbQF3q47db/L/L7fCwIrI+P79t1o9rJjTBkTE7DHO mceCDoQxf4YrL4YI9XWyriGz8czVOvDnVUp/37OkjDErDcRphn022Sd6RQ3MKMYb2Ujl wGNPXZELCDLG+uTcnQQcMTm1RRA5YQttzwR40/dKbrXmYhmVWTPuKxF0jVWw8GDPiojI 7c6TsJSX5Fdu3bu7ipqxeZHPnqLcg4RdMzluHy4kWePLsdactzgaiCY60CoSVotMdSTN 3mfA== X-Gm-Message-State: ACgBeo1OIhmzHHrqm6JVOECMwNogZd3Ddu3ScTZKE8z0RV8S5Qrls5FL 7aQ6sGXRpwlRZCV1ZZN8m/UT7CvqaHyKqD41IoASNZf7hR9iAkJEyTI2nb6RiB1nrI+4rx5qZKj TPWuWQuDPM0nnR3+KmZZPzmx6z7G/nmmBhRdXrJVxXmA6YTCfjmKlsam95Xup++LF9Og= X-Received: by 2002:a05:6402:84e:b0:440:4bac:be5a with SMTP id b14-20020a056402084e00b004404bacbe5amr20240091edz.103.1662937554229; Sun, 11 Sep 2022 16:05:54 -0700 (PDT) X-Google-Smtp-Source: AA6agR4V7OLghGY+a2XlZeOWTd980Net83lAH8YbXMAu+n4oIGyt9qvMc4cD0yf85XP0X1m/Mcf4kA== X-Received: by 2002:a05:6402:84e:b0:440:4bac:be5a with SMTP id b14-20020a056402084e00b004404bacbe5amr20240075edz.103.1662937553905; Sun, 11 Sep 2022 16:05:53 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id r12-20020aa7da0c000000b004511957d075sm4660022eds.80.2022.09.11.16.05.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:53 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 31/37] target/i386: reimplement 0x0f 0x28-0x2f, add AVX Date: Mon, 12 Sep 2022 01:04:11 +0200 Message-Id: <20220911230418.340941-32-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Here the code is a bit uglier due to the truncation and extension of registers to and from 32-bit. Otherwise there is nothing special going on. Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 54 ++++++++++++++ target/i386/tcg/emit.c.inc | 120 +++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 1 + 3 files changed, 175 insertions(+) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 268ccb886f..383a425ccd 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -620,6 +620,51 @@ static void decode_0F16(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui } } +static void decode_0F2A(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry opcodes_0F2A[4] = { + X86_OP_ENTRY3(CVTPI2Px, V,x, None,None, Q,q, vex4), + X86_OP_ENTRY3(CVTPI2Px, V,x, None,None, Q,q, vex4), + X86_OP_ENTRY3(VCVTSI2Sx, V,x, H,x, E,y, vex3), + X86_OP_ENTRY3(VCVTSI2Sx, V,x, H,x, E,y, vex3), + }; + *entry = *decode_by_prefix(s, opcodes_0F2A); +} + +static void decode_0F2B(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry opcodes_0F2B[4] = { + X86_OP_ENTRY3(MOVDQ, M,x, None,None, V,x, vex4), /* MOVNTPS */ + X86_OP_ENTRY3(MOVDQ, M,x, None,None, V,x, vex4), /* MOVNTPD */ + X86_OP_ENTRY3(VMOVSS_st, M,ss, None,None, V,x, vex4), + X86_OP_ENTRY3(VMOVLPx_st, M,sd, None,None, V,x, vex4), /* MOVSD */ + }; + + *entry = *decode_by_prefix(s, opcodes_0F2B); +} + +static void decode_0F2C(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry opcodes_0F2C[4] = { + X86_OP_ENTRY3(CVTTPx2PI, P,q, None,None, W,x, vex4), + X86_OP_ENTRY3(CVTTPx2PI, P,q, None,None, W,x, vex4), + X86_OP_ENTRY3(VCVTTSx2SI, G,y, None,None, W,x, vex3), + X86_OP_ENTRY3(VCVTTSx2SI, G,y, None,None, W,x, vex3), + }; + *entry = *decode_by_prefix(s, opcodes_0F2C); +} + +static void decode_0F2D(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + static const X86OpEntry opcodes_0F2D[4] = { + X86_OP_ENTRY3(CVTPx2PI, P,q, None,None, W,x, vex4), + X86_OP_ENTRY3(CVTPx2PI, P,q, None,None, W,x, vex4), + X86_OP_ENTRY3(VCVTSx2SI, G,y, None,None, W,x, vex3), + X86_OP_ENTRY3(VCVTSx2SI, G,y, None,None, W,x, vex3), + }; + *entry = *decode_by_prefix(s, opcodes_0F2D); +} + static void decode_sse_unary(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) { if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ))) { @@ -672,6 +717,15 @@ static const X86OpEntry opcodes_0F[256] = { [0x76] = X86_OP_ENTRY3(PCMPEQD, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66), [0x77] = X86_OP_ENTRY0(EMMS_VZERO, vex8), + [0x28] = X86_OP_ENTRY3(MOVDQ, V,x, None,None, W,x, vex1 p_00_66), /* MOVAPS */ + [0x29] = X86_OP_ENTRY3(MOVDQ, W,x, None,None, V,x, vex1 p_00_66), /* MOVAPS */ + [0x2A] = X86_OP_GROUP0(0F2A), + [0x2B] = X86_OP_GROUP0(0F2B), + [0x2C] = X86_OP_GROUP0(0F2C), + [0x2D] = X86_OP_GROUP0(0F2D), + [0x2E] = X86_OP_ENTRY3(VUCOMI, None,None, V,x, W,x, vex4 p_00_66), + [0x2F] = X86_OP_ENTRY3(VCOMI, None,None, V,x, W,x, vex4 p_00_66), + [0x38] = X86_OP_GROUP0(0F38), [0x3a] = X86_OP_GROUP0(0F3A), diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 2319368cb5..d61b43f21c 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -921,6 +921,36 @@ static void gen_CRC32(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) gen_helper_crc32(s->T0, s->tmp2_i32, s->T1, tcg_const_i32(8 << ot)); } +static void gen_CVTPI2Px(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_helper_enter_mmx(cpu_env); + if (s->prefix & PREFIX_DATA) { + gen_helper_cvtpi2pd(cpu_env, s->ptr0, s->ptr2); + } else { + gen_helper_cvtpi2ps(cpu_env, s->ptr0, s->ptr2); + } +} + +static void gen_CVTPx2PI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_helper_enter_mmx(cpu_env); + if (s->prefix & PREFIX_DATA) { + gen_helper_cvtpd2pi(cpu_env, s->ptr0, s->ptr2); + } else { + gen_helper_cvtps2pi(cpu_env, s->ptr0, s->ptr2); + } +} + +static void gen_CVTTPx2PI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_helper_enter_mmx(cpu_env); + if (s->prefix & PREFIX_DATA) { + gen_helper_cvttpd2pi(cpu_env, s->ptr0, s->ptr2); + } else { + gen_helper_cvttps2pi(cpu_env, s->ptr0, s->ptr2); + } +} + static void gen_EMMS_VZERO(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { if (!(s->prefix & PREFIX_VEX)) { @@ -1863,6 +1893,14 @@ static inline void gen_VCMP(DisasContext *s, CPUX86State *env, X86DecodedInsn *d gen_helper_cmp_funcs[index][b](cpu_env, s->ptr0, s->ptr1, s->ptr2); } +static void gen_VCOMI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + SSEFunc_0_epp fn; + fn = s->prefix & PREFIX_DATA ? gen_helper_comisd : gen_helper_comiss; + fn(cpu_env, s->ptr1, s->ptr2); + set_cc_op(s, CC_OP_EFLAGS); +} + static void gen_VCVTfp2fp(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { gen_unary_fp_sse(s, env, decode, @@ -1871,6 +1909,80 @@ static void gen_VCVTfp2fp(DisasContext *s, CPUX86State *env, X86DecodedInsn *dec gen_helper_cvtsd2ss, gen_helper_cvtss2sd); } +static void gen_VCVTSI2Sx(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + int vec_len = sse_vec_len(s, decode); + MemOp ot = decode->op[2].ot; + TCGv_i32 in; + + tcg_gen_gvec_mov(MO_64, decode->op[0].offset, decode->op[1].offset, vec_len, vec_len); +#ifdef TARGET_X86_64 + if (ot == MO_64) { + if (s->prefix & PREFIX_REPNZ) { + gen_helper_cvtsq2sd(cpu_env, s->ptr0, s->T1); + } else { + gen_helper_cvtsq2ss(cpu_env, s->ptr0, s->T1); + } + return; + } + in = s->tmp2_i32; + tcg_gen_trunc_tl_i32(in, s->T1); +#else + in = s->T1; +#endif + + if (s->prefix & PREFIX_REPNZ) { + gen_helper_cvtsi2sd(cpu_env, s->ptr0, in); + } else { + gen_helper_cvtsi2ss(cpu_env, s->ptr0, in); + } +} + +static inline void gen_VCVTtSx2SI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode, + SSEFunc_i_ep ss2si, SSEFunc_l_ep ss2sq, + SSEFunc_i_ep sd2si, SSEFunc_l_ep sd2sq) +{ + MemOp ot = decode->op[0].ot; + TCGv_i32 out; + +#ifdef TARGET_X86_64 + if (ot == MO_64) { + if (s->prefix & PREFIX_REPNZ) { + sd2sq(s->T0, cpu_env, s->ptr2); + } else { + ss2sq(s->T0, cpu_env, s->ptr2); + } + return; + } + + out = s->tmp2_i32; +#else + out = s->T0; +#endif + if (s->prefix & PREFIX_REPNZ) { + sd2si(out, cpu_env, s->ptr2); + } else { + ss2si(out, cpu_env, s->ptr2); + } +#ifdef TARGET_X86_64 + tcg_gen_extu_i32_tl(s->T0, out); +#endif +} + +static void gen_VCVTSx2SI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_VCVTtSx2SI(s, env, decode, + gen_helper_cvtss2si, gen_helper_cvtss2sq, + gen_helper_cvtsd2si, gen_helper_cvtsd2sq); +} + +static void gen_VCVTTSx2SI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + gen_VCVTtSx2SI(s, env, decode, + gen_helper_cvttss2si, gen_helper_cvttss2sq, + gen_helper_cvttsd2si, gen_helper_cvttsd2sq); +} + static void gen_VCVTpd_dq(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { SSEFunc_0_epp fn = NULL; @@ -2162,4 +2274,12 @@ static inline void gen_VSHUF(DisasContext *s, CPUX86State *env, X86DecodedInsn * tcg_temp_free_i32(imm); } +static void gen_VUCOMI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + SSEFunc_0_epp fn; + fn = s->prefix & PREFIX_DATA ? gen_helper_ucomisd : gen_helper_ucomiss; + fn(cpu_env, s->ptr1, s->ptr2); + set_cc_op(s, CC_OP_EFLAGS); +} + #define gen_VXOR gen_PXOR diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index bb5f74140c..f312663110 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4669,6 +4669,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) if (use_new && (b == 0x138 || b == 0x13a || (b >= 0x110 && b <= 0x117) || + (b >= 0x128 && b <= 0x12f) || (b >= 0x150 && b <= 0x17f) || b == 0x1c2 || (b >= 0x1c4 && b <= 0x1c6) || (b >= 0x1d0 && b <= 0x1ff))) { From patchwork Sun Sep 11 23:04:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973148 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A9879C6FA83 for ; Sun, 11 Sep 2022 23:37:19 +0000 (UTC) Received: from localhost ([::1]:55908 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWVq-0007Ii-MX for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:37:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44888) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1a-0001fd-Ko for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:06 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:28639) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1Z-0007OA-1U for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937560; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XfveNXHWPuBDWCBxYw20yQD27dwpQZ3MhFlsQhrWRRc=; b=Ig1hihEch/qQBvo/h3VP3gBX/+ck3Pm0f29elOZhSgEf9zY21qTRfQQbopOexUPi75lYnS RfeHzFRfw7/IDIJsaVENQKD2bwhQzq4KqNnGzt5RMZ/3CmwJaN93rJG416vAoXxLdOp7w4 OMnEkMIs/kL0R6WOcnBCkHeaWVtFX54= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-615-2XQhcek_MvqLAgKQsQF90A-1; Sun, 11 Sep 2022 19:05:58 -0400 X-MC-Unique: 2XQhcek_MvqLAgKQsQF90A-1 Received: by mail-ed1-f69.google.com with SMTP id t13-20020a056402524d00b0043db1fbefdeso4956175edd.2 for ; Sun, 11 Sep 2022 16:05:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=XfveNXHWPuBDWCBxYw20yQD27dwpQZ3MhFlsQhrWRRc=; b=hJjLQ12g/d0awFnFG6dRCyYuwUC9DXdgAgrJsZp3KxRF/vAh5NIaOEJcFWb3HralGc GKLqoehiIwz96bpT8YZQOvJtWTum7ZC1UMbQNvPaj0skAYt8+RtAY9SwJhatgzB3bc7N EDNfTb8luirw35cBqnk8xddzjl8PsLPEjy57UQwQSZm2Hza9/ArPN79p0y1gXU59hOni c4xqpsLGEEIuCsmOnGMqn4pf9gUoFyUkhYKwXRuHGsXGR4dlMWJxEV1BtXuE0vXPMIHX /lbd/5gKVWHT41+87DCYoqH9N6L4+UnZej3fM5rdUt6zXxzr1eH4Ukt/PF51yysrXvxz CKtA== X-Gm-Message-State: ACgBeo1DQmKA7MWWFEylBw4a7MSTiaz/ricI+BwFUGKMg5c4pGOIJkHa kBMJrXjvIWtTVfSQbi3pr6zynMXpU5zrV2VPesbkywA3P8Hn3GyGy0AVOb3fPd0hV+ZAW72rn8n 6GQXaSmKOYlFFzGC0wSl0/SEszfinuBCtWdgL57wuK+7jc/bz88jNn+Ida8/FsjygJco= X-Received: by 2002:a50:ed0d:0:b0:44e:8882:fc4a with SMTP id j13-20020a50ed0d000000b0044e8882fc4amr20036283eds.190.1662937556896; Sun, 11 Sep 2022 16:05:56 -0700 (PDT) X-Google-Smtp-Source: AA6agR4q6OCCGR392I6uUhFgFwjNeIU0+aN3otUyjl9leCKhu1tYrbptxxR+i2EEXTewybpes6S9WQ== X-Received: by 2002:a50:ed0d:0:b0:44e:8882:fc4a with SMTP id j13-20020a50ed0d000000b0044e8882fc4amr20036270eds.190.1662937556568; Sun, 11 Sep 2022 16:05:56 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id s15-20020aa7c54f000000b004466f5375a5sm4712093edr.53.2022.09.11.16.05.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:56 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 32/37] target/i386: implement XSAVE and XRSTOR of AVX registers Date: Mon, 12 Sep 2022 01:04:12 +0200 Message-Id: <20220911230418.340941-33-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 84 ++++++++++++++++++++++++++++++++++-- 1 file changed, 81 insertions(+), 3 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 230907bc5c..1be620257e 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2571,6 +2571,25 @@ static void do_xsave_sse(CPUX86State *env, target_ulong ptr, uintptr_t ra) } } +static void do_xsave_ymmh(CPUX86State *env, target_ulong ptr, uintptr_t ra) +{ + int i, nb_xmm_regs; + target_ulong addr; + + if (env->hflags & HF_CS64_MASK) { + nb_xmm_regs = 16; + } else { + nb_xmm_regs = 8; + } + + addr = ptr + XO(avx_state); + for (i = 0; i < nb_xmm_regs; i++) { + cpu_stq_data_ra(env, addr, env->xmm_regs[i].ZMM_Q(2), ra); + cpu_stq_data_ra(env, addr + 8, env->xmm_regs[i].ZMM_Q(3), ra); + addr += 16; + } +} + static void do_xsave_bndregs(CPUX86State *env, target_ulong ptr, uintptr_t ra) { target_ulong addr = ptr + offsetof(XSaveBNDREG, bnd_regs); @@ -2663,6 +2682,9 @@ static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, if (opt & XSTATE_SSE_MASK) { do_xsave_sse(env, ptr, ra); } + if (opt & XSTATE_YMM_MASK) { + do_xsave_ymmh(env, ptr + XO(avx_state), ra); + } if (opt & XSTATE_BNDREGS_MASK) { do_xsave_bndregs(env, ptr + XO(bndreg_state), ra); } @@ -2737,6 +2759,57 @@ static void do_xrstor_sse(CPUX86State *env, target_ulong ptr, uintptr_t ra) } } +static void do_clear_sse(CPUX86State *env) +{ + int i, nb_xmm_regs; + + if (env->hflags & HF_CS64_MASK) { + nb_xmm_regs = 16; + } else { + nb_xmm_regs = 8; + } + + for (i = 0; i < nb_xmm_regs; i++) { + env->xmm_regs[i].ZMM_Q(0) = 0; + env->xmm_regs[i].ZMM_Q(1) = 0; + } +} + +static void do_xrstor_ymmh(CPUX86State *env, target_ulong ptr, uintptr_t ra) +{ + int i, nb_xmm_regs; + target_ulong addr; + + if (env->hflags & HF_CS64_MASK) { + nb_xmm_regs = 16; + } else { + nb_xmm_regs = 8; + } + + addr = ptr + XO(avx_state); + for (i = 0; i < nb_xmm_regs; i++) { + env->xmm_regs[i].ZMM_Q(2) = cpu_ldq_data_ra(env, addr, ra); + env->xmm_regs[i].ZMM_Q(3) = cpu_ldq_data_ra(env, addr + 8, ra); + addr += 16; + } +} + +static void do_clear_ymmh(CPUX86State *env) +{ + int i, nb_xmm_regs; + + if (env->hflags & HF_CS64_MASK) { + nb_xmm_regs = 16; + } else { + nb_xmm_regs = 8; + } + + for (i = 0; i < nb_xmm_regs; i++) { + env->xmm_regs[i].ZMM_Q(2) = 0; + env->xmm_regs[i].ZMM_Q(3) = 0; + } +} + static void do_xrstor_bndregs(CPUX86State *env, target_ulong ptr, uintptr_t ra) { target_ulong addr = ptr + offsetof(XSaveBNDREG, bnd_regs); @@ -2856,9 +2929,14 @@ void helper_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm) if (xstate_bv & XSTATE_SSE_MASK) { do_xrstor_sse(env, ptr, ra); } else { - /* ??? When AVX is implemented, we may have to be more - selective in the clearing. */ - memset(env->xmm_regs, 0, sizeof(env->xmm_regs)); + do_clear_sse(env); + } + } + if (rfbm & XSTATE_YMM_MASK) { + if (xstate_bv & XSTATE_BNDREGS_MASK) { + do_xrstor_ymmh(env, ptr, ra); + } else { + do_clear_ymmh(env); } } if (rfbm & XSTATE_BNDREGS_MASK) { From patchwork Sun Sep 11 23:04:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3C5D6C6FA83 for ; Sun, 11 Sep 2022 23:41:01 +0000 (UTC) Received: from localhost ([::1]:45750 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWZQ-00038G-Bx for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:41:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35674) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1g-0001pd-Th for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:08 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:51795) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1f-0007PZ-Ad for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+eX1od6QHLDlQzvBK0qDAVBO4mDaRubV0sfXxiBQC28=; b=Weyugi0iDpXiPDbn8G2TKP0J0sKOVv0vWeI4lY8xCCAR8ho5Ua7gvw+C6CO6l1MEJzbVdV JzJdbXow0K9qOglnyYQf0xK0TPB9mInuc80qyAFuRQiofaSevLw1dLCvHIg/gs3mgaoZ+T AtRaFbnAetkicNLyZTn8/QlyoeHjVBM= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-133-lQ4n4CRBN8-ElAZff20Ybw-1; Sun, 11 Sep 2022 19:06:00 -0400 X-MC-Unique: lQ4n4CRBN8-ElAZff20Ybw-1 Received: by mail-ed1-f69.google.com with SMTP id q18-20020a056402519200b0043dd2ff50feso4948145edd.9 for ; Sun, 11 Sep 2022 16:06:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=+eX1od6QHLDlQzvBK0qDAVBO4mDaRubV0sfXxiBQC28=; b=pMmi2ZlmlmVdqdwJnUbPa+shpcWl0qvP88O0rlESQO+okfj/+YQ2vEa0DRfxcUhw4c Z7BSmzRALNlxMBDiVSnOFiBnVjbY41fL67DEEnfzj6Tfzj0UV39EamgaTJrIa4bDs4U5 OG3Wo9cT1D6sMi9skf/N6uexVAbGYGr0/o8kvf8E8uB2Osz8XF6UcpFMw10s5plSKcaf U9dfpbYRJUnV4dBpnGql+i5i0MbESAFSKgH1iSLCPoUnrz6kSv4ywC8rEWH0W10KcbRX wlDct+e89OolIWx0/x111acW3Iyg2ManyKD2l3pks28N1W6cCy3XelfMvbSNrutJtWVn Xt8Q== X-Gm-Message-State: ACgBeo06MGDWn+5K9f2Mtch4RWAh6j3addksg3rv48F3m712BMB/d7AD g4hmcV4KwQqfR5kXP7wlmXdx90RTeH60PgUKZ+CcVsKBPhMYn/djeGigBOr2MAPkd/ATpoLcQQi hK5FerOzBJKYKqusSUxsF4zn2HZGAfoIrP/0G+IPj1XCbRwoPyzi8sZRb3KT8zw9808k= X-Received: by 2002:a05:6402:1f01:b0:445:fbe8:4b2e with SMTP id b1-20020a0564021f0100b00445fbe84b2emr19935305edb.192.1662937559522; Sun, 11 Sep 2022 16:05:59 -0700 (PDT) X-Google-Smtp-Source: AA6agR4JO0mmaS//QZL+irlnleywbmMmPBh9numh3N+T6GkibpfKgVAs+TkQuGDBQX0TqjeeqgTG5Q== X-Received: by 2002:a05:6402:1f01:b0:445:fbe8:4b2e with SMTP id b1-20020a0564021f0100b00445fbe84b2emr19935292edb.192.1662937559240; Sun, 11 Sep 2022 16:05:59 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id y7-20020a056402170700b0044e7f40c48esm4665525edu.62.2022.09.11.16.05.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:05:58 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: Paul Brook Subject: [PATCH 33/37] target/i386: Enable AVX cpuid bits when using TCG Date: Mon, 12 Sep 2022 01:04:13 +0200 Message-Id: <20220911230418.340941-34-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, UPPERCASE_50_75=0.008 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Paul Brook Include AVX, AVX2 and VAES in the guest cpuid features supported by TCG. Signed-off-by: Paul Brook Message-Id: <20220424220204.2493824-40-paul@nowt.org> Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/cpu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 1db1278a59..ec0817a61d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -625,12 +625,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, CPUID_EXT_SSE41 | CPUID_EXT_SSE42 | CPUID_EXT_POPCNT | \ CPUID_EXT_XSAVE | /* CPUID_EXT_OSXSAVE is dynamic */ \ CPUID_EXT_MOVBE | CPUID_EXT_AES | CPUID_EXT_HYPERVISOR | \ - CPUID_EXT_RDRAND) + CPUID_EXT_RDRAND | CPUID_EXT_AVX) /* missing: CPUID_EXT_DTES64, CPUID_EXT_DSCPL, CPUID_EXT_VMX, CPUID_EXT_SMX, CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID, CPUID_EXT_FMA, CPUID_EXT_XTPR, CPUID_EXT_PDCM, CPUID_EXT_PCID, CPUID_EXT_DCA, - CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER, CPUID_EXT_AVX, + CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER, CPUID_EXT_F16C */ #ifdef TARGET_X86_64 @@ -653,14 +653,14 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ADX | \ CPUID_7_0_EBX_PCOMMIT | CPUID_7_0_EBX_CLFLUSHOPT | \ CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_MPX | CPUID_7_0_EBX_FSGSBASE | \ - CPUID_7_0_EBX_ERMS) + CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_AVX2) /* missing: - CPUID_7_0_EBX_HLE, CPUID_7_0_EBX_AVX2, + CPUID_7_0_EBX_HLE CPUID_7_0_EBX_INVPCID, CPUID_7_0_EBX_RTM, CPUID_7_0_EBX_RDSEED */ #define TCG_7_0_ECX_FEATURES (CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU | \ /* CPUID_7_0_ECX_OSPKE is dynamic */ \ - CPUID_7_0_ECX_LA57 | CPUID_7_0_ECX_PKS) + CPUID_7_0_ECX_LA57 | CPUID_7_0_ECX_PKS | CPUID_7_0_ECX_VAES) #define TCG_7_0_EDX_FEATURES 0 #define TCG_7_1_EAX_FEATURES 0 #define TCG_APM_FEATURES 0 From patchwork Sun Sep 11 23:04:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 87C21C6FA83 for ; Sun, 11 Sep 2022 23:42:47 +0000 (UTC) Received: from localhost ([::1]:43064 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWb8-0004iu-It for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:42:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59034) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1n-000299-Sm for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:15 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:26732) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1k-0007Tl-JT for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937572; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rMMsdqPj65b1k2EoYmQmlkH+oVmOYz12tl46MzuK+wQ=; b=bnSr7jsOqUx3pVYjSZCABz9ox113dcBDypze5PH4hHYjTuyTFJWHDj7rRY2AmOGm62zXFl EaL7dfUW51/PjRhnW7wePL7BEQ3An3xGLZlfiAy98nfNDx3QNJHIBeko2VBVGd9HWt5UAo 0PAe/qZHuxmDYSnwOxC3xs3IMpz0H7k= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-576-yPQWi3j-MhKeR1kf22xUvg-1; Sun, 11 Sep 2022 19:06:04 -0400 X-MC-Unique: yPQWi3j-MhKeR1kf22xUvg-1 Received: by mail-ed1-f71.google.com with SMTP id w20-20020a05640234d400b00450f24c8ca6so4946627edc.13 for ; Sun, 11 Sep 2022 16:06:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=rMMsdqPj65b1k2EoYmQmlkH+oVmOYz12tl46MzuK+wQ=; b=rGwVUjqt+0F5gxms5d7NyDXslvl6oV8IeAZRU93F1lJI/mk6DLCRkaJZe+1dJt47Ag j8R4ccjyU4PvaBFD6rTceCLffKJ90ul6xtERe2Yg07j6HDiYWiXDPRKx3Tn4csCPHUDO OKX7vBVwpPN11DF6L2M1OEnT+pwrBJKwGJdads7Zas06377uKnk4atKzglJncX5Q4cvP +ckt+72I5QIfKS/fQ+3lAD13US7Yb2aZGfMTFfKgcW5bi6dvDpEk4hme1L4WQSEgkjWP vc9VZ2DssHf3in0Q+uWq3EE0L8vhmQ1n+zS62Iwo4OV854ofWqUtCp0o20xdWRKmGmZi BRrg== X-Gm-Message-State: ACgBeo2fJXGUbGs3w6blF9cSs4YinAt6pB9Tr0fm0sWFHMjpV0hjHO1a AnQpDhWN4TastpSsiKzj3UtcWBt3qG3ahfyGih34Qq+Ck9S+OPtgU16R3nzbV4ddb3NVMSyFjJu E3vixyp+s0wSNFalkhAOEfnq3yDrHwIEfsMl4i/PQ207LaZOHvXzoexyUxhn+kLsbmVM= X-Received: by 2002:a17:907:2be1:b0:770:8268:ec95 with SMTP id gv33-20020a1709072be100b007708268ec95mr16558922ejc.105.1662937563049; Sun, 11 Sep 2022 16:06:03 -0700 (PDT) X-Google-Smtp-Source: AA6agR51GzMSZz0w9kdB5Xc4FQiGgY9xFM7LInw/9u3dgdBTNmxGmXt8f8zZkEk1d+rdOYjWhNOBvA== X-Received: by 2002:a17:907:2be1:b0:770:8268:ec95 with SMTP id gv33-20020a1709072be100b007708268ec95mr16558914ejc.105.1662937562710; Sun, 11 Sep 2022 16:06:02 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id v17-20020a170906293100b0077e6be40e4asm237054ejd.175.2022.09.11.16.06.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:06:02 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 34/37] target/i386: implement VLDMXCSR/VSTMXCSR Date: Mon, 12 Sep 2022 01:04:14 +0200 Message-Id: <20220911230418.340941-35-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" These are exactly the same as the non-VEX version, but one has to be careful that only VEX.L=0 is allowed. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/decode-new.c.inc | 25 +++++++++++++++++++++++++ target/i386/tcg/emit.c.inc | 20 ++++++++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 383a425ccd..e468a32787 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -80,6 +80,10 @@ #define X86_OP_ENTRY2(op, op0, s0, op1, s1, ...) \ X86_OP_ENTRY3(op, op0, s0, 2op, s0, op1, s1, ## __VA_ARGS__) +#define X86_OP_ENTRYw(op, op0, s0, ...) \ + X86_OP_ENTRY3(op, op0, s0, None, None, None, None, ## __VA_ARGS__) +#define X86_OP_ENTRYr(op, op0, s0, ...) \ + X86_OP_ENTRY3(op, None, None, None, None, op0, s0, ## __VA_ARGS__) #define X86_OP_ENTRY0(op, ...) \ X86_OP_ENTRY3(op, None, None, None, None, None, None, ## __VA_ARGS__) @@ -147,6 +151,25 @@ static inline const X86OpEntry *decode_by_prefix(DisasContext *s, const X86OpEnt } } +static void decode_group15(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) +{ + /* only includes ldmxcsr and stmxcsr, because they have AVX variants. */ + static const X86OpEntry group15_reg[8] = { + }; + + static const X86OpEntry group15_mem[8] = { + [2] = X86_OP_ENTRYr(LDMXCSR, E,d, vex5), + [3] = X86_OP_ENTRYw(STMXCSR, E,d, vex5), + }; + + uint8_t modrm = get_modrm(s, env); + if ((modrm >> 6) == 3) { + *entry = group15_reg[(modrm >> 3) & 7]; + } else { + *entry = group15_mem[(modrm >> 3) & 7]; + } +} + static void decode_group17(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b) { static const X86GenFunc group17_gen[8] = { @@ -754,6 +777,8 @@ static const X86OpEntry opcodes_0F[256] = { [0x7e] = X86_OP_GROUP0(0F7E), [0x7f] = X86_OP_GROUP3(0F6F, W,x, None,None, V,x, vex5 mmx p_00_66_f3), + [0xae] = X86_OP_GROUP0(group15), + [0xc2] = X86_OP_ENTRY4(VCMP, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), [0xc4] = X86_OP_ENTRY4(PINSRW, V,dq,H,dq,E,w, vex5 mmx p_00_66), [0xc5] = X86_OP_ENTRY3(PEXTRW, G,d, U,dq,I,b, vex5 mmx p_00_66), diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index d61b43f21c..942766de0f 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -979,6 +979,16 @@ static void gen_LDDQU(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) gen_load_sse(s, s->T0, decode->op[0].ot, decode->op[0].offset); } +static void gen_LDMXCSR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + if (s->vex_l) { + gen_illegal_opcode(s); + return; + } + tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T1); + gen_helper_ldmxcsr(cpu_env, s->tmp2_i32); +} + static void gen_MASKMOV(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { tcg_gen_mov_tl(s->A0, cpu_regs[R_EDI]); @@ -1808,6 +1818,16 @@ static void gen_SSE4a_R(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod } } +static void gen_STMXCSR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + if (s->vex_l) { + gen_illegal_opcode(s); + return; + } + gen_helper_update_mxcsr(cpu_env); + tcg_gen_ld32u_tl(s->T0, cpu_env, offsetof(CPUX86State, mxcsr)); +} + static inline void gen_VAESIMC(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) { assert(!s->vex_l); From patchwork Sun Sep 11 23:04:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973155 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 633FBC6FA83 for ; Sun, 11 Sep 2022 23:45:36 +0000 (UTC) Received: from localhost ([::1]:46740 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWdr-0007zK-Gs for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:45:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59036) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1w-0002Ob-MC for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:24 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:20906) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1h-0007RU-50 for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937568; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FsiERs7teh/UejsYFZ1UuW65KPTiCINIp1xgGzHegLI=; b=PbB3RW+21hAZyU6sw9DKsq3cxj41VyDv4Qyo9QdW9Ie03ek+JoMvkY4HvYrFL+5ko6iQWz QqEurJNy/sGF8hIruXKEgo6cDBCWSJDliXsrdlJyD7Dg3NmS/bB3q9NOYAXZ3Ag8GRXNNv xrJGz+11Q/nQiUe3edh5Q4/IMX6H7gY= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-475-36X4xiakOmyF2SR1JpArHA-1; Sun, 11 Sep 2022 19:06:07 -0400 X-MC-Unique: 36X4xiakOmyF2SR1JpArHA-1 Received: by mail-ed1-f69.google.com with SMTP id b13-20020a056402350d00b0043dfc84c533so4915806edd.5 for ; Sun, 11 Sep 2022 16:06:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=FsiERs7teh/UejsYFZ1UuW65KPTiCINIp1xgGzHegLI=; b=A3NgD5ljINdGXL7qDh1lY0NNuEwzRRGLR2dr5ETKqlnvH7rfayFOXFo8NylTqP8uGY A3+n9T9uhnf+HGyi66NJxbjLb01R9BhsQgVfznuYSjrc2GbFXJiBJFsqBfEtxWzQ+pke OkAYd+2NAL2Z2H880RezfFClk9B6wzHFxrxyfsTiDIgqFyVUtjqE4FRSTOXUtmUNTtZ8 jZLstYOOXlvJdOKdM612mO4Cjz92myNsF7v0JhTQ4TYrtkOiQqeQiBi4cx6jf9mNP5re JFKT8koXOqZm24kYsnNk0/kkxEfl8CRbb6eEKspc/cVp/uNalMh+CCdWYSS4Fb7yjebp +3dA== X-Gm-Message-State: ACgBeo35qCTtKB/LZBBPK/rc14oAdp6WkH6/j713MMdSC7uAf1HLSjou XRHJPv6tzG3M/goz2qBFAwmU2QpLXzxQZpSBoILlpFiG0BZ8BNz+8REf23wFOBbeHhqab03Qtrg 0FUpyDS0zXdhqmgJO+Me9g+EeHMLFfURuI6aTcPYQS6zTBRcTTKn2EAZtzBojSME92HM= X-Received: by 2002:a17:907:320c:b0:77b:6f08:9870 with SMTP id xg12-20020a170907320c00b0077b6f089870mr5077956ejb.249.1662937565726; Sun, 11 Sep 2022 16:06:05 -0700 (PDT) X-Google-Smtp-Source: AA6agR7qvlZu+IMMQGk3pTm+rUqz9n5A9d/ujtNI9WZMz82kG3l3QpyXJPqHLKJiT4s3Ls+kmZaWHw== X-Received: by 2002:a17:907:320c:b0:77b:6f08:9870 with SMTP id xg12-20020a170907320c00b0077b6f089870mr5077944ejb.249.1662937565278; Sun, 11 Sep 2022 16:06:05 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id a18-20020a17090640d200b0073cf8e0355fsm3443128ejk.208.2022.09.11.16.06.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:06:04 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 35/37] tests/tcg: extend SSE tests to AVX Date: Mon, 12 Sep 2022 01:04:15 +0200 Message-Id: <20220911230418.340941-36-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Extracted from a patch by Paul Brook . Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- tests/tcg/i386/Makefile.target | 2 +- tests/tcg/i386/test-avx.c | 201 ++++++++++++++++++--------------- tests/tcg/i386/test-avx.py | 3 +- 3 files changed, 112 insertions(+), 94 deletions(-) diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target index ae71e7f748..4139973255 100644 --- a/tests/tcg/i386/Makefile.target +++ b/tests/tcg/i386/Makefile.target @@ -98,5 +98,5 @@ test-3dnow: test-3dnow.h test-mmx: CFLAGS += -masm=intel -O -I. test-mmx: test-mmx.h -test-avx: CFLAGS += -masm=intel -O -I. +test-avx: CFLAGS += -mavx -masm=intel -O -I. test-avx: test-avx.h diff --git a/tests/tcg/i386/test-avx.c b/tests/tcg/i386/test-avx.c index 23c170dd79..953e2906fe 100644 --- a/tests/tcg/i386/test-avx.c +++ b/tests/tcg/i386/test-avx.c @@ -6,18 +6,18 @@ typedef void (*testfn)(void); typedef struct { - uint64_t q0, q1; -} __attribute__((aligned(16))) v2di; + uint64_t q0, q1, q2, q3; +} __attribute__((aligned(32))) v4di; typedef struct { uint64_t mm[8]; - v2di xmm[16]; + v4di ymm[16]; uint64_t r[16]; uint64_t flags; uint32_t ff; uint64_t pad; - v2di mem[4]; - v2di mem0[4]; + v4di mem[4]; + v4di mem0[4]; } reg_state; typedef struct { @@ -31,20 +31,20 @@ reg_state initI; reg_state initF32; reg_state initF64; -static void dump_xmm(const char *name, int n, const v2di *r, int ff) +static void dump_ymm(const char *name, int n, const v4di *r, int ff) { - printf("%s%d = %016lx %016lx\n", - name, n, r->q1, r->q0); + printf("%s%d = %016lx %016lx %016lx %016lx\n", + name, n, r->q3, r->q2, r->q1, r->q0); if (ff == 64) { - double v[2]; + double v[4]; memcpy(v, r, sizeof(v)); - printf(" %16g %16g\n", - v[1], v[0]); - } else if (ff == 32) { - float v[4]; - memcpy(v, r, sizeof(v)); - printf(" %8g %8g %8g %8g\n", + printf(" %16g %16g %16g %16g\n", v[3], v[2], v[1], v[0]); + } else if (ff == 32) { + float v[8]; + memcpy(v, r, sizeof(v)); + printf(" %8g %8g %8g %8g %8g %8g %8g %8g\n", + v[7], v[6], v[5], v[4], v[3], v[2], v[1], v[0]); } } @@ -53,10 +53,10 @@ static void dump_regs(reg_state *s) int i; for (i = 0; i < 16; i++) { - dump_xmm("xmm", i, &s->xmm[i], 0); + dump_ymm("ymm", i, &s->ymm[i], 0); } for (i = 0; i < 4; i++) { - dump_xmm("mem", i, &s->mem0[i], 0); + dump_ymm("mem", i, &s->mem0[i], 0); } } @@ -74,13 +74,13 @@ static void compare_state(const reg_state *a, const reg_state *b) } } for (i = 0; i < 16; i++) { - if (memcmp(&a->xmm[i], &b->xmm[i], 16)) { - dump_xmm("xmm", i, &b->xmm[i], a->ff); + if (memcmp(&a->ymm[i], &b->ymm[i], 32)) { + dump_ymm("ymm", i, &b->ymm[i], a->ff); } } for (i = 0; i < 4; i++) { - if (memcmp(&a->mem0[i], &a->mem[i], 16)) { - dump_xmm("mem", i, &a->mem[i], a->ff); + if (memcmp(&a->mem0[i], &a->mem[i], 32)) { + dump_ymm("mem", i, &a->mem[i], a->ff); } } if (a->flags != b->flags) { @@ -89,9 +89,9 @@ static void compare_state(const reg_state *a, const reg_state *b) } #define LOADMM(r, o) "movq " #r ", " #o "[%0]\n\t" -#define LOADXMM(r, o) "movdqa " #r ", " #o "[%0]\n\t" +#define LOADYMM(r, o) "vmovdqa " #r ", " #o "[%0]\n\t" #define STOREMM(r, o) "movq " #o "[%1], " #r "\n\t" -#define STOREXMM(r, o) "movdqa " #o "[%1], " #r "\n\t" +#define STOREYMM(r, o) "vmovdqa " #o "[%1], " #r "\n\t" #define MMREG(F) \ F(mm0, 0x00) \ F(mm1, 0x08) \ @@ -101,39 +101,39 @@ static void compare_state(const reg_state *a, const reg_state *b) F(mm5, 0x28) \ F(mm6, 0x30) \ F(mm7, 0x38) -#define XMMREG(F) \ - F(xmm0, 0x040) \ - F(xmm1, 0x050) \ - F(xmm2, 0x060) \ - F(xmm3, 0x070) \ - F(xmm4, 0x080) \ - F(xmm5, 0x090) \ - F(xmm6, 0x0a0) \ - F(xmm7, 0x0b0) \ - F(xmm8, 0x0c0) \ - F(xmm9, 0x0d0) \ - F(xmm10, 0x0e0) \ - F(xmm11, 0x0f0) \ - F(xmm12, 0x100) \ - F(xmm13, 0x110) \ - F(xmm14, 0x120) \ - F(xmm15, 0x130) +#define YMMREG(F) \ + F(ymm0, 0x040) \ + F(ymm1, 0x060) \ + F(ymm2, 0x080) \ + F(ymm3, 0x0a0) \ + F(ymm4, 0x0c0) \ + F(ymm5, 0x0e0) \ + F(ymm6, 0x100) \ + F(ymm7, 0x120) \ + F(ymm8, 0x140) \ + F(ymm9, 0x160) \ + F(ymm10, 0x180) \ + F(ymm11, 0x1a0) \ + F(ymm12, 0x1c0) \ + F(ymm13, 0x1e0) \ + F(ymm14, 0x200) \ + F(ymm15, 0x220) #define LOADREG(r, o) "mov " #r ", " #o "[rax]\n\t" #define STOREREG(r, o) "mov " #o "[rax], " #r "\n\t" #define REG(F) \ - F(rbx, 0x148) \ - F(rcx, 0x150) \ - F(rdx, 0x158) \ - F(rsi, 0x160) \ - F(rdi, 0x168) \ - F(r8, 0x180) \ - F(r9, 0x188) \ - F(r10, 0x190) \ - F(r11, 0x198) \ - F(r12, 0x1a0) \ - F(r13, 0x1a8) \ - F(r14, 0x1b0) \ - F(r15, 0x1b8) \ + F(rbx, 0x248) \ + F(rcx, 0x250) \ + F(rdx, 0x258) \ + F(rsi, 0x260) \ + F(rdi, 0x268) \ + F(r8, 0x280) \ + F(r9, 0x288) \ + F(r10, 0x290) \ + F(r11, 0x298) \ + F(r12, 0x2a0) \ + F(r13, 0x2a8) \ + F(r14, 0x2b0) \ + F(r15, 0x2b8) \ static void run_test(const TestDef *t) { @@ -143,7 +143,7 @@ static void run_test(const TestDef *t) printf("%5d %s\n", t->n, t->s); asm volatile( MMREG(LOADMM) - XMMREG(LOADXMM) + YMMREG(LOADYMM) "sub rsp, 128\n\t" "push rax\n\t" "push rbx\n\t" @@ -156,26 +156,26 @@ static void run_test(const TestDef *t) "pop rbx\n\t" "shr rbx, 8\n\t" "shl rbx, 8\n\t" - "mov rcx, 0x1c0[rax]\n\t" + "mov rcx, 0x2c0[rax]\n\t" "and rcx, 0xff\n\t" "or rbx, rcx\n\t" "push rbx\n\t" "popf\n\t" REG(LOADREG) - "mov rax, 0x140[rax]\n\t" + "mov rax, 0x240[rax]\n\t" "call [rsp]\n\t" "mov [rsp], rax\n\t" "mov rax, 8[rsp]\n\t" REG(STOREREG) "mov rbx, [rsp]\n\t" - "mov 0x140[rax], rbx\n\t" + "mov 0x240[rax], rbx\n\t" "mov rbx, 0\n\t" - "mov 0x170[rax], rbx\n\t" - "mov 0x178[rax], rbx\n\t" + "mov 0x270[rax], rbx\n\t" + "mov 0x278[rax], rbx\n\t" "pushf\n\t" "pop rbx\n\t" "and rbx, 0xff\n\t" - "mov 0x1c0[rax], rbx\n\t" + "mov 0x2c0[rax], rbx\n\t" "add rsp, 16\n\t" "pop rdx\n\t" "pop rcx\n\t" @@ -183,15 +183,15 @@ static void run_test(const TestDef *t) "pop rax\n\t" "add rsp, 128\n\t" MMREG(STOREMM) - XMMREG(STOREXMM) + YMMREG(STOREYMM) : : "r"(init), "r"(&result), "r"(t->fn) : "memory", "cc", "rsi", "rdi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "mm0", "mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm7", - "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", - "xmm6", "xmm7", "xmm8", "xmm9", "xmm10", "xmm11", - "xmm12", "xmm13", "xmm14", "xmm15" + "ymm0", "ymm1", "ymm2", "ymm3", "ymm4", "ymm5", + "ymm6", "ymm7", "ymm8", "ymm9", "ymm10", "ymm11", + "ymm12", "ymm13", "ymm14", "ymm15" ); compare_state(init, &result); } @@ -223,22 +223,30 @@ static void run_all(void) float val_f32[] = {2.0, -1.0, 4.8, 0.8, 3, -42.0, 5e6, 7.5, 8.3}; double val_f64[] = {2.0, -1.0, 4.8, 0.8, 3, -42.0, 5e6, 7.5}; -v2di val_i64[] = { - {0x3d6b3b6a9e4118f2lu, 0x355ae76d2774d78clu}, - {0xd851c54a56bf1f29lu, 0x4a84d1d50bf4c4fflu}, - {0x5826475e2c5fd799lu, 0xfd32edc01243f5e9lu}, +v4di val_i64[] = { + {0x3d6b3b6a9e4118f2lu, 0x355ae76d2774d78clu, + 0xac3ff76c4daa4b28lu, 0xe7fabd204cb54083lu}, + {0xd851c54a56bf1f29lu, 0x4a84d1d50bf4c4fflu, + 0x56621e553d52b56clu, 0xd0069553da8f584alu}, + {0x5826475e2c5fd799lu, 0xfd32edc01243f5e9lu, + 0x738ba2c66d3fe126lu, 0x5707219c6e6c26b4lu}, }; -v2di deadbeef = {0xa5a5a5a5deadbeefull, 0xa5a5a5a5deadbeefull}; -v2di indexq = {0x000000000000001full, 0x000000000000008full}; -v2di indexd = {0x00000002000000efull, 0xfffffff500000010ull}; +v4di deadbeef = {0xa5a5a5a5deadbeefull, 0xa5a5a5a5deadbeefull, + 0xa5a5a5a5deadbeefull, 0xa5a5a5a5deadbeefull}; +v4di indexq = {0x000000000000001full, 0x000000000000008full, + 0xffffffffffffffffull, 0xffffffffffffff5full}; +v4di indexd = {0x00000002000000efull, 0xfffffff500000010ull, + 0x0000000afffffff0ull, 0x000000000000000eull}; -void init_f32reg(v2di *r) +v4di gather_mem[0x20]; + +void init_f32reg(v4di *r) { static int n; - float v[4]; + float v[8]; int i; - for (i = 0; i < 4; i++) { + for (i = 0; i < 8; i++) { v[i] = val_f32[n++]; if (n == ARRAY_LEN(val_f32)) { n = 0; @@ -247,12 +255,12 @@ void init_f32reg(v2di *r) memcpy(r, v, sizeof(*r)); } -void init_f64reg(v2di *r) +void init_f64reg(v4di *r) { static int n; - double v[2]; + double v[4]; int i; - for (i = 0; i < 2; i++) { + for (i = 0; i < 4; i++) { v[i] = val_f64[n++]; if (n == ARRAY_LEN(val_f64)) { n = 0; @@ -261,13 +269,15 @@ void init_f64reg(v2di *r) memcpy(r, v, sizeof(*r)); } -void init_intreg(v2di *r) +void init_intreg(v4di *r) { static uint64_t mask; static int n; r->q0 = val_i64[n].q0 ^ mask; r->q1 = val_i64[n].q1 ^ mask; + r->q2 = val_i64[n].q2 ^ mask; + r->q3 = val_i64[n].q3 ^ mask; n++; if (n == ARRAY_LEN(val_i64)) { n = 0; @@ -280,46 +290,53 @@ static void init_all(reg_state *s) int i; s->r[3] = (uint64_t)&s->mem[0]; /* rdx */ + s->r[4] = (uint64_t)&gather_mem[ARRAY_LEN(gather_mem) / 2]; /* rsi */ s->r[5] = (uint64_t)&s->mem[2]; /* rdi */ s->flags = 2; - for (i = 0; i < 8; i++) { - s->xmm[i] = deadbeef; + for (i = 0; i < 16; i++) { + s->ymm[i] = deadbeef; } - s->xmm[13] = indexd; - s->xmm[14] = indexq; - for (i = 0; i < 2; i++) { + s->ymm[13] = indexd; + s->ymm[14] = indexq; + for (i = 0; i < 4; i++) { s->mem0[i] = deadbeef; } } int main(int argc, char *argv[]) { + int i; + init_all(&initI); - init_intreg(&initI.xmm[10]); - init_intreg(&initI.xmm[11]); - init_intreg(&initI.xmm[12]); + init_intreg(&initI.ymm[10]); + init_intreg(&initI.ymm[11]); + init_intreg(&initI.ymm[12]); init_intreg(&initI.mem0[1]); printf("Int:\n"); dump_regs(&initI); init_all(&initF32); - init_f32reg(&initF32.xmm[10]); - init_f32reg(&initF32.xmm[11]); - init_f32reg(&initF32.xmm[12]); + init_f32reg(&initF32.ymm[10]); + init_f32reg(&initF32.ymm[11]); + init_f32reg(&initF32.ymm[12]); init_f32reg(&initF32.mem0[1]); initF32.ff = 32; printf("F32:\n"); dump_regs(&initF32); init_all(&initF64); - init_f64reg(&initF64.xmm[10]); - init_f64reg(&initF64.xmm[11]); - init_f64reg(&initF64.xmm[12]); + init_f64reg(&initF64.ymm[10]); + init_f64reg(&initF64.ymm[11]); + init_f64reg(&initF64.ymm[12]); init_f64reg(&initF64.mem0[1]); initF64.ff = 64; printf("F64:\n"); dump_regs(&initF64); + for (i = 0; i < ARRAY_LEN(gather_mem); i++) { + init_intreg(&gather_mem[i]); + } + if (argc > 1) { int n = atoi(argv[1]); run_test(&test_table[n]); diff --git a/tests/tcg/i386/test-avx.py b/tests/tcg/i386/test-avx.py index e16a3d8bee..cff3aed138 100755 --- a/tests/tcg/i386/test-avx.py +++ b/tests/tcg/i386/test-avx.py @@ -8,6 +8,7 @@ archs = [ "SSE", "SSE2", "SSE3", "SSSE3", "SSE4_1", "SSE4_2", + "AVX", "AVX2", "AES+AVX", # "VAES+AVX", ] ignore = set(["FISTTP", @@ -85,7 +86,7 @@ def mem_w(w): else: raise Exception() - return t + " PTR 16[rdx]" + return t + " PTR 32[rdx]" class XMMArg(): isxmm = True From patchwork Sun Sep 11 23:04:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12973150 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7D8AEC6FA83 for ; Sun, 11 Sep 2022 23:38:13 +0000 (UTC) Received: from localhost ([::1]:48176 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oXWWi-0000PS-Lh for qemu-devel@archiver.kernel.org; Sun, 11 Sep 2022 19:38:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35676) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1m-00024O-5k for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:14 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:34619) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oXW1k-0007TW-9x for qemu-devel@nongnu.org; Sun, 11 Sep 2022 19:06:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662937571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B6bRjieiGzdBvtZIC5lJqenm0fl7JlMwt7v12NnjAxw=; b=dZpHgd+EIf8EB1aADc4isxzCsC0vpv1DSCgzePUa6VOysKrEGFtK3wQFQlf+v2AJkM2uVv tVTjreRvGAjspe9IlrkSsgY6W1/PTU4HAfpWJr3S5h+rHE8aYn1WzN199oLA6R4b5ZYzAz wSgxEotHupO0H/zXmsxS4F/HFzKZD2I= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-120-i3sEuFrLNICbd7T_iA8ukw-1; Sun, 11 Sep 2022 19:06:09 -0400 X-MC-Unique: i3sEuFrLNICbd7T_iA8ukw-1 Received: by mail-ed1-f69.google.com with SMTP id x5-20020a05640226c500b00451ec193793so226746edd.16 for ; Sun, 11 Sep 2022 16:06:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=B6bRjieiGzdBvtZIC5lJqenm0fl7JlMwt7v12NnjAxw=; b=ZbUJy0xU05Hln0PUcmUg7CDPBejpmAD6sNSZ7cjxdzfkiJegVycTkMfkBhy8BMqdpr Fabb/S9Xq2P8HAlAg2sfNXJWoGZ06HMs7QbP6wOMdXL1kiVRb5WolkOv0ngx6Em2BU+S TW5shCcgj0pyuAFOPCO0MguI+Xi4ndezN3eyelvIDpouPP9Tb60s+LgMPCFtJA7oFxgG 6HllljNtkzM0WbpFdvy92ohVGurGNdY3SP/Ca76MbKOA+4APeQbUOm1p6aAdXJ+yTeGl ApDJ46TvllRtebSHLV3hGfQcRvGtQJIGqBe3RYoXxJSGhHh+hbMt5tcenXxdr3McvKy3 K9hA== X-Gm-Message-State: ACgBeo25u/A/DMT0rhkMB0Baq7f5rSAh9Xn1Z6O+iZN0zAyHF8zLVrW/ vRn8h0anqvfvgMUBoBZGiRi3jxoo1PG2uctXGFosUsN+qE25kEnAZWkRek2S7mwKWf3RJM+h+QZ s71aWFcR/+eNsD2iqn8cOFJankQjClKCdcaMxqJeXN0BtwiYVoBn6rH+P8I/CcMYCplw= X-Received: by 2002:a17:907:d8e:b0:77d:2649:1a78 with SMTP id go14-20020a1709070d8e00b0077d26491a78mr2217327ejc.521.1662937568469; Sun, 11 Sep 2022 16:06:08 -0700 (PDT) X-Google-Smtp-Source: AA6agR6roiL7zEdk1D4MuVptP2cBLyrUajhLkiOpoek689icEn0tQ2nKuHPKb4SWXAvCpyOsJKhWMw== X-Received: by 2002:a17:907:d8e:b0:77d:2649:1a78 with SMTP id go14-20020a1709070d8e00b0077d26491a78mr2217315ejc.521.1662937568173; Sun, 11 Sep 2022 16:06:08 -0700 (PDT) Received: from goa-sendmail (93-44-39-154.ip95.fastwebnet.it. [93.44.39.154]) by smtp.gmail.com with ESMTPSA id u3-20020a05640207c300b00451e3160451sm518351edy.89.2022.09.11.16.06.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Sep 2022 16:06:07 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 36/37] target/i386: move 3DNow completely out of gen_sse Date: Mon, 12 Sep 2022 01:04:16 +0200 Message-Id: <20220911230418.340941-37-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Everything else has been converted to the new decoder, so separate the part that survives. Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson --- target/i386/tcg/translate.c | 104 +++++++++++++++++++++++------------- 1 file changed, 68 insertions(+), 36 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index f312663110..0783b1e7ee 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2918,7 +2918,6 @@ static bool first = true; static unsigned long limit; #define SSE_OPF_CMP (1 << 1) /* does not write for first operand */ #define SSE_OPF_BLENDV (1 << 2) /* blendv* instruction */ #define SSE_OPF_SPECIAL (1 << 3) /* magic */ -#define SSE_OPF_3DNOW (1 << 4) /* 3DNow! instruction */ #define SSE_OPF_MMX (1 << 5) /* MMX/integer/AVX2 instruction */ #define SSE_OPF_SCALAR (1 << 6) /* Has SSE scalar variants */ #define SSE_OPF_SHUF (1 << 9) /* pshufx/shufpx */ @@ -2952,13 +2951,9 @@ struct SSEOpHelper_table1 { SSEFuncs fn[4]; }; -#define SSE_3DNOW { SSE_OPF_3DNOW } #define SSE_SPECIAL { SSE_OPF_SPECIAL } static const struct SSEOpHelper_table1 sse_op_table1[256] = { - /* 3DNow! extensions */ - [0x0e] = SSE_SPECIAL, /* femms */ - [0x0f] = SSE_3DNOW, /* pf... (sse_op_table5) */ /* pure SSE operations */ [0x10] = SSE_SPECIAL, /* movups, movupd, movss, movsd */ [0x11] = SSE_SPECIAL, /* movups, movupd, movss, movsd */ @@ -3172,7 +3167,7 @@ static void gen_helper_pavgusb(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b) gen_helper_pavgb_mmx(env, reg_a, reg_a, reg_b); } -static const SSEFunc_0_epp sse_op_table5[256] = { +static const SSEFunc_0_epp op_3dnow[256] = { [0x0c] = gen_helper_pi2fw, [0x0d] = gen_helper_pi2fd, [0x1c] = gen_helper_pf2iw, @@ -3351,7 +3346,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, b1 = 0; sse_op_flags = sse_op_table1[b].flags; sse_op_fn = sse_op_table1[b].fn[b1]; - if ((sse_op_flags & (SSE_OPF_SPECIAL | SSE_OPF_3DNOW)) == 0 + if ((sse_op_flags & SSE_OPF_SPECIAL) == 0 && !sse_op_fn.op1) { goto unknown_op; } @@ -3365,11 +3360,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, is_xmm = 1; } } - if (sse_op_flags & SSE_OPF_3DNOW) { - if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { - goto illegal_op; - } - } /* simple MMX/SSE operation */ if (s->flags & HF_TS_MASK) { gen_exception(s, EXCP07_PREX, pc_start - s->cs_base); @@ -3385,15 +3375,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, && (b != 0x38 && b != 0x3a)) { goto unknown_op; } - if (b == 0x0e) { - if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { - /* If we were fully decoding this we might use illegal_op. */ - goto unknown_op; - } - /* femms */ - gen_helper_emms(cpu_env); - return; - } if (b == 0x77) { /* emms */ gen_helper_emms(cpu_env); @@ -4536,18 +4517,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, rm = (modrm & 7); op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); } - if (sse_op_flags & SSE_OPF_3DNOW) { - /* 3DNow! data insns */ - val = x86_ldub_code(env, s); - SSEFunc_0_epp op_3dnow = sse_op_table5[val]; - if (!op_3dnow) { - goto unknown_op; - } - tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); - tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); - op_3dnow(cpu_env, s->ptr0, s->ptr1); - return; - } } @@ -4598,6 +4567,70 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } } +static void gen_3dnow(CPUX86State *env, DisasContext *s, int b, + target_ulong pc_start) +{ + int op1_offset, op2_offset, val; + int modrm, mod, rm, reg; + SSEFunc_0_epp fn; + + /* simple MMX/SSE operation */ + if (s->flags & HF_TS_MASK) { + gen_exception(s, EXCP07_PREX, pc_start - s->cs_base); + return; + } + if (s->flags & HF_EM_MASK) { + goto illegal_op; + return; + } + if (b == 0x10e) { + if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { + /* If we were fully decoding this we might use illegal_op. */ + goto unknown_op; + } + /* femms */ + gen_helper_emms(cpu_env); + return; + } + + if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { + goto illegal_op; + } + + gen_helper_enter_mmx(cpu_env); + + modrm = x86_ldub_code(env, s); + reg = ((modrm >> 3) & 7); + mod = (modrm >> 6) & 3; + + op1_offset = offsetof(CPUX86State,fpregs[reg].mmx); + if (mod != 3) { + gen_lea_modrm(env, s, modrm); + op2_offset = offsetof(CPUX86State,mmx_t0); + gen_ldq_env_A0(s, op2_offset); + } else { + rm = (modrm & 7); + op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); + } + + val = x86_ldub_code(env, s); + fn = op_3dnow[val]; + if (!fn) { + goto unknown_op; + } + tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); + tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); + fn(cpu_env, s->ptr0, s->ptr1); + return; + +illegal_op: + gen_illegal_opcode(s); + return; + +unknown_op: + gen_unknown_opcode(env, s); +} + /* convert one instruction. s->base.is_jmp is set if the translation must be stopped. Return the next pc value */ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) @@ -8505,9 +8538,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu) set_cc_op(s, CC_OP_POPCNT); break; case 0x10e ... 0x10f: - /* 3DNow! instructions, ignore prefixes */ - s->prefix &= ~(PREFIX_REPZ | PREFIX_REPNZ | PREFIX_DATA); - /* fall through */ + gen_3dnow(env, s, b, pc_start); + break; case 0x110 ... 0x117: case 0x128 ... 0x12f: case 0x138 ... 0x13a: