From patchwork Fri Mar 15 10:36:26 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854447
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 332EF1575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:38:21 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 183362A8F2
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:38:21 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 09AB82A932; Fri, 15 Mar 2019 10:38:21 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A90FB2A8F2
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:38:20 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kCT-0002wW-DR; Fri, 15 Mar 2019 10:36:29 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kCR-0002wP-9q
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:36:27 +0000
X-Inumbo-ID: 2f6666f7-470e-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 2f6666f7-470e-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:36:26 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:36:25 -0600
Message-Id: <5C8B802A020000780021F116@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:36:26 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 01/50] x86emul: no need to set
 fault_suppression to false for VMOVNT*
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

When evex.opmsk is required to be zero there's no need for this, as it
won't have been set to true in the first place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v8: Add this previously standalone patch into the series.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -5911,7 +5911,6 @@ x86_emulate(
     CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x2b): /* vmovntp{s,d} [xyz]mm,mem */
         generate_exception_if(ea.type != OP_MEM || evex.opmsk, EXC_UD);
         sfence = true;
-        fault_suppression = false;
         /* fall through */
     CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x10): /* vmovup{s,d} [xyz]mm/mem,[xyz]mm{k} */
     CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x10): /* vmovs{s,d} mem,xmm{k} */
@@ -6795,7 +6794,6 @@ x86_emulate(
         generate_exception_if(ea.type != OP_MEM || evex.opmsk || evex.w,
                               EXC_UD);
         sfence = true;
-        fault_suppression = false;
         /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f, 0x6f): /* vmovdqa{32,64} [xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_F3(0x0f, 0x6f): /* vmovdqu{32,64} [xyz]mm/mem,[xyz]mm{k} */

From patchwork Fri Mar 15 10:36:53 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854449
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3D9B1575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:38:38 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9F002A8F2
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:38:38 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9E5892A932; Fri, 15 Mar 2019 10:38:38 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0EE592A8F2
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:38:37 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kCv-0002zS-Rc; Fri, 15 Mar 2019 10:36:57 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kCv-0002zJ-1y
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:36:57 +0000
X-Inumbo-ID: 3f8b1f6c-470e-11e9-a383-4782e587b27a
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 3f8b1f6c-470e-11e9-a383-4782e587b27a;
 Fri, 15 Mar 2019 10:36:53 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:36:52 -0600
Message-Id: <5C8B8045020000780021F119@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:36:53 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 02/50] x86emul: support AVX512{F, BW,
 DQ} extract insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v4: Make use of d8s_dq64.
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -212,6 +212,7 @@ static const struct test avx512f_all[] =
 };
 
 static const struct test avx512f_128[] = {
+    INSN(extractps, 66, 0f3a, 17, el,    d, el),
     INSN(mov,       66,   0f, 6e, el, dq64, el),
     INSN(mov,       66,   0f, 7e, el, dq64, el),
     INSN(movq,      f3,   0f, 7e, el,    q, el),
@@ -221,10 +222,14 @@ static const struct test avx512f_128[] =
 static const struct test avx512f_no128[] = {
     INSN(broadcastf32x4, 66, 0f38, 1a, el_4,  d, vl),
     INSN(broadcastsd,    66, 0f38, 19, el,    q, el),
+    INSN(extractf32x4,   66, 0f3a, 19, el_4,  d, vl),
+    INSN(extracti32x4,   66, 0f3a, 39, el_4,  d, vl),
 };
 
 static const struct test avx512f_512[] = {
     INSN(broadcastf64x4, 66, 0f38, 1b, el_4, q, vl),
+    INSN(extractf64x4,   66, 0f3a, 1b, el_4, q, vl),
+    INSN(extracti64x4,   66, 0f3a, 3b, el_4, q, vl),
 };
 
 static const struct test avx512bw_all[] = {
@@ -280,6 +285,12 @@ static const struct test avx512bw_all[]
     INSN(ptestnm,     f3, 0f38, 26,    vl,   bw, vl),
 };
 
+static const struct test avx512bw_128[] = {
+    INSN(pextrb, 66, 0f3a, 14, el, b, el),
+//       pextrw, 66,   0f, c5,     w
+    INSN(pextrw, 66, 0f3a, 15, el, w, el),
+};
+
 static const struct test avx512dq_all[] = {
     INSN_PFP(and,              0f, 54),
     INSN_PFP(andn,             0f, 55),
@@ -288,13 +299,21 @@ static const struct test avx512dq_all[]
     INSN_PFP(xor,              0f, 57),
 };
 
+static const struct test avx512dq_128[] = {
+    INSN(pextr, 66, 0f3a, 16, el, dq64, el),
+};
+
 static const struct test avx512dq_no128[] = {
     INSN(broadcastf32x2, 66, 0f38, 19, el_2, d, vl),
     INSN(broadcastf64x2, 66, 0f38, 1a, el_2, q, vl),
+    INSN(extractf64x2,   66, 0f3a, 19, el_2, q, vl),
+    INSN(extracti64x2,   66, 0f3a, 39, el_2, q, vl),
 };
 
 static const struct test avx512dq_512[] = {
     INSN(broadcastf32x8, 66, 0f38, 1b, el_8, d, vl),
+    INSN(extractf32x8,   66, 0f3a, 1b, el_8, d, vl),
+    INSN(extracti32x8,   66, 0f3a, 3b, el_8, d, vl),
 };
 
 static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
@@ -632,7 +651,9 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512f, no128);
     RUN(avx512f, 512);
     RUN(avx512bw, all);
+    RUN(avx512bw, 128);
     RUN(avx512dq, all);
+    RUN(avx512dq, 128);
     RUN(avx512dq, no128);
     RUN(avx512dq, 512);
 }
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -512,9 +512,13 @@ static const struct ext0f3a_table {
     [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc },
     [0x0c ... 0x0d] = { .simd_size = simd_packed_fp },
     [0x0e ... 0x0f] = { .simd_size = simd_packed_int },
-    [0x14 ... 0x17] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1 },
+    [0x14] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 0 },
+    [0x15] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 1 },
+    [0x16] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = d8s_dq64 },
+    [0x17] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 2 },
     [0x18] = { .simd_size = simd_128 },
-    [0x19] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1 },
+    [0x19] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 },
+    [0x1b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x1d] = { .simd_size = simd_other, .to_mem = 1, .two_op = 1 },
     [0x1e ... 0x1f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x20] = { .simd_size = simd_none },
@@ -523,7 +527,8 @@ static const struct ext0f3a_table {
     [0x25] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x30 ... 0x33] = { .simd_size = simd_other, .two_op = 1 },
     [0x38] = { .simd_size = simd_128 },
-    [0x39] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1 },
+    [0x39] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 },
+    [0x3b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x40 ... 0x41] = { .simd_size = simd_packed_fp },
     [0x42] = { .simd_size = simd_packed_int },
@@ -2676,6 +2681,8 @@ x86_decode_0f3a(
      ... X86EMUL_OPC_66(0, 0x17):     /* pextr*, extractps */
     case X86EMUL_OPC_VEX_66(0, 0x14)
      ... X86EMUL_OPC_VEX_66(0, 0x17): /* vpextr*, vextractps */
+    case X86EMUL_OPC_EVEX_66(0, 0x14)
+     ... X86EMUL_OPC_EVEX_66(0, 0x17): /* vpextr*, vextractps */
     case X86EMUL_OPC_VEX_F2(0, 0xf0): /* rorx */
         break;
 
@@ -8878,9 +8885,9 @@ x86_emulate(
         opc[0] = b;
         /* Convert memory/GPR operand to (%rAX). */
         rex_prefix &= ~REX_B;
-        vex.b = 1;
+        evex.b = vex.b = 1;
         if ( !mode_64bit() )
-            vex.w = 0;
+            evex.w = vex.w = 0;
         opc[1] = modrm & 0x38;
         opc[2] = imm1;
         opc[3] = 0xc3;
@@ -8890,7 +8897,10 @@ x86_emulate(
             --opc;
         }
 
-        copy_REX_VEX(opc, rex_prefix, vex);
+        if ( evex_encoded() )
+            copy_EVEX(opc, evex);
+        else
+            copy_REX_VEX(opc, rex_prefix, vex);
         invoke_stub("", "", "=m" (dst.val) : "a" (&dst.val));
         put_stub(stub);
 
@@ -8915,6 +8925,52 @@ x86_emulate(
         opc = init_prefixes(stub);
         goto pextr;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0xc5):   /* vpextrw $imm8,xmm,reg */
+        generate_exception_if(ea.type != OP_REG, EXC_UD);
+        /* Convert to alternative encoding: We want to use a memory operand. */
+        evex.opcx = ext_0f3a;
+        b = 0x15;
+        modrm <<= 3;
+        evex.r = evex.b;
+        evex.R = evex.x;
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x14): /* vpextrb $imm8,xmm,r/m */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x15): /* vpextrw $imm8,xmm,r/m */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x16): /* vpextr{d,q} $imm8,xmm,r/m */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x17): /* vextractps $imm8,xmm,r/m */
+        generate_exception_if((evex.lr || evex.reg != 0xf || !evex.RX ||
+                               evex.opmsk || evex.brs),
+                              EXC_UD);
+        if ( !(b & 2) )
+            host_and_vcpu_must_have(avx512bw);
+        else if ( !(b & 1) )
+            host_and_vcpu_must_have(avx512dq);
+        else
+            host_and_vcpu_must_have(avx512f);
+        get_fpu(X86EMUL_FPU_zmm);
+        opc = init_evex(stub);
+        goto pextr;
+
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x19): /* vextractf32x4 $imm8,{y,z}mm,xmm/m128{k} */
+                                            /* vextractf64x2 $imm8,{y,z}mm,xmm/m128{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x39): /* vextracti32x4 $imm8,{y,z}mm,xmm/m128{k} */
+                                            /* vextracti64x2 $imm8,{y,z}mm,xmm/m128{k} */
+        if ( evex.w )
+            host_and_vcpu_must_have(avx512dq);
+        generate_exception_if(!evex.lr || evex.brs, EXC_UD);
+        fault_suppression = false;
+        goto avx512f_imm8_no_sae;
+
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x1b): /* vextractf32x8 $imm8,zmm,ymm/m256{k} */
+                                            /* vextractf64x4 $imm8,zmm,ymm/m256{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x3b): /* vextracti32x8 $imm8,zmm,ymm/m256{k} */
+                                            /* vextracti64x4 $imm8,zmm,ymm/m256{k} */
+        if ( !evex.w )
+            host_and_vcpu_must_have(avx512dq);
+        generate_exception_if(evex.lr != 2 || evex.brs, EXC_UD);
+        fault_suppression = false;
+        goto avx512f_imm8_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x1d): /* vcvtps2ph $imm8,{x,y}mm,xmm/mem */
     {
         uint32_t mxcsr;

From patchwork Fri Mar 15 10:37:20 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854451
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B5A71575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:39:09 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E6A82A16A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:39:09 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 1FACC2A191; Fri, 15 Mar 2019 10:39:09 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 661612A16A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:39:08 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kDM-00032j-5g; Fri, 15 Mar 2019 10:37:24 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kDL-00032X-1l
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:37:23 +0000
X-Inumbo-ID: 505686fe-470e-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 505686fe-470e-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:37:21 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:37:21 -0600
Message-Id: <5C8B8060020000780021F11C@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:37:20 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 03/50] x86emul: support AVX512{F, BW,
 DQ} insert insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Also correct the comment of the AVX form of VINSERTPS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v6: Don't refuse to emulate VINSERTPS without AVX512VL.
v4: Make use of d8s_dq64.
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -213,6 +213,7 @@ static const struct test avx512f_all[] =
 
 static const struct test avx512f_128[] = {
     INSN(extractps, 66, 0f3a, 17, el,    d, el),
+    INSN(insertps,  66, 0f3a, 21, el,    d, el),
     INSN(mov,       66,   0f, 6e, el, dq64, el),
     INSN(mov,       66,   0f, 7e, el, dq64, el),
     INSN(movq,      f3,   0f, 7e, el,    q, el),
@@ -224,12 +225,16 @@ static const struct test avx512f_no128[]
     INSN(broadcastsd,    66, 0f38, 19, el,    q, el),
     INSN(extractf32x4,   66, 0f3a, 19, el_4,  d, vl),
     INSN(extracti32x4,   66, 0f3a, 39, el_4,  d, vl),
+    INSN(insertf32x4,    66, 0f3a, 18, el_4,  d, vl),
+    INSN(inserti32x4,    66, 0f3a, 38, el_4,  d, vl),
 };
 
 static const struct test avx512f_512[] = {
     INSN(broadcastf64x4, 66, 0f38, 1b, el_4, q, vl),
     INSN(extractf64x4,   66, 0f3a, 1b, el_4, q, vl),
     INSN(extracti64x4,   66, 0f3a, 3b, el_4, q, vl),
+    INSN(insertf64x4,    66, 0f3a, 1a, el_4, q, vl),
+    INSN(inserti64x4,    66, 0f3a, 3a, el_4, q, vl),
 };
 
 static const struct test avx512bw_all[] = {
@@ -289,6 +294,8 @@ static const struct test avx512bw_128[]
     INSN(pextrb, 66, 0f3a, 14, el, b, el),
 //       pextrw, 66,   0f, c5,     w
     INSN(pextrw, 66, 0f3a, 15, el, w, el),
+    INSN(pinsrb, 66, 0f3a, 20, el, b, el),
+    INSN(pinsrw, 66,   0f, c4, el, w, el),
 };
 
 static const struct test avx512dq_all[] = {
@@ -301,6 +308,7 @@ static const struct test avx512dq_all[]
 
 static const struct test avx512dq_128[] = {
     INSN(pextr, 66, 0f3a, 16, el, dq64, el),
+    INSN(pinsr, 66, 0f3a, 22, el, dq64, el),
 };
 
 static const struct test avx512dq_no128[] = {
@@ -308,12 +316,16 @@ static const struct test avx512dq_no128[
     INSN(broadcastf64x2, 66, 0f38, 1a, el_2, q, vl),
     INSN(extractf64x2,   66, 0f3a, 19, el_2, q, vl),
     INSN(extracti64x2,   66, 0f3a, 39, el_2, q, vl),
+    INSN(insertf64x2,    66, 0f3a, 18, el_2, q, vl),
+    INSN(inserti64x2,    66, 0f3a, 38, el_2, q, vl),
 };
 
 static const struct test avx512dq_512[] = {
     INSN(broadcastf32x8, 66, 0f38, 1b, el_8, d, vl),
     INSN(extractf32x8,   66, 0f3a, 1b, el_8, d, vl),
     INSN(extracti32x8,   66, 0f3a, 3b, el_8, d, vl),
+    INSN(insertf32x8,    66, 0f3a, 1a, el_8, d, vl),
+    INSN(inserti32x8,    66, 0f3a, 3a, el_8, d, vl),
 };
 
 static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -360,7 +360,7 @@ static const struct twobyte_table {
     [0xc1] = { DstMem|SrcReg|ModRM },
     [0xc2] = { DstImplicit|SrcImmByte|ModRM, simd_any_fp, d8s_vl },
     [0xc3] = { DstMem|SrcReg|ModRM|Mov },
-    [0xc4] = { DstReg|SrcImmByte|ModRM, simd_packed_int },
+    [0xc4] = { DstReg|SrcImmByte|ModRM, simd_packed_int, 1 },
     [0xc5] = { DstReg|SrcImmByte|ModRM|Mov },
     [0xc6] = { DstImplicit|SrcImmByte|ModRM, simd_packed_fp, d8s_vl },
     [0xc7] = { ImplicitOps|ModRM },
@@ -516,17 +516,19 @@ static const struct ext0f3a_table {
     [0x15] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 1 },
     [0x16] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = d8s_dq64 },
     [0x17] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 2 },
-    [0x18] = { .simd_size = simd_128 },
+    [0x18] = { .simd_size = simd_128, .d8s = 4 },
     [0x19] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 },
+    [0x1a] = { .simd_size = simd_256, .d8s = d8s_vl_by_2 },
     [0x1b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x1d] = { .simd_size = simd_other, .to_mem = 1, .two_op = 1 },
     [0x1e ... 0x1f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
-    [0x20] = { .simd_size = simd_none },
-    [0x21] = { .simd_size = simd_other },
-    [0x22] = { .simd_size = simd_none },
+    [0x20] = { .simd_size = simd_none, .d8s = 0 },
+    [0x21] = { .simd_size = simd_other, .d8s = 2 },
+    [0x22] = { .simd_size = simd_none, .d8s = d8s_dq64 },
     [0x25] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x30 ... 0x33] = { .simd_size = simd_other, .two_op = 1 },
-    [0x38] = { .simd_size = simd_128 },
+    [0x38] = { .simd_size = simd_128, .d8s = 4 },
+    [0x3a] = { .simd_size = simd_256, .d8s = d8s_vl_by_2 },
     [0x39] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 },
     [0x3b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
@@ -2586,6 +2588,7 @@ x86_decode_twobyte(
         ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
         /* fall through */
     case X86EMUL_OPC_VEX_66(0, 0xc4): /* vpinsrw */
+    case X86EMUL_OPC_EVEX_66(0, 0xc4): /* vpinsrw */
         state->desc = DstReg | SrcMem16;
         break;
 
@@ -2688,6 +2691,7 @@ x86_decode_0f3a(
 
     case X86EMUL_OPC_66(0, 0x20):     /* pinsrb */
     case X86EMUL_OPC_VEX_66(0, 0x20): /* vpinsrb */
+    case X86EMUL_OPC_EVEX_66(0, 0x20): /* vpinsrb */
         state->desc = DstImplicit | SrcMem;
         if ( modrm_mod != 3 )
             state->desc |= ByteOp;
@@ -2695,6 +2699,7 @@ x86_decode_0f3a(
 
     case X86EMUL_OPC_66(0, 0x22):     /* pinsr{d,q} */
     case X86EMUL_OPC_VEX_66(0, 0x22): /* vpinsr{d,q} */
+    case X86EMUL_OPC_EVEX_66(0, 0x22): /* vpinsr{d,q} */
         state->desc = DstImplicit | SrcMem;
         break;
 
@@ -7735,6 +7740,23 @@ x86_emulate(
         ea.type = OP_MEM;
         goto simd_0f_int_imm8;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0xc4):   /* vpinsrw $imm8,r32/m16,xmm,xmm */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x20): /* vpinsrb $imm8,r32/m8,xmm,xmm */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x22): /* vpinsr{d,q} $imm8,r/m,xmm,xmm */
+        generate_exception_if(evex.lr || evex.opmsk || evex.brs, EXC_UD);
+        if ( b & 2 )
+            host_and_vcpu_must_have(avx512dq);
+        else
+            host_and_vcpu_must_have(avx512bw);
+        if ( !mode_64bit() )
+            evex.w = 0;
+        memcpy(mmvalp, &src.val, op_bytes);
+        ea.type = OP_MEM;
+        op_bytes = src.bytes;
+        d = SrcMem16; /* Fake for the common SIMD code below. */
+        state->simd_size = simd_other;
+        goto avx512f_imm8_no_sae;
+
     CASE_SIMD_PACKED_INT(0x0f, 0xc5):      /* pextrw $imm8,{,x}mm,reg */
     case X86EMUL_OPC_VEX_66(0x0f, 0xc5):   /* vpextrw $imm8,xmm,reg */
         generate_exception_if(vex.l, EXC_UD);
@@ -8951,8 +8973,12 @@ x86_emulate(
         opc = init_evex(stub);
         goto pextr;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x18): /* vinsertf32x4 $imm8,xmm/m128,{y,z}mm{k} */
+                                            /* vinsertf64x2 $imm8,xmm/m128,{y,z}mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x19): /* vextractf32x4 $imm8,{y,z}mm,xmm/m128{k} */
                                             /* vextractf64x2 $imm8,{y,z}mm,xmm/m128{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x38): /* vinserti32x4 $imm8,xmm/m128,{y,z}mm{k} */
+                                            /* vinserti64x2 $imm8,xmm/m128,{y,z}mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x39): /* vextracti32x4 $imm8,{y,z}mm,xmm/m128{k} */
                                             /* vextracti64x2 $imm8,{y,z}mm,xmm/m128{k} */
         if ( evex.w )
@@ -8961,8 +8987,12 @@ x86_emulate(
         fault_suppression = false;
         goto avx512f_imm8_no_sae;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x1a): /* vinsertf32x4 $imm8,ymm/m256,zmm{k} */
+                                            /* vinsertf64x2 $imm8,ymm/m256,zmm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x1b): /* vextractf32x8 $imm8,zmm,ymm/m256{k} */
                                             /* vextractf64x4 $imm8,zmm,ymm/m256{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x3a): /* vinserti32x4 $imm8,ymm/m256,zmm{k} */
+                                            /* vinserti64x2 $imm8,ymm/m256,zmm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x3b): /* vextracti32x8 $imm8,zmm,ymm/m256{k} */
                                             /* vextracti64x4 $imm8,zmm,ymm/m256{k} */
         if ( !evex.w )
@@ -9055,13 +9085,20 @@ x86_emulate(
         op_bytes = 4;
         goto simd_0f3a_common;
 
-    case X86EMUL_OPC_VEX_66(0x0f3a, 0x21): /* vinsertps $imm8,xmm/m128,xmm,xmm */
+    case X86EMUL_OPC_VEX_66(0x0f3a, 0x21): /* vinsertps $imm8,xmm/m32,xmm,xmm */
         op_bytes = 4;
         /* fall through */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x41): /* vdppd $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */
         generate_exception_if(vex.l, EXC_UD);
         goto simd_0f_imm8_avx;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x21): /* vinsertps $imm8,xmm/m32,xmm,xmm */
+        host_and_vcpu_must_have(avx512f);
+        generate_exception_if(evex.lr || evex.w || evex.opmsk || evex.brs,
+                              EXC_UD);
+        op_bytes = 4;
+        goto simd_imm8_zmm;
+
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x30): /* kshiftr{b,w} $imm8,k,k */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x32): /* kshiftl{b,w} $imm8,k,k */
         if ( !vex.w )

From patchwork Fri Mar 15 10:38:35 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854453
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E77961575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:40:25 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CBE812A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:40:25 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id BFC832A937; Fri, 15 Mar 2019 10:40:25 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D7BCD2A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:40:24 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kEb-0003EJ-I9; Fri, 15 Mar 2019 10:38:41 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kEa-0003E9-1H
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:38:40 +0000
X-Inumbo-ID: 7dfca9dc-470e-11e9-8d48-b7475793dce7
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 7dfca9dc-470e-11e9-8d48-b7475793dce7;
 Fri, 15 Mar 2019 10:38:38 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:38:37 -0600
Message-Id: <5C8B80AB020000780021F11F@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:38:35 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 04/50] x86emul: basic AVX512F testing
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Test various of the insns which have been implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v6: Fix formatting in simd.h.
v5: Add VSQRT* tests.
v4: Make eq() also work for 4- and 8-byte integer element sizes.
v3: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86
 
 CFLAGS += $(CFLAGS_xeninclude)
 
-SIMD := 3dnow sse sse2 sse4 avx avx2 xop
+SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f
 FMA := fma4 fma
 SG := avx2-sg
 TESTCASES := blowfish $(SIMD) $(FMA) $(SG)
@@ -63,6 +63,9 @@ avx2-sg-flts := 4 8
 xop-vecs := $(avx-vecs)
 xop-ints := 1 2 4 8
 xop-flts := $(avx-flts)
+avx512f-vecs := 64
+avx512f-ints := 4 8
+avx512f-flts := 4 8
 
 avx512f-opmask-vecs := 2
 avx512dq-opmask-vecs := 1
@@ -170,7 +173,7 @@ $(addsuffix .c,$(SG)):
 
 $(addsuffix .h,$(SIMD) $(FMA) $(SG)): simd.h
 
-xop.h: simd-fma.c
+xop.h avx512f.h: simd-fma.c
 
 endif # 32-bit override
 
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -2,7 +2,41 @@
 
 ENTRY(simd_test);
 
-#if VEC_SIZE == 8 && defined(__SSE__)
+#if defined(__AVX512F__)
+# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT))
+# if VEC_SIZE == 4
+#  define eq(x, y) ({ \
+    float x_ = (x)[0]; \
+    float __attribute__((vector_size(16))) y_ = { (y)[0] }; \
+    unsigned short r_; \
+    asm ( "vcmpss $0, %1, %2, %0"  : "=k" (r_) : "m" (x_), "v" (y_) ); \
+    r_ == 1; \
+})
+# elif VEC_SIZE == 8
+#  define eq(x, y) ({ \
+    double x_ = (x)[0]; \
+    double __attribute__((vector_size(16))) y_ = { (y)[0] }; \
+    unsigned short r_; \
+    asm ( "vcmpsd $0, %1, %2, %0"  : "=k" (r_) : "m" (x_), "v" (y_) ); \
+    r_ == 1; \
+})
+# elif FLOAT_SIZE == 4
+/*
+ * gcc's (up to at least 8.2) __builtin_ia32_cmpps256_mask() has an anomaly in
+ * that its return type is QI rather than UQI, and hence the value would get
+ * sign-extended before comapring to ALL_TRUE. The same oddity does not matter
+ * for __builtin_ia32_cmppd256_mask(), as there only 4 bits are significant.
+ * Hence the extra " & ALL_TRUE".
+ */
+#  define eq(x, y) ((BR(cmpps, _mask, x, y, 0, -1) & ALL_TRUE) == ALL_TRUE)
+# elif FLOAT_SIZE == 8
+#  define eq(x, y) (BR(cmppd, _mask, x, y, 0, -1) == ALL_TRUE)
+# elif INT_SIZE == 4 || UINT_SIZE == 4
+#  define eq(x, y) (B(pcmpeqd, _mask, (vsi_t)(x), (vsi_t)(y), -1) == ALL_TRUE)
+# elif INT_SIZE == 8 || UINT_SIZE == 8
+#  define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE)
+# endif
+#elif VEC_SIZE == 8 && defined(__SSE__)
 # define to_bool(cmp) (__builtin_ia32_pmovmskb(cmp) == 0xff)
 #elif VEC_SIZE == 16
 # if defined(__AVX__) && defined(FLOAT_SIZE)
@@ -93,6 +127,56 @@ static inline bool _to_bool(byte_vec_t b
     touch(x); \
     __builtin_ia32_pfrcpit2(__builtin_ia32_pfrsqit1(__builtin_ia32_pfmul(t_, t_), x), t_); \
 })
+#elif defined(FLOAT_SIZE) && VEC_SIZE == FLOAT_SIZE && defined(__AVX512F__)
+# if FLOAT_SIZE == 4
+#  define sqrt(x) scalar_1op(x, "vsqrtss %[in], %[out], %[out]")
+# elif FLOAT_SIZE == 8
+#  define sqrt(x) scalar_1op(x, "vsqrtsd %[in], %[out], %[out]")
+# endif
+#elif defined(FLOAT_SIZE) && defined(__AVX512F__) && \
+      (VEC_SIZE == 64 || defined(__AVX512VL__))
+# if FLOAT_SIZE == 4
+#  define broadcast(x) ({ \
+    vec_t t_; \
+    asm ( "%{evex%} vbroadcastss %1, %0" \
+          : "=v" (t_) : "m" (*(float[1]){ x }) ); \
+    t_; \
+})
+#  define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0)
+#  define min(x, y) BR_(minps, _mask, x, y, undef(), ~0)
+#  define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE))
+#  define sqrt(x) BR(sqrtps, _mask, x, undef(), ~0)
+#  if VEC_SIZE == 16
+#   define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0)
+#   define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0)
+#   define swap(x) B(shufps, _mask, x, x, 0b00011011, undef(), ~0)
+#  endif
+# elif FLOAT_SIZE == 8
+#  if VEC_SIZE >= 32
+#   define broadcast(x) ({ \
+    vec_t t_; \
+    asm ( "%{evex%} vbroadcastsd %1, %0" : "=v" (t_) \
+          : "m" (*(double[1]){ x }) ); \
+    t_; \
+})
+#  else
+#   define broadcast(x) ({ \
+    vec_t t_; \
+    asm ( "%{evex%} vpbroadcastq %1, %0" \
+          : "=v" (t_) : "m" (*(double[1]){ x }) ); \
+    t_; \
+})
+#  endif
+#  define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0)
+#  define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0)
+#  define mix(x, y) B(movapd, _mask, x, y, 0b01010101)
+#  define sqrt(x) BR(sqrtpd, _mask, x, undef(), ~0)
+#  if VEC_SIZE == 16
+#   define interleave_hi(x, y) B(unpckhpd, _mask, x, y, undef(), ~0)
+#   define interleave_lo(x, y) B(unpcklpd, _mask, x, y, undef(), ~0)
+#   define swap(x) B(shufpd, _mask, x, x, 0b01, undef(), ~0)
+#  endif
+# endif
 #elif FLOAT_SIZE == 4 && defined(__SSE__)
 # if VEC_SIZE == 32 && defined(__AVX__)
 #  if defined(__AVX2__)
@@ -191,7 +275,30 @@ static inline bool _to_bool(byte_vec_t b
 #  define sqrt(x) scalar_1op(x, "sqrtsd %[in], %[out]")
 # endif
 #endif
-#if VEC_SIZE == 16 && defined(__SSE2__)
+#if (INT_SIZE == 4 || UINT_SIZE == 4 || INT_SIZE == 8 || UINT_SIZE == 8) && \
+     defined(__AVX512F__) && (VEC_SIZE == 64 || defined(__AVX512VL__))
+# if INT_SIZE == 4 || UINT_SIZE == 4
+#  define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \
+                              (0b0101010101010101 & ((1 << ELEM_COUNT) - 1))))
+# elif INT_SIZE == 8 || UINT_SIZE == 8
+#  define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101))
+# endif
+# if INT_SIZE == 4
+#  define max(x, y) B(pmaxsd, _mask, x, y, undef(), ~0)
+#  define min(x, y) B(pminsd, _mask, x, y, undef(), ~0)
+#  define mul_full(x, y) ((vec_t)B(pmuldq, _mask, x, y, (vdi_t)undef(), ~0))
+# elif UINT_SIZE == 4
+#  define max(x, y) ((vec_t)B(pmaxud, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
+#  define min(x, y) ((vec_t)B(pminud, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
+#  define mul_full(x, y) ((vec_t)B(pmuludq, _mask, (vsi_t)(x), (vsi_t)(y), (vdi_t)undef(), ~0))
+# elif INT_SIZE == 8
+#  define max(x, y) ((vec_t)B(pmaxsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
+#  define min(x, y) ((vec_t)B(pminsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
+# elif UINT_SIZE == 8
+#  define max(x, y) ((vec_t)B(pmaxuq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
+#  define min(x, y) ((vec_t)B(pminuq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
+# endif
+#elif VEC_SIZE == 16 && defined(__SSE2__)
 # if INT_SIZE == 1 || UINT_SIZE == 1
 #  define interleave_hi(x, y) ((vec_t)__builtin_ia32_punpckhbw128((vqi_t)(x), (vqi_t)(y)))
 #  define interleave_lo(x, y) ((vec_t)__builtin_ia32_punpcklbw128((vqi_t)(x), (vqi_t)(y)))
@@ -587,6 +694,10 @@ static inline bool _to_bool(byte_vec_t b
 # endif
 #endif
 
+#if defined(__AVX512F__) && defined(FLOAT_SIZE)
+# include "simd-fma.c"
+#endif
+
 int simd_test(void)
 {
     unsigned int i, j;
@@ -1034,7 +1145,8 @@ int simd_test(void)
 # endif
 #endif
 
-#if defined(__XOP__) && VEC_SIZE == 16 && (INT_SIZE == 2 || INT_SIZE == 4)
+#if (defined(__XOP__) && VEC_SIZE == 16 && (INT_SIZE == 2 || INT_SIZE == 4)) || \
+    (defined(__AVX512F__) && defined(FLOAT_SIZE))
     return -fma_test();
 #endif
 
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -70,9 +70,111 @@ typedef int __attribute__((vector_size(V
 typedef long long __attribute__((vector_size(VEC_SIZE))) vdi_t;
 #endif
 
+#if VEC_SIZE == 16
+# define B(n, s, a...)   __builtin_ia32_ ## n ## 128 ## s(a)
+# define B_(n, s, a...)  __builtin_ia32_ ## n ##        s(a)
+#elif VEC_SIZE == 32
+# define B(n, s, a...)   __builtin_ia32_ ## n ## 256 ## s(a)
+#elif VEC_SIZE == 64
+# define B(n, s, a...)   __builtin_ia32_ ## n ## 512 ## s(a)
+# define BR(n, s, a...)  __builtin_ia32_ ## n ## 512 ## s(a, 4)
+#endif
+#ifndef B_
+# define B_ B
+#endif
+#ifndef BR
+# define BR B
+# define BR_ B_
+#endif
+#ifndef BR_
+# define BR_ BR
+#endif
+
+#ifdef __AVX512F__
+
+/*
+ * The original plan was to effect use of EVEX encodings for scalar as well as
+ * 128- and 256-bit insn variants by restricting the compiler to use (on 64-bit
+ * only of course) XMM16-XMM31 only. All sorts of compiler errors result when
+ * doing this with gcc 8.2. Therefore resort to injecting {evex} prefixes,
+ * which has the benefit of also working for 32-bit. Granted, there is a lot of
+ * escaping to get right here.
+ */
+asm ( ".macro override insn    \n\t"
+      ".macro $\\insn o:vararg \n\t"
+      ".purgem \\insn          \n\t"
+      "{evex} \\insn \\(\\)o   \n\t"
+      ".macro \\insn o:vararg  \n\t"
+      "$\\insn \\(\\(\\))o     \n\t"
+      ".endm                   \n\t"
+      ".endm                   \n\t"
+      ".macro \\insn o:vararg  \n\t"
+      "$\\insn \\(\\)o         \n\t"
+      ".endm                   \n\t"
+      ".endm" );
+
+# define OVR(n) asm ( "override v" #n )
+# define OVR_SFP(n) OVR(n ## sd); OVR(n ## ss)
+
+# ifdef __AVX512VL__
+#  ifdef __AVX512BW__
+#   define OVR_BW(n) OVR(p ## n ## b); OVR(p ## n ## w)
+#  else
+#   define OVR_BW(n)
+#  endif
+#  define OVR_DQ(n) OVR(p ## n ## d); OVR(p ## n ## q)
+#  define OVR_VFP(n) OVR(n ## pd); OVR(n ## ps)
+# else
+#  define OVR_BW(n)
+#  define OVR_DQ(n)
+#  define OVR_VFP(n)
+# endif
+
+# define OVR_FMA(n, w) OVR_ ## w(n ## 132); OVR_ ## w(n ## 213); \
+                       OVR_ ## w(n ## 231)
+# define OVR_FP(n) OVR_VFP(n); OVR_SFP(n)
+# define OVR_INT(n) OVR_BW(n); OVR_DQ(n)
+
+OVR_SFP(broadcast);
+OVR_SFP(comi);
+OVR_FP(add);
+OVR_FP(div);
+OVR(extractps);
+OVR_FMA(fmadd, FP);
+OVR_FMA(fmsub, FP);
+OVR_FMA(fnmadd, FP);
+OVR_FMA(fnmsub, FP);
+OVR(insertps);
+OVR_FP(max);
+OVR_FP(min);
+OVR(movd);
+OVR(movq);
+OVR_SFP(mov);
+OVR_FP(mul);
+OVR_FP(sqrt);
+OVR_FP(sub);
+OVR_SFP(ucomi);
+
+# undef OVR_VFP
+# undef OVR_SFP
+# undef OVR_INT
+# undef OVR_FP
+# undef OVR_FMA
+# undef OVR_DQ
+# undef OVR_BW
+# undef OVR
+
+#endif /* __AVX512F__ */
+
 /*
  * Suppress value propagation by the compiler, preventing unwanted
  * optimization. This at once makes the compiler use memory operands
  * more often, which for our purposes is the more interesting case.
  */
 #define touch(var) asm volatile ( "" : "+m" (var) )
+
+static inline vec_t undef(void)
+{
+    vec_t v = v;
+    return v;
+}
--- a/tools/tests/x86_emulator/simd-fma.c
+++ b/tools/tests/x86_emulator/simd-fma.c
@@ -1,10 +1,9 @@
+#if !defined(__XOP__) && !defined(__AVX512F__)
 #include "simd.h"
-
-#ifndef __XOP__
 ENTRY(fma_test);
 #endif
 
-#if VEC_SIZE < 16
+#if VEC_SIZE < 16 && !defined(to_bool)
 # define to_bool(cmp) (!~(cmp)[0])
 #elif VEC_SIZE == 16
 # if FLOAT_SIZE == 4
@@ -24,7 +23,13 @@ ENTRY(fma_test);
 # define eq(x, y) to_bool((x) == (y))
 #endif
 
-#if VEC_SIZE == 16
+#if defined(__AVX512F__) && VEC_SIZE > FLOAT_SIZE
+# if FLOAT_SIZE == 4
+#  define fmaddsub(x, y, z) BR(vfmaddsubps, _mask, x, y, z, ~0)
+# elif FLOAT_SIZE == 8
+#  define fmaddsub(x, y, z) BR(vfmaddsubpd, _mask, x, y, z, ~0)
+# endif
+#elif VEC_SIZE == 16
 # if FLOAT_SIZE == 4
 #  define addsub(x, y) __builtin_ia32_addsubps(x, y)
 #  if defined(__FMA4__) || defined(__FMA__)
@@ -50,6 +55,10 @@ ENTRY(fma_test);
 # endif
 #endif
 
+#if defined(fmaddsub) && !defined(addsub)
+# define addsub(x, y) fmaddsub(x, broadcast(1), y)
+#endif
+
 int fma_test(void)
 {
     unsigned int i;
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -21,6 +21,7 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512f-opmask.h"
 #include "avx512dq-opmask.h"
 #include "avx512bw-opmask.h"
+#include "avx512f.h"
 
 #define verbose false /* Switch to true for far more logging. */
 
@@ -248,6 +249,14 @@ static const struct {
     SIMD(OPMASK/b,    avx512dq_opmask,         1),
     SIMD(OPMASK/d,    avx512bw_opmask,         4),
     SIMD(OPMASK/q,    avx512bw_opmask,         8),
+    SIMD(AVX512F f32 scalar,  avx512f,        f4),
+    SIMD(AVX512F f32x16,      avx512f,      64f4),
+    SIMD(AVX512F f64 scalar,  avx512f,        f8),
+    SIMD(AVX512F f64x8,       avx512f,      64f8),
+    SIMD(AVX512F s32x16,      avx512f,      64i4),
+    SIMD(AVX512F u32x16,      avx512f,      64u4),
+    SIMD(AVX512F s64x8,       avx512f,      64i8),
+    SIMD(AVX512F u64x8,       avx512f,      64u8),
 #undef SIMD_
 #undef SIMD
 };

From patchwork Fri Mar 15 10:39:01 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854455
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E85841575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:40:50 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD5F72A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:40:50 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C0C862A937; Fri, 15 Mar 2019 10:40:50 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 06AAF2A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:40:50 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kEz-0003K1-1T; Fri, 15 Mar 2019 10:39:05 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kEy-0003Ji-0l
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:39:04 +0000
X-Inumbo-ID: 8c55f094-470e-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 8c55f094-470e-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:39:02 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:39:01 -0600
Message-Id: <5C8B80C5020000780021F122@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:39:01 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 05/50] x86emul: support AVX512{F, BW,
 DQ} integer broadcast insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Note that the pbroadcastw table entry in evex-disp8.c is slightly
different from what one would expect, due to it requiring EVEX.W to be
zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v7: Use dummy output in invoke_stub(). Re-base.
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -164,6 +164,9 @@ static const struct test avx512f_all[] =
     INSN(paddq,        66,   0f, d4,    vl,      q, vl),
     INSN(pand,         66,   0f, db,    vl,     dq, vl),
     INSN(pandn,        66,   0f, df,    vl,     dq, vl),
+//       pbroadcast,   66, 0f38, 7c,          dq64
+    INSN(pbroadcastd,  66, 0f38, 58,    el,      d, el),
+    INSN(pbroadcastq,  66, 0f38, 59,    el,      q, el),
     INSN(pcmp,         66, 0f3a, 1f,    vl,     dq, vl),
     INSN(pcmpeqd,      66,   0f, 76,    vl,      d, vl),
     INSN(pcmpeqq,      66, 0f38, 29,    vl,      q, vl),
@@ -222,6 +225,7 @@ static const struct test avx512f_128[] =
 
 static const struct test avx512f_no128[] = {
     INSN(broadcastf32x4, 66, 0f38, 1a, el_4,  d, vl),
+    INSN(broadcasti32x4, 66, 0f38, 5a, el_4,  d, vl),
     INSN(broadcastsd,    66, 0f38, 19, el,    q, el),
     INSN(extractf32x4,   66, 0f3a, 19, el_4,  d, vl),
     INSN(extracti32x4,   66, 0f3a, 39, el_4,  d, vl),
@@ -231,6 +235,7 @@ static const struct test avx512f_no128[]
 
 static const struct test avx512f_512[] = {
     INSN(broadcastf64x4, 66, 0f38, 1b, el_4, q, vl),
+    INSN(broadcasti64x4, 66, 0f38, 5b, el_4, q, vl),
     INSN(extractf64x4,   66, 0f3a, 1b, el_4, q, vl),
     INSN(extracti64x4,   66, 0f3a, 3b, el_4, q, vl),
     INSN(insertf64x4,    66, 0f3a, 1a, el_4, q, vl),
@@ -250,6 +255,10 @@ static const struct test avx512bw_all[]
     INSN(paddw,       66,   0f, fd,    vl,    w, vl),
     INSN(pavgb,       66,   0f, e0,    vl,    b, vl),
     INSN(pavgw,       66,   0f, e3,    vl,    w, vl),
+    INSN(pbroadcastb, 66, 0f38, 78,    el,    b, el),
+//       pbroadcastb, 66, 0f38, 7a,           b
+    INSN(pbroadcastw, 66, 0f38, 79,    el_2,  b, vl),
+//       pbroadcastw, 66, 0f38, 7b,           b
     INSN(pcmp,        66, 0f3a, 3f,    vl,   bw, vl),
     INSN(pcmpeqb,     66,   0f, 74,    vl,    b, vl),
     INSN(pcmpeqw,     66,   0f, 75,    vl,    w, vl),
@@ -301,6 +310,7 @@ static const struct test avx512bw_128[]
 static const struct test avx512dq_all[] = {
     INSN_PFP(and,              0f, 54),
     INSN_PFP(andn,             0f, 55),
+    INSN(broadcasti32x2, 66, 0f38, 59, el_2,  d, vl),
     INSN_PFP(or,               0f, 56),
     INSN(pmullq,         66, 0f38, 40,   vl,  q, vl),
     INSN_PFP(xor,              0f, 57),
@@ -314,6 +324,7 @@ static const struct test avx512dq_128[]
 static const struct test avx512dq_no128[] = {
     INSN(broadcastf32x2, 66, 0f38, 19, el_2, d, vl),
     INSN(broadcastf64x2, 66, 0f38, 1a, el_2, q, vl),
+    INSN(broadcasti64x2, 66, 0f38, 5a, el_2, q, vl),
     INSN(extractf64x2,   66, 0f3a, 19, el_2, q, vl),
     INSN(extracti64x2,   66, 0f3a, 39, el_2, q, vl),
     INSN(insertf64x2,    66, 0f3a, 18, el_2, q, vl),
@@ -322,6 +333,7 @@ static const struct test avx512dq_no128[
 
 static const struct test avx512dq_512[] = {
     INSN(broadcastf32x8, 66, 0f38, 1b, el_8, d, vl),
+    INSN(broadcasti32x8, 66, 0f38, 5b, el_8, d, vl),
     INSN(extractf32x8,   66, 0f3a, 1b, el_8, d, vl),
     INSN(extracti32x8,   66, 0f3a, 3b, el_8, d, vl),
     INSN(insertf32x8,    66, 0f3a, 1a, el_8, d, vl),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -278,9 +278,33 @@ static inline bool _to_bool(byte_vec_t b
 #if (INT_SIZE == 4 || UINT_SIZE == 4 || INT_SIZE == 8 || UINT_SIZE == 8) && \
      defined(__AVX512F__) && (VEC_SIZE == 64 || defined(__AVX512VL__))
 # if INT_SIZE == 4 || UINT_SIZE == 4
+#  define broadcast(x) ({ \
+    vec_t t_; \
+    asm ( "%{evex%} vpbroadcastd %1, %0" \
+          : "=v" (t_) : "m" (*(int[1]){ x }) ); \
+    t_; \
+})
+#  define broadcast2(x) ({ \
+    vec_t t_; \
+    asm ( "vpbroadcastd %k1, %0" : "=v" (t_) : "r" (x) ); \
+    t_; \
+})
 #  define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \
                               (0b0101010101010101 & ((1 << ELEM_COUNT) - 1))))
 # elif INT_SIZE == 8 || UINT_SIZE == 8
+#  define broadcast(x) ({ \
+    vec_t t_; \
+    asm ( "%{evex%} vpbroadcastq %1, %0" \
+          : "=v" (t_) : "m" (*(long long[1]){ x }) ); \
+    t_; \
+})
+#  ifdef __x86_64__
+#   define broadcast2(x) ({ \
+    vec_t t_; \
+    asm ( "vpbroadcastq %1, %0" : "=v" (t_) : "r" ((x) + 0ULL) ); \
+    t_; \
+})
+#  endif
 #  define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101))
 # endif
 # if INT_SIZE == 4
@@ -977,10 +1001,14 @@ int simd_test(void)
     if ( !eq(swap2(src), inv) ) return __LINE__;
 #endif
 
-#if defined(broadcast)
+#ifdef broadcast
     if ( !eq(broadcast(ELEM_COUNT + 1), src + inv) ) return __LINE__;
 #endif
 
+#ifdef broadcast2
+    if ( !eq(broadcast2(ELEM_COUNT + 1), src + inv) ) return __LINE__;
+#endif
+
 #if defined(interleave_lo) && defined(interleave_hi)
     touch(src);
     x = interleave_lo(inv, src);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -454,9 +454,13 @@ static const struct ext0f38_table {
     [0x40] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x41] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0x45 ... 0x47] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
-    [0x58 ... 0x59] = { .simd_size = simd_other, .two_op = 1 },
-    [0x5a] = { .simd_size = simd_128, .two_op = 1 },
-    [0x78 ... 0x79] = { .simd_size = simd_other, .two_op = 1 },
+    [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 },
+    [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 },
+    [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 },
+    [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 },
+    [0x78] = { .simd_size = simd_other, .two_op = 1 },
+    [0x79] = { .simd_size = simd_other, .two_op = 1, .d8s = 1 },
+    [0x7a ... 0x7c] = { .simd_size = simd_none, .two_op = 1 },
     [0x8c] = { .simd_size = simd_packed_int },
     [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 },
     [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 },
@@ -2636,6 +2640,11 @@ x86_decode_0f38(
         ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
         break;
 
+    case X86EMUL_OPC_EVEX_66(0, 0x7a): /* vpbroadcastb */
+    case X86EMUL_OPC_EVEX_66(0, 0x7b): /* vpbroadcastw */
+    case X86EMUL_OPC_EVEX_66(0, 0x7c): /* vpbroadcast{d,q} */
+        break;
+
     case 0xf0: /* movbe / crc32 */
         state->desc |= repne_prefix() ? ByteOp : Mov;
         if ( rep_prefix() )
@@ -8233,6 +8242,8 @@ x86_emulate(
         goto avx512f_no_sae;
 
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x18): /* vbroadcastss xmm/m32,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,[xyz]mm{k} */
+        op_bytes = elem_bytes;
         generate_exception_if(evex.w || evex.brs, EXC_UD);
     avx512_broadcast:
         /*
@@ -8252,17 +8263,27 @@ x86_emulate(
 
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x1b): /* vbroadcastf32x8 m256,zmm{k} */
                                             /* vbroadcastf64x4 m256,zmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x5b): /* vbroadcasti32x8 m256,zmm{k} */
+                                            /* vbroadcasti64x4 m256,zmm{k} */
         generate_exception_if(ea.type != OP_MEM || evex.lr != 2, EXC_UD);
         /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x19): /* vbroadcastsd xmm/m64,{y,z}mm{k} */
                                             /* vbroadcastf32x2 xmm/m64,{y,z}mm{k} */
-        generate_exception_if(!evex.lr || evex.brs, EXC_UD);
+        generate_exception_if(!evex.lr, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,[xyz]mm{k} */
+                                            /* vbroadcasti32x2 xmm/m64,[xyz]mm{k} */
+        if ( b == 0x59 )
+            op_bytes = 8;
+        generate_exception_if(evex.brs, EXC_UD);
         if ( !evex.w )
             host_and_vcpu_must_have(avx512dq);
         goto avx512_broadcast;
 
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x1a): /* vbroadcastf32x4 m128,{y,z}mm{k} */
                                             /* vbroadcastf64x2 m128,{y,z}mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x5a): /* vbroadcasti32x4 m128,{y,z}mm{k} */
+                                            /* vbroadcasti64x2 m128,{y,z}mm{k} */
         generate_exception_if(ea.type != OP_MEM || !evex.lr || evex.brs,
                               EXC_UD);
         if ( evex.w )
@@ -8456,6 +8477,45 @@ x86_emulate(
         generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD);
         goto simd_0f_avx2;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x79): /* vpbroadcastw xmm/m16,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512bw);
+        generate_exception_if(evex.w || evex.brs, EXC_UD);
+        op_bytes = elem_bytes = 1 << (b & 1);
+        /* See the comment at the avx512_broadcast label. */
+        op_mask |= !(b & 1 ? !(uint32_t)op_mask : !op_mask);
+        goto avx512f_no_sae;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x7a): /* vpbroadcastb r32,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x7b): /* vpbroadcastw r32,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512bw);
+        generate_exception_if(evex.w, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x7c): /* vpbroadcast{d,q} reg,[xyz]mm{k} */
+        generate_exception_if((ea.type != OP_REG || evex.brs ||
+                               evex.reg != 0xf || !evex.RX),
+                              EXC_UD);
+        host_and_vcpu_must_have(avx512f);
+        avx512_vlen_check(false);
+        get_fpu(X86EMUL_FPU_zmm);
+
+        opc = init_evex(stub);
+        opc[0] = b;
+        /* Convert GPR source to %rAX. */
+        evex.b = 1;
+        if ( !mode_64bit() )
+            evex.w = 0;
+        opc[1] = modrm & 0xf8;
+        insn_bytes = EVEX_PFX_BYTES + 2;
+        opc[2] = 0xc3;
+
+        copy_EVEX(opc, evex);
+        invoke_stub("", "", "=g" (dummy) : "a" (src.val));
+
+        put_stub(stub);
+        ASSERT(!state->simd_size);
+        break;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0x8c): /* vpmaskmov{d,q} mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x8e): /* vpmaskmov{d,q} {x,y}mm,{x,y}mm,mem */
         generate_exception_if(ea.type != OP_MEM, EXC_UD);

From patchwork Fri Mar 15 10:39:27 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854457
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83FAA1575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:41:12 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67E0C2A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:41:12 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 57E0B2A933; Fri, 15 Mar 2019 10:41:12 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E2CCF2A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:41:10 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kFP-0003PU-CK; Fri, 15 Mar 2019 10:39:31 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kFN-0003PA-Sh
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:39:29 +0000
X-Inumbo-ID: 9ad0a48c-470e-11e9-bd06-fb9f09d34ca8
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 9ad0a48c-470e-11e9-bd06-fb9f09d34ca8;
 Fri, 15 Mar 2019 10:39:27 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:39:26 -0600
Message-Id: <5C8B80DF020000780021F125@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:39:27 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 06/50] x86emul: basic AVX512VL testing
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Test the 128- and 256-bit variants of the insns which have been
implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v6: Don't enable AVX512VL for scalar tests, nor for S/G ones with index
    wider than data. Re-base over changes earlier in the series.
v4: Move OVR() additions into __AVX512VL__ conditional.
v3: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -63,7 +63,7 @@ avx2-sg-flts := 4 8
 xop-vecs := $(avx-vecs)
 xop-ints := 1 2 4 8
 xop-flts := $(avx-flts)
-avx512f-vecs := 64
+avx512f-vecs := 64 16 32
 avx512f-ints := 4 8
 avx512f-flts := 4 8
 
--- a/tools/tests/x86_emulator/simd-fma.c
+++ b/tools/tests/x86_emulator/simd-fma.c
@@ -5,13 +5,13 @@ ENTRY(fma_test);
 
 #if VEC_SIZE < 16 && !defined(to_bool)
 # define to_bool(cmp) (!~(cmp)[0])
-#elif VEC_SIZE == 16
+#elif VEC_SIZE == 16 && !defined(__AVX512VL__)
 # if FLOAT_SIZE == 4
 #  define to_bool(cmp) __builtin_ia32_vtestcps(cmp, (vec_t){} == 0)
 # elif FLOAT_SIZE == 8
 #  define to_bool(cmp) __builtin_ia32_vtestcpd(cmp, (vec_t){} == 0)
 # endif
-#elif VEC_SIZE == 32
+#elif VEC_SIZE == 32 && !defined(__AVX512VL__)
 # if FLOAT_SIZE == 4
 #  define to_bool(cmp) __builtin_ia32_vtestcps256(cmp, (vec_t){} == 0)
 # elif FLOAT_SIZE == 8
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -539,7 +539,7 @@ static inline bool _to_bool(byte_vec_t b
 #  define rotr(x, n) ((vec_t)__builtin_ia32_palignr128((vdi_t)(x), (vdi_t)(x), (n) * 64))
 # endif
 #endif
-#if VEC_SIZE == 16 && defined(__SSE4_1__)
+#if VEC_SIZE == 16 && defined(__SSE4_1__) && !defined(__AVX512VL__)
 # if INT_SIZE == 1
 #  define max(x, y) ((vec_t)__builtin_ia32_pmaxsb128((vqi_t)(x), (vqi_t)(y)))
 #  define min(x, y) ((vec_t)__builtin_ia32_pminsb128((vqi_t)(x), (vqi_t)(y)))
@@ -593,7 +593,7 @@ static inline bool _to_bool(byte_vec_t b
 #  define mix(x, y) __builtin_ia32_blendpd(x, y, 0b10)
 # endif
 #endif
-#if VEC_SIZE == 32 && defined(__AVX__)
+#if VEC_SIZE == 32 && defined(__AVX__) && !defined(__AVX512VL__)
 # if FLOAT_SIZE == 4
 #  define dot_product(x, y) ({ \
     vec_t t_ = __builtin_ia32_dpps256(x, y, 0b11110001); \
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -92,6 +92,15 @@ typedef long long __attribute__((vector_
 
 #ifdef __AVX512F__
 
+# if VEC_SIZE > ELEM_SIZE && (defined(VEC_MAX) ? VEC_MAX : VEC_SIZE) < 64
+#  pragma GCC target ( "avx512vl" )
+# endif
+
+# define REN(insn, old, new)                     \
+    asm ( ".macro v" #insn #old " o:vararg \n\t" \
+          "v" #insn #new " \\o             \n\t" \
+          ".endm" )
+
 /*
  * The original plan was to effect use of EVEX encodings for scalar as well as
  * 128- and 256-bit insn variants by restricting the compiler to use (on 64-bit
@@ -135,25 +144,88 @@ asm ( ".macro override insn    \n\t"
 # define OVR_FP(n) OVR_VFP(n); OVR_SFP(n)
 # define OVR_INT(n) OVR_BW(n); OVR_DQ(n)
 
+OVR_INT(broadcast);
 OVR_SFP(broadcast);
 OVR_SFP(comi);
 OVR_FP(add);
+OVR_INT(add);
 OVR_FP(div);
 OVR(extractps);
 OVR_FMA(fmadd, FP);
+OVR_FMA(fmaddsub, VFP);
 OVR_FMA(fmsub, FP);
+OVR_FMA(fmsubadd, VFP);
 OVR_FMA(fnmadd, FP);
 OVR_FMA(fnmsub, FP);
 OVR(insertps);
 OVR_FP(max);
+OVR_INT(maxs);
+OVR_INT(maxu);
 OVR_FP(min);
+OVR_INT(mins);
+OVR_INT(minu);
 OVR(movd);
 OVR(movq);
 OVR_SFP(mov);
+OVR_VFP(mova);
+OVR_VFP(movnt);
+OVR_VFP(movu);
 OVR_FP(mul);
+OVR_VFP(shuf);
+OVR_INT(sll);
+OVR_DQ(sllv);
 OVR_FP(sqrt);
+OVR_INT(sra);
+OVR_DQ(srav);
+OVR_INT(srl);
+OVR_DQ(srlv);
 OVR_FP(sub);
+OVR_INT(sub);
 OVR_SFP(ucomi);
+OVR_VFP(unpckh);
+OVR_VFP(unpckl);
+
+# ifdef __AVX512VL__
+#  if ELEM_SIZE == 8 && defined(__AVX512DQ__)
+REN(extract, f128, f64x2);
+REN(extract, i128, i64x2);
+REN(insert, f128, f64x2);
+REN(insert, i128, i64x2);
+#  else
+REN(extract, f128, f32x4);
+REN(extract, i128, i32x4);
+REN(insert, f128, f32x4);
+REN(insert, i128, i32x4);
+#  endif
+#  if ELEM_SIZE == 8
+REN(movdqa, , 64);
+REN(movdqu, , 64);
+REN(pand, , q);
+REN(pandn, , q);
+REN(por, , q);
+REN(pxor, , q);
+#  else
+#   if ELEM_SIZE == 1 && defined(__AVX512BW__)
+REN(movdq, a, u8);
+REN(movdqu, , 8);
+#   elif ELEM_SIZE == 2 && defined(__AVX512BW__)
+REN(movdq, a, u16);
+REN(movdqu, , 16);
+#   else
+REN(movdqa, , 32);
+REN(movdqu, , 32);
+#   endif
+REN(pand, , d);
+REN(pandn, , d);
+REN(por, , d);
+REN(pxor, , d);
+#  endif
+OVR(movntdq);
+OVR(movntdqa);
+OVR(pmulld);
+OVR(pmuldq);
+OVR(pmuludq);
+# endif
 
 # undef OVR_VFP
 # undef OVR_SFP
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -88,6 +88,11 @@ static bool simd_check_avx512f(void)
 }
 #define simd_check_avx512f_opmask simd_check_avx512f
 
+static bool simd_check_avx512f_vl(void)
+{
+    return cpu_has_avx512f && cpu_has_avx512vl;
+}
+
 static bool simd_check_avx512dq(void)
 {
     return cpu_has_avx512dq;
@@ -142,11 +147,21 @@ static const struct {
       .check_cpu = simd_check_ ## feat,                             \
       .set_regs = simd_set_regs,                                    \
       .check_regs = simd_check_regs }
+#define AVX512VL_(bits, desc, feat, form)                          \
+    { .code = feat ## _x86_ ## bits ## _D ## _ ## form,            \
+      .size = sizeof(feat ## _x86_ ## bits ## _D ## _ ## form),    \
+      .bitness = bits, .name = "AVX512" #desc,                     \
+      .check_cpu = simd_check_ ## feat ## _vl,                     \
+      .set_regs = simd_set_regs,                                   \
+      .check_regs = simd_check_regs }
 #ifdef __x86_64__
 # define SIMD(desc, feat, form) SIMD_(64, desc, feat, form), \
                                 SIMD_(32, desc, feat, form)
+# define AVX512VL(desc, feat, form) AVX512VL_(64, desc, feat, form), \
+                                    AVX512VL_(32, desc, feat, form)
 #else
 # define SIMD(desc, feat, form) SIMD_(32, desc, feat, form)
+# define AVX512VL(desc, feat, form) AVX512VL_(32, desc, feat, form)
 #endif
     SIMD(3DNow! single,          _3dnow,     8f4),
     SIMD(SSE scalar single,      sse,         f4),
@@ -257,6 +272,20 @@ static const struct {
     SIMD(AVX512F u32x16,      avx512f,      64u4),
     SIMD(AVX512F s64x8,       avx512f,      64i8),
     SIMD(AVX512F u64x8,       avx512f,      64u8),
+    AVX512VL(VL f32x4,        avx512f,      16f4),
+    AVX512VL(VL f64x2,        avx512f,      16f8),
+    AVX512VL(VL f32x8,        avx512f,      32f4),
+    AVX512VL(VL f64x4,        avx512f,      32f8),
+    AVX512VL(VL s32x4,        avx512f,      16i4),
+    AVX512VL(VL u32x4,        avx512f,      16u4),
+    AVX512VL(VL s32x8,        avx512f,      32i4),
+    AVX512VL(VL u32x8,        avx512f,      32u4),
+    AVX512VL(VL s64x2,        avx512f,      16i8),
+    AVX512VL(VL u64x2,        avx512f,      16u8),
+    AVX512VL(VL s64x4,        avx512f,      32i8),
+    AVX512VL(VL u64x4,        avx512f,      32u8),
+#undef AVX512VL_
+#undef AVX512VL
 #undef SIMD_
 #undef SIMD
 };

From patchwork Fri Mar 15 10:40:10 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854459
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7AF61575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:41:51 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F3022A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:41:51 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9371D2A936; Fri, 15 Mar 2019 10:41:51 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0BED72A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:41:51 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kG6-00044P-OZ; Fri, 15 Mar 2019 10:40:14 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kG4-000448-Vg
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:40:13 +0000
X-Inumbo-ID: b542a0ff-470e-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id b542a0ff-470e-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:40:11 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:40:10 -0600
Message-Id: <5C8B810A020000780021F128@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:40:10 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 07/50] x86emul: support AVX512{F,
 BW} zero- and sign-extending moves
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Note that the testing in simd.c doesn't really follow the ISA extension
pattern - to fit the scheme, extensions from byte and word granular
vectors can (currently) sensibly only happen in the AVX512BW case (and
hence respective abstraction macros will be added there rather than
here).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Raise #UD when EVEX.b is set. Re-base.
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -177,6 +177,16 @@ static const struct test avx512f_all[] =
     INSN(pmaxu,        66, 0f38, 3f,    vl,     dq, vl),
     INSN(pmins,        66, 0f38, 39,    vl,     dq, vl),
     INSN(pminu,        66, 0f38, 3b,    vl,     dq, vl),
+    INSN(pmovsxbd,     66, 0f38, 21,    vl_4,    b, vl),
+    INSN(pmovsxbq,     66, 0f38, 22,    vl_8,    b, vl),
+    INSN(pmovsxwd,     66, 0f38, 23,    vl_2,    w, vl),
+    INSN(pmovsxwq,     66, 0f38, 24,    vl_4,    w, vl),
+    INSN(pmovsxdq,     66, 0f38, 25,    vl_2, d_nb, vl),
+    INSN(pmovzxbd,     66, 0f38, 31,    vl_4,    b, vl),
+    INSN(pmovzxbq,     66, 0f38, 32,    vl_8,    b, vl),
+    INSN(pmovzxwd,     66, 0f38, 33,    vl_2,    w, vl),
+    INSN(pmovzxwq,     66, 0f38, 34,    vl_4,    w, vl),
+    INSN(pmovzxdq,     66, 0f38, 35,    vl_2, d_nb, vl),
     INSN(pmuldq,       66, 0f38, 28,    vl,      q, vl),
     INSN(pmulld,       66, 0f38, 40,    vl,      d, vl),
     INSN(pmuludq,      66,   0f, f4,    vl,      q, vl),
@@ -274,6 +284,8 @@ static const struct test avx512bw_all[]
     INSN(pminsw,      66,   0f, ea,    vl,    w, vl),
     INSN(pminub,      66,   0f, da,    vl,    b, vl),
     INSN(pminuw,      66, 0f38, 3a,    vl,    w, vl),
+    INSN(pmovsxbw,    66, 0f38, 20,    vl_2,  b, vl),
+    INSN(pmovzxbw,    66, 0f38, 30,    vl_2,  b, vl),
     INSN(pmulhuw,     66,   0f, e4,    vl,    w, vl),
     INSN(pmulhw,      66,   0f, e5,    vl,    w, vl),
     INSN(pmullw,      66,   0f, d5,    vl,    w, vl),
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -443,13 +443,23 @@ static const struct ext0f38_table {
     [0x1a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 },
     [0x1b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x1c ... 0x1e] = { .simd_size = simd_packed_int, .two_op = 1 },
-    [0x20 ... 0x25] = { .simd_size = simd_other, .two_op = 1 },
+    [0x20] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
+    [0x21] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 },
+    [0x22] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_8 },
+    [0x23] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
+    [0x24] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 },
+    [0x25] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x26 ... 0x29] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x2a] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0x2b] = { .simd_size = simd_packed_int },
     [0x2c ... 0x2d] = { .simd_size = simd_packed_fp },
     [0x2e ... 0x2f] = { .simd_size = simd_packed_fp, .to_mem = 1 },
-    [0x30 ... 0x35] = { .simd_size = simd_other, .two_op = 1 },
+    [0x30] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
+    [0x31] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 },
+    [0x32] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_8 },
+    [0x33] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
+    [0x34] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 },
+    [0x35] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x36 ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x40] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x41] = { .simd_size = simd_packed_int, .two_op = 1 },
@@ -8349,6 +8359,25 @@ x86_emulate(
         op_bytes = 16 >> (pmov_convert_delta[b & 7] - vex.l);
         goto simd_0f_int;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x20): /* vpmovsxbw {x,y}mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x30): /* vpmovzxbw {x,y}mm/mem,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512bw);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x23): /* vpmovsxwd {x,y}mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x24): /* vpmovsxwq xmm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x25): /* vpmovsxdq {x,y}mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x31): /* vpmovzxbd xmm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x32): /* vpmovzxbq xmm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x33): /* vpmovzxwd {x,y}mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x34): /* vpmovzxwq xmm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x35): /* vpmovzxdq {x,y}mm/mem,[xyz]mm{k} */
+        generate_exception_if(evex.brs || (evex.w && (b & 7) == 5), EXC_UD);
+        op_bytes = 32 >> (pmov_convert_delta[b & 7] + 1 - evex.lr);
+        elem_bytes = (b & 7) < 3 ? 1 : (b & 7) != 5 ? 2 : 4;
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_66(0x0f38, 0x2a):     /* movntdqa m128,xmm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x2a): /* vmovntdqa mem,{x,y}mm */
         generate_exception_if(ea.type != OP_MEM, EXC_UD);
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -311,10 +311,12 @@ static inline bool _to_bool(byte_vec_t b
 #  define max(x, y) B(pmaxsd, _mask, x, y, undef(), ~0)
 #  define min(x, y) B(pminsd, _mask, x, y, undef(), ~0)
 #  define mul_full(x, y) ((vec_t)B(pmuldq, _mask, x, y, (vdi_t)undef(), ~0))
+#  define widen1(x) ((vec_t)B(pmovsxdq, _mask, x, (vdi_t)undef(), ~0))
 # elif UINT_SIZE == 4
 #  define max(x, y) ((vec_t)B(pmaxud, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
 #  define min(x, y) ((vec_t)B(pminud, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
 #  define mul_full(x, y) ((vec_t)B(pmuludq, _mask, (vsi_t)(x), (vsi_t)(y), (vdi_t)undef(), ~0))
+#  define widen1(x) ((vec_t)B(pmovzxdq, _mask, (vsi_half_t)(x), (vdi_t)undef(), ~0))
 # elif INT_SIZE == 8
 #  define max(x, y) ((vec_t)B(pmaxsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
 #  define min(x, y) ((vec_t)B(pminsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -222,6 +222,16 @@ REN(pxor, , d);
 #  endif
 OVR(movntdq);
 OVR(movntdqa);
+OVR(pmovsxbd);
+OVR(pmovsxbq);
+OVR(pmovsxdq);
+OVR(pmovsxwd);
+OVR(pmovsxwq);
+OVR(pmovzxbd);
+OVR(pmovzxbq);
+OVR(pmovzxdq);
+OVR(pmovzxwd);
+OVR(pmovzxwq);
 OVR(pmulld);
 OVR(pmuldq);
 OVR(pmuludq);

From patchwork Fri Mar 15 10:40:34 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854461
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2005A1575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:42:16 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02F4B2A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:42:16 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id EAFA32A936; Fri, 15 Mar 2019 10:42:15 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 26BCD2A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:42:15 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kGX-0004B1-8g; Fri, 15 Mar 2019 10:40:41 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kGV-0004Ad-RD
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:40:39 +0000
X-Inumbo-ID: c4517480-470e-11e9-abf2-57ce10c0dfab
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id c4517480-470e-11e9-abf2-57ce10c0dfab;
 Fri, 15 Mar 2019 10:40:36 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:40:35 -0600
Message-Id: <5C8B8122020000780021F161@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:40:34 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 08/50] x86emul: support AVX512{F,
 BW} down conversion moves
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Note that the vpmov{,s,us}{d,q}w table entries in evex-disp8.c are
slightly different from what one would expect, due to them requiring
EVEX.W to be zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v8: Adjustment for XSA-289: Use XOR instead of ADD when fiddling with b
    as an array index.
v7: ea.type == OP_* -> ea.type != OP_*. Re-base over change in previous
    patch. Re-base.
v5: Also adjust x86_insn_is_mem_write().
v4: Also #UD when evex.z is set with a memory operand.
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -177,11 +177,26 @@ static const struct test avx512f_all[] =
     INSN(pmaxu,        66, 0f38, 3f,    vl,     dq, vl),
     INSN(pmins,        66, 0f38, 39,    vl,     dq, vl),
     INSN(pminu,        66, 0f38, 3b,    vl,     dq, vl),
+    INSN(pmovdb,       f3, 0f38, 31,    vl_4,    b, vl),
+    INSN(pmovdw,       f3, 0f38, 33,    vl_2,    b, vl),
+    INSN(pmovqb,       f3, 0f38, 32,    vl_8,    b, vl),
+    INSN(pmovqd,       f3, 0f38, 35,    vl_2, d_nb, vl),
+    INSN(pmovqw,       f3, 0f38, 34,    vl_4,    b, vl),
+    INSN(pmovsdb,      f3, 0f38, 21,    vl_4,    b, vl),
+    INSN(pmovsdw,      f3, 0f38, 23,    vl_2,    b, vl),
+    INSN(pmovsqb,      f3, 0f38, 22,    vl_8,    b, vl),
+    INSN(pmovsqd,      f3, 0f38, 25,    vl_2, d_nb, vl),
+    INSN(pmovsqw,      f3, 0f38, 24,    vl_4,    b, vl),
     INSN(pmovsxbd,     66, 0f38, 21,    vl_4,    b, vl),
     INSN(pmovsxbq,     66, 0f38, 22,    vl_8,    b, vl),
     INSN(pmovsxwd,     66, 0f38, 23,    vl_2,    w, vl),
     INSN(pmovsxwq,     66, 0f38, 24,    vl_4,    w, vl),
     INSN(pmovsxdq,     66, 0f38, 25,    vl_2, d_nb, vl),
+    INSN(pmovusdb,     f3, 0f38, 11,    vl_4,    b, vl),
+    INSN(pmovusdw,     f3, 0f38, 13,    vl_2,    b, vl),
+    INSN(pmovusqb,     f3, 0f38, 12,    vl_8,    b, vl),
+    INSN(pmovusqd,     f3, 0f38, 15,    vl_2, d_nb, vl),
+    INSN(pmovusqw,     f3, 0f38, 14,    vl_4,    b, vl),
     INSN(pmovzxbd,     66, 0f38, 31,    vl_4,    b, vl),
     INSN(pmovzxbq,     66, 0f38, 32,    vl_8,    b, vl),
     INSN(pmovzxwd,     66, 0f38, 33,    vl_2,    w, vl),
@@ -284,7 +299,10 @@ static const struct test avx512bw_all[]
     INSN(pminsw,      66,   0f, ea,    vl,    w, vl),
     INSN(pminub,      66,   0f, da,    vl,    b, vl),
     INSN(pminuw,      66, 0f38, 3a,    vl,    w, vl),
+    INSN(pmovswb,     f3, 0f38, 20,    vl_2,  b, vl),
     INSN(pmovsxbw,    66, 0f38, 20,    vl_2,  b, vl),
+    INSN(pmovuswb,    f3, 0f38, 10,    vl_2,  b, vl),
+    INSN(pmovwb,      f3, 0f38, 30,    vl_2,  b, vl),
     INSN(pmovzxbw,    66, 0f38, 30,    vl_2,  b, vl),
     INSN(pmulhuw,     66,   0f, e4,    vl,    w, vl),
     INSN(pmulhw,      66,   0f, e5,    vl,    w, vl),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -277,6 +277,17 @@ static inline bool _to_bool(byte_vec_t b
 #endif
 #if (INT_SIZE == 4 || UINT_SIZE == 4 || INT_SIZE == 8 || UINT_SIZE == 8) && \
      defined(__AVX512F__) && (VEC_SIZE == 64 || defined(__AVX512VL__))
+# if ELEM_COUNT == 8 /* vextracti{32,64}x4 */ || \
+     (ELEM_COUNT == 16 && ELEM_SIZE == 4 && defined(__AVX512DQ__)) /* vextracti32x8 */ || \
+     (ELEM_COUNT == 4 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextracti64x2 */
+#  define low_half(x) ({ \
+    half_t t_; \
+    asm ( "vextracti%c[w]x%c[n] $0, %[s], %[d]" \
+          : [d] "=m" (t_) \
+          : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 2) ); \
+    t_; \
+})
+# endif
 # if INT_SIZE == 4 || UINT_SIZE == 4
 #  define broadcast(x) ({ \
     vec_t t_; \
@@ -291,6 +302,7 @@ static inline bool _to_bool(byte_vec_t b
 })
 #  define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \
                               (0b0101010101010101 & ((1 << ELEM_COUNT) - 1))))
+#  define shrink1(x) ((half_t)B(pmovqd, _mask, (vdi_t)(x), (vsi_half_t){}, ~0))
 # elif INT_SIZE == 8 || UINT_SIZE == 8
 #  define broadcast(x) ({ \
     vec_t t_; \
@@ -720,6 +732,27 @@ static inline bool _to_bool(byte_vec_t b
 # endif
 #endif
 
+#if VEC_SIZE >= 16
+
+# if !defined(low_half) && defined(HALF_SIZE)
+static inline half_t low_half(vec_t x)
+{
+#  if HALF_SIZE < VEC_SIZE
+    half_t y;
+    unsigned int i;
+
+    for ( i = 0; i < ELEM_COUNT / 2; ++i )
+        y[i] = x[i];
+
+    return y;
+#  else
+    return x;
+#  endif
+}
+# endif
+
+#endif
+
 #if defined(__AVX512F__) && defined(FLOAT_SIZE)
 # include "simd-fma.c"
 #endif
@@ -1087,6 +1120,21 @@ int simd_test(void)
 
 #endif
 
+#if defined(widen1) && defined(shrink1)
+    {
+        half_t aux1 = low_half(src), aux2;
+
+        touch(aux1);
+        x = widen1(aux1);
+        touch(x);
+        aux2 = shrink1(x);
+        touch(aux2);
+        for ( i = 0; i < ELEM_COUNT / 2; ++i )
+            if ( aux2[i] != src[i] )
+                return __LINE__;
+    }
+#endif
+
 #ifdef dup_lo
     touch(src);
     x = dup_lo(src);
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -70,6 +70,23 @@ typedef int __attribute__((vector_size(V
 typedef long long __attribute__((vector_size(VEC_SIZE))) vdi_t;
 #endif
 
+#if VEC_SIZE >= 16
+
+# if ELEM_COUNT >= 2
+#  if VEC_SIZE > 32
+#   define HALF_SIZE (VEC_SIZE / 2)
+#  else
+#   define HALF_SIZE 16
+#  endif
+typedef typeof((vec_t){}[0]) __attribute__((vector_size(HALF_SIZE))) half_t;
+typedef char __attribute__((vector_size(HALF_SIZE))) vqi_half_t;
+typedef short __attribute__((vector_size(HALF_SIZE))) vhi_half_t;
+typedef int __attribute__((vector_size(HALF_SIZE))) vsi_half_t;
+typedef long long __attribute__((vector_size(HALF_SIZE))) vdi_half_t;
+# endif
+
+#endif
+
 #if VEC_SIZE == 16
 # define B(n, s, a...)   __builtin_ia32_ ## n ## 128 ## s(a)
 # define B_(n, s, a...)  __builtin_ia32_ ## n ##        s(a)
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3068,7 +3068,22 @@ x86_decode(
                 d |= vSIB;
             state->simd_size = ext0f38_table[b].simd_size;
             if ( evex_encoded() )
-                disp8scale = decode_disp8scale(ext0f38_table[b].d8s, state);
+            {
+                /*
+                 * VPMOVUS* are identical to VPMOVS* Disp8-scaling-wise, but
+                 * their attributes don't match those of the vex_66 encoded
+                 * insns with the same base opcodes. Rather than adding new
+                 * columns to the table, handle this here for now.
+                 */
+                if ( evex.pfx != vex_f3 || (b & 0xf8) != 0x10 )
+                    disp8scale = decode_disp8scale(ext0f38_table[b].d8s, state);
+                else
+                {
+                    disp8scale = decode_disp8scale(ext0f38_table[b ^ 0x30].d8s,
+                                                   state);
+                    state->simd_size = simd_other;
+                }
+            }
             break;
 
         case ext_0f3a:
@@ -8359,10 +8374,14 @@ x86_emulate(
         op_bytes = 16 >> (pmov_convert_delta[b & 7] - vex.l);
         goto simd_0f_int;
 
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x10): /* vpmovuswb [xyz]mm,{x,y}mm/mem{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x20): /* vpmovsxbw {x,y}mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x20): /* vpmovswb [xyz]mm,{x,y}mm/mem{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x30): /* vpmovzxbw {x,y}mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x30): /* vpmovwb [xyz]mm,{x,y}mm/mem{k} */
         host_and_vcpu_must_have(avx512bw);
-        /* fall through */
+        if ( evex.pfx != vex_f3 )
+        {
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x23): /* vpmovsxwd {x,y}mm/mem,[xyz]mm{k} */
@@ -8373,7 +8392,29 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x33): /* vpmovzxwd {x,y}mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x34): /* vpmovzxwq xmm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x35): /* vpmovzxdq {x,y}mm/mem,[xyz]mm{k} */
-        generate_exception_if(evex.brs || (evex.w && (b & 7) == 5), EXC_UD);
+            generate_exception_if(evex.w && (b & 7) == 5, EXC_UD);
+        }
+        else
+        {
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x11): /* vpmovusdb [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x12): /* vpmovusqb [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x13): /* vpmovusdw [xyz]mm,{x,y}mm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x14): /* vpmovusqw [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x15): /* vpmovusqd [xyz]mm,{x,y}mm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x21): /* vpmovsdb [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x22): /* vpmovsqb [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x23): /* vpmovsdw [xyz]mm,{x,y}mm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x24): /* vpmovsqw [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x25): /* vpmovsqd [xyz]mm,{x,y}mm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x31): /* vpmovdb [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x32): /* vpmovqb [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x33): /* vpmovdw [xyz]mm,{x,y}mm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x34): /* vpmovqw [xyz]mm,xmm/mem{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x35): /* vpmovqd [xyz]mm,{x,y}mm/mem{k} */
+            generate_exception_if(evex.w || (ea.type != OP_REG && evex.z), EXC_UD);
+            d = DstMem | SrcReg | TwoOp;
+        }
+        generate_exception_if(evex.brs, EXC_UD);
         op_bytes = 32 >> (pmov_convert_delta[b & 7] + 1 - evex.lr);
         elem_bytes = (b & 7) < 3 ? 1 : (b & 7) != 5 ? 2 : 4;
         goto avx512f_no_sae;
@@ -10212,6 +10253,12 @@ x86_insn_is_mem_write(const struct x86_e
     case X86EMUL_OPC(0x0f, 0xab):        /* BTS */
     case X86EMUL_OPC(0x0f, 0xb3):        /* BTR */
     case X86EMUL_OPC(0x0f, 0xbb):        /* BTC */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x10) ...
+         X86EMUL_OPC_EVEX_F3(0x0f38, 0x15): /* VPMOVUS* */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x20) ...
+         X86EMUL_OPC_EVEX_F3(0x0f38, 0x25): /* VPMOVS* */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x30) ...
+         X86EMUL_OPC_EVEX_F3(0x0f38, 0x35): /* VPMOV{D,Q,W}* */
         return true;
 
     case 0xd9:

From patchwork Fri Mar 15 10:41:03 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854463
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB41E1575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:42:41 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2F092A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:42:41 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C711D2A936; Fri, 15 Mar 2019 10:42:41 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 470F52A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:42:41 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kGw-0004HN-KW; Fri, 15 Mar 2019 10:41:06 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kGv-0004HD-RX
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:41:05 +0000
X-Inumbo-ID: d52b4836-470e-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id d52b4836-470e-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:41:04 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:41:04 -0600
Message-Id: <5C8B813F020000780021F164@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:41:03 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 09/50] x86emul: support AVX512{F,
 BW} integer unpack insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

There's once again one extra twobyte_table[] entry which gets its Disp8
shift value set right away without getting support implemented just yet,
again to avoid needlessly splitting groups of entries.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v8: Re-base.
v6: Re-base over changes earlier in the series.
v4: Move OVR() additions into __AVX512VL__ conditional.
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -229,6 +229,10 @@ static const struct test avx512f_all[] =
     INSN(pternlog,     66, 0f3a, 25,    vl,     dq, vl),
     INSN(ptestm,       66, 0f38, 27,    vl,     dq, vl),
     INSN(ptestnm,      f3, 0f38, 27,    vl,     dq, vl),
+    INSN(punpckhdq,    66,   0f, 6a,    vl,      d, vl),
+    INSN(punpckhqdq,   66,   0f, 6d,    vl,      q, vl),
+    INSN(punpckldq,    66,   0f, 62,    vl,      d, vl),
+    INSN(punpcklqdq,   66,   0f, 6c,    vl,      q, vl),
     INSN(pxor,         66,   0f, ef,    vl,     dq, vl),
     INSN_PFP(shuf,           0f, c6),
     INSN_FP(sqrt,            0f, 51),
@@ -327,6 +331,10 @@ static const struct test avx512bw_all[]
     INSN(psubw,       66,   0f, f9,    vl,    w, vl),
     INSN(ptestm,      66, 0f38, 26,    vl,   bw, vl),
     INSN(ptestnm,     f3, 0f38, 26,    vl,   bw, vl),
+    INSN(punpckhbw,   66,   0f, 68,    vl,    b, vl),
+    INSN(punpckhwd,   66,   0f, 69,    vl,    w, vl),
+    INSN(punpcklbw,   66,   0f, 60,    vl,    b, vl),
+    INSN(punpcklwd,   66,   0f, 61,    vl,    w, vl),
 };
 
 static const struct test avx512bw_128[] = {
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -300,6 +300,10 @@ static inline bool _to_bool(byte_vec_t b
     asm ( "vpbroadcastd %k1, %0" : "=v" (t_) : "r" (x) ); \
     t_; \
 })
+#  if VEC_SIZE == 16
+#   define interleave_hi(x, y) ((vec_t)B(punpckhdq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
+#   define interleave_lo(x, y) ((vec_t)B(punpckldq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
+#  endif
 #  define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \
                               (0b0101010101010101 & ((1 << ELEM_COUNT) - 1))))
 #  define shrink1(x) ((half_t)B(pmovqd, _mask, (vdi_t)(x), (vsi_half_t){}, ~0))
@@ -317,6 +321,10 @@ static inline bool _to_bool(byte_vec_t b
     t_; \
 })
 #  endif
+#  if VEC_SIZE == 16
+#   define interleave_hi(x, y) ((vec_t)B(punpckhqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
+#   define interleave_lo(x, y) ((vec_t)B(punpcklqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
+#  endif
 #  define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101))
 # endif
 # if INT_SIZE == 4
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -252,6 +252,10 @@ OVR(pmovzxwq);
 OVR(pmulld);
 OVR(pmuldq);
 OVR(pmuludq);
+OVR(punpckhdq);
+OVR(punpckhqdq);
+OVR(punpckldq);
+OVR(punpcklqdq);
 # endif
 
 # undef OVR_VFP
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -312,10 +312,10 @@ static const struct twobyte_table {
     [0x58 ... 0x59] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl },
     [0x5a ... 0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
     [0x5c ... 0x5f] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl },
-    [0x60 ... 0x62] = { DstImplicit|SrcMem|ModRM, simd_other },
+    [0x60 ... 0x62] = { DstImplicit|SrcMem|ModRM, simd_other, d8s_vl },
     [0x63 ... 0x67] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
-    [0x68 ... 0x6a] = { DstImplicit|SrcMem|ModRM, simd_other },
-    [0x6b ... 0x6d] = { DstImplicit|SrcMem|ModRM, simd_packed_int },
+    [0x68 ... 0x6a] = { DstImplicit|SrcMem|ModRM, simd_other, d8s_vl },
+    [0x6b ... 0x6d] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
     [0x6e] = { DstImplicit|SrcMem|ModRM|Mov, simd_none, d8s_dq64 },
     [0x6f] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_int, d8s_vl },
     [0x70] = { SrcImmByte|ModRM|TwoOp, simd_other },
@@ -6681,6 +6681,12 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf6): /* vpsadbw [xyz]mm/mem,[xyz]mm,[xyz]mm */
         generate_exception_if(evex.opmsk, EXC_UD);
         /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x60): /* vpunpcklbw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x61): /* vpunpcklwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x68): /* vpunpckhbw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x69): /* vpunpckhwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        op_bytes = 16 << evex.lr;
+        /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xd1): /* vpsrlw xmm/m128,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xe1): /* vpsraw xmm/m128,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf1): /* vpsllw xmm/m128,[xyz]mm,[xyz]mm{k} */
@@ -6708,6 +6714,13 @@ x86_emulate(
         elem_bytes = 1 << (b & 1);
         goto avx512f_no_sae;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x62): /* vpunpckldq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x6a): /* vpunpckhdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        generate_exception_if(evex.w, EXC_UD);
+        fault_suppression = false;
+        op_bytes = 16 << evex.lr;
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_EVEX_F3(0x0f38, 0x26): /* vptestnm{b,w} [xyz]mm/mem,[xyz]mm,k{k} */
     case X86EMUL_OPC_EVEX_F3(0x0f38, 0x27): /* vptestnm{d,q} [xyz]mm/mem,[xyz]mm,k{k} */
         op_bytes = 16 << evex.lr;
@@ -6734,6 +6747,10 @@ x86_emulate(
         avx512_vlen_check(false);
         goto simd_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x6c): /* vpunpcklqdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x6d): /* vpunpckhqdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        fault_suppression = false;
+        /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xd4): /* vpaddq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf4): /* vpmuludq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x28): /* vpmuldq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */

From patchwork Fri Mar 15 10:41:48 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854465
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 82F6215AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:43:41 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 66CDF2A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:43:41 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 584CA2A936; Fri, 15 Mar 2019 10:43:41 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 76F242A933
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:43:40 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kHf-0004Rc-W6; Fri, 15 Mar 2019 10:41:51 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kHe-0004RF-Hu
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:41:50 +0000
X-Inumbo-ID: ef3f5313-470e-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id ef3f5313-470e-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:41:48 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:41:47 -0600
Message-Id: <5C8B816C020000780021F167@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:41:48 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 10/50] x86emul: support AVX512{F, BW,
 _VBMI} full permute insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Take the liberty and also correct the (public interface) name of the
AVX512_VBMI feature flag, on the assumption that no external consumer
has actually been using that flag so far. Furthermore make it have
AVX512BW instead of AVX512F as a prerequisite, for requiring full
64-bit mask registers (the upper 48 bits of which can't be accessed
other than through XSAVE/XRSTOR without AVX512BW support).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v5: Re-base.
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -173,6 +173,10 @@ static const struct test avx512f_all[] =
     INSN(pcmpgtd,      66,   0f, 66,    vl,      d, vl),
     INSN(pcmpgtq,      66, 0f38, 37,    vl,      q, vl),
     INSN(pcmpu,        66, 0f3a, 1e,    vl,     dq, vl),
+    INSN(permi2,       66, 0f38, 76,    vl,     dq, vl),
+    INSN(permi2,       66, 0f38, 77,    vl,     sd, vl),
+    INSN(permt2,       66, 0f38, 7e,    vl,     dq, vl),
+    INSN(permt2,       66, 0f38, 7f,    vl,     sd, vl),
     INSN(pmaxs,        66, 0f38, 3d,    vl,     dq, vl),
     INSN(pmaxu,        66, 0f38, 3f,    vl,     dq, vl),
     INSN(pmins,        66, 0f38, 39,    vl,     dq, vl),
@@ -294,6 +298,8 @@ static const struct test avx512bw_all[]
     INSN(pcmpgtb,     66,   0f, 64,    vl,    b, vl),
     INSN(pcmpgtw,     66,   0f, 65,    vl,    w, vl),
     INSN(pcmpu,       66, 0f3a, 3e,    vl,   bw, vl),
+    INSN(permi2w,     66, 0f38, 75,    vl,    w, vl),
+    INSN(permt2w,     66, 0f38, 7d,    vl,    w, vl),
     INSN(pmaddwd,     66,   0f, f5,    vl,    w, vl),
     INSN(pmaxsb,      66, 0f38, 3c,    vl,    b, vl),
     INSN(pmaxsw,      66,   0f, ee,    vl,    w, vl),
@@ -378,6 +384,11 @@ static const struct test avx512dq_512[]
     INSN(inserti32x8,    66, 0f3a, 3a, el_8, d, vl),
 };
 
+static const struct test avx512_vbmi_all[] = {
+    INSN(permi2b,       66, 0f38, 75, vl, b, vl),
+    INSN(permt2b,       66, 0f38, 7d, vl, b, vl),
+};
+
 static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
 static const unsigned char vl_128[] = { VL_128 };
 static const unsigned char vl_no128[] = { VL_512, VL_256 };
@@ -718,4 +729,5 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512dq, 128);
     RUN(avx512dq, no128);
     RUN(avx512dq, 512);
+    RUN(avx512_vbmi, all);
 }
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -150,6 +150,9 @@ static inline bool _to_bool(byte_vec_t b
 #   define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0)
 #   define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0)
 #   define swap(x) B(shufps, _mask, x, x, 0b00011011, undef(), ~0)
+#  else
+#   define interleave_hi(x, y) B(vpermi2varps, _mask, x, interleave_hi, y, ~0)
+#   define interleave_lo(x, y) B(vpermt2varps, _mask, interleave_lo, x, y, ~0)
 #  endif
 # elif FLOAT_SIZE == 8
 #  if VEC_SIZE >= 32
@@ -175,6 +178,9 @@ static inline bool _to_bool(byte_vec_t b
 #   define interleave_hi(x, y) B(unpckhpd, _mask, x, y, undef(), ~0)
 #   define interleave_lo(x, y) B(unpcklpd, _mask, x, y, undef(), ~0)
 #   define swap(x) B(shufpd, _mask, x, x, 0b01, undef(), ~0)
+#  else
+#   define interleave_hi(x, y) B(vpermi2varpd, _mask, x, interleave_hi, y, ~0)
+#   define interleave_lo(x, y) B(vpermt2varpd, _mask, interleave_lo, x, y, ~0)
 #  endif
 # endif
 #elif FLOAT_SIZE == 4 && defined(__SSE__)
@@ -303,6 +309,9 @@ static inline bool _to_bool(byte_vec_t b
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) ((vec_t)B(punpckhdq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(punpckldq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
+#  else
+#   define interleave_hi(x, y) ((vec_t)B(vpermi2vard, _mask, (vsi_t)(x), interleave_hi, (vsi_t)(y), ~0))
+#   define interleave_lo(x, y) ((vec_t)B(vpermt2vard, _mask, interleave_lo, (vsi_t)(x), (vsi_t)(y), ~0))
 #  endif
 #  define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \
                               (0b0101010101010101 & ((1 << ELEM_COUNT) - 1))))
@@ -324,6 +333,9 @@ static inline bool _to_bool(byte_vec_t b
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) ((vec_t)B(punpckhqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(punpcklqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
+#  else
+#   define interleave_hi(x, y) ((vec_t)B(vpermi2varq, _mask, (vdi_t)(x), interleave_hi, (vdi_t)(y), ~0))
+#   define interleave_lo(x, y) ((vec_t)B(vpermt2varq, _mask, interleave_lo, (vdi_t)(x), (vdi_t)(y), ~0))
 #  endif
 #  define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101))
 # endif
@@ -769,6 +781,7 @@ int simd_test(void)
 {
     unsigned int i, j;
     vec_t x, y, z, src, inv, alt, sh;
+    vint_t interleave_lo, interleave_hi;
 
     for ( i = 0, j = ELEM_SIZE << 3; i < ELEM_COUNT; ++i )
     {
@@ -782,6 +795,9 @@ int simd_test(void)
         if ( !(i & (i + 1)) )
             --j;
         sh[i] = j;
+
+        interleave_lo[i] = ((i & 1) * ELEM_COUNT) | (i >> 1);
+        interleave_hi[i] = interleave_lo[i] + (ELEM_COUNT / 2);
     }
 
     touch(src);
@@ -1075,7 +1091,7 @@ int simd_test(void)
     x = src * alt;
     y = interleave_lo(x, alt < 0);
     touch(x);
-    z = widen1(x);
+    z = widen1(low_half(x));
     touch(x);
     if ( !eq(z, y) ) return __LINE__;
 
@@ -1107,7 +1123,7 @@ int simd_test(void)
 
 # ifdef widen1
     touch(src);
-    x = widen1(src);
+    x = widen1(low_half(src));
     touch(src);
     if ( !eq(x, y) ) return __LINE__;
 # endif
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -70,6 +70,16 @@ typedef int __attribute__((vector_size(V
 typedef long long __attribute__((vector_size(VEC_SIZE))) vdi_t;
 #endif
 
+#if ELEM_SIZE == 1
+typedef vqi_t vint_t;
+#elif ELEM_SIZE == 2
+typedef vhi_t vint_t;
+#elif ELEM_SIZE == 4
+typedef vsi_t vint_t;
+#elif ELEM_SIZE == 8
+typedef vdi_t vint_t;
+#endif
+
 #if VEC_SIZE >= 16
 
 # if ELEM_COUNT >= 2
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -136,6 +136,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512dq  (cp.feat.avx512dq && xcr0_mask(0xe6))
 #define cpu_has_avx512bw  (cp.feat.avx512bw && xcr0_mask(0xe6))
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
+#define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -468,9 +468,13 @@ static const struct ext0f38_table {
     [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 },
     [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 },
     [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 },
+    [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x78] = { .simd_size = simd_other, .two_op = 1 },
     [0x79] = { .simd_size = simd_other, .two_op = 1, .d8s = 1 },
     [0x7a ... 0x7c] = { .simd_size = simd_none, .two_op = 1 },
+    [0x7d ... 0x7e] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0x7f] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x8c] = { .simd_size = simd_packed_int },
     [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 },
     [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 },
@@ -1861,6 +1865,7 @@ static bool vcpu_has(
 #define vcpu_has_sha()         vcpu_has(         7, EBX, 29, ctxt, ops)
 #define vcpu_has_avx512bw()    vcpu_has(         7, EBX, 30, ctxt, ops)
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
+#define vcpu_has_avx512_vbmi() vcpu_has(         7, ECX,  1, ctxt, ops)
 #define vcpu_has_rdpid()       vcpu_has(         7, ECX, 22, ctxt, ops)
 #define vcpu_has_clzero()      vcpu_has(0x80000008, EBX,  0, ctxt, ops)
 
@@ -6043,6 +6048,11 @@ x86_emulate(
     CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x15): /* vunpckhp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         generate_exception_if(evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK),
                               EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x76): /* vpermi2{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x77): /* vpermi2p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x7e): /* vpermt2{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x7f): /* vpermt2p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         fault_suppression = false;
         /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xdb): /* vpand{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
@@ -8564,6 +8574,16 @@ x86_emulate(
         generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD);
         goto simd_0f_avx2;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        if ( !evex.w )
+            host_and_vcpu_must_have(avx512_vbmi);
+        else
+            host_and_vcpu_must_have(avx512bw);
+        generate_exception_if(evex.brs, EXC_UD);
+        fault_suppression = false;
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x79): /* vpbroadcastw xmm/m16,[xyz]mm{k} */
         host_and_vcpu_must_have(avx512bw);
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -107,6 +107,7 @@
 #define cpu_has_avx512vl        boot_cpu_has(X86_FEATURE_AVX512VL)
 
 /* CPUID level 0x00000007:0.ecx */
+#define cpu_has_avx512_vbmi     boot_cpu_has(X86_FEATURE_AVX512_VBMI)
 #define cpu_has_rdpid           boot_cpu_has(X86_FEATURE_RDPID)
 
 /* CPUID level 0x80000007.edx */
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -224,7 +224,7 @@ XEN_CPUFEATURE(AVX512VL,      5*32+31) /
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.ecx, word 6 */
 XEN_CPUFEATURE(PREFETCHWT1,   6*32+ 0) /*A  PREFETCHWT1 instruction */
-XEN_CPUFEATURE(AVX512VBMI,    6*32+ 1) /*A  AVX-512 Vector Byte Manipulation Instrs */
+XEN_CPUFEATURE(AVX512_VBMI,   6*32+ 1) /*A  AVX-512 Vector Byte Manipulation Instrs */
 XEN_CPUFEATURE(UMIP,          6*32+ 2) /*S  User Mode Instruction Prevention */
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -259,12 +259,17 @@ def crunch_numbers(state):
         AVX2: [AVX512F],
 
         # AVX512F is taken to mean hardware support for 512bit registers
-        # (which in practice depends on the EVEX prefix to encode), and the
-        # instructions themselves. All further AVX512 features are built on
-        # top of AVX512F
+        # (which in practice depends on the EVEX prefix to encode) as well
+        # as mask registers, and the instructions themselves. All further
+        # AVX512 features are built on top of AVX512F
         AVX512F: [AVX512DQ, AVX512IFMA, AVX512PF, AVX512ER, AVX512CD,
-                  AVX512BW, AVX512VL, AVX512VBMI, AVX512_4VNNIW,
-                  AVX512_4FMAPS, AVX512_VPOPCNTDQ],
+                  AVX512BW, AVX512VL, AVX512_4VNNIW, AVX512_4FMAPS,
+                  AVX512_VPOPCNTDQ],
+
+        # AVX512 extensions acting solely on vectors of bytes/words are made
+        # dependents of AVX512BW (as to requiring wider than 16-bit mask
+        # registers), despite the SDM not formally making this connection.
+        AVX512BW: [AVX512_VBMI],
 
         # The features:
         #   * Single Thread Indirect Branch Predictors

From patchwork Fri Mar 15 10:43:01 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854467
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25C7A15AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:44:47 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 06DF72A842
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:44:47 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id EB1702A86D; Fri, 15 Mar 2019 10:44:46 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 31F9B2A842
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:44:46 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kIr-0004cV-GD; Fri, 15 Mar 2019 10:43:05 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kIq-0004cM-Rn
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:43:04 +0000
X-Inumbo-ID: 1b04e870-470f-11e9-9060-ab704e06104d
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 1b04e870-470f-11e9-9060-ab704e06104d;
 Fri, 15 Mar 2019 10:43:02 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:43:01 -0600
Message-Id: <5C8B81B5020000780021F16A@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:43:01 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 11/50] x86emul: support AVX512{F,
 BW} integer shuffle insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Also include vshuff{32x4,64x2} as being very similar to vshufi{32x4,64x2}.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
---
v8: Re-base.
v7: Disable fault suppression for VPSHUF{D,{H,L}W}. Re-base.
v6: Re-base over changes earlier in the series.
v5: Re-base over changes earlier in the series.
v4: Move OVR() addition into __AVX512VL__ conditional. Correct comments.
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -214,6 +214,7 @@ static const struct test avx512f_all[] =
     INSN(prolv,        66, 0f38, 15,    vl,     dq, vl),
     INSNX(pror,        66,   0f, 72, 0, vl,     dq, vl),
     INSN(prorv,        66, 0f38, 14,    vl,     dq, vl),
+    INSN(pshufd,       66,   0f, 70,    vl,      d, vl),
     INSN(pslld,        66,   0f, f2,    el_4,    d, vl),
     INSNX(pslld,       66,   0f, 72, 6, vl,      d, vl),
     INSN(psllq,        66,   0f, f3,    el_2,    q, vl),
@@ -264,6 +265,10 @@ static const struct test avx512f_no128[]
     INSN(extracti32x4,   66, 0f3a, 39, el_4,  d, vl),
     INSN(insertf32x4,    66, 0f3a, 18, el_4,  d, vl),
     INSN(inserti32x4,    66, 0f3a, 38, el_4,  d, vl),
+    INSN(shuff32x4,      66, 0f3a, 23, vl,    d, vl),
+    INSN(shuff64x2,      66, 0f3a, 23, vl,    q, vl),
+    INSN(shufi32x4,      66, 0f3a, 43, vl,    d, vl),
+    INSN(shufi64x2,      66, 0f3a, 43, vl,    q, vl),
 };
 
 static const struct test avx512f_512[] = {
@@ -318,6 +323,9 @@ static const struct test avx512bw_all[]
     INSN(pmulhw,      66,   0f, e5,    vl,    w, vl),
     INSN(pmullw,      66,   0f, d5,    vl,    w, vl),
     INSN(psadbw,      66,   0f, f6,    vl,    b, vl),
+    INSN(pshufb,      66, 0f38, 00,    vl,    b, vl),
+    INSN(pshufhw,     f3,   0f, 70,    vl,    w, vl),
+    INSN(pshuflw,     f2,   0f, 70,    vl,    w, vl),
     INSNX(pslldq,     66,   0f, 73, 7, vl,    b, vl),
     INSN(psllvw,      66, 0f38, 12,    vl,    w, vl),
     INSN(psllw,       66,   0f, f1,    el_8,  w, vl),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -153,6 +153,10 @@ static inline bool _to_bool(byte_vec_t b
 #  else
 #   define interleave_hi(x, y) B(vpermi2varps, _mask, x, interleave_hi, y, ~0)
 #   define interleave_lo(x, y) B(vpermt2varps, _mask, interleave_lo, x, y, ~0)
+#   define swap(x) ({ \
+    vec_t t_ = B(shuf_f32x4_, _mask, x, x, VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0); \
+    B(shufps, _mask, t_, t_, 0b00011011, undef(), ~0); \
+})
 #  endif
 # elif FLOAT_SIZE == 8
 #  if VEC_SIZE >= 32
@@ -181,6 +185,10 @@ static inline bool _to_bool(byte_vec_t b
 #  else
 #   define interleave_hi(x, y) B(vpermi2varpd, _mask, x, interleave_hi, y, ~0)
 #   define interleave_lo(x, y) B(vpermt2varpd, _mask, interleave_lo, x, y, ~0)
+#   define swap(x) ({ \
+    vec_t t_ = B(shuf_f64x2_, _mask, x, x, VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0); \
+    B(shufpd, _mask, t_, t_, 0b01010101, undef(), ~0); \
+})
 #  endif
 # endif
 #elif FLOAT_SIZE == 4 && defined(__SSE__)
@@ -309,9 +317,14 @@ static inline bool _to_bool(byte_vec_t b
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) ((vec_t)B(punpckhdq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(punpckldq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
+#   define swap(x) ((vec_t)B(pshufd, _mask, (vsi_t)(x), 0b00011011, (vsi_t)undef(), ~0))
 #  else
 #   define interleave_hi(x, y) ((vec_t)B(vpermi2vard, _mask, (vsi_t)(x), interleave_hi, (vsi_t)(y), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(vpermt2vard, _mask, interleave_lo, (vsi_t)(x), (vsi_t)(y), ~0))
+#   define swap(x) ((vec_t)B(pshufd, _mask, \
+                             B(shuf_i32x4_, _mask, (vsi_t)(x), (vsi_t)(x), \
+                               VEC_SIZE == 32 ? 0b01 : 0b00011011, (vsi_t)undef(), ~0), \
+                             0b00011011, (vsi_t)undef(), ~0))
 #  endif
 #  define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \
                               (0b0101010101010101 & ((1 << ELEM_COUNT) - 1))))
@@ -333,9 +346,14 @@ static inline bool _to_bool(byte_vec_t b
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) ((vec_t)B(punpckhqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(punpcklqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
+#   define swap(x) ((vec_t)B(pshufd, _mask, (vsi_t)(x), 0b01001110, (vsi_t)undef(), ~0))
 #  else
 #   define interleave_hi(x, y) ((vec_t)B(vpermi2varq, _mask, (vdi_t)(x), interleave_hi, (vdi_t)(y), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(vpermt2varq, _mask, interleave_lo, (vdi_t)(x), (vdi_t)(y), ~0))
+#   define swap(x) ((vec_t)B(pshufd, _mask, \
+                             (vsi_t)B(shuf_i64x2_, _mask, (vdi_t)(x), (vdi_t)(x), \
+                                      VEC_SIZE == 32 ? 0b01 : 0b00011011, (vdi_t)undef(), ~0), \
+                             0b01001110, (vsi_t)undef(), ~0))
 #  endif
 #  define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101))
 # endif
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -119,6 +119,12 @@ typedef long long __attribute__((vector_
 
 #ifdef __AVX512F__
 
+/* Sadly there are a few exceptions to the general naming rules. */
+# define __builtin_ia32_shuf_f32x4_512_mask __builtin_ia32_shuf_f32x4_mask
+# define __builtin_ia32_shuf_f64x2_512_mask __builtin_ia32_shuf_f64x2_mask
+# define __builtin_ia32_shuf_i32x4_512_mask __builtin_ia32_shuf_i32x4_mask
+# define __builtin_ia32_shuf_i64x2_512_mask __builtin_ia32_shuf_i64x2_mask
+
 # if VEC_SIZE > ELEM_SIZE && (defined(VEC_MAX) ? VEC_MAX : VEC_SIZE) < 64
 #  pragma GCC target ( "avx512vl" )
 # endif
@@ -262,6 +268,7 @@ OVR(pmovzxwq);
 OVR(pmulld);
 OVR(pmuldq);
 OVR(pmuludq);
+OVR(pshufd);
 OVR(punpckhdq);
 OVR(punpckhqdq);
 OVR(punpckldq);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -318,7 +318,7 @@ static const struct twobyte_table {
     [0x6b ... 0x6d] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
     [0x6e] = { DstImplicit|SrcMem|ModRM|Mov, simd_none, d8s_dq64 },
     [0x6f] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_int, d8s_vl },
-    [0x70] = { SrcImmByte|ModRM|TwoOp, simd_other },
+    [0x70] = { SrcImmByte|ModRM|TwoOp, simd_other, d8s_vl },
     [0x71 ... 0x73] = { DstImplicit|SrcImmByte|ModRM, simd_none, d8s_vl },
     [0x74 ... 0x76] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
     [0x77] = { DstImplicit|SrcNone },
@@ -432,7 +432,8 @@ static const struct ext0f38_table {
     uint8_t vsib:1;
     disp8scale_t d8s:4;
 } ext0f38_table[256] = {
-    [0x00 ... 0x0b] = { .simd_size = simd_packed_int },
+    [0x00] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0x01 ... 0x0b] = { .simd_size = simd_packed_int },
     [0x0c ... 0x0f] = { .simd_size = simd_packed_fp },
     [0x10 ... 0x12] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x13] = { .simd_size = simd_other, .two_op = 1 },
@@ -543,6 +544,7 @@ static const struct ext0f3a_table {
     [0x20] = { .simd_size = simd_none, .d8s = 0 },
     [0x21] = { .simd_size = simd_other, .d8s = 2 },
     [0x22] = { .simd_size = simd_none, .d8s = d8s_dq64 },
+    [0x23] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x25] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x30 ... 0x33] = { .simd_size = simd_other, .two_op = 1 },
     [0x38] = { .simd_size = simd_128, .d8s = 4 },
@@ -552,6 +554,7 @@ static const struct ext0f3a_table {
     [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x40 ... 0x41] = { .simd_size = simd_packed_fp },
     [0x42] = { .simd_size = simd_packed_int },
+    [0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x44] = { .simd_size = simd_packed_int },
     [0x46] = { .simd_size = simd_packed_int },
     [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 },
@@ -6701,6 +6704,7 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f, 0xe1): /* vpsraw xmm/m128,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf1): /* vpsllw xmm/m128,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf5): /* vpmaddwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x00): /* vpshufb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         fault_suppression = false;
         /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xd5): /* vpmullw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
@@ -6955,6 +6959,21 @@ x86_emulate(
         insn_bytes = PFX_BYTES + 3;
         break;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x70): /* vpshufd $imm8,[xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f, 0x70): /* vpshufhw $imm8,[xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_F2(0x0f, 0x70): /* vpshuflw $imm8,[xyz]mm/mem,[xyz]mm{k} */
+        if ( evex.pfx == vex_66 )
+            generate_exception_if(evex.w, EXC_UD);
+        else
+        {
+            host_and_vcpu_must_have(avx512bw);
+            generate_exception_if(evex.brs, EXC_UD);
+        }
+        d = (d & ~SrcMask) | SrcMem | TwoOp;
+        op_bytes = 16 << evex.lr;
+        fault_suppression = false;
+        goto avx512f_imm8_no_sae;
+
     CASE_SIMD_PACKED_INT(0x0f, 0x71):    /* Grp12 */
     case X86EMUL_OPC_VEX_66(0x0f, 0x71):
     CASE_SIMD_PACKED_INT(0x0f, 0x72):    /* Grp13 */
@@ -9150,7 +9169,13 @@ x86_emulate(
                                             /* vextracti64x2 $imm8,{y,z}mm,xmm/m128{k} */
         if ( evex.w )
             host_and_vcpu_must_have(avx512dq);
-        generate_exception_if(!evex.lr || evex.brs, EXC_UD);
+        generate_exception_if(evex.brs, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x23): /* vshuff32x4 $imm8,{y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
+                                            /* vshuff64x2 $imm8,{y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x43): /* vshufi32x4 $imm8,{y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
+                                            /* vshufi64x2 $imm8,{y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
+        generate_exception_if(!evex.lr, EXC_UD);
         fault_suppression = false;
         goto avx512f_imm8_no_sae;
 

From patchwork Fri Mar 15 10:43:31 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854469
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8CBF915AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:45:09 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71C7D2A842
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:45:09 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 6077B2A86D; Fri, 15 Mar 2019 10:45:09 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EEB5C2A842
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:45:08 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kJJ-0004gY-RJ; Fri, 15 Mar 2019 10:43:33 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kJJ-0004gQ-3I
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:43:33 +0000
X-Inumbo-ID: 2d040f67-470f-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 2d040f67-470f-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:43:32 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:43:31 -0600
Message-Id: <5C8B81D3020000780021F16D@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:43:31 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 12/50] x86emul: support AVX512{BW,
 DQ} mask move insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Entries to the tables in evex-disp8.c are added despite these insns not
allowing for memory operands, with the goal of the tables giving a
complete picture of the supported EVEX-encoded insns in the end.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -314,9 +314,12 @@ static const struct test avx512bw_all[]
     INSN(pminsw,      66,   0f, ea,    vl,    w, vl),
     INSN(pminub,      66,   0f, da,    vl,    b, vl),
     INSN(pminuw,      66, 0f38, 3a,    vl,    w, vl),
+//       pmovb2m,     f3, 0f38, 29,           b
+//       pmovm2,      f3, 0f38, 28,          bw
     INSN(pmovswb,     f3, 0f38, 20,    vl_2,  b, vl),
     INSN(pmovsxbw,    66, 0f38, 20,    vl_2,  b, vl),
     INSN(pmovuswb,    f3, 0f38, 10,    vl_2,  b, vl),
+//       pmovw2m,     f3, 0f38, 29,           w
     INSN(pmovwb,      f3, 0f38, 30,    vl_2,  b, vl),
     INSN(pmovzxbw,    66, 0f38, 30,    vl_2,  b, vl),
     INSN(pmulhuw,     66,   0f, e4,    vl,    w, vl),
@@ -364,6 +367,9 @@ static const struct test avx512dq_all[]
     INSN_PFP(andn,             0f, 55),
     INSN(broadcasti32x2, 66, 0f38, 59, el_2,  d, vl),
     INSN_PFP(or,               0f, 56),
+//       pmovd2m,        f3, 0f38, 39,        d
+//       pmovm2,         f3, 0f38, 38,       dq
+//       pmovq2m,        f3, 0f38, 39,        q
     INSN(pmullq,         66, 0f38, 40,   vl,  q, vl),
     INSN_PFP(xor,              0f, 57),
 };
--- a/tools/tests/x86_emulator/opmask.S
+++ b/tools/tests/x86_emulator/opmask.S
@@ -12,17 +12,23 @@
 
 #if SIZE == 1
 # define _(x) x##b
+# define _v(x, t) _v_(x##q, t)
 #elif SIZE == 2
 # define _(x) x##w
+# define _v(x, t) _v_(x##d, t)
 # define WIDEN(x) x##bw
 #elif SIZE == 4
 # define _(x) x##d
+# define _v(x, t) _v_(x##w, t)
 # define WIDEN(x) x##wd
 #elif SIZE == 8
 # define _(x) x##q
+# define _v(x, t) _v_(x##b, t)
 # define WIDEN(x) x##dq
 #endif
 
+#define _v_(x, t) v##x##t
+
     .macro check res1:req, res2:req, line:req
     _(kmov)       %\res1, DATA(out)
 #if SIZE < 8 || !defined(__i386__)
@@ -131,6 +137,15 @@ _start:
 
 #endif
 
+#if SIZE > 2 ? defined(__AVX512BW__) : defined(__AVX512DQ__)
+
+    _(kmov)       DATA(in1), %k0
+    _v(pmovm2,)   %k0, %zmm7
+    _v(pmov,2m)   %zmm7, %k3
+    check         k0, k3, __LINE__
+
+#endif
+
     xor           %eax, %eax
     ret
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -8465,6 +8465,21 @@ x86_emulate(
         elem_bytes = (b & 7) < 3 ? 1 : (b & 7) != 5 ? 2 : 4;
         goto avx512f_no_sae;
 
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x29): /* vpmov{b,w}2m [xyz]mm,k */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x39): /* vpmov{d,q}2m [xyz]mm,k */
+        generate_exception_if(!evex.r || !evex.R, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x28): /* vpmovm2{b,w} k,[xyz]mm */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x38): /* vpmovm2{d,q} k,[xyz]mm */
+        if ( b & 0x10 )
+            host_and_vcpu_must_have(avx512dq);
+        else
+            host_and_vcpu_must_have(avx512bw);
+        generate_exception_if(evex.opmsk || ea.type != OP_REG, EXC_UD);
+        d |= TwoOp;
+        op_bytes = 16 << evex.lr;
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_66(0x0f38, 0x2a):     /* movntdqa m128,xmm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x2a): /* vmovntdqa mem,{x,y}mm */
         generate_exception_if(ea.type != OP_MEM, EXC_UD);

From patchwork Fri Mar 15 10:43:57 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854471
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 500C015AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:45:47 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 336FF2A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:45:47 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 257492A937; Fri, 15 Mar 2019 10:45:47 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4411A2A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:45:46 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kJp-0004mm-6i; Fri, 15 Mar 2019 10:44:05 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kJn-0004mX-Na
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:44:03 +0000
X-Inumbo-ID: 3d50d2f4-470f-11e9-8a90-0b134121ba9f
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 3d50d2f4-470f-11e9-8a90-0b134121ba9f;
 Fri, 15 Mar 2019 10:43:59 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:43:58 -0600
Message-Id: <5C8B81ED020000780021F170@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:43:57 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 13/50] x86emul: basic AVX512BW testing
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Test various of the insns which have been implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v8: Correct PS{R,L}LDQ overrides.
v6: Re-base over changes earlier in the series.
v4: Add __AVX512VL__ conditional around majority of OVR() additions.
    Correct eq() for 1- and 2-byte cases.
v3: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86
 
 CFLAGS += $(CFLAGS_xeninclude)
 
-SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f
+SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw
 FMA := fma4 fma
 SG := avx2-sg
 TESTCASES := blowfish $(SIMD) $(FMA) $(SG)
@@ -66,6 +66,9 @@ xop-flts := $(avx-flts)
 avx512f-vecs := 64 16 32
 avx512f-ints := 4 8
 avx512f-flts := 4 8
+avx512bw-vecs := $(avx512f-vecs)
+avx512bw-ints := 1 2
+avx512bw-flts :=
 
 avx512f-opmask-vecs := 2
 avx512dq-opmask-vecs := 1
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -31,6 +31,10 @@ ENTRY(simd_test);
 #  define eq(x, y) ((BR(cmpps, _mask, x, y, 0, -1) & ALL_TRUE) == ALL_TRUE)
 # elif FLOAT_SIZE == 8
 #  define eq(x, y) (BR(cmppd, _mask, x, y, 0, -1) == ALL_TRUE)
+# elif (INT_SIZE == 1 || UINT_SIZE == 1) && defined(__AVX512BW__)
+#  define eq(x, y) (B(pcmpeqb, _mask, (vqi_t)(x), (vqi_t)(y), -1) == ALL_TRUE)
+# elif (INT_SIZE == 2 || UINT_SIZE == 2) && defined(__AVX512BW__)
+#  define eq(x, y) (B(pcmpeqw, _mask, (vhi_t)(x), (vhi_t)(y), -1) == ALL_TRUE)
 # elif INT_SIZE == 4 || UINT_SIZE == 4
 #  define eq(x, y) (B(pcmpeqd, _mask, (vsi_t)(x), (vsi_t)(y), -1) == ALL_TRUE)
 # elif INT_SIZE == 8 || UINT_SIZE == 8
@@ -374,6 +378,87 @@ static inline bool _to_bool(byte_vec_t b
 #  define max(x, y) ((vec_t)B(pmaxuq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
 #  define min(x, y) ((vec_t)B(pminuq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
 # endif
+#elif (INT_SIZE == 1 || UINT_SIZE == 1 || INT_SIZE == 2 || UINT_SIZE == 2) && \
+      defined(__AVX512BW__) && (VEC_SIZE == 64 || defined(__AVX512VL__))
+# if INT_SIZE == 1 || UINT_SIZE == 1
+#  define broadcast(x) ({ \
+    vec_t t_; \
+    asm ( "%{evex%} vpbroadcastb %1, %0" \
+          : "=v" (t_) : "m" (*(char[1]){ x }) ); \
+    t_; \
+})
+#  define broadcast2(x) ({ \
+    vec_t t_; \
+    asm ( "vpbroadcastb %k1, %0" : "=v" (t_) : "r" (x) ); \
+    t_; \
+})
+#  if VEC_SIZE == 16
+#   define interleave_hi(x, y) ((vec_t)B(punpckhbw, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
+#   define interleave_lo(x, y) ((vec_t)B(punpcklbw, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
+#   define swap(x) ((vec_t)B(pshufb, _mask, (vqi_t)(x), (vqi_t)(inv - 1), (vqi_t)undef(), ~0))
+#  elif defined(__AVX512VBMI__)
+#   define interleave_hi(x, y) ((vec_t)B(vpermi2varqi, _mask, (vqi_t)(x), interleave_hi, (vqi_t)(y), ~0))
+#   define interleave_lo(x, y) ((vec_t)B(vpermt2varqi, _mask, interleave_lo, (vqi_t)(x), (vqi_t)(y), ~0))
+#  endif
+#  define mix(x, y) ((vec_t)B(movdquqi, _mask, (vqi_t)(x), (vqi_t)(y), \
+                              (0b0101010101010101010101010101010101010101010101010101010101010101LL & ALL_TRUE)))
+#  define shrink1(x) ((half_t)B(pmovwb, _mask, (vhi_t)(x), (vqi_half_t){}, ~0))
+#  define shrink2(x) ((quarter_t)B(pmovdb, _mask, (vsi_t)(x), (vqi_quarter_t){}, ~0))
+#  define shrink3(x) ((eighth_t)B(pmovqb, _mask, (vdi_t)(x), (vqi_eighth_t){}, ~0))
+# elif INT_SIZE == 2 || UINT_SIZE == 2
+#  define broadcast(x) ({ \
+    vec_t t_; \
+    asm ( "%{evex%} vpbroadcastw %1, %0" \
+          : "=v" (t_) : "m" (*(short[1]){ x }) ); \
+    t_; \
+})
+#  define broadcast2(x) ({ \
+    vec_t t_; \
+    asm ( "vpbroadcastw %k1, %0" : "=v" (t_) : "r" (x) ); \
+    t_; \
+})
+#  if VEC_SIZE == 16
+#   define interleave_hi(x, y) ((vec_t)B(punpckhwd, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0))
+#   define interleave_lo(x, y) ((vec_t)B(punpcklwd, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0))
+#   define swap(x) ((vec_t)B(pshufd, _mask, \
+                             (vsi_t)B(pshufhw, _mask, \
+                                      B(pshuflw, _mask, (vhi_t)(x), 0b00011011, (vhi_t)undef(), ~0), \
+                                      0b00011011, (vhi_t)undef(), ~0), \
+                             0b01001110, (vsi_t)undef(), ~0))
+#  else
+#   define interleave_hi(x, y) ((vec_t)B(vpermi2varhi, _mask, (vhi_t)(x), interleave_hi, (vhi_t)(y), ~0))
+#   define interleave_lo(x, y) ((vec_t)B(vpermt2varhi, _mask, interleave_lo, (vhi_t)(x), (vhi_t)(y), ~0))
+#  endif
+#  define mix(x, y) ((vec_t)B(movdquhi, _mask, (vhi_t)(x), (vhi_t)(y), \
+                              (0b01010101010101010101010101010101 & ALL_TRUE)))
+#  define shrink1(x) ((half_t)B(pmovdw, _mask, (vsi_t)(x), (vhi_half_t){}, ~0))
+#  define shrink2(x) ((quarter_t)B(pmovqw, _mask, (vdi_t)(x), (vhi_quarter_t){}, ~0))
+# endif
+# if INT_SIZE == 1
+#  define max(x, y) ((vec_t)B(pmaxsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
+#  define min(x, y) ((vec_t)B(pminsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
+#  define widen1(x) ((vec_t)B(pmovsxbw, _mask, (vqi_half_t)(x), (vhi_t)undef(), ~0))
+#  define widen2(x) ((vec_t)B(pmovsxbd, _mask, (vqi_quarter_t)(x), (vsi_t)undef(), ~0))
+#  define widen3(x) ((vec_t)B(pmovsxbq, _mask, (vqi_eighth_t)(x), (vdi_t)undef(), ~0))
+# elif UINT_SIZE == 1
+#  define max(x, y) ((vec_t)B(pmaxub, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
+#  define min(x, y) ((vec_t)B(pminub, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
+#  define widen1(x) ((vec_t)B(pmovzxbw, _mask, (vqi_half_t)(x), (vhi_t)undef(), ~0))
+#  define widen2(x) ((vec_t)B(pmovzxbd, _mask, (vqi_quarter_t)(x), (vsi_t)undef(), ~0))
+#  define widen3(x) ((vec_t)B(pmovzxbq, _mask, (vqi_eighth_t)(x), (vdi_t)undef(), ~0))
+# elif INT_SIZE == 2
+#  define max(x, y) B(pmaxsw, _mask, x, y, undef(), ~0)
+#  define min(x, y) B(pminsw, _mask, x, y, undef(), ~0)
+#  define mul_hi(x, y) B(pmulhw, _mask, x, y, undef(), ~0)
+#  define widen1(x) ((vec_t)B(pmovsxwd, _mask, x, (vsi_t)undef(), ~0))
+#  define widen2(x) ((vec_t)B(pmovsxwq, _mask, x, (vdi_t)undef(), ~0))
+# elif UINT_SIZE == 2
+#  define max(x, y) ((vec_t)B(pmaxuw, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0))
+#  define min(x, y) ((vec_t)B(pminuw, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0))
+#  define mul_hi(x, y) ((vec_t)B(pmulhuw, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0))
+#  define widen1(x) ((vec_t)B(pmovzxwd, _mask, (vhi_half_t)(x), (vsi_t)undef(), ~0))
+#  define widen2(x) ((vec_t)B(pmovzxwq, _mask, (vhi_quarter_t)(x), (vdi_t)undef(), ~0))
+# endif
 #elif VEC_SIZE == 16 && defined(__SSE2__)
 # if INT_SIZE == 1 || UINT_SIZE == 1
 #  define interleave_hi(x, y) ((vec_t)__builtin_ia32_punpckhbw128((vqi_t)(x), (vqi_t)(y)))
@@ -565,7 +650,7 @@ static inline bool _to_bool(byte_vec_t b
 #  endif
 # endif
 #endif
-#if VEC_SIZE == 16 && defined(__SSSE3__)
+#if VEC_SIZE == 16 && defined(__SSSE3__) && !defined(__AVX512VL__)
 # if INT_SIZE == 1
 #  define abs(x) ((vec_t)__builtin_ia32_pabsb128((vqi_t)(x)))
 # elif INT_SIZE == 2
@@ -789,6 +874,40 @@ static inline half_t low_half(vec_t x)
 }
 # endif
 
+# if !defined(low_quarter) && defined(QUARTER_SIZE)
+static inline quarter_t low_quarter(vec_t x)
+{
+#  if QUARTER_SIZE < VEC_SIZE
+    quarter_t y;
+    unsigned int i;
+
+    for ( i = 0; i < ELEM_COUNT / 4; ++i )
+        y[i] = x[i];
+
+    return y;
+#  else
+    return x;
+#  endif
+}
+# endif
+
+# if !defined(low_eighth) && defined(EIGHTH_SIZE)
+static inline eighth_t low_eighth(vec_t x)
+{
+#  if EIGHTH_SIZE < VEC_SIZE
+    eighth_t y;
+    unsigned int i;
+
+    for ( i = 0; i < ELEM_COUNT / 4; ++i )
+        y[i] = x[i];
+
+    return y;
+#  else
+    return x;
+#  endif
+}
+# endif
+
 #endif
 
 #if defined(__AVX512F__) && defined(FLOAT_SIZE)
@@ -1117,7 +1236,7 @@ int simd_test(void)
     y = interleave_lo(alt < 0, alt < 0);
     y = interleave_lo(z, y);
     touch(x);
-    z = widen2(x);
+    z = widen2(low_quarter(x));
     touch(x);
     if ( !eq(z, y) ) return __LINE__;
 
@@ -1126,7 +1245,7 @@ int simd_test(void)
     y = interleave_lo(y, y);
     y = interleave_lo(z, y);
     touch(x);
-    z = widen3(x);
+    z = widen3(low_eighth(x));
     touch(x);
     if ( !eq(z, y) ) return __LINE__;
 #  endif
@@ -1148,14 +1267,14 @@ int simd_test(void)
 
 # ifdef widen2
     touch(src);
-    x = widen2(src);
+    x = widen2(low_quarter(src));
     touch(src);
     if ( !eq(x, z) ) return __LINE__;
 # endif
 
 # ifdef widen3
     touch(src);
-    x = widen3(src);
+    x = widen3(low_eighth(src));
     touch(src);
     if ( !eq(x, interleave_lo(z, (vec_t){})) ) return __LINE__;
 # endif
@@ -1175,6 +1294,36 @@ int simd_test(void)
             if ( aux2[i] != src[i] )
                 return __LINE__;
     }
+#endif
+
+#if defined(widen2) && defined(shrink2)
+    {
+        quarter_t aux1 = low_quarter(src), aux2;
+
+        touch(aux1);
+        x = widen2(aux1);
+        touch(x);
+        aux2 = shrink2(x);
+        touch(aux2);
+        for ( i = 0; i < ELEM_COUNT / 4; ++i )
+            if ( aux2[i] != src[i] )
+                return __LINE__;
+    }
+#endif
+
+#if defined(widen3) && defined(shrink3)
+    {
+        eighth_t aux1 = low_eighth(src), aux2;
+
+        touch(aux1);
+        x = widen3(aux1);
+        touch(x);
+        aux2 = shrink3(x);
+        touch(aux2);
+        for ( i = 0; i < ELEM_COUNT / 8; ++i )
+            if ( aux2[i] != src[i] )
+                return __LINE__;
+    }
 #endif
 
 #ifdef dup_lo
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -95,6 +95,32 @@ typedef int __attribute__((vector_size(H
 typedef long long __attribute__((vector_size(HALF_SIZE))) vdi_half_t;
 # endif
 
+# if ELEM_COUNT >= 4
+#  if VEC_SIZE > 64
+#   define QUARTER_SIZE (VEC_SIZE / 4)
+#  else
+#   define QUARTER_SIZE 16
+#  endif
+typedef typeof((vec_t){}[0]) __attribute__((vector_size(QUARTER_SIZE))) quarter_t;
+typedef char __attribute__((vector_size(QUARTER_SIZE))) vqi_quarter_t;
+typedef short __attribute__((vector_size(QUARTER_SIZE))) vhi_quarter_t;
+typedef int __attribute__((vector_size(QUARTER_SIZE))) vsi_quarter_t;
+typedef long long __attribute__((vector_size(QUARTER_SIZE))) vdi_quarter_t;
+# endif
+
+# if ELEM_COUNT >= 8
+#  if VEC_SIZE > 128
+#   define EIGHTH_SIZE (VEC_SIZE / 8)
+#  else
+#   define EIGHTH_SIZE 16
+#  endif
+typedef typeof((vec_t){}[0]) __attribute__((vector_size(EIGHTH_SIZE))) eighth_t;
+typedef char __attribute__((vector_size(EIGHTH_SIZE))) vqi_eighth_t;
+typedef short __attribute__((vector_size(EIGHTH_SIZE))) vhi_eighth_t;
+typedef int __attribute__((vector_size(EIGHTH_SIZE))) vsi_eighth_t;
+typedef long long __attribute__((vector_size(EIGHTH_SIZE))) vdi_eighth_t;
+# endif
+
 #endif
 
 #if VEC_SIZE == 16
@@ -182,6 +208,9 @@ OVR_SFP(broadcast);
 OVR_SFP(comi);
 OVR_FP(add);
 OVR_INT(add);
+OVR_BW(adds);
+OVR_BW(addus);
+OVR_BW(avg);
 OVR_FP(div);
 OVR(extractps);
 OVR_FMA(fmadd, FP);
@@ -214,6 +243,8 @@ OVR_INT(srl);
 OVR_DQ(srlv);
 OVR_FP(sub);
 OVR_INT(sub);
+OVR_BW(subs);
+OVR_BW(subus);
 OVR_SFP(ucomi);
 OVR_VFP(unpckh);
 OVR_VFP(unpckl);
@@ -275,6 +306,31 @@ OVR(punpckldq);
 OVR(punpcklqdq);
 # endif
 
+# ifdef __AVX512BW__
+OVR(pextrb);
+OVR(pextrw);
+OVR(pinsrb);
+OVR(pinsrw);
+#  ifdef __AVX512VL__
+OVR(pmaddwd);
+OVR(pmovsxbw);
+OVR(pmovzxbw);
+OVR(pmulhuw);
+OVR(pmulhw);
+OVR(pmullw);
+OVR(psadbw);
+OVR(pshufb);
+OVR(pshufhw);
+OVR(pshuflw);
+OVR(pslldq);
+OVR(psrldq);
+OVR(punpckhbw);
+OVR(punpckhwd);
+OVR(punpcklbw);
+OVR(punpcklwd);
+#  endif
+# endif
+
 # undef OVR_VFP
 # undef OVR_SFP
 # undef OVR_INT
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -22,6 +22,7 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512dq-opmask.h"
 #include "avx512bw-opmask.h"
 #include "avx512f.h"
+#include "avx512bw.h"
 
 #define verbose false /* Switch to true for far more logging. */
 
@@ -105,6 +106,11 @@ static bool simd_check_avx512bw(void)
 }
 #define simd_check_avx512bw_opmask simd_check_avx512bw
 
+static bool simd_check_avx512bw_vl(void)
+{
+    return cpu_has_avx512bw && cpu_has_avx512vl;
+}
+
 static void simd_set_regs(struct cpu_user_regs *regs)
 {
     if ( cpu_has_mmx )
@@ -284,6 +290,18 @@ static const struct {
     AVX512VL(VL u64x2,        avx512f,      16u8),
     AVX512VL(VL s64x4,        avx512f,      32i8),
     AVX512VL(VL u64x4,        avx512f,      32u8),
+    SIMD(AVX512BW s8x64,     avx512bw,      64i1),
+    SIMD(AVX512BW u8x64,     avx512bw,      64u1),
+    SIMD(AVX512BW s16x32,    avx512bw,      64i2),
+    SIMD(AVX512BW u16x32,    avx512bw,      64u2),
+    AVX512VL(BW+VL s8x16,    avx512bw,      16i1),
+    AVX512VL(BW+VL u8x16,    avx512bw,      16u1),
+    AVX512VL(BW+VL s8x32,    avx512bw,      32i1),
+    AVX512VL(BW+VL u8x32,    avx512bw,      32u1),
+    AVX512VL(BW+VL s16x8,    avx512bw,      16i2),
+    AVX512VL(BW+VL u16x8,    avx512bw,      16u2),
+    AVX512VL(BW+VL s16x16,   avx512bw,      32i2),
+    AVX512VL(BW+VL u16x16,   avx512bw,      32u2),
 #undef AVX512VL_
 #undef AVX512VL
 #undef SIMD_

From patchwork Fri Mar 15 10:44:23 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854473
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3BDD713B5
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:28 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1F4832A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 132502A938; Fri, 15 Mar 2019 10:46:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0362C2A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:27 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kKB-0004tU-Mo; Fri, 15 Mar 2019 10:44:27 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kKA-0004t5-DN
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:44:26 +0000
X-Inumbo-ID: 4c4c4af7-470f-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 4c4c4af7-470f-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:44:24 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:44:23 -0600
Message-Id: <5C8B8207020000780021F173@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:44:23 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 14/50] x86emul: basic AVX512DQ testing
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Test various of the insns which have been implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v6: Re-base.
v5: Re-base over changes earlier in the series.
v4: Wrap OVR(pmullq) in __AVX512VL__ conditional.
v3: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86
 
 CFLAGS += $(CFLAGS_xeninclude)
 
-SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw
+SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq
 FMA := fma4 fma
 SG := avx2-sg
 TESTCASES := blowfish $(SIMD) $(FMA) $(SG)
@@ -69,9 +69,12 @@ avx512f-flts := 4 8
 avx512bw-vecs := $(avx512f-vecs)
 avx512bw-ints := 1 2
 avx512bw-flts :=
+avx512dq-vecs := $(avx512f-vecs)
+avx512dq-ints := $(avx512f-ints)
+avx512dq-flts := $(avx512f-flts)
 
 avx512f-opmask-vecs := 2
-avx512dq-opmask-vecs := 1
+avx512dq-opmask-vecs := 1 2
 avx512bw-opmask-vecs := 4 8
 
 # Suppress building by default of the harness if the compiler can't deal
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -121,6 +121,34 @@ typedef int __attribute__((vector_size(E
 typedef long long __attribute__((vector_size(EIGHTH_SIZE))) vdi_eighth_t;
 # endif
 
+# define DECL_PAIR(w) \
+typedef w ## _t pair_t; \
+typedef vsi_ ## w ## _t vsi_pair_t; \
+typedef vdi_ ## w ## _t vdi_pair_t
+# define DECL_QUARTET(w) \
+typedef w ## _t quartet_t; \
+typedef vsi_ ## w ## _t vsi_quartet_t; \
+typedef vdi_ ## w ## _t vdi_quartet_t
+# define DECL_OCTET(w) \
+typedef w ## _t octet_t; \
+typedef vsi_ ## w ## _t vsi_octet_t; \
+typedef vdi_ ## w ## _t vdi_octet_t
+
+# if ELEM_COUNT == 4
+DECL_PAIR(half);
+# elif ELEM_COUNT == 8
+DECL_PAIR(quarter);
+DECL_QUARTET(half);
+# elif ELEM_COUNT == 16
+DECL_PAIR(eighth);
+DECL_QUARTET(quarter);
+DECL_OCTET(half);
+# endif
+
+# undef DECL_OCTET
+# undef DECL_QUARTET
+# undef DECL_PAIR
+
 #endif
 
 #if VEC_SIZE == 16
@@ -146,6 +174,14 @@ typedef long long __attribute__((vector_
 #ifdef __AVX512F__
 
 /* Sadly there are a few exceptions to the general naming rules. */
+# define __builtin_ia32_broadcastf32x4_512_mask __builtin_ia32_broadcastf32x4_512
+# define __builtin_ia32_broadcasti32x4_512_mask __builtin_ia32_broadcasti32x4_512
+# define __builtin_ia32_insertf32x4_512_mask __builtin_ia32_insertf32x4_mask
+# define __builtin_ia32_insertf32x8_512_mask __builtin_ia32_insertf32x8_mask
+# define __builtin_ia32_insertf64x4_512_mask __builtin_ia32_insertf64x4_mask
+# define __builtin_ia32_inserti32x4_512_mask __builtin_ia32_inserti32x4_mask
+# define __builtin_ia32_inserti32x8_512_mask __builtin_ia32_inserti32x8_mask
+# define __builtin_ia32_inserti64x4_512_mask __builtin_ia32_inserti64x4_mask
 # define __builtin_ia32_shuf_f32x4_512_mask __builtin_ia32_shuf_f32x4_mask
 # define __builtin_ia32_shuf_f64x2_512_mask __builtin_ia32_shuf_f64x2_mask
 # define __builtin_ia32_shuf_i32x4_512_mask __builtin_ia32_shuf_i32x4_mask
@@ -331,6 +367,20 @@ OVR(punpcklwd);
 #  endif
 # endif
 
+# ifdef __AVX512DQ__
+OVR_VFP(and);
+OVR_VFP(andn);
+OVR_VFP(or);
+OVR(pextrd);
+OVR(pextrq);
+OVR(pinsrd);
+OVR(pinsrq);
+#  ifdef __AVX512VL__
+OVR(pmullq);
+#  endif
+OVR_VFP(xor);
+# endif
+
 # undef OVR_VFP
 # undef OVR_SFP
 # undef OVR_INT
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -139,6 +139,27 @@ static inline bool _to_bool(byte_vec_t b
 # endif
 #elif defined(FLOAT_SIZE) && defined(__AVX512F__) && \
       (VEC_SIZE == 64 || defined(__AVX512VL__))
+# if ELEM_COUNT == 8 /* vextractf{32,64}x4 */ || \
+     (ELEM_COUNT == 16 && ELEM_SIZE == 4 && defined(__AVX512DQ__)) /* vextractf32x8 */ || \
+     (ELEM_COUNT == 4 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextractf64x2 */
+#  define low_half(x) ({ \
+    half_t t_; \
+    asm ( "vextractf%c[w]x%c[n] $0, %[s], %[d]" \
+          : [d] "=m" (t_) \
+          : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 2) ); \
+    t_; \
+})
+# endif
+# if (ELEM_COUNT == 16 && ELEM_SIZE == 4) /* vextractf32x4 */ || \
+     (ELEM_COUNT == 8 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextractf64x2 */
+#  define low_quarter(x) ({ \
+    quarter_t t_; \
+    asm ( "vextractf%c[w]x%c[n] $0, %[s], %[d]" \
+          : [d] "=m" (t_) \
+          : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 4) ); \
+    t_; \
+})
+# endif
 # if FLOAT_SIZE == 4
 #  define broadcast(x) ({ \
     vec_t t_; \
@@ -146,6 +167,17 @@ static inline bool _to_bool(byte_vec_t b
           : "=v" (t_) : "m" (*(float[1]){ x }) ); \
     t_; \
 })
+#  if VEC_SIZE >= 32 && defined(__AVX512DQ__)
+#   define broadcast_pair(x) ({ \
+    vec_t t_; \
+    asm ( "vbroadcastf32x2 %1, %0" : "=v" (t_) : "m" (x) ); \
+    t_; \
+})
+#  endif
+#  if VEC_SIZE == 64 && defined(__AVX512DQ__)
+#   define broadcast_octet(x) B(broadcastf32x8_, _mask, x, undef(), ~0)
+#   define insert_octet(x, y, p) B(insertf32x8_, _mask, x, y, p, undef(), ~0)
+#  endif
 #  define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0)
 #  define min(x, y) BR_(minps, _mask, x, y, undef(), ~0)
 #  define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE))
@@ -155,6 +187,13 @@ static inline bool _to_bool(byte_vec_t b
 #   define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0)
 #   define swap(x) B(shufps, _mask, x, x, 0b00011011, undef(), ~0)
 #  else
+#   define broadcast_quartet(x) B(broadcastf32x4_, _mask, x, undef(), ~0)
+#   define insert_pair(x, y, p) \
+    B(insertf32x4_, _mask, x, \
+      /* Cast needed below to work around gcc 7.x quirk. */ \
+      (p) & 1 ? (typeof(y))__builtin_ia32_shufps(y, y, 0b01000100) : (y), \
+      (p) >> 1, x, 3 << ((p) * 2))
+#   define insert_quartet(x, y, p) B(insertf32x4_, _mask, x, y, p, undef(), ~0)
 #   define interleave_hi(x, y) B(vpermi2varps, _mask, x, interleave_hi, y, ~0)
 #   define interleave_lo(x, y) B(vpermt2varps, _mask, interleave_lo, x, y, ~0)
 #   define swap(x) ({ \
@@ -178,6 +217,14 @@ static inline bool _to_bool(byte_vec_t b
     t_; \
 })
 #  endif
+#  if VEC_SIZE >= 32 && defined(__AVX512DQ__)
+#   define broadcast_pair(x) B(broadcastf64x2_, _mask, x, undef(), ~0)
+#   define insert_pair(x, y, p) B(insertf64x2_, _mask, x, y, p, undef(), ~0)
+#  endif
+#  if VEC_SIZE == 64
+#   define broadcast_quartet(x) B(broadcastf64x4_, , x, undef(), ~0)
+#   define insert_quartet(x, y, p) B(insertf64x4_, _mask, x, y, p, undef(), ~0)
+#  endif
 #  define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0)
 #  define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0)
 #  define mix(x, y) B(movapd, _mask, x, y, 0b01010101)
@@ -306,6 +353,16 @@ static inline bool _to_bool(byte_vec_t b
     t_; \
 })
 # endif
+# if (ELEM_COUNT == 16 && ELEM_SIZE == 4) /* vextracti32x4 */ || \
+       (ELEM_COUNT == 8 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextracti64x2 */
+#  define low_quarter(x) ({ \
+    quarter_t t_; \
+    asm ( "vextracti%c[w]x%c[n] $0, %[s], %[d]" \
+          : [d] "=m" (t_) \
+          : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 4) ); \
+    t_; \
+})
+# endif
 # if INT_SIZE == 4 || UINT_SIZE == 4
 #  define broadcast(x) ({ \
     vec_t t_; \
@@ -318,11 +375,30 @@ static inline bool _to_bool(byte_vec_t b
     asm ( "vpbroadcastd %k1, %0" : "=v" (t_) : "r" (x) ); \
     t_; \
 })
+#  ifdef __AVX512DQ__
+#   define broadcast_pair(x) ({ \
+    vec_t t_; \
+    asm ( "vbroadcasti32x2 %1, %0" : "=v" (t_) : "m" (x) ); \
+    t_; \
+})
+#  endif
+#  if VEC_SIZE == 64 && defined(__AVX512DQ__)
+#   define broadcast_octet(x) ((vec_t)B(broadcasti32x8_, _mask, (vsi_octet_t)(x), (vsi_t)undef(), ~0))
+#   define insert_octet(x, y, p) ((vec_t)B(inserti32x8_, _mask, (vsi_t)(x), (vsi_octet_t)(y), p, (vsi_t)undef(), ~0))
+#  endif
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) ((vec_t)B(punpckhdq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(punpckldq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0))
 #   define swap(x) ((vec_t)B(pshufd, _mask, (vsi_t)(x), 0b00011011, (vsi_t)undef(), ~0))
 #  else
+#   define broadcast_quartet(x) ((vec_t)B(broadcasti32x4_, _mask, (vsi_quartet_t)(x), (vsi_t)undef(), ~0))
+#   define insert_pair(x, y, p) \
+    (vec_t)(B(inserti32x4_, _mask, (vsi_t)(x), \
+              /* First cast needed below to work around gcc 7.x quirk. */ \
+              (p) & 1 ? (vsi_pair_t)__builtin_ia32_pshufd((vsi_pair_t)(y), 0b01000100) \
+                      : (vsi_pair_t)(y), \
+              (p) >> 1, (vsi_t)(x), 3 << ((p) * 2)))
+#   define insert_quartet(x, y, p) ((vec_t)B(inserti32x4_, _mask, (vsi_t)(x), (vsi_quartet_t)(y), p, (vsi_t)undef(), ~0))
 #   define interleave_hi(x, y) ((vec_t)B(vpermi2vard, _mask, (vsi_t)(x), interleave_hi, (vsi_t)(y), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(vpermt2vard, _mask, interleave_lo, (vsi_t)(x), (vsi_t)(y), ~0))
 #   define swap(x) ((vec_t)B(pshufd, _mask, \
@@ -347,6 +423,14 @@ static inline bool _to_bool(byte_vec_t b
     t_; \
 })
 #  endif
+#  if VEC_SIZE >= 32 && defined(__AVX512DQ__)
+#   define broadcast_pair(x) ((vec_t)B(broadcasti64x2_, _mask, (vdi_pair_t)(x), (vdi_t)undef(), ~0))
+#   define insert_pair(x, y, p) ((vec_t)B(inserti64x2_, _mask, (vdi_t)(x), (vdi_pair_t)(y), p, (vdi_t)undef(), ~0))
+#  endif
+#  if VEC_SIZE == 64
+#   define broadcast_quartet(x) ((vec_t)B(broadcasti64x4_, , (vdi_quartet_t)(x), (vdi_t)undef(), ~0))
+#   define insert_quartet(x, y, p) ((vec_t)B(inserti64x4_, _mask, (vdi_t)(x), (vdi_quartet_t)(y), p, (vdi_t)undef(), ~0))
+#  endif
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) ((vec_t)B(punpckhqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(punpcklqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
@@ -898,7 +982,7 @@ static inline eighth_t low_eighth(vec_t
     eighth_t y;
     unsigned int i;
 
-    for ( i = 0; i < ELEM_COUNT / 4; ++i )
+    for ( i = 0; i < ELEM_COUNT / 8; ++i )
         y[i] = x[i];
 
     return y;
@@ -910,6 +994,50 @@ static inline eighth_t low_eighth(vec_t
 
 #endif
 
+#ifdef broadcast_pair
+# if ELEM_COUNT == 4
+#  define broadcast_half broadcast_pair
+# elif ELEM_COUNT == 8
+#  define broadcast_quarter broadcast_pair
+# elif ELEM_COUNT == 16
+#  define broadcast_eighth broadcast_pair
+# endif
+#endif
+
+#ifdef insert_pair
+# if ELEM_COUNT == 4
+#  define insert_half insert_pair
+# elif ELEM_COUNT == 8
+#  define insert_quarter insert_pair
+# elif ELEM_COUNT == 16
+#  define insert_eighth insert_pair
+# endif
+#endif
+
+#ifdef broadcast_quartet
+# if ELEM_COUNT == 8
+#  define broadcast_half broadcast_quartet
+# elif ELEM_COUNT == 16
+#  define broadcast_quarter broadcast_quartet
+# endif
+#endif
+
+#ifdef insert_quartet
+# if ELEM_COUNT == 8
+#  define insert_half insert_quartet
+# elif ELEM_COUNT == 16
+#  define insert_quarter insert_quartet
+# endif
+#endif
+
+#if defined(broadcast_octet) && ELEM_COUNT == 16
+# define broadcast_half broadcast_octet
+#endif
+
+#if defined(insert_octet) && ELEM_COUNT == 16
+# define insert_half insert_octet
+#endif
+
 #if defined(__AVX512F__) && defined(FLOAT_SIZE)
 # include "simd-fma.c"
 #endif
@@ -1205,6 +1333,60 @@ int simd_test(void)
     if ( !eq(broadcast2(ELEM_COUNT + 1), src + inv) ) return __LINE__;
 #endif
 
+#if defined(broadcast_half) && defined(insert_half)
+    {
+        half_t aux = low_half(src);
+
+        touch(aux);
+        x = broadcast_half(aux);
+        touch(aux);
+        y = insert_half(src, aux, 1);
+        if ( !eq(x, y) ) return __LINE__;
+    }
+#endif
+
+#if defined(broadcast_quarter) && defined(insert_quarter)
+    {
+        quarter_t aux = low_quarter(src);
+
+        touch(aux);
+        x = broadcast_quarter(aux);
+        touch(aux);
+        y = insert_quarter(src, aux, 1);
+        touch(aux);
+        y = insert_quarter(y, aux, 2);
+        touch(aux);
+        y = insert_quarter(y, aux, 3);
+        if ( !eq(x, y) ) return __LINE__;
+    }
+#endif
+
+#if defined(broadcast_eighth) && defined(insert_eighth) && \
+    /* At least gcc 7.3 "optimizes" away all insert_eighth() calls below. */ \
+    __GNUC__ >= 8
+    {
+        eighth_t aux = low_eighth(src);
+
+        touch(aux);
+        x = broadcast_eighth(aux);
+        touch(aux);
+        y = insert_eighth(src, aux, 1);
+        touch(aux);
+        y = insert_eighth(y, aux, 2);
+        touch(aux);
+        y = insert_eighth(y, aux, 3);
+        touch(aux);
+        y = insert_eighth(y, aux, 4);
+        touch(aux);
+        y = insert_eighth(y, aux, 5);
+        touch(aux);
+        y = insert_eighth(y, aux, 6);
+        touch(aux);
+        y = insert_eighth(y, aux, 7);
+        if ( !eq(x, y) ) return __LINE__;
+    }
+#endif
+
 #if defined(interleave_lo) && defined(interleave_hi)
     touch(src);
     x = interleave_lo(inv, src);
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -23,6 +23,7 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512bw-opmask.h"
 #include "avx512f.h"
 #include "avx512bw.h"
+#include "avx512dq.h"
 
 #define verbose false /* Switch to true for far more logging. */
 
@@ -100,6 +101,11 @@ static bool simd_check_avx512dq(void)
 }
 #define simd_check_avx512dq_opmask simd_check_avx512dq
 
+static bool simd_check_avx512dq_vl(void)
+{
+    return cpu_has_avx512dq && cpu_has_avx512vl;
+}
+
 static bool simd_check_avx512bw(void)
 {
     return cpu_has_avx512bw;
@@ -267,9 +273,10 @@ static const struct {
     SIMD(XOP i32x8,               xop,      32i4),
     SIMD(XOP i64x4,               xop,      32i8),
     SIMD(OPMASK/w,     avx512f_opmask,         2),
-    SIMD(OPMASK/b,    avx512dq_opmask,         1),
-    SIMD(OPMASK/d,    avx512bw_opmask,         4),
-    SIMD(OPMASK/q,    avx512bw_opmask,         8),
+    SIMD(OPMASK+DQ/b, avx512dq_opmask,         1),
+    SIMD(OPMASK+DQ/w, avx512dq_opmask,         2),
+    SIMD(OPMASK+BW/d, avx512bw_opmask,         4),
+    SIMD(OPMASK+BW/q, avx512bw_opmask,         8),
     SIMD(AVX512F f32 scalar,  avx512f,        f4),
     SIMD(AVX512F f32x16,      avx512f,      64f4),
     SIMD(AVX512F f64 scalar,  avx512f,        f8),
@@ -302,6 +309,24 @@ static const struct {
     AVX512VL(BW+VL u16x8,    avx512bw,      16u2),
     AVX512VL(BW+VL s16x16,   avx512bw,      32i2),
     AVX512VL(BW+VL u16x16,   avx512bw,      32u2),
+    SIMD(AVX512DQ f32x16,    avx512dq,      64f4),
+    SIMD(AVX512DQ f64x8,     avx512dq,      64f8),
+    SIMD(AVX512DQ s32x16,    avx512dq,      64i4),
+    SIMD(AVX512DQ u32x16,    avx512dq,      64u4),
+    SIMD(AVX512DQ s64x8,     avx512dq,      64i8),
+    SIMD(AVX512DQ u64x8,     avx512dq,      64u8),
+    AVX512VL(DQ+VL f32x4,    avx512dq,      16f4),
+    AVX512VL(DQ+VL f64x2,    avx512dq,      16f8),
+    AVX512VL(DQ+VL f32x8,    avx512dq,      32f4),
+    AVX512VL(DQ+VL f64x4,    avx512dq,      32f8),
+    AVX512VL(DQ+VL s32x4,    avx512dq,      16i4),
+    AVX512VL(DQ+VL u32x4,    avx512dq,      16u4),
+    AVX512VL(DQ+VL s32x8,    avx512dq,      32i4),
+    AVX512VL(DQ+VL u32x8,    avx512dq,      32u4),
+    AVX512VL(DQ+VL s64x2,    avx512dq,      16i8),
+    AVX512VL(DQ+VL u64x2,    avx512dq,      16u8),
+    AVX512VL(DQ+VL s64x4,    avx512dq,      32i8),
+    AVX512VL(DQ+VL u64x4,    avx512dq,      32u8),
 #undef AVX512VL_
 #undef AVX512VL
 #undef SIMD_

From patchwork Fri Mar 15 10:44:56 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854475
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6336E15AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:36 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A7922A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:36 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 3EC602A937; Fri, 15 Mar 2019 10:46:36 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D0A7B2A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:35 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kKh-00052u-Bz; Fri, 15 Mar 2019 10:44:59 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kKg-00052d-4f
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:44:58 +0000
X-Inumbo-ID: 5f0e27b6-470f-11e9-9848-bb533d287e20
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 5f0e27b6-470f-11e9-9848-bb533d287e20;
 Fri, 15 Mar 2019 10:44:56 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:44:55 -0600
Message-Id: <5C8B8228020000780021F176@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:44:56 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 15/50] x86emul: support AVX512F move high/low
 insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

No explicit test harness additions other than the overrides, as the
compiler already makes use of the insns.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v8: No need to set fault_suppression to false.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -253,6 +253,16 @@ static const struct test avx512f_128[] =
     INSN(insertps,  66, 0f3a, 21, el,    d, el),
     INSN(mov,       66,   0f, 6e, el, dq64, el),
     INSN(mov,       66,   0f, 7e, el, dq64, el),
+//       movhlps,     ,   0f, 12,        d
+    INSN(movhpd,    66,   0f, 16, el,    q, vl),
+    INSN(movhpd,    66,   0f, 17, el,    q, vl),
+    INSN(movhps,      ,   0f, 16, el_2,  d, vl),
+    INSN(movhps,      ,   0f, 17, el_2,  d, vl),
+//       movlhps,     ,   0f, 16,        d
+    INSN(movlpd,    66,   0f, 12, el,    q, vl),
+    INSN(movlpd,    66,   0f, 13, el,    q, vl),
+    INSN(movlps,      ,   0f, 12, el_2,  d, vl),
+    INSN(movlps,      ,   0f, 13, el_2,  d, vl),
     INSN(movq,      f3,   0f, 7e, el,    q, el),
     INSN(movq,      66,   0f, d6, el,    q, el),
 };
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -266,6 +266,12 @@ OVR(movd);
 OVR(movq);
 OVR_SFP(mov);
 OVR_VFP(mova);
+OVR(movhlps);
+OVR(movhpd);
+OVR(movhps);
+OVR(movlhps);
+OVR(movlpd);
+OVR(movlps);
 OVR_VFP(movnt);
 OVR_VFP(movu);
 OVR_FP(mul);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -286,11 +286,11 @@ static const struct twobyte_table {
     [0x0f] = { ModRM|SrcImmByte },
     [0x10] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp, d8s_vl },
     [0x11] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp, d8s_vl },
-    [0x12] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
-    [0x13] = { DstMem|SrcImplicit|ModRM|Mov, simd_other },
+    [0x12] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, 3 },
+    [0x13] = { DstMem|SrcImplicit|ModRM|Mov, simd_other, 3 },
     [0x14 ... 0x15] = { DstImplicit|SrcMem|ModRM, simd_packed_fp, d8s_vl },
-    [0x16] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
-    [0x17] = { DstMem|SrcImplicit|ModRM|Mov, simd_other },
+    [0x16] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, 3 },
+    [0x17] = { DstMem|SrcImplicit|ModRM|Mov, simd_other, 3 },
     [0x18 ... 0x1f] = { ImplicitOps|ModRM },
     [0x20 ... 0x21] = { DstMem|SrcImplicit|ModRM },
     [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM },
@@ -6032,6 +6032,25 @@ x86_emulate(
         op_bytes = 8;
         goto simd_0f_fp;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x12):   /* vmovlpd m64,xmm,xmm */
+    CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x13): /* vmovlp{s,d} xmm,m64 */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x16):   /* vmovhpd m64,xmm,xmm */
+    CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x17): /* vmovhp{s,d} xmm,m64 */
+        generate_exception_if(ea.type != OP_MEM, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX(0x0f, 0x12):      /* vmovlps m64,xmm,xmm */
+                                            /* vmovhlps xmm,xmm,xmm */
+    case X86EMUL_OPC_EVEX(0x0f, 0x16):      /* vmovhps m64,xmm,xmm */
+                                            /* vmovlhps xmm,xmm,xmm */
+        generate_exception_if((evex.lr || evex.opmsk || evex.brs ||
+                               evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK)),
+                              EXC_UD);
+        host_and_vcpu_must_have(avx512f);
+        if ( (d & DstMask) != DstMem )
+            d &= ~TwoOp;
+        op_bytes = 8;
+        goto simd_zmm;
+
     case X86EMUL_OPC_F3(0x0f, 0x12):       /* movsldup xmm/m128,xmm */
     case X86EMUL_OPC_VEX_F3(0x0f, 0x12):   /* vmovsldup {x,y}mm/mem,{x,y}mm */
     case X86EMUL_OPC_F2(0x0f, 0x12):       /* movddup xmm/m64,xmm */

From patchwork Fri Mar 15 10:45:21 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854477
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C37F15AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:57 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64D172A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:57 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 58BF82A937; Fri, 15 Mar 2019 10:46:57 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DF84D2A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:46:56 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kL8-0005B6-P0; Fri, 15 Mar 2019 10:45:26 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kL6-0005A6-Oa
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:45:24 +0000
X-Inumbo-ID: 6f4167f9-470f-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 6f4167f9-470f-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:45:23 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:45:22 -0600
Message-Id: <5C8B8241020000780021F1BE@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:45:21 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 16/50] x86emul: support AVX512F move
 duplicate insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Judging from insn prefixes, these are scalar insns, but their (memory)
operands are vector ones (with the exception of 128-bit VMOVDDUP). For
this some adjustments to disp8scale calculation code are needed.

No explicit test harness additions other than the overrides, as the
compiler already makes use of the insns.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v6: Fix Disp8 test for VMOVDDUP when AVX512VL is unavailable.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -146,6 +146,7 @@ static const struct test avx512f_all[] =
     INSN_SFP(mov,            0f, 11),
     INSN_PFP_NB(mova,        0f, 28),
     INSN_PFP_NB(mova,        0f, 29),
+    INSN(movddup,      f2,   0f, 12,    vl,   q_nb, vl),
     INSN(movdqa32,     66,   0f, 6f,    vl,   d_nb, vl),
     INSN(movdqa32,     66,   0f, 7f,    vl,   d_nb, vl),
     INSN(movdqa64,     66,   0f, 6f,    vl,   q_nb, vl),
@@ -157,6 +158,8 @@ static const struct test avx512f_all[] =
     INSN(movntdq,      66,   0f, e7,    vl,   d_nb, vl),
     INSN(movntdqa,     66, 0f38, 2a,    vl,   d_nb, vl),
     INSN_PFP_NB(movnt,       0f, 2b),
+    INSN(movshdup,     f3,   0f, 16,    vl,   d_nb, vl),
+    INSN(movsldup,     f3,   0f, 12,    vl,   d_nb, vl),
     INSN_PFP_NB(movu,        0f, 10),
     INSN_PFP_NB(movu,        0f, 11),
     INSN_FP(mul,             0f, 59),
@@ -694,6 +697,19 @@ static void test_group(const struct test
 
             switch ( tests[i].esz )
             {
+            case ESZ_q_nb:
+                /* The 128-bit form of VMOVDDUP needs special casing. */
+                if ( vl[j] == VL_128 && tests[i].spc == SPC_0f &&
+                     tests[i].opc == 0x12 && tests[i].pfx == PFX_f2 )
+                {
+                    struct test test = tests[i];
+
+                    test.vsz = VSZ_el;
+                    test.scale = SC_el;
+                    test_one(&test, vl[j], instr, ctxt);
+                    continue;
+                }
+                /* fall through */
             default:
                 test_one(&tests[i], vl[j], instr, ctxt);
                 break;
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -326,8 +326,11 @@ REN(pandn, , d);
 REN(por, , d);
 REN(pxor, , d);
 #  endif
+OVR(movddup);
 OVR(movntdq);
 OVR(movntdqa);
+OVR(movshdup);
+OVR(movsldup);
 OVR(pmovsxbd);
 OVR(pmovsxbq);
 OVR(pmovsxdq);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3048,6 +3048,15 @@ x86_decode(
 
             switch ( b )
             {
+            case 0x12: /* vmovsldup / vmovddup */
+                if ( evex.pfx == vex_f2 )
+                    disp8scale = evex.lr ? 4 + evex.lr : 3;
+                /* fall through */
+            case 0x16: /* vmovshdup */
+                if ( evex.pfx == vex_f3 )
+                    disp8scale = 4 + evex.lr;
+                break;
+
             case 0x20: /* mov cr,reg */
             case 0x21: /* mov dr,reg */
             case 0x22: /* mov reg,cr */
@@ -6066,6 +6075,20 @@ x86_emulate(
         host_and_vcpu_must_have(sse3);
         goto simd_0f_xmm;
 
+    case X86EMUL_OPC_EVEX_F3(0x0f, 0x12):   /* vmovsldup [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_F2(0x0f, 0x12):   /* vmovddup [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f, 0x16):   /* vmovshdup [xyz]mm/mem,[xyz]mm{k} */
+        generate_exception_if((evex.brs ||
+                               evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK)),
+                              EXC_UD);
+        host_and_vcpu_must_have(avx512f);
+        avx512_vlen_check(false);
+        d |= TwoOp;
+        op_bytes = !(evex.pfx & VEX_PREFIX_DOUBLE_MASK) || evex.lr
+                   ? 16 << evex.lr : 8;
+        fault_suppression = false;
+        goto simd_zmm;
+
     CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x14): /* vunpcklp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x15): /* vunpckhp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         generate_exception_if(evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK),

From patchwork Fri Mar 15 10:46:04 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854479
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CF6615AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:47:50 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 511292A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:47:50 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 456A42A937; Fri, 15 Mar 2019 10:47:50 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 659F62A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:47:49 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kLp-0005MQ-8n; Fri, 15 Mar 2019 10:46:09 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kLn-0005M6-Ly
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:46:07 +0000
X-Inumbo-ID: 87f83496-470f-11e9-84e6-a7c41e3a5e29
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 87f83496-470f-11e9-84e6-a7c41e3a5e29;
 Fri, 15 Mar 2019 10:46:04 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:46:03 -0600
Message-Id: <5C8B826C020000780021F1C1@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:46:04 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 17/50] x86emul: support AVX512{F, BW,
 _VBMI} permute insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v5: Re-base over changes earlier in the series.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -178,6 +178,10 @@ static const struct test avx512f_all[] =
     INSN(pcmpu,        66, 0f3a, 1e,    vl,     dq, vl),
     INSN(permi2,       66, 0f38, 76,    vl,     dq, vl),
     INSN(permi2,       66, 0f38, 77,    vl,     sd, vl),
+    INSN(permilpd,     66, 0f38, 0d,    vl,      q, vl),
+    INSN(permilpd,     66, 0f3a, 05,    vl,      q, vl),
+    INSN(permilps,     66, 0f38, 0c,    vl,      d, vl),
+    INSN(permilps,     66, 0f3a, 04,    vl,      d, vl),
     INSN(permt2,       66, 0f38, 7e,    vl,     dq, vl),
     INSN(permt2,       66, 0f38, 7f,    vl,     sd, vl),
     INSN(pmaxs,        66, 0f38, 3d,    vl,     dq, vl),
@@ -278,6 +282,10 @@ static const struct test avx512f_no128[]
     INSN(extracti32x4,   66, 0f3a, 39, el_4,  d, vl),
     INSN(insertf32x4,    66, 0f3a, 18, el_4,  d, vl),
     INSN(inserti32x4,    66, 0f3a, 38, el_4,  d, vl),
+    INSN(perm,           66, 0f38, 36, vl,   dq, vl),
+    INSN(perm,           66, 0f38, 16, vl,   sd, vl),
+    INSN(permpd,         66, 0f3a, 01, vl,    q, vl),
+    INSN(permq,          66, 0f3a, 00, vl,    q, vl),
     INSN(shuff32x4,      66, 0f3a, 23, vl,    d, vl),
     INSN(shuff64x2,      66, 0f3a, 23, vl,    q, vl),
     INSN(shufi32x4,      66, 0f3a, 43, vl,    d, vl),
@@ -316,6 +324,7 @@ static const struct test avx512bw_all[]
     INSN(pcmpgtb,     66,   0f, 64,    vl,    b, vl),
     INSN(pcmpgtw,     66,   0f, 65,    vl,    w, vl),
     INSN(pcmpu,       66, 0f3a, 3e,    vl,   bw, vl),
+    INSN(permw,       66, 0f38, 8d,    vl,    w, vl),
     INSN(permi2w,     66, 0f38, 75,    vl,    w, vl),
     INSN(permt2w,     66, 0f38, 7d,    vl,    w, vl),
     INSN(pmaddwd,     66,   0f, f5,    vl,    w, vl),
@@ -412,6 +421,7 @@ static const struct test avx512dq_512[]
 };
 
 static const struct test avx512_vbmi_all[] = {
+    INSN(permb,         66, 0f38, 8d, vl, b, vl),
     INSN(permi2b,       66, 0f38, 75, vl, b, vl),
     INSN(permt2b,       66, 0f38, 7d, vl, b, vl),
 };
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -186,6 +186,7 @@ static inline bool _to_bool(byte_vec_t b
 #   define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0)
 #   define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0)
 #   define swap(x) B(shufps, _mask, x, x, 0b00011011, undef(), ~0)
+#   define swap2(x) B_(vpermilps, _mask, x, 0b00011011, undef(), ~0)
 #  else
 #   define broadcast_quartet(x) B(broadcastf32x4_, _mask, x, undef(), ~0)
 #   define insert_pair(x, y, p) \
@@ -200,6 +201,10 @@ static inline bool _to_bool(byte_vec_t b
     vec_t t_ = B(shuf_f32x4_, _mask, x, x, VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0); \
     B(shufps, _mask, t_, t_, 0b00011011, undef(), ~0); \
 })
+#   define swap2(x) B(vpermilps, _mask, \
+                       B(shuf_f32x4_, _mask, x, x, \
+                         VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0), \
+                       0b00011011, undef(), ~0)
 #  endif
 # elif FLOAT_SIZE == 8
 #  if VEC_SIZE >= 32
@@ -233,6 +238,7 @@ static inline bool _to_bool(byte_vec_t b
 #   define interleave_hi(x, y) B(unpckhpd, _mask, x, y, undef(), ~0)
 #   define interleave_lo(x, y) B(unpcklpd, _mask, x, y, undef(), ~0)
 #   define swap(x) B(shufpd, _mask, x, x, 0b01, undef(), ~0)
+#   define swap2(x) B_(vpermilpd, _mask, x, 0b01, undef(), ~0)
 #  else
 #   define interleave_hi(x, y) B(vpermi2varpd, _mask, x, interleave_hi, y, ~0)
 #   define interleave_lo(x, y) B(vpermt2varpd, _mask, interleave_lo, x, y, ~0)
@@ -240,6 +246,10 @@ static inline bool _to_bool(byte_vec_t b
     vec_t t_ = B(shuf_f64x2_, _mask, x, x, VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0); \
     B(shufpd, _mask, t_, t_, 0b01010101, undef(), ~0); \
 })
+#   define swap2(x) B(vpermilpd, _mask, \
+                       B(shuf_f64x2_, _mask, x, x, \
+                         VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0), \
+                       0b01010101, undef(), ~0)
 #  endif
 # endif
 #elif FLOAT_SIZE == 4 && defined(__SSE__)
@@ -405,6 +415,7 @@ static inline bool _to_bool(byte_vec_t b
                              B(shuf_i32x4_, _mask, (vsi_t)(x), (vsi_t)(x), \
                                VEC_SIZE == 32 ? 0b01 : 0b00011011, (vsi_t)undef(), ~0), \
                              0b00011011, (vsi_t)undef(), ~0))
+#   define swap2(x) ((vec_t)B_(permvarsi, _mask, (vsi_t)(x), (vsi_t)(inv - 1), (vsi_t)undef(), ~0))
 #  endif
 #  define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \
                               (0b0101010101010101 & ((1 << ELEM_COUNT) - 1))))
@@ -442,8 +453,17 @@ static inline bool _to_bool(byte_vec_t b
                              (vsi_t)B(shuf_i64x2_, _mask, (vdi_t)(x), (vdi_t)(x), \
                                       VEC_SIZE == 32 ? 0b01 : 0b00011011, (vdi_t)undef(), ~0), \
                              0b01001110, (vsi_t)undef(), ~0))
+#   define swap2(x) ((vec_t)B(permvardi, _mask, (vdi_t)(x), (vdi_t)(inv - 1), (vdi_t)undef(), ~0))
 #  endif
 #  define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101))
+#  if VEC_SIZE == 32
+#   define swap3(x) ((vec_t)B_(permdi, _mask, (vdi_t)(x), 0b00011011, (vdi_t)undef(), ~0))
+#  elif VEC_SIZE == 64
+#   define swap3(x) ({ \
+    vdi_t t_ = B_(permdi, _mask, (vdi_t)(x), 0b00011011, (vdi_t)undef(), ~0); \
+    B(shuf_i64x2_, _mask, t_, t_, 0b01001110, (vdi_t)undef(), ~0); \
+})
+#  endif
 # endif
 # if INT_SIZE == 4
 #  define max(x, y) B(pmaxsd, _mask, x, y, undef(), ~0)
@@ -489,6 +509,9 @@ static inline bool _to_bool(byte_vec_t b
 #  define shrink1(x) ((half_t)B(pmovwb, _mask, (vhi_t)(x), (vqi_half_t){}, ~0))
 #  define shrink2(x) ((quarter_t)B(pmovdb, _mask, (vsi_t)(x), (vqi_quarter_t){}, ~0))
 #  define shrink3(x) ((eighth_t)B(pmovqb, _mask, (vdi_t)(x), (vqi_eighth_t){}, ~0))
+#  ifdef __AVX512VBMI__
+#   define swap2(x) ((vec_t)B(permvarqi, _mask, (vqi_t)(x), (vqi_t)(inv - 1), (vqi_t)undef(), ~0))
+#  endif
 # elif INT_SIZE == 2 || UINT_SIZE == 2
 #  define broadcast(x) ({ \
     vec_t t_; \
@@ -517,6 +540,7 @@ static inline bool _to_bool(byte_vec_t b
                               (0b01010101010101010101010101010101 & ALL_TRUE)))
 #  define shrink1(x) ((half_t)B(pmovdw, _mask, (vsi_t)(x), (vhi_half_t){}, ~0))
 #  define shrink2(x) ((quarter_t)B(pmovqw, _mask, (vdi_t)(x), (vhi_quarter_t){}, ~0))
+#  define swap2(x) ((vec_t)B(permvarhi, _mask, (vhi_t)(x), (vhi_t)(inv - 1), (vhi_t)undef(), ~0))
 # endif
 # if INT_SIZE == 1
 #  define max(x, y) ((vec_t)B(pmaxsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
@@ -1325,6 +1349,12 @@ int simd_test(void)
     if ( !eq(swap2(src), inv) ) return __LINE__;
 #endif
 
+#ifdef swap3
+    touch(src);
+    if ( !eq(swap3(src), inv) ) return __LINE__;
+    touch(src);
+#endif
+
 #ifdef broadcast
     if ( !eq(broadcast(ELEM_COUNT + 1), src + inv) ) return __LINE__;
 #endif
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -275,6 +275,8 @@ OVR(movlps);
 OVR_VFP(movnt);
 OVR_VFP(movu);
 OVR_FP(mul);
+OVR_VFP(perm);
+OVR_VFP(permil);
 OVR_VFP(shuf);
 OVR_INT(sll);
 OVR_DQ(sllv);
@@ -331,6 +333,8 @@ OVR(movntdq);
 OVR(movntdqa);
 OVR(movshdup);
 OVR(movsldup);
+OVR(permd);
+OVR(permq);
 OVR(pmovsxbd);
 OVR(pmovsxbq);
 OVR(pmovsxdq);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -434,7 +434,8 @@ static const struct ext0f38_table {
 } ext0f38_table[256] = {
     [0x00] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x01 ... 0x0b] = { .simd_size = simd_packed_int },
-    [0x0c ... 0x0f] = { .simd_size = simd_packed_fp },
+    [0x0c ... 0x0d] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
+    [0x0e ... 0x0f] = { .simd_size = simd_packed_fp },
     [0x10 ... 0x12] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x13] = { .simd_size = simd_other, .two_op = 1 },
     [0x14 ... 0x16] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
@@ -477,6 +478,7 @@ static const struct ext0f38_table {
     [0x7d ... 0x7e] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x7f] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x8c] = { .simd_size = simd_packed_int },
+    [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 },
     [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 },
     [0x96 ... 0x98] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
@@ -522,10 +524,10 @@ static const struct ext0f3a_table {
     uint8_t four_op:1;
     disp8scale_t d8s:4;
 } ext0f3a_table[256] = {
-    [0x00] = { .simd_size = simd_packed_int, .two_op = 1 },
-    [0x01] = { .simd_size = simd_packed_fp, .two_op = 1 },
+    [0x00] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
+    [0x01] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x02] = { .simd_size = simd_packed_int },
-    [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1 },
+    [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x06] = { .simd_size = simd_packed_fp },
     [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1 },
     [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc },
@@ -8102,6 +8104,9 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf2): /* vpslld xmm/m128,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf3): /* vpsllq xmm/m128,[xyz]mm,[xyz]mm{k} */
         generate_exception_if(evex.brs, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x0c): /* vpermilps [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x0d): /* vpermilpd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         fault_suppression = false;
         if ( b == 0xe2 )
             goto avx512f_no_sae;
@@ -8447,6 +8452,12 @@ x86_emulate(
         generate_exception_if(!vex.l || vex.w, EXC_UD);
         goto simd_0f_avx2;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x16): /* vpermp{s,d} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x36): /* vperm{d,q} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
+        generate_exception_if(!evex.lr, EXC_UD);
+        fault_suppression = false;
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0x20): /* vpmovsxbw xmm/mem,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,{x,y}mm */
@@ -8652,6 +8663,7 @@ x86_emulate(
 
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x8d): /* vperm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         if ( !evex.w )
             host_and_vcpu_must_have(avx512_vbmi);
         else
@@ -9077,6 +9089,12 @@ x86_emulate(
         generate_exception_if(!vex.l || !vex.w, EXC_UD);
         goto simd_0f_imm8_avx2;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x00): /* vpermq $imm8,{y,z}mm/mem,{y,z}mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x01): /* vpermpd $imm8,{y,z}mm/mem,{y,z}mm{k} */
+        generate_exception_if(!evex.lr || !evex.w, EXC_UD);
+        fault_suppression = false;
+        goto avx512f_imm8_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x38): /* vinserti128 $imm8,xmm/m128,ymm,ymm */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x39): /* vextracti128 $imm8,ymm,xmm/m128 */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x46): /* vperm2i128 $imm8,ymm/m256,ymm,ymm */
@@ -9096,6 +9114,12 @@ x86_emulate(
         generate_exception_if(vex.w, EXC_UD);
         goto simd_0f_imm8_avx;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x04): /* vpermilps $imm8,[xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x05): /* vpermilpd $imm8,[xyz]mm/mem,[xyz]mm{k} */
+        generate_exception_if(evex.w != (b & 1), EXC_UD);
+        fault_suppression = false;
+        goto avx512f_imm8_no_sae;
+
     case X86EMUL_OPC_66(0x0f3a, 0x08): /* roundps $imm8,xmm/m128,xmm */
     case X86EMUL_OPC_66(0x0f3a, 0x09): /* roundpd $imm8,xmm/m128,xmm */
     case X86EMUL_OPC_66(0x0f3a, 0x0a): /* roundss $imm8,xmm/m128,xmm */

From patchwork Fri Mar 15 10:46:33 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854481
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D1FC13B5
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:48:11 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86C502A934
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:48:11 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7B4F62A938; Fri, 15 Mar 2019 10:48:11 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 748992A936
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:48:10 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kMH-0005TD-Ke; Fri, 15 Mar 2019 10:46:37 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kMG-0005Sw-K9
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:46:36 +0000
X-Inumbo-ID: 9a05e01e-470f-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 9a05e01e-470f-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:46:34 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:46:34 -0600
Message-Id: <5C8B8289020000780021F1C4@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:46:33 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 18/50] x86emul: support AVX512BW pack insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

No further test harness additions - what is there is good enough for
these rather "regular" insns.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -306,6 +306,10 @@ static const struct test avx512bw_all[]
     INSN(movdqu8,     f2,   0f, 7f,    vl,    b, vl),
     INSN(movdqu16,    f2,   0f, 6f,    vl,    w, vl),
     INSN(movdqu16,    f2,   0f, 7f,    vl,    w, vl),
+    INSN(packssdw,    66,   0f, 6b,    vl, d_nb, vl),
+    INSN(packsswb,    66,   0f, 63,    vl,    w, vl),
+    INSN(packusdw,    66, 0f38, 2b,    vl, d_nb, vl),
+    INSN(packuswb,    66,   0f, 67,    vl,    w, vl),
     INSN(paddb,       66,   0f, fc,    vl,    b, vl),
     INSN(paddsb,      66,   0f, ec,    vl,    b, vl),
     INSN(paddsw,      66,   0f, ed,    vl,    w, vl),
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -361,6 +361,10 @@ OVR(pextrw);
 OVR(pinsrb);
 OVR(pinsrw);
 #  ifdef __AVX512VL__
+OVR(packssdw);
+OVR(packsswb);
+OVR(packusdw);
+OVR(packuswb);
 OVR(pmaddwd);
 OVR(pmovsxbw);
 OVR(pmovzxbw);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -453,7 +453,7 @@ static const struct ext0f38_table {
     [0x25] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x26 ... 0x29] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x2a] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
-    [0x2b] = { .simd_size = simd_packed_int },
+    [0x2b] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x2c ... 0x2d] = { .simd_size = simd_packed_fp },
     [0x2e ... 0x2f] = { .simd_size = simd_packed_fp, .to_mem = 1 },
     [0x30] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
@@ -6744,6 +6744,8 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f, 0x69): /* vpunpckhwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         op_bytes = 16 << evex.lr;
         /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x63): /* vpacksswb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x67): /* vpackuswb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xd1): /* vpsrlw xmm/m128,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xe1): /* vpsraw xmm/m128,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf1): /* vpsllw xmm/m128,[xyz]mm,[xyz]mm{k} */
@@ -6805,6 +6807,12 @@ x86_emulate(
         avx512_vlen_check(false);
         goto simd_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x6b): /* vpackssdw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x2b): /* vpackusdw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        generate_exception_if(evex.w || evex.brs, EXC_UD);
+        fault_suppression = false;
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_EVEX_66(0x0f, 0x6c): /* vpunpcklqdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0x6d): /* vpunpckhqdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         fault_suppression = false;

From patchwork Fri Mar 15 10:47:27 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854483
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FDFF1880
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:49:24 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D87F42A271
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:49:23 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id CBF162A2CF; Fri, 15 Mar 2019 10:49:23 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AF6942A1CA
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:49:22 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kNB-0005d4-1f; Fri, 15 Mar 2019 10:47:33 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kN8-0005cl-UU
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:47:31 +0000
X-Inumbo-ID: ba92195c-470f-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id ba92195c-470f-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:47:29 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:47:28 -0600
Message-Id: <5C8B82BF020000780021F1C7@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:47:27 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 19/50] x86emul: support AVX512F
 floating-point conversion insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

VCVTPS2PD, sharing its main opcode with others, needs a "manual"
override of disp8scale.

The simd_size change for twobyte_table[0x5a] is benign to pre-existing
code, but allows decode_disp8scale() to work as is here.

Also correct the comment on an AVX counterpart.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: ea.type == OP_* -> ea.type != OP_*. Re-base.
v6: Re-base over changes earlier in the series.
v5: Re-base over changes earlier in the series.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -109,6 +109,12 @@ static const struct test avx512f_all[] =
     INSN_FP(cmp,             0f, c2),
     INSN(comisd,       66,   0f, 2f,    el,      q, el),
     INSN(comiss,         ,   0f, 2f,    el,      d, el),
+    INSN(cvtpd2ps,     66,   0f, 5a,    vl,      q, vl),
+    INSN(cvtph2ps,     66, 0f38, 13,    vl_2, d_nb, vl),
+    INSN(cvtps2pd,       ,   0f, 5a,    vl_2,    d, vl),
+    INSN(cvtps2ph,     66, 0f3a, 1d,    vl_2, d_nb, vl),
+    INSN(cvtsd2ss,     f2,   0f, 5a,    el,      q, el),
+    INSN(cvtss2sd,     f3,   0f, 5a,    el,      d, el),
     INSN_FP(div,             0f, 5e),
     INSN(fmadd132,     66, 0f38, 98,    vl,     sd, vl),
     INSN(fmadd132,     66, 0f38, 99,    el,     sd, el),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -181,7 +181,9 @@ static inline bool _to_bool(byte_vec_t b
 #  define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0)
 #  define min(x, y) BR_(minps, _mask, x, y, undef(), ~0)
 #  define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE))
+#  define shrink1(x) BR_(cvtpd2ps, _mask, (vdf_t)(x), (vsf_half_t){}, ~0)
 #  define sqrt(x) BR(sqrtps, _mask, x, undef(), ~0)
+#  define widen1(x) ((vec_t)BR(cvtps2pd, _mask, x, (vdf_t)undef(), ~0))
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0)
 #   define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0)
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -68,6 +68,7 @@ typedef short __attribute__((vector_size
 typedef int __attribute__((vector_size(VEC_SIZE))) vsi_t;
 #if VEC_SIZE >= 8
 typedef long long __attribute__((vector_size(VEC_SIZE))) vdi_t;
+typedef double __attribute__((vector_size(VEC_SIZE))) vdf_t;
 #endif
 
 #if ELEM_SIZE == 1
@@ -93,6 +94,7 @@ typedef char __attribute__((vector_size(
 typedef short __attribute__((vector_size(HALF_SIZE))) vhi_half_t;
 typedef int __attribute__((vector_size(HALF_SIZE))) vsi_half_t;
 typedef long long __attribute__((vector_size(HALF_SIZE))) vdi_half_t;
+typedef float __attribute__((vector_size(HALF_SIZE))) vsf_half_t;
 # endif
 
 # if ELEM_COUNT >= 4
@@ -328,6 +330,13 @@ REN(pandn, , d);
 REN(por, , d);
 REN(pxor, , d);
 #  endif
+OVR(cvtpd2psx);
+OVR(cvtpd2psy);
+OVR(cvtph2ps);
+OVR(cvtps2pd);
+OVR(cvtps2ph);
+OVR(cvtsd2ss);
+OVR(cvtss2sd);
 OVR(movddup);
 OVR(movntdq);
 OVR(movntdqa);
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -3871,6 +3871,49 @@ int main(int argc, char **argv)
     else
         printf("skipped\n");
 
+    printf("%-40s", "Testing vcvtph2ps 32(%ecx),%zmm7{%k4}...");
+    if ( stack_exec && cpu_has_avx512f )
+    {
+        decl_insn(evex_vcvtph2ps);
+        decl_insn(evex_vcvtps2ph);
+
+        asm volatile ( "vpternlogd $0x81, %%zmm7, %%zmm7, %%zmm7\n\t"
+                       "kmovw %1,%%k4\n"
+                       put_insn(evex_vcvtph2ps, "vcvtph2ps 32(%0), %%zmm7%{%%k4%}")
+                       :: "c" (NULL), "r" (0x3333) );
+
+        set_insn(evex_vcvtph2ps);
+        memset(res, 0xff, 128);
+        res[8] = 0x40003c00; /* (1.0, 2.0) */
+        res[10] = 0x44004200; /* (3.0, 4.0) */
+        res[12] = 0x3400b800; /* (-.5, .25) */
+        res[14] = 0xbc000000; /* (0.0, -1.) */
+        regs.ecx = (unsigned long)res;
+        rc = x86_emulate(&ctxt, &emulops);
+        asm volatile ( "vmovups %%zmm7, %0" : "=m" (res[16]) );
+        if ( rc != X86EMUL_OKAY || !check_eip(evex_vcvtph2ps) )
+            goto fail;
+        printf("okay\n");
+
+        printf("%-40s", "Testing vcvtps2ph $0,%zmm3,64(%edx){%k4}...");
+        asm volatile ( "vmovups %0, %%zmm3\n"
+                       put_insn(evex_vcvtps2ph, "vcvtps2ph $0, %%zmm3, 128(%1)%{%%k4%}")
+                       :: "m" (res[16]), "d" (NULL) );
+
+        set_insn(evex_vcvtps2ph);
+        regs.edx = (unsigned long)res;
+        memset(res + 32, 0xcc, 32);
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(evex_vcvtps2ph) )
+            goto fail;
+        res[15] = res[13] = res[11] = res[9] = 0xcccccccc;
+        if ( memcmp(res + 8, res + 32, 32) )
+            goto fail;
+        printf("okay\n");
+    }
+    else
+        printf("skipped\n");
+
 #undef decl_insn
 #undef put_insn
 #undef set_insn
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -310,7 +310,8 @@ static const struct twobyte_table {
     [0x52 ... 0x53] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_single_fp },
     [0x54 ... 0x57] = { DstImplicit|SrcMem|ModRM, simd_packed_fp, d8s_vl },
     [0x58 ... 0x59] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl },
-    [0x5a ... 0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
+    [0x5a] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp, d8s_vl },
+    [0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
     [0x5c ... 0x5f] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl },
     [0x60 ... 0x62] = { DstImplicit|SrcMem|ModRM, simd_other, d8s_vl },
     [0x63 ... 0x67] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
@@ -437,7 +438,7 @@ static const struct ext0f38_table {
     [0x0c ... 0x0d] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x0e ... 0x0f] = { .simd_size = simd_packed_fp },
     [0x10 ... 0x12] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
-    [0x13] = { .simd_size = simd_other, .two_op = 1 },
+    [0x13] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x14 ... 0x16] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x17] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0x18] = { .simd_size = simd_scalar_opc, .two_op = 1, .d8s = 2 },
@@ -541,7 +542,7 @@ static const struct ext0f3a_table {
     [0x19] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 },
     [0x1a] = { .simd_size = simd_256, .d8s = d8s_vl_by_2 },
     [0x1b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 },
-    [0x1d] = { .simd_size = simd_other, .to_mem = 1, .two_op = 1 },
+    [0x1d] = { .simd_size = simd_other, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x1e ... 0x1f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x20] = { .simd_size = simd_none, .d8s = 0 },
     [0x21] = { .simd_size = simd_other, .d8s = 2 },
@@ -3071,6 +3072,11 @@ x86_decode(
                 modrm_mod = 3;
                 break;
 
+            case 0x5a: /* vcvtps2pd needs special casing */
+                if ( disp8scale && !evex.pfx && !evex.brs )
+                    --disp8scale;
+                break;
+
             case 0x7e: /* vmovq xmm/m64,xmm needs special casing */
                 if ( disp8scale == 2 && evex.pfx == vex_f3 )
                     disp8scale = 3;
@@ -5998,6 +6004,7 @@ x86_emulate(
     CASE_SIMD_ALL_FP(_EVEX, 0x0f, 0x5d):    /* vmin{p,s}{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     CASE_SIMD_ALL_FP(_EVEX, 0x0f, 0x5e):    /* vdiv{p,s}{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     CASE_SIMD_ALL_FP(_EVEX, 0x0f, 0x5f):    /* vmax{p,s}{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    avx512f_all_fp:
         generate_exception_if((evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK) ||
                                (ea.type != OP_REG && evex.brs &&
                                 (evex.pfx & VEX_PREFIX_SCALAR_MASK))),
@@ -6557,7 +6564,7 @@ x86_emulate(
         goto simd_zmm;
 
     CASE_SIMD_ALL_FP(, 0x0f, 0x5a):        /* cvt{p,s}{s,d}2{p,s}{s,d} xmm/mem,xmm */
-    CASE_SIMD_ALL_FP(_VEX, 0x0f, 0x5a):    /* vcvtp{s,d}2p{s,d} xmm/mem,xmm */
+    CASE_SIMD_ALL_FP(_VEX, 0x0f, 0x5a):    /* vcvtp{s,d}2p{s,d} {x,y}mm/mem,{x,y}mm */
                                            /* vcvts{s,d}2s{s,d} xmm/mem,xmm,xmm */
         op_bytes = 4 << (((vex.pfx & VEX_PREFIX_SCALAR_MASK) ? 0 : 1 + vex.l) +
                          !!(vex.pfx & VEX_PREFIX_DOUBLE_MASK));
@@ -6566,6 +6573,12 @@ x86_emulate(
             goto simd_0f_sse2;
         goto simd_0f_avx;
 
+    CASE_SIMD_ALL_FP(_EVEX, 0x0f, 0x5a):   /* vcvtp{s,d}2p{s,d} [xyz]mm/mem,[xyz]mm{k} */
+                                           /* vcvts{s,d}2s{s,d} xmm/mem,xmm,xmm{k} */
+        op_bytes = 4 << (((evex.pfx & VEX_PREFIX_SCALAR_MASK) ? 0 : 1 + evex.lr) +
+                         evex.w);
+        goto avx512f_all_fp;
+
     CASE_SIMD_PACKED_FP(, 0x0f, 0x5b):     /* cvt{ps,dq}2{dq,ps} xmm/mem,xmm */
     CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x5b): /* vcvt{ps,dq}2{dq,ps} {x,y}mm/mem,{x,y}mm */
     case X86EMUL_OPC_F3(0x0f, 0x5b):       /* cvttps2dq xmm/mem,xmm */
@@ -8455,6 +8468,15 @@ x86_emulate(
         op_bytes = 8 << vex.l;
         goto simd_0f_ymm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x13): /* vcvtph2ps {x,y}mm/mem,[xyz]mm{k} */
+        generate_exception_if(evex.w || (ea.type != OP_REG && evex.brs), EXC_UD);
+        host_and_vcpu_must_have(avx512f);
+        if ( !evex.brs )
+            avx512_vlen_check(false);
+        op_bytes = 8 << evex.lr;
+        elem_bytes = 2;
+        goto simd_zmm;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0x16): /* vpermps ymm/m256,ymm,ymm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x36): /* vpermd ymm/m256,ymm,ymm */
         generate_exception_if(!vex.l || vex.w, EXC_UD);
@@ -9283,27 +9305,79 @@ x86_emulate(
         goto avx512f_imm8_no_sae;
 
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x1d): /* vcvtps2ph $imm8,{x,y}mm,xmm/mem */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x1d): /* vcvtps2ph $imm8,[xyz]mm,{x,y}mm/mem{k} */
     {
         uint32_t mxcsr;
 
-        generate_exception_if(vex.w || vex.reg != 0xf, EXC_UD);
-        host_and_vcpu_must_have(f16c);
         fail_if(!ops->write);
+        if ( evex_encoded() )
+        {
+            generate_exception_if((evex.w || evex.reg != 0xf || !evex.RX ||
+                                   (ea.type != OP_REG && (evex.z || evex.brs))),
+                                  EXC_UD);
+            host_and_vcpu_must_have(avx512f);
+            avx512_vlen_check(false);
+            opc = init_evex(stub);
+        }
+        else
+        {
+            generate_exception_if(vex.w || vex.reg != 0xf, EXC_UD);
+            host_and_vcpu_must_have(f16c);
+            opc = init_prefixes(stub);
+        }
+
+        op_bytes = 8 << evex.lr;
 
-        opc = init_prefixes(stub);
         opc[0] = b;
         opc[1] = modrm;
         if ( ea.type == OP_MEM )
         {
             /* Convert memory operand to (%rAX). */
             vex.b = 1;
+            evex.b = 1;
             opc[1] &= 0x38;
         }
         opc[2] = imm1;
-        insn_bytes = PFX_BYTES + 3;
+        if ( evex_encoded() )
+        {
+            unsigned int full = 0;
+
+            insn_bytes = EVEX_PFX_BYTES + 3;
+            copy_EVEX(opc, evex);
+
+            if ( ea.type == OP_MEM && evex.opmsk )
+            {
+                full = 0xffff >> (16 - op_bytes / 2);
+                op_mask &= full;
+                if ( !op_mask )
+                    goto complete_insn;
+
+                first_byte = __builtin_ctz(op_mask);
+                op_mask >>= first_byte;
+                full >>= first_byte;
+                first_byte <<= 1;
+                op_bytes = (32 - __builtin_clz(op_mask)) << 1;
+
+                /*
+                 * We may need to read (parts of) the memory operand for the
+                 * purpose of merging in order to avoid splitting the write
+                 * below into multiple ones.
+                 */
+                if ( op_mask != full &&
+                     (rc = ops->read(ea.mem.seg,
+                                     truncate_ea(ea.mem.off + first_byte),
+                                     (void *)mmvalp + first_byte, op_bytes,
+                                     ctxt)) != X86EMUL_OKAY )
+                    goto done;
+            }
+        }
+        else
+        {
+            insn_bytes = PFX_BYTES + 3;
+            copy_VEX(opc, vex);
+        }
         opc[3] = 0xc3;
 
-        copy_VEX(opc, vex);
         /* Latch MXCSR - we may need to restore it below. */
         invoke_stub("stmxcsr %[mxcsr]", "",
                     "=m" (*mmvalp), [mxcsr] "=m" (mxcsr) : "a" (mmvalp));
@@ -9312,7 +9386,8 @@ x86_emulate(
 
         if ( ea.type == OP_MEM )
         {
-            rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp, 8 << vex.l, ctxt);
+            rc = ops->write(ea.mem.seg, truncate_ea(ea.mem.off + first_byte),
+                            (void *)mmvalp + first_byte, op_bytes, ctxt);
             if ( rc != X86EMUL_OKAY )
             {
                 asm volatile ( "ldmxcsr %0" :: "m" (mxcsr) );

From patchwork Fri Mar 15 10:47:50 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854485
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C76FC1515
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:49:35 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD9DB2A8F6
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:49:35 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id A07CD2A934; Fri, 15 Mar 2019 10:49:35 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id F243E2A8F6
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:49:34 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kNY-0005hk-He; Fri, 15 Mar 2019 10:47:56 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kNX-0005hZ-G8
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:47:55 +0000
X-Inumbo-ID: c7b84530-470f-11e9-b3a0-63b6fccfb590
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id c7b84530-470f-11e9-b3a0-63b6fccfb590;
 Fri, 15 Mar 2019 10:47:51 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:47:50 -0600
Message-Id: <5C8B82D6020000780021F1CA@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:47:50 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 20/50] x86emul: support AVX512F
 legacy-equivalent packed int/FP conversion insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

... including the two AVX512DQ forms which shared encodings, just with
EVEX.W set there.

VCVTDQ2PD, sharing its main opcode with others, needs a "manual"
override of disp8scale.

The simd_size changes for the twobyte_table[] entries are benign to
pre-existing code, but allow decode_disp8scale() to work as is here.

The at this point wrong placement of the 0xe6 case block is once again
in anticipation of further additions of case labels.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: ea.type == OP_* -> ea.type != OP_*. Re-base.
v6: Re-base over changes earlier in the series.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -109,8 +109,12 @@ static const struct test avx512f_all[] =
     INSN_FP(cmp,             0f, c2),
     INSN(comisd,       66,   0f, 2f,    el,      q, el),
     INSN(comiss,         ,   0f, 2f,    el,      d, el),
+    INSN(cvtdq2pd,     f3,   0f, e6,    vl_2,    d, vl),
+    INSN(cvtdq2ps,       ,   0f, 5b,    vl,      d, vl),
+    INSN(cvtpd2dq,     f2,   0f, e6,    vl,      q, vl),
     INSN(cvtpd2ps,     66,   0f, 5a,    vl,      q, vl),
     INSN(cvtph2ps,     66, 0f38, 13,    vl_2, d_nb, vl),
+    INSN(cvtps2dq,     66,   0f, 5b,    vl,      d, vl),
     INSN(cvtps2pd,       ,   0f, 5a,    vl_2,    d, vl),
     INSN(cvtps2ph,     66, 0f3a, 1d,    vl_2, d_nb, vl),
     INSN(cvtsd2ss,     f2,   0f, 5a,    el,      q, el),
@@ -398,6 +402,8 @@ static const struct test avx512dq_all[]
     INSN_PFP(and,              0f, 54),
     INSN_PFP(andn,             0f, 55),
     INSN(broadcasti32x2, 66, 0f38, 59, el_2,  d, vl),
+    INSN(cvtqq2pd,       f3,   0f, e6,   vl,  q, vl),
+    INSN(cvtqq2ps,         ,   0f, 5b,   vl,  q, vl),
     INSN_PFP(or,               0f, 56),
 //       pmovd2m,        f3, 0f38, 39,        d
 //       pmovm2,         f3, 0f38, 38,       dq
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -92,6 +92,13 @@ static inline bool _to_bool(byte_vec_t b
 # define to_int(x) ((vec_t){ (int)(x)[0] })
 #elif VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW__)
 # define to_int(x) __builtin_ia32_pi2fd(__builtin_ia32_pf2id(x))
+#elif defined(FLOAT_SIZE) && VEC_SIZE > FLOAT_SIZE && defined(__AVX512F__) && \
+      (VEC_SIZE == 64 || defined(__AVX512VL__))
+# if FLOAT_SIZE == 4
+#  define to_int(x) BR(cvtdq2ps, _mask, BR(cvtps2dq, _mask, x, (vsi_t)undef(), ~0), undef(), ~0)
+# elif FLOAT_SIZE == 8
+#  define to_int(x) B(cvtdq2pd, _mask, BR(cvtpd2dq, _mask, x, (vsi_half_t){}, ~0), undef(), ~0)
+# endif
 #elif VEC_SIZE == 16 && defined(__SSE2__)
 # if FLOAT_SIZE == 4
 #  define to_int(x) __builtin_ia32_cvtdq2ps(__builtin_ia32_cvtps2dq(x))
@@ -1142,15 +1149,21 @@ int simd_test(void)
     touch(src);
     if ( !eq(x * -alt, -src) ) return __LINE__;
 
-# if defined(recip) && defined(to_int)
+# ifdef to_int
+
+    touch(src);
+    x = to_int(src);
+    touch(src);
+    if ( !eq(x, src) ) return __LINE__;
 
+#  ifdef recip
     touch(src);
     x = recip(src);
     touch(src);
     touch(x);
     if ( !eq(to_int(recip(x)), src) ) return __LINE__;
 
-#  ifdef rsqrt
+#   ifdef rsqrt
     x = src * src;
     touch(x);
     y = rsqrt(x);
@@ -1158,6 +1171,7 @@ int simd_test(void)
     if ( !eq(to_int(recip(y)), src) ) return __LINE__;
     touch(src);
     if ( !eq(to_int(y), to_int(recip(src))) ) return __LINE__;
+#   endif
 #  endif
 
 # endif
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -244,6 +244,7 @@ asm ( ".macro override insn    \n\t"
 OVR_INT(broadcast);
 OVR_SFP(broadcast);
 OVR_SFP(comi);
+OVR_VFP(cvtdq2);
 OVR_FP(add);
 OVR_INT(add);
 OVR_BW(adds);
@@ -330,13 +331,19 @@ REN(pandn, , d);
 REN(por, , d);
 REN(pxor, , d);
 #  endif
+OVR(cvtpd2dqx);
+OVR(cvtpd2dqy);
 OVR(cvtpd2psx);
 OVR(cvtpd2psy);
 OVR(cvtph2ps);
+OVR(cvtps2dq);
 OVR(cvtps2pd);
 OVR(cvtps2ph);
 OVR(cvtsd2ss);
 OVR(cvtss2sd);
+OVR(cvttpd2dqx);
+OVR(cvttpd2dqy);
+OVR(cvttps2dq);
 OVR(movddup);
 OVR(movntdq);
 OVR(movntdqa);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -311,7 +311,7 @@ static const struct twobyte_table {
     [0x54 ... 0x57] = { DstImplicit|SrcMem|ModRM, simd_packed_fp, d8s_vl },
     [0x58 ... 0x59] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl },
     [0x5a] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp, d8s_vl },
-    [0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
+    [0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl },
     [0x5c ... 0x5f] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl },
     [0x60 ... 0x62] = { DstImplicit|SrcMem|ModRM, simd_other, d8s_vl },
     [0x63 ... 0x67] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
@@ -375,7 +375,7 @@ static const struct twobyte_table {
     [0xe0] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
     [0xe1 ... 0xe2] = { DstImplicit|SrcMem|ModRM, simd_128, 4 },
     [0xe3 ... 0xe5] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
-    [0xe6] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
+    [0xe6] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl },
     [0xe7] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_int, d8s_vl },
     [0xe8 ... 0xef] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
     [0xf0] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
@@ -3081,6 +3081,11 @@ x86_decode(
                 if ( disp8scale == 2 && evex.pfx == vex_f3 )
                     disp8scale = 3;
                 break;
+
+            case 0xe6: /* vcvtdq2pd needs special casing */
+                if ( disp8scale && evex.pfx == vex_f3 && !evex.w && !evex.brs )
+                    --disp8scale;
+                break;
             }
             break;
 
@@ -6587,6 +6592,22 @@ x86_emulate(
         op_bytes = 16 << vex.l;
         goto simd_0f_cvt;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x5b): /* vcvtps2dq [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_F3(0x0f, 0x5b): /* vcvttps2dq [xyz]mm/mem,[xyz]mm{k} */
+        generate_exception_if(evex.w, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX(0x0f, 0x5b):    /* vcvtdq2ps [xyz]mm/mem,[xyz]mm{k} */
+                                          /* vcvtqq2ps [xyz]mm/mem,{x,y}mm{k} */
+        if ( evex.w )
+            host_and_vcpu_must_have(avx512dq);
+        else
+            host_and_vcpu_must_have(avx512f);
+        if ( ea.type != OP_REG || !evex.brs )
+            avx512_vlen_check(false);
+        d |= TwoOp;
+        op_bytes = 16 << evex.lr;
+        goto simd_zmm;
+
     CASE_SIMD_PACKED_INT(0x0f, 0x60):    /* punpcklbw {,x}mm/mem,{,x}mm */
     case X86EMUL_OPC_VEX_66(0x0f, 0x60): /* vpunpcklbw {x,y}mm/mem,{x,y}mm,{x,y}mm */
     CASE_SIMD_PACKED_INT(0x0f, 0x61):    /* punpcklwd {,x}mm/mem,{,x}mm */
@@ -7251,6 +7272,27 @@ x86_emulate(
         op_bytes = 8;
         goto simd_0f_xmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f, 0xe6):   /* vcvttpd2dq [xyz]mm/mem,{x,y}mm{k} */
+    case X86EMUL_OPC_EVEX_F2(0x0f, 0xe6):   /* vcvtpd2dq [xyz]mm/mem,{x,y}mm{k} */
+        generate_exception_if(!evex.w, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_F3(0x0f, 0xe6):   /* vcvtdq2pd {x,y}mm/mem,[xyz]mm{k} */
+                                            /* vcvtqq2pd [xyz]mm/mem,[xyz]mm{k} */
+        if ( evex.pfx != vex_f3 )
+            host_and_vcpu_must_have(avx512f);
+        else if ( evex.w )
+            host_and_vcpu_must_have(avx512dq);
+        else
+        {
+            host_and_vcpu_must_have(avx512f);
+            generate_exception_if(ea.type != OP_MEM && evex.brs, EXC_UD);
+        }
+        if ( ea.type != OP_REG || !evex.brs )
+            avx512_vlen_check(false);
+        d |= TwoOp;
+        op_bytes = 8 << (evex.w + evex.lr);
+        goto simd_zmm;
+
     case X86EMUL_OPC_F2(0x0f, 0xf0):     /* lddqu m128,xmm */
     case X86EMUL_OPC_VEX_F2(0x0f, 0xf0): /* vlddqu mem,{x,y}mm */
         generate_exception_if(ea.type != OP_MEM, EXC_UD);

From patchwork Fri Mar 15 10:52:06 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854487
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC8BD13B5
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:53:58 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D36EF2A93A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:53:58 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C703F2A939; Fri, 15 Mar 2019 10:53:58 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 502A52A939
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:53:57 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kRe-0006Y5-6d; Fri, 15 Mar 2019 10:52:10 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kRd-0006Y0-4O
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:52:09 +0000
X-Inumbo-ID: 602ec5c7-4710-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 602ec5c7-4710-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:52:07 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:52:06 -0600
Message-Id: <5C8B83D6020000780021F208@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:52:06 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 21/50] x86emul: support AVX512F
 legacy-equivalent scalar int/FP conversion insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

VCVT{,T}S{S,D}2SI use EVEX.W for their destination (register) rather
than their (possibly memory) source operand size and hence need a
"manual" override of disp8scale.

While the SDM claims that EVEX.L'L needs to be zero for the 32-bit forms
of VCVT{,U}SI2SD (exception type E10NF), observations on my test system
do not confirm this (and I've got informal confirmation that this is a
doc mistake). Nevertheless, to be on the safe side, force evex.lr to be
zero in this case though when constructing the stub.

Slightly adjust the scalar to_int() in the test harness, to increase the
chances of the operand ending up in memory.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Fix VCVTSI2SS - cannot re-use VMOV{D,Q} code here, as the register
    form can't be converted to a memory one when embedded rounding is in
    effect. Force evex.lr to zero for 32-bit VCVTSI2SD. Permit embedded
    rounding for VCVT{,T}S{S,D}2SI. Re-base.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -117,8 +117,16 @@ static const struct test avx512f_all[] =
     INSN(cvtps2dq,     66,   0f, 5b,    vl,      d, vl),
     INSN(cvtps2pd,       ,   0f, 5a,    vl_2,    d, vl),
     INSN(cvtps2ph,     66, 0f3a, 1d,    vl_2, d_nb, vl),
+    INSN(cvtsd2si,     f2,   0f, 2d,    el,      q, el),
     INSN(cvtsd2ss,     f2,   0f, 5a,    el,      q, el),
+    INSN(cvtsi2sd,     f2,   0f, 2a,    el,   dq64, el),
+    INSN(cvtsi2ss,     f3,   0f, 2a,    el,   dq64, el),
     INSN(cvtss2sd,     f3,   0f, 5a,    el,      d, el),
+    INSN(cvtss2si,     f3,   0f, 2d,    el,      d, el),
+    INSN(cvttpd2dq,    66,   0f, e6,    vl,      q, vl),
+    INSN(cvttps2dq,    f3,   0f, 5b,    vl,      d, vl),
+    INSN(cvttsd2si,    f2,   0f, 2c,    el,      q, el),
+    INSN(cvttss2si,    f3,   0f, 2c,    el,      d, el),
     INSN_FP(div,             0f, 5e),
     INSN(fmadd132,     66, 0f38, 98,    vl,     sd, vl),
     INSN(fmadd132,     66, 0f38, 99,    el,     sd, el),
@@ -746,8 +754,9 @@ static void test_group(const struct test
                 break;
 
             case ESZ_dq:
-                test_pair(&tests[i], vl[j], ESZ_d, "d", ESZ_q, "q",
-                          instr, ctxt);
+                test_pair(&tests[i], vl[j], ESZ_d,
+                          strncmp(tests[i].mnemonic, "cvt", 3) ? "d" : "l",
+                          ESZ_q, "q", instr, ctxt);
                 break;
 
 #ifdef __i386__
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -89,7 +89,7 @@ static inline bool _to_bool(byte_vec_t b
 #endif
 
 #if VEC_SIZE == FLOAT_SIZE
-# define to_int(x) ((vec_t){ (int)(x)[0] })
+# define to_int(x) ({ int i_ = (x)[0]; touch(i_); ((vec_t){ i_ }); })
 #elif VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW__)
 # define to_int(x) __builtin_ia32_pi2fd(__builtin_ia32_pf2id(x))
 #elif defined(FLOAT_SIZE) && VEC_SIZE > FLOAT_SIZE && defined(__AVX512F__) && \
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -340,10 +340,28 @@ OVR(cvtps2dq);
 OVR(cvtps2pd);
 OVR(cvtps2ph);
 OVR(cvtsd2ss);
+OVR(cvtsd2si);
+OVR(cvtsd2sil);
+OVR(cvtsd2siq);
+OVR(cvtsi2sd);
+OVR(cvtsi2sdl);
+OVR(cvtsi2sdq);
+OVR(cvtsi2ss);
+OVR(cvtsi2ssl);
+OVR(cvtsi2ssq);
 OVR(cvtss2sd);
+OVR(cvtss2si);
+OVR(cvtss2sil);
+OVR(cvtss2siq);
 OVR(cvttpd2dqx);
 OVR(cvttpd2dqy);
 OVR(cvttps2dq);
+OVR(cvttsd2si);
+OVR(cvttsd2sil);
+OVR(cvttsd2siq);
+OVR(cvttss2si);
+OVR(cvttss2sil);
+OVR(cvttss2siq);
 OVR(movddup);
 OVR(movntdq);
 OVR(movntdqa);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -296,7 +296,7 @@ static const struct twobyte_table {
     [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM },
     [0x28] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl },
     [0x29] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_fp, d8s_vl },
-    [0x2a] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
+    [0x2a] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_dq64 },
     [0x2b] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp, d8s_vl },
     [0x2c ... 0x2d] = { DstImplicit|SrcMem|ModRM|Mov, simd_other },
     [0x2e ... 0x2f] = { ImplicitOps|ModRM|TwoOp, simd_none, d8s_dq },
@@ -3072,6 +3072,12 @@ x86_decode(
                 modrm_mod = 3;
                 break;
 
+            case 0x2c: /* vcvtts{s,d}2si need special casing */
+            case 0x2d: /* vcvts{s,d}2si need special casing */
+                if ( evex_encoded() )
+                    disp8scale = 2 + (evex.pfx & VEX_PREFIX_DOUBLE_MASK);
+                break;
+
             case 0x5a: /* vcvtps2pd needs special casing */
                 if ( disp8scale && !evex.pfx && !evex.brs )
                     --disp8scale;
@@ -6199,6 +6205,48 @@ x86_emulate(
         state->simd_size = simd_none;
         goto simd_0f_rm;
 
+    CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2a): /* vcvtsi2s{s,d} r/m,xmm,xmm */
+        generate_exception_if(evex.opmsk || (ea.type != OP_REG && evex.brs),
+                              EXC_UD);
+        host_and_vcpu_must_have(avx512f);
+        if ( !evex.brs )
+            avx512_vlen_check(true);
+        get_fpu(X86EMUL_FPU_zmm);
+
+        if ( ea.type == OP_MEM )
+        {
+            rc = read_ulong(ea.mem.seg, ea.mem.off, &src.val,
+                            rex_prefix & REX_W ? 8 : 4, ctxt, ops);
+            if ( rc != X86EMUL_OKAY )
+                goto done;
+        }
+        else
+            src.val = *ea.reg;
+
+        opc = init_evex(stub);
+        opc[0] = b;
+        /* Convert memory/GPR source to %rAX. */
+        evex.b = 1;
+        if ( !mode_64bit() )
+            evex.w = 0;
+        /*
+         * SDM version 067 claims that exception type E10NF implies #UD when
+         * EVEX.L'L is non-zero for 32-bit VCVT{,U}SI2SD. Experimentally this
+         * cannot be confirmed, but be on the safe side for the stub.
+         */
+        if ( !evex.w && evex.pfx == vex_f2 )
+            evex.lr = 0;
+        opc[1] = (modrm & 0x38) | 0xc0;
+        insn_bytes = EVEX_PFX_BYTES + 2;
+        opc[2] = 0xc3;
+
+        copy_EVEX(opc, evex);
+        invoke_stub("", "", "=g" (dummy) : "a" (src.val));
+
+        put_stub(stub);
+        state->simd_size = simd_none;
+        break;
+
     CASE_SIMD_SCALAR_FP(, 0x0f, 0x2c):     /* cvtts{s,d}2si xmm/mem,reg */
     CASE_SIMD_SCALAR_FP(_VEX, 0x0f, 0x2c): /* vcvtts{s,d}2si xmm/mem,reg */
     CASE_SIMD_SCALAR_FP(, 0x0f, 0x2d):     /* cvts{s,d}2si xmm/mem,reg */
@@ -6222,14 +6270,17 @@ x86_emulate(
         }
 
         opc = init_prefixes(stub);
+    cvts_2si:
         opc[0] = b;
         /* Convert GPR destination to %rAX and memory operand to (%rCX). */
         rex_prefix &= ~REX_R;
         vex.r = 1;
+        evex.r = 1;
         if ( ea.type == OP_MEM )
         {
             rex_prefix &= ~REX_B;
             vex.b = 1;
+            evex.b = 1;
             opc[1] = 0x01;
 
             rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
@@ -6240,11 +6291,22 @@ x86_emulate(
         else
             opc[1] = modrm & 0xc7;
         if ( !mode_64bit() )
+        {
             vex.w = 0;
-        insn_bytes = PFX_BYTES + 2;
+            evex.w = 0;
+        }
+        if ( evex_encoded() )
+        {
+            insn_bytes = EVEX_PFX_BYTES + 2;
+            copy_EVEX(opc, evex);
+        }
+        else
+        {
+            insn_bytes = PFX_BYTES + 2;
+            copy_REX_VEX(opc, rex_prefix, vex);
+        }
         opc[2] = 0xc3;
 
-        copy_REX_VEX(opc, rex_prefix, vex);
         ea.reg = decode_gpr(&_regs, modrm_reg);
         invoke_stub("", "", "=a" (*ea.reg) : "c" (mmvalp), "m" (*mmvalp));
 
@@ -6252,6 +6314,18 @@ x86_emulate(
         state->simd_size = simd_none;
         break;
 
+    CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2c): /* vcvtts{s,d}2si xmm/mem,reg */
+    CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2d): /* vcvts{s,d}2si xmm/mem,reg */
+        generate_exception_if((evex.reg != 0xf || !evex.RX || evex.opmsk ||
+                               (ea.type != OP_REG && evex.brs)),
+                              EXC_UD);
+        host_and_vcpu_must_have(avx512f);
+        if ( !evex.brs )
+            avx512_vlen_check(true);
+        get_fpu(X86EMUL_FPU_zmm);
+        opc = init_evex(stub);
+        goto cvts_2si;
+
     CASE_SIMD_PACKED_FP(, 0x0f, 0x2e):     /* ucomis{s,d} xmm/mem,xmm */
     CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x2e): /* vucomis{s,d} xmm/mem,xmm */
     CASE_SIMD_PACKED_FP(, 0x0f, 0x2f):     /* comis{s,d} xmm/mem,xmm */

From patchwork Fri Mar 15 10:52:30 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854489
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5939513B5
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:54:16 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3DFA12908D
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:54:16 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 2F81C291B4; Fri, 15 Mar 2019 10:54:16 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A34EE2908D
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:54:15 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kS2-0006aX-HJ; Fri, 15 Mar 2019 10:52:34 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kS0-0006aM-Qr
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:52:32 +0000
X-Inumbo-ID: 6eb0479b-4710-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 6eb0479b-4710-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:52:31 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:52:30 -0600
Message-Id: <5C8B83EE020000780021F20B@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:52:30 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 22/50] x86emul: support AVX512DQ packed
 quad-int/FP conversion insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

VCVT{,T}PS2QQ, sharing their main opcodes with others, once again need
"manual" overrides of disp8scale.

While not directly related here, also add a scalar variant of to_wint()
to the test harness.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v6: Workaround for gcc 7 quirk.
v5: Re-base over changes earlier in the series.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -410,8 +410,12 @@ static const struct test avx512dq_all[]
     INSN_PFP(and,              0f, 54),
     INSN_PFP(andn,             0f, 55),
     INSN(broadcasti32x2, 66, 0f38, 59, el_2,  d, vl),
+    INSN(cvtpd2qq,       66,   0f, 7b,   vl,  q, vl),
+    INSN(cvtps2qq,       66,   0f, 7b, vl_2,  d, vl),
     INSN(cvtqq2pd,       f3,   0f, e6,   vl,  q, vl),
     INSN(cvtqq2ps,         ,   0f, 5b,   vl,  q, vl),
+    INSN(cvttpd2qq,      66,   0f, 7a,   vl,  q, vl),
+    INSN(cvttps2qq,      66,   0f, 7a, vl_2,  d, vl),
     INSN_PFP(or,               0f, 56),
 //       pmovd2m,        f3, 0f38, 39,        d
 //       pmovm2,         f3, 0f38, 38,       dq
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -90,14 +90,35 @@ static inline bool _to_bool(byte_vec_t b
 
 #if VEC_SIZE == FLOAT_SIZE
 # define to_int(x) ({ int i_ = (x)[0]; touch(i_); ((vec_t){ i_ }); })
+# ifdef __x86_64__
+#  define to_wint(x) ({ long l_ = (x)[0]; touch(l_); ((vec_t){ l_ }); })
+# endif
 #elif VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW__)
 # define to_int(x) __builtin_ia32_pi2fd(__builtin_ia32_pf2id(x))
 #elif defined(FLOAT_SIZE) && VEC_SIZE > FLOAT_SIZE && defined(__AVX512F__) && \
       (VEC_SIZE == 64 || defined(__AVX512VL__))
 # if FLOAT_SIZE == 4
 #  define to_int(x) BR(cvtdq2ps, _mask, BR(cvtps2dq, _mask, x, (vsi_t)undef(), ~0), undef(), ~0)
+#  ifdef __AVX512DQ__
+#   define to_wint(x) ({ \
+    vsf_half_t t_ = low_half(x); \
+    vdi_t lo_, hi_; \
+    touch(t_); \
+    lo_ = BR(cvtps2qq, _mask, t_, (vdi_t)undef(), ~0); \
+    t_ = high_half(x); \
+    touch(t_); \
+    hi_ = BR(cvtps2qq, _mask, t_, (vdi_t)undef(), ~0); \
+    touch(lo_); touch(hi_); \
+    insert_half(insert_half(undef(), \
+                            BR(cvtqq2ps, _mask, lo_, (vsf_half_t){}, ~0), 0), \
+                BR(cvtqq2ps, _mask, hi_, (vsf_half_t){}, ~0), 1); \
+})
+#  endif
 # elif FLOAT_SIZE == 8
 #  define to_int(x) B(cvtdq2pd, _mask, BR(cvtpd2dq, _mask, x, (vsi_half_t){}, ~0), undef(), ~0)
+#  ifdef __AVX512DQ__
+#   define to_wint(x) BR(cvtqq2pd, _mask, BR(cvtpd2qq, _mask, x, (vdi_t)undef(), ~0), undef(), ~0)
+#  endif
 # endif
 #elif VEC_SIZE == 16 && defined(__SSE2__)
 # if FLOAT_SIZE == 4
@@ -121,6 +142,21 @@ static inline bool _to_bool(byte_vec_t b
 })
 #endif
 
+#if VEC_SIZE == 16 && FLOAT_SIZE == 4 && defined(__SSE__)
+# define low_half(x) (x)
+# define high_half(x) B_(movhlps, , undef(), x)
+/*
+ * GCC 7 (and perhaps earlier) report a bogus type mismatch for the conditional
+ * expression below. All works well with this no-op wrapper.
+ */
+static inline vec_t movlhps(vec_t x, vec_t y) {
+    return __builtin_ia32_movlhps(x, y);
+}
+# define insert_pair(x, y, p) \
+    ((p) ? movlhps(x, y) \
+         : ({ vec_t t_ = (x); t_[0] = (y)[0]; t_[1] = (y)[1]; t_; }))
+#endif
+
 #if VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW_A__)
 # define max __builtin_ia32_pfmax
 # define min __builtin_ia32_pfmin
@@ -149,13 +185,16 @@ static inline bool _to_bool(byte_vec_t b
 # if ELEM_COUNT == 8 /* vextractf{32,64}x4 */ || \
      (ELEM_COUNT == 16 && ELEM_SIZE == 4 && defined(__AVX512DQ__)) /* vextractf32x8 */ || \
      (ELEM_COUNT == 4 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextractf64x2 */
-#  define low_half(x) ({ \
+#  define _half(x, lh) ({ \
     half_t t_; \
-    asm ( "vextractf%c[w]x%c[n] $0, %[s], %[d]" \
+    asm ( "vextractf%c[w]x%c[n] %[sel], %[s], %[d]" \
           : [d] "=m" (t_) \
-          : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 2) ); \
+          : [s] "v" (x), [sel] "i" (lh), \
+            [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 2) ); \
     t_; \
 })
+#  define low_half(x)  _half(x, 0)
+#  define high_half(x) _half(x, 1)
 # endif
 # if (ELEM_COUNT == 16 && ELEM_SIZE == 4) /* vextractf32x4 */ || \
      (ELEM_COUNT == 8 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextractf64x2 */
@@ -1176,6 +1215,13 @@ int simd_test(void)
 
 # endif
 
+# ifdef to_wint
+    touch(src);
+    x = to_wint(src);
+    touch(src);
+    if ( !eq(x, src) ) return __LINE__;
+# endif
+
 # ifdef sqrt
     x = src * src;
     touch(x);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -325,6 +325,8 @@ static const struct twobyte_table {
     [0x77] = { DstImplicit|SrcNone },
     [0x78] = { ImplicitOps|ModRM },
     [0x79] = { DstReg|SrcMem|ModRM, simd_packed_int },
+    [0x7a] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl },
+    [0x7b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_vl },
     [0x7c ... 0x7d] = { DstImplicit|SrcMem|ModRM, simd_other },
     [0x7e] = { DstMem|SrcImplicit|ModRM|Mov, simd_none, d8s_dq64 },
     [0x7f] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_int, d8s_vl },
@@ -3083,6 +3085,12 @@ x86_decode(
                     --disp8scale;
                 break;
 
+            case 0x7a: /* vcvttps2qq needs special casing */
+            case 0x7b: /* vcvtps2qq needs special casing */
+                if ( disp8scale && evex.pfx == vex_66 && !evex.w && !evex.brs )
+                    --disp8scale;
+                break;
+
             case 0x7e: /* vmovq xmm/m64,xmm needs special casing */
                 if ( disp8scale == 2 && evex.pfx == vex_f3 )
                     disp8scale = 3;
@@ -7355,7 +7363,13 @@ x86_emulate(
         if ( evex.pfx != vex_f3 )
             host_and_vcpu_must_have(avx512f);
         else if ( evex.w )
+        {
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x7a):   /* vcvttps2qq {x,y}mm/mem,[xyz]mm{k} */
+                                            /* vcvttpd2qq [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x7b):   /* vcvtps2qq {x,y}mm/mem,[xyz]mm{k} */
+                                            /* vcvtpd2qq [xyz]mm/mem,[xyz]mm{k} */
             host_and_vcpu_must_have(avx512dq);
+        }
         else
         {
             host_and_vcpu_must_have(avx512f);

From patchwork Fri Mar 15 10:53:37 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854491
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCFE41390
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:55:21 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A4D6E2A864
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:55:21 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9815D2A874; Fri, 15 Mar 2019 10:55:21 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 23FC92A864
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:55:21 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kTB-0006lx-Bp; Fri, 15 Mar 2019 10:53:45 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kTA-0006lm-Ew
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:53:44 +0000
X-Inumbo-ID: 9773a936-4710-11e9-81e2-471df63f5f69
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 9773a936-4710-11e9-81e2-471df63f5f69;
 Fri, 15 Mar 2019 10:53:40 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:53:39 -0600
Message-Id: <5C8B8431020000780021F20E@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:53:37 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 23/50] x86emul: support AVX512{F,
 DQ} uint-to-FP conversion insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Some "manual" overrides of disp8scale are needed here again. In
particular code ends up simpler when using d8s_dq64 in the
twobyte_table[] entry.

Test harness additions will be done once the reverse conversions are
also available.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -127,6 +127,10 @@ static const struct test avx512f_all[] =
     INSN(cvttps2dq,    f3,   0f, 5b,    vl,      d, vl),
     INSN(cvttsd2si,    f2,   0f, 2c,    el,      q, el),
     INSN(cvttss2si,    f3,   0f, 2c,    el,      d, el),
+    INSN(cvtudq2pd,    f3,   0f, 7a,    vl_2,    d, vl),
+    INSN(cvtudq2ps,    f2,   0f, 7a,    vl,      d, vl),
+    INSN(cvtusi2sd,    f2,   0f, 7b,    el,   dq64, el),
+    INSN(cvtusi2ss,    f3,   0f, 7b,    el,   dq64, el),
     INSN_FP(div,             0f, 5e),
     INSN(fmadd132,     66, 0f38, 98,    vl,     sd, vl),
     INSN(fmadd132,     66, 0f38, 99,    el,     sd, el),
@@ -416,6 +420,8 @@ static const struct test avx512dq_all[]
     INSN(cvtqq2ps,         ,   0f, 5b,   vl,  q, vl),
     INSN(cvttpd2qq,      66,   0f, 7a,   vl,  q, vl),
     INSN(cvttps2qq,      66,   0f, 7a, vl_2,  d, vl),
+    INSN(cvtuqq2pd,      f3,   0f, 7a,   vl,  q, vl),
+    INSN(cvtuqq2ps,      f2,   0f, 7a,   vl,  q, vl),
     INSN_PFP(or,               0f, 56),
 //       pmovd2m,        f3, 0f38, 39,        d
 //       pmovm2,         f3, 0f38, 38,       dq
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -326,7 +326,7 @@ static const struct twobyte_table {
     [0x78] = { ImplicitOps|ModRM },
     [0x79] = { DstReg|SrcMem|ModRM, simd_packed_int },
     [0x7a] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl },
-    [0x7b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_vl },
+    [0x7b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_dq64 },
     [0x7c ... 0x7d] = { DstImplicit|SrcMem|ModRM, simd_other },
     [0x7e] = { DstMem|SrcImplicit|ModRM|Mov, simd_none, d8s_dq64 },
     [0x7f] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_int, d8s_vl },
@@ -3085,12 +3085,16 @@ x86_decode(
                     --disp8scale;
                 break;
 
-            case 0x7a: /* vcvttps2qq needs special casing */
-            case 0x7b: /* vcvtps2qq needs special casing */
-                if ( disp8scale && evex.pfx == vex_66 && !evex.w && !evex.brs )
+            case 0x7a: /* vcvttps2qq and vcvtudq2pd need special casing */
+                if ( disp8scale && evex.pfx != vex_f2 && !evex.w && !evex.brs )
                     --disp8scale;
                 break;
 
+            case 0x7b: /* vcvtp{s,d}2qq need special casing */
+                if ( disp8scale && evex.pfx == vex_66 )
+                    disp8scale = (evex.brs ? 2 : 3 + evex.lr) + evex.w;
+                break;
+
             case 0x7e: /* vmovq xmm/m64,xmm needs special casing */
                 if ( disp8scale == 2 && evex.pfx == vex_f3 )
                     disp8scale = 3;
@@ -6214,6 +6218,7 @@ x86_emulate(
         goto simd_0f_rm;
 
     CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2a): /* vcvtsi2s{s,d} r/m,xmm,xmm */
+    CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x7b): /* vcvtusi2s{s,d} r/m,xmm,xmm */
         generate_exception_if(evex.opmsk || (ea.type != OP_REG && evex.brs),
                               EXC_UD);
         host_and_vcpu_must_have(avx512f);
@@ -6680,6 +6685,8 @@ x86_emulate(
         /* fall through */
     case X86EMUL_OPC_EVEX(0x0f, 0x5b):    /* vcvtdq2ps [xyz]mm/mem,[xyz]mm{k} */
                                           /* vcvtqq2ps [xyz]mm/mem,{x,y}mm{k} */
+    case X86EMUL_OPC_EVEX_F2(0x0f, 0x7a): /* vcvtudq2ps [xyz]mm/mem,[xyz]mm{k} */
+                                          /* vcvtuqq2ps [xyz]mm/mem,{x,y}mm{k} */
         if ( evex.w )
             host_and_vcpu_must_have(avx512dq);
         else
@@ -7358,6 +7365,8 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_F2(0x0f, 0xe6):   /* vcvtpd2dq [xyz]mm/mem,{x,y}mm{k} */
         generate_exception_if(!evex.w, EXC_UD);
         /* fall through */
+    case X86EMUL_OPC_EVEX_F3(0x0f, 0x7a):   /* vcvtudq2pd {x,y}mm/mem,[xyz]mm{k} */
+                                            /* vcvtuqq2pd [xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_F3(0x0f, 0xe6):   /* vcvtdq2pd {x,y}mm/mem,[xyz]mm{k} */
                                             /* vcvtqq2pd [xyz]mm/mem,[xyz]mm{k} */
         if ( evex.pfx != vex_f3 )

From patchwork Fri Mar 15 10:54:03 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854493
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D4CB1390
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:55:42 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 236FF2A93E
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:55:42 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 17BF42A940; Fri, 15 Mar 2019 10:55:42 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 537712A93E
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:55:41 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kTV-0006rY-Ms; Fri, 15 Mar 2019 10:54:05 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kTU-0006rH-6Z
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:54:04 +0000
X-Inumbo-ID: a5335d75-4710-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id a5335d75-4710-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:54:03 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:54:02 -0600
Message-Id: <5C8B844B020000780021F211@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:54:03 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 24/50] x86emul: support AVX512{F,
 DQ} FP-to-uint conversion insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Along the lines of prior patches, VCVT{,T}PS2UQQ as well as
VCVT{,T}S{S,D}2USI need "manual" overrides of disp8scale.

The twobyte_table[] entries get altered, with their prior values
now put in place in x86_decode_twobyte().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v4: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -112,21 +112,29 @@ static const struct test avx512f_all[] =
     INSN(cvtdq2pd,     f3,   0f, e6,    vl_2,    d, vl),
     INSN(cvtdq2ps,       ,   0f, 5b,    vl,      d, vl),
     INSN(cvtpd2dq,     f2,   0f, e6,    vl,      q, vl),
+    INSN(cvtpd2udq,      ,   0f, 79,    vl,      q, vl),
     INSN(cvtpd2ps,     66,   0f, 5a,    vl,      q, vl),
     INSN(cvtph2ps,     66, 0f38, 13,    vl_2, d_nb, vl),
     INSN(cvtps2dq,     66,   0f, 5b,    vl,      d, vl),
     INSN(cvtps2pd,       ,   0f, 5a,    vl_2,    d, vl),
     INSN(cvtps2ph,     66, 0f3a, 1d,    vl_2, d_nb, vl),
+    INSN(cvtps2udq,      ,   0f, 79,    vl,      d, vl),
     INSN(cvtsd2si,     f2,   0f, 2d,    el,      q, el),
+    INSN(cvtsd2usi,    f2,   0f, 79,    el,      q, el),
     INSN(cvtsd2ss,     f2,   0f, 5a,    el,      q, el),
     INSN(cvtsi2sd,     f2,   0f, 2a,    el,   dq64, el),
     INSN(cvtsi2ss,     f3,   0f, 2a,    el,   dq64, el),
     INSN(cvtss2sd,     f3,   0f, 5a,    el,      d, el),
     INSN(cvtss2si,     f3,   0f, 2d,    el,      d, el),
+    INSN(cvtss2usi,    f3,   0f, 79,    el,      d, el),
     INSN(cvttpd2dq,    66,   0f, e6,    vl,      q, vl),
+    INSN(cvttpd2udq,     ,   0f, 78,    vl,      q, vl),
     INSN(cvttps2dq,    f3,   0f, 5b,    vl,      d, vl),
+    INSN(cvttps2udq,     ,   0f, 78,    vl,      d, vl),
     INSN(cvttsd2si,    f2,   0f, 2c,    el,      q, el),
+    INSN(cvttsd2usi,   f2,   0f, 78,    el,      q, el),
     INSN(cvttss2si,    f3,   0f, 2c,    el,      d, el),
+    INSN(cvttss2usi,   f3,   0f, 78,    el,      d, el),
     INSN(cvtudq2pd,    f3,   0f, 7a,    vl_2,    d, vl),
     INSN(cvtudq2ps,    f2,   0f, 7a,    vl,      d, vl),
     INSN(cvtusi2sd,    f2,   0f, 7b,    el,   dq64, el),
@@ -415,11 +423,15 @@ static const struct test avx512dq_all[]
     INSN_PFP(andn,             0f, 55),
     INSN(broadcasti32x2, 66, 0f38, 59, el_2,  d, vl),
     INSN(cvtpd2qq,       66,   0f, 7b,   vl,  q, vl),
+    INSN(cvtpd2uqq,      66,   0f, 79,   vl,  q, vl),
     INSN(cvtps2qq,       66,   0f, 7b, vl_2,  d, vl),
+    INSN(cvtps2uqq,      66,   0f, 79, vl_2,  d, vl),
     INSN(cvtqq2pd,       f3,   0f, e6,   vl,  q, vl),
     INSN(cvtqq2ps,         ,   0f, 5b,   vl,  q, vl),
     INSN(cvttpd2qq,      66,   0f, 7a,   vl,  q, vl),
+    INSN(cvttpd2uqq,     66,   0f, 78,   vl,  q, vl),
     INSN(cvttps2qq,      66,   0f, 7a, vl_2,  d, vl),
+    INSN(cvttps2uqq,     66,   0f, 78, vl_2,  d, vl),
     INSN(cvtuqq2pd,      f3,   0f, 7a,   vl,  q, vl),
     INSN(cvtuqq2ps,      f2,   0f, 7a,   vl,  q, vl),
     INSN_PFP(or,               0f, 56),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -93,31 +93,65 @@ static inline bool _to_bool(byte_vec_t b
 # ifdef __x86_64__
 #  define to_wint(x) ({ long l_ = (x)[0]; touch(l_); ((vec_t){ l_ }); })
 # endif
+# ifdef __AVX512F__
+/*
+ * Sadly even gcc 9.x, at the time of writing, does not carry out at least
+ * uint -> FP conversions using VCVTUSI2S{S,D}, so we need to use builtins
+ * or inline assembly here. The full-vector parameter types of the builtins
+ * aren't very helpful for our purposes, so use inline assembly.
+ */
+#  if FLOAT_SIZE == 4
+#   define to_u_int(type, x) ({ \
+    unsigned type u_; \
+    float __attribute__((vector_size(16))) t_; \
+    asm ( "vcvtss2usi %1, %0" : "=r" (u_) : "m" ((x)[0]) ); \
+    asm ( "vcvtusi2ss%z1 %1, %0, %0" : "=v" (t_) : "m" (u_) ); \
+    (vec_t){ t_[0] }; \
+})
+#  elif FLOAT_SIZE == 8
+#   define to_u_int(type, x) ({ \
+    unsigned type u_; \
+    double __attribute__((vector_size(16))) t_; \
+    asm ( "vcvtsd2usi %1, %0" : "=r" (u_) : "m" ((x)[0]) ); \
+    asm ( "vcvtusi2sd%z1 %1, %0, %0" : "=v" (t_) : "m" (u_) ); \
+    (vec_t){ t_[0] }; \
+})
+#  endif
+#  define to_uint(x) to_u_int(int, x)
+#  ifdef __x86_64__
+#   define to_uwint(x) to_u_int(long, x)
+#  endif
+# endif
 #elif VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW__)
 # define to_int(x) __builtin_ia32_pi2fd(__builtin_ia32_pf2id(x))
 #elif defined(FLOAT_SIZE) && VEC_SIZE > FLOAT_SIZE && defined(__AVX512F__) && \
       (VEC_SIZE == 64 || defined(__AVX512VL__))
 # if FLOAT_SIZE == 4
 #  define to_int(x) BR(cvtdq2ps, _mask, BR(cvtps2dq, _mask, x, (vsi_t)undef(), ~0), undef(), ~0)
+#  define to_uint(x) BR(cvtudq2ps, _mask, BR(cvtps2udq, _mask, x, (vsi_t)undef(), ~0), undef(), ~0)
 #  ifdef __AVX512DQ__
-#   define to_wint(x) ({ \
+#   define to_w_int(x, s) ({ \
     vsf_half_t t_ = low_half(x); \
     vdi_t lo_, hi_; \
     touch(t_); \
-    lo_ = BR(cvtps2qq, _mask, t_, (vdi_t)undef(), ~0); \
+    lo_ = BR(cvtps2 ## s ## qq, _mask, t_, (vdi_t)undef(), ~0); \
     t_ = high_half(x); \
     touch(t_); \
-    hi_ = BR(cvtps2qq, _mask, t_, (vdi_t)undef(), ~0); \
+    hi_ = BR(cvtps2 ## s ## qq, _mask, t_, (vdi_t)undef(), ~0); \
     touch(lo_); touch(hi_); \
     insert_half(insert_half(undef(), \
-                            BR(cvtqq2ps, _mask, lo_, (vsf_half_t){}, ~0), 0), \
-                BR(cvtqq2ps, _mask, hi_, (vsf_half_t){}, ~0), 1); \
+                            BR(cvt ## s ## qq2ps, _mask, lo_, (vsf_half_t){}, ~0), 0), \
+                BR(cvt ## s ## qq2ps, _mask, hi_, (vsf_half_t){}, ~0), 1); \
 })
+#   define to_wint(x) to_w_int(x, )
+#   define to_uwint(x) to_w_int(x, u)
 #  endif
 # elif FLOAT_SIZE == 8
 #  define to_int(x) B(cvtdq2pd, _mask, BR(cvtpd2dq, _mask, x, (vsi_half_t){}, ~0), undef(), ~0)
+#  define to_uint(x) B(cvtudq2pd, _mask, BR(cvtpd2udq, _mask, x, (vsi_half_t){}, ~0), undef(), ~0)
 #  ifdef __AVX512DQ__
 #   define to_wint(x) BR(cvtqq2pd, _mask, BR(cvtpd2qq, _mask, x, (vdi_t)undef(), ~0), undef(), ~0)
+#   define to_uwint(x) BR(cvtuqq2pd, _mask, BR(cvtpd2uqq, _mask, x, (vdi_t)undef(), ~0), undef(), ~0)
 #  endif
 # endif
 #elif VEC_SIZE == 16 && defined(__SSE2__)
@@ -1221,6 +1255,20 @@ int simd_test(void)
     touch(src);
     if ( !eq(x, src) ) return __LINE__;
 # endif
+
+# ifdef to_uint
+    touch(src);
+    x = to_uint(src);
+    touch(src);
+    if ( !eq(x, src) ) return __LINE__;
+# endif
+
+# ifdef to_uwint
+    touch(src);
+    x = to_uwint(src);
+    touch(src);
+    if ( !eq(x, src) ) return __LINE__;
+# endif
 
 # ifdef sqrt
     x = src * src;
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -323,8 +323,7 @@ static const struct twobyte_table {
     [0x71 ... 0x73] = { DstImplicit|SrcImmByte|ModRM, simd_none, d8s_vl },
     [0x74 ... 0x76] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl },
     [0x77] = { DstImplicit|SrcNone },
-    [0x78] = { ImplicitOps|ModRM },
-    [0x79] = { DstReg|SrcMem|ModRM, simd_packed_int },
+    [0x78 ... 0x79] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_vl },
     [0x7a] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl },
     [0x7b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_dq64 },
     [0x7c ... 0x7d] = { DstImplicit|SrcMem|ModRM, simd_other },
@@ -2523,6 +2522,8 @@ x86_decode_twobyte(
         break;
 
     case 0x78:
+        state->desc = ImplicitOps;
+        state->simd_size = simd_none;
         switch ( vex.pfx )
         {
         case vex_66: /* extrq $imm8, $imm8, xmm */
@@ -2535,7 +2536,7 @@ x86_decode_twobyte(
     case 0x10 ... 0x18:
     case 0x28 ... 0x2f:
     case 0x50 ... 0x77:
-    case 0x79 ... 0x7d:
+    case 0x7a ... 0x7d:
     case 0x7f:
     case 0xc2 ... 0xc3:
     case 0xc5 ... 0xc6:
@@ -2557,6 +2558,12 @@ x86_decode_twobyte(
         op_bytes = mode_64bit() ? 8 : 4;
         break;
 
+    case 0x79:
+        state->desc = DstReg | SrcMem;
+        state->simd_size = simd_packed_int;
+        ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
+        break;
+
     case 0x7e:
         ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
         if ( vex.pfx == vex_f3 ) /* movq xmm/m64,xmm */
@@ -3074,6 +3081,18 @@ x86_decode(
                 modrm_mod = 3;
                 break;
 
+            case 0x78:
+            case 0x79:
+                if ( !evex.pfx )
+                    break;
+                /* vcvt{,t}ps2uqq need special casing */
+                if ( evex.pfx == vex_66 )
+                {
+                    if ( !evex.w && !evex.brs )
+                        --disp8scale;
+                    break;
+                }
+                /* vcvt{,t}s{s,d}2usi need special casing: fall through */
             case 0x2c: /* vcvtts{s,d}2si need special casing */
             case 0x2d: /* vcvts{s,d}2si need special casing */
                 if ( evex_encoded() )
@@ -6329,6 +6348,8 @@ x86_emulate(
 
     CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2c): /* vcvtts{s,d}2si xmm/mem,reg */
     CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2d): /* vcvts{s,d}2si xmm/mem,reg */
+    CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x78): /* vcvtts{s,d}2usi xmm/mem,reg */
+    CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x79): /* vcvts{s,d}2usi xmm/mem,reg */
         generate_exception_if((evex.reg != 0xf || !evex.RX || evex.opmsk ||
                                (ea.type != OP_REG && evex.brs)),
                               EXC_UD);
@@ -6690,7 +6711,11 @@ x86_emulate(
         if ( evex.w )
             host_and_vcpu_must_have(avx512dq);
         else
+        {
+    case X86EMUL_OPC_EVEX(0x0f, 0x78):    /* vcvttp{s,d}2udq [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX(0x0f, 0x79):    /* vcvtp{s,d}2udq [xyz]mm/mem,[xyz]mm{k} */
             host_and_vcpu_must_have(avx512f);
+        }
         if ( ea.type != OP_REG || !evex.brs )
             avx512_vlen_check(false);
         d |= TwoOp;
@@ -7373,6 +7398,10 @@ x86_emulate(
             host_and_vcpu_must_have(avx512f);
         else if ( evex.w )
         {
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x78):   /* vcvttps2uqq {x,y}mm/mem,[xyz]mm{k} */
+                                            /* vcvttpd2uqq [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0x79):   /* vcvtps2uqq {x,y}mm/mem,[xyz]mm{k} */
+                                            /* vcvtpd2uqq [xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0x7a):   /* vcvttps2qq {x,y}mm/mem,[xyz]mm{k} */
                                             /* vcvttpd2qq [xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0x7b):   /* vcvtps2qq {x,y}mm/mem,[xyz]mm{k} */

From patchwork Fri Mar 15 10:54:35 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854495
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99C3A13B5
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:56:28 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7EF392A943
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:56:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 732972A945; Fri, 15 Mar 2019 10:56:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A5AD62A943
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:56:27 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kU5-000707-24; Fri, 15 Mar 2019 10:54:41 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kU3-0006zm-My
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:54:39 +0000
X-Inumbo-ID: b90803a8-4710-11e9-beb8-138a919fd9c5
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id b90803a8-4710-11e9-beb8-138a919fd9c5;
 Fri, 15 Mar 2019 10:54:36 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:54:35 -0600
Message-Id: <5C8B846B020000780021F214@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:54:35 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 25/50] x86emul: support remaining AVX512F
 legacy-equivalent insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Plus their AVX512BW counterparts.

Take the opportunity and also eliminate a pair of open coded instances
of scalar_1op().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base.
v6: Re-base over changes earlier in the series.
v5: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -193,6 +193,8 @@ static const struct test avx512f_all[] =
     INSN_PFP_NB(movu,        0f, 10),
     INSN_PFP_NB(movu,        0f, 11),
     INSN_FP(mul,             0f, 59),
+    INSN(pabsd,        66, 0f38, 1e,    vl,      d, vl),
+    INSN(pabsq,        66, 0f38, 1f,    vl,      q, vl),
     INSN(paddd,        66,   0f, fe,    vl,      d, vl),
     INSN(paddq,        66,   0f, d4,    vl,      q, vl),
     INSN(pand,         66,   0f, db,    vl,     dq, vl),
@@ -276,6 +278,10 @@ static const struct test avx512f_all[] =
     INSN(punpckldq,    66,   0f, 62,    vl,      d, vl),
     INSN(punpcklqdq,   66,   0f, 6c,    vl,      q, vl),
     INSN(pxor,         66,   0f, ef,    vl,     dq, vl),
+    INSN(rndscalepd,   66, 0f3a, 09,    vl,      q, vl),
+    INSN(rndscaleps,   66, 0f3a, 08,    vl,      d, vl),
+    INSN(rndscalesd,   66, 0f3a, 0b,    el,      q, el),
+    INSN(rndscaless,   66, 0f3a, 0a,    el,      d, el),
     INSN_PFP(shuf,           0f, c6),
     INSN_FP(sqrt,            0f, 51),
     INSN_FP(sub,             0f, 5c),
@@ -336,6 +342,8 @@ static const struct test avx512bw_all[]
     INSN(movdqu8,     f2,   0f, 7f,    vl,    b, vl),
     INSN(movdqu16,    f2,   0f, 6f,    vl,    w, vl),
     INSN(movdqu16,    f2,   0f, 7f,    vl,    w, vl),
+    INSN(pabsb,       66, 0f38, 1c,    vl,    b, vl),
+    INSN(pabsw,       66, 0f38, 1d,    vl,    w, vl),
     INSN(packssdw,    66,   0f, 6b,    vl, d_nb, vl),
     INSN(packsswb,    66,   0f, 63,    vl,    w, vl),
     INSN(packusdw,    66, 0f38, 2b,    vl, d_nb, vl),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -211,8 +211,10 @@ static inline vec_t movlhps(vec_t x, vec
 #elif defined(FLOAT_SIZE) && VEC_SIZE == FLOAT_SIZE && defined(__AVX512F__)
 # if FLOAT_SIZE == 4
 #  define sqrt(x) scalar_1op(x, "vsqrtss %[in], %[out], %[out]")
+#  define trunc(x) scalar_1op(x, "vrndscaless $0b1011, %[in], %[out], %[out]")
 # elif FLOAT_SIZE == 8
 #  define sqrt(x) scalar_1op(x, "vsqrtsd %[in], %[out], %[out]")
+#  define trunc(x) scalar_1op(x, "vrndscalesd $0b1011, %[in], %[out], %[out]")
 # endif
 #elif defined(FLOAT_SIZE) && defined(__AVX512F__) && \
       (VEC_SIZE == 64 || defined(__AVX512VL__))
@@ -263,6 +265,7 @@ static inline vec_t movlhps(vec_t x, vec
 #  define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE))
 #  define shrink1(x) BR_(cvtpd2ps, _mask, (vdf_t)(x), (vsf_half_t){}, ~0)
 #  define sqrt(x) BR(sqrtps, _mask, x, undef(), ~0)
+#  define trunc(x) BR(rndscaleps_, _mask, x, 0b1011, undef(), ~0)
 #  define widen1(x) ((vec_t)BR(cvtps2pd, _mask, x, (vdf_t)undef(), ~0))
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0)
@@ -316,6 +319,7 @@ static inline vec_t movlhps(vec_t x, vec
 #  define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0)
 #  define mix(x, y) B(movapd, _mask, x, y, 0b01010101)
 #  define sqrt(x) BR(sqrtpd, _mask, x, undef(), ~0)
+#  define trunc(x) BR(rndscalepd_, _mask, x, 0b1011, undef(), ~0)
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) B(unpckhpd, _mask, x, y, undef(), ~0)
 #   define interleave_lo(x, y) B(unpcklpd, _mask, x, y, undef(), ~0)
@@ -548,6 +552,7 @@ static inline vec_t movlhps(vec_t x, vec
 #  endif
 # endif
 # if INT_SIZE == 4
+#  define abs(x) B(pabsd, _mask, x, undef(), ~0)
 #  define max(x, y) B(pmaxsd, _mask, x, y, undef(), ~0)
 #  define min(x, y) B(pminsd, _mask, x, y, undef(), ~0)
 #  define mul_full(x, y) ((vec_t)B(pmuldq, _mask, x, y, (vdi_t)undef(), ~0))
@@ -558,6 +563,7 @@ static inline vec_t movlhps(vec_t x, vec
 #  define mul_full(x, y) ((vec_t)B(pmuludq, _mask, (vsi_t)(x), (vsi_t)(y), (vdi_t)undef(), ~0))
 #  define widen1(x) ((vec_t)B(pmovzxdq, _mask, (vsi_half_t)(x), (vdi_t)undef(), ~0))
 # elif INT_SIZE == 8
+#  define abs(x) ((vec_t)B(pabsq, _mask, (vdi_t)(x), (vdi_t)undef(), ~0))
 #  define max(x, y) ((vec_t)B(pmaxsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
 #  define min(x, y) ((vec_t)B(pminsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0))
 # elif UINT_SIZE == 8
@@ -625,6 +631,7 @@ static inline vec_t movlhps(vec_t x, vec
 #  define swap2(x) ((vec_t)B(permvarhi, _mask, (vhi_t)(x), (vhi_t)(inv - 1), (vhi_t)undef(), ~0))
 # endif
 # if INT_SIZE == 1
+#  define abs(x) ((vec_t)B(pabsb, _mask, (vqi_t)(x), (vqi_t)undef(), ~0))
 #  define max(x, y) ((vec_t)B(pmaxsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
 #  define min(x, y) ((vec_t)B(pminsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
 #  define widen1(x) ((vec_t)B(pmovsxbw, _mask, (vqi_half_t)(x), (vhi_t)undef(), ~0))
@@ -637,6 +644,7 @@ static inline vec_t movlhps(vec_t x, vec
 #  define widen2(x) ((vec_t)B(pmovzxbd, _mask, (vqi_quarter_t)(x), (vsi_t)undef(), ~0))
 #  define widen3(x) ((vec_t)B(pmovzxbq, _mask, (vqi_eighth_t)(x), (vdi_t)undef(), ~0))
 # elif INT_SIZE == 2
+#  define abs(x) B(pabsw, _mask, x, undef(), ~0)
 #  define max(x, y) B(pmaxsw, _mask, x, y, undef(), ~0)
 #  define min(x, y) B(pminsw, _mask, x, y, undef(), ~0)
 #  define mul_hi(x, y) B(pmulhw, _mask, x, y, undef(), ~0)
@@ -948,19 +956,11 @@ static inline vec_t movlhps(vec_t x, vec
 #if VEC_SIZE == FLOAT_SIZE
 # define max(x, y) ((vec_t){({ typeof(x[0]) x_ = (x)[0], y_ = (y)[0]; x_ > y_ ? x_ : y_; })})
 # define min(x, y) ((vec_t){({ typeof(x[0]) x_ = (x)[0], y_ = (y)[0]; x_ < y_ ? x_ : y_; })})
-# ifdef __SSE4_1__
+# if defined(__SSE4_1__) && !defined(__AVX512F__)
 #  if FLOAT_SIZE == 4
-#   define trunc(x) ({ \
-    float __attribute__((vector_size(16))) r_; \
-    asm ( "roundss $0b1011,%1,%0" : "=x" (r_) : "m" (x) ); \
-    (vec_t){ r_[0] }; \
-})
+#   define trunc(x) scalar_1op(x, "roundss $0b1011, %[in], %[out]")
 #  elif FLOAT_SIZE == 8
-#   define trunc(x) ({ \
-    double __attribute__((vector_size(16))) r_; \
-    asm ( "roundsd $0b1011,%1,%0" : "=x" (r_) : "m" (x) ); \
-    (vec_t){ r_[0] }; \
-})
+#   define trunc(x) scalar_1op(x, "roundsd $0b1011, %[in], %[out]")
 #  endif
 # endif
 #endif
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -184,6 +184,8 @@ DECL_OCTET(half);
 # define __builtin_ia32_inserti32x4_512_mask __builtin_ia32_inserti32x4_mask
 # define __builtin_ia32_inserti32x8_512_mask __builtin_ia32_inserti32x8_mask
 # define __builtin_ia32_inserti64x4_512_mask __builtin_ia32_inserti64x4_mask
+# define __builtin_ia32_rndscalepd_512_mask __builtin_ia32_rndscalepd_mask
+# define __builtin_ia32_rndscaleps_512_mask __builtin_ia32_rndscaleps_mask
 # define __builtin_ia32_shuf_f32x4_512_mask __builtin_ia32_shuf_f32x4_mask
 # define __builtin_ia32_shuf_f64x2_512_mask __builtin_ia32_shuf_f64x2_mask
 # define __builtin_ia32_shuf_i32x4_512_mask __builtin_ia32_shuf_i32x4_mask
@@ -245,6 +247,7 @@ OVR_INT(broadcast);
 OVR_SFP(broadcast);
 OVR_SFP(comi);
 OVR_VFP(cvtdq2);
+OVR_INT(abs);
 OVR_FP(add);
 OVR_INT(add);
 OVR_BW(adds);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -446,7 +446,7 @@ static const struct ext0f38_table {
     [0x19] = { .simd_size = simd_scalar_opc, .two_op = 1, .d8s = 3 },
     [0x1a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 },
     [0x1b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 },
-    [0x1c ... 0x1e] = { .simd_size = simd_packed_int, .two_op = 1 },
+    [0x1c ... 0x1f] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0x20] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x21] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 },
     [0x22] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_8 },
@@ -531,8 +531,8 @@ static const struct ext0f3a_table {
     [0x02] = { .simd_size = simd_packed_int },
     [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x06] = { .simd_size = simd_packed_fp },
-    [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1 },
-    [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc },
+    [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc, .d8s = d8s_dq },
     [0x0c ... 0x0d] = { .simd_size = simd_packed_fp },
     [0x0e ... 0x0f] = { .simd_size = simd_packed_int },
     [0x14] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 0 },
@@ -6917,6 +6917,8 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf9): /* vpsubw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xfc): /* vpaddb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xfd): /* vpaddw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */
         host_and_vcpu_must_have(avx512bw);
         generate_exception_if(evex.brs, EXC_UD);
         elem_bytes = 1 << (b & 1);
@@ -8303,6 +8305,8 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f, 0xfa): /* vpsubd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xfb): /* vpsubq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xfe): /* vpaddd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x1e): /* vpabsd [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x1f): /* vpabsq [xyz]mm/mem,[xyz]mm{k} */
         generate_exception_if(evex.w != (b & 1), EXC_UD);
         goto avx512f_no_sae;
 
@@ -9331,6 +9335,17 @@ x86_emulate(
         host_and_vcpu_must_have(sse4_1);
         goto simd_0f3a_common;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0a): /* vrndscaless $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0b): /* vrndscalesd $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x08): /* vrndscaleps $imm8,[xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x09): /* vrndscalepd $imm8,[xyz]mm/mem,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512f);
+        generate_exception_if(evex.w != (b & 1), EXC_UD);
+        avx512_vlen_check(b & 2);
+        goto simd_imm8_zmm;
+
     case X86EMUL_OPC(0x0f3a, 0x0f):    /* palignr $imm8,mm/m64,mm */
     case X86EMUL_OPC_66(0x0f3a, 0x0f): /* palignr $imm8,xmm/m128,xmm */
         host_and_vcpu_must_have(ssse3);

From patchwork Fri Mar 15 10:54:58 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854497
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 618951390
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:56:39 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 480DC2A940
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:56:39 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 39FA02A944; Fri, 15 Mar 2019 10:56:39 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AF1182A940
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:56:38 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kUQ-00075e-Gq; Fri, 15 Mar 2019 10:55:02 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kUO-00075K-UK
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:55:00 +0000
X-Inumbo-ID: c6ebc302-4710-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id c6ebc302-4710-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:54:59 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:54:59 -0600
Message-Id: <5C8B8482020000780021F217@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:54:58 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 26/50] x86emul: support remaining AVX512BW
 legacy-equivalent insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v8: Re-base.
v5: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -354,6 +354,7 @@ static const struct test avx512bw_all[]
     INSN(paddusb,     66,   0f, dc,    vl,    b, vl),
     INSN(paddusw,     66,   0f, dd,    vl,    w, vl),
     INSN(paddw,       66,   0f, fd,    vl,    w, vl),
+    INSN(palignr,     66, 0f3a, 0f,    vl,    b, vl),
     INSN(pavgb,       66,   0f, e0,    vl,    b, vl),
     INSN(pavgw,       66,   0f, e3,    vl,    w, vl),
     INSN(pbroadcastb, 66, 0f38, 78,    el,    b, el),
@@ -369,6 +370,7 @@ static const struct test avx512bw_all[]
     INSN(permw,       66, 0f38, 8d,    vl,    w, vl),
     INSN(permi2w,     66, 0f38, 75,    vl,    w, vl),
     INSN(permt2w,     66, 0f38, 7d,    vl,    w, vl),
+    INSN(pmaddubsw,   66, 0f38, 04,    vl,    b, vl),
     INSN(pmaddwd,     66,   0f, f5,    vl,    w, vl),
     INSN(pmaxsb,      66, 0f38, 3c,    vl,    b, vl),
     INSN(pmaxsw,      66,   0f, ee,    vl,    w, vl),
@@ -386,6 +388,7 @@ static const struct test avx512bw_all[]
 //       pmovw2m,     f3, 0f38, 29,           w
     INSN(pmovwb,      f3, 0f38, 30,    vl_2,  b, vl),
     INSN(pmovzxbw,    66, 0f38, 30,    vl_2,  b, vl),
+    INSN(pmulhrsw,    66, 0f38, 0b,    vl,    w, vl),
     INSN(pmulhuw,     66,   0f, e4,    vl,    w, vl),
     INSN(pmulhw,      66,   0f, e5,    vl,    w, vl),
     INSN(pmullw,      66,   0f, d5,    vl,    w, vl),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -587,6 +587,7 @@ static inline vec_t movlhps(vec_t x, vec
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) ((vec_t)B(punpckhbw, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(punpcklbw, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0))
+#   define rotr(x, n) ((vec_t)B(palignr, _mask, (vdi_t)(x), (vdi_t)(x), (n) * 8, (vdi_t)undef(), ~0))
 #   define swap(x) ((vec_t)B(pshufb, _mask, (vqi_t)(x), (vqi_t)(inv - 1), (vqi_t)undef(), ~0))
 #  elif defined(__AVX512VBMI__)
 #   define interleave_hi(x, y) ((vec_t)B(vpermi2varqi, _mask, (vqi_t)(x), interleave_hi, (vqi_t)(y), ~0))
@@ -615,6 +616,7 @@ static inline vec_t movlhps(vec_t x, vec
 #  if VEC_SIZE == 16
 #   define interleave_hi(x, y) ((vec_t)B(punpckhwd, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(punpcklwd, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0))
+#   define rotr(x, n) ((vec_t)B(palignr, _mask, (vdi_t)(x), (vdi_t)(x), (n) * 16, (vdi_t)undef(), ~0))
 #   define swap(x) ((vec_t)B(pshufd, _mask, \
                              (vsi_t)B(pshufhw, _mask, \
                                       B(pshuflw, _mask, (vhi_t)(x), 0b00011011, (vhi_t)undef(), ~0), \
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -402,9 +402,12 @@ OVR(packssdw);
 OVR(packsswb);
 OVR(packusdw);
 OVR(packuswb);
+OVR(palignr);
+OVR(pmaddubsw);
 OVR(pmaddwd);
 OVR(pmovsxbw);
 OVR(pmovzxbw);
+OVR(pmulhrsw);
 OVR(pmulhuw);
 OVR(pmulhw);
 OVR(pmullw);
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -435,7 +435,10 @@ static const struct ext0f38_table {
     disp8scale_t d8s:4;
 } ext0f38_table[256] = {
     [0x00] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
-    [0x01 ... 0x0b] = { .simd_size = simd_packed_int },
+    [0x01 ... 0x03] = { .simd_size = simd_packed_int },
+    [0x04] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0x05 ... 0x0b] = { .simd_size = simd_packed_int },
+    [0x0b] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x0c ... 0x0d] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x0e ... 0x0f] = { .simd_size = simd_packed_fp },
     [0x10 ... 0x12] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
@@ -534,7 +537,8 @@ static const struct ext0f3a_table {
     [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc, .d8s = d8s_dq },
     [0x0c ... 0x0d] = { .simd_size = simd_packed_fp },
-    [0x0e ... 0x0f] = { .simd_size = simd_packed_int },
+    [0x0e] = { .simd_size = simd_packed_int },
+    [0x0f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x14] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 0 },
     [0x15] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 1 },
     [0x16] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = d8s_dq64 },
@@ -6899,6 +6903,7 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf1): /* vpsllw xmm/m128,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf5): /* vpmaddwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x00): /* vpshufb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x04): /* vpmaddubsw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         fault_suppression = false;
         /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xd5): /* vpmullw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
@@ -6917,6 +6922,7 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f, 0xf9): /* vpsubw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xfc): /* vpaddb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f, 0xfd): /* vpaddw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x0b): /* vpmulhrsw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */
         host_and_vcpu_must_have(avx512bw);
@@ -9374,6 +9380,10 @@ x86_emulate(
         insn_bytes = PFX_BYTES + 4;
         break;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0f): /* vpalignr $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        fault_suppression = false;
+        goto avx512bw_imm;
+
     case X86EMUL_OPC_66(0x0f3a, 0x14): /* pextrb $imm8,xmm,r/m */
     case X86EMUL_OPC_66(0x0f3a, 0x15): /* pextrw $imm8,xmm,r/m */
     case X86EMUL_OPC_66(0x0f3a, 0x16): /* pextr{d,q} $imm8,xmm,r/m */

From patchwork Fri Mar 15 10:55:36 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854499
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE9301390
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:57:25 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D371C2A943
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:57:25 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C7D302A945; Fri, 15 Mar 2019 10:57:25 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DB7F12A943
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:57:24 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kV5-0007Dx-SG; Fri, 15 Mar 2019 10:55:43 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kV4-0007Di-FK
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:55:42 +0000
X-Inumbo-ID: ddc1cc88-4710-11e9-bc2d-7bc4b87856ac
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id ddc1cc88-4710-11e9-bc2d-7bc4b87856ac;
 Fri, 15 Mar 2019 10:55:38 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:55:37 -0600
Message-Id: <5C8B84A8020000780021F23F@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:55:36 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 27/50] x86emul: support AVX512{F,
 ER} reciprocal insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Also include the only other AVX512ER insn pair, VEXP2P{D,S}.

Note that despite the replacement of the SHA insns' table slots there's
no need to special case their decoding: Their insn-specific code already
sets op_bytes (as was required due to simd_other), and TwoOp is of no
relevance for legacy encoded SIMD insns.

The raising of #UD when EVEX.L'L is 3 for AVX512ER scalar insns is done
to be on the safe side. The SDM does not clarify behavior there, and
it's even more ambiguous here (without AVX512VL in the picture).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Fix vector length check for AVX512ER insns. ea.type == OP_* ->
    ea.type != OP_*. Re-base.
v6: Re-base. AVX512ER tests now also successfully run.
v5: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86
 
 CFLAGS += $(CFLAGS_xeninclude)
 
-SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq
+SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er
 FMA := fma4 fma
 SG := avx2-sg
 TESTCASES := blowfish $(SIMD) $(FMA) $(SG)
@@ -72,6 +72,9 @@ avx512bw-flts :=
 avx512dq-vecs := $(avx512f-vecs)
 avx512dq-ints := $(avx512f-ints)
 avx512dq-flts := $(avx512f-flts)
+avx512er-vecs := 64
+avx512er-ints :=
+avx512er-flts := 4 8
 
 avx512f-opmask-vecs := 2
 avx512dq-opmask-vecs := 1 2
--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -278,10 +278,14 @@ static const struct test avx512f_all[] =
     INSN(punpckldq,    66,   0f, 62,    vl,      d, vl),
     INSN(punpcklqdq,   66,   0f, 6c,    vl,      q, vl),
     INSN(pxor,         66,   0f, ef,    vl,     dq, vl),
+    INSN(rcp14,        66, 0f38, 4c,    vl,     sd, vl),
+    INSN(rcp14,        66, 0f38, 4d,    el,     sd, el),
     INSN(rndscalepd,   66, 0f3a, 09,    vl,      q, vl),
     INSN(rndscaleps,   66, 0f3a, 08,    vl,      d, vl),
     INSN(rndscalesd,   66, 0f3a, 0b,    el,      q, el),
     INSN(rndscaless,   66, 0f3a, 0a,    el,      d, el),
+    INSN(rsqrt14,      66, 0f38, 4e,    vl,     sd, vl),
+    INSN(rsqrt14,      66, 0f38, 4f,    el,     sd, el),
     INSN_PFP(shuf,           0f, c6),
     INSN_FP(sqrt,            0f, 51),
     INSN_FP(sub,             0f, 5c),
@@ -477,6 +481,14 @@ static const struct test avx512dq_512[]
     INSN(inserti32x8,    66, 0f3a, 3a, el_8, d, vl),
 };
 
+static const struct test avx512er_512[] = {
+    INSN(exp2,    66, 0f38, c8, vl, sd, vl),
+    INSN(rcp28,   66, 0f38, ca, vl, sd, vl),
+    INSN(rcp28,   66, 0f38, cb, el, sd, el),
+    INSN(rsqrt28, 66, 0f38, cc, vl, sd, vl),
+    INSN(rsqrt28, 66, 0f38, cd, el, sd, el),
+};
+
 static const struct test avx512_vbmi_all[] = {
     INSN(permb,         66, 0f38, 8d, vl, b, vl),
     INSN(permi2b,       66, 0f38, 75, vl, b, vl),
@@ -837,5 +849,6 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512dq, 128);
     RUN(avx512dq, no128);
     RUN(avx512dq, 512);
+    RUN(avx512er, 512);
     RUN(avx512_vbmi, all);
 }
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -210,9 +210,23 @@ static inline vec_t movlhps(vec_t x, vec
 })
 #elif defined(FLOAT_SIZE) && VEC_SIZE == FLOAT_SIZE && defined(__AVX512F__)
 # if FLOAT_SIZE == 4
+#  ifdef __AVX512ER__
+#   define recip(x) scalar_1op(x, "vrcp28ss %[in], %[out], %[out]")
+#   define rsqrt(x) scalar_1op(x, "vrsqrt28ss %[in], %[out], %[out]")
+#  else
+#   define recip(x) scalar_1op(x, "vrcp14ss %[in], %[out], %[out]")
+#   define rsqrt(x) scalar_1op(x, "vrsqrt14ss %[in], %[out], %[out]")
+#  endif
 #  define sqrt(x) scalar_1op(x, "vsqrtss %[in], %[out], %[out]")
 #  define trunc(x) scalar_1op(x, "vrndscaless $0b1011, %[in], %[out], %[out]")
 # elif FLOAT_SIZE == 8
+#  ifdef __AVX512ER__
+#   define recip(x) scalar_1op(x, "vrcp28sd %[in], %[out], %[out]")
+#   define rsqrt(x) scalar_1op(x, "vrsqrt28sd %[in], %[out], %[out]")
+#  else
+#   define recip(x) scalar_1op(x, "vrcp14sd %[in], %[out], %[out]")
+#   define rsqrt(x) scalar_1op(x, "vrsqrt14sd %[in], %[out], %[out]")
+#  endif
 #  define sqrt(x) scalar_1op(x, "vsqrtsd %[in], %[out], %[out]")
 #  define trunc(x) scalar_1op(x, "vrndscalesd $0b1011, %[in], %[out], %[out]")
 # endif
@@ -263,6 +277,13 @@ static inline vec_t movlhps(vec_t x, vec
 #  define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0)
 #  define min(x, y) BR_(minps, _mask, x, y, undef(), ~0)
 #  define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE))
+#  if VEC_SIZE == 64 && defined(__AVX512ER__)
+#   define recip(x) BR(rcp28ps, _mask, x, undef(), ~0)
+#   define rsqrt(x) BR(rsqrt28ps, _mask, x, undef(), ~0)
+#  else
+#   define recip(x) B(rcp14ps, _mask, x, undef(), ~0)
+#   define rsqrt(x) B(rsqrt14ps, _mask, x, undef(), ~0)
+#  endif
 #  define shrink1(x) BR_(cvtpd2ps, _mask, (vdf_t)(x), (vsf_half_t){}, ~0)
 #  define sqrt(x) BR(sqrtps, _mask, x, undef(), ~0)
 #  define trunc(x) BR(rndscaleps_, _mask, x, 0b1011, undef(), ~0)
@@ -318,6 +339,13 @@ static inline vec_t movlhps(vec_t x, vec
 #  define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0)
 #  define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0)
 #  define mix(x, y) B(movapd, _mask, x, y, 0b01010101)
+#  if VEC_SIZE == 64 && defined(__AVX512ER__)
+#   define recip(x) BR(rcp28pd, _mask, x, undef(), ~0)
+#   define rsqrt(x) BR(rsqrt28pd, _mask, x, undef(), ~0)
+#  else
+#   define recip(x) B(rcp14pd, _mask, x, undef(), ~0)
+#   define rsqrt(x) B(rsqrt14pd, _mask, x, undef(), ~0)
+#  endif
 #  define sqrt(x) BR(sqrtpd, _mask, x, undef(), ~0)
 #  define trunc(x) BR(rndscalepd_, _mask, x, 0b1011, undef(), ~0)
 #  if VEC_SIZE == 16
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -178,14 +178,20 @@ DECL_OCTET(half);
 /* Sadly there are a few exceptions to the general naming rules. */
 # define __builtin_ia32_broadcastf32x4_512_mask __builtin_ia32_broadcastf32x4_512
 # define __builtin_ia32_broadcasti32x4_512_mask __builtin_ia32_broadcasti32x4_512
+# define __builtin_ia32_exp2pd512_mask __builtin_ia32_exp2pd_mask
+# define __builtin_ia32_exp2ps512_mask __builtin_ia32_exp2ps_mask
 # define __builtin_ia32_insertf32x4_512_mask __builtin_ia32_insertf32x4_mask
 # define __builtin_ia32_insertf32x8_512_mask __builtin_ia32_insertf32x8_mask
 # define __builtin_ia32_insertf64x4_512_mask __builtin_ia32_insertf64x4_mask
 # define __builtin_ia32_inserti32x4_512_mask __builtin_ia32_inserti32x4_mask
 # define __builtin_ia32_inserti32x8_512_mask __builtin_ia32_inserti32x8_mask
 # define __builtin_ia32_inserti64x4_512_mask __builtin_ia32_inserti64x4_mask
+# define __builtin_ia32_rcp28pd512_mask __builtin_ia32_rcp28pd_mask
+# define __builtin_ia32_rcp28ps512_mask __builtin_ia32_rcp28ps_mask
 # define __builtin_ia32_rndscalepd_512_mask __builtin_ia32_rndscalepd_mask
 # define __builtin_ia32_rndscaleps_512_mask __builtin_ia32_rndscaleps_mask
+# define __builtin_ia32_rsqrt28pd512_mask __builtin_ia32_rsqrt28pd_mask
+# define __builtin_ia32_rsqrt28ps512_mask __builtin_ia32_rsqrt28ps_mask
 # define __builtin_ia32_shuf_f32x4_512_mask __builtin_ia32_shuf_f32x4_mask
 # define __builtin_ia32_shuf_f64x2_512_mask __builtin_ia32_shuf_f64x2_mask
 # define __builtin_ia32_shuf_i32x4_512_mask __builtin_ia32_shuf_i32x4_mask
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -24,6 +24,7 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512f.h"
 #include "avx512bw.h"
 #include "avx512dq.h"
+#include "avx512er.h"
 
 #define verbose false /* Switch to true for far more logging. */
 
@@ -106,6 +107,11 @@ static bool simd_check_avx512dq_vl(void)
     return cpu_has_avx512dq && cpu_has_avx512vl;
 }
 
+static bool simd_check_avx512er(void)
+{
+    return cpu_has_avx512er;
+}
+
 static bool simd_check_avx512bw(void)
 {
     return cpu_has_avx512bw;
@@ -327,6 +333,10 @@ static const struct {
     AVX512VL(DQ+VL u64x2,    avx512dq,      16u8),
     AVX512VL(DQ+VL s64x4,    avx512dq,      32i8),
     AVX512VL(DQ+VL u64x4,    avx512dq,      32u8),
+    SIMD(AVX512ER f32 scalar,avx512er,        f4),
+    SIMD(AVX512ER f32x16,    avx512er,      64f4),
+    SIMD(AVX512ER f64 scalar,avx512er,        f8),
+    SIMD(AVX512ER f64x8,     avx512er,      64f8),
 #undef AVX512VL_
 #undef AVX512VL
 #undef SIMD_
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -134,6 +134,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_bmi2       cp.feat.bmi2
 #define cpu_has_avx512f   (cp.feat.avx512f  && xcr0_mask(0xe6))
 #define cpu_has_avx512dq  (cp.feat.avx512dq && xcr0_mask(0xe6))
+#define cpu_has_avx512er  (cp.feat.avx512er && xcr0_mask(0xe6))
 #define cpu_has_avx512bw  (cp.feat.avx512bw && xcr0_mask(0xe6))
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -471,6 +471,10 @@ static const struct ext0f38_table {
     [0x40] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x41] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0x45 ... 0x47] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0x4c] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 },
     [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 },
     [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 },
@@ -510,7 +514,12 @@ static const struct ext0f38_table {
     [0xbd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xbe] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0xbf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
-    [0xc8 ... 0xcd] = { .simd_size = simd_other },
+    [0xc8] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0xc9] = { .simd_size = simd_other },
+    [0xca] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0xcb] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0xcc] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0xcd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xdb] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0xdc ... 0xdf] = { .simd_size = simd_packed_int },
     [0xf0] = { .two_op = 1 },
@@ -1873,6 +1882,7 @@ static bool vcpu_has(
 #define vcpu_has_smap()        vcpu_has(         7, EBX, 20, ctxt, ops)
 #define vcpu_has_clflushopt()  vcpu_has(         7, EBX, 23, ctxt, ops)
 #define vcpu_has_clwb()        vcpu_has(         7, EBX, 24, ctxt, ops)
+#define vcpu_has_avx512er()    vcpu_has(         7, EBX, 27, ctxt, ops)
 #define vcpu_has_sha()         vcpu_has(         7, EBX, 29, ctxt, ops)
 #define vcpu_has_avx512bw()    vcpu_has(         7, EBX, 30, ctxt, ops)
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
@@ -6168,6 +6178,8 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x45): /* vpsrlv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x46): /* vpsrav{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x47): /* vpsllv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x4c): /* vrcp14p{s,d} [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x4e): /* vrsqrt14p{s,d} [xyz]mm/mem,[xyz]mm{k} */
     avx512f_no_sae:
         host_and_vcpu_must_have(avx512f);
         generate_exception_if(ea.type != OP_MEM && evex.brs, EXC_UD);
@@ -8865,6 +8877,13 @@ x86_emulate(
         generate_exception_if(vex.w, EXC_UD);
         goto simd_0f_avx2;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x4d): /* vrcp14s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x4f): /* vrsqrt14s{s,d} xmm/mem,xmm,xmm{k} */
+        host_and_vcpu_must_have(avx512f);
+        generate_exception_if(evex.brs, EXC_UD);
+        avx512_vlen_check(true);
+        goto simd_zmm;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0x5a): /* vbroadcasti128 m128,ymm */
         generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD);
         goto simd_0f_avx2;
@@ -9112,6 +9131,7 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xbd): /* vfnmadd231s{s,d} xmm/mem,xmm,xmm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xbf): /* vfnmsub231s{s,d} xmm/mem,xmm,xmm{k} */
         host_and_vcpu_must_have(avx512f);
+    simd_zmm_scalar_sae:
         generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD);
         if ( !evex.brs )
             avx512_vlen_check(true);
@@ -9127,6 +9147,19 @@ x86_emulate(
         op_bytes = 16;
         goto simd_0f38_common;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xc8): /* vexp2p{s,d} zmm/mem,zmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xca): /* vrcp28p{s,d} zmm/mem,zmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xcc): /* vrsqrt28p{s,d} zmm/mem,zmm{k} */
+        host_and_vcpu_must_have(avx512er);
+        generate_exception_if((ea.type != OP_REG || !evex.brs) && evex.lr != 2,
+                              EXC_UD);
+        goto simd_zmm;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xcb): /* vrcp28s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xcd): /* vrsqrt28s{s,d} xmm/mem,xmm,xmm{k} */
+        host_and_vcpu_must_have(avx512er);
+        goto simd_zmm_scalar_sae;
+
     case X86EMUL_OPC(0x0f38, 0xf0): /* movbe m,r */
     case X86EMUL_OPC(0x0f38, 0xf1): /* movbe r,m */
         vcpu_must_have(movbe);
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -102,6 +102,7 @@
 #define cpu_has_avx512dq        boot_cpu_has(X86_FEATURE_AVX512DQ)
 #define cpu_has_rdseed          boot_cpu_has(X86_FEATURE_RDSEED)
 #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
+#define cpu_has_avx512er        boot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bw        boot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vl        boot_cpu_has(X86_FEATURE_AVX512VL)

From patchwork Fri Mar 15 10:56:05 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854501
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D90013B5
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:57:58 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42A862A946
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:57:58 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 368642A949; Fri, 15 Mar 2019 10:57:58 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 54E292A946
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:57:57 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kVV-0007Iw-7U; Fri, 15 Mar 2019 10:56:09 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kVU-0007If-4i
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:56:08 +0000
X-Inumbo-ID: ee9f13c0-4710-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id ee9f13c0-4710-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:56:06 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:56:05 -0600
Message-Id: <5C8B84C5020000780021F242@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:56:05 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 28/50] x86emul: support AVX512F floating
 point manipulation insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Fix vector length check for scalar insns. ea.type == OP_* ->
    ea.type != OP_*. Re-base.
v5: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -140,6 +140,8 @@ static const struct test avx512f_all[] =
     INSN(cvtusi2sd,    f2,   0f, 7b,    el,   dq64, el),
     INSN(cvtusi2ss,    f3,   0f, 7b,    el,   dq64, el),
     INSN_FP(div,             0f, 5e),
+    INSN(fixupimm,     66, 0f3a, 54,    vl,     sd, vl),
+    INSN(fixupimm,     66, 0f3a, 55,    el,     sd, el),
     INSN(fmadd132,     66, 0f38, 98,    vl,     sd, vl),
     INSN(fmadd132,     66, 0f38, 99,    el,     sd, el),
     INSN(fmadd213,     66, 0f38, a8,    vl,     sd, vl),
@@ -170,6 +172,10 @@ static const struct test avx512f_all[] =
     INSN(fnmsub213,    66, 0f38, af,    el,     sd, el),
     INSN(fnmsub231,    66, 0f38, be,    vl,     sd, vl),
     INSN(fnmsub231,    66, 0f38, bf,    el,     sd, el),
+    INSN(getexp,       66, 0f38, 42,    vl,     sd, vl),
+    INSN(getexp,       66, 0f38, 43,    el,     sd, el),
+    INSN(getmant,      66, 0f3a, 26,    vl,     sd, vl),
+    INSN(getmant,      66, 0f3a, 27,    el,     sd, el),
     INSN_FP(max,             0f, 5f),
     INSN_FP(min,             0f, 5d),
     INSN_SFP(mov,            0f, 10),
@@ -286,6 +292,8 @@ static const struct test avx512f_all[] =
     INSN(rndscaless,   66, 0f3a, 0a,    el,      d, el),
     INSN(rsqrt14,      66, 0f38, 4e,    vl,     sd, vl),
     INSN(rsqrt14,      66, 0f38, 4f,    el,     sd, el),
+    INSN(scalef,       66, 0f38, 2c,    vl,     sd, vl),
+    INSN(scalef,       66, 0f38, 2d,    el,     sd, el),
     INSN_PFP(shuf,           0f, c6),
     INSN_FP(sqrt,            0f, 51),
     INSN_FP(sub,             0f, 5c),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -174,6 +174,11 @@ static inline bool _to_bool(byte_vec_t b
     asm ( op : [out] "=&x" (r_) : [in] "m" (x) ); \
     (vec_t){ r_[0] }; \
 })
+# define scalar_2op(x, y, op) ({ \
+    typeof((x)[0]) __attribute__((vector_size(16))) r_ = { x[0] }; \
+    asm ( op : [out] "=&x" (r_) : [in1] "[out]" (r_), [in2] "m" (y) ); \
+    (vec_t){ r_[0] }; \
+})
 #endif
 
 #if VEC_SIZE == 16 && FLOAT_SIZE == 4 && defined(__SSE__)
@@ -210,6 +215,8 @@ static inline vec_t movlhps(vec_t x, vec
 })
 #elif defined(FLOAT_SIZE) && VEC_SIZE == FLOAT_SIZE && defined(__AVX512F__)
 # if FLOAT_SIZE == 4
+#  define getexp(x) scalar_1op(x, "vgetexpss %[in], %[out], %[out]")
+#  define getmant(x) scalar_1op(x, "vgetmantss $0, %[in], %[out], %[out]")
 #  ifdef __AVX512ER__
 #   define recip(x) scalar_1op(x, "vrcp28ss %[in], %[out], %[out]")
 #   define rsqrt(x) scalar_1op(x, "vrsqrt28ss %[in], %[out], %[out]")
@@ -217,9 +224,12 @@ static inline vec_t movlhps(vec_t x, vec
 #   define recip(x) scalar_1op(x, "vrcp14ss %[in], %[out], %[out]")
 #   define rsqrt(x) scalar_1op(x, "vrsqrt14ss %[in], %[out], %[out]")
 #  endif
+#  define scale(x, y) scalar_2op(x, y, "vscalefss %[in2], %[in1], %[out]")
 #  define sqrt(x) scalar_1op(x, "vsqrtss %[in], %[out], %[out]")
 #  define trunc(x) scalar_1op(x, "vrndscaless $0b1011, %[in], %[out], %[out]")
 # elif FLOAT_SIZE == 8
+#  define getexp(x) scalar_1op(x, "vgetexpsd %[in], %[out], %[out]")
+#  define getmant(x) scalar_1op(x, "vgetmantsd $0, %[in], %[out], %[out]")
 #  ifdef __AVX512ER__
 #   define recip(x) scalar_1op(x, "vrcp28sd %[in], %[out], %[out]")
 #   define rsqrt(x) scalar_1op(x, "vrsqrt28sd %[in], %[out], %[out]")
@@ -227,6 +237,7 @@ static inline vec_t movlhps(vec_t x, vec
 #   define recip(x) scalar_1op(x, "vrcp14sd %[in], %[out], %[out]")
 #   define rsqrt(x) scalar_1op(x, "vrsqrt14sd %[in], %[out], %[out]")
 #  endif
+#  define scale(x, y) scalar_2op(x, y, "vscalefsd %[in2], %[in1], %[out]")
 #  define sqrt(x) scalar_1op(x, "vsqrtsd %[in], %[out], %[out]")
 #  define trunc(x) scalar_1op(x, "vrndscalesd $0b1011, %[in], %[out], %[out]")
 # endif
@@ -274,9 +285,12 @@ static inline vec_t movlhps(vec_t x, vec
 #   define broadcast_octet(x) B(broadcastf32x8_, _mask, x, undef(), ~0)
 #   define insert_octet(x, y, p) B(insertf32x8_, _mask, x, y, p, undef(), ~0)
 #  endif
+#  define getexp(x) BR(getexpps, _mask, x, undef(), ~0)
+#  define getmant(x) BR(getmantps, _mask, x, 0, undef(), ~0)
 #  define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0)
 #  define min(x, y) BR_(minps, _mask, x, y, undef(), ~0)
 #  define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE))
+#  define scale(x, y) BR(scalefps, _mask, x, y, undef(), ~0)
 #  if VEC_SIZE == 64 && defined(__AVX512ER__)
 #   define recip(x) BR(rcp28ps, _mask, x, undef(), ~0)
 #   define rsqrt(x) BR(rsqrt28ps, _mask, x, undef(), ~0)
@@ -336,9 +350,12 @@ static inline vec_t movlhps(vec_t x, vec
 #   define broadcast_quartet(x) B(broadcastf64x4_, , x, undef(), ~0)
 #   define insert_quartet(x, y, p) B(insertf64x4_, _mask, x, y, p, undef(), ~0)
 #  endif
+#  define getexp(x) BR(getexppd, _mask, x, undef(), ~0)
+#  define getmant(x) BR(getmantpd, _mask, x, 0, undef(), ~0)
 #  define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0)
 #  define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0)
 #  define mix(x, y) B(movapd, _mask, x, y, 0b01010101)
+#  define scale(x, y) BR(scalefpd, _mask, x, y, undef(), ~0)
 #  if VEC_SIZE == 64 && defined(__AVX512ER__)
 #   define recip(x) BR(rcp28pd, _mask, x, undef(), ~0)
 #   define rsqrt(x) BR(rsqrt28pd, _mask, x, undef(), ~0)
@@ -1766,6 +1783,28 @@ int simd_test(void)
 # endif
 #endif
 
+#if defined(getexp) && defined(getmant)
+    touch(src);
+    x = getmant(src);
+    touch(src);
+    y = getexp(src);
+    touch(src);
+    for ( j = i = 0; i < ELEM_COUNT; ++i )
+    {
+        if ( y[i] != j ) return __LINE__;
+
+        if ( !((i + 1) & (i + 2)) )
+            ++j;
+
+        if ( !(i & (i + 1)) && x[i] != 1 ) return __LINE__;
+    }
+# ifdef scale
+    touch(y);
+    z = scale(x, y);
+    if ( !eq(src, z) ) return __LINE__;
+# endif
+#endif
+
 #if (defined(__XOP__) && VEC_SIZE == 16 && (INT_SIZE == 2 || INT_SIZE == 4)) || \
     (defined(__AVX512F__) && defined(FLOAT_SIZE))
     return -fma_test();
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -3924,6 +3924,44 @@ int main(int argc, char **argv)
     else
         printf("skipped\n");
 
+    printf("%-40s", "Testing vfixupimmpd $0,8(%edx){1to8},%zmm3,%zmm4...");
+    if ( stack_exec && cpu_has_avx512f )
+    {
+        decl_insn(vfixupimmpd);
+        static const struct {
+            double d[4];
+        }
+        src = { { -1, 0, 1, 2 } },
+        dst = { { 3, 4, 5, 6 } },
+        out = { { .5, -1, 90, 2 } };
+
+        asm volatile ( "vbroadcastf64x4 %1, %%zmm3\n\t"
+                       "vbroadcastf64x4 %2, %%zmm4\n"
+                       put_insn(vfixupimmpd,
+                                "vfixupimmpd $0, 8(%0)%{1to8%}, %%zmm3, %%zmm4")
+                       :: "d" (NULL), "m" (src), "m" (dst) );
+
+        set_insn(vfixupimmpd);
+        /*
+         * Nibble (token) mapping (unused ones simply set to zero):
+         * 2 (ZERO)    ->  -1 (0x9)
+         * 3 (POS_ONE) ->  90 (0xc)
+         * 6 (NEG)     -> 1/2 (0xb)
+         * 7 (POS)     -> src (0x1)
+         */
+        res[2] = 0x1b00c900;
+        regs.edx = (unsigned long)res;
+        rc = x86_emulate(&ctxt, &emulops);
+        asm volatile ( "vmovupd %%zmm4, %0" : "=m" (res[0]) );
+        if ( rc != X86EMUL_OKAY || !check_eip(vfixupimmpd) ||
+             memcmp(res + 0, &out, sizeof(out)) ||
+             memcmp(res + 8, &out, sizeof(out)) )
+            goto fail;
+        printf("okay\n");
+    }
+    else
+        printf("skipped\n");
+
 #undef decl_insn
 #undef put_insn
 #undef set_insn
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -459,7 +459,8 @@ static const struct ext0f38_table {
     [0x26 ... 0x29] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x2a] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0x2b] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
-    [0x2c ... 0x2d] = { .simd_size = simd_packed_fp },
+    [0x2c] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
+    [0x2d] = { .simd_size = simd_packed_fp, .d8s = d8s_dq },
     [0x2e ... 0x2f] = { .simd_size = simd_packed_fp, .to_mem = 1 },
     [0x30] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x31] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 },
@@ -470,6 +471,8 @@ static const struct ext0f38_table {
     [0x36 ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x40] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x41] = { .simd_size = simd_packed_int, .two_op = 1 },
+    [0x42] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0x43] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x45 ... 0x47] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x4c] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
@@ -563,6 +566,8 @@ static const struct ext0f3a_table {
     [0x22] = { .simd_size = simd_none, .d8s = d8s_dq64 },
     [0x23] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x25] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0x26] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0x27] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x30 ... 0x33] = { .simd_size = simd_other, .two_op = 1 },
     [0x38] = { .simd_size = simd_128, .d8s = 4 },
     [0x3a] = { .simd_size = simd_256, .d8s = d8s_vl_by_2 },
@@ -577,6 +582,8 @@ static const struct ext0f3a_table {
     [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x4a ... 0x4b] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x4c] = { .simd_size = simd_packed_int, .four_op = 1 },
+    [0x54] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
+    [0x55] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x5c ... 0x5f] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x60 ... 0x63] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0x68 ... 0x69] = { .simd_size = simd_packed_fp, .four_op = 1 },
@@ -2684,6 +2691,10 @@ x86_decode_0f38(
         ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
         break;
 
+    case X86EMUL_OPC_EVEX_66(0, 0x2d): /* vscalefs{s,d} */
+        state->simd_size = simd_scalar_vexw;
+        break;
+
     case X86EMUL_OPC_EVEX_66(0, 0x7a): /* vpbroadcastb */
     case X86EMUL_OPC_EVEX_66(0, 0x7b): /* vpbroadcastw */
     case X86EMUL_OPC_EVEX_66(0, 0x7c): /* vpbroadcast{d,q} */
@@ -9095,6 +9106,8 @@ x86_emulate(
         host_and_vcpu_must_have(fma);
         goto simd_0f_ymm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x2c): /* vscalefp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x42): /* vgetexpp{s,d} [xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x96): /* vfmaddsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x97): /* vfmsubadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x98): /* vfmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
@@ -9118,6 +9131,8 @@ x86_emulate(
             avx512_vlen_check(false);
         goto simd_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x2d): /* vscalefs{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x43): /* vgetexps{s,d} xmm/mem,xmm,xmm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x99): /* vfmadd132s{s,d} xmm/mem,xmm,xmm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x9b): /* vfmsub132s{s,d} xmm/mem,xmm,xmm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x9d): /* vfnmadd132s{s,d} xmm/mem,xmm,xmm{k} */
@@ -9681,6 +9696,21 @@ x86_emulate(
         op_bytes = 4;
         goto simd_imm8_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x26): /* vgetmantp{s,d} $imm8,[xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x54): /* vfixupimmp{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512f);
+        if ( ea.type != OP_REG || !evex.brs )
+            avx512_vlen_check(false);
+        goto simd_imm8_zmm;
+
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x27): /* vgetmants{s,d} $imm8,xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x55): /* vfixupimms{s,d} $imm8,xmm/mem,xmm,xmm{k} */
+        host_and_vcpu_must_have(avx512f);
+        generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD);
+        if ( !evex.brs )
+            avx512_vlen_check(true);
+        goto simd_imm8_zmm;
+
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x30): /* kshiftr{b,w} $imm8,k,k */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x32): /* kshiftl{b,w} $imm8,k,k */
         if ( !vex.w )

From patchwork Fri Mar 15 10:56:28 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854503
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EDCF71390
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:58:34 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2EC02A947
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:58:34 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C45CB2A949; Fri, 15 Mar 2019 10:58:34 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2488B2A947
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:58:34 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kVw-0007Pl-N9; Fri, 15 Mar 2019 10:56:36 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kVv-0007PF-53
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:56:35 +0000
X-Inumbo-ID: fd04ddd8-4710-11e9-898b-07180b6782ad
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id fd04ddd8-4710-11e9-898b-07180b6782ad;
 Fri, 15 Mar 2019 10:56:30 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:56:29 -0600
Message-Id: <5C8B84DC020000780021F245@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:56:28 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 29/50] x86emul: support AVX512DQ floating
 point manipulation insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This completes support of AVX512DQ in the insn emulator.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Fix vector length check for scalar insns. Re-base.
v5: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -457,11 +457,17 @@ static const struct test avx512dq_all[]
     INSN(cvttps2uqq,     66,   0f, 78, vl_2,  d, vl),
     INSN(cvtuqq2pd,      f3,   0f, 7a,   vl,  q, vl),
     INSN(cvtuqq2ps,      f2,   0f, 7a,   vl,  q, vl),
+    INSN(fpclass,        66, 0f3a, 66,   vl, sd, vl),
+    INSN(fpclass,        66, 0f3a, 67,   el, sd, el),
     INSN_PFP(or,               0f, 56),
 //       pmovd2m,        f3, 0f38, 39,        d
 //       pmovm2,         f3, 0f38, 38,       dq
 //       pmovq2m,        f3, 0f38, 39,        q
     INSN(pmullq,         66, 0f38, 40,   vl,  q, vl),
+    INSN(range,          66, 0f3a, 50,   vl, sd, vl),
+    INSN(range,          66, 0f3a, 51,   el, sd, el),
+    INSN(reduce,         66, 0f3a, 56,   vl, sd, vl),
+    INSN(reduce,         66, 0f3a, 57,   el, sd, el),
     INSN_PFP(xor,              0f, 57),
 };
 
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -285,10 +285,18 @@ static inline vec_t movlhps(vec_t x, vec
 #   define broadcast_octet(x) B(broadcastf32x8_, _mask, x, undef(), ~0)
 #   define insert_octet(x, y, p) B(insertf32x8_, _mask, x, y, p, undef(), ~0)
 #  endif
+#  ifdef __AVX512DQ__
+#   define frac(x) B(reduceps, _mask, x, 0b00001011, undef(), ~0)
+#  endif
 #  define getexp(x) BR(getexpps, _mask, x, undef(), ~0)
 #  define getmant(x) BR(getmantps, _mask, x, 0, undef(), ~0)
-#  define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0)
-#  define min(x, y) BR_(minps, _mask, x, y, undef(), ~0)
+#  ifdef __AVX512DQ__
+#   define max(x, y) BR(rangeps, _mask, x, y, 0b0101, undef(), ~0)
+#   define min(x, y) BR(rangeps, _mask, x, y, 0b0100, undef(), ~0)
+#  else
+#   define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0)
+#   define min(x, y) BR_(minps, _mask, x, y, undef(), ~0)
+#  endif
 #  define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE))
 #  define scale(x, y) BR(scalefps, _mask, x, y, undef(), ~0)
 #  if VEC_SIZE == 64 && defined(__AVX512ER__)
@@ -350,10 +358,18 @@ static inline vec_t movlhps(vec_t x, vec
 #   define broadcast_quartet(x) B(broadcastf64x4_, , x, undef(), ~0)
 #   define insert_quartet(x, y, p) B(insertf64x4_, _mask, x, y, p, undef(), ~0)
 #  endif
+#  ifdef __AVX512DQ__
+#   define frac(x) B(reducepd, _mask, x, 0b00001011, undef(), ~0)
+#  endif
 #  define getexp(x) BR(getexppd, _mask, x, undef(), ~0)
 #  define getmant(x) BR(getmantpd, _mask, x, 0, undef(), ~0)
-#  define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0)
-#  define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0)
+#  ifdef __AVX512DQ__
+#   define max(x, y) BR(rangepd, _mask, x, y, 0b0101, undef(), ~0)
+#   define min(x, y) BR(rangepd, _mask, x, y, 0b0100, undef(), ~0)
+#  else
+#   define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0)
+#   define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0)
+#  endif
 #  define mix(x, y) B(movapd, _mask, x, y, 0b01010101)
 #  define scale(x, y) BR(scalefpd, _mask, x, y, undef(), ~0)
 #  if VEC_SIZE == 64 && defined(__AVX512ER__)
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -3962,6 +3962,39 @@ int main(int argc, char **argv)
     else
         printf("skipped\n");
 
+
+    printf("%-40s", "Testing vfpclasspsz $0x46,64(%edx),%k2...");
+    if ( stack_exec && cpu_has_avx512dq )
+    {
+        decl_insn(vfpclassps);
+
+        asm volatile ( put_insn(vfpclassps,
+                                /* 0x46: check for +/- 0 and neg. */
+                                "vfpclasspsz $0x46, 64(%0), %%k2")
+                       :: "d" (NULL) );
+
+        set_insn(vfpclassps);
+        for ( i = 0; i < 3; ++i )
+        {
+            res[16 + i * 5 + 0] = 0x00000000; /* +0 */
+            res[16 + i * 5 + 1] = 0x80000000; /* -0 */
+            res[16 + i * 5 + 2] = 0x80000001; /* -DEN */
+            res[16 + i * 5 + 3] = 0xff000000; /* -FIN */
+            res[16 + i * 5 + 4] = 0x7f000000; /* +FIN */
+        }
+        res[31] = 0;
+        regs.edx = (unsigned long)res;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vfpclassps) )
+            goto fail;
+        asm volatile ( "kmovw %%k2, %0" : "=g" (rc) );
+        if ( rc != 0xbdef )
+            goto fail;
+        printf("okay\n");
+    }
+    else
+        printf("skipped\n");
+
 #undef decl_insn
 #undef put_insn
 #undef set_insn
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -582,10 +582,16 @@ static const struct ext0f3a_table {
     [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x4a ... 0x4b] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x4c] = { .simd_size = simd_packed_int, .four_op = 1 },
+    [0x50] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
+    [0x51] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x54] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x55] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0x56] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0x57] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x5c ... 0x5f] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x60 ... 0x63] = { .simd_size = simd_packed_int, .two_op = 1 },
+    [0x66] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
+    [0x67] = { .simd_size = simd_scalar_vexw, .two_op = 1, .d8s = d8s_dq },
     [0x68 ... 0x69] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x6a ... 0x6b] = { .simd_size = simd_scalar_opc, .four_op = 1 },
     [0x6c ... 0x6d] = { .simd_size = simd_packed_fp, .four_op = 1 },
@@ -9696,6 +9702,10 @@ x86_emulate(
         op_bytes = 4;
         goto simd_imm8_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x50): /* vrangep{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x56): /* vreducep{s,d} $imm8,[xyz]mm/mem,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512dq);
+        /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x26): /* vgetmantp{s,d} $imm8,[xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x54): /* vfixupimmp{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         host_and_vcpu_must_have(avx512f);
@@ -9703,6 +9713,10 @@ x86_emulate(
             avx512_vlen_check(false);
         goto simd_imm8_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x51): /* vranges{s,d} $imm8,xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x57): /* vreduces{s,d} $imm8,xmm/mem,xmm,xmm{k} */
+        host_and_vcpu_must_have(avx512dq);
+        /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x27): /* vgetmants{s,d} $imm8,xmm/mem,xmm,xmm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x55): /* vfixupimms{s,d} $imm8,xmm/mem,xmm,xmm{k} */
         host_and_vcpu_must_have(avx512f);
@@ -9858,6 +9872,16 @@ x86_emulate(
         dst.type = OP_NONE;
         break;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x66): /* vfpclassp{s,d} $imm8,[xyz]mm/mem,k{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x67): /* vfpclasss{s,d} $imm8,[xyz]mm/mem,k{k} */
+        host_and_vcpu_must_have(avx512dq);
+        generate_exception_if(!evex.r || !evex.R || evex.z, EXC_UD);
+        if ( !(b & 1) )
+            goto avx512f_imm8_no_sae;
+        generate_exception_if(evex.brs, EXC_UD);
+        avx512_vlen_check(true);
+        goto simd_imm8_zmm;
+
     case X86EMUL_OPC(0x0f3a, 0xcc):     /* sha1rnds4 $imm8,xmm/m128,xmm */
         host_and_vcpu_must_have(sha);
         op_bytes = 16;

From patchwork Fri Mar 15 10:56:56 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854513
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9F501390
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:59:02 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DF702A947
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:59:02 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9209E2A949; Fri, 15 Mar 2019 10:59:02 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 80C692A947
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 10:59:01 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kWM-0007WA-1S; Fri, 15 Mar 2019 10:57:02 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kWK-0007Vw-Os
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:57:00 +0000
X-Inumbo-ID: 0dd1094e-4711-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 0dd1094e-4711-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:56:58 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:56:58 -0600
Message-Id: <5C8B84F8020000780021F248@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:56:56 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 30/50] x86emul: support AVX512{F,
 _VBMI2} compress/expand insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v7: Re-base.
v6: Re-base. Add tests for the byte/word forms.
v5: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -109,6 +109,7 @@ static const struct test avx512f_all[] =
     INSN_FP(cmp,             0f, c2),
     INSN(comisd,       66,   0f, 2f,    el,      q, el),
     INSN(comiss,         ,   0f, 2f,    el,      d, el),
+    INSN(compress,     66, 0f38, 8a,    vl,     sd, el),
     INSN(cvtdq2pd,     f3,   0f, e6,    vl_2,    d, vl),
     INSN(cvtdq2ps,       ,   0f, 5b,    vl,      d, vl),
     INSN(cvtpd2dq,     f2,   0f, e6,    vl,      q, vl),
@@ -140,6 +141,7 @@ static const struct test avx512f_all[] =
     INSN(cvtusi2sd,    f2,   0f, 7b,    el,   dq64, el),
     INSN(cvtusi2ss,    f3,   0f, 7b,    el,   dq64, el),
     INSN_FP(div,             0f, 5e),
+    INSN(expand,       66, 0f38, 88,    vl,     sd, el),
     INSN(fixupimm,     66, 0f3a, 54,    vl,     sd, vl),
     INSN(fixupimm,     66, 0f3a, 55,    el,     sd, el),
     INSN(fmadd132,     66, 0f38, 98,    vl,     sd, vl),
@@ -214,6 +216,7 @@ static const struct test avx512f_all[] =
     INSN(pcmpgtd,      66,   0f, 66,    vl,      d, vl),
     INSN(pcmpgtq,      66, 0f38, 37,    vl,      q, vl),
     INSN(pcmpu,        66, 0f3a, 1e,    vl,     dq, vl),
+    INSN(pcompress,    66, 0f38, 8b,    vl,     dq, el),
     INSN(permi2,       66, 0f38, 76,    vl,     dq, vl),
     INSN(permi2,       66, 0f38, 77,    vl,     sd, vl),
     INSN(permilpd,     66, 0f38, 0d,    vl,      q, vl),
@@ -222,6 +225,7 @@ static const struct test avx512f_all[] =
     INSN(permilps,     66, 0f3a, 04,    vl,      d, vl),
     INSN(permt2,       66, 0f38, 7e,    vl,     dq, vl),
     INSN(permt2,       66, 0f38, 7f,    vl,     sd, vl),
+    INSN(pexpand,      66, 0f38, 89,    vl,     dq, el),
     INSN(pmaxs,        66, 0f38, 3d,    vl,     dq, vl),
     INSN(pmaxu,        66, 0f38, 3f,    vl,     dq, vl),
     INSN(pmins,        66, 0f38, 39,    vl,     dq, vl),
@@ -509,6 +513,11 @@ static const struct test avx512_vbmi_all
     INSN(permt2b,       66, 0f38, 7d, vl, b, vl),
 };
 
+static const struct test avx512_vbmi2_all[] = {
+    INSN(pcompress, 66, 0f38, 63, vl, bw, el),
+    INSN(pexpand,   66, 0f38, 62, vl, bw, el),
+};
+
 static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
 static const unsigned char vl_128[] = { VL_128 };
 static const unsigned char vl_no128[] = { VL_512, VL_256 };
@@ -865,4 +874,5 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512dq, 512);
     RUN(avx512er, 512);
     RUN(avx512_vbmi, all);
+    RUN(avx512_vbmi2, all);
 }
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -3995,6 +3995,227 @@ int main(int argc, char **argv)
     else
         printf("skipped\n");
 
+    /*
+     * The following compress/expand tests are not only making sure the
+     * accessed data is correct, but they also verify (by placing operands
+     * on the mapping boundaries) that elements controlled by clear mask
+     * bits don't get accessed.
+     */
+    if ( stack_exec && cpu_has_avx512f )
+    {
+        decl_insn(vpcompressd);
+        decl_insn(vpcompressq);
+        decl_insn(vpexpandd);
+        decl_insn(vpexpandq);
+        static const struct {
+            unsigned int d[16];
+        } dsrc = { { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 } };
+        static const struct {
+            unsigned long long q[8];
+        } qsrc = { { 0, 1, 2, 3, 4, 5, 6, 7 } };
+        unsigned int *ptr = res + MMAP_SZ / sizeof(*res) - 32;
+
+        printf("%-40s", "Testing vpcompressd %zmm1,24*4(%ecx){%k2}...");
+        asm volatile ( "kmovw %1, %%k2\n\t"
+                       "vmovdqu32 %2, %%zmm1\n"
+                       put_insn(vpcompressd,
+                                "vpcompressd %%zmm1, 24*4(%0)%{%%k2%}")
+                       :: "c" (NULL), "r" (0x55aa), "m" (dsrc) );
+
+        memset(ptr, 0xdb, 32 * 4);
+        set_insn(vpcompressd);
+        regs.ecx = (unsigned long)ptr;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vpcompressd) ||
+             memcmp(ptr, ptr + 8, 16 * 4) )
+            goto fail;
+        for ( i = 0; i < 4; ++i )
+            if ( ptr[24 + i] != 2 * i + 1 )
+                goto fail;
+        for ( ; i < 8; ++i )
+            if ( ptr[24 + i] != 2 * i )
+                goto fail;
+        printf("okay\n");
+
+        printf("%-40s", "Testing vpexpandd 8*4(%edx),%zmm3{%k2}{z}...");
+        asm volatile ( "vpternlogd $0x81, %%zmm3, %%zmm3, %%zmm3\n"
+                       put_insn(vpexpandd,
+                                "vpexpandd 8*4(%0), %%zmm3%{%%k2%}%{z%}")
+                       :: "d" (NULL) );
+        set_insn(vpexpandd);
+        regs.edx = (unsigned long)(ptr + 16);
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vpexpandd) )
+            goto fail;
+        asm ( "vmovdqa32 %%zmm1, %%zmm2%{%%k2%}%{z%}\n\t"
+              "vpcmpeqd %%zmm2, %%zmm3, %%k0\n\t"
+              "kmovw %%k0, %0"
+              : "=r" (rc) );
+        if ( rc != 0xffff )
+            goto fail;
+        printf("okay\n");
+
+        printf("%-40s", "Testing vpcompressq %zmm4,12*8(%edx){%k3}...");
+        asm volatile ( "kmovw %1, %%k3\n\t"
+                       "vmovdqu64 %2, %%zmm4\n"
+                       put_insn(vpcompressq,
+                                "vpcompressq %%zmm4, 12*8(%0)%{%%k3%}")
+                       :: "d" (NULL), "r" (0x5a), "m" (qsrc) );
+
+        memset(ptr, 0xdb, 16 * 8);
+        set_insn(vpcompressq);
+        regs.edx = (unsigned long)ptr;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vpcompressq) ||
+             memcmp(ptr, ptr + 8, 8 * 8) )
+            goto fail;
+        for ( i = 0; i < 2; ++i )
+        {
+            if ( ptr[(12 + i) * 2] != 2 * i + 1 ||
+                 ptr[(12 + i) * 2 + 1] )
+                goto fail;
+        }
+        for ( ; i < 4; ++i )
+        {
+            if ( ptr[(12 + i) * 2] != 2 * i ||
+                 ptr[(12 + i) * 2 + 1] )
+                goto fail;
+        }
+        printf("okay\n");
+
+        printf("%-40s", "Testing vpexpandq 4*8(%ecx),%zmm5{%k3}{z}...");
+        asm volatile ( "vpternlogq $0x81, %%zmm5, %%zmm5, %%zmm5\n"
+                       put_insn(vpexpandq,
+                                "vpexpandq 4*8(%0), %%zmm5%{%%k3%}%{z%}")
+                       :: "c" (NULL) );
+        set_insn(vpexpandq);
+        regs.ecx = (unsigned long)(ptr + 16);
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vpexpandq) )
+            goto fail;
+        asm ( "vmovdqa64 %%zmm4, %%zmm6%{%%k3%}%{z%}\n\t"
+              "vpcmpeqq %%zmm5, %%zmm6, %%k0\n\t"
+              "kmovw %%k0, %0"
+              : "=r" (rc) );
+        if ( rc != 0xff )
+            goto fail;
+        printf("okay\n");
+    }
+
+#if __GNUC__ > 7 /* can't check for __AVX512VBMI2__ here */
+    if ( stack_exec && cpu_has_avx512_vbmi2 )
+    {
+        decl_insn(vpcompressb);
+        decl_insn(vpcompressw);
+        decl_insn(vpexpandb);
+        decl_insn(vpexpandw);
+        static const struct {
+            unsigned char b[64];
+        } bsrc = { { 0,  1,  2,  3,  4,  5,  6,  7,
+                     8,  9, 10, 11, 12, 13, 14, 15,
+                    16, 17, 18, 19, 20, 21, 22, 23,
+                    24, 25, 26, 27, 28, 29, 30, 31,
+                    32, 33, 34, 35, 36, 37, 38, 39,
+                    40, 41, 42, 43, 44, 45, 46, 47,
+                    48, 49, 50, 51, 52, 53, 54, 55,
+                    56, 57, 58, 59, 60, 61, 62, 63 } };
+        static const struct {
+            unsigned short w[32];
+        } wsrc = { { 0,  1,  2,  3,  4,  5,  6,  7,
+                     8,  9, 10, 11, 12, 13, 14, 15,
+                    16, 17, 18, 19, 20, 21, 22, 23,
+                    24, 25, 26, 27, 28, 29, 30, 31 } };
+        unsigned char *ptr = (void *)res + MMAP_SZ - 128;
+        unsigned long long w = 0x55555555aaaaaaaaULL;
+
+        printf("%-40s", "Testing vpcompressb %zmm1,96*1(%ecx){%k2}...");
+        asm volatile ( "kmovq %1, %%k2\n\t"
+                       "vmovdqu8 %2, %%zmm1\n"
+                       put_insn(vpcompressb,
+                                "vpcompressb %%zmm1, 96*1(%0)%{%%k2%}")
+                       :: "c" (NULL), "m" (w), "m" (bsrc) );
+
+        memset(ptr, 0xdb, 128 * 1);
+        set_insn(vpcompressb);
+        regs.ecx = (unsigned long)ptr;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vpcompressb) ||
+             memcmp(ptr, ptr + 32, 64 * 1) )
+            goto fail;
+        for ( i = 0; i < 16; ++i )
+            if ( ptr[96 + i] != 2 * i + 1 )
+                goto fail;
+        for ( ; i < 32; ++i )
+            if ( ptr[96 + i] != 2 * i )
+                goto fail;
+        printf("okay\n");
+
+        printf("%-40s", "Testing vpexpandb 32*1(%edx),%zmm3{%k2}{z}...");
+        asm volatile ( "vpternlogd $0x81, %%zmm3, %%zmm3, %%zmm3\n"
+                       put_insn(vpexpandb,
+                                "vpexpandb 32*1(%0), %%zmm3%{%%k2%}%{z%}")
+                       :: "d" (NULL) );
+        set_insn(vpexpandb);
+        regs.edx = (unsigned long)(ptr + 64);
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vpexpandb) )
+            goto fail;
+        asm ( "vmovdqu8 %%zmm1, %%zmm2%{%%k2%}%{z%}\n\t"
+              "vpcmpeqb %%zmm2, %%zmm3, %%k0\n\t"
+              "kmovq %%k0, %0"
+              : "=m" (w) );
+        if ( w != 0xffffffffffffffffULL )
+            goto fail;
+        printf("okay\n");
+
+        printf("%-40s", "Testing vpcompressw %zmm4,48*2(%edx){%k3}...");
+        asm volatile ( "kmovd %1, %%k3\n\t"
+                       "vmovdqu16 %2, %%zmm4\n"
+                       put_insn(vpcompressw,
+                                "vpcompressw %%zmm4, 48*2(%0)%{%%k3%}")
+                       :: "d" (NULL), "r" (0x5555aaaa), "m" (wsrc) );
+
+        memset(ptr, 0xdb, 64 * 2);
+        set_insn(vpcompressw);
+        regs.edx = (unsigned long)ptr;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vpcompressw) ||
+             memcmp(ptr, ptr + 32, 32 * 2) )
+            goto fail;
+        for ( i = 0; i < 8; ++i )
+        {
+            if ( ptr[(48 + i) * 2] != 2 * i + 1 ||
+                 ptr[(48 + i) * 2 + 1] )
+                goto fail;
+        }
+        for ( ; i < 16; ++i )
+        {
+            if ( ptr[(48 + i) * 2] != 2 * i ||
+                 ptr[(48 + i) * 2 + 1] )
+                goto fail;
+        }
+        printf("okay\n");
+
+        printf("%-40s", "Testing vpexpandw 16*2(%ecx),%zmm5{%k3}{z}...");
+        asm volatile ( "vpternlogd $0x81, %%zmm5, %%zmm5, %%zmm5\n"
+                       put_insn(vpexpandw,
+                                "vpexpandw 16*2(%0), %%zmm5%{%%k3%}%{z%}")
+                       :: "c" (NULL) );
+        set_insn(vpexpandw);
+        regs.ecx = (unsigned long)(ptr + 64);
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vpexpandw) )
+            goto fail;
+        asm ( "vmovdqu16 %%zmm4, %%zmm6%{%%k3%}%{z%}\n\t"
+              "vpcmpeqw %%zmm5, %%zmm6, %%k0\n\t"
+              "kmovq %%k0, %0"
+              : "=m" (w) );
+        if ( w != 0xffffffff )
+            goto fail;
+        printf("okay\n");
+    }
+#endif
+
 #undef decl_insn
 #undef put_insn
 #undef set_insn
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -59,6 +59,9 @@
     (type *)((char *)mptr__ - offsetof(type, member)); \
 })
 
+#define hweight32 __builtin_popcount
+#define hweight64 __builtin_popcountll
+
 #define is_canonical_address(x) (((int64_t)(x) >> 47) == ((int64_t)(x) >> 63))
 
 extern uint32_t mxcsr_mask;
@@ -138,6 +141,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512bw  (cp.feat.avx512bw && xcr0_mask(0xe6))
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
+#define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -482,6 +482,8 @@ static const struct ext0f38_table {
     [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 },
     [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 },
     [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 },
+    [0x62] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_bw },
+    [0x63] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_bw },
     [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x78] = { .simd_size = simd_other, .two_op = 1 },
@@ -489,6 +491,10 @@ static const struct ext0f38_table {
     [0x7a ... 0x7c] = { .simd_size = simd_none, .two_op = 1 },
     [0x7d ... 0x7e] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x7f] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
+    [0x88] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_dq },
+    [0x89] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_dq },
+    [0x8a] = { .simd_size = simd_packed_fp, .to_mem = 1, .two_op = 1, .d8s = d8s_dq },
+    [0x8b] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_dq },
     [0x8c] = { .simd_size = simd_packed_int },
     [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 },
@@ -1900,6 +1906,7 @@ static bool vcpu_has(
 #define vcpu_has_avx512bw()    vcpu_has(         7, EBX, 30, ctxt, ops)
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
 #define vcpu_has_avx512_vbmi() vcpu_has(         7, ECX,  1, ctxt, ops)
+#define vcpu_has_avx512_vbmi2() vcpu_has(        7, ECX,  6, ctxt, ops)
 #define vcpu_has_rdpid()       vcpu_has(         7, ECX, 22, ctxt, ops)
 #define vcpu_has_clzero()      vcpu_has(0x80000008, EBX,  0, ctxt, ops)
 
@@ -8905,6 +8912,36 @@ x86_emulate(
         generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD);
         goto simd_0f_avx2;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x62): /* vpexpand{b,w} [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x63): /* vpcompress{b,w} [xyz]mm,[xyz]mm/mem{k} */
+        host_and_vcpu_must_have(avx512_vbmi2);
+        elem_bytes = 1 << evex.w;
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x88): /* vexpandp{s,d} [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x89): /* vpexpand{d,q} [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x8a): /* vcompressp{s,d} [xyz]mm,[xyz]mm/mem{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x8b): /* vpcompress{d,q} [xyz]mm,[xyz]mm/mem{k} */
+        host_and_vcpu_must_have(avx512f);
+        generate_exception_if(evex.brs, EXC_UD);
+        avx512_vlen_check(false);
+        /*
+         * For the respective code below the main switch() to work we need to
+         * compact op_mask here: Memory accesses are non-sparse even if the
+         * mask register has sparsely set bits.
+         */
+        if ( likely(fault_suppression) )
+        {
+            n = 1 << ((b & 8 ? 2 : 4) + evex.lr - evex.w);
+            EXPECT(elem_bytes > 0);
+            ASSERT(op_bytes == n * elem_bytes);
+            op_mask &= ~0ULL >> (64 - n);
+            n = hweight64(op_mask);
+            op_bytes = n * elem_bytes;
+            if ( n )
+                op_mask = ~0ULL >> (64 - n);
+        }
+        goto simd_zmm;
+
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x8d): /* vperm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -109,6 +109,7 @@
 
 /* CPUID level 0x00000007:0.ecx */
 #define cpu_has_avx512_vbmi     boot_cpu_has(X86_FEATURE_AVX512_VBMI)
+#define cpu_has_avx512_vbmi2    boot_cpu_has(X86_FEATURE_AVX512_VBMI2)
 #define cpu_has_rdpid           boot_cpu_has(X86_FEATURE_RDPID)
 
 /* CPUID level 0x80000007.edx */
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -228,6 +228,7 @@ XEN_CPUFEATURE(AVX512_VBMI,   6*32+ 1) /
 XEN_CPUFEATURE(UMIP,          6*32+ 2) /*S  User Mode Instruction Prevention */
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
+XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
 XEN_CPUFEATURE(RDPID,         6*32+22) /*A  RDPID instruction */
 
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -266,10 +266,10 @@ def crunch_numbers(state):
                   AVX512BW, AVX512VL, AVX512_4VNNIW, AVX512_4FMAPS,
                   AVX512_VPOPCNTDQ],
 
-        # AVX512 extensions acting solely on vectors of bytes/words are made
+        # AVX512 extensions acting (solely) on vectors of bytes/words are made
         # dependents of AVX512BW (as to requiring wider than 16-bit mask
         # registers), despite the SDM not formally making this connection.
-        AVX512BW: [AVX512_VBMI],
+        AVX512BW: [AVX512_VBMI, AVX512_VBMI2],
 
         # The features:
         #   * Single Thread Indirect Branch Predictors

From patchwork Fri Mar 15 10:58:22 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854515
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4F5A01575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:00:09 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30D242A6AF
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:00:09 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 1F3122A948; Fri, 15 Mar 2019 11:00:09 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 473092A6AF
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:00:08 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kXg-0007mS-TL; Fri, 15 Mar 2019 10:58:24 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kXf-0007m6-Ux
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:58:23 +0000
X-Inumbo-ID: 3fabcaf7-4711-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 3fabcaf7-4711-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:58:22 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:58:21 -0600
Message-Id: <5C8B854E020000780021F24B@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:58:22 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 31/50] x86emul: support remaining misc
 AVX512{F, BW} insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This completes support of AVX512BW in the insn emulator, and leaves just
the scatter/gather ones open in the AVX512F set.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v5: New.
---
TBD: The *blendm* inline functions don't reliably produce the intended
     insns, as the respective moves are about as good a fit for the
     compiler when looking for a match for the intended operation. We'd
     need to switch to inline assembly if we wanted to guarantee the
     testing of those insns. Thoughts?

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -105,6 +105,8 @@ enum esz {
 
 static const struct test avx512f_all[] = {
     INSN_FP(add,             0f, 58),
+    INSN(align,        66, 0f3a, 03,    vl,     dq, vl),
+    INSN(blendm,       66, 0f38, 65,    vl,     sd, vl),
     INSN(broadcastss,  66, 0f38, 18,    el,      d, el),
     INSN_FP(cmp,             0f, c2),
     INSN(comisd,       66,   0f, 2f,    el,      q, el),
@@ -207,6 +209,7 @@ static const struct test avx512f_all[] =
     INSN(paddq,        66,   0f, d4,    vl,      q, vl),
     INSN(pand,         66,   0f, db,    vl,     dq, vl),
     INSN(pandn,        66,   0f, df,    vl,     dq, vl),
+    INSN(pblendm,      66, 0f38, 64,    vl,     dq, vl),
 //       pbroadcast,   66, 0f38, 7c,          dq64
     INSN(pbroadcastd,  66, 0f38, 58,    el,      d, el),
     INSN(pbroadcastq,  66, 0f38, 59,    el,      q, el),
@@ -354,6 +357,7 @@ static const struct test avx512f_512[] =
 };
 
 static const struct test avx512bw_all[] = {
+    INSN(dbpsadbw,    66, 0f3a, 42,    vl,    b, vl),
     INSN(movdqu8,     f2,   0f, 6f,    vl,    b, vl),
     INSN(movdqu8,     f2,   0f, 7f,    vl,    b, vl),
     INSN(movdqu16,    f2,   0f, 6f,    vl,    w, vl),
@@ -373,6 +377,7 @@ static const struct test avx512bw_all[]
     INSN(palignr,     66, 0f3a, 0f,    vl,    b, vl),
     INSN(pavgb,       66,   0f, e0,    vl,    b, vl),
     INSN(pavgw,       66,   0f, e3,    vl,    w, vl),
+    INSN(pblendm,     66, 0f38, 66,    vl,   bw, vl),
     INSN(pbroadcastb, 66, 0f38, 78,    el,    b, el),
 //       pbroadcastb, 66, 0f38, 7a,           b
     INSN(pbroadcastw, 66, 0f38, 79,    el_2,  b, vl),
--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -297,7 +297,7 @@ static inline vec_t movlhps(vec_t x, vec
 #   define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0)
 #   define min(x, y) BR_(minps, _mask, x, y, undef(), ~0)
 #  endif
-#  define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE))
+#  define mix(x, y) B(blendmps_, _mask, x, y, (0b1010101010101010 & ALL_TRUE))
 #  define scale(x, y) BR(scalefps, _mask, x, y, undef(), ~0)
 #  if VEC_SIZE == 64 && defined(__AVX512ER__)
 #   define recip(x) BR(rcp28ps, _mask, x, undef(), ~0)
@@ -370,7 +370,7 @@ static inline vec_t movlhps(vec_t x, vec
 #   define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0)
 #   define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0)
 #  endif
-#  define mix(x, y) B(movapd, _mask, x, y, 0b01010101)
+#  define mix(x, y) B(blendmpd_, _mask, x, y, 0b10101010)
 #  define scale(x, y) BR(scalefpd, _mask, x, y, undef(), ~0)
 #  if VEC_SIZE == 64 && defined(__AVX512ER__)
 #   define recip(x) BR(rcp28pd, _mask, x, undef(), ~0)
@@ -564,8 +564,9 @@ static inline vec_t movlhps(vec_t x, vec
                              0b00011011, (vsi_t)undef(), ~0))
 #   define swap2(x) ((vec_t)B_(permvarsi, _mask, (vsi_t)(x), (vsi_t)(inv - 1), (vsi_t)undef(), ~0))
 #  endif
-#  define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \
-                              (0b0101010101010101 & ((1 << ELEM_COUNT) - 1))))
+#  define mix(x, y) ((vec_t)B(blendmd_, _mask, (vsi_t)(x), (vsi_t)(y), \
+                              (0b1010101010101010 & ((1 << ELEM_COUNT) - 1))))
+#  define rotr(x, n) ((vec_t)B(alignd, _mask, (vsi_t)(x), (vsi_t)(x), n, (vsi_t)undef(), ~0))
 #  define shrink1(x) ((half_t)B(pmovqd, _mask, (vdi_t)(x), (vsi_half_t){}, ~0))
 # elif INT_SIZE == 8 || UINT_SIZE == 8
 #  define broadcast(x) ({ \
@@ -602,7 +603,8 @@ static inline vec_t movlhps(vec_t x, vec
                              0b01001110, (vsi_t)undef(), ~0))
 #   define swap2(x) ((vec_t)B(permvardi, _mask, (vdi_t)(x), (vdi_t)(inv - 1), (vdi_t)undef(), ~0))
 #  endif
-#  define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101))
+#  define mix(x, y) ((vec_t)B(blendmq_, _mask, (vdi_t)(x), (vdi_t)(y), 0b10101010))
+#  define rotr(x, n) ((vec_t)B(alignq, _mask, (vdi_t)(x), (vdi_t)(x), n, (vdi_t)undef(), ~0))
 #  if VEC_SIZE == 32
 #   define swap3(x) ((vec_t)B_(permdi, _mask, (vdi_t)(x), 0b00011011, (vdi_t)undef(), ~0))
 #  elif VEC_SIZE == 64
@@ -654,8 +656,8 @@ static inline vec_t movlhps(vec_t x, vec
 #   define interleave_hi(x, y) ((vec_t)B(vpermi2varqi, _mask, (vqi_t)(x), interleave_hi, (vqi_t)(y), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(vpermt2varqi, _mask, interleave_lo, (vqi_t)(x), (vqi_t)(y), ~0))
 #  endif
-#  define mix(x, y) ((vec_t)B(movdquqi, _mask, (vqi_t)(x), (vqi_t)(y), \
-                              (0b0101010101010101010101010101010101010101010101010101010101010101LL & ALL_TRUE)))
+#  define mix(x, y) ((vec_t)B(blendmb_, _mask, (vqi_t)(x), (vqi_t)(y), \
+                              (0b1010101010101010101010101010101010101010101010101010101010101010LL & ALL_TRUE)))
 #  define shrink1(x) ((half_t)B(pmovwb, _mask, (vhi_t)(x), (vqi_half_t){}, ~0))
 #  define shrink2(x) ((quarter_t)B(pmovdb, _mask, (vsi_t)(x), (vqi_quarter_t){}, ~0))
 #  define shrink3(x) ((eighth_t)B(pmovqb, _mask, (vdi_t)(x), (vqi_eighth_t){}, ~0))
@@ -687,8 +689,8 @@ static inline vec_t movlhps(vec_t x, vec
 #   define interleave_hi(x, y) ((vec_t)B(vpermi2varhi, _mask, (vhi_t)(x), interleave_hi, (vhi_t)(y), ~0))
 #   define interleave_lo(x, y) ((vec_t)B(vpermt2varhi, _mask, interleave_lo, (vhi_t)(x), (vhi_t)(y), ~0))
 #  endif
-#  define mix(x, y) ((vec_t)B(movdquhi, _mask, (vhi_t)(x), (vhi_t)(y), \
-                              (0b01010101010101010101010101010101 & ALL_TRUE)))
+#  define mix(x, y) ((vec_t)B(blendmw_, _mask, (vhi_t)(x), (vhi_t)(y), \
+                              (0b10101010101010101010101010101010 & ALL_TRUE)))
 #  define shrink1(x) ((half_t)B(pmovdw, _mask, (vsi_t)(x), (vhi_half_t){}, ~0))
 #  define shrink2(x) ((quarter_t)B(pmovqw, _mask, (vdi_t)(x), (vhi_quarter_t){}, ~0))
 #  define swap2(x) ((vec_t)B(permvarhi, _mask, (vhi_t)(x), (vhi_t)(inv - 1), (vhi_t)undef(), ~0))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -484,6 +484,7 @@ static const struct ext0f38_table {
     [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x62] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_bw },
     [0x63] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_bw },
+    [0x64 ... 0x66] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x78] = { .simd_size = simd_other, .two_op = 1 },
@@ -550,6 +551,7 @@ static const struct ext0f3a_table {
     [0x00] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0x01] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x02] = { .simd_size = simd_packed_int },
+    [0x03] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x06] = { .simd_size = simd_packed_fp },
     [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
@@ -581,8 +583,7 @@ static const struct ext0f3a_table {
     [0x3b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 },
     [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x40 ... 0x41] = { .simd_size = simd_packed_fp },
-    [0x42] = { .simd_size = simd_packed_int },
-    [0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0x42 ... 0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x44] = { .simd_size = simd_packed_int },
     [0x46] = { .simd_size = simd_packed_int },
     [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 },
@@ -6204,6 +6205,8 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x47): /* vpsllv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x4c): /* vrcp14p{s,d} [xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x4e): /* vrsqrt14p{s,d} [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x64): /* vpblendm{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x65): /* vblendmp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     avx512f_no_sae:
         host_and_vcpu_must_have(avx512f);
         generate_exception_if(ea.type != OP_MEM && evex.brs, EXC_UD);
@@ -6961,6 +6964,7 @@ x86_emulate(
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x0b): /* vpmulhrsw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x66): /* vpblendm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         host_and_vcpu_must_have(avx512bw);
         generate_exception_if(evex.brs, EXC_UD);
         elem_bytes = 1 << (b & 1);
@@ -8130,10 +8134,12 @@ x86_emulate(
         goto simd_0f_to_gpr;
 
     CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0xc6): /* vshufp{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-        fault_suppression = false;
         generate_exception_if(evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK),
                               EXC_UD);
         /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x03): /* valign{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        fault_suppression = false;
+        /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x25): /* vpternlog{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     avx512f_imm8_no_sae:
         host_and_vcpu_must_have(avx512f);
@@ -9471,6 +9477,9 @@ x86_emulate(
         insn_bytes = PFX_BYTES + 4;
         break;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x42): /* vdbpsadbw $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        generate_exception_if(evex.w, EXC_UD);
+        /* fall through */
     case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0f): /* vpalignr $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         fault_suppression = false;
         goto avx512bw_imm;

From patchwork Fri Mar 15 10:58:47 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854517
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC0371575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:00:37 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CECDF2A938
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:00:37 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C00622A948; Fri, 15 Mar 2019 11:00:37 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8F10D2A938
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:00:36 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kY8-0007tH-7w; Fri, 15 Mar 2019 10:58:52 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kY6-0007sy-Fc
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:58:50 +0000
X-Inumbo-ID: 4f3598e0-4711-11e9-acbf-7385351564a4
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 4f3598e0-4711-11e9-acbf-7385351564a4;
 Fri, 15 Mar 2019 10:58:48 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:58:47 -0600
Message-Id: <5C8B8567020000780021F24E@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:58:47 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 32/50] x86emul: support AVX512F gather insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This requires getting modrm_reg and sib_index set correctly in the EVEX
case, to account for the high 16 [XYZ]MM registers. Extend the
adjustments to modrm_rm as well, such that x86_insn_modrm() would
correctly report register numbers (this was a latent issue only as we
don't currently have callers of that function which would care about an
EVEX case). The adjustment in turn requires dropping the assertion from
decode_gpr() as well as re-introducing the explicit masking, as we now
need to actively mask off the high bit when a GPR is meant.
_decode_gpr() invocations also need slight adjustments, when invoked in
generic code ahead of the main switch(). All other uses of modrm_reg and
modrm_rm already get suitably masked where necessary.

There was also an encoding mistake in the EVEX Disp8 test code, which
was benign (due to %rdx getting set to zero) to all non-vSIB tests as it
mistakenly encoded <disp8>(%rdx,%rdx) instead of <disp8>(%rdx,%riz). In
the vSIB case this meant <disp8>(%rdx,%zmm2) instead of the intended
<disp8>(%rdx,%zmm4).

Likewise the access count check wasn't entirely correct for the S/G
case: In the quad-word-index but dword-data case only half the number
of full vector elements get accessed.

As an unrelated change in the main test harness source file distinguish
the "n/a" messages by bitness.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v8: Re-base.
v7: Fix ByteOp register decode. Re-base.
v6: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -18,7 +18,7 @@ CFLAGS += $(CFLAGS_xeninclude)
 
 SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er
 FMA := fma4 fma
-SG := avx2-sg
+SG := avx2-sg avx512f-sg avx512vl-sg
 TESTCASES := blowfish $(SIMD) $(FMA) $(SG)
 
 OPMASK := avx512f avx512dq avx512bw
@@ -66,6 +66,14 @@ xop-flts := $(avx-flts)
 avx512f-vecs := 64 16 32
 avx512f-ints := 4 8
 avx512f-flts := 4 8
+avx512f-sg-vecs := 64
+avx512f-sg-idxs := 4 8
+avx512f-sg-ints := $(avx512f-ints)
+avx512f-sg-flts := $(avx512f-flts)
+avx512vl-sg-vecs := 16 32
+avx512vl-sg-idxs := $(avx512f-sg-idxs)
+avx512vl-sg-ints := $(avx512f-ints)
+avx512vl-sg-flts := $(avx512f-flts)
 avx512bw-vecs := $(avx512f-vecs)
 avx512bw-ints := 1 2
 avx512bw-flts :=
--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -176,6 +176,8 @@ static const struct test avx512f_all[] =
     INSN(fnmsub213,    66, 0f38, af,    el,     sd, el),
     INSN(fnmsub231,    66, 0f38, be,    vl,     sd, vl),
     INSN(fnmsub231,    66, 0f38, bf,    el,     sd, el),
+    INSN(gatherd,      66, 0f38, 92,    vl,     sd, el),
+    INSN(gatherq,      66, 0f38, 93,    vl,     sd, el),
     INSN(getexp,       66, 0f38, 42,    vl,     sd, vl),
     INSN(getexp,       66, 0f38, 43,    el,     sd, el),
     INSN(getmant,      66, 0f3a, 26,    vl,     sd, vl),
@@ -229,6 +231,8 @@ static const struct test avx512f_all[] =
     INSN(permt2,       66, 0f38, 7e,    vl,     dq, vl),
     INSN(permt2,       66, 0f38, 7f,    vl,     sd, vl),
     INSN(pexpand,      66, 0f38, 89,    vl,     dq, el),
+    INSN(pgatherd,     66, 0f38, 90,    vl,     dq, el),
+    INSN(pgatherq,     66, 0f38, 91,    vl,     dq, el),
     INSN(pmaxs,        66, 0f38, 3d,    vl,     dq, vl),
     INSN(pmaxu,        66, 0f38, 3f,    vl,     dq, vl),
     INSN(pmins,        66, 0f38, 39,    vl,     dq, vl),
@@ -698,7 +702,7 @@ static void test_one(const struct test *
     instr[3] = evex.raw[2];
     instr[4] = test->opc;
     instr[5] = 0x44 | (test->ext << 3); /* ModR/M */
-    instr[6] = 0x12; /* SIB: base rDX, index none / xMM4 */
+    instr[6] = 0x22; /* SIB: base rDX, index none / xMM4 */
     instr[7] = 1; /* Disp8 */
     instr[8] = 0; /* immediate, if any */
 
@@ -718,7 +722,8 @@ static void test_one(const struct test *
          if ( accessed[i] )
              goto fail;
     for ( ; i < (test->scale == SC_vl ? vsz : esz) + (sg ? esz : vsz); ++i )
-         if ( accessed[i] != (sg ? vsz / esz : 1) )
+         if ( accessed[i] != (sg ? (vsz / esz) >> (test->opc & 1 & !evex.w)
+                                 : 1) )
              goto fail;
     for ( ; i < ARRAY_SIZE(accessed); ++i )
          if ( accessed[i] )
--- a/tools/tests/x86_emulator/simd-sg.c
+++ b/tools/tests/x86_emulator/simd-sg.c
@@ -35,13 +35,78 @@ typedef long long __attribute__((vector_
 #define ITEM_COUNT (VEC_SIZE / ELEM_SIZE < IVEC_SIZE / IDX_SIZE ? \
                     VEC_SIZE / ELEM_SIZE : IVEC_SIZE / IDX_SIZE)
 
-#if VEC_SIZE == 16
-# define to_bool(cmp) __builtin_ia32_ptestc128(cmp, (vec_t){} == 0)
-#else
-# define to_bool(cmp) __builtin_ia32_ptestc256(cmp, (vec_t){} == 0)
-#endif
+#if defined(__AVX512F__)
+# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT))
+# if ELEM_SIZE == 4
+#  if IDX_SIZE == 4 || defined(__AVX512VL__)
+#   define to_mask(msk) B(ptestmd, , (vsi_t)(msk), (vsi_t)(msk), ~0)
+#   define eq(x, y) (B(pcmpeqd, _mask, (vsi_t)(x), (vsi_t)(y), -1) == ALL_TRUE)
+#  else
+#   define widen(x) __builtin_ia32_pmovzxdq512_mask((vsi_t)(x), (idi_t){}, ~0)
+#   define to_mask(msk) __builtin_ia32_ptestmq512(widen(msk), widen(msk), ~0)
+#   define eq(x, y) (__builtin_ia32_pcmpeqq512_mask(widen(x), widen(y), ~0) == ALL_TRUE)
+#  endif
+#  define BG_(dt, it, reg, mem, idx, msk, scl) \
+    __builtin_ia32_gather##it##dt(reg, mem, idx, to_mask(msk), scl)
+# else
+#  define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE)
+#  define BG_(dt, it, reg, mem, idx, msk, scl) \
+    __builtin_ia32_gather##it##dt(reg, mem, idx, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), scl)
+# endif
+/*
+ * Instead of replicating the main IDX_SIZE conditional below three times, use
+ * a double layer of macro invocations, allowing for substitution of the
+ * respective relevant macro argument tokens.
+ */
+# define BG(dt, it, reg, mem, idx, msk, scl) BG_(dt, it, reg, mem, idx, msk, scl)
+# if VEC_MAX < 64
+/*
+ * The sub-512-bit built-ins have an extra "3" infix, presumably because the
+ * 512-bit names were chosen without the AVX512VL extension in mind (and hence
+ * making the latter collide with the AVX2 ones).
+ */
+#  define si 3si
+#  define di 3di
+# endif
+# if VEC_MAX == 16
+#  define v8df v2df
+#  define v8di v2di
+#  define v16sf v4sf
+#  define v16si v4si
+# elif VEC_MAX == 32
+#  define v8df v4df
+#  define v8di v4di
+#  define v16sf v8sf
+#  define v16si v8si
+# endif
+# if IDX_SIZE == 4
+#  if INT_SIZE == 4
+#   define gather(reg, mem, idx, msk, scl) BG(v16si, si, reg, mem, idx, msk, scl)
+#  elif INT_SIZE == 8
+#   define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, si, (vdi_t)(reg), mem, idx, msk, scl))
+#  elif FLOAT_SIZE == 4
+#   define gather(reg, mem, idx, msk, scl) BG(v16sf, si, reg, mem, idx, msk, scl)
+#  elif FLOAT_SIZE == 8
+#   define gather(reg, mem, idx, msk, scl) BG(v8df, si, reg, mem, idx, msk, scl)
+#  endif
+# elif IDX_SIZE == 8
+#  if INT_SIZE == 4
+#   define gather(reg, mem, idx, msk, scl) BG(v16si, di, reg, mem, (idi_t)(idx), msk, scl)
+#  elif INT_SIZE == 8
+#   define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, di, (vdi_t)(reg), mem, (idi_t)(idx), msk, scl))
+#  elif FLOAT_SIZE == 4
+#   define gather(reg, mem, idx, msk, scl) BG(v16sf, di, reg, mem, (idi_t)(idx), msk, scl)
+#  elif FLOAT_SIZE == 8
+#   define gather(reg, mem, idx, msk, scl) BG(v8df, di, reg, mem, (idi_t)(idx), msk, scl)
+#  endif
+# endif
+#elif defined(__AVX2__)
+# if VEC_SIZE == 16
+#  define to_bool(cmp) __builtin_ia32_ptestc128(cmp, (vec_t){} == 0)
+# else
+#  define to_bool(cmp) __builtin_ia32_ptestc256(cmp, (vec_t){} == 0)
+# endif
 
-#if defined(__AVX2__)
 # if VEC_MAX == 16
 #  if IDX_SIZE == 4
 #   if INT_SIZE == 4
@@ -111,6 +176,10 @@ typedef long long __attribute__((vector_
 # endif
 #endif
 
+#ifndef eq
+# define eq(x, y) to_bool((x) == (y))
+#endif
+
 #define GLUE_(x, y) x ## y
 #define GLUE(x, y) GLUE_(x, y)
 
@@ -119,6 +188,7 @@ typedef long long __attribute__((vector_
 #define PUT8(n)  PUT4(n),   PUT4((n) +  4)
 #define PUT16(n) PUT8(n),   PUT8((n) +  8)
 #define PUT32(n) PUT16(n), PUT16((n) + 16)
+#define PUT64(n) PUT32(n), PUT32((n) + 32)
 
 const typeof((vec_t){}[0]) array[] = {
     GLUE(PUT, VEC_MAX)(1),
@@ -174,7 +244,7 @@ int sg_test(void)
 
     y = gather(full, array + ITEM_COUNT, -idx, full, ELEM_SIZE);
 #if ITEM_COUNT == ELEM_COUNT
-    if ( !to_bool(y == x - 1) )
+    if ( !eq(y, x - 1) )
         return __LINE__;
 #else
     for ( i = 0; i < ITEM_COUNT; ++i )
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -22,6 +22,8 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512dq-opmask.h"
 #include "avx512bw-opmask.h"
 #include "avx512f.h"
+#include "avx512f-sg.h"
+#include "avx512vl-sg.h"
 #include "avx512bw.h"
 #include "avx512dq.h"
 #include "avx512er.h"
@@ -90,11 +92,13 @@ static bool simd_check_avx512f(void)
     return cpu_has_avx512f;
 }
 #define simd_check_avx512f_opmask simd_check_avx512f
+#define simd_check_avx512f_sg simd_check_avx512f
 
 static bool simd_check_avx512f_vl(void)
 {
     return cpu_has_avx512f && cpu_has_avx512vl;
 }
+#define simd_check_avx512vl_sg simd_check_avx512f_vl
 
 static bool simd_check_avx512dq(void)
 {
@@ -291,6 +295,14 @@ static const struct {
     SIMD(AVX512F u32x16,      avx512f,      64u4),
     SIMD(AVX512F s64x8,       avx512f,      64i8),
     SIMD(AVX512F u64x8,       avx512f,      64u8),
+    SIMD(AVX512F S/G f32[16x32], avx512f_sg, 64x4f4),
+    SIMD(AVX512F S/G f64[ 8x32], avx512f_sg, 64x4f8),
+    SIMD(AVX512F S/G f32[ 8x64], avx512f_sg, 64x8f4),
+    SIMD(AVX512F S/G f64[ 8x64], avx512f_sg, 64x8f8),
+    SIMD(AVX512F S/G i32[16x32], avx512f_sg, 64x4i4),
+    SIMD(AVX512F S/G i64[ 8x32], avx512f_sg, 64x4i8),
+    SIMD(AVX512F S/G i32[ 8x64], avx512f_sg, 64x8i4),
+    SIMD(AVX512F S/G i64[ 8x64], avx512f_sg, 64x8i8),
     AVX512VL(VL f32x4,        avx512f,      16f4),
     AVX512VL(VL f64x2,        avx512f,      16f8),
     AVX512VL(VL f32x8,        avx512f,      32f4),
@@ -303,6 +315,22 @@ static const struct {
     AVX512VL(VL u64x2,        avx512f,      16u8),
     AVX512VL(VL s64x4,        avx512f,      32i8),
     AVX512VL(VL u64x4,        avx512f,      32u8),
+    SIMD(AVX512VL S/G f32[4x32], avx512vl_sg, 16x4f4),
+    SIMD(AVX512VL S/G f64[2x32], avx512vl_sg, 16x4f8),
+    SIMD(AVX512VL S/G f32[2x64], avx512vl_sg, 16x8f4),
+    SIMD(AVX512VL S/G f64[2x64], avx512vl_sg, 16x8f8),
+    SIMD(AVX512VL S/G f32[8x32], avx512vl_sg, 32x4f4),
+    SIMD(AVX512VL S/G f64[4x32], avx512vl_sg, 32x4f8),
+    SIMD(AVX512VL S/G f32[4x64], avx512vl_sg, 32x8f4),
+    SIMD(AVX512VL S/G f64[4x64], avx512vl_sg, 32x8f8),
+    SIMD(AVX512VL S/G i32[4x32], avx512vl_sg, 16x4i4),
+    SIMD(AVX512VL S/G i64[2x32], avx512vl_sg, 16x4i8),
+    SIMD(AVX512VL S/G i32[2x64], avx512vl_sg, 16x8i4),
+    SIMD(AVX512VL S/G i64[2x64], avx512vl_sg, 16x8i8),
+    SIMD(AVX512VL S/G i32[8x32], avx512vl_sg, 32x4i4),
+    SIMD(AVX512VL S/G i64[4x32], avx512vl_sg, 32x4i8),
+    SIMD(AVX512VL S/G i32[4x64], avx512vl_sg, 32x8i4),
+    SIMD(AVX512VL S/G i64[4x64], avx512vl_sg, 32x8i8),
     SIMD(AVX512BW s8x64,     avx512bw,      64i1),
     SIMD(AVX512BW u8x64,     avx512bw,      64u1),
     SIMD(AVX512BW s16x32,    avx512bw,      64i2),
@@ -4260,7 +4288,7 @@ int main(int argc, char **argv)
 
         if ( !blobs[j].size )
         {
-            printf("%-39s n/a\n", blobs[j].name);
+            printf("%-39s n/a (%u-bit)\n", blobs[j].name, blobs[j].bitness);
             continue;
         }
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -499,7 +499,7 @@ static const struct ext0f38_table {
     [0x8c] = { .simd_size = simd_packed_int },
     [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 },
-    [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 },
+    [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq },
     [0x96 ... 0x98] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x99] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x9a] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
@@ -3054,7 +3054,8 @@ x86_decode(
 
         d &= ~ModRM;
 #undef ModRM /* Only its aliases are valid to use from here on. */
-        modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3);
+        modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3) |
+                    ((evex_encoded() && !evex.R) << 4);
         modrm_rm  = modrm & 0x07;
 
         /*
@@ -3224,7 +3225,8 @@ x86_decode(
         if ( modrm_mod == 3 )
         {
             generate_exception_if(d & vSIB, EXC_UD);
-            modrm_rm |= (rex_prefix & 1) << 3;
+            modrm_rm |= ((rex_prefix & 1) << 3) |
+                        (evex_encoded() && !evex.x) << 4;
             ea.type = OP_REG;
         }
         else if ( ad_bytes == 2 )
@@ -3289,7 +3291,10 @@ x86_decode(
 
                 state->sib_index = ((sib >> 3) & 7) | ((rex_prefix << 2) & 8);
                 state->sib_scale = (sib >> 6) & 3;
-                if ( state->sib_index != 4 && !(d & vSIB) )
+                if ( unlikely(d & vSIB) )
+                    state->sib_index |= (mode_64bit() && evex_encoded() &&
+                                         !evex.RX) << 4;
+                else if ( state->sib_index != 4 )
                 {
                     ea.mem.off = *decode_gpr(state->regs, state->sib_index);
                     ea.mem.off <<= state->sib_scale;
@@ -3592,7 +3597,7 @@ x86_emulate(
     generate_exception_if(state->not_64bit && mode_64bit(), EXC_UD);
 
     if ( ea.type == OP_REG )
-        ea.reg = _decode_gpr(&_regs, modrm_rm, (d & ByteOp) && !rex_prefix);
+        ea.reg = _decode_gpr(&_regs, modrm_rm, (d & ByteOp) && !rex_prefix && !vex.opcx);
 
     memset(mmvalp, 0xaa /* arbitrary */, sizeof(*mmvalp));
 
@@ -3606,7 +3611,7 @@ x86_emulate(
         src.type = OP_REG;
         if ( d & ByteOp )
         {
-            src.reg = _decode_gpr(&_regs, modrm_reg, !rex_prefix);
+            src.reg = _decode_gpr(&_regs, modrm_reg, !rex_prefix && !vex.opcx);
             src.val = *(uint8_t *)src.reg;
             src.bytes = 1;
         }
@@ -3704,7 +3709,7 @@ x86_emulate(
         dst.type = OP_REG;
         if ( d & ByteOp )
         {
-            dst.reg = _decode_gpr(&_regs, modrm_reg, !rex_prefix);
+            dst.reg = _decode_gpr(&_regs, modrm_reg, !rex_prefix && !vex.opcx);
             dst.val = *(uint8_t *)dst.reg;
             dst.bytes = 1;
         }
@@ -9119,6 +9124,130 @@ x86_emulate(
         put_stub(stub);
 
         state->simd_size = simd_none;
+        break;
+    }
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x90): /* vpgatherd{d,q} mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x91): /* vpgatherq{d,q} mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x92): /* vgatherdp{s,d} mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x93): /* vgatherqp{s,d} mem,[xyz]mm{k} */
+    {
+        typeof(evex) *pevex;
+        union {
+            int32_t dw[16];
+            int64_t qw[8];
+        } index;
+        bool done = false;
+
+        ASSERT(ea.type == OP_MEM);
+        generate_exception_if((!evex.opmsk || evex.brs || evex.z ||
+                               evex.reg != 0xf ||
+                               modrm_reg == state->sib_index),
+                              EXC_UD);
+        avx512_vlen_check(false);
+        host_and_vcpu_must_have(avx512f);
+        get_fpu(X86EMUL_FPU_zmm);
+
+        /* Read destination and index registers. */
+        opc = init_evex(stub);
+        pevex = copy_EVEX(opc, evex);
+        pevex->opcx = vex_0f;
+        opc[0] = 0x7f; /* vmovdqa{32,64} */
+        /*
+         * The register writeback below has to retain masked-off elements, but
+         * needs to clear upper portions in the index-wider-than-data cases.
+         * Therefore read (and write below) the full register. The alternative
+         * would have been to fiddle with the mask register used.
+         */
+        pevex->opmsk = 0;
+        /* Use (%rax) as destination and modrm_reg as source. */
+        pevex->b = 1;
+        opc[1] = (modrm_reg & 7) << 3;
+        pevex->RX = 1;
+        opc[2] = 0xc3;
+
+        invoke_stub("", "", "=m" (*mmvalp) : "a" (mmvalp));
+
+        pevex->pfx = vex_f3; /* vmovdqu{32,64} */
+        pevex->w = b & 1;
+        /* Switch to sib_index as source. */
+        pevex->r = !mode_64bit() || !(state->sib_index & 0x08);
+        pevex->R = !mode_64bit() || !(state->sib_index & 0x10);
+        opc[1] = (state->sib_index & 7) << 3;
+
+        invoke_stub("", "", "=m" (index) : "a" (&index));
+        put_stub(stub);
+
+        /* Clear untouched parts of the destination and mask values. */
+        n = 1 << (2 + evex.lr - ((b & 1) | evex.w));
+        op_bytes = 4 << evex.w;
+        memset((void *)mmvalp + n * op_bytes, 0, 64 - n * op_bytes);
+        op_mask &= (1 << n) - 1;
+
+        for ( i = 0; op_mask; ++i )
+        {
+            signed long idx = b & 1 ? index.qw[i] : index.dw[i];
+
+            if ( !(op_mask & (1 << i)) )
+                continue;
+
+            rc = ops->read(ea.mem.seg,
+                           truncate_ea(ea.mem.off + (idx << state->sib_scale)),
+                           (void *)mmvalp + i * op_bytes, op_bytes, ctxt);
+            if ( rc != X86EMUL_OKAY )
+            {
+                /*
+                 * If we've made some progress and the access did not fault,
+                 * force a retry instead. This is for example necessary to
+                 * cope with the limited capacity of HVM's MMIO cache.
+                 */
+                if ( rc != X86EMUL_EXCEPTION && done )
+                    rc = X86EMUL_RETRY;
+                break;
+            }
+
+            op_mask &= ~(1 << i);
+            done = true;
+
+#ifdef __XEN__
+            if ( op_mask && local_events_need_delivery() )
+            {
+                rc = X86EMUL_RETRY;
+                break;
+            }
+#endif
+        }
+
+        /* Write destination and mask registers. */
+        opc = init_evex(stub);
+        pevex = copy_EVEX(opc, evex);
+        pevex->opcx = vex_0f;
+        opc[0] = 0x6f; /* vmovdqa{32,64} */
+        pevex->opmsk = 0;
+        /* Use modrm_reg as destination and (%rax) as source. */
+        pevex->b = 1;
+        opc[1] = (modrm_reg & 7) << 3;
+        pevex->RX = 1;
+        opc[2] = 0xc3;
+
+        invoke_stub("", "", "+m" (*mmvalp) : "a" (mmvalp));
+
+        /*
+         * kmovw: This is VEX-encoded, so we can't use pevex. Avoid copy_VEX() etc
+         * as well, since we can easily use the 2-byte VEX form here.
+         */
+        opc -= EVEX_PFX_BYTES;
+        opc[0] = 0xc5;
+        opc[1] = 0xf8;
+        opc[2] = 0x90;
+        /* Use (%rax) as source. */
+        opc[3] = evex.opmsk << 3;
+        opc[4] = 0xc3;
+
+        invoke_stub("", "", "+m" (op_mask) : "a" (&op_mask));
+        put_stub(stub);
+
+        state->simd_size = simd_none;
         break;
     }
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -662,8 +662,6 @@ static inline unsigned long *decode_gpr(
     BUILD_BUG_ON(ARRAY_SIZE(cpu_user_regs_gpr_offsets) &
                  (ARRAY_SIZE(cpu_user_regs_gpr_offsets) - 1));
 
-    ASSERT(modrm < ARRAY_SIZE(cpu_user_regs_gpr_offsets));
-
     /* Note that this also acts as array_access_nospec() stand-in. */
     modrm &= ARRAY_SIZE(cpu_user_regs_gpr_offsets) - 1;
 

From patchwork Fri Mar 15 10:59:19 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854519
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B012F1575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:01:04 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 988FD2A6AF
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:01:04 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 8A0CF2A938; Fri, 15 Mar 2019 11:01:04 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1C8B82A6AF
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:01:04 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kYc-00082q-QF; Fri, 15 Mar 2019 10:59:22 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kYb-00082d-MO
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:59:21 +0000
X-Inumbo-ID: 6250d186-4711-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 6250d186-4711-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 10:59:20 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:59:19 -0600
Message-Id: <5C8B8587020000780021F251@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:59:19 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 33/50] x86emul: add high register S/G test
 cases
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

In order to verify that in particular the index register decoding works
correctly in the S/G emulation paths, add dedicated (64-bit only) cases
disallowing the compiler to use the lower registers. Other than in the
generic SIMD case, where occasional uses of %xmm or %ymm registers in
generated code cause various internal compiler errors when disallowing
use of all of the lower 16 registers (apparently due to insn templates
trying to use AVX2 encodings), doing so here in the AVX512F case looks
to be fine.

While the main goal here is the AVX512F case, add an AVX2 variant as
well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v6: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -147,6 +147,12 @@ $(foreach flavor,$(SIMD) $(FMA),$(eval $
 $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor))))
 $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor))))
 
+first-string = $(shell for s in $(1); do echo "$$s"; break; done)
+
+avx2-sg-cflags-x86_64    := "-D_high $(foreach n,7 6 5 4 3 2 1,-ffixed-ymm$(n)) $(call first-string,$(avx2-sg-cflags))"
+avx512f-sg-cflags-x86_64 := "-D_higher $(foreach n,7 6 5 4 3 2 1,-ffixed-zmm$(n)) $(call first-string,$(avx512f-sg-cflags))"
+avx512f-sg-cflags-x86_64 += "-D_highest $(foreach n,15 14 13 12 11 10 9 8,-ffixed-zmm$(n)) $(call first-string,$(avx512f-sg-cflags-x86_64))"
+
 $(addsuffix .h,$(TESTCASES)): %.h: %.c testcase.mk Makefile
 	rm -f $@.new $*.bin
 	$(foreach arch,$(filter-out $(XEN_COMPILE_ARCH),x86_32) $(XEN_COMPILE_ARCH), \
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -266,6 +266,9 @@ static const struct {
     SIMD(AVX2 S/G i64[4x32],  avx2_sg,    32x4i8),
     SIMD(AVX2 S/G i32[4x64],  avx2_sg,    32x8i4),
     SIMD(AVX2 S/G i64[4x64],  avx2_sg,    32x8i8),
+#ifdef __x86_64__
+    SIMD_(64, AVX2 S/G %ymm8+, avx2_sg,     high),
+#endif
     SIMD(XOP 128bit single,       xop,      16f4),
     SIMD(XOP 256bit single,       xop,      32f4),
     SIMD(XOP 128bit double,       xop,      16f8),
@@ -303,6 +306,10 @@ static const struct {
     SIMD(AVX512F S/G i64[ 8x32], avx512f_sg, 64x4i8),
     SIMD(AVX512F S/G i32[ 8x64], avx512f_sg, 64x8i4),
     SIMD(AVX512F S/G i64[ 8x64], avx512f_sg, 64x8i8),
+#ifdef __x86_64__
+    SIMD_(64, AVX512F S/G %zmm8+, avx512f_sg, higher),
+    SIMD_(64, AVX512F S/G %zmm16+, avx512f_sg, highest),
+#endif
     AVX512VL(VL f32x4,        avx512f,      16f4),
     AVX512VL(VL f64x2,        avx512f,      16f8),
     AVX512VL(VL f32x8,        avx512f,      32f4),

From patchwork Fri Mar 15 10:59:46 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854523
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22C381575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:02:42 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 094BB2A949
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:02:42 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id F20142A94B; Fri, 15 Mar 2019 11:02:41 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3F16F2A949
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:02:41 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kZ5-00089G-5V; Fri, 15 Mar 2019 10:59:51 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kZ4-000896-8f
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:59:50 +0000
X-Inumbo-ID: 72042404-4711-11e9-9a43-9bcba378de18
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 72042404-4711-11e9-9a43-9bcba378de18;
 Fri, 15 Mar 2019 10:59:47 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 04:59:46 -0600
Message-Id: <5C8B85A2020000780021F254@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 04:59:46 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 34/50] x86emul: support AVX512F scatter insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This completes support of AVX512F in the insn emulator.

Note that in the test harness there's a little bit of trickery needed to
get around the not fully consistent naming of AVX512VL gather and
scatter built-ins. To suppress expansion of the "di" and "si" tokens
they get constructed by token concatenation in BS(), which is different
from BG().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: I couldn't really decide whether to duplicate code or merge scatter
     into gather emulation.
---
v7: Re-base.
v6: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -270,6 +270,8 @@ static const struct test avx512f_all[] =
     INSN(prolv,        66, 0f38, 15,    vl,     dq, vl),
     INSNX(pror,        66,   0f, 72, 0, vl,     dq, vl),
     INSN(prorv,        66, 0f38, 14,    vl,     dq, vl),
+    INSN(pscatterd,    66, 0f38, a0,    vl,     dq, el),
+    INSN(pscatterq,    66, 0f38, a1,    vl,     dq, el),
     INSN(pshufd,       66,   0f, 70,    vl,      d, vl),
     INSN(pslld,        66,   0f, f2,    el_4,    d, vl),
     INSNX(pslld,       66,   0f, 72, 6, vl,      d, vl),
@@ -305,6 +307,8 @@ static const struct test avx512f_all[] =
     INSN(rsqrt14,      66, 0f38, 4f,    el,     sd, el),
     INSN(scalef,       66, 0f38, 2c,    vl,     sd, vl),
     INSN(scalef,       66, 0f38, 2d,    el,     sd, el),
+    INSN(scatterd,     66, 0f38, a2,    vl,     sd, el),
+    INSN(scatterq,     66, 0f38, a3,    vl,     sd, el),
     INSN_PFP(shuf,           0f, c6),
     INSN_FP(sqrt,            0f, 51),
     INSN_FP(sub,             0f, 5c),
--- a/tools/tests/x86_emulator/simd-sg.c
+++ b/tools/tests/x86_emulator/simd-sg.c
@@ -48,10 +48,14 @@ typedef long long __attribute__((vector_
 #  endif
 #  define BG_(dt, it, reg, mem, idx, msk, scl) \
     __builtin_ia32_gather##it##dt(reg, mem, idx, to_mask(msk), scl)
+#  define BS_(dt, it, mem, idx, reg, msk, scl) \
+    __builtin_ia32_scatter##it##dt(mem, to_mask(msk), idx, reg, scl)
 # else
 #  define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE)
 #  define BG_(dt, it, reg, mem, idx, msk, scl) \
     __builtin_ia32_gather##it##dt(reg, mem, idx, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), scl)
+#  define BS_(dt, it, mem, idx, reg, msk, scl) \
+    __builtin_ia32_scatter##it##dt(mem, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), idx, reg, scl)
 # endif
 /*
  * Instead of replicating the main IDX_SIZE conditional below three times, use
@@ -59,6 +63,7 @@ typedef long long __attribute__((vector_
  * respective relevant macro argument tokens.
  */
 # define BG(dt, it, reg, mem, idx, msk, scl) BG_(dt, it, reg, mem, idx, msk, scl)
+# define BS(dt, it, mem, idx, reg, msk, scl) BS_(dt, it##i, mem, idx, reg, msk, scl)
 # if VEC_MAX < 64
 /*
  * The sub-512-bit built-ins have an extra "3" infix, presumably because the
@@ -82,22 +87,30 @@ typedef long long __attribute__((vector_
 # if IDX_SIZE == 4
 #  if INT_SIZE == 4
 #   define gather(reg, mem, idx, msk, scl) BG(v16si, si, reg, mem, idx, msk, scl)
+#   define scatter(mem, idx, reg, msk, scl) BS(v16si, s, mem, idx, reg, msk, scl)
 #  elif INT_SIZE == 8
 #   define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, si, (vdi_t)(reg), mem, idx, msk, scl))
+#   define scatter(mem, idx, reg, msk, scl) BS(v8di, s, mem, idx, (vdi_t)(reg), msk, scl)
 #  elif FLOAT_SIZE == 4
 #   define gather(reg, mem, idx, msk, scl) BG(v16sf, si, reg, mem, idx, msk, scl)
+#   define scatter(mem, idx, reg, msk, scl) BS(v16sf, s, mem, idx, reg, msk, scl)
 #  elif FLOAT_SIZE == 8
 #   define gather(reg, mem, idx, msk, scl) BG(v8df, si, reg, mem, idx, msk, scl)
+#   define scatter(mem, idx, reg, msk, scl) BS(v8df, s, mem, idx, reg, msk, scl)
 #  endif
 # elif IDX_SIZE == 8
 #  if INT_SIZE == 4
 #   define gather(reg, mem, idx, msk, scl) BG(v16si, di, reg, mem, (idi_t)(idx), msk, scl)
+#   define scatter(mem, idx, reg, msk, scl) BS(v16si, d, mem, (idi_t)(idx), reg, msk, scl)
 #  elif INT_SIZE == 8
 #   define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, di, (vdi_t)(reg), mem, (idi_t)(idx), msk, scl))
+#   define scatter(mem, idx, reg, msk, scl) BS(v8di, d, mem, (idi_t)(idx), (vdi_t)(reg), msk, scl)
 #  elif FLOAT_SIZE == 4
 #   define gather(reg, mem, idx, msk, scl) BG(v16sf, di, reg, mem, (idi_t)(idx), msk, scl)
+#   define scatter(mem, idx, reg, msk, scl) BS(v16sf, d, mem, (idi_t)(idx), reg, msk, scl)
 #  elif FLOAT_SIZE == 8
 #   define gather(reg, mem, idx, msk, scl) BG(v8df, di, reg, mem, (idi_t)(idx), msk, scl)
+#   define scatter(mem, idx, reg, msk, scl) BS(v8df, d, mem, (idi_t)(idx), reg, msk, scl)
 #  endif
 # endif
 #elif defined(__AVX2__)
@@ -195,6 +208,8 @@ const typeof((vec_t){}[0]) array[] = {
     GLUE(PUT, VEC_MAX)(VEC_MAX + 1)
 };
 
+typeof((vec_t){}[0]) out[VEC_MAX * 2];
+
 int sg_test(void)
 {
     unsigned int i;
@@ -275,5 +290,41 @@ int sg_test(void)
 # endif
 #endif
 
+#ifdef scatter
+
+    for ( i = 0; i < sizeof(out) / sizeof(*out); ++i )
+        out[i] = 0;
+
+    for ( i = 0; i < ITEM_COUNT; ++i )
+        x[i] = i + 1;
+
+    touch(x);
+
+    scatter(out, (idx_t){}, x, (vec_t){ 1 } != 0, 1);
+    if ( out[0] != 1 )
+        return __LINE__;
+    for ( i = 1; i < ITEM_COUNT; ++i )
+        if ( out[i] )
+            return __LINE__;
+
+    scatter(out, (idx_t){}, x, full, 1);
+    if ( out[0] != ITEM_COUNT )
+        return __LINE__;
+    for ( i = 1; i < ITEM_COUNT; ++i )
+        if ( out[i] )
+            return __LINE__;
+
+    scatter(out, idx, x, full, ELEM_SIZE);
+    for ( i = 1; i <= ITEM_COUNT; ++i )
+        if ( out[i] != i )
+            return __LINE__;
+
+    scatter(out, inv, x, full, ELEM_SIZE);
+    for ( i = 1; i <= ITEM_COUNT; ++i )
+        if ( out[i] != ITEM_COUNT + 1 - i )
+            return __LINE__;
+
+#endif
+
     return 0;
 }
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -508,6 +508,7 @@ static const struct ext0f38_table {
     [0x9d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x9e] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x9f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0xa0 ... 0xa3] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq },
     [0xa6 ... 0xa8] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0xa9] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xaa] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
@@ -9330,6 +9331,102 @@ x86_emulate(
             avx512_vlen_check(true);
         goto simd_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa0): /* vpscatterd{d,q} [xyz]mm,mem{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa1): /* vpscatterq{d,q} [xyz]mm,mem{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa2): /* vscatterdp{s,d} [xyz]mm,mem{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa3): /* vscatterqp{s,d} [xyz]mm,mem{k} */
+    {
+        typeof(evex) *pevex;
+        union {
+            int32_t dw[16];
+            int64_t qw[8];
+        } index;
+        bool done = false;
+
+        ASSERT(ea.type == OP_MEM);
+        fail_if(!ops->write);
+        generate_exception_if((!evex.opmsk || evex.brs || evex.z ||
+                               evex.reg != 0xf ||
+                               modrm_reg == state->sib_index),
+                              EXC_UD);
+        avx512_vlen_check(false);
+        host_and_vcpu_must_have(avx512f);
+        get_fpu(X86EMUL_FPU_zmm);
+
+        /* Read source and index registers. */
+        opc = init_evex(stub);
+        pevex = copy_EVEX(opc, evex);
+        pevex->opcx = vex_0f;
+        opc[0] = 0x7f; /* vmovdqa{32,64} */
+        /* Use (%rax) as destination and modrm_reg as source. */
+        pevex->b = 1;
+        opc[1] = (modrm_reg & 7) << 3;
+        pevex->RX = 1;
+        opc[2] = 0xc3;
+
+        invoke_stub("", "", "=m" (*mmvalp) : "a" (mmvalp));
+
+        pevex->pfx = vex_f3; /* vmovdqu{32,64} */
+        pevex->w = b & 1;
+        /* Switch to sib_index as source. */
+        pevex->r = !mode_64bit() || !(state->sib_index & 0x08);
+        pevex->R = !mode_64bit() || !(state->sib_index & 0x10);
+        opc[1] = (state->sib_index & 7) << 3;
+
+        invoke_stub("", "", "=m" (index) : "a" (&index));
+        put_stub(stub);
+
+        /* Clear untouched parts of the mask value. */
+        n = 1 << (2 + evex.lr - ((b & 1) | evex.w));
+        op_bytes = 4 << evex.w;
+        op_mask &= (1 << n) - 1;
+
+        for ( i = 0; op_mask; ++i )
+        {
+            signed long idx = b & 1 ? index.qw[i] : index.dw[i];
+
+            if ( !(op_mask & (1 << i)) )
+                continue;
+
+            rc = ops->write(ea.mem.seg,
+                            truncate_ea(ea.mem.off + (idx << state->sib_scale)),
+                            (void *)mmvalp + i * op_bytes, op_bytes, ctxt);
+            if ( rc != X86EMUL_OKAY )
+            {
+                /* See comment in gather emulation. */
+                if ( rc != X86EMUL_EXCEPTION && done )
+                    rc = X86EMUL_RETRY;
+                break;
+            }
+
+            op_mask &= ~(1 << i);
+            done = true;
+
+#ifdef __XEN__
+            if ( op_mask && local_events_need_delivery() )
+            {
+                rc = X86EMUL_RETRY;
+                break;
+            }
+#endif
+        }
+
+        /* Write mask register. See comment in gather emulation. */
+        opc = get_stub(stub);
+        opc[0] = 0xc5;
+        opc[1] = 0xf8;
+        opc[2] = 0x90;
+        /* Use (%rax) as source. */
+        opc[3] = evex.opmsk << 3;
+        opc[4] = 0xc3;
+
+        invoke_stub("", "", "+m" (op_mask) : "a" (&op_mask));
+        put_stub(stub);
+
+        state->simd_size = simd_none;
+        break;
+    }
+
     case X86EMUL_OPC(0x0f38, 0xc8):     /* sha1nexte xmm/m128,xmm */
     case X86EMUL_OPC(0x0f38, 0xc9):     /* sha1msg1 xmm/m128,xmm */
     case X86EMUL_OPC(0x0f38, 0xca):     /* sha1msg2 xmm/m128,xmm */

From patchwork Fri Mar 15 11:00:19 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854525
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F182113B5
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:02:57 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D75572A948
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:02:57 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id CA1562A94A; Fri, 15 Mar 2019 11:02:57 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2B1CA2A948
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:02:57 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kZb-0000PT-I9; Fri, 15 Mar 2019 11:00:23 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kZa-0000PF-9B
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:00:22 +0000
X-Inumbo-ID: 85e4d830-4711-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 85e4d830-4711-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 11:00:20 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:00:19 -0600
Message-Id: <5C8B85C3020000780021F257@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:00:19 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 35/50] x86emul: support AVX512PF insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Some adjustments are necessary to the EVEX Disp8 scaling test code to
account for the zero byte reads/writes, which get issued for the test
harness only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v8: #GP/#SS don't arise here. Add previously missed change to
    emul_test_init().
v7: Re-base.
v6: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -520,6 +520,17 @@ static const struct test avx512er_512[]
     INSN(rsqrt28, 66, 0f38, cd, el, sd, el),
 };
 
+static const struct test avx512pf_512[] = {
+    INSNX(gatherpf0d,  66, 0f38, c6, 1, vl, sd, el),
+    INSNX(gatherpf0q,  66, 0f38, c7, 1, vl, sd, el),
+    INSNX(gatherpf1d,  66, 0f38, c6, 2, vl, sd, el),
+    INSNX(gatherpf1q,  66, 0f38, c7, 2, vl, sd, el),
+    INSNX(scatterpf0d, 66, 0f38, c6, 5, vl, sd, el),
+    INSNX(scatterpf0q, 66, 0f38, c7, 5, vl, sd, el),
+    INSNX(scatterpf1d, 66, 0f38, c6, 6, vl, sd, el),
+    INSNX(scatterpf1q, 66, 0f38, c7, 6, vl, sd, el),
+};
+
 static const struct test avx512_vbmi_all[] = {
     INSN(permb,         66, 0f38, 8d, vl, b, vl),
     INSN(permi2b,       66, 0f38, 75, vl, b, vl),
@@ -580,7 +591,7 @@ static bool record_access(enum x86_segme
 static int read(enum x86_segment seg, unsigned long offset, void *p_data,
                 unsigned int bytes, struct x86_emulate_ctxt *ctxt)
 {
-    if ( !record_access(seg, offset, bytes) )
+    if ( !record_access(seg, offset, bytes + !bytes) )
         return X86EMUL_UNHANDLEABLE;
     memset(p_data, 0, bytes);
     return X86EMUL_OKAY;
@@ -589,7 +600,7 @@ static int read(enum x86_segment seg, un
 static int write(enum x86_segment seg, unsigned long offset, void *p_data,
                  unsigned int bytes, struct x86_emulate_ctxt *ctxt)
 {
-    if ( !record_access(seg, offset, bytes) )
+    if ( !record_access(seg, offset, bytes + !bytes) )
         return X86EMUL_UNHANDLEABLE;
     return X86EMUL_OKAY;
 }
@@ -597,7 +608,7 @@ static int write(enum x86_segment seg, u
 static void test_one(const struct test *test, enum vl vl,
                      unsigned char *instr, struct x86_emulate_ctxt *ctxt)
 {
-    unsigned int vsz, esz, i;
+    unsigned int vsz, esz, i, n;
     int rc;
     bool sg = strstr(test->mnemonic, "gather") ||
               strstr(test->mnemonic, "scatter");
@@ -725,10 +736,20 @@ static void test_one(const struct test *
     for ( i = 0; i < (test->scale == SC_vl ? vsz : esz); ++i )
          if ( accessed[i] )
              goto fail;
-    for ( ; i < (test->scale == SC_vl ? vsz : esz) + (sg ? esz : vsz); ++i )
+
+    n = test->scale == SC_vl ? vsz : esz;
+    if ( !sg )
+        n += vsz;
+    else if ( !strstr(test->mnemonic, "pf") )
+        n += esz;
+    else
+        ++n;
+
+    for ( ; i < n; ++i )
          if ( accessed[i] != (sg ? (vsz / esz) >> (test->opc & 1 & !evex.w)
                                  : 1) )
              goto fail;
+
     for ( ; i < ARRAY_SIZE(accessed); ++i )
          if ( accessed[i] )
              goto fail;
@@ -887,6 +908,8 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512dq, no128);
     RUN(avx512dq, 512);
     RUN(avx512er, 512);
+#define cpu_has_avx512pf cpu_has_avx512f
+    RUN(avx512pf, 512);
     RUN(avx512_vbmi, all);
     RUN(avx512_vbmi2, all);
 }
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -73,6 +73,7 @@ bool emul_test_init(void)
      */
     cp.basic.movbe = true;
     cp.feat.adx = true;
+    cp.feat.avx512pf = cp.feat.avx512f;
     cp.feat.rdpid = true;
     cp.extd.clzero = true;
 
@@ -135,12 +136,14 @@ int emul_test_cpuid(
         res->c |= 1U << 22;
 
     /*
-     * The emulator doesn't itself use ADCX/ADOX/RDPID, so we can always run
-     * the respective tests.
+     * The emulator doesn't itself use ADCX/ADOX/RDPID nor the S/G prefetch
+     * insns, so we can always run the respective tests.
      */
     if ( leaf == 7 && subleaf == 0 )
     {
         res->b |= 1U << 19;
+        if ( res->b & (1U << 16) )
+            res->b |= 1U << 26;
         res->c |= 1U << 22;
     }
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -525,6 +525,7 @@ static const struct ext0f38_table {
     [0xbd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xbe] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0xbf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0xc6 ... 0xc7] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq },
     [0xc8] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0xc9] = { .simd_size = simd_other },
     [0xca] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
@@ -1903,6 +1904,7 @@ static bool vcpu_has(
 #define vcpu_has_smap()        vcpu_has(         7, EBX, 20, ctxt, ops)
 #define vcpu_has_clflushopt()  vcpu_has(         7, EBX, 23, ctxt, ops)
 #define vcpu_has_clwb()        vcpu_has(         7, EBX, 24, ctxt, ops)
+#define vcpu_has_avx512pf()    vcpu_has(         7, EBX, 26, ctxt, ops)
 #define vcpu_has_avx512er()    vcpu_has(         7, EBX, 27, ctxt, ops)
 #define vcpu_has_sha()         vcpu_has(         7, EBX, 29, ctxt, ops)
 #define vcpu_has_avx512bw()    vcpu_has(         7, EBX, 30, ctxt, ops)
@@ -9425,6 +9427,94 @@ x86_emulate(
 
         state->simd_size = simd_none;
         break;
+    }
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xc6):
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xc7):
+    {
+#ifndef __XEN__
+        typeof(evex) *pevex;
+        union {
+            int32_t dw[16];
+            int64_t qw[8];
+        } index;
+#endif
+
+        ASSERT(ea.type == OP_MEM);
+        generate_exception_if((!cpu_has_avx512f || !evex.opmsk || evex.brs ||
+                               evex.z || evex.reg != 0xf || evex.lr != 2),
+                              EXC_UD);
+
+        switch ( modrm_reg & 7 )
+        {
+        case 1: /* vgatherpf0{d,q}p{s,d} mem{k} */
+        case 2: /* vgatherpf1{d,q}p{s,d} mem{k} */
+        case 5: /* vscatterpf0{d,q}p{s,d} mem{k} */
+        case 6: /* vscatterpf1{d,q}p{s,d} mem{k} */
+            vcpu_must_have(avx512pf);
+            break;
+        default:
+            generate_exception(EXC_UD);
+        }
+
+        get_fpu(X86EMUL_FPU_zmm);
+
+#ifndef __XEN__
+        /*
+         * For the test harness perform zero byte memory accesses, such that
+         * in particular correct Disp8 scaling can be verified.
+         */
+        fail_if((modrm_reg & 4) && !ops->write);
+
+        /* Read index register. */
+        opc = init_evex(stub);
+        pevex = copy_EVEX(opc, evex);
+        pevex->opcx = vex_0f;
+        /* vmovdqu{32,64} */
+        opc[0] = 0x7f;
+        pevex->pfx = vex_f3;
+        pevex->w = b & 1;
+        /* Use (%rax) as destination and sib_index as source. */
+        pevex->b = 1;
+        opc[1] = (state->sib_index & 7) << 3;
+        pevex->r = !mode_64bit() || !(state->sib_index & 0x08);
+        pevex->R = !mode_64bit() || !(state->sib_index & 0x10);
+        pevex->RX = 1;
+        opc[2] = 0xc3;
+
+        invoke_stub("", "", "=m" (index) : "a" (&index));
+        put_stub(stub);
+
+        /* Clear untouched parts of the mask value. */
+        n = 1 << (4 - ((b & 1) | evex.w));
+        op_mask &= (1 << n) - 1;
+
+        for ( i = 0; rc == X86EMUL_OKAY && op_mask; ++i )
+        {
+            signed long idx = b & 1 ? index.qw[i] : index.dw[i];
+
+            if ( !(op_mask & (1 << i)) )
+                continue;
+
+            rc = (modrm_reg & 4
+                  ? ops->write
+                  : ops->read)(ea.mem.seg,
+                               truncate_ea(ea.mem.off +
+                                           (idx << state->sib_scale)),
+                               NULL, 0, ctxt);
+            if ( rc == X86EMUL_EXCEPTION )
+            {
+                /* Squash memory access related exceptions. */
+                x86_emul_reset_event(ctxt);
+                rc = X86EMUL_OKAY;
+            }
+
+            op_mask &= ~(1 << i);
+        }
+#endif
+
+        state->simd_size = simd_none;
+        break;
     }
 
     case X86EMUL_OPC(0x0f38, 0xc8):     /* sha1nexte xmm/m128,xmm */

From patchwork Fri Mar 15 11:00:45 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854527
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 65C7F1575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:03:22 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4AE562A948
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:03:22 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 3CA7B2A94A; Fri, 15 Mar 2019 11:03:22 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C09382A948
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:03:21 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4ka1-0000Va-TV; Fri, 15 Mar 2019 11:00:49 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4ka0-0000VL-BZ
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:00:48 +0000
X-Inumbo-ID: 950d664a-4711-11e9-9e4d-bf46a59f0c42
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 950d664a-4711-11e9-9e4d-bf46a59f0c42;
 Fri, 15 Mar 2019 11:00:45 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:00:44 -0600
Message-Id: <5C8B85DD020000780021F2AC@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:00:45 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 36/50] x86emul: support AVX512CD insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Since the insns here and in particular their memory access patterns
follow the usual scheme I didn't think it was necessary to add
contrived tests specifically for them, beyond the Disp8 scaling ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v6: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -458,6 +458,13 @@ static const struct test avx512bw_128[]
     INSN(pinsrw, 66,   0f, c4, el, w, el),
 };
 
+static const struct test avx512cd_all[] = {
+//       pbroadcastmb2q, f3, 0f38, 2a,      q
+//       pbroadcastmw2d, f3, 0f38, 3a,      d
+    INSN(pconflict,      66, 0f38, c4, vl, dq, vl),
+    INSN(plzcnt,         66, 0f38, 44, vl, dq, vl),
+};
+
 static const struct test avx512dq_all[] = {
     INSN_PFP(and,              0f, 54),
     INSN_PFP(andn,             0f, 55),
@@ -903,6 +910,7 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512f, 512);
     RUN(avx512bw, all);
     RUN(avx512bw, 128);
+    RUN(avx512cd, all);
     RUN(avx512dq, all);
     RUN(avx512dq, 128);
     RUN(avx512dq, no128);
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -138,6 +138,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512f   (cp.feat.avx512f  && xcr0_mask(0xe6))
 #define cpu_has_avx512dq  (cp.feat.avx512dq && xcr0_mask(0xe6))
 #define cpu_has_avx512er  (cp.feat.avx512er && xcr0_mask(0xe6))
+#define cpu_has_avx512cd  (cp.feat.avx512cd && xcr0_mask(0xe6))
 #define cpu_has_avx512bw  (cp.feat.avx512bw && xcr0_mask(0xe6))
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -473,6 +473,7 @@ static const struct ext0f38_table {
     [0x41] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0x42] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x43] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0x44] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0x45 ... 0x47] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x4c] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
@@ -525,6 +526,7 @@ static const struct ext0f38_table {
     [0xbd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xbe] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0xbf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0xc4] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0xc6 ... 0xc7] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq },
     [0xc8] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0xc9] = { .simd_size = simd_other },
@@ -1906,6 +1908,7 @@ static bool vcpu_has(
 #define vcpu_has_clwb()        vcpu_has(         7, EBX, 24, ctxt, ops)
 #define vcpu_has_avx512pf()    vcpu_has(         7, EBX, 26, ctxt, ops)
 #define vcpu_has_avx512er()    vcpu_has(         7, EBX, 27, ctxt, ops)
+#define vcpu_has_avx512cd()    vcpu_has(         7, EBX, 28, ctxt, ops)
 #define vcpu_has_sha()         vcpu_has(         7, EBX, 29, ctxt, ops)
 #define vcpu_has_avx512bw()    vcpu_has(         7, EBX, 30, ctxt, ops)
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
@@ -8816,6 +8819,20 @@ x86_emulate(
         evex.opcx = vex_0f;
         goto vmovdqa;
 
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x2a): /* vpbroadcastmb2q k,[xyz]mm */
+    case X86EMUL_OPC_EVEX_F3(0x0f38, 0x3a): /* vpbroadcastmw2d k,[xyz]mm */
+        generate_exception_if((ea.type != OP_REG || evex.opmsk ||
+                               evex.w == ((b >> 4) & 1)),
+                              EXC_UD);
+        d |= TwoOp;
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xc4): /* vpconflict{d,q} [xyz]mm/mem,[xyz]mm{k} */
+        fault_suppression = false;
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x44): /* vplzcnt{d,q} [xyz]mm/mem,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512cd);
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0x2c): /* vmaskmovps mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x2d): /* vmaskmovpd mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x2e): /* vmaskmovps {x,y}mm,{x,y}mm,mem */
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -103,6 +103,7 @@
 #define cpu_has_rdseed          boot_cpu_has(X86_FEATURE_RDSEED)
 #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
 #define cpu_has_avx512er        boot_cpu_has(X86_FEATURE_AVX512ER)
+#define cpu_has_avx512cd        boot_cpu_has(X86_FEATURE_AVX512CD)
 #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bw        boot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vl        boot_cpu_has(X86_FEATURE_AVX512VL)

From patchwork Fri Mar 15 11:01:16 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854531
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 035451575
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:11 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF62228AA8
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:10 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D38CD28B17; Fri, 15 Mar 2019 11:04:10 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 66CAB28AA8
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:10 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kaW-0000eL-O5; Fri, 15 Mar 2019 11:01:20 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kaV-0000du-6c
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:01:19 +0000
X-Inumbo-ID: a8758a8b-4711-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id a8758a8b-4711-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 11:01:18 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:01:17 -0600
Message-Id: <5C8B85FC020000780021F2AF@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:01:16 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 37/50] x86emul: complete support of
 AVX512_VBMI insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Also add testing of ones support for which was added before. Sadly gcc's
command line option naming is not in line with Intel's naming of the
feature, which makes it necessary to mis-name things in the test harness.

Since the only new insn here and in particular its memory access pattern
follows the usual scheme, I didn't think it was necessary to add a
contrived test specifically for it, beyond the Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v6: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86
 
 CFLAGS += $(CFLAGS_xeninclude)
 
-SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er
+SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er avx512vbmi
 FMA := fma4 fma
 SG := avx2-sg avx512f-sg avx512vl-sg
 TESTCASES := blowfish $(SIMD) $(FMA) $(SG)
@@ -83,6 +83,9 @@ avx512dq-flts := $(avx512f-flts)
 avx512er-vecs := 64
 avx512er-ints :=
 avx512er-flts := 4 8
+avx512vbmi-vecs := $(avx512bw-vecs)
+avx512vbmi-ints := $(avx512bw-ints)
+avx512vbmi-flts := $(avx512bw-flts)
 
 avx512f-opmask-vecs := 2
 avx512dq-opmask-vecs := 1 2
--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -542,6 +542,7 @@ static const struct test avx512_vbmi_all
     INSN(permb,         66, 0f38, 8d, vl, b, vl),
     INSN(permi2b,       66, 0f38, 75, vl, b, vl),
     INSN(permt2b,       66, 0f38, 7d, vl, b, vl),
+    INSN(pmultishiftqb, 66, 0f38, 83, vl, q, vl),
 };
 
 static const struct test avx512_vbmi2_all[] = {
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -27,6 +27,7 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512bw.h"
 #include "avx512dq.h"
 #include "avx512er.h"
+#include "avx512vbmi.h"
 
 #define verbose false /* Switch to true for far more logging. */
 
@@ -127,6 +128,16 @@ static bool simd_check_avx512bw_vl(void)
     return cpu_has_avx512bw && cpu_has_avx512vl;
 }
 
+static bool simd_check_avx512vbmi(void)
+{
+    return cpu_has_avx512_vbmi;
+}
+
+static bool simd_check_avx512vbmi_vl(void)
+{
+    return cpu_has_avx512_vbmi && cpu_has_avx512vl;
+}
+
 static void simd_set_regs(struct cpu_user_regs *regs)
 {
     if ( cpu_has_mmx )
@@ -372,6 +383,18 @@ static const struct {
     SIMD(AVX512ER f32x16,    avx512er,      64f4),
     SIMD(AVX512ER f64 scalar,avx512er,        f8),
     SIMD(AVX512ER f64x8,     avx512er,      64f8),
+    SIMD(AVX512_VBMI s8x64,  avx512vbmi,    64i1),
+    SIMD(AVX512_VBMI u8x64,  avx512vbmi,    64u1),
+    SIMD(AVX512_VBMI s16x32, avx512vbmi,    64i2),
+    SIMD(AVX512_VBMI u16x32, avx512vbmi,    64u2),
+    AVX512VL(_VBMI+VL s8x16, avx512vbmi,    16i1),
+    AVX512VL(_VBMI+VL u8x16, avx512vbmi,    16u1),
+    AVX512VL(_VBMI+VL s8x32, avx512vbmi,    32i1),
+    AVX512VL(_VBMI+VL u8x32, avx512vbmi,    32u1),
+    AVX512VL(_VBMI+VL s16x8, avx512vbmi,    16i2),
+    AVX512VL(_VBMI+VL u16x8, avx512vbmi,    16u2),
+    AVX512VL(_VBMI+VL s16x16, avx512vbmi,   32i2),
+    AVX512VL(_VBMI+VL u16x16, avx512vbmi,   32u2),
 #undef AVX512VL_
 #undef AVX512VL
 #undef SIMD_
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -493,6 +493,7 @@ static const struct ext0f38_table {
     [0x7a ... 0x7c] = { .simd_size = simd_none, .two_op = 1 },
     [0x7d ... 0x7e] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x7f] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
+    [0x83] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x88] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_dq },
     [0x89] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_dq },
     [0x8a] = { .simd_size = simd_packed_fp, .to_mem = 1, .two_op = 1, .d8s = d8s_dq },
@@ -9023,6 +9024,12 @@ x86_emulate(
         ASSERT(!state->simd_size);
         break;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x83): /* vpmultishiftqb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        generate_exception_if(!evex.w, EXC_UD);
+        host_and_vcpu_must_have(avx512_vbmi);
+        fault_suppression = false;
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0x8c): /* vpmaskmov{d,q} mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x8e): /* vpmaskmov{d,q} {x,y}mm,{x,y}mm,mem */
         generate_exception_if(ea.type != OP_MEM, EXC_UD);

From patchwork Fri Mar 15 11:01:55 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854533
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FF3B15AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:23 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBD1928AA8
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:22 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id DFC6A28B1C; Fri, 15 Mar 2019 11:04:22 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6E34628AA8
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:21 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kb9-0000oX-4I; Fri, 15 Mar 2019 11:01:59 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kb7-0000oD-FI
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:01:57 +0000
X-Inumbo-ID: bebc2e36-4711-11e9-8b95-afa5702a057b
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id bebc2e36-4711-11e9-8b95-afa5702a057b;
 Fri, 15 Mar 2019 11:01:55 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:01:54 -0600
Message-Id: <5C8B8623020000780021F2B2@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:01:55 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 38/50] x86emul: support of AVX512* population
 count insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Plus the only other AVX512_BITALG one.

As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v7: Re-base.
v6: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -538,6 +538,11 @@ static const struct test avx512pf_512[]
     INSNX(scatterpf1q, 66, 0f38, c7, 6, vl, sd, el),
 };
 
+static const struct test avx512_bitalg_all[] = {
+    INSN(popcnt,      66, 0f38, 54, vl, bw, vl),
+    INSN(pshufbitqmb, 66, 0f38, 8f, vl,  b, vl),
+};
+
 static const struct test avx512_vbmi_all[] = {
     INSN(permb,         66, 0f38, 8d, vl, b, vl),
     INSN(permi2b,       66, 0f38, 75, vl, b, vl),
@@ -550,6 +555,10 @@ static const struct test avx512_vbmi2_al
     INSN(pexpand,   66, 0f38, 62, vl, bw, el),
 };
 
+static const struct test avx512_vpopcntdq_all[] = {
+    INSN(popcnt, 66, 0f38, 55, vl, dq, vl)
+};
+
 static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
 static const unsigned char vl_128[] = { VL_128 };
 static const unsigned char vl_no128[] = { VL_512, VL_256 };
@@ -919,6 +928,8 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512er, 512);
 #define cpu_has_avx512pf cpu_has_avx512f
     RUN(avx512pf, 512);
+    RUN(avx512_bitalg, all);
     RUN(avx512_vbmi, all);
     RUN(avx512_vbmi2, all);
+    RUN(avx512_vpopcntdq, all);
 }
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -143,6 +143,8 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
+#define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
+#define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -479,6 +479,7 @@ static const struct ext0f38_table {
     [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0x54 ... 0x55] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 },
     [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 },
     [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 },
@@ -501,6 +502,7 @@ static const struct ext0f38_table {
     [0x8c] = { .simd_size = simd_packed_int },
     [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 },
+    [0x8f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq },
     [0x96 ... 0x98] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x99] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
@@ -1915,6 +1917,8 @@ static bool vcpu_has(
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
 #define vcpu_has_avx512_vbmi() vcpu_has(         7, ECX,  1, ctxt, ops)
 #define vcpu_has_avx512_vbmi2() vcpu_has(        7, ECX,  6, ctxt, ops)
+#define vcpu_has_avx512_bitalg() vcpu_has(       7, ECX, 12, ctxt, ops)
+#define vcpu_has_avx512_vpopcntdq() vcpu_has(    7, ECX, 14, ctxt, ops)
 #define vcpu_has_rdpid()       vcpu_has(         7, ECX, 22, ctxt, ops)
 #define vcpu_has_clzero()      vcpu_has(0x80000008, EBX,  0, ctxt, ops)
 
@@ -8923,6 +8927,19 @@ x86_emulate(
         generate_exception_if(vex.l, EXC_UD);
         goto simd_0f_avx;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x8f): /* vpshufbitqmb [xyz]mm/mem,[xyz]mm,k{k} */
+        generate_exception_if(evex.w || !evex.r || !evex.R || evex.z, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x54): /* vpopcnt{b,w} [xyz]mm/mem,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512_bitalg);
+        generate_exception_if(evex.brs, EXC_UD);
+        elem_bytes = 1 << evex.w;
+        goto avx512f_no_sae;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x55): /* vpopcnt{d,q} [xyz]mm/mem,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512_vpopcntdq);
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,{x,y}mm */
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -111,6 +111,8 @@
 /* CPUID level 0x00000007:0.ecx */
 #define cpu_has_avx512_vbmi     boot_cpu_has(X86_FEATURE_AVX512_VBMI)
 #define cpu_has_avx512_vbmi2    boot_cpu_has(X86_FEATURE_AVX512_VBMI2)
+#define cpu_has_avx512_bitalg   boot_cpu_has(X86_FEATURE_AVX512_BITALG)
+#define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ)
 #define cpu_has_rdpid           boot_cpu_has(X86_FEATURE_RDPID)
 
 /* CPUID level 0x80000007.edx */
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
+XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A  Support for VPOPCNT[B,W] and VPSHUFBITQMB */
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
 XEN_CPUFEATURE(RDPID,         6*32+22) /*A  RDPID instruction */
 
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -269,7 +269,7 @@ def crunch_numbers(state):
         # AVX512 extensions acting (solely) on vectors of bytes/words are made
         # dependents of AVX512BW (as to requiring wider than 16-bit mask
         # registers), despite the SDM not formally making this connection.
-        AVX512BW: [AVX512_VBMI, AVX512_VBMI2],
+        AVX512BW: [AVX512_VBMI, AVX512_BITALG, AVX512_VBMI2],
 
         # The features:
         #   * Single Thread Indirect Branch Predictors

From patchwork Fri Mar 15 11:02:24 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854537
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3BA914DE
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:54 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C904A2A817
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:54 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id BD0A72A823; Fri, 15 Mar 2019 11:04:54 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 455332A817
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:04:54 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kbc-0000ua-HS; Fri, 15 Mar 2019 11:02:28 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kbb-0000uH-Lm
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:02:27 +0000
X-Inumbo-ID: d0b6b531-4711-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id d0b6b531-4711-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 11:02:25 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:02:25 -0600
Message-Id: <5C8B8640020000780021F2B5@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:02:24 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 39/50] x86emul: support of AVX512_IFMA insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Once again take the liberty and also correct the (public interface) name
of the AVX512_IFMA feature flag to match the SDM, on the assumption that
no external consumer has actually been using that flag so far.

As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Reject EVEX.W=0.
v6: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -543,6 +543,11 @@ static const struct test avx512_bitalg_a
     INSN(pshufbitqmb, 66, 0f38, 8f, vl,  b, vl),
 };
 
+static const struct test avx512_ifma_all[] = {
+    INSN(pmadd52huq, 66, 0f38, b5, vl, q, vl),
+    INSN(pmadd52luq, 66, 0f38, b4, vl, q, vl),
+};
+
 static const struct test avx512_vbmi_all[] = {
     INSN(permb,         66, 0f38, 8d, vl, b, vl),
     INSN(permi2b,       66, 0f38, 75, vl, b, vl),
@@ -929,6 +934,7 @@ void evex_disp8_test(void *instr, struct
 #define cpu_has_avx512pf cpu_has_avx512f
     RUN(avx512pf, 512);
     RUN(avx512_bitalg, all);
+    RUN(avx512_ifma, all);
     RUN(avx512_vbmi, all);
     RUN(avx512_vbmi2, all);
     RUN(avx512_vpopcntdq, all);
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -137,6 +137,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_bmi2       cp.feat.bmi2
 #define cpu_has_avx512f   (cp.feat.avx512f  && xcr0_mask(0xe6))
 #define cpu_has_avx512dq  (cp.feat.avx512dq && xcr0_mask(0xe6))
+#define cpu_has_avx512_ifma (cp.feat.avx512_ifma && xcr0_mask(0xe6))
 #define cpu_has_avx512er  (cp.feat.avx512er && xcr0_mask(0xe6))
 #define cpu_has_avx512cd  (cp.feat.avx512cd && xcr0_mask(0xe6))
 #define cpu_has_avx512bw  (cp.feat.avx512bw && xcr0_mask(0xe6))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -521,6 +521,7 @@ static const struct ext0f38_table {
     [0xad] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xae] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0xaf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0xb4 ... 0xb5] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0xb6 ... 0xb8] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0xb9] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xba] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
@@ -1907,6 +1908,7 @@ static bool vcpu_has(
 #define vcpu_has_rdseed()      vcpu_has(         7, EBX, 18, ctxt, ops)
 #define vcpu_has_adx()         vcpu_has(         7, EBX, 19, ctxt, ops)
 #define vcpu_has_smap()        vcpu_has(         7, EBX, 20, ctxt, ops)
+#define vcpu_has_avx512_ifma() vcpu_has(         7, EBX, 21, ctxt, ops)
 #define vcpu_has_clflushopt()  vcpu_has(         7, EBX, 23, ctxt, ops)
 #define vcpu_has_clwb()        vcpu_has(         7, EBX, 24, ctxt, ops)
 #define vcpu_has_avx512pf()    vcpu_has(         7, EBX, 26, ctxt, ops)
@@ -9470,6 +9472,12 @@ x86_emulate(
         break;
     }
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb4): /* vpmadd52luq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb5): /* vpmadd52huq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512_ifma);
+        generate_exception_if(!evex.w, EXC_UD);
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xc6):
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xc7):
     {
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -102,6 +102,7 @@
 #define cpu_has_avx512dq        boot_cpu_has(X86_FEATURE_AVX512DQ)
 #define cpu_has_rdseed          boot_cpu_has(X86_FEATURE_RDSEED)
 #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
+#define cpu_has_avx512_ifma     boot_cpu_has(X86_FEATURE_AVX512_IFMA)
 #define cpu_has_avx512er        boot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cd        boot_cpu_has(X86_FEATURE_AVX512CD)
 #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -212,7 +212,7 @@ XEN_CPUFEATURE(AVX512DQ,      5*32+17) /
 XEN_CPUFEATURE(RDSEED,        5*32+18) /*A  RDSEED instruction */
 XEN_CPUFEATURE(ADX,           5*32+19) /*A  ADCX, ADOX instructions */
 XEN_CPUFEATURE(SMAP,          5*32+20) /*S  Supervisor Mode Access Prevention */
-XEN_CPUFEATURE(AVX512IFMA,    5*32+21) /*A  AVX-512 Integer Fused Multiply Add */
+XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add */
 XEN_CPUFEATURE(CLFLUSHOPT,    5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,          5*32+24) /*A  CLWB instruction */
 XEN_CPUFEATURE(AVX512PF,      5*32+26) /*A  AVX-512 Prefetch Instructions */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -262,7 +262,7 @@ def crunch_numbers(state):
         # (which in practice depends on the EVEX prefix to encode) as well
         # as mask registers, and the instructions themselves. All further
         # AVX512 features are built on top of AVX512F
-        AVX512F: [AVX512DQ, AVX512IFMA, AVX512PF, AVX512ER, AVX512CD,
+        AVX512F: [AVX512DQ, AVX512_IFMA, AVX512PF, AVX512ER, AVX512CD,
                   AVX512BW, AVX512VL, AVX512_4VNNIW, AVX512_4FMAPS,
                   AVX512_VPOPCNTDQ],
 

From patchwork Fri Mar 15 11:02:45 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854539
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40A8F15AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:05:08 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 294F92A822
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:05:08 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 1DD832A88F; Fri, 15 Mar 2019 11:05:08 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AA7CB2A822
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:05:07 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kbw-00010m-U3; Fri, 15 Mar 2019 11:02:48 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kbv-00010L-4s
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:02:47 +0000
X-Inumbo-ID: dcbfce1a-4711-11e9-8da3-a701bc062552
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id dcbfce1a-4711-11e9-8da3-a701bc062552;
 Fri, 15 Mar 2019 11:02:46 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:02:45 -0600
Message-Id: <5C8B8655020000780021F2B8@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:02:45 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 40/50] x86emul: support remaining
 AVX512_VBMI2 insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v7: Re-base over change earlier in the series.
v6: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -558,6 +558,14 @@ static const struct test avx512_vbmi_all
 static const struct test avx512_vbmi2_all[] = {
     INSN(pcompress, 66, 0f38, 63, vl, bw, el),
     INSN(pexpand,   66, 0f38, 62, vl, bw, el),
+    INSN(pshld,     66, 0f3a, 71, vl, dq, vl),
+    INSN(pshldv,    66, 0f38, 71, vl, dq, vl),
+    INSN(pshldvw,   66, 0f38, 70, vl,  w, vl),
+    INSN(pshldw,    66, 0f3a, 70, vl,  w, vl),
+    INSN(pshrd,     66, 0f3a, 73, vl, dq, vl),
+    INSN(pshrdv,    66, 0f38, 73, vl, dq, vl),
+    INSN(pshrdvw,   66, 0f38, 72, vl,  w, vl),
+    INSN(pshrdw,    66, 0f3a, 72, vl,  w, vl),
 };
 
 static const struct test avx512_vpopcntdq_all[] = {
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -487,6 +487,7 @@ static const struct ext0f38_table {
     [0x62] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_bw },
     [0x63] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_bw },
     [0x64 ... 0x66] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0x70 ... 0x73] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0x78] = { .simd_size = simd_other, .two_op = 1 },
@@ -611,6 +612,7 @@ static const struct ext0f3a_table {
     [0x6a ... 0x6b] = { .simd_size = simd_scalar_opc, .four_op = 1 },
     [0x6c ... 0x6d] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x6e ... 0x6f] = { .simd_size = simd_scalar_opc, .four_op = 1 },
+    [0x70 ... 0x73] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x78 ... 0x79] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x7a ... 0x7b] = { .simd_size = simd_scalar_opc, .four_op = 1 },
     [0x7c ... 0x7d] = { .simd_size = simd_packed_fp, .four_op = 1 },
@@ -8993,6 +8995,16 @@ x86_emulate(
         }
         goto simd_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x70): /* vpshldvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x72): /* vpshrdvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        generate_exception_if(!evex.w, EXC_UD);
+        elem_bytes = 2;
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x71): /* vpshldv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x73): /* vpshrdv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512_vbmi2);
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x8d): /* vperm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
@@ -10293,6 +10305,16 @@ x86_emulate(
         avx512_vlen_check(true);
         goto simd_imm8_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x70): /* vpshldw $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x72): /* vpshrdw $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        generate_exception_if(!evex.w, EXC_UD);
+        elem_bytes = 2;
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x71): /* vpshld{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x73): /* vpshrd{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512_vbmi2);
+        goto avx512f_imm8_no_sae;
+
     case X86EMUL_OPC(0x0f3a, 0xcc):     /* sha1rnds4 $imm8,xmm/m128,xmm */
         host_and_vcpu_must_have(sha);
         op_bytes = 16;

From patchwork Fri Mar 15 11:04:09 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854541
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E50314DE
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:24 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 343DF2A94B
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:24 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 285A82A94D; Fri, 15 Mar 2019 11:06:24 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 817762A94B
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:23 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kdL-0001HH-HE; Fri, 15 Mar 2019 11:04:15 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kdK-0001H5-CN
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:04:14 +0000
X-Inumbo-ID: 108dcb02-4712-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 108dcb02-4712-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 11:04:12 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:04:12 -0600
Message-Id: <5C8B86A9020000780021F2BB@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:04:09 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 41/50] x86emul: support AVX512_4FMAPS insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v8: Correct vcpu_has_*() insertion point.
v7: Re-base.
v6: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -538,6 +538,13 @@ static const struct test avx512pf_512[]
     INSNX(scatterpf1q, 66, 0f38, c7, 6, vl, sd, el),
 };
 
+static const struct test avx512_4fmaps_512[] = {
+    INSN(4fmaddps,  f2, 0f38, 9a, el_4, d, vl),
+    INSN(4fmaddss,  f2, 0f38, 9b, el_4, d, vl),
+    INSN(4fnmaddps, f2, 0f38, aa, el_4, d, vl),
+    INSN(4fnmaddss, f2, 0f38, ab, el_4, d, vl),
+};
+
 static const struct test avx512_bitalg_all[] = {
     INSN(popcnt,      66, 0f38, 54, vl, bw, vl),
     INSN(pshufbitqmb, 66, 0f38, 8f, vl,  b, vl),
@@ -941,6 +948,7 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512er, 512);
 #define cpu_has_avx512pf cpu_has_avx512f
     RUN(avx512pf, 512);
+    RUN(avx512_4fmaps, 512);
     RUN(avx512_bitalg, all);
     RUN(avx512_ifma, all);
     RUN(avx512_vbmi, all);
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -4274,6 +4274,81 @@ int main(int argc, char **argv)
     }
 #endif
 
+    printf("%-40s", "Testing v4fmaddps 32(%ecx),%zmm4,%zmm4{%k5}...");
+    if ( stack_exec && cpu_has_avx512_4fmaps )
+    {
+        decl_insn(v4fmaddps);
+        static const struct {
+            float f[16];
+        } in = {{
+            1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
+        }}, out = {{
+            1 + 1 * 9 + 2 * 10 + 3 * 11 + 4 * 12,
+            2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+            16 + 16 * 9 + 17 * 10 + 18 * 11 + 19 * 12
+        }};
+
+        asm volatile ( "vmovups %1, %%zmm4\n\t"
+                       "vbroadcastss %%xmm4, %%zmm7\n\t"
+                       "vaddps %%zmm4, %%zmm7, %%zmm5\n\t"
+                       "vaddps %%zmm5, %%zmm7, %%zmm6\n\t"
+                       "vaddps %%zmm6, %%zmm7, %%zmm7\n\t"
+                       "kmovw %2, %%k5\n"
+                       put_insn(v4fmaddps,
+                                "v4fmaddps 32(%0), %%zmm4, %%zmm4%{%%k5%}")
+                       :: "c" (NULL), "m" (in), "rmk" (0x8001) );
+
+        set_insn(v4fmaddps);
+        regs.ecx = (unsigned long)&in;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(v4fmaddps) )
+            goto fail;
+
+        asm ( "vcmpeqps %1, %%zmm4, %%k0\n\t"
+              "kmovw %%k0, %0" : "=g" (rc) : "m" (out) );
+        if ( rc != 0xffff )
+            goto fail;
+        printf("okay\n");
+    }
+    else
+        printf("skipped\n");
+
+    printf("%-40s", "Testing v4fnmaddss 16(%edx),%zmm4,%zmm4{%k3}...");
+    if ( stack_exec && cpu_has_avx512_4fmaps )
+    {
+        decl_insn(v4fnmaddss);
+        static const struct {
+            float f[16];
+        } in = {{
+            1, 2, 3, 4, 5, 6, 7, 8
+        }}, out = {{
+            1 - 1 * 5 - 2 * 6 - 3 * 7 - 4 * 8, 2, 3, 4
+        }};
+
+        asm volatile ( "vmovups %1, %%xmm4\n\t"
+                       "vaddss %%xmm4, %%xmm4, %%xmm5\n\t"
+                       "vaddss %%xmm5, %%xmm4, %%xmm6\n\t"
+                       "vaddss %%xmm6, %%xmm4, %%xmm7\n\t"
+                       "kmovw %2, %%k3\n"
+                       put_insn(v4fnmaddss,
+                                "v4fnmaddss 16(%0), %%xmm4, %%xmm4%{%%k3%}")
+                       :: "d" (NULL), "m" (in), "rmk" (1) );
+
+        set_insn(v4fnmaddss);
+        regs.edx = (unsigned long)&in;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(v4fnmaddss) )
+            goto fail;
+
+        asm ( "vcmpeqps %1, %%zmm4, %%k0\n\t"
+              "kmovw %%k0, %0" : "=g" (rc) : "m" (out) );
+        if ( rc != 0xffff )
+            goto fail;
+        printf("okay\n");
+    }
+    else
+        printf("skipped\n");
+
 #undef decl_insn
 #undef put_insn
 #undef set_insn
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -146,6 +146,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
 #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
+#define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1924,6 +1924,7 @@ static bool vcpu_has(
 #define vcpu_has_avx512_bitalg() vcpu_has(       7, ECX, 12, ctxt, ops)
 #define vcpu_has_avx512_vpopcntdq() vcpu_has(    7, ECX, 14, ctxt, ops)
 #define vcpu_has_rdpid()       vcpu_has(         7, ECX, 22, ctxt, ops)
+#define vcpu_has_avx512_4fmaps() vcpu_has(       7, EDX,  3, ctxt, ops)
 #define vcpu_has_clzero()      vcpu_has(0x80000008, EBX,  0, ctxt, ops)
 
 #define vcpu_must_have(feat) \
@@ -3205,6 +3206,18 @@ x86_decode(
                                                    state);
                     state->simd_size = simd_other;
                 }
+
+                switch ( b )
+                {
+                /* v4f{,n}madd{p,s}s need special casing */
+                case 0x9a: case 0x9b: case 0xaa: case 0xab:
+                    if ( evex.pfx == vex_f2 )
+                    {
+                        disp8scale = 4;
+                        state->simd_size = simd_128;
+                    }
+                    break;
+                }
             }
             break;
 
@@ -9388,6 +9401,24 @@ x86_emulate(
             avx512_vlen_check(true);
         goto simd_zmm;
 
+    case X86EMUL_OPC_EVEX_F2(0x0f38, 0x9a): /* v4fmaddps m128,zmm+3,zmm{k} */
+    case X86EMUL_OPC_EVEX_F2(0x0f38, 0xaa): /* v4fnmaddps m128,zmm+3,zmm{k} */
+        host_and_vcpu_must_have(avx512_4fmaps);
+        generate_exception_if((ea.type != OP_MEM || evex.w || evex.brs ||
+                               evex.lr != 2),
+                              EXC_UD);
+        op_mask = op_mask & 0xffff ? 0xf : 0;
+        goto simd_zmm;
+
+    case X86EMUL_OPC_EVEX_F2(0x0f38, 0x9b): /* v4fmaddss m128,xmm+3,xmm{k} */
+    case X86EMUL_OPC_EVEX_F2(0x0f38, 0xab): /* v4fnmaddss m128,xmm+3,xmm{k} */
+        host_and_vcpu_must_have(avx512_4fmaps);
+        generate_exception_if((ea.type != OP_MEM || evex.w || evex.brs ||
+                               evex.lr == 3),
+                              EXC_UD);
+        op_mask = op_mask & 1 ? 0xf : 0;
+        goto simd_zmm;
+
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xa0): /* vpscatterd{d,q} [xyz]mm,mem{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xa1): /* vpscatterq{d,q} [xyz]mm,mem{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xa2): /* vscatterdp{s,d} [xyz]mm,mem{k} */
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -116,6 +116,9 @@
 #define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ)
 #define cpu_has_rdpid           boot_cpu_has(X86_FEATURE_RDPID)
 
+/* CPUID level 0x00000007:0.edx */
+#define cpu_has_avx512_4fmaps   boot_cpu_has(X86_FEATURE_AVX512_4FMAPS)
+
 /* CPUID level 0x80000007.edx */
 #define cpu_has_itsc            boot_cpu_has(X86_FEATURE_ITSC)
 

From patchwork Fri Mar 15 11:04:29 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854543
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06BB114DE
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:25 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3CD72A94A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:24 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D7B232A94C; Fri, 15 Mar 2019 11:06:24 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 70E332A94A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:24 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kdc-0001LO-SF; Fri, 15 Mar 2019 11:04:32 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kdb-0001L5-SV
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:04:31 +0000
X-Inumbo-ID: 1a1c6bec-4712-11e9-b2e3-97a58ff491c9
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 1a1c6bec-4712-11e9-b2e3-97a58ff491c9;
 Fri, 15 Mar 2019 11:04:29 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:04:28 -0600
Message-Id: <5C8B86BD020000780021F2BE@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:04:29 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 42/50] x86emul: support AVX512_4VNNIW insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

As in a few cases before, since the insns here and in particular their
memory access patterns follow the AVX512_4FMAPS scheme, I didn't think
it was necessary to add contrived tests specifically for them, beyond
the Disp8 scaling ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v8: Correct vcpu_has_*() insertion point.
v7: Re-base.
v6: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -545,6 +545,11 @@ static const struct test avx512_4fmaps_5
     INSN(4fnmaddss, f2, 0f38, ab, el_4, d, vl),
 };
 
+static const struct test avx512_4vnniw_512[] = {
+    INSN(p4dpwssd,  f2, 0f38, 52, el_4, d, vl),
+    INSN(p4dpwssds, f2, 0f38, 53, el_4, d, vl),
+};
+
 static const struct test avx512_bitalg_all[] = {
     INSN(popcnt,      66, 0f38, 54, vl, bw, vl),
     INSN(pshufbitqmb, 66, 0f38, 8f, vl,  b, vl),
@@ -949,6 +954,7 @@ void evex_disp8_test(void *instr, struct
 #define cpu_has_avx512pf cpu_has_avx512f
     RUN(avx512pf, 512);
     RUN(avx512_4fmaps, 512);
+    RUN(avx512_4vnniw, 512);
     RUN(avx512_bitalg, all);
     RUN(avx512_ifma, all);
     RUN(avx512_vbmi, all);
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -146,6 +146,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
 #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
+#define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6))
 #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -479,6 +479,7 @@ static const struct ext0f38_table {
     [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0x52 ... 0x53] = { .simd_size = simd_128, .d8s = 4 },
     [0x54 ... 0x55] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 },
     [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 },
@@ -1924,6 +1925,7 @@ static bool vcpu_has(
 #define vcpu_has_avx512_bitalg() vcpu_has(       7, ECX, 12, ctxt, ops)
 #define vcpu_has_avx512_vpopcntdq() vcpu_has(    7, ECX, 14, ctxt, ops)
 #define vcpu_has_rdpid()       vcpu_has(         7, ECX, 22, ctxt, ops)
+#define vcpu_has_avx512_4vnniw() vcpu_has(       7, EDX,  2, ctxt, ops)
 #define vcpu_has_avx512_4fmaps() vcpu_has(       7, EDX,  3, ctxt, ops)
 #define vcpu_has_clzero()      vcpu_has(0x80000008, EBX,  0, ctxt, ops)
 
@@ -8944,6 +8946,15 @@ x86_emulate(
         generate_exception_if(vex.l, EXC_UD);
         goto simd_0f_avx;
 
+    case X86EMUL_OPC_EVEX_F2(0x0f38, 0x52): /* vp4dpwssd m128,zmm+3,zmm{k} */
+    case X86EMUL_OPC_EVEX_F2(0x0f38, 0x53): /* vp4dpwssds m128,zmm+3,zmm{k} */
+        host_and_vcpu_must_have(avx512_4vnniw);
+        generate_exception_if((ea.type != OP_MEM || evex.w || evex.brs ||
+                               evex.lr != 2),
+                              EXC_UD);
+        op_mask = op_mask & 0xffff ? 0xf : 0;
+        goto simd_zmm;
+
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x8f): /* vpshufbitqmb [xyz]mm/mem,[xyz]mm,k{k} */
         generate_exception_if(evex.w || !evex.r || !evex.R || evex.z, EXC_UD);
         /* fall through */
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -117,6 +117,7 @@
 #define cpu_has_rdpid           boot_cpu_has(X86_FEATURE_RDPID)
 
 /* CPUID level 0x00000007:0.edx */
+#define cpu_has_avx512_4vnniw   boot_cpu_has(X86_FEATURE_AVX512_4VNNIW)
 #define cpu_has_avx512_4fmaps   boot_cpu_has(X86_FEATURE_AVX512_4FMAPS)
 
 /* CPUID level 0x80000007.edx */

From patchwork Fri Mar 15 11:04:57 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854545
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C7C7315AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:46 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A99EE2A94A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:46 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9B6CF2A94C; Fri, 15 Mar 2019 11:06:46 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DBA132A94A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:06:42 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4ke5-0001SF-6z; Fri, 15 Mar 2019 11:05:01 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4ke3-0001Ro-AC
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:04:59 +0000
X-Inumbo-ID: 2b67dcac-4712-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 2b67dcac-4712-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 11:04:57 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:04:57 -0600
Message-Id: <5C8B86D9020000780021F2C1@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:04:57 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 43/50] x86emul: support AVX512_VNNI insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v8: Re-base.
v7: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -580,6 +580,13 @@ static const struct test avx512_vbmi2_al
     INSN(pshrdw,    66, 0f3a, 72, vl,  w, vl),
 };
 
+static const struct test avx512_vnni_all[] = {
+    INSN(pdpbusd,  66, 0f38, 50, vl, d, vl),
+    INSN(pdpbusds, 66, 0f38, 51, vl, d, vl),
+    INSN(pdpwssd,  66, 0f38, 52, vl, d, vl),
+    INSN(pdpwssds, 66, 0f38, 53, vl, d, vl),
+};
+
 static const struct test avx512_vpopcntdq_all[] = {
     INSN(popcnt, 66, 0f38, 55, vl, dq, vl)
 };
@@ -959,5 +966,6 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512_ifma, all);
     RUN(avx512_vbmi, all);
     RUN(avx512_vbmi2, all);
+    RUN(avx512_vnni, all);
     RUN(avx512_vpopcntdq, all);
 }
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -144,6 +144,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
+#define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6))
 #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
 #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -479,7 +479,7 @@ static const struct ext0f38_table {
     [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
-    [0x52 ... 0x53] = { .simd_size = simd_128, .d8s = 4 },
+    [0x50 ... 0x53] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x54 ... 0x55] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl },
     [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 },
     [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 },
@@ -1922,6 +1922,7 @@ static bool vcpu_has(
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
 #define vcpu_has_avx512_vbmi() vcpu_has(         7, ECX,  1, ctxt, ops)
 #define vcpu_has_avx512_vbmi2() vcpu_has(        7, ECX,  6, ctxt, ops)
+#define vcpu_has_avx512_vnni() vcpu_has(         7, ECX, 11, ctxt, ops)
 #define vcpu_has_avx512_bitalg() vcpu_has(       7, ECX, 12, ctxt, ops)
 #define vcpu_has_avx512_vpopcntdq() vcpu_has(    7, ECX, 14, ctxt, ops)
 #define vcpu_has_rdpid()       vcpu_has(         7, ECX, 22, ctxt, ops)
@@ -3211,6 +3212,8 @@ x86_decode(
 
                 switch ( b )
                 {
+                /* vp4dpwssd{,s} need special casing */
+                case 0x52: case 0x53:
                 /* v4f{,n}madd{p,s}s need special casing */
                 case 0x9a: case 0x9b: case 0xaa: case 0xab:
                     if ( evex.pfx == vex_f2 )
@@ -9412,6 +9415,14 @@ x86_emulate(
             avx512_vlen_check(true);
         goto simd_zmm;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x50): /* vpdpbusd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x51): /* vpdpbusds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x52): /* vpdpwssd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x53): /* vpdpwssds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512_vnni);
+        generate_exception_if(evex.w, EXC_UD);
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_EVEX_F2(0x0f38, 0x9a): /* v4fmaddps m128,zmm+3,zmm{k} */
     case X86EMUL_OPC_EVEX_F2(0x0f38, 0xaa): /* v4fnmaddps m128,zmm+3,zmm{k} */
         host_and_vcpu_must_have(avx512_4fmaps);
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -112,6 +112,7 @@
 /* CPUID level 0x00000007:0.ecx */
 #define cpu_has_avx512_vbmi     boot_cpu_has(X86_FEATURE_AVX512_VBMI)
 #define cpu_has_avx512_vbmi2    boot_cpu_has(X86_FEATURE_AVX512_VBMI2)
+#define cpu_has_avx512_vnni     boot_cpu_has(X86_FEATURE_AVX512_VNNI)
 #define cpu_has_avx512_bitalg   boot_cpu_has(X86_FEATURE_AVX512_BITALG)
 #define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ)
 #define cpu_has_rdpid           boot_cpu_has(X86_FEATURE_RDPID)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
+XEN_CPUFEATURE(AVX512_VNNI,   6*32+11) /*A  Vector Neural Network Instrs */
 XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A  Support for VPOPCNT[B,W] and VPSHUFBITQMB */
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
 XEN_CPUFEATURE(RDPID,         6*32+22) /*A  RDPID instruction */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -264,7 +264,7 @@ def crunch_numbers(state):
         # AVX512 features are built on top of AVX512F
         AVX512F: [AVX512DQ, AVX512_IFMA, AVX512PF, AVX512ER, AVX512CD,
                   AVX512BW, AVX512VL, AVX512_4VNNIW, AVX512_4FMAPS,
-                  AVX512_VPOPCNTDQ],
+                  AVX512_VNNI, AVX512_VPOPCNTDQ],
 
         # AVX512 extensions acting (solely) on vectors of bytes/words are made
         # dependents of AVX512BW (as to requiring wider than 16-bit mask

From patchwork Fri Mar 15 11:05:31 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854547
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0EFCE15AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:07:14 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E8A9B2A94A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:07:13 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id DC7B52A94F; Fri, 15 Mar 2019 11:07:13 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 23CE92A94A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:07:13 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kec-0001ac-JS; Fri, 15 Mar 2019 11:05:34 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kea-0001aM-Jd
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:05:32 +0000
X-Inumbo-ID: 3f3ea9bc-4712-11e9-b2f8-0b5f97a8f175
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 3f3ea9bc-4712-11e9-b2f8-0b5f97a8f175;
 Fri, 15 Mar 2019 11:05:31 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:05:30 -0600
Message-Id: <5C8B86FB020000780021F31D@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:05:31 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 44/50] x86emul: support VPCLMULQDQ insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

As to the feature dependency adjustment, while strictly speaking AVX is
a sufficient prereq (to have YMM registers), 256-bit vectors of integers
have got fully introduced with AVX2 only. Sadly gcc can't be used as a
reference here: They don't provide any AVX512-independent built-in at
all.

Along the lines of PCLMULQDQ, since the insns here and in particular
their memory access patterns follow the usual scheme, I didn't think it
was necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
TBD: Should VPCLMULQDQ also depend on PCLMULQDQ?
---
v8: No need to set fault_suppression to false.
v7: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -591,6 +591,10 @@ static const struct test avx512_vpopcntd
     INSN(popcnt, 66, 0f38, 55, vl, dq, vl)
 };
 
+static const struct test vpclmulqdq_all[] = {
+    INSN(pclmulqdq, 66, 0f3a, 44, vl, q_nb, vl)
+};
+
 static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
 static const unsigned char vl_128[] = { VL_128 };
 static const unsigned char vl_no128[] = { VL_512, VL_256 };
@@ -968,4 +972,9 @@ void evex_disp8_test(void *instr, struct
     RUN(avx512_vbmi2, all);
     RUN(avx512_vnni, all);
     RUN(avx512_vpopcntdq, all);
+
+    if ( cpu_has_avx512f )
+    {
+        RUN(vpclmulqdq, all);
+    }
 }
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -144,6 +144,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
+#define cpu_has_vpclmulqdq (cp.feat.vpclmulqdq && xcr0_mask(6))
 #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6))
 #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -594,7 +594,7 @@ static const struct ext0f3a_table {
     [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x40 ... 0x41] = { .simd_size = simd_packed_fp },
     [0x42 ... 0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
-    [0x44] = { .simd_size = simd_packed_int },
+    [0x44] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x46] = { .simd_size = simd_packed_int },
     [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x4a ... 0x4b] = { .simd_size = simd_packed_fp, .four_op = 1 },
@@ -1922,6 +1922,7 @@ static bool vcpu_has(
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
 #define vcpu_has_avx512_vbmi() vcpu_has(         7, ECX,  1, ctxt, ops)
 #define vcpu_has_avx512_vbmi2() vcpu_has(        7, ECX,  6, ctxt, ops)
+#define vcpu_has_vpclmulqdq()  vcpu_has(         7, ECX, 10, ctxt, ops)
 #define vcpu_has_avx512_vnni() vcpu_has(         7, ECX, 11, ctxt, ops)
 #define vcpu_has_avx512_bitalg() vcpu_has(       7, ECX, 12, ctxt, ops)
 #define vcpu_has_avx512_vpopcntdq() vcpu_has(    7, ECX, 14, ctxt, ops)
@@ -10219,13 +10220,19 @@ x86_emulate(
         goto opmask_shift_imm;
 
     case X86EMUL_OPC_66(0x0f3a, 0x44):     /* pclmulqdq $imm8,xmm/m128,xmm */
-    case X86EMUL_OPC_VEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,xmm/m128,xmm,xmm */
+    case X86EMUL_OPC_VEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */
         host_and_vcpu_must_have(pclmulqdq);
         if ( vex.opcx == vex_none )
             goto simd_0f3a_common;
-        generate_exception_if(vex.l, EXC_UD);
+        if ( vex.l )
+            host_and_vcpu_must_have(vpclmulqdq);
         goto simd_0f_imm8_avx;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm */
+        host_and_vcpu_must_have(vpclmulqdq);
+        generate_exception_if(evex.brs || evex.opmsk, EXC_UD);
+        goto avx512f_imm8_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x4a): /* vblendvps {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x4b): /* vblendvpd {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
         generate_exception_if(vex.w, EXC_UD);
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -112,6 +112,7 @@
 /* CPUID level 0x00000007:0.ecx */
 #define cpu_has_avx512_vbmi     boot_cpu_has(X86_FEATURE_AVX512_VBMI)
 #define cpu_has_avx512_vbmi2    boot_cpu_has(X86_FEATURE_AVX512_VBMI2)
+#define cpu_has_vpclmulqdq      boot_cpu_has(X86_FEATURE_VPCLMULQDQ)
 #define cpu_has_avx512_vnni     boot_cpu_has(X86_FEATURE_AVX512_VNNI)
 #define cpu_has_avx512_bitalg   boot_cpu_has(X86_FEATURE_AVX512_BITALG)
 #define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -121,7 +121,7 @@ XEN_CPUFEATURE(PBE,           0*32+31) /
 
 /* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
 XEN_CPUFEATURE(SSE3,          1*32+ 0) /*A  Streaming SIMD Extensions-3 */
-XEN_CPUFEATURE(PCLMULQDQ,     1*32+ 1) /*A  Carry-less mulitplication */
+XEN_CPUFEATURE(PCLMULQDQ,     1*32+ 1) /*A  Carry-less multiplication */
 XEN_CPUFEATURE(DTES64,        1*32+ 2) /*   64-bit Debug Store */
 XEN_CPUFEATURE(MONITOR,       1*32+ 3) /*   Monitor/Mwait support */
 XEN_CPUFEATURE(DSCPL,         1*32+ 4) /*   CPL Qualified Debug Store */
@@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
+XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
 XEN_CPUFEATURE(AVX512_VNNI,   6*32+11) /*A  Vector Neural Network Instrs */
 XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A  Support for VPOPCNT[B,W] and VPSHUFBITQMB */
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -255,8 +255,9 @@ def crunch_numbers(state):
 
         # This is just the dependency between AVX512 and AVX2 of XSTATE
         # feature flags.  If want to use AVX512, AVX2 must be supported and
-        # enabled.
-        AVX2: [AVX512F],
+        # enabled.  Certain later extensions, acting on 256-bit vectors of
+        # integers, better depend on AVX2 than AVX.
+        AVX2: [AVX512F, VPCLMULQDQ],
 
         # AVX512F is taken to mean hardware support for 512bit registers
         # (which in practice depends on the EVEX prefix to encode) as well

From patchwork Fri Mar 15 11:06:01 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854551
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD9DE14DE
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:07:37 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A48962A94B
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:07:37 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 98F632A94D; Fri, 15 Mar 2019 11:07:37 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 06C1E2A94B
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:07:37 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kf6-0001ga-3V; Fri, 15 Mar 2019 11:06:04 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kf5-0001gP-DA
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:06:03 +0000
X-Inumbo-ID: 51ec1320-4712-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 51ec1320-4712-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 11:06:02 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:06:01 -0600
Message-Id: <5C8B8719020000780021F320@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:06:01 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 45/50] x86emul: support VAES insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

As to the feature dependency adjustment, just like for VPCLMULQDQ while
strictly speaking AVX is a sufficient prereq (to have YMM registers),
256-bit vectors of integers have got fully introduced with AVX2 only.

A new test case (also covering AESNI) will be added to the harness by a
subsequent patch.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
---
TBD: Should VAES also depend on AESNI?
---
v8: No need to set fault_suppression to false.
v7: New.

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -591,6 +591,18 @@ static const struct test avx512_vpopcntd
     INSN(popcnt, 66, 0f38, 55, vl, dq, vl)
 };
 
+/*
+ * The uses of b in this table are simply (one of) the shortest form(s) of
+ * saying "no broadcast" without introducing a 128-bit granularity enumerator.
+ * Due to all of the insns being WIG, w, d_nb, and q_nb would all also fit.
+ */
+static const struct test vaes_all[] = {
+    INSN(aesdec,     66, 0f38, de, vl, b, vl),
+    INSN(aesdeclast, 66, 0f38, df, vl, b, vl),
+    INSN(aesenc,     66, 0f38, dc, vl, b, vl),
+    INSN(aesenclast, 66, 0f38, dd, vl, b, vl),
+};
+
 static const struct test vpclmulqdq_all[] = {
     INSN(pclmulqdq, 66, 0f3a, 44, vl, q_nb, vl)
 };
@@ -975,6 +987,7 @@ void evex_disp8_test(void *instr, struct
 
     if ( cpu_has_avx512f )
     {
+        RUN(vaes, all);
         RUN(vpclmulqdq, all);
     }
 }
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -144,6 +144,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
+#define cpu_has_vaes      (cp.feat.vaes && xcr0_mask(6))
 #define cpu_has_vpclmulqdq (cp.feat.vpclmulqdq && xcr0_mask(6))
 #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6))
 #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -541,7 +541,7 @@ static const struct ext0f38_table {
     [0xcc] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0xcd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xdb] = { .simd_size = simd_packed_int, .two_op = 1 },
-    [0xdc ... 0xdf] = { .simd_size = simd_packed_int },
+    [0xdc ... 0xdf] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0xf0] = { .two_op = 1 },
     [0xf1] = { .to_mem = 1, .two_op = 1 },
     [0xf2 ... 0xf3] = {},
@@ -1922,6 +1922,7 @@ static bool vcpu_has(
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
 #define vcpu_has_avx512_vbmi() vcpu_has(         7, ECX,  1, ctxt, ops)
 #define vcpu_has_avx512_vbmi2() vcpu_has(        7, ECX,  6, ctxt, ops)
+#define vcpu_has_vaes()        vcpu_has(         7, ECX,  9, ctxt, ops)
 #define vcpu_has_vpclmulqdq()  vcpu_has(         7, ECX, 10, ctxt, ops)
 #define vcpu_has_avx512_vnni() vcpu_has(         7, ECX, 11, ctxt, ops)
 #define vcpu_has_avx512_bitalg() vcpu_has(       7, ECX, 12, ctxt, ops)
@@ -8935,13 +8936,9 @@ x86_emulate(
     case X86EMUL_OPC_66(0x0f38, 0xdb):     /* aesimc xmm/m128,xmm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0xdb): /* vaesimc xmm/m128,xmm */
     case X86EMUL_OPC_66(0x0f38, 0xdc):     /* aesenc xmm/m128,xmm,xmm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0xdc): /* vaesenc xmm/m128,xmm,xmm */
     case X86EMUL_OPC_66(0x0f38, 0xdd):     /* aesenclast xmm/m128,xmm,xmm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0xdd): /* vaesenclast xmm/m128,xmm,xmm */
     case X86EMUL_OPC_66(0x0f38, 0xde):     /* aesdec xmm/m128,xmm,xmm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0xde): /* vaesdec xmm/m128,xmm,xmm */
     case X86EMUL_OPC_66(0x0f38, 0xdf):     /* aesdeclast xmm/m128,xmm,xmm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0xdf): /* vaesdeclast xmm/m128,xmm,xmm */
         host_and_vcpu_must_have(aesni);
         if ( vex.opcx == vex_none )
             goto simd_0f38_common;
@@ -9655,6 +9652,24 @@ x86_emulate(
         host_and_vcpu_must_have(avx512er);
         goto simd_zmm_scalar_sae;
 
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xdc):  /* vaesenc {x,y}mm/mem,{x,y}mm,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xdd):  /* vaesenclast {x,y}mm/mem,{x,y}mm,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xde):  /* vaesdec {x,y}mm/mem,{x,y}mm,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xdf):  /* vaesdeclast {x,y}mm/mem,{x,y}mm,{x,y}mm */
+        if ( !vex.l )
+            host_and_vcpu_must_have(aesni);
+        else
+            host_and_vcpu_must_have(vaes);
+        goto simd_0f_avx;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xdc): /* vaesenc [xyz]mm/mem,[xyz]mm,[xyz]mm */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xdd): /* vaesenclast [xyz]mm/mem,[xyz]mm,[xyz]mm */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xde): /* vaesdec [xyz]mm/mem,[xyz]mm,[xyz]mm */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xdf): /* vaesdeclast [xyz]mm/mem,[xyz]mm,[xyz]mm */
+        host_and_vcpu_must_have(vaes);
+        generate_exception_if(evex.brs || evex.opmsk, EXC_UD);
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC(0x0f38, 0xf0): /* movbe m,r */
     case X86EMUL_OPC(0x0f38, 0xf1): /* movbe r,m */
         vcpu_must_have(movbe);
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -112,6 +112,7 @@
 /* CPUID level 0x00000007:0.ecx */
 #define cpu_has_avx512_vbmi     boot_cpu_has(X86_FEATURE_AVX512_VBMI)
 #define cpu_has_avx512_vbmi2    boot_cpu_has(X86_FEATURE_AVX512_VBMI2)
+#define cpu_has_vaes            boot_cpu_has(X86_FEATURE_VAES)
 #define cpu_has_vpclmulqdq      boot_cpu_has(X86_FEATURE_VPCLMULQDQ)
 #define cpu_has_avx512_vnni     boot_cpu_has(X86_FEATURE_AVX512_VNNI)
 #define cpu_has_avx512_bitalg   boot_cpu_has(X86_FEATURE_AVX512_BITALG)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
+XEN_CPUFEATURE(VAES,          6*32+ 9) /*A  Vector AES Instrs */
 XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
 XEN_CPUFEATURE(AVX512_VNNI,   6*32+11) /*A  Vector Neural Network Instrs */
 XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A  Support for VPOPCNT[B,W] and VPSHUFBITQMB */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -257,7 +257,7 @@ def crunch_numbers(state):
         # feature flags.  If want to use AVX512, AVX2 must be supported and
         # enabled.  Certain later extensions, acting on 256-bit vectors of
         # integers, better depend on AVX2 than AVX.
-        AVX2: [AVX512F, VPCLMULQDQ],
+        AVX2: [AVX512F, VAES, VPCLMULQDQ],
 
         # AVX512F is taken to mean hardware support for 512bit registers
         # (which in practice depends on the EVEX prefix to encode) as well

From patchwork Fri Mar 15 11:06:27 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854553
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84CA614DE
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:08:18 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A0112A94C
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:08:18 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 5B3862A94E; Fri, 15 Mar 2019 11:08:18 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8C2B52A94C
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:08:16 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kfc-0001pN-Dy; Fri, 15 Mar 2019 11:06:36 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kfb-0001pE-G5
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:06:35 +0000
X-Inumbo-ID: 63619caa-4712-11e9-8595-c71003392d02
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 63619caa-4712-11e9-8595-c71003392d02;
 Fri, 15 Mar 2019 11:06:32 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:06:31 -0600
Message-Id: <5C8B8733020000780021F323@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:06:27 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 46/50] x86emul: support GFNI insns
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Note that the ISA extensions document revision 035 is ambiguous
regarding fault suppression for VGF2P8MULB: Text says it's supported,
while the exception specification listed is E4NF. Given the wording here
and for the other two insns I'm inclined to trust the text more than the
exception reference, which was also confirmed informally.

As to the feature dependency adjustment, while strictly speaking SSE is
a sufficient prereq (to have XMM registers), vectors of bytes and qwords
have got introduced only with SSE2. gcc, for example, uses a similar
connection in its respective intrinsics header.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v8: Add {evex}-producing vgf2p8mulb alias to simd.h. Add missing simd.h
    dependency. Re-base.
v7: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -19,7 +19,8 @@ CFLAGS += $(CFLAGS_xeninclude)
 SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er avx512vbmi
 FMA := fma4 fma
 SG := avx2-sg avx512f-sg avx512vl-sg
-TESTCASES := blowfish $(SIMD) $(FMA) $(SG)
+GF := sse2-gf avx2-gf avx512bw-gf
+TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(GF)
 
 OPMASK := avx512f avx512dq avx512bw
 
@@ -142,12 +143,17 @@ $(1)-cflags := \
 	   $(foreach flt,$($(1)-flts), \
 	     "-D_$(vec)x$(idx)f$(flt) -m$(1:-sg=) $(call non-sse,$(1)) -Os -DVEC_MAX=$(vec) -DIDX_SIZE=$(idx) -DFLOAT_SIZE=$(flt)")))
 endef
+define simd-gf-defs
+$(1)-cflags := $(foreach vec,$($(1:-gf=)-vecs), \
+	         "-D_$(vec) -mgfni -m$(1:-gf=) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)")
+endef
 define opmask-defs
 $(1)-opmask-cflags := $(foreach vec,$($(1)-opmask-vecs), "-D_$(vec) -m$(1) -Os -DSIZE=$(vec)")
 endef
 
 $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor))))
 $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor))))
+$(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor))))
 $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor))))
 
 first-string = $(shell for s in $(1); do echo "$$s"; break; done)
@@ -197,7 +203,10 @@ $(addsuffix .c,$(FMA)):
 $(addsuffix .c,$(SG)):
 	ln -sf simd-sg.c $@
 
-$(addsuffix .h,$(SIMD) $(FMA) $(SG)): simd.h
+$(addsuffix .c,$(GF)):
+	ln -sf simd-gf.c $@
+
+$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(GF)): simd.h
 
 xop.h avx512f.h: simd-fma.c
 
--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -591,6 +591,12 @@ static const struct test avx512_vpopcntd
     INSN(popcnt, 66, 0f38, 55, vl, dq, vl)
 };
 
+static const struct test gfni_all[] = {
+    INSN(gf2p8affineinvqb, 66, 0f3a, cf, vl, q, vl),
+    INSN(gf2p8affineqb,    66, 0f3a, ce, vl, q, vl),
+    INSN(gf2p8mulb,        66, 0f38, cf, vl, b, vl),
+};
+
 /*
  * The uses of b in this table are simply (one of) the shortest form(s) of
  * saying "no broadcast" without introducing a 128-bit granularity enumerator.
@@ -987,6 +993,7 @@ void evex_disp8_test(void *instr, struct
 
     if ( cpu_has_avx512f )
     {
+        RUN(gfni, all);
         RUN(vaes, all);
         RUN(vpclmulqdq, all);
     }
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -371,6 +371,7 @@ OVR(cvttsd2siq);
 OVR(cvttss2si);
 OVR(cvttss2sil);
 OVR(cvttss2siq);
+OVR(gf2p8mulb);
 OVR(movddup);
 OVR(movntdq);
 OVR(movntdqa);
--- /dev/null
+++ b/tools/tests/x86_emulator/simd-gf.c
@@ -0,0 +1,80 @@
+#define UINT_SIZE 1
+
+#include "simd.h"
+ENTRY(gf_test);
+
+#if VEC_SIZE == 16
+# define GF(op, s, a...) __builtin_ia32_vgf2p8 ## op ## _v16qi ## s(a)
+#elif VEC_SIZE == 32
+# define GF(op, s, a...) __builtin_ia32_vgf2p8 ## op ## _v32qi ## s(a)
+#elif VEC_SIZE == 64
+# define GF(op, s, a...) __builtin_ia32_vgf2p8 ## op ## _v64qi ## s(a)
+#endif
+
+#ifdef __AVX512BW__
+# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT))
+# define eq(x, y) (B(pcmpeqb, _mask, (vqi_t)(x), (vqi_t)(y), -1) == ALL_TRUE)
+# define mul(x, y) GF(mulb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)
+# define transform(m, dir, x, c) ({ \
+    vec_t t_; \
+    asm ( "vgf2p8affine" #dir "qb %[imm], %[matrix]%{1to%c[n]%}, %[src], %[dst]" \
+          : [dst] "=v" (t_) \
+          : [matrix] "m" (m), [src] "v" (x), [imm] "i" (c), [n] "i" (VEC_SIZE / 8) ); \
+    t_; \
+})
+#else
+# if defined(__AVX2__)
+#  define bcstq(x) ({ \
+    vdi_t t_; \
+    asm ( "vpbroadcastq %1, %0" : "=x" (t_) : "m" (x) ); \
+    t_; \
+})
+#  define to_bool(cmp) B(ptestc, , cmp, (vdi_t){} == 0)
+# else
+#  define bcstq(x) ((vdi_t){x, x})
+#  define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff)
+# endif
+# define eq(x, y) to_bool((x) == (y))
+# define mul(x, y) GF(mulb, , (vqi_t)(x), (vqi_t)(y))
+# define transform(m, dir, x, c) ({ \
+    vdi_t m_ = bcstq(m); \
+    touch(m_); \
+    ((vec_t)GF(affine ## dir ## qb, , (vqi_t)(x), (vqi_t)m_, c)); \
+})
+#endif
+
+const unsigned __attribute__((mode(DI))) ident = 0x0102040810204080ULL;
+
+int gf_test(void)
+{
+    unsigned int i;
+    vec_t src, one;
+
+    for ( i = 0; i < ELEM_COUNT; ++i )
+    {
+        src[i] = i;
+        one[i] = 1;
+    }
+
+    /* Special case for first iteration. */
+    one[0] = 0;
+
+    do {
+        vec_t inv = transform(ident, inv, src, 0);
+
+        touch(src);
+        touch(inv);
+        if ( !eq(mul(src, inv), one) ) return __LINE__;
+
+        touch(src);
+        touch(inv);
+        if ( !eq(mul(inv, src), one) ) return __LINE__;
+
+        one[0] = 1;
+
+        src += ELEM_COUNT;
+        i += ELEM_COUNT;
+    } while ( i < 256 );
+
+    return 0;
+}
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -11,12 +11,14 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "3dnow.h"
 #include "sse.h"
 #include "sse2.h"
+#include "sse2-gf.h"
 #include "sse4.h"
 #include "avx.h"
 #include "fma4.h"
 #include "fma.h"
 #include "avx2.h"
 #include "avx2-sg.h"
+#include "avx2-gf.h"
 #include "xop.h"
 #include "avx512f-opmask.h"
 #include "avx512dq-opmask.h"
@@ -25,6 +27,7 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512f-sg.h"
 #include "avx512vl-sg.h"
 #include "avx512bw.h"
+#include "avx512bw-gf.h"
 #include "avx512dq.h"
 #include "avx512er.h"
 #include "avx512vbmi.h"
@@ -138,6 +141,26 @@ static bool simd_check_avx512vbmi_vl(voi
     return cpu_has_avx512_vbmi && cpu_has_avx512vl;
 }
 
+static bool simd_check_sse2_gf(void)
+{
+    return cpu_has_gfni && cpu_has_sse2;
+}
+
+static bool simd_check_avx2_gf(void)
+{
+    return cpu_has_gfni && cpu_has_avx2;
+}
+
+static bool simd_check_avx512bw_gf(void)
+{
+    return cpu_has_gfni && cpu_has_avx512bw;
+}
+
+static bool simd_check_avx512bw_gf_vl(void)
+{
+    return cpu_has_gfni && cpu_has_avx512vl;
+}
+
 static void simd_set_regs(struct cpu_user_regs *regs)
 {
     if ( cpu_has_mmx )
@@ -395,6 +418,12 @@ static const struct {
     AVX512VL(_VBMI+VL u16x8, avx512vbmi,    16u2),
     AVX512VL(_VBMI+VL s16x16, avx512vbmi,   32i2),
     AVX512VL(_VBMI+VL u16x16, avx512vbmi,   32u2),
+    SIMD(GFNI (legacy),       sse2_gf,        16),
+    SIMD(GFNI (VEX/x16),      avx2_gf,        16),
+    SIMD(GFNI (VEX/x32),      avx2_gf,        32),
+    SIMD(GFNI (EVEX/x64), avx512bw_gf,        64),
+    AVX512VL(VL+GFNI (x16), avx512bw_gf,      16),
+    AVX512VL(VL+GFNI (x32), avx512bw_gf,      32),
 #undef AVX512VL_
 #undef AVX512VL
 #undef SIMD_
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -144,6 +144,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
+#define cpu_has_gfni       cp.feat.gfni
 #define cpu_has_vaes      (cp.feat.vaes && xcr0_mask(6))
 #define cpu_has_vpclmulqdq (cp.feat.vpclmulqdq && xcr0_mask(6))
 #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -540,6 +540,7 @@ static const struct ext0f38_table {
     [0xcb] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xcc] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl },
     [0xcd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0xcf] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0xdb] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0xdc ... 0xdf] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0xf0] = { .two_op = 1 },
@@ -619,6 +620,7 @@ static const struct ext0f3a_table {
     [0x7c ... 0x7d] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x7e ... 0x7f] = { .simd_size = simd_scalar_opc, .four_op = 1 },
     [0xcc] = { .simd_size = simd_other },
+    [0xce ... 0xcf] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0xdf] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0xf0] = {},
 };
@@ -1922,6 +1924,7 @@ static bool vcpu_has(
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
 #define vcpu_has_avx512_vbmi() vcpu_has(         7, ECX,  1, ctxt, ops)
 #define vcpu_has_avx512_vbmi2() vcpu_has(        7, ECX,  6, ctxt, ops)
+#define vcpu_has_gfni()        vcpu_has(         7, ECX,  8, ctxt, ops)
 #define vcpu_has_vaes()        vcpu_has(         7, ECX,  9, ctxt, ops)
 #define vcpu_has_vpclmulqdq()  vcpu_has(         7, ECX, 10, ctxt, ops)
 #define vcpu_has_avx512_vnni() vcpu_has(         7, ECX, 11, ctxt, ops)
@@ -9652,6 +9655,21 @@ x86_emulate(
         host_and_vcpu_must_have(avx512er);
         goto simd_zmm_scalar_sae;
 
+    case X86EMUL_OPC_66(0x0f38, 0xcf):      /* gf2p8mulb xmm/m128,xmm */
+        host_and_vcpu_must_have(gfni);
+        goto simd_0f38_common;
+
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xcf):  /* vgf2p8mulb {x,y}mm/mem,{x,y}mm,{x,y}mm */
+        host_and_vcpu_must_have(gfni);
+        generate_exception_if(vex.w, EXC_UD);
+        goto simd_0f_avx;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xcf): /* vgf2p8mulb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(gfni);
+        generate_exception_if(evex.w || evex.brs, EXC_UD);
+        elem_bytes = 1;
+        goto avx512f_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0xdc):  /* vaesenc {x,y}mm/mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0xdd):  /* vaesenclast {x,y}mm/mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0xde):  /* vaesdec {x,y}mm/mem,{x,y}mm,{x,y}mm */
@@ -10395,6 +10413,24 @@ x86_emulate(
         op_bytes = 16;
         goto simd_0f3a_common;
 
+    case X86EMUL_OPC_66(0x0f3a, 0xce):      /* gf2p8affineqb $imm8,xmm/m128,xmm */
+    case X86EMUL_OPC_66(0x0f3a, 0xcf):      /* gf2p8affineinvqb $imm8,xmm/m128,xmm */
+        host_and_vcpu_must_have(gfni);
+        goto simd_0f3a_common;
+
+    case X86EMUL_OPC_VEX_66(0x0f3a, 0xce):  /* vgf2p8affineqb $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f3a, 0xcf):  /* vgf2p8affineinvqb $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */
+        host_and_vcpu_must_have(gfni);
+        generate_exception_if(!vex.w, EXC_UD);
+        goto simd_0f_imm8_avx;
+
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0xce): /* vgf2p8affineqb $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0xcf): /* vgf2p8affineinvqb $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(gfni);
+        generate_exception_if(!evex.w, EXC_UD);
+        fault_suppression = false;
+        goto avx512f_imm8_no_sae;
+
     case X86EMUL_OPC_66(0x0f3a, 0xdf):     /* aeskeygenassist $imm8,xmm/m128,xmm */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0xdf): /* vaeskeygenassist $imm8,xmm/m128,xmm */
         host_and_vcpu_must_have(aesni);
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -112,6 +112,7 @@
 /* CPUID level 0x00000007:0.ecx */
 #define cpu_has_avx512_vbmi     boot_cpu_has(X86_FEATURE_AVX512_VBMI)
 #define cpu_has_avx512_vbmi2    boot_cpu_has(X86_FEATURE_AVX512_VBMI2)
+#define cpu_has_gfni            boot_cpu_has(X86_FEATURE_GFNI)
 #define cpu_has_vaes            boot_cpu_has(X86_FEATURE_VAES)
 #define cpu_has_vpclmulqdq      boot_cpu_has(X86_FEATURE_VPCLMULQDQ)
 #define cpu_has_avx512_vnni     boot_cpu_has(X86_FEATURE_AVX512_VNNI)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
+XEN_CPUFEATURE(GFNI,          6*32+ 8) /*A  Galois Field Instrs */
 XEN_CPUFEATURE(VAES,          6*32+ 9) /*A  Vector AES Instrs */
 XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
 XEN_CPUFEATURE(AVX512_VNNI,   6*32+11) /*A  Vector Neural Network Instrs */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -197,7 +197,7 @@ def crunch_numbers(state):
         # %XMM support, without specific inter-dependencies.  Additionally
         # AMD has a special mis-alignment sub-mode.
         SSE: [SSE2, SSE3, SSSE3, SSE4A, MISALIGNSSE,
-              AESNI, PCLMULQDQ, SHA],
+              AESNI, PCLMULQDQ, SHA, GFNI],
 
         # SSE2 was re-specified as core instructions for 64bit.
         SSE2: [LM],

From patchwork Fri Mar 15 11:07:08 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854555
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 138FA15AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:08:54 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E5D012A94C
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:08:53 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D4BBF2A94E; Fri, 15 Mar 2019 11:08:53 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6C31D2A94C
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:08:52 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kgB-0001yj-3G; Fri, 15 Mar 2019 11:07:11 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kgA-0001yT-6A
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:07:10 +0000
X-Inumbo-ID: 79038e7b-4712-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 79038e7b-4712-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 11:07:08 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:07:07 -0600
Message-Id: <5C8B875C020000780021F326@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:07:08 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 47/50] x86emul: restore ordering within main
 switch statement
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Incremental additions and/or mistakes have lead to some code blocks
sitting in "unexpected" places. Re-sort the case blocks (opcode space;
major opcode; 66/F3/F2 prefix; legacy/VEX/EVEX encoding).

As an exception the opcode space 0x0f EVEX-encoded VPEXTRW is left at
its current place, to keep it close to the "pextr" label.

Pure code movement.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
---
v7: New.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -7129,15 +7129,6 @@ x86_emulate(
         ASSERT(!state->simd_size);
         break;
 
-    case X86EMUL_OPC_EVEX_F3(0x0f, 0x7e): /* vmovq xmm/m64,xmm */
-    case X86EMUL_OPC_EVEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */
-        generate_exception_if(evex.lr || !evex.w || evex.opmsk || evex.brs,
-                              EXC_UD);
-        host_and_vcpu_must_have(avx512f);
-        d |= TwoOp;
-        op_bytes = 8;
-        goto simd_zmm;
-
     case X86EMUL_OPC_66(0x0f, 0xe7):     /* movntdq xmm,m128 */
     case X86EMUL_OPC_VEX_66(0x0f, 0xe7): /* vmovntdq {x,y}mm,mem */
         generate_exception_if(ea.type != OP_MEM, EXC_UD);
@@ -7535,6 +7526,15 @@ x86_emulate(
         op_bytes = 8;
         goto simd_0f_int;
 
+    case X86EMUL_OPC_EVEX_F3(0x0f, 0x7e): /* vmovq xmm/m64,xmm */
+    case X86EMUL_OPC_EVEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */
+        generate_exception_if(evex.lr || !evex.w || evex.opmsk || evex.brs,
+                              EXC_UD);
+        host_and_vcpu_must_have(avx512f);
+        d |= TwoOp;
+        op_bytes = 8;
+        goto simd_zmm;
+
     case X86EMUL_OPC(0x0f, 0x80) ... X86EMUL_OPC(0x0f, 0x8f): /* jcc (near) */
         if ( test_cc(b, _regs.eflags) )
             jmp_rel((int32_t)src.val);
@@ -8635,63 +8635,6 @@ x86_emulate(
         dst.type = OP_NONE;
         break;
 
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x10): /* vpsrlvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x11): /* vpsravw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x12): /* vpsllvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-        host_and_vcpu_must_have(avx512bw);
-        generate_exception_if(!evex.w || evex.brs, EXC_UD);
-        elem_bytes = 2;
-        goto avx512f_no_sae;
-
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x18): /* vbroadcastss xmm/m32,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,[xyz]mm{k} */
-        op_bytes = elem_bytes;
-        generate_exception_if(evex.w || evex.brs, EXC_UD);
-    avx512_broadcast:
-        /*
-         * For the respective code below the main switch() to work we need to
-         * fold op_mask here: A source element gets read whenever any of its
-         * respective destination elements' mask bits is set.
-         */
-        if ( fault_suppression )
-        {
-            n = 1 << ((b & 3) - evex.w);
-            EXPECT(elem_bytes > 0);
-            ASSERT(op_bytes == n * elem_bytes);
-            for ( i = n; i < (16 << evex.lr) / elem_bytes; i += n )
-                op_mask |= (op_mask >> i) & ((1 << n) - 1);
-        }
-        goto avx512f_no_sae;
-
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x1b): /* vbroadcastf32x8 m256,zmm{k} */
-                                            /* vbroadcastf64x4 m256,zmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x5b): /* vbroadcasti32x8 m256,zmm{k} */
-                                            /* vbroadcasti64x4 m256,zmm{k} */
-        generate_exception_if(ea.type != OP_MEM || evex.lr != 2, EXC_UD);
-        /* fall through */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x19): /* vbroadcastsd xmm/m64,{y,z}mm{k} */
-                                            /* vbroadcastf32x2 xmm/m64,{y,z}mm{k} */
-        generate_exception_if(!evex.lr, EXC_UD);
-        /* fall through */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,[xyz]mm{k} */
-                                            /* vbroadcasti32x2 xmm/m64,[xyz]mm{k} */
-        if ( b == 0x59 )
-            op_bytes = 8;
-        generate_exception_if(evex.brs, EXC_UD);
-        if ( !evex.w )
-            host_and_vcpu_must_have(avx512dq);
-        goto avx512_broadcast;
-
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x1a): /* vbroadcastf32x4 m128,{y,z}mm{k} */
-                                            /* vbroadcastf64x2 m128,{y,z}mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x5a): /* vbroadcasti32x4 m128,{y,z}mm{k} */
-                                            /* vbroadcasti64x2 m128,{y,z}mm{k} */
-        generate_exception_if(ea.type != OP_MEM || !evex.lr || evex.brs,
-                              EXC_UD);
-        if ( evex.w )
-            host_and_vcpu_must_have(avx512dq);
-        goto avx512_broadcast;
-
     case X86EMUL_OPC_66(0x0f38, 0x20): /* pmovsxbw xmm/m64,xmm */
     case X86EMUL_OPC_66(0x0f38, 0x21): /* pmovsxbd xmm/m32,xmm */
     case X86EMUL_OPC_66(0x0f38, 0x22): /* pmovsxbq xmm/m16,xmm */
@@ -8725,47 +8668,14 @@ x86_emulate(
         host_and_vcpu_must_have(sse4_1);
         goto simd_0f38_common;
 
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x13): /* vcvtph2ps xmm/mem,{x,y}mm */
-        generate_exception_if(vex.w, EXC_UD);
-        host_and_vcpu_must_have(f16c);
-        op_bytes = 8 << vex.l;
-        goto simd_0f_ymm;
-
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x13): /* vcvtph2ps {x,y}mm/mem,[xyz]mm{k} */
-        generate_exception_if(evex.w || (ea.type != OP_REG && evex.brs), EXC_UD);
-        host_and_vcpu_must_have(avx512f);
-        if ( !evex.brs )
-            avx512_vlen_check(false);
-        op_bytes = 8 << evex.lr;
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x10): /* vpsrlvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x11): /* vpsravw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x12): /* vpsllvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512bw);
+        generate_exception_if(!evex.w || evex.brs, EXC_UD);
         elem_bytes = 2;
-        goto simd_zmm;
-
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x16): /* vpermps ymm/m256,ymm,ymm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x36): /* vpermd ymm/m256,ymm,ymm */
-        generate_exception_if(!vex.l || vex.w, EXC_UD);
-        goto simd_0f_avx2;
-
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x16): /* vpermp{s,d} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x36): /* vperm{d,q} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
-        generate_exception_if(!evex.lr, EXC_UD);
-        fault_suppression = false;
         goto avx512f_no_sae;
 
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x20): /* vpmovsxbw xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x23): /* vpmovsxwd xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x24): /* vpmovsxwq xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x25): /* vpmovsxdq xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x30): /* vpmovzxbw xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x31): /* vpmovzxbd xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x32): /* vpmovzxbq xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x33): /* vpmovzxwd xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x34): /* vpmovzxwq xmm/mem,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x35): /* vpmovzxdq xmm/mem,{x,y}mm */
-        op_bytes = 16 >> (pmov_convert_delta[b & 7] - vex.l);
-        goto simd_0f_int;
-
     case X86EMUL_OPC_EVEX_F3(0x0f38, 0x10): /* vpmovuswb [xyz]mm,{x,y}mm/mem{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x20): /* vpmovsxbw {x,y}mm/mem,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_F3(0x0f38, 0x20): /* vpmovswb [xyz]mm,{x,y}mm/mem{k} */
@@ -8811,6 +8721,96 @@ x86_emulate(
         elem_bytes = (b & 7) < 3 ? 1 : (b & 7) != 5 ? 2 : 4;
         goto avx512f_no_sae;
 
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x13): /* vcvtph2ps xmm/mem,{x,y}mm */
+        generate_exception_if(vex.w, EXC_UD);
+        host_and_vcpu_must_have(f16c);
+        op_bytes = 8 << vex.l;
+        goto simd_0f_ymm;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x13): /* vcvtph2ps {x,y}mm/mem,[xyz]mm{k} */
+        generate_exception_if(evex.w || (ea.type != OP_REG && evex.brs), EXC_UD);
+        host_and_vcpu_must_have(avx512f);
+        if ( !evex.brs )
+            avx512_vlen_check(false);
+        op_bytes = 8 << evex.lr;
+        elem_bytes = 2;
+        goto simd_zmm;
+
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x16): /* vpermps ymm/m256,ymm,ymm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x36): /* vpermd ymm/m256,ymm,ymm */
+        generate_exception_if(!vex.l || vex.w, EXC_UD);
+        goto simd_0f_avx2;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x16): /* vpermp{s,d} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x36): /* vperm{d,q} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */
+        generate_exception_if(!evex.lr, EXC_UD);
+        fault_suppression = false;
+        goto avx512f_no_sae;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x18): /* vbroadcastss xmm/m32,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,[xyz]mm{k} */
+        op_bytes = elem_bytes;
+        generate_exception_if(evex.w || evex.brs, EXC_UD);
+    avx512_broadcast:
+        /*
+         * For the respective code below the main switch() to work we need to
+         * fold op_mask here: A source element gets read whenever any of its
+         * respective destination elements' mask bits is set.
+         */
+        if ( fault_suppression )
+        {
+            n = 1 << ((b & 3) - evex.w);
+            EXPECT(elem_bytes > 0);
+            ASSERT(op_bytes == n * elem_bytes);
+            for ( i = n; i < (16 << evex.lr) / elem_bytes; i += n )
+                op_mask |= (op_mask >> i) & ((1 << n) - 1);
+        }
+        goto avx512f_no_sae;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x1b): /* vbroadcastf32x8 m256,zmm{k} */
+                                            /* vbroadcastf64x4 m256,zmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x5b): /* vbroadcasti32x8 m256,zmm{k} */
+                                            /* vbroadcasti64x4 m256,zmm{k} */
+        generate_exception_if(ea.type != OP_MEM || evex.lr != 2, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x19): /* vbroadcastsd xmm/m64,{y,z}mm{k} */
+                                            /* vbroadcastf32x2 xmm/m64,{y,z}mm{k} */
+        generate_exception_if(!evex.lr, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,[xyz]mm{k} */
+                                            /* vbroadcasti32x2 xmm/m64,[xyz]mm{k} */
+        if ( b == 0x59 )
+            op_bytes = 8;
+        generate_exception_if(evex.brs, EXC_UD);
+        if ( !evex.w )
+            host_and_vcpu_must_have(avx512dq);
+        goto avx512_broadcast;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x1a): /* vbroadcastf32x4 m128,{y,z}mm{k} */
+                                            /* vbroadcastf64x2 m128,{y,z}mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x5a): /* vbroadcasti32x4 m128,{y,z}mm{k} */
+                                            /* vbroadcasti64x2 m128,{y,z}mm{k} */
+        generate_exception_if(ea.type != OP_MEM || !evex.lr || evex.brs,
+                              EXC_UD);
+        if ( evex.w )
+            host_and_vcpu_must_have(avx512dq);
+        goto avx512_broadcast;
+
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x20): /* vpmovsxbw xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x23): /* vpmovsxwd xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x24): /* vpmovsxwq xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x25): /* vpmovsxdq xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x30): /* vpmovzxbw xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x31): /* vpmovzxbd xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x32): /* vpmovzxbq xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x33): /* vpmovzxwd xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x34): /* vpmovzxwq xmm/mem,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x35): /* vpmovzxdq xmm/mem,{x,y}mm */
+        op_bytes = 16 >> (pmov_convert_delta[b & 7] - vex.l);
+        goto simd_0f_int;
+
     case X86EMUL_OPC_EVEX_F3(0x0f38, 0x29): /* vpmov{b,w}2m [xyz]mm,k */
     case X86EMUL_OPC_EVEX_F3(0x0f38, 0x39): /* vpmov{d,q}2m [xyz]mm,k */
         generate_exception_if(!evex.r || !evex.R, EXC_UD);
@@ -8918,6 +8918,52 @@ x86_emulate(
         break;
     }
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x2c): /* vscalefp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x42): /* vgetexpp{s,d} [xyz]mm/mem,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x96): /* vfmaddsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x97): /* vfmsubadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x98): /* vfmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9a): /* vfmsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9c): /* vfnmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9e): /* vfnmsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa6): /* vfmaddsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa7): /* vfmsubadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa8): /* vfmadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xaa): /* vfmsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xac): /* vfnmadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xae): /* vfnmsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb6): /* vfmaddsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb7): /* vfmsubadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb8): /* vfmadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xba): /* vfmsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbc): /* vfnmadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbe): /* vfnmsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512f);
+        if ( ea.type != OP_REG || !evex.brs )
+            avx512_vlen_check(false);
+        goto simd_zmm;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x2d): /* vscalefs{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x43): /* vgetexps{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x99): /* vfmadd132s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9b): /* vfmsub132s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9d): /* vfnmadd132s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9f): /* vfnmsub132s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa9): /* vfmadd213s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xab): /* vfmsub213s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xad): /* vfnmadd213s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xaf): /* vfnmsub213s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb9): /* vfmadd231s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbb): /* vfmsub231s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbd): /* vfnmadd231s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbf): /* vfnmsub231s{s,d} xmm/mem,xmm,xmm{k} */
+        host_and_vcpu_must_have(avx512f);
+    simd_zmm_scalar_sae:
+        generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD);
+        if ( !evex.brs )
+            avx512_vlen_check(true);
+        goto simd_zmm;
+
     case X86EMUL_OPC_66(0x0f38, 0x37): /* pcmpgtq xmm/m128,xmm */
         host_and_vcpu_must_have(sse4_2);
         goto simd_0f38_common;
@@ -8950,6 +8996,31 @@ x86_emulate(
         generate_exception_if(vex.l, EXC_UD);
         goto simd_0f_avx;
 
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x50): /* vpdpbusd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x51): /* vpdpbusds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x52): /* vpdpwssd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x53): /* vpdpwssds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+        host_and_vcpu_must_have(avx512_vnni);
+        generate_exception_if(evex.w, EXC_UD);
+        goto avx512f_no_sae;
+
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x79): /* vpbroadcastw xmm/m16,{x,y}mm */
+        op_bytes = 1 << ((!(b & 0x20) * 2) + (b & 1));
+        /* fall through */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0x46): /* vpsravd {x,y}mm/mem,{x,y}mm,{x,y}mm */
+        generate_exception_if(vex.w, EXC_UD);
+        goto simd_0f_avx2;
+
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x4d): /* vrcp14s{s,d} xmm/mem,xmm,xmm{k} */
+    case X86EMUL_OPC_EVEX_66(0x0f38, 0x4f): /* vrsqrt14s{s,d} xmm/mem,xmm,xmm{k} */
+        host_and_vcpu_must_have(avx512f);
+        generate_exception_if(evex.brs, EXC_UD);
+        avx512_vlen_check(true);
+        goto simd_zmm;
+
     case X86EMUL_OPC_EVEX_F2(0x0f38, 0x52): /* vp4dpwssd m128,zmm+3,zmm{k} */
     case X86EMUL_OPC_EVEX_F2(0x0f38, 0x53): /* vp4dpwssds m128,zmm+3,zmm{k} */
         host_and_vcpu_must_have(avx512_4vnniw);
@@ -8972,23 +9043,6 @@ x86_emulate(
         host_and_vcpu_must_have(avx512_vpopcntdq);
         goto avx512f_no_sae;
 
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x79): /* vpbroadcastw xmm/m16,{x,y}mm */
-        op_bytes = 1 << ((!(b & 0x20) * 2) + (b & 1));
-        /* fall through */
-    case X86EMUL_OPC_VEX_66(0x0f38, 0x46): /* vpsravd {x,y}mm/mem,{x,y}mm,{x,y}mm */
-        generate_exception_if(vex.w, EXC_UD);
-        goto simd_0f_avx2;
-
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x4d): /* vrcp14s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x4f): /* vrsqrt14s{s,d} xmm/mem,xmm,xmm{k} */
-        host_and_vcpu_must_have(avx512f);
-        generate_exception_if(evex.brs, EXC_UD);
-        avx512_vlen_check(true);
-        goto simd_zmm;
-
     case X86EMUL_OPC_VEX_66(0x0f38, 0x5a): /* vbroadcasti128 m128,ymm */
         generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD);
         goto simd_0f_avx2;
@@ -9370,60 +9424,6 @@ x86_emulate(
         host_and_vcpu_must_have(fma);
         goto simd_0f_ymm;
 
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x2c): /* vscalefp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x42): /* vgetexpp{s,d} [xyz]mm/mem,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x96): /* vfmaddsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x97): /* vfmsubadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x98): /* vfmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9a): /* vfmsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9c): /* vfnmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9e): /* vfnmsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa6): /* vfmaddsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa7): /* vfmsubadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa8): /* vfmadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xaa): /* vfmsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xac): /* vfnmadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xae): /* vfnmsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb6): /* vfmaddsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb7): /* vfmsubadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb8): /* vfmadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xba): /* vfmsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbc): /* vfnmadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbe): /* vfnmsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-        host_and_vcpu_must_have(avx512f);
-        if ( ea.type != OP_REG || !evex.brs )
-            avx512_vlen_check(false);
-        goto simd_zmm;
-
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x2d): /* vscalefs{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x43): /* vgetexps{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x99): /* vfmadd132s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9b): /* vfmsub132s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9d): /* vfnmadd132s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x9f): /* vfnmsub132s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xa9): /* vfmadd213s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xab): /* vfmsub213s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xad): /* vfnmadd213s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xaf): /* vfnmsub213s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xb9): /* vfmadd231s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbb): /* vfmsub231s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbd): /* vfnmadd231s{s,d} xmm/mem,xmm,xmm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0xbf): /* vfnmsub231s{s,d} xmm/mem,xmm,xmm{k} */
-        host_and_vcpu_must_have(avx512f);
-    simd_zmm_scalar_sae:
-        generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD);
-        if ( !evex.brs )
-            avx512_vlen_check(true);
-        goto simd_zmm;
-
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x50): /* vpdpbusd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x51): /* vpdpbusds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x52): /* vpdpwssd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-    case X86EMUL_OPC_EVEX_66(0x0f38, 0x53): /* vpdpwssds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
-        host_and_vcpu_must_have(avx512_vnni);
-        generate_exception_if(evex.w, EXC_UD);
-        goto avx512f_no_sae;
-
     case X86EMUL_OPC_EVEX_F2(0x0f38, 0x9a): /* v4fmaddps m128,zmm+3,zmm{k} */
     case X86EMUL_OPC_EVEX_F2(0x0f38, 0xaa): /* v4fnmaddps m128,zmm+3,zmm{k} */
         host_and_vcpu_must_have(avx512_4fmaps);
@@ -10266,11 +10266,6 @@ x86_emulate(
         generate_exception_if(evex.brs || evex.opmsk, EXC_UD);
         goto avx512f_imm8_no_sae;
 
-    case X86EMUL_OPC_VEX_66(0x0f3a, 0x4a): /* vblendvps {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
-    case X86EMUL_OPC_VEX_66(0x0f3a, 0x4b): /* vblendvpd {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
-        generate_exception_if(vex.w, EXC_UD);
-        goto simd_0f_imm8_avx;
-
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x48): /* vpermil2ps $imm,{x,y}mm/mem,{x,y}mm,{x,y}mm,{x,y}mm */
                                            /* vpermil2ps $imm,{x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x49): /* vpermil2pd $imm,{x,y}mm/mem,{x,y}mm,{x,y}mm,{x,y}mm */
@@ -10278,6 +10273,11 @@ x86_emulate(
         host_and_vcpu_must_have(xop);
         goto simd_0f_imm8_ymm;
 
+    case X86EMUL_OPC_VEX_66(0x0f3a, 0x4a): /* vblendvps {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
+    case X86EMUL_OPC_VEX_66(0x0f3a, 0x4b): /* vblendvpd {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
+        generate_exception_if(vex.w, EXC_UD);
+        goto simd_0f_imm8_avx;
+
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x4c): /* vpblendvb {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
         generate_exception_if(vex.w, EXC_UD);
         goto simd_0f_int_imm8;

From patchwork Fri Mar 15 11:07:29 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854557
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3460514DE
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:09:11 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B25D2A16E
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:09:11 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 0F8692A181; Fri, 15 Mar 2019 11:09:11 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 614A22A171
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:09:10 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4kgY-00026o-Iy; Fri, 15 Mar 2019 11:07:34 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kgW-00026O-Q9
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:07:32 +0000
X-Inumbo-ID: 8640fe96-4712-11e9-b7cf-bf1230b85b28
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 8640fe96-4712-11e9-b7cf-bf1230b85b28;
 Fri, 15 Mar 2019 11:07:30 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:07:29 -0600
Message-Id: <5C8B8771020000780021F329@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:07:29 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 48/50] x86emul: add an AES/VAES test case to
 the harness
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v8: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -19,8 +19,9 @@ CFLAGS += $(CFLAGS_xeninclude)
 SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er avx512vbmi
 FMA := fma4 fma
 SG := avx2-sg avx512f-sg avx512vl-sg
+AES := ssse3-aes avx-aes avx2-vaes avx512bw-vaes
 GF := sse2-gf avx2-gf avx512bw-gf
-TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(GF)
+TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(GF)
 
 OPMASK := avx512f avx512dq avx512bw
 
@@ -143,6 +144,10 @@ $(1)-cflags := \
 	   $(foreach flt,$($(1)-flts), \
 	     "-D_$(vec)x$(idx)f$(flt) -m$(1:-sg=) $(call non-sse,$(1)) -Os -DVEC_MAX=$(vec) -DIDX_SIZE=$(idx) -DFLOAT_SIZE=$(flt)")))
 endef
+define simd-aes-defs
+$(1)-cflags := $(foreach vec,$($(patsubst %-aes,sse,$(1))-vecs) $($(patsubst %-vaes,%,$(1))-vecs), \
+	         "-D_$(vec) -maes $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)")
+endef
 define simd-gf-defs
 $(1)-cflags := $(foreach vec,$($(1:-gf=)-vecs), \
 	         "-D_$(vec) -mgfni -m$(1:-gf=) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)")
@@ -153,6 +158,7 @@ endef
 
 $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor))))
 $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor))))
+$(foreach flavor,$(AES),$(eval $(call simd-aes-defs,$(flavor))))
 $(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor))))
 $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor))))
 
@@ -203,10 +209,13 @@ $(addsuffix .c,$(FMA)):
 $(addsuffix .c,$(SG)):
 	ln -sf simd-sg.c $@
 
+$(addsuffix .c,$(AES)):
+	ln -sf simd-aes.c $@
+
 $(addsuffix .c,$(GF)):
 	ln -sf simd-gf.c $@
 
-$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(GF)): simd.h
+$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(GF)): simd.h
 
 xop.h avx512f.h: simd-fma.c
 
--- /dev/null
+++ b/tools/tests/x86_emulator/simd-aes.c
@@ -0,0 +1,102 @@
+#define UINT_SIZE 1
+
+#include "simd.h"
+ENTRY(aes_test);
+
+#if VEC_SIZE == 16
+# define AES(op, a...) __builtin_ia32_vaes ## op ## _v16qi(a)
+# define imc(x) ((vec_t)__builtin_ia32_aesimc128((vdi_t)(x)))
+#elif VEC_SIZE == 32
+# define AES(op, a...) __builtin_ia32_vaes ## op ## _v32qi(a)
+# define imc(x) ({ \
+    vec_t r_; \
+    unsigned char __attribute__((vector_size(16))) t_; \
+    asm ( "vaesimc (%3), %x0\n\t" \
+          "vaesimc 16(%3), %1\n\t" \
+          "vinserti128 $1, %1, %0, %0" \
+          : "=&v" (r_), "=&v" (t_) \
+          : "m" (x), "r" (&(x)) ); \
+    r_; \
+})
+#elif VEC_SIZE == 64
+# define AES(op, a...) __builtin_ia32_vaes ## op ## _v64qi(a)
+# define imc(x) ({ \
+    vec_t r_; \
+    unsigned char __attribute__((vector_size(16))) t_; \
+    asm ( "vaesimc (%3), %x0\n\t" \
+          "vaesimc 1*16(%3), %1\n\t" \
+          "vinserti32x4 $1, %1, %0, %0\n\t" \
+          "vaesimc 2*16(%3), %1\n\t" \
+          "vinserti32x4 $2, %1, %0, %0\n\t" \
+          "vaesimc 3*16(%3), %1\n\t" \
+          "vinserti32x4 $3, %1, %0, %0" \
+          : "=&v" (r_), "=&v" (t_) \
+          : "m" (x), "r" (&(x)) ); \
+    r_; \
+})
+#endif
+
+#ifdef __AVX512BW__
+# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT))
+# define eq(x, y) (B(pcmpeqb, _mask, (vqi_t)(x), (vqi_t)(y), -1) == ALL_TRUE)
+# define aes(op, x, y) ((vec_t)AES(op, (vqi_t)(x), (vqi_t)(y)))
+#else
+# if defined(__AVX2__) && VEC_SIZE == 32
+#  define to_bool(cmp) B(ptestc, , cmp, (vdi_t){} == 0)
+#  define aes(op, x, y) ((vec_t)AES(op, (vqi_t)(x), (vqi_t)(y)))
+# else
+#  define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff)
+#  define aes(op, x, y) ((vec_t)__builtin_ia32_aes ## op ## 128((vdi_t)(x), (vdi_t)(y)))
+# endif
+# define eq(x, y) to_bool((x) == (y))
+#endif
+
+int aes_test(void)
+{
+    unsigned int i;
+    vec_t src, zero = {};
+
+    for ( i = 0; i < ELEM_COUNT; ++i )
+        src[i] = i;
+
+    do {
+        vec_t x, y;
+
+        touch(src);
+        x = imc(src);
+        touch(src);
+
+        touch(zero);
+        y = aes(enclast, src, zero);
+        touch(zero);
+        y = aes(dec, y, zero);
+
+        if ( !eq(x, y) ) return __LINE__;
+
+        touch(zero);
+        x = aes(declast, src, zero);
+        touch(zero);
+        y = aes(enc, x, zero);
+        touch(y);
+        x = imc(y);
+
+        if ( !eq(x, src) ) return __LINE__;
+
+#if VEC_SIZE == 16
+        touch(src);
+        x = (vec_t)__builtin_ia32_aeskeygenassist128((vdi_t)src, 0);
+        touch(src);
+        y = (vec_t)__builtin_ia32_pshufb128((vqi_t)x,
+                                            (vqi_t){  7,  4,  5,  6,
+                                                      1,  2,  3,  0,
+                                                     15, 12, 13, 14,
+                                                      9, 10, 11,  8 });
+        if ( !eq(x, y) ) return __LINE__;
+#endif
+
+        src += ELEM_COUNT;
+        i += ELEM_COUNT;
+    } while ( i <= 256 );
+
+    return 0;
+}
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -340,6 +340,10 @@ REN(pandn, , d);
 REN(por, , d);
 REN(pxor, , d);
 #  endif
+OVR(aesdec);
+OVR(aesdeclast);
+OVR(aesenc);
+OVR(aesenclast);
 OVR(cvtpd2dqx);
 OVR(cvtpd2dqy);
 OVR(cvtpd2psx);
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -12,12 +12,15 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "sse.h"
 #include "sse2.h"
 #include "sse2-gf.h"
+#include "ssse3-aes.h"
 #include "sse4.h"
 #include "avx.h"
+#include "avx-aes.h"
 #include "fma4.h"
 #include "fma.h"
 #include "avx2.h"
 #include "avx2-sg.h"
+#include "avx2-vaes.h"
 #include "avx2-gf.h"
 #include "xop.h"
 #include "avx512f-opmask.h"
@@ -27,6 +30,7 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512f-sg.h"
 #include "avx512vl-sg.h"
 #include "avx512bw.h"
+#include "avx512bw-vaes.h"
 #include "avx512bw-gf.h"
 #include "avx512dq.h"
 #include "avx512er.h"
@@ -91,6 +95,16 @@ static bool simd_check_xop(void)
     return cpu_has_xop;
 }
 
+static bool simd_check_ssse3_aes(void)
+{
+    return cpu_has_aesni && cpu_has_ssse3;
+}
+
+static bool simd_check_avx_aes(void)
+{
+    return cpu_has_aesni && cpu_has_avx;
+}
+
 static bool simd_check_avx512f(void)
 {
     return cpu_has_avx512f;
@@ -141,6 +155,22 @@ static bool simd_check_avx512vbmi_vl(voi
     return cpu_has_avx512_vbmi && cpu_has_avx512vl;
 }
 
+static bool simd_check_avx2_vaes(void)
+{
+    return cpu_has_aesni && cpu_has_vaes && cpu_has_avx2;
+}
+
+static bool simd_check_avx512bw_vaes(void)
+{
+    return cpu_has_aesni && cpu_has_vaes && cpu_has_avx512bw;
+}
+
+static bool simd_check_avx512bw_vaes_vl(void)
+{
+    return cpu_has_aesni && cpu_has_vaes &&
+           cpu_has_avx512bw && cpu_has_avx512vl;
+}
+
 static bool simd_check_sse2_gf(void)
 {
     return cpu_has_gfni && cpu_has_sse2;
@@ -319,6 +349,8 @@ static const struct {
     SIMD(XOP i16x16,              xop,      32i2),
     SIMD(XOP i32x8,               xop,      32i4),
     SIMD(XOP i64x4,               xop,      32i8),
+    SIMD(AES (legacy),      ssse3_aes,        16),
+    SIMD(AES (VEX/x16),       avx_aes,        16),
     SIMD(OPMASK/w,     avx512f_opmask,         2),
     SIMD(OPMASK+DQ/b, avx512dq_opmask,         1),
     SIMD(OPMASK+DQ/w, avx512dq_opmask,         2),
@@ -418,6 +450,10 @@ static const struct {
     AVX512VL(_VBMI+VL u16x8, avx512vbmi,    16u2),
     AVX512VL(_VBMI+VL s16x16, avx512vbmi,   32i2),
     AVX512VL(_VBMI+VL u16x16, avx512vbmi,   32u2),
+    SIMD(VAES (VEX/x32),    avx2_vaes,        32),
+    SIMD(VAES (EVEX/x64), avx512bw_vaes,      64),
+    AVX512VL(VL+VAES (x16), avx512bw_vaes,    16),
+    AVX512VL(VL+VAES (x32), avx512bw_vaes,    32),
     SIMD(GFNI (legacy),       sse2_gf,        16),
     SIMD(GFNI (VEX/x16),      avx2_gf,        16),
     SIMD(GFNI (VEX/x32),      avx2_gf,        32),
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -125,10 +125,12 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_sse        cp.basic.sse
 #define cpu_has_sse2       cp.basic.sse2
 #define cpu_has_sse3       cp.basic.sse3
+#define cpu_has_ssse3      cp.basic.ssse3
 #define cpu_has_fma       (cp.basic.fma && xcr0_mask(6))
 #define cpu_has_sse4_1     cp.basic.sse4_1
 #define cpu_has_sse4_2     cp.basic.sse4_2
 #define cpu_has_popcnt     cp.basic.popcnt
+#define cpu_has_aesni      cp.basic.aesni
 #define cpu_has_avx       (cp.basic.avx  && xcr0_mask(6))
 #define cpu_has_f16c      (cp.basic.f16c && xcr0_mask(6))
 

From patchwork Fri Mar 15 11:08:05 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854559
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C6D314DE
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:09:55 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C8E22A94C
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:09:55 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 0F1222A94E; Fri, 15 Mar 2019 11:09:55 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0A1302A94C
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:09:54 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4khA-0002G2-VE; Fri, 15 Mar 2019 11:08:12 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4kh8-0002Fg-Vq
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:08:11 +0000
X-Inumbo-ID: 9dac069d-4712-11e9-bc90-bc764e045a96
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 9dac069d-4712-11e9-bc90-bc764e045a96;
 Fri, 15 Mar 2019 11:08:09 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:08:08 -0600
Message-Id: <5C8B8795020000780021F32C@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:08:05 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 49/50] x86emul: add a SHA test case to the
 harness
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Also use this for AVX512VL VPRO{L,R}{,V}D as well as some further shifts
testing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v8: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -20,8 +20,9 @@ SIMD := 3dnow sse sse2 sse4 avx avx2 xop
 FMA := fma4 fma
 SG := avx2-sg avx512f-sg avx512vl-sg
 AES := ssse3-aes avx-aes avx2-vaes avx512bw-vaes
+SHA := sse4-sha avx-sha avx512f-sha
 GF := sse2-gf avx2-gf avx512bw-gf
-TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(GF)
+TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF)
 
 OPMASK := avx512f avx512dq avx512bw
 
@@ -148,6 +149,10 @@ define simd-aes-defs
 $(1)-cflags := $(foreach vec,$($(patsubst %-aes,sse,$(1))-vecs) $($(patsubst %-vaes,%,$(1))-vecs), \
 	         "-D_$(vec) -maes $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)")
 endef
+define simd-sha-defs
+$(1)-cflags := $(foreach vec,$(sse-vecs), \
+	         "-D_$(vec) $(addprefix -m,$(subst -,$(space),$(1))) -Os -DVEC_SIZE=$(vec)")
+endef
 define simd-gf-defs
 $(1)-cflags := $(foreach vec,$($(1:-gf=)-vecs), \
 	         "-D_$(vec) -mgfni -m$(1:-gf=) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)")
@@ -159,6 +164,7 @@ endef
 $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor))))
 $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor))))
 $(foreach flavor,$(AES),$(eval $(call simd-aes-defs,$(flavor))))
+$(foreach flavor,$(SHA),$(eval $(call simd-sha-defs,$(flavor))))
 $(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor))))
 $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor))))
 
@@ -212,10 +218,13 @@ $(addsuffix .c,$(SG)):
 $(addsuffix .c,$(AES)):
 	ln -sf simd-aes.c $@
 
+$(addsuffix .c,$(SHA)):
+	ln -sf simd-sha.c $@
+
 $(addsuffix .c,$(GF)):
 	ln -sf simd-gf.c $@
 
-$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(GF)): simd.h
+$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF)): simd.h
 
 xop.h avx512f.h: simd-fma.c
 
--- /dev/null
+++ b/tools/tests/x86_emulator/simd-sha.c
@@ -0,0 +1,392 @@
+#define INT_SIZE 4
+
+#include "simd.h"
+ENTRY(sha_test);
+
+#define SHA(op, a...) __builtin_ia32_sha ## op(a)
+
+#ifdef __AVX512F__
+# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT))
+# define eq(x, y) (B(pcmpeqd, _mask, x, y, -1) == ALL_TRUE)
+# define blend(x, y, sel) B(movdqa32_, _mask, y, x, sel)
+# define rot_c(f, r, x, n) B(pro ## f ## d, _mask, x, n, undef(), ~0)
+# define rot_s(f, r, x, n) ({ /* gcc does not support embedded broadcast */ \
+    vec_t r_; \
+    asm ( "vpro" #f "vd %2%{1to%c3%}, %1, %0" \
+          : "=v" (r_) \
+          : "v" (x), "m" (n), "i" (ELEM_COUNT) ); \
+    r_; \
+})
+# define rot_v(d, x, n) B(pro ## d ## vd, _mask, x, n, undef(), ~0)
+# define shift_s(d, x, n) ({ \
+    vec_t r_; \
+    asm ( "vps" #d "lvd %2%{1to%c3%}, %1, %0" \
+          : "=v" (r_) \
+          : "v" (x), "m" (n), "i" (ELEM_COUNT) ); \
+    r_; \
+})
+# define vshift(d, x, n) ({ /* gcc does not allow memory operands */ \
+    vec_t r_; \
+    asm ( "vps" #d "ldq %2, %1, %0" \
+          : "=v" (r_) : "m" (x), "i" ((n) * ELEM_SIZE) ); \
+    r_; \
+})
+#else
+# define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff)
+# define eq(x, y) to_bool((x) == (y))
+# define blend(x, y, sel) \
+    ((vec_t)__builtin_ia32_pblendw128((vhi_t)(x), (vhi_t)(y), \
+                                      ((sel) & 1 ? 0x03 : 0) | \
+                                      ((sel) & 2 ? 0x0c : 0) | \
+                                      ((sel) & 4 ? 0x30 : 0) | \
+                                      ((sel) & 8 ? 0xc0 : 0)))
+# define rot_c(f, r, x, n) (sh ## f ## _c(x, n) | sh ## r ## _c(x, 32 - (n)))
+# define rot_s(f, r, x, n) ({ /* gcc does not allow memory operands */ \
+    vec_t r_, t_, n_ = (vec_t){ 32 } - (n); \
+    asm ( "ps" #f "ld %2, %0; ps" #r "ld %3, %1; por %1, %0" \
+          : "=&x" (r_), "=&x" (t_) \
+          : "m" (n), "m" (n_), "0" (x), "1" (x) ); \
+    r_; \
+})
+static inline unsigned int rotl(unsigned int x, unsigned int n)
+{
+    return (x << (n & 0x1f)) | (x >> ((32 - n) & 0x1f));
+}
+static inline unsigned int rotr(unsigned int x, unsigned int n)
+{
+    return (x >> (n & 0x1f)) | (x << ((32 - n) & 0x1f));
+}
+# define rot_v(d, x, n) ({ \
+    vec_t t_; \
+    unsigned int i_; \
+    for ( i_ = 0; i_ < ELEM_COUNT; ++i_ ) \
+        t_[i_] = rot ## d((x)[i_], (n)[i_]); \
+    t_; \
+})
+# define shift_s(d, x, n) ({ \
+    vec_t r_; \
+    asm ( "ps" #d "ld %1, %0" : "=&x" (r_) : "m" (n), "0" (x) ); \
+    r_; \
+})
+# define vshift(d, x, n) \
+    (vec_t)(__builtin_ia32_ps ## d ## ldqi128((vdi_t)(x), (n) * ELEM_SIZE * 8))
+#endif
+
+#define alignr(x, y, n) ((vec_t)__builtin_ia32_palignr128((vdi_t)(x), (vdi_t)(y), (n) * 8))
+#define hadd(x, y) __builtin_ia32_phaddd128(x, y)
+#define rol_c(x, n) rot_c(l, r, x, n)
+#define rol_s(x, n) rot_s(l, r, x, n)
+#define rol_v(x, n...) rot_v(l, x, n)
+#define ror_c(x, n) rot_c(r, l, x, n)
+#define ror_s(x, n) rot_s(r, l, x, n)
+#define ror_v(x, n...) rot_v(r, x, n)
+#define shl_c(x, n) __builtin_ia32_pslldi128(x, n)
+#define shl_s(x, n) shift_s(l, x, n)
+#define shr_c(x, n) __builtin_ia32_psrldi128(x, n)
+#define shr_s(x, n) shift_s(r, x, n)
+#define shuf(x, s) __builtin_ia32_pshufd(x, s)
+#define swap(x) shuf(x, 0b00011011)
+#define vshl(x, n) vshift(l, x, n)
+#define vshr(x, n) vshift(r, x, n)
+
+static inline vec_t sha256_sigma0(vec_t w)
+{
+    vec_t res;
+
+    touch(w);
+    res = ror_c(w, 7);
+    touch(w);
+    res ^= rol_c(w, 14);
+    touch(w);
+    res ^= shr_c(w, 3);
+    touch(w);
+
+    return res;
+}
+
+static inline vec_t sha256_sigma1(vec_t w)
+{
+    vec_t _17 = { 17 }, _19 = { 19 }, _10 = { 10 };
+
+    return ror_s(w, _17) ^ ror_s(w, _19) ^ shr_s(w, _10);
+}
+
+static inline vec_t sha256_Sigma0(vec_t w)
+{
+    vec_t res, n1 = { 0, 0, 2, 2 }, n2 = { 0, 0, 13, 13 }, n3 = { 0, 0, 10, 10 };
+
+    touch(n1);
+    res = ror_v(w, n1);
+    touch(n2);
+    res ^= ror_v(w, n2);
+    touch(n3);
+
+    return res ^ rol_v(w, n3);
+}
+
+static inline vec_t sha256_Sigma1(vec_t w)
+{
+    return ror_c(w, 6) ^ ror_c(w, 11) ^ rol_c(w, 7);
+}
+
+int sha_test(void)
+{
+    unsigned int i;
+    vec_t src, one = { 1 };
+    vqi_t raw = {};
+
+    for ( i = 1; i < VEC_SIZE; ++i )
+        raw[i] = i;
+    src = (vec_t)raw;
+
+    for ( i = 0; i < 256; i += VEC_SIZE )
+    {
+        vec_t x, y, tmp, hash = -src;
+        vec_t a, b, c, d, e, g, h;
+        unsigned int k, r;
+
+        touch(src);
+        x = SHA(1msg1, hash, src);
+        touch(src);
+        y = hash ^ alignr(hash, src, 8);
+        touch(src);
+
+        if ( !eq(x, y) ) return __LINE__;
+
+        touch(src);
+        x = SHA(1msg2, hash, src);
+        touch(src);
+        tmp = hash ^ alignr(src, hash, 12);
+        touch(tmp);
+        y = rol_c(tmp, 1);
+        tmp = hash ^ alignr(src, y, 12);
+        touch(tmp);
+        y = rol_c(tmp, 1);
+
+        if ( !eq(x, y) ) return __LINE__;
+
+        touch(src);
+        x = SHA(1msg2, hash, src);
+        touch(src);
+        tmp = rol_s(hash ^ alignr(src, hash, 12), one);
+        y = rol_s(hash ^ alignr(src, tmp, 12), one);
+
+        if ( !eq(x, y) ) return __LINE__;
+
+        touch(src);
+        x = SHA(1nexte, hash, src);
+        touch(src);
+        touch(hash);
+        tmp = rol_c(hash, 30);
+        tmp[2] = tmp[1] = tmp[0] = 0;
+
+        if ( !eq(x, src + tmp) ) return __LINE__;
+
+        /*
+         * SHA1RNDS4
+         *
+         * SRC1 = { A0, B0, C0, D0 }
+         * SRC2 = W' = { W[0]E0, W[1], W[2], W[3] }
+         *
+         * (NB that the notation is not C-like, i.e. elements are listed
+         * high-to-low everywhere in this comment.)
+         *
+         * In order to pick a simple rounds function, an immediate value of
+         * 1 is used; 3 would also be a possibility.
+         *
+         * Applying
+         *
+         * A1 = ROL5(A0) + (B0 ^ C0 ^ D0) + W'[0] + K
+         * E1 = D0
+         * D1 = C0
+         * C1 = ROL30(B0)
+         * B1 = A0
+         *
+         * iteratively four times and resolving round variable values to
+         * A<n> and B0, C0, and D0 we get
+         *
+         * A4 = ROL5(A3) + (A2 ^ ROL30(A1) ^ ROL30(A0)) + W'[3] + ROL30(B0) + K
+         * A3 = ROL5(A2) + (A1 ^ ROL30(A0) ^ ROL30(B0)) + W'[2] +       C0  + K
+         * A2 = ROL5(A1) + (A0 ^ ROL30(B0) ^       C0 ) + W'[1] +       D0  + K
+         * A1 = ROL5(A0) + (B0 ^       C0  ^       D0 ) + W'[0]             + K
+         *
+         * (respective per-column variable names:
+         *  y         a      b          c           d      src           e    k
+         * )
+         *
+         * with
+         *
+         * B4 = A3
+         * C4 = ROL30(A2)
+         * D4 = ROL30(A1)
+         * E4 = ROL30(A0)
+         *
+         * and hence
+         *
+         * DST = { A4, A3, ROL30(A2), ROL30(A1) }
+         */
+
+        touch(src);
+        x = SHA(1rnds4, hash, src, 1);
+        touch(src);
+
+        a = vshr(hash, 3);
+        b = vshr(hash, 2);
+        touch(hash);
+        d = rol_c(hash, 30);
+        touch(hash);
+        d = blend(d, hash, 0b0011);
+        c = vshr(d, 1);
+        e = vshl(d, 1);
+        tmp = (vec_t){};
+        k = rol_c(SHA(1rnds4, tmp, tmp, 1), 2)[0];
+
+        for ( r = 0; r < 4; ++r )
+        {
+            y = rol_c(a, 5) + (b ^ c ^ d) + swap(src) + e + k;
+
+            switch ( r )
+            {
+            case 0:
+                c[3] = rol_c(y, 30)[0];
+                /* fall through */
+            case 1:
+                b[r + 2] = y[r];
+                /* fall through */
+            case 2:
+                a[r + 1] = y[r];
+                break;
+            }
+
+            switch ( r )
+            {
+            case 3:
+                if ( a[3] != y[2] ) return __LINE__;
+                /* fall through */
+            case 2:
+                if ( a[2] != y[1] ) return __LINE__;
+                if ( b[3] != y[1] ) return __LINE__;
+                /* fall through */
+            case 1:
+                if ( a[1] != y[0] ) return __LINE__;
+                if ( b[2] != y[0] ) return __LINE__;
+                if ( c[3] != rol_c(y, 30)[0] ) return __LINE__;
+                break;
+            }
+        }
+
+        a = blend(rol_c(y, 30), y, 0b1100);
+
+        if ( !eq(x, a) ) return __LINE__;
+
+        touch(src);
+        x = SHA(256msg1, hash, src);
+        touch(src);
+        y = hash + sha256_sigma0(alignr(src, hash, 4));
+
+        if ( !eq(x, y) ) return __LINE__;
+
+        touch(src);
+        x = SHA(256msg2, hash, src);
+        touch(src);
+        tmp = hash + sha256_sigma1(alignr(hash, src, 8));
+        y = hash + sha256_sigma1(alignr(tmp, src, 8));
+
+        if ( !eq(x, y) ) return __LINE__;
+
+        /*
+         * SHA256RNDS2
+         *
+         * SRC1 = { C0, D0, G0, H0 }
+         * SRC2 = { A0, B0, E0, F0 }
+         * XMM0 = W' = { ?, ?, WK1, WK0 }
+         *
+         * (NB that the notation again is not C-like, i.e. elements are listed
+         * high-to-low everywhere in this comment.)
+         *
+         * Ch(E,F,G) = (E & F) ^ (~E & G)
+         * Maj(A,B,C) = (A & B) ^ (A & C) ^ (B & C)
+         *
+         * Î£0(A) = ROR2(A) ^ ROR13(A) ^ ROR22(A)
+         * Î£1(E) = ROR6(E) ^ ROR11(E) ^ ROR25(E)
+         *
+         * Applying
+         *
+         * A1 = Ch(E0, F0, G0) + Î£1(E0) + WK0 + H0 + Maj(A0, B0, C0) + Î£0(A0)
+         * B1 = A0
+         * C1 = B0
+         * D1 = C0
+         * E1 = Ch(E0, F0, G0) + Î£1(E0) + WK0 + H0 + D0
+         * F1 = E0
+         * G1 = F0
+         * H1 = G0
+         *
+         * iteratively four times and resolving round variable values to
+         * A<n> / E<n> and B0, C0, D0, F0, G0, and H0 we get
+         *
+         * A2 = Ch(E1, E0, F0) + Î£1(E1) + WK1 + G0 + Maj(A1, A0, B0) + Î£0(A1)
+         * A1 = Ch(E0, F0, G0) + Î£1(E0) + WK0 + H0 + Maj(A0, B0, C0) + Î£0(A0)
+         * E2 = Ch(E1, E0, F0) + Î£1(E1) + WK1 + G0 + C0
+         * E1 = Ch(E0, F0, G0) + Î£1(E0) + WK0 + H0 + D0
+         *
+         * with
+         *
+         * B2 = A1
+         * F2 = E1
+         *
+         * and hence
+         *
+         * DST = { A2, A1, E2, E1 }
+         *
+         * which we can simplify a little, by letting A0, B0, and E0 be zero
+         * and F0 = ~G0, and by then utilizing
+         *
+         * Ch(0, 0, x) = x
+         * Ch(x, 0, y) = ~x & y
+         * Maj(x, 0, 0) = Maj(0, x, 0) = Maj(0, 0, x) = 0
+         *
+         * A2 = (~E1 & F0) + Î£1(E1) + WK1 + G0 + Î£0(A1)
+         * A1 = (~E0 & G0) + Î£1(E0) + WK0 + H0 + Î£0(A0)
+         * E2 = (~E1 & F0) + Î£1(E1) + WK1 + G0 + C0
+         * E1 = (~E0 & G0) + Î£1(E0) + WK0 + H0 + D0
+         *
+         * (respective per-column variable names:
+         *  y      e    g        e    src    h    d
+         * )
+         */
+
+        tmp = (vec_t){ ~hash[1] };
+        touch(tmp);
+        x = SHA(256rnds2, hash, tmp, src);
+        touch(tmp);
+
+        e = y = (vec_t){};
+        d = alignr(y, hash, 8);
+        g = (vec_t){ hash[1], tmp[0], hash[1], tmp[0] };
+        h = shuf(hash, 0b01000100);
+
+        for ( r = 0; r < 2; ++r )
+        {
+            y = (~e & g) + sha256_Sigma1(e) + shuf(src, 0b01000100) +
+                h + sha256_Sigma0(d);
+
+            if ( !r )
+            {
+                d[3] = y[2];
+                e[3] = e[1] = y[0];
+            }
+            else if ( d[3] != y[2] )
+                return __LINE__;
+            else if ( e[1] != y[0] )
+                return __LINE__;
+            else if ( e[3] != y[0] )
+                return __LINE__;
+        }
+
+        if ( !eq(x, y) ) return __LINE__;
+
+        src += 0x01010101 * VEC_SIZE;
+    }
+
+    return 0;
+}
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -14,8 +14,10 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "sse2-gf.h"
 #include "ssse3-aes.h"
 #include "sse4.h"
+#include "sse4-sha.h"
 #include "avx.h"
 #include "avx-aes.h"
+#include "avx-sha.h"
 #include "fma4.h"
 #include "fma.h"
 #include "avx2.h"
@@ -28,6 +30,7 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512bw-opmask.h"
 #include "avx512f.h"
 #include "avx512f-sg.h"
+#include "avx512f-sha.h"
 #include "avx512vl-sg.h"
 #include "avx512bw.h"
 #include "avx512bw-vaes.h"
@@ -155,6 +158,21 @@ static bool simd_check_avx512vbmi_vl(voi
     return cpu_has_avx512_vbmi && cpu_has_avx512vl;
 }
 
+static bool simd_check_sse4_sha(void)
+{
+    return cpu_has_sha && cpu_has_sse4_2;
+}
+
+static bool simd_check_avx_sha(void)
+{
+    return cpu_has_sha && cpu_has_avx;
+}
+
+static bool simd_check_avx512f_sha_vl(void)
+{
+    return cpu_has_sha && cpu_has_avx512vl;
+}
+
 static bool simd_check_avx2_vaes(void)
 {
     return cpu_has_aesni && cpu_has_vaes && cpu_has_avx2;
@@ -450,6 +468,9 @@ static const struct {
     AVX512VL(_VBMI+VL u16x8, avx512vbmi,    16u2),
     AVX512VL(_VBMI+VL s16x16, avx512vbmi,   32i2),
     AVX512VL(_VBMI+VL u16x16, avx512vbmi,   32u2),
+    SIMD(SHA,                sse4_sha,        16),
+    SIMD(AVX+SHA,             avx_sha,        16),
+    AVX512VL(VL+SHA,      avx512f_sha,        16),
     SIMD(VAES (VEX/x32),    avx2_vaes,        32),
     SIMD(VAES (EVEX/x64), avx512bw_vaes,      64),
     AVX512VL(VL+VAES (x16), avx512bw_vaes,    16),
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -142,6 +142,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512_ifma (cp.feat.avx512_ifma && xcr0_mask(0xe6))
 #define cpu_has_avx512er  (cp.feat.avx512er && xcr0_mask(0xe6))
 #define cpu_has_avx512cd  (cp.feat.avx512cd && xcr0_mask(0xe6))
+#define cpu_has_sha        cp.feat.sha
 #define cpu_has_avx512bw  (cp.feat.avx512bw && xcr0_mask(0xe6))
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))

From patchwork Fri Mar 15 11:08:32 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 10854561
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B347715AC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:10:14 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 922282A94D
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:10:14 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7DBD22A94F; Fri, 15 Mar 2019 11:10:14 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9DDB32A94D
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Fri, 15 Mar 2019 11:10:13 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1h4khX-0002La-9i; Fri, 15 Mar 2019 11:08:35 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from
 <SRS0=64UD=RS=suse.com=jbeulich@srs-us1.protection.inumbo.net>)
 id 1h4khW-0002LL-DM
 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:08:34 +0000
X-Inumbo-ID: ab2986d8-4712-11e9-871f-c7ab690a713f
Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id ab2986d8-4712-11e9-871f-c7ab690a713f;
 Fri, 15 Mar 2019 11:08:32 +0000 (UTC)
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com
 with Novell_GroupWise; Fri, 15 Mar 2019 05:08:31 -0600
Message-Id: <5C8B87B0020000780021F32F@prv1-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 18.1.0 
Date: Fri, 15 Mar 2019 05:08:32 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 50/50] x86emul: add a PCLMUL/VPCLMUL test
 case to the harness
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Also use this for AVX512_VBMI2 VPSH{L,R}D{,V}{D,Q,W} testing (only the
quad word right shifts get actually used; the assumption is that their
"left" counterparts as well as the double word and word forms then work
as well).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citirx.com> (subject to all
---
v8: New.

--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -20,9 +20,10 @@ SIMD := 3dnow sse sse2 sse4 avx avx2 xop
 FMA := fma4 fma
 SG := avx2-sg avx512f-sg avx512vl-sg
 AES := ssse3-aes avx-aes avx2-vaes avx512bw-vaes
+CLMUL := ssse3-pclmul avx-pclmul avx2-vpclmulqdq avx512bw-vpclmulqdq avx512vbmi2-vpclmulqdq
 SHA := sse4-sha avx-sha avx512f-sha
 GF := sse2-gf avx2-gf avx512bw-gf
-TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF)
+TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(CLMUL) $(SHA) $(GF)
 
 OPMASK := avx512f avx512dq avx512bw
 
@@ -89,6 +90,7 @@ avx512er-flts := 4 8
 avx512vbmi-vecs := $(avx512bw-vecs)
 avx512vbmi-ints := $(avx512bw-ints)
 avx512vbmi-flts := $(avx512bw-flts)
+avx512vbmi2-vecs := $(avx512bw-vecs)
 
 avx512f-opmask-vecs := 2
 avx512dq-opmask-vecs := 1 2
@@ -149,6 +151,10 @@ define simd-aes-defs
 $(1)-cflags := $(foreach vec,$($(patsubst %-aes,sse,$(1))-vecs) $($(patsubst %-vaes,%,$(1))-vecs), \
 	         "-D_$(vec) -maes $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)")
 endef
+define simd-clmul-defs
+$(1)-cflags := $(foreach vec,$($(patsubst %-pclmul,sse,$(1))-vecs) $($(patsubst %-vpclmulqdq,%,$(1))-vecs), \
+	         "-D_$(vec) -mpclmul $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)")
+endef
 define simd-sha-defs
 $(1)-cflags := $(foreach vec,$(sse-vecs), \
 	         "-D_$(vec) $(addprefix -m,$(subst -,$(space),$(1))) -Os -DVEC_SIZE=$(vec)")
@@ -164,6 +170,7 @@ endef
 $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor))))
 $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor))))
 $(foreach flavor,$(AES),$(eval $(call simd-aes-defs,$(flavor))))
+$(foreach flavor,$(CLMUL),$(eval $(call simd-clmul-defs,$(flavor))))
 $(foreach flavor,$(SHA),$(eval $(call simd-sha-defs,$(flavor))))
 $(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor))))
 $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor))))
@@ -218,13 +225,16 @@ $(addsuffix .c,$(SG)):
 $(addsuffix .c,$(AES)):
 	ln -sf simd-aes.c $@
 
+$(addsuffix .c,$(CLMUL)):
+	ln -sf simd-clmul.c $@
+
 $(addsuffix .c,$(SHA)):
 	ln -sf simd-sha.c $@
 
 $(addsuffix .c,$(GF)):
 	ln -sf simd-gf.c $@
 
-$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF)): simd.h
+$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(CLMUL) $(SHA) $(GF)): simd.h
 
 xop.h avx512f.h: simd-fma.c
 
--- /dev/null
+++ b/tools/tests/x86_emulator/simd-clmul.c
@@ -0,0 +1,150 @@
+#define UINT_SIZE 8
+
+#include "simd.h"
+ENTRY(clmul_test);
+
+#ifdef __AVX512F__ /* AVX512BW may get enabled only below */
+# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT))
+# define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE)
+# define lane_shr_unit(x) \
+    ((vec_t)B(palignr, _mask, (vdi_t)(x), (vdi_t)(x), 64, (vdi_t){}, \
+              0x00ff00ff00ff00ffULL & (~0ULL >> (64 - VEC_SIZE))))
+#else
+# if defined(__AVX2__) && VEC_SIZE == 32
+#  define to_bool(cmp) B(ptestc, , cmp, (vdi_t){} == 0)
+# else
+#  define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff)
+# endif
+# define eq(x, y) to_bool((x) == (y))
+# define lane_shr_unit(x) ((vec_t)B(palignr, , (vdi_t){}, (vdi_t)(x), 64))
+#endif
+
+#define CLMUL(op, x, y, c) (vec_t)(__builtin_ia32_ ## op((vdi_t)(x), (vdi_t)(y), c))
+
+#if VEC_SIZE == 16
+# define clmul(x, y, c) CLMUL(pclmulqdq128, x, y, c)
+# define vpshrd __builtin_ia32_vpshrd_v2di
+#elif VEC_SIZE == 32
+# define clmul(x, y, c) CLMUL(vpclmulqdq_v4di, x, y, c)
+# define vpshrd __builtin_ia32_vpshrd_v4di
+#elif VEC_SIZE == 64
+# define clmul(x, y, c) CLMUL(vpclmulqdq_v8di, x, y, c)
+# define vpshrd __builtin_ia32_vpshrd_v8di
+#endif
+
+#define clmul_ll(x, y) clmul(x, y, 0x00)
+#define clmul_hl(x, y) clmul(x, y, 0x01)
+#define clmul_lh(x, y) clmul(x, y, 0x10)
+#define clmul_hh(x, y) clmul(x, y, 0x11)
+
+#if defined(__AVX512VBMI2__)
+# pragma GCC target ( "avx512bw" )
+# define lane_shr_i(x, n) ({ \
+    vec_t h_ = lane_shr_unit(x); \
+    touch(h_); \
+    (n) < 64 ? (vec_t)vpshrd((vdi_t)(x), (vdi_t)(h_), n) : h_ >> ((n) - 64); \
+})
+# define lane_shr_v(x, n) ({ \
+    vec_t t_ = (x), h_ = lane_shr_unit(x); \
+    typeof(t_[0]) n_ = (n); \
+    if ( (n) < 64 ) \
+        /* gcc does not support embedded broadcast */ \
+        asm ( "vpshrdvq %2%{1to%c3%}, %1, %0" \
+              : "+v" (t_) : "v" (h_), "m" (n_), "i" (ELEM_COUNT) ); \
+    else \
+        t_ = h_ >> ((n) - 64); \
+    t_; \
+})
+#else
+# define lane_shr_i lane_shr_v
+# define lane_shr_v(x, n) ({ \
+    vec_t t_ = (n) > 0 ? lane_shr_unit(x) : (x); \
+    (n) < 64 ? ((x) >> (n)) | (t_ << (-(n) & 0x3f)) \
+             : t_ >> ((n) - 64); \
+})
+#endif
+
+int clmul_test(void)
+{
+    unsigned int i;
+    vec_t src;
+    vqi_t raw = {};
+
+    for ( i = 1; i < VEC_SIZE; ++i )
+        raw[i] = i;
+    src = (vec_t)raw;
+
+    for ( i = 0; i < 256; i += VEC_SIZE )
+    {
+        vec_t x = {}, y, z, lo, hi;
+        unsigned int j;
+
+        touch(x);
+        y = clmul_ll(src, x);
+        touch(x);
+
+        if ( !eq(y, x) ) return __LINE__;
+
+        for ( j = 0; j < ELEM_COUNT; j += 2 )
+            x[j] = 1;
+
+        touch(src);
+        y = clmul_ll(x, src);
+        touch(src);
+        z = clmul_lh(x, src);
+        touch(src);
+
+        for ( j = 0; j < ELEM_COUNT; j += 2 )
+            y[j + 1] = z[j];
+
+        if ( !eq(y, src) ) return __LINE__;
+
+        /*
+         * Besides the obvious property of the low and high half products
+         * being the same either direction, the "square" of a number has the
+         * property of simply being the original bit pattern with a zero bit
+         * inserted between any two bits. This is what the code below checks.
+         */
+
+        x = src;
+        touch(src);
+        y = clmul_lh(x, src);
+        touch(src);
+        z = clmul_hl(x, src);
+
+        if ( !eq(y, z) ) return __LINE__;
+
+        touch(src);
+        y = lo = clmul_ll(x, src);
+        touch(src);
+        z = hi = clmul_hh(x, src);
+        touch(src);
+
+        for ( j = 0; j < 64; ++j )
+        {
+            vec_t l = lane_shr_v(lo, 2 * j);
+            vec_t h = lane_shr_v(hi, 2 * j);
+            unsigned int n;
+
+            if ( !eq(l, y) ) return __LINE__;
+            if ( !eq(h, z) ) return __LINE__;
+
+            x = src >> j;
+
+            for ( n = 0; n < ELEM_COUNT; n += 2 )
+            {
+                if ( (x[n + 0] & 1) != (l[n] & 3) ) return __LINE__;
+                if ( (x[n + 1] & 1) != (h[n] & 3) ) return __LINE__;
+            }
+
+            touch(y);
+            y = lane_shr_i(y, 2);
+            touch(z);
+            z = lane_shr_i(z, 2);
+        }
+
+        src += 0x0101010101010101ULL * VEC_SIZE;
+    }
+
+    return 0;
+}
--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -381,6 +381,7 @@ OVR(movntdq);
 OVR(movntdqa);
 OVR(movshdup);
 OVR(movsldup);
+OVR(pclmulqdq);
 OVR(permd);
 OVR(permq);
 OVR(pmovsxbd);
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -13,16 +13,19 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "sse2.h"
 #include "sse2-gf.h"
 #include "ssse3-aes.h"
+#include "ssse3-pclmul.h"
 #include "sse4.h"
 #include "sse4-sha.h"
 #include "avx.h"
 #include "avx-aes.h"
+#include "avx-pclmul.h"
 #include "avx-sha.h"
 #include "fma4.h"
 #include "fma.h"
 #include "avx2.h"
 #include "avx2-sg.h"
 #include "avx2-vaes.h"
+#include "avx2-vpclmulqdq.h"
 #include "avx2-gf.h"
 #include "xop.h"
 #include "avx512f-opmask.h"
@@ -34,10 +37,12 @@ asm ( ".pushsection .test, \"ax\", @prog
 #include "avx512vl-sg.h"
 #include "avx512bw.h"
 #include "avx512bw-vaes.h"
+#include "avx512bw-vpclmulqdq.h"
 #include "avx512bw-gf.h"
 #include "avx512dq.h"
 #include "avx512er.h"
 #include "avx512vbmi.h"
+#include "avx512vbmi2-vpclmulqdq.h"
 
 #define verbose false /* Switch to true for far more logging. */
 
@@ -108,6 +113,16 @@ static bool simd_check_avx_aes(void)
     return cpu_has_aesni && cpu_has_avx;
 }
 
+static bool simd_check_ssse3_pclmul(void)
+{
+    return cpu_has_pclmulqdq && cpu_has_ssse3;
+}
+
+static bool simd_check_avx_pclmul(void)
+{
+    return cpu_has_pclmulqdq && cpu_has_avx;
+}
+
 static bool simd_check_avx512f(void)
 {
     return cpu_has_avx512f;
@@ -189,6 +204,31 @@ static bool simd_check_avx512bw_vaes_vl(
            cpu_has_avx512bw && cpu_has_avx512vl;
 }
 
+static bool simd_check_avx2_vpclmulqdq(void)
+{
+    return cpu_has_vpclmulqdq && cpu_has_avx2;
+}
+
+static bool simd_check_avx512bw_vpclmulqdq(void)
+{
+    return cpu_has_vpclmulqdq && cpu_has_avx512bw;
+}
+
+static bool simd_check_avx512bw_vpclmulqdq_vl(void)
+{
+    return cpu_has_vpclmulqdq && cpu_has_avx512bw && cpu_has_avx512vl;
+}
+
+static bool simd_check_avx512vbmi2_vpclmulqdq(void)
+{
+    return cpu_has_avx512_vbmi2 && simd_check_avx512bw_vpclmulqdq();
+}
+
+static bool simd_check_avx512vbmi2_vpclmulqdq_vl(void)
+{
+    return cpu_has_avx512_vbmi2 && simd_check_avx512bw_vpclmulqdq_vl();
+}
+
 static bool simd_check_sse2_gf(void)
 {
     return cpu_has_gfni && cpu_has_sse2;
@@ -369,6 +409,8 @@ static const struct {
     SIMD(XOP i64x4,               xop,      32i8),
     SIMD(AES (legacy),      ssse3_aes,        16),
     SIMD(AES (VEX/x16),       avx_aes,        16),
+    SIMD(PCLMUL (legacy), ssse3_pclmul,       16),
+    SIMD(PCLMUL (VEX/x2),  avx_pclmul,        16),
     SIMD(OPMASK/w,     avx512f_opmask,         2),
     SIMD(OPMASK+DQ/b, avx512dq_opmask,         1),
     SIMD(OPMASK+DQ/w, avx512dq_opmask,         2),
@@ -475,6 +517,13 @@ static const struct {
     SIMD(VAES (EVEX/x64), avx512bw_vaes,      64),
     AVX512VL(VL+VAES (x16), avx512bw_vaes,    16),
     AVX512VL(VL+VAES (x32), avx512bw_vaes,    32),
+    SIMD(VPCLMUL (VEX/x4), avx2_vpclmulqdq,  32),
+    SIMD(VPCLMUL (EVEX/x8), avx512bw_vpclmulqdq, 64),
+    AVX512VL(VL+VPCLMUL (x4), avx512bw_vpclmulqdq, 16),
+    AVX512VL(VL+VPCLMUL (x8), avx512bw_vpclmulqdq, 32),
+    SIMD(AVX512_VBMI2+VPCLMUL (x8), avx512vbmi2_vpclmulqdq, 64),
+    AVX512VL(_VBMI2+VL+VPCLMUL (x2), avx512vbmi2_vpclmulqdq, 16),
+    AVX512VL(_VBMI2+VL+VPCLMUL (x4), avx512vbmi2_vpclmulqdq, 32),
     SIMD(GFNI (legacy),       sse2_gf,        16),
     SIMD(GFNI (VEX/x16),      avx2_gf,        16),
     SIMD(GFNI (VEX/x32),      avx2_gf,        32),
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -125,6 +125,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_sse        cp.basic.sse
 #define cpu_has_sse2       cp.basic.sse2
 #define cpu_has_sse3       cp.basic.sse3
+#define cpu_has_pclmulqdq  cp.basic.pclmulqdq
 #define cpu_has_ssse3      cp.basic.ssse3
 #define cpu_has_fma       (cp.basic.fma && xcr0_mask(6))
 #define cpu_has_sse4_1     cp.basic.sse4_1