From patchwork Fri Mar 15 10:36:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854447 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 332EF1575 for ; Fri, 15 Mar 2019 10:38:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 183362A8F2 for ; Fri, 15 Mar 2019 10:38:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 09AB82A932; Fri, 15 Mar 2019 10:38:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A90FB2A8F2 for ; Fri, 15 Mar 2019 10:38:20 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kCT-0002wW-DR; Fri, 15 Mar 2019 10:36:29 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kCR-0002wP-9q for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:36:27 +0000 X-Inumbo-ID: 2f6666f7-470e-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 2f6666f7-470e-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:36:26 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:36:25 -0600 Message-Id: <5C8B802A020000780021F116@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:36:26 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 01/50] x86emul: no need to set fault_suppression to false for VMOVNT* X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP When evex.opmsk is required to be zero there's no need for this, as it won't have been set to true in the first place. Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper --- v8: Add this previously standalone patch into the series. --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -5911,7 +5911,6 @@ x86_emulate( CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x2b): /* vmovntp{s,d} [xyz]mm,mem */ generate_exception_if(ea.type != OP_MEM || evex.opmsk, EXC_UD); sfence = true; - fault_suppression = false; /* fall through */ CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x10): /* vmovup{s,d} [xyz]mm/mem,[xyz]mm{k} */ CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x10): /* vmovs{s,d} mem,xmm{k} */ @@ -6795,7 +6794,6 @@ x86_emulate( generate_exception_if(ea.type != OP_MEM || evex.opmsk || evex.w, EXC_UD); sfence = true; - fault_suppression = false; /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f, 0x6f): /* vmovdqa{32,64} [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_F3(0x0f, 0x6f): /* vmovdqu{32,64} [xyz]mm/mem,[xyz]mm{k} */ From patchwork Fri Mar 15 10:36:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854449 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3D9B1575 for ; Fri, 15 Mar 2019 10:38:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9F002A8F2 for ; Fri, 15 Mar 2019 10:38:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E5892A932; Fri, 15 Mar 2019 10:38:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0EE592A8F2 for ; Fri, 15 Mar 2019 10:38:37 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kCv-0002zS-Rc; Fri, 15 Mar 2019 10:36:57 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kCv-0002zJ-1y for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:36:57 +0000 X-Inumbo-ID: 3f8b1f6c-470e-11e9-a383-4782e587b27a Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 3f8b1f6c-470e-11e9-a383-4782e587b27a; Fri, 15 Mar 2019 10:36:53 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:36:52 -0600 Message-Id: <5C8B8045020000780021F119@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:36:53 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 02/50] x86emul: support AVX512{F, BW, DQ} extract insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v4: Make use of d8s_dq64. v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -212,6 +212,7 @@ static const struct test avx512f_all[] = }; static const struct test avx512f_128[] = { + INSN(extractps, 66, 0f3a, 17, el, d, el), INSN(mov, 66, 0f, 6e, el, dq64, el), INSN(mov, 66, 0f, 7e, el, dq64, el), INSN(movq, f3, 0f, 7e, el, q, el), @@ -221,10 +222,14 @@ static const struct test avx512f_128[] = static const struct test avx512f_no128[] = { INSN(broadcastf32x4, 66, 0f38, 1a, el_4, d, vl), INSN(broadcastsd, 66, 0f38, 19, el, q, el), + INSN(extractf32x4, 66, 0f3a, 19, el_4, d, vl), + INSN(extracti32x4, 66, 0f3a, 39, el_4, d, vl), }; static const struct test avx512f_512[] = { INSN(broadcastf64x4, 66, 0f38, 1b, el_4, q, vl), + INSN(extractf64x4, 66, 0f3a, 1b, el_4, q, vl), + INSN(extracti64x4, 66, 0f3a, 3b, el_4, q, vl), }; static const struct test avx512bw_all[] = { @@ -280,6 +285,12 @@ static const struct test avx512bw_all[] INSN(ptestnm, f3, 0f38, 26, vl, bw, vl), }; +static const struct test avx512bw_128[] = { + INSN(pextrb, 66, 0f3a, 14, el, b, el), +// pextrw, 66, 0f, c5, w + INSN(pextrw, 66, 0f3a, 15, el, w, el), +}; + static const struct test avx512dq_all[] = { INSN_PFP(and, 0f, 54), INSN_PFP(andn, 0f, 55), @@ -288,13 +299,21 @@ static const struct test avx512dq_all[] INSN_PFP(xor, 0f, 57), }; +static const struct test avx512dq_128[] = { + INSN(pextr, 66, 0f3a, 16, el, dq64, el), +}; + static const struct test avx512dq_no128[] = { INSN(broadcastf32x2, 66, 0f38, 19, el_2, d, vl), INSN(broadcastf64x2, 66, 0f38, 1a, el_2, q, vl), + INSN(extractf64x2, 66, 0f3a, 19, el_2, q, vl), + INSN(extracti64x2, 66, 0f3a, 39, el_2, q, vl), }; static const struct test avx512dq_512[] = { INSN(broadcastf32x8, 66, 0f38, 1b, el_8, d, vl), + INSN(extractf32x8, 66, 0f3a, 1b, el_8, d, vl), + INSN(extracti32x8, 66, 0f3a, 3b, el_8, d, vl), }; static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; @@ -632,7 +651,9 @@ void evex_disp8_test(void *instr, struct RUN(avx512f, no128); RUN(avx512f, 512); RUN(avx512bw, all); + RUN(avx512bw, 128); RUN(avx512dq, all); + RUN(avx512dq, 128); RUN(avx512dq, no128); RUN(avx512dq, 512); } --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -512,9 +512,13 @@ static const struct ext0f3a_table { [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc }, [0x0c ... 0x0d] = { .simd_size = simd_packed_fp }, [0x0e ... 0x0f] = { .simd_size = simd_packed_int }, - [0x14 ... 0x17] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1 }, + [0x14] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 0 }, + [0x15] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 1 }, + [0x16] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = d8s_dq64 }, + [0x17] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 2 }, [0x18] = { .simd_size = simd_128 }, - [0x19] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1 }, + [0x19] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 }, + [0x1b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x1d] = { .simd_size = simd_other, .to_mem = 1, .two_op = 1 }, [0x1e ... 0x1f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x20] = { .simd_size = simd_none }, @@ -523,7 +527,8 @@ static const struct ext0f3a_table { [0x25] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x30 ... 0x33] = { .simd_size = simd_other, .two_op = 1 }, [0x38] = { .simd_size = simd_128 }, - [0x39] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1 }, + [0x39] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 }, + [0x3b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x40 ... 0x41] = { .simd_size = simd_packed_fp }, [0x42] = { .simd_size = simd_packed_int }, @@ -2676,6 +2681,8 @@ x86_decode_0f3a( ... X86EMUL_OPC_66(0, 0x17): /* pextr*, extractps */ case X86EMUL_OPC_VEX_66(0, 0x14) ... X86EMUL_OPC_VEX_66(0, 0x17): /* vpextr*, vextractps */ + case X86EMUL_OPC_EVEX_66(0, 0x14) + ... X86EMUL_OPC_EVEX_66(0, 0x17): /* vpextr*, vextractps */ case X86EMUL_OPC_VEX_F2(0, 0xf0): /* rorx */ break; @@ -8878,9 +8885,9 @@ x86_emulate( opc[0] = b; /* Convert memory/GPR operand to (%rAX). */ rex_prefix &= ~REX_B; - vex.b = 1; + evex.b = vex.b = 1; if ( !mode_64bit() ) - vex.w = 0; + evex.w = vex.w = 0; opc[1] = modrm & 0x38; opc[2] = imm1; opc[3] = 0xc3; @@ -8890,7 +8897,10 @@ x86_emulate( --opc; } - copy_REX_VEX(opc, rex_prefix, vex); + if ( evex_encoded() ) + copy_EVEX(opc, evex); + else + copy_REX_VEX(opc, rex_prefix, vex); invoke_stub("", "", "=m" (dst.val) : "a" (&dst.val)); put_stub(stub); @@ -8915,6 +8925,52 @@ x86_emulate( opc = init_prefixes(stub); goto pextr; + case X86EMUL_OPC_EVEX_66(0x0f, 0xc5): /* vpextrw $imm8,xmm,reg */ + generate_exception_if(ea.type != OP_REG, EXC_UD); + /* Convert to alternative encoding: We want to use a memory operand. */ + evex.opcx = ext_0f3a; + b = 0x15; + modrm <<= 3; + evex.r = evex.b; + evex.R = evex.x; + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x14): /* vpextrb $imm8,xmm,r/m */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x15): /* vpextrw $imm8,xmm,r/m */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x16): /* vpextr{d,q} $imm8,xmm,r/m */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x17): /* vextractps $imm8,xmm,r/m */ + generate_exception_if((evex.lr || evex.reg != 0xf || !evex.RX || + evex.opmsk || evex.brs), + EXC_UD); + if ( !(b & 2) ) + host_and_vcpu_must_have(avx512bw); + else if ( !(b & 1) ) + host_and_vcpu_must_have(avx512dq); + else + host_and_vcpu_must_have(avx512f); + get_fpu(X86EMUL_FPU_zmm); + opc = init_evex(stub); + goto pextr; + + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x19): /* vextractf32x4 $imm8,{y,z}mm,xmm/m128{k} */ + /* vextractf64x2 $imm8,{y,z}mm,xmm/m128{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x39): /* vextracti32x4 $imm8,{y,z}mm,xmm/m128{k} */ + /* vextracti64x2 $imm8,{y,z}mm,xmm/m128{k} */ + if ( evex.w ) + host_and_vcpu_must_have(avx512dq); + generate_exception_if(!evex.lr || evex.brs, EXC_UD); + fault_suppression = false; + goto avx512f_imm8_no_sae; + + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x1b): /* vextractf32x8 $imm8,zmm,ymm/m256{k} */ + /* vextractf64x4 $imm8,zmm,ymm/m256{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x3b): /* vextracti32x8 $imm8,zmm,ymm/m256{k} */ + /* vextracti64x4 $imm8,zmm,ymm/m256{k} */ + if ( !evex.w ) + host_and_vcpu_must_have(avx512dq); + generate_exception_if(evex.lr != 2 || evex.brs, EXC_UD); + fault_suppression = false; + goto avx512f_imm8_no_sae; + case X86EMUL_OPC_VEX_66(0x0f3a, 0x1d): /* vcvtps2ph $imm8,{x,y}mm,xmm/mem */ { uint32_t mxcsr; From patchwork Fri Mar 15 10:37:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854451 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B5A71575 for ; Fri, 15 Mar 2019 10:39:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E6A82A16A for ; Fri, 15 Mar 2019 10:39:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1FACC2A191; Fri, 15 Mar 2019 10:39:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 661612A16A for ; Fri, 15 Mar 2019 10:39:08 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kDM-00032j-5g; Fri, 15 Mar 2019 10:37:24 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kDL-00032X-1l for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:37:23 +0000 X-Inumbo-ID: 505686fe-470e-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 505686fe-470e-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:37:21 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:37:21 -0600 Message-Id: <5C8B8060020000780021F11C@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:37:20 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 03/50] x86emul: support AVX512{F, BW, DQ} insert insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Also correct the comment of the AVX form of VINSERTPS. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v6: Don't refuse to emulate VINSERTPS without AVX512VL. v4: Make use of d8s_dq64. v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -213,6 +213,7 @@ static const struct test avx512f_all[] = static const struct test avx512f_128[] = { INSN(extractps, 66, 0f3a, 17, el, d, el), + INSN(insertps, 66, 0f3a, 21, el, d, el), INSN(mov, 66, 0f, 6e, el, dq64, el), INSN(mov, 66, 0f, 7e, el, dq64, el), INSN(movq, f3, 0f, 7e, el, q, el), @@ -224,12 +225,16 @@ static const struct test avx512f_no128[] INSN(broadcastsd, 66, 0f38, 19, el, q, el), INSN(extractf32x4, 66, 0f3a, 19, el_4, d, vl), INSN(extracti32x4, 66, 0f3a, 39, el_4, d, vl), + INSN(insertf32x4, 66, 0f3a, 18, el_4, d, vl), + INSN(inserti32x4, 66, 0f3a, 38, el_4, d, vl), }; static const struct test avx512f_512[] = { INSN(broadcastf64x4, 66, 0f38, 1b, el_4, q, vl), INSN(extractf64x4, 66, 0f3a, 1b, el_4, q, vl), INSN(extracti64x4, 66, 0f3a, 3b, el_4, q, vl), + INSN(insertf64x4, 66, 0f3a, 1a, el_4, q, vl), + INSN(inserti64x4, 66, 0f3a, 3a, el_4, q, vl), }; static const struct test avx512bw_all[] = { @@ -289,6 +294,8 @@ static const struct test avx512bw_128[] INSN(pextrb, 66, 0f3a, 14, el, b, el), // pextrw, 66, 0f, c5, w INSN(pextrw, 66, 0f3a, 15, el, w, el), + INSN(pinsrb, 66, 0f3a, 20, el, b, el), + INSN(pinsrw, 66, 0f, c4, el, w, el), }; static const struct test avx512dq_all[] = { @@ -301,6 +308,7 @@ static const struct test avx512dq_all[] static const struct test avx512dq_128[] = { INSN(pextr, 66, 0f3a, 16, el, dq64, el), + INSN(pinsr, 66, 0f3a, 22, el, dq64, el), }; static const struct test avx512dq_no128[] = { @@ -308,12 +316,16 @@ static const struct test avx512dq_no128[ INSN(broadcastf64x2, 66, 0f38, 1a, el_2, q, vl), INSN(extractf64x2, 66, 0f3a, 19, el_2, q, vl), INSN(extracti64x2, 66, 0f3a, 39, el_2, q, vl), + INSN(insertf64x2, 66, 0f3a, 18, el_2, q, vl), + INSN(inserti64x2, 66, 0f3a, 38, el_2, q, vl), }; static const struct test avx512dq_512[] = { INSN(broadcastf32x8, 66, 0f38, 1b, el_8, d, vl), INSN(extractf32x8, 66, 0f3a, 1b, el_8, d, vl), INSN(extracti32x8, 66, 0f3a, 3b, el_8, d, vl), + INSN(insertf32x8, 66, 0f3a, 1a, el_8, d, vl), + INSN(inserti32x8, 66, 0f3a, 3a, el_8, d, vl), }; static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -360,7 +360,7 @@ static const struct twobyte_table { [0xc1] = { DstMem|SrcReg|ModRM }, [0xc2] = { DstImplicit|SrcImmByte|ModRM, simd_any_fp, d8s_vl }, [0xc3] = { DstMem|SrcReg|ModRM|Mov }, - [0xc4] = { DstReg|SrcImmByte|ModRM, simd_packed_int }, + [0xc4] = { DstReg|SrcImmByte|ModRM, simd_packed_int, 1 }, [0xc5] = { DstReg|SrcImmByte|ModRM|Mov }, [0xc6] = { DstImplicit|SrcImmByte|ModRM, simd_packed_fp, d8s_vl }, [0xc7] = { ImplicitOps|ModRM }, @@ -516,17 +516,19 @@ static const struct ext0f3a_table { [0x15] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 1 }, [0x16] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = d8s_dq64 }, [0x17] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 2 }, - [0x18] = { .simd_size = simd_128 }, + [0x18] = { .simd_size = simd_128, .d8s = 4 }, [0x19] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 }, + [0x1a] = { .simd_size = simd_256, .d8s = d8s_vl_by_2 }, [0x1b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x1d] = { .simd_size = simd_other, .to_mem = 1, .two_op = 1 }, [0x1e ... 0x1f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, - [0x20] = { .simd_size = simd_none }, - [0x21] = { .simd_size = simd_other }, - [0x22] = { .simd_size = simd_none }, + [0x20] = { .simd_size = simd_none, .d8s = 0 }, + [0x21] = { .simd_size = simd_other, .d8s = 2 }, + [0x22] = { .simd_size = simd_none, .d8s = d8s_dq64 }, [0x25] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x30 ... 0x33] = { .simd_size = simd_other, .two_op = 1 }, - [0x38] = { .simd_size = simd_128 }, + [0x38] = { .simd_size = simd_128, .d8s = 4 }, + [0x3a] = { .simd_size = simd_256, .d8s = d8s_vl_by_2 }, [0x39] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 }, [0x3b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, @@ -2586,6 +2588,7 @@ x86_decode_twobyte( ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK); /* fall through */ case X86EMUL_OPC_VEX_66(0, 0xc4): /* vpinsrw */ + case X86EMUL_OPC_EVEX_66(0, 0xc4): /* vpinsrw */ state->desc = DstReg | SrcMem16; break; @@ -2688,6 +2691,7 @@ x86_decode_0f3a( case X86EMUL_OPC_66(0, 0x20): /* pinsrb */ case X86EMUL_OPC_VEX_66(0, 0x20): /* vpinsrb */ + case X86EMUL_OPC_EVEX_66(0, 0x20): /* vpinsrb */ state->desc = DstImplicit | SrcMem; if ( modrm_mod != 3 ) state->desc |= ByteOp; @@ -2695,6 +2699,7 @@ x86_decode_0f3a( case X86EMUL_OPC_66(0, 0x22): /* pinsr{d,q} */ case X86EMUL_OPC_VEX_66(0, 0x22): /* vpinsr{d,q} */ + case X86EMUL_OPC_EVEX_66(0, 0x22): /* vpinsr{d,q} */ state->desc = DstImplicit | SrcMem; break; @@ -7735,6 +7740,23 @@ x86_emulate( ea.type = OP_MEM; goto simd_0f_int_imm8; + case X86EMUL_OPC_EVEX_66(0x0f, 0xc4): /* vpinsrw $imm8,r32/m16,xmm,xmm */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x20): /* vpinsrb $imm8,r32/m8,xmm,xmm */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x22): /* vpinsr{d,q} $imm8,r/m,xmm,xmm */ + generate_exception_if(evex.lr || evex.opmsk || evex.brs, EXC_UD); + if ( b & 2 ) + host_and_vcpu_must_have(avx512dq); + else + host_and_vcpu_must_have(avx512bw); + if ( !mode_64bit() ) + evex.w = 0; + memcpy(mmvalp, &src.val, op_bytes); + ea.type = OP_MEM; + op_bytes = src.bytes; + d = SrcMem16; /* Fake for the common SIMD code below. */ + state->simd_size = simd_other; + goto avx512f_imm8_no_sae; + CASE_SIMD_PACKED_INT(0x0f, 0xc5): /* pextrw $imm8,{,x}mm,reg */ case X86EMUL_OPC_VEX_66(0x0f, 0xc5): /* vpextrw $imm8,xmm,reg */ generate_exception_if(vex.l, EXC_UD); @@ -8951,8 +8973,12 @@ x86_emulate( opc = init_evex(stub); goto pextr; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x18): /* vinsertf32x4 $imm8,xmm/m128,{y,z}mm{k} */ + /* vinsertf64x2 $imm8,xmm/m128,{y,z}mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x19): /* vextractf32x4 $imm8,{y,z}mm,xmm/m128{k} */ /* vextractf64x2 $imm8,{y,z}mm,xmm/m128{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x38): /* vinserti32x4 $imm8,xmm/m128,{y,z}mm{k} */ + /* vinserti64x2 $imm8,xmm/m128,{y,z}mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x39): /* vextracti32x4 $imm8,{y,z}mm,xmm/m128{k} */ /* vextracti64x2 $imm8,{y,z}mm,xmm/m128{k} */ if ( evex.w ) @@ -8961,8 +8987,12 @@ x86_emulate( fault_suppression = false; goto avx512f_imm8_no_sae; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x1a): /* vinsertf32x4 $imm8,ymm/m256,zmm{k} */ + /* vinsertf64x2 $imm8,ymm/m256,zmm{k} */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x1b): /* vextractf32x8 $imm8,zmm,ymm/m256{k} */ /* vextractf64x4 $imm8,zmm,ymm/m256{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x3a): /* vinserti32x4 $imm8,ymm/m256,zmm{k} */ + /* vinserti64x2 $imm8,ymm/m256,zmm{k} */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x3b): /* vextracti32x8 $imm8,zmm,ymm/m256{k} */ /* vextracti64x4 $imm8,zmm,ymm/m256{k} */ if ( !evex.w ) @@ -9055,13 +9085,20 @@ x86_emulate( op_bytes = 4; goto simd_0f3a_common; - case X86EMUL_OPC_VEX_66(0x0f3a, 0x21): /* vinsertps $imm8,xmm/m128,xmm,xmm */ + case X86EMUL_OPC_VEX_66(0x0f3a, 0x21): /* vinsertps $imm8,xmm/m32,xmm,xmm */ op_bytes = 4; /* fall through */ case X86EMUL_OPC_VEX_66(0x0f3a, 0x41): /* vdppd $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */ generate_exception_if(vex.l, EXC_UD); goto simd_0f_imm8_avx; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x21): /* vinsertps $imm8,xmm/m32,xmm,xmm */ + host_and_vcpu_must_have(avx512f); + generate_exception_if(evex.lr || evex.w || evex.opmsk || evex.brs, + EXC_UD); + op_bytes = 4; + goto simd_imm8_zmm; + case X86EMUL_OPC_VEX_66(0x0f3a, 0x30): /* kshiftr{b,w} $imm8,k,k */ case X86EMUL_OPC_VEX_66(0x0f3a, 0x32): /* kshiftl{b,w} $imm8,k,k */ if ( !vex.w ) From patchwork Fri Mar 15 10:38:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854453 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E77961575 for ; Fri, 15 Mar 2019 10:40:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CBE812A934 for ; Fri, 15 Mar 2019 10:40:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BFC832A937; Fri, 15 Mar 2019 10:40:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D7BCD2A934 for ; Fri, 15 Mar 2019 10:40:24 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kEb-0003EJ-I9; Fri, 15 Mar 2019 10:38:41 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kEa-0003E9-1H for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:38:40 +0000 X-Inumbo-ID: 7dfca9dc-470e-11e9-8d48-b7475793dce7 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 7dfca9dc-470e-11e9-8d48-b7475793dce7; Fri, 15 Mar 2019 10:38:38 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:38:37 -0600 Message-Id: <5C8B80AB020000780021F11F@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:38:35 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 04/50] x86emul: basic AVX512F testing X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Test various of the insns which have been implemented already. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v6: Fix formatting in simd.h. v5: Add VSQRT* tests. v4: Make eq() also work for 4- and 8-byte integer element sizes. v3: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86 CFLAGS += $(CFLAGS_xeninclude) -SIMD := 3dnow sse sse2 sse4 avx avx2 xop +SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f FMA := fma4 fma SG := avx2-sg TESTCASES := blowfish $(SIMD) $(FMA) $(SG) @@ -63,6 +63,9 @@ avx2-sg-flts := 4 8 xop-vecs := $(avx-vecs) xop-ints := 1 2 4 8 xop-flts := $(avx-flts) +avx512f-vecs := 64 +avx512f-ints := 4 8 +avx512f-flts := 4 8 avx512f-opmask-vecs := 2 avx512dq-opmask-vecs := 1 @@ -170,7 +173,7 @@ $(addsuffix .c,$(SG)): $(addsuffix .h,$(SIMD) $(FMA) $(SG)): simd.h -xop.h: simd-fma.c +xop.h avx512f.h: simd-fma.c endif # 32-bit override --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -2,7 +2,41 @@ ENTRY(simd_test); -#if VEC_SIZE == 8 && defined(__SSE__) +#if defined(__AVX512F__) +# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT)) +# if VEC_SIZE == 4 +# define eq(x, y) ({ \ + float x_ = (x)[0]; \ + float __attribute__((vector_size(16))) y_ = { (y)[0] }; \ + unsigned short r_; \ + asm ( "vcmpss $0, %1, %2, %0" : "=k" (r_) : "m" (x_), "v" (y_) ); \ + r_ == 1; \ +}) +# elif VEC_SIZE == 8 +# define eq(x, y) ({ \ + double x_ = (x)[0]; \ + double __attribute__((vector_size(16))) y_ = { (y)[0] }; \ + unsigned short r_; \ + asm ( "vcmpsd $0, %1, %2, %0" : "=k" (r_) : "m" (x_), "v" (y_) ); \ + r_ == 1; \ +}) +# elif FLOAT_SIZE == 4 +/* + * gcc's (up to at least 8.2) __builtin_ia32_cmpps256_mask() has an anomaly in + * that its return type is QI rather than UQI, and hence the value would get + * sign-extended before comapring to ALL_TRUE. The same oddity does not matter + * for __builtin_ia32_cmppd256_mask(), as there only 4 bits are significant. + * Hence the extra " & ALL_TRUE". + */ +# define eq(x, y) ((BR(cmpps, _mask, x, y, 0, -1) & ALL_TRUE) == ALL_TRUE) +# elif FLOAT_SIZE == 8 +# define eq(x, y) (BR(cmppd, _mask, x, y, 0, -1) == ALL_TRUE) +# elif INT_SIZE == 4 || UINT_SIZE == 4 +# define eq(x, y) (B(pcmpeqd, _mask, (vsi_t)(x), (vsi_t)(y), -1) == ALL_TRUE) +# elif INT_SIZE == 8 || UINT_SIZE == 8 +# define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE) +# endif +#elif VEC_SIZE == 8 && defined(__SSE__) # define to_bool(cmp) (__builtin_ia32_pmovmskb(cmp) == 0xff) #elif VEC_SIZE == 16 # if defined(__AVX__) && defined(FLOAT_SIZE) @@ -93,6 +127,56 @@ static inline bool _to_bool(byte_vec_t b touch(x); \ __builtin_ia32_pfrcpit2(__builtin_ia32_pfrsqit1(__builtin_ia32_pfmul(t_, t_), x), t_); \ }) +#elif defined(FLOAT_SIZE) && VEC_SIZE == FLOAT_SIZE && defined(__AVX512F__) +# if FLOAT_SIZE == 4 +# define sqrt(x) scalar_1op(x, "vsqrtss %[in], %[out], %[out]") +# elif FLOAT_SIZE == 8 +# define sqrt(x) scalar_1op(x, "vsqrtsd %[in], %[out], %[out]") +# endif +#elif defined(FLOAT_SIZE) && defined(__AVX512F__) && \ + (VEC_SIZE == 64 || defined(__AVX512VL__)) +# if FLOAT_SIZE == 4 +# define broadcast(x) ({ \ + vec_t t_; \ + asm ( "%{evex%} vbroadcastss %1, %0" \ + : "=v" (t_) : "m" (*(float[1]){ x }) ); \ + t_; \ +}) +# define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) +# define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) +# define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) +# define sqrt(x) BR(sqrtps, _mask, x, undef(), ~0) +# if VEC_SIZE == 16 +# define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0) +# define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0) +# define swap(x) B(shufps, _mask, x, x, 0b00011011, undef(), ~0) +# endif +# elif FLOAT_SIZE == 8 +# if VEC_SIZE >= 32 +# define broadcast(x) ({ \ + vec_t t_; \ + asm ( "%{evex%} vbroadcastsd %1, %0" : "=v" (t_) \ + : "m" (*(double[1]){ x }) ); \ + t_; \ +}) +# else +# define broadcast(x) ({ \ + vec_t t_; \ + asm ( "%{evex%} vpbroadcastq %1, %0" \ + : "=v" (t_) : "m" (*(double[1]){ x }) ); \ + t_; \ +}) +# endif +# define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0) +# define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) +# define mix(x, y) B(movapd, _mask, x, y, 0b01010101) +# define sqrt(x) BR(sqrtpd, _mask, x, undef(), ~0) +# if VEC_SIZE == 16 +# define interleave_hi(x, y) B(unpckhpd, _mask, x, y, undef(), ~0) +# define interleave_lo(x, y) B(unpcklpd, _mask, x, y, undef(), ~0) +# define swap(x) B(shufpd, _mask, x, x, 0b01, undef(), ~0) +# endif +# endif #elif FLOAT_SIZE == 4 && defined(__SSE__) # if VEC_SIZE == 32 && defined(__AVX__) # if defined(__AVX2__) @@ -191,7 +275,30 @@ static inline bool _to_bool(byte_vec_t b # define sqrt(x) scalar_1op(x, "sqrtsd %[in], %[out]") # endif #endif -#if VEC_SIZE == 16 && defined(__SSE2__) +#if (INT_SIZE == 4 || UINT_SIZE == 4 || INT_SIZE == 8 || UINT_SIZE == 8) && \ + defined(__AVX512F__) && (VEC_SIZE == 64 || defined(__AVX512VL__)) +# if INT_SIZE == 4 || UINT_SIZE == 4 +# define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ + (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) +# elif INT_SIZE == 8 || UINT_SIZE == 8 +# define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101)) +# endif +# if INT_SIZE == 4 +# define max(x, y) B(pmaxsd, _mask, x, y, undef(), ~0) +# define min(x, y) B(pminsd, _mask, x, y, undef(), ~0) +# define mul_full(x, y) ((vec_t)B(pmuldq, _mask, x, y, (vdi_t)undef(), ~0)) +# elif UINT_SIZE == 4 +# define max(x, y) ((vec_t)B(pmaxud, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) +# define min(x, y) ((vec_t)B(pminud, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) +# define mul_full(x, y) ((vec_t)B(pmuludq, _mask, (vsi_t)(x), (vsi_t)(y), (vdi_t)undef(), ~0)) +# elif INT_SIZE == 8 +# define max(x, y) ((vec_t)B(pmaxsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) +# define min(x, y) ((vec_t)B(pminsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) +# elif UINT_SIZE == 8 +# define max(x, y) ((vec_t)B(pmaxuq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) +# define min(x, y) ((vec_t)B(pminuq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) +# endif +#elif VEC_SIZE == 16 && defined(__SSE2__) # if INT_SIZE == 1 || UINT_SIZE == 1 # define interleave_hi(x, y) ((vec_t)__builtin_ia32_punpckhbw128((vqi_t)(x), (vqi_t)(y))) # define interleave_lo(x, y) ((vec_t)__builtin_ia32_punpcklbw128((vqi_t)(x), (vqi_t)(y))) @@ -587,6 +694,10 @@ static inline bool _to_bool(byte_vec_t b # endif #endif +#if defined(__AVX512F__) && defined(FLOAT_SIZE) +# include "simd-fma.c" +#endif + int simd_test(void) { unsigned int i, j; @@ -1034,7 +1145,8 @@ int simd_test(void) # endif #endif -#if defined(__XOP__) && VEC_SIZE == 16 && (INT_SIZE == 2 || INT_SIZE == 4) +#if (defined(__XOP__) && VEC_SIZE == 16 && (INT_SIZE == 2 || INT_SIZE == 4)) || \ + (defined(__AVX512F__) && defined(FLOAT_SIZE)) return -fma_test(); #endif --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -70,9 +70,111 @@ typedef int __attribute__((vector_size(V typedef long long __attribute__((vector_size(VEC_SIZE))) vdi_t; #endif +#if VEC_SIZE == 16 +# define B(n, s, a...) __builtin_ia32_ ## n ## 128 ## s(a) +# define B_(n, s, a...) __builtin_ia32_ ## n ## s(a) +#elif VEC_SIZE == 32 +# define B(n, s, a...) __builtin_ia32_ ## n ## 256 ## s(a) +#elif VEC_SIZE == 64 +# define B(n, s, a...) __builtin_ia32_ ## n ## 512 ## s(a) +# define BR(n, s, a...) __builtin_ia32_ ## n ## 512 ## s(a, 4) +#endif +#ifndef B_ +# define B_ B +#endif +#ifndef BR +# define BR B +# define BR_ B_ +#endif +#ifndef BR_ +# define BR_ BR +#endif + +#ifdef __AVX512F__ + +/* + * The original plan was to effect use of EVEX encodings for scalar as well as + * 128- and 256-bit insn variants by restricting the compiler to use (on 64-bit + * only of course) XMM16-XMM31 only. All sorts of compiler errors result when + * doing this with gcc 8.2. Therefore resort to injecting {evex} prefixes, + * which has the benefit of also working for 32-bit. Granted, there is a lot of + * escaping to get right here. + */ +asm ( ".macro override insn \n\t" + ".macro $\\insn o:vararg \n\t" + ".purgem \\insn \n\t" + "{evex} \\insn \\(\\)o \n\t" + ".macro \\insn o:vararg \n\t" + "$\\insn \\(\\(\\))o \n\t" + ".endm \n\t" + ".endm \n\t" + ".macro \\insn o:vararg \n\t" + "$\\insn \\(\\)o \n\t" + ".endm \n\t" + ".endm" ); + +# define OVR(n) asm ( "override v" #n ) +# define OVR_SFP(n) OVR(n ## sd); OVR(n ## ss) + +# ifdef __AVX512VL__ +# ifdef __AVX512BW__ +# define OVR_BW(n) OVR(p ## n ## b); OVR(p ## n ## w) +# else +# define OVR_BW(n) +# endif +# define OVR_DQ(n) OVR(p ## n ## d); OVR(p ## n ## q) +# define OVR_VFP(n) OVR(n ## pd); OVR(n ## ps) +# else +# define OVR_BW(n) +# define OVR_DQ(n) +# define OVR_VFP(n) +# endif + +# define OVR_FMA(n, w) OVR_ ## w(n ## 132); OVR_ ## w(n ## 213); \ + OVR_ ## w(n ## 231) +# define OVR_FP(n) OVR_VFP(n); OVR_SFP(n) +# define OVR_INT(n) OVR_BW(n); OVR_DQ(n) + +OVR_SFP(broadcast); +OVR_SFP(comi); +OVR_FP(add); +OVR_FP(div); +OVR(extractps); +OVR_FMA(fmadd, FP); +OVR_FMA(fmsub, FP); +OVR_FMA(fnmadd, FP); +OVR_FMA(fnmsub, FP); +OVR(insertps); +OVR_FP(max); +OVR_FP(min); +OVR(movd); +OVR(movq); +OVR_SFP(mov); +OVR_FP(mul); +OVR_FP(sqrt); +OVR_FP(sub); +OVR_SFP(ucomi); + +# undef OVR_VFP +# undef OVR_SFP +# undef OVR_INT +# undef OVR_FP +# undef OVR_FMA +# undef OVR_DQ +# undef OVR_BW +# undef OVR + +#endif /* __AVX512F__ */ + /* * Suppress value propagation by the compiler, preventing unwanted * optimization. This at once makes the compiler use memory operands * more often, which for our purposes is the more interesting case. */ #define touch(var) asm volatile ( "" : "+m" (var) ) + +static inline vec_t undef(void) +{ + vec_t v = v; + return v; +} --- a/tools/tests/x86_emulator/simd-fma.c +++ b/tools/tests/x86_emulator/simd-fma.c @@ -1,10 +1,9 @@ +#if !defined(__XOP__) && !defined(__AVX512F__) #include "simd.h" - -#ifndef __XOP__ ENTRY(fma_test); #endif -#if VEC_SIZE < 16 +#if VEC_SIZE < 16 && !defined(to_bool) # define to_bool(cmp) (!~(cmp)[0]) #elif VEC_SIZE == 16 # if FLOAT_SIZE == 4 @@ -24,7 +23,13 @@ ENTRY(fma_test); # define eq(x, y) to_bool((x) == (y)) #endif -#if VEC_SIZE == 16 +#if defined(__AVX512F__) && VEC_SIZE > FLOAT_SIZE +# if FLOAT_SIZE == 4 +# define fmaddsub(x, y, z) BR(vfmaddsubps, _mask, x, y, z, ~0) +# elif FLOAT_SIZE == 8 +# define fmaddsub(x, y, z) BR(vfmaddsubpd, _mask, x, y, z, ~0) +# endif +#elif VEC_SIZE == 16 # if FLOAT_SIZE == 4 # define addsub(x, y) __builtin_ia32_addsubps(x, y) # if defined(__FMA4__) || defined(__FMA__) @@ -50,6 +55,10 @@ ENTRY(fma_test); # endif #endif +#if defined(fmaddsub) && !defined(addsub) +# define addsub(x, y) fmaddsub(x, broadcast(1), y) +#endif + int fma_test(void) { unsigned int i; --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -21,6 +21,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512f-opmask.h" #include "avx512dq-opmask.h" #include "avx512bw-opmask.h" +#include "avx512f.h" #define verbose false /* Switch to true for far more logging. */ @@ -248,6 +249,14 @@ static const struct { SIMD(OPMASK/b, avx512dq_opmask, 1), SIMD(OPMASK/d, avx512bw_opmask, 4), SIMD(OPMASK/q, avx512bw_opmask, 8), + SIMD(AVX512F f32 scalar, avx512f, f4), + SIMD(AVX512F f32x16, avx512f, 64f4), + SIMD(AVX512F f64 scalar, avx512f, f8), + SIMD(AVX512F f64x8, avx512f, 64f8), + SIMD(AVX512F s32x16, avx512f, 64i4), + SIMD(AVX512F u32x16, avx512f, 64u4), + SIMD(AVX512F s64x8, avx512f, 64i8), + SIMD(AVX512F u64x8, avx512f, 64u8), #undef SIMD_ #undef SIMD }; From patchwork Fri Mar 15 10:39:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854455 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E85841575 for ; Fri, 15 Mar 2019 10:40:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD5F72A933 for ; Fri, 15 Mar 2019 10:40:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C0C862A937; Fri, 15 Mar 2019 10:40:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 06AAF2A933 for ; Fri, 15 Mar 2019 10:40:50 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kEz-0003K1-1T; Fri, 15 Mar 2019 10:39:05 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kEy-0003Ji-0l for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:39:04 +0000 X-Inumbo-ID: 8c55f094-470e-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 8c55f094-470e-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:39:02 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:39:01 -0600 Message-Id: <5C8B80C5020000780021F122@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:39:01 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 05/50] x86emul: support AVX512{F, BW, DQ} integer broadcast insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Note that the pbroadcastw table entry in evex-disp8.c is slightly different from what one would expect, due to it requiring EVEX.W to be zero. Signed-off-by: Jan Beulich --- v7: Use dummy output in invoke_stub(). Re-base. v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -164,6 +164,9 @@ static const struct test avx512f_all[] = INSN(paddq, 66, 0f, d4, vl, q, vl), INSN(pand, 66, 0f, db, vl, dq, vl), INSN(pandn, 66, 0f, df, vl, dq, vl), +// pbroadcast, 66, 0f38, 7c, dq64 + INSN(pbroadcastd, 66, 0f38, 58, el, d, el), + INSN(pbroadcastq, 66, 0f38, 59, el, q, el), INSN(pcmp, 66, 0f3a, 1f, vl, dq, vl), INSN(pcmpeqd, 66, 0f, 76, vl, d, vl), INSN(pcmpeqq, 66, 0f38, 29, vl, q, vl), @@ -222,6 +225,7 @@ static const struct test avx512f_128[] = static const struct test avx512f_no128[] = { INSN(broadcastf32x4, 66, 0f38, 1a, el_4, d, vl), + INSN(broadcasti32x4, 66, 0f38, 5a, el_4, d, vl), INSN(broadcastsd, 66, 0f38, 19, el, q, el), INSN(extractf32x4, 66, 0f3a, 19, el_4, d, vl), INSN(extracti32x4, 66, 0f3a, 39, el_4, d, vl), @@ -231,6 +235,7 @@ static const struct test avx512f_no128[] static const struct test avx512f_512[] = { INSN(broadcastf64x4, 66, 0f38, 1b, el_4, q, vl), + INSN(broadcasti64x4, 66, 0f38, 5b, el_4, q, vl), INSN(extractf64x4, 66, 0f3a, 1b, el_4, q, vl), INSN(extracti64x4, 66, 0f3a, 3b, el_4, q, vl), INSN(insertf64x4, 66, 0f3a, 1a, el_4, q, vl), @@ -250,6 +255,10 @@ static const struct test avx512bw_all[] INSN(paddw, 66, 0f, fd, vl, w, vl), INSN(pavgb, 66, 0f, e0, vl, b, vl), INSN(pavgw, 66, 0f, e3, vl, w, vl), + INSN(pbroadcastb, 66, 0f38, 78, el, b, el), +// pbroadcastb, 66, 0f38, 7a, b + INSN(pbroadcastw, 66, 0f38, 79, el_2, b, vl), +// pbroadcastw, 66, 0f38, 7b, b INSN(pcmp, 66, 0f3a, 3f, vl, bw, vl), INSN(pcmpeqb, 66, 0f, 74, vl, b, vl), INSN(pcmpeqw, 66, 0f, 75, vl, w, vl), @@ -301,6 +310,7 @@ static const struct test avx512bw_128[] static const struct test avx512dq_all[] = { INSN_PFP(and, 0f, 54), INSN_PFP(andn, 0f, 55), + INSN(broadcasti32x2, 66, 0f38, 59, el_2, d, vl), INSN_PFP(or, 0f, 56), INSN(pmullq, 66, 0f38, 40, vl, q, vl), INSN_PFP(xor, 0f, 57), @@ -314,6 +324,7 @@ static const struct test avx512dq_128[] static const struct test avx512dq_no128[] = { INSN(broadcastf32x2, 66, 0f38, 19, el_2, d, vl), INSN(broadcastf64x2, 66, 0f38, 1a, el_2, q, vl), + INSN(broadcasti64x2, 66, 0f38, 5a, el_2, q, vl), INSN(extractf64x2, 66, 0f3a, 19, el_2, q, vl), INSN(extracti64x2, 66, 0f3a, 39, el_2, q, vl), INSN(insertf64x2, 66, 0f3a, 18, el_2, q, vl), @@ -322,6 +333,7 @@ static const struct test avx512dq_no128[ static const struct test avx512dq_512[] = { INSN(broadcastf32x8, 66, 0f38, 1b, el_8, d, vl), + INSN(broadcasti32x8, 66, 0f38, 5b, el_8, d, vl), INSN(extractf32x8, 66, 0f3a, 1b, el_8, d, vl), INSN(extracti32x8, 66, 0f3a, 3b, el_8, d, vl), INSN(insertf32x8, 66, 0f3a, 1a, el_8, d, vl), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -278,9 +278,33 @@ static inline bool _to_bool(byte_vec_t b #if (INT_SIZE == 4 || UINT_SIZE == 4 || INT_SIZE == 8 || UINT_SIZE == 8) && \ defined(__AVX512F__) && (VEC_SIZE == 64 || defined(__AVX512VL__)) # if INT_SIZE == 4 || UINT_SIZE == 4 +# define broadcast(x) ({ \ + vec_t t_; \ + asm ( "%{evex%} vpbroadcastd %1, %0" \ + : "=v" (t_) : "m" (*(int[1]){ x }) ); \ + t_; \ +}) +# define broadcast2(x) ({ \ + vec_t t_; \ + asm ( "vpbroadcastd %k1, %0" : "=v" (t_) : "r" (x) ); \ + t_; \ +}) # define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) # elif INT_SIZE == 8 || UINT_SIZE == 8 +# define broadcast(x) ({ \ + vec_t t_; \ + asm ( "%{evex%} vpbroadcastq %1, %0" \ + : "=v" (t_) : "m" (*(long long[1]){ x }) ); \ + t_; \ +}) +# ifdef __x86_64__ +# define broadcast2(x) ({ \ + vec_t t_; \ + asm ( "vpbroadcastq %1, %0" : "=v" (t_) : "r" ((x) + 0ULL) ); \ + t_; \ +}) +# endif # define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101)) # endif # if INT_SIZE == 4 @@ -977,10 +1001,14 @@ int simd_test(void) if ( !eq(swap2(src), inv) ) return __LINE__; #endif -#if defined(broadcast) +#ifdef broadcast if ( !eq(broadcast(ELEM_COUNT + 1), src + inv) ) return __LINE__; #endif +#ifdef broadcast2 + if ( !eq(broadcast2(ELEM_COUNT + 1), src + inv) ) return __LINE__; +#endif + #if defined(interleave_lo) && defined(interleave_hi) touch(src); x = interleave_lo(inv, src); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -454,9 +454,13 @@ static const struct ext0f38_table { [0x40] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x41] = { .simd_size = simd_packed_int, .two_op = 1 }, [0x45 ... 0x47] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, - [0x58 ... 0x59] = { .simd_size = simd_other, .two_op = 1 }, - [0x5a] = { .simd_size = simd_128, .two_op = 1 }, - [0x78 ... 0x79] = { .simd_size = simd_other, .two_op = 1 }, + [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 }, + [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 }, + [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 }, + [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 }, + [0x78] = { .simd_size = simd_other, .two_op = 1 }, + [0x79] = { .simd_size = simd_other, .two_op = 1, .d8s = 1 }, + [0x7a ... 0x7c] = { .simd_size = simd_none, .two_op = 1 }, [0x8c] = { .simd_size = simd_packed_int }, [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 }, [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 }, @@ -2636,6 +2640,11 @@ x86_decode_0f38( ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK); break; + case X86EMUL_OPC_EVEX_66(0, 0x7a): /* vpbroadcastb */ + case X86EMUL_OPC_EVEX_66(0, 0x7b): /* vpbroadcastw */ + case X86EMUL_OPC_EVEX_66(0, 0x7c): /* vpbroadcast{d,q} */ + break; + case 0xf0: /* movbe / crc32 */ state->desc |= repne_prefix() ? ByteOp : Mov; if ( rep_prefix() ) @@ -8233,6 +8242,8 @@ x86_emulate( goto avx512f_no_sae; case X86EMUL_OPC_EVEX_66(0x0f38, 0x18): /* vbroadcastss xmm/m32,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,[xyz]mm{k} */ + op_bytes = elem_bytes; generate_exception_if(evex.w || evex.brs, EXC_UD); avx512_broadcast: /* @@ -8252,17 +8263,27 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x1b): /* vbroadcastf32x8 m256,zmm{k} */ /* vbroadcastf64x4 m256,zmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x5b): /* vbroadcasti32x8 m256,zmm{k} */ + /* vbroadcasti64x4 m256,zmm{k} */ generate_exception_if(ea.type != OP_MEM || evex.lr != 2, EXC_UD); /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x19): /* vbroadcastsd xmm/m64,{y,z}mm{k} */ /* vbroadcastf32x2 xmm/m64,{y,z}mm{k} */ - generate_exception_if(!evex.lr || evex.brs, EXC_UD); + generate_exception_if(!evex.lr, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,[xyz]mm{k} */ + /* vbroadcasti32x2 xmm/m64,[xyz]mm{k} */ + if ( b == 0x59 ) + op_bytes = 8; + generate_exception_if(evex.brs, EXC_UD); if ( !evex.w ) host_and_vcpu_must_have(avx512dq); goto avx512_broadcast; case X86EMUL_OPC_EVEX_66(0x0f38, 0x1a): /* vbroadcastf32x4 m128,{y,z}mm{k} */ /* vbroadcastf64x2 m128,{y,z}mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x5a): /* vbroadcasti32x4 m128,{y,z}mm{k} */ + /* vbroadcasti64x2 m128,{y,z}mm{k} */ generate_exception_if(ea.type != OP_MEM || !evex.lr || evex.brs, EXC_UD); if ( evex.w ) @@ -8456,6 +8477,45 @@ x86_emulate( generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD); goto simd_0f_avx2; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x79): /* vpbroadcastw xmm/m16,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512bw); + generate_exception_if(evex.w || evex.brs, EXC_UD); + op_bytes = elem_bytes = 1 << (b & 1); + /* See the comment at the avx512_broadcast label. */ + op_mask |= !(b & 1 ? !(uint32_t)op_mask : !op_mask); + goto avx512f_no_sae; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x7a): /* vpbroadcastb r32,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x7b): /* vpbroadcastw r32,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512bw); + generate_exception_if(evex.w, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x7c): /* vpbroadcast{d,q} reg,[xyz]mm{k} */ + generate_exception_if((ea.type != OP_REG || evex.brs || + evex.reg != 0xf || !evex.RX), + EXC_UD); + host_and_vcpu_must_have(avx512f); + avx512_vlen_check(false); + get_fpu(X86EMUL_FPU_zmm); + + opc = init_evex(stub); + opc[0] = b; + /* Convert GPR source to %rAX. */ + evex.b = 1; + if ( !mode_64bit() ) + evex.w = 0; + opc[1] = modrm & 0xf8; + insn_bytes = EVEX_PFX_BYTES + 2; + opc[2] = 0xc3; + + copy_EVEX(opc, evex); + invoke_stub("", "", "=g" (dummy) : "a" (src.val)); + + put_stub(stub); + ASSERT(!state->simd_size); + break; + case X86EMUL_OPC_VEX_66(0x0f38, 0x8c): /* vpmaskmov{d,q} mem,{x,y}mm,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x8e): /* vpmaskmov{d,q} {x,y}mm,{x,y}mm,mem */ generate_exception_if(ea.type != OP_MEM, EXC_UD); From patchwork Fri Mar 15 10:39:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854457 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83FAA1575 for ; Fri, 15 Mar 2019 10:41:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67E0C2A934 for ; Fri, 15 Mar 2019 10:41:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 57E0B2A933; Fri, 15 Mar 2019 10:41:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E2CCF2A933 for ; Fri, 15 Mar 2019 10:41:10 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kFP-0003PU-CK; Fri, 15 Mar 2019 10:39:31 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kFN-0003PA-Sh for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:39:29 +0000 X-Inumbo-ID: 9ad0a48c-470e-11e9-bd06-fb9f09d34ca8 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 9ad0a48c-470e-11e9-bd06-fb9f09d34ca8; Fri, 15 Mar 2019 10:39:27 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:39:26 -0600 Message-Id: <5C8B80DF020000780021F125@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:39:27 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 06/50] x86emul: basic AVX512VL testing X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Test the 128- and 256-bit variants of the insns which have been implemented already. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v6: Don't enable AVX512VL for scalar tests, nor for S/G ones with index wider than data. Re-base over changes earlier in the series. v4: Move OVR() additions into __AVX512VL__ conditional. v3: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -63,7 +63,7 @@ avx2-sg-flts := 4 8 xop-vecs := $(avx-vecs) xop-ints := 1 2 4 8 xop-flts := $(avx-flts) -avx512f-vecs := 64 +avx512f-vecs := 64 16 32 avx512f-ints := 4 8 avx512f-flts := 4 8 --- a/tools/tests/x86_emulator/simd-fma.c +++ b/tools/tests/x86_emulator/simd-fma.c @@ -5,13 +5,13 @@ ENTRY(fma_test); #if VEC_SIZE < 16 && !defined(to_bool) # define to_bool(cmp) (!~(cmp)[0]) -#elif VEC_SIZE == 16 +#elif VEC_SIZE == 16 && !defined(__AVX512VL__) # if FLOAT_SIZE == 4 # define to_bool(cmp) __builtin_ia32_vtestcps(cmp, (vec_t){} == 0) # elif FLOAT_SIZE == 8 # define to_bool(cmp) __builtin_ia32_vtestcpd(cmp, (vec_t){} == 0) # endif -#elif VEC_SIZE == 32 +#elif VEC_SIZE == 32 && !defined(__AVX512VL__) # if FLOAT_SIZE == 4 # define to_bool(cmp) __builtin_ia32_vtestcps256(cmp, (vec_t){} == 0) # elif FLOAT_SIZE == 8 --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -539,7 +539,7 @@ static inline bool _to_bool(byte_vec_t b # define rotr(x, n) ((vec_t)__builtin_ia32_palignr128((vdi_t)(x), (vdi_t)(x), (n) * 64)) # endif #endif -#if VEC_SIZE == 16 && defined(__SSE4_1__) +#if VEC_SIZE == 16 && defined(__SSE4_1__) && !defined(__AVX512VL__) # if INT_SIZE == 1 # define max(x, y) ((vec_t)__builtin_ia32_pmaxsb128((vqi_t)(x), (vqi_t)(y))) # define min(x, y) ((vec_t)__builtin_ia32_pminsb128((vqi_t)(x), (vqi_t)(y))) @@ -593,7 +593,7 @@ static inline bool _to_bool(byte_vec_t b # define mix(x, y) __builtin_ia32_blendpd(x, y, 0b10) # endif #endif -#if VEC_SIZE == 32 && defined(__AVX__) +#if VEC_SIZE == 32 && defined(__AVX__) && !defined(__AVX512VL__) # if FLOAT_SIZE == 4 # define dot_product(x, y) ({ \ vec_t t_ = __builtin_ia32_dpps256(x, y, 0b11110001); \ --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -92,6 +92,15 @@ typedef long long __attribute__((vector_ #ifdef __AVX512F__ +# if VEC_SIZE > ELEM_SIZE && (defined(VEC_MAX) ? VEC_MAX : VEC_SIZE) < 64 +# pragma GCC target ( "avx512vl" ) +# endif + +# define REN(insn, old, new) \ + asm ( ".macro v" #insn #old " o:vararg \n\t" \ + "v" #insn #new " \\o \n\t" \ + ".endm" ) + /* * The original plan was to effect use of EVEX encodings for scalar as well as * 128- and 256-bit insn variants by restricting the compiler to use (on 64-bit @@ -135,25 +144,88 @@ asm ( ".macro override insn \n\t" # define OVR_FP(n) OVR_VFP(n); OVR_SFP(n) # define OVR_INT(n) OVR_BW(n); OVR_DQ(n) +OVR_INT(broadcast); OVR_SFP(broadcast); OVR_SFP(comi); OVR_FP(add); +OVR_INT(add); OVR_FP(div); OVR(extractps); OVR_FMA(fmadd, FP); +OVR_FMA(fmaddsub, VFP); OVR_FMA(fmsub, FP); +OVR_FMA(fmsubadd, VFP); OVR_FMA(fnmadd, FP); OVR_FMA(fnmsub, FP); OVR(insertps); OVR_FP(max); +OVR_INT(maxs); +OVR_INT(maxu); OVR_FP(min); +OVR_INT(mins); +OVR_INT(minu); OVR(movd); OVR(movq); OVR_SFP(mov); +OVR_VFP(mova); +OVR_VFP(movnt); +OVR_VFP(movu); OVR_FP(mul); +OVR_VFP(shuf); +OVR_INT(sll); +OVR_DQ(sllv); OVR_FP(sqrt); +OVR_INT(sra); +OVR_DQ(srav); +OVR_INT(srl); +OVR_DQ(srlv); OVR_FP(sub); +OVR_INT(sub); OVR_SFP(ucomi); +OVR_VFP(unpckh); +OVR_VFP(unpckl); + +# ifdef __AVX512VL__ +# if ELEM_SIZE == 8 && defined(__AVX512DQ__) +REN(extract, f128, f64x2); +REN(extract, i128, i64x2); +REN(insert, f128, f64x2); +REN(insert, i128, i64x2); +# else +REN(extract, f128, f32x4); +REN(extract, i128, i32x4); +REN(insert, f128, f32x4); +REN(insert, i128, i32x4); +# endif +# if ELEM_SIZE == 8 +REN(movdqa, , 64); +REN(movdqu, , 64); +REN(pand, , q); +REN(pandn, , q); +REN(por, , q); +REN(pxor, , q); +# else +# if ELEM_SIZE == 1 && defined(__AVX512BW__) +REN(movdq, a, u8); +REN(movdqu, , 8); +# elif ELEM_SIZE == 2 && defined(__AVX512BW__) +REN(movdq, a, u16); +REN(movdqu, , 16); +# else +REN(movdqa, , 32); +REN(movdqu, , 32); +# endif +REN(pand, , d); +REN(pandn, , d); +REN(por, , d); +REN(pxor, , d); +# endif +OVR(movntdq); +OVR(movntdqa); +OVR(pmulld); +OVR(pmuldq); +OVR(pmuludq); +# endif # undef OVR_VFP # undef OVR_SFP --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -88,6 +88,11 @@ static bool simd_check_avx512f(void) } #define simd_check_avx512f_opmask simd_check_avx512f +static bool simd_check_avx512f_vl(void) +{ + return cpu_has_avx512f && cpu_has_avx512vl; +} + static bool simd_check_avx512dq(void) { return cpu_has_avx512dq; @@ -142,11 +147,21 @@ static const struct { .check_cpu = simd_check_ ## feat, \ .set_regs = simd_set_regs, \ .check_regs = simd_check_regs } +#define AVX512VL_(bits, desc, feat, form) \ + { .code = feat ## _x86_ ## bits ## _D ## _ ## form, \ + .size = sizeof(feat ## _x86_ ## bits ## _D ## _ ## form), \ + .bitness = bits, .name = "AVX512" #desc, \ + .check_cpu = simd_check_ ## feat ## _vl, \ + .set_regs = simd_set_regs, \ + .check_regs = simd_check_regs } #ifdef __x86_64__ # define SIMD(desc, feat, form) SIMD_(64, desc, feat, form), \ SIMD_(32, desc, feat, form) +# define AVX512VL(desc, feat, form) AVX512VL_(64, desc, feat, form), \ + AVX512VL_(32, desc, feat, form) #else # define SIMD(desc, feat, form) SIMD_(32, desc, feat, form) +# define AVX512VL(desc, feat, form) AVX512VL_(32, desc, feat, form) #endif SIMD(3DNow! single, _3dnow, 8f4), SIMD(SSE scalar single, sse, f4), @@ -257,6 +272,20 @@ static const struct { SIMD(AVX512F u32x16, avx512f, 64u4), SIMD(AVX512F s64x8, avx512f, 64i8), SIMD(AVX512F u64x8, avx512f, 64u8), + AVX512VL(VL f32x4, avx512f, 16f4), + AVX512VL(VL f64x2, avx512f, 16f8), + AVX512VL(VL f32x8, avx512f, 32f4), + AVX512VL(VL f64x4, avx512f, 32f8), + AVX512VL(VL s32x4, avx512f, 16i4), + AVX512VL(VL u32x4, avx512f, 16u4), + AVX512VL(VL s32x8, avx512f, 32i4), + AVX512VL(VL u32x8, avx512f, 32u4), + AVX512VL(VL s64x2, avx512f, 16i8), + AVX512VL(VL u64x2, avx512f, 16u8), + AVX512VL(VL s64x4, avx512f, 32i8), + AVX512VL(VL u64x4, avx512f, 32u8), +#undef AVX512VL_ +#undef AVX512VL #undef SIMD_ #undef SIMD }; From patchwork Fri Mar 15 10:40:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854459 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7AF61575 for ; Fri, 15 Mar 2019 10:41:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F3022A933 for ; Fri, 15 Mar 2019 10:41:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9371D2A936; Fri, 15 Mar 2019 10:41:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0BED72A933 for ; Fri, 15 Mar 2019 10:41:51 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kG6-00044P-OZ; Fri, 15 Mar 2019 10:40:14 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kG4-000448-Vg for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:40:13 +0000 X-Inumbo-ID: b542a0ff-470e-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id b542a0ff-470e-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:40:11 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:40:10 -0600 Message-Id: <5C8B810A020000780021F128@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:40:10 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 07/50] x86emul: support AVX512{F, BW} zero- and sign-extending moves X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Note that the testing in simd.c doesn't really follow the ISA extension pattern - to fit the scheme, extensions from byte and word granular vectors can (currently) sensibly only happen in the AVX512BW case (and hence respective abstraction macros will be added there rather than here). Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Raise #UD when EVEX.b is set. Re-base. v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -177,6 +177,16 @@ static const struct test avx512f_all[] = INSN(pmaxu, 66, 0f38, 3f, vl, dq, vl), INSN(pmins, 66, 0f38, 39, vl, dq, vl), INSN(pminu, 66, 0f38, 3b, vl, dq, vl), + INSN(pmovsxbd, 66, 0f38, 21, vl_4, b, vl), + INSN(pmovsxbq, 66, 0f38, 22, vl_8, b, vl), + INSN(pmovsxwd, 66, 0f38, 23, vl_2, w, vl), + INSN(pmovsxwq, 66, 0f38, 24, vl_4, w, vl), + INSN(pmovsxdq, 66, 0f38, 25, vl_2, d_nb, vl), + INSN(pmovzxbd, 66, 0f38, 31, vl_4, b, vl), + INSN(pmovzxbq, 66, 0f38, 32, vl_8, b, vl), + INSN(pmovzxwd, 66, 0f38, 33, vl_2, w, vl), + INSN(pmovzxwq, 66, 0f38, 34, vl_4, w, vl), + INSN(pmovzxdq, 66, 0f38, 35, vl_2, d_nb, vl), INSN(pmuldq, 66, 0f38, 28, vl, q, vl), INSN(pmulld, 66, 0f38, 40, vl, d, vl), INSN(pmuludq, 66, 0f, f4, vl, q, vl), @@ -274,6 +284,8 @@ static const struct test avx512bw_all[] INSN(pminsw, 66, 0f, ea, vl, w, vl), INSN(pminub, 66, 0f, da, vl, b, vl), INSN(pminuw, 66, 0f38, 3a, vl, w, vl), + INSN(pmovsxbw, 66, 0f38, 20, vl_2, b, vl), + INSN(pmovzxbw, 66, 0f38, 30, vl_2, b, vl), INSN(pmulhuw, 66, 0f, e4, vl, w, vl), INSN(pmulhw, 66, 0f, e5, vl, w, vl), INSN(pmullw, 66, 0f, d5, vl, w, vl), --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -443,13 +443,23 @@ static const struct ext0f38_table { [0x1a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 }, [0x1b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x1c ... 0x1e] = { .simd_size = simd_packed_int, .two_op = 1 }, - [0x20 ... 0x25] = { .simd_size = simd_other, .two_op = 1 }, + [0x20] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, + [0x21] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 }, + [0x22] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_8 }, + [0x23] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, + [0x24] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 }, + [0x25] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x26 ... 0x29] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x2a] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x2b] = { .simd_size = simd_packed_int }, [0x2c ... 0x2d] = { .simd_size = simd_packed_fp }, [0x2e ... 0x2f] = { .simd_size = simd_packed_fp, .to_mem = 1 }, - [0x30 ... 0x35] = { .simd_size = simd_other, .two_op = 1 }, + [0x30] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, + [0x31] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 }, + [0x32] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_8 }, + [0x33] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, + [0x34] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 }, + [0x35] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x36 ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x40] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x41] = { .simd_size = simd_packed_int, .two_op = 1 }, @@ -8349,6 +8359,25 @@ x86_emulate( op_bytes = 16 >> (pmov_convert_delta[b & 7] - vex.l); goto simd_0f_int; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x20): /* vpmovsxbw {x,y}mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x30): /* vpmovzxbw {x,y}mm/mem,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512bw); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x23): /* vpmovsxwd {x,y}mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x24): /* vpmovsxwq xmm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x25): /* vpmovsxdq {x,y}mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x31): /* vpmovzxbd xmm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x32): /* vpmovzxbq xmm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x33): /* vpmovzxwd {x,y}mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x34): /* vpmovzxwq xmm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x35): /* vpmovzxdq {x,y}mm/mem,[xyz]mm{k} */ + generate_exception_if(evex.brs || (evex.w && (b & 7) == 5), EXC_UD); + op_bytes = 32 >> (pmov_convert_delta[b & 7] + 1 - evex.lr); + elem_bytes = (b & 7) < 3 ? 1 : (b & 7) != 5 ? 2 : 4; + goto avx512f_no_sae; + case X86EMUL_OPC_66(0x0f38, 0x2a): /* movntdqa m128,xmm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x2a): /* vmovntdqa mem,{x,y}mm */ generate_exception_if(ea.type != OP_MEM, EXC_UD); --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -311,10 +311,12 @@ static inline bool _to_bool(byte_vec_t b # define max(x, y) B(pmaxsd, _mask, x, y, undef(), ~0) # define min(x, y) B(pminsd, _mask, x, y, undef(), ~0) # define mul_full(x, y) ((vec_t)B(pmuldq, _mask, x, y, (vdi_t)undef(), ~0)) +# define widen1(x) ((vec_t)B(pmovsxdq, _mask, x, (vdi_t)undef(), ~0)) # elif UINT_SIZE == 4 # define max(x, y) ((vec_t)B(pmaxud, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) # define min(x, y) ((vec_t)B(pminud, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) # define mul_full(x, y) ((vec_t)B(pmuludq, _mask, (vsi_t)(x), (vsi_t)(y), (vdi_t)undef(), ~0)) +# define widen1(x) ((vec_t)B(pmovzxdq, _mask, (vsi_half_t)(x), (vdi_t)undef(), ~0)) # elif INT_SIZE == 8 # define max(x, y) ((vec_t)B(pmaxsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) # define min(x, y) ((vec_t)B(pminsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -222,6 +222,16 @@ REN(pxor, , d); # endif OVR(movntdq); OVR(movntdqa); +OVR(pmovsxbd); +OVR(pmovsxbq); +OVR(pmovsxdq); +OVR(pmovsxwd); +OVR(pmovsxwq); +OVR(pmovzxbd); +OVR(pmovzxbq); +OVR(pmovzxdq); +OVR(pmovzxwd); +OVR(pmovzxwq); OVR(pmulld); OVR(pmuldq); OVR(pmuludq); From patchwork Fri Mar 15 10:40:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854461 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2005A1575 for ; Fri, 15 Mar 2019 10:42:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02F4B2A933 for ; Fri, 15 Mar 2019 10:42:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EAFA32A936; Fri, 15 Mar 2019 10:42:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 26BCD2A933 for ; Fri, 15 Mar 2019 10:42:15 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kGX-0004B1-8g; Fri, 15 Mar 2019 10:40:41 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kGV-0004Ad-RD for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:40:39 +0000 X-Inumbo-ID: c4517480-470e-11e9-abf2-57ce10c0dfab Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id c4517480-470e-11e9-abf2-57ce10c0dfab; Fri, 15 Mar 2019 10:40:36 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:40:35 -0600 Message-Id: <5C8B8122020000780021F161@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:40:34 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 08/50] x86emul: support AVX512{F, BW} down conversion moves X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Note that the vpmov{,s,us}{d,q}w table entries in evex-disp8.c are slightly different from what one would expect, due to them requiring EVEX.W to be zero. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v8: Adjustment for XSA-289: Use XOR instead of ADD when fiddling with b as an array index. v7: ea.type == OP_* -> ea.type != OP_*. Re-base over change in previous patch. Re-base. v5: Also adjust x86_insn_is_mem_write(). v4: Also #UD when evex.z is set with a memory operand. v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -177,11 +177,26 @@ static const struct test avx512f_all[] = INSN(pmaxu, 66, 0f38, 3f, vl, dq, vl), INSN(pmins, 66, 0f38, 39, vl, dq, vl), INSN(pminu, 66, 0f38, 3b, vl, dq, vl), + INSN(pmovdb, f3, 0f38, 31, vl_4, b, vl), + INSN(pmovdw, f3, 0f38, 33, vl_2, b, vl), + INSN(pmovqb, f3, 0f38, 32, vl_8, b, vl), + INSN(pmovqd, f3, 0f38, 35, vl_2, d_nb, vl), + INSN(pmovqw, f3, 0f38, 34, vl_4, b, vl), + INSN(pmovsdb, f3, 0f38, 21, vl_4, b, vl), + INSN(pmovsdw, f3, 0f38, 23, vl_2, b, vl), + INSN(pmovsqb, f3, 0f38, 22, vl_8, b, vl), + INSN(pmovsqd, f3, 0f38, 25, vl_2, d_nb, vl), + INSN(pmovsqw, f3, 0f38, 24, vl_4, b, vl), INSN(pmovsxbd, 66, 0f38, 21, vl_4, b, vl), INSN(pmovsxbq, 66, 0f38, 22, vl_8, b, vl), INSN(pmovsxwd, 66, 0f38, 23, vl_2, w, vl), INSN(pmovsxwq, 66, 0f38, 24, vl_4, w, vl), INSN(pmovsxdq, 66, 0f38, 25, vl_2, d_nb, vl), + INSN(pmovusdb, f3, 0f38, 11, vl_4, b, vl), + INSN(pmovusdw, f3, 0f38, 13, vl_2, b, vl), + INSN(pmovusqb, f3, 0f38, 12, vl_8, b, vl), + INSN(pmovusqd, f3, 0f38, 15, vl_2, d_nb, vl), + INSN(pmovusqw, f3, 0f38, 14, vl_4, b, vl), INSN(pmovzxbd, 66, 0f38, 31, vl_4, b, vl), INSN(pmovzxbq, 66, 0f38, 32, vl_8, b, vl), INSN(pmovzxwd, 66, 0f38, 33, vl_2, w, vl), @@ -284,7 +299,10 @@ static const struct test avx512bw_all[] INSN(pminsw, 66, 0f, ea, vl, w, vl), INSN(pminub, 66, 0f, da, vl, b, vl), INSN(pminuw, 66, 0f38, 3a, vl, w, vl), + INSN(pmovswb, f3, 0f38, 20, vl_2, b, vl), INSN(pmovsxbw, 66, 0f38, 20, vl_2, b, vl), + INSN(pmovuswb, f3, 0f38, 10, vl_2, b, vl), + INSN(pmovwb, f3, 0f38, 30, vl_2, b, vl), INSN(pmovzxbw, 66, 0f38, 30, vl_2, b, vl), INSN(pmulhuw, 66, 0f, e4, vl, w, vl), INSN(pmulhw, 66, 0f, e5, vl, w, vl), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -277,6 +277,17 @@ static inline bool _to_bool(byte_vec_t b #endif #if (INT_SIZE == 4 || UINT_SIZE == 4 || INT_SIZE == 8 || UINT_SIZE == 8) && \ defined(__AVX512F__) && (VEC_SIZE == 64 || defined(__AVX512VL__)) +# if ELEM_COUNT == 8 /* vextracti{32,64}x4 */ || \ + (ELEM_COUNT == 16 && ELEM_SIZE == 4 && defined(__AVX512DQ__)) /* vextracti32x8 */ || \ + (ELEM_COUNT == 4 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextracti64x2 */ +# define low_half(x) ({ \ + half_t t_; \ + asm ( "vextracti%c[w]x%c[n] $0, %[s], %[d]" \ + : [d] "=m" (t_) \ + : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 2) ); \ + t_; \ +}) +# endif # if INT_SIZE == 4 || UINT_SIZE == 4 # define broadcast(x) ({ \ vec_t t_; \ @@ -291,6 +302,7 @@ static inline bool _to_bool(byte_vec_t b }) # define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) +# define shrink1(x) ((half_t)B(pmovqd, _mask, (vdi_t)(x), (vsi_half_t){}, ~0)) # elif INT_SIZE == 8 || UINT_SIZE == 8 # define broadcast(x) ({ \ vec_t t_; \ @@ -720,6 +732,27 @@ static inline bool _to_bool(byte_vec_t b # endif #endif +#if VEC_SIZE >= 16 + +# if !defined(low_half) && defined(HALF_SIZE) +static inline half_t low_half(vec_t x) +{ +# if HALF_SIZE < VEC_SIZE + half_t y; + unsigned int i; + + for ( i = 0; i < ELEM_COUNT / 2; ++i ) + y[i] = x[i]; + + return y; +# else + return x; +# endif +} +# endif + +#endif + #if defined(__AVX512F__) && defined(FLOAT_SIZE) # include "simd-fma.c" #endif @@ -1087,6 +1120,21 @@ int simd_test(void) #endif +#if defined(widen1) && defined(shrink1) + { + half_t aux1 = low_half(src), aux2; + + touch(aux1); + x = widen1(aux1); + touch(x); + aux2 = shrink1(x); + touch(aux2); + for ( i = 0; i < ELEM_COUNT / 2; ++i ) + if ( aux2[i] != src[i] ) + return __LINE__; + } +#endif + #ifdef dup_lo touch(src); x = dup_lo(src); --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -70,6 +70,23 @@ typedef int __attribute__((vector_size(V typedef long long __attribute__((vector_size(VEC_SIZE))) vdi_t; #endif +#if VEC_SIZE >= 16 + +# if ELEM_COUNT >= 2 +# if VEC_SIZE > 32 +# define HALF_SIZE (VEC_SIZE / 2) +# else +# define HALF_SIZE 16 +# endif +typedef typeof((vec_t){}[0]) __attribute__((vector_size(HALF_SIZE))) half_t; +typedef char __attribute__((vector_size(HALF_SIZE))) vqi_half_t; +typedef short __attribute__((vector_size(HALF_SIZE))) vhi_half_t; +typedef int __attribute__((vector_size(HALF_SIZE))) vsi_half_t; +typedef long long __attribute__((vector_size(HALF_SIZE))) vdi_half_t; +# endif + +#endif + #if VEC_SIZE == 16 # define B(n, s, a...) __builtin_ia32_ ## n ## 128 ## s(a) # define B_(n, s, a...) __builtin_ia32_ ## n ## s(a) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -3068,7 +3068,22 @@ x86_decode( d |= vSIB; state->simd_size = ext0f38_table[b].simd_size; if ( evex_encoded() ) - disp8scale = decode_disp8scale(ext0f38_table[b].d8s, state); + { + /* + * VPMOVUS* are identical to VPMOVS* Disp8-scaling-wise, but + * their attributes don't match those of the vex_66 encoded + * insns with the same base opcodes. Rather than adding new + * columns to the table, handle this here for now. + */ + if ( evex.pfx != vex_f3 || (b & 0xf8) != 0x10 ) + disp8scale = decode_disp8scale(ext0f38_table[b].d8s, state); + else + { + disp8scale = decode_disp8scale(ext0f38_table[b ^ 0x30].d8s, + state); + state->simd_size = simd_other; + } + } break; case ext_0f3a: @@ -8359,10 +8374,14 @@ x86_emulate( op_bytes = 16 >> (pmov_convert_delta[b & 7] - vex.l); goto simd_0f_int; + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x10): /* vpmovuswb [xyz]mm,{x,y}mm/mem{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x20): /* vpmovsxbw {x,y}mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x20): /* vpmovswb [xyz]mm,{x,y}mm/mem{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x30): /* vpmovzxbw {x,y}mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x30): /* vpmovwb [xyz]mm,{x,y}mm/mem{k} */ host_and_vcpu_must_have(avx512bw); - /* fall through */ + if ( evex.pfx != vex_f3 ) + { case X86EMUL_OPC_EVEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x23): /* vpmovsxwd {x,y}mm/mem,[xyz]mm{k} */ @@ -8373,7 +8392,29 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x33): /* vpmovzxwd {x,y}mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x34): /* vpmovzxwq xmm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x35): /* vpmovzxdq {x,y}mm/mem,[xyz]mm{k} */ - generate_exception_if(evex.brs || (evex.w && (b & 7) == 5), EXC_UD); + generate_exception_if(evex.w && (b & 7) == 5, EXC_UD); + } + else + { + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x11): /* vpmovusdb [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x12): /* vpmovusqb [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x13): /* vpmovusdw [xyz]mm,{x,y}mm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x14): /* vpmovusqw [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x15): /* vpmovusqd [xyz]mm,{x,y}mm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x21): /* vpmovsdb [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x22): /* vpmovsqb [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x23): /* vpmovsdw [xyz]mm,{x,y}mm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x24): /* vpmovsqw [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x25): /* vpmovsqd [xyz]mm,{x,y}mm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x31): /* vpmovdb [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x32): /* vpmovqb [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x33): /* vpmovdw [xyz]mm,{x,y}mm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x34): /* vpmovqw [xyz]mm,xmm/mem{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x35): /* vpmovqd [xyz]mm,{x,y}mm/mem{k} */ + generate_exception_if(evex.w || (ea.type != OP_REG && evex.z), EXC_UD); + d = DstMem | SrcReg | TwoOp; + } + generate_exception_if(evex.brs, EXC_UD); op_bytes = 32 >> (pmov_convert_delta[b & 7] + 1 - evex.lr); elem_bytes = (b & 7) < 3 ? 1 : (b & 7) != 5 ? 2 : 4; goto avx512f_no_sae; @@ -10212,6 +10253,12 @@ x86_insn_is_mem_write(const struct x86_e case X86EMUL_OPC(0x0f, 0xab): /* BTS */ case X86EMUL_OPC(0x0f, 0xb3): /* BTR */ case X86EMUL_OPC(0x0f, 0xbb): /* BTC */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x10) ... + X86EMUL_OPC_EVEX_F3(0x0f38, 0x15): /* VPMOVUS* */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x20) ... + X86EMUL_OPC_EVEX_F3(0x0f38, 0x25): /* VPMOVS* */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x30) ... + X86EMUL_OPC_EVEX_F3(0x0f38, 0x35): /* VPMOV{D,Q,W}* */ return true; case 0xd9: From patchwork Fri Mar 15 10:41:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854463 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB41E1575 for ; Fri, 15 Mar 2019 10:42:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2F092A933 for ; Fri, 15 Mar 2019 10:42:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C711D2A936; Fri, 15 Mar 2019 10:42:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 470F52A933 for ; Fri, 15 Mar 2019 10:42:41 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kGw-0004HN-KW; Fri, 15 Mar 2019 10:41:06 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kGv-0004HD-RX for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:41:05 +0000 X-Inumbo-ID: d52b4836-470e-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id d52b4836-470e-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:41:04 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:41:04 -0600 Message-Id: <5C8B813F020000780021F164@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:41:03 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 09/50] x86emul: support AVX512{F, BW} integer unpack insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP There's once again one extra twobyte_table[] entry which gets its Disp8 shift value set right away without getting support implemented just yet, again to avoid needlessly splitting groups of entries. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v8: Re-base. v6: Re-base over changes earlier in the series. v4: Move OVR() additions into __AVX512VL__ conditional. v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -229,6 +229,10 @@ static const struct test avx512f_all[] = INSN(pternlog, 66, 0f3a, 25, vl, dq, vl), INSN(ptestm, 66, 0f38, 27, vl, dq, vl), INSN(ptestnm, f3, 0f38, 27, vl, dq, vl), + INSN(punpckhdq, 66, 0f, 6a, vl, d, vl), + INSN(punpckhqdq, 66, 0f, 6d, vl, q, vl), + INSN(punpckldq, 66, 0f, 62, vl, d, vl), + INSN(punpcklqdq, 66, 0f, 6c, vl, q, vl), INSN(pxor, 66, 0f, ef, vl, dq, vl), INSN_PFP(shuf, 0f, c6), INSN_FP(sqrt, 0f, 51), @@ -327,6 +331,10 @@ static const struct test avx512bw_all[] INSN(psubw, 66, 0f, f9, vl, w, vl), INSN(ptestm, 66, 0f38, 26, vl, bw, vl), INSN(ptestnm, f3, 0f38, 26, vl, bw, vl), + INSN(punpckhbw, 66, 0f, 68, vl, b, vl), + INSN(punpckhwd, 66, 0f, 69, vl, w, vl), + INSN(punpcklbw, 66, 0f, 60, vl, b, vl), + INSN(punpcklwd, 66, 0f, 61, vl, w, vl), }; static const struct test avx512bw_128[] = { --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -300,6 +300,10 @@ static inline bool _to_bool(byte_vec_t b asm ( "vpbroadcastd %k1, %0" : "=v" (t_) : "r" (x) ); \ t_; \ }) +# if VEC_SIZE == 16 +# define interleave_hi(x, y) ((vec_t)B(punpckhdq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) +# define interleave_lo(x, y) ((vec_t)B(punpckldq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) +# endif # define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) # define shrink1(x) ((half_t)B(pmovqd, _mask, (vdi_t)(x), (vsi_half_t){}, ~0)) @@ -317,6 +321,10 @@ static inline bool _to_bool(byte_vec_t b t_; \ }) # endif +# if VEC_SIZE == 16 +# define interleave_hi(x, y) ((vec_t)B(punpckhqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) +# define interleave_lo(x, y) ((vec_t)B(punpcklqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) +# endif # define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101)) # endif # if INT_SIZE == 4 --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -252,6 +252,10 @@ OVR(pmovzxwq); OVR(pmulld); OVR(pmuldq); OVR(pmuludq); +OVR(punpckhdq); +OVR(punpckhqdq); +OVR(punpckldq); +OVR(punpcklqdq); # endif # undef OVR_VFP --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -312,10 +312,10 @@ static const struct twobyte_table { [0x58 ... 0x59] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl }, [0x5a ... 0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, [0x5c ... 0x5f] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl }, - [0x60 ... 0x62] = { DstImplicit|SrcMem|ModRM, simd_other }, + [0x60 ... 0x62] = { DstImplicit|SrcMem|ModRM, simd_other, d8s_vl }, [0x63 ... 0x67] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, - [0x68 ... 0x6a] = { DstImplicit|SrcMem|ModRM, simd_other }, - [0x6b ... 0x6d] = { DstImplicit|SrcMem|ModRM, simd_packed_int }, + [0x68 ... 0x6a] = { DstImplicit|SrcMem|ModRM, simd_other, d8s_vl }, + [0x6b ... 0x6d] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, [0x6e] = { DstImplicit|SrcMem|ModRM|Mov, simd_none, d8s_dq64 }, [0x6f] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_int, d8s_vl }, [0x70] = { SrcImmByte|ModRM|TwoOp, simd_other }, @@ -6681,6 +6681,12 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f, 0xf6): /* vpsadbw [xyz]mm/mem,[xyz]mm,[xyz]mm */ generate_exception_if(evex.opmsk, EXC_UD); /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x60): /* vpunpcklbw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x61): /* vpunpcklwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x68): /* vpunpckhbw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x69): /* vpunpckhwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + op_bytes = 16 << evex.lr; + /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f, 0xd1): /* vpsrlw xmm/m128,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xe1): /* vpsraw xmm/m128,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xf1): /* vpsllw xmm/m128,[xyz]mm,[xyz]mm{k} */ @@ -6708,6 +6714,13 @@ x86_emulate( elem_bytes = 1 << (b & 1); goto avx512f_no_sae; + case X86EMUL_OPC_EVEX_66(0x0f, 0x62): /* vpunpckldq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x6a): /* vpunpckhdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + generate_exception_if(evex.w, EXC_UD); + fault_suppression = false; + op_bytes = 16 << evex.lr; + goto avx512f_no_sae; + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x26): /* vptestnm{b,w} [xyz]mm/mem,[xyz]mm,k{k} */ case X86EMUL_OPC_EVEX_F3(0x0f38, 0x27): /* vptestnm{d,q} [xyz]mm/mem,[xyz]mm,k{k} */ op_bytes = 16 << evex.lr; @@ -6734,6 +6747,10 @@ x86_emulate( avx512_vlen_check(false); goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f, 0x6c): /* vpunpcklqdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x6d): /* vpunpckhqdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + fault_suppression = false; + /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f, 0xd4): /* vpaddq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xf4): /* vpmuludq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x28): /* vpmuldq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ From patchwork Fri Mar 15 10:41:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854465 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 82F6215AC for ; Fri, 15 Mar 2019 10:43:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 66CDF2A933 for ; Fri, 15 Mar 2019 10:43:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 584CA2A936; Fri, 15 Mar 2019 10:43:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 76F242A933 for ; Fri, 15 Mar 2019 10:43:40 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kHf-0004Rc-W6; Fri, 15 Mar 2019 10:41:51 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kHe-0004RF-Hu for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:41:50 +0000 X-Inumbo-ID: ef3f5313-470e-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id ef3f5313-470e-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:41:48 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:41:47 -0600 Message-Id: <5C8B816C020000780021F167@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:41:48 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 10/50] x86emul: support AVX512{F, BW, _VBMI} full permute insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Take the liberty and also correct the (public interface) name of the AVX512_VBMI feature flag, on the assumption that no external consumer has actually been using that flag so far. Furthermore make it have AVX512BW instead of AVX512F as a prerequisite, for requiring full 64-bit mask registers (the upper 48 bits of which can't be accessed other than through XSAVE/XRSTOR without AVX512BW support). Signed-off-by: Jan Beulich Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v5: Re-base. v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -173,6 +173,10 @@ static const struct test avx512f_all[] = INSN(pcmpgtd, 66, 0f, 66, vl, d, vl), INSN(pcmpgtq, 66, 0f38, 37, vl, q, vl), INSN(pcmpu, 66, 0f3a, 1e, vl, dq, vl), + INSN(permi2, 66, 0f38, 76, vl, dq, vl), + INSN(permi2, 66, 0f38, 77, vl, sd, vl), + INSN(permt2, 66, 0f38, 7e, vl, dq, vl), + INSN(permt2, 66, 0f38, 7f, vl, sd, vl), INSN(pmaxs, 66, 0f38, 3d, vl, dq, vl), INSN(pmaxu, 66, 0f38, 3f, vl, dq, vl), INSN(pmins, 66, 0f38, 39, vl, dq, vl), @@ -294,6 +298,8 @@ static const struct test avx512bw_all[] INSN(pcmpgtb, 66, 0f, 64, vl, b, vl), INSN(pcmpgtw, 66, 0f, 65, vl, w, vl), INSN(pcmpu, 66, 0f3a, 3e, vl, bw, vl), + INSN(permi2w, 66, 0f38, 75, vl, w, vl), + INSN(permt2w, 66, 0f38, 7d, vl, w, vl), INSN(pmaddwd, 66, 0f, f5, vl, w, vl), INSN(pmaxsb, 66, 0f38, 3c, vl, b, vl), INSN(pmaxsw, 66, 0f, ee, vl, w, vl), @@ -378,6 +384,11 @@ static const struct test avx512dq_512[] INSN(inserti32x8, 66, 0f3a, 3a, el_8, d, vl), }; +static const struct test avx512_vbmi_all[] = { + INSN(permi2b, 66, 0f38, 75, vl, b, vl), + INSN(permt2b, 66, 0f38, 7d, vl, b, vl), +}; + static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; static const unsigned char vl_128[] = { VL_128 }; static const unsigned char vl_no128[] = { VL_512, VL_256 }; @@ -718,4 +729,5 @@ void evex_disp8_test(void *instr, struct RUN(avx512dq, 128); RUN(avx512dq, no128); RUN(avx512dq, 512); + RUN(avx512_vbmi, all); } --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -150,6 +150,9 @@ static inline bool _to_bool(byte_vec_t b # define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0) # define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0) # define swap(x) B(shufps, _mask, x, x, 0b00011011, undef(), ~0) +# else +# define interleave_hi(x, y) B(vpermi2varps, _mask, x, interleave_hi, y, ~0) +# define interleave_lo(x, y) B(vpermt2varps, _mask, interleave_lo, x, y, ~0) # endif # elif FLOAT_SIZE == 8 # if VEC_SIZE >= 32 @@ -175,6 +178,9 @@ static inline bool _to_bool(byte_vec_t b # define interleave_hi(x, y) B(unpckhpd, _mask, x, y, undef(), ~0) # define interleave_lo(x, y) B(unpcklpd, _mask, x, y, undef(), ~0) # define swap(x) B(shufpd, _mask, x, x, 0b01, undef(), ~0) +# else +# define interleave_hi(x, y) B(vpermi2varpd, _mask, x, interleave_hi, y, ~0) +# define interleave_lo(x, y) B(vpermt2varpd, _mask, interleave_lo, x, y, ~0) # endif # endif #elif FLOAT_SIZE == 4 && defined(__SSE__) @@ -303,6 +309,9 @@ static inline bool _to_bool(byte_vec_t b # if VEC_SIZE == 16 # define interleave_hi(x, y) ((vec_t)B(punpckhdq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) # define interleave_lo(x, y) ((vec_t)B(punpckldq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) +# else +# define interleave_hi(x, y) ((vec_t)B(vpermi2vard, _mask, (vsi_t)(x), interleave_hi, (vsi_t)(y), ~0)) +# define interleave_lo(x, y) ((vec_t)B(vpermt2vard, _mask, interleave_lo, (vsi_t)(x), (vsi_t)(y), ~0)) # endif # define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) @@ -324,6 +333,9 @@ static inline bool _to_bool(byte_vec_t b # if VEC_SIZE == 16 # define interleave_hi(x, y) ((vec_t)B(punpckhqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) # define interleave_lo(x, y) ((vec_t)B(punpcklqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) +# else +# define interleave_hi(x, y) ((vec_t)B(vpermi2varq, _mask, (vdi_t)(x), interleave_hi, (vdi_t)(y), ~0)) +# define interleave_lo(x, y) ((vec_t)B(vpermt2varq, _mask, interleave_lo, (vdi_t)(x), (vdi_t)(y), ~0)) # endif # define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101)) # endif @@ -769,6 +781,7 @@ int simd_test(void) { unsigned int i, j; vec_t x, y, z, src, inv, alt, sh; + vint_t interleave_lo, interleave_hi; for ( i = 0, j = ELEM_SIZE << 3; i < ELEM_COUNT; ++i ) { @@ -782,6 +795,9 @@ int simd_test(void) if ( !(i & (i + 1)) ) --j; sh[i] = j; + + interleave_lo[i] = ((i & 1) * ELEM_COUNT) | (i >> 1); + interleave_hi[i] = interleave_lo[i] + (ELEM_COUNT / 2); } touch(src); @@ -1075,7 +1091,7 @@ int simd_test(void) x = src * alt; y = interleave_lo(x, alt < 0); touch(x); - z = widen1(x); + z = widen1(low_half(x)); touch(x); if ( !eq(z, y) ) return __LINE__; @@ -1107,7 +1123,7 @@ int simd_test(void) # ifdef widen1 touch(src); - x = widen1(src); + x = widen1(low_half(src)); touch(src); if ( !eq(x, y) ) return __LINE__; # endif --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -70,6 +70,16 @@ typedef int __attribute__((vector_size(V typedef long long __attribute__((vector_size(VEC_SIZE))) vdi_t; #endif +#if ELEM_SIZE == 1 +typedef vqi_t vint_t; +#elif ELEM_SIZE == 2 +typedef vhi_t vint_t; +#elif ELEM_SIZE == 4 +typedef vsi_t vint_t; +#elif ELEM_SIZE == 8 +typedef vdi_t vint_t; +#endif + #if VEC_SIZE >= 16 # if ELEM_COUNT >= 2 --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -136,6 +136,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512dq (cp.feat.avx512dq && xcr0_mask(0xe6)) #define cpu_has_avx512bw (cp.feat.avx512bw && xcr0_mask(0xe6)) #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) +#define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -468,9 +468,13 @@ static const struct ext0f38_table { [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 }, [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 }, [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 }, + [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x78] = { .simd_size = simd_other, .two_op = 1 }, [0x79] = { .simd_size = simd_other, .two_op = 1, .d8s = 1 }, [0x7a ... 0x7c] = { .simd_size = simd_none, .two_op = 1 }, + [0x7d ... 0x7e] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x7f] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x8c] = { .simd_size = simd_packed_int }, [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 }, [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 }, @@ -1861,6 +1865,7 @@ static bool vcpu_has( #define vcpu_has_sha() vcpu_has( 7, EBX, 29, ctxt, ops) #define vcpu_has_avx512bw() vcpu_has( 7, EBX, 30, ctxt, ops) #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) +#define vcpu_has_avx512_vbmi() vcpu_has( 7, ECX, 1, ctxt, ops) #define vcpu_has_rdpid() vcpu_has( 7, ECX, 22, ctxt, ops) #define vcpu_has_clzero() vcpu_has(0x80000008, EBX, 0, ctxt, ops) @@ -6043,6 +6048,11 @@ x86_emulate( CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x15): /* vunpckhp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ generate_exception_if(evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK), EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x76): /* vpermi2{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x77): /* vpermi2p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x7e): /* vpermt2{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x7f): /* vpermt2p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ fault_suppression = false; /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f, 0xdb): /* vpand{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ @@ -8564,6 +8574,16 @@ x86_emulate( generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD); goto simd_0f_avx2; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + if ( !evex.w ) + host_and_vcpu_must_have(avx512_vbmi); + else + host_and_vcpu_must_have(avx512bw); + generate_exception_if(evex.brs, EXC_UD); + fault_suppression = false; + goto avx512f_no_sae; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x79): /* vpbroadcastw xmm/m16,[xyz]mm{k} */ host_and_vcpu_must_have(avx512bw); --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -107,6 +107,7 @@ #define cpu_has_avx512vl boot_cpu_has(X86_FEATURE_AVX512VL) /* CPUID level 0x00000007:0.ecx */ +#define cpu_has_avx512_vbmi boot_cpu_has(X86_FEATURE_AVX512_VBMI) #define cpu_has_rdpid boot_cpu_has(X86_FEATURE_RDPID) /* CPUID level 0x80000007.edx */ --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -224,7 +224,7 @@ XEN_CPUFEATURE(AVX512VL, 5*32+31) / /* Intel-defined CPU features, CPUID level 0x00000007:0.ecx, word 6 */ XEN_CPUFEATURE(PREFETCHWT1, 6*32+ 0) /*A PREFETCHWT1 instruction */ -XEN_CPUFEATURE(AVX512VBMI, 6*32+ 1) /*A AVX-512 Vector Byte Manipulation Instrs */ +XEN_CPUFEATURE(AVX512_VBMI, 6*32+ 1) /*A AVX-512 Vector Byte Manipulation Instrs */ XEN_CPUFEATURE(UMIP, 6*32+ 2) /*S User Mode Instruction Prevention */ XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -259,12 +259,17 @@ def crunch_numbers(state): AVX2: [AVX512F], # AVX512F is taken to mean hardware support for 512bit registers - # (which in practice depends on the EVEX prefix to encode), and the - # instructions themselves. All further AVX512 features are built on - # top of AVX512F + # (which in practice depends on the EVEX prefix to encode) as well + # as mask registers, and the instructions themselves. All further + # AVX512 features are built on top of AVX512F AVX512F: [AVX512DQ, AVX512IFMA, AVX512PF, AVX512ER, AVX512CD, - AVX512BW, AVX512VL, AVX512VBMI, AVX512_4VNNIW, - AVX512_4FMAPS, AVX512_VPOPCNTDQ], + AVX512BW, AVX512VL, AVX512_4VNNIW, AVX512_4FMAPS, + AVX512_VPOPCNTDQ], + + # AVX512 extensions acting solely on vectors of bytes/words are made + # dependents of AVX512BW (as to requiring wider than 16-bit mask + # registers), despite the SDM not formally making this connection. + AVX512BW: [AVX512_VBMI], # The features: # * Single Thread Indirect Branch Predictors From patchwork Fri Mar 15 10:43:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854467 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25C7A15AC for ; Fri, 15 Mar 2019 10:44:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 06DF72A842 for ; Fri, 15 Mar 2019 10:44:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EB1702A86D; Fri, 15 Mar 2019 10:44:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 31F9B2A842 for ; Fri, 15 Mar 2019 10:44:46 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kIr-0004cV-GD; Fri, 15 Mar 2019 10:43:05 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kIq-0004cM-Rn for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:43:04 +0000 X-Inumbo-ID: 1b04e870-470f-11e9-9060-ab704e06104d Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 1b04e870-470f-11e9-9060-ab704e06104d; Fri, 15 Mar 2019 10:43:02 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:43:01 -0600 Message-Id: <5C8B81B5020000780021F16A@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:43:01 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 11/50] x86emul: support AVX512{F, BW} integer shuffle insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Also include vshuff{32x4,64x2} as being very similar to vshufi{32x4,64x2}. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v8: Re-base. v7: Disable fault suppression for VPSHUF{D,{H,L}W}. Re-base. v6: Re-base over changes earlier in the series. v5: Re-base over changes earlier in the series. v4: Move OVR() addition into __AVX512VL__ conditional. Correct comments. v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -214,6 +214,7 @@ static const struct test avx512f_all[] = INSN(prolv, 66, 0f38, 15, vl, dq, vl), INSNX(pror, 66, 0f, 72, 0, vl, dq, vl), INSN(prorv, 66, 0f38, 14, vl, dq, vl), + INSN(pshufd, 66, 0f, 70, vl, d, vl), INSN(pslld, 66, 0f, f2, el_4, d, vl), INSNX(pslld, 66, 0f, 72, 6, vl, d, vl), INSN(psllq, 66, 0f, f3, el_2, q, vl), @@ -264,6 +265,10 @@ static const struct test avx512f_no128[] INSN(extracti32x4, 66, 0f3a, 39, el_4, d, vl), INSN(insertf32x4, 66, 0f3a, 18, el_4, d, vl), INSN(inserti32x4, 66, 0f3a, 38, el_4, d, vl), + INSN(shuff32x4, 66, 0f3a, 23, vl, d, vl), + INSN(shuff64x2, 66, 0f3a, 23, vl, q, vl), + INSN(shufi32x4, 66, 0f3a, 43, vl, d, vl), + INSN(shufi64x2, 66, 0f3a, 43, vl, q, vl), }; static const struct test avx512f_512[] = { @@ -318,6 +323,9 @@ static const struct test avx512bw_all[] INSN(pmulhw, 66, 0f, e5, vl, w, vl), INSN(pmullw, 66, 0f, d5, vl, w, vl), INSN(psadbw, 66, 0f, f6, vl, b, vl), + INSN(pshufb, 66, 0f38, 00, vl, b, vl), + INSN(pshufhw, f3, 0f, 70, vl, w, vl), + INSN(pshuflw, f2, 0f, 70, vl, w, vl), INSNX(pslldq, 66, 0f, 73, 7, vl, b, vl), INSN(psllvw, 66, 0f38, 12, vl, w, vl), INSN(psllw, 66, 0f, f1, el_8, w, vl), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -153,6 +153,10 @@ static inline bool _to_bool(byte_vec_t b # else # define interleave_hi(x, y) B(vpermi2varps, _mask, x, interleave_hi, y, ~0) # define interleave_lo(x, y) B(vpermt2varps, _mask, interleave_lo, x, y, ~0) +# define swap(x) ({ \ + vec_t t_ = B(shuf_f32x4_, _mask, x, x, VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0); \ + B(shufps, _mask, t_, t_, 0b00011011, undef(), ~0); \ +}) # endif # elif FLOAT_SIZE == 8 # if VEC_SIZE >= 32 @@ -181,6 +185,10 @@ static inline bool _to_bool(byte_vec_t b # else # define interleave_hi(x, y) B(vpermi2varpd, _mask, x, interleave_hi, y, ~0) # define interleave_lo(x, y) B(vpermt2varpd, _mask, interleave_lo, x, y, ~0) +# define swap(x) ({ \ + vec_t t_ = B(shuf_f64x2_, _mask, x, x, VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0); \ + B(shufpd, _mask, t_, t_, 0b01010101, undef(), ~0); \ +}) # endif # endif #elif FLOAT_SIZE == 4 && defined(__SSE__) @@ -309,9 +317,14 @@ static inline bool _to_bool(byte_vec_t b # if VEC_SIZE == 16 # define interleave_hi(x, y) ((vec_t)B(punpckhdq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) # define interleave_lo(x, y) ((vec_t)B(punpckldq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) +# define swap(x) ((vec_t)B(pshufd, _mask, (vsi_t)(x), 0b00011011, (vsi_t)undef(), ~0)) # else # define interleave_hi(x, y) ((vec_t)B(vpermi2vard, _mask, (vsi_t)(x), interleave_hi, (vsi_t)(y), ~0)) # define interleave_lo(x, y) ((vec_t)B(vpermt2vard, _mask, interleave_lo, (vsi_t)(x), (vsi_t)(y), ~0)) +# define swap(x) ((vec_t)B(pshufd, _mask, \ + B(shuf_i32x4_, _mask, (vsi_t)(x), (vsi_t)(x), \ + VEC_SIZE == 32 ? 0b01 : 0b00011011, (vsi_t)undef(), ~0), \ + 0b00011011, (vsi_t)undef(), ~0)) # endif # define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) @@ -333,9 +346,14 @@ static inline bool _to_bool(byte_vec_t b # if VEC_SIZE == 16 # define interleave_hi(x, y) ((vec_t)B(punpckhqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) # define interleave_lo(x, y) ((vec_t)B(punpcklqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) +# define swap(x) ((vec_t)B(pshufd, _mask, (vsi_t)(x), 0b01001110, (vsi_t)undef(), ~0)) # else # define interleave_hi(x, y) ((vec_t)B(vpermi2varq, _mask, (vdi_t)(x), interleave_hi, (vdi_t)(y), ~0)) # define interleave_lo(x, y) ((vec_t)B(vpermt2varq, _mask, interleave_lo, (vdi_t)(x), (vdi_t)(y), ~0)) +# define swap(x) ((vec_t)B(pshufd, _mask, \ + (vsi_t)B(shuf_i64x2_, _mask, (vdi_t)(x), (vdi_t)(x), \ + VEC_SIZE == 32 ? 0b01 : 0b00011011, (vdi_t)undef(), ~0), \ + 0b01001110, (vsi_t)undef(), ~0)) # endif # define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101)) # endif --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -119,6 +119,12 @@ typedef long long __attribute__((vector_ #ifdef __AVX512F__ +/* Sadly there are a few exceptions to the general naming rules. */ +# define __builtin_ia32_shuf_f32x4_512_mask __builtin_ia32_shuf_f32x4_mask +# define __builtin_ia32_shuf_f64x2_512_mask __builtin_ia32_shuf_f64x2_mask +# define __builtin_ia32_shuf_i32x4_512_mask __builtin_ia32_shuf_i32x4_mask +# define __builtin_ia32_shuf_i64x2_512_mask __builtin_ia32_shuf_i64x2_mask + # if VEC_SIZE > ELEM_SIZE && (defined(VEC_MAX) ? VEC_MAX : VEC_SIZE) < 64 # pragma GCC target ( "avx512vl" ) # endif @@ -262,6 +268,7 @@ OVR(pmovzxwq); OVR(pmulld); OVR(pmuldq); OVR(pmuludq); +OVR(pshufd); OVR(punpckhdq); OVR(punpckhqdq); OVR(punpckldq); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -318,7 +318,7 @@ static const struct twobyte_table { [0x6b ... 0x6d] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, [0x6e] = { DstImplicit|SrcMem|ModRM|Mov, simd_none, d8s_dq64 }, [0x6f] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_int, d8s_vl }, - [0x70] = { SrcImmByte|ModRM|TwoOp, simd_other }, + [0x70] = { SrcImmByte|ModRM|TwoOp, simd_other, d8s_vl }, [0x71 ... 0x73] = { DstImplicit|SrcImmByte|ModRM, simd_none, d8s_vl }, [0x74 ... 0x76] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, [0x77] = { DstImplicit|SrcNone }, @@ -432,7 +432,8 @@ static const struct ext0f38_table { uint8_t vsib:1; disp8scale_t d8s:4; } ext0f38_table[256] = { - [0x00 ... 0x0b] = { .simd_size = simd_packed_int }, + [0x00] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x01 ... 0x0b] = { .simd_size = simd_packed_int }, [0x0c ... 0x0f] = { .simd_size = simd_packed_fp }, [0x10 ... 0x12] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x13] = { .simd_size = simd_other, .two_op = 1 }, @@ -543,6 +544,7 @@ static const struct ext0f3a_table { [0x20] = { .simd_size = simd_none, .d8s = 0 }, [0x21] = { .simd_size = simd_other, .d8s = 2 }, [0x22] = { .simd_size = simd_none, .d8s = d8s_dq64 }, + [0x23] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x25] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x30 ... 0x33] = { .simd_size = simd_other, .two_op = 1 }, [0x38] = { .simd_size = simd_128, .d8s = 4 }, @@ -552,6 +554,7 @@ static const struct ext0f3a_table { [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x40 ... 0x41] = { .simd_size = simd_packed_fp }, [0x42] = { .simd_size = simd_packed_int }, + [0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x44] = { .simd_size = simd_packed_int }, [0x46] = { .simd_size = simd_packed_int }, [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 }, @@ -6701,6 +6704,7 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f, 0xe1): /* vpsraw xmm/m128,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xf1): /* vpsllw xmm/m128,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xf5): /* vpmaddwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x00): /* vpshufb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ fault_suppression = false; /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f, 0xd5): /* vpmullw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ @@ -6955,6 +6959,21 @@ x86_emulate( insn_bytes = PFX_BYTES + 3; break; + case X86EMUL_OPC_EVEX_66(0x0f, 0x70): /* vpshufd $imm8,[xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f, 0x70): /* vpshufhw $imm8,[xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_F2(0x0f, 0x70): /* vpshuflw $imm8,[xyz]mm/mem,[xyz]mm{k} */ + if ( evex.pfx == vex_66 ) + generate_exception_if(evex.w, EXC_UD); + else + { + host_and_vcpu_must_have(avx512bw); + generate_exception_if(evex.brs, EXC_UD); + } + d = (d & ~SrcMask) | SrcMem | TwoOp; + op_bytes = 16 << evex.lr; + fault_suppression = false; + goto avx512f_imm8_no_sae; + CASE_SIMD_PACKED_INT(0x0f, 0x71): /* Grp12 */ case X86EMUL_OPC_VEX_66(0x0f, 0x71): CASE_SIMD_PACKED_INT(0x0f, 0x72): /* Grp13 */ @@ -9150,7 +9169,13 @@ x86_emulate( /* vextracti64x2 $imm8,{y,z}mm,xmm/m128{k} */ if ( evex.w ) host_and_vcpu_must_have(avx512dq); - generate_exception_if(!evex.lr || evex.brs, EXC_UD); + generate_exception_if(evex.brs, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x23): /* vshuff32x4 $imm8,{y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ + /* vshuff64x2 $imm8,{y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x43): /* vshufi32x4 $imm8,{y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ + /* vshufi64x2 $imm8,{y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ + generate_exception_if(!evex.lr, EXC_UD); fault_suppression = false; goto avx512f_imm8_no_sae; From patchwork Fri Mar 15 10:43:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854469 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8CBF915AC for ; Fri, 15 Mar 2019 10:45:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71C7D2A842 for ; Fri, 15 Mar 2019 10:45:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6077B2A86D; Fri, 15 Mar 2019 10:45:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EEB5C2A842 for ; Fri, 15 Mar 2019 10:45:08 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kJJ-0004gY-RJ; Fri, 15 Mar 2019 10:43:33 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kJJ-0004gQ-3I for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:43:33 +0000 X-Inumbo-ID: 2d040f67-470f-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 2d040f67-470f-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:43:32 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:43:31 -0600 Message-Id: <5C8B81D3020000780021F16D@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:43:31 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 12/50] x86emul: support AVX512{BW, DQ} mask move insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Entries to the tables in evex-disp8.c are added despite these insns not allowing for memory operands, with the goal of the tables giving a complete picture of the supported EVEX-encoded insns in the end. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v3: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -314,9 +314,12 @@ static const struct test avx512bw_all[] INSN(pminsw, 66, 0f, ea, vl, w, vl), INSN(pminub, 66, 0f, da, vl, b, vl), INSN(pminuw, 66, 0f38, 3a, vl, w, vl), +// pmovb2m, f3, 0f38, 29, b +// pmovm2, f3, 0f38, 28, bw INSN(pmovswb, f3, 0f38, 20, vl_2, b, vl), INSN(pmovsxbw, 66, 0f38, 20, vl_2, b, vl), INSN(pmovuswb, f3, 0f38, 10, vl_2, b, vl), +// pmovw2m, f3, 0f38, 29, w INSN(pmovwb, f3, 0f38, 30, vl_2, b, vl), INSN(pmovzxbw, 66, 0f38, 30, vl_2, b, vl), INSN(pmulhuw, 66, 0f, e4, vl, w, vl), @@ -364,6 +367,9 @@ static const struct test avx512dq_all[] INSN_PFP(andn, 0f, 55), INSN(broadcasti32x2, 66, 0f38, 59, el_2, d, vl), INSN_PFP(or, 0f, 56), +// pmovd2m, f3, 0f38, 39, d +// pmovm2, f3, 0f38, 38, dq +// pmovq2m, f3, 0f38, 39, q INSN(pmullq, 66, 0f38, 40, vl, q, vl), INSN_PFP(xor, 0f, 57), }; --- a/tools/tests/x86_emulator/opmask.S +++ b/tools/tests/x86_emulator/opmask.S @@ -12,17 +12,23 @@ #if SIZE == 1 # define _(x) x##b +# define _v(x, t) _v_(x##q, t) #elif SIZE == 2 # define _(x) x##w +# define _v(x, t) _v_(x##d, t) # define WIDEN(x) x##bw #elif SIZE == 4 # define _(x) x##d +# define _v(x, t) _v_(x##w, t) # define WIDEN(x) x##wd #elif SIZE == 8 # define _(x) x##q +# define _v(x, t) _v_(x##b, t) # define WIDEN(x) x##dq #endif +#define _v_(x, t) v##x##t + .macro check res1:req, res2:req, line:req _(kmov) %\res1, DATA(out) #if SIZE < 8 || !defined(__i386__) @@ -131,6 +137,15 @@ _start: #endif +#if SIZE > 2 ? defined(__AVX512BW__) : defined(__AVX512DQ__) + + _(kmov) DATA(in1), %k0 + _v(pmovm2,) %k0, %zmm7 + _v(pmov,2m) %zmm7, %k3 + check k0, k3, __LINE__ + +#endif + xor %eax, %eax ret --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -8465,6 +8465,21 @@ x86_emulate( elem_bytes = (b & 7) < 3 ? 1 : (b & 7) != 5 ? 2 : 4; goto avx512f_no_sae; + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x29): /* vpmov{b,w}2m [xyz]mm,k */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x39): /* vpmov{d,q}2m [xyz]mm,k */ + generate_exception_if(!evex.r || !evex.R, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x28): /* vpmovm2{b,w} k,[xyz]mm */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x38): /* vpmovm2{d,q} k,[xyz]mm */ + if ( b & 0x10 ) + host_and_vcpu_must_have(avx512dq); + else + host_and_vcpu_must_have(avx512bw); + generate_exception_if(evex.opmsk || ea.type != OP_REG, EXC_UD); + d |= TwoOp; + op_bytes = 16 << evex.lr; + goto avx512f_no_sae; + case X86EMUL_OPC_66(0x0f38, 0x2a): /* movntdqa m128,xmm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x2a): /* vmovntdqa mem,{x,y}mm */ generate_exception_if(ea.type != OP_MEM, EXC_UD); From patchwork Fri Mar 15 10:43:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854471 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 500C015AC for ; Fri, 15 Mar 2019 10:45:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 336FF2A934 for ; Fri, 15 Mar 2019 10:45:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 257492A937; Fri, 15 Mar 2019 10:45:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4411A2A934 for ; Fri, 15 Mar 2019 10:45:46 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kJp-0004mm-6i; Fri, 15 Mar 2019 10:44:05 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kJn-0004mX-Na for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:44:03 +0000 X-Inumbo-ID: 3d50d2f4-470f-11e9-8a90-0b134121ba9f Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 3d50d2f4-470f-11e9-8a90-0b134121ba9f; Fri, 15 Mar 2019 10:43:59 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:43:58 -0600 Message-Id: <5C8B81ED020000780021F170@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:43:57 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 13/50] x86emul: basic AVX512BW testing X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Test various of the insns which have been implemented already. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v8: Correct PS{R,L}LDQ overrides. v6: Re-base over changes earlier in the series. v4: Add __AVX512VL__ conditional around majority of OVR() additions. Correct eq() for 1- and 2-byte cases. v3: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86 CFLAGS += $(CFLAGS_xeninclude) -SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f +SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw FMA := fma4 fma SG := avx2-sg TESTCASES := blowfish $(SIMD) $(FMA) $(SG) @@ -66,6 +66,9 @@ xop-flts := $(avx-flts) avx512f-vecs := 64 16 32 avx512f-ints := 4 8 avx512f-flts := 4 8 +avx512bw-vecs := $(avx512f-vecs) +avx512bw-ints := 1 2 +avx512bw-flts := avx512f-opmask-vecs := 2 avx512dq-opmask-vecs := 1 --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -31,6 +31,10 @@ ENTRY(simd_test); # define eq(x, y) ((BR(cmpps, _mask, x, y, 0, -1) & ALL_TRUE) == ALL_TRUE) # elif FLOAT_SIZE == 8 # define eq(x, y) (BR(cmppd, _mask, x, y, 0, -1) == ALL_TRUE) +# elif (INT_SIZE == 1 || UINT_SIZE == 1) && defined(__AVX512BW__) +# define eq(x, y) (B(pcmpeqb, _mask, (vqi_t)(x), (vqi_t)(y), -1) == ALL_TRUE) +# elif (INT_SIZE == 2 || UINT_SIZE == 2) && defined(__AVX512BW__) +# define eq(x, y) (B(pcmpeqw, _mask, (vhi_t)(x), (vhi_t)(y), -1) == ALL_TRUE) # elif INT_SIZE == 4 || UINT_SIZE == 4 # define eq(x, y) (B(pcmpeqd, _mask, (vsi_t)(x), (vsi_t)(y), -1) == ALL_TRUE) # elif INT_SIZE == 8 || UINT_SIZE == 8 @@ -374,6 +378,87 @@ static inline bool _to_bool(byte_vec_t b # define max(x, y) ((vec_t)B(pmaxuq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) # define min(x, y) ((vec_t)B(pminuq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) # endif +#elif (INT_SIZE == 1 || UINT_SIZE == 1 || INT_SIZE == 2 || UINT_SIZE == 2) && \ + defined(__AVX512BW__) && (VEC_SIZE == 64 || defined(__AVX512VL__)) +# if INT_SIZE == 1 || UINT_SIZE == 1 +# define broadcast(x) ({ \ + vec_t t_; \ + asm ( "%{evex%} vpbroadcastb %1, %0" \ + : "=v" (t_) : "m" (*(char[1]){ x }) ); \ + t_; \ +}) +# define broadcast2(x) ({ \ + vec_t t_; \ + asm ( "vpbroadcastb %k1, %0" : "=v" (t_) : "r" (x) ); \ + t_; \ +}) +# if VEC_SIZE == 16 +# define interleave_hi(x, y) ((vec_t)B(punpckhbw, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) +# define interleave_lo(x, y) ((vec_t)B(punpcklbw, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) +# define swap(x) ((vec_t)B(pshufb, _mask, (vqi_t)(x), (vqi_t)(inv - 1), (vqi_t)undef(), ~0)) +# elif defined(__AVX512VBMI__) +# define interleave_hi(x, y) ((vec_t)B(vpermi2varqi, _mask, (vqi_t)(x), interleave_hi, (vqi_t)(y), ~0)) +# define interleave_lo(x, y) ((vec_t)B(vpermt2varqi, _mask, interleave_lo, (vqi_t)(x), (vqi_t)(y), ~0)) +# endif +# define mix(x, y) ((vec_t)B(movdquqi, _mask, (vqi_t)(x), (vqi_t)(y), \ + (0b0101010101010101010101010101010101010101010101010101010101010101LL & ALL_TRUE))) +# define shrink1(x) ((half_t)B(pmovwb, _mask, (vhi_t)(x), (vqi_half_t){}, ~0)) +# define shrink2(x) ((quarter_t)B(pmovdb, _mask, (vsi_t)(x), (vqi_quarter_t){}, ~0)) +# define shrink3(x) ((eighth_t)B(pmovqb, _mask, (vdi_t)(x), (vqi_eighth_t){}, ~0)) +# elif INT_SIZE == 2 || UINT_SIZE == 2 +# define broadcast(x) ({ \ + vec_t t_; \ + asm ( "%{evex%} vpbroadcastw %1, %0" \ + : "=v" (t_) : "m" (*(short[1]){ x }) ); \ + t_; \ +}) +# define broadcast2(x) ({ \ + vec_t t_; \ + asm ( "vpbroadcastw %k1, %0" : "=v" (t_) : "r" (x) ); \ + t_; \ +}) +# if VEC_SIZE == 16 +# define interleave_hi(x, y) ((vec_t)B(punpckhwd, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0)) +# define interleave_lo(x, y) ((vec_t)B(punpcklwd, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0)) +# define swap(x) ((vec_t)B(pshufd, _mask, \ + (vsi_t)B(pshufhw, _mask, \ + B(pshuflw, _mask, (vhi_t)(x), 0b00011011, (vhi_t)undef(), ~0), \ + 0b00011011, (vhi_t)undef(), ~0), \ + 0b01001110, (vsi_t)undef(), ~0)) +# else +# define interleave_hi(x, y) ((vec_t)B(vpermi2varhi, _mask, (vhi_t)(x), interleave_hi, (vhi_t)(y), ~0)) +# define interleave_lo(x, y) ((vec_t)B(vpermt2varhi, _mask, interleave_lo, (vhi_t)(x), (vhi_t)(y), ~0)) +# endif +# define mix(x, y) ((vec_t)B(movdquhi, _mask, (vhi_t)(x), (vhi_t)(y), \ + (0b01010101010101010101010101010101 & ALL_TRUE))) +# define shrink1(x) ((half_t)B(pmovdw, _mask, (vsi_t)(x), (vhi_half_t){}, ~0)) +# define shrink2(x) ((quarter_t)B(pmovqw, _mask, (vdi_t)(x), (vhi_quarter_t){}, ~0)) +# endif +# if INT_SIZE == 1 +# define max(x, y) ((vec_t)B(pmaxsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) +# define min(x, y) ((vec_t)B(pminsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) +# define widen1(x) ((vec_t)B(pmovsxbw, _mask, (vqi_half_t)(x), (vhi_t)undef(), ~0)) +# define widen2(x) ((vec_t)B(pmovsxbd, _mask, (vqi_quarter_t)(x), (vsi_t)undef(), ~0)) +# define widen3(x) ((vec_t)B(pmovsxbq, _mask, (vqi_eighth_t)(x), (vdi_t)undef(), ~0)) +# elif UINT_SIZE == 1 +# define max(x, y) ((vec_t)B(pmaxub, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) +# define min(x, y) ((vec_t)B(pminub, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) +# define widen1(x) ((vec_t)B(pmovzxbw, _mask, (vqi_half_t)(x), (vhi_t)undef(), ~0)) +# define widen2(x) ((vec_t)B(pmovzxbd, _mask, (vqi_quarter_t)(x), (vsi_t)undef(), ~0)) +# define widen3(x) ((vec_t)B(pmovzxbq, _mask, (vqi_eighth_t)(x), (vdi_t)undef(), ~0)) +# elif INT_SIZE == 2 +# define max(x, y) B(pmaxsw, _mask, x, y, undef(), ~0) +# define min(x, y) B(pminsw, _mask, x, y, undef(), ~0) +# define mul_hi(x, y) B(pmulhw, _mask, x, y, undef(), ~0) +# define widen1(x) ((vec_t)B(pmovsxwd, _mask, x, (vsi_t)undef(), ~0)) +# define widen2(x) ((vec_t)B(pmovsxwq, _mask, x, (vdi_t)undef(), ~0)) +# elif UINT_SIZE == 2 +# define max(x, y) ((vec_t)B(pmaxuw, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0)) +# define min(x, y) ((vec_t)B(pminuw, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0)) +# define mul_hi(x, y) ((vec_t)B(pmulhuw, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0)) +# define widen1(x) ((vec_t)B(pmovzxwd, _mask, (vhi_half_t)(x), (vsi_t)undef(), ~0)) +# define widen2(x) ((vec_t)B(pmovzxwq, _mask, (vhi_quarter_t)(x), (vdi_t)undef(), ~0)) +# endif #elif VEC_SIZE == 16 && defined(__SSE2__) # if INT_SIZE == 1 || UINT_SIZE == 1 # define interleave_hi(x, y) ((vec_t)__builtin_ia32_punpckhbw128((vqi_t)(x), (vqi_t)(y))) @@ -565,7 +650,7 @@ static inline bool _to_bool(byte_vec_t b # endif # endif #endif -#if VEC_SIZE == 16 && defined(__SSSE3__) +#if VEC_SIZE == 16 && defined(__SSSE3__) && !defined(__AVX512VL__) # if INT_SIZE == 1 # define abs(x) ((vec_t)__builtin_ia32_pabsb128((vqi_t)(x))) # elif INT_SIZE == 2 @@ -789,6 +874,40 @@ static inline half_t low_half(vec_t x) } # endif +# if !defined(low_quarter) && defined(QUARTER_SIZE) +static inline quarter_t low_quarter(vec_t x) +{ +# if QUARTER_SIZE < VEC_SIZE + quarter_t y; + unsigned int i; + + for ( i = 0; i < ELEM_COUNT / 4; ++i ) + y[i] = x[i]; + + return y; +# else + return x; +# endif +} +# endif + +# if !defined(low_eighth) && defined(EIGHTH_SIZE) +static inline eighth_t low_eighth(vec_t x) +{ +# if EIGHTH_SIZE < VEC_SIZE + eighth_t y; + unsigned int i; + + for ( i = 0; i < ELEM_COUNT / 4; ++i ) + y[i] = x[i]; + + return y; +# else + return x; +# endif +} +# endif + #endif #if defined(__AVX512F__) && defined(FLOAT_SIZE) @@ -1117,7 +1236,7 @@ int simd_test(void) y = interleave_lo(alt < 0, alt < 0); y = interleave_lo(z, y); touch(x); - z = widen2(x); + z = widen2(low_quarter(x)); touch(x); if ( !eq(z, y) ) return __LINE__; @@ -1126,7 +1245,7 @@ int simd_test(void) y = interleave_lo(y, y); y = interleave_lo(z, y); touch(x); - z = widen3(x); + z = widen3(low_eighth(x)); touch(x); if ( !eq(z, y) ) return __LINE__; # endif @@ -1148,14 +1267,14 @@ int simd_test(void) # ifdef widen2 touch(src); - x = widen2(src); + x = widen2(low_quarter(src)); touch(src); if ( !eq(x, z) ) return __LINE__; # endif # ifdef widen3 touch(src); - x = widen3(src); + x = widen3(low_eighth(src)); touch(src); if ( !eq(x, interleave_lo(z, (vec_t){})) ) return __LINE__; # endif @@ -1175,6 +1294,36 @@ int simd_test(void) if ( aux2[i] != src[i] ) return __LINE__; } +#endif + +#if defined(widen2) && defined(shrink2) + { + quarter_t aux1 = low_quarter(src), aux2; + + touch(aux1); + x = widen2(aux1); + touch(x); + aux2 = shrink2(x); + touch(aux2); + for ( i = 0; i < ELEM_COUNT / 4; ++i ) + if ( aux2[i] != src[i] ) + return __LINE__; + } +#endif + +#if defined(widen3) && defined(shrink3) + { + eighth_t aux1 = low_eighth(src), aux2; + + touch(aux1); + x = widen3(aux1); + touch(x); + aux2 = shrink3(x); + touch(aux2); + for ( i = 0; i < ELEM_COUNT / 8; ++i ) + if ( aux2[i] != src[i] ) + return __LINE__; + } #endif #ifdef dup_lo --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -95,6 +95,32 @@ typedef int __attribute__((vector_size(H typedef long long __attribute__((vector_size(HALF_SIZE))) vdi_half_t; # endif +# if ELEM_COUNT >= 4 +# if VEC_SIZE > 64 +# define QUARTER_SIZE (VEC_SIZE / 4) +# else +# define QUARTER_SIZE 16 +# endif +typedef typeof((vec_t){}[0]) __attribute__((vector_size(QUARTER_SIZE))) quarter_t; +typedef char __attribute__((vector_size(QUARTER_SIZE))) vqi_quarter_t; +typedef short __attribute__((vector_size(QUARTER_SIZE))) vhi_quarter_t; +typedef int __attribute__((vector_size(QUARTER_SIZE))) vsi_quarter_t; +typedef long long __attribute__((vector_size(QUARTER_SIZE))) vdi_quarter_t; +# endif + +# if ELEM_COUNT >= 8 +# if VEC_SIZE > 128 +# define EIGHTH_SIZE (VEC_SIZE / 8) +# else +# define EIGHTH_SIZE 16 +# endif +typedef typeof((vec_t){}[0]) __attribute__((vector_size(EIGHTH_SIZE))) eighth_t; +typedef char __attribute__((vector_size(EIGHTH_SIZE))) vqi_eighth_t; +typedef short __attribute__((vector_size(EIGHTH_SIZE))) vhi_eighth_t; +typedef int __attribute__((vector_size(EIGHTH_SIZE))) vsi_eighth_t; +typedef long long __attribute__((vector_size(EIGHTH_SIZE))) vdi_eighth_t; +# endif + #endif #if VEC_SIZE == 16 @@ -182,6 +208,9 @@ OVR_SFP(broadcast); OVR_SFP(comi); OVR_FP(add); OVR_INT(add); +OVR_BW(adds); +OVR_BW(addus); +OVR_BW(avg); OVR_FP(div); OVR(extractps); OVR_FMA(fmadd, FP); @@ -214,6 +243,8 @@ OVR_INT(srl); OVR_DQ(srlv); OVR_FP(sub); OVR_INT(sub); +OVR_BW(subs); +OVR_BW(subus); OVR_SFP(ucomi); OVR_VFP(unpckh); OVR_VFP(unpckl); @@ -275,6 +306,31 @@ OVR(punpckldq); OVR(punpcklqdq); # endif +# ifdef __AVX512BW__ +OVR(pextrb); +OVR(pextrw); +OVR(pinsrb); +OVR(pinsrw); +# ifdef __AVX512VL__ +OVR(pmaddwd); +OVR(pmovsxbw); +OVR(pmovzxbw); +OVR(pmulhuw); +OVR(pmulhw); +OVR(pmullw); +OVR(psadbw); +OVR(pshufb); +OVR(pshufhw); +OVR(pshuflw); +OVR(pslldq); +OVR(psrldq); +OVR(punpckhbw); +OVR(punpckhwd); +OVR(punpcklbw); +OVR(punpcklwd); +# endif +# endif + # undef OVR_VFP # undef OVR_SFP # undef OVR_INT --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -22,6 +22,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512dq-opmask.h" #include "avx512bw-opmask.h" #include "avx512f.h" +#include "avx512bw.h" #define verbose false /* Switch to true for far more logging. */ @@ -105,6 +106,11 @@ static bool simd_check_avx512bw(void) } #define simd_check_avx512bw_opmask simd_check_avx512bw +static bool simd_check_avx512bw_vl(void) +{ + return cpu_has_avx512bw && cpu_has_avx512vl; +} + static void simd_set_regs(struct cpu_user_regs *regs) { if ( cpu_has_mmx ) @@ -284,6 +290,18 @@ static const struct { AVX512VL(VL u64x2, avx512f, 16u8), AVX512VL(VL s64x4, avx512f, 32i8), AVX512VL(VL u64x4, avx512f, 32u8), + SIMD(AVX512BW s8x64, avx512bw, 64i1), + SIMD(AVX512BW u8x64, avx512bw, 64u1), + SIMD(AVX512BW s16x32, avx512bw, 64i2), + SIMD(AVX512BW u16x32, avx512bw, 64u2), + AVX512VL(BW+VL s8x16, avx512bw, 16i1), + AVX512VL(BW+VL u8x16, avx512bw, 16u1), + AVX512VL(BW+VL s8x32, avx512bw, 32i1), + AVX512VL(BW+VL u8x32, avx512bw, 32u1), + AVX512VL(BW+VL s16x8, avx512bw, 16i2), + AVX512VL(BW+VL u16x8, avx512bw, 16u2), + AVX512VL(BW+VL s16x16, avx512bw, 32i2), + AVX512VL(BW+VL u16x16, avx512bw, 32u2), #undef AVX512VL_ #undef AVX512VL #undef SIMD_ From patchwork Fri Mar 15 10:44:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854473 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3BDD713B5 for ; Fri, 15 Mar 2019 10:46:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1F4832A934 for ; Fri, 15 Mar 2019 10:46:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 132502A938; Fri, 15 Mar 2019 10:46:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0362C2A934 for ; Fri, 15 Mar 2019 10:46:27 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kKB-0004tU-Mo; Fri, 15 Mar 2019 10:44:27 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kKA-0004t5-DN for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:44:26 +0000 X-Inumbo-ID: 4c4c4af7-470f-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 4c4c4af7-470f-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:44:24 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:44:23 -0600 Message-Id: <5C8B8207020000780021F173@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:44:23 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 14/50] x86emul: basic AVX512DQ testing X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Test various of the insns which have been implemented already. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v6: Re-base. v5: Re-base over changes earlier in the series. v4: Wrap OVR(pmullq) in __AVX512VL__ conditional. v3: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86 CFLAGS += $(CFLAGS_xeninclude) -SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw +SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq FMA := fma4 fma SG := avx2-sg TESTCASES := blowfish $(SIMD) $(FMA) $(SG) @@ -69,9 +69,12 @@ avx512f-flts := 4 8 avx512bw-vecs := $(avx512f-vecs) avx512bw-ints := 1 2 avx512bw-flts := +avx512dq-vecs := $(avx512f-vecs) +avx512dq-ints := $(avx512f-ints) +avx512dq-flts := $(avx512f-flts) avx512f-opmask-vecs := 2 -avx512dq-opmask-vecs := 1 +avx512dq-opmask-vecs := 1 2 avx512bw-opmask-vecs := 4 8 # Suppress building by default of the harness if the compiler can't deal --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -121,6 +121,34 @@ typedef int __attribute__((vector_size(E typedef long long __attribute__((vector_size(EIGHTH_SIZE))) vdi_eighth_t; # endif +# define DECL_PAIR(w) \ +typedef w ## _t pair_t; \ +typedef vsi_ ## w ## _t vsi_pair_t; \ +typedef vdi_ ## w ## _t vdi_pair_t +# define DECL_QUARTET(w) \ +typedef w ## _t quartet_t; \ +typedef vsi_ ## w ## _t vsi_quartet_t; \ +typedef vdi_ ## w ## _t vdi_quartet_t +# define DECL_OCTET(w) \ +typedef w ## _t octet_t; \ +typedef vsi_ ## w ## _t vsi_octet_t; \ +typedef vdi_ ## w ## _t vdi_octet_t + +# if ELEM_COUNT == 4 +DECL_PAIR(half); +# elif ELEM_COUNT == 8 +DECL_PAIR(quarter); +DECL_QUARTET(half); +# elif ELEM_COUNT == 16 +DECL_PAIR(eighth); +DECL_QUARTET(quarter); +DECL_OCTET(half); +# endif + +# undef DECL_OCTET +# undef DECL_QUARTET +# undef DECL_PAIR + #endif #if VEC_SIZE == 16 @@ -146,6 +174,14 @@ typedef long long __attribute__((vector_ #ifdef __AVX512F__ /* Sadly there are a few exceptions to the general naming rules. */ +# define __builtin_ia32_broadcastf32x4_512_mask __builtin_ia32_broadcastf32x4_512 +# define __builtin_ia32_broadcasti32x4_512_mask __builtin_ia32_broadcasti32x4_512 +# define __builtin_ia32_insertf32x4_512_mask __builtin_ia32_insertf32x4_mask +# define __builtin_ia32_insertf32x8_512_mask __builtin_ia32_insertf32x8_mask +# define __builtin_ia32_insertf64x4_512_mask __builtin_ia32_insertf64x4_mask +# define __builtin_ia32_inserti32x4_512_mask __builtin_ia32_inserti32x4_mask +# define __builtin_ia32_inserti32x8_512_mask __builtin_ia32_inserti32x8_mask +# define __builtin_ia32_inserti64x4_512_mask __builtin_ia32_inserti64x4_mask # define __builtin_ia32_shuf_f32x4_512_mask __builtin_ia32_shuf_f32x4_mask # define __builtin_ia32_shuf_f64x2_512_mask __builtin_ia32_shuf_f64x2_mask # define __builtin_ia32_shuf_i32x4_512_mask __builtin_ia32_shuf_i32x4_mask @@ -331,6 +367,20 @@ OVR(punpcklwd); # endif # endif +# ifdef __AVX512DQ__ +OVR_VFP(and); +OVR_VFP(andn); +OVR_VFP(or); +OVR(pextrd); +OVR(pextrq); +OVR(pinsrd); +OVR(pinsrq); +# ifdef __AVX512VL__ +OVR(pmullq); +# endif +OVR_VFP(xor); +# endif + # undef OVR_VFP # undef OVR_SFP # undef OVR_INT --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -139,6 +139,27 @@ static inline bool _to_bool(byte_vec_t b # endif #elif defined(FLOAT_SIZE) && defined(__AVX512F__) && \ (VEC_SIZE == 64 || defined(__AVX512VL__)) +# if ELEM_COUNT == 8 /* vextractf{32,64}x4 */ || \ + (ELEM_COUNT == 16 && ELEM_SIZE == 4 && defined(__AVX512DQ__)) /* vextractf32x8 */ || \ + (ELEM_COUNT == 4 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextractf64x2 */ +# define low_half(x) ({ \ + half_t t_; \ + asm ( "vextractf%c[w]x%c[n] $0, %[s], %[d]" \ + : [d] "=m" (t_) \ + : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 2) ); \ + t_; \ +}) +# endif +# if (ELEM_COUNT == 16 && ELEM_SIZE == 4) /* vextractf32x4 */ || \ + (ELEM_COUNT == 8 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextractf64x2 */ +# define low_quarter(x) ({ \ + quarter_t t_; \ + asm ( "vextractf%c[w]x%c[n] $0, %[s], %[d]" \ + : [d] "=m" (t_) \ + : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 4) ); \ + t_; \ +}) +# endif # if FLOAT_SIZE == 4 # define broadcast(x) ({ \ vec_t t_; \ @@ -146,6 +167,17 @@ static inline bool _to_bool(byte_vec_t b : "=v" (t_) : "m" (*(float[1]){ x }) ); \ t_; \ }) +# if VEC_SIZE >= 32 && defined(__AVX512DQ__) +# define broadcast_pair(x) ({ \ + vec_t t_; \ + asm ( "vbroadcastf32x2 %1, %0" : "=v" (t_) : "m" (x) ); \ + t_; \ +}) +# endif +# if VEC_SIZE == 64 && defined(__AVX512DQ__) +# define broadcast_octet(x) B(broadcastf32x8_, _mask, x, undef(), ~0) +# define insert_octet(x, y, p) B(insertf32x8_, _mask, x, y, p, undef(), ~0) +# endif # define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) # define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) @@ -155,6 +187,13 @@ static inline bool _to_bool(byte_vec_t b # define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0) # define swap(x) B(shufps, _mask, x, x, 0b00011011, undef(), ~0) # else +# define broadcast_quartet(x) B(broadcastf32x4_, _mask, x, undef(), ~0) +# define insert_pair(x, y, p) \ + B(insertf32x4_, _mask, x, \ + /* Cast needed below to work around gcc 7.x quirk. */ \ + (p) & 1 ? (typeof(y))__builtin_ia32_shufps(y, y, 0b01000100) : (y), \ + (p) >> 1, x, 3 << ((p) * 2)) +# define insert_quartet(x, y, p) B(insertf32x4_, _mask, x, y, p, undef(), ~0) # define interleave_hi(x, y) B(vpermi2varps, _mask, x, interleave_hi, y, ~0) # define interleave_lo(x, y) B(vpermt2varps, _mask, interleave_lo, x, y, ~0) # define swap(x) ({ \ @@ -178,6 +217,14 @@ static inline bool _to_bool(byte_vec_t b t_; \ }) # endif +# if VEC_SIZE >= 32 && defined(__AVX512DQ__) +# define broadcast_pair(x) B(broadcastf64x2_, _mask, x, undef(), ~0) +# define insert_pair(x, y, p) B(insertf64x2_, _mask, x, y, p, undef(), ~0) +# endif +# if VEC_SIZE == 64 +# define broadcast_quartet(x) B(broadcastf64x4_, , x, undef(), ~0) +# define insert_quartet(x, y, p) B(insertf64x4_, _mask, x, y, p, undef(), ~0) +# endif # define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) # define mix(x, y) B(movapd, _mask, x, y, 0b01010101) @@ -306,6 +353,16 @@ static inline bool _to_bool(byte_vec_t b t_; \ }) # endif +# if (ELEM_COUNT == 16 && ELEM_SIZE == 4) /* vextracti32x4 */ || \ + (ELEM_COUNT == 8 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextracti64x2 */ +# define low_quarter(x) ({ \ + quarter_t t_; \ + asm ( "vextracti%c[w]x%c[n] $0, %[s], %[d]" \ + : [d] "=m" (t_) \ + : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 4) ); \ + t_; \ +}) +# endif # if INT_SIZE == 4 || UINT_SIZE == 4 # define broadcast(x) ({ \ vec_t t_; \ @@ -318,11 +375,30 @@ static inline bool _to_bool(byte_vec_t b asm ( "vpbroadcastd %k1, %0" : "=v" (t_) : "r" (x) ); \ t_; \ }) +# ifdef __AVX512DQ__ +# define broadcast_pair(x) ({ \ + vec_t t_; \ + asm ( "vbroadcasti32x2 %1, %0" : "=v" (t_) : "m" (x) ); \ + t_; \ +}) +# endif +# if VEC_SIZE == 64 && defined(__AVX512DQ__) +# define broadcast_octet(x) ((vec_t)B(broadcasti32x8_, _mask, (vsi_octet_t)(x), (vsi_t)undef(), ~0)) +# define insert_octet(x, y, p) ((vec_t)B(inserti32x8_, _mask, (vsi_t)(x), (vsi_octet_t)(y), p, (vsi_t)undef(), ~0)) +# endif # if VEC_SIZE == 16 # define interleave_hi(x, y) ((vec_t)B(punpckhdq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) # define interleave_lo(x, y) ((vec_t)B(punpckldq, _mask, (vsi_t)(x), (vsi_t)(y), (vsi_t)undef(), ~0)) # define swap(x) ((vec_t)B(pshufd, _mask, (vsi_t)(x), 0b00011011, (vsi_t)undef(), ~0)) # else +# define broadcast_quartet(x) ((vec_t)B(broadcasti32x4_, _mask, (vsi_quartet_t)(x), (vsi_t)undef(), ~0)) +# define insert_pair(x, y, p) \ + (vec_t)(B(inserti32x4_, _mask, (vsi_t)(x), \ + /* First cast needed below to work around gcc 7.x quirk. */ \ + (p) & 1 ? (vsi_pair_t)__builtin_ia32_pshufd((vsi_pair_t)(y), 0b01000100) \ + : (vsi_pair_t)(y), \ + (p) >> 1, (vsi_t)(x), 3 << ((p) * 2))) +# define insert_quartet(x, y, p) ((vec_t)B(inserti32x4_, _mask, (vsi_t)(x), (vsi_quartet_t)(y), p, (vsi_t)undef(), ~0)) # define interleave_hi(x, y) ((vec_t)B(vpermi2vard, _mask, (vsi_t)(x), interleave_hi, (vsi_t)(y), ~0)) # define interleave_lo(x, y) ((vec_t)B(vpermt2vard, _mask, interleave_lo, (vsi_t)(x), (vsi_t)(y), ~0)) # define swap(x) ((vec_t)B(pshufd, _mask, \ @@ -347,6 +423,14 @@ static inline bool _to_bool(byte_vec_t b t_; \ }) # endif +# if VEC_SIZE >= 32 && defined(__AVX512DQ__) +# define broadcast_pair(x) ((vec_t)B(broadcasti64x2_, _mask, (vdi_pair_t)(x), (vdi_t)undef(), ~0)) +# define insert_pair(x, y, p) ((vec_t)B(inserti64x2_, _mask, (vdi_t)(x), (vdi_pair_t)(y), p, (vdi_t)undef(), ~0)) +# endif +# if VEC_SIZE == 64 +# define broadcast_quartet(x) ((vec_t)B(broadcasti64x4_, , (vdi_quartet_t)(x), (vdi_t)undef(), ~0)) +# define insert_quartet(x, y, p) ((vec_t)B(inserti64x4_, _mask, (vdi_t)(x), (vdi_quartet_t)(y), p, (vdi_t)undef(), ~0)) +# endif # if VEC_SIZE == 16 # define interleave_hi(x, y) ((vec_t)B(punpckhqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) # define interleave_lo(x, y) ((vec_t)B(punpcklqdq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) @@ -898,7 +982,7 @@ static inline eighth_t low_eighth(vec_t eighth_t y; unsigned int i; - for ( i = 0; i < ELEM_COUNT / 4; ++i ) + for ( i = 0; i < ELEM_COUNT / 8; ++i ) y[i] = x[i]; return y; @@ -910,6 +994,50 @@ static inline eighth_t low_eighth(vec_t #endif +#ifdef broadcast_pair +# if ELEM_COUNT == 4 +# define broadcast_half broadcast_pair +# elif ELEM_COUNT == 8 +# define broadcast_quarter broadcast_pair +# elif ELEM_COUNT == 16 +# define broadcast_eighth broadcast_pair +# endif +#endif + +#ifdef insert_pair +# if ELEM_COUNT == 4 +# define insert_half insert_pair +# elif ELEM_COUNT == 8 +# define insert_quarter insert_pair +# elif ELEM_COUNT == 16 +# define insert_eighth insert_pair +# endif +#endif + +#ifdef broadcast_quartet +# if ELEM_COUNT == 8 +# define broadcast_half broadcast_quartet +# elif ELEM_COUNT == 16 +# define broadcast_quarter broadcast_quartet +# endif +#endif + +#ifdef insert_quartet +# if ELEM_COUNT == 8 +# define insert_half insert_quartet +# elif ELEM_COUNT == 16 +# define insert_quarter insert_quartet +# endif +#endif + +#if defined(broadcast_octet) && ELEM_COUNT == 16 +# define broadcast_half broadcast_octet +#endif + +#if defined(insert_octet) && ELEM_COUNT == 16 +# define insert_half insert_octet +#endif + #if defined(__AVX512F__) && defined(FLOAT_SIZE) # include "simd-fma.c" #endif @@ -1205,6 +1333,60 @@ int simd_test(void) if ( !eq(broadcast2(ELEM_COUNT + 1), src + inv) ) return __LINE__; #endif +#if defined(broadcast_half) && defined(insert_half) + { + half_t aux = low_half(src); + + touch(aux); + x = broadcast_half(aux); + touch(aux); + y = insert_half(src, aux, 1); + if ( !eq(x, y) ) return __LINE__; + } +#endif + +#if defined(broadcast_quarter) && defined(insert_quarter) + { + quarter_t aux = low_quarter(src); + + touch(aux); + x = broadcast_quarter(aux); + touch(aux); + y = insert_quarter(src, aux, 1); + touch(aux); + y = insert_quarter(y, aux, 2); + touch(aux); + y = insert_quarter(y, aux, 3); + if ( !eq(x, y) ) return __LINE__; + } +#endif + +#if defined(broadcast_eighth) && defined(insert_eighth) && \ + /* At least gcc 7.3 "optimizes" away all insert_eighth() calls below. */ \ + __GNUC__ >= 8 + { + eighth_t aux = low_eighth(src); + + touch(aux); + x = broadcast_eighth(aux); + touch(aux); + y = insert_eighth(src, aux, 1); + touch(aux); + y = insert_eighth(y, aux, 2); + touch(aux); + y = insert_eighth(y, aux, 3); + touch(aux); + y = insert_eighth(y, aux, 4); + touch(aux); + y = insert_eighth(y, aux, 5); + touch(aux); + y = insert_eighth(y, aux, 6); + touch(aux); + y = insert_eighth(y, aux, 7); + if ( !eq(x, y) ) return __LINE__; + } +#endif + #if defined(interleave_lo) && defined(interleave_hi) touch(src); x = interleave_lo(inv, src); --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -23,6 +23,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512bw-opmask.h" #include "avx512f.h" #include "avx512bw.h" +#include "avx512dq.h" #define verbose false /* Switch to true for far more logging. */ @@ -100,6 +101,11 @@ static bool simd_check_avx512dq(void) } #define simd_check_avx512dq_opmask simd_check_avx512dq +static bool simd_check_avx512dq_vl(void) +{ + return cpu_has_avx512dq && cpu_has_avx512vl; +} + static bool simd_check_avx512bw(void) { return cpu_has_avx512bw; @@ -267,9 +273,10 @@ static const struct { SIMD(XOP i32x8, xop, 32i4), SIMD(XOP i64x4, xop, 32i8), SIMD(OPMASK/w, avx512f_opmask, 2), - SIMD(OPMASK/b, avx512dq_opmask, 1), - SIMD(OPMASK/d, avx512bw_opmask, 4), - SIMD(OPMASK/q, avx512bw_opmask, 8), + SIMD(OPMASK+DQ/b, avx512dq_opmask, 1), + SIMD(OPMASK+DQ/w, avx512dq_opmask, 2), + SIMD(OPMASK+BW/d, avx512bw_opmask, 4), + SIMD(OPMASK+BW/q, avx512bw_opmask, 8), SIMD(AVX512F f32 scalar, avx512f, f4), SIMD(AVX512F f32x16, avx512f, 64f4), SIMD(AVX512F f64 scalar, avx512f, f8), @@ -302,6 +309,24 @@ static const struct { AVX512VL(BW+VL u16x8, avx512bw, 16u2), AVX512VL(BW+VL s16x16, avx512bw, 32i2), AVX512VL(BW+VL u16x16, avx512bw, 32u2), + SIMD(AVX512DQ f32x16, avx512dq, 64f4), + SIMD(AVX512DQ f64x8, avx512dq, 64f8), + SIMD(AVX512DQ s32x16, avx512dq, 64i4), + SIMD(AVX512DQ u32x16, avx512dq, 64u4), + SIMD(AVX512DQ s64x8, avx512dq, 64i8), + SIMD(AVX512DQ u64x8, avx512dq, 64u8), + AVX512VL(DQ+VL f32x4, avx512dq, 16f4), + AVX512VL(DQ+VL f64x2, avx512dq, 16f8), + AVX512VL(DQ+VL f32x8, avx512dq, 32f4), + AVX512VL(DQ+VL f64x4, avx512dq, 32f8), + AVX512VL(DQ+VL s32x4, avx512dq, 16i4), + AVX512VL(DQ+VL u32x4, avx512dq, 16u4), + AVX512VL(DQ+VL s32x8, avx512dq, 32i4), + AVX512VL(DQ+VL u32x8, avx512dq, 32u4), + AVX512VL(DQ+VL s64x2, avx512dq, 16i8), + AVX512VL(DQ+VL u64x2, avx512dq, 16u8), + AVX512VL(DQ+VL s64x4, avx512dq, 32i8), + AVX512VL(DQ+VL u64x4, avx512dq, 32u8), #undef AVX512VL_ #undef AVX512VL #undef SIMD_ From patchwork Fri Mar 15 10:44:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854475 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6336E15AC for ; Fri, 15 Mar 2019 10:46:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A7922A934 for ; Fri, 15 Mar 2019 10:46:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3EC602A937; Fri, 15 Mar 2019 10:46:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D0A7B2A934 for ; Fri, 15 Mar 2019 10:46:35 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kKh-00052u-Bz; Fri, 15 Mar 2019 10:44:59 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kKg-00052d-4f for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:44:58 +0000 X-Inumbo-ID: 5f0e27b6-470f-11e9-9848-bb533d287e20 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 5f0e27b6-470f-11e9-9848-bb533d287e20; Fri, 15 Mar 2019 10:44:56 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:44:55 -0600 Message-Id: <5C8B8228020000780021F176@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:44:56 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 15/50] x86emul: support AVX512F move high/low insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP No explicit test harness additions other than the overrides, as the compiler already makes use of the insns. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v8: No need to set fault_suppression to false. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -253,6 +253,16 @@ static const struct test avx512f_128[] = INSN(insertps, 66, 0f3a, 21, el, d, el), INSN(mov, 66, 0f, 6e, el, dq64, el), INSN(mov, 66, 0f, 7e, el, dq64, el), +// movhlps, , 0f, 12, d + INSN(movhpd, 66, 0f, 16, el, q, vl), + INSN(movhpd, 66, 0f, 17, el, q, vl), + INSN(movhps, , 0f, 16, el_2, d, vl), + INSN(movhps, , 0f, 17, el_2, d, vl), +// movlhps, , 0f, 16, d + INSN(movlpd, 66, 0f, 12, el, q, vl), + INSN(movlpd, 66, 0f, 13, el, q, vl), + INSN(movlps, , 0f, 12, el_2, d, vl), + INSN(movlps, , 0f, 13, el_2, d, vl), INSN(movq, f3, 0f, 7e, el, q, el), INSN(movq, 66, 0f, d6, el, q, el), }; --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -266,6 +266,12 @@ OVR(movd); OVR(movq); OVR_SFP(mov); OVR_VFP(mova); +OVR(movhlps); +OVR(movhpd); +OVR(movhps); +OVR(movlhps); +OVR(movlpd); +OVR(movlps); OVR_VFP(movnt); OVR_VFP(movu); OVR_FP(mul); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -286,11 +286,11 @@ static const struct twobyte_table { [0x0f] = { ModRM|SrcImmByte }, [0x10] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp, d8s_vl }, [0x11] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp, d8s_vl }, - [0x12] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, - [0x13] = { DstMem|SrcImplicit|ModRM|Mov, simd_other }, + [0x12] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, 3 }, + [0x13] = { DstMem|SrcImplicit|ModRM|Mov, simd_other, 3 }, [0x14 ... 0x15] = { DstImplicit|SrcMem|ModRM, simd_packed_fp, d8s_vl }, - [0x16] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, - [0x17] = { DstMem|SrcImplicit|ModRM|Mov, simd_other }, + [0x16] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, 3 }, + [0x17] = { DstMem|SrcImplicit|ModRM|Mov, simd_other, 3 }, [0x18 ... 0x1f] = { ImplicitOps|ModRM }, [0x20 ... 0x21] = { DstMem|SrcImplicit|ModRM }, [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM }, @@ -6032,6 +6032,25 @@ x86_emulate( op_bytes = 8; goto simd_0f_fp; + case X86EMUL_OPC_EVEX_66(0x0f, 0x12): /* vmovlpd m64,xmm,xmm */ + CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x13): /* vmovlp{s,d} xmm,m64 */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x16): /* vmovhpd m64,xmm,xmm */ + CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x17): /* vmovhp{s,d} xmm,m64 */ + generate_exception_if(ea.type != OP_MEM, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX(0x0f, 0x12): /* vmovlps m64,xmm,xmm */ + /* vmovhlps xmm,xmm,xmm */ + case X86EMUL_OPC_EVEX(0x0f, 0x16): /* vmovhps m64,xmm,xmm */ + /* vmovlhps xmm,xmm,xmm */ + generate_exception_if((evex.lr || evex.opmsk || evex.brs || + evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK)), + EXC_UD); + host_and_vcpu_must_have(avx512f); + if ( (d & DstMask) != DstMem ) + d &= ~TwoOp; + op_bytes = 8; + goto simd_zmm; + case X86EMUL_OPC_F3(0x0f, 0x12): /* movsldup xmm/m128,xmm */ case X86EMUL_OPC_VEX_F3(0x0f, 0x12): /* vmovsldup {x,y}mm/mem,{x,y}mm */ case X86EMUL_OPC_F2(0x0f, 0x12): /* movddup xmm/m64,xmm */ From patchwork Fri Mar 15 10:45:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854477 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C37F15AC for ; Fri, 15 Mar 2019 10:46:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64D172A934 for ; Fri, 15 Mar 2019 10:46:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 58BF82A937; Fri, 15 Mar 2019 10:46:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DF84D2A934 for ; Fri, 15 Mar 2019 10:46:56 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kL8-0005B6-P0; Fri, 15 Mar 2019 10:45:26 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kL6-0005A6-Oa for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:45:24 +0000 X-Inumbo-ID: 6f4167f9-470f-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 6f4167f9-470f-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:45:23 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:45:22 -0600 Message-Id: <5C8B8241020000780021F1BE@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:45:21 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 16/50] x86emul: support AVX512F move duplicate insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Judging from insn prefixes, these are scalar insns, but their (memory) operands are vector ones (with the exception of 128-bit VMOVDDUP). For this some adjustments to disp8scale calculation code are needed. No explicit test harness additions other than the overrides, as the compiler already makes use of the insns. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v6: Fix Disp8 test for VMOVDDUP when AVX512VL is unavailable. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -146,6 +146,7 @@ static const struct test avx512f_all[] = INSN_SFP(mov, 0f, 11), INSN_PFP_NB(mova, 0f, 28), INSN_PFP_NB(mova, 0f, 29), + INSN(movddup, f2, 0f, 12, vl, q_nb, vl), INSN(movdqa32, 66, 0f, 6f, vl, d_nb, vl), INSN(movdqa32, 66, 0f, 7f, vl, d_nb, vl), INSN(movdqa64, 66, 0f, 6f, vl, q_nb, vl), @@ -157,6 +158,8 @@ static const struct test avx512f_all[] = INSN(movntdq, 66, 0f, e7, vl, d_nb, vl), INSN(movntdqa, 66, 0f38, 2a, vl, d_nb, vl), INSN_PFP_NB(movnt, 0f, 2b), + INSN(movshdup, f3, 0f, 16, vl, d_nb, vl), + INSN(movsldup, f3, 0f, 12, vl, d_nb, vl), INSN_PFP_NB(movu, 0f, 10), INSN_PFP_NB(movu, 0f, 11), INSN_FP(mul, 0f, 59), @@ -694,6 +697,19 @@ static void test_group(const struct test switch ( tests[i].esz ) { + case ESZ_q_nb: + /* The 128-bit form of VMOVDDUP needs special casing. */ + if ( vl[j] == VL_128 && tests[i].spc == SPC_0f && + tests[i].opc == 0x12 && tests[i].pfx == PFX_f2 ) + { + struct test test = tests[i]; + + test.vsz = VSZ_el; + test.scale = SC_el; + test_one(&test, vl[j], instr, ctxt); + continue; + } + /* fall through */ default: test_one(&tests[i], vl[j], instr, ctxt); break; --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -326,8 +326,11 @@ REN(pandn, , d); REN(por, , d); REN(pxor, , d); # endif +OVR(movddup); OVR(movntdq); OVR(movntdqa); +OVR(movshdup); +OVR(movsldup); OVR(pmovsxbd); OVR(pmovsxbq); OVR(pmovsxdq); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -3048,6 +3048,15 @@ x86_decode( switch ( b ) { + case 0x12: /* vmovsldup / vmovddup */ + if ( evex.pfx == vex_f2 ) + disp8scale = evex.lr ? 4 + evex.lr : 3; + /* fall through */ + case 0x16: /* vmovshdup */ + if ( evex.pfx == vex_f3 ) + disp8scale = 4 + evex.lr; + break; + case 0x20: /* mov cr,reg */ case 0x21: /* mov dr,reg */ case 0x22: /* mov reg,cr */ @@ -6066,6 +6075,20 @@ x86_emulate( host_and_vcpu_must_have(sse3); goto simd_0f_xmm; + case X86EMUL_OPC_EVEX_F3(0x0f, 0x12): /* vmovsldup [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_F2(0x0f, 0x12): /* vmovddup [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f, 0x16): /* vmovshdup [xyz]mm/mem,[xyz]mm{k} */ + generate_exception_if((evex.brs || + evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK)), + EXC_UD); + host_and_vcpu_must_have(avx512f); + avx512_vlen_check(false); + d |= TwoOp; + op_bytes = !(evex.pfx & VEX_PREFIX_DOUBLE_MASK) || evex.lr + ? 16 << evex.lr : 8; + fault_suppression = false; + goto simd_zmm; + CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x14): /* vunpcklp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0x15): /* vunpckhp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ generate_exception_if(evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK), From patchwork Fri Mar 15 10:46:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854479 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CF6615AC for ; Fri, 15 Mar 2019 10:47:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 511292A934 for ; Fri, 15 Mar 2019 10:47:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 456A42A937; Fri, 15 Mar 2019 10:47:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 659F62A934 for ; Fri, 15 Mar 2019 10:47:49 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kLp-0005MQ-8n; Fri, 15 Mar 2019 10:46:09 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kLn-0005M6-Ly for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:46:07 +0000 X-Inumbo-ID: 87f83496-470f-11e9-84e6-a7c41e3a5e29 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 87f83496-470f-11e9-84e6-a7c41e3a5e29; Fri, 15 Mar 2019 10:46:04 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:46:03 -0600 Message-Id: <5C8B826C020000780021F1C1@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:46:04 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 17/50] x86emul: support AVX512{F, BW, _VBMI} permute insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v5: Re-base over changes earlier in the series. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -178,6 +178,10 @@ static const struct test avx512f_all[] = INSN(pcmpu, 66, 0f3a, 1e, vl, dq, vl), INSN(permi2, 66, 0f38, 76, vl, dq, vl), INSN(permi2, 66, 0f38, 77, vl, sd, vl), + INSN(permilpd, 66, 0f38, 0d, vl, q, vl), + INSN(permilpd, 66, 0f3a, 05, vl, q, vl), + INSN(permilps, 66, 0f38, 0c, vl, d, vl), + INSN(permilps, 66, 0f3a, 04, vl, d, vl), INSN(permt2, 66, 0f38, 7e, vl, dq, vl), INSN(permt2, 66, 0f38, 7f, vl, sd, vl), INSN(pmaxs, 66, 0f38, 3d, vl, dq, vl), @@ -278,6 +282,10 @@ static const struct test avx512f_no128[] INSN(extracti32x4, 66, 0f3a, 39, el_4, d, vl), INSN(insertf32x4, 66, 0f3a, 18, el_4, d, vl), INSN(inserti32x4, 66, 0f3a, 38, el_4, d, vl), + INSN(perm, 66, 0f38, 36, vl, dq, vl), + INSN(perm, 66, 0f38, 16, vl, sd, vl), + INSN(permpd, 66, 0f3a, 01, vl, q, vl), + INSN(permq, 66, 0f3a, 00, vl, q, vl), INSN(shuff32x4, 66, 0f3a, 23, vl, d, vl), INSN(shuff64x2, 66, 0f3a, 23, vl, q, vl), INSN(shufi32x4, 66, 0f3a, 43, vl, d, vl), @@ -316,6 +324,7 @@ static const struct test avx512bw_all[] INSN(pcmpgtb, 66, 0f, 64, vl, b, vl), INSN(pcmpgtw, 66, 0f, 65, vl, w, vl), INSN(pcmpu, 66, 0f3a, 3e, vl, bw, vl), + INSN(permw, 66, 0f38, 8d, vl, w, vl), INSN(permi2w, 66, 0f38, 75, vl, w, vl), INSN(permt2w, 66, 0f38, 7d, vl, w, vl), INSN(pmaddwd, 66, 0f, f5, vl, w, vl), @@ -412,6 +421,7 @@ static const struct test avx512dq_512[] }; static const struct test avx512_vbmi_all[] = { + INSN(permb, 66, 0f38, 8d, vl, b, vl), INSN(permi2b, 66, 0f38, 75, vl, b, vl), INSN(permt2b, 66, 0f38, 7d, vl, b, vl), }; --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -186,6 +186,7 @@ static inline bool _to_bool(byte_vec_t b # define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0) # define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0) # define swap(x) B(shufps, _mask, x, x, 0b00011011, undef(), ~0) +# define swap2(x) B_(vpermilps, _mask, x, 0b00011011, undef(), ~0) # else # define broadcast_quartet(x) B(broadcastf32x4_, _mask, x, undef(), ~0) # define insert_pair(x, y, p) \ @@ -200,6 +201,10 @@ static inline bool _to_bool(byte_vec_t b vec_t t_ = B(shuf_f32x4_, _mask, x, x, VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0); \ B(shufps, _mask, t_, t_, 0b00011011, undef(), ~0); \ }) +# define swap2(x) B(vpermilps, _mask, \ + B(shuf_f32x4_, _mask, x, x, \ + VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0), \ + 0b00011011, undef(), ~0) # endif # elif FLOAT_SIZE == 8 # if VEC_SIZE >= 32 @@ -233,6 +238,7 @@ static inline bool _to_bool(byte_vec_t b # define interleave_hi(x, y) B(unpckhpd, _mask, x, y, undef(), ~0) # define interleave_lo(x, y) B(unpcklpd, _mask, x, y, undef(), ~0) # define swap(x) B(shufpd, _mask, x, x, 0b01, undef(), ~0) +# define swap2(x) B_(vpermilpd, _mask, x, 0b01, undef(), ~0) # else # define interleave_hi(x, y) B(vpermi2varpd, _mask, x, interleave_hi, y, ~0) # define interleave_lo(x, y) B(vpermt2varpd, _mask, interleave_lo, x, y, ~0) @@ -240,6 +246,10 @@ static inline bool _to_bool(byte_vec_t b vec_t t_ = B(shuf_f64x2_, _mask, x, x, VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0); \ B(shufpd, _mask, t_, t_, 0b01010101, undef(), ~0); \ }) +# define swap2(x) B(vpermilpd, _mask, \ + B(shuf_f64x2_, _mask, x, x, \ + VEC_SIZE == 32 ? 0b01 : 0b00011011, undef(), ~0), \ + 0b01010101, undef(), ~0) # endif # endif #elif FLOAT_SIZE == 4 && defined(__SSE__) @@ -405,6 +415,7 @@ static inline bool _to_bool(byte_vec_t b B(shuf_i32x4_, _mask, (vsi_t)(x), (vsi_t)(x), \ VEC_SIZE == 32 ? 0b01 : 0b00011011, (vsi_t)undef(), ~0), \ 0b00011011, (vsi_t)undef(), ~0)) +# define swap2(x) ((vec_t)B_(permvarsi, _mask, (vsi_t)(x), (vsi_t)(inv - 1), (vsi_t)undef(), ~0)) # endif # define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) @@ -442,8 +453,17 @@ static inline bool _to_bool(byte_vec_t b (vsi_t)B(shuf_i64x2_, _mask, (vdi_t)(x), (vdi_t)(x), \ VEC_SIZE == 32 ? 0b01 : 0b00011011, (vdi_t)undef(), ~0), \ 0b01001110, (vsi_t)undef(), ~0)) +# define swap2(x) ((vec_t)B(permvardi, _mask, (vdi_t)(x), (vdi_t)(inv - 1), (vdi_t)undef(), ~0)) # endif # define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101)) +# if VEC_SIZE == 32 +# define swap3(x) ((vec_t)B_(permdi, _mask, (vdi_t)(x), 0b00011011, (vdi_t)undef(), ~0)) +# elif VEC_SIZE == 64 +# define swap3(x) ({ \ + vdi_t t_ = B_(permdi, _mask, (vdi_t)(x), 0b00011011, (vdi_t)undef(), ~0); \ + B(shuf_i64x2_, _mask, t_, t_, 0b01001110, (vdi_t)undef(), ~0); \ +}) +# endif # endif # if INT_SIZE == 4 # define max(x, y) B(pmaxsd, _mask, x, y, undef(), ~0) @@ -489,6 +509,9 @@ static inline bool _to_bool(byte_vec_t b # define shrink1(x) ((half_t)B(pmovwb, _mask, (vhi_t)(x), (vqi_half_t){}, ~0)) # define shrink2(x) ((quarter_t)B(pmovdb, _mask, (vsi_t)(x), (vqi_quarter_t){}, ~0)) # define shrink3(x) ((eighth_t)B(pmovqb, _mask, (vdi_t)(x), (vqi_eighth_t){}, ~0)) +# ifdef __AVX512VBMI__ +# define swap2(x) ((vec_t)B(permvarqi, _mask, (vqi_t)(x), (vqi_t)(inv - 1), (vqi_t)undef(), ~0)) +# endif # elif INT_SIZE == 2 || UINT_SIZE == 2 # define broadcast(x) ({ \ vec_t t_; \ @@ -517,6 +540,7 @@ static inline bool _to_bool(byte_vec_t b (0b01010101010101010101010101010101 & ALL_TRUE))) # define shrink1(x) ((half_t)B(pmovdw, _mask, (vsi_t)(x), (vhi_half_t){}, ~0)) # define shrink2(x) ((quarter_t)B(pmovqw, _mask, (vdi_t)(x), (vhi_quarter_t){}, ~0)) +# define swap2(x) ((vec_t)B(permvarhi, _mask, (vhi_t)(x), (vhi_t)(inv - 1), (vhi_t)undef(), ~0)) # endif # if INT_SIZE == 1 # define max(x, y) ((vec_t)B(pmaxsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) @@ -1325,6 +1349,12 @@ int simd_test(void) if ( !eq(swap2(src), inv) ) return __LINE__; #endif +#ifdef swap3 + touch(src); + if ( !eq(swap3(src), inv) ) return __LINE__; + touch(src); +#endif + #ifdef broadcast if ( !eq(broadcast(ELEM_COUNT + 1), src + inv) ) return __LINE__; #endif --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -275,6 +275,8 @@ OVR(movlps); OVR_VFP(movnt); OVR_VFP(movu); OVR_FP(mul); +OVR_VFP(perm); +OVR_VFP(permil); OVR_VFP(shuf); OVR_INT(sll); OVR_DQ(sllv); @@ -331,6 +333,8 @@ OVR(movntdq); OVR(movntdqa); OVR(movshdup); OVR(movsldup); +OVR(permd); +OVR(permq); OVR(pmovsxbd); OVR(pmovsxbq); OVR(pmovsxdq); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -434,7 +434,8 @@ static const struct ext0f38_table { } ext0f38_table[256] = { [0x00] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x01 ... 0x0b] = { .simd_size = simd_packed_int }, - [0x0c ... 0x0f] = { .simd_size = simd_packed_fp }, + [0x0c ... 0x0d] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, + [0x0e ... 0x0f] = { .simd_size = simd_packed_fp }, [0x10 ... 0x12] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x13] = { .simd_size = simd_other, .two_op = 1 }, [0x14 ... 0x16] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, @@ -477,6 +478,7 @@ static const struct ext0f38_table { [0x7d ... 0x7e] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x7f] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x8c] = { .simd_size = simd_packed_int }, + [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 }, [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 }, [0x96 ... 0x98] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, @@ -522,10 +524,10 @@ static const struct ext0f3a_table { uint8_t four_op:1; disp8scale_t d8s:4; } ext0f3a_table[256] = { - [0x00] = { .simd_size = simd_packed_int, .two_op = 1 }, - [0x01] = { .simd_size = simd_packed_fp, .two_op = 1 }, + [0x00] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, + [0x01] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x02] = { .simd_size = simd_packed_int }, - [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1 }, + [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x06] = { .simd_size = simd_packed_fp }, [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1 }, [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc }, @@ -8102,6 +8104,9 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f, 0xf2): /* vpslld xmm/m128,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xf3): /* vpsllq xmm/m128,[xyz]mm,[xyz]mm{k} */ generate_exception_if(evex.brs, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x0c): /* vpermilps [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x0d): /* vpermilpd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ fault_suppression = false; if ( b == 0xe2 ) goto avx512f_no_sae; @@ -8447,6 +8452,12 @@ x86_emulate( generate_exception_if(!vex.l || vex.w, EXC_UD); goto simd_0f_avx2; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x16): /* vpermp{s,d} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x36): /* vperm{d,q} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ + generate_exception_if(!evex.lr, EXC_UD); + fault_suppression = false; + goto avx512f_no_sae; + case X86EMUL_OPC_VEX_66(0x0f38, 0x20): /* vpmovsxbw xmm/mem,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,{x,y}mm */ @@ -8652,6 +8663,7 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x8d): /* vperm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ if ( !evex.w ) host_and_vcpu_must_have(avx512_vbmi); else @@ -9077,6 +9089,12 @@ x86_emulate( generate_exception_if(!vex.l || !vex.w, EXC_UD); goto simd_0f_imm8_avx2; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x00): /* vpermq $imm8,{y,z}mm/mem,{y,z}mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x01): /* vpermpd $imm8,{y,z}mm/mem,{y,z}mm{k} */ + generate_exception_if(!evex.lr || !evex.w, EXC_UD); + fault_suppression = false; + goto avx512f_imm8_no_sae; + case X86EMUL_OPC_VEX_66(0x0f3a, 0x38): /* vinserti128 $imm8,xmm/m128,ymm,ymm */ case X86EMUL_OPC_VEX_66(0x0f3a, 0x39): /* vextracti128 $imm8,ymm,xmm/m128 */ case X86EMUL_OPC_VEX_66(0x0f3a, 0x46): /* vperm2i128 $imm8,ymm/m256,ymm,ymm */ @@ -9096,6 +9114,12 @@ x86_emulate( generate_exception_if(vex.w, EXC_UD); goto simd_0f_imm8_avx; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x04): /* vpermilps $imm8,[xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x05): /* vpermilpd $imm8,[xyz]mm/mem,[xyz]mm{k} */ + generate_exception_if(evex.w != (b & 1), EXC_UD); + fault_suppression = false; + goto avx512f_imm8_no_sae; + case X86EMUL_OPC_66(0x0f3a, 0x08): /* roundps $imm8,xmm/m128,xmm */ case X86EMUL_OPC_66(0x0f3a, 0x09): /* roundpd $imm8,xmm/m128,xmm */ case X86EMUL_OPC_66(0x0f3a, 0x0a): /* roundss $imm8,xmm/m128,xmm */ From patchwork Fri Mar 15 10:46:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854481 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D1FC13B5 for ; Fri, 15 Mar 2019 10:48:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86C502A934 for ; Fri, 15 Mar 2019 10:48:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B4F62A938; Fri, 15 Mar 2019 10:48:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 748992A936 for ; Fri, 15 Mar 2019 10:48:10 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kMH-0005TD-Ke; Fri, 15 Mar 2019 10:46:37 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kMG-0005Sw-K9 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:46:36 +0000 X-Inumbo-ID: 9a05e01e-470f-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 9a05e01e-470f-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:46:34 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:46:34 -0600 Message-Id: <5C8B8289020000780021F1C4@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:46:33 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 18/50] x86emul: support AVX512BW pack insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP No further test harness additions - what is there is good enough for these rather "regular" insns. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -306,6 +306,10 @@ static const struct test avx512bw_all[] INSN(movdqu8, f2, 0f, 7f, vl, b, vl), INSN(movdqu16, f2, 0f, 6f, vl, w, vl), INSN(movdqu16, f2, 0f, 7f, vl, w, vl), + INSN(packssdw, 66, 0f, 6b, vl, d_nb, vl), + INSN(packsswb, 66, 0f, 63, vl, w, vl), + INSN(packusdw, 66, 0f38, 2b, vl, d_nb, vl), + INSN(packuswb, 66, 0f, 67, vl, w, vl), INSN(paddb, 66, 0f, fc, vl, b, vl), INSN(paddsb, 66, 0f, ec, vl, b, vl), INSN(paddsw, 66, 0f, ed, vl, w, vl), --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -361,6 +361,10 @@ OVR(pextrw); OVR(pinsrb); OVR(pinsrw); # ifdef __AVX512VL__ +OVR(packssdw); +OVR(packsswb); +OVR(packusdw); +OVR(packuswb); OVR(pmaddwd); OVR(pmovsxbw); OVR(pmovzxbw); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -453,7 +453,7 @@ static const struct ext0f38_table { [0x25] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x26 ... 0x29] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x2a] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, - [0x2b] = { .simd_size = simd_packed_int }, + [0x2b] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x2c ... 0x2d] = { .simd_size = simd_packed_fp }, [0x2e ... 0x2f] = { .simd_size = simd_packed_fp, .to_mem = 1 }, [0x30] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, @@ -6744,6 +6744,8 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f, 0x69): /* vpunpckhwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ op_bytes = 16 << evex.lr; /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x63): /* vpacksswb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x67): /* vpackuswb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xd1): /* vpsrlw xmm/m128,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xe1): /* vpsraw xmm/m128,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xf1): /* vpsllw xmm/m128,[xyz]mm,[xyz]mm{k} */ @@ -6805,6 +6807,12 @@ x86_emulate( avx512_vlen_check(false); goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f, 0x6b): /* vpackssdw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x2b): /* vpackusdw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + generate_exception_if(evex.w || evex.brs, EXC_UD); + fault_suppression = false; + goto avx512f_no_sae; + case X86EMUL_OPC_EVEX_66(0x0f, 0x6c): /* vpunpcklqdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0x6d): /* vpunpckhqdq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ fault_suppression = false; From patchwork Fri Mar 15 10:47:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854483 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FDFF1880 for ; Fri, 15 Mar 2019 10:49:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D87F42A271 for ; Fri, 15 Mar 2019 10:49:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CBF162A2CF; Fri, 15 Mar 2019 10:49:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AF6942A1CA for ; Fri, 15 Mar 2019 10:49:22 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kNB-0005d4-1f; Fri, 15 Mar 2019 10:47:33 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kN8-0005cl-UU for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:47:31 +0000 X-Inumbo-ID: ba92195c-470f-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id ba92195c-470f-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:47:29 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:47:28 -0600 Message-Id: <5C8B82BF020000780021F1C7@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:47:27 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 19/50] x86emul: support AVX512F floating-point conversion insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP VCVTPS2PD, sharing its main opcode with others, needs a "manual" override of disp8scale. The simd_size change for twobyte_table[0x5a] is benign to pre-existing code, but allows decode_disp8scale() to work as is here. Also correct the comment on an AVX counterpart. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: ea.type == OP_* -> ea.type != OP_*. Re-base. v6: Re-base over changes earlier in the series. v5: Re-base over changes earlier in the series. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -109,6 +109,12 @@ static const struct test avx512f_all[] = INSN_FP(cmp, 0f, c2), INSN(comisd, 66, 0f, 2f, el, q, el), INSN(comiss, , 0f, 2f, el, d, el), + INSN(cvtpd2ps, 66, 0f, 5a, vl, q, vl), + INSN(cvtph2ps, 66, 0f38, 13, vl_2, d_nb, vl), + INSN(cvtps2pd, , 0f, 5a, vl_2, d, vl), + INSN(cvtps2ph, 66, 0f3a, 1d, vl_2, d_nb, vl), + INSN(cvtsd2ss, f2, 0f, 5a, el, q, el), + INSN(cvtss2sd, f3, 0f, 5a, el, d, el), INSN_FP(div, 0f, 5e), INSN(fmadd132, 66, 0f38, 98, vl, sd, vl), INSN(fmadd132, 66, 0f38, 99, el, sd, el), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -181,7 +181,9 @@ static inline bool _to_bool(byte_vec_t b # define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) # define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) +# define shrink1(x) BR_(cvtpd2ps, _mask, (vdf_t)(x), (vsf_half_t){}, ~0) # define sqrt(x) BR(sqrtps, _mask, x, undef(), ~0) +# define widen1(x) ((vec_t)BR(cvtps2pd, _mask, x, (vdf_t)undef(), ~0)) # if VEC_SIZE == 16 # define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0) # define interleave_lo(x, y) B(unpcklps, _mask, x, y, undef(), ~0) --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -68,6 +68,7 @@ typedef short __attribute__((vector_size typedef int __attribute__((vector_size(VEC_SIZE))) vsi_t; #if VEC_SIZE >= 8 typedef long long __attribute__((vector_size(VEC_SIZE))) vdi_t; +typedef double __attribute__((vector_size(VEC_SIZE))) vdf_t; #endif #if ELEM_SIZE == 1 @@ -93,6 +94,7 @@ typedef char __attribute__((vector_size( typedef short __attribute__((vector_size(HALF_SIZE))) vhi_half_t; typedef int __attribute__((vector_size(HALF_SIZE))) vsi_half_t; typedef long long __attribute__((vector_size(HALF_SIZE))) vdi_half_t; +typedef float __attribute__((vector_size(HALF_SIZE))) vsf_half_t; # endif # if ELEM_COUNT >= 4 @@ -328,6 +330,13 @@ REN(pandn, , d); REN(por, , d); REN(pxor, , d); # endif +OVR(cvtpd2psx); +OVR(cvtpd2psy); +OVR(cvtph2ps); +OVR(cvtps2pd); +OVR(cvtps2ph); +OVR(cvtsd2ss); +OVR(cvtss2sd); OVR(movddup); OVR(movntdq); OVR(movntdqa); --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -3871,6 +3871,49 @@ int main(int argc, char **argv) else printf("skipped\n"); + printf("%-40s", "Testing vcvtph2ps 32(%ecx),%zmm7{%k4}..."); + if ( stack_exec && cpu_has_avx512f ) + { + decl_insn(evex_vcvtph2ps); + decl_insn(evex_vcvtps2ph); + + asm volatile ( "vpternlogd $0x81, %%zmm7, %%zmm7, %%zmm7\n\t" + "kmovw %1,%%k4\n" + put_insn(evex_vcvtph2ps, "vcvtph2ps 32(%0), %%zmm7%{%%k4%}") + :: "c" (NULL), "r" (0x3333) ); + + set_insn(evex_vcvtph2ps); + memset(res, 0xff, 128); + res[8] = 0x40003c00; /* (1.0, 2.0) */ + res[10] = 0x44004200; /* (3.0, 4.0) */ + res[12] = 0x3400b800; /* (-.5, .25) */ + res[14] = 0xbc000000; /* (0.0, -1.) */ + regs.ecx = (unsigned long)res; + rc = x86_emulate(&ctxt, &emulops); + asm volatile ( "vmovups %%zmm7, %0" : "=m" (res[16]) ); + if ( rc != X86EMUL_OKAY || !check_eip(evex_vcvtph2ps) ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing vcvtps2ph $0,%zmm3,64(%edx){%k4}..."); + asm volatile ( "vmovups %0, %%zmm3\n" + put_insn(evex_vcvtps2ph, "vcvtps2ph $0, %%zmm3, 128(%1)%{%%k4%}") + :: "m" (res[16]), "d" (NULL) ); + + set_insn(evex_vcvtps2ph); + regs.edx = (unsigned long)res; + memset(res + 32, 0xcc, 32); + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(evex_vcvtps2ph) ) + goto fail; + res[15] = res[13] = res[11] = res[9] = 0xcccccccc; + if ( memcmp(res + 8, res + 32, 32) ) + goto fail; + printf("okay\n"); + } + else + printf("skipped\n"); + #undef decl_insn #undef put_insn #undef set_insn --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -310,7 +310,8 @@ static const struct twobyte_table { [0x52 ... 0x53] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_single_fp }, [0x54 ... 0x57] = { DstImplicit|SrcMem|ModRM, simd_packed_fp, d8s_vl }, [0x58 ... 0x59] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl }, - [0x5a ... 0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, + [0x5a] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp, d8s_vl }, + [0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, [0x5c ... 0x5f] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl }, [0x60 ... 0x62] = { DstImplicit|SrcMem|ModRM, simd_other, d8s_vl }, [0x63 ... 0x67] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, @@ -437,7 +438,7 @@ static const struct ext0f38_table { [0x0c ... 0x0d] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x0e ... 0x0f] = { .simd_size = simd_packed_fp }, [0x10 ... 0x12] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, - [0x13] = { .simd_size = simd_other, .two_op = 1 }, + [0x13] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x14 ... 0x16] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x17] = { .simd_size = simd_packed_int, .two_op = 1 }, [0x18] = { .simd_size = simd_scalar_opc, .two_op = 1, .d8s = 2 }, @@ -541,7 +542,7 @@ static const struct ext0f3a_table { [0x19] = { .simd_size = simd_128, .to_mem = 1, .two_op = 1, .d8s = 4 }, [0x1a] = { .simd_size = simd_256, .d8s = d8s_vl_by_2 }, [0x1b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 }, - [0x1d] = { .simd_size = simd_other, .to_mem = 1, .two_op = 1 }, + [0x1d] = { .simd_size = simd_other, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x1e ... 0x1f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x20] = { .simd_size = simd_none, .d8s = 0 }, [0x21] = { .simd_size = simd_other, .d8s = 2 }, @@ -3071,6 +3072,11 @@ x86_decode( modrm_mod = 3; break; + case 0x5a: /* vcvtps2pd needs special casing */ + if ( disp8scale && !evex.pfx && !evex.brs ) + --disp8scale; + break; + case 0x7e: /* vmovq xmm/m64,xmm needs special casing */ if ( disp8scale == 2 && evex.pfx == vex_f3 ) disp8scale = 3; @@ -5998,6 +6004,7 @@ x86_emulate( CASE_SIMD_ALL_FP(_EVEX, 0x0f, 0x5d): /* vmin{p,s}{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ CASE_SIMD_ALL_FP(_EVEX, 0x0f, 0x5e): /* vdiv{p,s}{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ CASE_SIMD_ALL_FP(_EVEX, 0x0f, 0x5f): /* vmax{p,s}{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + avx512f_all_fp: generate_exception_if((evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK) || (ea.type != OP_REG && evex.brs && (evex.pfx & VEX_PREFIX_SCALAR_MASK))), @@ -6557,7 +6564,7 @@ x86_emulate( goto simd_zmm; CASE_SIMD_ALL_FP(, 0x0f, 0x5a): /* cvt{p,s}{s,d}2{p,s}{s,d} xmm/mem,xmm */ - CASE_SIMD_ALL_FP(_VEX, 0x0f, 0x5a): /* vcvtp{s,d}2p{s,d} xmm/mem,xmm */ + CASE_SIMD_ALL_FP(_VEX, 0x0f, 0x5a): /* vcvtp{s,d}2p{s,d} {x,y}mm/mem,{x,y}mm */ /* vcvts{s,d}2s{s,d} xmm/mem,xmm,xmm */ op_bytes = 4 << (((vex.pfx & VEX_PREFIX_SCALAR_MASK) ? 0 : 1 + vex.l) + !!(vex.pfx & VEX_PREFIX_DOUBLE_MASK)); @@ -6566,6 +6573,12 @@ x86_emulate( goto simd_0f_sse2; goto simd_0f_avx; + CASE_SIMD_ALL_FP(_EVEX, 0x0f, 0x5a): /* vcvtp{s,d}2p{s,d} [xyz]mm/mem,[xyz]mm{k} */ + /* vcvts{s,d}2s{s,d} xmm/mem,xmm,xmm{k} */ + op_bytes = 4 << (((evex.pfx & VEX_PREFIX_SCALAR_MASK) ? 0 : 1 + evex.lr) + + evex.w); + goto avx512f_all_fp; + CASE_SIMD_PACKED_FP(, 0x0f, 0x5b): /* cvt{ps,dq}2{dq,ps} xmm/mem,xmm */ CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x5b): /* vcvt{ps,dq}2{dq,ps} {x,y}mm/mem,{x,y}mm */ case X86EMUL_OPC_F3(0x0f, 0x5b): /* cvttps2dq xmm/mem,xmm */ @@ -8455,6 +8468,15 @@ x86_emulate( op_bytes = 8 << vex.l; goto simd_0f_ymm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x13): /* vcvtph2ps {x,y}mm/mem,[xyz]mm{k} */ + generate_exception_if(evex.w || (ea.type != OP_REG && evex.brs), EXC_UD); + host_and_vcpu_must_have(avx512f); + if ( !evex.brs ) + avx512_vlen_check(false); + op_bytes = 8 << evex.lr; + elem_bytes = 2; + goto simd_zmm; + case X86EMUL_OPC_VEX_66(0x0f38, 0x16): /* vpermps ymm/m256,ymm,ymm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x36): /* vpermd ymm/m256,ymm,ymm */ generate_exception_if(!vex.l || vex.w, EXC_UD); @@ -9283,27 +9305,79 @@ x86_emulate( goto avx512f_imm8_no_sae; case X86EMUL_OPC_VEX_66(0x0f3a, 0x1d): /* vcvtps2ph $imm8,{x,y}mm,xmm/mem */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x1d): /* vcvtps2ph $imm8,[xyz]mm,{x,y}mm/mem{k} */ { uint32_t mxcsr; - generate_exception_if(vex.w || vex.reg != 0xf, EXC_UD); - host_and_vcpu_must_have(f16c); fail_if(!ops->write); + if ( evex_encoded() ) + { + generate_exception_if((evex.w || evex.reg != 0xf || !evex.RX || + (ea.type != OP_REG && (evex.z || evex.brs))), + EXC_UD); + host_and_vcpu_must_have(avx512f); + avx512_vlen_check(false); + opc = init_evex(stub); + } + else + { + generate_exception_if(vex.w || vex.reg != 0xf, EXC_UD); + host_and_vcpu_must_have(f16c); + opc = init_prefixes(stub); + } + + op_bytes = 8 << evex.lr; - opc = init_prefixes(stub); opc[0] = b; opc[1] = modrm; if ( ea.type == OP_MEM ) { /* Convert memory operand to (%rAX). */ vex.b = 1; + evex.b = 1; opc[1] &= 0x38; } opc[2] = imm1; - insn_bytes = PFX_BYTES + 3; + if ( evex_encoded() ) + { + unsigned int full = 0; + + insn_bytes = EVEX_PFX_BYTES + 3; + copy_EVEX(opc, evex); + + if ( ea.type == OP_MEM && evex.opmsk ) + { + full = 0xffff >> (16 - op_bytes / 2); + op_mask &= full; + if ( !op_mask ) + goto complete_insn; + + first_byte = __builtin_ctz(op_mask); + op_mask >>= first_byte; + full >>= first_byte; + first_byte <<= 1; + op_bytes = (32 - __builtin_clz(op_mask)) << 1; + + /* + * We may need to read (parts of) the memory operand for the + * purpose of merging in order to avoid splitting the write + * below into multiple ones. + */ + if ( op_mask != full && + (rc = ops->read(ea.mem.seg, + truncate_ea(ea.mem.off + first_byte), + (void *)mmvalp + first_byte, op_bytes, + ctxt)) != X86EMUL_OKAY ) + goto done; + } + } + else + { + insn_bytes = PFX_BYTES + 3; + copy_VEX(opc, vex); + } opc[3] = 0xc3; - copy_VEX(opc, vex); /* Latch MXCSR - we may need to restore it below. */ invoke_stub("stmxcsr %[mxcsr]", "", "=m" (*mmvalp), [mxcsr] "=m" (mxcsr) : "a" (mmvalp)); @@ -9312,7 +9386,8 @@ x86_emulate( if ( ea.type == OP_MEM ) { - rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp, 8 << vex.l, ctxt); + rc = ops->write(ea.mem.seg, truncate_ea(ea.mem.off + first_byte), + (void *)mmvalp + first_byte, op_bytes, ctxt); if ( rc != X86EMUL_OKAY ) { asm volatile ( "ldmxcsr %0" :: "m" (mxcsr) ); From patchwork Fri Mar 15 10:47:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854485 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C76FC1515 for ; Fri, 15 Mar 2019 10:49:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD9DB2A8F6 for ; Fri, 15 Mar 2019 10:49:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A07CD2A934; Fri, 15 Mar 2019 10:49:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id F243E2A8F6 for ; Fri, 15 Mar 2019 10:49:34 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kNY-0005hk-He; Fri, 15 Mar 2019 10:47:56 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kNX-0005hZ-G8 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:47:55 +0000 X-Inumbo-ID: c7b84530-470f-11e9-b3a0-63b6fccfb590 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id c7b84530-470f-11e9-b3a0-63b6fccfb590; Fri, 15 Mar 2019 10:47:51 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:47:50 -0600 Message-Id: <5C8B82D6020000780021F1CA@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:47:50 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 20/50] x86emul: support AVX512F legacy-equivalent packed int/FP conversion insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP ... including the two AVX512DQ forms which shared encodings, just with EVEX.W set there. VCVTDQ2PD, sharing its main opcode with others, needs a "manual" override of disp8scale. The simd_size changes for the twobyte_table[] entries are benign to pre-existing code, but allow decode_disp8scale() to work as is here. The at this point wrong placement of the 0xe6 case block is once again in anticipation of further additions of case labels. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: ea.type == OP_* -> ea.type != OP_*. Re-base. v6: Re-base over changes earlier in the series. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -109,8 +109,12 @@ static const struct test avx512f_all[] = INSN_FP(cmp, 0f, c2), INSN(comisd, 66, 0f, 2f, el, q, el), INSN(comiss, , 0f, 2f, el, d, el), + INSN(cvtdq2pd, f3, 0f, e6, vl_2, d, vl), + INSN(cvtdq2ps, , 0f, 5b, vl, d, vl), + INSN(cvtpd2dq, f2, 0f, e6, vl, q, vl), INSN(cvtpd2ps, 66, 0f, 5a, vl, q, vl), INSN(cvtph2ps, 66, 0f38, 13, vl_2, d_nb, vl), + INSN(cvtps2dq, 66, 0f, 5b, vl, d, vl), INSN(cvtps2pd, , 0f, 5a, vl_2, d, vl), INSN(cvtps2ph, 66, 0f3a, 1d, vl_2, d_nb, vl), INSN(cvtsd2ss, f2, 0f, 5a, el, q, el), @@ -398,6 +402,8 @@ static const struct test avx512dq_all[] INSN_PFP(and, 0f, 54), INSN_PFP(andn, 0f, 55), INSN(broadcasti32x2, 66, 0f38, 59, el_2, d, vl), + INSN(cvtqq2pd, f3, 0f, e6, vl, q, vl), + INSN(cvtqq2ps, , 0f, 5b, vl, q, vl), INSN_PFP(or, 0f, 56), // pmovd2m, f3, 0f38, 39, d // pmovm2, f3, 0f38, 38, dq --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -92,6 +92,13 @@ static inline bool _to_bool(byte_vec_t b # define to_int(x) ((vec_t){ (int)(x)[0] }) #elif VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW__) # define to_int(x) __builtin_ia32_pi2fd(__builtin_ia32_pf2id(x)) +#elif defined(FLOAT_SIZE) && VEC_SIZE > FLOAT_SIZE && defined(__AVX512F__) && \ + (VEC_SIZE == 64 || defined(__AVX512VL__)) +# if FLOAT_SIZE == 4 +# define to_int(x) BR(cvtdq2ps, _mask, BR(cvtps2dq, _mask, x, (vsi_t)undef(), ~0), undef(), ~0) +# elif FLOAT_SIZE == 8 +# define to_int(x) B(cvtdq2pd, _mask, BR(cvtpd2dq, _mask, x, (vsi_half_t){}, ~0), undef(), ~0) +# endif #elif VEC_SIZE == 16 && defined(__SSE2__) # if FLOAT_SIZE == 4 # define to_int(x) __builtin_ia32_cvtdq2ps(__builtin_ia32_cvtps2dq(x)) @@ -1142,15 +1149,21 @@ int simd_test(void) touch(src); if ( !eq(x * -alt, -src) ) return __LINE__; -# if defined(recip) && defined(to_int) +# ifdef to_int + + touch(src); + x = to_int(src); + touch(src); + if ( !eq(x, src) ) return __LINE__; +# ifdef recip touch(src); x = recip(src); touch(src); touch(x); if ( !eq(to_int(recip(x)), src) ) return __LINE__; -# ifdef rsqrt +# ifdef rsqrt x = src * src; touch(x); y = rsqrt(x); @@ -1158,6 +1171,7 @@ int simd_test(void) if ( !eq(to_int(recip(y)), src) ) return __LINE__; touch(src); if ( !eq(to_int(y), to_int(recip(src))) ) return __LINE__; +# endif # endif # endif --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -244,6 +244,7 @@ asm ( ".macro override insn \n\t" OVR_INT(broadcast); OVR_SFP(broadcast); OVR_SFP(comi); +OVR_VFP(cvtdq2); OVR_FP(add); OVR_INT(add); OVR_BW(adds); @@ -330,13 +331,19 @@ REN(pandn, , d); REN(por, , d); REN(pxor, , d); # endif +OVR(cvtpd2dqx); +OVR(cvtpd2dqy); OVR(cvtpd2psx); OVR(cvtpd2psy); OVR(cvtph2ps); +OVR(cvtps2dq); OVR(cvtps2pd); OVR(cvtps2ph); OVR(cvtsd2ss); OVR(cvtss2sd); +OVR(cvttpd2dqx); +OVR(cvttpd2dqy); +OVR(cvttps2dq); OVR(movddup); OVR(movntdq); OVR(movntdqa); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -311,7 +311,7 @@ static const struct twobyte_table { [0x54 ... 0x57] = { DstImplicit|SrcMem|ModRM, simd_packed_fp, d8s_vl }, [0x58 ... 0x59] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl }, [0x5a] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp, d8s_vl }, - [0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, + [0x5b] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl }, [0x5c ... 0x5f] = { DstImplicit|SrcMem|ModRM, simd_any_fp, d8s_vl }, [0x60 ... 0x62] = { DstImplicit|SrcMem|ModRM, simd_other, d8s_vl }, [0x63 ... 0x67] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, @@ -375,7 +375,7 @@ static const struct twobyte_table { [0xe0] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, [0xe1 ... 0xe2] = { DstImplicit|SrcMem|ModRM, simd_128, 4 }, [0xe3 ... 0xe5] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, - [0xe6] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, + [0xe6] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl }, [0xe7] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_int, d8s_vl }, [0xe8 ... 0xef] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, [0xf0] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, @@ -3081,6 +3081,11 @@ x86_decode( if ( disp8scale == 2 && evex.pfx == vex_f3 ) disp8scale = 3; break; + + case 0xe6: /* vcvtdq2pd needs special casing */ + if ( disp8scale && evex.pfx == vex_f3 && !evex.w && !evex.brs ) + --disp8scale; + break; } break; @@ -6587,6 +6592,22 @@ x86_emulate( op_bytes = 16 << vex.l; goto simd_0f_cvt; + case X86EMUL_OPC_EVEX_66(0x0f, 0x5b): /* vcvtps2dq [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_F3(0x0f, 0x5b): /* vcvttps2dq [xyz]mm/mem,[xyz]mm{k} */ + generate_exception_if(evex.w, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX(0x0f, 0x5b): /* vcvtdq2ps [xyz]mm/mem,[xyz]mm{k} */ + /* vcvtqq2ps [xyz]mm/mem,{x,y}mm{k} */ + if ( evex.w ) + host_and_vcpu_must_have(avx512dq); + else + host_and_vcpu_must_have(avx512f); + if ( ea.type != OP_REG || !evex.brs ) + avx512_vlen_check(false); + d |= TwoOp; + op_bytes = 16 << evex.lr; + goto simd_zmm; + CASE_SIMD_PACKED_INT(0x0f, 0x60): /* punpcklbw {,x}mm/mem,{,x}mm */ case X86EMUL_OPC_VEX_66(0x0f, 0x60): /* vpunpcklbw {x,y}mm/mem,{x,y}mm,{x,y}mm */ CASE_SIMD_PACKED_INT(0x0f, 0x61): /* punpcklwd {,x}mm/mem,{,x}mm */ @@ -7251,6 +7272,27 @@ x86_emulate( op_bytes = 8; goto simd_0f_xmm; + case X86EMUL_OPC_EVEX_66(0x0f, 0xe6): /* vcvttpd2dq [xyz]mm/mem,{x,y}mm{k} */ + case X86EMUL_OPC_EVEX_F2(0x0f, 0xe6): /* vcvtpd2dq [xyz]mm/mem,{x,y}mm{k} */ + generate_exception_if(!evex.w, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_F3(0x0f, 0xe6): /* vcvtdq2pd {x,y}mm/mem,[xyz]mm{k} */ + /* vcvtqq2pd [xyz]mm/mem,[xyz]mm{k} */ + if ( evex.pfx != vex_f3 ) + host_and_vcpu_must_have(avx512f); + else if ( evex.w ) + host_and_vcpu_must_have(avx512dq); + else + { + host_and_vcpu_must_have(avx512f); + generate_exception_if(ea.type != OP_MEM && evex.brs, EXC_UD); + } + if ( ea.type != OP_REG || !evex.brs ) + avx512_vlen_check(false); + d |= TwoOp; + op_bytes = 8 << (evex.w + evex.lr); + goto simd_zmm; + case X86EMUL_OPC_F2(0x0f, 0xf0): /* lddqu m128,xmm */ case X86EMUL_OPC_VEX_F2(0x0f, 0xf0): /* vlddqu mem,{x,y}mm */ generate_exception_if(ea.type != OP_MEM, EXC_UD); From patchwork Fri Mar 15 10:52:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854487 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC8BD13B5 for ; Fri, 15 Mar 2019 10:53:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D36EF2A93A for ; Fri, 15 Mar 2019 10:53:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C703F2A939; Fri, 15 Mar 2019 10:53:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 502A52A939 for ; Fri, 15 Mar 2019 10:53:57 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kRe-0006Y5-6d; Fri, 15 Mar 2019 10:52:10 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kRd-0006Y0-4O for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:52:09 +0000 X-Inumbo-ID: 602ec5c7-4710-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 602ec5c7-4710-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:52:07 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:52:06 -0600 Message-Id: <5C8B83D6020000780021F208@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:52:06 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 21/50] x86emul: support AVX512F legacy-equivalent scalar int/FP conversion insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP VCVT{,T}S{S,D}2SI use EVEX.W for their destination (register) rather than their (possibly memory) source operand size and hence need a "manual" override of disp8scale. While the SDM claims that EVEX.L'L needs to be zero for the 32-bit forms of VCVT{,U}SI2SD (exception type E10NF), observations on my test system do not confirm this (and I've got informal confirmation that this is a doc mistake). Nevertheless, to be on the safe side, force evex.lr to be zero in this case though when constructing the stub. Slightly adjust the scalar to_int() in the test harness, to increase the chances of the operand ending up in memory. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Fix VCVTSI2SS - cannot re-use VMOV{D,Q} code here, as the register form can't be converted to a memory one when embedded rounding is in effect. Force evex.lr to zero for 32-bit VCVTSI2SD. Permit embedded rounding for VCVT{,T}S{S,D}2SI. Re-base. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -117,8 +117,16 @@ static const struct test avx512f_all[] = INSN(cvtps2dq, 66, 0f, 5b, vl, d, vl), INSN(cvtps2pd, , 0f, 5a, vl_2, d, vl), INSN(cvtps2ph, 66, 0f3a, 1d, vl_2, d_nb, vl), + INSN(cvtsd2si, f2, 0f, 2d, el, q, el), INSN(cvtsd2ss, f2, 0f, 5a, el, q, el), + INSN(cvtsi2sd, f2, 0f, 2a, el, dq64, el), + INSN(cvtsi2ss, f3, 0f, 2a, el, dq64, el), INSN(cvtss2sd, f3, 0f, 5a, el, d, el), + INSN(cvtss2si, f3, 0f, 2d, el, d, el), + INSN(cvttpd2dq, 66, 0f, e6, vl, q, vl), + INSN(cvttps2dq, f3, 0f, 5b, vl, d, vl), + INSN(cvttsd2si, f2, 0f, 2c, el, q, el), + INSN(cvttss2si, f3, 0f, 2c, el, d, el), INSN_FP(div, 0f, 5e), INSN(fmadd132, 66, 0f38, 98, vl, sd, vl), INSN(fmadd132, 66, 0f38, 99, el, sd, el), @@ -746,8 +754,9 @@ static void test_group(const struct test break; case ESZ_dq: - test_pair(&tests[i], vl[j], ESZ_d, "d", ESZ_q, "q", - instr, ctxt); + test_pair(&tests[i], vl[j], ESZ_d, + strncmp(tests[i].mnemonic, "cvt", 3) ? "d" : "l", + ESZ_q, "q", instr, ctxt); break; #ifdef __i386__ --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -89,7 +89,7 @@ static inline bool _to_bool(byte_vec_t b #endif #if VEC_SIZE == FLOAT_SIZE -# define to_int(x) ((vec_t){ (int)(x)[0] }) +# define to_int(x) ({ int i_ = (x)[0]; touch(i_); ((vec_t){ i_ }); }) #elif VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW__) # define to_int(x) __builtin_ia32_pi2fd(__builtin_ia32_pf2id(x)) #elif defined(FLOAT_SIZE) && VEC_SIZE > FLOAT_SIZE && defined(__AVX512F__) && \ --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -340,10 +340,28 @@ OVR(cvtps2dq); OVR(cvtps2pd); OVR(cvtps2ph); OVR(cvtsd2ss); +OVR(cvtsd2si); +OVR(cvtsd2sil); +OVR(cvtsd2siq); +OVR(cvtsi2sd); +OVR(cvtsi2sdl); +OVR(cvtsi2sdq); +OVR(cvtsi2ss); +OVR(cvtsi2ssl); +OVR(cvtsi2ssq); OVR(cvtss2sd); +OVR(cvtss2si); +OVR(cvtss2sil); +OVR(cvtss2siq); OVR(cvttpd2dqx); OVR(cvttpd2dqy); OVR(cvttps2dq); +OVR(cvttsd2si); +OVR(cvttsd2sil); +OVR(cvttsd2siq); +OVR(cvttss2si); +OVR(cvttss2sil); +OVR(cvttss2siq); OVR(movddup); OVR(movntdq); OVR(movntdqa); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -296,7 +296,7 @@ static const struct twobyte_table { [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM }, [0x28] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl }, [0x29] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_fp, d8s_vl }, - [0x2a] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, + [0x2a] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_dq64 }, [0x2b] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp, d8s_vl }, [0x2c ... 0x2d] = { DstImplicit|SrcMem|ModRM|Mov, simd_other }, [0x2e ... 0x2f] = { ImplicitOps|ModRM|TwoOp, simd_none, d8s_dq }, @@ -3072,6 +3072,12 @@ x86_decode( modrm_mod = 3; break; + case 0x2c: /* vcvtts{s,d}2si need special casing */ + case 0x2d: /* vcvts{s,d}2si need special casing */ + if ( evex_encoded() ) + disp8scale = 2 + (evex.pfx & VEX_PREFIX_DOUBLE_MASK); + break; + case 0x5a: /* vcvtps2pd needs special casing */ if ( disp8scale && !evex.pfx && !evex.brs ) --disp8scale; @@ -6199,6 +6205,48 @@ x86_emulate( state->simd_size = simd_none; goto simd_0f_rm; + CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2a): /* vcvtsi2s{s,d} r/m,xmm,xmm */ + generate_exception_if(evex.opmsk || (ea.type != OP_REG && evex.brs), + EXC_UD); + host_and_vcpu_must_have(avx512f); + if ( !evex.brs ) + avx512_vlen_check(true); + get_fpu(X86EMUL_FPU_zmm); + + if ( ea.type == OP_MEM ) + { + rc = read_ulong(ea.mem.seg, ea.mem.off, &src.val, + rex_prefix & REX_W ? 8 : 4, ctxt, ops); + if ( rc != X86EMUL_OKAY ) + goto done; + } + else + src.val = *ea.reg; + + opc = init_evex(stub); + opc[0] = b; + /* Convert memory/GPR source to %rAX. */ + evex.b = 1; + if ( !mode_64bit() ) + evex.w = 0; + /* + * SDM version 067 claims that exception type E10NF implies #UD when + * EVEX.L'L is non-zero for 32-bit VCVT{,U}SI2SD. Experimentally this + * cannot be confirmed, but be on the safe side for the stub. + */ + if ( !evex.w && evex.pfx == vex_f2 ) + evex.lr = 0; + opc[1] = (modrm & 0x38) | 0xc0; + insn_bytes = EVEX_PFX_BYTES + 2; + opc[2] = 0xc3; + + copy_EVEX(opc, evex); + invoke_stub("", "", "=g" (dummy) : "a" (src.val)); + + put_stub(stub); + state->simd_size = simd_none; + break; + CASE_SIMD_SCALAR_FP(, 0x0f, 0x2c): /* cvtts{s,d}2si xmm/mem,reg */ CASE_SIMD_SCALAR_FP(_VEX, 0x0f, 0x2c): /* vcvtts{s,d}2si xmm/mem,reg */ CASE_SIMD_SCALAR_FP(, 0x0f, 0x2d): /* cvts{s,d}2si xmm/mem,reg */ @@ -6222,14 +6270,17 @@ x86_emulate( } opc = init_prefixes(stub); + cvts_2si: opc[0] = b; /* Convert GPR destination to %rAX and memory operand to (%rCX). */ rex_prefix &= ~REX_R; vex.r = 1; + evex.r = 1; if ( ea.type == OP_MEM ) { rex_prefix &= ~REX_B; vex.b = 1; + evex.b = 1; opc[1] = 0x01; rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp, @@ -6240,11 +6291,22 @@ x86_emulate( else opc[1] = modrm & 0xc7; if ( !mode_64bit() ) + { vex.w = 0; - insn_bytes = PFX_BYTES + 2; + evex.w = 0; + } + if ( evex_encoded() ) + { + insn_bytes = EVEX_PFX_BYTES + 2; + copy_EVEX(opc, evex); + } + else + { + insn_bytes = PFX_BYTES + 2; + copy_REX_VEX(opc, rex_prefix, vex); + } opc[2] = 0xc3; - copy_REX_VEX(opc, rex_prefix, vex); ea.reg = decode_gpr(&_regs, modrm_reg); invoke_stub("", "", "=a" (*ea.reg) : "c" (mmvalp), "m" (*mmvalp)); @@ -6252,6 +6314,18 @@ x86_emulate( state->simd_size = simd_none; break; + CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2c): /* vcvtts{s,d}2si xmm/mem,reg */ + CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2d): /* vcvts{s,d}2si xmm/mem,reg */ + generate_exception_if((evex.reg != 0xf || !evex.RX || evex.opmsk || + (ea.type != OP_REG && evex.brs)), + EXC_UD); + host_and_vcpu_must_have(avx512f); + if ( !evex.brs ) + avx512_vlen_check(true); + get_fpu(X86EMUL_FPU_zmm); + opc = init_evex(stub); + goto cvts_2si; + CASE_SIMD_PACKED_FP(, 0x0f, 0x2e): /* ucomis{s,d} xmm/mem,xmm */ CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x2e): /* vucomis{s,d} xmm/mem,xmm */ CASE_SIMD_PACKED_FP(, 0x0f, 0x2f): /* comis{s,d} xmm/mem,xmm */ From patchwork Fri Mar 15 10:52:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854489 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5939513B5 for ; Fri, 15 Mar 2019 10:54:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3DFA12908D for ; Fri, 15 Mar 2019 10:54:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F81C291B4; Fri, 15 Mar 2019 10:54:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A34EE2908D for ; Fri, 15 Mar 2019 10:54:15 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kS2-0006aX-HJ; Fri, 15 Mar 2019 10:52:34 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kS0-0006aM-Qr for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:52:32 +0000 X-Inumbo-ID: 6eb0479b-4710-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 6eb0479b-4710-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:52:31 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:52:30 -0600 Message-Id: <5C8B83EE020000780021F20B@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:52:30 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 22/50] x86emul: support AVX512DQ packed quad-int/FP conversion insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP VCVT{,T}PS2QQ, sharing their main opcodes with others, once again need "manual" overrides of disp8scale. While not directly related here, also add a scalar variant of to_wint() to the test harness. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v6: Workaround for gcc 7 quirk. v5: Re-base over changes earlier in the series. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -410,8 +410,12 @@ static const struct test avx512dq_all[] INSN_PFP(and, 0f, 54), INSN_PFP(andn, 0f, 55), INSN(broadcasti32x2, 66, 0f38, 59, el_2, d, vl), + INSN(cvtpd2qq, 66, 0f, 7b, vl, q, vl), + INSN(cvtps2qq, 66, 0f, 7b, vl_2, d, vl), INSN(cvtqq2pd, f3, 0f, e6, vl, q, vl), INSN(cvtqq2ps, , 0f, 5b, vl, q, vl), + INSN(cvttpd2qq, 66, 0f, 7a, vl, q, vl), + INSN(cvttps2qq, 66, 0f, 7a, vl_2, d, vl), INSN_PFP(or, 0f, 56), // pmovd2m, f3, 0f38, 39, d // pmovm2, f3, 0f38, 38, dq --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -90,14 +90,35 @@ static inline bool _to_bool(byte_vec_t b #if VEC_SIZE == FLOAT_SIZE # define to_int(x) ({ int i_ = (x)[0]; touch(i_); ((vec_t){ i_ }); }) +# ifdef __x86_64__ +# define to_wint(x) ({ long l_ = (x)[0]; touch(l_); ((vec_t){ l_ }); }) +# endif #elif VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW__) # define to_int(x) __builtin_ia32_pi2fd(__builtin_ia32_pf2id(x)) #elif defined(FLOAT_SIZE) && VEC_SIZE > FLOAT_SIZE && defined(__AVX512F__) && \ (VEC_SIZE == 64 || defined(__AVX512VL__)) # if FLOAT_SIZE == 4 # define to_int(x) BR(cvtdq2ps, _mask, BR(cvtps2dq, _mask, x, (vsi_t)undef(), ~0), undef(), ~0) +# ifdef __AVX512DQ__ +# define to_wint(x) ({ \ + vsf_half_t t_ = low_half(x); \ + vdi_t lo_, hi_; \ + touch(t_); \ + lo_ = BR(cvtps2qq, _mask, t_, (vdi_t)undef(), ~0); \ + t_ = high_half(x); \ + touch(t_); \ + hi_ = BR(cvtps2qq, _mask, t_, (vdi_t)undef(), ~0); \ + touch(lo_); touch(hi_); \ + insert_half(insert_half(undef(), \ + BR(cvtqq2ps, _mask, lo_, (vsf_half_t){}, ~0), 0), \ + BR(cvtqq2ps, _mask, hi_, (vsf_half_t){}, ~0), 1); \ +}) +# endif # elif FLOAT_SIZE == 8 # define to_int(x) B(cvtdq2pd, _mask, BR(cvtpd2dq, _mask, x, (vsi_half_t){}, ~0), undef(), ~0) +# ifdef __AVX512DQ__ +# define to_wint(x) BR(cvtqq2pd, _mask, BR(cvtpd2qq, _mask, x, (vdi_t)undef(), ~0), undef(), ~0) +# endif # endif #elif VEC_SIZE == 16 && defined(__SSE2__) # if FLOAT_SIZE == 4 @@ -121,6 +142,21 @@ static inline bool _to_bool(byte_vec_t b }) #endif +#if VEC_SIZE == 16 && FLOAT_SIZE == 4 && defined(__SSE__) +# define low_half(x) (x) +# define high_half(x) B_(movhlps, , undef(), x) +/* + * GCC 7 (and perhaps earlier) report a bogus type mismatch for the conditional + * expression below. All works well with this no-op wrapper. + */ +static inline vec_t movlhps(vec_t x, vec_t y) { + return __builtin_ia32_movlhps(x, y); +} +# define insert_pair(x, y, p) \ + ((p) ? movlhps(x, y) \ + : ({ vec_t t_ = (x); t_[0] = (y)[0]; t_[1] = (y)[1]; t_; })) +#endif + #if VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW_A__) # define max __builtin_ia32_pfmax # define min __builtin_ia32_pfmin @@ -149,13 +185,16 @@ static inline bool _to_bool(byte_vec_t b # if ELEM_COUNT == 8 /* vextractf{32,64}x4 */ || \ (ELEM_COUNT == 16 && ELEM_SIZE == 4 && defined(__AVX512DQ__)) /* vextractf32x8 */ || \ (ELEM_COUNT == 4 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextractf64x2 */ -# define low_half(x) ({ \ +# define _half(x, lh) ({ \ half_t t_; \ - asm ( "vextractf%c[w]x%c[n] $0, %[s], %[d]" \ + asm ( "vextractf%c[w]x%c[n] %[sel], %[s], %[d]" \ : [d] "=m" (t_) \ - : [s] "v" (x), [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 2) ); \ + : [s] "v" (x), [sel] "i" (lh), \ + [w] "i" (ELEM_SIZE * 8), [n] "i" (ELEM_COUNT / 2) ); \ t_; \ }) +# define low_half(x) _half(x, 0) +# define high_half(x) _half(x, 1) # endif # if (ELEM_COUNT == 16 && ELEM_SIZE == 4) /* vextractf32x4 */ || \ (ELEM_COUNT == 8 && ELEM_SIZE == 8 && defined(__AVX512DQ__)) /* vextractf64x2 */ @@ -1176,6 +1215,13 @@ int simd_test(void) # endif +# ifdef to_wint + touch(src); + x = to_wint(src); + touch(src); + if ( !eq(x, src) ) return __LINE__; +# endif + # ifdef sqrt x = src * src; touch(x); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -325,6 +325,8 @@ static const struct twobyte_table { [0x77] = { DstImplicit|SrcNone }, [0x78] = { ImplicitOps|ModRM }, [0x79] = { DstReg|SrcMem|ModRM, simd_packed_int }, + [0x7a] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl }, + [0x7b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_vl }, [0x7c ... 0x7d] = { DstImplicit|SrcMem|ModRM, simd_other }, [0x7e] = { DstMem|SrcImplicit|ModRM|Mov, simd_none, d8s_dq64 }, [0x7f] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_int, d8s_vl }, @@ -3083,6 +3085,12 @@ x86_decode( --disp8scale; break; + case 0x7a: /* vcvttps2qq needs special casing */ + case 0x7b: /* vcvtps2qq needs special casing */ + if ( disp8scale && evex.pfx == vex_66 && !evex.w && !evex.brs ) + --disp8scale; + break; + case 0x7e: /* vmovq xmm/m64,xmm needs special casing */ if ( disp8scale == 2 && evex.pfx == vex_f3 ) disp8scale = 3; @@ -7355,7 +7363,13 @@ x86_emulate( if ( evex.pfx != vex_f3 ) host_and_vcpu_must_have(avx512f); else if ( evex.w ) + { + case X86EMUL_OPC_EVEX_66(0x0f, 0x7a): /* vcvttps2qq {x,y}mm/mem,[xyz]mm{k} */ + /* vcvttpd2qq [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x7b): /* vcvtps2qq {x,y}mm/mem,[xyz]mm{k} */ + /* vcvtpd2qq [xyz]mm/mem,[xyz]mm{k} */ host_and_vcpu_must_have(avx512dq); + } else { host_and_vcpu_must_have(avx512f); From patchwork Fri Mar 15 10:53:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854491 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCFE41390 for ; Fri, 15 Mar 2019 10:55:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A4D6E2A864 for ; Fri, 15 Mar 2019 10:55:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9815D2A874; Fri, 15 Mar 2019 10:55:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 23FC92A864 for ; Fri, 15 Mar 2019 10:55:21 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kTB-0006lx-Bp; Fri, 15 Mar 2019 10:53:45 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kTA-0006lm-Ew for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:53:44 +0000 X-Inumbo-ID: 9773a936-4710-11e9-81e2-471df63f5f69 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 9773a936-4710-11e9-81e2-471df63f5f69; Fri, 15 Mar 2019 10:53:40 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:53:39 -0600 Message-Id: <5C8B8431020000780021F20E@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:53:37 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 23/50] x86emul: support AVX512{F, DQ} uint-to-FP conversion insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Some "manual" overrides of disp8scale are needed here again. In particular code ends up simpler when using d8s_dq64 in the twobyte_table[] entry. Test harness additions will be done once the reverse conversions are also available. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -127,6 +127,10 @@ static const struct test avx512f_all[] = INSN(cvttps2dq, f3, 0f, 5b, vl, d, vl), INSN(cvttsd2si, f2, 0f, 2c, el, q, el), INSN(cvttss2si, f3, 0f, 2c, el, d, el), + INSN(cvtudq2pd, f3, 0f, 7a, vl_2, d, vl), + INSN(cvtudq2ps, f2, 0f, 7a, vl, d, vl), + INSN(cvtusi2sd, f2, 0f, 7b, el, dq64, el), + INSN(cvtusi2ss, f3, 0f, 7b, el, dq64, el), INSN_FP(div, 0f, 5e), INSN(fmadd132, 66, 0f38, 98, vl, sd, vl), INSN(fmadd132, 66, 0f38, 99, el, sd, el), @@ -416,6 +420,8 @@ static const struct test avx512dq_all[] INSN(cvtqq2ps, , 0f, 5b, vl, q, vl), INSN(cvttpd2qq, 66, 0f, 7a, vl, q, vl), INSN(cvttps2qq, 66, 0f, 7a, vl_2, d, vl), + INSN(cvtuqq2pd, f3, 0f, 7a, vl, q, vl), + INSN(cvtuqq2ps, f2, 0f, 7a, vl, q, vl), INSN_PFP(or, 0f, 56), // pmovd2m, f3, 0f38, 39, d // pmovm2, f3, 0f38, 38, dq --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -326,7 +326,7 @@ static const struct twobyte_table { [0x78] = { ImplicitOps|ModRM }, [0x79] = { DstReg|SrcMem|ModRM, simd_packed_int }, [0x7a] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl }, - [0x7b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_vl }, + [0x7b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_dq64 }, [0x7c ... 0x7d] = { DstImplicit|SrcMem|ModRM, simd_other }, [0x7e] = { DstMem|SrcImplicit|ModRM|Mov, simd_none, d8s_dq64 }, [0x7f] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_int, d8s_vl }, @@ -3085,12 +3085,16 @@ x86_decode( --disp8scale; break; - case 0x7a: /* vcvttps2qq needs special casing */ - case 0x7b: /* vcvtps2qq needs special casing */ - if ( disp8scale && evex.pfx == vex_66 && !evex.w && !evex.brs ) + case 0x7a: /* vcvttps2qq and vcvtudq2pd need special casing */ + if ( disp8scale && evex.pfx != vex_f2 && !evex.w && !evex.brs ) --disp8scale; break; + case 0x7b: /* vcvtp{s,d}2qq need special casing */ + if ( disp8scale && evex.pfx == vex_66 ) + disp8scale = (evex.brs ? 2 : 3 + evex.lr) + evex.w; + break; + case 0x7e: /* vmovq xmm/m64,xmm needs special casing */ if ( disp8scale == 2 && evex.pfx == vex_f3 ) disp8scale = 3; @@ -6214,6 +6218,7 @@ x86_emulate( goto simd_0f_rm; CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2a): /* vcvtsi2s{s,d} r/m,xmm,xmm */ + CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x7b): /* vcvtusi2s{s,d} r/m,xmm,xmm */ generate_exception_if(evex.opmsk || (ea.type != OP_REG && evex.brs), EXC_UD); host_and_vcpu_must_have(avx512f); @@ -6680,6 +6685,8 @@ x86_emulate( /* fall through */ case X86EMUL_OPC_EVEX(0x0f, 0x5b): /* vcvtdq2ps [xyz]mm/mem,[xyz]mm{k} */ /* vcvtqq2ps [xyz]mm/mem,{x,y}mm{k} */ + case X86EMUL_OPC_EVEX_F2(0x0f, 0x7a): /* vcvtudq2ps [xyz]mm/mem,[xyz]mm{k} */ + /* vcvtuqq2ps [xyz]mm/mem,{x,y}mm{k} */ if ( evex.w ) host_and_vcpu_must_have(avx512dq); else @@ -7358,6 +7365,8 @@ x86_emulate( case X86EMUL_OPC_EVEX_F2(0x0f, 0xe6): /* vcvtpd2dq [xyz]mm/mem,{x,y}mm{k} */ generate_exception_if(!evex.w, EXC_UD); /* fall through */ + case X86EMUL_OPC_EVEX_F3(0x0f, 0x7a): /* vcvtudq2pd {x,y}mm/mem,[xyz]mm{k} */ + /* vcvtuqq2pd [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_F3(0x0f, 0xe6): /* vcvtdq2pd {x,y}mm/mem,[xyz]mm{k} */ /* vcvtqq2pd [xyz]mm/mem,[xyz]mm{k} */ if ( evex.pfx != vex_f3 ) From patchwork Fri Mar 15 10:54:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854493 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D4CB1390 for ; Fri, 15 Mar 2019 10:55:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 236FF2A93E for ; Fri, 15 Mar 2019 10:55:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 17BF42A940; Fri, 15 Mar 2019 10:55:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 537712A93E for ; Fri, 15 Mar 2019 10:55:41 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kTV-0006rY-Ms; Fri, 15 Mar 2019 10:54:05 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kTU-0006rH-6Z for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:54:04 +0000 X-Inumbo-ID: a5335d75-4710-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id a5335d75-4710-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:54:03 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:54:02 -0600 Message-Id: <5C8B844B020000780021F211@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:54:03 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 24/50] x86emul: support AVX512{F, DQ} FP-to-uint conversion insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Along the lines of prior patches, VCVT{,T}PS2UQQ as well as VCVT{,T}S{S,D}2USI need "manual" overrides of disp8scale. The twobyte_table[] entries get altered, with their prior values now put in place in x86_decode_twobyte(). Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v4: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -112,21 +112,29 @@ static const struct test avx512f_all[] = INSN(cvtdq2pd, f3, 0f, e6, vl_2, d, vl), INSN(cvtdq2ps, , 0f, 5b, vl, d, vl), INSN(cvtpd2dq, f2, 0f, e6, vl, q, vl), + INSN(cvtpd2udq, , 0f, 79, vl, q, vl), INSN(cvtpd2ps, 66, 0f, 5a, vl, q, vl), INSN(cvtph2ps, 66, 0f38, 13, vl_2, d_nb, vl), INSN(cvtps2dq, 66, 0f, 5b, vl, d, vl), INSN(cvtps2pd, , 0f, 5a, vl_2, d, vl), INSN(cvtps2ph, 66, 0f3a, 1d, vl_2, d_nb, vl), + INSN(cvtps2udq, , 0f, 79, vl, d, vl), INSN(cvtsd2si, f2, 0f, 2d, el, q, el), + INSN(cvtsd2usi, f2, 0f, 79, el, q, el), INSN(cvtsd2ss, f2, 0f, 5a, el, q, el), INSN(cvtsi2sd, f2, 0f, 2a, el, dq64, el), INSN(cvtsi2ss, f3, 0f, 2a, el, dq64, el), INSN(cvtss2sd, f3, 0f, 5a, el, d, el), INSN(cvtss2si, f3, 0f, 2d, el, d, el), + INSN(cvtss2usi, f3, 0f, 79, el, d, el), INSN(cvttpd2dq, 66, 0f, e6, vl, q, vl), + INSN(cvttpd2udq, , 0f, 78, vl, q, vl), INSN(cvttps2dq, f3, 0f, 5b, vl, d, vl), + INSN(cvttps2udq, , 0f, 78, vl, d, vl), INSN(cvttsd2si, f2, 0f, 2c, el, q, el), + INSN(cvttsd2usi, f2, 0f, 78, el, q, el), INSN(cvttss2si, f3, 0f, 2c, el, d, el), + INSN(cvttss2usi, f3, 0f, 78, el, d, el), INSN(cvtudq2pd, f3, 0f, 7a, vl_2, d, vl), INSN(cvtudq2ps, f2, 0f, 7a, vl, d, vl), INSN(cvtusi2sd, f2, 0f, 7b, el, dq64, el), @@ -415,11 +423,15 @@ static const struct test avx512dq_all[] INSN_PFP(andn, 0f, 55), INSN(broadcasti32x2, 66, 0f38, 59, el_2, d, vl), INSN(cvtpd2qq, 66, 0f, 7b, vl, q, vl), + INSN(cvtpd2uqq, 66, 0f, 79, vl, q, vl), INSN(cvtps2qq, 66, 0f, 7b, vl_2, d, vl), + INSN(cvtps2uqq, 66, 0f, 79, vl_2, d, vl), INSN(cvtqq2pd, f3, 0f, e6, vl, q, vl), INSN(cvtqq2ps, , 0f, 5b, vl, q, vl), INSN(cvttpd2qq, 66, 0f, 7a, vl, q, vl), + INSN(cvttpd2uqq, 66, 0f, 78, vl, q, vl), INSN(cvttps2qq, 66, 0f, 7a, vl_2, d, vl), + INSN(cvttps2uqq, 66, 0f, 78, vl_2, d, vl), INSN(cvtuqq2pd, f3, 0f, 7a, vl, q, vl), INSN(cvtuqq2ps, f2, 0f, 7a, vl, q, vl), INSN_PFP(or, 0f, 56), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -93,31 +93,65 @@ static inline bool _to_bool(byte_vec_t b # ifdef __x86_64__ # define to_wint(x) ({ long l_ = (x)[0]; touch(l_); ((vec_t){ l_ }); }) # endif +# ifdef __AVX512F__ +/* + * Sadly even gcc 9.x, at the time of writing, does not carry out at least + * uint -> FP conversions using VCVTUSI2S{S,D}, so we need to use builtins + * or inline assembly here. The full-vector parameter types of the builtins + * aren't very helpful for our purposes, so use inline assembly. + */ +# if FLOAT_SIZE == 4 +# define to_u_int(type, x) ({ \ + unsigned type u_; \ + float __attribute__((vector_size(16))) t_; \ + asm ( "vcvtss2usi %1, %0" : "=r" (u_) : "m" ((x)[0]) ); \ + asm ( "vcvtusi2ss%z1 %1, %0, %0" : "=v" (t_) : "m" (u_) ); \ + (vec_t){ t_[0] }; \ +}) +# elif FLOAT_SIZE == 8 +# define to_u_int(type, x) ({ \ + unsigned type u_; \ + double __attribute__((vector_size(16))) t_; \ + asm ( "vcvtsd2usi %1, %0" : "=r" (u_) : "m" ((x)[0]) ); \ + asm ( "vcvtusi2sd%z1 %1, %0, %0" : "=v" (t_) : "m" (u_) ); \ + (vec_t){ t_[0] }; \ +}) +# endif +# define to_uint(x) to_u_int(int, x) +# ifdef __x86_64__ +# define to_uwint(x) to_u_int(long, x) +# endif +# endif #elif VEC_SIZE == 8 && FLOAT_SIZE == 4 && defined(__3dNOW__) # define to_int(x) __builtin_ia32_pi2fd(__builtin_ia32_pf2id(x)) #elif defined(FLOAT_SIZE) && VEC_SIZE > FLOAT_SIZE && defined(__AVX512F__) && \ (VEC_SIZE == 64 || defined(__AVX512VL__)) # if FLOAT_SIZE == 4 # define to_int(x) BR(cvtdq2ps, _mask, BR(cvtps2dq, _mask, x, (vsi_t)undef(), ~0), undef(), ~0) +# define to_uint(x) BR(cvtudq2ps, _mask, BR(cvtps2udq, _mask, x, (vsi_t)undef(), ~0), undef(), ~0) # ifdef __AVX512DQ__ -# define to_wint(x) ({ \ +# define to_w_int(x, s) ({ \ vsf_half_t t_ = low_half(x); \ vdi_t lo_, hi_; \ touch(t_); \ - lo_ = BR(cvtps2qq, _mask, t_, (vdi_t)undef(), ~0); \ + lo_ = BR(cvtps2 ## s ## qq, _mask, t_, (vdi_t)undef(), ~0); \ t_ = high_half(x); \ touch(t_); \ - hi_ = BR(cvtps2qq, _mask, t_, (vdi_t)undef(), ~0); \ + hi_ = BR(cvtps2 ## s ## qq, _mask, t_, (vdi_t)undef(), ~0); \ touch(lo_); touch(hi_); \ insert_half(insert_half(undef(), \ - BR(cvtqq2ps, _mask, lo_, (vsf_half_t){}, ~0), 0), \ - BR(cvtqq2ps, _mask, hi_, (vsf_half_t){}, ~0), 1); \ + BR(cvt ## s ## qq2ps, _mask, lo_, (vsf_half_t){}, ~0), 0), \ + BR(cvt ## s ## qq2ps, _mask, hi_, (vsf_half_t){}, ~0), 1); \ }) +# define to_wint(x) to_w_int(x, ) +# define to_uwint(x) to_w_int(x, u) # endif # elif FLOAT_SIZE == 8 # define to_int(x) B(cvtdq2pd, _mask, BR(cvtpd2dq, _mask, x, (vsi_half_t){}, ~0), undef(), ~0) +# define to_uint(x) B(cvtudq2pd, _mask, BR(cvtpd2udq, _mask, x, (vsi_half_t){}, ~0), undef(), ~0) # ifdef __AVX512DQ__ # define to_wint(x) BR(cvtqq2pd, _mask, BR(cvtpd2qq, _mask, x, (vdi_t)undef(), ~0), undef(), ~0) +# define to_uwint(x) BR(cvtuqq2pd, _mask, BR(cvtpd2uqq, _mask, x, (vdi_t)undef(), ~0), undef(), ~0) # endif # endif #elif VEC_SIZE == 16 && defined(__SSE2__) @@ -1221,6 +1255,20 @@ int simd_test(void) touch(src); if ( !eq(x, src) ) return __LINE__; # endif + +# ifdef to_uint + touch(src); + x = to_uint(src); + touch(src); + if ( !eq(x, src) ) return __LINE__; +# endif + +# ifdef to_uwint + touch(src); + x = to_uwint(src); + touch(src); + if ( !eq(x, src) ) return __LINE__; +# endif # ifdef sqrt x = src * src; --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -323,8 +323,7 @@ static const struct twobyte_table { [0x71 ... 0x73] = { DstImplicit|SrcImmByte|ModRM, simd_none, d8s_vl }, [0x74 ... 0x76] = { DstImplicit|SrcMem|ModRM, simd_packed_int, d8s_vl }, [0x77] = { DstImplicit|SrcNone }, - [0x78] = { ImplicitOps|ModRM }, - [0x79] = { DstReg|SrcMem|ModRM, simd_packed_int }, + [0x78 ... 0x79] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_vl }, [0x7a] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp, d8s_vl }, [0x7b] = { DstImplicit|SrcMem|ModRM|Mov, simd_other, d8s_dq64 }, [0x7c ... 0x7d] = { DstImplicit|SrcMem|ModRM, simd_other }, @@ -2523,6 +2522,8 @@ x86_decode_twobyte( break; case 0x78: + state->desc = ImplicitOps; + state->simd_size = simd_none; switch ( vex.pfx ) { case vex_66: /* extrq $imm8, $imm8, xmm */ @@ -2535,7 +2536,7 @@ x86_decode_twobyte( case 0x10 ... 0x18: case 0x28 ... 0x2f: case 0x50 ... 0x77: - case 0x79 ... 0x7d: + case 0x7a ... 0x7d: case 0x7f: case 0xc2 ... 0xc3: case 0xc5 ... 0xc6: @@ -2557,6 +2558,12 @@ x86_decode_twobyte( op_bytes = mode_64bit() ? 8 : 4; break; + case 0x79: + state->desc = DstReg | SrcMem; + state->simd_size = simd_packed_int; + ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK); + break; + case 0x7e: ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK); if ( vex.pfx == vex_f3 ) /* movq xmm/m64,xmm */ @@ -3074,6 +3081,18 @@ x86_decode( modrm_mod = 3; break; + case 0x78: + case 0x79: + if ( !evex.pfx ) + break; + /* vcvt{,t}ps2uqq need special casing */ + if ( evex.pfx == vex_66 ) + { + if ( !evex.w && !evex.brs ) + --disp8scale; + break; + } + /* vcvt{,t}s{s,d}2usi need special casing: fall through */ case 0x2c: /* vcvtts{s,d}2si need special casing */ case 0x2d: /* vcvts{s,d}2si need special casing */ if ( evex_encoded() ) @@ -6329,6 +6348,8 @@ x86_emulate( CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2c): /* vcvtts{s,d}2si xmm/mem,reg */ CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2d): /* vcvts{s,d}2si xmm/mem,reg */ + CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x78): /* vcvtts{s,d}2usi xmm/mem,reg */ + CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x79): /* vcvts{s,d}2usi xmm/mem,reg */ generate_exception_if((evex.reg != 0xf || !evex.RX || evex.opmsk || (ea.type != OP_REG && evex.brs)), EXC_UD); @@ -6690,7 +6711,11 @@ x86_emulate( if ( evex.w ) host_and_vcpu_must_have(avx512dq); else + { + case X86EMUL_OPC_EVEX(0x0f, 0x78): /* vcvttp{s,d}2udq [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX(0x0f, 0x79): /* vcvtp{s,d}2udq [xyz]mm/mem,[xyz]mm{k} */ host_and_vcpu_must_have(avx512f); + } if ( ea.type != OP_REG || !evex.brs ) avx512_vlen_check(false); d |= TwoOp; @@ -7373,6 +7398,10 @@ x86_emulate( host_and_vcpu_must_have(avx512f); else if ( evex.w ) { + case X86EMUL_OPC_EVEX_66(0x0f, 0x78): /* vcvttps2uqq {x,y}mm/mem,[xyz]mm{k} */ + /* vcvttpd2uqq [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f, 0x79): /* vcvtps2uqq {x,y}mm/mem,[xyz]mm{k} */ + /* vcvtpd2uqq [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0x7a): /* vcvttps2qq {x,y}mm/mem,[xyz]mm{k} */ /* vcvttpd2qq [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0x7b): /* vcvtps2qq {x,y}mm/mem,[xyz]mm{k} */ From patchwork Fri Mar 15 10:54:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854495 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99C3A13B5 for ; Fri, 15 Mar 2019 10:56:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7EF392A943 for ; Fri, 15 Mar 2019 10:56:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 732972A945; Fri, 15 Mar 2019 10:56:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A5AD62A943 for ; Fri, 15 Mar 2019 10:56:27 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kU5-000707-24; Fri, 15 Mar 2019 10:54:41 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kU3-0006zm-My for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:54:39 +0000 X-Inumbo-ID: b90803a8-4710-11e9-beb8-138a919fd9c5 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id b90803a8-4710-11e9-beb8-138a919fd9c5; Fri, 15 Mar 2019 10:54:36 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:54:35 -0600 Message-Id: <5C8B846B020000780021F214@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:54:35 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 25/50] x86emul: support remaining AVX512F legacy-equivalent insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Plus their AVX512BW counterparts. Take the opportunity and also eliminate a pair of open coded instances of scalar_1op(). Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base. v6: Re-base over changes earlier in the series. v5: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -193,6 +193,8 @@ static const struct test avx512f_all[] = INSN_PFP_NB(movu, 0f, 10), INSN_PFP_NB(movu, 0f, 11), INSN_FP(mul, 0f, 59), + INSN(pabsd, 66, 0f38, 1e, vl, d, vl), + INSN(pabsq, 66, 0f38, 1f, vl, q, vl), INSN(paddd, 66, 0f, fe, vl, d, vl), INSN(paddq, 66, 0f, d4, vl, q, vl), INSN(pand, 66, 0f, db, vl, dq, vl), @@ -276,6 +278,10 @@ static const struct test avx512f_all[] = INSN(punpckldq, 66, 0f, 62, vl, d, vl), INSN(punpcklqdq, 66, 0f, 6c, vl, q, vl), INSN(pxor, 66, 0f, ef, vl, dq, vl), + INSN(rndscalepd, 66, 0f3a, 09, vl, q, vl), + INSN(rndscaleps, 66, 0f3a, 08, vl, d, vl), + INSN(rndscalesd, 66, 0f3a, 0b, el, q, el), + INSN(rndscaless, 66, 0f3a, 0a, el, d, el), INSN_PFP(shuf, 0f, c6), INSN_FP(sqrt, 0f, 51), INSN_FP(sub, 0f, 5c), @@ -336,6 +342,8 @@ static const struct test avx512bw_all[] INSN(movdqu8, f2, 0f, 7f, vl, b, vl), INSN(movdqu16, f2, 0f, 6f, vl, w, vl), INSN(movdqu16, f2, 0f, 7f, vl, w, vl), + INSN(pabsb, 66, 0f38, 1c, vl, b, vl), + INSN(pabsw, 66, 0f38, 1d, vl, w, vl), INSN(packssdw, 66, 0f, 6b, vl, d_nb, vl), INSN(packsswb, 66, 0f, 63, vl, w, vl), INSN(packusdw, 66, 0f38, 2b, vl, d_nb, vl), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -211,8 +211,10 @@ static inline vec_t movlhps(vec_t x, vec #elif defined(FLOAT_SIZE) && VEC_SIZE == FLOAT_SIZE && defined(__AVX512F__) # if FLOAT_SIZE == 4 # define sqrt(x) scalar_1op(x, "vsqrtss %[in], %[out], %[out]") +# define trunc(x) scalar_1op(x, "vrndscaless $0b1011, %[in], %[out], %[out]") # elif FLOAT_SIZE == 8 # define sqrt(x) scalar_1op(x, "vsqrtsd %[in], %[out], %[out]") +# define trunc(x) scalar_1op(x, "vrndscalesd $0b1011, %[in], %[out], %[out]") # endif #elif defined(FLOAT_SIZE) && defined(__AVX512F__) && \ (VEC_SIZE == 64 || defined(__AVX512VL__)) @@ -263,6 +265,7 @@ static inline vec_t movlhps(vec_t x, vec # define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) # define shrink1(x) BR_(cvtpd2ps, _mask, (vdf_t)(x), (vsf_half_t){}, ~0) # define sqrt(x) BR(sqrtps, _mask, x, undef(), ~0) +# define trunc(x) BR(rndscaleps_, _mask, x, 0b1011, undef(), ~0) # define widen1(x) ((vec_t)BR(cvtps2pd, _mask, x, (vdf_t)undef(), ~0)) # if VEC_SIZE == 16 # define interleave_hi(x, y) B(unpckhps, _mask, x, y, undef(), ~0) @@ -316,6 +319,7 @@ static inline vec_t movlhps(vec_t x, vec # define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) # define mix(x, y) B(movapd, _mask, x, y, 0b01010101) # define sqrt(x) BR(sqrtpd, _mask, x, undef(), ~0) +# define trunc(x) BR(rndscalepd_, _mask, x, 0b1011, undef(), ~0) # if VEC_SIZE == 16 # define interleave_hi(x, y) B(unpckhpd, _mask, x, y, undef(), ~0) # define interleave_lo(x, y) B(unpcklpd, _mask, x, y, undef(), ~0) @@ -548,6 +552,7 @@ static inline vec_t movlhps(vec_t x, vec # endif # endif # if INT_SIZE == 4 +# define abs(x) B(pabsd, _mask, x, undef(), ~0) # define max(x, y) B(pmaxsd, _mask, x, y, undef(), ~0) # define min(x, y) B(pminsd, _mask, x, y, undef(), ~0) # define mul_full(x, y) ((vec_t)B(pmuldq, _mask, x, y, (vdi_t)undef(), ~0)) @@ -558,6 +563,7 @@ static inline vec_t movlhps(vec_t x, vec # define mul_full(x, y) ((vec_t)B(pmuludq, _mask, (vsi_t)(x), (vsi_t)(y), (vdi_t)undef(), ~0)) # define widen1(x) ((vec_t)B(pmovzxdq, _mask, (vsi_half_t)(x), (vdi_t)undef(), ~0)) # elif INT_SIZE == 8 +# define abs(x) ((vec_t)B(pabsq, _mask, (vdi_t)(x), (vdi_t)undef(), ~0)) # define max(x, y) ((vec_t)B(pmaxsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) # define min(x, y) ((vec_t)B(pminsq, _mask, (vdi_t)(x), (vdi_t)(y), (vdi_t)undef(), ~0)) # elif UINT_SIZE == 8 @@ -625,6 +631,7 @@ static inline vec_t movlhps(vec_t x, vec # define swap2(x) ((vec_t)B(permvarhi, _mask, (vhi_t)(x), (vhi_t)(inv - 1), (vhi_t)undef(), ~0)) # endif # if INT_SIZE == 1 +# define abs(x) ((vec_t)B(pabsb, _mask, (vqi_t)(x), (vqi_t)undef(), ~0)) # define max(x, y) ((vec_t)B(pmaxsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) # define min(x, y) ((vec_t)B(pminsb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) # define widen1(x) ((vec_t)B(pmovsxbw, _mask, (vqi_half_t)(x), (vhi_t)undef(), ~0)) @@ -637,6 +644,7 @@ static inline vec_t movlhps(vec_t x, vec # define widen2(x) ((vec_t)B(pmovzxbd, _mask, (vqi_quarter_t)(x), (vsi_t)undef(), ~0)) # define widen3(x) ((vec_t)B(pmovzxbq, _mask, (vqi_eighth_t)(x), (vdi_t)undef(), ~0)) # elif INT_SIZE == 2 +# define abs(x) B(pabsw, _mask, x, undef(), ~0) # define max(x, y) B(pmaxsw, _mask, x, y, undef(), ~0) # define min(x, y) B(pminsw, _mask, x, y, undef(), ~0) # define mul_hi(x, y) B(pmulhw, _mask, x, y, undef(), ~0) @@ -948,19 +956,11 @@ static inline vec_t movlhps(vec_t x, vec #if VEC_SIZE == FLOAT_SIZE # define max(x, y) ((vec_t){({ typeof(x[0]) x_ = (x)[0], y_ = (y)[0]; x_ > y_ ? x_ : y_; })}) # define min(x, y) ((vec_t){({ typeof(x[0]) x_ = (x)[0], y_ = (y)[0]; x_ < y_ ? x_ : y_; })}) -# ifdef __SSE4_1__ +# if defined(__SSE4_1__) && !defined(__AVX512F__) # if FLOAT_SIZE == 4 -# define trunc(x) ({ \ - float __attribute__((vector_size(16))) r_; \ - asm ( "roundss $0b1011,%1,%0" : "=x" (r_) : "m" (x) ); \ - (vec_t){ r_[0] }; \ -}) +# define trunc(x) scalar_1op(x, "roundss $0b1011, %[in], %[out]") # elif FLOAT_SIZE == 8 -# define trunc(x) ({ \ - double __attribute__((vector_size(16))) r_; \ - asm ( "roundsd $0b1011,%1,%0" : "=x" (r_) : "m" (x) ); \ - (vec_t){ r_[0] }; \ -}) +# define trunc(x) scalar_1op(x, "roundsd $0b1011, %[in], %[out]") # endif # endif #endif --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -184,6 +184,8 @@ DECL_OCTET(half); # define __builtin_ia32_inserti32x4_512_mask __builtin_ia32_inserti32x4_mask # define __builtin_ia32_inserti32x8_512_mask __builtin_ia32_inserti32x8_mask # define __builtin_ia32_inserti64x4_512_mask __builtin_ia32_inserti64x4_mask +# define __builtin_ia32_rndscalepd_512_mask __builtin_ia32_rndscalepd_mask +# define __builtin_ia32_rndscaleps_512_mask __builtin_ia32_rndscaleps_mask # define __builtin_ia32_shuf_f32x4_512_mask __builtin_ia32_shuf_f32x4_mask # define __builtin_ia32_shuf_f64x2_512_mask __builtin_ia32_shuf_f64x2_mask # define __builtin_ia32_shuf_i32x4_512_mask __builtin_ia32_shuf_i32x4_mask @@ -245,6 +247,7 @@ OVR_INT(broadcast); OVR_SFP(broadcast); OVR_SFP(comi); OVR_VFP(cvtdq2); +OVR_INT(abs); OVR_FP(add); OVR_INT(add); OVR_BW(adds); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -446,7 +446,7 @@ static const struct ext0f38_table { [0x19] = { .simd_size = simd_scalar_opc, .two_op = 1, .d8s = 3 }, [0x1a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 }, [0x1b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 }, - [0x1c ... 0x1e] = { .simd_size = simd_packed_int, .two_op = 1 }, + [0x1c ... 0x1f] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x20] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x21] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 }, [0x22] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_8 }, @@ -531,8 +531,8 @@ static const struct ext0f3a_table { [0x02] = { .simd_size = simd_packed_int }, [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x06] = { .simd_size = simd_packed_fp }, - [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1 }, - [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc }, + [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc, .d8s = d8s_dq }, [0x0c ... 0x0d] = { .simd_size = simd_packed_fp }, [0x0e ... 0x0f] = { .simd_size = simd_packed_int }, [0x14] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 0 }, @@ -6917,6 +6917,8 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f, 0xf9): /* vpsubw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xfc): /* vpaddb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xfd): /* vpaddw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */ host_and_vcpu_must_have(avx512bw); generate_exception_if(evex.brs, EXC_UD); elem_bytes = 1 << (b & 1); @@ -8303,6 +8305,8 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f, 0xfa): /* vpsubd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xfb): /* vpsubq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xfe): /* vpaddd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x1e): /* vpabsd [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x1f): /* vpabsq [xyz]mm/mem,[xyz]mm{k} */ generate_exception_if(evex.w != (b & 1), EXC_UD); goto avx512f_no_sae; @@ -9331,6 +9335,17 @@ x86_emulate( host_and_vcpu_must_have(sse4_1); goto simd_0f3a_common; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0a): /* vrndscaless $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0b): /* vrndscalesd $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x08): /* vrndscaleps $imm8,[xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x09): /* vrndscalepd $imm8,[xyz]mm/mem,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512f); + generate_exception_if(evex.w != (b & 1), EXC_UD); + avx512_vlen_check(b & 2); + goto simd_imm8_zmm; + case X86EMUL_OPC(0x0f3a, 0x0f): /* palignr $imm8,mm/m64,mm */ case X86EMUL_OPC_66(0x0f3a, 0x0f): /* palignr $imm8,xmm/m128,xmm */ host_and_vcpu_must_have(ssse3); From patchwork Fri Mar 15 10:54:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854497 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 618951390 for ; Fri, 15 Mar 2019 10:56:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 480DC2A940 for ; Fri, 15 Mar 2019 10:56:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 39FA02A944; Fri, 15 Mar 2019 10:56:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AF1182A940 for ; Fri, 15 Mar 2019 10:56:38 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kUQ-00075e-Gq; Fri, 15 Mar 2019 10:55:02 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kUO-00075K-UK for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:55:00 +0000 X-Inumbo-ID: c6ebc302-4710-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id c6ebc302-4710-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:54:59 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:54:59 -0600 Message-Id: <5C8B8482020000780021F217@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:54:58 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 26/50] x86emul: support remaining AVX512BW legacy-equivalent insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jan Beulich --- v8: Re-base. v5: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -354,6 +354,7 @@ static const struct test avx512bw_all[] INSN(paddusb, 66, 0f, dc, vl, b, vl), INSN(paddusw, 66, 0f, dd, vl, w, vl), INSN(paddw, 66, 0f, fd, vl, w, vl), + INSN(palignr, 66, 0f3a, 0f, vl, b, vl), INSN(pavgb, 66, 0f, e0, vl, b, vl), INSN(pavgw, 66, 0f, e3, vl, w, vl), INSN(pbroadcastb, 66, 0f38, 78, el, b, el), @@ -369,6 +370,7 @@ static const struct test avx512bw_all[] INSN(permw, 66, 0f38, 8d, vl, w, vl), INSN(permi2w, 66, 0f38, 75, vl, w, vl), INSN(permt2w, 66, 0f38, 7d, vl, w, vl), + INSN(pmaddubsw, 66, 0f38, 04, vl, b, vl), INSN(pmaddwd, 66, 0f, f5, vl, w, vl), INSN(pmaxsb, 66, 0f38, 3c, vl, b, vl), INSN(pmaxsw, 66, 0f, ee, vl, w, vl), @@ -386,6 +388,7 @@ static const struct test avx512bw_all[] // pmovw2m, f3, 0f38, 29, w INSN(pmovwb, f3, 0f38, 30, vl_2, b, vl), INSN(pmovzxbw, 66, 0f38, 30, vl_2, b, vl), + INSN(pmulhrsw, 66, 0f38, 0b, vl, w, vl), INSN(pmulhuw, 66, 0f, e4, vl, w, vl), INSN(pmulhw, 66, 0f, e5, vl, w, vl), INSN(pmullw, 66, 0f, d5, vl, w, vl), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -587,6 +587,7 @@ static inline vec_t movlhps(vec_t x, vec # if VEC_SIZE == 16 # define interleave_hi(x, y) ((vec_t)B(punpckhbw, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) # define interleave_lo(x, y) ((vec_t)B(punpcklbw, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0)) +# define rotr(x, n) ((vec_t)B(palignr, _mask, (vdi_t)(x), (vdi_t)(x), (n) * 8, (vdi_t)undef(), ~0)) # define swap(x) ((vec_t)B(pshufb, _mask, (vqi_t)(x), (vqi_t)(inv - 1), (vqi_t)undef(), ~0)) # elif defined(__AVX512VBMI__) # define interleave_hi(x, y) ((vec_t)B(vpermi2varqi, _mask, (vqi_t)(x), interleave_hi, (vqi_t)(y), ~0)) @@ -615,6 +616,7 @@ static inline vec_t movlhps(vec_t x, vec # if VEC_SIZE == 16 # define interleave_hi(x, y) ((vec_t)B(punpckhwd, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0)) # define interleave_lo(x, y) ((vec_t)B(punpcklwd, _mask, (vhi_t)(x), (vhi_t)(y), (vhi_t)undef(), ~0)) +# define rotr(x, n) ((vec_t)B(palignr, _mask, (vdi_t)(x), (vdi_t)(x), (n) * 16, (vdi_t)undef(), ~0)) # define swap(x) ((vec_t)B(pshufd, _mask, \ (vsi_t)B(pshufhw, _mask, \ B(pshuflw, _mask, (vhi_t)(x), 0b00011011, (vhi_t)undef(), ~0), \ --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -402,9 +402,12 @@ OVR(packssdw); OVR(packsswb); OVR(packusdw); OVR(packuswb); +OVR(palignr); +OVR(pmaddubsw); OVR(pmaddwd); OVR(pmovsxbw); OVR(pmovzxbw); +OVR(pmulhrsw); OVR(pmulhuw); OVR(pmulhw); OVR(pmullw); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -435,7 +435,10 @@ static const struct ext0f38_table { disp8scale_t d8s:4; } ext0f38_table[256] = { [0x00] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, - [0x01 ... 0x0b] = { .simd_size = simd_packed_int }, + [0x01 ... 0x03] = { .simd_size = simd_packed_int }, + [0x04] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x05 ... 0x0b] = { .simd_size = simd_packed_int }, + [0x0b] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x0c ... 0x0d] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x0e ... 0x0f] = { .simd_size = simd_packed_fp }, [0x10 ... 0x12] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, @@ -534,7 +537,8 @@ static const struct ext0f3a_table { [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x0a ... 0x0b] = { .simd_size = simd_scalar_opc, .d8s = d8s_dq }, [0x0c ... 0x0d] = { .simd_size = simd_packed_fp }, - [0x0e ... 0x0f] = { .simd_size = simd_packed_int }, + [0x0e] = { .simd_size = simd_packed_int }, + [0x0f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x14] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 0 }, [0x15] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = 1 }, [0x16] = { .simd_size = simd_none, .to_mem = 1, .two_op = 1, .d8s = d8s_dq64 }, @@ -6899,6 +6903,7 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f, 0xf1): /* vpsllw xmm/m128,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xf5): /* vpmaddwd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x00): /* vpshufb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x04): /* vpmaddubsw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ fault_suppression = false; /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f, 0xd5): /* vpmullw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ @@ -6917,6 +6922,7 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f, 0xf9): /* vpsubw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xfc): /* vpaddb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f, 0xfd): /* vpaddw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x0b): /* vpmulhrsw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */ host_and_vcpu_must_have(avx512bw); @@ -9374,6 +9380,10 @@ x86_emulate( insn_bytes = PFX_BYTES + 4; break; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0f): /* vpalignr $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + fault_suppression = false; + goto avx512bw_imm; + case X86EMUL_OPC_66(0x0f3a, 0x14): /* pextrb $imm8,xmm,r/m */ case X86EMUL_OPC_66(0x0f3a, 0x15): /* pextrw $imm8,xmm,r/m */ case X86EMUL_OPC_66(0x0f3a, 0x16): /* pextr{d,q} $imm8,xmm,r/m */ From patchwork Fri Mar 15 10:55:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854499 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE9301390 for ; Fri, 15 Mar 2019 10:57:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D371C2A943 for ; Fri, 15 Mar 2019 10:57:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C7D302A945; Fri, 15 Mar 2019 10:57:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DB7F12A943 for ; Fri, 15 Mar 2019 10:57:24 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kV5-0007Dx-SG; Fri, 15 Mar 2019 10:55:43 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kV4-0007Di-FK for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:55:42 +0000 X-Inumbo-ID: ddc1cc88-4710-11e9-bc2d-7bc4b87856ac Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id ddc1cc88-4710-11e9-bc2d-7bc4b87856ac; Fri, 15 Mar 2019 10:55:38 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:55:37 -0600 Message-Id: <5C8B84A8020000780021F23F@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:55:36 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 27/50] x86emul: support AVX512{F, ER} reciprocal insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Also include the only other AVX512ER insn pair, VEXP2P{D,S}. Note that despite the replacement of the SHA insns' table slots there's no need to special case their decoding: Their insn-specific code already sets op_bytes (as was required due to simd_other), and TwoOp is of no relevance for legacy encoded SIMD insns. The raising of #UD when EVEX.L'L is 3 for AVX512ER scalar insns is done to be on the safe side. The SDM does not clarify behavior there, and it's even more ambiguous here (without AVX512VL in the picture). Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Fix vector length check for AVX512ER insns. ea.type == OP_* -> ea.type != OP_*. Re-base. v6: Re-base. AVX512ER tests now also successfully run. v5: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86 CFLAGS += $(CFLAGS_xeninclude) -SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq +SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er FMA := fma4 fma SG := avx2-sg TESTCASES := blowfish $(SIMD) $(FMA) $(SG) @@ -72,6 +72,9 @@ avx512bw-flts := avx512dq-vecs := $(avx512f-vecs) avx512dq-ints := $(avx512f-ints) avx512dq-flts := $(avx512f-flts) +avx512er-vecs := 64 +avx512er-ints := +avx512er-flts := 4 8 avx512f-opmask-vecs := 2 avx512dq-opmask-vecs := 1 2 --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -278,10 +278,14 @@ static const struct test avx512f_all[] = INSN(punpckldq, 66, 0f, 62, vl, d, vl), INSN(punpcklqdq, 66, 0f, 6c, vl, q, vl), INSN(pxor, 66, 0f, ef, vl, dq, vl), + INSN(rcp14, 66, 0f38, 4c, vl, sd, vl), + INSN(rcp14, 66, 0f38, 4d, el, sd, el), INSN(rndscalepd, 66, 0f3a, 09, vl, q, vl), INSN(rndscaleps, 66, 0f3a, 08, vl, d, vl), INSN(rndscalesd, 66, 0f3a, 0b, el, q, el), INSN(rndscaless, 66, 0f3a, 0a, el, d, el), + INSN(rsqrt14, 66, 0f38, 4e, vl, sd, vl), + INSN(rsqrt14, 66, 0f38, 4f, el, sd, el), INSN_PFP(shuf, 0f, c6), INSN_FP(sqrt, 0f, 51), INSN_FP(sub, 0f, 5c), @@ -477,6 +481,14 @@ static const struct test avx512dq_512[] INSN(inserti32x8, 66, 0f3a, 3a, el_8, d, vl), }; +static const struct test avx512er_512[] = { + INSN(exp2, 66, 0f38, c8, vl, sd, vl), + INSN(rcp28, 66, 0f38, ca, vl, sd, vl), + INSN(rcp28, 66, 0f38, cb, el, sd, el), + INSN(rsqrt28, 66, 0f38, cc, vl, sd, vl), + INSN(rsqrt28, 66, 0f38, cd, el, sd, el), +}; + static const struct test avx512_vbmi_all[] = { INSN(permb, 66, 0f38, 8d, vl, b, vl), INSN(permi2b, 66, 0f38, 75, vl, b, vl), @@ -837,5 +849,6 @@ void evex_disp8_test(void *instr, struct RUN(avx512dq, 128); RUN(avx512dq, no128); RUN(avx512dq, 512); + RUN(avx512er, 512); RUN(avx512_vbmi, all); } --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -210,9 +210,23 @@ static inline vec_t movlhps(vec_t x, vec }) #elif defined(FLOAT_SIZE) && VEC_SIZE == FLOAT_SIZE && defined(__AVX512F__) # if FLOAT_SIZE == 4 +# ifdef __AVX512ER__ +# define recip(x) scalar_1op(x, "vrcp28ss %[in], %[out], %[out]") +# define rsqrt(x) scalar_1op(x, "vrsqrt28ss %[in], %[out], %[out]") +# else +# define recip(x) scalar_1op(x, "vrcp14ss %[in], %[out], %[out]") +# define rsqrt(x) scalar_1op(x, "vrsqrt14ss %[in], %[out], %[out]") +# endif # define sqrt(x) scalar_1op(x, "vsqrtss %[in], %[out], %[out]") # define trunc(x) scalar_1op(x, "vrndscaless $0b1011, %[in], %[out], %[out]") # elif FLOAT_SIZE == 8 +# ifdef __AVX512ER__ +# define recip(x) scalar_1op(x, "vrcp28sd %[in], %[out], %[out]") +# define rsqrt(x) scalar_1op(x, "vrsqrt28sd %[in], %[out], %[out]") +# else +# define recip(x) scalar_1op(x, "vrcp14sd %[in], %[out], %[out]") +# define rsqrt(x) scalar_1op(x, "vrsqrt14sd %[in], %[out], %[out]") +# endif # define sqrt(x) scalar_1op(x, "vsqrtsd %[in], %[out], %[out]") # define trunc(x) scalar_1op(x, "vrndscalesd $0b1011, %[in], %[out], %[out]") # endif @@ -263,6 +277,13 @@ static inline vec_t movlhps(vec_t x, vec # define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) # define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) +# if VEC_SIZE == 64 && defined(__AVX512ER__) +# define recip(x) BR(rcp28ps, _mask, x, undef(), ~0) +# define rsqrt(x) BR(rsqrt28ps, _mask, x, undef(), ~0) +# else +# define recip(x) B(rcp14ps, _mask, x, undef(), ~0) +# define rsqrt(x) B(rsqrt14ps, _mask, x, undef(), ~0) +# endif # define shrink1(x) BR_(cvtpd2ps, _mask, (vdf_t)(x), (vsf_half_t){}, ~0) # define sqrt(x) BR(sqrtps, _mask, x, undef(), ~0) # define trunc(x) BR(rndscaleps_, _mask, x, 0b1011, undef(), ~0) @@ -318,6 +339,13 @@ static inline vec_t movlhps(vec_t x, vec # define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) # define mix(x, y) B(movapd, _mask, x, y, 0b01010101) +# if VEC_SIZE == 64 && defined(__AVX512ER__) +# define recip(x) BR(rcp28pd, _mask, x, undef(), ~0) +# define rsqrt(x) BR(rsqrt28pd, _mask, x, undef(), ~0) +# else +# define recip(x) B(rcp14pd, _mask, x, undef(), ~0) +# define rsqrt(x) B(rsqrt14pd, _mask, x, undef(), ~0) +# endif # define sqrt(x) BR(sqrtpd, _mask, x, undef(), ~0) # define trunc(x) BR(rndscalepd_, _mask, x, 0b1011, undef(), ~0) # if VEC_SIZE == 16 --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -178,14 +178,20 @@ DECL_OCTET(half); /* Sadly there are a few exceptions to the general naming rules. */ # define __builtin_ia32_broadcastf32x4_512_mask __builtin_ia32_broadcastf32x4_512 # define __builtin_ia32_broadcasti32x4_512_mask __builtin_ia32_broadcasti32x4_512 +# define __builtin_ia32_exp2pd512_mask __builtin_ia32_exp2pd_mask +# define __builtin_ia32_exp2ps512_mask __builtin_ia32_exp2ps_mask # define __builtin_ia32_insertf32x4_512_mask __builtin_ia32_insertf32x4_mask # define __builtin_ia32_insertf32x8_512_mask __builtin_ia32_insertf32x8_mask # define __builtin_ia32_insertf64x4_512_mask __builtin_ia32_insertf64x4_mask # define __builtin_ia32_inserti32x4_512_mask __builtin_ia32_inserti32x4_mask # define __builtin_ia32_inserti32x8_512_mask __builtin_ia32_inserti32x8_mask # define __builtin_ia32_inserti64x4_512_mask __builtin_ia32_inserti64x4_mask +# define __builtin_ia32_rcp28pd512_mask __builtin_ia32_rcp28pd_mask +# define __builtin_ia32_rcp28ps512_mask __builtin_ia32_rcp28ps_mask # define __builtin_ia32_rndscalepd_512_mask __builtin_ia32_rndscalepd_mask # define __builtin_ia32_rndscaleps_512_mask __builtin_ia32_rndscaleps_mask +# define __builtin_ia32_rsqrt28pd512_mask __builtin_ia32_rsqrt28pd_mask +# define __builtin_ia32_rsqrt28ps512_mask __builtin_ia32_rsqrt28ps_mask # define __builtin_ia32_shuf_f32x4_512_mask __builtin_ia32_shuf_f32x4_mask # define __builtin_ia32_shuf_f64x2_512_mask __builtin_ia32_shuf_f64x2_mask # define __builtin_ia32_shuf_i32x4_512_mask __builtin_ia32_shuf_i32x4_mask --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -24,6 +24,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512f.h" #include "avx512bw.h" #include "avx512dq.h" +#include "avx512er.h" #define verbose false /* Switch to true for far more logging. */ @@ -106,6 +107,11 @@ static bool simd_check_avx512dq_vl(void) return cpu_has_avx512dq && cpu_has_avx512vl; } +static bool simd_check_avx512er(void) +{ + return cpu_has_avx512er; +} + static bool simd_check_avx512bw(void) { return cpu_has_avx512bw; @@ -327,6 +333,10 @@ static const struct { AVX512VL(DQ+VL u64x2, avx512dq, 16u8), AVX512VL(DQ+VL s64x4, avx512dq, 32i8), AVX512VL(DQ+VL u64x4, avx512dq, 32u8), + SIMD(AVX512ER f32 scalar,avx512er, f4), + SIMD(AVX512ER f32x16, avx512er, 64f4), + SIMD(AVX512ER f64 scalar,avx512er, f8), + SIMD(AVX512ER f64x8, avx512er, 64f8), #undef AVX512VL_ #undef AVX512VL #undef SIMD_ --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -134,6 +134,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_bmi2 cp.feat.bmi2 #define cpu_has_avx512f (cp.feat.avx512f && xcr0_mask(0xe6)) #define cpu_has_avx512dq (cp.feat.avx512dq && xcr0_mask(0xe6)) +#define cpu_has_avx512er (cp.feat.avx512er && xcr0_mask(0xe6)) #define cpu_has_avx512bw (cp.feat.avx512bw && xcr0_mask(0xe6)) #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -471,6 +471,10 @@ static const struct ext0f38_table { [0x40] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x41] = { .simd_size = simd_packed_int, .two_op = 1 }, [0x45 ... 0x47] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x4c] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 }, [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 }, [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 }, @@ -510,7 +514,12 @@ static const struct ext0f38_table { [0xbd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xbe] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0xbf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, - [0xc8 ... 0xcd] = { .simd_size = simd_other }, + [0xc8] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0xc9] = { .simd_size = simd_other }, + [0xca] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0xcb] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0xcc] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0xcd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xdb] = { .simd_size = simd_packed_int, .two_op = 1 }, [0xdc ... 0xdf] = { .simd_size = simd_packed_int }, [0xf0] = { .two_op = 1 }, @@ -1873,6 +1882,7 @@ static bool vcpu_has( #define vcpu_has_smap() vcpu_has( 7, EBX, 20, ctxt, ops) #define vcpu_has_clflushopt() vcpu_has( 7, EBX, 23, ctxt, ops) #define vcpu_has_clwb() vcpu_has( 7, EBX, 24, ctxt, ops) +#define vcpu_has_avx512er() vcpu_has( 7, EBX, 27, ctxt, ops) #define vcpu_has_sha() vcpu_has( 7, EBX, 29, ctxt, ops) #define vcpu_has_avx512bw() vcpu_has( 7, EBX, 30, ctxt, ops) #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) @@ -6168,6 +6178,8 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x45): /* vpsrlv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x46): /* vpsrav{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x47): /* vpsllv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x4c): /* vrcp14p{s,d} [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x4e): /* vrsqrt14p{s,d} [xyz]mm/mem,[xyz]mm{k} */ avx512f_no_sae: host_and_vcpu_must_have(avx512f); generate_exception_if(ea.type != OP_MEM && evex.brs, EXC_UD); @@ -8865,6 +8877,13 @@ x86_emulate( generate_exception_if(vex.w, EXC_UD); goto simd_0f_avx2; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x4d): /* vrcp14s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x4f): /* vrsqrt14s{s,d} xmm/mem,xmm,xmm{k} */ + host_and_vcpu_must_have(avx512f); + generate_exception_if(evex.brs, EXC_UD); + avx512_vlen_check(true); + goto simd_zmm; + case X86EMUL_OPC_VEX_66(0x0f38, 0x5a): /* vbroadcasti128 m128,ymm */ generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD); goto simd_0f_avx2; @@ -9112,6 +9131,7 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0xbd): /* vfnmadd231s{s,d} xmm/mem,xmm,xmm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0xbf): /* vfnmsub231s{s,d} xmm/mem,xmm,xmm{k} */ host_and_vcpu_must_have(avx512f); + simd_zmm_scalar_sae: generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD); if ( !evex.brs ) avx512_vlen_check(true); @@ -9127,6 +9147,19 @@ x86_emulate( op_bytes = 16; goto simd_0f38_common; + case X86EMUL_OPC_EVEX_66(0x0f38, 0xc8): /* vexp2p{s,d} zmm/mem,zmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xca): /* vrcp28p{s,d} zmm/mem,zmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xcc): /* vrsqrt28p{s,d} zmm/mem,zmm{k} */ + host_and_vcpu_must_have(avx512er); + generate_exception_if((ea.type != OP_REG || !evex.brs) && evex.lr != 2, + EXC_UD); + goto simd_zmm; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0xcb): /* vrcp28s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xcd): /* vrsqrt28s{s,d} xmm/mem,xmm,xmm{k} */ + host_and_vcpu_must_have(avx512er); + goto simd_zmm_scalar_sae; + case X86EMUL_OPC(0x0f38, 0xf0): /* movbe m,r */ case X86EMUL_OPC(0x0f38, 0xf1): /* movbe r,m */ vcpu_must_have(movbe); --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -102,6 +102,7 @@ #define cpu_has_avx512dq boot_cpu_has(X86_FEATURE_AVX512DQ) #define cpu_has_rdseed boot_cpu_has(X86_FEATURE_RDSEED) #define cpu_has_smap boot_cpu_has(X86_FEATURE_SMAP) +#define cpu_has_avx512er boot_cpu_has(X86_FEATURE_AVX512ER) #define cpu_has_sha boot_cpu_has(X86_FEATURE_SHA) #define cpu_has_avx512bw boot_cpu_has(X86_FEATURE_AVX512BW) #define cpu_has_avx512vl boot_cpu_has(X86_FEATURE_AVX512VL) From patchwork Fri Mar 15 10:56:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854501 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D90013B5 for ; Fri, 15 Mar 2019 10:57:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42A862A946 for ; Fri, 15 Mar 2019 10:57:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 368642A949; Fri, 15 Mar 2019 10:57:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 54E292A946 for ; Fri, 15 Mar 2019 10:57:57 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kVV-0007Iw-7U; Fri, 15 Mar 2019 10:56:09 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kVU-0007If-4i for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:56:08 +0000 X-Inumbo-ID: ee9f13c0-4710-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id ee9f13c0-4710-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:56:06 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:56:05 -0600 Message-Id: <5C8B84C5020000780021F242@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:56:05 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 28/50] x86emul: support AVX512F floating point manipulation insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Fix vector length check for scalar insns. ea.type == OP_* -> ea.type != OP_*. Re-base. v5: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -140,6 +140,8 @@ static const struct test avx512f_all[] = INSN(cvtusi2sd, f2, 0f, 7b, el, dq64, el), INSN(cvtusi2ss, f3, 0f, 7b, el, dq64, el), INSN_FP(div, 0f, 5e), + INSN(fixupimm, 66, 0f3a, 54, vl, sd, vl), + INSN(fixupimm, 66, 0f3a, 55, el, sd, el), INSN(fmadd132, 66, 0f38, 98, vl, sd, vl), INSN(fmadd132, 66, 0f38, 99, el, sd, el), INSN(fmadd213, 66, 0f38, a8, vl, sd, vl), @@ -170,6 +172,10 @@ static const struct test avx512f_all[] = INSN(fnmsub213, 66, 0f38, af, el, sd, el), INSN(fnmsub231, 66, 0f38, be, vl, sd, vl), INSN(fnmsub231, 66, 0f38, bf, el, sd, el), + INSN(getexp, 66, 0f38, 42, vl, sd, vl), + INSN(getexp, 66, 0f38, 43, el, sd, el), + INSN(getmant, 66, 0f3a, 26, vl, sd, vl), + INSN(getmant, 66, 0f3a, 27, el, sd, el), INSN_FP(max, 0f, 5f), INSN_FP(min, 0f, 5d), INSN_SFP(mov, 0f, 10), @@ -286,6 +292,8 @@ static const struct test avx512f_all[] = INSN(rndscaless, 66, 0f3a, 0a, el, d, el), INSN(rsqrt14, 66, 0f38, 4e, vl, sd, vl), INSN(rsqrt14, 66, 0f38, 4f, el, sd, el), + INSN(scalef, 66, 0f38, 2c, vl, sd, vl), + INSN(scalef, 66, 0f38, 2d, el, sd, el), INSN_PFP(shuf, 0f, c6), INSN_FP(sqrt, 0f, 51), INSN_FP(sub, 0f, 5c), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -174,6 +174,11 @@ static inline bool _to_bool(byte_vec_t b asm ( op : [out] "=&x" (r_) : [in] "m" (x) ); \ (vec_t){ r_[0] }; \ }) +# define scalar_2op(x, y, op) ({ \ + typeof((x)[0]) __attribute__((vector_size(16))) r_ = { x[0] }; \ + asm ( op : [out] "=&x" (r_) : [in1] "[out]" (r_), [in2] "m" (y) ); \ + (vec_t){ r_[0] }; \ +}) #endif #if VEC_SIZE == 16 && FLOAT_SIZE == 4 && defined(__SSE__) @@ -210,6 +215,8 @@ static inline vec_t movlhps(vec_t x, vec }) #elif defined(FLOAT_SIZE) && VEC_SIZE == FLOAT_SIZE && defined(__AVX512F__) # if FLOAT_SIZE == 4 +# define getexp(x) scalar_1op(x, "vgetexpss %[in], %[out], %[out]") +# define getmant(x) scalar_1op(x, "vgetmantss $0, %[in], %[out], %[out]") # ifdef __AVX512ER__ # define recip(x) scalar_1op(x, "vrcp28ss %[in], %[out], %[out]") # define rsqrt(x) scalar_1op(x, "vrsqrt28ss %[in], %[out], %[out]") @@ -217,9 +224,12 @@ static inline vec_t movlhps(vec_t x, vec # define recip(x) scalar_1op(x, "vrcp14ss %[in], %[out], %[out]") # define rsqrt(x) scalar_1op(x, "vrsqrt14ss %[in], %[out], %[out]") # endif +# define scale(x, y) scalar_2op(x, y, "vscalefss %[in2], %[in1], %[out]") # define sqrt(x) scalar_1op(x, "vsqrtss %[in], %[out], %[out]") # define trunc(x) scalar_1op(x, "vrndscaless $0b1011, %[in], %[out], %[out]") # elif FLOAT_SIZE == 8 +# define getexp(x) scalar_1op(x, "vgetexpsd %[in], %[out], %[out]") +# define getmant(x) scalar_1op(x, "vgetmantsd $0, %[in], %[out], %[out]") # ifdef __AVX512ER__ # define recip(x) scalar_1op(x, "vrcp28sd %[in], %[out], %[out]") # define rsqrt(x) scalar_1op(x, "vrsqrt28sd %[in], %[out], %[out]") @@ -227,6 +237,7 @@ static inline vec_t movlhps(vec_t x, vec # define recip(x) scalar_1op(x, "vrcp14sd %[in], %[out], %[out]") # define rsqrt(x) scalar_1op(x, "vrsqrt14sd %[in], %[out], %[out]") # endif +# define scale(x, y) scalar_2op(x, y, "vscalefsd %[in2], %[in1], %[out]") # define sqrt(x) scalar_1op(x, "vsqrtsd %[in], %[out], %[out]") # define trunc(x) scalar_1op(x, "vrndscalesd $0b1011, %[in], %[out], %[out]") # endif @@ -274,9 +285,12 @@ static inline vec_t movlhps(vec_t x, vec # define broadcast_octet(x) B(broadcastf32x8_, _mask, x, undef(), ~0) # define insert_octet(x, y, p) B(insertf32x8_, _mask, x, y, p, undef(), ~0) # endif +# define getexp(x) BR(getexpps, _mask, x, undef(), ~0) +# define getmant(x) BR(getmantps, _mask, x, 0, undef(), ~0) # define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) # define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) +# define scale(x, y) BR(scalefps, _mask, x, y, undef(), ~0) # if VEC_SIZE == 64 && defined(__AVX512ER__) # define recip(x) BR(rcp28ps, _mask, x, undef(), ~0) # define rsqrt(x) BR(rsqrt28ps, _mask, x, undef(), ~0) @@ -336,9 +350,12 @@ static inline vec_t movlhps(vec_t x, vec # define broadcast_quartet(x) B(broadcastf64x4_, , x, undef(), ~0) # define insert_quartet(x, y, p) B(insertf64x4_, _mask, x, y, p, undef(), ~0) # endif +# define getexp(x) BR(getexppd, _mask, x, undef(), ~0) +# define getmant(x) BR(getmantpd, _mask, x, 0, undef(), ~0) # define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) # define mix(x, y) B(movapd, _mask, x, y, 0b01010101) +# define scale(x, y) BR(scalefpd, _mask, x, y, undef(), ~0) # if VEC_SIZE == 64 && defined(__AVX512ER__) # define recip(x) BR(rcp28pd, _mask, x, undef(), ~0) # define rsqrt(x) BR(rsqrt28pd, _mask, x, undef(), ~0) @@ -1766,6 +1783,28 @@ int simd_test(void) # endif #endif +#if defined(getexp) && defined(getmant) + touch(src); + x = getmant(src); + touch(src); + y = getexp(src); + touch(src); + for ( j = i = 0; i < ELEM_COUNT; ++i ) + { + if ( y[i] != j ) return __LINE__; + + if ( !((i + 1) & (i + 2)) ) + ++j; + + if ( !(i & (i + 1)) && x[i] != 1 ) return __LINE__; + } +# ifdef scale + touch(y); + z = scale(x, y); + if ( !eq(src, z) ) return __LINE__; +# endif +#endif + #if (defined(__XOP__) && VEC_SIZE == 16 && (INT_SIZE == 2 || INT_SIZE == 4)) || \ (defined(__AVX512F__) && defined(FLOAT_SIZE)) return -fma_test(); --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -3924,6 +3924,44 @@ int main(int argc, char **argv) else printf("skipped\n"); + printf("%-40s", "Testing vfixupimmpd $0,8(%edx){1to8},%zmm3,%zmm4..."); + if ( stack_exec && cpu_has_avx512f ) + { + decl_insn(vfixupimmpd); + static const struct { + double d[4]; + } + src = { { -1, 0, 1, 2 } }, + dst = { { 3, 4, 5, 6 } }, + out = { { .5, -1, 90, 2 } }; + + asm volatile ( "vbroadcastf64x4 %1, %%zmm3\n\t" + "vbroadcastf64x4 %2, %%zmm4\n" + put_insn(vfixupimmpd, + "vfixupimmpd $0, 8(%0)%{1to8%}, %%zmm3, %%zmm4") + :: "d" (NULL), "m" (src), "m" (dst) ); + + set_insn(vfixupimmpd); + /* + * Nibble (token) mapping (unused ones simply set to zero): + * 2 (ZERO) -> -1 (0x9) + * 3 (POS_ONE) -> 90 (0xc) + * 6 (NEG) -> 1/2 (0xb) + * 7 (POS) -> src (0x1) + */ + res[2] = 0x1b00c900; + regs.edx = (unsigned long)res; + rc = x86_emulate(&ctxt, &emulops); + asm volatile ( "vmovupd %%zmm4, %0" : "=m" (res[0]) ); + if ( rc != X86EMUL_OKAY || !check_eip(vfixupimmpd) || + memcmp(res + 0, &out, sizeof(out)) || + memcmp(res + 8, &out, sizeof(out)) ) + goto fail; + printf("okay\n"); + } + else + printf("skipped\n"); + #undef decl_insn #undef put_insn #undef set_insn --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -459,7 +459,8 @@ static const struct ext0f38_table { [0x26 ... 0x29] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x2a] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x2b] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, - [0x2c ... 0x2d] = { .simd_size = simd_packed_fp }, + [0x2c] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, + [0x2d] = { .simd_size = simd_packed_fp, .d8s = d8s_dq }, [0x2e ... 0x2f] = { .simd_size = simd_packed_fp, .to_mem = 1 }, [0x30] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x31] = { .simd_size = simd_other, .two_op = 1, .d8s = d8s_vl_by_4 }, @@ -470,6 +471,8 @@ static const struct ext0f38_table { [0x36 ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x40] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x41] = { .simd_size = simd_packed_int, .two_op = 1 }, + [0x42] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0x43] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x45 ... 0x47] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x4c] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, @@ -563,6 +566,8 @@ static const struct ext0f3a_table { [0x22] = { .simd_size = simd_none, .d8s = d8s_dq64 }, [0x23] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x25] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x26] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0x27] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x30 ... 0x33] = { .simd_size = simd_other, .two_op = 1 }, [0x38] = { .simd_size = simd_128, .d8s = 4 }, [0x3a] = { .simd_size = simd_256, .d8s = d8s_vl_by_2 }, @@ -577,6 +582,8 @@ static const struct ext0f3a_table { [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x4a ... 0x4b] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x4c] = { .simd_size = simd_packed_int, .four_op = 1 }, + [0x54] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, + [0x55] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x5c ... 0x5f] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x60 ... 0x63] = { .simd_size = simd_packed_int, .two_op = 1 }, [0x68 ... 0x69] = { .simd_size = simd_packed_fp, .four_op = 1 }, @@ -2684,6 +2691,10 @@ x86_decode_0f38( ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK); break; + case X86EMUL_OPC_EVEX_66(0, 0x2d): /* vscalefs{s,d} */ + state->simd_size = simd_scalar_vexw; + break; + case X86EMUL_OPC_EVEX_66(0, 0x7a): /* vpbroadcastb */ case X86EMUL_OPC_EVEX_66(0, 0x7b): /* vpbroadcastw */ case X86EMUL_OPC_EVEX_66(0, 0x7c): /* vpbroadcast{d,q} */ @@ -9095,6 +9106,8 @@ x86_emulate( host_and_vcpu_must_have(fma); goto simd_0f_ymm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x2c): /* vscalefp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x42): /* vgetexpp{s,d} [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x96): /* vfmaddsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x97): /* vfmsubadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x98): /* vfmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ @@ -9118,6 +9131,8 @@ x86_emulate( avx512_vlen_check(false); goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x2d): /* vscalefs{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x43): /* vgetexps{s,d} xmm/mem,xmm,xmm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x99): /* vfmadd132s{s,d} xmm/mem,xmm,xmm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x9b): /* vfmsub132s{s,d} xmm/mem,xmm,xmm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x9d): /* vfnmadd132s{s,d} xmm/mem,xmm,xmm{k} */ @@ -9681,6 +9696,21 @@ x86_emulate( op_bytes = 4; goto simd_imm8_zmm; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x26): /* vgetmantp{s,d} $imm8,[xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x54): /* vfixupimmp{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512f); + if ( ea.type != OP_REG || !evex.brs ) + avx512_vlen_check(false); + goto simd_imm8_zmm; + + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x27): /* vgetmants{s,d} $imm8,xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x55): /* vfixupimms{s,d} $imm8,xmm/mem,xmm,xmm{k} */ + host_and_vcpu_must_have(avx512f); + generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD); + if ( !evex.brs ) + avx512_vlen_check(true); + goto simd_imm8_zmm; + case X86EMUL_OPC_VEX_66(0x0f3a, 0x30): /* kshiftr{b,w} $imm8,k,k */ case X86EMUL_OPC_VEX_66(0x0f3a, 0x32): /* kshiftl{b,w} $imm8,k,k */ if ( !vex.w ) From patchwork Fri Mar 15 10:56:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854503 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EDCF71390 for ; Fri, 15 Mar 2019 10:58:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2EC02A947 for ; Fri, 15 Mar 2019 10:58:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C45CB2A949; Fri, 15 Mar 2019 10:58:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2488B2A947 for ; Fri, 15 Mar 2019 10:58:34 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kVw-0007Pl-N9; Fri, 15 Mar 2019 10:56:36 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kVv-0007PF-53 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:56:35 +0000 X-Inumbo-ID: fd04ddd8-4710-11e9-898b-07180b6782ad Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id fd04ddd8-4710-11e9-898b-07180b6782ad; Fri, 15 Mar 2019 10:56:30 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:56:29 -0600 Message-Id: <5C8B84DC020000780021F245@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:56:28 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 29/50] x86emul: support AVX512DQ floating point manipulation insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This completes support of AVX512DQ in the insn emulator. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Fix vector length check for scalar insns. Re-base. v5: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -457,11 +457,17 @@ static const struct test avx512dq_all[] INSN(cvttps2uqq, 66, 0f, 78, vl_2, d, vl), INSN(cvtuqq2pd, f3, 0f, 7a, vl, q, vl), INSN(cvtuqq2ps, f2, 0f, 7a, vl, q, vl), + INSN(fpclass, 66, 0f3a, 66, vl, sd, vl), + INSN(fpclass, 66, 0f3a, 67, el, sd, el), INSN_PFP(or, 0f, 56), // pmovd2m, f3, 0f38, 39, d // pmovm2, f3, 0f38, 38, dq // pmovq2m, f3, 0f38, 39, q INSN(pmullq, 66, 0f38, 40, vl, q, vl), + INSN(range, 66, 0f3a, 50, vl, sd, vl), + INSN(range, 66, 0f3a, 51, el, sd, el), + INSN(reduce, 66, 0f3a, 56, vl, sd, vl), + INSN(reduce, 66, 0f3a, 57, el, sd, el), INSN_PFP(xor, 0f, 57), }; --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -285,10 +285,18 @@ static inline vec_t movlhps(vec_t x, vec # define broadcast_octet(x) B(broadcastf32x8_, _mask, x, undef(), ~0) # define insert_octet(x, y, p) B(insertf32x8_, _mask, x, y, p, undef(), ~0) # endif +# ifdef __AVX512DQ__ +# define frac(x) B(reduceps, _mask, x, 0b00001011, undef(), ~0) +# endif # define getexp(x) BR(getexpps, _mask, x, undef(), ~0) # define getmant(x) BR(getmantps, _mask, x, 0, undef(), ~0) -# define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) -# define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) +# ifdef __AVX512DQ__ +# define max(x, y) BR(rangeps, _mask, x, y, 0b0101, undef(), ~0) +# define min(x, y) BR(rangeps, _mask, x, y, 0b0100, undef(), ~0) +# else +# define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) +# define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) +# endif # define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) # define scale(x, y) BR(scalefps, _mask, x, y, undef(), ~0) # if VEC_SIZE == 64 && defined(__AVX512ER__) @@ -350,10 +358,18 @@ static inline vec_t movlhps(vec_t x, vec # define broadcast_quartet(x) B(broadcastf64x4_, , x, undef(), ~0) # define insert_quartet(x, y, p) B(insertf64x4_, _mask, x, y, p, undef(), ~0) # endif +# ifdef __AVX512DQ__ +# define frac(x) B(reducepd, _mask, x, 0b00001011, undef(), ~0) +# endif # define getexp(x) BR(getexppd, _mask, x, undef(), ~0) # define getmant(x) BR(getmantpd, _mask, x, 0, undef(), ~0) -# define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0) -# define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) +# ifdef __AVX512DQ__ +# define max(x, y) BR(rangepd, _mask, x, y, 0b0101, undef(), ~0) +# define min(x, y) BR(rangepd, _mask, x, y, 0b0100, undef(), ~0) +# else +# define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0) +# define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) +# endif # define mix(x, y) B(movapd, _mask, x, y, 0b01010101) # define scale(x, y) BR(scalefpd, _mask, x, y, undef(), ~0) # if VEC_SIZE == 64 && defined(__AVX512ER__) --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -3962,6 +3962,39 @@ int main(int argc, char **argv) else printf("skipped\n"); + + printf("%-40s", "Testing vfpclasspsz $0x46,64(%edx),%k2..."); + if ( stack_exec && cpu_has_avx512dq ) + { + decl_insn(vfpclassps); + + asm volatile ( put_insn(vfpclassps, + /* 0x46: check for +/- 0 and neg. */ + "vfpclasspsz $0x46, 64(%0), %%k2") + :: "d" (NULL) ); + + set_insn(vfpclassps); + for ( i = 0; i < 3; ++i ) + { + res[16 + i * 5 + 0] = 0x00000000; /* +0 */ + res[16 + i * 5 + 1] = 0x80000000; /* -0 */ + res[16 + i * 5 + 2] = 0x80000001; /* -DEN */ + res[16 + i * 5 + 3] = 0xff000000; /* -FIN */ + res[16 + i * 5 + 4] = 0x7f000000; /* +FIN */ + } + res[31] = 0; + regs.edx = (unsigned long)res; + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vfpclassps) ) + goto fail; + asm volatile ( "kmovw %%k2, %0" : "=g" (rc) ); + if ( rc != 0xbdef ) + goto fail; + printf("okay\n"); + } + else + printf("skipped\n"); + #undef decl_insn #undef put_insn #undef set_insn --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -582,10 +582,16 @@ static const struct ext0f3a_table { [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x4a ... 0x4b] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x4c] = { .simd_size = simd_packed_int, .four_op = 1 }, + [0x50] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, + [0x51] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x54] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x55] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0x56] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0x57] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x5c ... 0x5f] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x60 ... 0x63] = { .simd_size = simd_packed_int, .two_op = 1 }, + [0x66] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, + [0x67] = { .simd_size = simd_scalar_vexw, .two_op = 1, .d8s = d8s_dq }, [0x68 ... 0x69] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x6a ... 0x6b] = { .simd_size = simd_scalar_opc, .four_op = 1 }, [0x6c ... 0x6d] = { .simd_size = simd_packed_fp, .four_op = 1 }, @@ -9696,6 +9702,10 @@ x86_emulate( op_bytes = 4; goto simd_imm8_zmm; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x50): /* vrangep{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x56): /* vreducep{s,d} $imm8,[xyz]mm/mem,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512dq); + /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x26): /* vgetmantp{s,d} $imm8,[xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x54): /* vfixupimmp{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ host_and_vcpu_must_have(avx512f); @@ -9703,6 +9713,10 @@ x86_emulate( avx512_vlen_check(false); goto simd_imm8_zmm; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x51): /* vranges{s,d} $imm8,xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x57): /* vreduces{s,d} $imm8,xmm/mem,xmm,xmm{k} */ + host_and_vcpu_must_have(avx512dq); + /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x27): /* vgetmants{s,d} $imm8,xmm/mem,xmm,xmm{k} */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x55): /* vfixupimms{s,d} $imm8,xmm/mem,xmm,xmm{k} */ host_and_vcpu_must_have(avx512f); @@ -9858,6 +9872,16 @@ x86_emulate( dst.type = OP_NONE; break; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x66): /* vfpclassp{s,d} $imm8,[xyz]mm/mem,k{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x67): /* vfpclasss{s,d} $imm8,[xyz]mm/mem,k{k} */ + host_and_vcpu_must_have(avx512dq); + generate_exception_if(!evex.r || !evex.R || evex.z, EXC_UD); + if ( !(b & 1) ) + goto avx512f_imm8_no_sae; + generate_exception_if(evex.brs, EXC_UD); + avx512_vlen_check(true); + goto simd_imm8_zmm; + case X86EMUL_OPC(0x0f3a, 0xcc): /* sha1rnds4 $imm8,xmm/m128,xmm */ host_and_vcpu_must_have(sha); op_bytes = 16; From patchwork Fri Mar 15 10:56:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854513 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9F501390 for ; Fri, 15 Mar 2019 10:59:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DF702A947 for ; Fri, 15 Mar 2019 10:59:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9209E2A949; Fri, 15 Mar 2019 10:59:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 80C692A947 for ; Fri, 15 Mar 2019 10:59:01 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kWM-0007WA-1S; Fri, 15 Mar 2019 10:57:02 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kWK-0007Vw-Os for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:57:00 +0000 X-Inumbo-ID: 0dd1094e-4711-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 0dd1094e-4711-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:56:58 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:56:58 -0600 Message-Id: <5C8B84F8020000780021F248@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:56:56 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 30/50] x86emul: support AVX512{F, _VBMI2} compress/expand insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jan Beulich --- v7: Re-base. v6: Re-base. Add tests for the byte/word forms. v5: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -109,6 +109,7 @@ static const struct test avx512f_all[] = INSN_FP(cmp, 0f, c2), INSN(comisd, 66, 0f, 2f, el, q, el), INSN(comiss, , 0f, 2f, el, d, el), + INSN(compress, 66, 0f38, 8a, vl, sd, el), INSN(cvtdq2pd, f3, 0f, e6, vl_2, d, vl), INSN(cvtdq2ps, , 0f, 5b, vl, d, vl), INSN(cvtpd2dq, f2, 0f, e6, vl, q, vl), @@ -140,6 +141,7 @@ static const struct test avx512f_all[] = INSN(cvtusi2sd, f2, 0f, 7b, el, dq64, el), INSN(cvtusi2ss, f3, 0f, 7b, el, dq64, el), INSN_FP(div, 0f, 5e), + INSN(expand, 66, 0f38, 88, vl, sd, el), INSN(fixupimm, 66, 0f3a, 54, vl, sd, vl), INSN(fixupimm, 66, 0f3a, 55, el, sd, el), INSN(fmadd132, 66, 0f38, 98, vl, sd, vl), @@ -214,6 +216,7 @@ static const struct test avx512f_all[] = INSN(pcmpgtd, 66, 0f, 66, vl, d, vl), INSN(pcmpgtq, 66, 0f38, 37, vl, q, vl), INSN(pcmpu, 66, 0f3a, 1e, vl, dq, vl), + INSN(pcompress, 66, 0f38, 8b, vl, dq, el), INSN(permi2, 66, 0f38, 76, vl, dq, vl), INSN(permi2, 66, 0f38, 77, vl, sd, vl), INSN(permilpd, 66, 0f38, 0d, vl, q, vl), @@ -222,6 +225,7 @@ static const struct test avx512f_all[] = INSN(permilps, 66, 0f3a, 04, vl, d, vl), INSN(permt2, 66, 0f38, 7e, vl, dq, vl), INSN(permt2, 66, 0f38, 7f, vl, sd, vl), + INSN(pexpand, 66, 0f38, 89, vl, dq, el), INSN(pmaxs, 66, 0f38, 3d, vl, dq, vl), INSN(pmaxu, 66, 0f38, 3f, vl, dq, vl), INSN(pmins, 66, 0f38, 39, vl, dq, vl), @@ -509,6 +513,11 @@ static const struct test avx512_vbmi_all INSN(permt2b, 66, 0f38, 7d, vl, b, vl), }; +static const struct test avx512_vbmi2_all[] = { + INSN(pcompress, 66, 0f38, 63, vl, bw, el), + INSN(pexpand, 66, 0f38, 62, vl, bw, el), +}; + static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; static const unsigned char vl_128[] = { VL_128 }; static const unsigned char vl_no128[] = { VL_512, VL_256 }; @@ -865,4 +874,5 @@ void evex_disp8_test(void *instr, struct RUN(avx512dq, 512); RUN(avx512er, 512); RUN(avx512_vbmi, all); + RUN(avx512_vbmi2, all); } --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -3995,6 +3995,227 @@ int main(int argc, char **argv) else printf("skipped\n"); + /* + * The following compress/expand tests are not only making sure the + * accessed data is correct, but they also verify (by placing operands + * on the mapping boundaries) that elements controlled by clear mask + * bits don't get accessed. + */ + if ( stack_exec && cpu_has_avx512f ) + { + decl_insn(vpcompressd); + decl_insn(vpcompressq); + decl_insn(vpexpandd); + decl_insn(vpexpandq); + static const struct { + unsigned int d[16]; + } dsrc = { { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 } }; + static const struct { + unsigned long long q[8]; + } qsrc = { { 0, 1, 2, 3, 4, 5, 6, 7 } }; + unsigned int *ptr = res + MMAP_SZ / sizeof(*res) - 32; + + printf("%-40s", "Testing vpcompressd %zmm1,24*4(%ecx){%k2}..."); + asm volatile ( "kmovw %1, %%k2\n\t" + "vmovdqu32 %2, %%zmm1\n" + put_insn(vpcompressd, + "vpcompressd %%zmm1, 24*4(%0)%{%%k2%}") + :: "c" (NULL), "r" (0x55aa), "m" (dsrc) ); + + memset(ptr, 0xdb, 32 * 4); + set_insn(vpcompressd); + regs.ecx = (unsigned long)ptr; + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vpcompressd) || + memcmp(ptr, ptr + 8, 16 * 4) ) + goto fail; + for ( i = 0; i < 4; ++i ) + if ( ptr[24 + i] != 2 * i + 1 ) + goto fail; + for ( ; i < 8; ++i ) + if ( ptr[24 + i] != 2 * i ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing vpexpandd 8*4(%edx),%zmm3{%k2}{z}..."); + asm volatile ( "vpternlogd $0x81, %%zmm3, %%zmm3, %%zmm3\n" + put_insn(vpexpandd, + "vpexpandd 8*4(%0), %%zmm3%{%%k2%}%{z%}") + :: "d" (NULL) ); + set_insn(vpexpandd); + regs.edx = (unsigned long)(ptr + 16); + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vpexpandd) ) + goto fail; + asm ( "vmovdqa32 %%zmm1, %%zmm2%{%%k2%}%{z%}\n\t" + "vpcmpeqd %%zmm2, %%zmm3, %%k0\n\t" + "kmovw %%k0, %0" + : "=r" (rc) ); + if ( rc != 0xffff ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing vpcompressq %zmm4,12*8(%edx){%k3}..."); + asm volatile ( "kmovw %1, %%k3\n\t" + "vmovdqu64 %2, %%zmm4\n" + put_insn(vpcompressq, + "vpcompressq %%zmm4, 12*8(%0)%{%%k3%}") + :: "d" (NULL), "r" (0x5a), "m" (qsrc) ); + + memset(ptr, 0xdb, 16 * 8); + set_insn(vpcompressq); + regs.edx = (unsigned long)ptr; + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vpcompressq) || + memcmp(ptr, ptr + 8, 8 * 8) ) + goto fail; + for ( i = 0; i < 2; ++i ) + { + if ( ptr[(12 + i) * 2] != 2 * i + 1 || + ptr[(12 + i) * 2 + 1] ) + goto fail; + } + for ( ; i < 4; ++i ) + { + if ( ptr[(12 + i) * 2] != 2 * i || + ptr[(12 + i) * 2 + 1] ) + goto fail; + } + printf("okay\n"); + + printf("%-40s", "Testing vpexpandq 4*8(%ecx),%zmm5{%k3}{z}..."); + asm volatile ( "vpternlogq $0x81, %%zmm5, %%zmm5, %%zmm5\n" + put_insn(vpexpandq, + "vpexpandq 4*8(%0), %%zmm5%{%%k3%}%{z%}") + :: "c" (NULL) ); + set_insn(vpexpandq); + regs.ecx = (unsigned long)(ptr + 16); + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vpexpandq) ) + goto fail; + asm ( "vmovdqa64 %%zmm4, %%zmm6%{%%k3%}%{z%}\n\t" + "vpcmpeqq %%zmm5, %%zmm6, %%k0\n\t" + "kmovw %%k0, %0" + : "=r" (rc) ); + if ( rc != 0xff ) + goto fail; + printf("okay\n"); + } + +#if __GNUC__ > 7 /* can't check for __AVX512VBMI2__ here */ + if ( stack_exec && cpu_has_avx512_vbmi2 ) + { + decl_insn(vpcompressb); + decl_insn(vpcompressw); + decl_insn(vpexpandb); + decl_insn(vpexpandw); + static const struct { + unsigned char b[64]; + } bsrc = { { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 21, 22, 23, + 24, 25, 26, 27, 28, 29, 30, 31, + 32, 33, 34, 35, 36, 37, 38, 39, + 40, 41, 42, 43, 44, 45, 46, 47, + 48, 49, 50, 51, 52, 53, 54, 55, + 56, 57, 58, 59, 60, 61, 62, 63 } }; + static const struct { + unsigned short w[32]; + } wsrc = { { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 21, 22, 23, + 24, 25, 26, 27, 28, 29, 30, 31 } }; + unsigned char *ptr = (void *)res + MMAP_SZ - 128; + unsigned long long w = 0x55555555aaaaaaaaULL; + + printf("%-40s", "Testing vpcompressb %zmm1,96*1(%ecx){%k2}..."); + asm volatile ( "kmovq %1, %%k2\n\t" + "vmovdqu8 %2, %%zmm1\n" + put_insn(vpcompressb, + "vpcompressb %%zmm1, 96*1(%0)%{%%k2%}") + :: "c" (NULL), "m" (w), "m" (bsrc) ); + + memset(ptr, 0xdb, 128 * 1); + set_insn(vpcompressb); + regs.ecx = (unsigned long)ptr; + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vpcompressb) || + memcmp(ptr, ptr + 32, 64 * 1) ) + goto fail; + for ( i = 0; i < 16; ++i ) + if ( ptr[96 + i] != 2 * i + 1 ) + goto fail; + for ( ; i < 32; ++i ) + if ( ptr[96 + i] != 2 * i ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing vpexpandb 32*1(%edx),%zmm3{%k2}{z}..."); + asm volatile ( "vpternlogd $0x81, %%zmm3, %%zmm3, %%zmm3\n" + put_insn(vpexpandb, + "vpexpandb 32*1(%0), %%zmm3%{%%k2%}%{z%}") + :: "d" (NULL) ); + set_insn(vpexpandb); + regs.edx = (unsigned long)(ptr + 64); + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vpexpandb) ) + goto fail; + asm ( "vmovdqu8 %%zmm1, %%zmm2%{%%k2%}%{z%}\n\t" + "vpcmpeqb %%zmm2, %%zmm3, %%k0\n\t" + "kmovq %%k0, %0" + : "=m" (w) ); + if ( w != 0xffffffffffffffffULL ) + goto fail; + printf("okay\n"); + + printf("%-40s", "Testing vpcompressw %zmm4,48*2(%edx){%k3}..."); + asm volatile ( "kmovd %1, %%k3\n\t" + "vmovdqu16 %2, %%zmm4\n" + put_insn(vpcompressw, + "vpcompressw %%zmm4, 48*2(%0)%{%%k3%}") + :: "d" (NULL), "r" (0x5555aaaa), "m" (wsrc) ); + + memset(ptr, 0xdb, 64 * 2); + set_insn(vpcompressw); + regs.edx = (unsigned long)ptr; + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vpcompressw) || + memcmp(ptr, ptr + 32, 32 * 2) ) + goto fail; + for ( i = 0; i < 8; ++i ) + { + if ( ptr[(48 + i) * 2] != 2 * i + 1 || + ptr[(48 + i) * 2 + 1] ) + goto fail; + } + for ( ; i < 16; ++i ) + { + if ( ptr[(48 + i) * 2] != 2 * i || + ptr[(48 + i) * 2 + 1] ) + goto fail; + } + printf("okay\n"); + + printf("%-40s", "Testing vpexpandw 16*2(%ecx),%zmm5{%k3}{z}..."); + asm volatile ( "vpternlogd $0x81, %%zmm5, %%zmm5, %%zmm5\n" + put_insn(vpexpandw, + "vpexpandw 16*2(%0), %%zmm5%{%%k3%}%{z%}") + :: "c" (NULL) ); + set_insn(vpexpandw); + regs.ecx = (unsigned long)(ptr + 64); + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(vpexpandw) ) + goto fail; + asm ( "vmovdqu16 %%zmm4, %%zmm6%{%%k3%}%{z%}\n\t" + "vpcmpeqw %%zmm5, %%zmm6, %%k0\n\t" + "kmovq %%k0, %0" + : "=m" (w) ); + if ( w != 0xffffffff ) + goto fail; + printf("okay\n"); + } +#endif + #undef decl_insn #undef put_insn #undef set_insn --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -59,6 +59,9 @@ (type *)((char *)mptr__ - offsetof(type, member)); \ }) +#define hweight32 __builtin_popcount +#define hweight64 __builtin_popcountll + #define is_canonical_address(x) (((int64_t)(x) >> 47) == ((int64_t)(x) >> 63)) extern uint32_t mxcsr_mask; @@ -138,6 +141,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512bw (cp.feat.avx512bw && xcr0_mask(0xe6)) #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) +#define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -482,6 +482,8 @@ static const struct ext0f38_table { [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 }, [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 }, [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 }, + [0x62] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_bw }, + [0x63] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_bw }, [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x78] = { .simd_size = simd_other, .two_op = 1 }, @@ -489,6 +491,10 @@ static const struct ext0f38_table { [0x7a ... 0x7c] = { .simd_size = simd_none, .two_op = 1 }, [0x7d ... 0x7e] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x7f] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, + [0x88] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_dq }, + [0x89] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_dq }, + [0x8a] = { .simd_size = simd_packed_fp, .to_mem = 1, .two_op = 1, .d8s = d8s_dq }, + [0x8b] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_dq }, [0x8c] = { .simd_size = simd_packed_int }, [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 }, @@ -1900,6 +1906,7 @@ static bool vcpu_has( #define vcpu_has_avx512bw() vcpu_has( 7, EBX, 30, ctxt, ops) #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) #define vcpu_has_avx512_vbmi() vcpu_has( 7, ECX, 1, ctxt, ops) +#define vcpu_has_avx512_vbmi2() vcpu_has( 7, ECX, 6, ctxt, ops) #define vcpu_has_rdpid() vcpu_has( 7, ECX, 22, ctxt, ops) #define vcpu_has_clzero() vcpu_has(0x80000008, EBX, 0, ctxt, ops) @@ -8905,6 +8912,36 @@ x86_emulate( generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD); goto simd_0f_avx2; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x62): /* vpexpand{b,w} [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x63): /* vpcompress{b,w} [xyz]mm,[xyz]mm/mem{k} */ + host_and_vcpu_must_have(avx512_vbmi2); + elem_bytes = 1 << evex.w; + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x88): /* vexpandp{s,d} [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x89): /* vpexpand{d,q} [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x8a): /* vcompressp{s,d} [xyz]mm,[xyz]mm/mem{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x8b): /* vpcompress{d,q} [xyz]mm,[xyz]mm/mem{k} */ + host_and_vcpu_must_have(avx512f); + generate_exception_if(evex.brs, EXC_UD); + avx512_vlen_check(false); + /* + * For the respective code below the main switch() to work we need to + * compact op_mask here: Memory accesses are non-sparse even if the + * mask register has sparsely set bits. + */ + if ( likely(fault_suppression) ) + { + n = 1 << ((b & 8 ? 2 : 4) + evex.lr - evex.w); + EXPECT(elem_bytes > 0); + ASSERT(op_bytes == n * elem_bytes); + op_mask &= ~0ULL >> (64 - n); + n = hweight64(op_mask); + op_bytes = n * elem_bytes; + if ( n ) + op_mask = ~0ULL >> (64 - n); + } + goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x8d): /* vperm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -109,6 +109,7 @@ /* CPUID level 0x00000007:0.ecx */ #define cpu_has_avx512_vbmi boot_cpu_has(X86_FEATURE_AVX512_VBMI) +#define cpu_has_avx512_vbmi2 boot_cpu_has(X86_FEATURE_AVX512_VBMI2) #define cpu_has_rdpid boot_cpu_has(X86_FEATURE_RDPID) /* CPUID level 0x80000007.edx */ --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -228,6 +228,7 @@ XEN_CPUFEATURE(AVX512_VBMI, 6*32+ 1) / XEN_CPUFEATURE(UMIP, 6*32+ 2) /*S User Mode Instruction Prevention */ XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ +XEN_CPUFEATURE(AVX512_VBMI2, 6*32+ 6) /*A Additional AVX-512 Vector Byte Manipulation Instrs */ XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A POPCNT for vectors of DW/QW */ XEN_CPUFEATURE(RDPID, 6*32+22) /*A RDPID instruction */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -266,10 +266,10 @@ def crunch_numbers(state): AVX512BW, AVX512VL, AVX512_4VNNIW, AVX512_4FMAPS, AVX512_VPOPCNTDQ], - # AVX512 extensions acting solely on vectors of bytes/words are made + # AVX512 extensions acting (solely) on vectors of bytes/words are made # dependents of AVX512BW (as to requiring wider than 16-bit mask # registers), despite the SDM not formally making this connection. - AVX512BW: [AVX512_VBMI], + AVX512BW: [AVX512_VBMI, AVX512_VBMI2], # The features: # * Single Thread Indirect Branch Predictors From patchwork Fri Mar 15 10:58:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854515 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4F5A01575 for ; Fri, 15 Mar 2019 11:00:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30D242A6AF for ; Fri, 15 Mar 2019 11:00:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1F3122A948; Fri, 15 Mar 2019 11:00:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 473092A6AF for ; Fri, 15 Mar 2019 11:00:08 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kXg-0007mS-TL; Fri, 15 Mar 2019 10:58:24 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kXf-0007m6-Ux for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:58:23 +0000 X-Inumbo-ID: 3fabcaf7-4711-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 3fabcaf7-4711-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:58:22 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:58:21 -0600 Message-Id: <5C8B854E020000780021F24B@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:58:22 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 31/50] x86emul: support remaining misc AVX512{F, BW} insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This completes support of AVX512BW in the insn emulator, and leaves just the scatter/gather ones open in the AVX512F set. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v5: New. --- TBD: The *blendm* inline functions don't reliably produce the intended insns, as the respective moves are about as good a fit for the compiler when looking for a match for the intended operation. We'd need to switch to inline assembly if we wanted to guarantee the testing of those insns. Thoughts? --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -105,6 +105,8 @@ enum esz { static const struct test avx512f_all[] = { INSN_FP(add, 0f, 58), + INSN(align, 66, 0f3a, 03, vl, dq, vl), + INSN(blendm, 66, 0f38, 65, vl, sd, vl), INSN(broadcastss, 66, 0f38, 18, el, d, el), INSN_FP(cmp, 0f, c2), INSN(comisd, 66, 0f, 2f, el, q, el), @@ -207,6 +209,7 @@ static const struct test avx512f_all[] = INSN(paddq, 66, 0f, d4, vl, q, vl), INSN(pand, 66, 0f, db, vl, dq, vl), INSN(pandn, 66, 0f, df, vl, dq, vl), + INSN(pblendm, 66, 0f38, 64, vl, dq, vl), // pbroadcast, 66, 0f38, 7c, dq64 INSN(pbroadcastd, 66, 0f38, 58, el, d, el), INSN(pbroadcastq, 66, 0f38, 59, el, q, el), @@ -354,6 +357,7 @@ static const struct test avx512f_512[] = }; static const struct test avx512bw_all[] = { + INSN(dbpsadbw, 66, 0f3a, 42, vl, b, vl), INSN(movdqu8, f2, 0f, 6f, vl, b, vl), INSN(movdqu8, f2, 0f, 7f, vl, b, vl), INSN(movdqu16, f2, 0f, 6f, vl, w, vl), @@ -373,6 +377,7 @@ static const struct test avx512bw_all[] INSN(palignr, 66, 0f3a, 0f, vl, b, vl), INSN(pavgb, 66, 0f, e0, vl, b, vl), INSN(pavgw, 66, 0f, e3, vl, w, vl), + INSN(pblendm, 66, 0f38, 66, vl, bw, vl), INSN(pbroadcastb, 66, 0f38, 78, el, b, el), // pbroadcastb, 66, 0f38, 7a, b INSN(pbroadcastw, 66, 0f38, 79, el_2, b, vl), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -297,7 +297,7 @@ static inline vec_t movlhps(vec_t x, vec # define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) # endif -# define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) +# define mix(x, y) B(blendmps_, _mask, x, y, (0b1010101010101010 & ALL_TRUE)) # define scale(x, y) BR(scalefps, _mask, x, y, undef(), ~0) # if VEC_SIZE == 64 && defined(__AVX512ER__) # define recip(x) BR(rcp28ps, _mask, x, undef(), ~0) @@ -370,7 +370,7 @@ static inline vec_t movlhps(vec_t x, vec # define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) # endif -# define mix(x, y) B(movapd, _mask, x, y, 0b01010101) +# define mix(x, y) B(blendmpd_, _mask, x, y, 0b10101010) # define scale(x, y) BR(scalefpd, _mask, x, y, undef(), ~0) # if VEC_SIZE == 64 && defined(__AVX512ER__) # define recip(x) BR(rcp28pd, _mask, x, undef(), ~0) @@ -564,8 +564,9 @@ static inline vec_t movlhps(vec_t x, vec 0b00011011, (vsi_t)undef(), ~0)) # define swap2(x) ((vec_t)B_(permvarsi, _mask, (vsi_t)(x), (vsi_t)(inv - 1), (vsi_t)undef(), ~0)) # endif -# define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ - (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) +# define mix(x, y) ((vec_t)B(blendmd_, _mask, (vsi_t)(x), (vsi_t)(y), \ + (0b1010101010101010 & ((1 << ELEM_COUNT) - 1)))) +# define rotr(x, n) ((vec_t)B(alignd, _mask, (vsi_t)(x), (vsi_t)(x), n, (vsi_t)undef(), ~0)) # define shrink1(x) ((half_t)B(pmovqd, _mask, (vdi_t)(x), (vsi_half_t){}, ~0)) # elif INT_SIZE == 8 || UINT_SIZE == 8 # define broadcast(x) ({ \ @@ -602,7 +603,8 @@ static inline vec_t movlhps(vec_t x, vec 0b01001110, (vsi_t)undef(), ~0)) # define swap2(x) ((vec_t)B(permvardi, _mask, (vdi_t)(x), (vdi_t)(inv - 1), (vdi_t)undef(), ~0)) # endif -# define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101)) +# define mix(x, y) ((vec_t)B(blendmq_, _mask, (vdi_t)(x), (vdi_t)(y), 0b10101010)) +# define rotr(x, n) ((vec_t)B(alignq, _mask, (vdi_t)(x), (vdi_t)(x), n, (vdi_t)undef(), ~0)) # if VEC_SIZE == 32 # define swap3(x) ((vec_t)B_(permdi, _mask, (vdi_t)(x), 0b00011011, (vdi_t)undef(), ~0)) # elif VEC_SIZE == 64 @@ -654,8 +656,8 @@ static inline vec_t movlhps(vec_t x, vec # define interleave_hi(x, y) ((vec_t)B(vpermi2varqi, _mask, (vqi_t)(x), interleave_hi, (vqi_t)(y), ~0)) # define interleave_lo(x, y) ((vec_t)B(vpermt2varqi, _mask, interleave_lo, (vqi_t)(x), (vqi_t)(y), ~0)) # endif -# define mix(x, y) ((vec_t)B(movdquqi, _mask, (vqi_t)(x), (vqi_t)(y), \ - (0b0101010101010101010101010101010101010101010101010101010101010101LL & ALL_TRUE))) +# define mix(x, y) ((vec_t)B(blendmb_, _mask, (vqi_t)(x), (vqi_t)(y), \ + (0b1010101010101010101010101010101010101010101010101010101010101010LL & ALL_TRUE))) # define shrink1(x) ((half_t)B(pmovwb, _mask, (vhi_t)(x), (vqi_half_t){}, ~0)) # define shrink2(x) ((quarter_t)B(pmovdb, _mask, (vsi_t)(x), (vqi_quarter_t){}, ~0)) # define shrink3(x) ((eighth_t)B(pmovqb, _mask, (vdi_t)(x), (vqi_eighth_t){}, ~0)) @@ -687,8 +689,8 @@ static inline vec_t movlhps(vec_t x, vec # define interleave_hi(x, y) ((vec_t)B(vpermi2varhi, _mask, (vhi_t)(x), interleave_hi, (vhi_t)(y), ~0)) # define interleave_lo(x, y) ((vec_t)B(vpermt2varhi, _mask, interleave_lo, (vhi_t)(x), (vhi_t)(y), ~0)) # endif -# define mix(x, y) ((vec_t)B(movdquhi, _mask, (vhi_t)(x), (vhi_t)(y), \ - (0b01010101010101010101010101010101 & ALL_TRUE))) +# define mix(x, y) ((vec_t)B(blendmw_, _mask, (vhi_t)(x), (vhi_t)(y), \ + (0b10101010101010101010101010101010 & ALL_TRUE))) # define shrink1(x) ((half_t)B(pmovdw, _mask, (vsi_t)(x), (vhi_half_t){}, ~0)) # define shrink2(x) ((quarter_t)B(pmovqw, _mask, (vdi_t)(x), (vhi_quarter_t){}, ~0)) # define swap2(x) ((vec_t)B(permvarhi, _mask, (vhi_t)(x), (vhi_t)(inv - 1), (vhi_t)undef(), ~0)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -484,6 +484,7 @@ static const struct ext0f38_table { [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x62] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_bw }, [0x63] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_bw }, + [0x64 ... 0x66] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x78] = { .simd_size = simd_other, .two_op = 1 }, @@ -550,6 +551,7 @@ static const struct ext0f3a_table { [0x00] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x01] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x02] = { .simd_size = simd_packed_int }, + [0x03] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x06] = { .simd_size = simd_packed_fp }, [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, @@ -581,8 +583,7 @@ static const struct ext0f3a_table { [0x3b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x40 ... 0x41] = { .simd_size = simd_packed_fp }, - [0x42] = { .simd_size = simd_packed_int }, - [0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x42 ... 0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x44] = { .simd_size = simd_packed_int }, [0x46] = { .simd_size = simd_packed_int }, [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 }, @@ -6204,6 +6205,8 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x47): /* vpsllv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x4c): /* vrcp14p{s,d} [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x4e): /* vrsqrt14p{s,d} [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x64): /* vpblendm{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x65): /* vblendmp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ avx512f_no_sae: host_and_vcpu_must_have(avx512f); generate_exception_if(ea.type != OP_MEM && evex.brs, EXC_UD); @@ -6961,6 +6964,7 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x0b): /* vpmulhrsw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x66): /* vpblendm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ host_and_vcpu_must_have(avx512bw); generate_exception_if(evex.brs, EXC_UD); elem_bytes = 1 << (b & 1); @@ -8130,10 +8134,12 @@ x86_emulate( goto simd_0f_to_gpr; CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0xc6): /* vshufp{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - fault_suppression = false; generate_exception_if(evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK), EXC_UD); /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x03): /* valign{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + fault_suppression = false; + /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x25): /* vpternlog{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ avx512f_imm8_no_sae: host_and_vcpu_must_have(avx512f); @@ -9471,6 +9477,9 @@ x86_emulate( insn_bytes = PFX_BYTES + 4; break; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x42): /* vdbpsadbw $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + generate_exception_if(evex.w, EXC_UD); + /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0f): /* vpalignr $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ fault_suppression = false; goto avx512bw_imm; From patchwork Fri Mar 15 10:58:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854517 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC0371575 for ; Fri, 15 Mar 2019 11:00:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CECDF2A938 for ; Fri, 15 Mar 2019 11:00:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C00622A948; Fri, 15 Mar 2019 11:00:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8F10D2A938 for ; Fri, 15 Mar 2019 11:00:36 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kY8-0007tH-7w; Fri, 15 Mar 2019 10:58:52 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kY6-0007sy-Fc for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:58:50 +0000 X-Inumbo-ID: 4f3598e0-4711-11e9-acbf-7385351564a4 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 4f3598e0-4711-11e9-acbf-7385351564a4; Fri, 15 Mar 2019 10:58:48 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:58:47 -0600 Message-Id: <5C8B8567020000780021F24E@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:58:47 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 32/50] x86emul: support AVX512F gather insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This requires getting modrm_reg and sib_index set correctly in the EVEX case, to account for the high 16 [XYZ]MM registers. Extend the adjustments to modrm_rm as well, such that x86_insn_modrm() would correctly report register numbers (this was a latent issue only as we don't currently have callers of that function which would care about an EVEX case). The adjustment in turn requires dropping the assertion from decode_gpr() as well as re-introducing the explicit masking, as we now need to actively mask off the high bit when a GPR is meant. _decode_gpr() invocations also need slight adjustments, when invoked in generic code ahead of the main switch(). All other uses of modrm_reg and modrm_rm already get suitably masked where necessary. There was also an encoding mistake in the EVEX Disp8 test code, which was benign (due to %rdx getting set to zero) to all non-vSIB tests as it mistakenly encoded (%rdx,%rdx) instead of (%rdx,%riz). In the vSIB case this meant (%rdx,%zmm2) instead of the intended (%rdx,%zmm4). Likewise the access count check wasn't entirely correct for the S/G case: In the quad-word-index but dword-data case only half the number of full vector elements get accessed. As an unrelated change in the main test harness source file distinguish the "n/a" messages by bitness. Signed-off-by: Jan Beulich --- v8: Re-base. v7: Fix ByteOp register decode. Re-base. v6: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -18,7 +18,7 @@ CFLAGS += $(CFLAGS_xeninclude) SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er FMA := fma4 fma -SG := avx2-sg +SG := avx2-sg avx512f-sg avx512vl-sg TESTCASES := blowfish $(SIMD) $(FMA) $(SG) OPMASK := avx512f avx512dq avx512bw @@ -66,6 +66,14 @@ xop-flts := $(avx-flts) avx512f-vecs := 64 16 32 avx512f-ints := 4 8 avx512f-flts := 4 8 +avx512f-sg-vecs := 64 +avx512f-sg-idxs := 4 8 +avx512f-sg-ints := $(avx512f-ints) +avx512f-sg-flts := $(avx512f-flts) +avx512vl-sg-vecs := 16 32 +avx512vl-sg-idxs := $(avx512f-sg-idxs) +avx512vl-sg-ints := $(avx512f-ints) +avx512vl-sg-flts := $(avx512f-flts) avx512bw-vecs := $(avx512f-vecs) avx512bw-ints := 1 2 avx512bw-flts := --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -176,6 +176,8 @@ static const struct test avx512f_all[] = INSN(fnmsub213, 66, 0f38, af, el, sd, el), INSN(fnmsub231, 66, 0f38, be, vl, sd, vl), INSN(fnmsub231, 66, 0f38, bf, el, sd, el), + INSN(gatherd, 66, 0f38, 92, vl, sd, el), + INSN(gatherq, 66, 0f38, 93, vl, sd, el), INSN(getexp, 66, 0f38, 42, vl, sd, vl), INSN(getexp, 66, 0f38, 43, el, sd, el), INSN(getmant, 66, 0f3a, 26, vl, sd, vl), @@ -229,6 +231,8 @@ static const struct test avx512f_all[] = INSN(permt2, 66, 0f38, 7e, vl, dq, vl), INSN(permt2, 66, 0f38, 7f, vl, sd, vl), INSN(pexpand, 66, 0f38, 89, vl, dq, el), + INSN(pgatherd, 66, 0f38, 90, vl, dq, el), + INSN(pgatherq, 66, 0f38, 91, vl, dq, el), INSN(pmaxs, 66, 0f38, 3d, vl, dq, vl), INSN(pmaxu, 66, 0f38, 3f, vl, dq, vl), INSN(pmins, 66, 0f38, 39, vl, dq, vl), @@ -698,7 +702,7 @@ static void test_one(const struct test * instr[3] = evex.raw[2]; instr[4] = test->opc; instr[5] = 0x44 | (test->ext << 3); /* ModR/M */ - instr[6] = 0x12; /* SIB: base rDX, index none / xMM4 */ + instr[6] = 0x22; /* SIB: base rDX, index none / xMM4 */ instr[7] = 1; /* Disp8 */ instr[8] = 0; /* immediate, if any */ @@ -718,7 +722,8 @@ static void test_one(const struct test * if ( accessed[i] ) goto fail; for ( ; i < (test->scale == SC_vl ? vsz : esz) + (sg ? esz : vsz); ++i ) - if ( accessed[i] != (sg ? vsz / esz : 1) ) + if ( accessed[i] != (sg ? (vsz / esz) >> (test->opc & 1 & !evex.w) + : 1) ) goto fail; for ( ; i < ARRAY_SIZE(accessed); ++i ) if ( accessed[i] ) --- a/tools/tests/x86_emulator/simd-sg.c +++ b/tools/tests/x86_emulator/simd-sg.c @@ -35,13 +35,78 @@ typedef long long __attribute__((vector_ #define ITEM_COUNT (VEC_SIZE / ELEM_SIZE < IVEC_SIZE / IDX_SIZE ? \ VEC_SIZE / ELEM_SIZE : IVEC_SIZE / IDX_SIZE) -#if VEC_SIZE == 16 -# define to_bool(cmp) __builtin_ia32_ptestc128(cmp, (vec_t){} == 0) -#else -# define to_bool(cmp) __builtin_ia32_ptestc256(cmp, (vec_t){} == 0) -#endif +#if defined(__AVX512F__) +# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT)) +# if ELEM_SIZE == 4 +# if IDX_SIZE == 4 || defined(__AVX512VL__) +# define to_mask(msk) B(ptestmd, , (vsi_t)(msk), (vsi_t)(msk), ~0) +# define eq(x, y) (B(pcmpeqd, _mask, (vsi_t)(x), (vsi_t)(y), -1) == ALL_TRUE) +# else +# define widen(x) __builtin_ia32_pmovzxdq512_mask((vsi_t)(x), (idi_t){}, ~0) +# define to_mask(msk) __builtin_ia32_ptestmq512(widen(msk), widen(msk), ~0) +# define eq(x, y) (__builtin_ia32_pcmpeqq512_mask(widen(x), widen(y), ~0) == ALL_TRUE) +# endif +# define BG_(dt, it, reg, mem, idx, msk, scl) \ + __builtin_ia32_gather##it##dt(reg, mem, idx, to_mask(msk), scl) +# else +# define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE) +# define BG_(dt, it, reg, mem, idx, msk, scl) \ + __builtin_ia32_gather##it##dt(reg, mem, idx, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), scl) +# endif +/* + * Instead of replicating the main IDX_SIZE conditional below three times, use + * a double layer of macro invocations, allowing for substitution of the + * respective relevant macro argument tokens. + */ +# define BG(dt, it, reg, mem, idx, msk, scl) BG_(dt, it, reg, mem, idx, msk, scl) +# if VEC_MAX < 64 +/* + * The sub-512-bit built-ins have an extra "3" infix, presumably because the + * 512-bit names were chosen without the AVX512VL extension in mind (and hence + * making the latter collide with the AVX2 ones). + */ +# define si 3si +# define di 3di +# endif +# if VEC_MAX == 16 +# define v8df v2df +# define v8di v2di +# define v16sf v4sf +# define v16si v4si +# elif VEC_MAX == 32 +# define v8df v4df +# define v8di v4di +# define v16sf v8sf +# define v16si v8si +# endif +# if IDX_SIZE == 4 +# if INT_SIZE == 4 +# define gather(reg, mem, idx, msk, scl) BG(v16si, si, reg, mem, idx, msk, scl) +# elif INT_SIZE == 8 +# define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, si, (vdi_t)(reg), mem, idx, msk, scl)) +# elif FLOAT_SIZE == 4 +# define gather(reg, mem, idx, msk, scl) BG(v16sf, si, reg, mem, idx, msk, scl) +# elif FLOAT_SIZE == 8 +# define gather(reg, mem, idx, msk, scl) BG(v8df, si, reg, mem, idx, msk, scl) +# endif +# elif IDX_SIZE == 8 +# if INT_SIZE == 4 +# define gather(reg, mem, idx, msk, scl) BG(v16si, di, reg, mem, (idi_t)(idx), msk, scl) +# elif INT_SIZE == 8 +# define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, di, (vdi_t)(reg), mem, (idi_t)(idx), msk, scl)) +# elif FLOAT_SIZE == 4 +# define gather(reg, mem, idx, msk, scl) BG(v16sf, di, reg, mem, (idi_t)(idx), msk, scl) +# elif FLOAT_SIZE == 8 +# define gather(reg, mem, idx, msk, scl) BG(v8df, di, reg, mem, (idi_t)(idx), msk, scl) +# endif +# endif +#elif defined(__AVX2__) +# if VEC_SIZE == 16 +# define to_bool(cmp) __builtin_ia32_ptestc128(cmp, (vec_t){} == 0) +# else +# define to_bool(cmp) __builtin_ia32_ptestc256(cmp, (vec_t){} == 0) +# endif -#if defined(__AVX2__) # if VEC_MAX == 16 # if IDX_SIZE == 4 # if INT_SIZE == 4 @@ -111,6 +176,10 @@ typedef long long __attribute__((vector_ # endif #endif +#ifndef eq +# define eq(x, y) to_bool((x) == (y)) +#endif + #define GLUE_(x, y) x ## y #define GLUE(x, y) GLUE_(x, y) @@ -119,6 +188,7 @@ typedef long long __attribute__((vector_ #define PUT8(n) PUT4(n), PUT4((n) + 4) #define PUT16(n) PUT8(n), PUT8((n) + 8) #define PUT32(n) PUT16(n), PUT16((n) + 16) +#define PUT64(n) PUT32(n), PUT32((n) + 32) const typeof((vec_t){}[0]) array[] = { GLUE(PUT, VEC_MAX)(1), @@ -174,7 +244,7 @@ int sg_test(void) y = gather(full, array + ITEM_COUNT, -idx, full, ELEM_SIZE); #if ITEM_COUNT == ELEM_COUNT - if ( !to_bool(y == x - 1) ) + if ( !eq(y, x - 1) ) return __LINE__; #else for ( i = 0; i < ITEM_COUNT; ++i ) --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -22,6 +22,8 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512dq-opmask.h" #include "avx512bw-opmask.h" #include "avx512f.h" +#include "avx512f-sg.h" +#include "avx512vl-sg.h" #include "avx512bw.h" #include "avx512dq.h" #include "avx512er.h" @@ -90,11 +92,13 @@ static bool simd_check_avx512f(void) return cpu_has_avx512f; } #define simd_check_avx512f_opmask simd_check_avx512f +#define simd_check_avx512f_sg simd_check_avx512f static bool simd_check_avx512f_vl(void) { return cpu_has_avx512f && cpu_has_avx512vl; } +#define simd_check_avx512vl_sg simd_check_avx512f_vl static bool simd_check_avx512dq(void) { @@ -291,6 +295,14 @@ static const struct { SIMD(AVX512F u32x16, avx512f, 64u4), SIMD(AVX512F s64x8, avx512f, 64i8), SIMD(AVX512F u64x8, avx512f, 64u8), + SIMD(AVX512F S/G f32[16x32], avx512f_sg, 64x4f4), + SIMD(AVX512F S/G f64[ 8x32], avx512f_sg, 64x4f8), + SIMD(AVX512F S/G f32[ 8x64], avx512f_sg, 64x8f4), + SIMD(AVX512F S/G f64[ 8x64], avx512f_sg, 64x8f8), + SIMD(AVX512F S/G i32[16x32], avx512f_sg, 64x4i4), + SIMD(AVX512F S/G i64[ 8x32], avx512f_sg, 64x4i8), + SIMD(AVX512F S/G i32[ 8x64], avx512f_sg, 64x8i4), + SIMD(AVX512F S/G i64[ 8x64], avx512f_sg, 64x8i8), AVX512VL(VL f32x4, avx512f, 16f4), AVX512VL(VL f64x2, avx512f, 16f8), AVX512VL(VL f32x8, avx512f, 32f4), @@ -303,6 +315,22 @@ static const struct { AVX512VL(VL u64x2, avx512f, 16u8), AVX512VL(VL s64x4, avx512f, 32i8), AVX512VL(VL u64x4, avx512f, 32u8), + SIMD(AVX512VL S/G f32[4x32], avx512vl_sg, 16x4f4), + SIMD(AVX512VL S/G f64[2x32], avx512vl_sg, 16x4f8), + SIMD(AVX512VL S/G f32[2x64], avx512vl_sg, 16x8f4), + SIMD(AVX512VL S/G f64[2x64], avx512vl_sg, 16x8f8), + SIMD(AVX512VL S/G f32[8x32], avx512vl_sg, 32x4f4), + SIMD(AVX512VL S/G f64[4x32], avx512vl_sg, 32x4f8), + SIMD(AVX512VL S/G f32[4x64], avx512vl_sg, 32x8f4), + SIMD(AVX512VL S/G f64[4x64], avx512vl_sg, 32x8f8), + SIMD(AVX512VL S/G i32[4x32], avx512vl_sg, 16x4i4), + SIMD(AVX512VL S/G i64[2x32], avx512vl_sg, 16x4i8), + SIMD(AVX512VL S/G i32[2x64], avx512vl_sg, 16x8i4), + SIMD(AVX512VL S/G i64[2x64], avx512vl_sg, 16x8i8), + SIMD(AVX512VL S/G i32[8x32], avx512vl_sg, 32x4i4), + SIMD(AVX512VL S/G i64[4x32], avx512vl_sg, 32x4i8), + SIMD(AVX512VL S/G i32[4x64], avx512vl_sg, 32x8i4), + SIMD(AVX512VL S/G i64[4x64], avx512vl_sg, 32x8i8), SIMD(AVX512BW s8x64, avx512bw, 64i1), SIMD(AVX512BW u8x64, avx512bw, 64u1), SIMD(AVX512BW s16x32, avx512bw, 64i2), @@ -4260,7 +4288,7 @@ int main(int argc, char **argv) if ( !blobs[j].size ) { - printf("%-39s n/a\n", blobs[j].name); + printf("%-39s n/a (%u-bit)\n", blobs[j].name, blobs[j].bitness); continue; } --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -499,7 +499,7 @@ static const struct ext0f38_table { [0x8c] = { .simd_size = simd_packed_int }, [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 }, - [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 }, + [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq }, [0x96 ... 0x98] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x99] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x9a] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, @@ -3054,7 +3054,8 @@ x86_decode( d &= ~ModRM; #undef ModRM /* Only its aliases are valid to use from here on. */ - modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3); + modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3) | + ((evex_encoded() && !evex.R) << 4); modrm_rm = modrm & 0x07; /* @@ -3224,7 +3225,8 @@ x86_decode( if ( modrm_mod == 3 ) { generate_exception_if(d & vSIB, EXC_UD); - modrm_rm |= (rex_prefix & 1) << 3; + modrm_rm |= ((rex_prefix & 1) << 3) | + (evex_encoded() && !evex.x) << 4; ea.type = OP_REG; } else if ( ad_bytes == 2 ) @@ -3289,7 +3291,10 @@ x86_decode( state->sib_index = ((sib >> 3) & 7) | ((rex_prefix << 2) & 8); state->sib_scale = (sib >> 6) & 3; - if ( state->sib_index != 4 && !(d & vSIB) ) + if ( unlikely(d & vSIB) ) + state->sib_index |= (mode_64bit() && evex_encoded() && + !evex.RX) << 4; + else if ( state->sib_index != 4 ) { ea.mem.off = *decode_gpr(state->regs, state->sib_index); ea.mem.off <<= state->sib_scale; @@ -3592,7 +3597,7 @@ x86_emulate( generate_exception_if(state->not_64bit && mode_64bit(), EXC_UD); if ( ea.type == OP_REG ) - ea.reg = _decode_gpr(&_regs, modrm_rm, (d & ByteOp) && !rex_prefix); + ea.reg = _decode_gpr(&_regs, modrm_rm, (d & ByteOp) && !rex_prefix && !vex.opcx); memset(mmvalp, 0xaa /* arbitrary */, sizeof(*mmvalp)); @@ -3606,7 +3611,7 @@ x86_emulate( src.type = OP_REG; if ( d & ByteOp ) { - src.reg = _decode_gpr(&_regs, modrm_reg, !rex_prefix); + src.reg = _decode_gpr(&_regs, modrm_reg, !rex_prefix && !vex.opcx); src.val = *(uint8_t *)src.reg; src.bytes = 1; } @@ -3704,7 +3709,7 @@ x86_emulate( dst.type = OP_REG; if ( d & ByteOp ) { - dst.reg = _decode_gpr(&_regs, modrm_reg, !rex_prefix); + dst.reg = _decode_gpr(&_regs, modrm_reg, !rex_prefix && !vex.opcx); dst.val = *(uint8_t *)dst.reg; dst.bytes = 1; } @@ -9119,6 +9124,130 @@ x86_emulate( put_stub(stub); state->simd_size = simd_none; + break; + } + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x90): /* vpgatherd{d,q} mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x91): /* vpgatherq{d,q} mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x92): /* vgatherdp{s,d} mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x93): /* vgatherqp{s,d} mem,[xyz]mm{k} */ + { + typeof(evex) *pevex; + union { + int32_t dw[16]; + int64_t qw[8]; + } index; + bool done = false; + + ASSERT(ea.type == OP_MEM); + generate_exception_if((!evex.opmsk || evex.brs || evex.z || + evex.reg != 0xf || + modrm_reg == state->sib_index), + EXC_UD); + avx512_vlen_check(false); + host_and_vcpu_must_have(avx512f); + get_fpu(X86EMUL_FPU_zmm); + + /* Read destination and index registers. */ + opc = init_evex(stub); + pevex = copy_EVEX(opc, evex); + pevex->opcx = vex_0f; + opc[0] = 0x7f; /* vmovdqa{32,64} */ + /* + * The register writeback below has to retain masked-off elements, but + * needs to clear upper portions in the index-wider-than-data cases. + * Therefore read (and write below) the full register. The alternative + * would have been to fiddle with the mask register used. + */ + pevex->opmsk = 0; + /* Use (%rax) as destination and modrm_reg as source. */ + pevex->b = 1; + opc[1] = (modrm_reg & 7) << 3; + pevex->RX = 1; + opc[2] = 0xc3; + + invoke_stub("", "", "=m" (*mmvalp) : "a" (mmvalp)); + + pevex->pfx = vex_f3; /* vmovdqu{32,64} */ + pevex->w = b & 1; + /* Switch to sib_index as source. */ + pevex->r = !mode_64bit() || !(state->sib_index & 0x08); + pevex->R = !mode_64bit() || !(state->sib_index & 0x10); + opc[1] = (state->sib_index & 7) << 3; + + invoke_stub("", "", "=m" (index) : "a" (&index)); + put_stub(stub); + + /* Clear untouched parts of the destination and mask values. */ + n = 1 << (2 + evex.lr - ((b & 1) | evex.w)); + op_bytes = 4 << evex.w; + memset((void *)mmvalp + n * op_bytes, 0, 64 - n * op_bytes); + op_mask &= (1 << n) - 1; + + for ( i = 0; op_mask; ++i ) + { + signed long idx = b & 1 ? index.qw[i] : index.dw[i]; + + if ( !(op_mask & (1 << i)) ) + continue; + + rc = ops->read(ea.mem.seg, + truncate_ea(ea.mem.off + (idx << state->sib_scale)), + (void *)mmvalp + i * op_bytes, op_bytes, ctxt); + if ( rc != X86EMUL_OKAY ) + { + /* + * If we've made some progress and the access did not fault, + * force a retry instead. This is for example necessary to + * cope with the limited capacity of HVM's MMIO cache. + */ + if ( rc != X86EMUL_EXCEPTION && done ) + rc = X86EMUL_RETRY; + break; + } + + op_mask &= ~(1 << i); + done = true; + +#ifdef __XEN__ + if ( op_mask && local_events_need_delivery() ) + { + rc = X86EMUL_RETRY; + break; + } +#endif + } + + /* Write destination and mask registers. */ + opc = init_evex(stub); + pevex = copy_EVEX(opc, evex); + pevex->opcx = vex_0f; + opc[0] = 0x6f; /* vmovdqa{32,64} */ + pevex->opmsk = 0; + /* Use modrm_reg as destination and (%rax) as source. */ + pevex->b = 1; + opc[1] = (modrm_reg & 7) << 3; + pevex->RX = 1; + opc[2] = 0xc3; + + invoke_stub("", "", "+m" (*mmvalp) : "a" (mmvalp)); + + /* + * kmovw: This is VEX-encoded, so we can't use pevex. Avoid copy_VEX() etc + * as well, since we can easily use the 2-byte VEX form here. + */ + opc -= EVEX_PFX_BYTES; + opc[0] = 0xc5; + opc[1] = 0xf8; + opc[2] = 0x90; + /* Use (%rax) as source. */ + opc[3] = evex.opmsk << 3; + opc[4] = 0xc3; + + invoke_stub("", "", "+m" (op_mask) : "a" (&op_mask)); + put_stub(stub); + + state->simd_size = simd_none; break; } --- a/xen/arch/x86/x86_emulate/x86_emulate.h +++ b/xen/arch/x86/x86_emulate/x86_emulate.h @@ -662,8 +662,6 @@ static inline unsigned long *decode_gpr( BUILD_BUG_ON(ARRAY_SIZE(cpu_user_regs_gpr_offsets) & (ARRAY_SIZE(cpu_user_regs_gpr_offsets) - 1)); - ASSERT(modrm < ARRAY_SIZE(cpu_user_regs_gpr_offsets)); - /* Note that this also acts as array_access_nospec() stand-in. */ modrm &= ARRAY_SIZE(cpu_user_regs_gpr_offsets) - 1; From patchwork Fri Mar 15 10:59:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854519 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B012F1575 for ; Fri, 15 Mar 2019 11:01:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 988FD2A6AF for ; Fri, 15 Mar 2019 11:01:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8A0CF2A938; Fri, 15 Mar 2019 11:01:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1C8B82A6AF for ; Fri, 15 Mar 2019 11:01:04 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kYc-00082q-QF; Fri, 15 Mar 2019 10:59:22 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kYb-00082d-MO for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:59:21 +0000 X-Inumbo-ID: 6250d186-4711-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 6250d186-4711-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 10:59:20 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:59:19 -0600 Message-Id: <5C8B8587020000780021F251@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:59:19 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 33/50] x86emul: add high register S/G test cases X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP In order to verify that in particular the index register decoding works correctly in the S/G emulation paths, add dedicated (64-bit only) cases disallowing the compiler to use the lower registers. Other than in the generic SIMD case, where occasional uses of %xmm or %ymm registers in generated code cause various internal compiler errors when disallowing use of all of the lower 16 registers (apparently due to insn templates trying to use AVX2 encodings), doing so here in the AVX512F case looks to be fine. While the main goal here is the AVX512F case, add an AVX2 variant as well. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v6: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -147,6 +147,12 @@ $(foreach flavor,$(SIMD) $(FMA),$(eval $ $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor)))) $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor)))) +first-string = $(shell for s in $(1); do echo "$$s"; break; done) + +avx2-sg-cflags-x86_64 := "-D_high $(foreach n,7 6 5 4 3 2 1,-ffixed-ymm$(n)) $(call first-string,$(avx2-sg-cflags))" +avx512f-sg-cflags-x86_64 := "-D_higher $(foreach n,7 6 5 4 3 2 1,-ffixed-zmm$(n)) $(call first-string,$(avx512f-sg-cflags))" +avx512f-sg-cflags-x86_64 += "-D_highest $(foreach n,15 14 13 12 11 10 9 8,-ffixed-zmm$(n)) $(call first-string,$(avx512f-sg-cflags-x86_64))" + $(addsuffix .h,$(TESTCASES)): %.h: %.c testcase.mk Makefile rm -f $@.new $*.bin $(foreach arch,$(filter-out $(XEN_COMPILE_ARCH),x86_32) $(XEN_COMPILE_ARCH), \ --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -266,6 +266,9 @@ static const struct { SIMD(AVX2 S/G i64[4x32], avx2_sg, 32x4i8), SIMD(AVX2 S/G i32[4x64], avx2_sg, 32x8i4), SIMD(AVX2 S/G i64[4x64], avx2_sg, 32x8i8), +#ifdef __x86_64__ + SIMD_(64, AVX2 S/G %ymm8+, avx2_sg, high), +#endif SIMD(XOP 128bit single, xop, 16f4), SIMD(XOP 256bit single, xop, 32f4), SIMD(XOP 128bit double, xop, 16f8), @@ -303,6 +306,10 @@ static const struct { SIMD(AVX512F S/G i64[ 8x32], avx512f_sg, 64x4i8), SIMD(AVX512F S/G i32[ 8x64], avx512f_sg, 64x8i4), SIMD(AVX512F S/G i64[ 8x64], avx512f_sg, 64x8i8), +#ifdef __x86_64__ + SIMD_(64, AVX512F S/G %zmm8+, avx512f_sg, higher), + SIMD_(64, AVX512F S/G %zmm16+, avx512f_sg, highest), +#endif AVX512VL(VL f32x4, avx512f, 16f4), AVX512VL(VL f64x2, avx512f, 16f8), AVX512VL(VL f32x8, avx512f, 32f4), From patchwork Fri Mar 15 10:59:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854523 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22C381575 for ; Fri, 15 Mar 2019 11:02:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 094BB2A949 for ; Fri, 15 Mar 2019 11:02:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F20142A94B; Fri, 15 Mar 2019 11:02:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3F16F2A949 for ; Fri, 15 Mar 2019 11:02:41 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kZ5-00089G-5V; Fri, 15 Mar 2019 10:59:51 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kZ4-000896-8f for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 10:59:50 +0000 X-Inumbo-ID: 72042404-4711-11e9-9a43-9bcba378de18 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 72042404-4711-11e9-9a43-9bcba378de18; Fri, 15 Mar 2019 10:59:47 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 04:59:46 -0600 Message-Id: <5C8B85A2020000780021F254@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 04:59:46 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 34/50] x86emul: support AVX512F scatter insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This completes support of AVX512F in the insn emulator. Note that in the test harness there's a little bit of trickery needed to get around the not fully consistent naming of AVX512VL gather and scatter built-ins. To suppress expansion of the "di" and "si" tokens they get constructed by token concatenation in BS(), which is different from BG(). Signed-off-by: Jan Beulich --- TBD: I couldn't really decide whether to duplicate code or merge scatter into gather emulation. --- v7: Re-base. v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -270,6 +270,8 @@ static const struct test avx512f_all[] = INSN(prolv, 66, 0f38, 15, vl, dq, vl), INSNX(pror, 66, 0f, 72, 0, vl, dq, vl), INSN(prorv, 66, 0f38, 14, vl, dq, vl), + INSN(pscatterd, 66, 0f38, a0, vl, dq, el), + INSN(pscatterq, 66, 0f38, a1, vl, dq, el), INSN(pshufd, 66, 0f, 70, vl, d, vl), INSN(pslld, 66, 0f, f2, el_4, d, vl), INSNX(pslld, 66, 0f, 72, 6, vl, d, vl), @@ -305,6 +307,8 @@ static const struct test avx512f_all[] = INSN(rsqrt14, 66, 0f38, 4f, el, sd, el), INSN(scalef, 66, 0f38, 2c, vl, sd, vl), INSN(scalef, 66, 0f38, 2d, el, sd, el), + INSN(scatterd, 66, 0f38, a2, vl, sd, el), + INSN(scatterq, 66, 0f38, a3, vl, sd, el), INSN_PFP(shuf, 0f, c6), INSN_FP(sqrt, 0f, 51), INSN_FP(sub, 0f, 5c), --- a/tools/tests/x86_emulator/simd-sg.c +++ b/tools/tests/x86_emulator/simd-sg.c @@ -48,10 +48,14 @@ typedef long long __attribute__((vector_ # endif # define BG_(dt, it, reg, mem, idx, msk, scl) \ __builtin_ia32_gather##it##dt(reg, mem, idx, to_mask(msk), scl) +# define BS_(dt, it, mem, idx, reg, msk, scl) \ + __builtin_ia32_scatter##it##dt(mem, to_mask(msk), idx, reg, scl) # else # define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE) # define BG_(dt, it, reg, mem, idx, msk, scl) \ __builtin_ia32_gather##it##dt(reg, mem, idx, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), scl) +# define BS_(dt, it, mem, idx, reg, msk, scl) \ + __builtin_ia32_scatter##it##dt(mem, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), idx, reg, scl) # endif /* * Instead of replicating the main IDX_SIZE conditional below three times, use @@ -59,6 +63,7 @@ typedef long long __attribute__((vector_ * respective relevant macro argument tokens. */ # define BG(dt, it, reg, mem, idx, msk, scl) BG_(dt, it, reg, mem, idx, msk, scl) +# define BS(dt, it, mem, idx, reg, msk, scl) BS_(dt, it##i, mem, idx, reg, msk, scl) # if VEC_MAX < 64 /* * The sub-512-bit built-ins have an extra "3" infix, presumably because the @@ -82,22 +87,30 @@ typedef long long __attribute__((vector_ # if IDX_SIZE == 4 # if INT_SIZE == 4 # define gather(reg, mem, idx, msk, scl) BG(v16si, si, reg, mem, idx, msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v16si, s, mem, idx, reg, msk, scl) # elif INT_SIZE == 8 # define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, si, (vdi_t)(reg), mem, idx, msk, scl)) +# define scatter(mem, idx, reg, msk, scl) BS(v8di, s, mem, idx, (vdi_t)(reg), msk, scl) # elif FLOAT_SIZE == 4 # define gather(reg, mem, idx, msk, scl) BG(v16sf, si, reg, mem, idx, msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v16sf, s, mem, idx, reg, msk, scl) # elif FLOAT_SIZE == 8 # define gather(reg, mem, idx, msk, scl) BG(v8df, si, reg, mem, idx, msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v8df, s, mem, idx, reg, msk, scl) # endif # elif IDX_SIZE == 8 # if INT_SIZE == 4 # define gather(reg, mem, idx, msk, scl) BG(v16si, di, reg, mem, (idi_t)(idx), msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v16si, d, mem, (idi_t)(idx), reg, msk, scl) # elif INT_SIZE == 8 # define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, di, (vdi_t)(reg), mem, (idi_t)(idx), msk, scl)) +# define scatter(mem, idx, reg, msk, scl) BS(v8di, d, mem, (idi_t)(idx), (vdi_t)(reg), msk, scl) # elif FLOAT_SIZE == 4 # define gather(reg, mem, idx, msk, scl) BG(v16sf, di, reg, mem, (idi_t)(idx), msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v16sf, d, mem, (idi_t)(idx), reg, msk, scl) # elif FLOAT_SIZE == 8 # define gather(reg, mem, idx, msk, scl) BG(v8df, di, reg, mem, (idi_t)(idx), msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v8df, d, mem, (idi_t)(idx), reg, msk, scl) # endif # endif #elif defined(__AVX2__) @@ -195,6 +208,8 @@ const typeof((vec_t){}[0]) array[] = { GLUE(PUT, VEC_MAX)(VEC_MAX + 1) }; +typeof((vec_t){}[0]) out[VEC_MAX * 2]; + int sg_test(void) { unsigned int i; @@ -275,5 +290,41 @@ int sg_test(void) # endif #endif +#ifdef scatter + + for ( i = 0; i < sizeof(out) / sizeof(*out); ++i ) + out[i] = 0; + + for ( i = 0; i < ITEM_COUNT; ++i ) + x[i] = i + 1; + + touch(x); + + scatter(out, (idx_t){}, x, (vec_t){ 1 } != 0, 1); + if ( out[0] != 1 ) + return __LINE__; + for ( i = 1; i < ITEM_COUNT; ++i ) + if ( out[i] ) + return __LINE__; + + scatter(out, (idx_t){}, x, full, 1); + if ( out[0] != ITEM_COUNT ) + return __LINE__; + for ( i = 1; i < ITEM_COUNT; ++i ) + if ( out[i] ) + return __LINE__; + + scatter(out, idx, x, full, ELEM_SIZE); + for ( i = 1; i <= ITEM_COUNT; ++i ) + if ( out[i] != i ) + return __LINE__; + + scatter(out, inv, x, full, ELEM_SIZE); + for ( i = 1; i <= ITEM_COUNT; ++i ) + if ( out[i] != ITEM_COUNT + 1 - i ) + return __LINE__; + +#endif + return 0; } --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -508,6 +508,7 @@ static const struct ext0f38_table { [0x9d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x9e] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x9f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0xa0 ... 0xa3] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq }, [0xa6 ... 0xa8] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0xa9] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xaa] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, @@ -9330,6 +9331,102 @@ x86_emulate( avx512_vlen_check(true); goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa0): /* vpscatterd{d,q} [xyz]mm,mem{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa1): /* vpscatterq{d,q} [xyz]mm,mem{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa2): /* vscatterdp{s,d} [xyz]mm,mem{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa3): /* vscatterqp{s,d} [xyz]mm,mem{k} */ + { + typeof(evex) *pevex; + union { + int32_t dw[16]; + int64_t qw[8]; + } index; + bool done = false; + + ASSERT(ea.type == OP_MEM); + fail_if(!ops->write); + generate_exception_if((!evex.opmsk || evex.brs || evex.z || + evex.reg != 0xf || + modrm_reg == state->sib_index), + EXC_UD); + avx512_vlen_check(false); + host_and_vcpu_must_have(avx512f); + get_fpu(X86EMUL_FPU_zmm); + + /* Read source and index registers. */ + opc = init_evex(stub); + pevex = copy_EVEX(opc, evex); + pevex->opcx = vex_0f; + opc[0] = 0x7f; /* vmovdqa{32,64} */ + /* Use (%rax) as destination and modrm_reg as source. */ + pevex->b = 1; + opc[1] = (modrm_reg & 7) << 3; + pevex->RX = 1; + opc[2] = 0xc3; + + invoke_stub("", "", "=m" (*mmvalp) : "a" (mmvalp)); + + pevex->pfx = vex_f3; /* vmovdqu{32,64} */ + pevex->w = b & 1; + /* Switch to sib_index as source. */ + pevex->r = !mode_64bit() || !(state->sib_index & 0x08); + pevex->R = !mode_64bit() || !(state->sib_index & 0x10); + opc[1] = (state->sib_index & 7) << 3; + + invoke_stub("", "", "=m" (index) : "a" (&index)); + put_stub(stub); + + /* Clear untouched parts of the mask value. */ + n = 1 << (2 + evex.lr - ((b & 1) | evex.w)); + op_bytes = 4 << evex.w; + op_mask &= (1 << n) - 1; + + for ( i = 0; op_mask; ++i ) + { + signed long idx = b & 1 ? index.qw[i] : index.dw[i]; + + if ( !(op_mask & (1 << i)) ) + continue; + + rc = ops->write(ea.mem.seg, + truncate_ea(ea.mem.off + (idx << state->sib_scale)), + (void *)mmvalp + i * op_bytes, op_bytes, ctxt); + if ( rc != X86EMUL_OKAY ) + { + /* See comment in gather emulation. */ + if ( rc != X86EMUL_EXCEPTION && done ) + rc = X86EMUL_RETRY; + break; + } + + op_mask &= ~(1 << i); + done = true; + +#ifdef __XEN__ + if ( op_mask && local_events_need_delivery() ) + { + rc = X86EMUL_RETRY; + break; + } +#endif + } + + /* Write mask register. See comment in gather emulation. */ + opc = get_stub(stub); + opc[0] = 0xc5; + opc[1] = 0xf8; + opc[2] = 0x90; + /* Use (%rax) as source. */ + opc[3] = evex.opmsk << 3; + opc[4] = 0xc3; + + invoke_stub("", "", "+m" (op_mask) : "a" (&op_mask)); + put_stub(stub); + + state->simd_size = simd_none; + break; + } + case X86EMUL_OPC(0x0f38, 0xc8): /* sha1nexte xmm/m128,xmm */ case X86EMUL_OPC(0x0f38, 0xc9): /* sha1msg1 xmm/m128,xmm */ case X86EMUL_OPC(0x0f38, 0xca): /* sha1msg2 xmm/m128,xmm */ From patchwork Fri Mar 15 11:00:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854525 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F182113B5 for ; Fri, 15 Mar 2019 11:02:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D75572A948 for ; Fri, 15 Mar 2019 11:02:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CA1562A94A; Fri, 15 Mar 2019 11:02:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2B1CA2A948 for ; Fri, 15 Mar 2019 11:02:57 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kZb-0000PT-I9; Fri, 15 Mar 2019 11:00:23 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kZa-0000PF-9B for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:00:22 +0000 X-Inumbo-ID: 85e4d830-4711-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 85e4d830-4711-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 11:00:20 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:00:19 -0600 Message-Id: <5C8B85C3020000780021F257@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:00:19 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 35/50] x86emul: support AVX512PF insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Some adjustments are necessary to the EVEX Disp8 scaling test code to account for the zero byte reads/writes, which get issued for the test harness only. Signed-off-by: Jan Beulich --- v8: #GP/#SS don't arise here. Add previously missed change to emul_test_init(). v7: Re-base. v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -520,6 +520,17 @@ static const struct test avx512er_512[] INSN(rsqrt28, 66, 0f38, cd, el, sd, el), }; +static const struct test avx512pf_512[] = { + INSNX(gatherpf0d, 66, 0f38, c6, 1, vl, sd, el), + INSNX(gatherpf0q, 66, 0f38, c7, 1, vl, sd, el), + INSNX(gatherpf1d, 66, 0f38, c6, 2, vl, sd, el), + INSNX(gatherpf1q, 66, 0f38, c7, 2, vl, sd, el), + INSNX(scatterpf0d, 66, 0f38, c6, 5, vl, sd, el), + INSNX(scatterpf0q, 66, 0f38, c7, 5, vl, sd, el), + INSNX(scatterpf1d, 66, 0f38, c6, 6, vl, sd, el), + INSNX(scatterpf1q, 66, 0f38, c7, 6, vl, sd, el), +}; + static const struct test avx512_vbmi_all[] = { INSN(permb, 66, 0f38, 8d, vl, b, vl), INSN(permi2b, 66, 0f38, 75, vl, b, vl), @@ -580,7 +591,7 @@ static bool record_access(enum x86_segme static int read(enum x86_segment seg, unsigned long offset, void *p_data, unsigned int bytes, struct x86_emulate_ctxt *ctxt) { - if ( !record_access(seg, offset, bytes) ) + if ( !record_access(seg, offset, bytes + !bytes) ) return X86EMUL_UNHANDLEABLE; memset(p_data, 0, bytes); return X86EMUL_OKAY; @@ -589,7 +600,7 @@ static int read(enum x86_segment seg, un static int write(enum x86_segment seg, unsigned long offset, void *p_data, unsigned int bytes, struct x86_emulate_ctxt *ctxt) { - if ( !record_access(seg, offset, bytes) ) + if ( !record_access(seg, offset, bytes + !bytes) ) return X86EMUL_UNHANDLEABLE; return X86EMUL_OKAY; } @@ -597,7 +608,7 @@ static int write(enum x86_segment seg, u static void test_one(const struct test *test, enum vl vl, unsigned char *instr, struct x86_emulate_ctxt *ctxt) { - unsigned int vsz, esz, i; + unsigned int vsz, esz, i, n; int rc; bool sg = strstr(test->mnemonic, "gather") || strstr(test->mnemonic, "scatter"); @@ -725,10 +736,20 @@ static void test_one(const struct test * for ( i = 0; i < (test->scale == SC_vl ? vsz : esz); ++i ) if ( accessed[i] ) goto fail; - for ( ; i < (test->scale == SC_vl ? vsz : esz) + (sg ? esz : vsz); ++i ) + + n = test->scale == SC_vl ? vsz : esz; + if ( !sg ) + n += vsz; + else if ( !strstr(test->mnemonic, "pf") ) + n += esz; + else + ++n; + + for ( ; i < n; ++i ) if ( accessed[i] != (sg ? (vsz / esz) >> (test->opc & 1 & !evex.w) : 1) ) goto fail; + for ( ; i < ARRAY_SIZE(accessed); ++i ) if ( accessed[i] ) goto fail; @@ -887,6 +908,8 @@ void evex_disp8_test(void *instr, struct RUN(avx512dq, no128); RUN(avx512dq, 512); RUN(avx512er, 512); +#define cpu_has_avx512pf cpu_has_avx512f + RUN(avx512pf, 512); RUN(avx512_vbmi, all); RUN(avx512_vbmi2, all); } --- a/tools/tests/x86_emulator/x86-emulate.c +++ b/tools/tests/x86_emulator/x86-emulate.c @@ -73,6 +73,7 @@ bool emul_test_init(void) */ cp.basic.movbe = true; cp.feat.adx = true; + cp.feat.avx512pf = cp.feat.avx512f; cp.feat.rdpid = true; cp.extd.clzero = true; @@ -135,12 +136,14 @@ int emul_test_cpuid( res->c |= 1U << 22; /* - * The emulator doesn't itself use ADCX/ADOX/RDPID, so we can always run - * the respective tests. + * The emulator doesn't itself use ADCX/ADOX/RDPID nor the S/G prefetch + * insns, so we can always run the respective tests. */ if ( leaf == 7 && subleaf == 0 ) { res->b |= 1U << 19; + if ( res->b & (1U << 16) ) + res->b |= 1U << 26; res->c |= 1U << 22; } --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -525,6 +525,7 @@ static const struct ext0f38_table { [0xbd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xbe] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0xbf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0xc6 ... 0xc7] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq }, [0xc8] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0xc9] = { .simd_size = simd_other }, [0xca] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, @@ -1903,6 +1904,7 @@ static bool vcpu_has( #define vcpu_has_smap() vcpu_has( 7, EBX, 20, ctxt, ops) #define vcpu_has_clflushopt() vcpu_has( 7, EBX, 23, ctxt, ops) #define vcpu_has_clwb() vcpu_has( 7, EBX, 24, ctxt, ops) +#define vcpu_has_avx512pf() vcpu_has( 7, EBX, 26, ctxt, ops) #define vcpu_has_avx512er() vcpu_has( 7, EBX, 27, ctxt, ops) #define vcpu_has_sha() vcpu_has( 7, EBX, 29, ctxt, ops) #define vcpu_has_avx512bw() vcpu_has( 7, EBX, 30, ctxt, ops) @@ -9425,6 +9427,94 @@ x86_emulate( state->simd_size = simd_none; break; + } + + case X86EMUL_OPC_EVEX_66(0x0f38, 0xc6): + case X86EMUL_OPC_EVEX_66(0x0f38, 0xc7): + { +#ifndef __XEN__ + typeof(evex) *pevex; + union { + int32_t dw[16]; + int64_t qw[8]; + } index; +#endif + + ASSERT(ea.type == OP_MEM); + generate_exception_if((!cpu_has_avx512f || !evex.opmsk || evex.brs || + evex.z || evex.reg != 0xf || evex.lr != 2), + EXC_UD); + + switch ( modrm_reg & 7 ) + { + case 1: /* vgatherpf0{d,q}p{s,d} mem{k} */ + case 2: /* vgatherpf1{d,q}p{s,d} mem{k} */ + case 5: /* vscatterpf0{d,q}p{s,d} mem{k} */ + case 6: /* vscatterpf1{d,q}p{s,d} mem{k} */ + vcpu_must_have(avx512pf); + break; + default: + generate_exception(EXC_UD); + } + + get_fpu(X86EMUL_FPU_zmm); + +#ifndef __XEN__ + /* + * For the test harness perform zero byte memory accesses, such that + * in particular correct Disp8 scaling can be verified. + */ + fail_if((modrm_reg & 4) && !ops->write); + + /* Read index register. */ + opc = init_evex(stub); + pevex = copy_EVEX(opc, evex); + pevex->opcx = vex_0f; + /* vmovdqu{32,64} */ + opc[0] = 0x7f; + pevex->pfx = vex_f3; + pevex->w = b & 1; + /* Use (%rax) as destination and sib_index as source. */ + pevex->b = 1; + opc[1] = (state->sib_index & 7) << 3; + pevex->r = !mode_64bit() || !(state->sib_index & 0x08); + pevex->R = !mode_64bit() || !(state->sib_index & 0x10); + pevex->RX = 1; + opc[2] = 0xc3; + + invoke_stub("", "", "=m" (index) : "a" (&index)); + put_stub(stub); + + /* Clear untouched parts of the mask value. */ + n = 1 << (4 - ((b & 1) | evex.w)); + op_mask &= (1 << n) - 1; + + for ( i = 0; rc == X86EMUL_OKAY && op_mask; ++i ) + { + signed long idx = b & 1 ? index.qw[i] : index.dw[i]; + + if ( !(op_mask & (1 << i)) ) + continue; + + rc = (modrm_reg & 4 + ? ops->write + : ops->read)(ea.mem.seg, + truncate_ea(ea.mem.off + + (idx << state->sib_scale)), + NULL, 0, ctxt); + if ( rc == X86EMUL_EXCEPTION ) + { + /* Squash memory access related exceptions. */ + x86_emul_reset_event(ctxt); + rc = X86EMUL_OKAY; + } + + op_mask &= ~(1 << i); + } +#endif + + state->simd_size = simd_none; + break; } case X86EMUL_OPC(0x0f38, 0xc8): /* sha1nexte xmm/m128,xmm */ From patchwork Fri Mar 15 11:00:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854527 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 65C7F1575 for ; Fri, 15 Mar 2019 11:03:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4AE562A948 for ; Fri, 15 Mar 2019 11:03:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3CA7B2A94A; Fri, 15 Mar 2019 11:03:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C09382A948 for ; Fri, 15 Mar 2019 11:03:21 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4ka1-0000Va-TV; Fri, 15 Mar 2019 11:00:49 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4ka0-0000VL-BZ for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:00:48 +0000 X-Inumbo-ID: 950d664a-4711-11e9-9e4d-bf46a59f0c42 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 950d664a-4711-11e9-9e4d-bf46a59f0c42; Fri, 15 Mar 2019 11:00:45 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:00:44 -0600 Message-Id: <5C8B85DD020000780021F2AC@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:00:45 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 36/50] x86emul: support AVX512CD insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Since the insns here and in particular their memory access patterns follow the usual scheme I didn't think it was necessary to add contrived tests specifically for them, beyond the Disp8 scaling ones. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -458,6 +458,13 @@ static const struct test avx512bw_128[] INSN(pinsrw, 66, 0f, c4, el, w, el), }; +static const struct test avx512cd_all[] = { +// pbroadcastmb2q, f3, 0f38, 2a, q +// pbroadcastmw2d, f3, 0f38, 3a, d + INSN(pconflict, 66, 0f38, c4, vl, dq, vl), + INSN(plzcnt, 66, 0f38, 44, vl, dq, vl), +}; + static const struct test avx512dq_all[] = { INSN_PFP(and, 0f, 54), INSN_PFP(andn, 0f, 55), @@ -903,6 +910,7 @@ void evex_disp8_test(void *instr, struct RUN(avx512f, 512); RUN(avx512bw, all); RUN(avx512bw, 128); + RUN(avx512cd, all); RUN(avx512dq, all); RUN(avx512dq, 128); RUN(avx512dq, no128); --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -138,6 +138,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512f (cp.feat.avx512f && xcr0_mask(0xe6)) #define cpu_has_avx512dq (cp.feat.avx512dq && xcr0_mask(0xe6)) #define cpu_has_avx512er (cp.feat.avx512er && xcr0_mask(0xe6)) +#define cpu_has_avx512cd (cp.feat.avx512cd && xcr0_mask(0xe6)) #define cpu_has_avx512bw (cp.feat.avx512bw && xcr0_mask(0xe6)) #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -473,6 +473,7 @@ static const struct ext0f38_table { [0x41] = { .simd_size = simd_packed_int, .two_op = 1 }, [0x42] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x43] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0x44] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x45 ... 0x47] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x4c] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, @@ -525,6 +526,7 @@ static const struct ext0f38_table { [0xbd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xbe] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0xbf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0xc4] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0xc6 ... 0xc7] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq }, [0xc8] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0xc9] = { .simd_size = simd_other }, @@ -1906,6 +1908,7 @@ static bool vcpu_has( #define vcpu_has_clwb() vcpu_has( 7, EBX, 24, ctxt, ops) #define vcpu_has_avx512pf() vcpu_has( 7, EBX, 26, ctxt, ops) #define vcpu_has_avx512er() vcpu_has( 7, EBX, 27, ctxt, ops) +#define vcpu_has_avx512cd() vcpu_has( 7, EBX, 28, ctxt, ops) #define vcpu_has_sha() vcpu_has( 7, EBX, 29, ctxt, ops) #define vcpu_has_avx512bw() vcpu_has( 7, EBX, 30, ctxt, ops) #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) @@ -8816,6 +8819,20 @@ x86_emulate( evex.opcx = vex_0f; goto vmovdqa; + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x2a): /* vpbroadcastmb2q k,[xyz]mm */ + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x3a): /* vpbroadcastmw2d k,[xyz]mm */ + generate_exception_if((ea.type != OP_REG || evex.opmsk || + evex.w == ((b >> 4) & 1)), + EXC_UD); + d |= TwoOp; + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xc4): /* vpconflict{d,q} [xyz]mm/mem,[xyz]mm{k} */ + fault_suppression = false; + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x44): /* vplzcnt{d,q} [xyz]mm/mem,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512cd); + goto avx512f_no_sae; + case X86EMUL_OPC_VEX_66(0x0f38, 0x2c): /* vmaskmovps mem,{x,y}mm,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x2d): /* vmaskmovpd mem,{x,y}mm,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x2e): /* vmaskmovps {x,y}mm,{x,y}mm,mem */ --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -103,6 +103,7 @@ #define cpu_has_rdseed boot_cpu_has(X86_FEATURE_RDSEED) #define cpu_has_smap boot_cpu_has(X86_FEATURE_SMAP) #define cpu_has_avx512er boot_cpu_has(X86_FEATURE_AVX512ER) +#define cpu_has_avx512cd boot_cpu_has(X86_FEATURE_AVX512CD) #define cpu_has_sha boot_cpu_has(X86_FEATURE_SHA) #define cpu_has_avx512bw boot_cpu_has(X86_FEATURE_AVX512BW) #define cpu_has_avx512vl boot_cpu_has(X86_FEATURE_AVX512VL) From patchwork Fri Mar 15 11:01:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854531 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 035451575 for ; Fri, 15 Mar 2019 11:04:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF62228AA8 for ; Fri, 15 Mar 2019 11:04:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D38CD28B17; Fri, 15 Mar 2019 11:04:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 66CAB28AA8 for ; Fri, 15 Mar 2019 11:04:10 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kaW-0000eL-O5; Fri, 15 Mar 2019 11:01:20 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kaV-0000du-6c for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:01:19 +0000 X-Inumbo-ID: a8758a8b-4711-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id a8758a8b-4711-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 11:01:18 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:01:17 -0600 Message-Id: <5C8B85FC020000780021F2AF@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:01:16 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 37/50] x86emul: complete support of AVX512_VBMI insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Also add testing of ones support for which was added before. Sadly gcc's command line option naming is not in line with Intel's naming of the feature, which makes it necessary to mis-name things in the test harness. Since the only new insn here and in particular its memory access pattern follows the usual scheme, I didn't think it was necessary to add a contrived test specifically for it, beyond the Disp8 scaling one. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v6: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -16,7 +16,7 @@ vpath %.c $(XEN_ROOT)/xen/lib/x86 CFLAGS += $(CFLAGS_xeninclude) -SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er +SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er avx512vbmi FMA := fma4 fma SG := avx2-sg avx512f-sg avx512vl-sg TESTCASES := blowfish $(SIMD) $(FMA) $(SG) @@ -83,6 +83,9 @@ avx512dq-flts := $(avx512f-flts) avx512er-vecs := 64 avx512er-ints := avx512er-flts := 4 8 +avx512vbmi-vecs := $(avx512bw-vecs) +avx512vbmi-ints := $(avx512bw-ints) +avx512vbmi-flts := $(avx512bw-flts) avx512f-opmask-vecs := 2 avx512dq-opmask-vecs := 1 2 --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -542,6 +542,7 @@ static const struct test avx512_vbmi_all INSN(permb, 66, 0f38, 8d, vl, b, vl), INSN(permi2b, 66, 0f38, 75, vl, b, vl), INSN(permt2b, 66, 0f38, 7d, vl, b, vl), + INSN(pmultishiftqb, 66, 0f38, 83, vl, q, vl), }; static const struct test avx512_vbmi2_all[] = { --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -27,6 +27,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512bw.h" #include "avx512dq.h" #include "avx512er.h" +#include "avx512vbmi.h" #define verbose false /* Switch to true for far more logging. */ @@ -127,6 +128,16 @@ static bool simd_check_avx512bw_vl(void) return cpu_has_avx512bw && cpu_has_avx512vl; } +static bool simd_check_avx512vbmi(void) +{ + return cpu_has_avx512_vbmi; +} + +static bool simd_check_avx512vbmi_vl(void) +{ + return cpu_has_avx512_vbmi && cpu_has_avx512vl; +} + static void simd_set_regs(struct cpu_user_regs *regs) { if ( cpu_has_mmx ) @@ -372,6 +383,18 @@ static const struct { SIMD(AVX512ER f32x16, avx512er, 64f4), SIMD(AVX512ER f64 scalar,avx512er, f8), SIMD(AVX512ER f64x8, avx512er, 64f8), + SIMD(AVX512_VBMI s8x64, avx512vbmi, 64i1), + SIMD(AVX512_VBMI u8x64, avx512vbmi, 64u1), + SIMD(AVX512_VBMI s16x32, avx512vbmi, 64i2), + SIMD(AVX512_VBMI u16x32, avx512vbmi, 64u2), + AVX512VL(_VBMI+VL s8x16, avx512vbmi, 16i1), + AVX512VL(_VBMI+VL u8x16, avx512vbmi, 16u1), + AVX512VL(_VBMI+VL s8x32, avx512vbmi, 32i1), + AVX512VL(_VBMI+VL u8x32, avx512vbmi, 32u1), + AVX512VL(_VBMI+VL s16x8, avx512vbmi, 16i2), + AVX512VL(_VBMI+VL u16x8, avx512vbmi, 16u2), + AVX512VL(_VBMI+VL s16x16, avx512vbmi, 32i2), + AVX512VL(_VBMI+VL u16x16, avx512vbmi, 32u2), #undef AVX512VL_ #undef AVX512VL #undef SIMD_ --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -493,6 +493,7 @@ static const struct ext0f38_table { [0x7a ... 0x7c] = { .simd_size = simd_none, .two_op = 1 }, [0x7d ... 0x7e] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x7f] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, + [0x83] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x88] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_dq }, [0x89] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_dq }, [0x8a] = { .simd_size = simd_packed_fp, .to_mem = 1, .two_op = 1, .d8s = d8s_dq }, @@ -9023,6 +9024,12 @@ x86_emulate( ASSERT(!state->simd_size); break; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x83): /* vpmultishiftqb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + generate_exception_if(!evex.w, EXC_UD); + host_and_vcpu_must_have(avx512_vbmi); + fault_suppression = false; + goto avx512f_no_sae; + case X86EMUL_OPC_VEX_66(0x0f38, 0x8c): /* vpmaskmov{d,q} mem,{x,y}mm,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x8e): /* vpmaskmov{d,q} {x,y}mm,{x,y}mm,mem */ generate_exception_if(ea.type != OP_MEM, EXC_UD); From patchwork Fri Mar 15 11:01:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854533 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FF3B15AC for ; Fri, 15 Mar 2019 11:04:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBD1928AA8 for ; Fri, 15 Mar 2019 11:04:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DFC6A28B1C; Fri, 15 Mar 2019 11:04:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6E34628AA8 for ; Fri, 15 Mar 2019 11:04:21 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kb9-0000oX-4I; Fri, 15 Mar 2019 11:01:59 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kb7-0000oD-FI for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:01:57 +0000 X-Inumbo-ID: bebc2e36-4711-11e9-8b95-afa5702a057b Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id bebc2e36-4711-11e9-8b95-afa5702a057b; Fri, 15 Mar 2019 11:01:55 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:01:54 -0600 Message-Id: <5C8B8623020000780021F2B2@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:01:55 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 38/50] x86emul: support of AVX512* population count insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Plus the only other AVX512_BITALG one. As in a few cases before, since the insns here and in particular their memory access patterns follow the usual scheme, I didn't think it was necessary to add a contrived test specifically for them, beyond the Disp8 scaling one. Signed-off-by: Jan Beulich --- v7: Re-base. v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -538,6 +538,11 @@ static const struct test avx512pf_512[] INSNX(scatterpf1q, 66, 0f38, c7, 6, vl, sd, el), }; +static const struct test avx512_bitalg_all[] = { + INSN(popcnt, 66, 0f38, 54, vl, bw, vl), + INSN(pshufbitqmb, 66, 0f38, 8f, vl, b, vl), +}; + static const struct test avx512_vbmi_all[] = { INSN(permb, 66, 0f38, 8d, vl, b, vl), INSN(permi2b, 66, 0f38, 75, vl, b, vl), @@ -550,6 +555,10 @@ static const struct test avx512_vbmi2_al INSN(pexpand, 66, 0f38, 62, vl, bw, el), }; +static const struct test avx512_vpopcntdq_all[] = { + INSN(popcnt, 66, 0f38, 55, vl, dq, vl) +}; + static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; static const unsigned char vl_128[] = { VL_128 }; static const unsigned char vl_no128[] = { VL_512, VL_256 }; @@ -919,6 +928,8 @@ void evex_disp8_test(void *instr, struct RUN(avx512er, 512); #define cpu_has_avx512pf cpu_has_avx512f RUN(avx512pf, 512); + RUN(avx512_bitalg, all); RUN(avx512_vbmi, all); RUN(avx512_vbmi2, all); + RUN(avx512_vpopcntdq, all); } --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -143,6 +143,8 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) +#define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6)) +#define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -479,6 +479,7 @@ static const struct ext0f38_table { [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0x54 ... 0x55] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 }, [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 }, [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 }, @@ -501,6 +502,7 @@ static const struct ext0f38_table { [0x8c] = { .simd_size = simd_packed_int }, [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 }, + [0x8f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq }, [0x96 ... 0x98] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x99] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, @@ -1915,6 +1917,8 @@ static bool vcpu_has( #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) #define vcpu_has_avx512_vbmi() vcpu_has( 7, ECX, 1, ctxt, ops) #define vcpu_has_avx512_vbmi2() vcpu_has( 7, ECX, 6, ctxt, ops) +#define vcpu_has_avx512_bitalg() vcpu_has( 7, ECX, 12, ctxt, ops) +#define vcpu_has_avx512_vpopcntdq() vcpu_has( 7, ECX, 14, ctxt, ops) #define vcpu_has_rdpid() vcpu_has( 7, ECX, 22, ctxt, ops) #define vcpu_has_clzero() vcpu_has(0x80000008, EBX, 0, ctxt, ops) @@ -8923,6 +8927,19 @@ x86_emulate( generate_exception_if(vex.l, EXC_UD); goto simd_0f_avx; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x8f): /* vpshufbitqmb [xyz]mm/mem,[xyz]mm,k{k} */ + generate_exception_if(evex.w || !evex.r || !evex.R || evex.z, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x54): /* vpopcnt{b,w} [xyz]mm/mem,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_bitalg); + generate_exception_if(evex.brs, EXC_UD); + elem_bytes = 1 << evex.w; + goto avx512f_no_sae; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x55): /* vpopcnt{d,q} [xyz]mm/mem,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_vpopcntdq); + goto avx512f_no_sae; + case X86EMUL_OPC_VEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,{x,y}mm */ --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -111,6 +111,8 @@ /* CPUID level 0x00000007:0.ecx */ #define cpu_has_avx512_vbmi boot_cpu_has(X86_FEATURE_AVX512_VBMI) #define cpu_has_avx512_vbmi2 boot_cpu_has(X86_FEATURE_AVX512_VBMI2) +#define cpu_has_avx512_bitalg boot_cpu_has(X86_FEATURE_AVX512_BITALG) +#define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ) #define cpu_has_rdpid boot_cpu_has(X86_FEATURE_RDPID) /* CPUID level 0x80000007.edx */ --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP, 6*32+ 2) / XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ XEN_CPUFEATURE(AVX512_VBMI2, 6*32+ 6) /*A Additional AVX-512 Vector Byte Manipulation Instrs */ +XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A Support for VPOPCNT[B,W] and VPSHUFBITQMB */ XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A POPCNT for vectors of DW/QW */ XEN_CPUFEATURE(RDPID, 6*32+22) /*A RDPID instruction */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -269,7 +269,7 @@ def crunch_numbers(state): # AVX512 extensions acting (solely) on vectors of bytes/words are made # dependents of AVX512BW (as to requiring wider than 16-bit mask # registers), despite the SDM not formally making this connection. - AVX512BW: [AVX512_VBMI, AVX512_VBMI2], + AVX512BW: [AVX512_VBMI, AVX512_BITALG, AVX512_VBMI2], # The features: # * Single Thread Indirect Branch Predictors From patchwork Fri Mar 15 11:02:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854537 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3BA914DE for ; Fri, 15 Mar 2019 11:04:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C904A2A817 for ; Fri, 15 Mar 2019 11:04:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD0A72A823; Fri, 15 Mar 2019 11:04:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 455332A817 for ; Fri, 15 Mar 2019 11:04:54 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kbc-0000ua-HS; Fri, 15 Mar 2019 11:02:28 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kbb-0000uH-Lm for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:02:27 +0000 X-Inumbo-ID: d0b6b531-4711-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id d0b6b531-4711-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 11:02:25 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:02:25 -0600 Message-Id: <5C8B8640020000780021F2B5@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:02:24 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 39/50] x86emul: support of AVX512_IFMA insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Once again take the liberty and also correct the (public interface) name of the AVX512_IFMA feature flag to match the SDM, on the assumption that no external consumer has actually been using that flag so far. As in a few cases before, since the insns here and in particular their memory access patterns follow the usual scheme, I didn't think it was necessary to add a contrived test specifically for them, beyond the Disp8 scaling one. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Reject EVEX.W=0. v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -543,6 +543,11 @@ static const struct test avx512_bitalg_a INSN(pshufbitqmb, 66, 0f38, 8f, vl, b, vl), }; +static const struct test avx512_ifma_all[] = { + INSN(pmadd52huq, 66, 0f38, b5, vl, q, vl), + INSN(pmadd52luq, 66, 0f38, b4, vl, q, vl), +}; + static const struct test avx512_vbmi_all[] = { INSN(permb, 66, 0f38, 8d, vl, b, vl), INSN(permi2b, 66, 0f38, 75, vl, b, vl), @@ -929,6 +934,7 @@ void evex_disp8_test(void *instr, struct #define cpu_has_avx512pf cpu_has_avx512f RUN(avx512pf, 512); RUN(avx512_bitalg, all); + RUN(avx512_ifma, all); RUN(avx512_vbmi, all); RUN(avx512_vbmi2, all); RUN(avx512_vpopcntdq, all); --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -137,6 +137,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_bmi2 cp.feat.bmi2 #define cpu_has_avx512f (cp.feat.avx512f && xcr0_mask(0xe6)) #define cpu_has_avx512dq (cp.feat.avx512dq && xcr0_mask(0xe6)) +#define cpu_has_avx512_ifma (cp.feat.avx512_ifma && xcr0_mask(0xe6)) #define cpu_has_avx512er (cp.feat.avx512er && xcr0_mask(0xe6)) #define cpu_has_avx512cd (cp.feat.avx512cd && xcr0_mask(0xe6)) #define cpu_has_avx512bw (cp.feat.avx512bw && xcr0_mask(0xe6)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -521,6 +521,7 @@ static const struct ext0f38_table { [0xad] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xae] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0xaf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0xb4 ... 0xb5] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0xb6 ... 0xb8] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0xb9] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xba] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, @@ -1907,6 +1908,7 @@ static bool vcpu_has( #define vcpu_has_rdseed() vcpu_has( 7, EBX, 18, ctxt, ops) #define vcpu_has_adx() vcpu_has( 7, EBX, 19, ctxt, ops) #define vcpu_has_smap() vcpu_has( 7, EBX, 20, ctxt, ops) +#define vcpu_has_avx512_ifma() vcpu_has( 7, EBX, 21, ctxt, ops) #define vcpu_has_clflushopt() vcpu_has( 7, EBX, 23, ctxt, ops) #define vcpu_has_clwb() vcpu_has( 7, EBX, 24, ctxt, ops) #define vcpu_has_avx512pf() vcpu_has( 7, EBX, 26, ctxt, ops) @@ -9470,6 +9472,12 @@ x86_emulate( break; } + case X86EMUL_OPC_EVEX_66(0x0f38, 0xb4): /* vpmadd52luq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xb5): /* vpmadd52huq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_ifma); + generate_exception_if(!evex.w, EXC_UD); + goto avx512f_no_sae; + case X86EMUL_OPC_EVEX_66(0x0f38, 0xc6): case X86EMUL_OPC_EVEX_66(0x0f38, 0xc7): { --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -102,6 +102,7 @@ #define cpu_has_avx512dq boot_cpu_has(X86_FEATURE_AVX512DQ) #define cpu_has_rdseed boot_cpu_has(X86_FEATURE_RDSEED) #define cpu_has_smap boot_cpu_has(X86_FEATURE_SMAP) +#define cpu_has_avx512_ifma boot_cpu_has(X86_FEATURE_AVX512_IFMA) #define cpu_has_avx512er boot_cpu_has(X86_FEATURE_AVX512ER) #define cpu_has_avx512cd boot_cpu_has(X86_FEATURE_AVX512CD) #define cpu_has_sha boot_cpu_has(X86_FEATURE_SHA) --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -212,7 +212,7 @@ XEN_CPUFEATURE(AVX512DQ, 5*32+17) / XEN_CPUFEATURE(RDSEED, 5*32+18) /*A RDSEED instruction */ XEN_CPUFEATURE(ADX, 5*32+19) /*A ADCX, ADOX instructions */ XEN_CPUFEATURE(SMAP, 5*32+20) /*S Supervisor Mode Access Prevention */ -XEN_CPUFEATURE(AVX512IFMA, 5*32+21) /*A AVX-512 Integer Fused Multiply Add */ +XEN_CPUFEATURE(AVX512_IFMA, 5*32+21) /*A AVX-512 Integer Fused Multiply Add */ XEN_CPUFEATURE(CLFLUSHOPT, 5*32+23) /*A CLFLUSHOPT instruction */ XEN_CPUFEATURE(CLWB, 5*32+24) /*A CLWB instruction */ XEN_CPUFEATURE(AVX512PF, 5*32+26) /*A AVX-512 Prefetch Instructions */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -262,7 +262,7 @@ def crunch_numbers(state): # (which in practice depends on the EVEX prefix to encode) as well # as mask registers, and the instructions themselves. All further # AVX512 features are built on top of AVX512F - AVX512F: [AVX512DQ, AVX512IFMA, AVX512PF, AVX512ER, AVX512CD, + AVX512F: [AVX512DQ, AVX512_IFMA, AVX512PF, AVX512ER, AVX512CD, AVX512BW, AVX512VL, AVX512_4VNNIW, AVX512_4FMAPS, AVX512_VPOPCNTDQ], From patchwork Fri Mar 15 11:02:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854539 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40A8F15AC for ; Fri, 15 Mar 2019 11:05:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 294F92A822 for ; Fri, 15 Mar 2019 11:05:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DD832A88F; Fri, 15 Mar 2019 11:05:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AA7CB2A822 for ; Fri, 15 Mar 2019 11:05:07 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kbw-00010m-U3; Fri, 15 Mar 2019 11:02:48 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kbv-00010L-4s for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:02:47 +0000 X-Inumbo-ID: dcbfce1a-4711-11e9-8da3-a701bc062552 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id dcbfce1a-4711-11e9-8da3-a701bc062552; Fri, 15 Mar 2019 11:02:46 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:02:45 -0600 Message-Id: <5C8B8655020000780021F2B8@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:02:45 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 40/50] x86emul: support remaining AVX512_VBMI2 insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP As in a few cases before, since the insns here and in particular their memory access patterns follow the usual scheme, I didn't think it was necessary to add a contrived test specifically for them, beyond the Disp8 scaling one. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: Re-base over change earlier in the series. v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -558,6 +558,14 @@ static const struct test avx512_vbmi_all static const struct test avx512_vbmi2_all[] = { INSN(pcompress, 66, 0f38, 63, vl, bw, el), INSN(pexpand, 66, 0f38, 62, vl, bw, el), + INSN(pshld, 66, 0f3a, 71, vl, dq, vl), + INSN(pshldv, 66, 0f38, 71, vl, dq, vl), + INSN(pshldvw, 66, 0f38, 70, vl, w, vl), + INSN(pshldw, 66, 0f3a, 70, vl, w, vl), + INSN(pshrd, 66, 0f3a, 73, vl, dq, vl), + INSN(pshrdv, 66, 0f38, 73, vl, dq, vl), + INSN(pshrdvw, 66, 0f38, 72, vl, w, vl), + INSN(pshrdw, 66, 0f3a, 72, vl, w, vl), }; static const struct test avx512_vpopcntdq_all[] = { --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -487,6 +487,7 @@ static const struct ext0f38_table { [0x62] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_bw }, [0x63] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_bw }, [0x64 ... 0x66] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x70 ... 0x73] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x78] = { .simd_size = simd_other, .two_op = 1 }, @@ -611,6 +612,7 @@ static const struct ext0f3a_table { [0x6a ... 0x6b] = { .simd_size = simd_scalar_opc, .four_op = 1 }, [0x6c ... 0x6d] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x6e ... 0x6f] = { .simd_size = simd_scalar_opc, .four_op = 1 }, + [0x70 ... 0x73] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x78 ... 0x79] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x7a ... 0x7b] = { .simd_size = simd_scalar_opc, .four_op = 1 }, [0x7c ... 0x7d] = { .simd_size = simd_packed_fp, .four_op = 1 }, @@ -8993,6 +8995,16 @@ x86_emulate( } goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x70): /* vpshldvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x72): /* vpshrdvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + generate_exception_if(!evex.w, EXC_UD); + elem_bytes = 2; + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x71): /* vpshldv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x73): /* vpshrdv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_vbmi2); + goto avx512f_no_sae; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x8d): /* vperm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ @@ -10293,6 +10305,16 @@ x86_emulate( avx512_vlen_check(true); goto simd_imm8_zmm; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x70): /* vpshldw $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x72): /* vpshrdw $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + generate_exception_if(!evex.w, EXC_UD); + elem_bytes = 2; + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x71): /* vpshld{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x73): /* vpshrd{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_vbmi2); + goto avx512f_imm8_no_sae; + case X86EMUL_OPC(0x0f3a, 0xcc): /* sha1rnds4 $imm8,xmm/m128,xmm */ host_and_vcpu_must_have(sha); op_bytes = 16; From patchwork Fri Mar 15 11:04:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854541 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E50314DE for ; Fri, 15 Mar 2019 11:06:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 343DF2A94B for ; Fri, 15 Mar 2019 11:06:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 285A82A94D; Fri, 15 Mar 2019 11:06:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 817762A94B for ; Fri, 15 Mar 2019 11:06:23 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kdL-0001HH-HE; Fri, 15 Mar 2019 11:04:15 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kdK-0001H5-CN for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:04:14 +0000 X-Inumbo-ID: 108dcb02-4712-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 108dcb02-4712-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 11:04:12 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:04:12 -0600 Message-Id: <5C8B86A9020000780021F2BB@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:04:09 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 41/50] x86emul: support AVX512_4FMAPS insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jan Beulich --- v8: Correct vcpu_has_*() insertion point. v7: Re-base. v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -538,6 +538,13 @@ static const struct test avx512pf_512[] INSNX(scatterpf1q, 66, 0f38, c7, 6, vl, sd, el), }; +static const struct test avx512_4fmaps_512[] = { + INSN(4fmaddps, f2, 0f38, 9a, el_4, d, vl), + INSN(4fmaddss, f2, 0f38, 9b, el_4, d, vl), + INSN(4fnmaddps, f2, 0f38, aa, el_4, d, vl), + INSN(4fnmaddss, f2, 0f38, ab, el_4, d, vl), +}; + static const struct test avx512_bitalg_all[] = { INSN(popcnt, 66, 0f38, 54, vl, bw, vl), INSN(pshufbitqmb, 66, 0f38, 8f, vl, b, vl), @@ -941,6 +948,7 @@ void evex_disp8_test(void *instr, struct RUN(avx512er, 512); #define cpu_has_avx512pf cpu_has_avx512f RUN(avx512pf, 512); + RUN(avx512_4fmaps, 512); RUN(avx512_bitalg, all); RUN(avx512_ifma, all); RUN(avx512_vbmi, all); --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -4274,6 +4274,81 @@ int main(int argc, char **argv) } #endif + printf("%-40s", "Testing v4fmaddps 32(%ecx),%zmm4,%zmm4{%k5}..."); + if ( stack_exec && cpu_has_avx512_4fmaps ) + { + decl_insn(v4fmaddps); + static const struct { + float f[16]; + } in = {{ + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 + }}, out = {{ + 1 + 1 * 9 + 2 * 10 + 3 * 11 + 4 * 12, + 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + 16 + 16 * 9 + 17 * 10 + 18 * 11 + 19 * 12 + }}; + + asm volatile ( "vmovups %1, %%zmm4\n\t" + "vbroadcastss %%xmm4, %%zmm7\n\t" + "vaddps %%zmm4, %%zmm7, %%zmm5\n\t" + "vaddps %%zmm5, %%zmm7, %%zmm6\n\t" + "vaddps %%zmm6, %%zmm7, %%zmm7\n\t" + "kmovw %2, %%k5\n" + put_insn(v4fmaddps, + "v4fmaddps 32(%0), %%zmm4, %%zmm4%{%%k5%}") + :: "c" (NULL), "m" (in), "rmk" (0x8001) ); + + set_insn(v4fmaddps); + regs.ecx = (unsigned long)∈ + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(v4fmaddps) ) + goto fail; + + asm ( "vcmpeqps %1, %%zmm4, %%k0\n\t" + "kmovw %%k0, %0" : "=g" (rc) : "m" (out) ); + if ( rc != 0xffff ) + goto fail; + printf("okay\n"); + } + else + printf("skipped\n"); + + printf("%-40s", "Testing v4fnmaddss 16(%edx),%zmm4,%zmm4{%k3}..."); + if ( stack_exec && cpu_has_avx512_4fmaps ) + { + decl_insn(v4fnmaddss); + static const struct { + float f[16]; + } in = {{ + 1, 2, 3, 4, 5, 6, 7, 8 + }}, out = {{ + 1 - 1 * 5 - 2 * 6 - 3 * 7 - 4 * 8, 2, 3, 4 + }}; + + asm volatile ( "vmovups %1, %%xmm4\n\t" + "vaddss %%xmm4, %%xmm4, %%xmm5\n\t" + "vaddss %%xmm5, %%xmm4, %%xmm6\n\t" + "vaddss %%xmm6, %%xmm4, %%xmm7\n\t" + "kmovw %2, %%k3\n" + put_insn(v4fnmaddss, + "v4fnmaddss 16(%0), %%xmm4, %%xmm4%{%%k3%}") + :: "d" (NULL), "m" (in), "rmk" (1) ); + + set_insn(v4fnmaddss); + regs.edx = (unsigned long)∈ + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY || !check_eip(v4fnmaddss) ) + goto fail; + + asm ( "vcmpeqps %1, %%zmm4, %%k0\n\t" + "kmovw %%k0, %0" : "=g" (rc) : "m" (out) ); + if ( rc != 0xffff ) + goto fail; + printf("okay\n"); + } + else + printf("skipped\n"); + #undef decl_insn #undef put_insn #undef set_insn --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -146,6 +146,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6)) #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) +#define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1924,6 +1924,7 @@ static bool vcpu_has( #define vcpu_has_avx512_bitalg() vcpu_has( 7, ECX, 12, ctxt, ops) #define vcpu_has_avx512_vpopcntdq() vcpu_has( 7, ECX, 14, ctxt, ops) #define vcpu_has_rdpid() vcpu_has( 7, ECX, 22, ctxt, ops) +#define vcpu_has_avx512_4fmaps() vcpu_has( 7, EDX, 3, ctxt, ops) #define vcpu_has_clzero() vcpu_has(0x80000008, EBX, 0, ctxt, ops) #define vcpu_must_have(feat) \ @@ -3205,6 +3206,18 @@ x86_decode( state); state->simd_size = simd_other; } + + switch ( b ) + { + /* v4f{,n}madd{p,s}s need special casing */ + case 0x9a: case 0x9b: case 0xaa: case 0xab: + if ( evex.pfx == vex_f2 ) + { + disp8scale = 4; + state->simd_size = simd_128; + } + break; + } } break; @@ -9388,6 +9401,24 @@ x86_emulate( avx512_vlen_check(true); goto simd_zmm; + case X86EMUL_OPC_EVEX_F2(0x0f38, 0x9a): /* v4fmaddps m128,zmm+3,zmm{k} */ + case X86EMUL_OPC_EVEX_F2(0x0f38, 0xaa): /* v4fnmaddps m128,zmm+3,zmm{k} */ + host_and_vcpu_must_have(avx512_4fmaps); + generate_exception_if((ea.type != OP_MEM || evex.w || evex.brs || + evex.lr != 2), + EXC_UD); + op_mask = op_mask & 0xffff ? 0xf : 0; + goto simd_zmm; + + case X86EMUL_OPC_EVEX_F2(0x0f38, 0x9b): /* v4fmaddss m128,xmm+3,xmm{k} */ + case X86EMUL_OPC_EVEX_F2(0x0f38, 0xab): /* v4fnmaddss m128,xmm+3,xmm{k} */ + host_and_vcpu_must_have(avx512_4fmaps); + generate_exception_if((ea.type != OP_MEM || evex.w || evex.brs || + evex.lr == 3), + EXC_UD); + op_mask = op_mask & 1 ? 0xf : 0; + goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa0): /* vpscatterd{d,q} [xyz]mm,mem{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0xa1): /* vpscatterq{d,q} [xyz]mm,mem{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0xa2): /* vscatterdp{s,d} [xyz]mm,mem{k} */ --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -116,6 +116,9 @@ #define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ) #define cpu_has_rdpid boot_cpu_has(X86_FEATURE_RDPID) +/* CPUID level 0x00000007:0.edx */ +#define cpu_has_avx512_4fmaps boot_cpu_has(X86_FEATURE_AVX512_4FMAPS) + /* CPUID level 0x80000007.edx */ #define cpu_has_itsc boot_cpu_has(X86_FEATURE_ITSC) From patchwork Fri Mar 15 11:04:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854543 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06BB114DE for ; Fri, 15 Mar 2019 11:06:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3CD72A94A for ; Fri, 15 Mar 2019 11:06:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D7B232A94C; Fri, 15 Mar 2019 11:06:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 70E332A94A for ; Fri, 15 Mar 2019 11:06:24 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kdc-0001LO-SF; Fri, 15 Mar 2019 11:04:32 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kdb-0001L5-SV for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:04:31 +0000 X-Inumbo-ID: 1a1c6bec-4712-11e9-b2e3-97a58ff491c9 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 1a1c6bec-4712-11e9-b2e3-97a58ff491c9; Fri, 15 Mar 2019 11:04:29 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:04:28 -0600 Message-Id: <5C8B86BD020000780021F2BE@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:04:29 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 42/50] x86emul: support AVX512_4VNNIW insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP As in a few cases before, since the insns here and in particular their memory access patterns follow the AVX512_4FMAPS scheme, I didn't think it was necessary to add contrived tests specifically for them, beyond the Disp8 scaling ones. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v8: Correct vcpu_has_*() insertion point. v7: Re-base. v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -545,6 +545,11 @@ static const struct test avx512_4fmaps_5 INSN(4fnmaddss, f2, 0f38, ab, el_4, d, vl), }; +static const struct test avx512_4vnniw_512[] = { + INSN(p4dpwssd, f2, 0f38, 52, el_4, d, vl), + INSN(p4dpwssds, f2, 0f38, 53, el_4, d, vl), +}; + static const struct test avx512_bitalg_all[] = { INSN(popcnt, 66, 0f38, 54, vl, bw, vl), INSN(pshufbitqmb, 66, 0f38, 8f, vl, b, vl), @@ -949,6 +954,7 @@ void evex_disp8_test(void *instr, struct #define cpu_has_avx512pf cpu_has_avx512f RUN(avx512pf, 512); RUN(avx512_4fmaps, 512); + RUN(avx512_4vnniw, 512); RUN(avx512_bitalg, all); RUN(avx512_ifma, all); RUN(avx512_vbmi, all); --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -146,6 +146,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6)) #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) +#define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6)) #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -479,6 +479,7 @@ static const struct ext0f38_table { [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0x52 ... 0x53] = { .simd_size = simd_128, .d8s = 4 }, [0x54 ... 0x55] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 }, [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 }, @@ -1924,6 +1925,7 @@ static bool vcpu_has( #define vcpu_has_avx512_bitalg() vcpu_has( 7, ECX, 12, ctxt, ops) #define vcpu_has_avx512_vpopcntdq() vcpu_has( 7, ECX, 14, ctxt, ops) #define vcpu_has_rdpid() vcpu_has( 7, ECX, 22, ctxt, ops) +#define vcpu_has_avx512_4vnniw() vcpu_has( 7, EDX, 2, ctxt, ops) #define vcpu_has_avx512_4fmaps() vcpu_has( 7, EDX, 3, ctxt, ops) #define vcpu_has_clzero() vcpu_has(0x80000008, EBX, 0, ctxt, ops) @@ -8944,6 +8946,15 @@ x86_emulate( generate_exception_if(vex.l, EXC_UD); goto simd_0f_avx; + case X86EMUL_OPC_EVEX_F2(0x0f38, 0x52): /* vp4dpwssd m128,zmm+3,zmm{k} */ + case X86EMUL_OPC_EVEX_F2(0x0f38, 0x53): /* vp4dpwssds m128,zmm+3,zmm{k} */ + host_and_vcpu_must_have(avx512_4vnniw); + generate_exception_if((ea.type != OP_MEM || evex.w || evex.brs || + evex.lr != 2), + EXC_UD); + op_mask = op_mask & 0xffff ? 0xf : 0; + goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x8f): /* vpshufbitqmb [xyz]mm/mem,[xyz]mm,k{k} */ generate_exception_if(evex.w || !evex.r || !evex.R || evex.z, EXC_UD); /* fall through */ --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -117,6 +117,7 @@ #define cpu_has_rdpid boot_cpu_has(X86_FEATURE_RDPID) /* CPUID level 0x00000007:0.edx */ +#define cpu_has_avx512_4vnniw boot_cpu_has(X86_FEATURE_AVX512_4VNNIW) #define cpu_has_avx512_4fmaps boot_cpu_has(X86_FEATURE_AVX512_4FMAPS) /* CPUID level 0x80000007.edx */ From patchwork Fri Mar 15 11:04:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854545 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C7C7315AC for ; Fri, 15 Mar 2019 11:06:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A99EE2A94A for ; Fri, 15 Mar 2019 11:06:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B6CF2A94C; Fri, 15 Mar 2019 11:06:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DBA132A94A for ; Fri, 15 Mar 2019 11:06:42 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4ke5-0001SF-6z; Fri, 15 Mar 2019 11:05:01 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4ke3-0001Ro-AC for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:04:59 +0000 X-Inumbo-ID: 2b67dcac-4712-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 2b67dcac-4712-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 11:04:57 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:04:57 -0600 Message-Id: <5C8B86D9020000780021F2C1@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:04:57 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 43/50] x86emul: support AVX512_VNNI insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP As in a few cases before, since the insns here and in particular their memory access patterns follow the usual scheme, I didn't think it was necessary to add a contrived test specifically for them, beyond the Disp8 scaling one. Signed-off-by: Jan Beulich --- v8: Re-base. v7: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -580,6 +580,13 @@ static const struct test avx512_vbmi2_al INSN(pshrdw, 66, 0f3a, 72, vl, w, vl), }; +static const struct test avx512_vnni_all[] = { + INSN(pdpbusd, 66, 0f38, 50, vl, d, vl), + INSN(pdpbusds, 66, 0f38, 51, vl, d, vl), + INSN(pdpwssd, 66, 0f38, 52, vl, d, vl), + INSN(pdpwssds, 66, 0f38, 53, vl, d, vl), +}; + static const struct test avx512_vpopcntdq_all[] = { INSN(popcnt, 66, 0f38, 55, vl, dq, vl) }; @@ -959,5 +966,6 @@ void evex_disp8_test(void *instr, struct RUN(avx512_ifma, all); RUN(avx512_vbmi, all); RUN(avx512_vbmi2, all); + RUN(avx512_vnni, all); RUN(avx512_vpopcntdq, all); } --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -144,6 +144,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) +#define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6)) #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6)) #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -479,7 +479,7 @@ static const struct ext0f38_table { [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, - [0x52 ... 0x53] = { .simd_size = simd_128, .d8s = 4 }, + [0x50 ... 0x53] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x54 ... 0x55] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 }, [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 }, @@ -1922,6 +1922,7 @@ static bool vcpu_has( #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) #define vcpu_has_avx512_vbmi() vcpu_has( 7, ECX, 1, ctxt, ops) #define vcpu_has_avx512_vbmi2() vcpu_has( 7, ECX, 6, ctxt, ops) +#define vcpu_has_avx512_vnni() vcpu_has( 7, ECX, 11, ctxt, ops) #define vcpu_has_avx512_bitalg() vcpu_has( 7, ECX, 12, ctxt, ops) #define vcpu_has_avx512_vpopcntdq() vcpu_has( 7, ECX, 14, ctxt, ops) #define vcpu_has_rdpid() vcpu_has( 7, ECX, 22, ctxt, ops) @@ -3211,6 +3212,8 @@ x86_decode( switch ( b ) { + /* vp4dpwssd{,s} need special casing */ + case 0x52: case 0x53: /* v4f{,n}madd{p,s}s need special casing */ case 0x9a: case 0x9b: case 0xaa: case 0xab: if ( evex.pfx == vex_f2 ) @@ -9412,6 +9415,14 @@ x86_emulate( avx512_vlen_check(true); goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x50): /* vpdpbusd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x51): /* vpdpbusds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x52): /* vpdpwssd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x53): /* vpdpwssds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_vnni); + generate_exception_if(evex.w, EXC_UD); + goto avx512f_no_sae; + case X86EMUL_OPC_EVEX_F2(0x0f38, 0x9a): /* v4fmaddps m128,zmm+3,zmm{k} */ case X86EMUL_OPC_EVEX_F2(0x0f38, 0xaa): /* v4fnmaddps m128,zmm+3,zmm{k} */ host_and_vcpu_must_have(avx512_4fmaps); --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -112,6 +112,7 @@ /* CPUID level 0x00000007:0.ecx */ #define cpu_has_avx512_vbmi boot_cpu_has(X86_FEATURE_AVX512_VBMI) #define cpu_has_avx512_vbmi2 boot_cpu_has(X86_FEATURE_AVX512_VBMI2) +#define cpu_has_avx512_vnni boot_cpu_has(X86_FEATURE_AVX512_VNNI) #define cpu_has_avx512_bitalg boot_cpu_has(X86_FEATURE_AVX512_BITALG) #define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ) #define cpu_has_rdpid boot_cpu_has(X86_FEATURE_RDPID) --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP, 6*32+ 2) / XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ XEN_CPUFEATURE(AVX512_VBMI2, 6*32+ 6) /*A Additional AVX-512 Vector Byte Manipulation Instrs */ +XEN_CPUFEATURE(AVX512_VNNI, 6*32+11) /*A Vector Neural Network Instrs */ XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A Support for VPOPCNT[B,W] and VPSHUFBITQMB */ XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A POPCNT for vectors of DW/QW */ XEN_CPUFEATURE(RDPID, 6*32+22) /*A RDPID instruction */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -264,7 +264,7 @@ def crunch_numbers(state): # AVX512 features are built on top of AVX512F AVX512F: [AVX512DQ, AVX512_IFMA, AVX512PF, AVX512ER, AVX512CD, AVX512BW, AVX512VL, AVX512_4VNNIW, AVX512_4FMAPS, - AVX512_VPOPCNTDQ], + AVX512_VNNI, AVX512_VPOPCNTDQ], # AVX512 extensions acting (solely) on vectors of bytes/words are made # dependents of AVX512BW (as to requiring wider than 16-bit mask From patchwork Fri Mar 15 11:05:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854547 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0EFCE15AC for ; Fri, 15 Mar 2019 11:07:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E8A9B2A94A for ; Fri, 15 Mar 2019 11:07:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC7B52A94F; Fri, 15 Mar 2019 11:07:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 23CE92A94A for ; Fri, 15 Mar 2019 11:07:13 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kec-0001ac-JS; Fri, 15 Mar 2019 11:05:34 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kea-0001aM-Jd for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:05:32 +0000 X-Inumbo-ID: 3f3ea9bc-4712-11e9-b2f8-0b5f97a8f175 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 3f3ea9bc-4712-11e9-b2f8-0b5f97a8f175; Fri, 15 Mar 2019 11:05:31 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:05:30 -0600 Message-Id: <5C8B86FB020000780021F31D@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:05:31 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 44/50] x86emul: support VPCLMULQDQ insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP As to the feature dependency adjustment, while strictly speaking AVX is a sufficient prereq (to have YMM registers), 256-bit vectors of integers have got fully introduced with AVX2 only. Sadly gcc can't be used as a reference here: They don't provide any AVX512-independent built-in at all. Along the lines of PCLMULQDQ, since the insns here and in particular their memory access patterns follow the usual scheme, I didn't think it was necessary to add a contrived test specifically for them, beyond the Disp8 scaling one. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- TBD: Should VPCLMULQDQ also depend on PCLMULQDQ? --- v8: No need to set fault_suppression to false. v7: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -591,6 +591,10 @@ static const struct test avx512_vpopcntd INSN(popcnt, 66, 0f38, 55, vl, dq, vl) }; +static const struct test vpclmulqdq_all[] = { + INSN(pclmulqdq, 66, 0f3a, 44, vl, q_nb, vl) +}; + static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; static const unsigned char vl_128[] = { VL_128 }; static const unsigned char vl_no128[] = { VL_512, VL_256 }; @@ -968,4 +972,9 @@ void evex_disp8_test(void *instr, struct RUN(avx512_vbmi2, all); RUN(avx512_vnni, all); RUN(avx512_vpopcntdq, all); + + if ( cpu_has_avx512f ) + { + RUN(vpclmulqdq, all); + } } --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -144,6 +144,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) +#define cpu_has_vpclmulqdq (cp.feat.vpclmulqdq && xcr0_mask(6)) #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6)) #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6)) #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -594,7 +594,7 @@ static const struct ext0f3a_table { [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x40 ... 0x41] = { .simd_size = simd_packed_fp }, [0x42 ... 0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, - [0x44] = { .simd_size = simd_packed_int }, + [0x44] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x46] = { .simd_size = simd_packed_int }, [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x4a ... 0x4b] = { .simd_size = simd_packed_fp, .four_op = 1 }, @@ -1922,6 +1922,7 @@ static bool vcpu_has( #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) #define vcpu_has_avx512_vbmi() vcpu_has( 7, ECX, 1, ctxt, ops) #define vcpu_has_avx512_vbmi2() vcpu_has( 7, ECX, 6, ctxt, ops) +#define vcpu_has_vpclmulqdq() vcpu_has( 7, ECX, 10, ctxt, ops) #define vcpu_has_avx512_vnni() vcpu_has( 7, ECX, 11, ctxt, ops) #define vcpu_has_avx512_bitalg() vcpu_has( 7, ECX, 12, ctxt, ops) #define vcpu_has_avx512_vpopcntdq() vcpu_has( 7, ECX, 14, ctxt, ops) @@ -10219,13 +10220,19 @@ x86_emulate( goto opmask_shift_imm; case X86EMUL_OPC_66(0x0f3a, 0x44): /* pclmulqdq $imm8,xmm/m128,xmm */ - case X86EMUL_OPC_VEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,xmm/m128,xmm,xmm */ + case X86EMUL_OPC_VEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */ host_and_vcpu_must_have(pclmulqdq); if ( vex.opcx == vex_none ) goto simd_0f3a_common; - generate_exception_if(vex.l, EXC_UD); + if ( vex.l ) + host_and_vcpu_must_have(vpclmulqdq); goto simd_0f_imm8_avx; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm */ + host_and_vcpu_must_have(vpclmulqdq); + generate_exception_if(evex.brs || evex.opmsk, EXC_UD); + goto avx512f_imm8_no_sae; + case X86EMUL_OPC_VEX_66(0x0f3a, 0x4a): /* vblendvps {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f3a, 0x4b): /* vblendvpd {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */ generate_exception_if(vex.w, EXC_UD); --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -112,6 +112,7 @@ /* CPUID level 0x00000007:0.ecx */ #define cpu_has_avx512_vbmi boot_cpu_has(X86_FEATURE_AVX512_VBMI) #define cpu_has_avx512_vbmi2 boot_cpu_has(X86_FEATURE_AVX512_VBMI2) +#define cpu_has_vpclmulqdq boot_cpu_has(X86_FEATURE_VPCLMULQDQ) #define cpu_has_avx512_vnni boot_cpu_has(X86_FEATURE_AVX512_VNNI) #define cpu_has_avx512_bitalg boot_cpu_has(X86_FEATURE_AVX512_BITALG) #define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ) --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -121,7 +121,7 @@ XEN_CPUFEATURE(PBE, 0*32+31) / /* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */ XEN_CPUFEATURE(SSE3, 1*32+ 0) /*A Streaming SIMD Extensions-3 */ -XEN_CPUFEATURE(PCLMULQDQ, 1*32+ 1) /*A Carry-less mulitplication */ +XEN_CPUFEATURE(PCLMULQDQ, 1*32+ 1) /*A Carry-less multiplication */ XEN_CPUFEATURE(DTES64, 1*32+ 2) /* 64-bit Debug Store */ XEN_CPUFEATURE(MONITOR, 1*32+ 3) /* Monitor/Mwait support */ XEN_CPUFEATURE(DSCPL, 1*32+ 4) /* CPL Qualified Debug Store */ @@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP, 6*32+ 2) / XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ XEN_CPUFEATURE(AVX512_VBMI2, 6*32+ 6) /*A Additional AVX-512 Vector Byte Manipulation Instrs */ +XEN_CPUFEATURE(VPCLMULQDQ, 6*32+10) /*A Vector Carry-less Multiplication Instrs */ XEN_CPUFEATURE(AVX512_VNNI, 6*32+11) /*A Vector Neural Network Instrs */ XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A Support for VPOPCNT[B,W] and VPSHUFBITQMB */ XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A POPCNT for vectors of DW/QW */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -255,8 +255,9 @@ def crunch_numbers(state): # This is just the dependency between AVX512 and AVX2 of XSTATE # feature flags. If want to use AVX512, AVX2 must be supported and - # enabled. - AVX2: [AVX512F], + # enabled. Certain later extensions, acting on 256-bit vectors of + # integers, better depend on AVX2 than AVX. + AVX2: [AVX512F, VPCLMULQDQ], # AVX512F is taken to mean hardware support for 512bit registers # (which in practice depends on the EVEX prefix to encode) as well From patchwork Fri Mar 15 11:06:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854551 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD9DE14DE for ; Fri, 15 Mar 2019 11:07:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A48962A94B for ; Fri, 15 Mar 2019 11:07:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 98F632A94D; Fri, 15 Mar 2019 11:07:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 06C1E2A94B for ; Fri, 15 Mar 2019 11:07:37 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kf6-0001ga-3V; Fri, 15 Mar 2019 11:06:04 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kf5-0001gP-DA for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:06:03 +0000 X-Inumbo-ID: 51ec1320-4712-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 51ec1320-4712-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 11:06:02 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:06:01 -0600 Message-Id: <5C8B8719020000780021F320@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:06:01 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 45/50] x86emul: support VAES insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP As to the feature dependency adjustment, just like for VPCLMULQDQ while strictly speaking AVX is a sufficient prereq (to have YMM registers), 256-bit vectors of integers have got fully introduced with AVX2 only. A new test case (also covering AESNI) will be added to the harness by a subsequent patch. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- TBD: Should VAES also depend on AESNI? --- v8: No need to set fault_suppression to false. v7: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -591,6 +591,18 @@ static const struct test avx512_vpopcntd INSN(popcnt, 66, 0f38, 55, vl, dq, vl) }; +/* + * The uses of b in this table are simply (one of) the shortest form(s) of + * saying "no broadcast" without introducing a 128-bit granularity enumerator. + * Due to all of the insns being WIG, w, d_nb, and q_nb would all also fit. + */ +static const struct test vaes_all[] = { + INSN(aesdec, 66, 0f38, de, vl, b, vl), + INSN(aesdeclast, 66, 0f38, df, vl, b, vl), + INSN(aesenc, 66, 0f38, dc, vl, b, vl), + INSN(aesenclast, 66, 0f38, dd, vl, b, vl), +}; + static const struct test vpclmulqdq_all[] = { INSN(pclmulqdq, 66, 0f3a, 44, vl, q_nb, vl) }; @@ -975,6 +987,7 @@ void evex_disp8_test(void *instr, struct if ( cpu_has_avx512f ) { + RUN(vaes, all); RUN(vpclmulqdq, all); } } --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -144,6 +144,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) +#define cpu_has_vaes (cp.feat.vaes && xcr0_mask(6)) #define cpu_has_vpclmulqdq (cp.feat.vpclmulqdq && xcr0_mask(6)) #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6)) #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -541,7 +541,7 @@ static const struct ext0f38_table { [0xcc] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0xcd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xdb] = { .simd_size = simd_packed_int, .two_op = 1 }, - [0xdc ... 0xdf] = { .simd_size = simd_packed_int }, + [0xdc ... 0xdf] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0xf0] = { .two_op = 1 }, [0xf1] = { .to_mem = 1, .two_op = 1 }, [0xf2 ... 0xf3] = {}, @@ -1922,6 +1922,7 @@ static bool vcpu_has( #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) #define vcpu_has_avx512_vbmi() vcpu_has( 7, ECX, 1, ctxt, ops) #define vcpu_has_avx512_vbmi2() vcpu_has( 7, ECX, 6, ctxt, ops) +#define vcpu_has_vaes() vcpu_has( 7, ECX, 9, ctxt, ops) #define vcpu_has_vpclmulqdq() vcpu_has( 7, ECX, 10, ctxt, ops) #define vcpu_has_avx512_vnni() vcpu_has( 7, ECX, 11, ctxt, ops) #define vcpu_has_avx512_bitalg() vcpu_has( 7, ECX, 12, ctxt, ops) @@ -8935,13 +8936,9 @@ x86_emulate( case X86EMUL_OPC_66(0x0f38, 0xdb): /* aesimc xmm/m128,xmm */ case X86EMUL_OPC_VEX_66(0x0f38, 0xdb): /* vaesimc xmm/m128,xmm */ case X86EMUL_OPC_66(0x0f38, 0xdc): /* aesenc xmm/m128,xmm,xmm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0xdc): /* vaesenc xmm/m128,xmm,xmm */ case X86EMUL_OPC_66(0x0f38, 0xdd): /* aesenclast xmm/m128,xmm,xmm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0xdd): /* vaesenclast xmm/m128,xmm,xmm */ case X86EMUL_OPC_66(0x0f38, 0xde): /* aesdec xmm/m128,xmm,xmm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0xde): /* vaesdec xmm/m128,xmm,xmm */ case X86EMUL_OPC_66(0x0f38, 0xdf): /* aesdeclast xmm/m128,xmm,xmm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0xdf): /* vaesdeclast xmm/m128,xmm,xmm */ host_and_vcpu_must_have(aesni); if ( vex.opcx == vex_none ) goto simd_0f38_common; @@ -9655,6 +9652,24 @@ x86_emulate( host_and_vcpu_must_have(avx512er); goto simd_zmm_scalar_sae; + case X86EMUL_OPC_VEX_66(0x0f38, 0xdc): /* vaesenc {x,y}mm/mem,{x,y}mm,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0xdd): /* vaesenclast {x,y}mm/mem,{x,y}mm,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0xde): /* vaesdec {x,y}mm/mem,{x,y}mm,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0xdf): /* vaesdeclast {x,y}mm/mem,{x,y}mm,{x,y}mm */ + if ( !vex.l ) + host_and_vcpu_must_have(aesni); + else + host_and_vcpu_must_have(vaes); + goto simd_0f_avx; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0xdc): /* vaesenc [xyz]mm/mem,[xyz]mm,[xyz]mm */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xdd): /* vaesenclast [xyz]mm/mem,[xyz]mm,[xyz]mm */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xde): /* vaesdec [xyz]mm/mem,[xyz]mm,[xyz]mm */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xdf): /* vaesdeclast [xyz]mm/mem,[xyz]mm,[xyz]mm */ + host_and_vcpu_must_have(vaes); + generate_exception_if(evex.brs || evex.opmsk, EXC_UD); + goto avx512f_no_sae; + case X86EMUL_OPC(0x0f38, 0xf0): /* movbe m,r */ case X86EMUL_OPC(0x0f38, 0xf1): /* movbe r,m */ vcpu_must_have(movbe); --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -112,6 +112,7 @@ /* CPUID level 0x00000007:0.ecx */ #define cpu_has_avx512_vbmi boot_cpu_has(X86_FEATURE_AVX512_VBMI) #define cpu_has_avx512_vbmi2 boot_cpu_has(X86_FEATURE_AVX512_VBMI2) +#define cpu_has_vaes boot_cpu_has(X86_FEATURE_VAES) #define cpu_has_vpclmulqdq boot_cpu_has(X86_FEATURE_VPCLMULQDQ) #define cpu_has_avx512_vnni boot_cpu_has(X86_FEATURE_AVX512_VNNI) #define cpu_has_avx512_bitalg boot_cpu_has(X86_FEATURE_AVX512_BITALG) --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP, 6*32+ 2) / XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ XEN_CPUFEATURE(AVX512_VBMI2, 6*32+ 6) /*A Additional AVX-512 Vector Byte Manipulation Instrs */ +XEN_CPUFEATURE(VAES, 6*32+ 9) /*A Vector AES Instrs */ XEN_CPUFEATURE(VPCLMULQDQ, 6*32+10) /*A Vector Carry-less Multiplication Instrs */ XEN_CPUFEATURE(AVX512_VNNI, 6*32+11) /*A Vector Neural Network Instrs */ XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A Support for VPOPCNT[B,W] and VPSHUFBITQMB */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -257,7 +257,7 @@ def crunch_numbers(state): # feature flags. If want to use AVX512, AVX2 must be supported and # enabled. Certain later extensions, acting on 256-bit vectors of # integers, better depend on AVX2 than AVX. - AVX2: [AVX512F, VPCLMULQDQ], + AVX2: [AVX512F, VAES, VPCLMULQDQ], # AVX512F is taken to mean hardware support for 512bit registers # (which in practice depends on the EVEX prefix to encode) as well From patchwork Fri Mar 15 11:06:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854553 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84CA614DE for ; Fri, 15 Mar 2019 11:08:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A0112A94C for ; Fri, 15 Mar 2019 11:08:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5B3862A94E; Fri, 15 Mar 2019 11:08:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8C2B52A94C for ; Fri, 15 Mar 2019 11:08:16 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kfc-0001pN-Dy; Fri, 15 Mar 2019 11:06:36 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kfb-0001pE-G5 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:06:35 +0000 X-Inumbo-ID: 63619caa-4712-11e9-8595-c71003392d02 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 63619caa-4712-11e9-8595-c71003392d02; Fri, 15 Mar 2019 11:06:32 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:06:31 -0600 Message-Id: <5C8B8733020000780021F323@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:06:27 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 46/50] x86emul: support GFNI insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Note that the ISA extensions document revision 035 is ambiguous regarding fault suppression for VGF2P8MULB: Text says it's supported, while the exception specification listed is E4NF. Given the wording here and for the other two insns I'm inclined to trust the text more than the exception reference, which was also confirmed informally. As to the feature dependency adjustment, while strictly speaking SSE is a sufficient prereq (to have XMM registers), vectors of bytes and qwords have got introduced only with SSE2. gcc, for example, uses a similar connection in its respective intrinsics header. Signed-off-by: Jan Beulich --- v8: Add {evex}-producing vgf2p8mulb alias to simd.h. Add missing simd.h dependency. Re-base. v7: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -19,7 +19,8 @@ CFLAGS += $(CFLAGS_xeninclude) SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er avx512vbmi FMA := fma4 fma SG := avx2-sg avx512f-sg avx512vl-sg -TESTCASES := blowfish $(SIMD) $(FMA) $(SG) +GF := sse2-gf avx2-gf avx512bw-gf +TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(GF) OPMASK := avx512f avx512dq avx512bw @@ -142,12 +143,17 @@ $(1)-cflags := \ $(foreach flt,$($(1)-flts), \ "-D_$(vec)x$(idx)f$(flt) -m$(1:-sg=) $(call non-sse,$(1)) -Os -DVEC_MAX=$(vec) -DIDX_SIZE=$(idx) -DFLOAT_SIZE=$(flt)"))) endef +define simd-gf-defs +$(1)-cflags := $(foreach vec,$($(1:-gf=)-vecs), \ + "-D_$(vec) -mgfni -m$(1:-gf=) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") +endef define opmask-defs $(1)-opmask-cflags := $(foreach vec,$($(1)-opmask-vecs), "-D_$(vec) -m$(1) -Os -DSIZE=$(vec)") endef $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor)))) $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor)))) +$(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor)))) $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor)))) first-string = $(shell for s in $(1); do echo "$$s"; break; done) @@ -197,7 +203,10 @@ $(addsuffix .c,$(FMA)): $(addsuffix .c,$(SG)): ln -sf simd-sg.c $@ -$(addsuffix .h,$(SIMD) $(FMA) $(SG)): simd.h +$(addsuffix .c,$(GF)): + ln -sf simd-gf.c $@ + +$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(GF)): simd.h xop.h avx512f.h: simd-fma.c --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -591,6 +591,12 @@ static const struct test avx512_vpopcntd INSN(popcnt, 66, 0f38, 55, vl, dq, vl) }; +static const struct test gfni_all[] = { + INSN(gf2p8affineinvqb, 66, 0f3a, cf, vl, q, vl), + INSN(gf2p8affineqb, 66, 0f3a, ce, vl, q, vl), + INSN(gf2p8mulb, 66, 0f38, cf, vl, b, vl), +}; + /* * The uses of b in this table are simply (one of) the shortest form(s) of * saying "no broadcast" without introducing a 128-bit granularity enumerator. @@ -987,6 +993,7 @@ void evex_disp8_test(void *instr, struct if ( cpu_has_avx512f ) { + RUN(gfni, all); RUN(vaes, all); RUN(vpclmulqdq, all); } --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -371,6 +371,7 @@ OVR(cvttsd2siq); OVR(cvttss2si); OVR(cvttss2sil); OVR(cvttss2siq); +OVR(gf2p8mulb); OVR(movddup); OVR(movntdq); OVR(movntdqa); --- /dev/null +++ b/tools/tests/x86_emulator/simd-gf.c @@ -0,0 +1,80 @@ +#define UINT_SIZE 1 + +#include "simd.h" +ENTRY(gf_test); + +#if VEC_SIZE == 16 +# define GF(op, s, a...) __builtin_ia32_vgf2p8 ## op ## _v16qi ## s(a) +#elif VEC_SIZE == 32 +# define GF(op, s, a...) __builtin_ia32_vgf2p8 ## op ## _v32qi ## s(a) +#elif VEC_SIZE == 64 +# define GF(op, s, a...) __builtin_ia32_vgf2p8 ## op ## _v64qi ## s(a) +#endif + +#ifdef __AVX512BW__ +# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT)) +# define eq(x, y) (B(pcmpeqb, _mask, (vqi_t)(x), (vqi_t)(y), -1) == ALL_TRUE) +# define mul(x, y) GF(mulb, _mask, (vqi_t)(x), (vqi_t)(y), (vqi_t)undef(), ~0) +# define transform(m, dir, x, c) ({ \ + vec_t t_; \ + asm ( "vgf2p8affine" #dir "qb %[imm], %[matrix]%{1to%c[n]%}, %[src], %[dst]" \ + : [dst] "=v" (t_) \ + : [matrix] "m" (m), [src] "v" (x), [imm] "i" (c), [n] "i" (VEC_SIZE / 8) ); \ + t_; \ +}) +#else +# if defined(__AVX2__) +# define bcstq(x) ({ \ + vdi_t t_; \ + asm ( "vpbroadcastq %1, %0" : "=x" (t_) : "m" (x) ); \ + t_; \ +}) +# define to_bool(cmp) B(ptestc, , cmp, (vdi_t){} == 0) +# else +# define bcstq(x) ((vdi_t){x, x}) +# define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff) +# endif +# define eq(x, y) to_bool((x) == (y)) +# define mul(x, y) GF(mulb, , (vqi_t)(x), (vqi_t)(y)) +# define transform(m, dir, x, c) ({ \ + vdi_t m_ = bcstq(m); \ + touch(m_); \ + ((vec_t)GF(affine ## dir ## qb, , (vqi_t)(x), (vqi_t)m_, c)); \ +}) +#endif + +const unsigned __attribute__((mode(DI))) ident = 0x0102040810204080ULL; + +int gf_test(void) +{ + unsigned int i; + vec_t src, one; + + for ( i = 0; i < ELEM_COUNT; ++i ) + { + src[i] = i; + one[i] = 1; + } + + /* Special case for first iteration. */ + one[0] = 0; + + do { + vec_t inv = transform(ident, inv, src, 0); + + touch(src); + touch(inv); + if ( !eq(mul(src, inv), one) ) return __LINE__; + + touch(src); + touch(inv); + if ( !eq(mul(inv, src), one) ) return __LINE__; + + one[0] = 1; + + src += ELEM_COUNT; + i += ELEM_COUNT; + } while ( i < 256 ); + + return 0; +} --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -11,12 +11,14 @@ asm ( ".pushsection .test, \"ax\", @prog #include "3dnow.h" #include "sse.h" #include "sse2.h" +#include "sse2-gf.h" #include "sse4.h" #include "avx.h" #include "fma4.h" #include "fma.h" #include "avx2.h" #include "avx2-sg.h" +#include "avx2-gf.h" #include "xop.h" #include "avx512f-opmask.h" #include "avx512dq-opmask.h" @@ -25,6 +27,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512f-sg.h" #include "avx512vl-sg.h" #include "avx512bw.h" +#include "avx512bw-gf.h" #include "avx512dq.h" #include "avx512er.h" #include "avx512vbmi.h" @@ -138,6 +141,26 @@ static bool simd_check_avx512vbmi_vl(voi return cpu_has_avx512_vbmi && cpu_has_avx512vl; } +static bool simd_check_sse2_gf(void) +{ + return cpu_has_gfni && cpu_has_sse2; +} + +static bool simd_check_avx2_gf(void) +{ + return cpu_has_gfni && cpu_has_avx2; +} + +static bool simd_check_avx512bw_gf(void) +{ + return cpu_has_gfni && cpu_has_avx512bw; +} + +static bool simd_check_avx512bw_gf_vl(void) +{ + return cpu_has_gfni && cpu_has_avx512vl; +} + static void simd_set_regs(struct cpu_user_regs *regs) { if ( cpu_has_mmx ) @@ -395,6 +418,12 @@ static const struct { AVX512VL(_VBMI+VL u16x8, avx512vbmi, 16u2), AVX512VL(_VBMI+VL s16x16, avx512vbmi, 32i2), AVX512VL(_VBMI+VL u16x16, avx512vbmi, 32u2), + SIMD(GFNI (legacy), sse2_gf, 16), + SIMD(GFNI (VEX/x16), avx2_gf, 16), + SIMD(GFNI (VEX/x32), avx2_gf, 32), + SIMD(GFNI (EVEX/x64), avx512bw_gf, 64), + AVX512VL(VL+GFNI (x16), avx512bw_gf, 16), + AVX512VL(VL+GFNI (x32), avx512bw_gf, 32), #undef AVX512VL_ #undef AVX512VL #undef SIMD_ --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -144,6 +144,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) +#define cpu_has_gfni cp.feat.gfni #define cpu_has_vaes (cp.feat.vaes && xcr0_mask(6)) #define cpu_has_vpclmulqdq (cp.feat.vpclmulqdq && xcr0_mask(6)) #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -540,6 +540,7 @@ static const struct ext0f38_table { [0xcb] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xcc] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0xcd] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0xcf] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0xdb] = { .simd_size = simd_packed_int, .two_op = 1 }, [0xdc ... 0xdf] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0xf0] = { .two_op = 1 }, @@ -619,6 +620,7 @@ static const struct ext0f3a_table { [0x7c ... 0x7d] = { .simd_size = simd_packed_fp, .four_op = 1 }, [0x7e ... 0x7f] = { .simd_size = simd_scalar_opc, .four_op = 1 }, [0xcc] = { .simd_size = simd_other }, + [0xce ... 0xcf] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0xdf] = { .simd_size = simd_packed_int, .two_op = 1 }, [0xf0] = {}, }; @@ -1922,6 +1924,7 @@ static bool vcpu_has( #define vcpu_has_avx512vl() vcpu_has( 7, EBX, 31, ctxt, ops) #define vcpu_has_avx512_vbmi() vcpu_has( 7, ECX, 1, ctxt, ops) #define vcpu_has_avx512_vbmi2() vcpu_has( 7, ECX, 6, ctxt, ops) +#define vcpu_has_gfni() vcpu_has( 7, ECX, 8, ctxt, ops) #define vcpu_has_vaes() vcpu_has( 7, ECX, 9, ctxt, ops) #define vcpu_has_vpclmulqdq() vcpu_has( 7, ECX, 10, ctxt, ops) #define vcpu_has_avx512_vnni() vcpu_has( 7, ECX, 11, ctxt, ops) @@ -9652,6 +9655,21 @@ x86_emulate( host_and_vcpu_must_have(avx512er); goto simd_zmm_scalar_sae; + case X86EMUL_OPC_66(0x0f38, 0xcf): /* gf2p8mulb xmm/m128,xmm */ + host_and_vcpu_must_have(gfni); + goto simd_0f38_common; + + case X86EMUL_OPC_VEX_66(0x0f38, 0xcf): /* vgf2p8mulb {x,y}mm/mem,{x,y}mm,{x,y}mm */ + host_and_vcpu_must_have(gfni); + generate_exception_if(vex.w, EXC_UD); + goto simd_0f_avx; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0xcf): /* vgf2p8mulb [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(gfni); + generate_exception_if(evex.w || evex.brs, EXC_UD); + elem_bytes = 1; + goto avx512f_no_sae; + case X86EMUL_OPC_VEX_66(0x0f38, 0xdc): /* vaesenc {x,y}mm/mem,{x,y}mm,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0xdd): /* vaesenclast {x,y}mm/mem,{x,y}mm,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0xde): /* vaesdec {x,y}mm/mem,{x,y}mm,{x,y}mm */ @@ -10395,6 +10413,24 @@ x86_emulate( op_bytes = 16; goto simd_0f3a_common; + case X86EMUL_OPC_66(0x0f3a, 0xce): /* gf2p8affineqb $imm8,xmm/m128,xmm */ + case X86EMUL_OPC_66(0x0f3a, 0xcf): /* gf2p8affineinvqb $imm8,xmm/m128,xmm */ + host_and_vcpu_must_have(gfni); + goto simd_0f3a_common; + + case X86EMUL_OPC_VEX_66(0x0f3a, 0xce): /* vgf2p8affineqb $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f3a, 0xcf): /* vgf2p8affineinvqb $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */ + host_and_vcpu_must_have(gfni); + generate_exception_if(!vex.w, EXC_UD); + goto simd_0f_imm8_avx; + + case X86EMUL_OPC_EVEX_66(0x0f3a, 0xce): /* vgf2p8affineqb $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0xcf): /* vgf2p8affineinvqb $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(gfni); + generate_exception_if(!evex.w, EXC_UD); + fault_suppression = false; + goto avx512f_imm8_no_sae; + case X86EMUL_OPC_66(0x0f3a, 0xdf): /* aeskeygenassist $imm8,xmm/m128,xmm */ case X86EMUL_OPC_VEX_66(0x0f3a, 0xdf): /* vaeskeygenassist $imm8,xmm/m128,xmm */ host_and_vcpu_must_have(aesni); --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -112,6 +112,7 @@ /* CPUID level 0x00000007:0.ecx */ #define cpu_has_avx512_vbmi boot_cpu_has(X86_FEATURE_AVX512_VBMI) #define cpu_has_avx512_vbmi2 boot_cpu_has(X86_FEATURE_AVX512_VBMI2) +#define cpu_has_gfni boot_cpu_has(X86_FEATURE_GFNI) #define cpu_has_vaes boot_cpu_has(X86_FEATURE_VAES) #define cpu_has_vpclmulqdq boot_cpu_has(X86_FEATURE_VPCLMULQDQ) #define cpu_has_avx512_vnni boot_cpu_has(X86_FEATURE_AVX512_VNNI) --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP, 6*32+ 2) / XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ XEN_CPUFEATURE(AVX512_VBMI2, 6*32+ 6) /*A Additional AVX-512 Vector Byte Manipulation Instrs */ +XEN_CPUFEATURE(GFNI, 6*32+ 8) /*A Galois Field Instrs */ XEN_CPUFEATURE(VAES, 6*32+ 9) /*A Vector AES Instrs */ XEN_CPUFEATURE(VPCLMULQDQ, 6*32+10) /*A Vector Carry-less Multiplication Instrs */ XEN_CPUFEATURE(AVX512_VNNI, 6*32+11) /*A Vector Neural Network Instrs */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -197,7 +197,7 @@ def crunch_numbers(state): # %XMM support, without specific inter-dependencies. Additionally # AMD has a special mis-alignment sub-mode. SSE: [SSE2, SSE3, SSSE3, SSE4A, MISALIGNSSE, - AESNI, PCLMULQDQ, SHA], + AESNI, PCLMULQDQ, SHA, GFNI], # SSE2 was re-specified as core instructions for 64bit. SSE2: [LM], From patchwork Fri Mar 15 11:07:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 138FA15AC for ; Fri, 15 Mar 2019 11:08:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E5D012A94C for ; Fri, 15 Mar 2019 11:08:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D4BBF2A94E; Fri, 15 Mar 2019 11:08:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6C31D2A94C for ; Fri, 15 Mar 2019 11:08:52 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kgB-0001yj-3G; Fri, 15 Mar 2019 11:07:11 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kgA-0001yT-6A for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:07:10 +0000 X-Inumbo-ID: 79038e7b-4712-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 79038e7b-4712-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 11:07:08 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:07:07 -0600 Message-Id: <5C8B875C020000780021F326@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:07:08 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 47/50] x86emul: restore ordering within main switch statement X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Incremental additions and/or mistakes have lead to some code blocks sitting in "unexpected" places. Re-sort the case blocks (opcode space; major opcode; 66/F3/F2 prefix; legacy/VEX/EVEX encoding). As an exception the opcode space 0x0f EVEX-encoded VPEXTRW is left at its current place, to keep it close to the "pextr" label. Pure code movement. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v7: New. --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -7129,15 +7129,6 @@ x86_emulate( ASSERT(!state->simd_size); break; - case X86EMUL_OPC_EVEX_F3(0x0f, 0x7e): /* vmovq xmm/m64,xmm */ - case X86EMUL_OPC_EVEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */ - generate_exception_if(evex.lr || !evex.w || evex.opmsk || evex.brs, - EXC_UD); - host_and_vcpu_must_have(avx512f); - d |= TwoOp; - op_bytes = 8; - goto simd_zmm; - case X86EMUL_OPC_66(0x0f, 0xe7): /* movntdq xmm,m128 */ case X86EMUL_OPC_VEX_66(0x0f, 0xe7): /* vmovntdq {x,y}mm,mem */ generate_exception_if(ea.type != OP_MEM, EXC_UD); @@ -7535,6 +7526,15 @@ x86_emulate( op_bytes = 8; goto simd_0f_int; + case X86EMUL_OPC_EVEX_F3(0x0f, 0x7e): /* vmovq xmm/m64,xmm */ + case X86EMUL_OPC_EVEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */ + generate_exception_if(evex.lr || !evex.w || evex.opmsk || evex.brs, + EXC_UD); + host_and_vcpu_must_have(avx512f); + d |= TwoOp; + op_bytes = 8; + goto simd_zmm; + case X86EMUL_OPC(0x0f, 0x80) ... X86EMUL_OPC(0x0f, 0x8f): /* jcc (near) */ if ( test_cc(b, _regs.eflags) ) jmp_rel((int32_t)src.val); @@ -8635,63 +8635,6 @@ x86_emulate( dst.type = OP_NONE; break; - case X86EMUL_OPC_EVEX_66(0x0f38, 0x10): /* vpsrlvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x11): /* vpsravw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x12): /* vpsllvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - host_and_vcpu_must_have(avx512bw); - generate_exception_if(!evex.w || evex.brs, EXC_UD); - elem_bytes = 2; - goto avx512f_no_sae; - - case X86EMUL_OPC_EVEX_66(0x0f38, 0x18): /* vbroadcastss xmm/m32,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,[xyz]mm{k} */ - op_bytes = elem_bytes; - generate_exception_if(evex.w || evex.brs, EXC_UD); - avx512_broadcast: - /* - * For the respective code below the main switch() to work we need to - * fold op_mask here: A source element gets read whenever any of its - * respective destination elements' mask bits is set. - */ - if ( fault_suppression ) - { - n = 1 << ((b & 3) - evex.w); - EXPECT(elem_bytes > 0); - ASSERT(op_bytes == n * elem_bytes); - for ( i = n; i < (16 << evex.lr) / elem_bytes; i += n ) - op_mask |= (op_mask >> i) & ((1 << n) - 1); - } - goto avx512f_no_sae; - - case X86EMUL_OPC_EVEX_66(0x0f38, 0x1b): /* vbroadcastf32x8 m256,zmm{k} */ - /* vbroadcastf64x4 m256,zmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x5b): /* vbroadcasti32x8 m256,zmm{k} */ - /* vbroadcasti64x4 m256,zmm{k} */ - generate_exception_if(ea.type != OP_MEM || evex.lr != 2, EXC_UD); - /* fall through */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x19): /* vbroadcastsd xmm/m64,{y,z}mm{k} */ - /* vbroadcastf32x2 xmm/m64,{y,z}mm{k} */ - generate_exception_if(!evex.lr, EXC_UD); - /* fall through */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,[xyz]mm{k} */ - /* vbroadcasti32x2 xmm/m64,[xyz]mm{k} */ - if ( b == 0x59 ) - op_bytes = 8; - generate_exception_if(evex.brs, EXC_UD); - if ( !evex.w ) - host_and_vcpu_must_have(avx512dq); - goto avx512_broadcast; - - case X86EMUL_OPC_EVEX_66(0x0f38, 0x1a): /* vbroadcastf32x4 m128,{y,z}mm{k} */ - /* vbroadcastf64x2 m128,{y,z}mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x5a): /* vbroadcasti32x4 m128,{y,z}mm{k} */ - /* vbroadcasti64x2 m128,{y,z}mm{k} */ - generate_exception_if(ea.type != OP_MEM || !evex.lr || evex.brs, - EXC_UD); - if ( evex.w ) - host_and_vcpu_must_have(avx512dq); - goto avx512_broadcast; - case X86EMUL_OPC_66(0x0f38, 0x20): /* pmovsxbw xmm/m64,xmm */ case X86EMUL_OPC_66(0x0f38, 0x21): /* pmovsxbd xmm/m32,xmm */ case X86EMUL_OPC_66(0x0f38, 0x22): /* pmovsxbq xmm/m16,xmm */ @@ -8725,47 +8668,14 @@ x86_emulate( host_and_vcpu_must_have(sse4_1); goto simd_0f38_common; - case X86EMUL_OPC_VEX_66(0x0f38, 0x13): /* vcvtph2ps xmm/mem,{x,y}mm */ - generate_exception_if(vex.w, EXC_UD); - host_and_vcpu_must_have(f16c); - op_bytes = 8 << vex.l; - goto simd_0f_ymm; - - case X86EMUL_OPC_EVEX_66(0x0f38, 0x13): /* vcvtph2ps {x,y}mm/mem,[xyz]mm{k} */ - generate_exception_if(evex.w || (ea.type != OP_REG && evex.brs), EXC_UD); - host_and_vcpu_must_have(avx512f); - if ( !evex.brs ) - avx512_vlen_check(false); - op_bytes = 8 << evex.lr; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x10): /* vpsrlvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x11): /* vpsravw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x12): /* vpsllvw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512bw); + generate_exception_if(!evex.w || evex.brs, EXC_UD); elem_bytes = 2; - goto simd_zmm; - - case X86EMUL_OPC_VEX_66(0x0f38, 0x16): /* vpermps ymm/m256,ymm,ymm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x36): /* vpermd ymm/m256,ymm,ymm */ - generate_exception_if(!vex.l || vex.w, EXC_UD); - goto simd_0f_avx2; - - case X86EMUL_OPC_EVEX_66(0x0f38, 0x16): /* vpermp{s,d} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x36): /* vperm{d,q} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ - generate_exception_if(!evex.lr, EXC_UD); - fault_suppression = false; goto avx512f_no_sae; - case X86EMUL_OPC_VEX_66(0x0f38, 0x20): /* vpmovsxbw xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x23): /* vpmovsxwd xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x24): /* vpmovsxwq xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x25): /* vpmovsxdq xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x30): /* vpmovzxbw xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x31): /* vpmovzxbd xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x32): /* vpmovzxbq xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x33): /* vpmovzxwd xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x34): /* vpmovzxwq xmm/mem,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x35): /* vpmovzxdq xmm/mem,{x,y}mm */ - op_bytes = 16 >> (pmov_convert_delta[b & 7] - vex.l); - goto simd_0f_int; - case X86EMUL_OPC_EVEX_F3(0x0f38, 0x10): /* vpmovuswb [xyz]mm,{x,y}mm/mem{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x20): /* vpmovsxbw {x,y}mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_F3(0x0f38, 0x20): /* vpmovswb [xyz]mm,{x,y}mm/mem{k} */ @@ -8811,6 +8721,96 @@ x86_emulate( elem_bytes = (b & 7) < 3 ? 1 : (b & 7) != 5 ? 2 : 4; goto avx512f_no_sae; + case X86EMUL_OPC_VEX_66(0x0f38, 0x13): /* vcvtph2ps xmm/mem,{x,y}mm */ + generate_exception_if(vex.w, EXC_UD); + host_and_vcpu_must_have(f16c); + op_bytes = 8 << vex.l; + goto simd_0f_ymm; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x13): /* vcvtph2ps {x,y}mm/mem,[xyz]mm{k} */ + generate_exception_if(evex.w || (ea.type != OP_REG && evex.brs), EXC_UD); + host_and_vcpu_must_have(avx512f); + if ( !evex.brs ) + avx512_vlen_check(false); + op_bytes = 8 << evex.lr; + elem_bytes = 2; + goto simd_zmm; + + case X86EMUL_OPC_VEX_66(0x0f38, 0x16): /* vpermps ymm/m256,ymm,ymm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x36): /* vpermd ymm/m256,ymm,ymm */ + generate_exception_if(!vex.l || vex.w, EXC_UD); + goto simd_0f_avx2; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x16): /* vpermp{s,d} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x36): /* vperm{d,q} {y,z}mm/mem,{y,z}mm,{y,z}mm{k} */ + generate_exception_if(!evex.lr, EXC_UD); + fault_suppression = false; + goto avx512f_no_sae; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x18): /* vbroadcastss xmm/m32,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,[xyz]mm{k} */ + op_bytes = elem_bytes; + generate_exception_if(evex.w || evex.brs, EXC_UD); + avx512_broadcast: + /* + * For the respective code below the main switch() to work we need to + * fold op_mask here: A source element gets read whenever any of its + * respective destination elements' mask bits is set. + */ + if ( fault_suppression ) + { + n = 1 << ((b & 3) - evex.w); + EXPECT(elem_bytes > 0); + ASSERT(op_bytes == n * elem_bytes); + for ( i = n; i < (16 << evex.lr) / elem_bytes; i += n ) + op_mask |= (op_mask >> i) & ((1 << n) - 1); + } + goto avx512f_no_sae; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x1b): /* vbroadcastf32x8 m256,zmm{k} */ + /* vbroadcastf64x4 m256,zmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x5b): /* vbroadcasti32x8 m256,zmm{k} */ + /* vbroadcasti64x4 m256,zmm{k} */ + generate_exception_if(ea.type != OP_MEM || evex.lr != 2, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x19): /* vbroadcastsd xmm/m64,{y,z}mm{k} */ + /* vbroadcastf32x2 xmm/m64,{y,z}mm{k} */ + generate_exception_if(!evex.lr, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,[xyz]mm{k} */ + /* vbroadcasti32x2 xmm/m64,[xyz]mm{k} */ + if ( b == 0x59 ) + op_bytes = 8; + generate_exception_if(evex.brs, EXC_UD); + if ( !evex.w ) + host_and_vcpu_must_have(avx512dq); + goto avx512_broadcast; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x1a): /* vbroadcastf32x4 m128,{y,z}mm{k} */ + /* vbroadcastf64x2 m128,{y,z}mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x5a): /* vbroadcasti32x4 m128,{y,z}mm{k} */ + /* vbroadcasti64x2 m128,{y,z}mm{k} */ + generate_exception_if(ea.type != OP_MEM || !evex.lr || evex.brs, + EXC_UD); + if ( evex.w ) + host_and_vcpu_must_have(avx512dq); + goto avx512_broadcast; + + case X86EMUL_OPC_VEX_66(0x0f38, 0x20): /* vpmovsxbw xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x21): /* vpmovsxbd xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x22): /* vpmovsxbq xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x23): /* vpmovsxwd xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x24): /* vpmovsxwq xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x25): /* vpmovsxdq xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x30): /* vpmovzxbw xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x31): /* vpmovzxbd xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x32): /* vpmovzxbq xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x33): /* vpmovzxwd xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x34): /* vpmovzxwq xmm/mem,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x35): /* vpmovzxdq xmm/mem,{x,y}mm */ + op_bytes = 16 >> (pmov_convert_delta[b & 7] - vex.l); + goto simd_0f_int; + case X86EMUL_OPC_EVEX_F3(0x0f38, 0x29): /* vpmov{b,w}2m [xyz]mm,k */ case X86EMUL_OPC_EVEX_F3(0x0f38, 0x39): /* vpmov{d,q}2m [xyz]mm,k */ generate_exception_if(!evex.r || !evex.R, EXC_UD); @@ -8918,6 +8918,52 @@ x86_emulate( break; } + case X86EMUL_OPC_EVEX_66(0x0f38, 0x2c): /* vscalefp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x42): /* vgetexpp{s,d} [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x96): /* vfmaddsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x97): /* vfmsubadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x98): /* vfmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x9a): /* vfmsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x9c): /* vfnmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x9e): /* vfnmsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa6): /* vfmaddsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa7): /* vfmsubadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa8): /* vfmadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xaa): /* vfmsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xac): /* vfnmadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xae): /* vfnmsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xb6): /* vfmaddsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xb7): /* vfmsubadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xb8): /* vfmadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xba): /* vfmsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xbc): /* vfnmadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xbe): /* vfnmsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512f); + if ( ea.type != OP_REG || !evex.brs ) + avx512_vlen_check(false); + goto simd_zmm; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x2d): /* vscalefs{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x43): /* vgetexps{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x99): /* vfmadd132s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x9b): /* vfmsub132s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x9d): /* vfnmadd132s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x9f): /* vfnmsub132s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa9): /* vfmadd213s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xab): /* vfmsub213s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xad): /* vfnmadd213s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xaf): /* vfnmsub213s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xb9): /* vfmadd231s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xbb): /* vfmsub231s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xbd): /* vfnmadd231s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xbf): /* vfnmsub231s{s,d} xmm/mem,xmm,xmm{k} */ + host_and_vcpu_must_have(avx512f); + simd_zmm_scalar_sae: + generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD); + if ( !evex.brs ) + avx512_vlen_check(true); + goto simd_zmm; + case X86EMUL_OPC_66(0x0f38, 0x37): /* pcmpgtq xmm/m128,xmm */ host_and_vcpu_must_have(sse4_2); goto simd_0f38_common; @@ -8950,6 +8996,31 @@ x86_emulate( generate_exception_if(vex.l, EXC_UD); goto simd_0f_avx; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x50): /* vpdpbusd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x51): /* vpdpbusds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x52): /* vpdpwssd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x53): /* vpdpwssds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_vnni); + generate_exception_if(evex.w, EXC_UD); + goto avx512f_no_sae; + + case X86EMUL_OPC_VEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x79): /* vpbroadcastw xmm/m16,{x,y}mm */ + op_bytes = 1 << ((!(b & 0x20) * 2) + (b & 1)); + /* fall through */ + case X86EMUL_OPC_VEX_66(0x0f38, 0x46): /* vpsravd {x,y}mm/mem,{x,y}mm,{x,y}mm */ + generate_exception_if(vex.w, EXC_UD); + goto simd_0f_avx2; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x4d): /* vrcp14s{s,d} xmm/mem,xmm,xmm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x4f): /* vrsqrt14s{s,d} xmm/mem,xmm,xmm{k} */ + host_and_vcpu_must_have(avx512f); + generate_exception_if(evex.brs, EXC_UD); + avx512_vlen_check(true); + goto simd_zmm; + case X86EMUL_OPC_EVEX_F2(0x0f38, 0x52): /* vp4dpwssd m128,zmm+3,zmm{k} */ case X86EMUL_OPC_EVEX_F2(0x0f38, 0x53): /* vp4dpwssds m128,zmm+3,zmm{k} */ host_and_vcpu_must_have(avx512_4vnniw); @@ -8972,23 +9043,6 @@ x86_emulate( host_and_vcpu_must_have(avx512_vpopcntdq); goto avx512f_no_sae; - case X86EMUL_OPC_VEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x79): /* vpbroadcastw xmm/m16,{x,y}mm */ - op_bytes = 1 << ((!(b & 0x20) * 2) + (b & 1)); - /* fall through */ - case X86EMUL_OPC_VEX_66(0x0f38, 0x46): /* vpsravd {x,y}mm/mem,{x,y}mm,{x,y}mm */ - generate_exception_if(vex.w, EXC_UD); - goto simd_0f_avx2; - - case X86EMUL_OPC_EVEX_66(0x0f38, 0x4d): /* vrcp14s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x4f): /* vrsqrt14s{s,d} xmm/mem,xmm,xmm{k} */ - host_and_vcpu_must_have(avx512f); - generate_exception_if(evex.brs, EXC_UD); - avx512_vlen_check(true); - goto simd_zmm; - case X86EMUL_OPC_VEX_66(0x0f38, 0x5a): /* vbroadcasti128 m128,ymm */ generate_exception_if(ea.type != OP_MEM || !vex.l || vex.w, EXC_UD); goto simd_0f_avx2; @@ -9370,60 +9424,6 @@ x86_emulate( host_and_vcpu_must_have(fma); goto simd_0f_ymm; - case X86EMUL_OPC_EVEX_66(0x0f38, 0x2c): /* vscalefp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x42): /* vgetexpp{s,d} [xyz]mm/mem,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x96): /* vfmaddsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x97): /* vfmsubadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x98): /* vfmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x9a): /* vfmsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x9c): /* vfnmadd132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x9e): /* vfnmsub132p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xa6): /* vfmaddsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xa7): /* vfmsubadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xa8): /* vfmadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xaa): /* vfmsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xac): /* vfnmadd213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xae): /* vfnmsub213p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xb6): /* vfmaddsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xb7): /* vfmsubadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xb8): /* vfmadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xba): /* vfmsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xbc): /* vfnmadd231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xbe): /* vfnmsub231p{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - host_and_vcpu_must_have(avx512f); - if ( ea.type != OP_REG || !evex.brs ) - avx512_vlen_check(false); - goto simd_zmm; - - case X86EMUL_OPC_EVEX_66(0x0f38, 0x2d): /* vscalefs{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x43): /* vgetexps{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x99): /* vfmadd132s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x9b): /* vfmsub132s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x9d): /* vfnmadd132s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x9f): /* vfnmsub132s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xa9): /* vfmadd213s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xab): /* vfmsub213s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xad): /* vfnmadd213s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xaf): /* vfnmsub213s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xb9): /* vfmadd231s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xbb): /* vfmsub231s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xbd): /* vfnmadd231s{s,d} xmm/mem,xmm,xmm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0xbf): /* vfnmsub231s{s,d} xmm/mem,xmm,xmm{k} */ - host_and_vcpu_must_have(avx512f); - simd_zmm_scalar_sae: - generate_exception_if(ea.type != OP_REG && evex.brs, EXC_UD); - if ( !evex.brs ) - avx512_vlen_check(true); - goto simd_zmm; - - case X86EMUL_OPC_EVEX_66(0x0f38, 0x50): /* vpdpbusd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x51): /* vpdpbusds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x52): /* vpdpwssd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - case X86EMUL_OPC_EVEX_66(0x0f38, 0x53): /* vpdpwssds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - host_and_vcpu_must_have(avx512_vnni); - generate_exception_if(evex.w, EXC_UD); - goto avx512f_no_sae; - case X86EMUL_OPC_EVEX_F2(0x0f38, 0x9a): /* v4fmaddps m128,zmm+3,zmm{k} */ case X86EMUL_OPC_EVEX_F2(0x0f38, 0xaa): /* v4fnmaddps m128,zmm+3,zmm{k} */ host_and_vcpu_must_have(avx512_4fmaps); @@ -10266,11 +10266,6 @@ x86_emulate( generate_exception_if(evex.brs || evex.opmsk, EXC_UD); goto avx512f_imm8_no_sae; - case X86EMUL_OPC_VEX_66(0x0f3a, 0x4a): /* vblendvps {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */ - case X86EMUL_OPC_VEX_66(0x0f3a, 0x4b): /* vblendvpd {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */ - generate_exception_if(vex.w, EXC_UD); - goto simd_0f_imm8_avx; - case X86EMUL_OPC_VEX_66(0x0f3a, 0x48): /* vpermil2ps $imm,{x,y}mm/mem,{x,y}mm,{x,y}mm,{x,y}mm */ /* vpermil2ps $imm,{x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f3a, 0x49): /* vpermil2pd $imm,{x,y}mm/mem,{x,y}mm,{x,y}mm,{x,y}mm */ @@ -10278,6 +10273,11 @@ x86_emulate( host_and_vcpu_must_have(xop); goto simd_0f_imm8_ymm; + case X86EMUL_OPC_VEX_66(0x0f3a, 0x4a): /* vblendvps {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */ + case X86EMUL_OPC_VEX_66(0x0f3a, 0x4b): /* vblendvpd {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */ + generate_exception_if(vex.w, EXC_UD); + goto simd_0f_imm8_avx; + case X86EMUL_OPC_VEX_66(0x0f3a, 0x4c): /* vpblendvb {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */ generate_exception_if(vex.w, EXC_UD); goto simd_0f_int_imm8; From patchwork Fri Mar 15 11:07:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3460514DE for ; Fri, 15 Mar 2019 11:09:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B25D2A16E for ; Fri, 15 Mar 2019 11:09:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0F8692A181; Fri, 15 Mar 2019 11:09:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 614A22A171 for ; Fri, 15 Mar 2019 11:09:10 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kgY-00026o-Iy; Fri, 15 Mar 2019 11:07:34 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kgW-00026O-Q9 for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:07:32 +0000 X-Inumbo-ID: 8640fe96-4712-11e9-b7cf-bf1230b85b28 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 8640fe96-4712-11e9-b7cf-bf1230b85b28; Fri, 15 Mar 2019 11:07:30 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:07:29 -0600 Message-Id: <5C8B8771020000780021F329@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:07:29 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 48/50] x86emul: add an AES/VAES test case to the harness X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jan Beulich --- v8: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -19,8 +19,9 @@ CFLAGS += $(CFLAGS_xeninclude) SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er avx512vbmi FMA := fma4 fma SG := avx2-sg avx512f-sg avx512vl-sg +AES := ssse3-aes avx-aes avx2-vaes avx512bw-vaes GF := sse2-gf avx2-gf avx512bw-gf -TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(GF) +TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(GF) OPMASK := avx512f avx512dq avx512bw @@ -143,6 +144,10 @@ $(1)-cflags := \ $(foreach flt,$($(1)-flts), \ "-D_$(vec)x$(idx)f$(flt) -m$(1:-sg=) $(call non-sse,$(1)) -Os -DVEC_MAX=$(vec) -DIDX_SIZE=$(idx) -DFLOAT_SIZE=$(flt)"))) endef +define simd-aes-defs +$(1)-cflags := $(foreach vec,$($(patsubst %-aes,sse,$(1))-vecs) $($(patsubst %-vaes,%,$(1))-vecs), \ + "-D_$(vec) -maes $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") +endef define simd-gf-defs $(1)-cflags := $(foreach vec,$($(1:-gf=)-vecs), \ "-D_$(vec) -mgfni -m$(1:-gf=) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") @@ -153,6 +158,7 @@ endef $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor)))) $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor)))) +$(foreach flavor,$(AES),$(eval $(call simd-aes-defs,$(flavor)))) $(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor)))) $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor)))) @@ -203,10 +209,13 @@ $(addsuffix .c,$(FMA)): $(addsuffix .c,$(SG)): ln -sf simd-sg.c $@ +$(addsuffix .c,$(AES)): + ln -sf simd-aes.c $@ + $(addsuffix .c,$(GF)): ln -sf simd-gf.c $@ -$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(GF)): simd.h +$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(GF)): simd.h xop.h avx512f.h: simd-fma.c --- /dev/null +++ b/tools/tests/x86_emulator/simd-aes.c @@ -0,0 +1,102 @@ +#define UINT_SIZE 1 + +#include "simd.h" +ENTRY(aes_test); + +#if VEC_SIZE == 16 +# define AES(op, a...) __builtin_ia32_vaes ## op ## _v16qi(a) +# define imc(x) ((vec_t)__builtin_ia32_aesimc128((vdi_t)(x))) +#elif VEC_SIZE == 32 +# define AES(op, a...) __builtin_ia32_vaes ## op ## _v32qi(a) +# define imc(x) ({ \ + vec_t r_; \ + unsigned char __attribute__((vector_size(16))) t_; \ + asm ( "vaesimc (%3), %x0\n\t" \ + "vaesimc 16(%3), %1\n\t" \ + "vinserti128 $1, %1, %0, %0" \ + : "=&v" (r_), "=&v" (t_) \ + : "m" (x), "r" (&(x)) ); \ + r_; \ +}) +#elif VEC_SIZE == 64 +# define AES(op, a...) __builtin_ia32_vaes ## op ## _v64qi(a) +# define imc(x) ({ \ + vec_t r_; \ + unsigned char __attribute__((vector_size(16))) t_; \ + asm ( "vaesimc (%3), %x0\n\t" \ + "vaesimc 1*16(%3), %1\n\t" \ + "vinserti32x4 $1, %1, %0, %0\n\t" \ + "vaesimc 2*16(%3), %1\n\t" \ + "vinserti32x4 $2, %1, %0, %0\n\t" \ + "vaesimc 3*16(%3), %1\n\t" \ + "vinserti32x4 $3, %1, %0, %0" \ + : "=&v" (r_), "=&v" (t_) \ + : "m" (x), "r" (&(x)) ); \ + r_; \ +}) +#endif + +#ifdef __AVX512BW__ +# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT)) +# define eq(x, y) (B(pcmpeqb, _mask, (vqi_t)(x), (vqi_t)(y), -1) == ALL_TRUE) +# define aes(op, x, y) ((vec_t)AES(op, (vqi_t)(x), (vqi_t)(y))) +#else +# if defined(__AVX2__) && VEC_SIZE == 32 +# define to_bool(cmp) B(ptestc, , cmp, (vdi_t){} == 0) +# define aes(op, x, y) ((vec_t)AES(op, (vqi_t)(x), (vqi_t)(y))) +# else +# define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff) +# define aes(op, x, y) ((vec_t)__builtin_ia32_aes ## op ## 128((vdi_t)(x), (vdi_t)(y))) +# endif +# define eq(x, y) to_bool((x) == (y)) +#endif + +int aes_test(void) +{ + unsigned int i; + vec_t src, zero = {}; + + for ( i = 0; i < ELEM_COUNT; ++i ) + src[i] = i; + + do { + vec_t x, y; + + touch(src); + x = imc(src); + touch(src); + + touch(zero); + y = aes(enclast, src, zero); + touch(zero); + y = aes(dec, y, zero); + + if ( !eq(x, y) ) return __LINE__; + + touch(zero); + x = aes(declast, src, zero); + touch(zero); + y = aes(enc, x, zero); + touch(y); + x = imc(y); + + if ( !eq(x, src) ) return __LINE__; + +#if VEC_SIZE == 16 + touch(src); + x = (vec_t)__builtin_ia32_aeskeygenassist128((vdi_t)src, 0); + touch(src); + y = (vec_t)__builtin_ia32_pshufb128((vqi_t)x, + (vqi_t){ 7, 4, 5, 6, + 1, 2, 3, 0, + 15, 12, 13, 14, + 9, 10, 11, 8 }); + if ( !eq(x, y) ) return __LINE__; +#endif + + src += ELEM_COUNT; + i += ELEM_COUNT; + } while ( i <= 256 ); + + return 0; +} --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -340,6 +340,10 @@ REN(pandn, , d); REN(por, , d); REN(pxor, , d); # endif +OVR(aesdec); +OVR(aesdeclast); +OVR(aesenc); +OVR(aesenclast); OVR(cvtpd2dqx); OVR(cvtpd2dqy); OVR(cvtpd2psx); --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -12,12 +12,15 @@ asm ( ".pushsection .test, \"ax\", @prog #include "sse.h" #include "sse2.h" #include "sse2-gf.h" +#include "ssse3-aes.h" #include "sse4.h" #include "avx.h" +#include "avx-aes.h" #include "fma4.h" #include "fma.h" #include "avx2.h" #include "avx2-sg.h" +#include "avx2-vaes.h" #include "avx2-gf.h" #include "xop.h" #include "avx512f-opmask.h" @@ -27,6 +30,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512f-sg.h" #include "avx512vl-sg.h" #include "avx512bw.h" +#include "avx512bw-vaes.h" #include "avx512bw-gf.h" #include "avx512dq.h" #include "avx512er.h" @@ -91,6 +95,16 @@ static bool simd_check_xop(void) return cpu_has_xop; } +static bool simd_check_ssse3_aes(void) +{ + return cpu_has_aesni && cpu_has_ssse3; +} + +static bool simd_check_avx_aes(void) +{ + return cpu_has_aesni && cpu_has_avx; +} + static bool simd_check_avx512f(void) { return cpu_has_avx512f; @@ -141,6 +155,22 @@ static bool simd_check_avx512vbmi_vl(voi return cpu_has_avx512_vbmi && cpu_has_avx512vl; } +static bool simd_check_avx2_vaes(void) +{ + return cpu_has_aesni && cpu_has_vaes && cpu_has_avx2; +} + +static bool simd_check_avx512bw_vaes(void) +{ + return cpu_has_aesni && cpu_has_vaes && cpu_has_avx512bw; +} + +static bool simd_check_avx512bw_vaes_vl(void) +{ + return cpu_has_aesni && cpu_has_vaes && + cpu_has_avx512bw && cpu_has_avx512vl; +} + static bool simd_check_sse2_gf(void) { return cpu_has_gfni && cpu_has_sse2; @@ -319,6 +349,8 @@ static const struct { SIMD(XOP i16x16, xop, 32i2), SIMD(XOP i32x8, xop, 32i4), SIMD(XOP i64x4, xop, 32i8), + SIMD(AES (legacy), ssse3_aes, 16), + SIMD(AES (VEX/x16), avx_aes, 16), SIMD(OPMASK/w, avx512f_opmask, 2), SIMD(OPMASK+DQ/b, avx512dq_opmask, 1), SIMD(OPMASK+DQ/w, avx512dq_opmask, 2), @@ -418,6 +450,10 @@ static const struct { AVX512VL(_VBMI+VL u16x8, avx512vbmi, 16u2), AVX512VL(_VBMI+VL s16x16, avx512vbmi, 32i2), AVX512VL(_VBMI+VL u16x16, avx512vbmi, 32u2), + SIMD(VAES (VEX/x32), avx2_vaes, 32), + SIMD(VAES (EVEX/x64), avx512bw_vaes, 64), + AVX512VL(VL+VAES (x16), avx512bw_vaes, 16), + AVX512VL(VL+VAES (x32), avx512bw_vaes, 32), SIMD(GFNI (legacy), sse2_gf, 16), SIMD(GFNI (VEX/x16), avx2_gf, 16), SIMD(GFNI (VEX/x32), avx2_gf, 32), --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -125,10 +125,12 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_sse cp.basic.sse #define cpu_has_sse2 cp.basic.sse2 #define cpu_has_sse3 cp.basic.sse3 +#define cpu_has_ssse3 cp.basic.ssse3 #define cpu_has_fma (cp.basic.fma && xcr0_mask(6)) #define cpu_has_sse4_1 cp.basic.sse4_1 #define cpu_has_sse4_2 cp.basic.sse4_2 #define cpu_has_popcnt cp.basic.popcnt +#define cpu_has_aesni cp.basic.aesni #define cpu_has_avx (cp.basic.avx && xcr0_mask(6)) #define cpu_has_f16c (cp.basic.f16c && xcr0_mask(6)) From patchwork Fri Mar 15 11:08:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854559 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C6D314DE for ; Fri, 15 Mar 2019 11:09:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C8E22A94C for ; Fri, 15 Mar 2019 11:09:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0F1222A94E; Fri, 15 Mar 2019 11:09:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0A1302A94C for ; Fri, 15 Mar 2019 11:09:54 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4khA-0002G2-VE; Fri, 15 Mar 2019 11:08:12 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4kh8-0002Fg-Vq for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:08:11 +0000 X-Inumbo-ID: 9dac069d-4712-11e9-bc90-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 9dac069d-4712-11e9-bc90-bc764e045a96; Fri, 15 Mar 2019 11:08:09 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:08:08 -0600 Message-Id: <5C8B8795020000780021F32C@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:08:05 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 49/50] x86emul: add a SHA test case to the harness X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Also use this for AVX512VL VPRO{L,R}{,V}D as well as some further shifts testing. Signed-off-by: Jan Beulich --- v8: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -20,8 +20,9 @@ SIMD := 3dnow sse sse2 sse4 avx avx2 xop FMA := fma4 fma SG := avx2-sg avx512f-sg avx512vl-sg AES := ssse3-aes avx-aes avx2-vaes avx512bw-vaes +SHA := sse4-sha avx-sha avx512f-sha GF := sse2-gf avx2-gf avx512bw-gf -TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(GF) +TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF) OPMASK := avx512f avx512dq avx512bw @@ -148,6 +149,10 @@ define simd-aes-defs $(1)-cflags := $(foreach vec,$($(patsubst %-aes,sse,$(1))-vecs) $($(patsubst %-vaes,%,$(1))-vecs), \ "-D_$(vec) -maes $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") endef +define simd-sha-defs +$(1)-cflags := $(foreach vec,$(sse-vecs), \ + "-D_$(vec) $(addprefix -m,$(subst -,$(space),$(1))) -Os -DVEC_SIZE=$(vec)") +endef define simd-gf-defs $(1)-cflags := $(foreach vec,$($(1:-gf=)-vecs), \ "-D_$(vec) -mgfni -m$(1:-gf=) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") @@ -159,6 +164,7 @@ endef $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor)))) $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor)))) $(foreach flavor,$(AES),$(eval $(call simd-aes-defs,$(flavor)))) +$(foreach flavor,$(SHA),$(eval $(call simd-sha-defs,$(flavor)))) $(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor)))) $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor)))) @@ -212,10 +218,13 @@ $(addsuffix .c,$(SG)): $(addsuffix .c,$(AES)): ln -sf simd-aes.c $@ +$(addsuffix .c,$(SHA)): + ln -sf simd-sha.c $@ + $(addsuffix .c,$(GF)): ln -sf simd-gf.c $@ -$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(GF)): simd.h +$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF)): simd.h xop.h avx512f.h: simd-fma.c --- /dev/null +++ b/tools/tests/x86_emulator/simd-sha.c @@ -0,0 +1,392 @@ +#define INT_SIZE 4 + +#include "simd.h" +ENTRY(sha_test); + +#define SHA(op, a...) __builtin_ia32_sha ## op(a) + +#ifdef __AVX512F__ +# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT)) +# define eq(x, y) (B(pcmpeqd, _mask, x, y, -1) == ALL_TRUE) +# define blend(x, y, sel) B(movdqa32_, _mask, y, x, sel) +# define rot_c(f, r, x, n) B(pro ## f ## d, _mask, x, n, undef(), ~0) +# define rot_s(f, r, x, n) ({ /* gcc does not support embedded broadcast */ \ + vec_t r_; \ + asm ( "vpro" #f "vd %2%{1to%c3%}, %1, %0" \ + : "=v" (r_) \ + : "v" (x), "m" (n), "i" (ELEM_COUNT) ); \ + r_; \ +}) +# define rot_v(d, x, n) B(pro ## d ## vd, _mask, x, n, undef(), ~0) +# define shift_s(d, x, n) ({ \ + vec_t r_; \ + asm ( "vps" #d "lvd %2%{1to%c3%}, %1, %0" \ + : "=v" (r_) \ + : "v" (x), "m" (n), "i" (ELEM_COUNT) ); \ + r_; \ +}) +# define vshift(d, x, n) ({ /* gcc does not allow memory operands */ \ + vec_t r_; \ + asm ( "vps" #d "ldq %2, %1, %0" \ + : "=v" (r_) : "m" (x), "i" ((n) * ELEM_SIZE) ); \ + r_; \ +}) +#else +# define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff) +# define eq(x, y) to_bool((x) == (y)) +# define blend(x, y, sel) \ + ((vec_t)__builtin_ia32_pblendw128((vhi_t)(x), (vhi_t)(y), \ + ((sel) & 1 ? 0x03 : 0) | \ + ((sel) & 2 ? 0x0c : 0) | \ + ((sel) & 4 ? 0x30 : 0) | \ + ((sel) & 8 ? 0xc0 : 0))) +# define rot_c(f, r, x, n) (sh ## f ## _c(x, n) | sh ## r ## _c(x, 32 - (n))) +# define rot_s(f, r, x, n) ({ /* gcc does not allow memory operands */ \ + vec_t r_, t_, n_ = (vec_t){ 32 } - (n); \ + asm ( "ps" #f "ld %2, %0; ps" #r "ld %3, %1; por %1, %0" \ + : "=&x" (r_), "=&x" (t_) \ + : "m" (n), "m" (n_), "0" (x), "1" (x) ); \ + r_; \ +}) +static inline unsigned int rotl(unsigned int x, unsigned int n) +{ + return (x << (n & 0x1f)) | (x >> ((32 - n) & 0x1f)); +} +static inline unsigned int rotr(unsigned int x, unsigned int n) +{ + return (x >> (n & 0x1f)) | (x << ((32 - n) & 0x1f)); +} +# define rot_v(d, x, n) ({ \ + vec_t t_; \ + unsigned int i_; \ + for ( i_ = 0; i_ < ELEM_COUNT; ++i_ ) \ + t_[i_] = rot ## d((x)[i_], (n)[i_]); \ + t_; \ +}) +# define shift_s(d, x, n) ({ \ + vec_t r_; \ + asm ( "ps" #d "ld %1, %0" : "=&x" (r_) : "m" (n), "0" (x) ); \ + r_; \ +}) +# define vshift(d, x, n) \ + (vec_t)(__builtin_ia32_ps ## d ## ldqi128((vdi_t)(x), (n) * ELEM_SIZE * 8)) +#endif + +#define alignr(x, y, n) ((vec_t)__builtin_ia32_palignr128((vdi_t)(x), (vdi_t)(y), (n) * 8)) +#define hadd(x, y) __builtin_ia32_phaddd128(x, y) +#define rol_c(x, n) rot_c(l, r, x, n) +#define rol_s(x, n) rot_s(l, r, x, n) +#define rol_v(x, n...) rot_v(l, x, n) +#define ror_c(x, n) rot_c(r, l, x, n) +#define ror_s(x, n) rot_s(r, l, x, n) +#define ror_v(x, n...) rot_v(r, x, n) +#define shl_c(x, n) __builtin_ia32_pslldi128(x, n) +#define shl_s(x, n) shift_s(l, x, n) +#define shr_c(x, n) __builtin_ia32_psrldi128(x, n) +#define shr_s(x, n) shift_s(r, x, n) +#define shuf(x, s) __builtin_ia32_pshufd(x, s) +#define swap(x) shuf(x, 0b00011011) +#define vshl(x, n) vshift(l, x, n) +#define vshr(x, n) vshift(r, x, n) + +static inline vec_t sha256_sigma0(vec_t w) +{ + vec_t res; + + touch(w); + res = ror_c(w, 7); + touch(w); + res ^= rol_c(w, 14); + touch(w); + res ^= shr_c(w, 3); + touch(w); + + return res; +} + +static inline vec_t sha256_sigma1(vec_t w) +{ + vec_t _17 = { 17 }, _19 = { 19 }, _10 = { 10 }; + + return ror_s(w, _17) ^ ror_s(w, _19) ^ shr_s(w, _10); +} + +static inline vec_t sha256_Sigma0(vec_t w) +{ + vec_t res, n1 = { 0, 0, 2, 2 }, n2 = { 0, 0, 13, 13 }, n3 = { 0, 0, 10, 10 }; + + touch(n1); + res = ror_v(w, n1); + touch(n2); + res ^= ror_v(w, n2); + touch(n3); + + return res ^ rol_v(w, n3); +} + +static inline vec_t sha256_Sigma1(vec_t w) +{ + return ror_c(w, 6) ^ ror_c(w, 11) ^ rol_c(w, 7); +} + +int sha_test(void) +{ + unsigned int i; + vec_t src, one = { 1 }; + vqi_t raw = {}; + + for ( i = 1; i < VEC_SIZE; ++i ) + raw[i] = i; + src = (vec_t)raw; + + for ( i = 0; i < 256; i += VEC_SIZE ) + { + vec_t x, y, tmp, hash = -src; + vec_t a, b, c, d, e, g, h; + unsigned int k, r; + + touch(src); + x = SHA(1msg1, hash, src); + touch(src); + y = hash ^ alignr(hash, src, 8); + touch(src); + + if ( !eq(x, y) ) return __LINE__; + + touch(src); + x = SHA(1msg2, hash, src); + touch(src); + tmp = hash ^ alignr(src, hash, 12); + touch(tmp); + y = rol_c(tmp, 1); + tmp = hash ^ alignr(src, y, 12); + touch(tmp); + y = rol_c(tmp, 1); + + if ( !eq(x, y) ) return __LINE__; + + touch(src); + x = SHA(1msg2, hash, src); + touch(src); + tmp = rol_s(hash ^ alignr(src, hash, 12), one); + y = rol_s(hash ^ alignr(src, tmp, 12), one); + + if ( !eq(x, y) ) return __LINE__; + + touch(src); + x = SHA(1nexte, hash, src); + touch(src); + touch(hash); + tmp = rol_c(hash, 30); + tmp[2] = tmp[1] = tmp[0] = 0; + + if ( !eq(x, src + tmp) ) return __LINE__; + + /* + * SHA1RNDS4 + * + * SRC1 = { A0, B0, C0, D0 } + * SRC2 = W' = { W[0]E0, W[1], W[2], W[3] } + * + * (NB that the notation is not C-like, i.e. elements are listed + * high-to-low everywhere in this comment.) + * + * In order to pick a simple rounds function, an immediate value of + * 1 is used; 3 would also be a possibility. + * + * Applying + * + * A1 = ROL5(A0) + (B0 ^ C0 ^ D0) + W'[0] + K + * E1 = D0 + * D1 = C0 + * C1 = ROL30(B0) + * B1 = A0 + * + * iteratively four times and resolving round variable values to + * A and B0, C0, and D0 we get + * + * A4 = ROL5(A3) + (A2 ^ ROL30(A1) ^ ROL30(A0)) + W'[3] + ROL30(B0) + K + * A3 = ROL5(A2) + (A1 ^ ROL30(A0) ^ ROL30(B0)) + W'[2] + C0 + K + * A2 = ROL5(A1) + (A0 ^ ROL30(B0) ^ C0 ) + W'[1] + D0 + K + * A1 = ROL5(A0) + (B0 ^ C0 ^ D0 ) + W'[0] + K + * + * (respective per-column variable names: + * y a b c d src e k + * ) + * + * with + * + * B4 = A3 + * C4 = ROL30(A2) + * D4 = ROL30(A1) + * E4 = ROL30(A0) + * + * and hence + * + * DST = { A4, A3, ROL30(A2), ROL30(A1) } + */ + + touch(src); + x = SHA(1rnds4, hash, src, 1); + touch(src); + + a = vshr(hash, 3); + b = vshr(hash, 2); + touch(hash); + d = rol_c(hash, 30); + touch(hash); + d = blend(d, hash, 0b0011); + c = vshr(d, 1); + e = vshl(d, 1); + tmp = (vec_t){}; + k = rol_c(SHA(1rnds4, tmp, tmp, 1), 2)[0]; + + for ( r = 0; r < 4; ++r ) + { + y = rol_c(a, 5) + (b ^ c ^ d) + swap(src) + e + k; + + switch ( r ) + { + case 0: + c[3] = rol_c(y, 30)[0]; + /* fall through */ + case 1: + b[r + 2] = y[r]; + /* fall through */ + case 2: + a[r + 1] = y[r]; + break; + } + + switch ( r ) + { + case 3: + if ( a[3] != y[2] ) return __LINE__; + /* fall through */ + case 2: + if ( a[2] != y[1] ) return __LINE__; + if ( b[3] != y[1] ) return __LINE__; + /* fall through */ + case 1: + if ( a[1] != y[0] ) return __LINE__; + if ( b[2] != y[0] ) return __LINE__; + if ( c[3] != rol_c(y, 30)[0] ) return __LINE__; + break; + } + } + + a = blend(rol_c(y, 30), y, 0b1100); + + if ( !eq(x, a) ) return __LINE__; + + touch(src); + x = SHA(256msg1, hash, src); + touch(src); + y = hash + sha256_sigma0(alignr(src, hash, 4)); + + if ( !eq(x, y) ) return __LINE__; + + touch(src); + x = SHA(256msg2, hash, src); + touch(src); + tmp = hash + sha256_sigma1(alignr(hash, src, 8)); + y = hash + sha256_sigma1(alignr(tmp, src, 8)); + + if ( !eq(x, y) ) return __LINE__; + + /* + * SHA256RNDS2 + * + * SRC1 = { C0, D0, G0, H0 } + * SRC2 = { A0, B0, E0, F0 } + * XMM0 = W' = { ?, ?, WK1, WK0 } + * + * (NB that the notation again is not C-like, i.e. elements are listed + * high-to-low everywhere in this comment.) + * + * Ch(E,F,G) = (E & F) ^ (~E & G) + * Maj(A,B,C) = (A & B) ^ (A & C) ^ (B & C) + * + * Σ0(A) = ROR2(A) ^ ROR13(A) ^ ROR22(A) + * Σ1(E) = ROR6(E) ^ ROR11(E) ^ ROR25(E) + * + * Applying + * + * A1 = Ch(E0, F0, G0) + Σ1(E0) + WK0 + H0 + Maj(A0, B0, C0) + Σ0(A0) + * B1 = A0 + * C1 = B0 + * D1 = C0 + * E1 = Ch(E0, F0, G0) + Σ1(E0) + WK0 + H0 + D0 + * F1 = E0 + * G1 = F0 + * H1 = G0 + * + * iteratively four times and resolving round variable values to + * A / E and B0, C0, D0, F0, G0, and H0 we get + * + * A2 = Ch(E1, E0, F0) + Σ1(E1) + WK1 + G0 + Maj(A1, A0, B0) + Σ0(A1) + * A1 = Ch(E0, F0, G0) + Σ1(E0) + WK0 + H0 + Maj(A0, B0, C0) + Σ0(A0) + * E2 = Ch(E1, E0, F0) + Σ1(E1) + WK1 + G0 + C0 + * E1 = Ch(E0, F0, G0) + Σ1(E0) + WK0 + H0 + D0 + * + * with + * + * B2 = A1 + * F2 = E1 + * + * and hence + * + * DST = { A2, A1, E2, E1 } + * + * which we can simplify a little, by letting A0, B0, and E0 be zero + * and F0 = ~G0, and by then utilizing + * + * Ch(0, 0, x) = x + * Ch(x, 0, y) = ~x & y + * Maj(x, 0, 0) = Maj(0, x, 0) = Maj(0, 0, x) = 0 + * + * A2 = (~E1 & F0) + Σ1(E1) + WK1 + G0 + Σ0(A1) + * A1 = (~E0 & G0) + Σ1(E0) + WK0 + H0 + Σ0(A0) + * E2 = (~E1 & F0) + Σ1(E1) + WK1 + G0 + C0 + * E1 = (~E0 & G0) + Σ1(E0) + WK0 + H0 + D0 + * + * (respective per-column variable names: + * y e g e src h d + * ) + */ + + tmp = (vec_t){ ~hash[1] }; + touch(tmp); + x = SHA(256rnds2, hash, tmp, src); + touch(tmp); + + e = y = (vec_t){}; + d = alignr(y, hash, 8); + g = (vec_t){ hash[1], tmp[0], hash[1], tmp[0] }; + h = shuf(hash, 0b01000100); + + for ( r = 0; r < 2; ++r ) + { + y = (~e & g) + sha256_Sigma1(e) + shuf(src, 0b01000100) + + h + sha256_Sigma0(d); + + if ( !r ) + { + d[3] = y[2]; + e[3] = e[1] = y[0]; + } + else if ( d[3] != y[2] ) + return __LINE__; + else if ( e[1] != y[0] ) + return __LINE__; + else if ( e[3] != y[0] ) + return __LINE__; + } + + if ( !eq(x, y) ) return __LINE__; + + src += 0x01010101 * VEC_SIZE; + } + + return 0; +} --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -14,8 +14,10 @@ asm ( ".pushsection .test, \"ax\", @prog #include "sse2-gf.h" #include "ssse3-aes.h" #include "sse4.h" +#include "sse4-sha.h" #include "avx.h" #include "avx-aes.h" +#include "avx-sha.h" #include "fma4.h" #include "fma.h" #include "avx2.h" @@ -28,6 +30,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512bw-opmask.h" #include "avx512f.h" #include "avx512f-sg.h" +#include "avx512f-sha.h" #include "avx512vl-sg.h" #include "avx512bw.h" #include "avx512bw-vaes.h" @@ -155,6 +158,21 @@ static bool simd_check_avx512vbmi_vl(voi return cpu_has_avx512_vbmi && cpu_has_avx512vl; } +static bool simd_check_sse4_sha(void) +{ + return cpu_has_sha && cpu_has_sse4_2; +} + +static bool simd_check_avx_sha(void) +{ + return cpu_has_sha && cpu_has_avx; +} + +static bool simd_check_avx512f_sha_vl(void) +{ + return cpu_has_sha && cpu_has_avx512vl; +} + static bool simd_check_avx2_vaes(void) { return cpu_has_aesni && cpu_has_vaes && cpu_has_avx2; @@ -450,6 +468,9 @@ static const struct { AVX512VL(_VBMI+VL u16x8, avx512vbmi, 16u2), AVX512VL(_VBMI+VL s16x16, avx512vbmi, 32i2), AVX512VL(_VBMI+VL u16x16, avx512vbmi, 32u2), + SIMD(SHA, sse4_sha, 16), + SIMD(AVX+SHA, avx_sha, 16), + AVX512VL(VL+SHA, avx512f_sha, 16), SIMD(VAES (VEX/x32), avx2_vaes, 32), SIMD(VAES (EVEX/x64), avx512bw_vaes, 64), AVX512VL(VL+VAES (x16), avx512bw_vaes, 16), --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -142,6 +142,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512_ifma (cp.feat.avx512_ifma && xcr0_mask(0xe6)) #define cpu_has_avx512er (cp.feat.avx512er && xcr0_mask(0xe6)) #define cpu_has_avx512cd (cp.feat.avx512cd && xcr0_mask(0xe6)) +#define cpu_has_sha cp.feat.sha #define cpu_has_avx512bw (cp.feat.avx512bw && xcr0_mask(0xe6)) #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) From patchwork Fri Mar 15 11:08:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10854561 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B347715AC for ; Fri, 15 Mar 2019 11:10:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 922282A94D for ; Fri, 15 Mar 2019 11:10:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7DBD22A94F; Fri, 15 Mar 2019 11:10:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9DDB32A94D for ; Fri, 15 Mar 2019 11:10:13 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4khX-0002La-9i; Fri, 15 Mar 2019 11:08:35 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1h4khW-0002LL-DM for xen-devel@lists.xenproject.org; Fri, 15 Mar 2019 11:08:34 +0000 X-Inumbo-ID: ab2986d8-4712-11e9-871f-c7ab690a713f Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id ab2986d8-4712-11e9-871f-c7ab690a713f; Fri, 15 Mar 2019 11:08:32 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Fri, 15 Mar 2019 05:08:31 -0600 Message-Id: <5C8B87B0020000780021F32F@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Fri, 15 Mar 2019 05:08:32 -0600 From: "Jan Beulich" To: "xen-devel" References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 50/50] x86emul: add a PCLMUL/VPCLMUL test case to the harness X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Also use this for AVX512_VBMI2 VPSH{L,R}D{,V}{D,Q,W} testing (only the quad word right shifts get actually used; the assumption is that their "left" counterparts as well as the double word and word forms then work as well). Signed-off-by: Jan Beulich Acked-by: Andrew Cooper (subject to all --- v8: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -20,9 +20,10 @@ SIMD := 3dnow sse sse2 sse4 avx avx2 xop FMA := fma4 fma SG := avx2-sg avx512f-sg avx512vl-sg AES := ssse3-aes avx-aes avx2-vaes avx512bw-vaes +CLMUL := ssse3-pclmul avx-pclmul avx2-vpclmulqdq avx512bw-vpclmulqdq avx512vbmi2-vpclmulqdq SHA := sse4-sha avx-sha avx512f-sha GF := sse2-gf avx2-gf avx512bw-gf -TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF) +TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(CLMUL) $(SHA) $(GF) OPMASK := avx512f avx512dq avx512bw @@ -89,6 +90,7 @@ avx512er-flts := 4 8 avx512vbmi-vecs := $(avx512bw-vecs) avx512vbmi-ints := $(avx512bw-ints) avx512vbmi-flts := $(avx512bw-flts) +avx512vbmi2-vecs := $(avx512bw-vecs) avx512f-opmask-vecs := 2 avx512dq-opmask-vecs := 1 2 @@ -149,6 +151,10 @@ define simd-aes-defs $(1)-cflags := $(foreach vec,$($(patsubst %-aes,sse,$(1))-vecs) $($(patsubst %-vaes,%,$(1))-vecs), \ "-D_$(vec) -maes $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") endef +define simd-clmul-defs +$(1)-cflags := $(foreach vec,$($(patsubst %-pclmul,sse,$(1))-vecs) $($(patsubst %-vpclmulqdq,%,$(1))-vecs), \ + "-D_$(vec) -mpclmul $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") +endef define simd-sha-defs $(1)-cflags := $(foreach vec,$(sse-vecs), \ "-D_$(vec) $(addprefix -m,$(subst -,$(space),$(1))) -Os -DVEC_SIZE=$(vec)") @@ -164,6 +170,7 @@ endef $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor)))) $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor)))) $(foreach flavor,$(AES),$(eval $(call simd-aes-defs,$(flavor)))) +$(foreach flavor,$(CLMUL),$(eval $(call simd-clmul-defs,$(flavor)))) $(foreach flavor,$(SHA),$(eval $(call simd-sha-defs,$(flavor)))) $(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor)))) $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor)))) @@ -218,13 +225,16 @@ $(addsuffix .c,$(SG)): $(addsuffix .c,$(AES)): ln -sf simd-aes.c $@ +$(addsuffix .c,$(CLMUL)): + ln -sf simd-clmul.c $@ + $(addsuffix .c,$(SHA)): ln -sf simd-sha.c $@ $(addsuffix .c,$(GF)): ln -sf simd-gf.c $@ -$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF)): simd.h +$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(CLMUL) $(SHA) $(GF)): simd.h xop.h avx512f.h: simd-fma.c --- /dev/null +++ b/tools/tests/x86_emulator/simd-clmul.c @@ -0,0 +1,150 @@ +#define UINT_SIZE 8 + +#include "simd.h" +ENTRY(clmul_test); + +#ifdef __AVX512F__ /* AVX512BW may get enabled only below */ +# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT)) +# define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE) +# define lane_shr_unit(x) \ + ((vec_t)B(palignr, _mask, (vdi_t)(x), (vdi_t)(x), 64, (vdi_t){}, \ + 0x00ff00ff00ff00ffULL & (~0ULL >> (64 - VEC_SIZE)))) +#else +# if defined(__AVX2__) && VEC_SIZE == 32 +# define to_bool(cmp) B(ptestc, , cmp, (vdi_t){} == 0) +# else +# define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff) +# endif +# define eq(x, y) to_bool((x) == (y)) +# define lane_shr_unit(x) ((vec_t)B(palignr, , (vdi_t){}, (vdi_t)(x), 64)) +#endif + +#define CLMUL(op, x, y, c) (vec_t)(__builtin_ia32_ ## op((vdi_t)(x), (vdi_t)(y), c)) + +#if VEC_SIZE == 16 +# define clmul(x, y, c) CLMUL(pclmulqdq128, x, y, c) +# define vpshrd __builtin_ia32_vpshrd_v2di +#elif VEC_SIZE == 32 +# define clmul(x, y, c) CLMUL(vpclmulqdq_v4di, x, y, c) +# define vpshrd __builtin_ia32_vpshrd_v4di +#elif VEC_SIZE == 64 +# define clmul(x, y, c) CLMUL(vpclmulqdq_v8di, x, y, c) +# define vpshrd __builtin_ia32_vpshrd_v8di +#endif + +#define clmul_ll(x, y) clmul(x, y, 0x00) +#define clmul_hl(x, y) clmul(x, y, 0x01) +#define clmul_lh(x, y) clmul(x, y, 0x10) +#define clmul_hh(x, y) clmul(x, y, 0x11) + +#if defined(__AVX512VBMI2__) +# pragma GCC target ( "avx512bw" ) +# define lane_shr_i(x, n) ({ \ + vec_t h_ = lane_shr_unit(x); \ + touch(h_); \ + (n) < 64 ? (vec_t)vpshrd((vdi_t)(x), (vdi_t)(h_), n) : h_ >> ((n) - 64); \ +}) +# define lane_shr_v(x, n) ({ \ + vec_t t_ = (x), h_ = lane_shr_unit(x); \ + typeof(t_[0]) n_ = (n); \ + if ( (n) < 64 ) \ + /* gcc does not support embedded broadcast */ \ + asm ( "vpshrdvq %2%{1to%c3%}, %1, %0" \ + : "+v" (t_) : "v" (h_), "m" (n_), "i" (ELEM_COUNT) ); \ + else \ + t_ = h_ >> ((n) - 64); \ + t_; \ +}) +#else +# define lane_shr_i lane_shr_v +# define lane_shr_v(x, n) ({ \ + vec_t t_ = (n) > 0 ? lane_shr_unit(x) : (x); \ + (n) < 64 ? ((x) >> (n)) | (t_ << (-(n) & 0x3f)) \ + : t_ >> ((n) - 64); \ +}) +#endif + +int clmul_test(void) +{ + unsigned int i; + vec_t src; + vqi_t raw = {}; + + for ( i = 1; i < VEC_SIZE; ++i ) + raw[i] = i; + src = (vec_t)raw; + + for ( i = 0; i < 256; i += VEC_SIZE ) + { + vec_t x = {}, y, z, lo, hi; + unsigned int j; + + touch(x); + y = clmul_ll(src, x); + touch(x); + + if ( !eq(y, x) ) return __LINE__; + + for ( j = 0; j < ELEM_COUNT; j += 2 ) + x[j] = 1; + + touch(src); + y = clmul_ll(x, src); + touch(src); + z = clmul_lh(x, src); + touch(src); + + for ( j = 0; j < ELEM_COUNT; j += 2 ) + y[j + 1] = z[j]; + + if ( !eq(y, src) ) return __LINE__; + + /* + * Besides the obvious property of the low and high half products + * being the same either direction, the "square" of a number has the + * property of simply being the original bit pattern with a zero bit + * inserted between any two bits. This is what the code below checks. + */ + + x = src; + touch(src); + y = clmul_lh(x, src); + touch(src); + z = clmul_hl(x, src); + + if ( !eq(y, z) ) return __LINE__; + + touch(src); + y = lo = clmul_ll(x, src); + touch(src); + z = hi = clmul_hh(x, src); + touch(src); + + for ( j = 0; j < 64; ++j ) + { + vec_t l = lane_shr_v(lo, 2 * j); + vec_t h = lane_shr_v(hi, 2 * j); + unsigned int n; + + if ( !eq(l, y) ) return __LINE__; + if ( !eq(h, z) ) return __LINE__; + + x = src >> j; + + for ( n = 0; n < ELEM_COUNT; n += 2 ) + { + if ( (x[n + 0] & 1) != (l[n] & 3) ) return __LINE__; + if ( (x[n + 1] & 1) != (h[n] & 3) ) return __LINE__; + } + + touch(y); + y = lane_shr_i(y, 2); + touch(z); + z = lane_shr_i(z, 2); + } + + src += 0x0101010101010101ULL * VEC_SIZE; + } + + return 0; +} --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -381,6 +381,7 @@ OVR(movntdq); OVR(movntdqa); OVR(movshdup); OVR(movsldup); +OVR(pclmulqdq); OVR(permd); OVR(permq); OVR(pmovsxbd); --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -13,16 +13,19 @@ asm ( ".pushsection .test, \"ax\", @prog #include "sse2.h" #include "sse2-gf.h" #include "ssse3-aes.h" +#include "ssse3-pclmul.h" #include "sse4.h" #include "sse4-sha.h" #include "avx.h" #include "avx-aes.h" +#include "avx-pclmul.h" #include "avx-sha.h" #include "fma4.h" #include "fma.h" #include "avx2.h" #include "avx2-sg.h" #include "avx2-vaes.h" +#include "avx2-vpclmulqdq.h" #include "avx2-gf.h" #include "xop.h" #include "avx512f-opmask.h" @@ -34,10 +37,12 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512vl-sg.h" #include "avx512bw.h" #include "avx512bw-vaes.h" +#include "avx512bw-vpclmulqdq.h" #include "avx512bw-gf.h" #include "avx512dq.h" #include "avx512er.h" #include "avx512vbmi.h" +#include "avx512vbmi2-vpclmulqdq.h" #define verbose false /* Switch to true for far more logging. */ @@ -108,6 +113,16 @@ static bool simd_check_avx_aes(void) return cpu_has_aesni && cpu_has_avx; } +static bool simd_check_ssse3_pclmul(void) +{ + return cpu_has_pclmulqdq && cpu_has_ssse3; +} + +static bool simd_check_avx_pclmul(void) +{ + return cpu_has_pclmulqdq && cpu_has_avx; +} + static bool simd_check_avx512f(void) { return cpu_has_avx512f; @@ -189,6 +204,31 @@ static bool simd_check_avx512bw_vaes_vl( cpu_has_avx512bw && cpu_has_avx512vl; } +static bool simd_check_avx2_vpclmulqdq(void) +{ + return cpu_has_vpclmulqdq && cpu_has_avx2; +} + +static bool simd_check_avx512bw_vpclmulqdq(void) +{ + return cpu_has_vpclmulqdq && cpu_has_avx512bw; +} + +static bool simd_check_avx512bw_vpclmulqdq_vl(void) +{ + return cpu_has_vpclmulqdq && cpu_has_avx512bw && cpu_has_avx512vl; +} + +static bool simd_check_avx512vbmi2_vpclmulqdq(void) +{ + return cpu_has_avx512_vbmi2 && simd_check_avx512bw_vpclmulqdq(); +} + +static bool simd_check_avx512vbmi2_vpclmulqdq_vl(void) +{ + return cpu_has_avx512_vbmi2 && simd_check_avx512bw_vpclmulqdq_vl(); +} + static bool simd_check_sse2_gf(void) { return cpu_has_gfni && cpu_has_sse2; @@ -369,6 +409,8 @@ static const struct { SIMD(XOP i64x4, xop, 32i8), SIMD(AES (legacy), ssse3_aes, 16), SIMD(AES (VEX/x16), avx_aes, 16), + SIMD(PCLMUL (legacy), ssse3_pclmul, 16), + SIMD(PCLMUL (VEX/x2), avx_pclmul, 16), SIMD(OPMASK/w, avx512f_opmask, 2), SIMD(OPMASK+DQ/b, avx512dq_opmask, 1), SIMD(OPMASK+DQ/w, avx512dq_opmask, 2), @@ -475,6 +517,13 @@ static const struct { SIMD(VAES (EVEX/x64), avx512bw_vaes, 64), AVX512VL(VL+VAES (x16), avx512bw_vaes, 16), AVX512VL(VL+VAES (x32), avx512bw_vaes, 32), + SIMD(VPCLMUL (VEX/x4), avx2_vpclmulqdq, 32), + SIMD(VPCLMUL (EVEX/x8), avx512bw_vpclmulqdq, 64), + AVX512VL(VL+VPCLMUL (x4), avx512bw_vpclmulqdq, 16), + AVX512VL(VL+VPCLMUL (x8), avx512bw_vpclmulqdq, 32), + SIMD(AVX512_VBMI2+VPCLMUL (x8), avx512vbmi2_vpclmulqdq, 64), + AVX512VL(_VBMI2+VL+VPCLMUL (x2), avx512vbmi2_vpclmulqdq, 16), + AVX512VL(_VBMI2+VL+VPCLMUL (x4), avx512vbmi2_vpclmulqdq, 32), SIMD(GFNI (legacy), sse2_gf, 16), SIMD(GFNI (VEX/x16), avx2_gf, 16), SIMD(GFNI (VEX/x32), avx2_gf, 32), --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -125,6 +125,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_sse cp.basic.sse #define cpu_has_sse2 cp.basic.sse2 #define cpu_has_sse3 cp.basic.sse3 +#define cpu_has_pclmulqdq cp.basic.pclmulqdq #define cpu_has_ssse3 cp.basic.ssse3 #define cpu_has_fma (cp.basic.fma && xcr0_mask(6)) #define cpu_has_sse4_1 cp.basic.sse4_1