[v8,44/50] x86emul: support VPCLMULQDQ insns

Message ID	5C8B86FB020000780021F31D@prv1-mh.provo.novell.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Message-Id: <5C8B86FB020000780021F31D@prv1-mh.provo.novell.com> Date: Fri, 15 Mar 2019 05:05:31 -0600 From: "Jan Beulich" <JBeulich@suse.com> To: "xen-devel" <xen-devel@lists.xenproject.org> References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com> <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH v8 44/50] x86emul: support VPCLMULQDQ insns Precedence: list Cc: George Dunlap <George.Dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>, Roger Pau Monne <roger.pau@citrix.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	x86emul: remaining AVX512 support \| expand [v8,00/50] x86emul: remaining AVX512 support [v8,01/50] x86emul: no need to set fault_suppression to false for VMOVNT* [v8,02/50] x86emul: support AVX512{F, BW, DQ} extract insns [v8,03/50] x86emul: support AVX512{F, BW, DQ} insert insns [v8,04/50] x86emul: basic AVX512F testing [v8,05/50] x86emul: support AVX512{F, BW, DQ} integer broadcast insns [v8,06/50] x86emul: basic AVX512VL testing [v8,07/50] x86emul: support AVX512{F, BW} zero- and sign-extending moves [v8,08/50] x86emul: support AVX512{F, BW} down conversion moves [v8,09/50] x86emul: support AVX512{F, BW} integer unpack insns [v8,10/50] x86emul: support AVX512{F, BW, _VBMI} full permute insns [v8,11/50] x86emul: support AVX512{F, BW} integer shuffle insns [v8,12/50] x86emul: support AVX512{BW, DQ} mask move insns [v8,13/50] x86emul: basic AVX512BW testing [v8,14/50] x86emul: basic AVX512DQ testing [v8,15/50] x86emul: support AVX512F move high/low insns [v8,16/50] x86emul: support AVX512F move duplicate insns [v8,17/50] x86emul: support AVX512{F, BW, _VBMI} permute insns [v8,18/50] x86emul: support AVX512BW pack insns [v8,19/50] x86emul: support AVX512F floating-point conversion insns [v8,20/50] x86emul: support AVX512F legacy-equivalent packed int/FP conversion insns [v8,21/50] x86emul: support AVX512F legacy-equivalent scalar int/FP conversion insns [v8,22/50] x86emul: support AVX512DQ packed quad-int/FP conversion insns [v8,23/50] x86emul: support AVX512{F, DQ} uint-to-FP conversion insns [v8,24/50] x86emul: support AVX512{F, DQ} FP-to-uint conversion insns [v8,25/50] x86emul: support remaining AVX512F legacy-equivalent insns [v8,26/50] x86emul: support remaining AVX512BW legacy-equivalent insns [v8,27/50] x86emul: support AVX512{F, ER} reciprocal insns [v8,28/50] x86emul: support AVX512F floating point manipulation insns [v8,29/50] x86emul: support AVX512DQ floating point manipulation insns [v8,30/50] x86emul: support AVX512{F, _VBMI2} compress/expand insns [v8,31/50] x86emul: support remaining misc AVX512{F, BW} insns [v8,32/50] x86emul: support AVX512F gather insns [v8,33/50] x86emul: add high register S/G test cases [v8,34/50] x86emul: support AVX512F scatter insns [v8,35/50] x86emul: support AVX512PF insns [v8,36/50] x86emul: support AVX512CD insns [v8,37/50] x86emul: complete support of AVX512_VBMI insns [v8,38/50] x86emul: support of AVX512* population count insns [v8,39/50] x86emul: support of AVX512_IFMA insns [v8,40/50] x86emul: support remaining AVX512_VBMI2 insns [v8,41/50] x86emul: support AVX512_4FMAPS insns [v8,42/50] x86emul: support AVX512_4VNNIW insns [v8,43/50] x86emul: support AVX512_VNNI insns [v8,44/50] x86emul: support VPCLMULQDQ insns [v8,45/50] x86emul: support VAES insns [v8,46/50] x86emul: support GFNI insns [v8,47/50] x86emul: restore ordering within main switch statement [v8,48/50] x86emul: add an AES/VAES test case to the harness [v8,49/50] x86emul: add a SHA test case to the harness [v8,50/50] x86emul: add a PCLMUL/VPCLMUL test case to the harness

Message ID

5C8B86FB020000780021F31D@prv1-mh.provo.novell.com (mailing list archive)

State

New, archived

Headers

Message-Id: <5C8B86FB020000780021F31D@prv1-mh.provo.novell.com>
Date: Fri, 15 Mar 2019 05:05:31 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
References: <5B6BF83602000078001DC548@prv1-mh.provo.novell.com>
 <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
In-Reply-To: <5C8B7EC0020000780021F10B@prv1-mh.provo.novell.com>
Mime-Version: 1.0
Content-Disposition: inline
Subject: [Xen-devel] [PATCH v8 44/50] x86emul: support VPCLMULQDQ insns
Precedence: list
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wei.liu2@citrix.com>,
 Roger Pau Monne <roger.pau@citrix.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Series

x86emul: remaining AVX512 support | expand

Commit Message

Jan Beulich March 15, 2019, 11:05 a.m. UTC

As to the feature dependency adjustment, while strictly speaking AVX is
a sufficient prereq (to have YMM registers), 256-bit vectors of integers
have got fully introduced with AVX2 only. Sadly gcc can't be used as a
reference here: They don't provide any AVX512-independent built-in at
all.

Along the lines of PCLMULQDQ, since the insns here and in particular
their memory access patterns follow the usual scheme, I didn't think it
was necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: Should VPCLMULQDQ also depend on PCLMULQDQ?
---
v8: No need to set fault_suppression to false.
v7: New.

Comments

Andrew Cooper June 21, 2019, 12:52 p.m. UTC | #1

On 15/03/2019 11:05, Jan Beulich wrote:
> As to the feature dependency adjustment, while strictly speaking AVX is
> a sufficient prereq (to have YMM registers), 256-bit vectors of integers
> have got fully introduced with AVX2 only. Sadly gcc can't be used as a
> reference here: They don't provide any AVX512-independent built-in at
> all.
>
> Along the lines of PCLMULQDQ, since the insns here and in particular
> their memory access patterns follow the usual scheme, I didn't think it
> was necessary to add a contrived test specifically for them, beyond the
> Disp8 scaling one.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TBD: Should VPCLMULQDQ also depend on PCLMULQDQ?

I think so, yes.  These are all 64 by 64 multiplies with a 128 bit
result, and an imm8 to choose which quadwords get used, so both these
features will be using the silicon block in the vector pipeline.  The
only difference is whether its wired through from the legacy SSE
instructions, or the [E]VEX instructions.

I certainly don't expect to ever see hardware with VPCLMULQDQ but
lacking PCLMULQDQ.

> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -255,8 +255,9 @@ def crunch_numbers(state):
>  
>          # This is just the dependency between AVX512 and AVX2 of XSTATE
>          # feature flags.  If want to use AVX512, AVX2 must be supported and
> -        # enabled.
> -        AVX2: [AVX512F],
> +        # enabled.  Certain later extensions, acting on 256-bit vectors of
> +        # integers, better depend on AVX2 than AVX.
> +        AVX2: [AVX512F, VPCLMULQDQ],

Hmm - this is awkward, because in practice, there won't be any hardware
in existence with VPCLMULQDQ and AVX2 but lacking AVX512.

However, the VEX encoding is legitimate in the absence of the EVEX
encoding, and doesn't depend on the AVX512 XCR0 state.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Jan Beulich June 21, 2019, 1:44 p.m. UTC | #2

>>> On 21.06.19 at 14:52, <andrew.cooper3@citrix.com> wrote:
> On 15/03/2019 11:05, Jan Beulich wrote:
>> As to the feature dependency adjustment, while strictly speaking AVX is
>> a sufficient prereq (to have YMM registers), 256-bit vectors of integers
>> have got fully introduced with AVX2 only. Sadly gcc can't be used as a
>> reference here: They don't provide any AVX512-independent built-in at
>> all.
>>
>> Along the lines of PCLMULQDQ, since the insns here and in particular
>> their memory access patterns follow the usual scheme, I didn't think it
>> was necessary to add a contrived test specifically for them, beyond the
>> Disp8 scaling one.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> TBD: Should VPCLMULQDQ also depend on PCLMULQDQ?
> 
> I think so, yes.  These are all 64 by 64 multiplies with a 128 bit
> result, and an imm8 to choose which quadwords get used, so both these
> features will be using the silicon block in the vector pipeline.  The
> only difference is whether its wired through from the legacy SSE
> instructions, or the [E]VEX instructions.
> 
> I certainly don't expect to ever see hardware with VPCLMULQDQ but
> lacking PCLMULQDQ.

Okay, will do, and I'll assume that ...

>> --- a/xen/tools/gen-cpuid.py
>> +++ b/xen/tools/gen-cpuid.py
>> @@ -255,8 +255,9 @@ def crunch_numbers(state):
>>  
>>          # This is just the dependency between AVX512 and AVX2 of XSTATE
>>          # feature flags.  If want to use AVX512, AVX2 must be supported and
>> -        # enabled.
>> -        AVX2: [AVX512F],
>> +        # enabled.  Certain later extensions, acting on 256-bit vectors of
>> +        # integers, better depend on AVX2 than AVX.
>> +        AVX2: [AVX512F, VPCLMULQDQ],
> 
> Hmm - this is awkward, because in practice, there won't be any hardware
> in existence with VPCLMULQDQ and AVX2 but lacking AVX512.
> 
> However, the VEX encoding is legitimate in the absence of the EVEX
> encoding, and doesn't depend on the AVX512 XCR0 state.
> 
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

... this is meant for a patch with that addition to the dependency tree.

Btw - for GFNI and its VEX/EVEX forms this gets even more interesting.
It didn't even occur to me that there could be new hardware supporting
GFNI but no AVX* whatsoever, but I've been told the current split
between SDM and ISA extensions doc is to reflect exactly this.

Jan

--- a/tools/tests/x86_emulator/evex-disp8.c
+++ b/tools/tests/x86_emulator/evex-disp8.c
@@ -591,6 +591,10 @@  static const struct test avx512_vpopcntd
     INSN(popcnt, 66, 0f38, 55, vl, dq, vl)
 };
 
+static const struct test vpclmulqdq_all[] = {
+    INSN(pclmulqdq, 66, 0f3a, 44, vl, q_nb, vl)
+};
+
 static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
 static const unsigned char vl_128[] = { VL_128 };
 static const unsigned char vl_no128[] = { VL_512, VL_256 };
@@ -968,4 +972,9 @@  void evex_disp8_test(void *instr, struct
     RUN(avx512_vbmi2, all);
     RUN(avx512_vnni, all);
     RUN(avx512_vpopcntdq, all);
+
+    if ( cpu_has_avx512f )
+    {
+        RUN(vpclmulqdq, all);
+    }
 }
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -144,6 +144,7 @@  static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512vl  (cp.feat.avx512vl && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))
 #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6))
+#define cpu_has_vpclmulqdq (cp.feat.vpclmulqdq && xcr0_mask(6))
 #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6))
 #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -594,7 +594,7 @@  static const struct ext0f3a_table {
     [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x40 ... 0x41] = { .simd_size = simd_packed_fp },
     [0x42 ... 0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
-    [0x44] = { .simd_size = simd_packed_int },
+    [0x44] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0x46] = { .simd_size = simd_packed_int },
     [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 },
     [0x4a ... 0x4b] = { .simd_size = simd_packed_fp, .four_op = 1 },
@@ -1922,6 +1922,7 @@  static bool vcpu_has(
 #define vcpu_has_avx512vl()    vcpu_has(         7, EBX, 31, ctxt, ops)
 #define vcpu_has_avx512_vbmi() vcpu_has(         7, ECX,  1, ctxt, ops)
 #define vcpu_has_avx512_vbmi2() vcpu_has(        7, ECX,  6, ctxt, ops)
+#define vcpu_has_vpclmulqdq()  vcpu_has(         7, ECX, 10, ctxt, ops)
 #define vcpu_has_avx512_vnni() vcpu_has(         7, ECX, 11, ctxt, ops)
 #define vcpu_has_avx512_bitalg() vcpu_has(       7, ECX, 12, ctxt, ops)
 #define vcpu_has_avx512_vpopcntdq() vcpu_has(    7, ECX, 14, ctxt, ops)
@@ -10219,13 +10220,19 @@  x86_emulate(
         goto opmask_shift_imm;
 
     case X86EMUL_OPC_66(0x0f3a, 0x44):     /* pclmulqdq $imm8,xmm/m128,xmm */
-    case X86EMUL_OPC_VEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,xmm/m128,xmm,xmm */
+    case X86EMUL_OPC_VEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,{x,y}mm/mem,{x,y}mm,{x,y}mm */
         host_and_vcpu_must_have(pclmulqdq);
         if ( vex.opcx == vex_none )
             goto simd_0f3a_common;
-        generate_exception_if(vex.l, EXC_UD);
+        if ( vex.l )
+            host_and_vcpu_must_have(vpclmulqdq);
         goto simd_0f_imm8_avx;
 
+    case X86EMUL_OPC_EVEX_66(0x0f3a, 0x44): /* vpclmulqdq $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm */
+        host_and_vcpu_must_have(vpclmulqdq);
+        generate_exception_if(evex.brs || evex.opmsk, EXC_UD);
+        goto avx512f_imm8_no_sae;
+
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x4a): /* vblendvps {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x4b): /* vblendvpd {x,y}mm,{x,y}mm/mem,{x,y}mm,{x,y}mm */
         generate_exception_if(vex.w, EXC_UD);
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -112,6 +112,7 @@ 
 /* CPUID level 0x00000007:0.ecx */
 #define cpu_has_avx512_vbmi     boot_cpu_has(X86_FEATURE_AVX512_VBMI)
 #define cpu_has_avx512_vbmi2    boot_cpu_has(X86_FEATURE_AVX512_VBMI2)
+#define cpu_has_vpclmulqdq      boot_cpu_has(X86_FEATURE_VPCLMULQDQ)
 #define cpu_has_avx512_vnni     boot_cpu_has(X86_FEATURE_AVX512_VNNI)
 #define cpu_has_avx512_bitalg   boot_cpu_has(X86_FEATURE_AVX512_BITALG)
 #define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -121,7 +121,7 @@  XEN_CPUFEATURE(PBE,           0*32+31) /
 
 /* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
 XEN_CPUFEATURE(SSE3,          1*32+ 0) /*A  Streaming SIMD Extensions-3 */
-XEN_CPUFEATURE(PCLMULQDQ,     1*32+ 1) /*A  Carry-less mulitplication */
+XEN_CPUFEATURE(PCLMULQDQ,     1*32+ 1) /*A  Carry-less multiplication */
 XEN_CPUFEATURE(DTES64,        1*32+ 2) /*   64-bit Debug Store */
 XEN_CPUFEATURE(MONITOR,       1*32+ 3) /*   Monitor/Mwait support */
 XEN_CPUFEATURE(DSCPL,         1*32+ 4) /*   CPL Qualified Debug Store */
@@ -229,6 +229,7 @@  XEN_CPUFEATURE(UMIP,          6*32+ 2) /
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
+XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
 XEN_CPUFEATURE(AVX512_VNNI,   6*32+11) /*A  Vector Neural Network Instrs */
 XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A  Support for VPOPCNT[B,W] and VPSHUFBITQMB */
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -255,8 +255,9 @@  def crunch_numbers(state):
 
         # This is just the dependency between AVX512 and AVX2 of XSTATE
         # feature flags.  If want to use AVX512, AVX2 must be supported and
-        # enabled.
-        AVX2: [AVX512F],
+        # enabled.  Certain later extensions, acting on 256-bit vectors of
+        # integers, better depend on AVX2 than AVX.
+        AVX2: [AVX512F, VPCLMULQDQ],
 
         # AVX512F is taken to mean hardware support for 512bit registers
         # (which in practice depends on the EVEX prefix to encode) as well

[v8,44/50] x86emul: support VPCLMULQDQ insns

Commit Message

Comments

Patch