diff mbox series

[v3,5/8] x86emul: support MOVDIR{I,64B} insns

Message ID fa007bb5-1644-6116-fe96-00b00f7241a4@suse.com (mailing list archive)
State New, archived
Headers show
Series x86emul: further work | expand

Commit Message

Jan Beulich Sept. 3, 2019, 9:39 a.m. UTC
Note that SDM revision 070 doesn't specify exception behavior for
ModRM.mod != 0b11; assuming #UD here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Update description.

Comments

Andrew Cooper Sept. 3, 2019, 10:28 a.m. UTC | #1
On 03/09/2019 10:39, Jan Beulich wrote:
> Note that SDM revision 070 doesn't specify exception behavior for
> ModRM.mod != 0b11; assuming #UD here.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

What are we going to do about the ->write() hook atomicity?  I'm happy
to put it on the TODO list, but we can't simply ignore the problem.

~Andrew
Jan Beulich Sept. 3, 2019, 12:25 p.m. UTC | #2
On 03.09.2019 12:28, Andrew Cooper wrote:
> On 03/09/2019 10:39, Jan Beulich wrote:
>> Note that SDM revision 070 doesn't specify exception behavior for
>> ModRM.mod != 0b11; assuming #UD here.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> What are we going to do about the ->write() hook atomicity?  I'm happy
> to put it on the TODO list, but we can't simply ignore the problem.

So do you not agree with my assessment that our memcpy()
implementation satisfies the need, and it's just not very
nice that the ->write() hook is dependent upon this?

Jan
Andrew Cooper Sept. 4, 2019, 10:19 a.m. UTC | #3
On 03/09/2019 13:25, Jan Beulich wrote:
> On 03.09.2019 12:28, Andrew Cooper wrote:
>> On 03/09/2019 10:39, Jan Beulich wrote:
>>> Note that SDM revision 070 doesn't specify exception behavior for
>>> ModRM.mod != 0b11; assuming #UD here.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> What are we going to do about the ->write() hook atomicity?  I'm happy
>> to put it on the TODO list, but we can't simply ignore the problem.
> So do you not agree with my assessment that our memcpy()
> implementation satisfies the need, and it's just not very
> nice that the ->write() hook is dependent upon this?

We use __builtin_memcpy(), not necessarily our own local implementation.

Our own copy uses rep movsq followed by rep movsb of up to 7 bytes,
which doesn't handle 2 and 4 byte copies atomically.  More generally,
rep movs doesn't provide guarantees about the external visibility of
component writes, owing to the many different ways that such writes may
be implemented, and optimised.

Even if the above were fixed (and I'm not sure could make it correct
even for misaligned writes), we cannot depend on our own memcpy never
changing in the future.  For one, it should be a straight rep movsb on
most Intel hardware these days, for performance.

~Andrew
Jan Beulich Sept. 4, 2019, 11:52 a.m. UTC | #4
On 04.09.2019 12:19, Andrew Cooper wrote:
> On 03/09/2019 13:25, Jan Beulich wrote:
>> On 03.09.2019 12:28, Andrew Cooper wrote:
>>> On 03/09/2019 10:39, Jan Beulich wrote:
>>>> Note that SDM revision 070 doesn't specify exception behavior for
>>>> ModRM.mod != 0b11; assuming #UD here.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> What are we going to do about the ->write() hook atomicity?  I'm happy
>>> to put it on the TODO list, but we can't simply ignore the problem.
>> So do you not agree with my assessment that our memcpy()
>> implementation satisfies the need, and it's just not very
>> nice that the ->write() hook is dependent upon this?
> 
> We use __builtin_memcpy(), not necessarily our own local implementation.
> 
> Our own copy uses rep movsq followed by rep movsb of up to 7 bytes,
> which doesn't handle 2 and 4 byte copies atomically.  More generally,
> rep movs doesn't provide guarantees about the external visibility of
> component writes, owing to the many different ways that such writes may
> be implemented, and optimised.
> 
> Even if the above were fixed (and I'm not sure could make it correct
> even for misaligned writes), we cannot depend on our own memcpy never
> changing in the future.  For one, it should be a straight rep movsb on
> most Intel hardware these days, for performance.

Well, okay, I'll add a prepatch making HVM's ->write() hook not use
memcpy anymore for small aligned accesses. I guess I should also do
this for ->read() then, if it's not the case already.

However, as an implication this means that MOVDIR64B can't use the
->write() hook at all. We'd need to introduce a new hook, but to be
honest I have no good idea how to model it (i.e. what other
operations it might cover later on); possibly we want to come back
to this when deciding how to implement large memory block accessing
insns (FXSAVE/FXRSTOR etc).

Furthermore I don't think we currently have means to make the split
behavior of MOVDIRI correct: By using ->write(), we can't guarantee
it'll be exactly two writes.

Jan
diff mbox series

Patch

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -2196,6 +2196,36 @@  int main(int argc, char **argv)
         goto fail;
     printf("okay\n");
 
+    printf("%-40s", "Testing movdiri %edx,(%ecx)...");
+    instr[0] = 0x0f; instr[1] = 0x38; instr[2] = 0xf9; instr[3] = 0x11;
+    regs.eip = (unsigned long)&instr[0];
+    regs.ecx = (unsigned long)memset(res, -1, 16);
+    regs.edx = 0x44332211;
+    rc = x86_emulate(&ctxt, &emulops);
+    if ( (rc != X86EMUL_OKAY) ||
+         (regs.eip != (unsigned long)&instr[4]) ||
+         res[0] != 0x44332211 || ~res[1] )
+        goto fail;
+    printf("okay\n");
+
+    printf("%-40s", "Testing movdir64b 144(%edx),%ecx...");
+    instr[0] = 0x66; instr[1] = 0x0f; instr[2] = 0x38; instr[3] = 0xf8;
+    instr[4] = 0x8a; instr[5] = 0x90; instr[8] = instr[7] = instr[6] = 0;
+    regs.eip = (unsigned long)&instr[0];
+    for ( i = 0; i < 64; ++i )
+        res[i] = i - 20;
+    regs.edx = (unsigned long)res;
+    regs.ecx = (unsigned long)(res + 16);
+    rc = x86_emulate(&ctxt, &emulops);
+    if ( (rc != X86EMUL_OKAY) ||
+         (regs.eip != (unsigned long)&instr[9]) ||
+         res[15] != -5 || res[32] != 12 )
+        goto fail;
+    for ( i = 16; i < 32; ++i )
+        if ( res[i] != i )
+            goto fail;
+    printf("okay\n");
+
     printf("%-40s", "Testing movq %mm3,(%ecx)...");
     if ( stack_exec && cpu_has_mmx )
     {
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -76,6 +76,8 @@  bool emul_test_init(void)
     cp.feat.adx = true;
     cp.feat.avx512pf = cp.feat.avx512f;
     cp.feat.rdpid = true;
+    cp.feat.movdiri = true;
+    cp.feat.movdir64b = true;
     cp.extd.clzero = true;
 
     if ( cpu_has_xsave )
@@ -137,15 +139,15 @@  int emul_test_cpuid(
         res->c |= 1U << 22;
 
     /*
-     * The emulator doesn't itself use ADCX/ADOX/RDPID nor the S/G prefetch
-     * insns, so we can always run the respective tests.
+     * The emulator doesn't itself use ADCX/ADOX/RDPID/MOVDIR* nor the S/G
+     * prefetch insns, so we can always run the respective tests.
      */
     if ( leaf == 7 && subleaf == 0 )
     {
         res->b |= (1U << 10) | (1U << 19);
         if ( res->b & (1U << 16) )
             res->b |= 1U << 26;
-        res->c |= 1U << 22;
+        res->c |= (1U << 22) | (1U << 27) | (1U << 28);
     }
 
     /*
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -548,6 +548,8 @@  static const struct ext0f38_table {
     [0xf1] = { .to_mem = 1, .two_op = 1 },
     [0xf2 ... 0xf3] = {},
     [0xf5 ... 0xf7] = {},
+    [0xf8] = { .simd_size = simd_other },
+    [0xf9] = { .to_mem = 1 },
 };
 
 /* Shift values between src and dst sizes of pmov{s,z}x{b,w,d}{w,d,q}. */
@@ -1902,6 +1904,8 @@  in_protmode(
 #define vcpu_has_avx512_bitalg() (ctxt->cpuid->feat.avx512_bitalg)
 #define vcpu_has_avx512_vpopcntdq() (ctxt->cpuid->feat.avx512_vpopcntdq)
 #define vcpu_has_rdpid()       (ctxt->cpuid->feat.rdpid)
+#define vcpu_has_movdiri()     (ctxt->cpuid->feat.movdiri)
+#define vcpu_has_movdir64b()   (ctxt->cpuid->feat.movdir64b)
 #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw)
 #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps)
 
@@ -2693,10 +2697,12 @@  x86_decode_0f38(
     {
     case 0x00 ... 0xef:
     case 0xf2 ... 0xf5:
-    case 0xf7 ... 0xff:
+    case 0xf7 ... 0xf8:
+    case 0xfa ... 0xff:
         op_bytes = 0;
         /* fall through */
     case 0xf6: /* adcx / adox */
+    case 0xf9: /* movdiri */
         ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
         break;
 
@@ -9896,6 +9902,32 @@  x86_emulate(
                             : "0" ((uint32_t)src.val), "rm" (_regs.edx) );
         break;
 
+    case X86EMUL_OPC_66(0x0f38, 0xf8): /* movdir64b r,m512 */
+        vcpu_must_have(movdir64b);
+        generate_exception_if(ea.type != OP_MEM, EXC_UD);
+        src.val = truncate_ea(*dst.reg);
+        generate_exception_if(!is_aligned(x86_seg_es, src.val, 64, ctxt, ops),
+                              EXC_GP, 0);
+        /* Ignore the non-temporal behavior for now. */
+        fail_if(!ops->write);
+        BUILD_BUG_ON(sizeof(*mmvalp) < 64);
+        if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp, 64,
+                             ctxt)) != X86EMUL_OKAY ||
+             (rc = ops->write(x86_seg_es, src.val, mmvalp, 64,
+                              ctxt)) != X86EMUL_OKAY )
+            goto done;
+        state->simd_size = simd_none;
+        sfence = true;
+        break;
+
+    case X86EMUL_OPC(0x0f38, 0xf9): /* movdiri mem,r */
+        vcpu_must_have(movdiri);
+        generate_exception_if(dst.type != OP_MEM, EXC_UD);
+        /* Ignore the non-temporal behavior for now. */
+        dst.val = src.val;
+        sfence = true;
+        break;
+
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x00): /* vpermq $imm8,ymm/m256,ymm */
     case X86EMUL_OPC_VEX_66(0x0f3a, 0x01): /* vpermpd $imm8,ymm/m256,ymm */
         generate_exception_if(!vex.l || !vex.w, EXC_UD);
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -237,6 +237,8 @@  XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
 XEN_CPUFEATURE(RDPID,         6*32+22) /*A  RDPID instruction */
 XEN_CPUFEATURE(CLDEMOTE,      6*32+25) /*A  CLDEMOTE instruction */
+XEN_CPUFEATURE(MOVDIRI,       6*32+27) /*A  MOVDIRI instruction */
+XEN_CPUFEATURE(MOVDIR64B,     6*32+28) /*A  MOVDIR64B instruction */
 
 /* AMD-defined CPU features, CPUID level 0x80000007.edx, word 7 */
 XEN_CPUFEATURE(ITSC,          7*32+ 8) /*   Invariant TSC */