diff mbox series

[v2] x86emul: don't read mask register on AVX512F-incapable platforms

Message ID 5C9DCC4A0200007800222AEA@prv1-mh.provo.novell.com (mailing list archive)
State New, archived
Headers show
Series [v2] x86emul: don't read mask register on AVX512F-incapable platforms | expand

Commit Message

Jan Beulich March 29, 2019, 7:42 a.m. UTC
Nor when register state isn't sufficiently enabled.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This is surely a stable tree candidate, unless it could still make it
into 4.12 before the release.
---
v2: Add XCR0 check.

Comments

Andrew Cooper March 29, 2019, 9:19 a.m. UTC | #1
On 29/03/2019 07:42, Jan Beulich wrote:
> Nor when register state isn't sufficiently enabled.
>
> Reported-by: George Dunlap <george.dunlap@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> This is surely a stable tree candidate, unless it could still make it
> into 4.12 before the release.
> ---
> v2: Add XCR0 check.
>
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -3511,7 +3511,8 @@ x86_emulate(
>      }
>  
>      /* With a memory operand, fetch the mask register in use (if any). */
> -    if ( ea.type == OP_MEM && evex.opmsk )
> +    if ( ea.type == OP_MEM && cpu_has_avx512f && evex.opmsk &&
> +         _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY )

The cpu_has_avx512f check is now redundant.  It is fully encapsulated by
_get_fpu() being happy with X86EMUL_FPU_opmask.

Preferably with it dropped, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

>      {
>          uint8_t *stb = get_stub(stub);
>  
> @@ -3532,6 +3533,14 @@ x86_emulate(
>          fault_suppression = true;
>      }
>  
> +    if ( fpu_type == X86EMUL_FPU_opmask )
> +    {
> +        /* Squash (side) effects of the _get_fpu() above. */
> +        x86_emul_reset_event(ctxt);
> +        put_fpu(X86EMUL_FPU_opmask, false, state, ctxt, ops);
> +        fpu_type = X86EMUL_FPU_none;
> +    }
> +
>      /* Decode (but don't fetch) the destination operand: register or memory. */
>      switch ( d & DstMask )
>      {
>
>
>
>
Jan Beulich March 29, 2019, 9:36 a.m. UTC | #2
>>> On 29.03.19 at 10:19, <andrew.cooper3@citrix.com> wrote:
> On 29/03/2019 07:42, Jan Beulich wrote:
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -3511,7 +3511,8 @@ x86_emulate(
>>      }
>>  
>>      /* With a memory operand, fetch the mask register in use (if any). */
>> -    if ( ea.type == OP_MEM && evex.opmsk )
>> +    if ( ea.type == OP_MEM && cpu_has_avx512f && evex.opmsk &&
>> +         _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY )
> 
> The cpu_has_avx512f check is now redundant.  It is fully encapsulated by
> _get_fpu() being happy with X86EMUL_FPU_opmask.

Well, that'll end up being inconsistent with what we do elsewhere:
If we did as you say, host_and_vcpu_must_have(avx512f) could
(and for consistency then should) all become just
vcpu_must_have(avx512f). Similarly for AVX.

I'd like to put up the other option then: Rather than using
_get_fpu() (and in particular the read_xcr() and read_cr() hooks)
we could read the real XCR0 here. After all we issue the KMOV not
because the guest has specified it, but because we need the value
of the register for correct fault suppression emulation.

> Preferably with it dropped, Reviewed-by: Andrew Cooper
> <andrew.cooper3@citrix.com>

Let me know of the applicability of this.

Jan
Andrew Cooper March 29, 2019, 10:02 a.m. UTC | #3
On 29/03/2019 09:36, Jan Beulich wrote:
>>>> On 29.03.19 at 10:19, <andrew.cooper3@citrix.com> wrote:
>> On 29/03/2019 07:42, Jan Beulich wrote:
>>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>>> @@ -3511,7 +3511,8 @@ x86_emulate(
>>>      }
>>>  
>>>      /* With a memory operand, fetch the mask register in use (if any). */
>>> -    if ( ea.type == OP_MEM && evex.opmsk )
>>> +    if ( ea.type == OP_MEM && cpu_has_avx512f && evex.opmsk &&
>>> +         _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY )
>> The cpu_has_avx512f check is now redundant.  It is fully encapsulated by
>> _get_fpu() being happy with X86EMUL_FPU_opmask.
> Well, that'll end up being inconsistent with what we do elsewhere:
> If we did as you say, host_and_vcpu_must_have(avx512f) could
> (and for consistency then should) all become just
> vcpu_must_have(avx512f). Similarly for AVX.

That case isn't the same.

For kmov, we don't care about the instruction group per say.  We care
that kmask xsave state is active and usable.

If you recall, the reason why you chose not to merge the host_and_vcpu
and vcpu predicates when I queried this on initial review was for the
theoretical case of the guest being offered features not present in
hardware, and having the emulator fill in the gaps.  (Also, the code may
have pre-dated {pv,hvm}_cpuid() handing back properly audited content,
which is something that has definitely been fixed now.)

Given many years retrospect on the matter, I'm not actually sure how
much of a useful usecase this is.  Obviously, there are some
cross-vendor applicabilities, but these only extend to individual
instructions whose behaviour can be fully replaced in other ways (i.e.
not for instructions which we decode and replay).

I don't see us ever gaining support for using instructions in cases
where the relevant xstate isn't available in hardware.

>
> I'd like to put up the other option then: Rather than using
> _get_fpu() (and in particular the read_xcr() and read_cr() hooks)
> we could read the real XCR0 here. After all we issue the KMOV not
> because the guest has specified it, but because we need the value
> of the register for correct fault suppression emulation.

True, and that would be rather smaller and less invasive than
deliberately squashing the other side effects of get_fpu()

~Andrew
Jan Beulich March 29, 2019, 10:56 a.m. UTC | #4
>>> On 29.03.19 at 11:02, <andrew.cooper3@citrix.com> wrote:
> On 29/03/2019 09:36, Jan Beulich wrote:
>> I'd like to put up the other option then: Rather than using
>> _get_fpu() (and in particular the read_xcr() and read_cr() hooks)
>> we could read the real XCR0 here. After all we issue the KMOV not
>> because the guest has specified it, but because we need the value
>> of the register for correct fault suppression emulation.
> 
> True, and that would be rather smaller and less invasive than
> deliberately squashing the other side effects of get_fpu()

Hmm, I've tried to do this, but this is more complicated: CR0.TS
may be set, in which case we need to invoke the get_fpu() hook
to get it cleared with appropriate bookkeeping. I don't think it's
worth further complicating the code by invoking the hook _only_
in that case. So I guess we better stick to v2.

Which makes me come back to your request to drop the
cpu_has_avx512f part of the condition: Right now the fuzzer
uses emul_test_read_xcr() instead of actually fuzzing the
value. Once it does, would we expect it to never set any bits
in the returned value that aren't set in hardware, but could
in principle be set based on (real) CPUID output? In that case
I could agree to remove the extra condition.

Jan
Andrew Cooper April 1, 2019, 2:14 p.m. UTC | #5
On 29/03/2019 10:56, Jan Beulich wrote:
>>>> On 29.03.19 at 11:02, <andrew.cooper3@citrix.com> wrote:
>> On 29/03/2019 09:36, Jan Beulich wrote:
>>> I'd like to put up the other option then: Rather than using
>>> _get_fpu() (and in particular the read_xcr() and read_cr() hooks)
>>> we could read the real XCR0 here. After all we issue the KMOV not
>>> because the guest has specified it, but because we need the value
>>> of the register for correct fault suppression emulation.
>> True, and that would be rather smaller and less invasive than
>> deliberately squashing the other side effects of get_fpu()
> Hmm, I've tried to do this, but this is more complicated: CR0.TS
> may be set, in which case we need to invoke the get_fpu() hook
> to get it cleared with appropriate bookkeeping. I don't think it's
> worth further complicating the code by invoking the hook _only_
> in that case. So I guess we better stick to v2.

Oh ok.  That does complicated things.  Lets just use the existing
infrastructure, even if it is rather heavyweight.

>
> Which makes me come back to your request to drop the
> cpu_has_avx512f part of the condition: Right now the fuzzer
> uses emul_test_read_xcr() instead of actually fuzzing the
> value. Once it does, would we expect it to never set any bits
> in the returned value that aren't set in hardware, but could
> in principle be set based on (real) CPUID output? In that case
> I could agree to remove the extra condition.

I don't see how we could ever emulate with a (v)xcr0 different to a
legitimate value in hardware, as the stubs would #UD.

I also don't see how the userspace tools could ever test with a value
other than what it can see in xgetbv, because only the kernel gets to
choose %xcr0.  Even with faking up a smaller xcr0, you'd end up with
instructions which should fault but don't.

~Andrew
Jan Beulich April 1, 2019, 3:02 p.m. UTC | #6
>>> On 01.04.19 at 16:14, <andrew.cooper3@citrix.com> wrote:
> On 29/03/2019 10:56, Jan Beulich wrote:
>>>>> On 29.03.19 at 11:02, <andrew.cooper3@citrix.com> wrote:
>>> On 29/03/2019 09:36, Jan Beulich wrote:
>>>> I'd like to put up the other option then: Rather than using
>>>> _get_fpu() (and in particular the read_xcr() and read_cr() hooks)
>>>> we could read the real XCR0 here. After all we issue the KMOV not
>>>> because the guest has specified it, but because we need the value
>>>> of the register for correct fault suppression emulation.
>>> True, and that would be rather smaller and less invasive than
>>> deliberately squashing the other side effects of get_fpu()
>> Hmm, I've tried to do this, but this is more complicated: CR0.TS
>> may be set, in which case we need to invoke the get_fpu() hook
>> to get it cleared with appropriate bookkeeping. I don't think it's
>> worth further complicating the code by invoking the hook _only_
>> in that case. So I guess we better stick to v2.
> 
> Oh ok.  That does complicated things.  Lets just use the existing
> infrastructure, even if it is rather heavyweight.
> 
>> Which makes me come back to your request to drop the
>> cpu_has_avx512f part of the condition: Right now the fuzzer
>> uses emul_test_read_xcr() instead of actually fuzzing the
>> value. Once it does, would we expect it to never set any bits
>> in the returned value that aren't set in hardware, but could
>> in principle be set based on (real) CPUID output? In that case
>> I could agree to remove the extra condition.
> 
> I don't see how we could ever emulate with a (v)xcr0 different to a
> legitimate value in hardware, as the stubs would #UD.
> 
> I also don't see how the userspace tools could ever test with a value
> other than what it can see in xgetbv, because only the kernel gets to
> choose %xcr0.  Even with faking up a smaller xcr0, you'd end up with
> instructions which should fault but don't.

Would you mind looking at what we do for CR0 and CR4 right now
in the fuzzer stubs? I don't see why, in principle, these and XCR0
would need handling differently: Either we supply sane state
rather than fully fuzzed one, or we don't. But preferably uniformly.
Yet right now XCR0 gets sane values, while CR0 and CR4 get
fuzzed in architecturally impossible ways.

As to faulting: The same would be true if the emulator used e.g.
the fsgsbase insns itself, but based its decision on the presented
CR4 value: It might fault when it shouldn't, or it might not fault
when it should, depending on host CR4.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3511,7 +3511,8 @@  x86_emulate(
     }
 
     /* With a memory operand, fetch the mask register in use (if any). */
-    if ( ea.type == OP_MEM && evex.opmsk )
+    if ( ea.type == OP_MEM && cpu_has_avx512f && evex.opmsk &&
+         _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY )
     {
         uint8_t *stb = get_stub(stub);
 
@@ -3532,6 +3533,14 @@  x86_emulate(
         fault_suppression = true;
     }
 
+    if ( fpu_type == X86EMUL_FPU_opmask )
+    {
+        /* Squash (side) effects of the _get_fpu() above. */
+        x86_emul_reset_event(ctxt);
+        put_fpu(X86EMUL_FPU_opmask, false, state, ctxt, ops);
+        fpu_type = X86EMUL_FPU_none;
+    }
+
     /* Decode (but don't fetch) the destination operand: register or memory. */
     switch ( d & DstMask )
     {