Message ID | 5C9DCC4A0200007800222AEA@prv1-mh.provo.novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] x86emul: don't read mask register on AVX512F-incapable platforms | expand |
On 29/03/2019 07:42, Jan Beulich wrote: > Nor when register state isn't sufficiently enabled. > > Reported-by: George Dunlap <george.dunlap@citrix.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > This is surely a stable tree candidate, unless it could still make it > into 4.12 before the release. > --- > v2: Add XCR0 check. > > --- a/xen/arch/x86/x86_emulate/x86_emulate.c > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c > @@ -3511,7 +3511,8 @@ x86_emulate( > } > > /* With a memory operand, fetch the mask register in use (if any). */ > - if ( ea.type == OP_MEM && evex.opmsk ) > + if ( ea.type == OP_MEM && cpu_has_avx512f && evex.opmsk && > + _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY ) The cpu_has_avx512f check is now redundant. It is fully encapsulated by _get_fpu() being happy with X86EMUL_FPU_opmask. Preferably with it dropped, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > { > uint8_t *stb = get_stub(stub); > > @@ -3532,6 +3533,14 @@ x86_emulate( > fault_suppression = true; > } > > + if ( fpu_type == X86EMUL_FPU_opmask ) > + { > + /* Squash (side) effects of the _get_fpu() above. */ > + x86_emul_reset_event(ctxt); > + put_fpu(X86EMUL_FPU_opmask, false, state, ctxt, ops); > + fpu_type = X86EMUL_FPU_none; > + } > + > /* Decode (but don't fetch) the destination operand: register or memory. */ > switch ( d & DstMask ) > { > > > >
>>> On 29.03.19 at 10:19, <andrew.cooper3@citrix.com> wrote: > On 29/03/2019 07:42, Jan Beulich wrote: >> --- a/xen/arch/x86/x86_emulate/x86_emulate.c >> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c >> @@ -3511,7 +3511,8 @@ x86_emulate( >> } >> >> /* With a memory operand, fetch the mask register in use (if any). */ >> - if ( ea.type == OP_MEM && evex.opmsk ) >> + if ( ea.type == OP_MEM && cpu_has_avx512f && evex.opmsk && >> + _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY ) > > The cpu_has_avx512f check is now redundant. It is fully encapsulated by > _get_fpu() being happy with X86EMUL_FPU_opmask. Well, that'll end up being inconsistent with what we do elsewhere: If we did as you say, host_and_vcpu_must_have(avx512f) could (and for consistency then should) all become just vcpu_must_have(avx512f). Similarly for AVX. I'd like to put up the other option then: Rather than using _get_fpu() (and in particular the read_xcr() and read_cr() hooks) we could read the real XCR0 here. After all we issue the KMOV not because the guest has specified it, but because we need the value of the register for correct fault suppression emulation. > Preferably with it dropped, Reviewed-by: Andrew Cooper > <andrew.cooper3@citrix.com> Let me know of the applicability of this. Jan
On 29/03/2019 09:36, Jan Beulich wrote: >>>> On 29.03.19 at 10:19, <andrew.cooper3@citrix.com> wrote: >> On 29/03/2019 07:42, Jan Beulich wrote: >>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c >>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c >>> @@ -3511,7 +3511,8 @@ x86_emulate( >>> } >>> >>> /* With a memory operand, fetch the mask register in use (if any). */ >>> - if ( ea.type == OP_MEM && evex.opmsk ) >>> + if ( ea.type == OP_MEM && cpu_has_avx512f && evex.opmsk && >>> + _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY ) >> The cpu_has_avx512f check is now redundant. It is fully encapsulated by >> _get_fpu() being happy with X86EMUL_FPU_opmask. > Well, that'll end up being inconsistent with what we do elsewhere: > If we did as you say, host_and_vcpu_must_have(avx512f) could > (and for consistency then should) all become just > vcpu_must_have(avx512f). Similarly for AVX. That case isn't the same. For kmov, we don't care about the instruction group per say. We care that kmask xsave state is active and usable. If you recall, the reason why you chose not to merge the host_and_vcpu and vcpu predicates when I queried this on initial review was for the theoretical case of the guest being offered features not present in hardware, and having the emulator fill in the gaps. (Also, the code may have pre-dated {pv,hvm}_cpuid() handing back properly audited content, which is something that has definitely been fixed now.) Given many years retrospect on the matter, I'm not actually sure how much of a useful usecase this is. Obviously, there are some cross-vendor applicabilities, but these only extend to individual instructions whose behaviour can be fully replaced in other ways (i.e. not for instructions which we decode and replay). I don't see us ever gaining support for using instructions in cases where the relevant xstate isn't available in hardware. > > I'd like to put up the other option then: Rather than using > _get_fpu() (and in particular the read_xcr() and read_cr() hooks) > we could read the real XCR0 here. After all we issue the KMOV not > because the guest has specified it, but because we need the value > of the register for correct fault suppression emulation. True, and that would be rather smaller and less invasive than deliberately squashing the other side effects of get_fpu() ~Andrew
>>> On 29.03.19 at 11:02, <andrew.cooper3@citrix.com> wrote: > On 29/03/2019 09:36, Jan Beulich wrote: >> I'd like to put up the other option then: Rather than using >> _get_fpu() (and in particular the read_xcr() and read_cr() hooks) >> we could read the real XCR0 here. After all we issue the KMOV not >> because the guest has specified it, but because we need the value >> of the register for correct fault suppression emulation. > > True, and that would be rather smaller and less invasive than > deliberately squashing the other side effects of get_fpu() Hmm, I've tried to do this, but this is more complicated: CR0.TS may be set, in which case we need to invoke the get_fpu() hook to get it cleared with appropriate bookkeeping. I don't think it's worth further complicating the code by invoking the hook _only_ in that case. So I guess we better stick to v2. Which makes me come back to your request to drop the cpu_has_avx512f part of the condition: Right now the fuzzer uses emul_test_read_xcr() instead of actually fuzzing the value. Once it does, would we expect it to never set any bits in the returned value that aren't set in hardware, but could in principle be set based on (real) CPUID output? In that case I could agree to remove the extra condition. Jan
On 29/03/2019 10:56, Jan Beulich wrote: >>>> On 29.03.19 at 11:02, <andrew.cooper3@citrix.com> wrote: >> On 29/03/2019 09:36, Jan Beulich wrote: >>> I'd like to put up the other option then: Rather than using >>> _get_fpu() (and in particular the read_xcr() and read_cr() hooks) >>> we could read the real XCR0 here. After all we issue the KMOV not >>> because the guest has specified it, but because we need the value >>> of the register for correct fault suppression emulation. >> True, and that would be rather smaller and less invasive than >> deliberately squashing the other side effects of get_fpu() > Hmm, I've tried to do this, but this is more complicated: CR0.TS > may be set, in which case we need to invoke the get_fpu() hook > to get it cleared with appropriate bookkeeping. I don't think it's > worth further complicating the code by invoking the hook _only_ > in that case. So I guess we better stick to v2. Oh ok. That does complicated things. Lets just use the existing infrastructure, even if it is rather heavyweight. > > Which makes me come back to your request to drop the > cpu_has_avx512f part of the condition: Right now the fuzzer > uses emul_test_read_xcr() instead of actually fuzzing the > value. Once it does, would we expect it to never set any bits > in the returned value that aren't set in hardware, but could > in principle be set based on (real) CPUID output? In that case > I could agree to remove the extra condition. I don't see how we could ever emulate with a (v)xcr0 different to a legitimate value in hardware, as the stubs would #UD. I also don't see how the userspace tools could ever test with a value other than what it can see in xgetbv, because only the kernel gets to choose %xcr0. Even with faking up a smaller xcr0, you'd end up with instructions which should fault but don't. ~Andrew
>>> On 01.04.19 at 16:14, <andrew.cooper3@citrix.com> wrote: > On 29/03/2019 10:56, Jan Beulich wrote: >>>>> On 29.03.19 at 11:02, <andrew.cooper3@citrix.com> wrote: >>> On 29/03/2019 09:36, Jan Beulich wrote: >>>> I'd like to put up the other option then: Rather than using >>>> _get_fpu() (and in particular the read_xcr() and read_cr() hooks) >>>> we could read the real XCR0 here. After all we issue the KMOV not >>>> because the guest has specified it, but because we need the value >>>> of the register for correct fault suppression emulation. >>> True, and that would be rather smaller and less invasive than >>> deliberately squashing the other side effects of get_fpu() >> Hmm, I've tried to do this, but this is more complicated: CR0.TS >> may be set, in which case we need to invoke the get_fpu() hook >> to get it cleared with appropriate bookkeeping. I don't think it's >> worth further complicating the code by invoking the hook _only_ >> in that case. So I guess we better stick to v2. > > Oh ok. That does complicated things. Lets just use the existing > infrastructure, even if it is rather heavyweight. > >> Which makes me come back to your request to drop the >> cpu_has_avx512f part of the condition: Right now the fuzzer >> uses emul_test_read_xcr() instead of actually fuzzing the >> value. Once it does, would we expect it to never set any bits >> in the returned value that aren't set in hardware, but could >> in principle be set based on (real) CPUID output? In that case >> I could agree to remove the extra condition. > > I don't see how we could ever emulate with a (v)xcr0 different to a > legitimate value in hardware, as the stubs would #UD. > > I also don't see how the userspace tools could ever test with a value > other than what it can see in xgetbv, because only the kernel gets to > choose %xcr0. Even with faking up a smaller xcr0, you'd end up with > instructions which should fault but don't. Would you mind looking at what we do for CR0 and CR4 right now in the fuzzer stubs? I don't see why, in principle, these and XCR0 would need handling differently: Either we supply sane state rather than fully fuzzed one, or we don't. But preferably uniformly. Yet right now XCR0 gets sane values, while CR0 and CR4 get fuzzed in architecturally impossible ways. As to faulting: The same would be true if the emulator used e.g. the fsgsbase insns itself, but based its decision on the presented CR4 value: It might fault when it shouldn't, or it might not fault when it should, depending on host CR4. Jan
--- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -3511,7 +3511,8 @@ x86_emulate( } /* With a memory operand, fetch the mask register in use (if any). */ - if ( ea.type == OP_MEM && evex.opmsk ) + if ( ea.type == OP_MEM && cpu_has_avx512f && evex.opmsk && + _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY ) { uint8_t *stb = get_stub(stub); @@ -3532,6 +3533,14 @@ x86_emulate( fault_suppression = true; } + if ( fpu_type == X86EMUL_FPU_opmask ) + { + /* Squash (side) effects of the _get_fpu() above. */ + x86_emul_reset_event(ctxt); + put_fpu(X86EMUL_FPU_opmask, false, state, ctxt, ops); + fpu_type = X86EMUL_FPU_none; + } + /* Decode (but don't fetch) the destination operand: register or memory. */ switch ( d & DstMask ) {
Nor when register state isn't sufficiently enabled. Reported-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- This is surely a stable tree candidate, unless it could still make it into 4.12 before the release. --- v2: Add XCR0 check.