Message ID | 1554127273-18663-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/msr: Fix handling of MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV | expand |
>>> On 01.04.19 at 16:01, <andrew.cooper3@citrix.com> wrote: > There are a number of bugs. There are no read/write hooks on the HVM side, so > guest accesses fall into the "read/write-discard" defaults, which bypass the > correct faulting behaviour and the Intel special case. > > For the PV side, writes are discarded (again, bypassing proper faulting), > except for a pinned dom0, which is permitted to actually write the values > other than 0. This is pointless with read hook implementing the Intel special > case. > > However, implementing the Intel special case is pointless. First of all, OS > software can't guarentee to read back 0 in the first place, because a) this > behaviour isn't guarenteed in the SDM, and b) there are SMM handlers which use > the CPUID instruction. I didn't think there was ever an intention to be able to (reliably) hand back zero on a subsequent read. > Secondly, when a guest executes CPUID, this doesn't > typically result in Xen executing a CPUID instruction in practice. Wait - this is true for DomU, but used to be false for (PV) Dom0, until you've switched over from pv_cpuid() to guest_cpuid(). I would assume it was an unintended side effect to, this way, eliminate the actual CPUID access from the special ucode rev reading sequence Dom0 may perform. As to DomU, I severely doubt it's a good idea to expose the ucode revision to a guest, at least as long as the guest can be migrated. But yes, we can leave avoiding to do so to the guest itself: We won't mislead it any worse by handing it the real revision than we would by handing it some synthetized one (e.g. constant 0). > With the dom0 special case removed, there are now no writes to this MSR other > than Xen's microcode loading facilities, which means that the value held in > the MSR will be properly up-to-date. Forward it directly, without jumping > through any hoops. I agree with this, so Reviewed-by: Jan Beulich <jbeulich@suse.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Wei Liu <wei.liu2@citrix.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: David Wang <davidwang@zhaoxin.com> > > This patch texturally (but not functionally) interacts with "xen/sched: Remove > d->is_pinned" but rebasing either is easy. It would also be best applied with > Sergey's "x86/microcode: always collect_cpu_info() during boot". Well, "best" is probably an understatement, as you'd supply stale data if we never came through that code. So I'd call it a strict prereq. Jan
On 01/04/2019 15:55, Jan Beulich wrote: > >> Secondly, when a guest executes CPUID, this doesn't >> typically result in Xen executing a CPUID instruction in practice. > Wait - this is true for DomU, but used to be false for (PV) Dom0, > until you've switched over from pv_cpuid() to guest_cpuid(). I > would assume it was an unintended side effect to, this way, > eliminate the actual CPUID access from the special ucode rev > reading sequence Dom0 may perform. It certainly was an unintended side effect of the cpuid_policy work. In practice, dom0 still doesn't have faulting activated (due to the domain builder not having switched to DOMCTL_set_cpu_policy yet), so can still execute real CPUID instructions. This is not going to stay true for very much longer. > As to DomU, I severely doubt it's a good idea to expose the > ucode revision to a guest, at least as long as the guest can be > migrated. But yes, we can leave avoiding to do so to the guest > itself: We won't mislead it any worse by handing it the real > revision than we would by handing it some synthetized one (e.g. > constant 0). All guests in practice read the current microcode version. As it is model specific, we can't #GP on read, and faking up 0 is the only other option. Furthermore, the speculative sidechannels mean that guests in practice do need to consult the microcode version for cases where MSR_ARCH_CAPS isn't provided. On Broadwell at least, it is a non-optional part of evaluating the safety of retpoline. We have always made the microcode version available, and even the visibility of differences across migrate. It would be better if guests ignored microcode entirely if they found themselves virtualised, but I doubt that is going to happen. >> With the dom0 special case removed, there are now no writes to this MSR other >> than Xen's microcode loading facilities, which means that the value held in >> the MSR will be properly up-to-date. Forward it directly, without jumping >> through any hoops. > I agree with this, so > Reviewed-by: Jan Beulich <jbeulich@suse.com> Thanks, but I'll wait to see if we get a Shanghai answer in a reasonable period of time. > >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Wei Liu <wei.liu2@citrix.com> >> CC: Roger Pau Monné <roger.pau@citrix.com> >> CC: David Wang <davidwang@zhaoxin.com> >> >> This patch texturally (but not functionally) interacts with "xen/sched: Remove >> d->is_pinned" but rebasing either is easy. It would also be best applied with >> Sergey's "x86/microcode: always collect_cpu_info() during boot". > Well, "best" is probably an understatement, as you'd supply > stale data if we never came through that code. So I'd call it a > strict prereq. Hmm - I've missed part of the explanation. a) This behaviour is what happens currently for HVM guests. b) In practice, it is only the very original P6 with no microcode which may have a fully read/write MSR here without anything to update the version. AFACIT, for all CPUs which Xen will boot on, all that matters is that a CPUID instruction has been executed previously, which is covered by the AP boot path. i.e. I think the logic would be more consistent to follow if this patch was behind Sergey's, but there is no functional dependency. ~Andrew
>>> On 01.04.19 at 17:35, <andrew.cooper3@citrix.com> wrote: > On 01/04/2019 15:55, Jan Beulich wrote: >> >>> Secondly, when a guest executes CPUID, this doesn't >>> typically result in Xen executing a CPUID instruction in practice. >> Wait - this is true for DomU, but used to be false for (PV) Dom0, >> until you've switched over from pv_cpuid() to guest_cpuid(). I >> would assume it was an unintended side effect to, this way, >> eliminate the actual CPUID access from the special ucode rev >> reading sequence Dom0 may perform. > > It certainly was an unintended side effect of the cpuid_policy work. In > practice, dom0 still doesn't have faulting activated (due to the domain > builder not having switched to DOMCTL_set_cpu_policy yet), so can still > execute real CPUID instructions. Provided it doesn't use the forced-#UD prefix, which I don't think the Dom0 kernel would make any effort to bypass in this special case. > This is not going to stay true for very much longer. Right. >> As to DomU, I severely doubt it's a good idea to expose the >> ucode revision to a guest, at least as long as the guest can be >> migrated. But yes, we can leave avoiding to do so to the guest >> itself: We won't mislead it any worse by handing it the real >> revision than we would by handing it some synthetized one (e.g. >> constant 0). > > All guests in practice read the current microcode version. Not really, no. In our XenoLinux forward port I had specifically suppressed it, for being an undue operation to do by a PV domain. > As it is > model specific, we can't #GP on read, and faking up 0 is the only other > option. > > Furthermore, the speculative sidechannels mean that guests in practice > do need to consult the microcode version for cases where MSR_ARCH_CAPS > isn't provided. On Broadwell at least, it is a non-optional part of > evaluating the safety of retpoline. But any decisions derived from this information are of limited applicability. >>> This patch texturally (but not functionally) interacts with "xen/sched: Remove >>> d->is_pinned" but rebasing either is easy. It would also be best applied with >>> Sergey's "x86/microcode: always collect_cpu_info() during boot". >> Well, "best" is probably an understatement, as you'd supply >> stale data if we never came through that code. So I'd call it a >> strict prereq. > > Hmm - I've missed part of the explanation. > > a) This behaviour is what happens currently for HVM guests. > > b) In practice, it is only the very original P6 with no microcode which > may have a fully read/write MSR here without anything to update the > version. AFACIT, for all CPUs which Xen will boot on, all that matters > is that a CPUID instruction has been executed previously, which is > covered by the AP boot path. > > i.e. I think the logic would be more consistent to follow if this patch > was behind Sergey's, but there is no functional dependency. Ah, yes, good point. Jan
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c index 4df4a59..59b4298 100644 --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -135,6 +135,28 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val) /* Not offered to guests. */ goto gp_fault; + case MSR_AMD_PATCHLEVEL: + BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL); + /* + * AMD and Intel use the same MSR for the current microcode version. + * + * There is no need to jump through the SDM-provided hoops for Intel. + * A guest might itself perform the "write 0, CPUID, read" sequence, + * but servicing the CPUID for the guest typically wont result in + * actually executing a CPUID instruction. + * + * However, as a guest can't influence the value of this MSR, the + * value will be from Xen's last microcode load, which can be + * forwarded straight to the guest. + */ + if ( (d->arch.cpuid->x86_vendor != X86_VENDOR_INTEL && + d->arch.cpuid->x86_vendor != X86_VENDOR_AMD) || + (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && + boot_cpu_data.x86_vendor != X86_VENDOR_AMD) || + rdmsr_safe(MSR_AMD_PATCHLEVEL, *val) ) + goto gp_fault; + break; + case MSR_SPEC_CTRL: if ( !cp->feat.ibrsb ) goto gp_fault; @@ -236,6 +258,19 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) /* Not offered to guests. */ goto gp_fault; + case MSR_AMD_PATCHLEVEL: + BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL); + /* + * AMD and Intel use the same MSR for the current microcode version. + * + * Both document it as read-only. However Intel also document that, + * for backwards compatiblity, the OS should write 0 to it before + * trying to access the current microcode version. + */ + if ( d->arch.cpuid->x86_vendor != X86_VENDOR_INTEL || val != 0 ) + goto gp_fault; + break; + case MSR_AMD_PATCHLOADER: /* * See note on MSR_IA32_UCODE_WRITE below, which may or may not apply diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c index 84ce67c..a4f3ffc 100644 --- a/xen/arch/x86/pv/emul-priv-op.c +++ b/xen/arch/x86/pv/emul-priv-op.c @@ -893,17 +893,6 @@ static int read_msr(unsigned int reg, uint64_t *val, *val = 0; return X86EMUL_OKAY; - case MSR_IA32_UCODE_REV: - BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL); - if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ) - { - if ( wrmsr_safe(MSR_IA32_UCODE_REV, 0) ) - break; - /* As documented in the SDM: Do a CPUID 1 here */ - cpuid_eax(1); - } - goto normal; - case MSR_IA32_MISC_ENABLE: rdmsrl(reg, *val); *val = guest_misc_enable(*val); @@ -1047,17 +1036,6 @@ static int write_msr(unsigned int reg, uint64_t val, return X86EMUL_OKAY; break; - case MSR_IA32_UCODE_REV: - if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) - break; - if ( !is_hwdom_pinned_vcpu(curr) ) - return X86EMUL_OKAY; - if ( rdmsr_safe(reg, temp) ) - break; - if ( val ) - goto invalid; - return X86EMUL_OKAY; - case MSR_IA32_MISC_ENABLE: rdmsrl(reg, temp); if ( val != guest_misc_enable(temp) )
There are a number of bugs. There are no read/write hooks on the HVM side, so guest accesses fall into the "read/write-discard" defaults, which bypass the correct faulting behaviour and the Intel special case. For the PV side, writes are discarded (again, bypassing proper faulting), except for a pinned dom0, which is permitted to actually write the values other than 0. This is pointless with read hook implementing the Intel special case. However, implementing the Intel special case is pointless. First of all, OS software can't guarentee to read back 0 in the first place, because a) this behaviour isn't guarenteed in the SDM, and b) there are SMM handlers which use the CPUID instruction. Secondly, when a guest executes CPUID, this doesn't typically result in Xen executing a CPUID instruction in practice. With the dom0 special case removed, there are now no writes to this MSR other than Xen's microcode loading facilities, which means that the value held in the MSR will be properly up-to-date. Forward it directly, without jumping through any hoops. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: David Wang <davidwang@zhaoxin.com> This patch texturally (but not functionally) interacts with "xen/sched: Remove d->is_pinned" but rebasing either is easy. It would also be best applied with Sergey's "x86/microcode: always collect_cpu_info() during boot". The migration case is complicated. A guest which re-evaluates its idea of the world may find something completely different, but this is probably less bad letting it see the wrong microcode version. The cross-vendor case is even worse, because Intel and AMD report the version in different 32bit words in this MSR. The result here is fairly close to current behaviour. David: Do Shanghai processors do microcode loading, and if so, how? I can't find any documentation (at all), and don't see any support in Xen or Linux. --- xen/arch/x86/msr.c | 35 +++++++++++++++++++++++++++++++++++ xen/arch/x86/pv/emul-priv-op.c | 22 ---------------------- 2 files changed, 35 insertions(+), 22 deletions(-)