diff mbox series

x86/msr: Fix handling of MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV

Message ID 1554127273-18663-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/msr: Fix handling of MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV | expand

Commit Message

Andrew Cooper April 1, 2019, 2:01 p.m. UTC
There are a number of bugs.  There are no read/write hooks on the HVM side, so
guest accesses fall into the "read/write-discard" defaults, which bypass the
correct faulting behaviour and the Intel special case.

For the PV side, writes are discarded (again, bypassing proper faulting),
except for a pinned dom0, which is permitted to actually write the values
other than 0.  This is pointless with read hook implementing the Intel special
case.

However, implementing the Intel special case is pointless.  First of all, OS
software can't guarentee to read back 0 in the first place, because a) this
behaviour isn't guarenteed in the SDM, and b) there are SMM handlers which use
the CPUID instruction.  Secondly, when a guest executes CPUID, this doesn't
typically result in Xen executing a CPUID instruction in practice.

With the dom0 special case removed, there are now no writes to this MSR other
than Xen's microcode loading facilities, which means that the value held in
the MSR will be properly up-to-date.  Forward it directly, without jumping
through any hoops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: David Wang <davidwang@zhaoxin.com>

This patch texturally (but not functionally) interacts with "xen/sched: Remove
d->is_pinned" but rebasing either is easy.  It would also be best applied with
Sergey's "x86/microcode: always collect_cpu_info() during boot".

The migration case is complicated.  A guest which re-evaluates its idea of the
world may find something completely different, but this is probably less bad
letting it see the wrong microcode version.  The cross-vendor case is even
worse, because Intel and AMD report the version in different 32bit words in
this MSR.  The result here is fairly close to current behaviour.

David: Do Shanghai processors do microcode loading, and if so, how?  I can't
find any documentation (at all), and don't see any support in Xen or Linux.
---
 xen/arch/x86/msr.c             | 35 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/pv/emul-priv-op.c | 22 ----------------------
 2 files changed, 35 insertions(+), 22 deletions(-)

Comments

Jan Beulich April 1, 2019, 2:55 p.m. UTC | #1
>>> On 01.04.19 at 16:01, <andrew.cooper3@citrix.com> wrote:
> There are a number of bugs.  There are no read/write hooks on the HVM side, so
> guest accesses fall into the "read/write-discard" defaults, which bypass the
> correct faulting behaviour and the Intel special case.
> 
> For the PV side, writes are discarded (again, bypassing proper faulting),
> except for a pinned dom0, which is permitted to actually write the values
> other than 0.  This is pointless with read hook implementing the Intel special
> case.
> 
> However, implementing the Intel special case is pointless.  First of all, OS
> software can't guarentee to read back 0 in the first place, because a) this
> behaviour isn't guarenteed in the SDM, and b) there are SMM handlers which use
> the CPUID instruction.

I didn't think there was ever an intention to be able to (reliably)
hand back zero on a subsequent read.

>  Secondly, when a guest executes CPUID, this doesn't
> typically result in Xen executing a CPUID instruction in practice.

Wait - this is true for DomU, but used to be false for (PV) Dom0,
until you've switched over from pv_cpuid() to guest_cpuid(). I
would assume it was an unintended side effect to, this way,
eliminate the actual CPUID access from the special ucode rev
reading sequence Dom0 may perform.

As to DomU, I severely doubt it's a good idea to expose the
ucode revision to a guest, at least as long as the guest can be
migrated. But yes, we can leave avoiding to do so to the guest
itself: We won't mislead it any worse by handing it the real
revision than we would by handing it some synthetized one (e.g.
constant 0).

> With the dom0 special case removed, there are now no writes to this MSR other
> than Xen's microcode loading facilities, which means that the value held in
> the MSR will be properly up-to-date.  Forward it directly, without jumping
> through any hoops.

I agree with this, so
Reviewed-by: Jan Beulich <jbeulich@suse.com>

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: David Wang <davidwang@zhaoxin.com>
> 
> This patch texturally (but not functionally) interacts with "xen/sched: Remove
> d->is_pinned" but rebasing either is easy.  It would also be best applied with
> Sergey's "x86/microcode: always collect_cpu_info() during boot".

Well, "best" is probably an understatement, as you'd supply
stale data if we never came through that code. So I'd call it a
strict prereq.

Jan
Andrew Cooper April 1, 2019, 3:35 p.m. UTC | #2
On 01/04/2019 15:55, Jan Beulich wrote:
>
>>  Secondly, when a guest executes CPUID, this doesn't
>> typically result in Xen executing a CPUID instruction in practice.
> Wait - this is true for DomU, but used to be false for (PV) Dom0,
> until you've switched over from pv_cpuid() to guest_cpuid(). I
> would assume it was an unintended side effect to, this way,
> eliminate the actual CPUID access from the special ucode rev
> reading sequence Dom0 may perform.

It certainly was an unintended side effect of the cpuid_policy work.  In
practice, dom0 still doesn't have faulting activated (due to the domain
builder not having switched to DOMCTL_set_cpu_policy yet), so can still
execute real CPUID instructions.

This is not going to stay true for very much longer.

> As to DomU, I severely doubt it's a good idea to expose the
> ucode revision to a guest, at least as long as the guest can be
> migrated. But yes, we can leave avoiding to do so to the guest
> itself: We won't mislead it any worse by handing it the real
> revision than we would by handing it some synthetized one (e.g.
> constant 0).

All guests in practice read the current microcode version.  As it is
model specific, we can't #GP on read, and faking up 0 is the only other
option.

Furthermore, the speculative sidechannels mean that guests in practice
do need to consult the microcode version for cases where MSR_ARCH_CAPS
isn't provided.  On Broadwell at least, it is a non-optional part of
evaluating the safety of retpoline.

We have always made the microcode version available, and even the
visibility of differences across migrate.  It would be better if guests
ignored microcode entirely if they found themselves virtualised, but I
doubt that is going to happen.

>> With the dom0 special case removed, there are now no writes to this MSR other
>> than Xen's microcode loading facilities, which means that the value held in
>> the MSR will be properly up-to-date.  Forward it directly, without jumping
>> through any hoops.
> I agree with this, so
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks, but I'll wait to see if we get a Shanghai answer in a reasonable
period of time.

>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: David Wang <davidwang@zhaoxin.com>
>>
>> This patch texturally (but not functionally) interacts with "xen/sched: Remove
>> d->is_pinned" but rebasing either is easy.  It would also be best applied with
>> Sergey's "x86/microcode: always collect_cpu_info() during boot".
> Well, "best" is probably an understatement, as you'd supply
> stale data if we never came through that code. So I'd call it a
> strict prereq.

Hmm - I've missed part of the explanation.

a) This behaviour is what happens currently for HVM guests.

b) In practice, it is only the very original P6 with no microcode which
may have a fully read/write MSR here without anything to update the
version.  AFACIT, for all CPUs which Xen will boot on, all that matters
is that a CPUID instruction has been executed previously, which is
covered by the AP boot path.

i.e. I think the logic would be more consistent to follow if this patch
was behind Sergey's, but there is no functional dependency.

~Andrew
Jan Beulich April 2, 2019, 10:40 a.m. UTC | #3
>>> On 01.04.19 at 17:35, <andrew.cooper3@citrix.com> wrote:
> On 01/04/2019 15:55, Jan Beulich wrote:
>>
>>>  Secondly, when a guest executes CPUID, this doesn't
>>> typically result in Xen executing a CPUID instruction in practice.
>> Wait - this is true for DomU, but used to be false for (PV) Dom0,
>> until you've switched over from pv_cpuid() to guest_cpuid(). I
>> would assume it was an unintended side effect to, this way,
>> eliminate the actual CPUID access from the special ucode rev
>> reading sequence Dom0 may perform.
> 
> It certainly was an unintended side effect of the cpuid_policy work.  In
> practice, dom0 still doesn't have faulting activated (due to the domain
> builder not having switched to DOMCTL_set_cpu_policy yet), so can still
> execute real CPUID instructions.

Provided it doesn't use the forced-#UD prefix, which I don't think
the Dom0 kernel would make any effort to bypass in this special
case.

> This is not going to stay true for very much longer.

Right.

>> As to DomU, I severely doubt it's a good idea to expose the
>> ucode revision to a guest, at least as long as the guest can be
>> migrated. But yes, we can leave avoiding to do so to the guest
>> itself: We won't mislead it any worse by handing it the real
>> revision than we would by handing it some synthetized one (e.g.
>> constant 0).
> 
> All guests in practice read the current microcode version.

Not really, no. In our XenoLinux forward port I had specifically
suppressed it, for being an undue operation to do by a PV
domain.

>  As it is
> model specific, we can't #GP on read, and faking up 0 is the only other
> option.
> 
> Furthermore, the speculative sidechannels mean that guests in practice
> do need to consult the microcode version for cases where MSR_ARCH_CAPS
> isn't provided.  On Broadwell at least, it is a non-optional part of
> evaluating the safety of retpoline.

But any decisions derived from this information are of limited
applicability.

>>> This patch texturally (but not functionally) interacts with "xen/sched: Remove
>>> d->is_pinned" but rebasing either is easy.  It would also be best applied with
>>> Sergey's "x86/microcode: always collect_cpu_info() during boot".
>> Well, "best" is probably an understatement, as you'd supply
>> stale data if we never came through that code. So I'd call it a
>> strict prereq.
> 
> Hmm - I've missed part of the explanation.
> 
> a) This behaviour is what happens currently for HVM guests.
> 
> b) In practice, it is only the very original P6 with no microcode which
> may have a fully read/write MSR here without anything to update the
> version.  AFACIT, for all CPUs which Xen will boot on, all that matters
> is that a CPUID instruction has been executed previously, which is
> covered by the AP boot path.
> 
> i.e. I think the logic would be more consistent to follow if this patch
> was behind Sergey's, but there is no functional dependency.

Ah, yes, good point.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 4df4a59..59b4298 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -135,6 +135,28 @@  int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
         /* Not offered to guests. */
         goto gp_fault;
 
+    case MSR_AMD_PATCHLEVEL:
+        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
+        /*
+         * AMD and Intel use the same MSR for the current microcode version.
+         *
+         * There is no need to jump through the SDM-provided hoops for Intel.
+         * A guest might itself perform the "write 0, CPUID, read" sequence,
+         * but servicing the CPUID for the guest typically wont result in
+         * actually executing a CPUID instruction.
+         *
+         * However, as a guest can't influence the value of this MSR, the
+         * value will be from Xen's last microcode load, which can be
+         * forwarded straight to the guest.
+         */
+        if ( (d->arch.cpuid->x86_vendor != X86_VENDOR_INTEL &&
+              d->arch.cpuid->x86_vendor != X86_VENDOR_AMD) ||
+             (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
+              boot_cpu_data.x86_vendor != X86_VENDOR_AMD) ||
+             rdmsr_safe(MSR_AMD_PATCHLEVEL, *val) )
+            goto gp_fault;
+        break;
+
     case MSR_SPEC_CTRL:
         if ( !cp->feat.ibrsb )
             goto gp_fault;
@@ -236,6 +258,19 @@  int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
         /* Not offered to guests. */
         goto gp_fault;
 
+    case MSR_AMD_PATCHLEVEL:
+        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
+        /*
+         * AMD and Intel use the same MSR for the current microcode version.
+         *
+         * Both document it as read-only.  However Intel also document that,
+         * for backwards compatiblity, the OS should write 0 to it before
+         * trying to access the current microcode version.
+         */
+        if ( d->arch.cpuid->x86_vendor != X86_VENDOR_INTEL || val != 0 )
+            goto gp_fault;
+        break;
+
     case MSR_AMD_PATCHLOADER:
         /*
          * See note on MSR_IA32_UCODE_WRITE below, which may or may not apply
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 84ce67c..a4f3ffc 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -893,17 +893,6 @@  static int read_msr(unsigned int reg, uint64_t *val,
         *val = 0;
         return X86EMUL_OKAY;
 
-    case MSR_IA32_UCODE_REV:
-        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            if ( wrmsr_safe(MSR_IA32_UCODE_REV, 0) )
-                break;
-            /* As documented in the SDM: Do a CPUID 1 here */
-            cpuid_eax(1);
-        }
-        goto normal;
-
     case MSR_IA32_MISC_ENABLE:
         rdmsrl(reg, *val);
         *val = guest_misc_enable(*val);
@@ -1047,17 +1036,6 @@  static int write_msr(unsigned int reg, uint64_t val,
             return X86EMUL_OKAY;
         break;
 
-    case MSR_IA32_UCODE_REV:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( !is_hwdom_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( rdmsr_safe(reg, temp) )
-            break;
-        if ( val )
-            goto invalid;
-        return X86EMUL_OKAY;
-
     case MSR_IA32_MISC_ENABLE:
         rdmsrl(reg, temp);
         if ( val != guest_misc_enable(temp) )