diff mbox

KVM: X86: Allow userspace to define the microcode version

Message ID 1519629838-4898-1-git-send-email-wanpengli@tencent.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li Feb. 26, 2018, 7:23 a.m. UTC
From: Wanpeng Li <wanpengli@tencent.com>

Linux (among the others) has checks to make sure that certain features 
aren't enabled on a certain family/model/stepping if the microcode version 
isn't greater than or equal to a known good version.

By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.

Suggested-by: Filippo Sironi <sironi@amazon.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/x86.c              | 8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

Comments

Borislav Petkov Feb. 26, 2018, 9:41 a.m. UTC | #1
On Mon, Feb 26, 2018 at 03:23:58PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Linux (among the others) has checks to make sure that certain features 
> aren't enabled on a certain family/model/stepping if the microcode version 
> isn't greater than or equal to a known good version.
> 
> By exposing the real microcode version, we're preventing buggy guests that

Where do we prevent userspace from coming up with some non-sensical
microcode revision?
Wanpeng Li Feb. 26, 2018, 10:06 a.m. UTC | #2
2018-02-26 17:41 GMT+08:00 Borislav Petkov <bp@alien8.de>:
> On Mon, Feb 26, 2018 at 03:23:58PM +0800, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpengli@tencent.com>
>>
>> Linux (among the others) has checks to make sure that certain features
>> aren't enabled on a certain family/model/stepping if the microcode version
>> isn't greater than or equal to a known good version.
>>
>> By exposing the real microcode version, we're preventing buggy guests that
>
> Where do we prevent userspace from coming up with some non-sensical
> microcode revision?

I think it is the host admin(e.g. cloud provider)'s responsibility to
set an expected microcode revision. In addition, the non-sensical
value which is written by the guest will not reflect to guest-visible
microcode revision and just be ignored in this implementation.

Regards,
Wanpeng Li
Borislav Petkov Feb. 26, 2018, 10:49 a.m. UTC | #3
On Mon, Feb 26, 2018 at 06:06:42PM +0800, Wanpeng Li wrote:
> I think it is the host admin(e.g. cloud provider)'s responsibility to
> set an expected microcode revision.

+       vcpu->arch.microcode_version = 0x1;

That already looks pretty arbitrary and non-sensical to me.

>In addition, the non-sensical value which is written by the guest will
>not reflect to guest-visible microcode revision and just be ignored in
>this implementation.

Huh? How so?

So a guest will have *two* microcode revisions - both of which are most
likely wrong?!

This whole thing sounds like the wrong approach to me.

> Linux (among the others) has checks to make sure that certain features
> aren't enabled on a certain family/model/stepping if the microcode version
> isn't greater than or equal to a known good version.

It sounds to me like the proper fix is to make the kernel *not* look at
microcode revisions when running virtualized. The same way we're not
loading microcode in a guest:

        if (native_cpuid_ecx(1) & BIT(31))

Letting userspace control the microcode revision number is revision
number management SNAFU waiting to happen IMO.
Wanpeng Li Feb. 26, 2018, 11:02 a.m. UTC | #4
2018-02-26 18:49 GMT+08:00 Borislav Petkov <bp@alien8.de>:
> On Mon, Feb 26, 2018 at 06:06:42PM +0800, Wanpeng Li wrote:
>> I think it is the host admin(e.g. cloud provider)'s responsibility to
>> set an expected microcode revision.
>
> +       vcpu->arch.microcode_version = 0x1;
>
> That already looks pretty arbitrary and non-sensical to me.

This is the original kvm implementation before the patch.

>
>>In addition, the non-sensical value which is written by the guest will
>>not reflect to guest-visible microcode revision and just be ignored in
>>this implementation.
>
> Huh? How so?
>
> So a guest will have *two* microcode revisions - both of which are most
> likely wrong?!

Just one revision.

>
> This whole thing sounds like the wrong approach to me.
>
>> Linux (among the others) has checks to make sure that certain features
>> aren't enabled on a certain family/model/stepping if the microcode version
>> isn't greater than or equal to a known good version.
>
> It sounds to me like the proper fix is to make the kernel *not* look at
> microcode revisions when running virtualized. The same way we're not
> loading microcode in a guest:
>
>         if (native_cpuid_ecx(1) & BIT(31))
>
> Letting userspace control the microcode revision number is revision
> number management SNAFU waiting to happen IMO.

https://lkml.org/lkml/2017/12/9/29 The original discussion explain in
more details.

Regards,
Wanpeng Li
Borislav Petkov Feb. 26, 2018, 11:16 a.m. UTC | #5
On Mon, Feb 26, 2018 at 07:02:29PM +0800, Wanpeng Li wrote:
> > So a guest will have *two* microcode revisions - both of which are most
> > likely wrong?!
> 
> Just one revision.

So what does "the non-sensical value which is written by the guest will
not reflect to guest-visible microcode revision" even mean then?

cat /proc/cpuinfo

in the guest shows what exactly?

And what would RDMSR 0x8b show then?

> https://lkml.org/lkml/2017/12/9/29 The original discussion explain in
> more details.

My argument stands: exposing microcode revisions to guests is the wrong
approach. Instead, the kernel should not look at microcode revisions if
it runs virtualized.
Wanpeng Li Feb. 26, 2018, 11:25 a.m. UTC | #6
2018-02-26 19:16 GMT+08:00 Borislav Petkov <bp@alien8.de>:
> On Mon, Feb 26, 2018 at 07:02:29PM +0800, Wanpeng Li wrote:
>> > So a guest will have *two* microcode revisions - both of which are most
>> > likely wrong?!
>>
>> Just one revision.
>
> So what does "the non-sensical value which is written by the guest will
> not reflect to guest-visible microcode revision" even mean then?
>
> cat /proc/cpuinfo
>
> in the guest shows what exactly?
>
> And what would RDMSR 0x8b show then?

Both are the same values set by kvm userspace.

>
>> https://lkml.org/lkml/2017/12/9/29 The original discussion explain in
>> more details.
>

> approach. Instead, the kernel should not look at microcode revisions if
> it runs virtualized.

This is correct. The link explains why the userspace sets microcode
revision is still needed.

Regards,
Wanpeng Li
Borislav Petkov Feb. 26, 2018, 11:30 a.m. UTC | #7
On Mon, Feb 26, 2018 at 07:25:28PM +0800, Wanpeng Li wrote:
> Both are the same values set by kvm userspace.

This still doesn't answer my question what "the non-sensical value which
is written by the guest will not reflect to guest-visible microcode
revision" means?

> This is correct. The link explains why the userspace sets microcode
> revision is still needed.

Why is it still needed?
Wanpeng Li Feb. 26, 2018, 11:37 a.m. UTC | #8
2018-02-26 19:30 GMT+08:00 Borislav Petkov <bp@alien8.de>:
> On Mon, Feb 26, 2018 at 07:25:28PM +0800, Wanpeng Li wrote:
>> Both are the same values set by kvm userspace.
>
> This still doesn't answer my question what "the non-sensical value which
> is written by the guest will not reflect to guest-visible microcode
> revision" means?

The guest write is ignored as the original kvm implementation before the patch.

>
>> This is correct. The link explains why the userspace sets microcode
>> revision is still needed.
>
> Why is it still needed?

Hmm, the apic_check_deadline_errata() example can be referred to.

Regards,
Wanpeng Li
Borislav Petkov Feb. 26, 2018, 11:44 a.m. UTC | #9
On Mon, Feb 26, 2018 at 07:37:32PM +0800, Wanpeng Li wrote:
> The guest write is ignored as the original kvm implementation before the patch.

That will never work because there's no virtualized microcode loader.
Which will be a dumb idea anyway.

Goes to show that dealing with microcode revisions for a guest is the
wrong approach.

> Hmm, the apic_check_deadline_errata() example can be referred to.

So that's basically what I'm saying - fix apic_check_deadline_errata()
to check whether the kernel runs as a guest.
Paolo Bonzini Feb. 26, 2018, 11:47 a.m. UTC | #10
On 26/02/2018 11:49, Borislav Petkov wrote:
>> I think it is the host admin(e.g. cloud provider)'s responsibility to
>> set an expected microcode revision.
> +       vcpu->arch.microcode_version = 0x1;
> 
> That already looks pretty arbitrary and non-sensical to me.

It's actually 0x100000000.

>> In addition, the non-sensical value which is written by the guest will
>> not reflect to guest-visible microcode revision and just be ignored in
>> this implementation.
>
> Huh? How so?
> 
> So a guest will have *two* microcode revisions - both of which are most
> likely wrong?!

I don't understand this either.

Actually I think this patch makes sense, since some errata actually can
be worked around in the guest in the same way as the host.  However, it
should also be tied to the recently introduced mechanism to read MSR
contents from the host.

Paolo
Wanpeng Li Feb. 26, 2018, 11:52 a.m. UTC | #11
2018-02-26 19:44 GMT+08:00 Borislav Petkov <bp@alien8.de>:
> On Mon, Feb 26, 2018 at 07:37:32PM +0800, Wanpeng Li wrote:
>> The guest write is ignored as the original kvm implementation before the patch.
>
> That will never work because there's no virtualized microcode loader.
> Which will be a dumb idea anyway.
>
> Goes to show that dealing with microcode revisions for a guest is the
> wrong approach.
>
>> Hmm, the apic_check_deadline_errata() example can be referred to.
>
> So that's basically what I'm saying - fix apic_check_deadline_errata()
> to check whether the kernel runs as a guest.

Both I and the link agree with your opinion. However, it is hard to
fix all the guest images which have already been used by customers in
cloud environment, anyway, this patch supplies an alternative way to
work around by host admin.

Regards,
Wanpeng Li
Paolo Bonzini Feb. 26, 2018, 11:54 a.m. UTC | #12
On 26/02/2018 12:44, Borislav Petkov wrote:
>> The guest write is ignored as the original kvm implementation before the patch.
>
> That will never work because there's no virtualized microcode loader.
> Which will be a dumb idea anyway.
> 
> Goes to show that dealing with microcode revisions for a guest is the
> wrong approach.

I don't understand how one thing follows from the other.  How are writes
to 0x8B related to having a virtualized microcode loaded (which is a
concept that actually makes no sense at all)?

> So that's basically what I'm saying - fix apic_check_deadline_errata()
> to check whether the kernel runs as a guest.

It has already been fixed for a few months, and fixing it is indeed the
right thing to do independent of this patch.

Paolo
Borislav Petkov Feb. 26, 2018, 12:15 p.m. UTC | #13
On Mon, Feb 26, 2018 at 12:54:52PM +0100, Paolo Bonzini wrote:
> I don't understand how one thing follows from the other.  How are writes
> to 0x8B related to having a virtualized microcode loaded (which is a
> concept that actually makes no sense at all)?

I'm questioning the whole idea. 0x8b is the MSR which gives you the
microcode revision. Most CPUs don't even allow writing to it, AFAICT.
(SDM says "may prevent writing" on VM transitions.)

So how is that host-initiated write to 0x8b is even going to work, in
reality? kvm module writes the microcode version in there? How does the
admin work around that?

> It has already been fixed for a few months, and fixing it is indeed the
> right thing to do independent of this patch.

Yap.
Paolo Bonzini Feb. 26, 2018, 12:16 p.m. UTC | #14
On 26/02/2018 13:15, Borislav Petkov wrote:
> On Mon, Feb 26, 2018 at 12:54:52PM +0100, Paolo Bonzini wrote:
>> I don't understand how one thing follows from the other.  How are writes
>> to 0x8B related to having a virtualized microcode loaded (which is a
>> concept that actually makes no sense at all)?
> 
> I'm questioning the whole idea. 0x8b is the MSR which gives you the
> microcode revision. Most CPUs don't even allow writing to it, AFAICT.
> (SDM says "may prevent writing" on VM transitions.)
> 
> So how is that host-initiated write to 0x8b is even going to work, in
> reality? kvm module writes the microcode version in there? How does the
> admin work around that?

In this context, "host-initiated" write means written by KVM userspace
with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
or live migration.

Thanks,

Paolo

>> It has already been fixed for a few months, and fixing it is indeed the
>> right thing to do independent of this patch.
> 
> Yap.
>
Paolo Bonzini Feb. 26, 2018, 12:18 p.m. UTC | #15
On 26/02/2018 13:16, Paolo Bonzini wrote:
> On 26/02/2018 13:15, Borislav Petkov wrote:
>> On Mon, Feb 26, 2018 at 12:54:52PM +0100, Paolo Bonzini wrote:
>>> I don't understand how one thing follows from the other.  How are writes
>>> to 0x8B related to having a virtualized microcode loaded (which is a
>>> concept that actually makes no sense at all)?
>>
>> I'm questioning the whole idea. 0x8b is the MSR which gives you the
>> microcode revision. Most CPUs don't even allow writing to it, AFAICT.
>> (SDM says "may prevent writing" on VM transitions.)
>>
>> So how is that host-initiated write to 0x8b is even going to work, in
>> reality? kvm module writes the microcode version in there? How does the
>> admin work around that?
> 
> In this context, "host-initiated" write means written by KVM userspace
> with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
> or live migration.

To be clear, the target of the write is still the vCPU's emulated MSR.

Paolo
Borislav Petkov Feb. 26, 2018, 12:20 p.m. UTC | #16
On Mon, Feb 26, 2018 at 12:47:27PM +0100, Paolo Bonzini wrote:
> It's actually 0x100000000.

Even more wrong. Revision ID is bits [31:0].

> Actually I think this patch makes sense, since some errata actually can
> be worked around in the guest in the same way as the host.  However, it
> should also be tied to the recently introduced mechanism to read MSR
> contents from the host.

So if we decide to do microcode revisions in the guest, we should
consider the fact that microcode revisions need to be handled the same
way as CPUID bits: a revision number means something on baremetal and
implies a certain functionality, just like a set CPUID bit does.

So we either have that same functionality in the guest or we don't talk
about revisions at all. Because if we end up lying by coming up with
revision numbers, all hell will break loose in the guest.

IMO, of course.
Borislav Petkov Feb. 26, 2018, 12:22 p.m. UTC | #17
On Mon, Feb 26, 2018 at 01:18:07PM +0100, Paolo Bonzini wrote:
> > In this context, "host-initiated" write means written by KVM userspace
> > with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
> > or live migration.
> 
> To be clear, the target of the write is still the vCPU's emulated MSR.

So how am I to imagine this as a user:

qemu-system-x86_64 --microcode-revision=0xdeadbeef...

?
Paolo Bonzini Feb. 26, 2018, 12:41 p.m. UTC | #18
On 26/02/2018 13:22, Borislav Petkov wrote:
> On Mon, Feb 26, 2018 at 01:18:07PM +0100, Paolo Bonzini wrote:
>>> In this context, "host-initiated" write means written by KVM userspace
>>> with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
>>> or live migration.
>>
>> To be clear, the target of the write is still the vCPU's emulated MSR.
> 
> So how am I to imagine this as a user:
> 
> qemu-system-x86_64 --microcode-revision=0xdeadbeef...

More like "-cpu foo,ucode_rev=0xdeadbeef".  But in practice what would
happen is one of the following:

1) "-cpu host" sets ucode_rev to the same value of the host, everyone
else leaves it to zero as is now.

2) Only Amazon uses this feature and we ignore it. :)

Paolo
Borislav Petkov Feb. 26, 2018, 1:05 p.m. UTC | #19
On Mon, Feb 26, 2018 at 01:41:38PM +0100, Paolo Bonzini wrote:
> More like "-cpu foo,ucode_rev=0xdeadbeef".  But in practice what would
> happen is one of the following:
> 
> 1) "-cpu host" sets ucode_rev to the same value of the host, everyone
> else leaves it to zero as is now.
> 
> 2) Only Amazon uses this feature and we ignore it. :)

I fear that that might get misused and we probably should consider some
trivial range checking and each qemu cpu model would have a valid range
or so.

Or we should better do that in kvm_set_msr_common directly... although
if we do it here, it would require kvm knowing about all those different
microcode revisions and qemu cpu models sounds better...
Wanpeng Li April 17, 2018, 10:40 a.m. UTC | #20
Cc Eduardo,
2018-02-26 20:41 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 26/02/2018 13:22, Borislav Petkov wrote:
>> On Mon, Feb 26, 2018 at 01:18:07PM +0100, Paolo Bonzini wrote:
>>>> In this context, "host-initiated" write means written by KVM userspace
>>>> with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
>>>> or live migration.
>>>
>>> To be clear, the target of the write is still the vCPU's emulated MSR.
>>
>> So how am I to imagine this as a user:
>>
>> qemu-system-x86_64 --microcode-revision=0xdeadbeef...
>
> More like "-cpu foo,ucode_rev=0xdeadbeef".  But in practice what would
> happen is one of the following:
>
> 1) "-cpu host" sets ucode_rev to the same value of the host, everyone
> else leaves it to zero as is now.

Hi Paolo,

Do you mean the host admin to get the ucode_rev from the host and set
to -cpu host, ucode_rev=xxxxxx or qemu get the ucode_rev directly by
rdmsr?

Regards,
Wanpeng Li
Eduardo Habkost April 17, 2018, 8:24 p.m. UTC | #21
On Tue, Apr 17, 2018 at 06:40:58PM +0800, Wanpeng Li wrote:
> Cc Eduardo,
> 2018-02-26 20:41 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> > On 26/02/2018 13:22, Borislav Petkov wrote:
> >> On Mon, Feb 26, 2018 at 01:18:07PM +0100, Paolo Bonzini wrote:
> >>>> In this context, "host-initiated" write means written by KVM userspace
> >>>> with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
> >>>> or live migration.
> >>>
> >>> To be clear, the target of the write is still the vCPU's emulated MSR.
> >>
> >> So how am I to imagine this as a user:
> >>
> >> qemu-system-x86_64 --microcode-revision=0xdeadbeef...
> >
> > More like "-cpu foo,ucode_rev=0xdeadbeef".  But in practice what would
> > happen is one of the following:
> >
> > 1) "-cpu host" sets ucode_rev to the same value of the host, everyone
> > else leaves it to zero as is now.
> 
> Hi Paolo,
> 
> Do you mean the host admin to get the ucode_rev from the host and set
> to -cpu host, ucode_rev=xxxxxx or qemu get the ucode_rev directly by
> rdmsr?

QEMU setting ucode_rev automatically using the host value when
using "-cpu host" (with no need for explicit ucode_rev option)
makes sense to me.
Wanpeng Li April 18, 2018, 3:24 a.m. UTC | #22
2018-04-18 4:24 GMT+08:00 Eduardo Habkost <ehabkost@redhat.com>:
> On Tue, Apr 17, 2018 at 06:40:58PM +0800, Wanpeng Li wrote:
>> Cc Eduardo,
>> 2018-02-26 20:41 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> > On 26/02/2018 13:22, Borislav Petkov wrote:
>> >> On Mon, Feb 26, 2018 at 01:18:07PM +0100, Paolo Bonzini wrote:
>> >>>> In this context, "host-initiated" write means written by KVM userspace
>> >>>> with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
>> >>>> or live migration.
>> >>>
>> >>> To be clear, the target of the write is still the vCPU's emulated MSR.
>> >>
>> >> So how am I to imagine this as a user:
>> >>
>> >> qemu-system-x86_64 --microcode-revision=0xdeadbeef...
>> >
>> > More like "-cpu foo,ucode_rev=0xdeadbeef".  But in practice what would
>> > happen is one of the following:
>> >
>> > 1) "-cpu host" sets ucode_rev to the same value of the host, everyone
>> > else leaves it to zero as is now.
>>
>> Hi Paolo,
>>
>> Do you mean the host admin to get the ucode_rev from the host and set
>> to -cpu host, ucode_rev=xxxxxx or qemu get the ucode_rev directly by
>> rdmsr?
>
> QEMU setting ucode_rev automatically using the host value when
> using "-cpu host" (with no need for explicit ucode_rev option)
> makes sense to me.

QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
since rdmsr will #GP when ring !=0, any idea?

Regards,
Wanpeng Li
Eduardo Habkost April 18, 2018, 9:03 a.m. UTC | #23
On Wed, Apr 18, 2018 at 11:24:22AM +0800, Wanpeng Li wrote:
> 2018-04-18 4:24 GMT+08:00 Eduardo Habkost <ehabkost@redhat.com>:
> > On Tue, Apr 17, 2018 at 06:40:58PM +0800, Wanpeng Li wrote:
> >> Cc Eduardo,
> >> 2018-02-26 20:41 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> >> > On 26/02/2018 13:22, Borislav Petkov wrote:
> >> >> On Mon, Feb 26, 2018 at 01:18:07PM +0100, Paolo Bonzini wrote:
> >> >>>> In this context, "host-initiated" write means written by KVM userspace
> >> >>>> with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
> >> >>>> or live migration.
> >> >>>
> >> >>> To be clear, the target of the write is still the vCPU's emulated MSR.
> >> >>
> >> >> So how am I to imagine this as a user:
> >> >>
> >> >> qemu-system-x86_64 --microcode-revision=0xdeadbeef...
> >> >
> >> > More like "-cpu foo,ucode_rev=0xdeadbeef".  But in practice what would
> >> > happen is one of the following:
> >> >
> >> > 1) "-cpu host" sets ucode_rev to the same value of the host, everyone
> >> > else leaves it to zero as is now.
> >>
> >> Hi Paolo,
> >>
> >> Do you mean the host admin to get the ucode_rev from the host and set
> >> to -cpu host, ucode_rev=xxxxxx or qemu get the ucode_rev directly by
> >> rdmsr?
> >
> > QEMU setting ucode_rev automatically using the host value when
> > using "-cpu host" (with no need for explicit ucode_rev option)
> > makes sense to me.
> 
> QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
> since rdmsr will #GP when ring !=0, any idea?

By looking at kvm_get_msr_feature(), it looks like
ioctl(system_fd, KVM_GET_MSRS) would return the host MSR value
for us.
Paolo Bonzini April 18, 2018, 10:36 a.m. UTC | #24
On 18/04/2018 11:03, Eduardo Habkost wrote:
>>> QEMU setting ucode_rev automatically using the host value when
>>> using "-cpu host" (with no need for explicit ucode_rev option)
>>> makes sense to me.
>> QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
>> since rdmsr will #GP when ring !=0, any idea?
> By looking at kvm_get_msr_feature(), it looks like
> ioctl(system_fd, KVM_GET_MSRS) would return the host MSR value
> for us.

Yes, that's exactly what it was introduced for (together with other MSRs
including VMX capabilities).

Paolo
Borislav Petkov April 23, 2018, 12:58 p.m. UTC | #25
On Wed, Apr 18, 2018 at 12:36:37PM +0200, Paolo Bonzini wrote:
> On 18/04/2018 11:03, Eduardo Habkost wrote:
> >>> QEMU setting ucode_rev automatically using the host value when
> >>> using "-cpu host" (with no need for explicit ucode_rev option)
> >>> makes sense to me.
> >> QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
> >> since rdmsr will #GP when ring !=0, any idea?
> > By looking at kvm_get_msr_feature(), it looks like
> > ioctl(system_fd, KVM_GET_MSRS) would return the host MSR value
> > for us.
> 
> Yes, that's exactly what it was introduced for (together with other MSRs
> including VMX capabilities).

Can't qemu do:

grep microcode /proc/cpuinfo | awk '{ print $3 }' | head -n 1

?

:)
Eduardo Habkost April 23, 2018, 1:08 p.m. UTC | #26
On Mon, Apr 23, 2018 at 02:58:49PM +0200, Borislav Petkov wrote:
> On Wed, Apr 18, 2018 at 12:36:37PM +0200, Paolo Bonzini wrote:
> > On 18/04/2018 11:03, Eduardo Habkost wrote:
> > >>> QEMU setting ucode_rev automatically using the host value when
> > >>> using "-cpu host" (with no need for explicit ucode_rev option)
> > >>> makes sense to me.
> > >> QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
> > >> since rdmsr will #GP when ring !=0, any idea?
> > > By looking at kvm_get_msr_feature(), it looks like
> > > ioctl(system_fd, KVM_GET_MSRS) would return the host MSR value
> > > for us.
> > 
> > Yes, that's exactly what it was introduced for (together with other MSRs
> > including VMX capabilities).
> 
> Can't qemu do:
> 
> grep microcode /proc/cpuinfo | awk '{ print $3 }' | head -n 1

It could, but why would QEMU do it if a real API already exists
for that?
Borislav Petkov April 23, 2018, 1:23 p.m. UTC | #27
On Mon, Apr 23, 2018 at 10:08:23AM -0300, Eduardo Habkost wrote:
> On Mon, Apr 23, 2018 at 02:58:49PM +0200, Borislav Petkov wrote:
> > On Wed, Apr 18, 2018 at 12:36:37PM +0200, Paolo Bonzini wrote:
> > > On 18/04/2018 11:03, Eduardo Habkost wrote:
> > > >>> QEMU setting ucode_rev automatically using the host value when
> > > >>> using "-cpu host" (with no need for explicit ucode_rev option)
> > > >>> makes sense to me.
> > > >> QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
> > > >> since rdmsr will #GP when ring !=0, any idea?
> > > > By looking at kvm_get_msr_feature(), it looks like
> > > > ioctl(system_fd, KVM_GET_MSRS) would return the host MSR value
> > > > for us.
> > > 
> > > Yes, that's exactly what it was introduced for (together with other MSRs
> > > including VMX capabilities).
> > 
> > Can't qemu do:
> > 
> > grep microcode /proc/cpuinfo | awk '{ print $3 }' | head -n 1
> 
> It could, but why would QEMU do it if a real API already exists
> for that?

To save an MSR read per logical CPU but from the looks of it, that
KVM_GET_MSRS is widely used in qemu with kvm_get_msrs() so I guess one
more MSR doesn't matter.
Paolo Bonzini April 23, 2018, 4:03 p.m. UTC | #28
On 23/04/2018 14:58, Borislav Petkov wrote:
>>>> QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
>>>> since rdmsr will #GP when ring !=0, any idea?
>>> By looking at kvm_get_msr_feature(), it looks like
>>> ioctl(system_fd, KVM_GET_MSRS) would return the host MSR value
>>> for us.
>> Yes, that's exactly what it was introduced for (together with other MSRs
>> including VMX capabilities).
> Can't qemu do:
> 
> grep microcode /proc/cpuinfo | awk '{ print $3 }' | head -n 1
> 
> ?

Yes, but you took that a statement bit too narrowly.

We didn't include KVM_GET_MSRS because microcode version wasn't
otherwise available; rather, there's a general need for KVM userspace to
know the values of host MSRs, and microcode is an example.

Paolo
Wanpeng Li April 24, 2018, 2:56 a.m. UTC | #29
2018-02-26 20:41 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 26/02/2018 13:22, Borislav Petkov wrote:
>> On Mon, Feb 26, 2018 at 01:18:07PM +0100, Paolo Bonzini wrote:
>>>> In this context, "host-initiated" write means written by KVM userspace
>>>> with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
>>>> or live migration.
>>>
>>> To be clear, the target of the write is still the vCPU's emulated MSR.
>>
>> So how am I to imagine this as a user:
>>
>> qemu-system-x86_64 --microcode-revision=0xdeadbeef...
>
> More like "-cpu foo,ucode_rev=0xdeadbeef".  But in practice what would
> happen is one of the following:
>
> 1) "-cpu host" sets ucode_rev to the same value of the host, everyone

I found vmware on my macbook will also do this. :)

Regards,
Wanpeng Li
Wanpeng Li April 24, 2018, 2:59 a.m. UTC | #30
2018-04-18 18:36 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 18/04/2018 11:03, Eduardo Habkost wrote:
>>>> QEMU setting ucode_rev automatically using the host value when
>>>> using "-cpu host" (with no need for explicit ucode_rev option)
>>>> makes sense to me.
>>> QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
>>> since rdmsr will #GP when ring !=0, any idea?
>> By looking at kvm_get_msr_feature(), it looks like
>> ioctl(system_fd, KVM_GET_MSRS) would return the host MSR value
>> for us.
>
> Yes, that's exactly what it was introduced for (together with other MSRs
> including VMX capabilities).

How about the live migration? What will happen if the source and
destination machines have different microcode version?

Regards,
Wanpeng Li
Konrad Rzeszutek Wilk April 24, 2018, 3:14 a.m. UTC | #31
On Tue, Apr 24, 2018 at 10:59:04AM +0800, Wanpeng Li wrote:
> 2018-04-18 18:36 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> > On 18/04/2018 11:03, Eduardo Habkost wrote:
> >>>> QEMU setting ucode_rev automatically using the host value when
> >>>> using "-cpu host" (with no need for explicit ucode_rev option)
> >>>> makes sense to me.
> >>> QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
> >>> since rdmsr will #GP when ring !=0, any idea?
> >> By looking at kvm_get_msr_feature(), it looks like
> >> ioctl(system_fd, KVM_GET_MSRS) would return the host MSR value
> >> for us.
> >
> > Yes, that's exactly what it was introduced for (together with other MSRs
> > including VMX capabilities).
> 
> How about the live migration? What will happen if the source and
> destination machines have different microcode version?

You would need to include the microcode version in the migration stream.

But this brings another point - what if we want to manifest certain
new CPUID bits?

For example, see:
https://software.intel.com/sites/default/files/managed/1d/46/Retpoline-A-Branch-Target-Injection-Mitigation.pdf

5.3:
"To remedy this situation, an operating system running as a VM can query bit 2 of the
IA32_ARCH_CAPABILITIES MSR, known as “RSB Alternate” (RSBA). When RSBA is set, it
indicates that the VM may run on a processor vulnerable to exploits of Empty RSB
conditions regardless of the processor’s DisplayFamily/DisplayModel signature, and
that the operating system should deploy appropriate mitigations. Virtual machine
managers (VMM) may set RSBA via MSR interception to indicate that a virtual machine
might run at some time in the future on a vulnerable processor."

Perhaps the guest should do a bit of sampling of various CPUIDs as the migration
has been done? Is there a nice KVM hook inside of the guest to do this?

> 
> Regards,
> Wanpeng Li
Paolo Bonzini April 24, 2018, 5:09 a.m. UTC | #32
On 24/04/2018 05:14, Konrad Rzeszutek Wilk wrote:
> You would need to include the microcode version in the migration stream.
> 
> But this brings another point - what if we want to manifest certain
> new CPUID bits?

You don't do that across migration.  Generally if you want to do live
migration and you set up the guest to know everything about the host
(down to the microcode level), you should make sure your host are pretty
much identical.

Paolo
Konrad Rzeszutek Wilk April 24, 2018, 1:44 p.m. UTC | #33
On April 24, 2018 1:09:00 AM EDT, Paolo Bonzini <pbonzini@redhat.com> wrote:
>On 24/04/2018 05:14, Konrad Rzeszutek Wilk wrote:
>> You would need to include the microcode version in the migration
>stream.
>> 
>> But this brings another point - what if we want to manifest certain
>> new CPUID bits?
>
>You don't do that across migration.  Generally if you want to do live
>migration and you set up the guest to know everything about the host
>(down to the microcode level), you should make sure your host are
>pretty
>much identical.

I understand how it ought to be but sadly the cloud vendors have a mix of hardware.


With the retpoline/IBRS support (like what RH kernel has) you could migrate from Skylake to Broadwell and switching over from IBRS to retpoline would be good.

Hence asking about this.

>
>Paolo
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 938d453..6e13f2f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -507,6 +507,7 @@  struct kvm_vcpu_arch {
 	u64 smi_count;
 	bool tpr_access_reporting;
 	u64 ia32_xss;
+	u32 microcode_version;
 
 	/*
 	 * Paging state of the vcpu
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a3ed81..cc51c61 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2247,7 +2247,6 @@  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 	switch (msr) {
 	case MSR_AMD64_NB_CFG:
-	case MSR_IA32_UCODE_REV:
 	case MSR_IA32_UCODE_WRITE:
 	case MSR_VM_HSAVE_PA:
 	case MSR_AMD64_PATCH_LOADER:
@@ -2255,6 +2254,10 @@  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_AMD64_DC_CFG:
 		break;
 
+	case MSR_IA32_UCODE_REV:
+		if (msr_info->host_initiated)
+			vcpu->arch.microcode_version = data >> 32;
+		break;
 	case MSR_EFER:
 		return set_efer(vcpu, data);
 	case MSR_K7_HWCR:
@@ -2550,7 +2553,7 @@  int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = 0;
 		break;
 	case MSR_IA32_UCODE_REV:
-		msr_info->data = 0x100000000ULL;
+		msr_info->data = (u64)vcpu->arch.microcode_version << 32;
 		break;
 	case MSR_MTRRcap:
 	case 0x200 ... 0x2ff:
@@ -8232,6 +8235,7 @@  void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vcpu->arch.regs_dirty = ~0;
 
 	vcpu->arch.ia32_xss = 0;
+	vcpu->arch.microcode_version = 0x1;
 
 	kvm_x86_ops->vcpu_reset(vcpu, init_event);
 }