diff mbox

[2/2] introduce -cpu host target

Message ID 1245707277-769-1-git-send-email-andre.przywara@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andre Przywara June 22, 2009, 9:47 p.m. UTC
Although the guest's CPUID bits can be controlled in a fine grained way
in QEMU, a simple way to inject the host CPU is missing. This is handy
for KVM desktop virtualization, where one wants the guest to support the
full host feature set.
Introduce another CPU type called 'host', which will propagate the host's
CPUID bits to the guest. Problematic bits can still be turned off by using
the existing syntax (-cpu host,-skinit)

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
 target-i386/helper.c |   65 +++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 59 insertions(+), 6 deletions(-)

I know that this has some more implications, currently it does not boot
Linux kernels on AMD K8 and Fam10h boxes with KVM. I will send out patches
for KVM to fix these issues (unhandled MSRs) later.
Should we ignore unhandled MSRs like QEMU or Xen do?
Also I want to hear your opinions on disabling CPUID bits afterwards
(like ibs, skinit and other things we don't virtualize yet).

Thanks and Regards,
Andre.

Comments

Avi Kivity June 23, 2009, 10:04 a.m. UTC | #1
On 06/23/2009 12:47 AM, Andre Przywara wrote:
> Although the guest's CPUID bits can be controlled in a fine grained way
> in QEMU, a simple way to inject the host CPU is missing. This is handy
> for KVM desktop virtualization, where one wants the guest to support the
> full host feature set.
> Introduce another CPU type called 'host', which will propagate the host's
> CPUID bits to the guest. Problematic bits can still be turned off by using
> the existing syntax (-cpu host,-skinit)
>    

kvm already knows how to filter unknown bits to prevent runtime 
failures.  This is even more important for qemu/tcg, since even simple 
bits which only define new instructions need explicit support in qemu.  
I think -cpu host should default to filtering unsupported bits instead 
of hoping the guest will ignore them or expecting the user to know which 
bits to remove.
Avi Kivity June 24, 2009, 9:54 a.m. UTC | #2
On 06/23/2009 12:47 AM, Andre Przywara wrote:
> Should we ignore unhandled MSRs like QEMU or Xen do?
>    

Ignoring unhandled msrs is dangerous.  If a write has some effect the 
guest depends on, and we're not emulating that effect, the guest will 
fail.  Similarly if you don't know what a register mean, who knows what 
returning zero for a read will do.
Andre Przywara June 24, 2009, 11:04 a.m. UTC | #3
Avi Kivity wrote:
> On 06/23/2009 12:47 AM, Andre Przywara wrote:
>> Should we ignore unhandled MSRs like QEMU or Xen do?
>>    
> 
> Ignoring unhandled msrs is dangerous.  If a write has some effect the 
> guest depends on, and we're not emulating that effect, the guest will 
> fail.  Similarly if you don't know what a register mean, who knows what 
> returning zero for a read will do.
I agree - from an academic POV.
But if the pragmatic approach simply enables many guests to run, then 
it's at least worth considering it.
And with the current approach the guest fails, too (due to the injected 
#GP).
If I only look at AMD's list of MSRs (not to speak of the internal list 
;-), there will be a lot of work to emulate them. Even worse, most of 
them cannot be properly emulated (like disable Lock prefix).

But nevertheless I would like to continue the "patch-on-demand" path by 
catching those MSRs that in-the-wild OSes really touch and handle them 
appropriately. Hopefully that will cover most of the MSRs.
Maybe we could consider an (module? QEMU cmdline?) option to ignore 
unknown MSRs.

Regards,
Andre.
Avi Kivity June 24, 2009, 11:26 a.m. UTC | #4
On 06/24/2009 02:04 PM, Andre Przywara wrote:
> Avi Kivity wrote:
>> On 06/23/2009 12:47 AM, Andre Przywara wrote:
>>> Should we ignore unhandled MSRs like QEMU or Xen do?
>>
>> Ignoring unhandled msrs is dangerous.  If a write has some effect the 
>> guest depends on, and we're not emulating that effect, the guest will 
>> fail.  Similarly if you don't know what a register mean, who knows 
>> what returning zero for a read will do.
> I agree - from an academic POV.
> But if the pragmatic approach simply enables many guests to run, then 
> it's at least worth considering it.
> And with the current approach the guest fails, too (due to the 
> injected #GP).
> If I only look at AMD's list of MSRs (not to speak of the internal 
> list ;-), there will be a lot of work to emulate them. Even worse, 
> most of them cannot be properly emulated (like disable Lock prefix).

We don't need to emulate all values.  We can allow default values or 
bits that don't matter for virtualization, and warn on bits that don't 
emulate properly.  I'm not concerned about failure -- just silent failure.

>
> But nevertheless I would like to continue the "patch-on-demand" path 
> by catching those MSRs that in-the-wild OSes really touch and handle 
> them appropriately. Hopefully that will cover most of the MSRs.
> Maybe we could consider an (module? QEMU cmdline?) option to ignore 
> unknown MSRs.

Good idea.  We can start with a module option, if it proves popular we 
can graduate it to a per-guest ioctl.
Jamie Lokier June 24, 2009, 4:43 p.m. UTC | #5
Andre Przywara wrote:
> Even worse, most of them cannot be properly emulated (like disable
> Lock prefix).

Disable Lock prefix should be easy to emulate by ignoring it,
shouldn't it? :-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filip Navara June 24, 2009, 5:37 p.m. UTC | #6
On Wed, Jun 24, 2009 at 11:54 AM, Avi Kivity<avi@redhat.com> wrote:
> On 06/23/2009 12:47 AM, Andre Przywara wrote:
>>
>> Should we ignore unhandled MSRs like QEMU or Xen do?
>>
>
> Ignoring unhandled msrs is dangerous.  If a write has some effect the guest
> depends on, and we're not emulating that effect, the guest will fail.
>  Similarly if you don't know what a register mean, who knows what returning
> zero for a read will do.

It is definitely a bad idea to ignore unknown MSRs. Kernel patch
protection scheme used by certain operating system depend on them to
work properly and it's pretty hard to debug when you don't know what
failed (the MSR read in this case).

http://www.uninformed.org/?v=3&a=3
http://www.uninformed.org/?v=6&a=1
http://www.uninformed.org/?v=8&a=5
http://en.wikipedia.org/wiki/Kernel_Patch_Protection

Best regards,
Filip Navara
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity June 24, 2009, 5:46 p.m. UTC | #7
On 06/24/2009 08:37 PM, Filip Navara wrote:
> On Wed, Jun 24, 2009 at 11:54 AM, Avi Kivity<avi@redhat.com>  wrote:
>    
>> On 06/23/2009 12:47 AM, Andre Przywara wrote:
>>      
>>> Should we ignore unhandled MSRs like QEMU or Xen do?
>>>
>>>        
>> Ignoring unhandled msrs is dangerous.  If a write has some effect the guest
>> depends on, and we're not emulating that effect, the guest will fail.
>>   Similarly if you don't know what a register mean, who knows what returning
>> zero for a read will do.
>>      
>
> It is definitely a bad idea to ignore unknown MSRs. Kernel patch
> protection scheme used by certain operating system depend on them to
> work properly and it's pretty hard to debug when you don't know what
> failed (the MSR read in this case).
>
> http://www.uninformed.org/?v=3&a=3
> http://www.uninformed.org/?v=6&a=1
> http://www.uninformed.org/?v=8&a=5
> http://en.wikipedia.org/wiki/Kernel_Patch_Protection
>
>    

Which unknown msrs are used by kernel patch protection?
Filip Navara June 24, 2009, 5:59 p.m. UTC | #8
On Wed, Jun 24, 2009 at 7:46 PM, Avi Kivity<avi@redhat.com> wrote:
> On 06/24/2009 08:37 PM, Filip Navara wrote:
>>
>> On Wed, Jun 24, 2009 at 11:54 AM, Avi Kivity<avi@redhat.com>  wrote:
>>
>>>
>>> On 06/23/2009 12:47 AM, Andre Przywara wrote:
>>>
>>>>
>>>> Should we ignore unhandled MSRs like QEMU or Xen do?
>>>>
>>>>
>>>
>>> Ignoring unhandled msrs is dangerous.  If a write has some effect the
>>> guest
>>> depends on, and we're not emulating that effect, the guest will fail.
>>>  Similarly if you don't know what a register mean, who knows what
>>> returning
>>> zero for a read will do.
>>>
>>
>> It is definitely a bad idea to ignore unknown MSRs. Kernel patch
>> protection scheme used by certain operating system depend on them to
>> work properly and it's pretty hard to debug when you don't know what
>> failed (the MSR read in this case).
>>
>> http://www.uninformed.org/?v=3&a=3
>> http://www.uninformed.org/?v=6&a=1
>> http://www.uninformed.org/?v=8&a=5
>> http://en.wikipedia.org/wiki/Kernel_Patch_Protection
>>
>>
>
> Which unknown msrs are used by kernel patch protection?

It's a moving target. At the time I first got Win64 running on QEMU it
was the one for getting number of implemented virtual address bits
(0x80000008 iirc) and some other for getting cache sizes
(0x80000005/0x80000006 iirc). Both of them were documented in AMD
manuals and not implemented by QEMU. Also the higher bits of virtual
addresses must be treated as sign-extended (as per the information in
the 0x80000008 MSR) even though there are actually bits stored in the
address. Me and Alex Ionescu have spent considerable time by reversing
the PatchGuard v1 and that information is described in more detail in
the first link above. I haven't looked at PatchGuard v2/v3 yet.

Best regards,
Filip Navara
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 529f962..80b5621 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -273,9 +273,9 @@  static x86_def_t x86_defs[] = {
     {
         .name = "athlon",
         .level = 2,
-        .vendor1 = 0x68747541, /* "Auth" */
-        .vendor2 = 0x69746e65, /* "enti" */
-        .vendor3 = 0x444d4163, /* "cAMD" */
+        .vendor1 = CPUID_VENDOR_AMD_1,
+        .vendor2 = CPUID_VENDOR_AMD_2,
+        .vendor3 = CPUID_VENDOR_AMD_3,
         .family = 6,
         .model = 2,
         .stepping = 3,
@@ -308,6 +308,54 @@  static x86_def_t x86_defs[] = {
     },
 };
 
+static void host_cpuid(uint32_t function, uint32_t count, uint32_t *eax,
+                               uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
+
+static int cpu_x86_fill_model_id(char *str)
+{
+    uint32_t eax, ebx, ecx, edx;
+    int i;
+
+    for (i = 0; i < 3; i++) {
+        host_cpuid(0x80000002 + i, 0, &eax, &ebx, &ecx, &edx);
+        memcpy(str + i * 16 +  0, &eax, 4);
+        memcpy(str + i * 16 +  4, &ebx, 4);
+        memcpy(str + i * 16 +  8, &ecx, 4);
+        memcpy(str + i * 16 + 12, &edx, 4);
+    }
+    return 0;
+}
+
+static int cpu_x86_fill_host(x86_def_t *x86_cpu_def)
+{
+    uint32_t eax, ebx, ecx, edx;
+
+    x86_cpu_def->name = "host";
+    host_cpuid(0x0, 0, &eax, &ebx, &ecx, &edx);
+    x86_cpu_def->level = eax;
+    x86_cpu_def->vendor1 = ebx;
+    x86_cpu_def->vendor2 = edx;
+    x86_cpu_def->vendor3 = ecx;
+
+    host_cpuid(0x1, 0, &eax, &ebx, &ecx, &edx);
+    x86_cpu_def->family = ((eax >> 8) & 0x0F) + ((eax >> 20) & 0xFF);
+    x86_cpu_def->model = ((eax >> 4) & 0x0F) | ((eax & 0xF0000) >> 12);
+    x86_cpu_def->stepping = eax & 0x0F;
+    x86_cpu_def->ext_features = ecx;
+    x86_cpu_def->features = edx;
+
+    host_cpuid(0x80000000, 0, &eax, &ebx, &ecx, &edx);
+    x86_cpu_def->xlevel = eax;
+
+    host_cpuid(0x80000001, 0, &eax, &ebx, &ecx, &edx);
+    x86_cpu_def->ext2_features = edx;
+    x86_cpu_def->ext3_features = ecx;
+    cpu_x86_fill_model_id(x86_cpu_def->model_id);
+    x86_cpu_def->vendor_override = 0;
+
+    return 0;
+}
+
 static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model)
 {
     unsigned int i;
@@ -326,9 +374,14 @@  static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model)
             break;
         }
     }
-    if (!def)
-        goto error;
-    memcpy(x86_cpu_def, def, sizeof(*def));
+    if (!def) {
+        if (strcmp(name, "host") != 0) {
+            goto error;
+        }
+        cpu_x86_fill_host(x86_cpu_def);
+    } else {
+        memcpy(x86_cpu_def, def, sizeof(*def));
+    }
 
     if (kvm_enabled()) {
         add_flagname_to_bitmaps("hypervisor", &plus_features,