diff mbox

xen/vPMU: Do not clobber IA32_MISC_ENABLE

Message ID 1457360841-28587-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper March 7, 2016, 2:27 p.m. UTC
The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into
vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS and PEBS
UNAVAIL bits.

Some 64bit Windows include IA32_MISC_ENABLE in the set of items checked by
PatchGuard, and will suffer a BSOD 0x109 CRITICAL_STRUCTURE_CORRUPTION if the
contents change on migrate.

The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>

This appears to have been broken since the vPMU code was first introduced.  It
appears to have lurked this log due to a hole (now fixed) in XenServers
upgrade testing.  The BSODs occur ~80% of the time on Win 8 thru 10, but
appear very hard to provoke on Windows 7.

This MSR still leaks mostly host state through into the guest.  Therefore
migration of windows is still liable to crash if moving between two
non-identical servers.  I need to get proper MSR levelling sorted before this
issue can be resolved fully.
---
 xen/arch/x86/cpu/vpmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Boris Ostrovsky March 7, 2016, 2:45 p.m. UTC | #1
On 03/07/2016 09:27 AM, Andrew Cooper wrote:
> The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into
> vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS and PEBS
> UNAVAIL bits.
>
> Some 64bit Windows include IA32_MISC_ENABLE in the set of items checked by
> PatchGuard, and will suffer a BSOD 0x109 CRITICAL_STRUCTURE_CORRUPTION if the
> contents change on migrate.
>
> The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
> This appears to have been broken since the vPMU code was first introduced.  It
> appears to have lurked this log due to a hole (now fixed) in XenServers
> upgrade testing.  The BSODs occur ~80% of the time on Win 8 thru 10, but
> appear very hard to provoke on Windows 7.
>
> This MSR still leaks mostly host state through into the guest.  Therefore
> migration of windows is still liable to crash if moving between two
> non-identical servers.  I need to get proper MSR levelling sorted before this
> issue can be resolved fully.
> ---
>   xen/arch/x86/cpu/vpmu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> index 237b5ff..2f9ddf6 100644
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -169,7 +169,7 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
>       return ret;
>   
>    nop:
> -    if ( !is_write )
> +    if ( !is_write && (msr != MSR_IA32_MISC_ENABLE) )
>           *msr_content = 0;
>   
>       return 0;


This is Intel-specific register so the test should really be happening 
in vpmu_intel.c. Of course then you'd need to always dereference 
vcpu_vpmu() and possibly add more checks to read/write ops (to mirror 
the one at the top of vpmu_do_msr()).

So maybe at least have the vendor check too??

-boris
Andrew Cooper March 7, 2016, 2:58 p.m. UTC | #2
On 07/03/16 14:45, Boris Ostrovsky wrote:
> On 03/07/2016 09:27 AM, Andrew Cooper wrote:
>> The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into
>> vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS
>> and PEBS
>> UNAVAIL bits.
>>
>> Some 64bit Windows include IA32_MISC_ENABLE in the set of items
>> checked by
>> PatchGuard, and will suffer a BSOD 0x109
>> CRITICAL_STRUCTURE_CORRUPTION if the
>> contents change on migrate.
>>
>> The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>
>> This appears to have been broken since the vPMU code was first
>> introduced.  It
>> appears to have lurked this log due to a hole (now fixed) in XenServers
>> upgrade testing.  The BSODs occur ~80% of the time on Win 8 thru 10, but
>> appear very hard to provoke on Windows 7.
>>
>> This MSR still leaks mostly host state through into the guest. 
>> Therefore
>> migration of windows is still liable to crash if moving between two
>> non-identical servers.  I need to get proper MSR levelling sorted
>> before this
>> issue can be resolved fully.
>> ---
>>   xen/arch/x86/cpu/vpmu.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
>> index 237b5ff..2f9ddf6 100644
>> --- a/xen/arch/x86/cpu/vpmu.c
>> +++ b/xen/arch/x86/cpu/vpmu.c
>> @@ -169,7 +169,7 @@ int vpmu_do_msr(unsigned int msr, uint64_t
>> *msr_content,
>>       return ret;
>>      nop:
>> -    if ( !is_write )
>> +    if ( !is_write && (msr != MSR_IA32_MISC_ENABLE) )
>>           *msr_content = 0;
>>         return 0;
>
>
> This is Intel-specific register so the test should really be happening
> in vpmu_intel.c. Of course then you'd need to always dereference
> vcpu_vpmu() and possibly add more checks to read/write ops (to mirror
> the one at the top of vpmu_do_msr()).
>
> So maybe at least have the vendor check too??

Strictly speaking, if we were to do a vendor check, it should be a guest
vendor check, not a host vendor check.

OTOH, we won't get here on a non-Intel host system, and emulating a
cross-vendor vPMU is going to end in disaster.  I personally don't think
its worth it.

~Andrew
Boris Ostrovsky March 7, 2016, 3:11 p.m. UTC | #3
On 03/07/2016 09:58 AM, Andrew Cooper wrote:
> On 07/03/16 14:45, Boris Ostrovsky wrote:
>> On 03/07/2016 09:27 AM, Andrew Cooper wrote:
>>> The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into
>>> vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS
>>> and PEBS
>>> UNAVAIL bits.
>>>
>>> Some 64bit Windows include IA32_MISC_ENABLE in the set of items
>>> checked by
>>> PatchGuard, and will suffer a BSOD 0x109
>>> CRITICAL_STRUCTURE_CORRUPTION if the
>>> contents change on migrate.
>>>
>>> The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>
>>> This appears to have been broken since the vPMU code was first
>>> introduced.  It
>>> appears to have lurked this log due to a hole (now fixed) in XenServers
>>> upgrade testing.  The BSODs occur ~80% of the time on Win 8 thru 10, but
>>> appear very hard to provoke on Windows 7.
>>>
>>> This MSR still leaks mostly host state through into the guest.
>>> Therefore
>>> migration of windows is still liable to crash if moving between two
>>> non-identical servers.  I need to get proper MSR levelling sorted
>>> before this
>>> issue can be resolved fully.
>>> ---
>>>    xen/arch/x86/cpu/vpmu.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
>>> index 237b5ff..2f9ddf6 100644
>>> --- a/xen/arch/x86/cpu/vpmu.c
>>> +++ b/xen/arch/x86/cpu/vpmu.c
>>> @@ -169,7 +169,7 @@ int vpmu_do_msr(unsigned int msr, uint64_t
>>> *msr_content,
>>>        return ret;
>>>       nop:
>>> -    if ( !is_write )
>>> +    if ( !is_write && (msr != MSR_IA32_MISC_ENABLE) )
>>>            *msr_content = 0;
>>>          return 0;
>>
>> This is Intel-specific register so the test should really be happening
>> in vpmu_intel.c. Of course then you'd need to always dereference
>> vcpu_vpmu() and possibly add more checks to read/write ops (to mirror
>> the one at the top of vpmu_do_msr()).
>>
>> So maybe at least have the vendor check too??
> Strictly speaking, if we were to do a vendor check, it should be a guest
> vendor check, not a host vendor check.
>
> OTOH, we won't get here on a non-Intel host system, and emulating a
> cross-vendor vPMU is going to end in disaster.  I personally don't think
> its worth it.

I wasn't thinking about cross-vendor cases. But I forgot that we now go 
to VPMU code only for VPMU registers.

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
diff mbox

Patch

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 237b5ff..2f9ddf6 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -169,7 +169,7 @@  int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
     return ret;
 
  nop:
-    if ( !is_write )
+    if ( !is_write && (msr != MSR_IA32_MISC_ENABLE) )
         *msr_content = 0;
 
     return 0;