Message ID | 1457360841-28587-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 03/07/2016 09:27 AM, Andrew Cooper wrote: > The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into > vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS and PEBS > UNAVAIL bits. > > Some 64bit Windows include IA32_MISC_ENABLE in the set of items checked by > PatchGuard, and will suffer a BSOD 0x109 CRITICAL_STRUCTURE_CORRUPTION if the > contents change on migrate. > > The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> > > This appears to have been broken since the vPMU code was first introduced. It > appears to have lurked this log due to a hole (now fixed) in XenServers > upgrade testing. The BSODs occur ~80% of the time on Win 8 thru 10, but > appear very hard to provoke on Windows 7. > > This MSR still leaks mostly host state through into the guest. Therefore > migration of windows is still liable to crash if moving between two > non-identical servers. I need to get proper MSR levelling sorted before this > issue can be resolved fully. > --- > xen/arch/x86/cpu/vpmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c > index 237b5ff..2f9ddf6 100644 > --- a/xen/arch/x86/cpu/vpmu.c > +++ b/xen/arch/x86/cpu/vpmu.c > @@ -169,7 +169,7 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, > return ret; > > nop: > - if ( !is_write ) > + if ( !is_write && (msr != MSR_IA32_MISC_ENABLE) ) > *msr_content = 0; > > return 0; This is Intel-specific register so the test should really be happening in vpmu_intel.c. Of course then you'd need to always dereference vcpu_vpmu() and possibly add more checks to read/write ops (to mirror the one at the top of vpmu_do_msr()). So maybe at least have the vendor check too?? -boris
On 07/03/16 14:45, Boris Ostrovsky wrote: > On 03/07/2016 09:27 AM, Andrew Cooper wrote: >> The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into >> vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS >> and PEBS >> UNAVAIL bits. >> >> Some 64bit Windows include IA32_MISC_ENABLE in the set of items >> checked by >> PatchGuard, and will suffer a BSOD 0x109 >> CRITICAL_STRUCTURE_CORRUPTION if the >> contents change on migrate. >> >> The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> >> >> This appears to have been broken since the vPMU code was first >> introduced. It >> appears to have lurked this log due to a hole (now fixed) in XenServers >> upgrade testing. The BSODs occur ~80% of the time on Win 8 thru 10, but >> appear very hard to provoke on Windows 7. >> >> This MSR still leaks mostly host state through into the guest. >> Therefore >> migration of windows is still liable to crash if moving between two >> non-identical servers. I need to get proper MSR levelling sorted >> before this >> issue can be resolved fully. >> --- >> xen/arch/x86/cpu/vpmu.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c >> index 237b5ff..2f9ddf6 100644 >> --- a/xen/arch/x86/cpu/vpmu.c >> +++ b/xen/arch/x86/cpu/vpmu.c >> @@ -169,7 +169,7 @@ int vpmu_do_msr(unsigned int msr, uint64_t >> *msr_content, >> return ret; >> nop: >> - if ( !is_write ) >> + if ( !is_write && (msr != MSR_IA32_MISC_ENABLE) ) >> *msr_content = 0; >> return 0; > > > This is Intel-specific register so the test should really be happening > in vpmu_intel.c. Of course then you'd need to always dereference > vcpu_vpmu() and possibly add more checks to read/write ops (to mirror > the one at the top of vpmu_do_msr()). > > So maybe at least have the vendor check too?? Strictly speaking, if we were to do a vendor check, it should be a guest vendor check, not a host vendor check. OTOH, we won't get here on a non-Intel host system, and emulating a cross-vendor vPMU is going to end in disaster. I personally don't think its worth it. ~Andrew
On 03/07/2016 09:58 AM, Andrew Cooper wrote: > On 07/03/16 14:45, Boris Ostrovsky wrote: >> On 03/07/2016 09:27 AM, Andrew Cooper wrote: >>> The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into >>> vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS >>> and PEBS >>> UNAVAIL bits. >>> >>> Some 64bit Windows include IA32_MISC_ENABLE in the set of items >>> checked by >>> PatchGuard, and will suffer a BSOD 0x109 >>> CRITICAL_STRUCTURE_CORRUPTION if the >>> contents change on migrate. >>> >>> The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all. >>> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> --- >>> CC: Jan Beulich <JBeulich@suse.com> >>> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> >>> >>> This appears to have been broken since the vPMU code was first >>> introduced. It >>> appears to have lurked this log due to a hole (now fixed) in XenServers >>> upgrade testing. The BSODs occur ~80% of the time on Win 8 thru 10, but >>> appear very hard to provoke on Windows 7. >>> >>> This MSR still leaks mostly host state through into the guest. >>> Therefore >>> migration of windows is still liable to crash if moving between two >>> non-identical servers. I need to get proper MSR levelling sorted >>> before this >>> issue can be resolved fully. >>> --- >>> xen/arch/x86/cpu/vpmu.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c >>> index 237b5ff..2f9ddf6 100644 >>> --- a/xen/arch/x86/cpu/vpmu.c >>> +++ b/xen/arch/x86/cpu/vpmu.c >>> @@ -169,7 +169,7 @@ int vpmu_do_msr(unsigned int msr, uint64_t >>> *msr_content, >>> return ret; >>> nop: >>> - if ( !is_write ) >>> + if ( !is_write && (msr != MSR_IA32_MISC_ENABLE) ) >>> *msr_content = 0; >>> return 0; >> >> This is Intel-specific register so the test should really be happening >> in vpmu_intel.c. Of course then you'd need to always dereference >> vcpu_vpmu() and possibly add more checks to read/write ops (to mirror >> the one at the top of vpmu_do_msr()). >> >> So maybe at least have the vendor check too?? > Strictly speaking, if we were to do a vendor check, it should be a guest > vendor check, not a host vendor check. > > OTOH, we won't get here on a non-Intel host system, and emulating a > cross-vendor vPMU is going to end in disaster. I personally don't think > its worth it. I wasn't thinking about cross-vendor cases. But I forgot that we now go to VPMU code only for VPMU registers. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c index 237b5ff..2f9ddf6 100644 --- a/xen/arch/x86/cpu/vpmu.c +++ b/xen/arch/x86/cpu/vpmu.c @@ -169,7 +169,7 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, return ret; nop: - if ( !is_write ) + if ( !is_write && (msr != MSR_IA32_MISC_ENABLE) ) *msr_content = 0; return 0;
The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS and PEBS UNAVAIL bits. Some 64bit Windows include IA32_MISC_ENABLE in the set of items checked by PatchGuard, and will suffer a BSOD 0x109 CRITICAL_STRUCTURE_CORRUPTION if the contents change on migrate. The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> This appears to have been broken since the vPMU code was first introduced. It appears to have lurked this log due to a hole (now fixed) in XenServers upgrade testing. The BSODs occur ~80% of the time on Win 8 thru 10, but appear very hard to provoke on Windows 7. This MSR still leaks mostly host state through into the guest. Therefore migration of windows is still liable to crash if moving between two non-identical servers. I need to get proper MSR levelling sorted before this issue can be resolved fully. --- xen/arch/x86/cpu/vpmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)