Message ID | 20200520160740.6144-3-mlevitsk@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix breakage from adding MSR_IA32_UMWAIT_CONTROL | expand |
Maxim Levitsky <mlevitsk@redhat.com> writes: > This msr is only available when the host supports WAITPKG feature. > > This breaks a nested guest, if the L1 hypervisor is set to ignore > unknown msrs, because the only other safety check that the > kernel does is that it attempts to read the msr and > rejects it if it gets an exception. > > Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > arch/x86/kvm/x86.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index fe3a24fd6b263..9c507b32b1b77 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) > if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= > min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) > continue; > + break; > + case MSR_IA32_UMWAIT_CONTROL: > + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) > + continue; I'm probably missing something but (if I understand correctly) the only effect of dropping MSR_IA32_UMWAIT_CONTROL from msrs_to_save would be that KVM userspace won't see it in e.g. KVM_GET_MSR_INDEX_LIST. But why is this causing an issue? I see both vmx_get_msr()/vmx_set_msr() have 'host_initiated' check: case MSR_IA32_UMWAIT_CONTROL: if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx)) return 1; so KVM userspace should be able to read/write this MSR even when there's no hardware support for it. Or who's trying to read/write it? Also, kvm_cpu_cap_has() check is not equal to vmx_has_waitpkg() which checks secondary execution controls. > default: > break; > }
On Wed, 2020-05-20 at 18:33 +0200, Vitaly Kuznetsov wrote: > Maxim Levitsky <mlevitsk@redhat.com> writes: > > > This msr is only available when the host supports WAITPKG feature. > > > > This breaks a nested guest, if the L1 hypervisor is set to ignore > > unknown msrs, because the only other safety check that the > > kernel does is that it attempts to read the msr and > > rejects it if it gets an exception. > > > > Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > --- > > arch/x86/kvm/x86.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index fe3a24fd6b263..9c507b32b1b77 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) > > if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= > > min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) > > continue; > > + break; > > + case MSR_IA32_UMWAIT_CONTROL: > > + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) > > + continue; > > I'm probably missing something but (if I understand correctly) the only > effect of dropping MSR_IA32_UMWAIT_CONTROL from msrs_to_save would be > that KVM userspace won't see it in e.g. KVM_GET_MSR_INDEX_LIST. But why > is this causing an issue? I see both vmx_get_msr()/vmx_set_msr() have > 'host_initiated' check: > > case MSR_IA32_UMWAIT_CONTROL: > if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx)) > return 1; Here it fails like that: 1. KVM_GET_MSR_INDEX_LIST returns this msrs, and qemu notes that it is supported in 'has_msr_umwait' global var 2. Qemu does kvm_arch_get/put_registers->kvm_get/put_msrs->ioctl(KVM_GET_MSRS) and while doing this it adds MSR_IA32_UMWAIT_CONTROL to that msr list. That reaches 'svm_get_msr', and this one knows nothing about MSR_IA32_UMWAIT_CONTROL. So the difference here is that vmx_get_msr not called at all. I can add this msr to svm_get_msr instead but that feels wrong since this feature is not yet supported on AMD. When AMD adds support for this feature, then the VMX specific code can be moved to kvm_get_msr_common I guess. > > so KVM userspace should be able to read/write this MSR even when there's > no hardware support for it. Or who's trying to read/write it? > > Also, kvm_cpu_cap_has() check is not equal to vmx_has_waitpkg() which > checks secondary execution controls. I was afraid that something like that will happen, but in this particular case we can only check CPUID support and if supported, the then it means we are dealing with intel system and thus vmx_get_msr will be called and ignore that msr. Calling vmx_has_waitpkg from the common code doesn't seem right, and besides, it checks the secondary controls which are set by the host and can change, at least in theory during runtime (I don't know if KVM does this). Note that if I now understand correctly, the 'host_initiated' means that MSR read/write is done by the host itself and not on behalf of the guest. Best regards, Maxim Levitsky > > > default: > > break; > > }
Maxim Levitsky <mlevitsk@redhat.com> writes: > On Wed, 2020-05-20 at 18:33 +0200, Vitaly Kuznetsov wrote: >> Maxim Levitsky <mlevitsk@redhat.com> writes: >> >> > This msr is only available when the host supports WAITPKG feature. >> > >> > This breaks a nested guest, if the L1 hypervisor is set to ignore >> > unknown msrs, because the only other safety check that the >> > kernel does is that it attempts to read the msr and >> > rejects it if it gets an exception. >> > >> > Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL >> > >> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> >> > --- >> > arch/x86/kvm/x86.c | 4 ++++ >> > 1 file changed, 4 insertions(+) >> > >> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> > index fe3a24fd6b263..9c507b32b1b77 100644 >> > --- a/arch/x86/kvm/x86.c >> > +++ b/arch/x86/kvm/x86.c >> > @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) >> > if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= >> > min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) >> > continue; >> > + break; >> > + case MSR_IA32_UMWAIT_CONTROL: >> > + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) >> > + continue; >> >> I'm probably missing something but (if I understand correctly) the only >> effect of dropping MSR_IA32_UMWAIT_CONTROL from msrs_to_save would be >> that KVM userspace won't see it in e.g. KVM_GET_MSR_INDEX_LIST. But why >> is this causing an issue? I see both vmx_get_msr()/vmx_set_msr() have >> 'host_initiated' check: >> >> case MSR_IA32_UMWAIT_CONTROL: >> if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx)) >> return 1; > > Here it fails like that: > > 1. KVM_GET_MSR_INDEX_LIST returns this msrs, and qemu notes that > it is supported in 'has_msr_umwait' global var > > 2. Qemu does kvm_arch_get/put_registers->kvm_get/put_msrs->ioctl(KVM_GET_MSRS) > and while doing this it adds MSR_IA32_UMWAIT_CONTROL to that msr list. > That reaches 'svm_get_msr', and this one knows nothing about MSR_IA32_UMWAIT_CONTROL. > > So the difference here is that vmx_get_msr not called at all. > I can add this msr to svm_get_msr instead but that feels wrong since this feature > is not yet supported on AMD. > When AMD adds support for this feature, then the VMX specific code can be moved to > kvm_get_msr_common I guess. > > Oh, SVM, I missed that completely) > >> >> so KVM userspace should be able to read/write this MSR even when there's >> no hardware support for it. Or who's trying to read/write it? >> >> Also, kvm_cpu_cap_has() check is not equal to vmx_has_waitpkg() which >> checks secondary execution controls. > > I was afraid that something like that will happen, but in this particular > case we can only check CPUID support and if supported, the then it means > we are dealing with intel system and thus vmx_get_msr will be called and > ignore that msr. > > Calling vmx_has_waitpkg from the common code doesn't seem right, and besides, > it checks the secondary controls which are set by the host and can change, > at least in theory during runtime (I don't know if KVM does this). > > Note that if I now understand correctly, the 'host_initiated' means > that MSR read/write is done by the host itself and not on behalf of the guest. Yes, it does that. We have kvm_x86_ops.has_emulated_msr() mechanism, can we use it here? E.g. completely untested diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 38f6aeefeb55..c19a9542e6c3 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3471,6 +3471,8 @@ static bool svm_has_emulated_msr(int index) case MSR_IA32_MCG_EXT_CTL: case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: return false; + case MSR_IA32_UMWAIT_CONTROL: + return false; default: break; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d786c7d27ce5..f45153ef3b81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1183,7 +1183,6 @@ static const u32 msrs_to_save_all[] = { MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B, MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B, MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B, - MSR_IA32_UMWAIT_CONTROL, MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1, MSR_ARCH_PERFMON_FIXED_CTR0 + 2, MSR_ARCH_PERFMON_FIXED_CTR0 + 3, @@ -1266,6 +1265,7 @@ static const u32 emulated_msrs_all[] = { MSR_IA32_VMX_PROCBASED_CTLS2, MSR_IA32_VMX_EPT_VPID_CAP, MSR_IA32_VMX_VMFUNC, + MSR_IA32_UMWAIT_CONTROL, MSR_K7_HWCR, MSR_KVM_POLL_CONTROL,
On Wed, 2020-05-20 at 19:15 +0200, Vitaly Kuznetsov wrote: > Maxim Levitsky <mlevitsk@redhat.com> writes: > > > On Wed, 2020-05-20 at 18:33 +0200, Vitaly Kuznetsov wrote: > > > Maxim Levitsky <mlevitsk@redhat.com> writes: > > > > > > > This msr is only available when the host supports WAITPKG feature. > > > > > > > > This breaks a nested guest, if the L1 hypervisor is set to ignore > > > > unknown msrs, because the only other safety check that the > > > > kernel does is that it attempts to read the msr and > > > > rejects it if it gets an exception. > > > > > > > > Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL > > > > > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > > > --- > > > > arch/x86/kvm/x86.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > index fe3a24fd6b263..9c507b32b1b77 100644 > > > > --- a/arch/x86/kvm/x86.c > > > > +++ b/arch/x86/kvm/x86.c > > > > @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) > > > > if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= > > > > min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) > > > > continue; > > > > + break; > > > > + case MSR_IA32_UMWAIT_CONTROL: > > > > + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) > > > > + continue; > > > > > > I'm probably missing something but (if I understand correctly) the only > > > effect of dropping MSR_IA32_UMWAIT_CONTROL from msrs_to_save would be > > > that KVM userspace won't see it in e.g. KVM_GET_MSR_INDEX_LIST. But why > > > is this causing an issue? I see both vmx_get_msr()/vmx_set_msr() have > > > 'host_initiated' check: > > > > > > case MSR_IA32_UMWAIT_CONTROL: > > > if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx)) > > > return 1; > > > > Here it fails like that: > > > > 1. KVM_GET_MSR_INDEX_LIST returns this msrs, and qemu notes that > > it is supported in 'has_msr_umwait' global var > > > > 2. Qemu does kvm_arch_get/put_registers->kvm_get/put_msrs->ioctl(KVM_GET_MSRS) > > and while doing this it adds MSR_IA32_UMWAIT_CONTROL to that msr list. > > That reaches 'svm_get_msr', and this one knows nothing about MSR_IA32_UMWAIT_CONTROL. > > > > So the difference here is that vmx_get_msr not called at all. > > I can add this msr to svm_get_msr instead but that feels wrong since this feature > > is not yet supported on AMD. > > When AMD adds support for this feature, then the VMX specific code can be moved to > > kvm_get_msr_common I guess. > > > > > > Oh, SVM, I missed that completely) > > > > so KVM userspace should be able to read/write this MSR even when there's > > > no hardware support for it. Or who's trying to read/write it? > > > > > > Also, kvm_cpu_cap_has() check is not equal to vmx_has_waitpkg() which > > > checks secondary execution controls. > > > > I was afraid that something like that will happen, but in this particular > > case we can only check CPUID support and if supported, the then it means > > we are dealing with intel system and thus vmx_get_msr will be called and > > ignore that msr. > > > > Calling vmx_has_waitpkg from the common code doesn't seem right, and besides, > > it checks the secondary controls which are set by the host and can change, > > at least in theory during runtime (I don't know if KVM does this). > > > > Note that if I now understand correctly, the 'host_initiated' means > > that MSR read/write is done by the host itself and not on behalf of the guest. > > Yes, it does that. > > We have kvm_x86_ops.has_emulated_msr() mechanism, can we use it here? > E.g. completely untested > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 38f6aeefeb55..c19a9542e6c3 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -3471,6 +3471,8 @@ static bool svm_has_emulated_msr(int index) > case MSR_IA32_MCG_EXT_CTL: > case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: > return false; > + case MSR_IA32_UMWAIT_CONTROL: > + return false; > default: > break; > } > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d786c7d27ce5..f45153ef3b81 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1183,7 +1183,6 @@ static const u32 msrs_to_save_all[] = { > MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B, > MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B, > MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B, > - MSR_IA32_UMWAIT_CONTROL, > > MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1, > MSR_ARCH_PERFMON_FIXED_CTR0 + 2, MSR_ARCH_PERFMON_FIXED_CTR0 + 3, > @@ -1266,6 +1265,7 @@ static const u32 emulated_msrs_all[] = { > MSR_IA32_VMX_PROCBASED_CTLS2, > MSR_IA32_VMX_EPT_VPID_CAP, > MSR_IA32_VMX_VMFUNC, > + MSR_IA32_UMWAIT_CONTROL, > > MSR_K7_HWCR, > MSR_KVM_POLL_CONTROL, > I don't see any reason why the above won't work, and to be honest I also took a look at this but to me it wasn't clear what the purpose of the emulated msrs is, this is why I took the approach in the patch I had sent. It 'seems' (although this is not enforced anywhere) that emulated msr list is intended for MSRs that are emulated by KVM, which means that KVM traps these msrs, and give guest arbitrary values it thinks that the guest should see. However MSR_IA32_UMWAIT_CONTROL appears to be exposed directly to the guest without any traps, with the virtualization done by cpu, and the only intervention we do is to set a value to be load when guest mode is entered and value to be loaded when guest mode is done (using VMX msr entry/exit msr lists), I see that done by atomic_switch_umwait_control_msr. So I am not sure if we should add it to emulated_msrs_all list. Paulo, what do you think about this? I personally don't mind how to fix this as long as it works and everyone agrees on the patch. Best regards, Maxim Levitsky
On 20/05/20 18:07, Maxim Levitsky wrote: > This msr is only available when the host supports WAITPKG feature. > > This breaks a nested guest, if the L1 hypervisor is set to ignore > unknown msrs, because the only other safety check that the > kernel does is that it attempts to read the msr and > rejects it if it gets an exception. > > Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > arch/x86/kvm/x86.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index fe3a24fd6b263..9c507b32b1b77 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) > if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= > min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) > continue; > + break; > + case MSR_IA32_UMWAIT_CONTROL: > + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) > + continue; > default: > break; > } The patch is correct, and matches what is done for the other entries of msrs_to_save_all. However, while looking at it I noticed that X86_FEATURE_WAITPKG is actually never added, and that is because it was also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: x86: Add support for user wait instructions", 2019-09-24), which was before the kvm_cpu_cap mechanism was added. So while at it you should also fix that. The right way to do that is to add a if (vmx_waitpkg_supported()) kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); in vmx_set_cpu_caps. Thanks, Paolo
On Wed, 2020-05-20 at 23:05 +0200, Paolo Bonzini wrote: > On 20/05/20 18:07, Maxim Levitsky wrote: > > This msr is only available when the host supports WAITPKG feature. > > > > This breaks a nested guest, if the L1 hypervisor is set to ignore > > unknown msrs, because the only other safety check that the > > kernel does is that it attempts to read the msr and > > rejects it if it gets an exception. > > > > Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > --- > > arch/x86/kvm/x86.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index fe3a24fd6b263..9c507b32b1b77 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) > > if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= > > min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) > > continue; > > + break; > > + case MSR_IA32_UMWAIT_CONTROL: > > + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) > > + continue; > > default: > > break; > > } > > The patch is correct, and matches what is done for the other entries of > msrs_to_save_all. However, while looking at it I noticed that > X86_FEATURE_WAITPKG is actually never added, and that is because it was > also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: x86: > Add support for user wait instructions", 2019-09-24), which was before > the kvm_cpu_cap mechanism was added. > > So while at it you should also fix that. The right way to do that is to > add a > > if (vmx_waitpkg_supported()) > kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); > > in vmx_set_cpu_caps. > > Thanks, Thank you very much for finding this. I didn't expect this to be broken. I will send a new version with this fix as well tomorrow. Best regards, Maxim Levitsky > > Paolo >
On 5/21/2020 5:05 AM, Paolo Bonzini wrote: > On 20/05/20 18:07, Maxim Levitsky wrote: >> This msr is only available when the host supports WAITPKG feature. >> >> This breaks a nested guest, if the L1 hypervisor is set to ignore >> unknown msrs, because the only other safety check that the >> kernel does is that it attempts to read the msr and >> rejects it if it gets an exception. >> >> Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL >> >> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> >> --- >> arch/x86/kvm/x86.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index fe3a24fd6b263..9c507b32b1b77 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) >> if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= >> min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) >> continue; >> + break; >> + case MSR_IA32_UMWAIT_CONTROL: >> + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) >> + continue; >> default: >> break; >> } > > The patch is correct, and matches what is done for the other entries of > msrs_to_save_all. However, while looking at it I noticed that > X86_FEATURE_WAITPKG is actually never added, and that is because it was > also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: x86: > Add support for user wait instructions", 2019-09-24), which was before > the kvm_cpu_cap mechanism was added. > > So while at it you should also fix that. The right way to do that is to > add a > > if (vmx_waitpkg_supported()) > kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); + Tao I remember there is certainly some reason why we don't expose WAITPKG to guest by default. Tao, please help clarify it. Thanks, -Xiaoyao > > in vmx_set_cpu_caps. > > Thanks, > > Paolo >
On 5/21/2020 12:33 PM, Xiaoyao Li wrote: > On 5/21/2020 5:05 AM, Paolo Bonzini wrote: >> On 20/05/20 18:07, Maxim Levitsky wrote: >>> This msr is only available when the host supports WAITPKG feature. >>> >>> This breaks a nested guest, if the L1 hypervisor is set to ignore >>> unknown msrs, because the only other safety check that the >>> kernel does is that it attempts to read the msr and >>> rejects it if it gets an exception. >>> >>> Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL >>> >>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> >>> --- >>> arch/x86/kvm/x86.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index fe3a24fd6b263..9c507b32b1b77 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) >>> if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= >>> min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) >>> continue; >>> + break; >>> + case MSR_IA32_UMWAIT_CONTROL: >>> + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) >>> + continue; >>> default: >>> break; >>> } >> >> The patch is correct, and matches what is done for the other entries of >> msrs_to_save_all. However, while looking at it I noticed that >> X86_FEATURE_WAITPKG is actually never added, and that is because it was >> also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: x86: >> Add support for user wait instructions", 2019-09-24), which was before >> the kvm_cpu_cap mechanism was added. >> >> So while at it you should also fix that. The right way to do that is to >> add a >> >> if (vmx_waitpkg_supported()) >> kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); > > + Tao > > I remember there is certainly some reason why we don't expose WAITPKG to > guest by default. > > Tao, please help clarify it. > > Thanks, > -Xiaoyao > Because in VM, umwait and tpause can put a (psysical) CPU into a power saving state. So from host view, this cpu will be 100% usage by VM. Although umwait and tpause just cause short wait(maybe 100 microseconds), we still want to unconditionally expose WAITPKG in VM.
On 5/21/2020 1:28 PM, Tao Xu wrote: > > > On 5/21/2020 12:33 PM, Xiaoyao Li wrote: >> On 5/21/2020 5:05 AM, Paolo Bonzini wrote: >>> On 20/05/20 18:07, Maxim Levitsky wrote: >>>> This msr is only available when the host supports WAITPKG feature. >>>> >>>> This breaks a nested guest, if the L1 hypervisor is set to ignore >>>> unknown msrs, because the only other safety check that the >>>> kernel does is that it attempts to read the msr and >>>> rejects it if it gets an exception. >>>> >>>> Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL >>>> >>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> >>>> --- >>>> arch/x86/kvm/x86.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>> index fe3a24fd6b263..9c507b32b1b77 100644 >>>> --- a/arch/x86/kvm/x86.c >>>> +++ b/arch/x86/kvm/x86.c >>>> @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) >>>> if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= >>>> min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) >>>> continue; >>>> + break; >>>> + case MSR_IA32_UMWAIT_CONTROL: >>>> + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) >>>> + continue; >>>> default: >>>> break; >>>> } >>> >>> The patch is correct, and matches what is done for the other entries of >>> msrs_to_save_all. However, while looking at it I noticed that >>> X86_FEATURE_WAITPKG is actually never added, and that is because it was >>> also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: x86: >>> Add support for user wait instructions", 2019-09-24), which was before >>> the kvm_cpu_cap mechanism was added. >>> >>> So while at it you should also fix that. The right way to do that is to >>> add a >>> >>> if (vmx_waitpkg_supported()) >>> kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); >> >> + Tao >> >> I remember there is certainly some reason why we don't expose WAITPKG >> to guest by default. >> >> Tao, please help clarify it. >> >> Thanks, >> -Xiaoyao >> > > Because in VM, umwait and tpause can put a (psysical) CPU into a power > saving state. So from host view, this cpu will be 100% usage by VM. > Although umwait and tpause just cause short wait(maybe 100 > microseconds), we still want to unconditionally expose WAITPKG in VM. I guess you typed "unconditionally" by mistake that you meant to say "conditionally" in fact?
On 5/21/2020 2:37 PM, Xiaoyao Li wrote: > On 5/21/2020 1:28 PM, Tao Xu wrote: >> >> >> On 5/21/2020 12:33 PM, Xiaoyao Li wrote: >>> On 5/21/2020 5:05 AM, Paolo Bonzini wrote: >>>> On 20/05/20 18:07, Maxim Levitsky wrote: >>>>> This msr is only available when the host supports WAITPKG feature. >>>>> >>>>> This breaks a nested guest, if the L1 hypervisor is set to ignore >>>>> unknown msrs, because the only other safety check that the >>>>> kernel does is that it attempts to read the msr and >>>>> rejects it if it gets an exception. >>>>> >>>>> Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL >>>>> >>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> >>>>> --- >>>>> arch/x86/kvm/x86.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>>> index fe3a24fd6b263..9c507b32b1b77 100644 >>>>> --- a/arch/x86/kvm/x86.c >>>>> +++ b/arch/x86/kvm/x86.c >>>>> @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) >>>>> if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= >>>>> min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) >>>>> continue; >>>>> + break; >>>>> + case MSR_IA32_UMWAIT_CONTROL: >>>>> + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) >>>>> + continue; >>>>> default: >>>>> break; >>>>> } >>>> >>>> The patch is correct, and matches what is done for the other entries of >>>> msrs_to_save_all. However, while looking at it I noticed that >>>> X86_FEATURE_WAITPKG is actually never added, and that is because it was >>>> also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: >>>> x86: >>>> Add support for user wait instructions", 2019-09-24), which was before >>>> the kvm_cpu_cap mechanism was added. >>>> >>>> So while at it you should also fix that. The right way to do that >>>> is to >>>> add a >>>> >>>> if (vmx_waitpkg_supported()) >>>> kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); >>> >>> + Tao >>> >>> I remember there is certainly some reason why we don't expose WAITPKG >>> to guest by default. >>> >>> Tao, please help clarify it. >>> >>> Thanks, >>> -Xiaoyao >>> >> >> Because in VM, umwait and tpause can put a (psysical) CPU into a power >> saving state. So from host view, this cpu will be 100% usage by VM. >> Although umwait and tpause just cause short wait(maybe 100 >> microseconds), we still want to unconditionally expose WAITPKG in VM. > > I guess you typed "unconditionally" by mistake that you meant to say > "conditionally" in fact? I am sorry, I mean: By default, we don't expose WAITPKG to guest. For QEMU, we can use "-overcommit cpu-pm=on" to use WAITPKG.
On 5/21/2020 12:56 AM, Maxim Levitsky wrote: > On Wed, 2020-05-20 at 18:33 +0200, Vitaly Kuznetsov wrote: >> Maxim Levitsky <mlevitsk@redhat.com> writes: >> >>> This msr is only available when the host supports WAITPKG feature. >>> >>> This breaks a nested guest, if the L1 hypervisor is set to ignore >>> unknown msrs, because the only other safety check that the >>> kernel does is that it attempts to read the msr and >>> rejects it if it gets an exception. >>> >>> Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL >>> >>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> >>> --- >>> arch/x86/kvm/x86.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index fe3a24fd6b263..9c507b32b1b77 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) >>> if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= >>> min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) >>> continue; >>> + break; >>> + case MSR_IA32_UMWAIT_CONTROL: >>> + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) >>> + continue; >> >> I'm probably missing something but (if I understand correctly) the only >> effect of dropping MSR_IA32_UMWAIT_CONTROL from msrs_to_save would be >> that KVM userspace won't see it in e.g. KVM_GET_MSR_INDEX_LIST. But why >> is this causing an issue? I see both vmx_get_msr()/vmx_set_msr() have >> 'host_initiated' check: >> >> case MSR_IA32_UMWAIT_CONTROL: >> if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx)) >> return 1; > > Here it fails like that: > > 1. KVM_GET_MSR_INDEX_LIST returns this msrs, and qemu notes that > it is supported in 'has_msr_umwait' global var In general, KVM_GET_MSR_INDEX_LIST won't return MSR_IA32_UMWAIT_CONTROL if KVM cannot read this MSR, see kvm_init_msr_list(). You hit issue because you used "ignore_msrs".
On 21/05/20 06:33, Xiaoyao Li wrote: > I remember there is certainly some reason why we don't expose WAITPKG to > guest by default. That's a userspace policy decision. KVM_GET_SUPPORTED_CPUID should still tell userspace that it's supported. Paolo > Tao, please help clarify it.
On 21/05/20 08:44, Tao Xu wrote: > > I am sorry, I mean: > By default, we don't expose WAITPKG to guest. For QEMU, we can use > "-overcommit cpu-pm=on" to use WAITPKG. But UMONITOR, UMWAIT and TPAUSE are not NOPs on older processors (which I should have checked before committing your patch, I admit). So you have broken "-cpu host -overcommit cpu-pm=on" on any processor that doesn't have WAITPKG. I'll send a patch. Paolo
On Thu, 2020-05-21 at 10:40 +0200, Paolo Bonzini wrote: > On 21/05/20 08:44, Tao Xu wrote: > > I am sorry, I mean: > > By default, we don't expose WAITPKG to guest. For QEMU, we can use > > "-overcommit cpu-pm=on" to use WAITPKG. > > But UMONITOR, UMWAIT and TPAUSE are not NOPs on older processors (which > I should have checked before committing your patch, I admit). So you > have broken "-cpu host -overcommit cpu-pm=on" on any processor that > doesn't have WAITPKG. I'll send a patch. > > Paolo > Any update on that? I accidently hit this today while updating my guest's kernel. Turns out I had '-overcommit cpu-pm=on' enabled and -cpu host, and waitpkg (which my AMD cpu doesn't have by any means) was silently exposed to the guest but it didn't use it, but the mainline kernel started using it and so it crashes. Took me some time to realize that I am hitting this issue. The CPUID_7_0_ECX_WAITPKG is indeed cleared in KVM_GET_SUPPORTED_CPUID, and code in qemu sets/clears it depending on 'cpu-pm' value. This patch (copy-pasted so probably not to apply) works for me regardless if we want to fix the KVM_GET_SUPPORTED_CPUID returned value (which I think we should). It basically detects the presence of the UMWAIT by presense of MSR_IA32_UMWAIT_CONTROL in the global KVM_GET_MSR_INDEX_LIST, which I recently fixed too, to not return this msr if the host CPUID doesn't support it. So this works but is a bit ugly IMHO. diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 6adbff3d74..e9933d2e68 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -412,7 +412,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, ret &= ~(CPUID_7_0_EBX_RTM | CPUID_7_0_EBX_HLE); } } else if (function == 7 && index == 0 && reg == R_ECX) { - if (enable_cpu_pm) { + if (enable_cpu_pm && has_msr_umwait) { ret |= CPUID_7_0_ECX_WAITPKG; } else { ret &= ~CPUID_7_0_ECX_WAITPKG;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fe3a24fd6b263..9c507b32b1b77 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void) if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) continue; + break; + case MSR_IA32_UMWAIT_CONTROL: + if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG)) + continue; default: break; }
This msr is only available when the host supports WAITPKG feature. This breaks a nested guest, if the L1 hypervisor is set to ignore unknown msrs, because the only other safety check that the kernel does is that it attempts to read the msr and rejects it if it gets an exception. Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- arch/x86/kvm/x86.c | 4 ++++ 1 file changed, 4 insertions(+)