diff mbox series

KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

Message ID 20220613161611.3567556-1-anrayabh@linux.microsoft.com (mailing list archive)
State New, archived
Headers show
Series KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V | expand

Commit Message

Anirudh Rayabharam June 13, 2022, 4:16 p.m. UTC
VM entry into an L2 guest on KVM on Hyper-V fails with the following
splat (stripped for brevity) when running cloud-hypervisor tests.

[ 1481.600386] WARNING: CPU: 4 PID: 7641 at arch/x86/kvm/vmx/nested.c:4563 nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
[ 1481.600427] CPU: 4 PID: 7641 Comm: vcpu2 Not tainted 5.15.0-1008-azure #9-Ubuntu
[ 1481.600429] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/22/2021
[ 1481.600430] RIP: 0010:nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
[ 1481.600447] Call Trace:
[ 1481.600449]  <TASK>
[ 1481.600451]  nested_vmx_reflect_vmexit+0x10b/0x440 [kvm_intel]
[ 1481.600457]  __vmx_handle_exit+0xef/0x670 [kvm_intel]
[ 1481.600467]  vmx_handle_exit+0x12/0x50 [kvm_intel]
[ 1481.600472]  vcpu_enter_guest+0x83a/0xfd0 [kvm]
[ 1481.600524]  vcpu_run+0x5e/0x240 [kvm]
[ 1481.600560]  kvm_arch_vcpu_ioctl_run+0xd7/0x550 [kvm]
[ 1481.600597]  kvm_vcpu_ioctl+0x29a/0x6d0 [kvm]
[ 1481.600634]  __x64_sys_ioctl+0x91/0xc0
[ 1481.600637]  do_syscall_64+0x5c/0xc0
[ 1481.600667]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1481.600670] RIP: 0033:0x7f688becdaff
[ 1481.600686]  </TASK>

As per the comments in arch/x86/kvm/vmx/evmcs.h, TSC multiplier field is
currently not supported in EVMCS. As a result, there is no TSC scaling
support when KVM is running on Hyper-V i.e. kvm_has_tsc_control is
false.

However, in nested_vmx_setup_ctls_msrs(), TSC scaling is exposed to L1.
When L1 tries to launch an L2 guest, vmcs12 has TSC scaling enabled.
This propagates to vmcs02. But KVM doesn't set the TSC multiplier value
because kvm_has_tsc_control is false. Due to this, VM entry for L2 guest
fails. (VM entry fails if "use TSC scaling" is 1 and TSC multiplier is 0.)

To fix, expose TSC scaling to L1 only if kvm_has_tsc_control.

Fixes: d041b5ea93352 ("KVM: nVMX: Enable nested TSC scaling")
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Sean Christopherson June 13, 2022, 4:41 p.m. UTC | #1
On Mon, Jun 13, 2022, Anirudh Rayabharam wrote:
> As per the comments in arch/x86/kvm/vmx/evmcs.h, TSC multiplier field is

It's not just the comments, it's also the code.  It would be helpful to call out
in the changelog that KVM clears unsupported controls via evmcs_sanitize_exec_ctrls()
when using eVMCS.  

> currently not supported in EVMCS. As a result, there is no TSC scaling
> support when KVM is running on Hyper-V i.e. kvm_has_tsc_control is
> false.
> 
> However, in nested_vmx_setup_ctls_msrs(), TSC scaling is exposed to L1.
> When L1 tries to launch an L2 guest, vmcs12 has TSC scaling enabled.
> This propagates to vmcs02. But KVM doesn't set the TSC multiplier value
> because kvm_has_tsc_control is false. Due to this, VM entry for L2 guest
> fails. (VM entry fails if "use TSC scaling" is 1 and TSC multiplier is 0.)
>
> To fix, expose TSC scaling to L1 only if kvm_has_tsc_control.
> 
> Fixes: d041b5ea93352 ("KVM: nVMX: Enable nested TSC scaling")
> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index f5cb18e00e78..d773ddc6422b 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6656,6 +6656,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>  		      msrs->secondary_ctls_low,
>  		      msrs->secondary_ctls_high);
>  
> +	if (!kvm_has_tsc_control)
> +		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;

I would much rather we fix the root of the problem and not play whack-a-mole,
e.g. all of the other controls that aren't supported by eVMCS have the same bug,.

nested_vmx_setup_ctls_msrs() should use vmcs_config to get the base MSR values,
not read the MSRs from hardware.  And it's not just eVMCS, e.g. the manipulation
of VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL for a CPU errata isn't handled either.

>  	msrs->secondary_ctls_low = 0;
>  	msrs->secondary_ctls_high &=
>  		SECONDARY_EXEC_DESC |
> @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>  		SECONDARY_EXEC_RDRAND_EXITING |
>  		SECONDARY_EXEC_ENABLE_INVPCID |
>  		SECONDARY_EXEC_RDSEED_EXITING |
> -		SECONDARY_EXEC_XSAVES |
> -		SECONDARY_EXEC_TSC_SCALING;
> +		SECONDARY_EXEC_XSAVES;
>  
>  	/*
>  	 * We can emulate "VMCS shadowing," even if the hardware
> -- 
> 2.34.1
>
Paolo Bonzini June 13, 2022, 4:49 p.m. UTC | #2
On 6/13/22 18:16, Anirudh Rayabharam wrote:
> +	if (!kvm_has_tsc_control)
> +		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> +
>   	msrs->secondary_ctls_low = 0;
>   	msrs->secondary_ctls_high &=
>   		SECONDARY_EXEC_DESC |
> @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>   		SECONDARY_EXEC_RDRAND_EXITING |
>   		SECONDARY_EXEC_ENABLE_INVPCID |
>   		SECONDARY_EXEC_RDSEED_EXITING |
> -		SECONDARY_EXEC_XSAVES |
> -		SECONDARY_EXEC_TSC_SCALING;
> +		SECONDARY_EXEC_XSAVES;
>   
>   	/*

This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
even if kvm_has_tsc_control == true.

That said, I think a better implementation of this patch is to just add
a version of evmcs_sanitize_exec_ctrls that takes a struct
nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like

	evmcs_sanitize_nested_vmx_vsrs(msrs);

Even better (but I cannot "mentally test it" offhand) would be just

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e802f71a9e8d..b3425ce835c5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1862,7 +1862,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
  		 * sanity checking and refuse to boot. Filter all unsupported
  		 * features out.
  		 */
-		if (!msr_info->host_initiated &&
+		if (static_branch_unlikely(&enable_evmcs) ||
  		    vmx->nested.enlightened_vmcs_enabled)
  			nested_evmcs_filter_control_msr(msr_info->index,
  							&msr_info->data);

I cannot quite understand the host_initiated check, so I'll defer to
Vitaly on why it is needed.  Most likely, removing it would cause some
warnings in QEMU with e.g. "-cpu Haswell,+vmx"; but I think it's a
userspace bug and we should remove that part of the condition.  You
don't need to worry about that part, we'll cross that bridge if the
above patch works for your case.

Thanks,

Paolo
Sean Christopherson June 13, 2022, 4:57 p.m. UTC | #3
On Mon, Jun 13, 2022, Paolo Bonzini wrote:
> On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > +	if (!kvm_has_tsc_control)
> > +		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > +
> >   	msrs->secondary_ctls_low = 0;
> >   	msrs->secondary_ctls_high &=
> >   		SECONDARY_EXEC_DESC |
> > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> >   		SECONDARY_EXEC_RDRAND_EXITING |
> >   		SECONDARY_EXEC_ENABLE_INVPCID |
> >   		SECONDARY_EXEC_RDSEED_EXITING |
> > -		SECONDARY_EXEC_XSAVES |
> > -		SECONDARY_EXEC_TSC_SCALING;
> > +		SECONDARY_EXEC_XSAVES;
> >   	/*
> 
> This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> even if kvm_has_tsc_control == true.
> 
> That said, I think a better implementation of this patch is to just add
> a version of evmcs_sanitize_exec_ctrls that takes a struct
> nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> 
> 	evmcs_sanitize_nested_vmx_vsrs(msrs);

Any reason not to use the already sanitized vmcs_config?  I can't think of any
reason why the nested path should blindly use the raw MSR values from hardware.
Anirudh Rayabharam June 14, 2022, 4:55 a.m. UTC | #4
On Mon, Jun 13, 2022 at 06:49:17PM +0200, Paolo Bonzini wrote:
> On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > +	if (!kvm_has_tsc_control)
> > +		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > +
> >   	msrs->secondary_ctls_low = 0;
> >   	msrs->secondary_ctls_high &=
> >   		SECONDARY_EXEC_DESC |
> > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> >   		SECONDARY_EXEC_RDRAND_EXITING |
> >   		SECONDARY_EXEC_ENABLE_INVPCID |
> >   		SECONDARY_EXEC_RDSEED_EXITING |
> > -		SECONDARY_EXEC_XSAVES |
> > -		SECONDARY_EXEC_TSC_SCALING;
> > +		SECONDARY_EXEC_XSAVES;
> >   	/*
> 
> This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> even if kvm_has_tsc_control == true.

The MSR actually allows 1-setting of the "use TSC scaling" control. So this
line is redundant anyway.

> 
> That said, I think a better implementation of this patch is to just add
> a version of evmcs_sanitize_exec_ctrls that takes a struct
> nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> 
> 	evmcs_sanitize_nested_vmx_vsrs(msrs);

Sanitize at the end might not work because I see some cases in
nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
even though the hardware doesn't support it.

> 
> Even better (but I cannot "mentally test it" offhand) would be just
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e802f71a9e8d..b3425ce835c5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1862,7 +1862,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		 * sanity checking and refuse to boot. Filter all unsupported
>  		 * features out.
>  		 */
> -		if (!msr_info->host_initiated &&
> +		if (static_branch_unlikely(&enable_evmcs) ||
>  		    vmx->nested.enlightened_vmcs_enabled)
>  			nested_evmcs_filter_control_msr(msr_info->index,
>  							&msr_info->data);

I will try this.

Thanks,

	Anirudh.

> 
> I cannot quite understand the host_initiated check, so I'll defer to
> Vitaly on why it is needed.  Most likely, removing it would cause some
> warnings in QEMU with e.g. "-cpu Haswell,+vmx"; but I think it's a
> userspace bug and we should remove that part of the condition.  You
> don't need to worry about that part, we'll cross that bridge if the
> above patch works for your case.
> 
> Thanks,
> 
> Paolo
Vitaly Kuznetsov June 14, 2022, 12:12 p.m. UTC | #5
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 6/13/22 18:16, Anirudh Rayabharam wrote:
>> +	if (!kvm_has_tsc_control)
>> +		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
>> +
>>   	msrs->secondary_ctls_low = 0;
>>   	msrs->secondary_ctls_high &=
>>   		SECONDARY_EXEC_DESC |
>> @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>>   		SECONDARY_EXEC_RDRAND_EXITING |
>>   		SECONDARY_EXEC_ENABLE_INVPCID |
>>   		SECONDARY_EXEC_RDSEED_EXITING |
>> -		SECONDARY_EXEC_XSAVES |
>> -		SECONDARY_EXEC_TSC_SCALING;
>> +		SECONDARY_EXEC_XSAVES;
>>   
>>   	/*
>
> This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> even if kvm_has_tsc_control == true.
>
> That said, I think a better implementation of this patch is to just add
> a version of evmcs_sanitize_exec_ctrls that takes a struct
> nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
>
> 	evmcs_sanitize_nested_vmx_vsrs(msrs);
>
> Even better (but I cannot "mentally test it" offhand) would be just
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e802f71a9e8d..b3425ce835c5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1862,7 +1862,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		 * sanity checking and refuse to boot. Filter all unsupported
>   		 * features out.
>   		 */
> -		if (!msr_info->host_initiated &&
> +		if (static_branch_unlikely(&enable_evmcs) ||
>   		    vmx->nested.enlightened_vmcs_enabled)
>   			nested_evmcs_filter_control_msr(msr_info->index,
>   							&msr_info->data);
>
> I cannot quite understand the host_initiated check, so I'll defer to
> Vitaly on why it is needed. Most likely, removing it would cause some
> warnings in QEMU with e.g. "-cpu Haswell,+vmx"; but I think it's a
> userspace bug and we should remove that part of the condition. 

I forgot the details, of course, but 31de3d2500e4 says:

```
    With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
    with default (matching CPU model) values and in case eVMCS is also enabled,
    fails.
```

so it certainly was a workaround for QEMU.
Paolo Bonzini June 14, 2022, 12:16 p.m. UTC | #6
On 6/14/22 06:55, Anirudh Rayabharam wrote:
>> That said, I think a better implementation of this patch is to just add
>> a version of evmcs_sanitize_exec_ctrls that takes a struct
>> nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
>>
>> 	evmcs_sanitize_nested_vmx_vsrs(msrs);
> Sanitize at the end might not work because I see some cases in
> nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
> even though the hardware doesn't support it.
> 

Yes, but these will never include eVMCS-unsupported features.

Paolo
Vitaly Kuznetsov June 14, 2022, 12:19 p.m. UTC | #7
Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:

...

>
> As per the comments in arch/x86/kvm/vmx/evmcs.h, TSC multiplier field is
> currently not supported in EVMCS.

The latest version:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_vmx_enlightened_vmcs

has it, actually. It was missing before (compare with e.g. 6.0b version
here:
https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf)

but AFAIR TSC scaling wasn't advertised by genuine Hyper-V either.
Interestingly enough, eVMCS version didn't change when these fields were
added, it is still '1'.

I even have a patch in my stash (attached). I didn't send it out because
it wasn't properly tested with different Hyper-V versions.
Vitaly Kuznetsov June 14, 2022, 3:01 p.m. UTC | #8
Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:
>
> ...
>
>>
>> As per the comments in arch/x86/kvm/vmx/evmcs.h, TSC multiplier field is
>> currently not supported in EVMCS.
>
> The latest version:
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_vmx_enlightened_vmcs
>
> has it, actually. It was missing before (compare with e.g. 6.0b version
> here:
> https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf)
>
> but AFAIR TSC scaling wasn't advertised by genuine Hyper-V either.
> Interestingly enough, eVMCS version didn't change when these fields were
> added, it is still '1'.
>
> I even have a patch in my stash (attached). I didn't send it out because
> it wasn't properly tested with different Hyper-V versions.

And of course I forgot a pre-requisite patch which updates 'struct
hv_enlightened_vmcs' to the latest:

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 0a9407dc0859..038e5ef9b4a6 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -559,9 +559,20 @@ struct hv_enlightened_vmcs {
        u64 partition_assist_page;
        u64 padding64_4[4];
        u64 guest_bndcfgs;
-       u64 padding64_5[7];
+       u64 guest_ia32_perf_global_ctrl;
+       u64 guest_ia32_s_cet;
+       u64 guest_ssp;
+       u64 guest_ia32_int_ssp_table_addr;
+       u64 guest_ia32_lbr_ctl;
+       u64 padding64_5[2];
        u64 xss_exit_bitmap;
-       u64 padding64_6[7];
+       u64 host_ia32_perf_global_ctrl;
+       u64 encls_exiting_bitmap;
+       u64 tsc_multiplier;
+       u64 host_ia32_s_cet;
+       u64 host_ssp;
+       u64 host_ia32_int_ssp_table_addr;
+       u64 padding64_6;
 } __packed;
 
 #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE                    0
Anirudh Rayabharam June 14, 2022, 3:13 p.m. UTC | #9
On Tue, Jun 14, 2022 at 02:16:00PM +0200, Paolo Bonzini wrote:
> On 6/14/22 06:55, Anirudh Rayabharam wrote:
> > > That said, I think a better implementation of this patch is to just add
> > > a version of evmcs_sanitize_exec_ctrls that takes a struct
> > > nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> > > 
> > > 	evmcs_sanitize_nested_vmx_vsrs(msrs);
> > Sanitize at the end might not work because I see some cases in
> > nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
> > even though the hardware doesn't support it.
> > 
> 
> Yes, but these will never include eVMCS-unsupported features.

How are you so sure?

For example, SECONDARY_EXEC_SHADOW_VMCS is unsupported in eVMCS but in
nested_vmx_setup_ctls_msrs() we do:

6675         /*
6676          * We can emulate "VMCS shadowing," even if the hardware
6677          * doesn't support it.
6678          */
6679         msrs->secondary_ctls_high |=
6680                 SECONDARY_EXEC_SHADOW_VMCS;

If we sanitize this out it might cause some regression right?

Thanks!

	Anirudh.
> 
> Paolo
Anirudh Rayabharam June 14, 2022, 3:17 p.m. UTC | #10
On Tue, Jun 14, 2022 at 10:25:52AM +0530, Anirudh Rayabharam wrote:
> On Mon, Jun 13, 2022 at 06:49:17PM +0200, Paolo Bonzini wrote:
> > On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > > +	if (!kvm_has_tsc_control)
> > > +		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > > +
> > >   	msrs->secondary_ctls_low = 0;
> > >   	msrs->secondary_ctls_high &=
> > >   		SECONDARY_EXEC_DESC |
> > > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> > >   		SECONDARY_EXEC_RDRAND_EXITING |
> > >   		SECONDARY_EXEC_ENABLE_INVPCID |
> > >   		SECONDARY_EXEC_RDSEED_EXITING |
> > > -		SECONDARY_EXEC_XSAVES |
> > > -		SECONDARY_EXEC_TSC_SCALING;
> > > +		SECONDARY_EXEC_XSAVES;
> > >   	/*
> > 
> > This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> > even if kvm_has_tsc_control == true.
> 
> The MSR actually allows 1-setting of the "use TSC scaling" control. So this
> line is redundant anyway.
> 
> > 
> > That said, I think a better implementation of this patch is to just add
> > a version of evmcs_sanitize_exec_ctrls that takes a struct
> > nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> > 
> > 	evmcs_sanitize_nested_vmx_vsrs(msrs);
> 
> Sanitize at the end might not work because I see some cases in
> nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
> even though the hardware doesn't support it.
> 
> > 
> > Even better (but I cannot "mentally test it" offhand) would be just
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index e802f71a9e8d..b3425ce835c5 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1862,7 +1862,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		 * sanity checking and refuse to boot. Filter all unsupported
> >  		 * features out.
> >  		 */
> > -		if (!msr_info->host_initiated &&
> > +		if (static_branch_unlikely(&enable_evmcs) ||
> >  		    vmx->nested.enlightened_vmcs_enabled)
> >  			nested_evmcs_filter_control_msr(msr_info->index,
> >  							&msr_info->data);
> 
> I will try this.

This patch fixed the issue for me. But again, this might filter out
things that we wan't to expose to L1 even if not available in hardware.

Thanks,

	Anirudh.

> 
> Thanks,
> 
> 	Anirudh.
> 
> > 
> > I cannot quite understand the host_initiated check, so I'll defer to
> > Vitaly on why it is needed.  Most likely, removing it would cause some
> > warnings in QEMU with e.g. "-cpu Haswell,+vmx"; but I think it's a
> > userspace bug and we should remove that part of the condition.  You
> > don't need to worry about that part, we'll cross that bridge if the
> > above patch works for your case.
> > 
> > Thanks,
> > 
> > Paolo
Anirudh Rayabharam June 14, 2022, 3:28 p.m. UTC | #11
On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
> On Mon, Jun 13, 2022, Paolo Bonzini wrote:
> > On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > > +	if (!kvm_has_tsc_control)
> > > +		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > > +
> > >   	msrs->secondary_ctls_low = 0;
> > >   	msrs->secondary_ctls_high &=
> > >   		SECONDARY_EXEC_DESC |
> > > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> > >   		SECONDARY_EXEC_RDRAND_EXITING |
> > >   		SECONDARY_EXEC_ENABLE_INVPCID |
> > >   		SECONDARY_EXEC_RDSEED_EXITING |
> > > -		SECONDARY_EXEC_XSAVES |
> > > -		SECONDARY_EXEC_TSC_SCALING;
> > > +		SECONDARY_EXEC_XSAVES;
> > >   	/*
> > 
> > This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> > even if kvm_has_tsc_control == true.
> > 
> > That said, I think a better implementation of this patch is to just add
> > a version of evmcs_sanitize_exec_ctrls that takes a struct
> > nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> > 
> > 	evmcs_sanitize_nested_vmx_vsrs(msrs);
> 
> Any reason not to use the already sanitized vmcs_config?  I can't think of any
> reason why the nested path should blindly use the raw MSR values from hardware.

vmcs_config has the sanitized exec controls. But how do we construct MSR
values using them?

Thanks,

	Anirudh.
Sean Christopherson June 14, 2022, 4 p.m. UTC | #12
On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
> > On Mon, Jun 13, 2022, Paolo Bonzini wrote:
> > > On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > > > +	if (!kvm_has_tsc_control)
> > > > +		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > > > +
> > > >   	msrs->secondary_ctls_low = 0;
> > > >   	msrs->secondary_ctls_high &=
> > > >   		SECONDARY_EXEC_DESC |
> > > > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> > > >   		SECONDARY_EXEC_RDRAND_EXITING |
> > > >   		SECONDARY_EXEC_ENABLE_INVPCID |
> > > >   		SECONDARY_EXEC_RDSEED_EXITING |
> > > > -		SECONDARY_EXEC_XSAVES |
> > > > -		SECONDARY_EXEC_TSC_SCALING;
> > > > +		SECONDARY_EXEC_XSAVES;
> > > >   	/*
> > > 
> > > This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> > > even if kvm_has_tsc_control == true.
> > > 
> > > That said, I think a better implementation of this patch is to just add
> > > a version of evmcs_sanitize_exec_ctrls that takes a struct
> > > nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> > > 
> > > 	evmcs_sanitize_nested_vmx_vsrs(msrs);
> > 
> > Any reason not to use the already sanitized vmcs_config?  I can't think of any
> > reason why the nested path should blindly use the raw MSR values from hardware.
> 
> vmcs_config has the sanitized exec controls. But how do we construct MSR
> values using them?

I was thinking we could use the sanitized controls for the allowed-1 bits, and then
take the required-1 bits from the CPU.  And then if we wanted to avoid the redundant
RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.

Hastily constructed and compile-tested only, proceed with caution :-)

---
 arch/x86/kvm/vmx/nested.c | 35 ++++++++++++++++++++---------------
 arch/x86/kvm/vmx/nested.h |  2 +-
 arch/x86/kvm/vmx/vmx.c    |  5 ++---
 3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f5cb18e00e78..67cbb6643efa 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6541,8 +6541,13 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
  * bit in the high half is on if the corresponding bit in the control field
  * may be on. See also vmx_control_verify().
  */
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
+void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
 {
+	struct nested_vmx_msrs *msrs = &vmcs_config.nested;
+
+	/* Take the allowed-1 bits from KVM's sanitized VMCS configuration. */
+	u32 ignore_high;
+
 	/*
 	 * Note that as a general rule, the high half of the MSRs (bits in
 	 * the control fields which may be 1) should be initialized by the
@@ -6559,11 +6564,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	 */

 	/* pin-based controls */
-	rdmsr(MSR_IA32_VMX_PINBASED_CTLS,
-		msrs->pinbased_ctls_low,
-		msrs->pinbased_ctls_high);
+	rdmsr(MSR_IA32_VMX_PINBASED_CTLS, msrs->pinbased_ctls_low, ignore_high);
 	msrs->pinbased_ctls_low |=
 		PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+
+	msrs->pinbased_ctls_high = vmcs_conf->pin_based_exec_ctrl;
 	msrs->pinbased_ctls_high &=
 		PIN_BASED_EXT_INTR_MASK |
 		PIN_BASED_NMI_EXITING |
@@ -6574,12 +6579,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		PIN_BASED_VMX_PREEMPTION_TIMER;

 	/* exit controls */
-	rdmsr(MSR_IA32_VMX_EXIT_CTLS,
-		msrs->exit_ctls_low,
-		msrs->exit_ctls_high);
+	rdmsr(MSR_IA32_VMX_EXIT_CTLS, msrs->exit_ctls_low, ignore_high);
 	msrs->exit_ctls_low =
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;

+	msrs->exit_ctls_high = vmcs_conf->vmexit_ctrl;
 	msrs->exit_ctls_high &=
 #ifdef CONFIG_X86_64
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
@@ -6595,11 +6599,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;

 	/* entry controls */
-	rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
-		msrs->entry_ctls_low,
-		msrs->entry_ctls_high);
+	rdmsr(MSR_IA32_VMX_ENTRY_CTLS, msrs->entry_ctls_low, ignore_high);
 	msrs->entry_ctls_low =
 		VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
+
+	msrs->entry_ctls_high = vmcs_conf->vmentry_ctrl;
 	msrs->entry_ctls_high &=
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
@@ -6613,11 +6617,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;

 	/* cpu-based controls */
-	rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
-		msrs->procbased_ctls_low,
-		msrs->procbased_ctls_high);
+	rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, msrs->procbased_ctls_low, ignore_high);
 	msrs->procbased_ctls_low =
 		CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+
+	msrs->procbased_ctls_high = vmcs_conf->cpu_based_exec_ctrl;
 	msrs->procbased_ctls_high &=
 		CPU_BASED_INTR_WINDOW_EXITING |
 		CPU_BASED_NMI_WINDOW_EXITING | CPU_BASED_USE_TSC_OFFSETTING |
@@ -6653,10 +6657,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	 */
 	if (msrs->procbased_ctls_high & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
 		rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
-		      msrs->secondary_ctls_low,
-		      msrs->secondary_ctls_high);
+		      msrs->secondary_ctls_low, ignore_high);

 	msrs->secondary_ctls_low = 0;
+
+	msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
 	msrs->secondary_ctls_high &=
 		SECONDARY_EXEC_DESC |
 		SECONDARY_EXEC_ENABLE_RDTSCP |
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index c92cea0b8ccc..fae047c6204b 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,7 +17,7 @@ enum nvmx_vmentry_status {
 };

 void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
+void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps);
 void nested_vmx_hardware_unsetup(void);
 __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
 void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9bd86ecccdab..cd0d0ffae0bf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7139,7 +7139,7 @@ static int __init vmx_check_processor_compat(void)
 	if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
 		return -EIO;
 	if (nested)
-		nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
+		nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
 	if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
 		printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
 				smp_processor_id());
@@ -8079,8 +8079,7 @@ static __init int hardware_setup(void)
 	setup_default_sgx_lepubkeyhash();

 	if (nested) {
-		nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
-					   vmx_capability.ept);
+		nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept);

 		r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
 		if (r)

base-commit: b821e4ff9e35a8fc999685e8d44c0644cfeaa228
--
Paolo Bonzini June 14, 2022, 5:20 p.m. UTC | #13
On 6/14/22 14:19, Vitaly Kuznetsov wrote:
> The latest version:
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_vmx_enlightened_vmcs
> 
> has it, actually. It was missing before (compare with e.g. 6.0b version
> here:
> https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf)
> 
> but AFAIR TSC scaling wasn't advertised by genuine Hyper-V either.
> Interestingly enough, eVMCS version didn't change when these fields were
> added, it is still '1'.
> 
> I even have a patch in my stash (attached). I didn't send it out because
> it wasn't properly tested with different Hyper-V versions.
> 
> -- Vitaly

Anirudh, can you check if Vitaly's patches work for you?

Paolo
Paolo Bonzini June 14, 2022, 5:28 p.m. UTC | #14
On 6/14/22 17:13, Anirudh Rayabharam wrote:
>>> Sanitize at the end might not work because I see some cases in
>>> nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
>>> even though the hardware doesn't support it.
>>
>> Yes, but these will never include eVMCS-unsupported features.
>
> How are you so sure?
> 
> For example, SECONDARY_EXEC_SHADOW_VMCS is unsupported in eVMCS but in
> nested_vmx_setup_ctls_msrs() we do:
> 
> 6675         /*
> 6676          * We can emulate "VMCS shadowing," even if the hardware
> 6677          * doesn't support it.
> 6678          */
> 6679         msrs->secondary_ctls_high |=
> 6680                 SECONDARY_EXEC_SHADOW_VMCS;
> 
> If we sanitize this out it might cause some regression right?

Yes, you're right, shadow VMCS is special: it is not supported by 
enlightened VMCS, but it is emulated rather than virtualized. 
Therefore, if L1 does not use the enlightened VMCS, it can indeed use 
shadow VMCS.

Paolo
Anirudh Rayabharam June 15, 2022, 9:01 a.m. UTC | #15
On Tue, Jun 14, 2022 at 07:20:34PM +0200, Paolo Bonzini wrote:
> On 6/14/22 14:19, Vitaly Kuznetsov wrote:
> > The latest version:
> > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_vmx_enlightened_vmcs
> > 
> > has it, actually. It was missing before (compare with e.g. 6.0b version
> > here:
> > https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf)
> > 
> > but AFAIR TSC scaling wasn't advertised by genuine Hyper-V either.
> > Interestingly enough, eVMCS version didn't change when these fields were
> > added, it is still '1'.
> > 
> > I even have a patch in my stash (attached). I didn't send it out because
> > it wasn't properly tested with different Hyper-V versions.
> > 
> > -- Vitaly
> 
> Anirudh, can you check if Vitaly's patches work for you?

I will check it. But I wonder if they fit the criteria for inclusion in
stable trees...

It is important for the fix to land in the stable trees since this issue
is a regression that was introduced _after_ 5.13. (I probably should've
mentioned this in the changelog.)

Thanks,

	Anirudh.

> 
> Paolo
Vitaly Kuznetsov June 15, 2022, 9:36 a.m. UTC | #16
Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:

> On Tue, Jun 14, 2022 at 07:20:34PM +0200, Paolo Bonzini wrote:
>> On 6/14/22 14:19, Vitaly Kuznetsov wrote:
>> > The latest version:
>> > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_vmx_enlightened_vmcs
>> > 
>> > has it, actually. It was missing before (compare with e.g. 6.0b version
>> > here:
>> > https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf)
>> > 
>> > but AFAIR TSC scaling wasn't advertised by genuine Hyper-V either.
>> > Interestingly enough, eVMCS version didn't change when these fields were
>> > added, it is still '1'.
>> > 
>> > I even have a patch in my stash (attached). I didn't send it out because
>> > it wasn't properly tested with different Hyper-V versions.
>> > 
>> > -- Vitaly
>> 
>> Anirudh, can you check if Vitaly's patches work for you?
>
> I will check it. But I wonder if they fit the criteria for inclusion in
> stable trees...
>
> It is important for the fix to land in the stable trees since this issue
> is a regression that was introduced _after_ 5.13. (I probably should've
> mentioned this in the changelog.)
>

Personally, I see no problem with splitting off TscMultiplier part from
my patch and marking it for stable@ fixing d041b5ea93352. I'm going to
run some tests too.
Vitaly Kuznetsov June 15, 2022, 11:30 a.m. UTC | #17
Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>
>> Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:
>>
>> ...
>>
>>>
>>> As per the comments in arch/x86/kvm/vmx/evmcs.h, TSC multiplier field is
>>> currently not supported in EVMCS.
>>
>> The latest version:
>> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_vmx_enlightened_vmcs
>>
>> has it, actually. It was missing before (compare with e.g. 6.0b version
>> here:
>> https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf)
>>
>> but AFAIR TSC scaling wasn't advertised by genuine Hyper-V either.
>> Interestingly enough, eVMCS version didn't change when these fields were
>> added, it is still '1'.
>>
>> I even have a patch in my stash (attached). I didn't send it out because
>> it wasn't properly tested with different Hyper-V versions.
>
> And of course I forgot a pre-requisite patch which updates 'struct
> hv_enlightened_vmcs' to the latest:
>

The good news is that TscMultiplies seems to work fine for me, at least
with an Azure Dv5 instance where I can see Tsc scaling exposed. The bad
news is that a few more patches are needed:

1) Fix 'struct hv_enlightened_vmcs' definition:
https://lore.kernel.org/kvm/20220613133922.2875594-20-vkuznets@redhat.com/

2) Define VMCS-to-EVMCS conversion for the new fields :

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 6a61b1ae7942..707a8de11802 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -28,6 +28,8 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
                     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
        EVMCS1_FIELD(HOST_IA32_EFER, host_ia32_efer,
                     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+       EVMCS1_FIELD(HOST_IA32_PERF_GLOBAL_CTRL, host_ia32_perf_global_ctrl,
+                    HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
        EVMCS1_FIELD(HOST_CR0, host_cr0,
                     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
        EVMCS1_FIELD(HOST_CR3, host_cr3,
@@ -78,6 +80,8 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
                     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
        EVMCS1_FIELD(GUEST_IA32_EFER, guest_ia32_efer,
                     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+       EVMCS1_FIELD(GUEST_IA32_PERF_GLOBAL_CTRL, guest_ia32_perf_global_ctrl,
+                    HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
        EVMCS1_FIELD(GUEST_PDPTR0, guest_pdptr0,
                     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
        EVMCS1_FIELD(GUEST_PDPTR1, guest_pdptr1,
@@ -126,24 +130,47 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
                     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
        EVMCS1_FIELD(XSS_EXIT_BITMAP, xss_exit_bitmap,
                     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
+       EVMCS1_FIELD(ENCLS_EXITING_BITMAP, encls_exiting_bitmap,
+                    HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
+       EVMCS1_FIELD(TSC_MULTIPLIER, tsc_multiplier,
+                    HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),

...

so it is becoming more and more complicated to assemble for testing. Let
me finish my testing and I'll put a series together and send it out to
simplify the process. Stay tuned!
Vitaly Kuznetsov June 22, 2022, 8 a.m. UTC | #18
Sean Christopherson <seanjc@google.com> writes:

> On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
>> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:

...

>> > 
>> > Any reason not to use the already sanitized vmcs_config?  I can't think of any
>> > reason why the nested path should blindly use the raw MSR values from hardware.
>> 
>> vmcs_config has the sanitized exec controls. But how do we construct MSR
>> values using them?
>
> I was thinking we could use the sanitized controls for the allowed-1 bits, and then
> take the required-1 bits from the CPU.  And then if we wanted to avoid the redundant
> RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
>
> Hastily constructed and compile-tested only, proceed with caution :-)

Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
the suggested patch and try to construct something for required-1 bits.

Thanks!
Anirudh Rayabharam June 22, 2022, 1:52 p.m. UTC | #19
On Wed, Jun 22, 2022 at 10:00:29AM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
> >> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
> 
> ...
> 
> >> > 
> >> > Any reason not to use the already sanitized vmcs_config?  I can't think of any
> >> > reason why the nested path should blindly use the raw MSR values from hardware.
> >> 
> >> vmcs_config has the sanitized exec controls. But how do we construct MSR
> >> values using them?
> >
> > I was thinking we could use the sanitized controls for the allowed-1 bits, and then
> > take the required-1 bits from the CPU.  And then if we wanted to avoid the redundant
> > RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
> >
> > Hastily constructed and compile-tested only, proceed with caution :-)
> 
> Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
> EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
> TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
> use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
> the suggested patch and try to construct something for required-1 bits.

I tried this patch today but it causes some regression which causes
/dev/kvm to be unavailable in L1. I didn't get a chance to look into it
closely but I am guessing it has something to do with the fact that
vmcs_config reflects the config that L0 chose to use rather than what is
available to use. So constructing allowed-1 MSR bits based on what bits
are set in exec controls maybe isn't correct.


Thanks!

	- Anirudh.
Vitaly Kuznetsov June 22, 2022, 2:35 p.m. UTC | #20
Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:

> On Wed, Jun 22, 2022 at 10:00:29AM +0200, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
>> >> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
>> 
>> ...
>> 
>> >> > 
>> >> > Any reason not to use the already sanitized vmcs_config?  I can't think of any
>> >> > reason why the nested path should blindly use the raw MSR values from hardware.
>> >> 
>> >> vmcs_config has the sanitized exec controls. But how do we construct MSR
>> >> values using them?
>> >
>> > I was thinking we could use the sanitized controls for the allowed-1 bits, and then
>> > take the required-1 bits from the CPU.  And then if we wanted to avoid the redundant
>> > RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
>> >
>> > Hastily constructed and compile-tested only, proceed with caution :-)
>> 
>> Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
>> EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
>> TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
>> use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
>> the suggested patch and try to construct something for required-1 bits.
>
> I tried this patch today but it causes some regression which causes
> /dev/kvm to be unavailable in L1. I didn't get a chance to look into it
> closely but I am guessing it has something to do with the fact that
> vmcs_config reflects the config that L0 chose to use rather than what is
> available to use. So constructing allowed-1 MSR bits based on what bits
> are set in exec controls maybe isn't correct.

I've tried to pick it up but it's actually much harder than I think. The
patch has some minor issues ('&vmcs_config.nested' needs to be switched
to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
identify at least:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5e14e4c40007..8076352174ad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2483,8 +2483,14 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
              CPU_BASED_INVLPG_EXITING |
              CPU_BASED_RDPMC_EXITING;
 
-       opt = CPU_BASED_TPR_SHADOW |
+       opt = CPU_BASED_INTR_WINDOW_EXITING |
+             CPU_BASED_NMI_WINDOW_EXITING |
+             CPU_BASED_TPR_SHADOW |
+             CPU_BASED_USE_IO_BITMAPS |
              CPU_BASED_USE_MSR_BITMAPS |
+             CPU_BASED_MONITOR_TRAP_FLAG |
+             CPU_BASED_RDTSC_EXITING |
+             CPU_BASED_PAUSE_EXITING |
              CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
              CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
        if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
@@ -2582,6 +2588,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 #endif
        opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
              VM_EXIT_LOAD_IA32_PAT |
+             VM_EXIT_SAVE_IA32_PAT |
              VM_EXIT_LOAD_IA32_EFER |
              VM_EXIT_CLEAR_BNDCFGS |
              VM_EXIT_PT_CONCEAL_PIP |
@@ -2604,7 +2611,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
                _pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
 
        min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
-       opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
+       opt =
+#ifdef CONFIG_X86_64
+             VM_ENTRY_IA32E_MODE |
+#endif
+             VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
              VM_ENTRY_LOAD_IA32_PAT |
              VM_ENTRY_LOAD_IA32_EFER |
              VM_ENTRY_LOAD_BNDCFGS |

but it is 1) not sufficient because some controls are smartly filtered
out just because we don't want them for L1 -- and this doesn't mean that
L2 doesn't need them and 2) because if we add some 'opt' controls to
setup_vmcs_config() we need to filter them out somewhere else.

I'm starting to think we may just want to store raw VMX MSR values in
vmcs_config first, then sanitize them (eVMCS, vmx preemtoion timer bug,
perf_ctrl bug,..) and then do the adjust_vmx_controls() magic. 

I'm not giving up yet but don't expect something small and backportable
to stable :-)
Anirudh Rayabharam June 22, 2022, 4:19 p.m. UTC | #21
On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:
> Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:
> 
> > On Wed, Jun 22, 2022 at 10:00:29AM +0200, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <seanjc@google.com> writes:
> >> 
> >> > On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
> >> >> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
> >> 
> >> ...
> >> 
> >> >> > 
> >> >> > Any reason not to use the already sanitized vmcs_config?  I can't think of any
> >> >> > reason why the nested path should blindly use the raw MSR values from hardware.
> >> >> 
> >> >> vmcs_config has the sanitized exec controls. But how do we construct MSR
> >> >> values using them?
> >> >
> >> > I was thinking we could use the sanitized controls for the allowed-1 bits, and then
> >> > take the required-1 bits from the CPU.  And then if we wanted to avoid the redundant
> >> > RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
> >> >
> >> > Hastily constructed and compile-tested only, proceed with caution :-)
> >> 
> >> Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
> >> EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
> >> TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
> >> use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
> >> the suggested patch and try to construct something for required-1 bits.
> >
> > I tried this patch today but it causes some regression which causes
> > /dev/kvm to be unavailable in L1. I didn't get a chance to look into it
> > closely but I am guessing it has something to do with the fact that
> > vmcs_config reflects the config that L0 chose to use rather than what is
> > available to use. So constructing allowed-1 MSR bits based on what bits
> > are set in exec controls maybe isn't correct.
> 
> I've tried to pick it up but it's actually much harder than I think. The
> patch has some minor issues ('&vmcs_config.nested' needs to be switched
> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
> identify at least:
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 5e14e4c40007..8076352174ad 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2483,8 +2483,14 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>               CPU_BASED_INVLPG_EXITING |
>               CPU_BASED_RDPMC_EXITING;
>  
> -       opt = CPU_BASED_TPR_SHADOW |
> +       opt = CPU_BASED_INTR_WINDOW_EXITING |
> +             CPU_BASED_NMI_WINDOW_EXITING |
> +             CPU_BASED_TPR_SHADOW |
> +             CPU_BASED_USE_IO_BITMAPS |
>               CPU_BASED_USE_MSR_BITMAPS |
> +             CPU_BASED_MONITOR_TRAP_FLAG |
> +             CPU_BASED_RDTSC_EXITING |
> +             CPU_BASED_PAUSE_EXITING |
>               CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
>               CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
>         if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
> @@ -2582,6 +2588,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>  #endif
>         opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
>               VM_EXIT_LOAD_IA32_PAT |
> +             VM_EXIT_SAVE_IA32_PAT |
>               VM_EXIT_LOAD_IA32_EFER |
>               VM_EXIT_CLEAR_BNDCFGS |
>               VM_EXIT_PT_CONCEAL_PIP |
> @@ -2604,7 +2611,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>                 _pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
>  
>         min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
> -       opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
> +       opt =
> +#ifdef CONFIG_X86_64
> +             VM_ENTRY_IA32E_MODE |
> +#endif
> +             VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
>               VM_ENTRY_LOAD_IA32_PAT |
>               VM_ENTRY_LOAD_IA32_EFER |
>               VM_ENTRY_LOAD_BNDCFGS |
> 
> but it is 1) not sufficient because some controls are smartly filtered
> out just because we don't want them for L1 -- and this doesn't mean that
> L2 doesn't need them and 2) because if we add some 'opt' controls to
> setup_vmcs_config() we need to filter them out somewhere else.
> 
> I'm starting to think we may just want to store raw VMX MSR values in
> vmcs_config first, then sanitize them (eVMCS, vmx preemtoion timer bug,
> perf_ctrl bug,..) and then do the adjust_vmx_controls() magic. 
> 
> I'm not giving up yet but don't expect something small and backportable
> to stable :-) 

How about we do something simple like the patch below to start with?
This will easily apply to stable and we can continue improving upon
it with follow up patches on mainline.

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f5cb18e00e78..f88d748c7cc6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6564,6 +6564,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		msrs->pinbased_ctls_high);
 	msrs->pinbased_ctls_low |=
 		PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (static_branch_unlikely(&enable_evmcs))
+		msrs->pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
+#endif
 	msrs->pinbased_ctls_high &=
 		PIN_BASED_EXT_INTR_MASK |
 		PIN_BASED_NMI_EXITING |
@@ -6580,6 +6584,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	msrs->exit_ctls_low =
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
 
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (static_branch_unlikely(&enable_evmcs))
+		msrs->exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
+#endif
 	msrs->exit_ctls_high &=
 #ifdef CONFIG_X86_64
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
@@ -6600,6 +6608,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		msrs->entry_ctls_high);
 	msrs->entry_ctls_low =
 		VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (static_branch_unlikely(&enable_evmcs))
+		msrs->entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
+#endif
 	msrs->entry_ctls_high &=
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
@@ -6657,6 +6669,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		      msrs->secondary_ctls_high);
 
 	msrs->secondary_ctls_low = 0;
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (static_branch_unlikely(&enable_evmcs))
+		msrs->secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
+#endif
 	msrs->secondary_ctls_high &=
 		SECONDARY_EXEC_DESC |
 		SECONDARY_EXEC_ENABLE_RDTSCP |
Vitaly Kuznetsov June 22, 2022, 4:48 p.m. UTC | #22
Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:

> On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:

...

>> 
>> I've tried to pick it up but it's actually much harder than I think. The
>> patch has some minor issues ('&vmcs_config.nested' needs to be switched
>> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
>> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
>> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
>> identify at least:

...

I've jsut sent "[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for
setting up nested VMX MSRs" which implements Sean's suggestion. Hope
this is the way to go for mainline.

>
> How about we do something simple like the patch below to start with?
> This will easily apply to stable and we can continue improving upon
> it with follow up patches on mainline.
>

Personally, I'm not against this for @stable. Alternatively, in case the
only observed issue is with TSC scaling, we can add support for it for
KVM-on-Hyper-V but not for Hyper-V-on-KVM (a small subset of "[PATCH
00/11] KVM: VMX: Support TscScaling and EnclsExitingBitmap whith
eVMCS"). I can prepare patches if needed.
Anirudh Rayabharam June 23, 2022, 10:17 a.m. UTC | #23
On Wed, Jun 22, 2022 at 06:48:50PM +0200, Vitaly Kuznetsov wrote:
> Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:
> 
> > On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:
> 
> ...
> 
> >> 
> >> I've tried to pick it up but it's actually much harder than I think. The
> >> patch has some minor issues ('&vmcs_config.nested' needs to be switched
> >> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
> >> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
> >> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
> >> identify at least:
> 
> ...
> 
> I've jsut sent "[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for
> setting up nested VMX MSRs" which implements Sean's suggestion. Hope
> this is the way to go for mainline.
> 
> >
> > How about we do something simple like the patch below to start with?
> > This will easily apply to stable and we can continue improving upon
> > it with follow up patches on mainline.
> >
> 
> Personally, I'm not against this for @stable. Alternatively, in case the

I think it's a good intermediate fix for mainline too. It is easier to land
it in stable if it already exists in mainline. It can stay in mainline
until your series lands and replaces it with the vmcs_config approach.

What do you think?

> only observed issue is with TSC scaling, we can add support for it for
> KVM-on-Hyper-V but not for Hyper-V-on-KVM (a small subset of "[PATCH
> 00/11] KVM: VMX: Support TscScaling and EnclsExitingBitmap whith
> eVMCS"). I can prepare patches if needed.

Will it fit in stable's 100 line rule?

Thanks!

	- Anirudh.
Vitaly Kuznetsov June 23, 2022, 11:49 a.m. UTC | #24
Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:

> On Wed, Jun 22, 2022 at 06:48:50PM +0200, Vitaly Kuznetsov wrote:
>> Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:
>> 
>> > On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:
>> 
>> ...
>> 
>> >> 
>> >> I've tried to pick it up but it's actually much harder than I think. The
>> >> patch has some minor issues ('&vmcs_config.nested' needs to be switched
>> >> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
>> >> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
>> >> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
>> >> identify at least:
>> 
>> ...
>> 
>> I've jsut sent "[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for
>> setting up nested VMX MSRs" which implements Sean's suggestion. Hope
>> this is the way to go for mainline.
>> 
>> >
>> > How about we do something simple like the patch below to start with?
>> > This will easily apply to stable and we can continue improving upon
>> > it with follow up patches on mainline.
>> >
>> 
>> Personally, I'm not against this for @stable. Alternatively, in case the
>
> I think it's a good intermediate fix for mainline too. It is easier to land
> it in stable if it already exists in mainline. It can stay in mainline
> until your series lands and replaces it with the vmcs_config approach.
>
> What do you think?
>

Paolo's call but personally I think both series can make 5.20 so there's
no need for an intermediate solution.

>> only observed issue is with TSC scaling, we can add support for it for
>> KVM-on-Hyper-V but not for Hyper-V-on-KVM (a small subset of "[PATCH
>> 00/11] KVM: VMX: Support TscScaling and EnclsExitingBitmap whith
>> eVMCS"). I can prepare patches if needed.
>
> Will it fit in stable's 100 line rule?
>

Yes, please take a look at the attached patches (5.18.y based). First 3
are identical to what I've sent for mainline, the last one is reduced to
only support TSC scaling for KVM on Hyper-V (but not Hyper-V on
KVM). Compile tested only, proceed with caution)
Anirudh Rayabharam June 28, 2022, 10:30 a.m. UTC | #25
On Thu, Jun 23, 2022 at 01:49:30PM +0200, Vitaly Kuznetsov wrote:
> Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:
> 
> > On Wed, Jun 22, 2022 at 06:48:50PM +0200, Vitaly Kuznetsov wrote:
> >> Anirudh Rayabharam <anrayabh@linux.microsoft.com> writes:
> >> 
> >> > On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:
> >> 
> >> ...
> >> 
> >> >> 
> >> >> I've tried to pick it up but it's actually much harder than I think. The
> >> >> patch has some minor issues ('&vmcs_config.nested' needs to be switched
> >> >> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
> >> >> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
> >> >> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
> >> >> identify at least:
> >> 
> >> ...
> >> 
> >> I've jsut sent "[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for
> >> setting up nested VMX MSRs" which implements Sean's suggestion. Hope
> >> this is the way to go for mainline.
> >> 
> >> >
> >> > How about we do something simple like the patch below to start with?
> >> > This will easily apply to stable and we can continue improving upon
> >> > it with follow up patches on mainline.
> >> >
> >> 
> >> Personally, I'm not against this for @stable. Alternatively, in case the
> >
> > I think it's a good intermediate fix for mainline too. It is easier to land
> > it in stable if it already exists in mainline. It can stay in mainline
> > until your series lands and replaces it with the vmcs_config approach.
> >
> > What do you think?
> >
> 
> Paolo's call but personally I think both series can make 5.20 so there's
> no need for an intermediate solution.

Only reason I see for this intermediate solution is to automatically
land the fix in stable without bothering to write a special backport.

I will send it as a proper patch and see if there is any interest in
taking it.

	- Anirudh.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f5cb18e00e78..d773ddc6422b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6656,6 +6656,9 @@  void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		      msrs->secondary_ctls_low,
 		      msrs->secondary_ctls_high);
 
+	if (!kvm_has_tsc_control)
+		msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
+
 	msrs->secondary_ctls_low = 0;
 	msrs->secondary_ctls_high &=
 		SECONDARY_EXEC_DESC |
@@ -6667,8 +6670,7 @@  void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		SECONDARY_EXEC_RDRAND_EXITING |
 		SECONDARY_EXEC_ENABLE_INVPCID |
 		SECONDARY_EXEC_RDSEED_EXITING |
-		SECONDARY_EXEC_XSAVES |
-		SECONDARY_EXEC_TSC_SCALING;
+		SECONDARY_EXEC_XSAVES;
 
 	/*
 	 * We can emulate "VMCS shadowing," even if the hardware