diff mbox series

[v6,06/13] KVM: nVMX: optimize prepare_vmcs02{,_full} for Enlightened VMCS case

Message ID 20181016165011.6607-7-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: nVMX: Enlightened VMCS for Hyper-V on KVM | expand

Commit Message

Vitaly Kuznetsov Oct. 16, 2018, 4:50 p.m. UTC
When Enlightened VMCS is in use by L1 hypervisor we can avoid vmwriting
VMCS fields which did not change.

Our first goal is to achieve minimal impact on traditional VMCS case so
we're not wrapping each vmwrite() with an if-changed checker. We also can't
utilize static keys as Enlightened VMCS usage is per-guest.

This patch implements the simpliest solution: checking fields in groups.
We skip single vmwrite() statements as doing the check will cost us
something even in non-evmcs case and the win is tiny. Unfortunately, this
makes prepare_vmcs02_full{,_full}() code Enlightened VMCS-dependent (and
a bit ugly).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx.c | 118 +++++++++++++++++++++++++--------------------
 1 file changed, 65 insertions(+), 53 deletions(-)

Comments

Paolo Bonzini Oct. 16, 2018, 9:55 p.m. UTC | #1
On 16/10/2018 18:50, Vitaly Kuznetsov wrote:
> +	if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
> +			   HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
> +		vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> +		vmcs_write32(GUEST_CS_LIMIT, vmcs12->guest_cs_limit);
> +		vmcs_write32(GUEST_CS_AR_BYTES, vmcs12->guest_cs_ar_bytes);
> +		vmcs_writel(GUEST_ES_BASE, vmcs12->guest_es_base);
> +		vmcs_writel(GUEST_CS_BASE, vmcs12->guest_cs_base);
> +	}

For what it's worth, I suspect that these can be moved to
prepare_vmcs02_full.  The initial implementation of shadow VMCS did not
expose "unrestricted guest" to the L1 hypervisor, and emulation does a
lot of accesses to CS (of course).  Not sure how ES base ended up in
there and not DS base, though...

Paolo
Vitaly Kuznetsov Oct. 17, 2018, 2:47 p.m. UTC | #2
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 16/10/2018 18:50, Vitaly Kuznetsov wrote:
>> +	if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
>> +			   HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
>> +		vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>> +		vmcs_write32(GUEST_CS_LIMIT, vmcs12->guest_cs_limit);
>> +		vmcs_write32(GUEST_CS_AR_BYTES, vmcs12->guest_cs_ar_bytes);
>> +		vmcs_writel(GUEST_ES_BASE, vmcs12->guest_es_base);
>> +		vmcs_writel(GUEST_CS_BASE, vmcs12->guest_cs_base);
>> +	}
>
> For what it's worth, I suspect that these can be moved to
> prepare_vmcs02_full.  The initial implementation of shadow VMCS did not
> expose "unrestricted guest" to the L1 hypervisor, and emulation does a
> lot of accesses to CS (of course).  Not sure how ES base ended up in
> there and not DS base, though...

I tried unshadowing all these fields and at least Hyper-V on KVM
(without using eVMCS of course) experiences a 1200-1300 cpu cycles
regression during tight cpuid loop test. I checked and this happens
because it likes vmreading GUEST_CS_AR_BYTES a lot.
Paolo Bonzini Oct. 17, 2018, 5:02 p.m. UTC | #3
On 17/10/2018 16:47, Vitaly Kuznetsov wrote:
>>> +	if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
>>> +			   HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
>>> +		vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>>> +		vmcs_write32(GUEST_CS_LIMIT, vmcs12->guest_cs_limit);
>>> +		vmcs_write32(GUEST_CS_AR_BYTES, vmcs12->guest_cs_ar_bytes);
>>> +		vmcs_writel(GUEST_ES_BASE, vmcs12->guest_es_base);
>>> +		vmcs_writel(GUEST_CS_BASE, vmcs12->guest_cs_base);
>>> +	}
>> For what it's worth, I suspect that these can be moved to
>> prepare_vmcs02_full.  The initial implementation of shadow VMCS did not
>> expose "unrestricted guest" to the L1 hypervisor, and emulation does a
>> lot of accesses to CS (of course).  Not sure how ES base ended up in
>> there and not DS base, though...
> I tried unshadowing all these fields and at least Hyper-V on KVM
> (without using eVMCS of course) experiences a 1200-1300 cpu cycles
> regression during tight cpuid loop test. I checked and this happens
> because it likes vmreading GUEST_CS_AR_BYTES a lot.

Go figure. :)  Liran, do you happen to know if ESX does something
similar with CS descriptor cache fields?

Paolo
Jim Mattson Oct. 17, 2018, 5:08 p.m. UTC | #4
I believe that ESXi reads GUEST_CS_AR_BYTES on every VM-exit to
determine code size.

On Wed, Oct 17, 2018 at 10:02 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 17/10/2018 16:47, Vitaly Kuznetsov wrote:
>>>> +   if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
>>>> +                      HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
>>>> +           vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>>>> +           vmcs_write32(GUEST_CS_LIMIT, vmcs12->guest_cs_limit);
>>>> +           vmcs_write32(GUEST_CS_AR_BYTES, vmcs12->guest_cs_ar_bytes);
>>>> +           vmcs_writel(GUEST_ES_BASE, vmcs12->guest_es_base);
>>>> +           vmcs_writel(GUEST_CS_BASE, vmcs12->guest_cs_base);
>>>> +   }
>>> For what it's worth, I suspect that these can be moved to
>>> prepare_vmcs02_full.  The initial implementation of shadow VMCS did not
>>> expose "unrestricted guest" to the L1 hypervisor, and emulation does a
>>> lot of accesses to CS (of course).  Not sure how ES base ended up in
>>> there and not DS base, though...
>> I tried unshadowing all these fields and at least Hyper-V on KVM
>> (without using eVMCS of course) experiences a 1200-1300 cpu cycles
>> regression during tight cpuid loop test. I checked and this happens
>> because it likes vmreading GUEST_CS_AR_BYTES a lot.
>
> Go figure. :)  Liran, do you happen to know if ESX does something
> similar with CS descriptor cache fields?
>
> Paolo
Paolo Bonzini Oct. 17, 2018, 5:17 p.m. UTC | #5
On 17/10/2018 19:08, Jim Mattson wrote:
> I believe that ESXi reads GUEST_CS_AR_BYTES on every VM-exit to
> determine code size.

Which makes me wonder, maybe we should add GUEST_SS_AR_BYTES which is
where the CPL lives.  But then your tests from last year didn't find it.

Paolo

> On Wed, Oct 17, 2018 at 10:02 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 17/10/2018 16:47, Vitaly Kuznetsov wrote:
>>>>> +   if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
>>>>> +                      HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
>>>>> +           vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>>>>> +           vmcs_write32(GUEST_CS_LIMIT, vmcs12->guest_cs_limit);
>>>>> +           vmcs_write32(GUEST_CS_AR_BYTES, vmcs12->guest_cs_ar_bytes);
>>>>> +           vmcs_writel(GUEST_ES_BASE, vmcs12->guest_es_base);
>>>>> +           vmcs_writel(GUEST_CS_BASE, vmcs12->guest_cs_base);
>>>>> +   }
>>>> For what it's worth, I suspect that these can be moved to
>>>> prepare_vmcs02_full.  The initial implementation of shadow VMCS did not
>>>> expose "unrestricted guest" to the L1 hypervisor, and emulation does a
>>>> lot of accesses to CS (of course).  Not sure how ES base ended up in
>>>> there and not DS base, though...
>>> I tried unshadowing all these fields and at least Hyper-V on KVM
>>> (without using eVMCS of course) experiences a 1200-1300 cpu cycles
>>> regression during tight cpuid loop test. I checked and this happens
>>> because it likes vmreading GUEST_CS_AR_BYTES a lot.
>>
>> Go figure. :)  Liran, do you happen to know if ESX does something
>> similar with CS descriptor cache fields?
>>
>> Paolo
Vitaly Kuznetsov Oct. 18, 2018, 11:14 a.m. UTC | #6
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 17/10/2018 19:08, Jim Mattson wrote:
>> I believe that ESXi reads GUEST_CS_AR_BYTES on every VM-exit to
>> determine code size.
>
> Which makes me wonder, maybe we should add GUEST_SS_AR_BYTES which is
> where the CPL lives.  But then your tests from last year didn't find it.
>

Hyper-V does read GUEST_SS_AR_BYTES. Way less frequent than
GUEST_CS_AR_BYTES but still.

Based on that my suggestion would be to shadow GUEST_SS_AR_BYTES, keep
GUEST_SS_AR_BYTES and unshadow the rest (GUEST_ES_BASE,
GUEST_CS_SELECTOR, GUEST_CS_LIMIT, GUEST_CS_BASE). I can do this as a
separate patch as I see this series is already in kvm/queue.
Paolo Bonzini Oct. 18, 2018, 12:42 p.m. UTC | #7
On 18/10/2018 13:14, Vitaly Kuznetsov wrote:
> 
> Based on that my suggestion would be to shadow GUEST_SS_AR_BYTES, keep
> GUEST_SS_AR_BYTES and unshadow the rest (GUEST_ES_BASE,
> GUEST_CS_SELECTOR, GUEST_CS_LIMIT, GUEST_CS_BASE). I can do this as a
> separate patch as I see this series is already in kvm/queue.

Yes, it should be a separate patch anyway.

GUEST_CS_BASE and GUEST_CS_LIMIT probably matter for 32-bit guests, but
I guess it's okay to remove them.  GUEST_CS_SELECTOR probably dates back
to when we were incorrectly using CPL=CS.RPL instead of CPL=SS.DPL, and
can be removed too.

GUEST_ES_BASE alone is quite useless, so it can go.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cfb44acd4291..0b665c74fadc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -12699,43 +12699,62 @@  static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 
 static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 {
-	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
-	vmcs_write16(GUEST_SS_SELECTOR, vmcs12->guest_ss_selector);
-	vmcs_write16(GUEST_DS_SELECTOR, vmcs12->guest_ds_selector);
-	vmcs_write16(GUEST_FS_SELECTOR, vmcs12->guest_fs_selector);
-	vmcs_write16(GUEST_GS_SELECTOR, vmcs12->guest_gs_selector);
-	vmcs_write16(GUEST_LDTR_SELECTOR, vmcs12->guest_ldtr_selector);
-	vmcs_write16(GUEST_TR_SELECTOR, vmcs12->guest_tr_selector);
-	vmcs_write32(GUEST_ES_LIMIT, vmcs12->guest_es_limit);
-	vmcs_write32(GUEST_SS_LIMIT, vmcs12->guest_ss_limit);
-	vmcs_write32(GUEST_DS_LIMIT, vmcs12->guest_ds_limit);
-	vmcs_write32(GUEST_FS_LIMIT, vmcs12->guest_fs_limit);
-	vmcs_write32(GUEST_GS_LIMIT, vmcs12->guest_gs_limit);
-	vmcs_write32(GUEST_LDTR_LIMIT, vmcs12->guest_ldtr_limit);
-	vmcs_write32(GUEST_TR_LIMIT, vmcs12->guest_tr_limit);
-	vmcs_write32(GUEST_GDTR_LIMIT, vmcs12->guest_gdtr_limit);
-	vmcs_write32(GUEST_IDTR_LIMIT, vmcs12->guest_idtr_limit);
-	vmcs_write32(GUEST_ES_AR_BYTES, vmcs12->guest_es_ar_bytes);
-	vmcs_write32(GUEST_SS_AR_BYTES, vmcs12->guest_ss_ar_bytes);
-	vmcs_write32(GUEST_DS_AR_BYTES, vmcs12->guest_ds_ar_bytes);
-	vmcs_write32(GUEST_FS_AR_BYTES, vmcs12->guest_fs_ar_bytes);
-	vmcs_write32(GUEST_GS_AR_BYTES, vmcs12->guest_gs_ar_bytes);
-	vmcs_write32(GUEST_LDTR_AR_BYTES, vmcs12->guest_ldtr_ar_bytes);
-	vmcs_write32(GUEST_TR_AR_BYTES, vmcs12->guest_tr_ar_bytes);
-	vmcs_writel(GUEST_SS_BASE, vmcs12->guest_ss_base);
-	vmcs_writel(GUEST_DS_BASE, vmcs12->guest_ds_base);
-	vmcs_writel(GUEST_FS_BASE, vmcs12->guest_fs_base);
-	vmcs_writel(GUEST_GS_BASE, vmcs12->guest_gs_base);
-	vmcs_writel(GUEST_LDTR_BASE, vmcs12->guest_ldtr_base);
-	vmcs_writel(GUEST_TR_BASE, vmcs12->guest_tr_base);
-	vmcs_writel(GUEST_GDTR_BASE, vmcs12->guest_gdtr_base);
-	vmcs_writel(GUEST_IDTR_BASE, vmcs12->guest_idtr_base);
-
-	vmcs_write32(GUEST_SYSENTER_CS, vmcs12->guest_sysenter_cs);
-	vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
-		vmcs12->guest_pending_dbg_exceptions);
-	vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->guest_sysenter_esp);
-	vmcs_writel(GUEST_SYSENTER_EIP, vmcs12->guest_sysenter_eip);
+	struct hv_enlightened_vmcs *hv_evmcs = vmx->nested.hv_evmcs;
+
+	if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
+			   HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
+		vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
+		vmcs_write16(GUEST_SS_SELECTOR, vmcs12->guest_ss_selector);
+		vmcs_write16(GUEST_DS_SELECTOR, vmcs12->guest_ds_selector);
+		vmcs_write16(GUEST_FS_SELECTOR, vmcs12->guest_fs_selector);
+		vmcs_write16(GUEST_GS_SELECTOR, vmcs12->guest_gs_selector);
+		vmcs_write16(GUEST_LDTR_SELECTOR, vmcs12->guest_ldtr_selector);
+		vmcs_write16(GUEST_TR_SELECTOR, vmcs12->guest_tr_selector);
+		vmcs_write32(GUEST_ES_LIMIT, vmcs12->guest_es_limit);
+		vmcs_write32(GUEST_SS_LIMIT, vmcs12->guest_ss_limit);
+		vmcs_write32(GUEST_DS_LIMIT, vmcs12->guest_ds_limit);
+		vmcs_write32(GUEST_FS_LIMIT, vmcs12->guest_fs_limit);
+		vmcs_write32(GUEST_GS_LIMIT, vmcs12->guest_gs_limit);
+		vmcs_write32(GUEST_LDTR_LIMIT, vmcs12->guest_ldtr_limit);
+		vmcs_write32(GUEST_TR_LIMIT, vmcs12->guest_tr_limit);
+		vmcs_write32(GUEST_GDTR_LIMIT, vmcs12->guest_gdtr_limit);
+		vmcs_write32(GUEST_IDTR_LIMIT, vmcs12->guest_idtr_limit);
+		vmcs_write32(GUEST_ES_AR_BYTES, vmcs12->guest_es_ar_bytes);
+		vmcs_write32(GUEST_SS_AR_BYTES, vmcs12->guest_ss_ar_bytes);
+		vmcs_write32(GUEST_DS_AR_BYTES, vmcs12->guest_ds_ar_bytes);
+		vmcs_write32(GUEST_FS_AR_BYTES, vmcs12->guest_fs_ar_bytes);
+		vmcs_write32(GUEST_GS_AR_BYTES, vmcs12->guest_gs_ar_bytes);
+		vmcs_write32(GUEST_LDTR_AR_BYTES, vmcs12->guest_ldtr_ar_bytes);
+		vmcs_write32(GUEST_TR_AR_BYTES, vmcs12->guest_tr_ar_bytes);
+		vmcs_writel(GUEST_SS_BASE, vmcs12->guest_ss_base);
+		vmcs_writel(GUEST_DS_BASE, vmcs12->guest_ds_base);
+		vmcs_writel(GUEST_FS_BASE, vmcs12->guest_fs_base);
+		vmcs_writel(GUEST_GS_BASE, vmcs12->guest_gs_base);
+		vmcs_writel(GUEST_LDTR_BASE, vmcs12->guest_ldtr_base);
+		vmcs_writel(GUEST_TR_BASE, vmcs12->guest_tr_base);
+		vmcs_writel(GUEST_GDTR_BASE, vmcs12->guest_gdtr_base);
+		vmcs_writel(GUEST_IDTR_BASE, vmcs12->guest_idtr_base);
+	}
+
+	if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
+			   HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1)) {
+		vmcs_write32(GUEST_SYSENTER_CS, vmcs12->guest_sysenter_cs);
+		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
+			    vmcs12->guest_pending_dbg_exceptions);
+		vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->guest_sysenter_esp);
+		vmcs_writel(GUEST_SYSENTER_EIP, vmcs12->guest_sysenter_eip);
+
+		/*
+		 * L1 may access the L2's PDPTR, so save them to construct
+		 * vmcs12
+		 */
+		if (enable_ept) {
+			vmcs_write64(GUEST_PDPTR0, vmcs12->guest_pdptr0);
+			vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1);
+			vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
+			vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
+		}
+	}
 
 	if (nested_cpu_has_xsaves(vmcs12))
 		vmcs_write64(XSS_EXIT_BITMAP, vmcs12->xss_exit_bitmap);
@@ -12778,16 +12797,6 @@  static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 		else
 			vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
 	}
-
-	/*
-	 * L1 may access the L2's PDPTR, so save them to construct vmcs12
-	 */
-	if (enable_ept) {
-		vmcs_write64(GUEST_PDPTR0, vmcs12->guest_pdptr0);
-		vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1);
-		vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
-		vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
-	}
 }
 
 /*
@@ -12805,6 +12814,7 @@  static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 			  u32 *entry_failure_code)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	struct hv_enlightened_vmcs *hv_evmcs = vmx->nested.hv_evmcs;
 
 	if (vmx->nested.dirty_vmcs12 || vmx->nested.hv_evmcs) {
 		prepare_vmcs02_full(vmx, vmcs12);
@@ -12815,12 +12825,14 @@  static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	 * First, the fields that are shadowed.  This must be kept in sync
 	 * with vmx_shadow_fields.h.
 	 */
-
-	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
-	vmcs_write32(GUEST_CS_LIMIT, vmcs12->guest_cs_limit);
-	vmcs_write32(GUEST_CS_AR_BYTES, vmcs12->guest_cs_ar_bytes);
-	vmcs_writel(GUEST_ES_BASE, vmcs12->guest_es_base);
-	vmcs_writel(GUEST_CS_BASE, vmcs12->guest_cs_base);
+	if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
+			   HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
+		vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
+		vmcs_write32(GUEST_CS_LIMIT, vmcs12->guest_cs_limit);
+		vmcs_write32(GUEST_CS_AR_BYTES, vmcs12->guest_cs_ar_bytes);
+		vmcs_writel(GUEST_ES_BASE, vmcs12->guest_es_base);
+		vmcs_writel(GUEST_CS_BASE, vmcs12->guest_cs_base);
+	}
 
 	if (vmx->nested.nested_run_pending &&
 	    (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {