Message ID | 20190507153629.3681-2-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: nVMX: Optimize VMCS data copying | expand |
On 07/05/19 17:36, Sean Christopherson wrote: > Note, "writable" in this context means > + * "writable by the guest", i.e. tagged SHADOW_FIELD_RW. Note #2, the set of > + * fields tagged SHADOW_FIELD_RO may or may not align with the "read-only" > + * VM-exit information fields (which are actually writable if the vCPU is > + * configured to support "VMWRITE to any supported field in the VMCS"). "There's a few more pages of P.S.'s here" (https://youtu.be/rKlrTJ7WN18?t=170) Paolo
On Tue, May 7, 2019 at 8:36 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > Not intercepting fields tagged read-only also allows for additional > optimizations, e.g. marking GUEST_{CS,SS}_AR_BYTES as SHADOW_FIELD_RO > since those fields are rarely written by a VMMs, but read frequently. Do you have data to support this, or is this just a gut feeling? The last time I looked at Virtual Box (which was admittedly a long time ago), it liked to read and write just about every VMCS guest-state field it could find on every VM-exit. The decision of which fields to shadow is really something that should be done dynamically, depending on the behavior of the guest hypervisor (which may vary depending on the L2 guest it's running!) Making the decision statically is bound to result in a poor outcome for some scenarios. When I measured this several years ago, taking one VM-exit for a VMREAD or VMWRITE was more expensive than needlessly shadowing it ~35-40 times.
On 13/06/19 19:02, Jim Mattson wrote: > On Tue, May 7, 2019 at 8:36 AM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > >> Not intercepting fields tagged read-only also allows for additional >> optimizations, e.g. marking GUEST_{CS,SS}_AR_BYTES as SHADOW_FIELD_RO >> since those fields are rarely written by a VMMs, but read frequently. > > Do you have data to support this, or is this just a gut feeling? The > last time I looked at Virtual Box (which was admittedly a long time > ago), it liked to read and write just about every VMCS guest-state > field it could find on every VM-exit. I have never looked at VirtualBox, but most other hypervisors do have a common set of fields (give or take a couple) that they like to read and/or write on most if not every vmexit. Also, while this may vary dynamically based on the L2 guest that is running, this is much less true for unrestricted-guest processors. Without data on _which_ scenarios are bad for a static set of shadowed fields, I'm not really happy to add even more complexity. Paolo > The decision of which fields to shadow is really something that should > be done dynamically, depending on the behavior of the guest hypervisor > (which may vary depending on the L2 guest it's running!) Making the > decision statically is bound to result in a poor outcome for some > scenarios.
On Thu, Jun 13, 2019 at 10:18 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > Also, while this may vary dynamically based on the L2 guest that is > running, this is much less true for unrestricted-guest processors. > Without data on _which_ scenarios are bad for a static set of shadowed > fields, I'm not really happy to add even more complexity. Data supporting which scenarios would lead you to entertain more complexity? Is it even worth collecting data on L3 performance, for example? :-)
On 13/06/19 19:36, Jim Mattson wrote: >> Also, while this may vary dynamically based on the L2 guest that is >> running, this is much less true for unrestricted-guest processors. >> Without data on _which_ scenarios are bad for a static set of shadowed >> fields, I'm not really happy to add even more complexity. > > Data supporting which scenarios would lead you to entertain more > complexity? For example it would be interesting if some L1 hypervisor had 2x slower vmexits on some L2 guests, but otherwise fits the current set of shadowed fields. Paolo > Is it even worth collecting data on L3 performance, for > example? :-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 04b40a98f60b..1f7c4af70903 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1102,14 +1102,6 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data) vmx->nested.msrs.misc_low = data; vmx->nested.msrs.misc_high = data >> 32; - /* - * If L1 has read-only VM-exit information fields, use the - * less permissive vmx_vmwrite_bitmap to specify write - * permissions for the shadow VMCS. - */ - if (enable_shadow_vmcs && !nested_cpu_has_vmwrite_any_field(&vmx->vcpu)) - vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap)); - return 0; } @@ -1298,41 +1290,27 @@ int vmx_get_vmx_msr(struct nested_vmx_msrs *msrs, u32 msr_index, u64 *pdata) } /* - * Copy the writable VMCS shadow fields back to the VMCS12, in case - * they have been modified by the L1 guest. Note that the "read-only" - * VM-exit information fields are actually writable if the vCPU is - * configured to support "VMWRITE to any supported field in the VMCS." + * Copy the writable VMCS shadow fields back to the VMCS12, in case they have + * been modified by the L1 guest. Note, "writable" in this context means + * "writable by the guest", i.e. tagged SHADOW_FIELD_RW. Note #2, the set of + * fields tagged SHADOW_FIELD_RO may or may not align with the "read-only" + * VM-exit information fields (which are actually writable if the vCPU is + * configured to support "VMWRITE to any supported field in the VMCS"). */ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx) { - const u16 *fields[] = { - shadow_read_write_fields, - shadow_read_only_fields - }; - const int max_fields[] = { - max_shadow_read_write_fields, - max_shadow_read_only_fields - }; - int i, q; - unsigned long field; - u64 field_value; struct vmcs *shadow_vmcs = vmx->vmcs01.shadow_vmcs; + struct vmcs12 *vmcs12 = get_vmcs12(&vmx->vcpu); + unsigned long field; + int i; preempt_disable(); vmcs_load(shadow_vmcs); - for (q = 0; q < ARRAY_SIZE(fields); q++) { - for (i = 0; i < max_fields[q]; i++) { - field = fields[q][i]; - field_value = __vmcs_readl(field); - vmcs12_write_any(get_vmcs12(&vmx->vcpu), field, field_value); - } - /* - * Skip the VM-exit information fields if they are read-only. - */ - if (!nested_cpu_has_vmwrite_any_field(&vmx->vcpu)) - break; + for (i = 0; i < max_shadow_read_write_fields; i++) { + field = shadow_read_write_fields[i]; + vmcs12_write_any(vmcs12, field, __vmcs_readl(field)); } vmcs_clear(shadow_vmcs); @@ -4511,6 +4489,24 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu) * path of prepare_vmcs02. */ break; + +#define SHADOW_FIELD_RO(x) case x: +#include "vmcs_shadow_fields.h" + /* + * L1 can read these fields without exiting, ensure the + * shadow VMCS is up-to-date. + */ + if (enable_shadow_vmcs) { + preempt_disable(); + vmcs_load(vmx->vmcs01.shadow_vmcs); + + __vmcs_writel(field, field_value); + + vmcs_clear(vmx->vmcs01.shadow_vmcs); + vmcs_load(vmx->loaded_vmcs->vmcs); + preempt_enable(); + } + /* fall through */ default: vmx->nested.dirty_vmcs12 = true; break; @@ -5456,14 +5452,8 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu, void nested_vmx_vcpu_setup(void) { if (enable_shadow_vmcs) { - /* - * At vCPU creation, "VMWRITE to any supported field - * in the VMCS" is supported, so use the more - * permissive vmx_vmread_bitmap to specify both read - * and write permissions for the shadow VMCS. - */ vmcs_write64(VMREAD_BITMAP, __pa(vmx_vmread_bitmap)); - vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmread_bitmap)); + vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap)); } }
Allowing L1 to VMWRITE read-only fields is only beneficial in a double nesting scenario, e.g. no sane VMM will VMWRITE VM_EXIT_REASON in normal non-nested operation. Intercepting RO fields means KVM doesn't need to sync them from the shadow VMCS to vmcs12 when running L2. The obvious downside is that L1 will VM-Exit more often when running L3, but it's likely safe to assume most folks would happily sacrifice a bit of L3 performance, which may not even be noticeable in the grande scheme, to improve L2 performance across the board. Not intercepting fields tagged read-only also allows for additional optimizations, e.g. marking GUEST_{CS,SS}_AR_BYTES as SHADOW_FIELD_RO since those fields are rarely written by a VMMs, but read frequently. When utilizing a shadow VMCS with asymmetric R/W and R/O bitmaps, fields that cause VM-Exit on VMWRITE but not VMREAD need to be propagated to the shadow VMCS during VMWRITE emulation, otherwise a subsequence VMREAD from L1 will consume a stale value. Note, KVM currently utilizes asymmetric bitmaps when "VMWRITE any field" is not exposed to L1, but only so that it can reject the VMWRITE, i.e. propagating the VMWRITE to the shadow VMCS is a new requirement, not a bug fix. Eliminating the copying of RO fields reduces the latency of nested VM-Entry (copy_shadow_to_vmcs12()) by ~100 cycles (plus 40-50 cycles if/when the AR_BYTES fields are exposed RO). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/vmx/nested.c | 72 +++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 41 deletions(-)