Message ID | 1517404231-22406-3-git-send-email-karahmed@amazon.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <karahmed@amazon.de> wrote: > + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, > + MSR_TYPE_W); Why not disable this intercept eagerly, rather than lazily? Unlike MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there is no cost to disabling the intercept if the guest cpuid info declares support for it. > + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { > + nested_vmx_disable_intercept_for_msr( > + msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_PRED_CMD, > + MSR_TYPE_R); > + } I don't think this should be predicated on "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than MSR_TYPE_R.
On 31/01/2018 11:50, Jim Mattson wrote: >> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { >> + nested_vmx_disable_intercept_for_msr( >> + msr_bitmap_l1, msr_bitmap_l0, >> + MSR_IA32_PRED_CMD, >> + MSR_TYPE_R); >> + } > I don't think this should be predicated on > "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just > "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to > nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than > MSR_TYPE_R. In fact this MSR can even be passed down unconditionally, since it needs no save/restore and has no ill performance effect on the sibling hyperthread. Only MSR_IA32_SPEC_CTRL needs to be conditional on "to_vmx(vcpu)->save_spec_ctrl_on_exit". Paolo
On 01/31/2018 05:50 PM, Jim Mattson wrote: > On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <karahmed@amazon.de> wrote: > >> + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, >> + MSR_TYPE_W); > > Why not disable this intercept eagerly, rather than lazily? Unlike > MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there > is no cost to disabling the intercept if the guest cpuid info declares > support for it. > > >> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { >> + nested_vmx_disable_intercept_for_msr( >> + msr_bitmap_l1, msr_bitmap_l0, >> + MSR_IA32_PRED_CMD, >> + MSR_TYPE_R); >> + } > > I don't think this should be predicated on > "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just > "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Paolo suggested this on the previous revision because guest_cpuid_has() would be slow. > Also, the final argument to > nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than > MSR_TYPE_R. > Oops! will fix! Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
On 31/01/2018 12:11, KarimAllah Ahmed wrote: > On 01/31/2018 05:50 PM, Jim Mattson wrote: >> On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <karahmed@amazon.de> >> wrote: >> >>> + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, >>> MSR_IA32_PRED_CMD, >>> + MSR_TYPE_W); >> >> Why not disable this intercept eagerly, rather than lazily? Unlike >> MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there >> is no cost to disabling the intercept if the guest cpuid info declares >> support for it. >> >> >>> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { >>> + nested_vmx_disable_intercept_for_msr( >>> + msr_bitmap_l1, msr_bitmap_l0, >>> + MSR_IA32_PRED_CMD, >>> + MSR_TYPE_R); >>> + } >> >> I don't think this should be predicated on >> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just >> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? > > Paolo suggested this on the previous revision because guest_cpuid_has() > would be slow. Sorry, that was for spec_ctrl. Here there's no need to do any kind of conditional check. Paolo >> Also, the final argument to >> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than >> MSR_TYPE_R. >> > Oops! will fix! > Amazon Development Center Germany GmbH > Berlin - Dresden - Aachen > main office: Krausenstr. 38, 10117 Berlin > Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger > Ust-ID: DE289237879 > Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
On 01/31/2018 05:55 PM, Paolo Bonzini wrote: > On 31/01/2018 11:50, Jim Mattson wrote: >>> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { >>> + nested_vmx_disable_intercept_for_msr( >>> + msr_bitmap_l1, msr_bitmap_l0, >>> + MSR_IA32_PRED_CMD, >>> + MSR_TYPE_R); >>> + } >> I don't think this should be predicated on >> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just >> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to >> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than >> MSR_TYPE_R. > > In fact this MSR can even be passed down unconditionally, since it needs > no save/restore and has no ill performance effect on the sibling > hyperthread. > > Only MSR_IA32_SPEC_CTRL needs to be conditional on > "to_vmx(vcpu)->save_spec_ctrl_on_exit". That used to be the case in an earlier version. There seems to be two opinions here: 1) Pass it only if CPUID for the guest has it. 2) Pass it unconditionally. I do not really have a preference. > > Paolo > Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Looking at SVM now... On 31/01/2018 08:10, KarimAllah Ahmed wrote: > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index f40d0da..89495cf 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -529,6 +529,7 @@ struct svm_cpu_data { > struct kvm_ldttss_desc *tss_desc; > > struct page *save_area; > + struct vmcb *current_vmcb; > }; > > static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); > @@ -1703,11 +1704,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) > __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); > kvm_vcpu_uninit(vcpu); > kmem_cache_free(kvm_vcpu_cache, svm); > + /* > + * The vmcb page can be recycled, causing a false negative in > + * svm_vcpu_load(). So do a full IBPB now. > + */ > + indirect_branch_prediction_barrier(); > } > > static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); > int i; > > if (unlikely(cpu != vcpu->cpu)) { > @@ -1736,6 +1743,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > if (static_cpu_has(X86_FEATURE_RDTSCP)) > wrmsrl(MSR_TSC_AUX, svm->tsc_aux); > > + if (sd->current_vmcb != svm->vmcb) { > + sd->current_vmcb = svm->vmcb; > + indirect_branch_prediction_barrier(); > + } > avic_vcpu_load(vcpu, cpu); > } > > @@ -3684,6 +3695,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) > case MSR_IA32_TSC: > kvm_write_tsc(vcpu, msr); > break; > + case MSR_IA32_PRED_CMD: > + if (!msr->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) > + return 1; > + > + if (data & ~PRED_CMD_IBPB) > + return 1; > + > + if (!data) > + break; > + > + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); This is missing an { .index = MSR_IA32_PRED_CMD, .always = false }, in the direct_access_msrs array. Once you do this, nested is taken care of automatically. Likewise for MSR_IA32_SPEC_CTRL in patch 5. Paolo > + if (is_guest_mode(vcpu)) > + break; > + set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1); > + break; > case MSR_STAR: > svm->vmcb->save.star = data; > break; > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index d46a61b..96e672e 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { > per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; > vmcs_load(vmx->loaded_vmcs->vmcs); > + indirect_branch_prediction_barrier(); > } > > if (!already_loaded) { > @@ -3342,6 +3343,25 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC: > kvm_write_tsc(vcpu, msr_info); > break; > + case MSR_IA32_PRED_CMD: > + if (!msr_info->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) > + return 1; > + > + if (data & ~PRED_CMD_IBPB) > + return 1; > + > + if (!data) > + break; > + > + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); > + > + if (is_guest_mode(vcpu)) > + break; > + > + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, > + MSR_TYPE_W); > + break; > case MSR_IA32_CR_PAT: > if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { > if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) > @@ -10046,7 +10066,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, > unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; > > /* This shortcut is ok because we support only x2APIC MSRs so far. */ > - if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) > + if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && > + !to_vmx(vcpu)->save_spec_ctrl_on_exit) > return false; > > page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); > @@ -10079,6 +10100,14 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, > MSR_TYPE_W); > } > } > + > + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { > + nested_vmx_disable_intercept_for_msr( > + msr_bitmap_l1, msr_bitmap_l0, > + MSR_IA32_PRED_CMD, > + MSR_TYPE_R); > + } > + > kunmap(page); > kvm_release_page_clean(page); > >
On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > In fact this MSR can even be passed down unconditionally, since it needs > no save/restore and has no ill performance effect on the sibling > hyperthread. I'm a bit surprised to hear that IBPB has no ill performance impact on the sibling hyperthread. On current CPUs, this has to flush the BTB, doesn't it? And since the BTB is shared between hyperthreads, doesn't the sibling lose all of its branch predictions?
> On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > In fact this MSR can even be passed down unconditionally, since it needs > > no save/restore and has no ill performance effect on the sibling > > hyperthread. > > I'm a bit surprised to hear that IBPB has no ill performance impact on > the sibling hyperthread. On current CPUs, this has to flush the BTB, > doesn't it? And since the BTB is shared between hyperthreads, doesn't > the sibling lose all of its branch predictions? IBPB most definitely (in current implementations) will stop both hyperthreads for the flush. IBPB is not a cheap operation on a system level
On 31/01/2018 12:39, Jim Mattson wrote: > On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> In fact this MSR can even be passed down unconditionally, since it needs >> no save/restore and has no ill performance effect on the sibling >> hyperthread. > > I'm a bit surprised to hear that IBPB has no ill performance impact on > the sibling hyperthread. On current CPUs, this has to flush the BTB, > doesn't it? And since the BTB is shared between hyperthreads, doesn't > the sibling lose all of its branch predictions? Yeah, I knew about that, but I hadn't heard yet that it also blocks the hyperthread while you do the write. It makes sense in retrospect. In any case, there's no difference (unlike IBRS) in the vmexit cost if the MSR is passed through. Paolo
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c0eb337..033004d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x80000008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC0000001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; } case 0x80000019: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f40d0da..89495cf 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -529,6 +529,7 @@ struct svm_cpu_data { struct kvm_ldttss_desc *tss_desc; struct page *save_area; + struct vmcb *current_vmcb; }; static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); @@ -1703,11 +1704,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); + /* + * The vmcb page can be recycled, causing a false negative in + * svm_vcpu_load(). So do a full IBPB now. + */ + indirect_branch_prediction_barrier(); } static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); int i; if (unlikely(cpu != vcpu->cpu)) { @@ -1736,6 +1743,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (static_cpu_has(X86_FEATURE_RDTSCP)) wrmsrl(MSR_TSC_AUX, svm->tsc_aux); + if (sd->current_vmcb != svm->vmcb) { + sd->current_vmcb = svm->vmcb; + indirect_branch_prediction_barrier(); + } avic_vcpu_load(vcpu, cpu); } @@ -3684,6 +3695,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_PRED_CMD: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + if (is_guest_mode(vcpu)) + break; + set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1); + break; case MSR_STAR: svm->vmcb->save.star = data; break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d46a61b..96e672e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); + indirect_branch_prediction_barrier(); } if (!already_loaded) { @@ -3342,6 +3343,25 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_PRED_CMD: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + + if (is_guest_mode(vcpu)) + break; + + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, + MSR_TYPE_W); + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -10046,7 +10066,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; /* This shortcut is ok because we support only x2APIC MSRs so far. */ - if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) + if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && + !to_vmx(vcpu)->save_spec_ctrl_on_exit) return false; page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); @@ -10079,6 +10100,14 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, MSR_TYPE_W); } } + + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PRED_CMD, + MSR_TYPE_R); + } + kunmap(page); kvm_release_page_clean(page);