Message ID | 38d613dc-5b46-1f4f-04d1-53c01932e6d6@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Feb 20, 2017 at 3:05 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 18/02/2017 04:29, Andy Lutomirski wrote: >> There's no code here because the patch is trivial, but I want to run >> the idea by you all first to see if there are any issues. >> >> VMX is silly and forces the TSS limit to the minimum on VM exits. KVM >> wastes lots of cycles bumping it back up to accomodate the io bitmap. > > Actually looked at the code now... > > reload_tss is only invoked for userspace exits, so it is a nice-to-have > but it wouldn't show on most workloads. Still it does save 150-200 > clock cycles to remove it (I just commented out reload_tss() from > __vmx_load_host_state to test). That's for anything involving userspace or preemption, right? > > Another 100-150 could be saved if we could just use rdgsbase/wrgsbase, > instead of rdmsr/wrmsr, to read and write the kernel GS. Super hacky > patch after sig. I have a Real Patch Series (tm) to do that, but it has a couple of unresolved corner cases so far. That being said, vmx_save_host_state() is, um, poorly optimized. I'll try to find some time to fix the obvious things. Meanwhile, I'll send real patches for TR. > + cr4_set_bits(X86_CR4_FSGSBASE); Nice root hole :-p --Andy
On 20/02/2017 17:46, Andy Lutomirski wrote: > On Mon, Feb 20, 2017 at 3:05 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> >> On 18/02/2017 04:29, Andy Lutomirski wrote: >>> There's no code here because the patch is trivial, but I want to run >>> the idea by you all first to see if there are any issues. >>> >>> VMX is silly and forces the TSS limit to the minimum on VM exits. KVM >>> wastes lots of cycles bumping it back up to accomodate the io bitmap. >> >> Actually looked at the code now... >> >> reload_tss is only invoked for userspace exits, so it is a nice-to-have >> but it wouldn't show on most workloads. Still it does save 150-200 >> clock cycles to remove it (I just commented out reload_tss() from >> __vmx_load_host_state to test). > > That's for anything involving userspace or preemption, right? Yes. But 150-200 clock cycles are nothing compared to the cache misses you get from preemption, so I'd ignore that. Saving 300 clock cycles on userspace exits from TR+GSBASE would be about 5% on my Haswell. > That being said, vmx_save_host_state() is, um, poorly optimized. Yeah, but again it doesn't run that often in practical cases. Paolo
On Mon, Feb 20, 2017 at 8:51 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 20/02/2017 17:46, Andy Lutomirski wrote: >> On Mon, Feb 20, 2017 at 3:05 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> >>> >>> On 18/02/2017 04:29, Andy Lutomirski wrote: >>>> There's no code here because the patch is trivial, but I want to run >>>> the idea by you all first to see if there are any issues. >>>> >>>> VMX is silly and forces the TSS limit to the minimum on VM exits. KVM >>>> wastes lots of cycles bumping it back up to accomodate the io bitmap. >>> >>> Actually looked at the code now... >>> >>> reload_tss is only invoked for userspace exits, so it is a nice-to-have >>> but it wouldn't show on most workloads. Still it does save 150-200 >>> clock cycles to remove it (I just commented out reload_tss() from >>> __vmx_load_host_state to test). >> >> That's for anything involving userspace or preemption, right? > > Yes. But 150-200 clock cycles are nothing compared to the cache misses > you get from preemption, so I'd ignore that. Saving 300 clock cycles on > userspace exits from TR+GSBASE would be about 5% on my Haswell. That's still 5% :) > >> That being said, vmx_save_host_state() is, um, poorly optimized. > > Yeah, but again it doesn't run that often in practical cases. But it will if someone ever tries to upstream Google's approach of not emulating anything in the kernel. :)
> > Yes. But 150-200 clock cycles are nothing compared to the cache misses > > you get from preemption, so I'd ignore that. Saving 300 clock cycles on > > userspace exits from TR+GSBASE would be about 5% on my Haswell. > > That's still 5% :) Yes, 5% on userspace exits is good (though they're rare). OTOH, 300 clock cycles on preemption are nothing to write home about. Paolo
On February 20, 2017 2:02:53 PM PST, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> > Yes. But 150-200 clock cycles are nothing compared to the cache >misses >> > you get from preemption, so I'd ignore that. Saving 300 clock >cycles on >> > userspace exits from TR+GSBASE would be about 5% on my Haswell. >> >> That's still 5% :) > >Yes, 5% on userspace exits is good (though they're rare). > >OTOH, 300 clock cycles on preemption are nothing to write home about. > >Paolo Oh, I would say it is... especially when it's basically free.
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9856b73a21ad..e76bfec463bc 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2138,9 +2138,12 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu) #endif #ifdef CONFIG_X86_64 - rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base); + local_irq_disable(); + asm volatile("swapgs; rdgsbase %0" : "=r" (vmx->msr_host_kernel_gs_base)); if (is_long_mode(&vmx->vcpu)) - wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); + asm volatile("wrgsbase %0" : : "r" (vmx->msr_guest_kernel_gs_base)); + asm volatile("swapgs"); + local_irq_enable(); #endif if (boot_cpu_has(X86_FEATURE_MPX)) rdmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs); @@ -2152,14 +2155,20 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu) static void __vmx_load_host_state(struct vcpu_vmx *vmx) { + unsigned long flags; + if (!vmx->host_state.loaded) return; ++vmx->vcpu.stat.host_state_reload; vmx->host_state.loaded = 0; #ifdef CONFIG_X86_64 + local_irq_save(flags); + asm("swapgs"); if (is_long_mode(&vmx->vcpu)) - rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); + asm volatile("rdgsbase %0" : "=r" (vmx->msr_guest_kernel_gs_base)); + asm volatile("wrgsbase %0; swapgs" : : "r" (vmx->msr_host_kernel_gs_base)); + local_irq_restore(flags); #endif if (vmx->host_state.gs_ldt_reload_needed) { kvm_load_ldt(vmx->host_state.ldt_sel); @@ -2177,10 +2186,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx) loadsegment(es, vmx->host_state.es_sel); } #endif - reload_tss(); -#ifdef CONFIG_X86_64 - wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base); -#endif + //reload_tss(); if (vmx->host_state.msr_host_bndcfgs) wrmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs); load_gdt(this_cpu_ptr(&host_gdt)); @@ -3469,6 +3475,7 @@ static int hardware_enable(void) wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits); } cr4_set_bits(X86_CR4_VMXE); + cr4_set_bits(X86_CR4_FSGSBASE); if (vmm_exclusive) { kvm_cpu_vmxon(phys_addr);