mbox series

[00/10] KVM: x86: Misc anti-retpoline optimizations

Message ID 20200502043234.12481-1-sean.j.christopherson@intel.com (mailing list archive)
Headers show
Series KVM: x86: Misc anti-retpoline optimizations | expand

Message

Sean Christopherson May 2, 2020, 4:32 a.m. UTC
A smattering of optimizations geared toward avoiding retpolines, though
IMO most of the patches are worthwhile changes irrespective of retpolines.
I can split this up into separate patches if desired, outside of the
obvious combos there are no dependencies.

I was mainly coming at this from a nVMX angle.  On a Haswell, this reduces
the best case latency for a nested VMX roundtrip by ~750 cycles, though I
doubt that much benefit will be realized in practice.


The CR0 and CR4 caching changes in particular are keepers, if only because
they get rid of the awful cache vs. decache naming.  And the CR4 change
can eliminate multiple of VMREADs in the nested VMX paths.

Ditto for the CR3 validation patch; it's arguably more readable and can
avoid a VMREAD.

I like the L1 TSC offset patch because VMX and SVM have inverted logic for
how they track the L1 offset, and because math is hard.

I think I like the TDP level change even if retpolines didn't exist?  The
nested EPT behavior is a bit scary, but it was already scary, this just
makes it more obvious.

The RIP/RSP accessors are definitely obsoleted by static calls, but IMO
the noise is worth the benefit unless static calls are imminent.

The DR6 change is gratuitous, though I do like not having to dive into the
VMX code when I inevitably forget the VMX implementations are nops.

Sean Christopherson (10):
  KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch'
  KVM: nVMX: Unconditionally validate CR3 during nested transitions
  KVM: x86: Make kvm_x86_ops' {g,s}et_dr6() hooks optional
  KVM: x86: Split guts of kvm_update_dr7() to separate helper
  KVM: nVMX: Avoid retpoline when writing DR7 during nested transitions
  KVM: VMX: Add proper cache tracking for CR4
  KVM: VMX: Add proper cache tracking for CR0
  KVM: VMX: Add anti-retpoline accessors for RIP and RSP
  KVM: VMX: Move nested EPT out of kvm_x86_ops.get_tdp_level() hook
  KVM: x86/mmu: Capture TDP level when updating CPUID

 arch/x86/include/asm/kvm_host.h |  7 +--
 arch/x86/kvm/cpuid.c            |  3 +-
 arch/x86/kvm/kvm_cache_regs.h   | 10 +++--
 arch/x86/kvm/mmu/mmu.c          |  6 +--
 arch/x86/kvm/svm/nested.c       |  2 +-
 arch/x86/kvm/svm/svm.c          | 21 ---------
 arch/x86/kvm/vmx/nested.c       | 49 +++++++++++----------
 arch/x86/kvm/vmx/vmx.c          | 77 +++++++++++++--------------------
 arch/x86/kvm/vmx/vmx.h          | 30 +++++++++++++
 arch/x86/kvm/x86.c              | 26 ++++-------
 arch/x86/kvm/x86.h              | 14 ++++++
 11 files changed, 125 insertions(+), 120 deletions(-)

Comments

Paolo Bonzini May 4, 2020, 1:25 p.m. UTC | #1
On 02/05/20 06:32, Sean Christopherson wrote:
> A smattering of optimizations geared toward avoiding retpolines, though
> IMO most of the patches are worthwhile changes irrespective of retpolines.
> I can split this up into separate patches if desired, outside of the
> obvious combos there are no dependencies.

Most of them are good stuff anyway, I agree.

Since I like to believe that static calls _are_ close, I queued these:

      KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch'
      KVM: nVMX: Unconditionally validate CR3 during nested transitions
      KVM: VMX: Add proper cache tracking for CR4
      KVM: VMX: Add proper cache tracking for CR0
      KVM: VMX: Move nested EPT out of kvm_x86_ops.get_tdp_level() hook
      KVM: x86/mmu: Capture TDP level when updating CPUID

and I don't disagree with the DR6 one though it can be even improved a
bit so I'll send a patch myself.

Paolo
Sean Christopherson May 4, 2020, 3:09 p.m. UTC | #2
On Mon, May 04, 2020 at 03:25:58PM +0200, Paolo Bonzini wrote:
> On 02/05/20 06:32, Sean Christopherson wrote:
> > A smattering of optimizations geared toward avoiding retpolines, though
> > IMO most of the patches are worthwhile changes irrespective of retpolines.
> > I can split this up into separate patches if desired, outside of the
> > obvious combos there are no dependencies.
> 
> Most of them are good stuff anyway, I agree.
> 
> Since I like to believe that static calls _are_ close, I queued these:
> 
>       KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch'
>       KVM: nVMX: Unconditionally validate CR3 during nested transitions
>       KVM: VMX: Add proper cache tracking for CR4
>       KVM: VMX: Add proper cache tracking for CR0
>       KVM: VMX: Move nested EPT out of kvm_x86_ops.get_tdp_level() hook
>       KVM: x86/mmu: Capture TDP level when updating CPUID
> 
> and I don't disagree with the DR6 one though it can be even improved a
> bit so I'll send a patch myself.

Sounds good, thanks!