diff mbox series

[2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff()

Message ID 20250414081131.97374-2-ubizjak@gmail.com (mailing list archive)
State New
Headers show
Series [1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]() | expand

Commit Message

Uros Bizjak April 14, 2025, 8:10 a.m. UTC
Micro-optimize vmx_do_interrupt_irqoff() by substituting
MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent
LEAVE instruction. GCC compiler does this by default for
a generic tuning and for all modern processors:

DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
	  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_ZHAOXIN
	  | m_TREMONT | m_CORE_HYBRID | m_CORE_ATOM | m_GENERIC)

The new code also saves a couple of bytes, from:

  27:	48 89 ec             	mov    %rbp,%rsp
  2a:	5d                   	pop    %rbp

to:

  27:	c9                   	leave

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kvm/vmx/vmenter.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Sean Christopherson April 15, 2025, 1:05 a.m. UTC | #1
On Mon, Apr 14, 2025, Uros Bizjak wrote:
> Micro-optimize vmx_do_interrupt_irqoff() by substituting
> MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent
> LEAVE instruction. GCC compiler does this by default for
> a generic tuning and for all modern processors:

Out of curisoity, is LEAVE actually a performance win, or is the benefit essentially
just the few code bytes saves?

> DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
> 	  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_ZHAOXIN
> 	  | m_TREMONT | m_CORE_HYBRID | m_CORE_ATOM | m_GENERIC)
> 
> The new code also saves a couple of bytes, from:
> 
>   27:	48 89 ec             	mov    %rbp,%rsp
>   2a:	5d                   	pop    %rbp
> 
> to:
> 
>   27:	c9                   	leave
> 
> No functional change intended.
> 
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  arch/x86/kvm/vmx/vmenter.S | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index f6986dee6f8c..0a6cf5bff2aa 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -59,8 +59,7 @@
>  	 * without the explicit restore, thinks the stack is getting walloped.
>  	 * Using an unwind hint is problematic due to x86-64's dynamic alignment.
>  	 */
> -	mov %_ASM_BP, %_ASM_SP
> -	pop %_ASM_BP
> +	leave
>  	RET
>  .endm
>  
> -- 
> 2.49.0
>
Uros Bizjak April 15, 2025, 7:42 a.m. UTC | #2
On Tue, Apr 15, 2025 at 3:05 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Apr 14, 2025, Uros Bizjak wrote:
> > Micro-optimize vmx_do_interrupt_irqoff() by substituting
> > MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent
> > LEAVE instruction. GCC compiler does this by default for
> > a generic tuning and for all modern processors:
>
> Out of curisoity, is LEAVE actually a performance win, or is the benefit essentially
> just the few code bytes saves?

It is hard to say for out-of-order execution cores, especially when
the stack engine is thrown to the mix (these two instructions, plus
following RET, all update %rsp).

The pragmatic solution was to do what the compiler does and use the
compiler's choice, based on the tuning below.

> > DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
> >         m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_ZHAOXIN
> >         | m_TREMONT | m_CORE_HYBRID | m_CORE_ATOM | m_GENERIC)

The tuning is updated when a new target is introduced to the compiler
and is based on various measurements by the processor manufacturer.
The above covers the majority of recent processors (plus generic
tuning), so I guess we won't fail by following the suit. OTOH, any
performance difference will be negligible.

> > The new code also saves a couple of bytes, from:
> >
> >   27: 48 89 ec                mov    %rbp,%rsp
> >   2a: 5d                      pop    %rbp
> >
> > to:
> >
> >   27: c9                      leave

Thanks,
Uros.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index f6986dee6f8c..0a6cf5bff2aa 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -59,8 +59,7 @@ 
 	 * without the explicit restore, thinks the stack is getting walloped.
 	 * Using an unwind hint is problematic due to x86-64's dynamic alignment.
 	 */
-	mov %_ASM_BP, %_ASM_SP
-	pop %_ASM_BP
+	leave
 	RET
 .endm