diff mbox series

[14/17] KVM: monolithic: x86: inline more exit handlers in vmx.c

Message ID 20190920212509.2578-15-aarcange@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM monolithic v1 | expand

Commit Message

Andrea Arcangeli Sept. 20, 2019, 9:25 p.m. UTC
They can be called directly more efficiently, so we can as well mark
some of them inline in case gcc doesn't decide to inline them.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Paolo Bonzini Sept. 23, 2019, 10:19 a.m. UTC | #1
On 20/09/19 23:25, Andrea Arcangeli wrote:
> They can be called directly more efficiently, so we can as well mark
> some of them inline in case gcc doesn't decide to inline them.

What is the output of size(1) before and after?

Paolo

> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ff46008dc514..a6e597025011 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4588,7 +4588,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -static int handle_external_interrupt(struct kvm_vcpu *vcpu)
> +static __always_inline int handle_external_interrupt(struct kvm_vcpu *vcpu)
>  {
>  	++vcpu->stat.irq_exits;
>  	return 1;
> @@ -4860,7 +4860,7 @@ static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
>  	vmcs_writel(GUEST_DR7, val);
>  }
>  
> -static int handle_cpuid(struct kvm_vcpu *vcpu)
> +static __always_inline int handle_cpuid(struct kvm_vcpu *vcpu)
>  {
>  	return kvm_emulate_cpuid(vcpu);
>  }
> @@ -4891,7 +4891,7 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> -static int handle_halt(struct kvm_vcpu *vcpu)
> +static __always_inline int handle_halt(struct kvm_vcpu *vcpu)
>  {
>  	return kvm_emulate_halt(vcpu);
>  }
>
Andrea Arcangeli Sept. 24, 2019, 1 a.m. UTC | #2
On Mon, Sep 23, 2019 at 12:19:12PM +0200, Paolo Bonzini wrote:
> On 20/09/19 23:25, Andrea Arcangeli wrote:
> > They can be called directly more efficiently, so we can as well mark
> > some of them inline in case gcc doesn't decide to inline them.
> 
> What is the output of size(1) before and after?

Before and after this specific commit there is a difference with gcc 8.3.

full patchset applied

 753699   87971    9616  851286   cfd56 build/arch/x86/kvm/kvm-intel.ko

git revert

 753739   87971    9616  851326   cfd7e  build/arch/x86/kvm/kvm-intel.ko

git reset --hard HEAD^

 753699   87971    9616  851286   cfd56  build/arch/x86/kvm/kvm-intel.ko

git revert

 753739   87971    9616  851326   cfd7e  build/arch/x86/kvm/kvm-intel.ko
Paolo Bonzini Sept. 24, 2019, 1:25 a.m. UTC | #3
On 24/09/19 03:00, Andrea Arcangeli wrote:
> Before and after this specific commit there is a difference with gcc 8.3.
> 
> full patchset applied
> 
>  753699   87971    9616  851286   cfd56 build/arch/x86/kvm/kvm-intel.ko
> 
> git revert
> 
>  753739   87971    9616  851326   cfd7e  build/arch/x86/kvm/kvm-intel.ko
> 
> git reset --hard HEAD^
> 
>  753699   87971    9616  851286   cfd56  build/arch/x86/kvm/kvm-intel.ko
> 
> git revert
> 
>  753739   87971    9616  851326   cfd7e  build/arch/x86/kvm/kvm-intel.ko

So it's forty bytes.  I think we can leave this out.

Paolo
Andrea Arcangeli Sept. 24, 2019, 1:55 a.m. UTC | #4
On Tue, Sep 24, 2019 at 03:25:34AM +0200, Paolo Bonzini wrote:
> On 24/09/19 03:00, Andrea Arcangeli wrote:
> > Before and after this specific commit there is a difference with gcc 8.3.
> > 
> > full patchset applied
> > 
> >  753699   87971    9616  851286   cfd56 build/arch/x86/kvm/kvm-intel.ko
> > 
> > git revert
> > 
> >  753739   87971    9616  851326   cfd7e  build/arch/x86/kvm/kvm-intel.ko
> > 
> > git reset --hard HEAD^
> > 
> >  753699   87971    9616  851286   cfd56  build/arch/x86/kvm/kvm-intel.ko
> > 
> > git revert
> > 
> >  753739   87971    9616  851326   cfd7e  build/arch/x86/kvm/kvm-intel.ko
> 
> So it's forty bytes.  I think we can leave this out.

This commit I reverted adds literally 3 inlines called by 3 functions,
in a very fast path, how many bytes of .text difference did you expect
by dropping some call/ret from a very fast path when you asked me to
test it? I mean it's just a couple of insn each.

I thought the question was if gcc was already inlining without the
hint or not or if it actually grew in size in case I got it wrong and
there were many callers and it was better off not inline, so now I
don't get what was the point of this test if with the result that
confirms it's needed, the patch should be dropped.

It's possible that this patch may not be relevant anymore with the
rename in place of the vmx/svm functions, but if this patch is to be
dropped with the optimal result, then I recommend you to go ahead and
submit a patch to drop __always_inline from the whole kernel because
if it's not good to use it here in a extreme fast path like
handle_external_interrupt and handle_halt, then I don't know what
__always_inline is good for anywhere else in the kernel.

Thanks,
Andrea
Andrea Arcangeli Sept. 24, 2019, 2:56 a.m. UTC | #5
On Mon, Sep 23, 2019 at 09:55:27PM -0400, Andrea Arcangeli wrote:
> This commit I reverted adds literally 3 inlines called by 3 functions,
> in a very fast path, how many bytes of .text difference did you expect
> by dropping some call/ret from a very fast path when you asked me to
> test it? I mean it's just a couple of insn each.
> 
> I thought the question was if gcc was already inlining without the
> hint or not or if it actually grew in size in case I got it wrong and
> there were many callers and it was better off not inline, so now I
> don't get what was the point of this test if with the result that
> confirms it's needed, the patch should be dropped.
> 
> It's possible that this patch may not be relevant anymore with the
> rename in place of the vmx/svm functions, but if this patch is to be
> dropped with the optimal result, then I recommend you to go ahead and
> submit a patch to drop __always_inline from the whole kernel because
> if it's not good to use it here in a extreme fast path like
> handle_external_interrupt and handle_halt, then I don't know what
> __always_inline is good for anywhere else in the kernel.

Thinking more at this I don't think the result of the size check was
nearly enough to come to any conclusion. The only probably conclusion
one can take is that if the size didn't change it was a fail, because
there would be high probability that it wouldn't be beneficial because
it was a noop (even that wouldn't be 100% certain).

One needs to look at why it changed to take any conclusion, and the
reason it got smaller is that all dynamic tracing for ftrace was
dropped, the functions were already inlined fine in the RETPOLINE
case.

Those are tiny functions so it looks a positive thing to make them as
small as possible, there are already proper tracepoints in
kvm_exit/kvm_entry for bpf tracing all all KVM events so there's no
need of the fentry in those tiny functions with just an instruction
that are only ever compiled as static because of the pointer to
function.

This however also means it helps the CONFIG_RETPOLINE=n case and it
doesn't help at all the CONFIG_RETPOLINE=y case, so it's fully
orthogonal to the patchset and can be splitted off but I think it's
worth it.

Thanks,
Andrea
Paolo Bonzini Sept. 25, 2019, 7:52 a.m. UTC | #6
On 24/09/19 03:55, Andrea Arcangeli wrote:
>> So it's forty bytes.  I think we can leave this out.
> This commit I reverted adds literally 3 inlines called by 3 functions,
> in a very fast path, how many bytes of .text difference did you expect
> by dropping some call/ret from a very fast path when you asked me to
> test it? I mean it's just a couple of insn each.

Actually I was either expecting the difference to be zero, meaning GCC
was already inlining them.

I think it is not inlining the functions because they are still
referenced by vmx_exit_handlers.  After patch 15 you could drop them
from the array, and then GCC should inline them.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ff46008dc514..a6e597025011 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4588,7 +4588,7 @@  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static int handle_external_interrupt(struct kvm_vcpu *vcpu)
+static __always_inline int handle_external_interrupt(struct kvm_vcpu *vcpu)
 {
 	++vcpu->stat.irq_exits;
 	return 1;
@@ -4860,7 +4860,7 @@  static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 	vmcs_writel(GUEST_DR7, val);
 }
 
-static int handle_cpuid(struct kvm_vcpu *vcpu)
+static __always_inline int handle_cpuid(struct kvm_vcpu *vcpu)
 {
 	return kvm_emulate_cpuid(vcpu);
 }
@@ -4891,7 +4891,7 @@  static int handle_interrupt_window(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
-static int handle_halt(struct kvm_vcpu *vcpu)
+static __always_inline int handle_halt(struct kvm_vcpu *vcpu)
 {
 	return kvm_emulate_halt(vcpu);
 }