Message ID | 20190920212509.2578-15-aarcange@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM monolithic v1 | expand |
On 20/09/19 23:25, Andrea Arcangeli wrote: > They can be called directly more efficiently, so we can as well mark > some of them inline in case gcc doesn't decide to inline them. What is the output of size(1) before and after? Paolo > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > arch/x86/kvm/vmx/vmx.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index ff46008dc514..a6e597025011 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -4588,7 +4588,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) > return 0; > } > > -static int handle_external_interrupt(struct kvm_vcpu *vcpu) > +static __always_inline int handle_external_interrupt(struct kvm_vcpu *vcpu) > { > ++vcpu->stat.irq_exits; > return 1; > @@ -4860,7 +4860,7 @@ static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) > vmcs_writel(GUEST_DR7, val); > } > > -static int handle_cpuid(struct kvm_vcpu *vcpu) > +static __always_inline int handle_cpuid(struct kvm_vcpu *vcpu) > { > return kvm_emulate_cpuid(vcpu); > } > @@ -4891,7 +4891,7 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu) > return 1; > } > > -static int handle_halt(struct kvm_vcpu *vcpu) > +static __always_inline int handle_halt(struct kvm_vcpu *vcpu) > { > return kvm_emulate_halt(vcpu); > } >
On Mon, Sep 23, 2019 at 12:19:12PM +0200, Paolo Bonzini wrote: > On 20/09/19 23:25, Andrea Arcangeli wrote: > > They can be called directly more efficiently, so we can as well mark > > some of them inline in case gcc doesn't decide to inline them. > > What is the output of size(1) before and after? Before and after this specific commit there is a difference with gcc 8.3. full patchset applied 753699 87971 9616 851286 cfd56 build/arch/x86/kvm/kvm-intel.ko git revert 753739 87971 9616 851326 cfd7e build/arch/x86/kvm/kvm-intel.ko git reset --hard HEAD^ 753699 87971 9616 851286 cfd56 build/arch/x86/kvm/kvm-intel.ko git revert 753739 87971 9616 851326 cfd7e build/arch/x86/kvm/kvm-intel.ko
On 24/09/19 03:00, Andrea Arcangeli wrote: > Before and after this specific commit there is a difference with gcc 8.3. > > full patchset applied > > 753699 87971 9616 851286 cfd56 build/arch/x86/kvm/kvm-intel.ko > > git revert > > 753739 87971 9616 851326 cfd7e build/arch/x86/kvm/kvm-intel.ko > > git reset --hard HEAD^ > > 753699 87971 9616 851286 cfd56 build/arch/x86/kvm/kvm-intel.ko > > git revert > > 753739 87971 9616 851326 cfd7e build/arch/x86/kvm/kvm-intel.ko So it's forty bytes. I think we can leave this out. Paolo
On Tue, Sep 24, 2019 at 03:25:34AM +0200, Paolo Bonzini wrote: > On 24/09/19 03:00, Andrea Arcangeli wrote: > > Before and after this specific commit there is a difference with gcc 8.3. > > > > full patchset applied > > > > 753699 87971 9616 851286 cfd56 build/arch/x86/kvm/kvm-intel.ko > > > > git revert > > > > 753739 87971 9616 851326 cfd7e build/arch/x86/kvm/kvm-intel.ko > > > > git reset --hard HEAD^ > > > > 753699 87971 9616 851286 cfd56 build/arch/x86/kvm/kvm-intel.ko > > > > git revert > > > > 753739 87971 9616 851326 cfd7e build/arch/x86/kvm/kvm-intel.ko > > So it's forty bytes. I think we can leave this out. This commit I reverted adds literally 3 inlines called by 3 functions, in a very fast path, how many bytes of .text difference did you expect by dropping some call/ret from a very fast path when you asked me to test it? I mean it's just a couple of insn each. I thought the question was if gcc was already inlining without the hint or not or if it actually grew in size in case I got it wrong and there were many callers and it was better off not inline, so now I don't get what was the point of this test if with the result that confirms it's needed, the patch should be dropped. It's possible that this patch may not be relevant anymore with the rename in place of the vmx/svm functions, but if this patch is to be dropped with the optimal result, then I recommend you to go ahead and submit a patch to drop __always_inline from the whole kernel because if it's not good to use it here in a extreme fast path like handle_external_interrupt and handle_halt, then I don't know what __always_inline is good for anywhere else in the kernel. Thanks, Andrea
On Mon, Sep 23, 2019 at 09:55:27PM -0400, Andrea Arcangeli wrote: > This commit I reverted adds literally 3 inlines called by 3 functions, > in a very fast path, how many bytes of .text difference did you expect > by dropping some call/ret from a very fast path when you asked me to > test it? I mean it's just a couple of insn each. > > I thought the question was if gcc was already inlining without the > hint or not or if it actually grew in size in case I got it wrong and > there were many callers and it was better off not inline, so now I > don't get what was the point of this test if with the result that > confirms it's needed, the patch should be dropped. > > It's possible that this patch may not be relevant anymore with the > rename in place of the vmx/svm functions, but if this patch is to be > dropped with the optimal result, then I recommend you to go ahead and > submit a patch to drop __always_inline from the whole kernel because > if it's not good to use it here in a extreme fast path like > handle_external_interrupt and handle_halt, then I don't know what > __always_inline is good for anywhere else in the kernel. Thinking more at this I don't think the result of the size check was nearly enough to come to any conclusion. The only probably conclusion one can take is that if the size didn't change it was a fail, because there would be high probability that it wouldn't be beneficial because it was a noop (even that wouldn't be 100% certain). One needs to look at why it changed to take any conclusion, and the reason it got smaller is that all dynamic tracing for ftrace was dropped, the functions were already inlined fine in the RETPOLINE case. Those are tiny functions so it looks a positive thing to make them as small as possible, there are already proper tracepoints in kvm_exit/kvm_entry for bpf tracing all all KVM events so there's no need of the fentry in those tiny functions with just an instruction that are only ever compiled as static because of the pointer to function. This however also means it helps the CONFIG_RETPOLINE=n case and it doesn't help at all the CONFIG_RETPOLINE=y case, so it's fully orthogonal to the patchset and can be splitted off but I think it's worth it. Thanks, Andrea
On 24/09/19 03:55, Andrea Arcangeli wrote: >> So it's forty bytes. I think we can leave this out. > This commit I reverted adds literally 3 inlines called by 3 functions, > in a very fast path, how many bytes of .text difference did you expect > by dropping some call/ret from a very fast path when you asked me to > test it? I mean it's just a couple of insn each. Actually I was either expecting the difference to be zero, meaning GCC was already inlining them. I think it is not inlining the functions because they are still referenced by vmx_exit_handlers. After patch 15 you could drop them from the array, and then GCC should inline them. Paolo
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index ff46008dc514..a6e597025011 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4588,7 +4588,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) return 0; } -static int handle_external_interrupt(struct kvm_vcpu *vcpu) +static __always_inline int handle_external_interrupt(struct kvm_vcpu *vcpu) { ++vcpu->stat.irq_exits; return 1; @@ -4860,7 +4860,7 @@ static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) vmcs_writel(GUEST_DR7, val); } -static int handle_cpuid(struct kvm_vcpu *vcpu) +static __always_inline int handle_cpuid(struct kvm_vcpu *vcpu) { return kvm_emulate_cpuid(vcpu); } @@ -4891,7 +4891,7 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu) return 1; } -static int handle_halt(struct kvm_vcpu *vcpu) +static __always_inline int handle_halt(struct kvm_vcpu *vcpu) { return kvm_emulate_halt(vcpu); }
They can be called directly more efficiently, so we can as well mark some of them inline in case gcc doesn't decide to inline them. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- arch/x86/kvm/vmx/vmx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)