Message ID | 1648872113-24329-1-git-send-email-lirongqing@baidu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: VMX: optimize pi_wakeup_handler | expand |
On 4/2/22 06:01, Li RongQing wrote: > pi_wakeup_handler is used to wakeup the sleep vCPUs by posted irq > list_for_each_entry is used in it, and whose input is other function > per_cpu(), That cause that per_cpu() be invoked at least twice when > there is one sleep vCPU > > so optimize pi_wakeup_handler it by reading once which is safe in > spinlock protection > > and same to per CPU spinlock What's the difference in the generated code? Paolo > Signed-off-by: Li RongQing <lirongqing@baidu.com> > --- > arch/x86/kvm/vmx/posted_intr.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c > index 5fdabf3..0dae431 100644 > --- a/arch/x86/kvm/vmx/posted_intr.c > +++ b/arch/x86/kvm/vmx/posted_intr.c > @@ -214,17 +214,21 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) > */ > void pi_wakeup_handler(void) > { > + struct list_head *wakeup_list; > int cpu = smp_processor_id(); > + raw_spinlock_t *spinlock; > struct vcpu_vmx *vmx; > > - raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); > - list_for_each_entry(vmx, &per_cpu(wakeup_vcpus_on_cpu, cpu), > - pi_wakeup_list) { > + spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu); > + > + raw_spin_lock(spinlock); > + wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu); > + list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) { > > if (pi_test_on(&vmx->pi_desc)) > kvm_vcpu_wake_up(&vmx->vcpu); > } > - raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); > + raw_spin_unlock(spinlock); > } > > void __init pi_init_cpu(int cpu)
> -----邮件原件----- > 发件人: Paolo Bonzini <paolo.bonzini@gmail.com> 代表 Paolo Bonzini > 发送时间: 2022年4月2日 16:14 > 收件人: Li,Rongqing <lirongqing@baidu.com>; kvm@vger.kernel.org; > seanjc@google.com; vkuznets@redhat.com > 主题: Re: [PATCH] KVM: VMX: optimize pi_wakeup_handler > > On 4/2/22 06:01, Li RongQing wrote: > > pi_wakeup_handler is used to wakeup the sleep vCPUs by posted irq > > list_for_each_entry is used in it, and whose input is other function > > per_cpu(), That cause that per_cpu() be invoked at least twice when > > there is one sleep vCPU > > > > so optimize pi_wakeup_handler it by reading once which is safe in > > spinlock protection > > > > and same to per CPU spinlock > > What's the difference in the generated code? > This reduces one fifth asm codes Without this patch: 0000000000000400 <pi_wakeup_handler>: 400: e8 00 00 00 00 callq 405 <pi_wakeup_handler+0x5> 405: 55 push %rbp 406: 48 89 e5 mov %rsp,%rbp 409: 41 57 push %r15 40b: 41 56 push %r14 40d: 41 55 push %r13 40f: 41 54 push %r12 411: 49 c7 c4 00 00 00 00 mov $0x0,%r12 418: 65 44 8b 2d 00 00 00 mov %gs:0x0(%rip),%r13d # 420 <pi_wakeup_handler+0x20> 41f: 00 420: 4d 63 ed movslq %r13d,%r13 423: 4c 89 e7 mov %r12,%rdi 426: 53 push %rbx 427: 4a 03 3c ed 00 00 00 add 0x0(,%r13,8),%rdi 42e: 00 42f: 49 c7 c6 00 00 00 00 mov $0x0,%r14 436: e8 00 00 00 00 callq 43b <pi_wakeup_handler+0x3b> 43b: 4a 8b 0c ed 00 00 00 mov 0x0(,%r13,8),%rcx 442: 00 443: 4c 89 f0 mov %r14,%rax 446: 48 8b 14 01 mov (%rcx,%rax,1),%rdx 44a: 48 01 c8 add %rcx,%rax 44d: 48 39 c2 cmp %rax,%rdx 450: 74 3e je 490 <pi_wakeup_handler+0x90> 452: 48 8d 9a 40 d9 ff ff lea -0x26c0(%rdx),%rbx 459: 49 c7 c7 00 00 00 00 mov $0x0,%r15 460: 48 8b 83 a0 26 00 00 mov 0x26a0(%rbx),%rax 467: a8 01 test $0x1,%al 469: 74 08 je 473 <pi_wakeup_handler+0x73> 46b: 48 89 df mov %rbx,%rdi 46e: e8 00 00 00 00 callq 473 <pi_wakeup_handler+0x73> 473: 4b 8b 0c ef mov (%r15,%r13,8),%rcx 477: 48 8b 93 c0 26 00 00 mov 0x26c0(%rbx),%rdx 47e: 4c 89 f0 mov %r14,%rax 481: 48 01 c8 add %rcx,%rax 484: 48 8d 9a 40 d9 ff ff lea -0x26c0(%rdx),%rbx 48b: 48 39 c2 cmp %rax,%rdx 48e: 75 d0 jne 460 <pi_wakeup_handler+0x60> 490: 49 01 cc add %rcx,%r12 493: 41 c6 04 24 00 movb $0x0,(%r12) 498: 5b pop %rbx 499: 41 5c pop %r12 49b: 41 5d pop %r13 49d: 41 5e pop %r14 49f: 41 5f pop %r15 4a1: 5d pop %rbp 4a2: c3 retq 4a3: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 4aa: 00 00 4ac: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 4b3: 00 00 00 4b6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 4bd: 00 00 00 With this patch 400: e8 00 00 00 00 callq 405 <pi_wakeup_handler+0x5> 405: 55 push %rbp 406: 48 89 e5 mov %rsp,%rbp 409: 41 55 push %r13 40b: 41 54 push %r12 40d: 53 push %rbx 40e: 49 c7 c5 00 00 00 00 mov $0x0,%r13 415: 49 c7 c4 00 00 00 00 mov $0x0,%r12 41c: 65 8b 1d 00 00 00 00 mov %gs:0x0(%rip),%ebx # 423 <pi_wakeup_handler+0x23> 423: 48 63 db movslq %ebx,%rbx 426: 4c 03 2c dd 00 00 00 add 0x0(,%rbx,8),%r13 42d: 00 42e: 4c 89 ef mov %r13,%rdi 431: e8 00 00 00 00 callq 436 <pi_wakeup_handler+0x36> 436: 4c 03 24 dd 00 00 00 add 0x0(,%rbx,8),%r12 43d: 00 43e: 49 8b 04 24 mov (%r12),%rax 442: 49 39 c4 cmp %rax,%r12 445: 74 2d je 474 <pi_wakeup_handler+0x74> 447: 48 8d 98 40 d9 ff ff lea -0x26c0(%rax),%rbx 44e: 48 8b 83 a0 26 00 00 mov 0x26a0(%rbx),%rax 455: a8 01 test $0x1,%al 457: 74 08 je 461 <pi_wakeup_handler+0x61> 459: 48 89 df mov %rbx,%rdi 45c: e8 00 00 00 00 callq 461 <pi_wakeup_handler+0x61> 461: 48 8b 83 c0 26 00 00 mov 0x26c0(%rbx),%rax 468: 49 39 c4 cmp %rax,%r12 46b: 48 8d 98 40 d9 ff ff lea -0x26c0(%rax),%rbx 472: 75 da jne 44e <pi_wakeup_handler+0x4e> 474: 41 c6 45 00 00 movb $0x0,0x0(%r13) 479: 5b pop %rbx 47a: 41 5c pop %r12 47c: 41 5d pop %r13 47e: 5d pop %rbp 47f: c3 retq these is a similar patch 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7 -Li
On Sat, Apr 02, 2022, Li,Rongqing wrote: > > 发件人: Paolo Bonzini <paolo.bonzini@gmail.com> 代表 Paolo Bonzini > > On 4/2/22 06:01, Li RongQing wrote: > > > pi_wakeup_handler is used to wakeup the sleep vCPUs by posted irq > > > list_for_each_entry is used in it, and whose input is other function > > > per_cpu(), That cause that per_cpu() be invoked at least twice when > > > there is one sleep vCPU > > > > > > so optimize pi_wakeup_handler it by reading once which is safe in > > > spinlock protection There's no need to protect reading the per-cpu variable with the spinlock, only walking the list needs to be protected. E.g. the code can be compacted to: int cpu = smp_processor_id(); raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu); struct list_head *wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu); struct vcpu_vmx *vmx; raw_spin_lock(spinlock); list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) { if (pi_test_on(&vmx->pi_desc)) kvm_vcpu_wake_up(&vmx->vcpu); } raw_spin_unlock(spinlock); > > > > > > and same to per CPU spinlock > > > > What's the difference in the generated code? > > > > This reduces one fifth asm codes ... > these is a similar patch 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7 Is there a measurable performance improvement though? I don't dislike the patch, but it probably falls into the "technically an optimization but no one will ever notice" category.
> There's no need to protect reading the per-cpu variable with the spinlock, only > walking the list needs to be protected. E.g. the code can be compacted to: Thanks > > > > > > What's the difference in the generated code? > > > > > > > This reduces one fifth asm codes > > ... > > > these is a similar patch 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7 > > Is there a measurable performance improvement though? I don't dislike the > patch, but it probably falls into the "technically an optimization but no one will > ever notice" category. There is small performance improvement when "perf bench sched pipe" is tested on IPI virtualization supported cpu and halt/mwait donot passthrough to vm (https://patchwork.kernel.org/project/kvm/patch/20220304080725.18135-9-guang.zeng@intel.com/ are included) Thanks -Li
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 5fdabf3..0dae431 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -214,17 +214,21 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) */ void pi_wakeup_handler(void) { + struct list_head *wakeup_list; int cpu = smp_processor_id(); + raw_spinlock_t *spinlock; struct vcpu_vmx *vmx; - raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); - list_for_each_entry(vmx, &per_cpu(wakeup_vcpus_on_cpu, cpu), - pi_wakeup_list) { + spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu); + + raw_spin_lock(spinlock); + wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu); + list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) { if (pi_test_on(&vmx->pi_desc)) kvm_vcpu_wake_up(&vmx->vcpu); } - raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); + raw_spin_unlock(spinlock); } void __init pi_init_cpu(int cpu)
pi_wakeup_handler is used to wakeup the sleep vCPUs by posted irq list_for_each_entry is used in it, and whose input is other function per_cpu(), That cause that per_cpu() be invoked at least twice when there is one sleep vCPU so optimize pi_wakeup_handler it by reading once which is safe in spinlock protection and same to per CPU spinlock Signed-off-by: Li RongQing <lirongqing@baidu.com> --- arch/x86/kvm/vmx/posted_intr.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)