[RESEND] KVM: Boosting vCPUs that are delivering interrupts

Message ID	1562915730-9490-1-git-send-email-wanpengli@tencent.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Wanpeng Li <kernellwp@gmail.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>, Christian Borntraeger <borntraeger@de.ibm.com> Subject: [PATCH RESEND] KVM: Boosting vCPUs that are delivering interrupts Date: Fri, 12 Jul 2019 15:15:30 +0800 Message-Id: <1562915730-9490-1-git-send-email-wanpengli@tencent.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	[RESEND] KVM: Boosting vCPUs that are delivering interrupts \| expand [RESEND] KVM: Boosting vCPUs that are delivering interrupts

Wanpeng Li July 12, 2019, 7:15 a.m. UTC

From: Wanpeng Li <wanpengli@tencent.com>

Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup 
and interrupt delivery), except the lock holder, we want to also boost vCPUs 
that are delivering interrupts. Actually most smp_call_function_many calls are 
synchronous ipi calls, the ipi target vCPUs are also good yield candidates. 
This patch sets preempted flag during wakeup and interrupt delivery time.

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M

            vanilla     boosting    improved
1VM          23000       21232        -9%                      
2VM           2800        8000       180%
3VM           1800        3100        72%

Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs, 
one running ebizzy -M, the other running 'stress --cpu 2':

w/ boosting + w/o pv sched yield(vanilla)   

            vanilla     boosting   improved 
              1570         4000       55%

w/ boosting + w/ pv sched yield(vanilla)

            vanilla     boosting   improved 
              1844         5157       79%   

w/o boosting, perf top in VM:

 72.33%  [kernel]       [k] smp_call_function_many
  4.22%  [kernel]       [k] call_function_i
  3.71%  [kernel]       [k] async_page_fault

w/ boosting, perf top in VM:

 38.43%  [kernel]       [k] smp_call_function_many
  6.31%  [kernel]       [k] async_page_fault
  6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
  4.88%  [kernel]       [k] call_function_interrupt

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 virt/kvm/kvm_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Wanpeng Li July 12, 2019, 7:25 a.m. UTC | #1

On Fri, 12 Jul 2019 at 15:15, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> From: Wanpeng Li <wanpengli@tencent.com>
>
> Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup
> and interrupt delivery), except the lock holder, we want to also boost vCPUs
> that are delivering interrupts. Actually most smp_call_function_many calls are
> synchronous ipi calls, the ipi target vCPUs are also good yield candidates.
> This patch sets preempted flag during wakeup and interrupt delivery time.
>

I forgot to mention that I disable pv tlb shootdown during testing,
function call interrupts are not easy to be triggered directly by
userspace workloads, in addition, distros' guest kernel w/o pv tlb
shootdown support can also get benefit in both tlb shootdown and
function call interrupts scenarios.

> Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
> ebizzy -M
>
>             vanilla     boosting    improved
> 1VM          23000       21232        -9%
> 2VM           2800        8000       180%
> 3VM           1800        3100        72%
>
> Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
> one running ebizzy -M, the other running 'stress --cpu 2':
>
> w/ boosting + w/o pv sched yield(vanilla)
>
>             vanilla     boosting   improved
>               1570         4000       55%
>
> w/ boosting + w/ pv sched yield(vanilla)
>
>             vanilla     boosting   improved
>               1844         5157       79%
>
> w/o boosting, perf top in VM:
>
>  72.33%  [kernel]       [k] smp_call_function_many
>   4.22%  [kernel]       [k] call_function_i
>   3.71%  [kernel]       [k] async_page_fault
>
> w/ boosting, perf top in VM:
>
>  38.43%  [kernel]       [k] smp_call_function_many
>   6.31%  [kernel]       [k] async_page_fault
>   6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
>   4.88%  [kernel]       [k] call_function_interrupt
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  virt/kvm/kvm_main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b4ab59d..2c46705 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>         int me;
>         int cpu = vcpu->cpu;
>
> -       if (kvm_vcpu_wake_up(vcpu))
> +       if (kvm_vcpu_wake_up(vcpu)) {
> +               vcpu->preempted = true;
>                 return;
> +       }
>
>         me = get_cpu();
>         if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> --
> 2.7.4
>

Wanpeng Li July 18, 2019, 6:09 a.m. UTC | #2

Cc arm guy's latest email
On Fri, 12 Jul 2019 at 15:15, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> From: Wanpeng Li <wanpengli@tencent.com>
>
> Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup
> and interrupt delivery), except the lock holder, we want to also boost vCPUs
> that are delivering interrupts. Actually most smp_call_function_many calls are
> synchronous ipi calls, the ipi target vCPUs are also good yield candidates.
> This patch sets preempted flag during wakeup and interrupt delivery time.
>
> Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
> ebizzy -M
>
>             vanilla     boosting    improved
> 1VM          23000       21232        -9%
> 2VM           2800        8000       180%
> 3VM           1800        3100        72%
>
> Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
> one running ebizzy -M, the other running 'stress --cpu 2':
>
> w/ boosting + w/o pv sched yield(vanilla)
>
>             vanilla     boosting   improved
>               1570         4000       55%
>
> w/ boosting + w/ pv sched yield(vanilla)
>
>             vanilla     boosting   improved
>               1844         5157       79%
>
> w/o boosting, perf top in VM:
>
>  72.33%  [kernel]       [k] smp_call_function_many
>   4.22%  [kernel]       [k] call_function_i
>   3.71%  [kernel]       [k] async_page_fault
>
> w/ boosting, perf top in VM:
>
>  38.43%  [kernel]       [k] smp_call_function_many
>   6.31%  [kernel]       [k] async_page_fault
>   6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
>   4.88%  [kernel]       [k] call_function_interrupt
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  virt/kvm/kvm_main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b4ab59d..2c46705 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>         int me;
>         int cpu = vcpu->cpu;
>
> -       if (kvm_vcpu_wake_up(vcpu))
> +       if (kvm_vcpu_wake_up(vcpu)) {
> +               vcpu->preempted = true;
>                 return;
> +       }
>
>         me = get_cpu();
>         if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> --
> 2.7.4
>

Paolo Bonzini July 18, 2019, 7:59 a.m. UTC | #3

On 12/07/19 09:15, Wanpeng Li wrote:
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b4ab59d..2c46705 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>  	int me;
>  	int cpu = vcpu->cpu;
>  
> -	if (kvm_vcpu_wake_up(vcpu))
> +	if (kvm_vcpu_wake_up(vcpu)) {
> +		vcpu->preempted = true;
>  		return;
> +	}
>  
>  	me = get_cpu();
>  	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> 

Who is resetting vcpu->preempted to false in this case?  This also
applies to s390 in fact.

Paolo

Christian Borntraeger July 18, 2019, 8:15 a.m. UTC | #4

On 18.07.19 09:59, Paolo Bonzini wrote:
> On 12/07/19 09:15, Wanpeng Li wrote:
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index b4ab59d..2c46705 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>>  	int me;
>>  	int cpu = vcpu->cpu;
>>  
>> -	if (kvm_vcpu_wake_up(vcpu))
>> +	if (kvm_vcpu_wake_up(vcpu)) {
>> +		vcpu->preempted = true;
>>  		return;
>> +	}
>>  
>>  	me = get_cpu();
>>  	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
>>
> 
> Who is resetting vcpu->preempted to false in this case?  This also
> applies to s390 in fact.

Isnt that done by the sched_in handler?

Paolo Bonzini July 18, 2019, 8:34 a.m. UTC | #5

On 18/07/19 10:15, Christian Borntraeger wrote:
> 
> 
> On 18.07.19 09:59, Paolo Bonzini wrote:
>> On 12/07/19 09:15, Wanpeng Li wrote:
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index b4ab59d..2c46705 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>>>  	int me;
>>>  	int cpu = vcpu->cpu;
>>>  
>>> -	if (kvm_vcpu_wake_up(vcpu))
>>> +	if (kvm_vcpu_wake_up(vcpu)) {
>>> +		vcpu->preempted = true;
>>>  		return;
>>> +	}
>>>  
>>>  	me = get_cpu();
>>>  	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
>>>
>>
>> Who is resetting vcpu->preempted to false in this case?  This also
>> applies to s390 in fact.
> 
> Isnt that done by the sched_in handler?

I am a bit confused because, if it is done by the sched_in later, I
don't understand why the sched_out handler hasn't set vcpu->preempted
already.

The s390 commit message is not very clear, but it talks about "a former
sleeping cpu" that "gave up the cpu voluntarily".  Does "voluntarily"
that mean it is in kvm_vcpu_block?  But then at least for x86 it would
be after vcpu_load so the preempt notifiers have been registered, and
for s390 too (kvm_arch_vcpu_ioctl_run -> __vcpu_run -> vcpu_post_run ->
kvm_handle_sie_intercept etc.).

Paolo

Wanpeng Li July 18, 2019, 8:43 a.m. UTC | #6

On Thu, 18 Jul 2019 at 16:34, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 18/07/19 10:15, Christian Borntraeger wrote:
> >
> >
> > On 18.07.19 09:59, Paolo Bonzini wrote:
> >> On 12/07/19 09:15, Wanpeng Li wrote:
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index b4ab59d..2c46705 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> >>>     int me;
> >>>     int cpu = vcpu->cpu;
> >>>
> >>> -   if (kvm_vcpu_wake_up(vcpu))
> >>> +   if (kvm_vcpu_wake_up(vcpu)) {
> >>> +           vcpu->preempted = true;
> >>>             return;
> >>> +   }
> >>>
> >>>     me = get_cpu();
> >>>     if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> >>>
> >>
> >> Who is resetting vcpu->preempted to false in this case?  This also
> >> applies to s390 in fact.
> >
> > Isnt that done by the sched_in handler?
>
> I am a bit confused because, if it is done by the sched_in later, I
> don't understand why the sched_out handler hasn't set vcpu->preempted
> already.
>
> The s390 commit message is not very clear, but it talks about "a former
> sleeping cpu" that "gave up the cpu voluntarily".  Does "voluntarily"
> that mean it is in kvm_vcpu_block?  But then at least for x86 it would

see the prepare_to_swait_exlusive() in kvm_vcpu_block(), the task will
be set in TASK_INTERRUPTIBLE state, kvm_sched_out will set
vcpu->preempted to true iff current->state == TASK_RUNNING.

Regards,
Wanpeng Li

Paolo Bonzini July 18, 2019, 9:07 a.m. UTC | #7

On 18/07/19 10:43, Wanpeng Li wrote:
>>> Isnt that done by the sched_in handler?
>>
>> I am a bit confused because, if it is done by the sched_in later, I
>> don't understand why the sched_out handler hasn't set vcpu->preempted
>> already.
>>
>> The s390 commit message is not very clear, but it talks about "a former
>> sleeping cpu" that "gave up the cpu voluntarily".  Does "voluntarily"
>> that mean it is in kvm_vcpu_block?  But then at least for x86 it would
> 
> see the prepare_to_swait_exlusive() in kvm_vcpu_block(), the task will
> be set in TASK_INTERRUPTIBLE state, kvm_sched_out will set
> vcpu->preempted to true iff current->state == TASK_RUNNING.

Ok, I was totally blind to that "if" around vcpu->preempted = true, it's
obvious now.

I think we need two flags then, for example vcpu->preempted and vcpu->ready:

- kvm_sched_out sets both of them to true iff current->state == TASK_RUNNING

- kvm_vcpu_kick sets vcpu->ready to true

- kvm_sched_in clears both of them

This way, vmx_vcpu_pi_load can keep looking at preempted only (it
handles voluntary preemption in pi_pre_block/pi_post_block).

Also, kvm_s390_vcpu_wakeup can be changed to use kvm_vcpu_wake_up, which
is nice.

Paolo

Wanpeng Li July 18, 2019, 9:29 a.m. UTC | #8

On Thu, 18 Jul 2019 at 17:07, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 18/07/19 10:43, Wanpeng Li wrote:
> >>> Isnt that done by the sched_in handler?
> >>
> >> I am a bit confused because, if it is done by the sched_in later, I
> >> don't understand why the sched_out handler hasn't set vcpu->preempted
> >> already.
> >>
> >> The s390 commit message is not very clear, but it talks about "a former
> >> sleeping cpu" that "gave up the cpu voluntarily".  Does "voluntarily"
> >> that mean it is in kvm_vcpu_block?  But then at least for x86 it would
> >
> > see the prepare_to_swait_exlusive() in kvm_vcpu_block(), the task will
> > be set in TASK_INTERRUPTIBLE state, kvm_sched_out will set
> > vcpu->preempted to true iff current->state == TASK_RUNNING.
>
> Ok, I was totally blind to that "if" around vcpu->preempted = true, it's
> obvious now.
>
> I think we need two flags then, for example vcpu->preempted and vcpu->ready:
>
> - kvm_sched_out sets both of them to true iff current->state == TASK_RUNNING
>
> - kvm_vcpu_kick sets vcpu->ready to true
>
> - kvm_sched_in clears both of them
>
> This way, vmx_vcpu_pi_load can keep looking at preempted only (it
> handles voluntary preemption in pi_pre_block/pi_post_block).
>
> Also, kvm_s390_vcpu_wakeup can be changed to use kvm_vcpu_wake_up, which
> is nice.

Will do. :)

Wanpeng

Paolo Bonzini July 18, 2019, 9:39 a.m. UTC | #9

On 18/07/19 11:29, Wanpeng Li wrote:
> On Thu, 18 Jul 2019 at 17:07, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 18/07/19 10:43, Wanpeng Li wrote:
>>>>> Isnt that done by the sched_in handler?
>>>>
>>>> I am a bit confused because, if it is done by the sched_in later, I
>>>> don't understand why the sched_out handler hasn't set vcpu->preempted
>>>> already.
>>>>
>>>> The s390 commit message is not very clear, but it talks about "a former
>>>> sleeping cpu" that "gave up the cpu voluntarily".  Does "voluntarily"
>>>> that mean it is in kvm_vcpu_block?  But then at least for x86 it would
>>>
>>> see the prepare_to_swait_exlusive() in kvm_vcpu_block(), the task will
>>> be set in TASK_INTERRUPTIBLE state, kvm_sched_out will set
>>> vcpu->preempted to true iff current->state == TASK_RUNNING.
>>
>> Ok, I was totally blind to that "if" around vcpu->preempted = true, it's
>> obvious now.
>>
>> I think we need two flags then, for example vcpu->preempted and vcpu->ready:
>>
>> - kvm_sched_out sets both of them to true iff current->state == TASK_RUNNING
>>
>> - kvm_vcpu_kick sets vcpu->ready to true
>>
>> - kvm_sched_in clears both of them

... and also kvm_vcpu_on_spin should check vcpu->ready.  vcpu->preempted
remains only for use by vmx_vcpu_pi_put.

Later we could think of removing vcpu->preempted.  For example,
kvm_arch_sched_out and kvm_x86_ops->sched_out could get the code
currently in vmx_vcpu_pi_put (testing curent->state == TASK_RUNNING
instead of vcpu->preempted).  But for now there's no need and I'm not
sure it's an improvement at all.

Paolo

>> This way, vmx_vcpu_pi_load can keep looking at preempted only (it
>> handles voluntary preemption in pi_pre_block/pi_post_block).

Wanpeng Li July 18, 2019, 11:40 a.m. UTC | #10

On Thu, 18 Jul 2019 at 17:39, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 18/07/19 11:29, Wanpeng Li wrote:
> > On Thu, 18 Jul 2019 at 17:07, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 18/07/19 10:43, Wanpeng Li wrote:
> >>>>> Isnt that done by the sched_in handler?
> >>>>
> >>>> I am a bit confused because, if it is done by the sched_in later, I
> >>>> don't understand why the sched_out handler hasn't set vcpu->preempted
> >>>> already.
> >>>>
> >>>> The s390 commit message is not very clear, but it talks about "a former
> >>>> sleeping cpu" that "gave up the cpu voluntarily".  Does "voluntarily"
> >>>> that mean it is in kvm_vcpu_block?  But then at least for x86 it would
> >>>
> >>> see the prepare_to_swait_exlusive() in kvm_vcpu_block(), the task will
> >>> be set in TASK_INTERRUPTIBLE state, kvm_sched_out will set
> >>> vcpu->preempted to true iff current->state == TASK_RUNNING.
> >>
> >> Ok, I was totally blind to that "if" around vcpu->preempted = true, it's
> >> obvious now.
> >>
> >> I think we need two flags then, for example vcpu->preempted and vcpu->ready:
> >>
> >> - kvm_sched_out sets both of them to true iff current->state == TASK_RUNNING
> >>
> >> - kvm_vcpu_kick sets vcpu->ready to true
> >>
> >> - kvm_sched_in clears both of them
>
> ... and also kvm_vcpu_on_spin should check vcpu->ready.  vcpu->preempted
> remains only for use by vmx_vcpu_pi_put.

Done in v2, please have a look. :)

Regards,
Wanpeng Li

[RESEND] KVM: Boosting vCPUs that are delivering interrupts

Commit Message

Comments

Patch