[RFC] KVM: optimize the kvm_vcpu_on_spin

Message ID	1501309377-195256-1-git-send-email-longpeng2@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: "Longpeng(Mike)" <longpeng2@huawei.com> To: <pbonzini@redhat.com>, <rkrcmar@redhat.com> CC: <agraf@suse.com>, <borntraeger@de.ibm.com>, <cohuck@redhat.com>, <christoffer.dall@linaro.org>, <marc.zyngier@arm.com>, <james.hogan@imgtec.com>, <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <weidong.huang@huawei.com>, <arei.gonglei@huawei.com>, <wangxinxin.wang@huawei.com>, <longpeng.mike@gmail.com>, "Longpeng(Mike)" <longpeng2@huawei.com> Subject: [RFC] KVM: optimize the kvm_vcpu_on_spin Date: Sat, 29 Jul 2017 14:22:57 +0800 Message-ID: <1501309377-195256-1-git-send-email-longpeng2@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Sender: kvm-owner@vger.kernel.org Precedence: bulk

Longpeng(Mike) July 29, 2017, 6:22 a.m. UTC

We had disscuss the idea here:
https://www.spinics.net/lists/kvm/msg140593.html

I think it's also suitable for other architectures.

If the vcpu(me) exit due to request a usermode spinlock, then
the spinlock-holder may be preempted in usermode or kernmode.
But if the vcpu(me) is in kernmode, then the holder must be
preempted in kernmode, so we should choose a vcpu in kernmode
as the most eligible candidate.

PS: I only implement X86 arch currently for I'm not familiar
    with other architecture.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
---
 arch/mips/kvm/mips.c       | 5 +++++
 arch/powerpc/kvm/powerpc.c | 5 +++++
 arch/s390/kvm/kvm-s390.c   | 5 +++++
 arch/x86/kvm/x86.c         | 5 +++++
 include/linux/kvm_host.h   | 4 ++++
 virt/kvm/arm/arm.c         | 5 +++++
 virt/kvm/kvm_main.c        | 9 ++++++++-
 7 files changed, 37 insertions(+), 1 deletion(-)

David Hildenbrand July 31, 2017, 11:31 a.m. UTC | #1

[no idea if this change makes sense (and especially if it has any bad
side effects), do you have performance numbers? I'll just have a look at
the general structure of the patch in the meanwhile]

> +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu)

kvm_arch_vcpu_in_kernel() ?

> +{
> +	return kvm_x86_ops->get_cpl(vcpu) == 0;
> +}
> +
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>  {
>  	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 648b34c..f8f0d74 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -272,6 +272,9 @@ struct kvm_vcpu {
>  	} spin_loop;
>  #endif
>  	bool preempted;
> +	/* If vcpu is in kernel-mode when preempted */
> +	bool in_kernmode;
> +

Why do you have to store that ...

[...]> +	me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me);
>  	kvm_vcpu_set_in_spin_loop(me, true);
>  	/*
>  	 * We boost the priority of a VCPU that is runnable but not
> @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  				continue;
>  			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
>  				continue;
> +			if (me->in_kernmode && !vcpu->in_kernmode)

Wouldn't it be easier to simply have

in_kernel = kvm_arch_vcpu_in_kernel(me);
...
if (in_kernel && !kvm_arch_vcpu_in_kernel(vcpu))
...

> +				continue;
>  			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>  				continue;
>  
> @@ -4009,8 +4013,11 @@ static void kvm_sched_out(struct preempt_notifier *pn,
>  {
>  	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>  
> -	if (current->state == TASK_RUNNING)
> +	if (current->state == TASK_RUNNING) {
>  		vcpu->preempted = true;
> +		vcpu->in_kernmode = kvm_arch_vcpu_spin_kernmode(vcpu);
> +	}
> +

so you don't have to do this change, too.

>  	kvm_arch_vcpu_put(vcpu);
>  }
>  
>

Longpeng(Mike) July 31, 2017, 12:08 p.m. UTC | #2

Hi David,

On 2017/7/31 19:31, David Hildenbrand wrote:

> [no idea if this change makes sense (and especially if it has any bad
> side effects), do you have performance numbers? I'll just have a look at
> the general structure of the patch in the meanwhile]
> 

I haven't any test results yet, could you give me some suggestion about what
benchmarks are suitable ?

>> +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu)
> 
> kvm_arch_vcpu_in_kernel() ?
> 

Um...yes, I'll correct this.

>> +{
>> +	return kvm_x86_ops->get_cpl(vcpu) == 0;
>> +}
>> +
>>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>>  {
>>  	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 648b34c..f8f0d74 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -272,6 +272,9 @@ struct kvm_vcpu {
>>  	} spin_loop;
>>  #endif
>>  	bool preempted;
>> +	/* If vcpu is in kernel-mode when preempted */
>> +	bool in_kernmode;
>> +
> 
> Why do you have to store that ...
> 

> [...]> +	me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me);
>>  	kvm_vcpu_set_in_spin_loop(me, true);
>>  	/*
>>  	 * We boost the priority of a VCPU that is runnable but not
>> @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>>  				continue;
>>  			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
>>  				continue;
>> +			if (me->in_kernmode && !vcpu->in_kernmode)
> 
> Wouldn't it be easier to simply have
> 
> in_kernel = kvm_arch_vcpu_in_kernel(me);
> ...
> if (in_kernel && !kvm_arch_vcpu_in_kernel(vcpu))
> ...
> 

I'm not sure whether the operation of get the vcpu's priority-level is
expensive on all architectures, so I record it in kvm_sched_out() for
minimal the extra cycles cost in kvm_vcpu_on_spin().

>> +				continue;
>>  			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>>  				continue;
>>  
>> @@ -4009,8 +4013,11 @@ static void kvm_sched_out(struct preempt_notifier *pn,
>>  {
>>  	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>>  
>> -	if (current->state == TASK_RUNNING)
>> +	if (current->state == TASK_RUNNING) {
>>  		vcpu->preempted = true;
>> +		vcpu->in_kernmode = kvm_arch_vcpu_spin_kernmode(vcpu);
>> +	}
>> +
> 
> so you don't have to do this change, too.
> 
>>  	kvm_arch_vcpu_put(vcpu);
>>  }
>>  
>>
> 
>

David Hildenbrand July 31, 2017, 12:27 p.m. UTC | #3

> I'm not sure whether the operation of get the vcpu's priority-level is
> expensive on all architectures, so I record it in kvm_sched_out() for
> minimal the extra cycles cost in kvm_vcpu_on_spin().
> 

as you only care for x86 right now either way, you can directly optimize
here for the good (here: x86) case (keeping changes and therefore
possible bugs minimal).

Cornelia Huck July 31, 2017, 12:31 p.m. UTC | #4

On Mon, 31 Jul 2017 20:08:14 +0800
"Longpeng (Mike)" <longpeng2@huawei.com> wrote:

> Hi David,
> 
> On 2017/7/31 19:31, David Hildenbrand wrote:

> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index 648b34c..f8f0d74 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -272,6 +272,9 @@ struct kvm_vcpu {
> >>  	} spin_loop;
> >>  #endif
> >>  	bool preempted;
> >> +	/* If vcpu is in kernel-mode when preempted */
> >> +	bool in_kernmode;
> >> +  
> > 
> > Why do you have to store that ...
> >   
> 
> > [...]> +	me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me);
> >>  	kvm_vcpu_set_in_spin_loop(me, true);
> >>  	/*
> >>  	 * We boost the priority of a VCPU that is runnable but not
> >> @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >>  				continue;
> >>  			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
> >>  				continue;
> >> +			if (me->in_kernmode && !vcpu->in_kernmode)  
> > 
> > Wouldn't it be easier to simply have
> > 
> > in_kernel = kvm_arch_vcpu_in_kernel(me);
> > ...
> > if (in_kernel && !kvm_arch_vcpu_in_kernel(vcpu))
> > ...
> >   
> 
> I'm not sure whether the operation of get the vcpu's priority-level is
> expensive on all architectures, so I record it in kvm_sched_out() for
> minimal the extra cycles cost in kvm_vcpu_on_spin().

As it is now, this handling looks a bit inconsistent. You only update
the field on sched-out via preemption _or_ if kvm_vcpu_on_spin is
called for the vcpu. In most contexts, this field will have stale
content.

Also, would checking for kernel mode be more expensive than the various
other checks already done in this function?

[I like David's suggestion.]

> 
> >> +				continue;
> >>  			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
> >>  				continue;
> >>

Longpeng(Mike) July 31, 2017, 1:01 p.m. UTC | #5

On 2017/7/31 20:31, Cornelia Huck wrote:

> On Mon, 31 Jul 2017 20:08:14 +0800
> "Longpeng (Mike)" <longpeng2@huawei.com> wrote:
> 
>> Hi David,
>>
>> On 2017/7/31 19:31, David Hildenbrand wrote:
> 
>>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>>> index 648b34c..f8f0d74 100644
>>>> --- a/include/linux/kvm_host.h
>>>> +++ b/include/linux/kvm_host.h
>>>> @@ -272,6 +272,9 @@ struct kvm_vcpu {
>>>>  	} spin_loop;
>>>>  #endif
>>>>  	bool preempted;
>>>> +	/* If vcpu is in kernel-mode when preempted */
>>>> +	bool in_kernmode;
>>>> +  
>>>
>>> Why do you have to store that ...
>>>   
>>
>>> [...]> +	me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me);
>>>>  	kvm_vcpu_set_in_spin_loop(me, true);
>>>>  	/*
>>>>  	 * We boost the priority of a VCPU that is runnable but not
>>>> @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>>>>  				continue;
>>>>  			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
>>>>  				continue;
>>>> +			if (me->in_kernmode && !vcpu->in_kernmode)  
>>>
>>> Wouldn't it be easier to simply have
>>>
>>> in_kernel = kvm_arch_vcpu_in_kernel(me);
>>> ...
>>> if (in_kernel && !kvm_arch_vcpu_in_kernel(vcpu))
>>> ...
>>>   
>>
>> I'm not sure whether the operation of get the vcpu's priority-level is
>> expensive on all architectures, so I record it in kvm_sched_out() for
>> minimal the extra cycles cost in kvm_vcpu_on_spin().
> 
> As it is now, this handling looks a bit inconsistent. You only update
> the field on sched-out via preemption _or_ if kvm_vcpu_on_spin is
> called for the vcpu. In most contexts, this field will have stale
> content.
> 
> Also, would checking for kernel mode be more expensive than the various
> other checks already done in this function?
> 

> [I like David's suggestion.]
> 


Hi Cornelia & David,

I'll take your suggestion, thanks :)

>>
>>>> +				continue;
>>>>  			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>>>>  				continue;
>>>>  
> 
> .
>

Paolo Bonzini July 31, 2017, 1:20 p.m. UTC | #6

On 31/07/2017 14:27, David Hildenbrand wrote:
>> I'm not sure whether the operation of get the vcpu's priority-level is
>> expensive on all architectures, so I record it in kvm_sched_out() for
>> minimal the extra cycles cost in kvm_vcpu_on_spin().
>>
> as you only care for x86 right now either way, you can directly optimize
> here for the good (here: x86) case (keeping changes and therefore
> possible bugs minimal).

I agree with Cornelia that this is inconsistent, so you shouldn't update
me->in_kernmode in kvm_vcpu_on_spin.  However, get_cpl requires
vcpu_load on Intel x86, so Mike's patch is necessary (vmx_get_cpl ->
vmx_read_guest_seg_ar -> vmcs_read32).

Alternatively, we can add a new callback kvm_x86_ops->sched_out to x86
KVM, and call vmx_get_cpl from the Intel implementation (vmx_sched_out).
 This will cache the result until the next sched_in, so that
kvm_vcpu_on_spin can use it.

Paolo

Marc Zyngier July 31, 2017, 1:42 p.m. UTC | #7

On 29/07/17 07:22, Longpeng(Mike) wrote:
> We had disscuss the idea here:
> https://www.spinics.net/lists/kvm/msg140593.html
> 
> I think it's also suitable for other architectures.
> 
> If the vcpu(me) exit due to request a usermode spinlock, then
> the spinlock-holder may be preempted in usermode or kernmode.
> But if the vcpu(me) is in kernmode, then the holder must be
> preempted in kernmode, so we should choose a vcpu in kernmode
> as the most eligible candidate.

That seems to preclude any form of locking between userspace and kernel
(which probably wouldn't be Linux). Are you sure that this form of
construct is not used anywhere? I have the feeling this patch could
break this scenario...

Thanks,

	M.

Paolo Bonzini July 31, 2017, 2:23 p.m. UTC | #8

On 31/07/2017 15:42, Marc Zyngier wrote:
>> If the vcpu(me) exit due to request a usermode spinlock, then
>> the spinlock-holder may be preempted in usermode or kernmode.
>> But if the vcpu(me) is in kernmode, then the holder must be
>> preempted in kernmode, so we should choose a vcpu in kernmode
>> as the most eligible candidate.
>
> That seems to preclude any form of locking between userspace and kernel
> (which probably wouldn't be Linux). Are you sure that this form of
> construct is not used anywhere? I have the feeling this patch could
> break this scenario...

It's just a heuristic; it would only be broken if you overcommit, and it
would be just as broken as if KVM didn't implement directed yield at all.

Paolo

Longpeng(Mike) Aug. 1, 2017, 3:26 a.m. UTC | #9

On 2017/7/31 21:20, Paolo Bonzini wrote:

> On 31/07/2017 14:27, David Hildenbrand wrote:
>>> I'm not sure whether the operation of get the vcpu's priority-level is
>>> expensive on all architectures, so I record it in kvm_sched_out() for
>>> minimal the extra cycles cost in kvm_vcpu_on_spin().
>>>
>> as you only care for x86 right now either way, you can directly optimize
>> here for the good (here: x86) case (keeping changes and therefore
>> possible bugs minimal).
> 
> I agree with Cornelia that this is inconsistent, so you shouldn't update
> me->in_kernmode in kvm_vcpu_on_spin.  However, get_cpl requires
> vcpu_load on Intel x86, so Mike's patch is necessary (vmx_get_cpl ->
> vmx_read_guest_seg_ar -> vmcs_read32).
> 

Hi Paolo,

It seems that other architectures(e.g. arm/s390) needn't to cache the result,
but x86 need, so I need to move 'in_kernmode' into kvm_vcpu_arch and only add
this field to x86, right?

> Alternatively, we can add a new callback kvm_x86_ops->sched_out to x86
> KVM, and call vmx_get_cpl from the Intel implementation (vmx_sched_out).


In this approach, vmx_sched_out would only call vmx_get_cpl, isn't too
redundant, because we can just call kvm_x86_ops->get_cpl instead at the right place?

>  This will cache the result until the next sched_in, so that


'until the next sched_in' --> Do we need to clear the result in sched in ?

> kvm_vcpu_on_spin can use it.
> 
> Paolo
> 
> .
>

Paolo Bonzini Aug. 1, 2017, 8:32 a.m. UTC | #10

On 01/08/2017 05:26, Longpeng (Mike) wrote:
> 
> 
> On 2017/7/31 21:20, Paolo Bonzini wrote:
> 
>> On 31/07/2017 14:27, David Hildenbrand wrote:
>>>> I'm not sure whether the operation of get the vcpu's priority-level is
>>>> expensive on all architectures, so I record it in kvm_sched_out() for
>>>> minimal the extra cycles cost in kvm_vcpu_on_spin().
>>>>
>>> as you only care for x86 right now either way, you can directly optimize
>>> here for the good (here: x86) case (keeping changes and therefore
>>> possible bugs minimal).
>>
>> I agree with Cornelia that this is inconsistent, so you shouldn't update
>> me->in_kernmode in kvm_vcpu_on_spin.  However, get_cpl requires
>> vcpu_load on Intel x86, so Mike's patch is necessary (vmx_get_cpl ->
>> vmx_read_guest_seg_ar -> vmcs_read32).
>>
> 
> Hi Paolo,
> 
> It seems that other architectures(e.g. arm/s390) needn't to cache the result,
> but x86 need, so I need to move 'in_kernmode' into kvm_vcpu_arch and only add
> this field to x86, right?

That's another way to do it, and I like it.

>>  This will cache the result until the next sched_in, so that
> 
> 'until the next sched_in' --> Do we need to clear the result in sched in ?

No, thanks.

Paolo

[RFC] KVM: optimize the kvm_vcpu_on_spin

Commit Message

Comments

Patch