[RFC,1/2] kvm vcpu: Note down pause loop exit

Message ID	20120709062032.24030.10454.sendpatchset@codeblue (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@vger.kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <kvm@vger.kernel.org> from <raghavendra.kt@linux.vnet.ibm.com>; Mon, 9 Jul 2012 06:16:35 +1000 Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 9 Jul 2012 06:16:31 +1000 From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> To: "H. Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, Avi Kivity <avi@redhat.com>, Ingo Molnar <mingo@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com>, Rik van Riel <riel@redhat.com> Cc: S390 <linux-s390@vger.kernel.org>, Carsten Otte <cotte@de.ibm.com>, Christian Borntraeger <borntraeger@de.ibm.com>, KVM <kvm@vger.kernel.org>, Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>, chegu vinod <chegu_vinod@hp.com>, "Andrew M. Theurer" <habanero@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org>, X86 <x86@kernel.org>, Gleb Natapov <gleb@redhat.com>, linux390@de.ibm.com, Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>, Joerg Roedel <joerg.roedel@amd.com> Date: Mon, 09 Jul 2012 11:50:33 +0530 Message-Id: <20120709062032.24030.10454.sendpatchset@codeblue> In-Reply-To: <20120709062012.24030.37154.sendpatchset@codeblue> References: <20120709062012.24030.37154.sendpatchset@codeblue> Subject: [PATCH RFC 1/2] kvm vcpu: Note down pause loop exit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Raghavendra K T July 9, 2012, 6:20 a.m. UTC

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

Noting pause loop exited vcpu helps in filtering right candidate to yield.
Yielding to same vcpu may result in more wastage of cpu.

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h |    7 +++++++
 arch/x86/kvm/svm.c              |    1 +
 arch/x86/kvm/vmx.c              |    1 +
 arch/x86/kvm/x86.c              |    4 +++-
 4 files changed, 12 insertions(+), 1 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Raghavendra K T July 9, 2012, 6:33 a.m. UTC | #1

On 07/09/2012 11:50 AM, Raghavendra K T wrote:
> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>
> Noting pause loop exited vcpu helps in filtering right candidate to yield.
> Yielding to same vcpu may result in more wastage of cpu.
>
> From: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
> ---

Oops. Sorry some how sign-off and from interchanged.. interchanged

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rik van Riel July 9, 2012, 10:39 p.m. UTC | #2

On 07/09/2012 02:20 AM, Raghavendra K T wrote:

> @@ -484,6 +484,13 @@ struct kvm_vcpu_arch {
>   		u64 length;
>   		u64 status;
>   	} osvw;
> +
> +	/* Pause loop exit optimization */
> +	struct {
> +		bool pause_loop_exited;
> +		bool dy_eligible;
> +	} plo;

I know kvm_vcpu_arch is traditionally not a well documented
structure, but it would be really nice if each variable inside
this sub-structure could get some documentation.

Also, do we really want to introduce another acronym here?

Or would we be better off simply calling this struct .ple,
since that is a name people are already familiar with.

Raghavendra K T July 10, 2012, 11:22 a.m. UTC | #3

On 07/10/2012 04:09 AM, Rik van Riel wrote:
> On 07/09/2012 02:20 AM, Raghavendra K T wrote:
>
>> @@ -484,6 +484,13 @@ struct kvm_vcpu_arch {
>> u64 length;
>> u64 status;
>> } osvw;
>> +
>> + /* Pause loop exit optimization */
>> + struct {
>> + bool pause_loop_exited;
>> + bool dy_eligible;
>> + } plo;
>
> I know kvm_vcpu_arch is traditionally not a well documented
> structure, but it would be really nice if each variable inside
> this sub-structure could get some documentation.

Sure. Will document it.

>
> Also, do we really want to introduce another acronym here?
>
> Or would we be better off simply calling this struct .ple,
> since that is a name people are already familiar with.

Yes. it makes sense to have .ple.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Avi Kivity July 11, 2012, 8:53 a.m. UTC | #4

On 07/09/2012 09:20 AM, Raghavendra K T wrote:
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> 
> Noting pause loop exited vcpu helps in filtering right candidate to yield.
> Yielding to same vcpu may result in more wastage of cpu.
> 
>  
>  struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f75af40..a492f5d 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
>  
>  static int pause_interception(struct vcpu_svm *svm)
>  {
> +	svm->vcpu.arch.plo.pause_loop_exited = true;
>  	kvm_vcpu_on_spin(&(svm->vcpu));
>  	return 1;
>  }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 32eb588..600fb3c 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4945,6 +4945,7 @@ out:
>  static int handle_pause(struct kvm_vcpu *vcpu)
>  {
>  	skip_emulated_instruction(vcpu);
> +	vcpu->arch.plo.pause_loop_exited = true;
>  	kvm_vcpu_on_spin(vcpu);
>  

This code is duplicated.  Should we move it to kvm_vcpu_on_spin?

That means the .plo structure needs to be in common code, but that's not
too bad perhaps.

> index be6d549..07dbd14 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	if (req_immediate_exit)
>  		smp_send_reschedule(vcpu->cpu);
> -
> +	vcpu->arch.plo.pause_loop_exited = false;

This adds some tiny overhead to vcpu entry.  You could remove it by
using the vcpu->requests mechanism to clear the flag, since
vcpu->requests is already checked on every entry.

>  	kvm_guest_enter();
>

Raghavendra K T July 11, 2012, 10:52 a.m. UTC | #5

On 07/11/2012 02:23 PM, Avi Kivity wrote:
> On 07/09/2012 09:20 AM, Raghavendra K T wrote:
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>>
>> Noting pause loop exited vcpu helps in filtering right candidate to yield.
>> Yielding to same vcpu may result in more wastage of cpu.
>>
>>
>>   struct kvm_lpage_info {
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index f75af40..a492f5d 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
>>
>>   static int pause_interception(struct vcpu_svm *svm)
>>   {
>> +	svm->vcpu.arch.plo.pause_loop_exited = true;
>>   	kvm_vcpu_on_spin(&(svm->vcpu));
>>   	return 1;
>>   }
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 32eb588..600fb3c 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -4945,6 +4945,7 @@ out:
>>   static int handle_pause(struct kvm_vcpu *vcpu)
>>   {
>>   	skip_emulated_instruction(vcpu);
>> +	vcpu->arch.plo.pause_loop_exited = true;
>>   	kvm_vcpu_on_spin(vcpu);
>>
>
> This code is duplicated.  Should we move it to kvm_vcpu_on_spin?
>
> That means the .plo structure needs to be in common code, but that's not
> too bad perhaps.
>

Since PLE is very much tied to x86, and proposed changes are very much 
specific to PLE handler, I thought it is better to make arch specific.

So do you think it is good to move inside vcpu_on_spin and make ple
structure belong to common code?

>> index be6d549..07dbd14 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>
>>   	if (req_immediate_exit)
>>   		smp_send_reschedule(vcpu->cpu);
>> -
>> +	vcpu->arch.plo.pause_loop_exited = false;
>
> This adds some tiny overhead to vcpu entry.  You could remove it by
> using the vcpu->requests mechanism to clear the flag, since
> vcpu->requests is already checked on every entry.

So IIUC,  let's have request bit for indicating PLE,

pause_interception() /handle_pause()
{
  make_request(PLE_REQUEST)
  vcpu_on_spin()

}

check_eligibility()
  {
  !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST)  && 
dy_eligible())
.
.
}

vcpu_run()
{

check_request(PLE_REQUEST)
.
.
}

Is this is the expected flow you had in mind?

[ But my only concern was not resetting for cases where we do not do 
guest_enter(). will test how that goes].

>
>>   	kvm_guest_enter();
>>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Avi Kivity July 11, 2012, 11:18 a.m. UTC | #6

On 07/11/2012 01:52 PM, Raghavendra K T wrote:
> On 07/11/2012 02:23 PM, Avi Kivity wrote:
>> On 07/09/2012 09:20 AM, Raghavendra K T wrote:
>>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>>>
>>> Noting pause loop exited vcpu helps in filtering right candidate to
>>> yield.
>>> Yielding to same vcpu may result in more wastage of cpu.
>>>
>>>
>>>   struct kvm_lpage_info {
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index f75af40..a492f5d 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct
>>> vcpu_svm *svm)
>>>
>>>   static int pause_interception(struct vcpu_svm *svm)
>>>   {
>>> +    svm->vcpu.arch.plo.pause_loop_exited = true;
>>>       kvm_vcpu_on_spin(&(svm->vcpu));
>>>       return 1;
>>>   }
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 32eb588..600fb3c 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -4945,6 +4945,7 @@ out:
>>>   static int handle_pause(struct kvm_vcpu *vcpu)
>>>   {
>>>       skip_emulated_instruction(vcpu);
>>> +    vcpu->arch.plo.pause_loop_exited = true;
>>>       kvm_vcpu_on_spin(vcpu);
>>>
>>
>> This code is duplicated.  Should we move it to kvm_vcpu_on_spin?
>>
>> That means the .plo structure needs to be in common code, but that's not
>> too bad perhaps.
>>
> 
> Since PLE is very much tied to x86, and proposed changes are very much
> specific to PLE handler, I thought it is better to make arch specific.
> 
> So do you think it is good to move inside vcpu_on_spin and make ple
> structure belong to common code?

See the discussion with Christian.  PLE is tied to x86, but cpu_relax()
and facilities to trap it are not.

>>
>> This adds some tiny overhead to vcpu entry.  You could remove it by
>> using the vcpu->requests mechanism to clear the flag, since
>> vcpu->requests is already checked on every entry.
> 
> So IIUC,  let's have request bit for indicating PLE,
> 
> pause_interception() /handle_pause()
> {
>  make_request(PLE_REQUEST)
>  vcpu_on_spin()
> 
> }
> 
> check_eligibility()
>  {
>  !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST)  &&
> dy_eligible())
> .
> .
> }
> 
> vcpu_run()
> {
> 
> check_request(PLE_REQUEST)
> .
> .
> }
> 
> Is this is the expected flow you had in mind?

Yes, something like that.

> 
> [ But my only concern was not resetting for cases where we do not do
> guest_enter(). will test how that goes].

Hm, suppose we're the next-in-line for a ticket lock and exit due to
PLE.  The lock holder completes and unlocks, which really assigns the
lock to us.  So now we are the lock owner, yet we are marked as don't
yield-to-us in the PLE code.

Raghavendra K T July 11, 2012, 11:56 a.m. UTC | #7

On 07/11/2012 04:48 PM, Avi Kivity wrote:
> On 07/11/2012 01:52 PM, Raghavendra K T wrote:
>> On 07/11/2012 02:23 PM, Avi Kivity wrote:
>>> On 07/09/2012 09:20 AM, Raghavendra K T wrote:
>>>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>>>>
>>>> Noting pause loop exited vcpu helps in filtering right candidate to
>>>> yield.
>>>> Yielding to same vcpu may result in more wastage of cpu.
>>>>
>>>>
>>>>    struct kvm_lpage_info {
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index f75af40..a492f5d 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct
>>>> vcpu_svm *svm)
>>>>
>>>>    static int pause_interception(struct vcpu_svm *svm)
>>>>    {
>>>> +    svm->vcpu.arch.plo.pause_loop_exited = true;
>>>>        kvm_vcpu_on_spin(&(svm->vcpu));
>>>>        return 1;
>>>>    }
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index 32eb588..600fb3c 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -4945,6 +4945,7 @@ out:
>>>>    static int handle_pause(struct kvm_vcpu *vcpu)
>>>>    {
>>>>        skip_emulated_instruction(vcpu);
>>>> +    vcpu->arch.plo.pause_loop_exited = true;
>>>>        kvm_vcpu_on_spin(vcpu);
>>>>
>>>
>>> This code is duplicated.  Should we move it to kvm_vcpu_on_spin?
>>>
>>> That means the .plo structure needs to be in common code, but that's not
>>> too bad perhaps.
>>>
>>
>> Since PLE is very much tied to x86, and proposed changes are very much
>> specific to PLE handler, I thought it is better to make arch specific.
>>
>> So do you think it is good to move inside vcpu_on_spin and make ple
>> structure belong to common code?
>
> See the discussion with Christian.  PLE is tied to x86, but cpu_relax()
> and facilities to trap it are not.

Yep.

>
>>>
>>> This adds some tiny overhead to vcpu entry.  You could remove it by
>>> using the vcpu->requests mechanism to clear the flag, since
>>> vcpu->requests is already checked on every entry.
>>
>> So IIUC,  let's have request bit for indicating PLE,
>>
>> pause_interception() /handle_pause()
>> {
>>   make_request(PLE_REQUEST)
>>   vcpu_on_spin()
>>
>> }
>>
>> check_eligibility()
>>   {
>>   !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST)&&
>> dy_eligible())
>> .
>> .
>> }
>>
>> vcpu_run()
>> {
>>
>> check_request(PLE_REQUEST)
>> .
>> .
>> }
>>
>> Is this is the expected flow you had in mind?
>
> Yes, something like that.

ok..

>
>>
>> [ But my only concern was not resetting for cases where we do not do
>> guest_enter(). will test how that goes].
>
> Hm, suppose we're the next-in-line for a ticket lock and exit due to
> PLE.  The lock holder completes and unlocks, which really assigns the
> lock to us.  So now we are the lock owner, yet we are marked as don't
> yield-to-us in the PLE code.

Yes.. off-topic but that is solved by kicked flag in PV spinlocks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrew Jones July 11, 2012, 12:41 p.m. UTC | #8

----- Original Message -----
> >
> > Hm, suppose we're the next-in-line for a ticket lock and exit due
> > to
> > PLE.  The lock holder completes and unlocks, which really assigns
> > the
> > lock to us.  So now we are the lock owner, yet we are marked as
> > don't
> > yield-to-us in the PLE code.
> 
> Yes.. off-topic but that is solved by kicked flag in PV spinlocks.
> 

Yeah, this is a different, but related, topic. pvticketlocks not
only help yield spinning vcpus, but also allow the use of ticketlocks
for the guaranteed fairness. If we want that fairness we'll always
need some pv-ness, even if it's a mostly PLE solution. I see that as
a reasonable reason to take the pvticketlock series, assuming it
performs at least as well as PLE. The following options have all been
brought up at some point by various people, and all have their own
pluses and minuses;

PLE-only best-effort:
   + hardware-only, algorithm improvements can be made independent of
     guest OSes
   - has limited info about spinning vcpus, making it hard to improve
     the algorithm (improved with an auto-adjusting ple_window?)
   - perf enhancement/degradation is workload/ple_window dependant
   - impossible? to guarantee FIFO order

pvticketlocks:
   - have to maintain pv code, both hosts and guests
   + perf is only workload dependant (should disable ple to avoid
     interference?)
   + guarantees FIFO order
   + can fall-back on PLE-only if the guest doesn't support it

hybrid:
   + takes advantage of the hw support
   - still requires host and guest pv code
   - will likely make perf dependant on ple_window again
   + guarantees FIFO order

???: did I miss any?

I think more benchmarking of PLE vs. pvticketlocks is needed, which
I'm working on. If we see that it performs just as well or better,
then IMHO, we should consider committing Raghu's latest version of
the pvticketlock series, perhaps with and additional patch that
auto-disables PLE when pvticketlocks are enabled.

Drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nikunj A. Dadhania July 12, 2012, 10:58 a.m. UTC | #9

On Wed, 11 Jul 2012 16:22:29 +0530, Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:
> On 07/11/2012 02:23 PM, Avi Kivity wrote:
> >
> > This adds some tiny overhead to vcpu entry.  You could remove it by
> > using the vcpu->requests mechanism to clear the flag, since
> > vcpu->requests is already checked on every entry.
> 
> So IIUC,  let's have request bit for indicating PLE,
> 
> pause_interception() /handle_pause()
> {
>   make_request(PLE_REQUEST)
>   vcpu_on_spin()
> 
> }
> 
> check_eligibility()
>   {
>   !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST)  && 
> dy_eligible())
> .
> .
> }
> 
> vcpu_run()
> {
> 
> check_request(PLE_REQUEST)
>
I know check_request will clear PLE_REQUEST, but you just need a
clear_request here, right?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Raghavendra K T July 12, 2012, 11:02 a.m. UTC | #10

On 07/12/2012 04:28 PM, Nikunj A Dadhania wrote:
> On Wed, 11 Jul 2012 16:22:29 +0530, Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>  wrote:
>> On 07/11/2012 02:23 PM, Avi Kivity wrote:
>>>
>>> This adds some tiny overhead to vcpu entry.  You could remove it by
>>> using the vcpu->requests mechanism to clear the flag, since
>>> vcpu->requests is already checked on every entry.
>>
>> So IIUC,  let's have request bit for indicating PLE,
>>
>> pause_interception() /handle_pause()
>> {
>>    make_request(PLE_REQUEST)
>>    vcpu_on_spin()
>>
>> }
>>
>> check_eligibility()
>>    {
>>    !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST)&&
>> dy_eligible())
>> .
>> .
>> }
>>
>> vcpu_run()
>> {
>>
>> check_request(PLE_REQUEST)
>>
> I know check_request will clear PLE_REQUEST, but you just need a
> clear_request here, right?
>

Yes. tried to use check_request for clearing. But I ended up in
different implementation. (latest thread)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC,1/2] kvm vcpu: Note down pause loop exit

Commit Message

Comments

Patch