diff mbox

[RFC,1/1] kvm: Use vcpu_id as pivot instead of last boosted vcpu in PLE handler

Message ID 20120829192100.22412.92575.sendpatchset@codeblue (mailing list archive)
State New, archived
Headers show

Commit Message

Raghavendra K T Aug. 29, 2012, 7:21 p.m. UTC
The idea of starting from next vcpu (source of yield_to + 1) seem to work
 well for overcomitted guest rather than using last boosted vcpu. We can also
 remove per VM variable with this approach.
 
 Iteration for eligible candidate after this patch starts from vcpu source+1
 and ends at source-1 (after wrapping)
 
 Thanks Nikunj for his quick verification of the patch.
 
 Please let me know if this patch is interesting and makes sense.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gleb Natapov Sept. 2, 2012, 10:12 a.m. UTC | #1
On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote:
>  The idea of starting from next vcpu (source of yield_to + 1) seem to work
>  well for overcomitted guest rather than using last boosted vcpu. We can also
>  remove per VM variable with this approach.
>  
>  Iteration for eligible candidate after this patch starts from vcpu source+1
>  and ends at source-1 (after wrapping)
>  
>  Thanks Nikunj for his quick verification of the patch.
>  
>  Please let me know if this patch is interesting and makes sense.
> 
This last_boosted_vcpu thing caused us trouble during attempt to
implement vcpu destruction. It is good to see it removed from this POV.

> ====8<====
> From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> 
>  Currently we use next vcpu to last boosted vcpu as starting point
>  while deciding eligible vcpu for directed yield.
> 
>  In overcomitted scenarios, if more vcpu try to do directed yield,
>  they start from same vcpu, resulting in wastage of cpu time (because of
>  failing yields and double runqueue lock).
>  
>  Since probability of same vcpu trying to do directed yield is already
>  prevented by improved PLE handler, we can start from next vcpu from source
>  of yield_to.
> 
> Suggested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
> 
>  include/linux/kvm_host.h |    1 -
>  virt/kvm/kvm_main.c      |   12 ++++--------
>  2 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b70b48b..64a090d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -275,7 +275,6 @@ struct kvm {
>  #endif
>  	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
>  	atomic_t online_vcpus;
> -	int last_boosted_vcpu;
>  	struct list_head vm_list;
>  	struct mutex lock;
>  	struct kvm_io_bus *buses[KVM_NR_BUSES];
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2468523..65a6c83 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1584,7 +1584,6 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  {
>  	struct kvm *kvm = me->kvm;
>  	struct kvm_vcpu *vcpu;
> -	int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
>  	int yielded = 0;
>  	int pass;
>  	int i;
> @@ -1594,21 +1593,18 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  	 * currently running, because it got preempted by something
>  	 * else and called schedule in __vcpu_run.  Hopefully that
>  	 * VCPU is holding the lock that we need and will release it.
> -	 * We approximate round-robin by starting at the last boosted VCPU.
> +	 * We approximate round-robin by starting at the next VCPU.
>  	 */
>  	for (pass = 0; pass < 2 && !yielded; pass++) {
>  		kvm_for_each_vcpu(i, vcpu, kvm) {
> -			if (!pass && i <= last_boosted_vcpu) {
> -				i = last_boosted_vcpu;
> +			if (!pass && i <= me->vcpu_id) {
> +				i = me->vcpu_id;
>  				continue;
> -			} else if (pass && i > last_boosted_vcpu)
> +			} else if (pass && i >= me->vcpu_id)
>  				break;
> -			if (vcpu == me)
> -				continue;
>  			if (waitqueue_active(&vcpu->wq))
>  				continue;
>  			if (kvm_vcpu_yield_to(vcpu)) {
> -				kvm->last_boosted_vcpu = i;
>  				yielded = 1;
>  				break;
>  			}

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rik van Riel Sept. 2, 2012, 4:29 p.m. UTC | #2
On 09/02/2012 06:12 AM, Gleb Natapov wrote:
> On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote:
>>   The idea of starting from next vcpu (source of yield_to + 1) seem to work
>>   well for overcomitted guest rather than using last boosted vcpu. We can also
>>   remove per VM variable with this approach.
>>
>>   Iteration for eligible candidate after this patch starts from vcpu source+1
>>   and ends at source-1 (after wrapping)
>>
>>   Thanks Nikunj for his quick verification of the patch.
>>
>>   Please let me know if this patch is interesting and makes sense.
>>
> This last_boosted_vcpu thing caused us trouble during attempt to
> implement vcpu destruction. It is good to see it removed from this POV.

I like this implementation.  It should achieve pretty much
the same as my old code, but without the downsides and without
having to keep the same amount of global state.
Raghavendra K T Sept. 4, 2012, 11:57 a.m. UTC | #3
On 09/02/2012 09:59 PM, Rik van Riel wrote:
> On 09/02/2012 06:12 AM, Gleb Natapov wrote:
>> On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote:
>>> The idea of starting from next vcpu (source of yield_to + 1) seem to
>>> work
>>> well for overcomitted guest rather than using last boosted vcpu. We
>>> can also
>>> remove per VM variable with this approach.
>>>
>>> Iteration for eligible candidate after this patch starts from vcpu
>>> source+1
>>> and ends at source-1 (after wrapping)
>>>
>>> Thanks Nikunj for his quick verification of the patch.
>>>
>>> Please let me know if this patch is interesting and makes sense.
>>>
>> This last_boosted_vcpu thing caused us trouble during attempt to
>> implement vcpu destruction. It is good to see it removed from this POV.
>
> I like this implementation. It should achieve pretty much
> the same as my old code, but without the downsides and without
> having to keep the same amount of global state.
>

My theoretical understanding how it would help is,

       |
       V
T0 ------- T1

suppose there are 4 vcpus (v1..v4) out of 32/64 vcpus simpultaneously 
enter directed yield handler,

if last_boosted_vcpu = i then v1 .. v4 will start from i, and there may
be some unnecessary attempts for directed yields.

We may not see such attempts with above patch. But again I agree that,
whole directed_yield stuff itself is very complicated because of 
possibility of each vcpu in different state (running/pauseloop exited 
while spinning/eligible)  and how they are located w.r.t each other.

Here is the result I got for ebizzy, 32 vcpu guest 32 core PLE machine
for 1x 2x and 3x overcommits.

base = 3.5-rc5 kernel with ple handler improvements patches applied
patched = base + vcpuid patch

      base         stdev       patched     stdev       %improvement
1x  1955.6250    39.8961    1863.3750    37.8302    -4.71716
2x  2475.3750   165.0307    3078.8750   341.9500    24.38014
3x  2071.5556    91.5370    2112.6667    56.6171     1.98455

Note:
I have to admit that, I am seeing very inconsistent results while 
experimenting with 3.6-rc kernel (not specific to vcpuid patch but as a 
whole) but not sure if it is some thing wrong in my config or should I 
spend some time debugging. Anybody has observed same?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghavendra K T Sept. 15, 2012, 2:22 a.m. UTC | #4
On 09/02/2012 09:59 PM, Rik van Riel wrote:
> On 09/02/2012 06:12 AM, Gleb Natapov wrote:
>> On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote:
>>> The idea of starting from next vcpu (source of yield_to + 1) seem to
>>> work
>>> well for overcomitted guest rather than using last boosted vcpu. We
>>> can also
>>> remove per VM variable with this approach.
>>>
>>> Iteration for eligible candidate after this patch starts from vcpu
>>> source+1
>>> and ends at source-1 (after wrapping)
>>>
>>> Thanks Nikunj for his quick verification of the patch.
>>>
>>> Please let me know if this patch is interesting and makes sense.
>>>
>> This last_boosted_vcpu thing caused us trouble during attempt to
>> implement vcpu destruction. It is good to see it removed from this POV.
>
> I like this implementation. It should achieve pretty much
> the same as my old code, but without the downsides and without
> having to keep the same amount of global state.
>

I able to test this on 3.6-rc5 (where I do not see inconsistency may be
it was my bad to go with rc1), with 32 guest 1x and 2x overcommit
scenario

Here is the result on 16 core ple machine (with HT 32 thread) x240
machine

base = 3.6-rc5 + ple handler improvement patch
patched = base + vcpuid usage patch

+-----------+-----------+-----------+------------+-----------+
      ebizzy (records/sec higher is better)
+-----------+-----------+-----------+------------+-----------+
     base         stdev       patched    stdev        %improve
+-----------+-----------+-----------+------------+-----------+
1x 11293.3750   624.4378   11242.8750   583.1757    -0.44716
2x  3641.8750   468.9400    4088.8750   290.5470    12.27390
+-----------+-----------+-----------+------------+-----------+


Avi, Marcelo.. any comments on this?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

====8<====
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

 Currently we use next vcpu to last boosted vcpu as starting point
 while deciding eligible vcpu for directed yield.

 In overcomitted scenarios, if more vcpu try to do directed yield,
 they start from same vcpu, resulting in wastage of cpu time (because of
 failing yields and double runqueue lock).
 
 Since probability of same vcpu trying to do directed yield is already
 prevented by improved PLE handler, we can start from next vcpu from source
 of yield_to.

Suggested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---

 include/linux/kvm_host.h |    1 -
 virt/kvm/kvm_main.c      |   12 ++++--------
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b70b48b..64a090d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -275,7 +275,6 @@  struct kvm {
 #endif
 	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
 	atomic_t online_vcpus;
-	int last_boosted_vcpu;
 	struct list_head vm_list;
 	struct mutex lock;
 	struct kvm_io_bus *buses[KVM_NR_BUSES];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2468523..65a6c83 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1584,7 +1584,6 @@  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
 	struct kvm *kvm = me->kvm;
 	struct kvm_vcpu *vcpu;
-	int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
 	int yielded = 0;
 	int pass;
 	int i;
@@ -1594,21 +1593,18 @@  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 	 * currently running, because it got preempted by something
 	 * else and called schedule in __vcpu_run.  Hopefully that
 	 * VCPU is holding the lock that we need and will release it.
-	 * We approximate round-robin by starting at the last boosted VCPU.
+	 * We approximate round-robin by starting at the next VCPU.
 	 */
 	for (pass = 0; pass < 2 && !yielded; pass++) {
 		kvm_for_each_vcpu(i, vcpu, kvm) {
-			if (!pass && i <= last_boosted_vcpu) {
-				i = last_boosted_vcpu;
+			if (!pass && i <= me->vcpu_id) {
+				i = me->vcpu_id;
 				continue;
-			} else if (pass && i > last_boosted_vcpu)
+			} else if (pass && i >= me->vcpu_id)
 				break;
-			if (vcpu == me)
-				continue;
 			if (waitqueue_active(&vcpu->wq))
 				continue;
 			if (kvm_vcpu_yield_to(vcpu)) {
-				kvm->last_boosted_vcpu = i;
 				yielded = 1;
 				break;
 			}