Message ID | 20120829192100.22412.92575.sendpatchset@codeblue (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote: > The idea of starting from next vcpu (source of yield_to + 1) seem to work > well for overcomitted guest rather than using last boosted vcpu. We can also > remove per VM variable with this approach. > > Iteration for eligible candidate after this patch starts from vcpu source+1 > and ends at source-1 (after wrapping) > > Thanks Nikunj for his quick verification of the patch. > > Please let me know if this patch is interesting and makes sense. > This last_boosted_vcpu thing caused us trouble during attempt to implement vcpu destruction. It is good to see it removed from this POV. > ====8<==== > From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> > > Currently we use next vcpu to last boosted vcpu as starting point > while deciding eligible vcpu for directed yield. > > In overcomitted scenarios, if more vcpu try to do directed yield, > they start from same vcpu, resulting in wastage of cpu time (because of > failing yields and double runqueue lock). > > Since probability of same vcpu trying to do directed yield is already > prevented by improved PLE handler, we can start from next vcpu from source > of yield_to. > > Suggested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> > --- > > include/linux/kvm_host.h | 1 - > virt/kvm/kvm_main.c | 12 ++++-------- > 2 files changed, 4 insertions(+), 9 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index b70b48b..64a090d 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -275,7 +275,6 @@ struct kvm { > #endif > struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; > atomic_t online_vcpus; > - int last_boosted_vcpu; > struct list_head vm_list; > struct mutex lock; > struct kvm_io_bus *buses[KVM_NR_BUSES]; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 2468523..65a6c83 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1584,7 +1584,6 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > { > struct kvm *kvm = me->kvm; > struct kvm_vcpu *vcpu; > - int last_boosted_vcpu = me->kvm->last_boosted_vcpu; > int yielded = 0; > int pass; > int i; > @@ -1594,21 +1593,18 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > * currently running, because it got preempted by something > * else and called schedule in __vcpu_run. Hopefully that > * VCPU is holding the lock that we need and will release it. > - * We approximate round-robin by starting at the last boosted VCPU. > + * We approximate round-robin by starting at the next VCPU. > */ > for (pass = 0; pass < 2 && !yielded; pass++) { > kvm_for_each_vcpu(i, vcpu, kvm) { > - if (!pass && i <= last_boosted_vcpu) { > - i = last_boosted_vcpu; > + if (!pass && i <= me->vcpu_id) { > + i = me->vcpu_id; > continue; > - } else if (pass && i > last_boosted_vcpu) > + } else if (pass && i >= me->vcpu_id) > break; > - if (vcpu == me) > - continue; > if (waitqueue_active(&vcpu->wq)) > continue; > if (kvm_vcpu_yield_to(vcpu)) { > - kvm->last_boosted_vcpu = i; > yielded = 1; > break; > } -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/02/2012 06:12 AM, Gleb Natapov wrote: > On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote: >> The idea of starting from next vcpu (source of yield_to + 1) seem to work >> well for overcomitted guest rather than using last boosted vcpu. We can also >> remove per VM variable with this approach. >> >> Iteration for eligible candidate after this patch starts from vcpu source+1 >> and ends at source-1 (after wrapping) >> >> Thanks Nikunj for his quick verification of the patch. >> >> Please let me know if this patch is interesting and makes sense. >> > This last_boosted_vcpu thing caused us trouble during attempt to > implement vcpu destruction. It is good to see it removed from this POV. I like this implementation. It should achieve pretty much the same as my old code, but without the downsides and without having to keep the same amount of global state.
On 09/02/2012 09:59 PM, Rik van Riel wrote: > On 09/02/2012 06:12 AM, Gleb Natapov wrote: >> On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote: >>> The idea of starting from next vcpu (source of yield_to + 1) seem to >>> work >>> well for overcomitted guest rather than using last boosted vcpu. We >>> can also >>> remove per VM variable with this approach. >>> >>> Iteration for eligible candidate after this patch starts from vcpu >>> source+1 >>> and ends at source-1 (after wrapping) >>> >>> Thanks Nikunj for his quick verification of the patch. >>> >>> Please let me know if this patch is interesting and makes sense. >>> >> This last_boosted_vcpu thing caused us trouble during attempt to >> implement vcpu destruction. It is good to see it removed from this POV. > > I like this implementation. It should achieve pretty much > the same as my old code, but without the downsides and without > having to keep the same amount of global state. > My theoretical understanding how it would help is, | V T0 ------- T1 suppose there are 4 vcpus (v1..v4) out of 32/64 vcpus simpultaneously enter directed yield handler, if last_boosted_vcpu = i then v1 .. v4 will start from i, and there may be some unnecessary attempts for directed yields. We may not see such attempts with above patch. But again I agree that, whole directed_yield stuff itself is very complicated because of possibility of each vcpu in different state (running/pauseloop exited while spinning/eligible) and how they are located w.r.t each other. Here is the result I got for ebizzy, 32 vcpu guest 32 core PLE machine for 1x 2x and 3x overcommits. base = 3.5-rc5 kernel with ple handler improvements patches applied patched = base + vcpuid patch base stdev patched stdev %improvement 1x 1955.6250 39.8961 1863.3750 37.8302 -4.71716 2x 2475.3750 165.0307 3078.8750 341.9500 24.38014 3x 2071.5556 91.5370 2112.6667 56.6171 1.98455 Note: I have to admit that, I am seeing very inconsistent results while experimenting with 3.6-rc kernel (not specific to vcpuid patch but as a whole) but not sure if it is some thing wrong in my config or should I spend some time debugging. Anybody has observed same? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/02/2012 09:59 PM, Rik van Riel wrote: > On 09/02/2012 06:12 AM, Gleb Natapov wrote: >> On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote: >>> The idea of starting from next vcpu (source of yield_to + 1) seem to >>> work >>> well for overcomitted guest rather than using last boosted vcpu. We >>> can also >>> remove per VM variable with this approach. >>> >>> Iteration for eligible candidate after this patch starts from vcpu >>> source+1 >>> and ends at source-1 (after wrapping) >>> >>> Thanks Nikunj for his quick verification of the patch. >>> >>> Please let me know if this patch is interesting and makes sense. >>> >> This last_boosted_vcpu thing caused us trouble during attempt to >> implement vcpu destruction. It is good to see it removed from this POV. > > I like this implementation. It should achieve pretty much > the same as my old code, but without the downsides and without > having to keep the same amount of global state. > I able to test this on 3.6-rc5 (where I do not see inconsistency may be it was my bad to go with rc1), with 32 guest 1x and 2x overcommit scenario Here is the result on 16 core ple machine (with HT 32 thread) x240 machine base = 3.6-rc5 + ple handler improvement patch patched = base + vcpuid usage patch +-----------+-----------+-----------+------------+-----------+ ebizzy (records/sec higher is better) +-----------+-----------+-----------+------------+-----------+ base stdev patched stdev %improve +-----------+-----------+-----------+------------+-----------+ 1x 11293.3750 624.4378 11242.8750 583.1757 -0.44716 2x 3641.8750 468.9400 4088.8750 290.5470 12.27390 +-----------+-----------+-----------+------------+-----------+ Avi, Marcelo.. any comments on this? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
====8<==== From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Currently we use next vcpu to last boosted vcpu as starting point while deciding eligible vcpu for directed yield. In overcomitted scenarios, if more vcpu try to do directed yield, they start from same vcpu, resulting in wastage of cpu time (because of failing yields and double runqueue lock). Since probability of same vcpu trying to do directed yield is already prevented by improved PLE handler, we can start from next vcpu from source of yield_to. Suggested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> --- include/linux/kvm_host.h | 1 - virt/kvm/kvm_main.c | 12 ++++-------- 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b70b48b..64a090d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -275,7 +275,6 @@ struct kvm { #endif struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; atomic_t online_vcpus; - int last_boosted_vcpu; struct list_head vm_list; struct mutex lock; struct kvm_io_bus *buses[KVM_NR_BUSES]; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2468523..65a6c83 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1584,7 +1584,6 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) { struct kvm *kvm = me->kvm; struct kvm_vcpu *vcpu; - int last_boosted_vcpu = me->kvm->last_boosted_vcpu; int yielded = 0; int pass; int i; @@ -1594,21 +1593,18 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) * currently running, because it got preempted by something * else and called schedule in __vcpu_run. Hopefully that * VCPU is holding the lock that we need and will release it. - * We approximate round-robin by starting at the last boosted VCPU. + * We approximate round-robin by starting at the next VCPU. */ for (pass = 0; pass < 2 && !yielded; pass++) { kvm_for_each_vcpu(i, vcpu, kvm) { - if (!pass && i <= last_boosted_vcpu) { - i = last_boosted_vcpu; + if (!pass && i <= me->vcpu_id) { + i = me->vcpu_id; continue; - } else if (pass && i > last_boosted_vcpu) + } else if (pass && i >= me->vcpu_id) break; - if (vcpu == me) - continue; if (waitqueue_active(&vcpu->wq)) continue; if (kvm_vcpu_yield_to(vcpu)) { - kvm->last_boosted_vcpu = i; yielded = 1; break; }