Message ID | 20120709062032.24030.10454.sendpatchset@codeblue (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/09/2012 11:50 AM, Raghavendra K T wrote: > Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> > > Noting pause loop exited vcpu helps in filtering right candidate to yield. > Yielding to same vcpu may result in more wastage of cpu. > > From: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> > --- Oops. Sorry some how sign-off and from interchanged.. interchanged -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/09/2012 02:20 AM, Raghavendra K T wrote: > @@ -484,6 +484,13 @@ struct kvm_vcpu_arch { > u64 length; > u64 status; > } osvw; > + > + /* Pause loop exit optimization */ > + struct { > + bool pause_loop_exited; > + bool dy_eligible; > + } plo; I know kvm_vcpu_arch is traditionally not a well documented structure, but it would be really nice if each variable inside this sub-structure could get some documentation. Also, do we really want to introduce another acronym here? Or would we be better off simply calling this struct .ple, since that is a name people are already familiar with.
On 07/10/2012 04:09 AM, Rik van Riel wrote: > On 07/09/2012 02:20 AM, Raghavendra K T wrote: > >> @@ -484,6 +484,13 @@ struct kvm_vcpu_arch { >> u64 length; >> u64 status; >> } osvw; >> + >> + /* Pause loop exit optimization */ >> + struct { >> + bool pause_loop_exited; >> + bool dy_eligible; >> + } plo; > > I know kvm_vcpu_arch is traditionally not a well documented > structure, but it would be really nice if each variable inside > this sub-structure could get some documentation. Sure. Will document it. > > Also, do we really want to introduce another acronym here? > > Or would we be better off simply calling this struct .ple, > since that is a name people are already familiar with. Yes. it makes sense to have .ple. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/09/2012 09:20 AM, Raghavendra K T wrote: > Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> > > Noting pause loop exited vcpu helps in filtering right candidate to yield. > Yielding to same vcpu may result in more wastage of cpu. > > > struct kvm_lpage_info { > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index f75af40..a492f5d 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm) > > static int pause_interception(struct vcpu_svm *svm) > { > + svm->vcpu.arch.plo.pause_loop_exited = true; > kvm_vcpu_on_spin(&(svm->vcpu)); > return 1; > } > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 32eb588..600fb3c 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -4945,6 +4945,7 @@ out: > static int handle_pause(struct kvm_vcpu *vcpu) > { > skip_emulated_instruction(vcpu); > + vcpu->arch.plo.pause_loop_exited = true; > kvm_vcpu_on_spin(vcpu); > This code is duplicated. Should we move it to kvm_vcpu_on_spin? That means the .plo structure needs to be in common code, but that's not too bad perhaps. > index be6d549..07dbd14 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > if (req_immediate_exit) > smp_send_reschedule(vcpu->cpu); > - > + vcpu->arch.plo.pause_loop_exited = false; This adds some tiny overhead to vcpu entry. You could remove it by using the vcpu->requests mechanism to clear the flag, since vcpu->requests is already checked on every entry. > kvm_guest_enter(); >
On 07/11/2012 02:23 PM, Avi Kivity wrote: > On 07/09/2012 09:20 AM, Raghavendra K T wrote: >> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> >> >> Noting pause loop exited vcpu helps in filtering right candidate to yield. >> Yielding to same vcpu may result in more wastage of cpu. >> >> >> struct kvm_lpage_info { >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >> index f75af40..a492f5d 100644 >> --- a/arch/x86/kvm/svm.c >> +++ b/arch/x86/kvm/svm.c >> @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm) >> >> static int pause_interception(struct vcpu_svm *svm) >> { >> + svm->vcpu.arch.plo.pause_loop_exited = true; >> kvm_vcpu_on_spin(&(svm->vcpu)); >> return 1; >> } >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 32eb588..600fb3c 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -4945,6 +4945,7 @@ out: >> static int handle_pause(struct kvm_vcpu *vcpu) >> { >> skip_emulated_instruction(vcpu); >> + vcpu->arch.plo.pause_loop_exited = true; >> kvm_vcpu_on_spin(vcpu); >> > > This code is duplicated. Should we move it to kvm_vcpu_on_spin? > > That means the .plo structure needs to be in common code, but that's not > too bad perhaps. > Since PLE is very much tied to x86, and proposed changes are very much specific to PLE handler, I thought it is better to make arch specific. So do you think it is good to move inside vcpu_on_spin and make ple structure belong to common code? >> index be6d549..07dbd14 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >> >> if (req_immediate_exit) >> smp_send_reschedule(vcpu->cpu); >> - >> + vcpu->arch.plo.pause_loop_exited = false; > > This adds some tiny overhead to vcpu entry. You could remove it by > using the vcpu->requests mechanism to clear the flag, since > vcpu->requests is already checked on every entry. So IIUC, let's have request bit for indicating PLE, pause_interception() /handle_pause() { make_request(PLE_REQUEST) vcpu_on_spin() } check_eligibility() { !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST) && dy_eligible()) . . } vcpu_run() { check_request(PLE_REQUEST) . . } Is this is the expected flow you had in mind? [ But my only concern was not resetting for cases where we do not do guest_enter(). will test how that goes]. > >> kvm_guest_enter(); >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/11/2012 01:52 PM, Raghavendra K T wrote: > On 07/11/2012 02:23 PM, Avi Kivity wrote: >> On 07/09/2012 09:20 AM, Raghavendra K T wrote: >>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> >>> >>> Noting pause loop exited vcpu helps in filtering right candidate to >>> yield. >>> Yielding to same vcpu may result in more wastage of cpu. >>> >>> >>> struct kvm_lpage_info { >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >>> index f75af40..a492f5d 100644 >>> --- a/arch/x86/kvm/svm.c >>> +++ b/arch/x86/kvm/svm.c >>> @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct >>> vcpu_svm *svm) >>> >>> static int pause_interception(struct vcpu_svm *svm) >>> { >>> + svm->vcpu.arch.plo.pause_loop_exited = true; >>> kvm_vcpu_on_spin(&(svm->vcpu)); >>> return 1; >>> } >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> index 32eb588..600fb3c 100644 >>> --- a/arch/x86/kvm/vmx.c >>> +++ b/arch/x86/kvm/vmx.c >>> @@ -4945,6 +4945,7 @@ out: >>> static int handle_pause(struct kvm_vcpu *vcpu) >>> { >>> skip_emulated_instruction(vcpu); >>> + vcpu->arch.plo.pause_loop_exited = true; >>> kvm_vcpu_on_spin(vcpu); >>> >> >> This code is duplicated. Should we move it to kvm_vcpu_on_spin? >> >> That means the .plo structure needs to be in common code, but that's not >> too bad perhaps. >> > > Since PLE is very much tied to x86, and proposed changes are very much > specific to PLE handler, I thought it is better to make arch specific. > > So do you think it is good to move inside vcpu_on_spin and make ple > structure belong to common code? See the discussion with Christian. PLE is tied to x86, but cpu_relax() and facilities to trap it are not. >> >> This adds some tiny overhead to vcpu entry. You could remove it by >> using the vcpu->requests mechanism to clear the flag, since >> vcpu->requests is already checked on every entry. > > So IIUC, let's have request bit for indicating PLE, > > pause_interception() /handle_pause() > { > make_request(PLE_REQUEST) > vcpu_on_spin() > > } > > check_eligibility() > { > !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST) && > dy_eligible()) > . > . > } > > vcpu_run() > { > > check_request(PLE_REQUEST) > . > . > } > > Is this is the expected flow you had in mind? Yes, something like that. > > [ But my only concern was not resetting for cases where we do not do > guest_enter(). will test how that goes]. Hm, suppose we're the next-in-line for a ticket lock and exit due to PLE. The lock holder completes and unlocks, which really assigns the lock to us. So now we are the lock owner, yet we are marked as don't yield-to-us in the PLE code.
On 07/11/2012 04:48 PM, Avi Kivity wrote: > On 07/11/2012 01:52 PM, Raghavendra K T wrote: >> On 07/11/2012 02:23 PM, Avi Kivity wrote: >>> On 07/09/2012 09:20 AM, Raghavendra K T wrote: >>>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> >>>> >>>> Noting pause loop exited vcpu helps in filtering right candidate to >>>> yield. >>>> Yielding to same vcpu may result in more wastage of cpu. >>>> >>>> >>>> struct kvm_lpage_info { >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >>>> index f75af40..a492f5d 100644 >>>> --- a/arch/x86/kvm/svm.c >>>> +++ b/arch/x86/kvm/svm.c >>>> @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct >>>> vcpu_svm *svm) >>>> >>>> static int pause_interception(struct vcpu_svm *svm) >>>> { >>>> + svm->vcpu.arch.plo.pause_loop_exited = true; >>>> kvm_vcpu_on_spin(&(svm->vcpu)); >>>> return 1; >>>> } >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>> index 32eb588..600fb3c 100644 >>>> --- a/arch/x86/kvm/vmx.c >>>> +++ b/arch/x86/kvm/vmx.c >>>> @@ -4945,6 +4945,7 @@ out: >>>> static int handle_pause(struct kvm_vcpu *vcpu) >>>> { >>>> skip_emulated_instruction(vcpu); >>>> + vcpu->arch.plo.pause_loop_exited = true; >>>> kvm_vcpu_on_spin(vcpu); >>>> >>> >>> This code is duplicated. Should we move it to kvm_vcpu_on_spin? >>> >>> That means the .plo structure needs to be in common code, but that's not >>> too bad perhaps. >>> >> >> Since PLE is very much tied to x86, and proposed changes are very much >> specific to PLE handler, I thought it is better to make arch specific. >> >> So do you think it is good to move inside vcpu_on_spin and make ple >> structure belong to common code? > > See the discussion with Christian. PLE is tied to x86, but cpu_relax() > and facilities to trap it are not. Yep. > >>> >>> This adds some tiny overhead to vcpu entry. You could remove it by >>> using the vcpu->requests mechanism to clear the flag, since >>> vcpu->requests is already checked on every entry. >> >> So IIUC, let's have request bit for indicating PLE, >> >> pause_interception() /handle_pause() >> { >> make_request(PLE_REQUEST) >> vcpu_on_spin() >> >> } >> >> check_eligibility() >> { >> !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST)&& >> dy_eligible()) >> . >> . >> } >> >> vcpu_run() >> { >> >> check_request(PLE_REQUEST) >> . >> . >> } >> >> Is this is the expected flow you had in mind? > > Yes, something like that. ok.. > >> >> [ But my only concern was not resetting for cases where we do not do >> guest_enter(). will test how that goes]. > > Hm, suppose we're the next-in-line for a ticket lock and exit due to > PLE. The lock holder completes and unlocks, which really assigns the > lock to us. So now we are the lock owner, yet we are marked as don't > yield-to-us in the PLE code. Yes.. off-topic but that is solved by kicked flag in PV spinlocks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > > > > Hm, suppose we're the next-in-line for a ticket lock and exit due > > to > > PLE. The lock holder completes and unlocks, which really assigns > > the > > lock to us. So now we are the lock owner, yet we are marked as > > don't > > yield-to-us in the PLE code. > > Yes.. off-topic but that is solved by kicked flag in PV spinlocks. > Yeah, this is a different, but related, topic. pvticketlocks not only help yield spinning vcpus, but also allow the use of ticketlocks for the guaranteed fairness. If we want that fairness we'll always need some pv-ness, even if it's a mostly PLE solution. I see that as a reasonable reason to take the pvticketlock series, assuming it performs at least as well as PLE. The following options have all been brought up at some point by various people, and all have their own pluses and minuses; PLE-only best-effort: + hardware-only, algorithm improvements can be made independent of guest OSes - has limited info about spinning vcpus, making it hard to improve the algorithm (improved with an auto-adjusting ple_window?) - perf enhancement/degradation is workload/ple_window dependant - impossible? to guarantee FIFO order pvticketlocks: - have to maintain pv code, both hosts and guests + perf is only workload dependant (should disable ple to avoid interference?) + guarantees FIFO order + can fall-back on PLE-only if the guest doesn't support it hybrid: + takes advantage of the hw support - still requires host and guest pv code - will likely make perf dependant on ple_window again + guarantees FIFO order ???: did I miss any? I think more benchmarking of PLE vs. pvticketlocks is needed, which I'm working on. If we see that it performs just as well or better, then IMHO, we should consider committing Raghu's latest version of the pvticketlock series, perhaps with and additional patch that auto-disables PLE when pvticketlocks are enabled. Drew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2012 16:22:29 +0530, Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote: > On 07/11/2012 02:23 PM, Avi Kivity wrote: > > > > This adds some tiny overhead to vcpu entry. You could remove it by > > using the vcpu->requests mechanism to clear the flag, since > > vcpu->requests is already checked on every entry. > > So IIUC, let's have request bit for indicating PLE, > > pause_interception() /handle_pause() > { > make_request(PLE_REQUEST) > vcpu_on_spin() > > } > > check_eligibility() > { > !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST) && > dy_eligible()) > . > . > } > > vcpu_run() > { > > check_request(PLE_REQUEST) > I know check_request will clear PLE_REQUEST, but you just need a clear_request here, right? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/12/2012 04:28 PM, Nikunj A Dadhania wrote: > On Wed, 11 Jul 2012 16:22:29 +0530, Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> wrote: >> On 07/11/2012 02:23 PM, Avi Kivity wrote: >>> >>> This adds some tiny overhead to vcpu entry. You could remove it by >>> using the vcpu->requests mechanism to clear the flag, since >>> vcpu->requests is already checked on every entry. >> >> So IIUC, let's have request bit for indicating PLE, >> >> pause_interception() /handle_pause() >> { >> make_request(PLE_REQUEST) >> vcpu_on_spin() >> >> } >> >> check_eligibility() >> { >> !test_request(PLE_REQUEST) || ( test_request(PLE_REQUEST)&& >> dy_eligible()) >> . >> . >> } >> >> vcpu_run() >> { >> >> check_request(PLE_REQUEST) >> > I know check_request will clear PLE_REQUEST, but you just need a > clear_request here, right? > Yes. tried to use check_request for clearing. But I ended up in different implementation. (latest thread) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index db7c1f2..857ca68 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -484,6 +484,13 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + /* Pause loop exit optimization */ + struct { + bool pause_loop_exited; + bool dy_eligible; + } plo; + }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f75af40..a492f5d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm) static int pause_interception(struct vcpu_svm *svm) { + svm->vcpu.arch.plo.pause_loop_exited = true; kvm_vcpu_on_spin(&(svm->vcpu)); return 1; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 32eb588..600fb3c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4945,6 +4945,7 @@ out: static int handle_pause(struct kvm_vcpu *vcpu) { skip_emulated_instruction(vcpu); + vcpu->arch.plo.pause_loop_exited = true; kvm_vcpu_on_spin(vcpu); return 1; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be6d549..07dbd14 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (req_immediate_exit) smp_send_reschedule(vcpu->cpu); - + vcpu->arch.plo.pause_loop_exited = false; kvm_guest_enter(); if (unlikely(vcpu->arch.switch_db_regs)) { @@ -6168,6 +6168,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) BUG_ON(vcpu->kvm == NULL); kvm = vcpu->kvm; + vcpu->arch.plo.pause_loop_exited = false; + vcpu->arch.plo.dy_eligible = true; vcpu->arch.emulate_ctxt.ops = &emulate_ops; if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu)) vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;