diff mbox

[RFC,0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

Message ID 506537C7.9070909@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Raghavendra K T Sept. 28, 2012, 5:38 a.m. UTC
On 09/27/2012 05:33 PM, Avi Kivity wrote:
> On 09/27/2012 01:23 PM, Raghavendra K T wrote:
>>>
>>> This gives us a good case for tracking preemption on a per-vm basis.  As
>>> long as we aren't preempted, we can keep the PLE window high, and also
>>> return immediately from the handler without looking for candidates.
>>
>> 1) So do you think, deferring preemption patch ( Vatsa was mentioning
>> long back)  is also another thing worth trying, so we reduce the chance
>> of LHP.
>
> Yes, we have to keep it in mind.  It will be useful for fine grained
> locks, not so much so coarse locks or IPIs.
>

Agree.

> I would still of course prefer a PLE solution, but if we can't get it to
> work we can consider preemption deferral.
>

Okay.

>>
>> IIRC, with defer preemption :
>> we will have hook in spinlock/unlock path to measure depth of lock held,
>> and shared with host scheduler (may be via MSRs now).
>> Host scheduler 'prefers' not to preempt lock holding vcpu. (or rather
>> give say one chance.
>
> A downside is that we have to do that even when undercommitted.
>
> Also there may be a lot of false positives (deferred preemptions even
> when there is no contention).

Yes. That is a worry.

>
>>
>> 2) looking at the result (comparing A & C) , I do feel we have
>> significant in iterating over vcpus (when compared to even vmexit)
>> so We still would need undercommit fix sugested by PeterZ (improving by
>> 140%). ?
>
> Looking only at the current runqueue?  My worry is that it misses a lot
> of cases.  Maybe try the current runqueue first and then others.
>
> Or were you referring to something else?

No. I was referring to the same thing.

However. I had tried following also (which works well to check 
undercommited scenario). But thinking to use only for yielding in case
of overcommit (yield in overcommit suggested by Rik) and keep 
undercommit patch as suggested by PeterZ

[ patch is not in proper diff I suppose ].

Will test them.

Peter, Can I post your patch with your from/sob.. in V2?
Please let me know..

---
  	struct kvm *kvm = me->kvm;
@@ -1629,6 +1644,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  	int pass;
  	int i;

+	if (!kvm_overcommitted())
+		return;
+
  	kvm_vcpu_set_in_spin_loop(me, true);
  	/*
  	 * We boost the priority of a VCPU that is runnable but not

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

H. Peter Anvin Sept. 28, 2012, 5:45 a.m. UTC | #1
On 09/27/2012 10:38 PM, Raghavendra K T wrote:
> +
> +bool kvm_overcommitted()
> +{

This better not be C...

	-hpa
Raghavendra K T Sept. 28, 2012, 6:03 a.m. UTC | #2
On 09/28/2012 11:15 AM, H. Peter Anvin wrote:
> On 09/27/2012 10:38 PM, Raghavendra K T wrote:
>> +
>> +bool kvm_overcommitted()
>> +{
>
> This better not be C...

I think you meant I should have had like kvm_overcommitted(void) and 
(different function name perhaps)

or is it the body of function?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Sept. 28, 2012, 8:38 a.m. UTC | #3
On Fri, 2012-09-28 at 11:08 +0530, Raghavendra K T wrote:
> 
> Peter, Can I post your patch with your from/sob.. in V2?
> Please let me know.. 

Yeah I guess ;-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Theurer Sept. 28, 2012, 11:40 a.m. UTC | #4
On Fri, 2012-09-28 at 11:08 +0530, Raghavendra K T wrote:
> On 09/27/2012 05:33 PM, Avi Kivity wrote:
> > On 09/27/2012 01:23 PM, Raghavendra K T wrote:
> >>>
> >>> This gives us a good case for tracking preemption on a per-vm basis.  As
> >>> long as we aren't preempted, we can keep the PLE window high, and also
> >>> return immediately from the handler without looking for candidates.
> >>
> >> 1) So do you think, deferring preemption patch ( Vatsa was mentioning
> >> long back)  is also another thing worth trying, so we reduce the chance
> >> of LHP.
> >
> > Yes, we have to keep it in mind.  It will be useful for fine grained
> > locks, not so much so coarse locks or IPIs.
> >
> 
> Agree.
> 
> > I would still of course prefer a PLE solution, but if we can't get it to
> > work we can consider preemption deferral.
> >
> 
> Okay.
> 
> >>
> >> IIRC, with defer preemption :
> >> we will have hook in spinlock/unlock path to measure depth of lock held,
> >> and shared with host scheduler (may be via MSRs now).
> >> Host scheduler 'prefers' not to preempt lock holding vcpu. (or rather
> >> give say one chance.
> >
> > A downside is that we have to do that even when undercommitted.

Hopefully vcpu preemption is very rare when undercommitted, so it should
not happen much at all.
> >
> > Also there may be a lot of false positives (deferred preemptions even
> > when there is no contention).

It will be interesting to see how this behaves with a very high lock
activity in a guest.  Once the scheduler defers preemption, is it for a
fixed amount of time, or does it know to cut the deferral short as soon
as the lock depth is reduced [by x]?
> 
> Yes. That is a worry.
> 
> >
> >>
> >> 2) looking at the result (comparing A & C) , I do feel we have
> >> significant in iterating over vcpus (when compared to even vmexit)
> >> so We still would need undercommit fix sugested by PeterZ (improving by
> >> 140%). ?
> >
> > Looking only at the current runqueue?  My worry is that it misses a lot
> > of cases.  Maybe try the current runqueue first and then others.
> >
> > Or were you referring to something else?
> 
> No. I was referring to the same thing.
> 
> However. I had tried following also (which works well to check 
> undercommited scenario). But thinking to use only for yielding in case
> of overcommit (yield in overcommit suggested by Rik) and keep 
> undercommit patch as suggested by PeterZ
> 
> [ patch is not in proper diff I suppose ].
> 
> Will test them.
> 
> Peter, Can I post your patch with your from/sob.. in V2?
> Please let me know..
> 
> ---
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 28f00bc..9ed3759 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1620,6 +1620,21 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
> kvm_vcpu *vcpu)
>   	return eligible;
>   }
>   #endif
> +
> +bool kvm_overcommitted()
> +{
> +	unsigned long load;
> +
> +	load = avenrun[0] + FIXED_1/200;
> +	load = load >> FSHIFT;
> +	load = (load << 7) / num_online_cpus();
> +
> +	if (load > 128)
> +		return true;
> +
> +	return false;
> +}
> +
>   void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>   {
>   	struct kvm *kvm = me->kvm;
> @@ -1629,6 +1644,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>   	int pass;
>   	int i;
> 
> +	if (!kvm_overcommitted())
> +		return;
> +
>   	kvm_vcpu_set_in_spin_loop(me, true);
>   	/*
>   	 * We boost the priority of a VCPU that is runnable but not


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghavendra K T Sept. 28, 2012, 2:11 p.m. UTC | #5
On 09/28/2012 05:10 PM, Andrew Theurer wrote:
> On Fri, 2012-09-28 at 11:08 +0530, Raghavendra K T wrote:
>> On 09/27/2012 05:33 PM, Avi Kivity wrote:
>>> On 09/27/2012 01:23 PM, Raghavendra K T wrote:
>>>>>
[...]
>>>
>>> Also there may be a lot of false positives (deferred preemptions even
>>> when there is no contention).
>
> It will be interesting to see how this behaves with a very high lock
> activity in a guest.  Once the scheduler defers preemption, is it for a
> fixed amount of time, or does it know to cut the deferral short as soon
> as the lock depth is reduced [by x]?

Design/protocol that Vatsa, had in mind was something like this:

- scheduler does not give a vcpu holding lock forever, it may give one
chance that would give only few ticks. In addition to giving chance,
scheduler also sets some indication that he has been given chance.

- vcpu once he release (all) the lock(s), if it had given chance,
it should clear that (ACK), and relinquish the cpu.




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Sept. 28, 2012, 2:13 p.m. UTC | #6
On Fri, 2012-09-28 at 06:40 -0500, Andrew Theurer wrote:
> It will be interesting to see how this behaves with a very high lock
> activity in a guest.  Once the scheduler defers preemption, is it for
> a
> fixed amount of time, or does it know to cut the deferral short as
> soon
> as the lock depth is reduced [by x]? 

Since the locks live in a guest/userspace, we don't even know they're
held at all, let alone when state changes.

Also, afaik PLE simply exits the guest whenever you do a busy-wait,
there's no guarantee its due to a lock at all, we could be waiting for a
'virtual' hardware resource or whatnot.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 30, 2012, 8:24 a.m. UTC | #7
On 09/28/2012 01:40 PM, Andrew Theurer wrote:
>> 
>> >>
>> >> IIRC, with defer preemption :
>> >> we will have hook in spinlock/unlock path to measure depth of lock held,
>> >> and shared with host scheduler (may be via MSRs now).
>> >> Host scheduler 'prefers' not to preempt lock holding vcpu. (or rather
>> >> give say one chance.
>> >
>> > A downside is that we have to do that even when undercommitted.
> 
> Hopefully vcpu preemption is very rare when undercommitted, so it should
> not happen much at all.

As soon as you're preempted, you're effectively overcommitted (even if
the system as a whole is undercommitted).  What I meant was that you
need to communicate your lock state to the host, and with fine-grained
locking this can happen a lot.  It may be as simple as an
increment/decrement instruction though.
diff mbox

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 28f00bc..9ed3759 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1620,6 +1620,21 @@  bool kvm_vcpu_eligible_for_directed_yield(struct 
kvm_vcpu *vcpu)
  	return eligible;
  }
  #endif
+
+bool kvm_overcommitted()
+{
+	unsigned long load;
+
+	load = avenrun[0] + FIXED_1/200;
+	load = load >> FSHIFT;
+	load = (load << 7) / num_online_cpus();
+
+	if (load > 128)
+		return true;
+
+	return false;
+}
+
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {