diff mbox

[3/4] arm64: KVM: let other tasks run when hitting WFE

Message ID 1374242035-13199-4-git-send-email-marc.zyngier@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier July 19, 2013, 1:53 p.m. UTC
So far, when a guest executes WFE (like when waiting for a spinlock
to become unlocked), we don't do a thing and let it run uninterrupted.

Another option is to trap a blocking WFE and offer the opportunity
to the scheduler to switch to another task, potentially giving the
vcpu holding the spinlock a chance to run sooner.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |  6 ++++--
 arch/arm64/kvm/handle_exit.c     | 18 +++++++++++++-----
 2 files changed, 17 insertions(+), 7 deletions(-)

Comments

Will Deacon July 19, 2013, 2:25 p.m. UTC | #1
On Fri, Jul 19, 2013 at 02:53:54PM +0100, Marc Zyngier wrote:
> So far, when a guest executes WFE (like when waiting for a spinlock
> to become unlocked), we don't do a thing and let it run uninterrupted.
> 
> Another option is to trap a blocking WFE and offer the opportunity
> to the scheduler to switch to another task, potentially giving the
> vcpu holding the spinlock a chance to run sooner.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

[...]

> +static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	kvm_vcpu_block(vcpu);
> +	if (kvm_vcpu_get_hsr(vcpu) & 1)
> +		cond_resched();

The hardcoded `1' doesn't make it obvious that we're treating wfi and wfe
differently. Could you use a #define for that bit in hsr.iss please?

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier July 19, 2013, 2:29 p.m. UTC | #2
On 19/07/13 15:25, Will Deacon wrote:
> On Fri, Jul 19, 2013 at 02:53:54PM +0100, Marc Zyngier wrote:
>> So far, when a guest executes WFE (like when waiting for a spinlock
>> to become unlocked), we don't do a thing and let it run uninterrupted.
>>
>> Another option is to trap a blocking WFE and offer the opportunity
>> to the scheduler to switch to another task, potentially giving the
>> vcpu holding the spinlock a chance to run sooner.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> [...]
> 
>> +static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -	kvm_vcpu_block(vcpu);
>> +	if (kvm_vcpu_get_hsr(vcpu) & 1)
>> +		cond_resched();
> 
> The hardcoded `1' doesn't make it obvious that we're treating wfi and wfe
> differently. Could you use a #define for that bit in hsr.iss please?

Good point. I'll rework that part.

	M.
Christoffer Dall July 20, 2013, 10:04 p.m. UTC | #3
On Fri, Jul 19, 2013 at 02:53:54PM +0100, Marc Zyngier wrote:
> So far, when a guest executes WFE (like when waiting for a spinlock
> to become unlocked), we don't do a thing and let it run uninterrupted.
> 
> Another option is to trap a blocking WFE and offer the opportunity
> to the scheduler to switch to another task, potentially giving the
> vcpu holding the spinlock a chance to run sooner.
> 

I'm curious if we have any data supporting this to be a good idea?

My intuition here is that waiting for a spinlock really shouldn't be
something a guest is doing for a long time - we always try to avoid too
much contention on spinlocks, no?  The theory that it would unlock the
spinlock sooner is really only supported if the CPU resources are
grossly oversubscribed - are we optimizing for this case?

So, how many cycles do we anticipate a world-switch back and forward
between a VM and the host to be compared to the average number of spin
cycles for a spinlock?

Finally, for the case where a waiting vcpu is only going to spin for a
couple of cycles, aren't we adding significant overhead?  I would expect
this to be the most common case.


> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  6 ++++--
>  arch/arm64/kvm/handle_exit.c     | 18 +++++++++++++-----
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index a5f28e2..ac1ea05 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -63,6 +63,7 @@
>   * TAC:		Trap ACTLR
>   * TSC:		Trap SMC
>   * TSW:		Trap cache operations by set/way
> + * TWE:		Trap WFE
>   * TWI:		Trap WFI
>   * TIDCP:	Trap L2CTLR/L2ECTLR
>   * BSU_IS:	Upgrade barriers to the inner shareable domain
> @@ -72,8 +73,9 @@
>   * FMO:		Override CPSR.F and enable signaling with VF
>   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
>   */
> -#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
> -			 HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
> +#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> +			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
> +			 HCR_AMO | HCR_IMO | HCR_FMO | \
>  			 HCR_SWIO | HCR_TIDCP | HCR_RW)
>  #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>  
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 9beaca03..b0098c2 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -47,21 +47,29 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  }
>  
>  /**
> - * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
> + * kvm_handle_wfx - handle a wait-for-interrupts or wait-for-event
> + *		    instruction executed by a guest
> + *
>   * @vcpu:	the vcpu pointer
>   *
> - * Simply call kvm_vcpu_block(), which will halt execution of
> + * WFE: Yield the CPU and come back to this vcpu when the scheduler
> + * decides to.
> + * WFI: Simply call kvm_vcpu_block(), which will halt execution of
>   * world-switches and schedule other host processes until there is an
>   * incoming IRQ or FIQ to the VM.
>   */
> -static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	kvm_vcpu_block(vcpu);
> +	if (kvm_vcpu_get_hsr(vcpu) & 1)
> +		cond_resched();
> +	else
> +		kvm_vcpu_block(vcpu);
> +
>  	return 1;
>  }
>  
>  static exit_handle_fn arm_exit_handlers[] = {
> -	[ESR_EL2_EC_WFI]	= kvm_handle_wfi,
> +	[ESR_EL2_EC_WFI]	= kvm_handle_wfx,
>  	[ESR_EL2_EC_CP15_32]	= kvm_handle_cp15_32,
>  	[ESR_EL2_EC_CP15_64]	= kvm_handle_cp15_64,
>  	[ESR_EL2_EC_CP14_MR]	= kvm_handle_cp14_access,
> -- 
> 1.8.2.3
> 
>
Gleb Natapov July 22, 2013, 7:36 a.m. UTC | #4
On Fri, Jul 19, 2013 at 02:53:54PM +0100, Marc Zyngier wrote:
> So far, when a guest executes WFE (like when waiting for a spinlock
> to become unlocked), we don't do a thing and let it run uninterrupted.
> 
> Another option is to trap a blocking WFE and offer the opportunity
> to the scheduler to switch to another task, potentially giving the
> vcpu holding the spinlock a chance to run sooner.
> 
Why is this targeted at 3.11 as opposite to 3.12? Looks like an
optimization.

> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  6 ++++--
>  arch/arm64/kvm/handle_exit.c     | 18 +++++++++++++-----
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index a5f28e2..ac1ea05 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -63,6 +63,7 @@
>   * TAC:		Trap ACTLR
>   * TSC:		Trap SMC
>   * TSW:		Trap cache operations by set/way
> + * TWE:		Trap WFE
>   * TWI:		Trap WFI
>   * TIDCP:	Trap L2CTLR/L2ECTLR
>   * BSU_IS:	Upgrade barriers to the inner shareable domain
> @@ -72,8 +73,9 @@
>   * FMO:		Override CPSR.F and enable signaling with VF
>   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
>   */
> -#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
> -			 HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
> +#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> +			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
> +			 HCR_AMO | HCR_IMO | HCR_FMO | \
>  			 HCR_SWIO | HCR_TIDCP | HCR_RW)
>  #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>  
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 9beaca03..b0098c2 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -47,21 +47,29 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  }
>  
>  /**
> - * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
> + * kvm_handle_wfx - handle a wait-for-interrupts or wait-for-event
> + *		    instruction executed by a guest
> + *
>   * @vcpu:	the vcpu pointer
>   *
> - * Simply call kvm_vcpu_block(), which will halt execution of
> + * WFE: Yield the CPU and come back to this vcpu when the scheduler
> + * decides to.
> + * WFI: Simply call kvm_vcpu_block(), which will halt execution of
>   * world-switches and schedule other host processes until there is an
>   * incoming IRQ or FIQ to the VM.
>   */
> -static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	kvm_vcpu_block(vcpu);
> +	if (kvm_vcpu_get_hsr(vcpu) & 1)
> +		cond_resched();
> +	else
> +		kvm_vcpu_block(vcpu);
> +
>  	return 1;
>  }
>  
>  static exit_handle_fn arm_exit_handlers[] = {
> -	[ESR_EL2_EC_WFI]	= kvm_handle_wfi,
> +	[ESR_EL2_EC_WFI]	= kvm_handle_wfx,
>  	[ESR_EL2_EC_CP15_32]	= kvm_handle_cp15_32,
>  	[ESR_EL2_EC_CP15_64]	= kvm_handle_cp15_64,
>  	[ESR_EL2_EC_CP14_MR]	= kvm_handle_cp14_access,
> -- 
> 1.8.2.3
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghavendra KT July 22, 2013, 8:53 a.m. UTC | #5
On Fri, Jul 19, 2013 at 7:23 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> So far, when a guest executes WFE (like when waiting for a spinlock
> to become unlocked), we don't do a thing and let it run uninterrupted.
>
> Another option is to trap a blocking WFE and offer the opportunity
> to the scheduler to switch to another task, potentially giving the
> vcpu holding the spinlock a chance to run sooner.
>

Idea looks to be correct from my experiments on x86. It does bring some
percentage of benefits in overcommitted guests. Infact,

https://lkml.org/lkml/2013/7/22/41 tries to do the same thing for x86.
(this results in using ple handler heuristics in vcpu_block pach).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall July 22, 2013, 12:51 p.m. UTC | #6
On 22 July 2013 10:53, Raghavendra KT <raghavendra.kt.linux@gmail.com> wrote:
> On Fri, Jul 19, 2013 at 7:23 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> So far, when a guest executes WFE (like when waiting for a spinlock
>> to become unlocked), we don't do a thing and let it run uninterrupted.
>>
>> Another option is to trap a blocking WFE and offer the opportunity
>> to the scheduler to switch to another task, potentially giving the
>> vcpu holding the spinlock a chance to run sooner.
>>
>
> Idea looks to be correct from my experiments on x86. It does bring some
> percentage of benefits in overcommitted guests. Infact,
>
> https://lkml.org/lkml/2013/7/22/41 tries to do the same thing for x86.
> (this results in using ple handler heuristics in vcpu_block pach).

What about the adverse effect in the non-overcommitted case?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon July 22, 2013, 1:01 p.m. UTC | #7
On Mon, Jul 22, 2013 at 01:51:52PM +0100, Christoffer Dall wrote:
> On 22 July 2013 10:53, Raghavendra KT <raghavendra.kt.linux@gmail.com> wrote:
> > On Fri, Jul 19, 2013 at 7:23 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >> So far, when a guest executes WFE (like when waiting for a spinlock
> >> to become unlocked), we don't do a thing and let it run uninterrupted.
> >>
> >> Another option is to trap a blocking WFE and offer the opportunity
> >> to the scheduler to switch to another task, potentially giving the
> >> vcpu holding the spinlock a chance to run sooner.
> >>
> >
> > Idea looks to be correct from my experiments on x86. It does bring some
> > percentage of benefits in overcommitted guests. Infact,
> >
> > https://lkml.org/lkml/2013/7/22/41 tries to do the same thing for x86.
> > (this results in using ple handler heuristics in vcpu_block pach).
> 
> What about the adverse effect in the non-overcommitted case?

You might hope to reduce overall system latency, in the same way that
PREEMPT kernels implement spin_lock in terms of spin_trylock. Marc might
have some ideas about the practical performance of this, but he's currently
on holiday and (assumedly) not reading the list!

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghavendra K T July 22, 2013, 1:57 p.m. UTC | #8
On 07/22/2013 06:21 PM, Christoffer Dall wrote:
> On 22 July 2013 10:53, Raghavendra KT <raghavendra.kt.linux@gmail.com> wrote:
>> On Fri, Jul 19, 2013 at 7:23 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> So far, when a guest executes WFE (like when waiting for a spinlock
>>> to become unlocked), we don't do a thing and let it run uninterrupted.
>>>
>>> Another option is to trap a blocking WFE and offer the opportunity
>>> to the scheduler to switch to another task, potentially giving the
>>> vcpu holding the spinlock a chance to run sooner.
>>>
>>
>> Idea looks to be correct from my experiments on x86. It does bring some
>> percentage of benefits in overcommitted guests. Infact,
>>
>> https://lkml.org/lkml/2013/7/22/41 tries to do the same thing for x86.
>> (this results in using ple handler heuristics in vcpu_block pach).
>
> What about the adverse effect in the non-overcommitted case?
>

Ideally is should fail to schedule any other task and comeback to halt
loop. This should not hurt AFAICS. But I agree that, numbers needed to
support this argument.

For x86, I had seen no side effects with the experiments.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas July 23, 2013, 10:41 a.m. UTC | #9
On Mon, Jul 22, 2013 at 01:51:52PM +0100, Christoffer Dall wrote:
> On 22 July 2013 10:53, Raghavendra KT <raghavendra.kt.linux@gmail.com> wrote:
> > On Fri, Jul 19, 2013 at 7:23 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >> So far, when a guest executes WFE (like when waiting for a spinlock
> >> to become unlocked), we don't do a thing and let it run uninterrupted.
> >>
> >> Another option is to trap a blocking WFE and offer the opportunity
> >> to the scheduler to switch to another task, potentially giving the
> >> vcpu holding the spinlock a chance to run sooner.
> >>
> >
> > Idea looks to be correct from my experiments on x86. It does bring some
> > percentage of benefits in overcommitted guests. Infact,
> >
> > https://lkml.org/lkml/2013/7/22/41 tries to do the same thing for x86.
> > (this results in using ple handler heuristics in vcpu_block pach).
> 
> What about the adverse effect in the non-overcommitted case?

Could we not detect overcommitment and only set the TWE bit in this case
(per VCPU scheduled to run)?

The advantage of a properly para-virtualised lock is that you can target
which VCPU to wake up. I don't think we can do this currently (without
assumptions about the underlying OS and how the compiler allocates
registers in the ticket spinlocks).

There have been suggestions to use WFE in user-space for a more power
efficient busy loop (usually in user-only locking, I think some
PostreSQL does that) together with an automatic even stream for waking
up the WFE (Sudeep posted some patches recently for enabling 100us event
stream). If such feature gets used, we have a new meaning for WFE and we
may not want to trap it all the time.

Question for Will - do we have a PMU counter for WFE? (don't ask why ;))
Will Deacon July 23, 2013, 4:04 p.m. UTC | #10
On Tue, Jul 23, 2013 at 11:41:14AM +0100, Catalin Marinas wrote:
> On Mon, Jul 22, 2013 at 01:51:52PM +0100, Christoffer Dall wrote:
> > On 22 July 2013 10:53, Raghavendra KT <raghavendra.kt.linux@gmail.com> wrote:
> > > On Fri, Jul 19, 2013 at 7:23 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > >> So far, when a guest executes WFE (like when waiting for a spinlock
> > >> to become unlocked), we don't do a thing and let it run uninterrupted.
> > >>
> > >> Another option is to trap a blocking WFE and offer the opportunity
> > >> to the scheduler to switch to another task, potentially giving the
> > >> vcpu holding the spinlock a chance to run sooner.
> > >>
> > >
> > > Idea looks to be correct from my experiments on x86. It does bring some
> > > percentage of benefits in overcommitted guests. Infact,
> > >
> > > https://lkml.org/lkml/2013/7/22/41 tries to do the same thing for x86.
> > > (this results in using ple handler heuristics in vcpu_block pach).
> > 
> > What about the adverse effect in the non-overcommitted case?
> 
> Could we not detect overcommitment and only set the TWE bit in this case
> (per VCPU scheduled to run)?
> 
> The advantage of a properly para-virtualised lock is that you can target
> which VCPU to wake up. I don't think we can do this currently (without
> assumptions about the underlying OS and how the compiler allocates
> registers in the ticket spinlocks).
> 
> There have been suggestions to use WFE in user-space for a more power
> efficient busy loop (usually in user-only locking, I think some
> PostreSQL does that) together with an automatic even stream for waking
> up the WFE (Sudeep posted some patches recently for enabling 100us event
> stream). If such feature gets used, we have a new meaning for WFE and we
> may not want to trap it all the time.
> 
> Question for Will - do we have a PMU counter for WFE? (don't ask why ;))

Step away from the CPU! There's nothing architected for counting wfe
instructions so, although a CPU might expose such an event, software can't
rely on it.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall July 28, 2013, 8:55 p.m. UTC | #11
On Mon, Jul 22, 2013 at 07:27:58PM +0530, Raghavendra K T wrote:
> On 07/22/2013 06:21 PM, Christoffer Dall wrote:
> >On 22 July 2013 10:53, Raghavendra KT <raghavendra.kt.linux@gmail.com> wrote:
> >>On Fri, Jul 19, 2013 at 7:23 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >>>So far, when a guest executes WFE (like when waiting for a spinlock
> >>>to become unlocked), we don't do a thing and let it run uninterrupted.
> >>>
> >>>Another option is to trap a blocking WFE and offer the opportunity
> >>>to the scheduler to switch to another task, potentially giving the
> >>>vcpu holding the spinlock a chance to run sooner.
> >>>
> >>
> >>Idea looks to be correct from my experiments on x86. It does bring some
> >>percentage of benefits in overcommitted guests. Infact,
> >>
> >>https://lkml.org/lkml/2013/7/22/41 tries to do the same thing for x86.
> >>(this results in using ple handler heuristics in vcpu_block pach).
> >
> >What about the adverse effect in the non-overcommitted case?
> >
> 
> Ideally is should fail to schedule any other task and comeback to halt
> loop. This should not hurt AFAICS. But I agree that, numbers needed to
> support this argument.

So if two VCPUs are scheduled on two PCPUs and the waiting VCPU would
normally wait, say, 1000 cycles to grab the lock, the latency for
grabbing the lock will now be (at least) a couple of thousand cycles
even for a tight switch back into the host and back into the guest (on
currently available hardware).

> 
> For x86, I had seen no side effects with the experiments.
> 

I suspect some workloads on x86 would indeed show some side effects, but
much smaller on ARM, since x86 has a much more hardware-optimized VMEXIT
cycle time on relatively recent CPUs.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghavendra K T July 29, 2013, 7:35 a.m. UTC | #12
On 07/29/2013 02:25 AM, Christoffer Dall wrote:
> On Mon, Jul 22, 2013 at 07:27:58PM +0530, Raghavendra K T wrote:
>> On 07/22/2013 06:21 PM, Christoffer Dall wrote:
>>> On 22 July 2013 10:53, Raghavendra KT <raghavendra.kt.linux@gmail.com> wrote:
>>>> On Fri, Jul 19, 2013 at 7:23 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>>> So far, when a guest executes WFE (like when waiting for a spinlock
>>>>> to become unlocked), we don't do a thing and let it run uninterrupted.
>>>>>
>>>>> Another option is to trap a blocking WFE and offer the opportunity
>>>>> to the scheduler to switch to another task, potentially giving the
>>>>> vcpu holding the spinlock a chance to run sooner.
>>>>>
>>>>
>>>> Idea looks to be correct from my experiments on x86. It does bring some
>>>> percentage of benefits in overcommitted guests. Infact,
>>>>
>>>> https://lkml.org/lkml/2013/7/22/41 tries to do the same thing for x86.
>>>> (this results in using ple handler heuristics in vcpu_block pach).
>>>
>>> What about the adverse effect in the non-overcommitted case?
>>>
>>
>> Ideally is should fail to schedule any other task and comeback to halt
>> loop. This should not hurt AFAICS. But I agree that, numbers needed to
>> support this argument.
>
> So if two VCPUs are scheduled on two PCPUs and the waiting VCPU would
> normally wait, say, 1000 cycles to grab the lock, the latency for
> grabbing the lock will now be (at least) a couple of thousand cycles
> even for a tight switch back into the host and back into the guest (on
> currently available hardware).
>

I agree that unnecessary vmexits increase the latency.

>>
>> For x86, I had seen no side effects with the experiments.
>>
>
> I suspect some workloads on x86 would indeed show some side effects, but
> much smaller on ARM, since x86 has a much more hardware-optimized VMEXIT
> cycle time on relatively recent CPUs.
>

I think I should have clearly explained what was tried in x86. sorry
for confusion.

in x86, what I tried was in the halt handler,
instead of doing simple schedule() do intelligent directed yields, using
already available ple handler.
ple handler does have some undercommit detection logic to return back
also the halt() was triggered by guest only after spinning enough in
pv-spinlocks (which was not normal otherwise).
So there was around 2-3% improvement overall in x86.
But yes, I am not expert to comment on arm ecosystem , though I liked 
the idea. and finally only numbers should prove as always.. :).


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a5f28e2..ac1ea05 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -63,6 +63,7 @@ 
  * TAC:		Trap ACTLR
  * TSC:		Trap SMC
  * TSW:		Trap cache operations by set/way
+ * TWE:		Trap WFE
  * TWI:		Trap WFI
  * TIDCP:	Trap L2CTLR/L2ECTLR
  * BSU_IS:	Upgrade barriers to the inner shareable domain
@@ -72,8 +73,9 @@ 
  * FMO:		Override CPSR.F and enable signaling with VF
  * SWIO:	Turn set/way invalidates into set/way clean+invalidate
  */
-#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
-			 HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
+			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
+			 HCR_AMO | HCR_IMO | HCR_FMO | \
 			 HCR_SWIO | HCR_TIDCP | HCR_RW)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 9beaca03..b0098c2 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -47,21 +47,29 @@  static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 }
 
 /**
- * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * kvm_handle_wfx - handle a wait-for-interrupts or wait-for-event
+ *		    instruction executed by a guest
+ *
  * @vcpu:	the vcpu pointer
  *
- * Simply call kvm_vcpu_block(), which will halt execution of
+ * WFE: Yield the CPU and come back to this vcpu when the scheduler
+ * decides to.
+ * WFI: Simply call kvm_vcpu_block(), which will halt execution of
  * world-switches and schedule other host processes until there is an
  * incoming IRQ or FIQ to the VM.
  */
-static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	kvm_vcpu_block(vcpu);
+	if (kvm_vcpu_get_hsr(vcpu) & 1)
+		cond_resched();
+	else
+		kvm_vcpu_block(vcpu);
+
 	return 1;
 }
 
 static exit_handle_fn arm_exit_handlers[] = {
-	[ESR_EL2_EC_WFI]	= kvm_handle_wfi,
+	[ESR_EL2_EC_WFI]	= kvm_handle_wfx,
 	[ESR_EL2_EC_CP15_32]	= kvm_handle_cp15_32,
 	[ESR_EL2_EC_CP15_64]	= kvm_handle_cp15_64,
 	[ESR_EL2_EC_CP14_MR]	= kvm_handle_cp14_access,