diff mbox series

[2/2] KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence

Message ID 20190802103709.70148-3-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm/arm64: Fix guest's PMR synchronization when blocking on WFI | expand

Commit Message

Marc Zyngier Aug. 2, 2019, 10:37 a.m. UTC
When a vpcu is about to block by calling kvm_vcpu_block, we call
back into the arch code to allow any form of synchronization that
may be required at this point (SVN stops the AVIC, ARM synchronises
the VMCR and enables GICv4 doorbells). But this synchronization
comes in quite late, as we've potentially waited for halt_poll_ns
to expire.

Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
kvm_vcpu_block(), which on ARM has several benefits:

- VMCR gets synchronised early, meaning that any interrupt delivered
  during the polling window will be evaluated with the correct guest
  PMR
- GICv4 doorbells are enabled, which means that any guest interrupt
  directly injected during that window will be immediately recognised

Tang Nianyao ran some tests on a GICv4 machine to evaluate such
change, and reported up to a 10% improvement for netperf:

<quote>
	netperf result:
	D06 as server, intel 8180 server as client
	with change:
	package 512 bytes - 5500 Mbits/s
	package 64 bytes - 760 Mbits/s
	without change:
	package 512 bytes - 5000 Mbits/s
	package 64 bytes - 710 Mbits/s
</quote>

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 virt/kvm/kvm_main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Paolo Bonzini Aug. 2, 2019, 10:46 a.m. UTC | #1
On 02/08/19 12:37, Marc Zyngier wrote:
> When a vpcu is about to block by calling kvm_vcpu_block, we call
> back into the arch code to allow any form of synchronization that
> may be required at this point (SVN stops the AVIC, ARM synchronises
> the VMCR and enables GICv4 doorbells). But this synchronization
> comes in quite late, as we've potentially waited for halt_poll_ns
> to expire.
> 
> Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
> kvm_vcpu_block(), which on ARM has several benefits:
> 
> - VMCR gets synchronised early, meaning that any interrupt delivered
>   during the polling window will be evaluated with the correct guest
>   PMR
> - GICv4 doorbells are enabled, which means that any guest interrupt
>   directly injected during that window will be immediately recognised
> 
> Tang Nianyao ran some tests on a GICv4 machine to evaluate such
> change, and reported up to a 10% improvement for netperf:
> 
> <quote>
> 	netperf result:
> 	D06 as server, intel 8180 server as client
> 	with change:
> 	package 512 bytes - 5500 Mbits/s
> 	package 64 bytes - 760 Mbits/s
> 	without change:
> 	package 512 bytes - 5000 Mbits/s
> 	package 64 bytes - 710 Mbits/s
> </quote>
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  virt/kvm/kvm_main.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 887f3b0c2b60..90d429c703cb 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2322,6 +2322,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  	bool waited = false;
>  	u64 block_ns;
>  
> +	kvm_arch_vcpu_blocking(vcpu);
> +
>  	start = cur = ktime_get();
>  	if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
>  		ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
> @@ -2342,8 +2344,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  		} while (single_task_running() && ktime_before(cur, stop));
>  	}
>  
> -	kvm_arch_vcpu_blocking(vcpu);
> -
>  	for (;;) {
>  		prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>  
> @@ -2356,9 +2356,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  
>  	finish_swait(&vcpu->wq, &wait);
>  	cur = ktime_get();
> -
> -	kvm_arch_vcpu_unblocking(vcpu);
>  out:
> +	kvm_arch_vcpu_unblocking(vcpu);
>  	block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
>  
>  	if (!vcpu_valid_wakeup(vcpu))
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Marc Zyngier Aug. 18, 2019, 5:53 p.m. UTC | #2
On Fri, 2 Aug 2019 12:46:33 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 02/08/19 12:37, Marc Zyngier wrote:
> > When a vpcu is about to block by calling kvm_vcpu_block, we call
> > back into the arch code to allow any form of synchronization that
> > may be required at this point (SVN stops the AVIC, ARM synchronises
> > the VMCR and enables GICv4 doorbells). But this synchronization
> > comes in quite late, as we've potentially waited for halt_poll_ns
> > to expire.
> > 
> > Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
> > kvm_vcpu_block(), which on ARM has several benefits:
> > 
> > - VMCR gets synchronised early, meaning that any interrupt delivered
> >   during the polling window will be evaluated with the correct guest
> >   PMR
> > - GICv4 doorbells are enabled, which means that any guest interrupt
> >   directly injected during that window will be immediately recognised
> > 
> > Tang Nianyao ran some tests on a GICv4 machine to evaluate such
> > change, and reported up to a 10% improvement for netperf:
> > 
> > <quote>
> > 	netperf result:
> > 	D06 as server, intel 8180 server as client
> > 	with change:
> > 	package 512 bytes - 5500 Mbits/s
> > 	package 64 bytes - 760 Mbits/s
> > 	without change:
> > 	package 512 bytes - 5000 Mbits/s
> > 	package 64 bytes - 710 Mbits/s
> > </quote>
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  virt/kvm/kvm_main.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 887f3b0c2b60..90d429c703cb 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2322,6 +2322,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >  	bool waited = false;
> >  	u64 block_ns;
> >  
> > +	kvm_arch_vcpu_blocking(vcpu);
> > +
> >  	start = cur = ktime_get();
> >  	if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
> >  		ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
> > @@ -2342,8 +2344,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >  		} while (single_task_running() && ktime_before(cur, stop));
> >  	}
> >  
> > -	kvm_arch_vcpu_blocking(vcpu);
> > -
> >  	for (;;) {
> >  		prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> >  
> > @@ -2356,9 +2356,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >  
> >  	finish_swait(&vcpu->wq, &wait);
> >  	cur = ktime_get();
> > -
> > -	kvm_arch_vcpu_unblocking(vcpu);
> >  out:
> > +	kvm_arch_vcpu_unblocking(vcpu);
> >  	block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
> >  
> >  	if (!vcpu_valid_wakeup(vcpu))
> >   
> 
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks for that. I've pushed this patch into -next so that it gets a
bit of exposure (I haven't heard from the AMD folks, and I'd like to
make sure it doesn't regress their platforms).

	M.
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 887f3b0c2b60..90d429c703cb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2322,6 +2322,8 @@  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	bool waited = false;
 	u64 block_ns;
 
+	kvm_arch_vcpu_blocking(vcpu);
+
 	start = cur = ktime_get();
 	if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
 		ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
@@ -2342,8 +2344,6 @@  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		} while (single_task_running() && ktime_before(cur, stop));
 	}
 
-	kvm_arch_vcpu_blocking(vcpu);
-
 	for (;;) {
 		prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
@@ -2356,9 +2356,8 @@  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 
 	finish_swait(&vcpu->wq, &wait);
 	cur = ktime_get();
-
-	kvm_arch_vcpu_unblocking(vcpu);
 out:
+	kvm_arch_vcpu_unblocking(vcpu);
 	block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
 
 	if (!vcpu_valid_wakeup(vcpu))