diff mbox

[PATCHv2/RFC] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

Message ID 1389876260-46636-1-git-send-email-borntraeger@de.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Christian Borntraeger Jan. 16, 2014, 12:44 p.m. UTC
When starting lots of dataplane devices the bootup takes very long on my
s390 system(prototype irqfd code). With larger setups we are even able
to
trigger some timeouts in some components.
Turns out that the KVM_SET_GSI_ROUTING ioctl takes very
long (strace claims up to 0.1 sec) when having multiple CPUs.
This is caused by the  synchronize_rcu and the HZ=100 of s390.
By changing the code to use a private srcu we can speed things up.

This patch reduces the boot time till mounting root from 8 to 2
seconds on my s390 guest with 100 disks.

I converted most of the rcu routines to srcu. Review for the unconverted
use of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
is necessary, though. They look fine to me since they are protected by
outer functions.

In addition, we should also discuss if a global srcu (for all guests) is
fine.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 virt/kvm/irqchip.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

Comments

Paolo Bonzini Jan. 16, 2014, 12:59 p.m. UTC | #1
Il 16/01/2014 13:44, Christian Borntraeger ha scritto:
> I converted most of the rcu routines to srcu. Review for the unconverted
> use of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
> is necessary, though. They look fine to me since they are protected by
> outer functions.

They are fine because they do not have lockdep checks
(hlist_for_each_entry_rcu uses rcu_dereference_raw rather than
rcu_dereference, and write-sides do not do rcu lockdep at all).

> In addition, we should also discuss if a global srcu (for all guests) is
> fine.

I think it is.  It is already way cheaper than it used to be, and we're
hardly relying on the "sleepable" part of srcu.  We just want its faster
detection of grace periods.  One instance should be fine because our
read sides are so small and mostly they are not even preemptable.

Thanks for writing the patch!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Borntraeger Jan. 16, 2014, 1:06 p.m. UTC | #2
On 16/01/14 13:59, Paolo Bonzini wrote:
> Il 16/01/2014 13:44, Christian Borntraeger ha scritto:
>> I converted most of the rcu routines to srcu. Review for the unconverted
>> use of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
>> is necessary, though. They look fine to me since they are protected by
>> outer functions.
> 
> They are fine because they do not have lockdep checks
> (hlist_for_each_entry_rcu uses rcu_dereference_raw rather than
> rcu_dereference, and write-sides do not do rcu lockdep at all).
> 
>> In addition, we should also discuss if a global srcu (for all guests) is
>> fine.
> 
> I think it is.  It is already way cheaper than it used to be, and we're
> hardly relying on the "sleepable" part of srcu.  We just want its faster
> detection of grace periods.  One instance should be fine because our
> read sides are so small and mostly they are not even preemptable.
> 
> Thanks for writing the patch!
> 
> Paolo
> 

Will you edit the patch description or shall I resend the patch?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Jan. 16, 2014, 1:07 p.m. UTC | #3
Il 16/01/2014 14:06, Christian Borntraeger ha scritto:
> Will you edit the patch description or shall I resend the patch?

I can edit the commit message.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Jan. 16, 2014, 6:55 p.m. UTC | #4
On Thu, Jan 16, 2014 at 01:44:20PM +0100, Christian Borntraeger wrote:
> When starting lots of dataplane devices the bootup takes very long on my
> s390 system(prototype irqfd code). With larger setups we are even able
> to
> trigger some timeouts in some components.
> Turns out that the KVM_SET_GSI_ROUTING ioctl takes very
> long (strace claims up to 0.1 sec) when having multiple CPUs.
> This is caused by the  synchronize_rcu and the HZ=100 of s390.
> By changing the code to use a private srcu we can speed things up.
> 
> This patch reduces the boot time till mounting root from 8 to 2
> seconds on my s390 guest with 100 disks.
> 
> I converted most of the rcu routines to srcu. Review for the unconverted
> use of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
> is necessary, though. They look fine to me since they are protected by
> outer functions.
> 
> In addition, we should also discuss if a global srcu (for all guests) is
> fine.
> 
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>


That's nice but did you try to measure the overhead
on some interrupt-intensive workloads, such as RX with 10G ethernet?
srcu locks aren't free like rcu ones.

> ---
>  virt/kvm/irqchip.c | 31 +++++++++++++++++--------------
>  1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> index 20dc9e4..5283eb8 100644
> --- a/virt/kvm/irqchip.c
> +++ b/virt/kvm/irqchip.c
> @@ -26,17 +26,20 @@
>  
>  #include <linux/kvm_host.h>
>  #include <linux/slab.h>
> +#include <linux/srcu.h>
>  #include <linux/export.h>
>  #include <trace/events/kvm.h>
>  #include "irq.h"
>  
> +DEFINE_STATIC_SRCU(irq_srcu);
> +
>  bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  {
>  	struct kvm_irq_ack_notifier *kian;
> -	int gsi;
> +	int gsi, idx;
>  
> -	rcu_read_lock();
> -	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> +	idx = srcu_read_lock(&irq_srcu);
> +	gsi = srcu_dereference(kvm->irq_routing, &irq_srcu)->chip[irqchip][pin];
>  	if (gsi != -1)
>  		hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
>  					 link)
> @@ -45,7 +48,7 @@ bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  				return true;
>  			}
>  
> -	rcu_read_unlock();
> +	srcu_read_unlock(&irq_srcu, idx);
>  
>  	return false;
>  }
> @@ -54,18 +57,18 @@ EXPORT_SYMBOL_GPL(kvm_irq_has_notifier);
>  void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  {
>  	struct kvm_irq_ack_notifier *kian;
> -	int gsi;
> +	int gsi, idx;
>  
>  	trace_kvm_ack_irq(irqchip, pin);
>  
> -	rcu_read_lock();
> -	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> +	idx = srcu_read_lock(&irq_srcu);
> +	gsi = srcu_dereference(kvm->irq_routing, &irq_srcu)->chip[irqchip][pin];
>  	if (gsi != -1)
>  		hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
>  					 link)
>  			if (kian->gsi == gsi)
>  				kian->irq_acked(kian);
> -	rcu_read_unlock();
> +	srcu_read_unlock(&irq_srcu, idx);
>  }
>  
>  void kvm_register_irq_ack_notifier(struct kvm *kvm,
> @@ -85,7 +88,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>  	mutex_lock(&kvm->irq_lock);
>  	hlist_del_init_rcu(&kian->link);
>  	mutex_unlock(&kvm->irq_lock);
> -	synchronize_rcu();
> +	synchronize_srcu_expedited(&irq_srcu);
>  #ifdef __KVM_HAVE_IOAPIC
>  	kvm_vcpu_request_scan_ioapic(kvm);
>  #endif
> @@ -115,7 +118,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level,
>  		bool line_status)
>  {
>  	struct kvm_kernel_irq_routing_entry *e, irq_set[KVM_NR_IRQCHIPS];
> -	int ret = -1, i = 0;
> +	int ret = -1, i = 0, idx;
>  	struct kvm_irq_routing_table *irq_rt;
>  
>  	trace_kvm_set_irq(irq, level, irq_source_id);
> @@ -124,12 +127,12 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level,
>  	 * IOAPIC.  So set the bit in both. The guest will ignore
>  	 * writes to the unused one.
>  	 */
> -	rcu_read_lock();
> -	irq_rt = rcu_dereference(kvm->irq_routing);
> +	idx = srcu_read_lock(&irq_srcu);
> +	irq_rt = srcu_dereference(kvm->irq_routing, &irq_srcu);
>  	if (irq < irq_rt->nr_rt_entries)
>  		hlist_for_each_entry(e, &irq_rt->map[irq], link)
>  			irq_set[i++] = *e;
> -	rcu_read_unlock();
> +	srcu_read_unlock(&irq_srcu, idx);
>  
>  	while(i--) {
>  		int r;
> @@ -226,7 +229,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
>  	kvm_irq_routing_update(kvm, new);
>  	mutex_unlock(&kvm->irq_lock);
>  
> -	synchronize_rcu();
> +	synchronize_srcu_expedited(&irq_srcu);

Hmm, it's a bit strange that you also do _expecited here.
What if this synchronize_rcu is replaced by synchronize_rcu_expedited
and no other changes are made?
Maybe that's enough?

>  
>  	new = old;
>  	r = 0;
> -- 
> 1.8.4.2
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Jan. 16, 2014, 6:56 p.m. UTC | #5
On Thu, Jan 16, 2014 at 02:07:19PM +0100, Paolo Bonzini wrote:
> Il 16/01/2014 14:06, Christian Borntraeger ha scritto:
> > Will you edit the patch description or shall I resend the patch?
> 
> I can edit the commit message.
> 
> Paolo

I think we really need to see the effect adding srcu has on interrupt
injection.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Borntraeger Jan. 17, 2014, 8:29 a.m. UTC | #6
On 16/01/14 19:56, Michael S. Tsirkin wrote:
> On Thu, Jan 16, 2014 at 02:07:19PM +0100, Paolo Bonzini wrote:
>> Il 16/01/2014 14:06, Christian Borntraeger ha scritto:
>>> Will you edit the patch description or shall I resend the patch?
>>
>> I can edit the commit message.
>>
>> Paolo
> 
> I think we really need to see the effect adding srcu has on interrupt
> injection.

Michael, 
do you have a quick way to check if srcu has a noticeable impact on int
injection on your systems? I am happy with either v2 or v3 of the patch,
but srcu_synchronize_expedited seems to have less latency impact on the 
full system than rcu_synchronize_expedited. This might give Paolo a hint
which of the patches is the right way to go.

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Jan. 17, 2014, 9:19 a.m. UTC | #7
Il 17/01/2014 09:29, Christian Borntraeger ha scritto:
> Michael, 
> do you have a quick way to check if srcu has a noticeable impact on int
> injection on your systems? I am happy with either v2 or v3 of the patch,
> but srcu_synchronize_expedited seems to have less latency impact on the 
> full system than rcu_synchronize_expedited. This might give Paolo a hint
> which of the patches is the right way to go.

For 3.14, I'll definitely pick v3.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Feb. 19, 2014, 10:23 p.m. UTC | #8
Il 17/01/2014 09:29, Christian Borntraeger ha scritto:
> Michael,
> do you have a quick way to check if srcu has a noticeable impact on int
> injection on your systems? I am happy with either v2 or v3 of the patch,
> but srcu_synchronize_expedited seems to have less latency impact on the
> full system than rcu_synchronize_expedited. This might give Paolo a hint
> which of the patches is the right way to go.

Hi all,

I've asked Andrew Theurer to run network tests on a 10G connection (TCP 
request/response to check for performance, TCP streaming for host CPU 
utilization).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Theurer Feb. 21, 2014, 4:59 a.m. UTC | #9
> Il 17/01/2014 09:29, Christian Borntraeger ha scritto:
> > Michael,
> > do you have a quick way to check if srcu has a noticeable impact on int
> > injection on your systems? I am happy with either v2 or v3 of the patch,
> > but srcu_synchronize_expedited seems to have less latency impact on the
> > full system than rcu_synchronize_expedited. This might give Paolo a hint
> > which of the patches is the right way to go.
> 
> Hi all,
> 
> I've asked Andrew Theurer to run network tests on a 10G connection (TCP
> request/response to check for performance, TCP streaming for host CPU
> utilization).

I am hoping to have some results some time tomorrow (Friday).

-Andrew

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Feb. 21, 2014, 5:35 p.m. UTC | #10
Il 16/01/2014 13:44, Christian Borntraeger ha scritto:
> +DEFINE_STATIC_SRCU(irq_srcu);
> +
>  bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  {
>  	struct kvm_irq_ack_notifier *kian;
> -	int gsi;
> +	int gsi, idx;
>  
> -	rcu_read_lock();
> -	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> +	idx = srcu_read_lock(&irq_srcu);
> +	gsi = srcu_dereference(kvm->irq_routing, &irq_srcu)->chip[irqchip][pin];
>  	if (gsi != -1)
>  		hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
>  					 link)
> @@ -45,7 +48,7 @@ bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  				return true;
>  			}
>  
> -	rcu_read_unlock();
> +	srcu_read_unlock(&irq_srcu, idx);

Missing hunk here:

@@ -44,7 +44,7 @@ bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
 		hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
 					 link)
 			if (kian->gsi == gsi) {
-				rcu_read_unlock();
+				srcu_read_unlock(&irq_srcu, idx);
 				return true;
 			}
 

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 20dc9e4..5283eb8 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -26,17 +26,20 @@ 
 
 #include <linux/kvm_host.h>
 #include <linux/slab.h>
+#include <linux/srcu.h>
 #include <linux/export.h>
 #include <trace/events/kvm.h>
 #include "irq.h"
 
+DEFINE_STATIC_SRCU(irq_srcu);
+
 bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
 {
 	struct kvm_irq_ack_notifier *kian;
-	int gsi;
+	int gsi, idx;
 
-	rcu_read_lock();
-	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
+	idx = srcu_read_lock(&irq_srcu);
+	gsi = srcu_dereference(kvm->irq_routing, &irq_srcu)->chip[irqchip][pin];
 	if (gsi != -1)
 		hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
 					 link)
@@ -45,7 +48,7 @@  bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
 				return true;
 			}
 
-	rcu_read_unlock();
+	srcu_read_unlock(&irq_srcu, idx);
 
 	return false;
 }
@@ -54,18 +57,18 @@  EXPORT_SYMBOL_GPL(kvm_irq_has_notifier);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 {
 	struct kvm_irq_ack_notifier *kian;
-	int gsi;
+	int gsi, idx;
 
 	trace_kvm_ack_irq(irqchip, pin);
 
-	rcu_read_lock();
-	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
+	idx = srcu_read_lock(&irq_srcu);
+	gsi = srcu_dereference(kvm->irq_routing, &irq_srcu)->chip[irqchip][pin];
 	if (gsi != -1)
 		hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
 					 link)
 			if (kian->gsi == gsi)
 				kian->irq_acked(kian);
-	rcu_read_unlock();
+	srcu_read_unlock(&irq_srcu, idx);
 }
 
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
@@ -85,7 +88,7 @@  void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
 	mutex_lock(&kvm->irq_lock);
 	hlist_del_init_rcu(&kian->link);
 	mutex_unlock(&kvm->irq_lock);
-	synchronize_rcu();
+	synchronize_srcu_expedited(&irq_srcu);
 #ifdef __KVM_HAVE_IOAPIC
 	kvm_vcpu_request_scan_ioapic(kvm);
 #endif
@@ -115,7 +118,7 @@  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level,
 		bool line_status)
 {
 	struct kvm_kernel_irq_routing_entry *e, irq_set[KVM_NR_IRQCHIPS];
-	int ret = -1, i = 0;
+	int ret = -1, i = 0, idx;
 	struct kvm_irq_routing_table *irq_rt;
 
 	trace_kvm_set_irq(irq, level, irq_source_id);
@@ -124,12 +127,12 @@  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level,
 	 * IOAPIC.  So set the bit in both. The guest will ignore
 	 * writes to the unused one.
 	 */
-	rcu_read_lock();
-	irq_rt = rcu_dereference(kvm->irq_routing);
+	idx = srcu_read_lock(&irq_srcu);
+	irq_rt = srcu_dereference(kvm->irq_routing, &irq_srcu);
 	if (irq < irq_rt->nr_rt_entries)
 		hlist_for_each_entry(e, &irq_rt->map[irq], link)
 			irq_set[i++] = *e;
-	rcu_read_unlock();
+	srcu_read_unlock(&irq_srcu, idx);
 
 	while(i--) {
 		int r;
@@ -226,7 +229,7 @@  int kvm_set_irq_routing(struct kvm *kvm,
 	kvm_irq_routing_update(kvm, new);
 	mutex_unlock(&kvm->irq_lock);
 
-	synchronize_rcu();
+	synchronize_srcu_expedited(&irq_srcu);
 
 	new = old;
 	r = 0;