diff mbox

[2/6] kvm-s390: use hrtimer for clock wakeup from idle

Message ID 1241534358-32172-3-git-send-email-ehrhardt@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

ehrhardt@linux.vnet.ibm.com May 5, 2009, 2:39 p.m. UTC
From: Christian Borntraeger <borntraeger@de.ibm.com>

This patch reworks the s390 clock comparator wakeup to hrtimer. The clock
comparator is a per-cpu value that is compared against the TOD clock. If
ckc <= TOD an external interrupt 1004 is triggered. Since the clock comparator
and the TOD clock have a much higher resolution than jiffies we should use
hrtimers to trigger the wakeup. This speeds up guest nanosleep for small
values.

Since hrtimers callbacks run in hard-irq context, I added a tasklet to do
the actual work with enabled interrupts. 

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
 include/asm/kvm_host.h |    5 ++++-
 kvm/interrupt.c        |   35 +++++++++++++++++++++++++----------
 kvm/kvm-s390.c         |    7 +++++--
 kvm/kvm-s390.h         |    4 +++-
 4 files changed, 37 insertions(+), 14 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Avi Kivity May 6, 2009, 12:10 p.m. UTC | #1
ehrhardt@linux.vnet.ibm.com wrote:
> From: Christian Borntraeger <borntraeger@de.ibm.com>
>
> This patch reworks the s390 clock comparator wakeup to hrtimer. The clock
> comparator is a per-cpu value that is compared against the TOD clock. If
> ckc <= TOD an external interrupt 1004 is triggered. Since the clock comparator
> and the TOD clock have a much higher resolution than jiffies we should use
> hrtimers to trigger the wakeup. This speeds up guest nanosleep for small
> values.
>
> Since hrtimers callbacks run in hard-irq context, I added a tasklet to do
> the actual work with enabled interrupts. 
>
>  
> -void kvm_s390_idle_wakeup(unsigned long data)
> +void kvm_s390_tasklet(unsigned long parm)
>  {
> -	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> +	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
>  
> -	spin_lock_bh(&vcpu->arch.local_int.lock);
> +	spin_lock(&vcpu->arch.local_int.lock);
>  	vcpu->arch.local_int.timer_due = 1;
>  	if (waitqueue_active(&vcpu->arch.local_int.wq))
>  		wake_up_interruptible(&vcpu->arch.local_int.wq);
> -	spin_unlock_bh(&vcpu->arch.local_int.lock);
> +	spin_unlock(&vcpu->arch.local_int.lock);
>  }
>   

Why can't this be done from the timer context (after adjusting the locks)?
Christian Borntraeger May 6, 2009, 12:36 p.m. UTC | #2
Am Wednesday 06 May 2009 14:10:07 schrieb Avi Kivity:
> ehrhardt@linux.vnet.ibm.com wrote:
> > From: Christian Borntraeger <borntraeger@de.ibm.com>
> >
> > This patch reworks the s390 clock comparator wakeup to hrtimer. The clock
> > comparator is a per-cpu value that is compared against the TOD clock. If
> > ckc <= TOD an external interrupt 1004 is triggered. Since the clock comparator
> > and the TOD clock have a much higher resolution than jiffies we should use
> > hrtimers to trigger the wakeup. This speeds up guest nanosleep for small
> > values.
> >
> > Since hrtimers callbacks run in hard-irq context, I added a tasklet to do
> > the actual work with enabled interrupts. 
> >
> >  
> > -void kvm_s390_idle_wakeup(unsigned long data)
> > +void kvm_s390_tasklet(unsigned long parm)
> >  {
> > -	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> > +	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
> >  
> > -	spin_lock_bh(&vcpu->arch.local_int.lock);
> > +	spin_lock(&vcpu->arch.local_int.lock);
> >  	vcpu->arch.local_int.timer_due = 1;
> >  	if (waitqueue_active(&vcpu->arch.local_int.wq))
> >  		wake_up_interruptible(&vcpu->arch.local_int.wq);
> > -	spin_unlock_bh(&vcpu->arch.local_int.lock);
> > +	spin_unlock(&vcpu->arch.local_int.lock);
> >  }
> >   
> 
> Why can't this be done from the timer context (after adjusting the locks)?

It can be done, in fact I did that in my first version. The thing is, we would need to change all local_int.locks to spinlock_irqs, since standard timers are softirqs and hrtimers and hardirqs. Enabling and disabling irqs is a relatively expensive operating on s390 (but locks via compare and swap are quite cheap). Since we take this specific lock in lots of places this lock is on some hot pathes. The idle wakeup on the other hand is not that critical.

Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 7, 2009, 10:19 a.m. UTC | #3
Christian Borntraeger wrote:
>> Why can't this be done from the timer context (after adjusting the locks)?
>>     
>
> It can be done, in fact I did that in my first version. The thing is, we would need to change all local_int.locks to spinlock_irqs, since standard timers are softirqs and hrtimers and hardirqs. Enabling and disabling irqs is a relatively expensive operating on s390 (but locks via compare and swap are quite cheap). Since we take this specific lock in lots of places this lock is on some hot pathes. The idle wakeup on the other hand is not that critical.
>   

Makes sense.

Back when irqsave/irqrestore were expensive on x86 (P4 days, I think it 
was ~100 cycles) there was talk of using a software irq disable flag 
instead of using the hardware support.  So

- local_irq_disable() sets a flag

- interrupt handlers check the flag, if set they schedule the interrupt 
handler and return immediately

- local_irq_restore() clears the flag and fires and scheduled handlers

Now these operations are pretty cheap on x86, but maybe that can apply 
to s390.
Christian Borntraeger May 7, 2009, 10:34 a.m. UTC | #4
Am Thursday 07 May 2009 12:19:32 schrieb Avi Kivity:
> Makes sense.
> 
> Back when irqsave/irqrestore were expensive on x86 (P4 days, I think it 
> was ~100 cycles) there was talk of using a software irq disable flag 
> instead of using the hardware support.  So
> 
> - local_irq_disable() sets a flag
> 
> - interrupt handlers check the flag, if set they schedule the interrupt 
> handler and return immediately
> 
> - local_irq_restore() clears the flag and fires and scheduled handlers
> 
> Now these operations are pretty cheap on x86, but maybe that can apply 
> to s390.

Interesting idea. Nevertheless, I dont think it improve our situation. The affected instructions (stosm and stnsm) are more expensive than compare and swap, but its nowhere near the ~100 cycles of P4.

Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hollis Blanchard May 20, 2009, 3:48 p.m. UTC | #5
On Thu, 2009-05-07 at 13:19 +0300, Avi Kivity wrote:
> Christian Borntraeger wrote:
> >> Why can't this be done from the timer context (after adjusting the locks)?
> >>     
> >
> > It can be done, in fact I did that in my first version. The thing is, we would need to change all local_int.locks to spinlock_irqs, since standard timers are softirqs and hrtimers and hardirqs. Enabling and disabling irqs is a relatively expensive operating on s390 (but locks via compare and swap are quite cheap). Since we take this specific lock in lots of places this lock is on some hot pathes. The idle wakeup on the other hand is not that critical.
> >   
> 
> Makes sense.
> 
> Back when irqsave/irqrestore were expensive on x86 (P4 days, I think it 
> was ~100 cycles) there was talk of using a software irq disable flag 
> instead of using the hardware support.  So
> 
> - local_irq_disable() sets a flag
> 
> - interrupt handlers check the flag, if set they schedule the interrupt 
> handler and return immediately
> 
> - local_irq_restore() clears the flag and fires and scheduled handlers
> 
> Now these operations are pretty cheap on x86, but maybe that can apply 
> to s390.

I don't know how the cycle counts compare, but FWIW normal ppc64 Linux
does almost exactly this (and calls it "lazy irq disabling"). The only
difference is a step 2.5 that really disables interrupts, because some
are level-triggered.
diff mbox

Patch

Index: kvm/arch/s390/include/asm/kvm_host.h
===================================================================
--- kvm.orig/arch/s390/include/asm/kvm_host.h	2009-05-05 15:58:45.000000000 +0200
+++ kvm/arch/s390/include/asm/kvm_host.h	2009-05-05 16:16:49.000000000 +0200
@@ -13,6 +13,8 @@ 
 
 #ifndef ASM_KVM_HOST_H
 #define ASM_KVM_HOST_H
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/kvm_host.h>
 #include <asm/debug.h>
 #include <asm/cpuid.h>
@@ -210,7 +212,8 @@ 
 	s390_fp_regs      guest_fpregs;
 	unsigned int      guest_acrs[NUM_ACRS];
 	struct kvm_s390_local_interrupt local_int;
-	struct timer_list ckc_timer;
+	struct hrtimer    ckc_timer;
+	struct tasklet_struct tasklet;
 	union  {
 		cpuid_t	  cpu_id;
 		u64	  stidp_data;
Index: kvm/arch/s390/kvm/interrupt.c
===================================================================
--- kvm.orig/arch/s390/kvm/interrupt.c	2009-05-05 15:58:45.000000000 +0200
+++ kvm/arch/s390/kvm/interrupt.c	2009-05-05 16:18:02.000000000 +0200
@@ -12,6 +12,8 @@ 
 
 #include <asm/lowcore.h>
 #include <asm/uaccess.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/kvm_host.h>
 #include <linux/signal.h>
 #include "kvm-s390.h"
@@ -361,12 +363,12 @@ 
 		return 0;
 	}
 
-	sltime = (vcpu->arch.sie_block->ckc - now) / (0xf4240000ul / HZ) + 1;
+	sltime = (vcpu->arch.sie_block->ckc - now)/4096*1000;
 
-	vcpu->arch.ckc_timer.expires = jiffies + sltime;
-
-	add_timer(&vcpu->arch.ckc_timer);
-	VCPU_EVENT(vcpu, 5, "enabled wait timer:%llx jiffies", sltime);
+	hrtimer_start(&vcpu->arch.ckc_timer, ktime_set(0, sltime),
+		      HRTIMER_MODE_REL);
+	VCPU_EVENT(vcpu, 5, "enabled wait via clock comparator: %llx ns",
+		   sltime);
 no_timer:
 	spin_lock_bh(&vcpu->arch.local_int.float_int->lock);
 	spin_lock_bh(&vcpu->arch.local_int.lock);
@@ -389,21 +391,34 @@ 
 	remove_wait_queue(&vcpu->wq, &wait);
 	spin_unlock_bh(&vcpu->arch.local_int.lock);
 	spin_unlock_bh(&vcpu->arch.local_int.float_int->lock);
-	del_timer(&vcpu->arch.ckc_timer);
+	hrtimer_try_to_cancel(&vcpu->arch.ckc_timer);
 	return 0;
 }
 
-void kvm_s390_idle_wakeup(unsigned long data)
+void kvm_s390_tasklet(unsigned long parm)
 {
-	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
+	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
 
-	spin_lock_bh(&vcpu->arch.local_int.lock);
+	spin_lock(&vcpu->arch.local_int.lock);
 	vcpu->arch.local_int.timer_due = 1;
 	if (waitqueue_active(&vcpu->arch.local_int.wq))
 		wake_up_interruptible(&vcpu->arch.local_int.wq);
-	spin_unlock_bh(&vcpu->arch.local_int.lock);
+	spin_unlock(&vcpu->arch.local_int.lock);
 }
 
+/*
+ * low level hrtimer wake routine. Because this runs in hardirq context
+ * we schedule a tasklet to do the real work.
+ */
+enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(timer, struct kvm_vcpu, arch.ckc_timer);
+	tasklet_schedule(&vcpu->arch.tasklet);
+
+	return HRTIMER_NORESTART;
+}
 
 void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
 {
Index: kvm/arch/s390/kvm/kvm-s390.c
===================================================================
--- kvm.orig/arch/s390/kvm/kvm-s390.c	2009-05-05 16:16:48.000000000 +0200
+++ kvm/arch/s390/kvm/kvm-s390.c	2009-05-05 16:16:49.000000000 +0200
@@ -15,6 +15,7 @@ 
 #include <linux/compiler.h>
 #include <linux/err.h>
 #include <linux/fs.h>
+#include <linux/hrtimer.h>
 #include <linux/init.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
@@ -286,8 +287,10 @@ 
 	vcpu->arch.sie_block->gmsor = vcpu->kvm->arch.guest_origin;
 	vcpu->arch.sie_block->ecb   = 2;
 	vcpu->arch.sie_block->eca   = 0xC1002001U;
-	setup_timer(&vcpu->arch.ckc_timer, kvm_s390_idle_wakeup,
-		 (unsigned long) vcpu);
+	hrtimer_init(&vcpu->arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+	tasklet_init(&vcpu->arch.tasklet, kvm_s390_tasklet,
+		     (unsigned long) vcpu);
+	vcpu->arch.ckc_timer.function = kvm_s390_idle_wakeup;
 	get_cpu_id(&vcpu->arch.cpu_id);
 	vcpu->arch.cpu_id.version = 0xff;
 	return 0;
Index: kvm/arch/s390/kvm/kvm-s390.h
===================================================================
--- kvm.orig/arch/s390/kvm/kvm-s390.h	2009-05-05 15:58:45.000000000 +0200
+++ kvm/arch/s390/kvm/kvm-s390.h	2009-05-05 16:16:49.000000000 +0200
@@ -14,6 +14,7 @@ 
 #ifndef ARCH_S390_KVM_S390_H
 #define ARCH_S390_KVM_S390_H
 
+#include <linux/hrtimer.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
@@ -41,7 +42,8 @@ 
 }
 
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
-void kvm_s390_idle_wakeup(unsigned long data);
+enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer);
+void kvm_s390_tasklet(unsigned long parm);
 void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
 int kvm_s390_inject_vm(struct kvm *kvm,
 		struct kvm_s390_interrupt *s390int);