diff mbox

[v2] KVM: x86: Remove support for reporting coalesced APIC IRQs

Message ID 517D0F69.5080902@web.de (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kiszka April 28, 2013, noon UTC
From: Jan Kiszka <jan.kiszka@siemens.com>

Since the arrival of posted interrupt support we can no longer guarantee
that coalesced IRQs are always reported to the IRQ source. Moreover,
accumulated APIC timer events could cause a busy loop when a VCPU should
rather be halted. The consensus is to remove coalesced tracking from the
LAPIC.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Changes in v2:
 - preserve return values where we need to count to how many VCPUs an
   IRQ was delivered

 arch/x86/kvm/lapic.c |   57 ++++++++++++++++++++-----------------------------
 arch/x86/kvm/lapic.h |    6 ++--
 virt/kvm/irq_comm.c  |    9 +++++--
 3 files changed, 32 insertions(+), 40 deletions(-)

Comments

Marcelo Tosatti May 3, 2013, 1:14 a.m. UTC | #1
On Sun, Apr 28, 2013 at 02:00:41PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Since the arrival of posted interrupt support we can no longer guarantee
> that coalesced IRQs are always reported to the IRQ source. Moreover,
> accumulated APIC timer events could cause a busy loop when a VCPU should
> rather be halted. The consensus is to remove coalesced tracking from the
> LAPIC.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> Changes in v2:
>  - preserve return values where we need to count to how many VCPUs an
>    IRQ was delivered

It would be best to confirm first that no guest depends on LAPIC timer
reinjection (rather than wait for guests to break and spend time
debugging). Specially since such kind of bugs are not easy to pinpoint.
Or if there is evidence to invalidate such reasoning, please point it
out.

Honestly i don't recall: would have to check RHEL5.32 UP/SMP, RHEL5.64
UP/SMP, RHEL4.32, RHEL4.64, RHEL6. Can do that in a week or so.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti May 14, 2013, 1:30 a.m. UTC | #2
On Thu, May 02, 2013 at 10:14:32PM -0300, Marcelo Tosatti wrote:
> On Sun, Apr 28, 2013 at 02:00:41PM +0200, Jan Kiszka wrote:
> > From: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Since the arrival of posted interrupt support we can no longer guarantee
> > that coalesced IRQs are always reported to the IRQ source. Moreover,
> > accumulated APIC timer events could cause a busy loop when a VCPU should
> > rather be halted. The consensus is to remove coalesced tracking from the
> > LAPIC.
> > 
> > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > ---
> > 
> > Changes in v2:
> >  - preserve return values where we need to count to how many VCPUs an
> >    IRQ was delivered
> 
> It would be best to confirm first that no guest depends on LAPIC timer
> reinjection (rather than wait for guests to break and spend time
> debugging). Specially since such kind of bugs are not easy to pinpoint.
> Or if there is evidence to invalidate such reasoning, please point it
> out.
> 
> Honestly i don't recall: would have to check RHEL5.32 UP/SMP, RHEL5.64
> UP/SMP, RHEL4.32, RHEL4.64, RHEL6. Can do that in a week or so.

Its OK to remove LAPIC timer reinjection.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 14, 2013, 9:10 a.m. UTC | #3
On Mon, May 13, 2013 at 10:30:33PM -0300, Marcelo Tosatti wrote:
> On Thu, May 02, 2013 at 10:14:32PM -0300, Marcelo Tosatti wrote:
> > On Sun, Apr 28, 2013 at 02:00:41PM +0200, Jan Kiszka wrote:
> > > From: Jan Kiszka <jan.kiszka@siemens.com>
> > > 
> > > Since the arrival of posted interrupt support we can no longer guarantee
> > > that coalesced IRQs are always reported to the IRQ source. Moreover,
> > > accumulated APIC timer events could cause a busy loop when a VCPU should
> > > rather be halted. The consensus is to remove coalesced tracking from the
> > > LAPIC.
> > > 
> > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > > ---
> > > 
> > > Changes in v2:
> > >  - preserve return values where we need to count to how many VCPUs an
> > >    IRQ was delivered
> > 
> > It would be best to confirm first that no guest depends on LAPIC timer
> > reinjection (rather than wait for guests to break and spend time
> > debugging). Specially since such kind of bugs are not easy to pinpoint.
> > Or if there is evidence to invalidate such reasoning, please point it
> > out.
> > 
> > Honestly i don't recall: would have to check RHEL5.32 UP/SMP, RHEL5.64
> > UP/SMP, RHEL4.32, RHEL4.64, RHEL6. Can do that in a week or so.
> 
> Its OK to remove LAPIC timer reinjection.
Applied. Thanks for checking!

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e29883c..3bc58cd 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -405,17 +405,17 @@  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 	return highest_irr;
 }
 
-static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
-			     int vector, int level, int trig_mode,
-			     unsigned long *dest_map);
+static void __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
+			      int vector, int level, int trig_mode,
+			      unsigned long *dest_map);
 
-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
-		unsigned long *dest_map)
+void kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
+		      unsigned long *dest_map)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	return __apic_accept_irq(apic, irq->delivery_mode, irq->vector,
-			irq->level, irq->trig_mode, dest_map);
+	__apic_accept_irq(apic, irq->delivery_mode, irq->vector,
+			  irq->level, irq->trig_mode, dest_map);
 }
 
 static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
@@ -608,7 +608,8 @@  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 	*r = -1;
 
 	if (irq->shorthand == APIC_DEST_SELF) {
-		*r = kvm_apic_set_irq(src->vcpu, irq, dest_map);
+		kvm_apic_set_irq(src->vcpu, irq, dest_map);
+		*r = 1;
 		return true;
 	}
 
@@ -653,7 +654,8 @@  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 			continue;
 		if (*r < 0)
 			*r = 0;
-		*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
+		kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
+		*r += 1;
 	}
 
 	ret = true;
@@ -662,15 +664,11 @@  out:
 	return ret;
 }
 
-/*
- * Add a pending IRQ into lapic.
- * Return 1 if successfully added and 0 if discarded.
- */
-static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
-			     int vector, int level, int trig_mode,
-			     unsigned long *dest_map)
+/* Set an IRQ pending in the lapic. */
+static void __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
+			      int vector, int level, int trig_mode,
+			      unsigned long *dest_map)
 {
-	int result = 0;
 	struct kvm_vcpu *vcpu = apic->vcpu;
 
 	switch (delivery_mode) {
@@ -684,13 +682,10 @@  static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 		if (dest_map)
 			__set_bit(vcpu->vcpu_id, dest_map);
 
-		if (kvm_x86_ops->deliver_posted_interrupt) {
-			result = 1;
+		if (kvm_x86_ops->deliver_posted_interrupt)
 			kvm_x86_ops->deliver_posted_interrupt(vcpu, vector);
-		} else {
-			result = !apic_test_and_set_irr(vector, apic);
-
-			if (!result) {
+		else {
+			if (apic_test_and_set_irr(vector, apic)) {
 				if (trig_mode)
 					apic_debug("level trig mode repeatedly "
 						"for vector %d", vector);
@@ -702,7 +697,7 @@  static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 		}
 out:
 		trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
-				trig_mode, vector, !result);
+					  trig_mode, vector, false);
 		break;
 
 	case APIC_DM_REMRD:
@@ -714,14 +709,12 @@  out:
 		break;
 
 	case APIC_DM_NMI:
-		result = 1;
 		kvm_inject_nmi(vcpu);
 		kvm_vcpu_kick(vcpu);
 		break;
 
 	case APIC_DM_INIT:
 		if (!trig_mode || level) {
-			result = 1;
 			/* assumes that there are only KVM_APIC_INIT/SIPI */
 			apic->pending_events = (1UL << KVM_APIC_INIT);
 			/* make sure pending_events is visible before sending
@@ -738,7 +731,6 @@  out:
 	case APIC_DM_STARTUP:
 		apic_debug("SIPI to vcpu %d vector 0x%02x\n",
 			   vcpu->vcpu_id, vector);
-		result = 1;
 		apic->sipi_vector = vector;
 		/* make sure sipi_vector is visible for the receiver */
 		smp_wmb();
@@ -760,7 +752,6 @@  out:
 		       delivery_mode);
 		break;
 	}
-	return result;
 }
 
 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
@@ -1470,7 +1461,7 @@  int apic_has_pending_timer(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
+void kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
 {
 	u32 reg = kvm_apic_get_reg(apic, lvt_type);
 	int vector, mode, trig_mode;
@@ -1479,10 +1470,8 @@  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
 		vector = reg & APIC_VECTOR_MASK;
 		mode = reg & APIC_MODE_MASK;
 		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
-		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
-					NULL);
+		__apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
 	}
-	return 0;
 }
 
 void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
@@ -1608,8 +1597,8 @@  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
 		return;
 
 	if (atomic_read(&apic->lapic_timer.pending) > 0) {
-		if (kvm_apic_local_deliver(apic, APIC_LVTT))
-			atomic_dec(&apic->lapic_timer.pending);
+		kvm_apic_local_deliver(apic, APIC_LVTT);
+		atomic_set(&apic->lapic_timer.pending, 0);
 	}
 }
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index c730ac9..61a73a0 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -57,9 +57,9 @@  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
-		unsigned long *dest_map);
-int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
+void kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
+		      unsigned long *dest_map);
+void kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
 
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 25ab480..d60e1f1 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -91,7 +91,8 @@  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		if (!kvm_is_dm_lowest_prio(irq)) {
 			if (r < 0)
 				r = 0;
-			r += kvm_apic_set_irq(vcpu, irq, dest_map);
+			kvm_apic_set_irq(vcpu, irq, dest_map);
+			r++;
 		} else if (kvm_lapic_enabled(vcpu)) {
 			if (!lowest)
 				lowest = vcpu;
@@ -100,8 +101,10 @@  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		}
 	}
 
-	if (lowest)
-		r = kvm_apic_set_irq(lowest, irq, dest_map);
+	if (lowest) {
+		kvm_apic_set_irq(lowest, irq, dest_map);
+		r = 1;
+	}
 
 	return r;
 }