Patchworkβ [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms

login
register
about
Submitter Suresh Siddha
Date 2009-11-07 07:27:50
Message ID <1257578870.2536.52.camel@sbs-t61>
Download mbox | patch
Permalink /patch/58307/
State New
Headers show

Comments

Suresh Siddha - 2009-11-07 07:27:50
On Thu, 2009-11-05 at 22:53 -0800, Maciej W. Rozycki wrote:
>  What I mean is if the serial delivery type is used, then an interrupt 
> will be acked twice -- once via an EOI message sent from the local APIC 
> over the serial bus and then again via the write to the EOI register.  

Maciej, Case where we are doing an explicit EOI to the io-apic (using
EOI register) is when the level triggered interrupt gets registered at
the cpu as an edge interrupt (in the local apic's trigger mode
register).

It will arrive as an edge interrupt for two cases.
a) for corner conditions which hit the 82093AA (io-apic version 0x11)
erratum
b) with my recent patch in -tip, during a cpu offline, when we send an
ipi (IPI is always registered as an edge triggered at the cpu) to
service the interrupt at the new cpu destination, instead of servicing
at it's old destination cpu (as it has already disabled interrupts and
going down -- like not being on the cpu_online_map etc).

So we are not acking the io-apic twice in this case, as the eoi to the
local apic won't brodcast the eoi to the io-apic (because of the edge
mode in trigger mode register of the local apic).

> There is a race condition here, so if the IRQ line is still/again 
> asserted, then most likely two consecutive IRQ messages will be sent by 
> the I/O APIC and they may be accepted by two different local units and 
> eventually delivered to the OS.  Linux will cope, but still this is sloppy 
> programming, so it should be best avoided -- if possible that is.

I hope the above clarified.

> > 
> > I am not sure if I follow. With the recent changes (tip commit
> > 5231a68614b94f60e8f6c56bc6e3d75955b9e75e), we use ipi on a new cpu to
> > handle a pending level-triggered  interrupt on the cpu that is going
> > offline. It's just not only in the case of io-apic erratum for 0x11, we
> > see level triggered interrupt as edge interrupt at the cpu.
> 
>  I don't get it, sorry -- an interrupt has its trigger mode implied by the 
> IRQ_LEVEL status flag being set or clear in the IRQ's descriptor.  What's 
> set in the local APIC's TMR does not (or should not) matter IMO.

Same, did the above clarify?

> 
>  Has the trigger mode mismatch erratum been seen for FSB delivered 
> interrupts anyway? 

No.

>  Complete the EOI dance for the trigger mode mismatch APIC erratum before 
> proceeding to IRQ migration. 

I am ok with what you are tying to fix, but not your patch. please see
below.

> +	if (use_eoi_reg) {
> +		ack_APIC_irq();
> +		eoi_ioapic_irq(desc);

We shouldn't do this unconditionally. i.e., we should do this double ack
only when the level-triggered interrupt is seen as an edge triggered
interrupt at the  cpu (specified by the trigger mode register in local
apic).

Otherwise as you mentioned above, we will see two EOI msg's at the
io-apic (one by the cpu's eoi broadcast and another by explicit eoi to
the io-apic).

Best way to fix the issue you noticed is by simply moving the tail end
of that erratum fix before we try to migrate the irq to a new place.

Do you agree?

---
From: "Maciej W. Rozycki" <macro@linux-mips.org>
Subject: x86, io-apic: move the effort of clearing remoteIRR explicitly
before migrating the irq

When the level-triggered interrupt is seen as an edge interrupt, we try
to clear the remoteIRR explicitly (using either an io-apic eoi register
when present or through the idea of changing trigger mode of the io-apic
RTE to edge and then back to level). But this explicit try also needs to
happen before we try to migrate the irq. Otherwise irq migration attempt
will fail anyhow, as it postpones the irq migration to a later attempt
when it sees the remoteIRR in the io-apic RTE still set.

Signed-off-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
---


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Maciej W. Rozycki - 2009-11-08 19:06:19
On Fri, 6 Nov 2009, Suresh Siddha wrote:

> >  What I mean is if the serial delivery type is used, then an interrupt 
> > will be acked twice -- once via an EOI message sent from the local APIC 
> > over the serial bus and then again via the write to the EOI register.  
> 
> Maciej, Case where we are doing an explicit EOI to the io-apic (using
> EOI register) is when the level triggered interrupt gets registered at
> the cpu as an edge interrupt (in the local apic's trigger mode
> register).
> 
> It will arrive as an edge interrupt for two cases.
> a) for corner conditions which hit the 82093AA (io-apic version 0x11)
> erratum
> b) with my recent patch in -tip, during a cpu offline, when we send an
> ipi (IPI is always registered as an edge triggered at the cpu) to
> service the interrupt at the new cpu destination, instead of servicing
> at it's old destination cpu (as it has already disabled interrupts and
> going down -- like not being on the cpu_online_map etc).
> 
> So we are not acking the io-apic twice in this case, as the eoi to the
> local apic won't brodcast the eoi to the io-apic (because of the edge
> mode in trigger mode register of the local apic).

 OK, I see what you mean, but that makes me wonder why you are going 
through such contortions.  In the case of a CPU going offline I would 
expect it to be done more or less in such a way:

1. Write all-zeroes to its local APIC's LDR register and set its TPR to 
   0xef.  This will take this APIC out from LoPri arbitration and thus 
   from accepting any I/O APIC interrupts.  Fixed delivery mode IPIs will 
   still be accepted (if that's not needed then the TPR can be set to 
   0xff; any received IPIs will be lost).

2. Service any outstanding interrupts that have already been accepted by 
   the local APIC (you may have to poll on the local IRR register with 
   interrupts enabled for a short while).

3. Disable the local APIC via the SVR register, mask local interrupts in 
   the processor's EFLAGS register and start the offline procedure.  This 
   is the point of no-return, further IPIs won't be accepted and the CPU 
   has to be put through an INIT-IPI+StartUp-IPI cycle to get in control 
   again.

   If IPI reception was not needed through stage #2 above, then the local 
   APIC could have been disabled at #1 instead -- interrupts pending in 
   the local APIC as recorded in the IRR or marked as in-progress in the 
   ISR are guaranteed to be delivered to the CPU and EOIed (as 
   appropriate) normally even in the disabled state of the local APIC.

> Do you agree?

 If the scenario I have outlined above cannot be made to work for some 
reason, then please do me and the others a favour and since with this 
change you are tying new functionality to code originally meant as a 
workaround for an obscure erratum only, do write a proper explanation and 
place it next to the original comment describing previous use of the code. 
With your change as it is, it is all but obvious what this piece of code 
is meant to do.

 Your change is OK with me once accompanied with said comment, but please 
investigate my scenario first -- your approach looks like a horrible hack 
to me, sorry.

  Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 47d95c3..9a26ea1 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2583,6 +2583,20 @@  static void ack_apic_level(unsigned int irq)
 	 */
 	ack_APIC_irq();
 
+	/* Tail end of version 0x11 I/O APIC bug workaround */
+	if (!(v & (1 << (i & 0x1f)))) {
+		atomic_inc(&irq_mis_count);
+
+		if (use_eoi_reg)
+			eoi_ioapic_irq(desc);
+		else {
+			spin_lock(&ioapic_lock);
+			__mask_and_edge_IO_APIC_irq(cfg);
+			__unmask_and_level_IO_APIC_irq(cfg);
+			spin_unlock(&ioapic_lock);
+		}
+	}
+
 	/* Now we can move and renable the irq */
 	if (unlikely(do_unmask_irq)) {
 		/* Only migrate the irq if the ack has been received.
@@ -2616,20 +2630,6 @@  static void ack_apic_level(unsigned int irq)
 			move_masked_irq(irq);
 		unmask_IO_APIC_irq_desc(desc);
 	}
-
-	/* Tail end of version 0x11 I/O APIC bug workaround */
-	if (!(v & (1 << (i & 0x1f)))) {
-		atomic_inc(&irq_mis_count);
-
-		if (use_eoi_reg)
-			eoi_ioapic_irq(desc);
-		else {
-			spin_lock(&ioapic_lock);
-			__mask_and_edge_IO_APIC_irq(cfg);
-			__unmask_and_level_IO_APIC_irq(cfg);
-			spin_unlock(&ioapic_lock);
-		}
-	}
 }
 
 #ifdef CONFIG_INTR_REMAP