diff mbox

[GIT,PULL] KVM fixes for 3.5-rc6

Message ID alpine.LFD.2.02.1207131928000.32033@ionos (mailing list archive)
State New, archived
Headers show

Commit Message

Thomas Gleixner July 13, 2012, 6:28 p.m. UTC
On Fri, 13 Jul 2012, Linus Torvalds wrote:

> On Fri, Jul 13, 2012 at 8:45 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> At the same time, I do wonder if maybe MSI + IRQF_ONESHOT couldn't be
> improved. The fact that the KVM people think that the extra overhead
> of IRQF_ONESHOT is a bad thing for MSI interrupts makes me wonder if
> maybe this wouldn't be an area the irq layer couldn't be improved on.
> Maybe the MSI+IRQF_ONESHOT case could be improved. Because MSI is kind
> of fundamentally one-shot, since it's a message-based irq scheme.  So
> maybe the extra overhead is unnecessary in general, not just in this
> particular KVM case. Hmm?
> 
> Thomas, see the commentary of a76beb14123a ("KVM: Fix device
> assignment threaded irq handler").

Groan.

We already discussed to let the irq chip (in this case MSI) tell the
core that it does not need the extra oneshot handling. That way the
code which requests an threaded irq with the NULL primary handler
works on both MSI and normal interrupts.

Untested patch below.

Thanks,

	tglx

-----
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Linus Torvalds July 13, 2012, 6:53 p.m. UTC | #1
On Fri, Jul 13, 2012 at 11:28 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> We already discussed to let the irq chip (in this case MSI) tell the
> core that it does not need the extra oneshot handling. That way the
> code which requests an threaded irq with the NULL primary handler
> works on both MSI and normal interrupts.

So I  don't think your patch is quite right.

If you want to clear the IRQF_ONESHOT for MSI irq's (and other ones
where the interrupt controller is fundamentally ONESHOT), I think you
should do it a few lines higher up - *before* you check the "does the
IRQF_ONESHOT mask match other shared interrupts"?

Now, irq sharing presumably doesn't happen with MSI, but there's
nothing fundamentally wrong with message-based irq schemes that have
shared interrupt handlers.

I think. Hmm?

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner July 14, 2012, 2:25 a.m. UTC | #2
On Fri, 13 Jul 2012, Thomas Gleixner wrote:
> On Fri, 13 Jul 2012, Linus Torvalds wrote:
> > On Fri, Jul 13, 2012 at 8:45 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > At the same time, I do wonder if maybe MSI + IRQF_ONESHOT couldn't be
> > improved. The fact that the KVM people think that the extra overhead
> > of IRQF_ONESHOT is a bad thing for MSI interrupts makes me wonder if
> > maybe this wouldn't be an area the irq layer couldn't be improved on.
> > Maybe the MSI+IRQF_ONESHOT case could be improved. Because MSI is kind
> > of fundamentally one-shot, since it's a message-based irq scheme.  So
> > maybe the extra overhead is unnecessary in general, not just in this
> > particular KVM case. Hmm?
> > 
> > Thomas, see the commentary of a76beb14123a ("KVM: Fix device
> > assignment threaded irq handler").
> 
> Groan.
> 
> We already discussed to let the irq chip (in this case MSI) tell the
> core that it does not need the extra oneshot handling. That way the
> code which requests an threaded irq with the NULL primary handler
> works on both MSI and normal interrupts.

That's the kind of stuff which makes me go berserk, and just for the
record: the most complaints I get for going berserk are coming from
the virt folks.

I really can't understand why the virt folks think they are
special and do not have to talk to core maintainers.

Back then when I was doing the big irq cleanup, virt crap stood out by
far with the most idiotic^Wcreative "workarounds". Of course nobody
complained about me being moronic enough to come up with generic
solutions for their problems.

Though especially that commit including its changelog proves once
again the ignorance and desinterest of the virt crowd in solving
problems which are not only relevant to themself.

I whish you'd just refused to pull that nonsense and instead made them
talk to those folks who had a proper solution in mind already.

In fact we could have solved that proper weeks ago, if only people
would have raised the issue.

I'm tired of that kind of crap, really.

    Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka July 14, 2012, 7 a.m. UTC | #3
On 2012-07-14 04:25, Thomas Gleixner wrote:
> On Fri, 13 Jul 2012, Thomas Gleixner wrote:
>> On Fri, 13 Jul 2012, Linus Torvalds wrote:
>>> On Fri, Jul 13, 2012 at 8:45 AM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>> At the same time, I do wonder if maybe MSI + IRQF_ONESHOT couldn't be
>>> improved. The fact that the KVM people think that the extra overhead
>>> of IRQF_ONESHOT is a bad thing for MSI interrupts makes me wonder if
>>> maybe this wouldn't be an area the irq layer couldn't be improved on.
>>> Maybe the MSI+IRQF_ONESHOT case could be improved. Because MSI is kind
>>> of fundamentally one-shot, since it's a message-based irq scheme.  So
>>> maybe the extra overhead is unnecessary in general, not just in this
>>> particular KVM case. Hmm?
>>>
>>> Thomas, see the commentary of a76beb14123a ("KVM: Fix device
>>> assignment threaded irq handler").
>>
>> Groan.
>>
>> We already discussed to let the irq chip (in this case MSI) tell the
>> core that it does not need the extra oneshot handling. That way the
>> code which requests an threaded irq with the NULL primary handler
>> works on both MSI and normal interrupts.
> 
> That's the kind of stuff which makes me go berserk, and just for the
> record: the most complaints I get for going berserk are coming from
> the virt folks.
> 
> I really can't understand why the virt folks think they are
> special and do not have to talk to core maintainers.
> 
> Back then when I was doing the big irq cleanup, virt crap stood out by
> far with the most idiotic^Wcreative "workarounds". Of course nobody
> complained about me being moronic enough to come up with generic
> solutions for their problems.
> 
> Though especially that commit including its changelog proves once
> again the ignorance and desinterest of the virt crowd in solving
> problems which are not only relevant to themself.
> 
> I whish you'd just refused to pull that nonsense and instead made them
> talk to those folks who had a proper solution in mind already.
> 
> In fact we could have solved that proper weeks ago, if only people
> would have raised the issue.

June 1: http://thread.gmane.org/gmane.linux.kernel/1306742

The proper solution for us will be conditional direct IRQ delivery
anyway [1], but those patches were not considered ready for 3.5.

This patch here is a workaround to unbreak devices assignment in 3.5
after the IRQ layer changes without regressing noticeable /wrt overhead.

Jan

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/92249
diff mbox

Patch

Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
+++ linux-2.6/arch/x86/kernel/apic/io_apic.c
@@ -3109,6 +3109,7 @@  static struct irq_chip msi_chip = {
 	.irq_set_affinity	= msi_set_affinity,
 #endif
 	.irq_retrigger		= ioapic_retrigger_irq,
+	.flags			= IRQCHIP_ONESHOT_SAFE,
 };
 
 static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -351,6 +351,7 @@  enum {
 	IRQCHIP_MASK_ON_SUSPEND		= (1 <<  2),
 	IRQCHIP_ONOFFLINE_ENABLED	= (1 <<  3),
 	IRQCHIP_SKIP_SET_WAKE		= (1 <<  4),
+	IRQCHIP_ONESHOT_SAFE		= (1 <<  5),
 };
 
 /* This include will go away once we isolated irq_desc usage to core code */
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -1004,35 +1004,48 @@  __setup_irq(unsigned int irq, struct irq
 	 */
 	if (new->flags & IRQF_ONESHOT) {
 		/*
-		 * Unlikely to have 32 resp 64 irqs sharing one line,
-		 * but who knows.
+		 * Drivers are often written to work w/o knowledge
+		 * about the underlying irq chip implementation, so a
+		 * request for a threaded irq without a primary hard
+		 * irq context handler requires the ONESHOT flag to be
+		 * set. Some irq chips like MSI based interrupts are
+		 * per se one shot safe. Check the chip flags, so we
+		 * can avoid the unmask dance at the end of the
+		 * threaded handler for those.
 		 */
-		if (thread_mask == ~0UL) {
-			ret = -EBUSY;
-			goto out_mask;
+		if (desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE) {
+			new->flags &= ~IRQF_ONESHOT;
+		} else {
+			/*
+			 * Unlikely to have 32 resp 64 irqs sharing one line,
+			 * but who knows.
+			 */
+			if (thread_mask == ~0UL) {
+				ret = -EBUSY;
+				goto out_mask;
+			}
+			/*
+			 * The thread_mask for the action is or'ed to
+			 * desc->thread_active to indicate that the
+			 * IRQF_ONESHOT thread handler has been woken, but not
+			 * yet finished. The bit is cleared when a thread
+			 * completes. When all threads of a shared interrupt
+			 * line have completed desc->threads_active becomes
+			 * zero and the interrupt line is unmasked. See
+			 * handle.c:irq_wake_thread() for further information.
+			 *
+			 * If no thread is woken by primary (hard irq context)
+			 * interrupt handlers, then desc->threads_active is
+			 * also checked for zero to unmask the irq line in the
+			 * affected hard irq flow handlers
+			 * (handle_[fasteoi|level]_irq).
+			 *
+			 * The new action gets the first zero bit of
+			 * thread_mask assigned. See the loop above which or's
+			 * all existing action->thread_mask bits.
+			 */
+			new->thread_mask = 1 << ffz(thread_mask);
 		}
-		/*
-		 * The thread_mask for the action is or'ed to
-		 * desc->thread_active to indicate that the
-		 * IRQF_ONESHOT thread handler has been woken, but not
-		 * yet finished. The bit is cleared when a thread
-		 * completes. When all threads of a shared interrupt
-		 * line have completed desc->threads_active becomes
-		 * zero and the interrupt line is unmasked. See
-		 * handle.c:irq_wake_thread() for further information.
-		 *
-		 * If no thread is woken by primary (hard irq context)
-		 * interrupt handlers, then desc->threads_active is
-		 * also checked for zero to unmask the irq line in the
-		 * affected hard irq flow handlers
-		 * (handle_[fasteoi|level]_irq).
-		 *
-		 * The new action gets the first zero bit of
-		 * thread_mask assigned. See the loop above which or's
-		 * all existing action->thread_mask bits.
-		 */
-		new->thread_mask = 1 << ffz(thread_mask);
-
 	} else if (new->handler == irq_default_primary_handler) {
 		/*
 		 * The interrupt was requested with handler = NULL, so