Message ID | 1427641227-7574-5-git-send-email-mst@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Sun, 03/29 17:04, Michael S. Tsirkin wrote: > This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0: > "pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2" > > It's un-necessary now that we disable msi at start, and it actually > turns out to cause problems: some device drivers don't register a level > interrupt handler when they detect msi/msix capability, switching off > msi while device is going causes device to assert a level interrupt > which is never de-asserted, causing a kernel hang. > > In particular, this was observed with virtio. > > Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> > Cc: Ulrich Obergfell <uobergfe@redhat.com> > Cc: Rusty Russell <rusty@rustcorp.com.au> > Reported-by: Fam Zheng <famz@redhat.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Fam Zheng <famz@redhat.com> > --- > drivers/pci/pci-driver.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 3cb2210..38a602c 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev) > > if (drv && drv->shutdown) > drv->shutdown(pci_dev); > - pci_msi_shutdown(pci_dev); > - pci_msix_shutdown(pci_dev); > > #ifdef CONFIG_KEXEC > /* > -- > MST > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Michael, On Sun, Mar 29, 2015 at 05:04:11PM +0200, Michael S. Tsirkin wrote: > This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0: > "pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2" > > It's un-necessary now that we disable msi at start, and it actually > turns out to cause problems: some device drivers don't register a level > interrupt handler when they detect msi/msix capability, switching off > msi while device is going causes device to assert a level interrupt > which is never de-asserted, causing a kernel hang. > > In particular, this was observed with virtio. I'm not questioning that this hang happens, but would you mind outlining *how* it happens in a little more detail? I'm not an IRQ expert, so I expected an "irq %d: nobody cared" message or something similar. It seems like a kernel hang is a pretty severe way to deal with an unexpected interrupt. Is virtio the only way the hang could happen, or is it just coincidence that it was involved? It'd be really nice if we could reference the bug report here. I think you said the original report was private. Can we open a kernel.org bugzilla that contains just the public information? > Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> > Cc: Ulrich Obergfell <uobergfe@redhat.com> > Cc: Rusty Russell <rusty@rustcorp.com.au> > Reported-by: Fam Zheng <famz@redhat.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > drivers/pci/pci-driver.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 3cb2210..38a602c 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev) > > if (drv && drv->shutdown) > drv->shutdown(pci_dev); > - pci_msi_shutdown(pci_dev); > - pci_msix_shutdown(pci_dev); > > #ifdef CONFIG_KEXEC > /* > -- > MST > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 10, 2015 at 01:33:04PM -0500, Bjorn Helgaas wrote: > Hi Michael, > > On Sun, Mar 29, 2015 at 05:04:11PM +0200, Michael S. Tsirkin wrote: > > This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0: > > "pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2" > > > > It's un-necessary now that we disable msi at start, and it actually > > turns out to cause problems: some device drivers don't register a level > > interrupt handler when they detect msi/msix capability, switching off > > msi while device is going causes device to assert a level interrupt > > which is never de-asserted, causing a kernel hang. > > > > In particular, this was observed with virtio. > > I'm not questioning that this hang happens, but would you mind outlining > *how* it happens in a little more detail? I'm not an IRQ expert, so I > expected an "irq %d: nobody cared" message or something similar. It seems > like a kernel hang is a pretty severe way to deal with an unexpected > interrupt. True. I intend to look into how this interacts with spurious interrupt detection some more. Avoiding spurious interrupts seems like a worthwhile goal in any case, right? It seems clear how this will cause hangs when noirqdebug is set (later leads to softlockup detected messages, or crash if softlockup_panic=1 is set). > Is virtio the only way the hang could happen, or is it just coincidence > that it was involved? Well, you need a driver which doesn't handle level IRQs when it enables MSI. virtio is one such driver. > It'd be really nice if we could reference the bug report here. I think you > said the original report was private. Can we open a kernel.org bugzilla > that contains just the public information? Ulrich Obergfell did most of the work on reproducing this, Fam Zheng did most debugging, so I'd like one of them to do this, so they get the appropriate credit. Fam, Ulrich? > > Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> > > Cc: Ulrich Obergfell <uobergfe@redhat.com> > > Cc: Rusty Russell <rusty@rustcorp.com.au> > > Reported-by: Fam Zheng <famz@redhat.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > drivers/pci/pci-driver.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > > index 3cb2210..38a602c 100644 > > --- a/drivers/pci/pci-driver.c > > +++ b/drivers/pci/pci-driver.c > > @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev) > > > > if (drv && drv->shutdown) > > drv->shutdown(pci_dev); > > - pci_msi_shutdown(pci_dev); > > - pci_msix_shutdown(pci_dev); > > > > #ifdef CONFIG_KEXEC > > /* > > -- > > MST > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 3cb2210..38a602c 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev) if (drv && drv->shutdown) drv->shutdown(pci_dev); - pci_msi_shutdown(pci_dev); - pci_msix_shutdown(pci_dev); #ifdef CONFIG_KEXEC /*
This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0: "pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2" It's un-necessary now that we disable msi at start, and it actually turns out to cause problems: some device drivers don't register a level interrupt handler when they detect msi/msix capability, switching off msi while device is going causes device to assert a level interrupt which is never de-asserted, causing a kernel hang. In particular, this was observed with virtio. Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Fam Zheng <famz@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- drivers/pci/pci-driver.c | 2 -- 1 file changed, 2 deletions(-)