diff mbox

[v5,04/10] pci: don't disable msi/msix at shutdown

Message ID 1427641227-7574-5-git-send-email-mst@redhat.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Michael S. Tsirkin March 29, 2015, 3:04 p.m. UTC
This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0:
	"pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2"

It's un-necessary now that we disable msi at start, and it actually
turns out to cause problems: some device drivers don't register a level
interrupt handler when they detect msi/msix capability, switching off
msi while device is going causes device to assert a level interrupt
which is never de-asserted, causing a kernel hang.

In particular, this was observed with virtio.

Cc: Yinghai Lu <yhlu.kernel.send@gmail.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/pci/pci-driver.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Fam Zheng April 2, 2015, 3:54 a.m. UTC | #1
On Sun, 03/29 17:04, Michael S. Tsirkin wrote:
> This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0:
> 	"pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2"
> 
> It's un-necessary now that we disable msi at start, and it actually
> turns out to cause problems: some device drivers don't register a level
> interrupt handler when they detect msi/msix capability, switching off
> msi while device is going causes device to assert a level interrupt
> which is never de-asserted, causing a kernel hang.
> 
> In particular, this was observed with virtio.
> 
> Cc: Yinghai Lu <yhlu.kernel.send@gmail.com>
> Cc: Ulrich Obergfell <uobergfe@redhat.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Reported-by: Fam Zheng <famz@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Fam Zheng <famz@redhat.com>

> ---
>  drivers/pci/pci-driver.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 3cb2210..38a602c 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev)
>  
>  	if (drv && drv->shutdown)
>  		drv->shutdown(pci_dev);
> -	pci_msi_shutdown(pci_dev);
> -	pci_msix_shutdown(pci_dev);
>  
>  #ifdef CONFIG_KEXEC
>  	/*
> -- 
> MST
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 10, 2015, 6:33 p.m. UTC | #2
Hi Michael,

On Sun, Mar 29, 2015 at 05:04:11PM +0200, Michael S. Tsirkin wrote:
> This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0:
> 	"pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2"
> 
> It's un-necessary now that we disable msi at start, and it actually
> turns out to cause problems: some device drivers don't register a level
> interrupt handler when they detect msi/msix capability, switching off
> msi while device is going causes device to assert a level interrupt
> which is never de-asserted, causing a kernel hang.
> 
> In particular, this was observed with virtio.

I'm not questioning that this hang happens, but would you mind outlining
*how* it happens in a little more detail?  I'm not an IRQ expert, so I
expected an "irq %d: nobody cared" message or something similar.  It seems
like a kernel hang is a pretty severe way to deal with an unexpected
interrupt.

Is virtio the only way the hang could happen, or is it just coincidence
that it was involved?

It'd be really nice if we could reference the bug report here.  I think you
said the original report was private.  Can we open a kernel.org bugzilla
that contains just the public information?

> Cc: Yinghai Lu <yhlu.kernel.send@gmail.com>
> Cc: Ulrich Obergfell <uobergfe@redhat.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Reported-by: Fam Zheng <famz@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/pci/pci-driver.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 3cb2210..38a602c 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev)
>  
>  	if (drv && drv->shutdown)
>  		drv->shutdown(pci_dev);
> -	pci_msi_shutdown(pci_dev);
> -	pci_msix_shutdown(pci_dev);
>  
>  #ifdef CONFIG_KEXEC
>  	/*
> -- 
> MST
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin April 12, 2015, 8:52 a.m. UTC | #3
On Fri, Apr 10, 2015 at 01:33:04PM -0500, Bjorn Helgaas wrote:
> Hi Michael,
> 
> On Sun, Mar 29, 2015 at 05:04:11PM +0200, Michael S. Tsirkin wrote:
> > This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0:
> > 	"pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2"
> > 
> > It's un-necessary now that we disable msi at start, and it actually
> > turns out to cause problems: some device drivers don't register a level
> > interrupt handler when they detect msi/msix capability, switching off
> > msi while device is going causes device to assert a level interrupt
> > which is never de-asserted, causing a kernel hang.
> > 
> > In particular, this was observed with virtio.
> 
> I'm not questioning that this hang happens, but would you mind outlining
> *how* it happens in a little more detail?  I'm not an IRQ expert, so I
> expected an "irq %d: nobody cared" message or something similar.  It seems
> like a kernel hang is a pretty severe way to deal with an unexpected
> interrupt.

True. I intend to look into how this interacts with spurious
interrupt detection some more. Avoiding spurious interrupts
seems like a worthwhile goal in any case, right?

It seems clear how this will cause hangs when noirqdebug is set (later leads
to softlockup detected messages, or crash if softlockup_panic=1 is set).

> Is virtio the only way the hang could happen, or is it just coincidence
> that it was involved?

Well, you need a driver which doesn't handle level IRQs
when it enables MSI. virtio is one such driver.


> It'd be really nice if we could reference the bug report here.  I think you
> said the original report was private.  Can we open a kernel.org bugzilla
> that contains just the public information?

Ulrich Obergfell did most of the work on reproducing this,
Fam Zheng did most debugging, so I'd like one of them
to do this, so they get the appropriate credit.
Fam, Ulrich?

> > Cc: Yinghai Lu <yhlu.kernel.send@gmail.com>
> > Cc: Ulrich Obergfell <uobergfe@redhat.com>
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Reported-by: Fam Zheng <famz@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/pci/pci-driver.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 3cb2210..38a602c 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev)
> >  
> >  	if (drv && drv->shutdown)
> >  		drv->shutdown(pci_dev);
> > -	pci_msi_shutdown(pci_dev);
> > -	pci_msix_shutdown(pci_dev);
> >  
> >  #ifdef CONFIG_KEXEC
> >  	/*
> > -- 
> > MST
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 3cb2210..38a602c 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -450,8 +450,6 @@  static void pci_device_shutdown(struct device *dev)
 
 	if (drv && drv->shutdown)
 		drv->shutdown(pci_dev);
-	pci_msi_shutdown(pci_dev);
-	pci_msix_shutdown(pci_dev);
 
 #ifdef CONFIG_KEXEC
 	/*