Message ID | 1441553385-3810-1-git-send-email-mst@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > On some hypervisors, virtio devices tend to generate spurious interrupts > when switching between MSI and non-MSI mode. Normally, either MSI or > non-MSI is used and all is well, but during shutdown, linux disables MSI > which then causes an "irq %d: nobody cared" message, with irq being > subsequently disabled. > > Since bus mastering is already disabled at this point, disabling MSI > isn't actually useful for spec compliant devices: MSI interrupts are > in-bus memory writes so disabling Bus Master (which is already done) > disables these as well: after some research, it appears to be there for > the benefit of devices that ignore the bus master bit. > > As it's not clear how common this kind of bug is, this patch simply adds > a quirk, to be set by devices that wish to skip disabling msi on > shutdown, relying on bus master instead. > > We set this quirk in virtio core. > > Reported-by: Fam Zheng <famz@redhat.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> > Cc: Ulrich Obergfell <uobergfe@redhat.com> > Cc: Rusty Russell <rusty@rustcorp.com.au> > Cc: "Eric W. Biederman" <ebiederm@xmission.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Eric, what do you think of this? You had many comments on previous versions. Minor comment on a comment below. > --- > > > changes from v6: > limit changes to virtio only > changes from v5: > rebased on top of pci/msi > fixed commit log, including comments by Bjorn > and adding explanation to address comments/questions by Eric > dropped stable Cc, this patch does not seem to qualify for stable > changes from v4: > Yijing Wang <wangyijing@huawei.com> noted that > early fixups rely on pci_msi_off. > Split out the functionality and move off the > required part to run early during pci_device_setup. > Changes from v3: > fix a copy-and-paste error in > pci: drop some duplicate code > other patches are unchanged > drop Cc stable for now > Changes from v2: > move code from probe to device enumeration > add patches to unexport pci_msi_off > > > include/linux/pci.h | 2 ++ > drivers/pci/pci-driver.c | 6 ++++-- > drivers/virtio/virtio_pci_common.c | 4 ++++ > 3 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 860c751..80f3494 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -180,6 +180,8 @@ enum pci_dev_flags { > PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6), > /* Do not use PM reset even if device advertises NoSoftRst- */ > PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7), > + /* Do not disable MSI on shutdown, disable bus master instead */ This comment doesn't really match what the patch does. The patch merely does "Do not disable MSI on shutdown." It doesn't "disable bus master instead." Bus master may be disabled elsewhere, but that is independent of the PCI_DEV_FLAGS_NO_MSI_SHUTDOWN flag. > + PCI_DEV_FLAGS_NO_MSI_SHUTDOWN = (__force pci_dev_flags_t) (1 << 8), > }; > > enum pci_irq_reroute_variant { > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 3cb2210..59d9e40 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -450,8 +450,10 @@ static void pci_device_shutdown(struct device *dev) > > if (drv && drv->shutdown) > drv->shutdown(pci_dev); > - pci_msi_shutdown(pci_dev); > - pci_msix_shutdown(pci_dev); > + if (!(pci_dev->dev_flags & PCI_DEV_FLAGS_NO_MSI_SHUTDOWN)) { > + pci_msi_shutdown(pci_dev); > + pci_msix_shutdown(pci_dev); > + } > > #ifdef CONFIG_KEXEC > /* > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c > index 78f804a..26f46c3 100644 > --- a/drivers/virtio/virtio_pci_common.c > +++ b/drivers/virtio/virtio_pci_common.c > @@ -528,6 +528,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, > if (rc) > goto err_register; > > + pci_dev->dev_flags |= PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; > + > return 0; > > err_register: > @@ -546,6 +548,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) > { > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); > > + pci_dev->dev_flags &= ~PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; > + > unregister_virtio_device(&vp_dev->vdev); > > if (vp_dev->ioaddr) > -- > MST -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas <bhelgaas@google.com> writes: > On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: >> On some hypervisors, virtio devices tend to generate spurious interrupts >> when switching between MSI and non-MSI mode. Normally, either MSI or >> non-MSI is used and all is well, but during shutdown, linux disables MSI >> which then causes an "irq %d: nobody cared" message, with irq being >> subsequently disabled. >> >> Since bus mastering is already disabled at this point, disabling MSI >> isn't actually useful for spec compliant devices: MSI interrupts are >> in-bus memory writes so disabling Bus Master (which is already done) >> disables these as well: after some research, it appears to be there for >> the benefit of devices that ignore the bus master bit. >> >> As it's not clear how common this kind of bug is, this patch simply adds >> a quirk, to be set by devices that wish to skip disabling msi on >> shutdown, relying on bus master instead. >> >> We set this quirk in virtio core. >> >> Reported-by: Fam Zheng <famz@redhat.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> >> Cc: Ulrich Obergfell <uobergfe@redhat.com> >> Cc: Rusty Russell <rusty@rustcorp.com.au> >> Cc: "Eric W. Biederman" <ebiederm@xmission.com> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > Eric, what do you think of this? You had many comments on previous > versions. I still think it is a hack to avoid actually fixing a driver. I think the hack is better as a quirk. I think the quirk would tend to be better if it was limited to whatever set of hypervisors where this is actually a problem. And of course given that on any sane configuration we have the irq watchdog I don't see what this is fixing in practice. Other than suggesting this hack become find a way to limit itself to the driver that is actually having problems I don't see a way to improve it. > Minor comment on a comment below. > >> --- >> >> >> changes from v6: >> limit changes to virtio only >> changes from v5: >> rebased on top of pci/msi >> fixed commit log, including comments by Bjorn >> and adding explanation to address comments/questions by Eric >> dropped stable Cc, this patch does not seem to qualify for stable >> changes from v4: >> Yijing Wang <wangyijing@huawei.com> noted that >> early fixups rely on pci_msi_off. >> Split out the functionality and move off the >> required part to run early during pci_device_setup. >> Changes from v3: >> fix a copy-and-paste error in >> pci: drop some duplicate code >> other patches are unchanged >> drop Cc stable for now >> Changes from v2: >> move code from probe to device enumeration >> add patches to unexport pci_msi_off >> >> >> include/linux/pci.h | 2 ++ >> drivers/pci/pci-driver.c | 6 ++++-- >> drivers/virtio/virtio_pci_common.c | 4 ++++ >> 3 files changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 860c751..80f3494 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -180,6 +180,8 @@ enum pci_dev_flags { >> PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6), >> /* Do not use PM reset even if device advertises NoSoftRst- */ >> PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7), >> + /* Do not disable MSI on shutdown, disable bus master instead */ > > This comment doesn't really match what the patch does. The patch merely > does "Do not disable MSI on shutdown." It doesn't "disable bus master > instead." > > Bus master may be disabled elsewhere, but that is independent of the > PCI_DEV_FLAGS_NO_MSI_SHUTDOWN flag. > >> + PCI_DEV_FLAGS_NO_MSI_SHUTDOWN = (__force pci_dev_flags_t) (1 << 8), >> }; >> >> enum pci_irq_reroute_variant { >> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c >> index 3cb2210..59d9e40 100644 >> --- a/drivers/pci/pci-driver.c >> +++ b/drivers/pci/pci-driver.c >> @@ -450,8 +450,10 @@ static void pci_device_shutdown(struct device *dev) >> >> if (drv && drv->shutdown) >> drv->shutdown(pci_dev); >> - pci_msi_shutdown(pci_dev); >> - pci_msix_shutdown(pci_dev); >> + if (!(pci_dev->dev_flags & PCI_DEV_FLAGS_NO_MSI_SHUTDOWN)) { >> + pci_msi_shutdown(pci_dev); >> + pci_msix_shutdown(pci_dev); >> + } >> >> #ifdef CONFIG_KEXEC >> /* >> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c >> index 78f804a..26f46c3 100644 >> --- a/drivers/virtio/virtio_pci_common.c >> +++ b/drivers/virtio/virtio_pci_common.c >> @@ -528,6 +528,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, >> if (rc) >> goto err_register; >> >> + pci_dev->dev_flags |= PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; >> + >> return 0; >> >> err_register: >> @@ -546,6 +548,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) >> { >> struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); >> >> + pci_dev->dev_flags &= ~PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; >> + >> unregister_virtio_device(&vp_dev->vdev); >> >> if (vp_dev->ioaddr) >> -- >> MST -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 17, 2015 at 10:44:24AM -0500, Bjorn Helgaas wrote: > On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > > On some hypervisors, virtio devices tend to generate spurious interrupts > > when switching between MSI and non-MSI mode. Normally, either MSI or > > non-MSI is used and all is well, but during shutdown, linux disables MSI > > which then causes an "irq %d: nobody cared" message, with irq being > > subsequently disabled. > > > > Since bus mastering is already disabled at this point, disabling MSI > > isn't actually useful for spec compliant devices: MSI interrupts are > > in-bus memory writes so disabling Bus Master (which is already done) > > disables these as well: after some research, it appears to be there for > > the benefit of devices that ignore the bus master bit. > > > > As it's not clear how common this kind of bug is, this patch simply adds > > a quirk, to be set by devices that wish to skip disabling msi on > > shutdown, relying on bus master instead. > > > > We set this quirk in virtio core. > > > > Reported-by: Fam Zheng <famz@redhat.com> > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> > > Cc: Ulrich Obergfell <uobergfe@redhat.com> > > Cc: Rusty Russell <rusty@rustcorp.com.au> > > Cc: "Eric W. Biederman" <ebiederm@xmission.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > Eric, what do you think of this? You had many comments on previous > versions. > > Minor comment on a comment below. > > > --- > > > > > > changes from v6: > > limit changes to virtio only > > changes from v5: > > rebased on top of pci/msi > > fixed commit log, including comments by Bjorn > > and adding explanation to address comments/questions by Eric > > dropped stable Cc, this patch does not seem to qualify for stable > > changes from v4: > > Yijing Wang <wangyijing@huawei.com> noted that > > early fixups rely on pci_msi_off. > > Split out the functionality and move off the > > required part to run early during pci_device_setup. > > Changes from v3: > > fix a copy-and-paste error in > > pci: drop some duplicate code > > other patches are unchanged > > drop Cc stable for now > > Changes from v2: > > move code from probe to device enumeration > > add patches to unexport pci_msi_off > > > > > > include/linux/pci.h | 2 ++ > > drivers/pci/pci-driver.c | 6 ++++-- > > drivers/virtio/virtio_pci_common.c | 4 ++++ > > 3 files changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/pci.h b/include/linux/pci.h > > index 860c751..80f3494 100644 > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -180,6 +180,8 @@ enum pci_dev_flags { > > PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6), > > /* Do not use PM reset even if device advertises NoSoftRst- */ > > PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7), > > + /* Do not disable MSI on shutdown, disable bus master instead */ > > This comment doesn't really match what the patch does. The patch merely > does "Do not disable MSI on shutdown." It doesn't "disable bus master > instead." > > Bus master may be disabled elsewhere, but that is independent of the > PCI_DEV_FLAGS_NO_MSI_SHUTDOWN flag. Yes but I wanted to point out that bus master needs to work. One of the reasons we don't do this patch for all devices is because some builtin devices might have a broken bus master. How about "Some other means (e.g. disabling bus master) suppress interrupts." > > + PCI_DEV_FLAGS_NO_MSI_SHUTDOWN = (__force pci_dev_flags_t) (1 << 8), > > }; > > > > enum pci_irq_reroute_variant { > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > > index 3cb2210..59d9e40 100644 > > --- a/drivers/pci/pci-driver.c > > +++ b/drivers/pci/pci-driver.c > > @@ -450,8 +450,10 @@ static void pci_device_shutdown(struct device *dev) > > > > if (drv && drv->shutdown) > > drv->shutdown(pci_dev); > > - pci_msi_shutdown(pci_dev); > > - pci_msix_shutdown(pci_dev); > > + if (!(pci_dev->dev_flags & PCI_DEV_FLAGS_NO_MSI_SHUTDOWN)) { > > + pci_msi_shutdown(pci_dev); > > + pci_msix_shutdown(pci_dev); > > + } > > > > #ifdef CONFIG_KEXEC > > /* > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c > > index 78f804a..26f46c3 100644 > > --- a/drivers/virtio/virtio_pci_common.c > > +++ b/drivers/virtio/virtio_pci_common.c > > @@ -528,6 +528,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, > > if (rc) > > goto err_register; > > > > + pci_dev->dev_flags |= PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; > > + > > return 0; > > > > err_register: > > @@ -546,6 +548,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) > > { > > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); > > > > + pci_dev->dev_flags &= ~PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; > > + > > unregister_virtio_device(&vp_dev->vdev); > > > > if (vp_dev->ioaddr) > > -- > > MST -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 17, 2015 at 10:49:06AM -0500, Eric W. Biederman wrote: > Bjorn Helgaas <bhelgaas@google.com> writes: > > > On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > >> On some hypervisors, virtio devices tend to generate spurious interrupts > >> when switching between MSI and non-MSI mode. Normally, either MSI or > >> non-MSI is used and all is well, but during shutdown, linux disables MSI > >> which then causes an "irq %d: nobody cared" message, with irq being > >> subsequently disabled. > >> > >> Since bus mastering is already disabled at this point, disabling MSI > >> isn't actually useful for spec compliant devices: MSI interrupts are > >> in-bus memory writes so disabling Bus Master (which is already done) > >> disables these as well: after some research, it appears to be there for > >> the benefit of devices that ignore the bus master bit. > >> > >> As it's not clear how common this kind of bug is, this patch simply adds > >> a quirk, to be set by devices that wish to skip disabling msi on > >> shutdown, relying on bus master instead. > >> > >> We set this quirk in virtio core. > >> > >> Reported-by: Fam Zheng <famz@redhat.com> > >> Cc: Bjorn Helgaas <bhelgaas@google.com> > >> Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> > >> Cc: Ulrich Obergfell <uobergfe@redhat.com> > >> Cc: Rusty Russell <rusty@rustcorp.com.au> > >> Cc: "Eric W. Biederman" <ebiederm@xmission.com> > >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > Eric, what do you think of this? You had many comments on previous > > versions. > > I still think it is a hack to avoid actually fixing a driver. The bug's in the hypervisor though. > I think the hack is better as a quirk. You mean put it in drivers/pci/quirks.c? > I think the quirk would tend to > be better if it was limited to whatever set of hypervisors where this is > actually a problem. I guess we might add a new flag "msi disable at shutdown is safe" in future hypervisors. Until such hypervisors ship I don't see a way to detect this. > And of course given that on any sane configuration we have the irq > watchdog I don't see what this is fixing in practice. > > Other than suggesting this hack become find a way to limit itself to the > driver that is actually having problems I don't see a way to improve it. It's already limited: virtio is the only one that sets PCI_DEV_FLAGS_NO_MSI_SHUTDOWN. Did you notice this? > > Minor comment on a comment below. > > > >> --- > >> > >> > >> changes from v6: > >> limit changes to virtio only > >> changes from v5: > >> rebased on top of pci/msi > >> fixed commit log, including comments by Bjorn > >> and adding explanation to address comments/questions by Eric > >> dropped stable Cc, this patch does not seem to qualify for stable > >> changes from v4: > >> Yijing Wang <wangyijing@huawei.com> noted that > >> early fixups rely on pci_msi_off. > >> Split out the functionality and move off the > >> required part to run early during pci_device_setup. > >> Changes from v3: > >> fix a copy-and-paste error in > >> pci: drop some duplicate code > >> other patches are unchanged > >> drop Cc stable for now > >> Changes from v2: > >> move code from probe to device enumeration > >> add patches to unexport pci_msi_off > >> > >> > >> include/linux/pci.h | 2 ++ > >> drivers/pci/pci-driver.c | 6 ++++-- > >> drivers/virtio/virtio_pci_common.c | 4 ++++ > >> 3 files changed, 10 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/pci.h b/include/linux/pci.h > >> index 860c751..80f3494 100644 > >> --- a/include/linux/pci.h > >> +++ b/include/linux/pci.h > >> @@ -180,6 +180,8 @@ enum pci_dev_flags { > >> PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6), > >> /* Do not use PM reset even if device advertises NoSoftRst- */ > >> PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7), > >> + /* Do not disable MSI on shutdown, disable bus master instead */ > > > > This comment doesn't really match what the patch does. The patch merely > > does "Do not disable MSI on shutdown." It doesn't "disable bus master > > instead." > > > > Bus master may be disabled elsewhere, but that is independent of the > > PCI_DEV_FLAGS_NO_MSI_SHUTDOWN flag. > > > >> + PCI_DEV_FLAGS_NO_MSI_SHUTDOWN = (__force pci_dev_flags_t) (1 << 8), > >> }; > >> > >> enum pci_irq_reroute_variant { > >> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > >> index 3cb2210..59d9e40 100644 > >> --- a/drivers/pci/pci-driver.c > >> +++ b/drivers/pci/pci-driver.c > >> @@ -450,8 +450,10 @@ static void pci_device_shutdown(struct device *dev) > >> > >> if (drv && drv->shutdown) > >> drv->shutdown(pci_dev); > >> - pci_msi_shutdown(pci_dev); > >> - pci_msix_shutdown(pci_dev); > >> + if (!(pci_dev->dev_flags & PCI_DEV_FLAGS_NO_MSI_SHUTDOWN)) { > >> + pci_msi_shutdown(pci_dev); > >> + pci_msix_shutdown(pci_dev); > >> + } > >> > >> #ifdef CONFIG_KEXEC > >> /* > >> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c > >> index 78f804a..26f46c3 100644 > >> --- a/drivers/virtio/virtio_pci_common.c > >> +++ b/drivers/virtio/virtio_pci_common.c > >> @@ -528,6 +528,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, > >> if (rc) > >> goto err_register; > >> > >> + pci_dev->dev_flags |= PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; > >> + > >> return 0; > >> > >> err_register: > >> @@ -546,6 +548,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) > >> { > >> struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); > >> > >> + pci_dev->dev_flags &= ~PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; > >> + > >> unregister_virtio_device(&vp_dev->vdev); > >> > >> if (vp_dev->ioaddr) > >> -- > >> MST -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > On some hypervisors, virtio devices tend to generate spurious interrupts > when switching between MSI and non-MSI mode. Normally, either MSI or > non-MSI is used and all is well, but during shutdown, linux disables MSI > which then causes an "irq %d: nobody cared" message, with irq being > subsequently disabled. My understanding is: Linux disables MSI/MSI-X during device shutdown. If the device signals an interrupt after that, it may use INTx. This INTx interrupt is not necessarily spurious. Using INTx to signal an interrupt that occurs when MSI is disabled seems like reasonable behavior for any PCI device. And it doesn't seem related to switching between MSI and non-MSI mode. Yes, the INTx happens *after* disabling MSI, but it is not at all *because* we disabled MSI. So I wouldn't say "they generate spurious interrupts when switching between MSI and non-MSI." Why doesn't virtio-pci just register an INTx handler in addition to an MSI handler? Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 21, 2015 at 01:21:47PM -0500, Bjorn Helgaas wrote: > On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > > On some hypervisors, virtio devices tend to generate spurious interrupts > > when switching between MSI and non-MSI mode. Normally, either MSI or > > non-MSI is used and all is well, but during shutdown, linux disables MSI > > which then causes an "irq %d: nobody cared" message, with irq being > > subsequently disabled. > > My understanding is: > > Linux disables MSI/MSI-X during device shutdown. If the device > signals an interrupt after that, it may use INTx. > > This INTx interrupt is not necessarily spurious. Using INTx to signal an > interrupt that occurs when MSI is disabled seems like reasonable behavior > for any PCI device. > And it doesn't seem related to switching between MSI and non-MSI mode. > Yes, the INTx happens *after* disabling MSI, but it is not at all > *because* we disabled MSI. So I wouldn't say "they generate spurious > interrupts when switching between MSI and non-MSI." > > Why doesn't virtio-pci just register an INTx handler in addition to an MSI > handler? > > Bjorn The handler causes an expensive exit to the hypervisor, and the INTx lines are shared with other devices. Seems silly to slow them down just so we can do something that triggers the device bug. The bus master is disabled by that time, if linux can just desist from touching MSI enable device won't send either INTx (because MSI is on) or MSI (because bus master is on) and all will be well.
On Mon, Sep 21, 2015 at 10:42:13PM +0300, Michael S. Tsirkin wrote: > On Mon, Sep 21, 2015 at 01:21:47PM -0500, Bjorn Helgaas wrote: > > On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > > > On some hypervisors, virtio devices tend to generate spurious interrupts > > > when switching between MSI and non-MSI mode. Normally, either MSI or > > > non-MSI is used and all is well, but during shutdown, linux disables MSI > > > which then causes an "irq %d: nobody cared" message, with irq being > > > subsequently disabled. > > > > My understanding is: > > > > Linux disables MSI/MSI-X during device shutdown. If the device > > signals an interrupt after that, it may use INTx. > > > > This INTx interrupt is not necessarily spurious. Using INTx to signal an > > interrupt that occurs when MSI is disabled seems like reasonable behavior > > for any PCI device. > > And it doesn't seem related to switching between MSI and non-MSI mode. > > Yes, the INTx happens *after* disabling MSI, but it is not at all > > *because* we disabled MSI. So I wouldn't say "they generate spurious > > interrupts when switching between MSI and non-MSI." > > > > Why doesn't virtio-pci just register an INTx handler in addition to an MSI > > handler? > > The handler causes an expensive exit to the hypervisor, > and the INTx lines are shared with other devices. Do we care? Is this a performance path? I thought we were in a kexec shutdown path. > Seems silly to slow them down just so we can do something > that triggers the device bug. The bus master is disabled by that time, > if linux can just desist from touching MSI enable device won't > send either INTx (because MSI is on) or MSI > (because bus master is on) and all will be well. It would also be silly to put special-purpose code in the PCI core if there's a reasonable way to handle this in a driver. Can you describe exactly what the device bug is? Apparently you're saying that if we shut down MSI, it triggers the bug? And I guess you're talking about a virtio device as implemented in qemu or other hypervisors? If we leave MSI enabled (as your patch does), then the device has MSI enabled and Bus Master disabled. I can see these possibilities: 1) the device never recognizes an interrupt condition 2) the device sets the pending bit but doesn't issue the MSI write, so the OS doesn't see the interrupt unless it polls for it 3) the device signals MSI and we still have an MSI handler registered, so we silently handle it 4) the device signals INTx You seem to suggest that if we leave MSI enabled (as your patch does), we're in case 1. But I doubt that disabling MSI causes the device to interrupt. Case 2 seems more likely to me: the device recognized an interrupt condition, e.g., an event occurred, and the OS simply doesn't see the interrupt because the device can't issue the MSI message. Case 3 does seem like it would be a device bug, because the device shouldn't do an MSI write when Bus Master is disabled. I don't see this case mentioned explicitly in the PCI spec, but PCIe r3.0 spec sec 7.5.1.1 does make it clear that disabling Bus Master also disables MSI messages. I don't know whether case 4 would be legal or not. But apparently it doesn't happen with the virtio device anyway, so it's not really a concern here. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 21, 2015 at 05:10:43PM -0500, Bjorn Helgaas wrote: > On Mon, Sep 21, 2015 at 10:42:13PM +0300, Michael S. Tsirkin wrote: > > On Mon, Sep 21, 2015 at 01:21:47PM -0500, Bjorn Helgaas wrote: > > > On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > > > > On some hypervisors, virtio devices tend to generate spurious interrupts > > > > when switching between MSI and non-MSI mode. Normally, either MSI or > > > > non-MSI is used and all is well, but during shutdown, linux disables MSI > > > > which then causes an "irq %d: nobody cared" message, with irq being > > > > subsequently disabled. > > > > > > My understanding is: > > > > > > Linux disables MSI/MSI-X during device shutdown. If the device > > > signals an interrupt after that, it may use INTx. > > > > > > This INTx interrupt is not necessarily spurious. Using INTx to signal an > > > interrupt that occurs when MSI is disabled seems like reasonable behavior > > > for any PCI device. > > > And it doesn't seem related to switching between MSI and non-MSI mode. > > > Yes, the INTx happens *after* disabling MSI, but it is not at all > > > *because* we disabled MSI. So I wouldn't say "they generate spurious > > > interrupts when switching between MSI and non-MSI." > > > > > > Why doesn't virtio-pci just register an INTx handler in addition to an MSI > > > handler? > > > > The handler causes an expensive exit to the hypervisor, > > and the INTx lines are shared with other devices. > > Do we care? Is this a performance path? I thought we were in a kexec > shutdown path. Yes but the handler would always have to be registered, right? > > Seems silly to slow them down just so we can do something > > that triggers the device bug. The bus master is disabled by that time, > > if linux can just desist from touching MSI enable device won't > > send either INTx (because MSI is on) or MSI > > (because bus master is on) and all will be well. > > It would also be silly to put special-purpose code in the PCI core > if there's a reasonable way to handle this in a driver. > > Can you describe exactly what the device bug is? Apparently you're > saying that if we shut down MSI, it triggers the bug? And I guess > you're talking about a virtio device as implemented in qemu or other > hypervisors? Yes. Basically depending on an internal device state, disabling MSI sometimes wedges it. The most easy to debug effect is if it starts sending INTx interrupts, for which there's no handler currently. Full system reset always gets us out of the bad state. > If we leave MSI enabled (as your patch does), then the device has MSI > enabled and Bus Master disabled. I can see these possibilities: > > 1) the device never recognizes an interrupt condition > 2) the device sets the pending bit but doesn't issue the MSI write, > so the OS doesn't see the interrupt unless it polls for it > 3) the device signals MSI and we still have an MSI handler > registered, so we silently handle it > 4) the device signals INTx > You seem to suggest that if we leave MSI enabled (as your patch does), > we're in case 1. But I doubt that disabling MSI causes the device to > interrupt. > > Case 2 seems more likely to me: the device recognized an interrupt > condition, e.g., an event occurred, and the OS simply doesn't see the > interrupt because the device can't issue the MSI message. > > Case 3 does seem like it would be a device bug, because the device > shouldn't do an MSI write when Bus Master is disabled. I don't see > this case mentioned explicitly in the PCI spec, but PCIe r3.0 spec sec > 7.5.1.1 does make it clear that disabling Bus Master also disables MSI > messages. > > I don't know whether case 4 would be legal or not. But apparently it > doesn't happen with the virtio device anyway, so it's not really a > concern here. > > Bjorn It's case 2. Except it doesn't actually set the pending bit, because PCI spec has language like "While a vector is masked, the function is prohibited from sending the associated message, and the function must set the associated Pending bit" but doesn't actually require to set pending if bus master enable prevents sending MSI.
On Tue, Sep 22, 2015 at 02:29:03PM +0300, Michael S. Tsirkin wrote: > On Mon, Sep 21, 2015 at 05:10:43PM -0500, Bjorn Helgaas wrote: > > On Mon, Sep 21, 2015 at 10:42:13PM +0300, Michael S. Tsirkin wrote: > > > On Mon, Sep 21, 2015 at 01:21:47PM -0500, Bjorn Helgaas wrote: > > > > On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > > > > > On some hypervisors, virtio devices tend to generate spurious interrupts > > > > > when switching between MSI and non-MSI mode. Normally, either MSI or > > > > > non-MSI is used and all is well, but during shutdown, linux disables MSI > > > > > which then causes an "irq %d: nobody cared" message, with irq being > > > > > subsequently disabled. > > > > > > > > My understanding is: > > > > > > > > Linux disables MSI/MSI-X during device shutdown. If the device > > > > signals an interrupt after that, it may use INTx. > > > > > > > > This INTx interrupt is not necessarily spurious. Using INTx to signal an > > > > interrupt that occurs when MSI is disabled seems like reasonable behavior > > > > for any PCI device. > > > > And it doesn't seem related to switching between MSI and non-MSI mode. > > > > Yes, the INTx happens *after* disabling MSI, but it is not at all > > > > *because* we disabled MSI. So I wouldn't say "they generate spurious > > > > interrupts when switching between MSI and non-MSI." > > > > > > > > Why doesn't virtio-pci just register an INTx handler in addition to an MSI > > > > handler? > > > > > > The handler causes an expensive exit to the hypervisor, > > > and the INTx lines are shared with other devices. > > > > Do we care? Is this a performance path? I thought we were in a kexec > > shutdown path. > > Yes but the handler would always have to be registered, right? The pci_device_shutdown() path you're modifying calls drv->shutdown() immediately before disabling MSI, so I suppose you could register a handler in a virtio shutdown method. > > > Seems silly to slow them down just so we can do something > > > that triggers the device bug. The bus master is disabled by that time, > > > if linux can just desist from touching MSI enable device won't > > > send either INTx (because MSI is on) or MSI > > > (because bus master is on) and all will be well. > > > > It would also be silly to put special-purpose code in the PCI core > > if there's a reasonable way to handle this in a driver. > > > > Can you describe exactly what the device bug is? Apparently you're > > saying that if we shut down MSI, it triggers the bug? And I guess > > you're talking about a virtio device as implemented in qemu or other > > hypervisors? > > Yes. Basically depending on an internal device state, disabling MSI > sometimes wedges it. The most easy to debug effect is if it starts > sending INTx interrupts, for which there's no handler currently. > Full system reset always gets us out of the bad state. If disabling MSI causes the device to use INTx interrupts, that sounds perfectly normal to me. If disabling MSI causes the device to hang, *that* sounds like a bug. Since this is virtio, we should be able to figure out exactly where that happens. Do you have a pointer to a virtio bug report, or even a QEMU commit that fixes this virtio bug? I understand that even if there is a virtio fix in QEMU, we want a solution that works even with an old QEMU that doesn't contain the fix. But a pointer to a QEMU fix would really help understand and document the Linux fix. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 22, 2015 at 07:36:40AM -0500, Bjorn Helgaas wrote: > On Tue, Sep 22, 2015 at 02:29:03PM +0300, Michael S. Tsirkin wrote: > > On Mon, Sep 21, 2015 at 05:10:43PM -0500, Bjorn Helgaas wrote: > > > On Mon, Sep 21, 2015 at 10:42:13PM +0300, Michael S. Tsirkin wrote: > > > > On Mon, Sep 21, 2015 at 01:21:47PM -0500, Bjorn Helgaas wrote: > > > > > On Sun, Sep 06, 2015 at 06:32:35PM +0300, Michael S. Tsirkin wrote: > > > > > > On some hypervisors, virtio devices tend to generate spurious interrupts > > > > > > when switching between MSI and non-MSI mode. Normally, either MSI or > > > > > > non-MSI is used and all is well, but during shutdown, linux disables MSI > > > > > > which then causes an "irq %d: nobody cared" message, with irq being > > > > > > subsequently disabled. > > > > > > > > > > My understanding is: > > > > > > > > > > Linux disables MSI/MSI-X during device shutdown. If the device > > > > > signals an interrupt after that, it may use INTx. > > > > > > > > > > This INTx interrupt is not necessarily spurious. Using INTx to signal an > > > > > interrupt that occurs when MSI is disabled seems like reasonable behavior > > > > > for any PCI device. > > > > > And it doesn't seem related to switching between MSI and non-MSI mode. > > > > > Yes, the INTx happens *after* disabling MSI, but it is not at all > > > > > *because* we disabled MSI. So I wouldn't say "they generate spurious > > > > > interrupts when switching between MSI and non-MSI." > > > > > > > > > > Why doesn't virtio-pci just register an INTx handler in addition to an MSI > > > > > handler? > > > > > > > > The handler causes an expensive exit to the hypervisor, > > > > and the INTx lines are shared with other devices. > > > > > > Do we care? Is this a performance path? I thought we were in a kexec > > > shutdown path. > > > > Yes but the handler would always have to be registered, right? > > The pci_device_shutdown() path you're modifying calls drv->shutdown() > immediately before disabling MSI, so I suppose you could register a > handler in a virtio shutdown method. I guess we could. Not sure what we'd do e.g. if that fails. > > > > Seems silly to slow them down just so we can do something > > > > that triggers the device bug. The bus master is disabled by that time, > > > > if linux can just desist from touching MSI enable device won't > > > > send either INTx (because MSI is on) or MSI > > > > (because bus master is on) and all will be well. > > > > > > It would also be silly to put special-purpose code in the PCI core > > > if there's a reasonable way to handle this in a driver. > > > > > > Can you describe exactly what the device bug is? Apparently you're > > > saying that if we shut down MSI, it triggers the bug? And I guess > > > you're talking about a virtio device as implemented in qemu or other > > > hypervisors? > > > > Yes. Basically depending on an internal device state, disabling MSI > > sometimes wedges it. The most easy to debug effect is if it starts > > sending INTx interrupts, for which there's no handler currently. > > Full system reset always gets us out of the bad state. > > If disabling MSI causes the device to use INTx interrupts, that sounds > perfectly normal to me. > > If disabling MSI causes the device to hang, *that* sounds like a bug. > Since this is virtio, we should be able to figure out exactly where > that happens. Do you have a pointer to a virtio bug report, or even a > QEMU commit that fixes this virtio bug? > > I understand that even if there is a virtio fix in QEMU, we want a > solution that works even with an old QEMU that doesn't contain the > fix. But a pointer to a QEMU fix would really help understand and > document the Linux fix. > > Bjorn I'm not sure we ever understood it completely. I think some of it has to do with the way the whole virtio 0 device register layout changes when you enable/disable MSI. So should be ok when using the modern virtio 1 model since we fixed this thing. I was hoping that since disabling MSI in pci core is only useful as a work-around (for devices with a broken bus master enable - even though I don't think we know what these are exactly), a flag for not disabling it won't be held to such a high standard.
On Tue, Sep 22, 2015 at 05:07:19PM +0300, Michael S. Tsirkin wrote: > On Tue, Sep 22, 2015 at 07:36:40AM -0500, Bjorn Helgaas wrote: > > On Tue, Sep 22, 2015 at 02:29:03PM +0300, Michael S. Tsirkin wrote: > > > On Mon, Sep 21, 2015 at 05:10:43PM -0500, Bjorn Helgaas wrote: > > > > Can you describe exactly what the device bug is? Apparently you're > > > > saying that if we shut down MSI, it triggers the bug? And I guess > > > > you're talking about a virtio device as implemented in qemu or other > > > > hypervisors? > > > > > > Yes. Basically depending on an internal device state, disabling MSI > > > sometimes wedges it. The most easy to debug effect is if it starts > > > sending INTx interrupts, for which there's no handler currently. > > > Full system reset always gets us out of the bad state. > > > > If disabling MSI causes the device to use INTx interrupts, that sounds > > perfectly normal to me. > > > > If disabling MSI causes the device to hang, *that* sounds like a bug. > > Since this is virtio, we should be able to figure out exactly where > > that happens. Do you have a pointer to a virtio bug report, or even a > > QEMU commit that fixes this virtio bug? > > > > I understand that even if there is a virtio fix in QEMU, we want a > > solution that works even with an old QEMU that doesn't contain the > > fix. But a pointer to a QEMU fix would really help understand and > > document the Linux fix. > > I'm not sure we ever understood it completely. > > I think some of it has to do with the way the whole virtio 0 > device register layout changes when you enable/disable MSI. So should > be ok when using the modern virtio 1 model since we fixed this thing. If you know that: - pci_device_shutdown() disables MSI, - disabling MSI changes the virtio register layout, and - the driver may touch the device after pci_device_shutdown(), it seems like you might want a virtio shutdown method so you can find out when the register layout changes. Changing the register layout sounds completely broken, and dealing with it sounds racy, but it sounds like a mess that should be handled by the driver. > I was hoping that since disabling MSI in pci core is only useful as a > work-around (for devices with a broken bus master enable - even though I > don't think we know what these are exactly), a flag for not disabling it > won't be held to such a high standard. Well, I suppose you see the problem with adding workarounds on top of workarounds, especially when we don't have a clear understanding of why we need them. The "disable MSI on shutdown" code was added here: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d52877c7b1af and there is no information in that patch about it being a workaround for devices with broken bus master enable. The cumulative effect of stuff like this is that it becomes impossible to do any meaningful restructuring in the core without breaking some corner case. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/pci.h b/include/linux/pci.h index 860c751..80f3494 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -180,6 +180,8 @@ enum pci_dev_flags { PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6), /* Do not use PM reset even if device advertises NoSoftRst- */ PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7), + /* Do not disable MSI on shutdown, disable bus master instead */ + PCI_DEV_FLAGS_NO_MSI_SHUTDOWN = (__force pci_dev_flags_t) (1 << 8), }; enum pci_irq_reroute_variant { diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 3cb2210..59d9e40 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -450,8 +450,10 @@ static void pci_device_shutdown(struct device *dev) if (drv && drv->shutdown) drv->shutdown(pci_dev); - pci_msi_shutdown(pci_dev); - pci_msix_shutdown(pci_dev); + if (!(pci_dev->dev_flags & PCI_DEV_FLAGS_NO_MSI_SHUTDOWN)) { + pci_msi_shutdown(pci_dev); + pci_msix_shutdown(pci_dev); + } #ifdef CONFIG_KEXEC /* diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 78f804a..26f46c3 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -528,6 +528,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, if (rc) goto err_register; + pci_dev->dev_flags |= PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; + return 0; err_register: @@ -546,6 +548,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) { struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); + pci_dev->dev_flags &= ~PCI_DEV_FLAGS_NO_MSI_SHUTDOWN; + unregister_virtio_device(&vp_dev->vdev); if (vp_dev->ioaddr)
On some hypervisors, virtio devices tend to generate spurious interrupts when switching between MSI and non-MSI mode. Normally, either MSI or non-MSI is used and all is well, but during shutdown, linux disables MSI which then causes an "irq %d: nobody cared" message, with irq being subsequently disabled. Since bus mastering is already disabled at this point, disabling MSI isn't actually useful for spec compliant devices: MSI interrupts are in-bus memory writes so disabling Bus Master (which is already done) disables these as well: after some research, it appears to be there for the benefit of devices that ignore the bus master bit. As it's not clear how common this kind of bug is, this patch simply adds a quirk, to be set by devices that wish to skip disabling msi on shutdown, relying on bus master instead. We set this quirk in virtio core. Reported-by: Fam Zheng <famz@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yhlu.kernel.send@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- changes from v6: limit changes to virtio only changes from v5: rebased on top of pci/msi fixed commit log, including comments by Bjorn and adding explanation to address comments/questions by Eric dropped stable Cc, this patch does not seem to qualify for stable changes from v4: Yijing Wang <wangyijing@huawei.com> noted that early fixups rely on pci_msi_off. Split out the functionality and move off the required part to run early during pci_device_setup. Changes from v3: fix a copy-and-paste error in pci: drop some duplicate code other patches are unchanged drop Cc stable for now Changes from v2: move code from probe to device enumeration add patches to unexport pci_msi_off include/linux/pci.h | 2 ++ drivers/pci/pci-driver.c | 6 ++++-- drivers/virtio/virtio_pci_common.c | 4 ++++ 3 files changed, 10 insertions(+), 2 deletions(-)