Message ID | 1431431730-25164-2-git-send-email-mst@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
"Michael S. Tsirkin" <mst@redhat.com> writes: > d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown > v2") disabled MSI/MSI-X at device shutdown to address a kexec problem. > > The change made by the above commit is no longer necessary: it was > superceded by commit d5dea7d95c48 ("PCI: msi: Disable msi interrupts > when we initialize a pci device") which makes sure the original kexec > problem is solved in the new kernel, and commit b566a22c23 ("PCI: > disable Bus Master on PCI device shutdown") which makes sure device does > not send MSI interrupts (MSI is a memory write and so is suppressed when > bus master is cleared). > > On the contrary, disabling MSI makes it *more* likely that the device > will cause mischief since unlike MSI, INT#x interrupts are not > suppressed by clearing bus mastering. > > One example of such mischief is that after we disable MSI, the device > may assert INT#x (remember, cleaning bus mastering does not disable > INT#x interrupts), and if the driver hasn't registered an interrupt > handler for it, the interrupt is never deasserted which causes an "irq > %d: nobody cared" message, with irq being subsequently disabled. This is > actually not hard to observe with virtio devices. Not a huge problem, > but ugly, and might in theory cause other problems, e.g. if the irq > being disabled is shared with another device that attempts to use it in > its shutdown callback, or if irq debugging was explicitly disabled on > the kernel command line. And disabling irq debugging is stupid, we should probably delete that option. We poll disabled irqs periodically to keep the disabled ones from completely killing your system even if they are shared. > Another theoretical problem is that if the driver does not flush MSI > interrupts in the shutdown callback, MSI interrupt handler will run at > the same time as the INT#x handler, which doesn't normally happen > outside the shutdown path; Depending on the driver, the two handlers > might conflict. I did not go looking for such driver bugs but this seems > plausible. Again a buggy driver issue. > virtio specifically has another issue: register functions change between > msi enabled and disabled modes, so disabling MSI while driver is doing > things (e.g. from a kthread) will make the device confused. Huh??? Virtio is virtual hardware. We are talking about real hardware. How do the two mix? Is there a pass through going on and we let the virtual machine get away with putting the hardware in a nasty state, and don't clean it up in the real driver? This sounds like a case for fixing whatever insanity is happening in the virtio driver. This also sounds like a case for implementing a shutdown callback and disabling things properly. A properly shutdown driver should have already disabled MSI's. A driver is responsible for enabling MSIs so it should be responsible for disabling it. The core disabling MSIs is mostly to catch the handful of lazy drivers that forget. The bottom line is that there are a few things that are standard behavior that we can do in the generic code, but at the end of the day it is the responsibility of the driver to shut things down and whatever driver you are dealing with clearly has a bunch of bugs and you aren't fixing it. > Of course, some of the above specific issues can be solved by > implementing a shutdown callback in the driver: this callback would have > to suppress further activity from both the driver and the device, and > flush outstanding handlers. This is a non-trivial amount of code that > can trigger at any time, so needs careful thought to avoid race > conditions causing bugs, and that's only running on shutdown, so isn't > well tested. Agreed. However the code should already exist in your drivers remove method. So with a little careful factoring you should be able to use a well tested code path. If you aren't happy with the separation between remove and shutdown please take it up with whoever maintains the driver API these days. I lost that fight the first time and I haven't had the energy to take it any time since. Also we aren't talking about kexec-on-panic here. So no, it can not trigger at any time. We are talking about kexec in as an orderly transition to another kernel. So it should be possible to perform an orderly shutdown. If you can't figure out in the driver how to create an orderly shutdown I know we can't figure out how to do it in generic code outside of the driver. Especially not for every device. > Instead, stop disabling MSIs in PCI core at shutdown: it's simple, safe, > removes code instead of adding more code, and needs no driver support. I can maybe almost see an argument for disabling intx as well. I really can not buy the argument you are sending. AKA "One day I met a bugggy driver. The driver was so buggy that my system did not work properly. Instead of fixing the driver let's do crazy things, to maybe avoid some of the bugs in the driver" The thing is this is the same path that we use to transition to firmware and the goal is to put the hardware in something closely resembly the state it was in when the firmware gave us the hardware. A state where linux and potentially other OS's can initialize the hardware and make it work. The whole bus-master disable that happens on the kexec path is just a best attempt to make things happen and it works in enough cases it is worth doing. Especially when the normal state of the hardware from the boot firmware is with bus master disabled, and a good shutdown routine will clear bus-master enable anyway. Interrupts and intx are in a very different situation. Typically interrupts and intx are enabled when they come from the boot firmware. Interrupts should not fire unless there is device activity. Most devices are just passive and don't have on-going activity so a shutdown method is not particularly needed. But clearly you have a device that if you don't tell it to do something goes around sending interrupts anyway. So that driver needs to be told to shut-up and stop. As far as leaving interrupts running and causing problems switch from msi to intx is not particularly significant. There is a window when interrupts can fire and cause havoc. By leaving msi enabled I don't see the problem going away, just the kind of havoc changing. I would be a lot more inclined to figure out how to get the code that calls shutdown methods to call a drivers remove method instead. That would simplify things. Unfortunately the architects of the driver callbacks wanted something complex that would not get well tested so we have a bit of a mess. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 12, 2015 at 02:22:05PM -0500, Eric W. Biederman wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown > > v2") disabled MSI/MSI-X at device shutdown to address a kexec problem. > > > > The change made by the above commit is no longer necessary: it was > > superceded by commit d5dea7d95c48 ("PCI: msi: Disable msi interrupts > > when we initialize a pci device") which makes sure the original kexec > > problem is solved in the new kernel, and commit b566a22c23 ("PCI: > > disable Bus Master on PCI device shutdown") which makes sure device does > > not send MSI interrupts (MSI is a memory write and so is suppressed when > > bus master is cleared). > > > > On the contrary, disabling MSI makes it *more* likely that the device > > will cause mischief since unlike MSI, INT#x interrupts are not > > suppressed by clearing bus mastering. > > > > One example of such mischief is that after we disable MSI, the device > > may assert INT#x (remember, cleaning bus mastering does not disable > > INT#x interrupts), and if the driver hasn't registered an interrupt > > handler for it, the interrupt is never deasserted which causes an "irq > > %d: nobody cared" message, with irq being subsequently disabled. This is > > actually not hard to observe with virtio devices. Not a huge problem, > > but ugly, and might in theory cause other problems, e.g. if the irq > > being disabled is shared with another device that attempts to use it in > > its shutdown callback, or if irq debugging was explicitly disabled on > > the kernel command line. > > And disabling irq debugging is stupid, we should probably delete that > option. We poll disabled irqs periodically to keep the disabled ones > from completely killing your system even if they are shared. > > > Another theoretical problem is that if the driver does not flush MSI > > interrupts in the shutdown callback, MSI interrupt handler will run at > > the same time as the INT#x handler, which doesn't normally happen > > outside the shutdown path; Depending on the driver, the two handlers > > might conflict. I did not go looking for such driver bugs but this seems > > plausible. > > Again a buggy driver issue. > > > virtio specifically has another issue: register functions change between > > msi enabled and disabled modes, so disabling MSI while driver is doing > > things (e.g. from a kthread) will make the device confused. > > Huh??? Virtio is virtual hardware. We are talking about real hardware. > How do the two mix? Is there a pass through going on and we let the > virtual machine get away with putting the hardware in a nasty state, > and don't clean it up in the real driver? Yes pass through can mix real hardware and virtio. We are talking about shutdown within the VM so the hypervisor does not get a chance to run and clean up anything. > This sounds like a case for fixing whatever insanity is happening in the > virtio driver. > > This also sounds like a case for implementing a shutdown callback and > disabling things properly. A properly shutdown driver should have > already disabled MSI's. A driver is responsible for enabling MSIs so it > should be responsible for disabling it. The core disabling MSIs is > mostly to catch the handful of lazy drivers that forget. Okay! And I am saying that if the driver did forget, we are better off not disabling it - leave it enabled until kexec starts and disables it. > The bottom line is that there are a few things that are standard > behavior that we can do in the generic code, but at the end of the day > it is the responsibility of the driver to shut things down and whatever > driver you are dealing with clearly has a bunch of bugs and you aren't > fixing it. So please let us get on with fixing it in driver and stop playing with device in core. > > Of course, some of the above specific issues can be solved by > > implementing a shutdown callback in the driver: this callback would have > > to suppress further activity from both the driver and the device, and > > flush outstanding handlers. This is a non-trivial amount of code that > > can trigger at any time, so needs careful thought to avoid race > > conditions causing bugs, and that's only running on shutdown, so isn't > > well tested. > > Agreed. However the code should already exist in your drivers remove > method. So with a little careful factoring you should be able to > use a well tested code path. > > If you aren't happy with the separation between remove and shutdown > please take it up with whoever maintains the driver API these days. > I lost that fight the first time and I haven't had the energy to take it > any time since. > > Also we aren't talking about kexec-on-panic here. So no, it can not > trigger at any time. We are talking about kexec in as an orderly > transition to another kernel. So it should be possible to perform an > orderly shutdown. Why aren't we talking about kexec-on-panic? That one doesn't trigger shutdown callbacks? > If you can't figure out in the driver how to create an orderly shutdown > I know we can't figure out how to do it in generic code outside of the > driver. Especially not for every device. Right! So please let's stop pretending we do, touch the device as little as possible. This is exactly what my patches do: don't touch msi since we don't have to. > > Instead, stop disabling MSIs in PCI core at shutdown: it's simple, safe, > > removes code instead of adding more code, and needs no driver support. > > I can maybe almost see an argument for disabling intx as well. > > I really can not buy the argument you are sending. AKA "One day I met a > bugggy driver. The driver was so buggy that my system did not work > properly. Instead of fixing the driver let's do crazy things, to maybe > avoid some of the bugs in the driver" I am saying it is driver's responsibility to disable msi on shutdown. If it doesn't then it knows best. Don't second-guess it. > The thing is this is the same path that we use to transition to firmware > and the goal is to put the hardware in something closely resembly the > state it was in when the firmware gave us the hardware. A state where > linux and potentially other OS's can initialize the hardware and make it > work. > > The whole bus-master disable that happens on the kexec path is just > a best attempt to make things happen and it works in enough cases > it is worth doing. Especially when the normal state of the hardware > from the boot firmware is with bus master disabled, and a good shutdown > routine will clear bus-master enable anyway. > > Interrupts and intx are in a very different situation. Typically > interrupts and intx are enabled when they come from the boot firmware. > Interrupts should not fire unless there is device activity. > > Most devices are just passive and don't have on-going activity so a > shutdown method is not particularly needed. But clearly you have a > device that if you don't tell it to do something goes around sending > interrupts anyway. So that driver needs to be told to shut-up and stop. So the same argument should have been applied instead of d52877c7b1af. Apparently for fusion io if you just disable msi and don't do anything else, things get rosy. But it's a device specific thing. All I am saying doing this in pci core is wrong, pci core has no idea how to properly tell the device to shut-up and stop, it is driver's responsibility. > As far as leaving interrupts running and causing problems switch from > msi to intx is not particularly significant. There is a window when > interrupts can fire and cause havoc. By leaving msi enabled I don't see > the problem going away, just the kind of havoc changing. How can havoc happen? We disable bus master, no msi will be sent. > I would be a lot more inclined to figure out how to get the code that > calls shutdown methods to call a drivers remove method instead. That > would simplify things. Unfortunately the architects of the driver > callbacks wanted something complex that would not get well tested so we > have a bit of a mess. > > Eric Really my patch is all about doing the minimal thing in core since we don't really know how to cleanup the device.
On Wed, May 13, 2015 at 08:41:55AM +0200, Michael S. Tsirkin wrote: > > This also sounds like a case for implementing a shutdown callback and > > disabling things properly. A properly shutdown driver should have > > already disabled MSI's. A driver is responsible for enabling MSIs so it > > should be responsible for disabling it. The core disabling MSIs is > > mostly to catch the handful of lazy drivers that forget. > > > Okay! And I am saying that if the driver did forget, > we are better off not disabling it - leave it enabled > until kexec starts and disables it. > > > > The bottom line is that there are a few things that are standard > > behavior that we can do in the generic code, but at the end of the day > > it is the responsibility of the driver to shut things down and whatever > > driver you are dealing with clearly has a bunch of bugs and you aren't > > fixing it. > > So please let us get on with fixing it in driver and stop > playing with device in core. Eric, does this argument make sense? Drivers should do the right thing in their shutdown callback, let's not try to work around their bugs in core.
On May 14, 2015 1:06:00 AM CDT, "Michael S. Tsirkin" <mst@redhat.com> wrote: >On Wed, May 13, 2015 at 08:41:55AM +0200, Michael S. Tsirkin wrote: >> > This also sounds like a case for implementing a shutdown callback >and >> > disabling things properly. A properly shutdown driver should have >> > already disabled MSI's. A driver is responsible for enabling MSIs >so it >> > should be responsible for disabling it. The core disabling MSIs is >> > mostly to catch the handful of lazy drivers that forget. >> >> >> Okay! And I am saying that if the driver did forget, >> we are better off not disabling it - leave it enabled >> until kexec starts and disables it. >> >> >> > The bottom line is that there are a few things that are standard >> > behavior that we can do in the generic code, but at the end of the >day >> > it is the responsibility of the driver to shut things down and >whatever >> > driver you are dealing with clearly has a bunch of bugs and you >aren't >> > fixing it. >> >> So please let us get on with fixing it in driver and stop >> playing with device in core. > >Eric, does this argument make sense? Drivers should do the right thing >in their shutdown callback, let's not try to work around their bugs in >core. Not in context of this patch, as this change appears to be to avoid fixing the driver. Further this behavior in the core has existed for the better part of a decade. Who knows what weird behavior (called regressions) this will trigger with other drivers in other situations. I do not see any reason to change the existing behavior here. Especially as if you try and boot a non-linux kernel with kexec you are almost certainly going to subject it to a screaming MSI interrupt and there almost certainly will not be code to disable MSIs as they are disabled by at boot up by default. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 14, 2015 at 02:58:59AM -0500, Eric W. Biederman wrote: > > > On May 14, 2015 1:06:00 AM CDT, "Michael S. Tsirkin" <mst@redhat.com> wrote: > >On Wed, May 13, 2015 at 08:41:55AM +0200, Michael S. Tsirkin wrote: > >> > This also sounds like a case for implementing a shutdown callback > >and > >> > disabling things properly. A properly shutdown driver should have > >> > already disabled MSI's. A driver is responsible for enabling MSIs > >so it > >> > should be responsible for disabling it. The core disabling MSIs is > >> > mostly to catch the handful of lazy drivers that forget. > >> > >> > >> Okay! And I am saying that if the driver did forget, > >> we are better off not disabling it - leave it enabled > >> until kexec starts and disables it. > >> > >> > >> > The bottom line is that there are a few things that are standard > >> > behavior that we can do in the generic code, but at the end of the > >day > >> > it is the responsibility of the driver to shut things down and > >whatever > >> > driver you are dealing with clearly has a bunch of bugs and you > >aren't > >> > fixing it. > >> > >> So please let us get on with fixing it in driver and stop > >> playing with device in core. > > > >Eric, does this argument make sense? Drivers should do the right thing > >in their shutdown callback, let's not try to work around their bugs in > >core. > > Not in context of this patch, as this change appears to > be to avoid fixing the driver. Not really. I do intend to work on adding shutdown to virtio: e.g. we really must change avoid running config change interrupts. but I would like the core to do the right thing, so I argue about what's best for general core to do, and what has the least chance to cause hangs. > Further this behavior in the core has existed for the > better part of a decade. > Who knows what weird > behavior (called regressions) this will trigger with > other drivers in other situations. That is also part of the argument. It used to be so, but now it really can not trigger any regressions because we have since added a bigger hammer: disabling bus mastering. Do you disagree with this? How can leaving msi on cause harm? > I do not see any reason to change the existing behavior here. > > Especially as if you try and boot a non-linux kernel with > kexec First time I hear about this requirement. Does anyone do this? I find it hard to believe it works ... > you are almost certainly going to subject it to a > screaming MSI interrupt and there almost certainly will > not be code to disable MSIs as they are disabled by at > boot up by default. > > Eric OTOH if you do disable MSI but leave device functioning you will just get screaming INT#x which is even worse because it will end up disabling INT#x which is shared, and so breaking multiple devices and not just this one.
On Tue, May 12, 2015 at 03:03:32PM +0200, Michael S. Tsirkin wrote: > d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown > v2") disabled MSI/MSI-X at device shutdown to address a kexec problem. > ... I know you're trying to put all the justification in the changelog, and that's great if it can be done. But would you please just add the single link here to https://bugzilla.kernel.org/show_bug.cgi?id=96571 ? And please attach the dmesg log and instructions for reproducing the problem to the bugzilla. I've asked for this before, and it seems like a simple request, but maybe there's a reason it's more complicated than it seems to me. It's obvious to you how all this fits together, but I'd like it to be more concrete to the rest of us, too. The bugzilla says Ulrich Obergfell noticed this in RHEL7. If there's a RHEL bugzilla, it'd be nice to have a link to it in the kernel.org bugzilla. Informal hints are great right now, but they'll be useless after six months. Bjorn > Reported-by: Fam Zheng <famz@redhat.com> > Tested-by: Fam Zheng <famz@redhat.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > CC: Yinghai Lu <yhlu.kernel.send@gmail.com> > CC: Ulrich Obergfell <uobergfe@redhat.com> > CC: Rusty Russell <rusty@rustcorp.com.au> > --- > drivers/pci/pci-driver.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 3cb2210..38a602c 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev) > > if (drv && drv->shutdown) > drv->shutdown(pci_dev); > - pci_msi_shutdown(pci_dev); > - pci_msix_shutdown(pci_dev); > > #ifdef CONFIG_KEXEC > /* > -- > MST > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 05/19 09:58, Bjorn Helgaas wrote: > On Tue, May 12, 2015 at 03:03:32PM +0200, Michael S. Tsirkin wrote: > > d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown > > v2") disabled MSI/MSI-X at device shutdown to address a kexec problem. > > ... Hi Bjorn, > > I know you're trying to put all the justification in the changelog, and > that's great if it can be done. But would you please just add the single > link here to https://bugzilla.kernel.org/show_bug.cgi?id=96571 ? > > And please attach the dmesg log and instructions for reproducing the > problem to the bugzilla. I've asked for this before, and it seems like a > simple request, but maybe there's a reason it's more complicated than it > seems to me. It's obvious to you how all this fits together, but I'd like > it to be more concrete to the rest of us, too. I've updated what I could see with current master build as much as I can, the softlockup is not observed in my tests, but the dmesg log already shows the spurious IRQ. > > The bugzilla says Ulrich Obergfell noticed this in RHEL7. If there's a > RHEL bugzilla, it'd be nice to have a link to it in the kernel.org > bugzilla. Informal hints are great right now, but they'll be useless after > six months. The original bug is not public accessible, so I didn't include the link. Fam -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 14, 2015 at 11:53:34AM +0200, Michael S. Tsirkin wrote: > > you are almost certainly going to subject it to a > > screaming MSI interrupt and there almost certainly will > > not be code to disable MSIs as they are disabled by at > > boot up by default. > > > > Eric > > OTOH if you do disable MSI but leave device functioning you will just > get screaming INT#x which is even worse because it will end up disabling > INT#x which is shared, and so breaking multiple devices and not just > this one. Eric, can you comment on this please? To list the points again: 1- if device is properly shut down there's no need to disable MSIs 2- if device is not properly shut down disabling MSIs will cause screaming INT#x interrupts 3- in both cases if we leave MSI on then we can be sure we won't get screaming interrupts since we now disable bus mastering which suppresses MSIs (but not INT#x) Looking at the commit that added the disable, that is d52877c7b1af, you will see that historically it preceded b566a22c23. So the screaming MSI problem that d52877c7b1af tried to address is now gone, addressed by b566a22c23. What do you think? > -- > MST -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 28, 2015 at 06:36:22PM +0200, Michael S. Tsirkin wrote: > On Thu, May 14, 2015 at 11:53:34AM +0200, Michael S. Tsirkin wrote: > > > you are almost certainly going to subject it to a > > > screaming MSI interrupt and there almost certainly will > > > not be code to disable MSIs as they are disabled by at > > > boot up by default. > > > > > > Eric > > > > OTOH if you do disable MSI but leave device functioning you will just > > get screaming INT#x which is even worse because it will end up disabling > > INT#x which is shared, and so breaking multiple devices and not just > > this one. > > Eric, can you comment on this please? > To list the points again: > 1- if device is properly shut down there's no need to disable MSIs > 2- if device is not properly shut down disabling MSIs will cause > screaming INT#x interrupts > 3- in both cases if we leave MSI on then we can be sure we won't > get screaming interrupts since we now disable bus mastering > which suppresses MSIs (but not INT#x) > > Looking at the commit that added the disable, that is d52877c7b1af, you > will see that historically it preceded b566a22c23. So the screaming MSI > problem that d52877c7b1af tried to address is now gone, addressed by > b566a22c23. > > What do you think? Eric, Bjor, could you comment? I'm sorry if these repeating questions are annoying to you, but the comment that not disabling msi can cause screaming interrupts seems to be based on state of the kernel before 2012. Ever since commit b566a22c23327f18ce941ffad0ca907e50a53d41 PCI: disable Bus Master on PCI device shutdown even broken devices that don't disable msi on shutdown can not cause screaming interrupts even if core leaves msi alone. So I don't see how can removing that code cause regressions, and it might improve the situation since it reduces the chance of getting screaming INT#x. > > -- > > MST -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 3cb2210..38a602c 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev) if (drv && drv->shutdown) drv->shutdown(pci_dev); - pci_msi_shutdown(pci_dev); - pci_msix_shutdown(pci_dev); #ifdef CONFIG_KEXEC /*