Message ID | 1393817042-13758-3-git-send-email-shangw@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote: > The problem is specific to the case of BIST issued applied to IPR > adapter on the guest side. After BIST reset, we lose everything > in MSIx table and we never have chance update MSIx messages for > those enabled interrupts to MSIx table. > > The patch fixes it by writing MSIx message to MSIx table before > reenabling them. That needs a better explanation... the guest does try to restore the MSI-X (in a way similar to resuming from suspend) by writing the message value back into the table. My understanding is that we trap that and turn that into a call to vfio_msi_set_vector_signal, is that correct ? In that case, it does indeed make sense to have that function rewrite the cached message. (Just trying to understand how this patch fixes the problem we saw) I suppose other drivers would have similar issues if they have some kind of internal "reset" path and try to save and restore the MSI-X table. Cheers, Ben. > Reported-by: Wen Xiong <wenxiong@us.ibm.com> > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> > --- > drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index 2103576..279ebd0 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -17,6 +17,7 @@ > #include <linux/interrupt.h> > #include <linux/eventfd.h> > #include <linux/pci.h> > +#include <linux/msi.h> > #include <linux/file.h> > #include <linux/poll.h> > #include <linux/vfio.h> > @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > struct pci_dev *pdev = vdev->pdev; > int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector; > char *name = msix ? "vfio-msix" : "vfio-msi"; > + struct msi_msg msg; > struct eventfd_ctx *trigger; > int ret; > > @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > return PTR_ERR(trigger); > } > > + /* We possiblly lose the MSI/MSIx message in some cases. > + * For example, BIST reset on IPR adapter. The MSIx table > + * is cleaned out. However, we never get chance to put > + * MSIx messages to MSIx table because all MSIx stuff is > + * being cached in QEMU. Here, we had the trick to put the > + * MSI/MSIx message back. > + * > + * Basically, we needn't worry about MSI messages. However, > + * it's not harmful and there might be cases of PCI config data > + * lost because of cached PCI config data in QEMU again. > + * > + * Note that we should flash the message prior to enabling > + * the corresponding interrupt by request_irq(). > + */ > + get_cached_msi_msg(irq, &msg); > + write_msi_msg(irq, &msg); > + > ret = request_irq(irq, vfio_msihandler, 0, > vdev->ctx[vector].name, trigger); > if (ret) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/03/2014 02:51 PM, Benjamin Herrenschmidt wrote: > On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote: >> The problem is specific to the case of BIST issued applied to IPR >> adapter on the guest side. After BIST reset, we lose everything >> in MSIx table and we never have chance update MSIx messages for >> those enabled interrupts to MSIx table. >> >> The patch fixes it by writing MSIx message to MSIx table before >> reenabling them. > > That needs a better explanation... the guest does try to restore the > MSI-X (in a way similar to resuming from suspend) by writing the > message value back into the table. > > My understanding is that we trap that and turn that into a call to > vfio_msi_set_vector_signal, is that correct ? Yep. Gavin sent me a backtrace of what happens when the guest tries writing to a BAR: qemu/hw/pci/msix.c::msix_table_mmio_write msix_handle_mask_update msix_fire_vector_notifier qemu/hw/misc/vfio.c::vfio_msix_vector_use vfio_msix_vector_do_use IOCTL-to-VFIO: VFIO_DEVICE_SET_IRQS host/drivers/vfio/pci/vfio_pci.c::vfio_pci_ioctl host/drivers/vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl vfio_pci_set_msi_trigger vfio_msi_set_block vfio_msi_set_vector_signal While it works for our particular problem and seems correct, it has one flaw - hw/pci/msix.c will not generate this backtrace if masking bit does not change which can happen in general: === static void msix_handle_mask_update(PCIDevice *dev, int vector, bool was_masked) { bool is_masked = msix_is_masked(dev, vector); if (is_masked == was_masked) { return; } === Or if masking bit is the same, nothing bad is expected?... > In that case, it does indeed make sense to have that function rewrite > the cached message. > > (Just trying to understand how this patch fixes the problem we saw) > > I suppose other drivers would have similar issues if they have some > kind of internal "reset" path and try to save and restore the MSI-X > table. > Cheers, > Ben. > >> Reported-by: Wen Xiong <wenxiong@us.ibm.com> >> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> >> --- >> drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >> index 2103576..279ebd0 100644 >> --- a/drivers/vfio/pci/vfio_pci_intrs.c >> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >> @@ -17,6 +17,7 @@ >> #include <linux/interrupt.h> >> #include <linux/eventfd.h> >> #include <linux/pci.h> >> +#include <linux/msi.h> >> #include <linux/file.h> >> #include <linux/poll.h> >> #include <linux/vfio.h> >> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, >> struct pci_dev *pdev = vdev->pdev; >> int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector; >> char *name = msix ? "vfio-msix" : "vfio-msi"; >> + struct msi_msg msg; >> struct eventfd_ctx *trigger; >> int ret; >> >> @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, >> return PTR_ERR(trigger); >> } >> >> + /* We possiblly lose the MSI/MSIx message in some cases. >> + * For example, BIST reset on IPR adapter. The MSIx table >> + * is cleaned out. However, we never get chance to put >> + * MSIx messages to MSIx table because all MSIx stuff is >> + * being cached in QEMU. Here, we had the trick to put the >> + * MSI/MSIx message back. >> + * >> + * Basically, we needn't worry about MSI messages. However, >> + * it's not harmful and there might be cases of PCI config data >> + * lost because of cached PCI config data in QEMU again. >> + * >> + * Note that we should flash the message prior to enabling >> + * the corresponding interrupt by request_irq(). >> + */ >> + get_cached_msi_msg(irq, &msg); >> + write_msi_msg(irq, &msg); >> + >> ret = request_irq(irq, vfio_msihandler, 0, >> vdev->ctx[vector].name, trigger); >> if (ret) { > >
On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote: > On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote: > > The problem is specific to the case of BIST issued applied to IPR > > adapter on the guest side. After BIST reset, we lose everything > > in MSIx table and we never have chance update MSIx messages for > > those enabled interrupts to MSIx table. > > > > The patch fixes it by writing MSIx message to MSIx table before > > reenabling them. > > That needs a better explanation... the guest does try to restore the > MSI-X (in a way similar to resuming from suspend) by writing the > message value back into the table. > > My understanding is that we trap that and turn that into a call to > vfio_msi_set_vector_signal, is that correct ? > > In that case, it does indeed make sense to have that function rewrite > the cached message. > > (Just trying to understand how this patch fixes the problem we saw) > > I suppose other drivers would have similar issues if they have some > kind of internal "reset" path and try to save and restore the MSI-X > table. > > Cheers, > Ben. > > > Reported-by: Wen Xiong <wenxiong@us.ibm.com> > > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> > > --- > > drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > > index 2103576..279ebd0 100644 > > --- a/drivers/vfio/pci/vfio_pci_intrs.c > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > > @@ -17,6 +17,7 @@ > > #include <linux/interrupt.h> > > #include <linux/eventfd.h> > > #include <linux/pci.h> > > +#include <linux/msi.h> > > #include <linux/file.h> > > #include <linux/poll.h> > > #include <linux/vfio.h> > > @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > > struct pci_dev *pdev = vdev->pdev; > > int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector; > > char *name = msix ? "vfio-msix" : "vfio-msi"; > > + struct msi_msg msg; > > struct eventfd_ctx *trigger; > > int ret; > > > > @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > > return PTR_ERR(trigger); > > } > > > > + /* We possiblly lose the MSI/MSIx message in some cases. > > + * For example, BIST reset on IPR adapter. The MSIx table > > + * is cleaned out. However, we never get chance to put > > + * MSIx messages to MSIx table because all MSIx stuff is > > + * being cached in QEMU. Here, we had the trick to put the > > + * MSI/MSIx message back. > > + * > > + * Basically, we needn't worry about MSI messages. However, > > + * it's not harmful and there might be cases of PCI config data > > + * lost because of cached PCI config data in QEMU again. > > + * > > + * Note that we should flash the message prior to enabling > > + * the corresponding interrupt by request_irq(). > > + */ > > + get_cached_msi_msg(irq, &msg); > > + write_msi_msg(irq, &msg); > > + > > ret = request_irq(irq, vfio_msihandler, 0, > > vdev->ctx[vector].name, trigger); > > if (ret) { I understand from talking to Alexey that the BARs are reset by this BIST, but what about the MSIX capability? If that gets reset to be disabled, where does it get re-enabled? QEMU won't know about the BIST, so probably assumes nothing changed when the guest writes the enable bit. VFIO doesn't allow userspace writes to the MSIX capability. So it seems like the above would restore the table entry, but not necessarily whether MSIX is enabled on the device. When I talked with Alexey I noted that we already have a BAR restore function, vfio_bar_restore(), that tries to do some fixup when backdoor resets are noticed. If we were to trigger that upon noting the user writing BAR values to what we think they're already set to, we could extend it to trigger an interrupt restore that could fixup both the capability and the table entries. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-03-03 at 15:43 +1100, Alexey Kardashevskiy wrote: > While it works for our particular problem and seems correct, it has one > flaw - hw/pci/msix.c will not generate this backtrace if masking bit does > not change which can happen in general: > === > static void msix_handle_mask_update(PCIDevice *dev, int vector, bool > was_masked) > { > bool is_masked = msix_is_masked(dev, vector); > > if (is_masked == was_masked) { > return; > } > === > > Or if masking bit is the same, nothing bad is expected?... Hrm ok, so it will work in this specific case but might not in the general case of a driver triggering some kind of local reset on the device requiring the MSI-X to be restored. The guest will write but qemu will swallow them ... I think that needs to be fixed but it might be hard without introducing a new ioctl from what I can see of the way the code is structured... Unless qemu turns that into a disable/enable pair I suppose. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2014-03-02 at 21:49 -0700, Alex Williamson wrote: > I understand from talking to Alexey that the BARs are reset by this > BIST, but what about the MSIX capability? If that gets reset to be > disabled, where does it get re-enabled? The guest will do pci_save/restore_state iirc which will attempt to save and restore everything, but qemu might get in the way here... I'm not sure whether the IPR BIST clears the config space but we probably should account for the general case of the driver in the guest doing some kind of reset (BIST or otherwise) and trying to save and restore state itself. > QEMU won't know about the BIST, > so probably assumes nothing changed when the guest writes the enable > bit. VFIO doesn't allow userspace writes to the MSIX capability. So it > seems like the above would restore the table entry, but not necessarily > whether MSIX is enabled on the device. Maybe qemu shouldn't trust its cache and upon guest writes, do a read first to check the HW state ? It's not like any of this is performance sensitive anyway... > When I talked with Alexey I noted that we already have a BAR restore > function, vfio_bar_restore(), that tries to do some fixup when backdoor > resets are noticed. If we were to trigger that upon noting the user > writing BAR values to what we think they're already set to, we could > extend it to trigger an interrupt restore that could fixup both the > capability and the table entries. Thanks, Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-03-03 at 14:10 +0800, Gavin Shan wrote: > On Sun, Mar 02, 2014 at 09:49:49PM -0700, Alex Williamson wrote: > >On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote: > >> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote: > > .../... > > >> > >> > Reported-by: Wen Xiong <wenxiong@us.ibm.com> > >> > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> > >> > --- > >> > drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++ > >> > 1 file changed, 19 insertions(+) > >> > > >> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > >> > index 2103576..279ebd0 100644 > >> > --- a/drivers/vfio/pci/vfio_pci_intrs.c > >> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > >> > @@ -17,6 +17,7 @@ > >> > #include <linux/interrupt.h> > >> > #include <linux/eventfd.h> > >> > #include <linux/pci.h> > >> > +#include <linux/msi.h> > >> > #include <linux/file.h> > >> > #include <linux/poll.h> > >> > #include <linux/vfio.h> > >> > @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > >> > struct pci_dev *pdev = vdev->pdev; > >> > int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector; > >> > char *name = msix ? "vfio-msix" : "vfio-msi"; > >> > + struct msi_msg msg; > >> > struct eventfd_ctx *trigger; > >> > int ret; > >> > > >> > @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > >> > return PTR_ERR(trigger); > >> > } > >> > > >> > + /* We possiblly lose the MSI/MSIx message in some cases. > >> > + * For example, BIST reset on IPR adapter. The MSIx table > >> > + * is cleaned out. However, we never get chance to put > >> > + * MSIx messages to MSIx table because all MSIx stuff is > >> > + * being cached in QEMU. Here, we had the trick to put the > >> > + * MSI/MSIx message back. > >> > + * > >> > + * Basically, we needn't worry about MSI messages. However, > >> > + * it's not harmful and there might be cases of PCI config data > >> > + * lost because of cached PCI config data in QEMU again. > >> > + * > >> > + * Note that we should flash the message prior to enabling > >> > + * the corresponding interrupt by request_irq(). > >> > + */ > >> > + get_cached_msi_msg(irq, &msg); > >> > + write_msi_msg(irq, &msg); > >> > + > >> > ret = request_irq(irq, vfio_msihandler, 0, > >> > vdev->ctx[vector].name, trigger); > >> > if (ret) { > > > > > >I understand from talking to Alexey that the BARs are reset by this > >BIST, but what about the MSIX capability? If that gets reset to be > >disabled, where does it get re-enabled? QEMU won't know about the BIST, > >so probably assumes nothing changed when the guest writes the enable > >bit. VFIO doesn't allow userspace writes to the MSIX capability. So it > >seems like the above would restore the table entry, but not necessarily > >whether MSIX is enabled on the device. > > > > Yes, It's not necessarily, but not harmful to restore the message into > device even though the PCI device has individual enabled. Prior to > request_irq(), the interrupt should have been masked on PIC/CPU side or > race condition there. > > >When I talked with Alexey I noted that we already have a BAR restore > >function, vfio_bar_restore(), that tries to do some fixup when backdoor > >resets are noticed. If we were to trigger that upon noting the user > >writing BAR values to what we think they're already set to, we could > >extend it to trigger an interrupt restore that could fixup both the > >capability and the table entries. Thanks, > > > > Various PCI devices could have different ways to do BIST. It's a bit > hard to capture all of them, I think. Alex, please advise which way > to fix the issue would be better. Or we can have this patch and figure > out the way using vfio_bar_restore() later? :-) This approach arguably has a very niche application. I'd rather see this device fall into the vfio_bar_restore() and add a callout to the vfio irq code to restore the MSIX table and MSI/MSIX capabilities if the surprise reset occurred with either enabled. That seems like it would be generally useful to any device with a backdoor reset. Besides, if the BARs on the device are being deprogrammed by BIST, then it must already be hitting the vfio_bar_restore code or just re-writing the vector table entry wouldn't be enough to get it running again. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-03-04 at 10:30 +0800, Gavin Shan wrote: > On Mon, Mar 03, 2014 at 12:36:16PM -0700, Alex Williamson wrote: > >On Mon, 2014-03-03 at 14:10 +0800, Gavin Shan wrote: > >> On Sun, Mar 02, 2014 at 09:49:49PM -0700, Alex Williamson wrote: > >> >On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote: > >> >> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote: > > ../... > > >> > >> Various PCI devices could have different ways to do BIST. It's a bit > >> hard to capture all of them, I think. Alex, please advise which way > >> to fix the issue would be better. Or we can have this patch and figure > >> out the way using vfio_bar_restore() later? :-) > > > >This approach arguably has a very niche application. I'd rather see > >this device fall into the vfio_bar_restore() and add a callout to the > >vfio irq code to restore the MSIX table and MSI/MSIX capabilities if the > >surprise reset occurred with either enabled. That seems like it would > >be generally useful to any device with a backdoor reset. Besides, if > >the BARs on the device are being deprogrammed by BIST, then it must > >already be hitting the vfio_bar_restore code or just re-writing the > >vector table entry wouldn't be enough to get it running again. Thanks, > > > > I had some debugging output in vfio_bar_restore() for my case, which is > pci_save_state(), BIST and then pci_restore_state() on guest side. I never > saw those debugging output from vfio_bar_restore(). Host VFIO-PCI has cached > PCI config space, which was built in advance, and BIST on guest side didn't > get altered PCI_COMMAND (memory/IO enable). So we didn't call into the function > where we expect to restore MSIx stuff, Or I missed your point? :-) Ok, I think I'm slowly wrapping my head around exactly what's going on, let me summarize to make sure I understand. The driver for this device does some degree of configuration, including enabling MSI-X for the device. It then initiates a BIST via non-standard memory mapped registers which resets some parts of the device, including the MMIO space containing the MSI-X vector table, but not including PCI config space of the device. In particular, the COMMAND register and BAR values in PCI config space are left intact (and apparently the MSI-X capability as well). If the driver wanted to be really evil, it could do this with vectors enabled and rely on the reset value of the MSI-X vectors being masked. However, we get lucky and the driver has all of the vectors masked prior to reset so that after reset we see the masked to unmasked transition and use that to program the eventfd, which is where you'd like to re-write the vector message. Is that all correct? If the rest of config space is unaffected by this reset then my suggestion to tie it into vfio_bar_restore() probably doesn't make sense. In fact, it doesn't seem like there's any config space interaction we could count on to reliably detect this. Thinking out loud, I wonder if there's any value to reading the MSI-X message from the device and comparing to the cached value so that we can at least detect that we've encountered this situation. If we did that, we could restore all of the vectors in the table. However, it seems like the only value that would add would be to write a message to syslog since the guest will need to re-write any vectors it intends to use (and assuming we can count on vectors being masked after reset as the spec indicates). So, I guess I can't think of anything better than what you proposed (with better comments to detail the situation for later). Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 2103576..279ebd0 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -17,6 +17,7 @@ #include <linux/interrupt.h> #include <linux/eventfd.h> #include <linux/pci.h> +#include <linux/msi.h> #include <linux/file.h> #include <linux/poll.h> #include <linux/vfio.h> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, struct pci_dev *pdev = vdev->pdev; int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector; char *name = msix ? "vfio-msix" : "vfio-msi"; + struct msi_msg msg; struct eventfd_ctx *trigger; int ret; @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, return PTR_ERR(trigger); } + /* We possiblly lose the MSI/MSIx message in some cases. + * For example, BIST reset on IPR adapter. The MSIx table + * is cleaned out. However, we never get chance to put + * MSIx messages to MSIx table because all MSIx stuff is + * being cached in QEMU. Here, we had the trick to put the + * MSI/MSIx message back. + * + * Basically, we needn't worry about MSI messages. However, + * it's not harmful and there might be cases of PCI config data + * lost because of cached PCI config data in QEMU again. + * + * Note that we should flash the message prior to enabling + * the corresponding interrupt by request_irq(). + */ + get_cached_msi_msg(irq, &msg); + write_msi_msg(irq, &msg); + ret = request_irq(irq, vfio_msihandler, 0, vdev->ctx[vector].name, trigger); if (ret) {
The problem is specific to the case of BIST issued applied to IPR adapter on the guest side. After BIST reset, we lose everything in MSIx table and we never have chance update MSIx messages for those enabled interrupts to MSIx table. The patch fixes it by writing MSIx message to MSIx table before reenabling them. Reported-by: Wen Xiong <wenxiong@us.ibm.com> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> --- drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)