diff mbox

[3/3] drivers/vfio/pci: Fix MSIx message lost

Message ID 1393817042-13758-3-git-send-email-shangw@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gavin Shan March 3, 2014, 3:24 a.m. UTC
The problem is specific to the case of BIST issued applied to IPR
adapter on the guest side. After BIST reset, we lose everything
in MSIx table and we never have chance update MSIx messages for
those enabled interrupts to MSIx table.

The patch fixes it by writing MSIx message to MSIx table before
reenabling them.

Reported-by: Wen Xiong <wenxiong@us.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Benjamin Herrenschmidt March 3, 2014, 3:51 a.m. UTC | #1
On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
> The problem is specific to the case of BIST issued applied to IPR
> adapter on the guest side. After BIST reset, we lose everything
> in MSIx table and we never have chance update MSIx messages for
> those enabled interrupts to MSIx table.
> 
> The patch fixes it by writing MSIx message to MSIx table before
> reenabling them.

That needs a better explanation... the guest does try to restore the
MSI-X (in a way similar to resuming from suspend) by writing the
message value back into the table.

My understanding is that we trap that and turn that into a call to
vfio_msi_set_vector_signal, is that correct ?

In that case, it does indeed make sense to have that function rewrite
the cached message.

(Just trying to understand how this patch fixes the problem we saw)

I suppose other drivers would have similar issues if they have some
kind of internal "reset" path and try to save and restore the MSI-X
table.

Cheers,
Ben.

> Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 2103576..279ebd0 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -17,6 +17,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/eventfd.h>
>  #include <linux/pci.h>
> +#include <linux/msi.h>
>  #include <linux/file.h>
>  #include <linux/poll.h>
>  #include <linux/vfio.h>
> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  	struct pci_dev *pdev = vdev->pdev;
>  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
>  	char *name = msix ? "vfio-msix" : "vfio-msi";
> +	struct msi_msg msg;
>  	struct eventfd_ctx *trigger;
>  	int ret;
>  
> @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  		return PTR_ERR(trigger);
>  	}
>  
> +	/* We possiblly lose the MSI/MSIx message in some cases.
> +	 * For example, BIST reset on IPR adapter. The MSIx table
> +	 * is cleaned out. However, we never get chance to put
> +	 * MSIx messages to MSIx table because all MSIx stuff is
> +	 * being cached in QEMU. Here, we had the trick to put the
> +	 * MSI/MSIx message back.
> +	 *
> +	 * Basically, we needn't worry about MSI messages. However,
> +	 * it's not harmful and there might be cases of PCI config data
> +	 * lost because of cached PCI config data in QEMU again.
> +	 *
> +	 * Note that we should flash the message prior to enabling
> +	 * the corresponding interrupt by request_irq().
> +	 */
> +	 get_cached_msi_msg(irq, &msg);
> +	 write_msi_msg(irq, &msg);
> +
>  	ret = request_irq(irq, vfio_msihandler, 0,
>  			  vdev->ctx[vector].name, trigger);
>  	if (ret) {


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Kardashevskiy March 3, 2014, 4:43 a.m. UTC | #2
On 03/03/2014 02:51 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
>> The problem is specific to the case of BIST issued applied to IPR
>> adapter on the guest side. After BIST reset, we lose everything
>> in MSIx table and we never have chance update MSIx messages for
>> those enabled interrupts to MSIx table.
>>
>> The patch fixes it by writing MSIx message to MSIx table before
>> reenabling them.
> 
> That needs a better explanation... the guest does try to restore the
> MSI-X (in a way similar to resuming from suspend) by writing the
> message value back into the table.
> 
> My understanding is that we trap that and turn that into a call to
> vfio_msi_set_vector_signal, is that correct ?


Yep.

Gavin sent me a backtrace of what happens when the guest tries writing to a
BAR:

qemu/hw/pci/msix.c::msix_table_mmio_write
	msix_handle_mask_update
	msix_fire_vector_notifier
qemu/hw/misc/vfio.c::vfio_msix_vector_use
	vfio_msix_vector_do_use

IOCTL-to-VFIO: VFIO_DEVICE_SET_IRQS
host/drivers/vfio/pci/vfio_pci.c::vfio_pci_ioctl
host/drivers/vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl
	vfio_pci_set_msi_trigger
	vfio_msi_set_block
	vfio_msi_set_vector_signal



While it works for our particular problem and seems correct, it has one
flaw - hw/pci/msix.c will not generate this backtrace if masking bit does
not change which can happen in general:
===
static void msix_handle_mask_update(PCIDevice *dev, int vector, bool
was_masked)
{
    bool is_masked = msix_is_masked(dev, vector);

    if (is_masked == was_masked) {
        return;
    }
===

Or if masking bit is the same, nothing bad is expected?...


> In that case, it does indeed make sense to have that function rewrite
> the cached message.
> 
> (Just trying to understand how this patch fixes the problem we saw)
> 
> I suppose other drivers would have similar issues if they have some
> kind of internal "reset" path and try to save and restore the MSI-X
> table.





> Cheers,
> Ben.
> 
>> Reported-by: Wen Xiong <wenxiong@us.ibm.com>
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>>  drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 2103576..279ebd0 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/interrupt.h>
>>  #include <linux/eventfd.h>
>>  #include <linux/pci.h>
>> +#include <linux/msi.h>
>>  #include <linux/file.h>
>>  #include <linux/poll.h>
>>  #include <linux/vfio.h>
>> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>>  	struct pci_dev *pdev = vdev->pdev;
>>  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
>>  	char *name = msix ? "vfio-msix" : "vfio-msi";
>> +	struct msi_msg msg;
>>  	struct eventfd_ctx *trigger;
>>  	int ret;
>>  
>> @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>>  		return PTR_ERR(trigger);
>>  	}
>>  
>> +	/* We possiblly lose the MSI/MSIx message in some cases.
>> +	 * For example, BIST reset on IPR adapter. The MSIx table
>> +	 * is cleaned out. However, we never get chance to put
>> +	 * MSIx messages to MSIx table because all MSIx stuff is
>> +	 * being cached in QEMU. Here, we had the trick to put the
>> +	 * MSI/MSIx message back.
>> +	 *
>> +	 * Basically, we needn't worry about MSI messages. However,
>> +	 * it's not harmful and there might be cases of PCI config data
>> +	 * lost because of cached PCI config data in QEMU again.
>> +	 *
>> +	 * Note that we should flash the message prior to enabling
>> +	 * the corresponding interrupt by request_irq().
>> +	 */
>> +	 get_cached_msi_msg(irq, &msg);
>> +	 write_msi_msg(irq, &msg);
>> +
>>  	ret = request_irq(irq, vfio_msihandler, 0,
>>  			  vdev->ctx[vector].name, trigger);
>>  	if (ret) {
> 
>
Alex Williamson March 3, 2014, 4:49 a.m. UTC | #3
On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
> > The problem is specific to the case of BIST issued applied to IPR
> > adapter on the guest side. After BIST reset, we lose everything
> > in MSIx table and we never have chance update MSIx messages for
> > those enabled interrupts to MSIx table.
> > 
> > The patch fixes it by writing MSIx message to MSIx table before
> > reenabling them.
> 
> That needs a better explanation... the guest does try to restore the
> MSI-X (in a way similar to resuming from suspend) by writing the
> message value back into the table.
> 
> My understanding is that we trap that and turn that into a call to
> vfio_msi_set_vector_signal, is that correct ?
> 
> In that case, it does indeed make sense to have that function rewrite
> the cached message.
> 
> (Just trying to understand how this patch fixes the problem we saw)
> 
> I suppose other drivers would have similar issues if they have some
> kind of internal "reset" path and try to save and restore the MSI-X
> table.
> 
> Cheers,
> Ben.
> 
> > Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> > index 2103576..279ebd0 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/interrupt.h>
> >  #include <linux/eventfd.h>
> >  #include <linux/pci.h>
> > +#include <linux/msi.h>
> >  #include <linux/file.h>
> >  #include <linux/poll.h>
> >  #include <linux/vfio.h>
> > @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >  	struct pci_dev *pdev = vdev->pdev;
> >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> > +	struct msi_msg msg;
> >  	struct eventfd_ctx *trigger;
> >  	int ret;
> >  
> > @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >  		return PTR_ERR(trigger);
> >  	}
> >  
> > +	/* We possiblly lose the MSI/MSIx message in some cases.
> > +	 * For example, BIST reset on IPR adapter. The MSIx table
> > +	 * is cleaned out. However, we never get chance to put
> > +	 * MSIx messages to MSIx table because all MSIx stuff is
> > +	 * being cached in QEMU. Here, we had the trick to put the
> > +	 * MSI/MSIx message back.
> > +	 *
> > +	 * Basically, we needn't worry about MSI messages. However,
> > +	 * it's not harmful and there might be cases of PCI config data
> > +	 * lost because of cached PCI config data in QEMU again.
> > +	 *
> > +	 * Note that we should flash the message prior to enabling
> > +	 * the corresponding interrupt by request_irq().
> > +	 */
> > +	 get_cached_msi_msg(irq, &msg);
> > +	 write_msi_msg(irq, &msg);
> > +
> >  	ret = request_irq(irq, vfio_msihandler, 0,
> >  			  vdev->ctx[vector].name, trigger);
> >  	if (ret) {


I understand from talking to Alexey that the BARs are reset by this
BIST, but what about the MSIX capability?  If that gets reset to be
disabled, where does it get re-enabled?  QEMU won't know about the BIST,
so probably assumes nothing changed when the guest writes the enable
bit.  VFIO doesn't allow userspace writes to the MSIX capability.  So it
seems like the above would restore the table entry, but not necessarily
whether MSIX is enabled on the device.

When I talked with Alexey I noted that we already have a BAR restore
function, vfio_bar_restore(), that tries to do some fixup when backdoor
resets are noticed.  If we were to trigger that upon noting the user
writing BAR values to what we think they're already set to, we could
extend it to trigger an interrupt restore that could fixup both the
capability and the table entries.  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt March 3, 2014, 5 a.m. UTC | #4
On Mon, 2014-03-03 at 15:43 +1100, Alexey Kardashevskiy wrote:
> While it works for our particular problem and seems correct, it has one
> flaw - hw/pci/msix.c will not generate this backtrace if masking bit does
> not change which can happen in general:
> ===
> static void msix_handle_mask_update(PCIDevice *dev, int vector, bool
> was_masked)
> {
>     bool is_masked = msix_is_masked(dev, vector);
> 
>     if (is_masked == was_masked) {
>         return;
>     }
> ===
> 
> Or if masking bit is the same, nothing bad is expected?...

Hrm ok, so it will work in this specific case but might not in the general
case of a driver triggering some kind of local reset on the device requiring
the MSI-X to be restored. The guest will write but qemu will swallow them ...

I think that needs to be fixed but it might be hard without introducing
a new ioctl from what I can see of the way the code is structured... Unless
qemu turns that into a disable/enable pair I suppose.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt March 3, 2014, 5:05 a.m. UTC | #5
On Sun, 2014-03-02 at 21:49 -0700, Alex Williamson wrote:

> I understand from talking to Alexey that the BARs are reset by this
> BIST, but what about the MSIX capability?  If that gets reset to be
> disabled, where does it get re-enabled?

The guest will do pci_save/restore_state iirc which will attempt
to save and restore everything, but qemu might get in the way here...

I'm not sure whether the IPR BIST clears the config space but we
probably should account for the general case of the driver in the guest
doing some kind of reset (BIST or otherwise) and trying to save and
restore state itself.

>   QEMU won't know about the BIST,
> so probably assumes nothing changed when the guest writes the enable
> bit.  VFIO doesn't allow userspace writes to the MSIX capability.  So it
> seems like the above would restore the table entry, but not necessarily
> whether MSIX is enabled on the device.

Maybe qemu shouldn't trust its cache and upon guest writes, do a read
first to check the HW state ? It's not like any of this is performance
sensitive anyway...

> When I talked with Alexey I noted that we already have a BAR restore
> function, vfio_bar_restore(), that tries to do some fixup when backdoor
> resets are noticed.  If we were to trigger that upon noting the user
> writing BAR values to what we think they're already set to, we could
> extend it to trigger an interrupt restore that could fixup both the
> capability and the table entries.  Thanks,

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson March 3, 2014, 7:36 p.m. UTC | #6
On Mon, 2014-03-03 at 14:10 +0800, Gavin Shan wrote:
> On Sun, Mar 02, 2014 at 09:49:49PM -0700, Alex Williamson wrote:
> >On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote:
> >> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
> 
> .../...
> 
> >> 
> >> > Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> >> > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> >> > ---
> >> >  drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
> >> >  1 file changed, 19 insertions(+)
> >> > 
> >> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> >> > index 2103576..279ebd0 100644
> >> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> >> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> >> > @@ -17,6 +17,7 @@
> >> >  #include <linux/interrupt.h>
> >> >  #include <linux/eventfd.h>
> >> >  #include <linux/pci.h>
> >> > +#include <linux/msi.h>
> >> >  #include <linux/file.h>
> >> >  #include <linux/poll.h>
> >> >  #include <linux/vfio.h>
> >> > @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >> >  	struct pci_dev *pdev = vdev->pdev;
> >> >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> >> >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> >> > +	struct msi_msg msg;
> >> >  	struct eventfd_ctx *trigger;
> >> >  	int ret;
> >> >  
> >> > @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >> >  		return PTR_ERR(trigger);
> >> >  	}
> >> >  
> >> > +	/* We possiblly lose the MSI/MSIx message in some cases.
> >> > +	 * For example, BIST reset on IPR adapter. The MSIx table
> >> > +	 * is cleaned out. However, we never get chance to put
> >> > +	 * MSIx messages to MSIx table because all MSIx stuff is
> >> > +	 * being cached in QEMU. Here, we had the trick to put the
> >> > +	 * MSI/MSIx message back.
> >> > +	 *
> >> > +	 * Basically, we needn't worry about MSI messages. However,
> >> > +	 * it's not harmful and there might be cases of PCI config data
> >> > +	 * lost because of cached PCI config data in QEMU again.
> >> > +	 *
> >> > +	 * Note that we should flash the message prior to enabling
> >> > +	 * the corresponding interrupt by request_irq().
> >> > +	 */
> >> > +	 get_cached_msi_msg(irq, &msg);
> >> > +	 write_msi_msg(irq, &msg);
> >> > +
> >> >  	ret = request_irq(irq, vfio_msihandler, 0,
> >> >  			  vdev->ctx[vector].name, trigger);
> >> >  	if (ret) {
> >
> >
> >I understand from talking to Alexey that the BARs are reset by this
> >BIST, but what about the MSIX capability?  If that gets reset to be
> >disabled, where does it get re-enabled?  QEMU won't know about the BIST,
> >so probably assumes nothing changed when the guest writes the enable
> >bit.  VFIO doesn't allow userspace writes to the MSIX capability.  So it
> >seems like the above would restore the table entry, but not necessarily
> >whether MSIX is enabled on the device.
> >
> 
> Yes, It's not necessarily, but not harmful to restore the message into
> device even though the PCI device has individual enabled. Prior to
> request_irq(), the interrupt should have been masked on PIC/CPU side or
> race condition there.
> 
> >When I talked with Alexey I noted that we already have a BAR restore
> >function, vfio_bar_restore(), that tries to do some fixup when backdoor
> >resets are noticed.  If we were to trigger that upon noting the user
> >writing BAR values to what we think they're already set to, we could
> >extend it to trigger an interrupt restore that could fixup both the
> >capability and the table entries.  Thanks,
> >
> 
> Various PCI devices could have different ways to do BIST. It's a bit
> hard to capture all of them, I think. Alex, please advise which way
> to fix the issue would be better. Or we can have this patch and figure
> out the way using vfio_bar_restore() later? :-)

This approach arguably has a very niche application.  I'd rather see
this device fall into the vfio_bar_restore() and add a callout to the
vfio irq code to restore the MSIX table and MSI/MSIX capabilities if the
surprise reset occurred with either enabled.  That seems like it would
be generally useful to any device with a backdoor reset.  Besides, if
the BARs on the device are being deprogrammed by BIST, then it must
already be hitting the vfio_bar_restore code or just re-writing the
vector table entry wouldn't be enough to get it running again.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson March 4, 2014, 8:45 p.m. UTC | #7
On Tue, 2014-03-04 at 10:30 +0800, Gavin Shan wrote:
> On Mon, Mar 03, 2014 at 12:36:16PM -0700, Alex Williamson wrote:
> >On Mon, 2014-03-03 at 14:10 +0800, Gavin Shan wrote:
> >> On Sun, Mar 02, 2014 at 09:49:49PM -0700, Alex Williamson wrote:
> >> >On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote:
> >> >> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
> 
> ../...
> 
> >> 
> >> Various PCI devices could have different ways to do BIST. It's a bit
> >> hard to capture all of them, I think. Alex, please advise which way
> >> to fix the issue would be better. Or we can have this patch and figure
> >> out the way using vfio_bar_restore() later? :-)
> >
> >This approach arguably has a very niche application.  I'd rather see
> >this device fall into the vfio_bar_restore() and add a callout to the
> >vfio irq code to restore the MSIX table and MSI/MSIX capabilities if the
> >surprise reset occurred with either enabled.  That seems like it would
> >be generally useful to any device with a backdoor reset.  Besides, if
> >the BARs on the device are being deprogrammed by BIST, then it must
> >already be hitting the vfio_bar_restore code or just re-writing the
> >vector table entry wouldn't be enough to get it running again.  Thanks,
> >
> 
> I had some debugging output in vfio_bar_restore() for my case, which is
> pci_save_state(), BIST and then pci_restore_state() on guest side. I never
> saw those debugging output from vfio_bar_restore(). Host VFIO-PCI has cached
> PCI config space, which was built in advance, and BIST on guest side didn't
> get altered PCI_COMMAND (memory/IO enable). So we didn't call into the function
> where we expect to restore MSIx stuff, Or I missed your point? :-)

Ok, I think I'm slowly wrapping my head around exactly what's going on,
let me summarize to make sure I understand.  The driver for this device
does some degree of configuration, including enabling MSI-X for the
device.  It then initiates a BIST via non-standard memory mapped
registers which resets some parts of the device, including the MMIO
space containing the MSI-X vector table, but not including PCI config
space of the device.  In particular, the COMMAND register and BAR values
in PCI config space are left intact (and apparently the MSI-X capability
as well).

If the driver wanted to be really evil, it could do this with vectors
enabled and rely on the reset value of the MSI-X vectors being masked.
However, we get lucky and the driver has all of the vectors masked prior
to reset so that after reset we see the masked to unmasked transition
and use that to program the eventfd, which is where you'd like to
re-write the vector message.  Is that all correct?

If the rest of config space is unaffected by this reset then my
suggestion to tie it into vfio_bar_restore() probably doesn't make
sense.  In fact, it doesn't seem like there's any config space
interaction we could count on to reliably detect this.

Thinking out loud, I wonder if there's any value to reading the MSI-X
message from the device and comparing to the cached value so that we can
at least detect that we've encountered this situation.  If we did that,
we could restore all of the vectors in the table.  However, it seems
like the only value that would add would be to write a message to syslog
since the guest will need to re-write any vectors it intends to use (and
assuming we can count on vectors being masked after reset as the spec
indicates).  So, I guess I can't think of anything better than what you
proposed (with better comments to detail the situation for later).
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 2103576..279ebd0 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -17,6 +17,7 @@ 
 #include <linux/interrupt.h>
 #include <linux/eventfd.h>
 #include <linux/pci.h>
+#include <linux/msi.h>
 #include <linux/file.h>
 #include <linux/poll.h>
 #include <linux/vfio.h>
@@ -517,6 +518,7 @@  static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	struct pci_dev *pdev = vdev->pdev;
 	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
 	char *name = msix ? "vfio-msix" : "vfio-msi";
+	struct msi_msg msg;
 	struct eventfd_ctx *trigger;
 	int ret;
 
@@ -544,6 +546,23 @@  static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		return PTR_ERR(trigger);
 	}
 
+	/* We possiblly lose the MSI/MSIx message in some cases.
+	 * For example, BIST reset on IPR adapter. The MSIx table
+	 * is cleaned out. However, we never get chance to put
+	 * MSIx messages to MSIx table because all MSIx stuff is
+	 * being cached in QEMU. Here, we had the trick to put the
+	 * MSI/MSIx message back.
+	 *
+	 * Basically, we needn't worry about MSI messages. However,
+	 * it's not harmful and there might be cases of PCI config data
+	 * lost because of cached PCI config data in QEMU again.
+	 *
+	 * Note that we should flash the message prior to enabling
+	 * the corresponding interrupt by request_irq().
+	 */
+	 get_cached_msi_msg(irq, &msg);
+	 write_msi_msg(irq, &msg);
+
 	ret = request_irq(irq, vfio_msihandler, 0,
 			  vdev->ctx[vector].name, trigger);
 	if (ret) {