diff mbox

Biweekly KVM Test report, kernel 7a7ada1b... qemu df85c051...

Message ID 1302632571.3589.115.camel@x201 (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Williamson April 12, 2011, 6:22 p.m. UTC
On Tue, 2011-04-12 at 11:02 +0300, Avi Kivity wrote:
> On 04/12/2011 10:48 AM, Ren, Yongjie wrote:
> > Hi All,
> > This is KVM test result against kvm.git 7a7ada1bfb958d2ad722d0df9299f1b0136ec1d4 based on kernel 2.6.39-rc2+, and qemu-kvm.git df85c051d780bca0ee2462cfeb8ef6d9552a19b0.
> >
> > We found 1 bug about "NIC cannot work when it had been used before ". 
> > The VT-d bug 730441 (qemu bugzilla) concerning "nomsi NIC" is fixed.
> >
> > New issue:
> > 1.[VT-d] NIC cannot work when it had been used before
> >   https://bugs.launchpad.net/qemu/+bug/754591
> >
> 
> += Alex.
> 

This is caused by the patch below.  When we do a reset via PCI sysfs,
the device state is saved and restored around the reset.  When the state
is restored, the saved state is invalidated.  Now when we go to free the
device, we call the "I know what I'm doing" __pci_reset_function(),
which doesn't save/restore state, then try to do a restore, but there's
nothing saved, so the device only has reset values... oops.

Jan, do you actually have a test case where you can see a difference
restoring the original saved state?  I'm tempted to suggest we just
revert this patch.  Otherwise it seems like we an interface to extract
and reload the original saved state for the device.  Thanks,

Alex

commit ed78661f2614d3c9f69c23e280db3bafdabdf5bb
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Tue Nov 16 22:30:05 2010 +0100

    KVM: Save/restore state of assigned PCI device
    
    The guest may change states that pci_reset_function does not touch. So
    we better save/restore the assigned device across guest usage.
    
    Acked-by: Alex Williamson <alex.williamson@redhat.com>
    Acked-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
    Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jan Kiszka April 12, 2011, 9:02 p.m. UTC | #1
On 2011-04-12 20:22, Alex Williamson wrote:
> On Tue, 2011-04-12 at 11:02 +0300, Avi Kivity wrote:
>> On 04/12/2011 10:48 AM, Ren, Yongjie wrote:
>>> Hi All,
>>> This is KVM test result against kvm.git 7a7ada1bfb958d2ad722d0df9299f1b0136ec1d4 based on kernel 2.6.39-rc2+, and qemu-kvm.git df85c051d780bca0ee2462cfeb8ef6d9552a19b0.
>>>
>>> We found 1 bug about "NIC cannot work when it had been used before ". 
>>> The VT-d bug 730441 (qemu bugzilla) concerning "nomsi NIC" is fixed.
>>>
>>> New issue:
>>> 1.[VT-d] NIC cannot work when it had been used before
>>>   https://bugs.launchpad.net/qemu/+bug/754591
>>>
>>
>> += Alex.
>>
> 
> This is caused by the patch below.  When we do a reset via PCI sysfs,
> the device state is saved and restored around the reset.  When the state
> is restored, the saved state is invalidated.  Now when we go to free the
> device, we call the "I know what I'm doing" __pci_reset_function(),
> which doesn't save/restore state, then try to do a restore, but there's
> nothing saved, so the device only has reset values... oops.
> 
> Jan, do you actually have a test case where you can see a difference
> restoring the original saved state?  I'm tempted to suggest we just
> revert this patch.  Otherwise it seems like we an interface to extract
> and reload the original saved state for the device.  Thanks,

I've no test case, but the issue is clear: we used to leak guest
manipulations of the config space to the host or the new owner.

However, I'm first of all wondering why the heck libvirt should issue a
sysfs PCI reset while the device is in KVM/guest hands? Is it clear that
this is actually the case? Then it should be fixed independently as it
would be a bug (proper pattern would be: deassign, reset, reassign).

That said, our way of relying on the consistency of the saved state
between assignment and release is in fact a bit fragile. We should
probably make it more robust as you suggested.

Jan
Alex Williamson April 12, 2011, 9:20 p.m. UTC | #2
On Tue, 2011-04-12 at 23:02 +0200, Jan Kiszka wrote:
> On 2011-04-12 20:22, Alex Williamson wrote:
> > On Tue, 2011-04-12 at 11:02 +0300, Avi Kivity wrote:
> >> On 04/12/2011 10:48 AM, Ren, Yongjie wrote:
> >>> Hi All,
> >>> This is KVM test result against kvm.git 7a7ada1bfb958d2ad722d0df9299f1b0136ec1d4 based on kernel 2.6.39-rc2+, and qemu-kvm.git df85c051d780bca0ee2462cfeb8ef6d9552a19b0.
> >>>
> >>> We found 1 bug about "NIC cannot work when it had been used before ". 
> >>> The VT-d bug 730441 (qemu bugzilla) concerning "nomsi NIC" is fixed.
> >>>
> >>> New issue:
> >>> 1.[VT-d] NIC cannot work when it had been used before
> >>>   https://bugs.launchpad.net/qemu/+bug/754591
> >>>
> >>
> >> += Alex.
> >>
> > 
> > This is caused by the patch below.  When we do a reset via PCI sysfs,
> > the device state is saved and restored around the reset.  When the state
> > is restored, the saved state is invalidated.  Now when we go to free the
> > device, we call the "I know what I'm doing" __pci_reset_function(),
> > which doesn't save/restore state, then try to do a restore, but there's
> > nothing saved, so the device only has reset values... oops.
> > 
> > Jan, do you actually have a test case where you can see a difference
> > restoring the original saved state?  I'm tempted to suggest we just
> > revert this patch.  Otherwise it seems like we an interface to extract
> > and reload the original saved state for the device.  Thanks,
> 
> I've no test case, but the issue is clear: we used to leak guest
> manipulations of the config space to the host or the new owner.

But is there any state that we care about that isn't flushed by the
device reset?  Sure there might be some subtle config space changes, but
the guest can't change fundamental mappings or anything.

> However, I'm first of all wondering why the heck libvirt should issue a
> sysfs PCI reset while the device is in KVM/guest hands? Is it clear that
> this is actually the case? Then it should be fixed independently as it
> would be a bug (proper pattern would be: deassign, reset, reassign).

libvirt does it's own weird device resets, but that's prior to handing
the device off to qemu, so doesn't come into play.  The change is
d9488459 where qemu resets the device via PCI sysfs on VM reset.
Without this, a device can continue to DMA across the reset, trashing
guest memory.  I think qemu, as the owner of the device, has every right
to do this, and it's the only reasonable way to quiesce the device.

> That said, our way of relying on the consistency of the saved state
> between assignment and release is in fact a bit fragile. We should
> probably make it more robust as you suggested.

That may take some time, and since we don't actually have a test case or
known issue, I vote to revert ed78661f.  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka April 12, 2011, 10:35 p.m. UTC | #3
On 2011-04-12 23:20, Alex Williamson wrote:
> On Tue, 2011-04-12 at 23:02 +0200, Jan Kiszka wrote:
>> On 2011-04-12 20:22, Alex Williamson wrote:
>>> On Tue, 2011-04-12 at 11:02 +0300, Avi Kivity wrote:
>>>> On 04/12/2011 10:48 AM, Ren, Yongjie wrote:
>>>>> Hi All,
>>>>> This is KVM test result against kvm.git 7a7ada1bfb958d2ad722d0df9299f1b0136ec1d4 based on kernel 2.6.39-rc2+, and qemu-kvm.git df85c051d780bca0ee2462cfeb8ef6d9552a19b0.
>>>>>
>>>>> We found 1 bug about "NIC cannot work when it had been used before ". 
>>>>> The VT-d bug 730441 (qemu bugzilla) concerning "nomsi NIC" is fixed.
>>>>>
>>>>> New issue:
>>>>> 1.[VT-d] NIC cannot work when it had been used before
>>>>>   https://bugs.launchpad.net/qemu/+bug/754591
>>>>>
>>>>
>>>> += Alex.
>>>>
>>>
>>> This is caused by the patch below.  When we do a reset via PCI sysfs,
>>> the device state is saved and restored around the reset.  When the state
>>> is restored, the saved state is invalidated.  Now when we go to free the
>>> device, we call the "I know what I'm doing" __pci_reset_function(),
>>> which doesn't save/restore state, then try to do a restore, but there's
>>> nothing saved, so the device only has reset values... oops.
>>>
>>> Jan, do you actually have a test case where you can see a difference
>>> restoring the original saved state?  I'm tempted to suggest we just
>>> revert this patch.  Otherwise it seems like we an interface to extract
>>> and reload the original saved state for the device.  Thanks,
>>
>> I've no test case, but the issue is clear: we used to leak guest
>> manipulations of the config space to the host or the new owner.
> 
> But is there any state that we care about that isn't flushed by the
> device reset?  Sure there might be some subtle config space changes, but
> the guest can't change fundamental mappings or anything.

I haven't written a "malicious" guest driver yet, but one of the easiest
ways to confuse the host is to disable the device's INTx before the
guest terminates. That would survive the old way of releasing and
resetting the device.

> 
>> However, I'm first of all wondering why the heck libvirt should issue a
>> sysfs PCI reset while the device is in KVM/guest hands? Is it clear that
>> this is actually the case? Then it should be fixed independently as it
>> would be a bug (proper pattern would be: deassign, reset, reassign).
> 
> libvirt does it's own weird device resets, but that's prior to handing
> the device off to qemu, so doesn't come into play.  The change is
> d9488459 where qemu resets the device via PCI sysfs on VM reset.
> Without this, a device can continue to DMA across the reset, trashing
> guest memory.  I think qemu, as the owner of the device, has every right
> to do this, and it's the only reasonable way to quiesce the device.
> 
>> That said, our way of relying on the consistency of the saved state
>> between assignment and release is in fact a bit fragile. We should
>> probably make it more robust as you suggested.
> 
> That may take some time, and since we don't actually have a test case or
> known issue, I vote to revert ed78661f.

Some alternative ways to address the issue in qemu:

- save/restore the full config space in qemu (won't help if qemu
  terminates prematurely)
- drop&reacquire the device before/after the reset
- export a pure reset service via kvm to userspace

Jan
diff mbox

Patch

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 7623408..d389207 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -197,7 +197,8 @@  static void kvm_free_assigned_device(struct kvm *kvm,
 {
        kvm_free_assigned_irq(kvm, assigned_dev);
 
-       pci_reset_function(assigned_dev->dev);
+       __pci_reset_function(assigned_dev->dev);
+       pci_restore_state(assigned_dev->dev);
 
        pci_release_regions(assigned_dev->dev);
        pci_disable_device(assigned_dev->dev);
@@ -514,6 +515,7 @@  static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
        }
 
        pci_reset_function(dev);
+       pci_save_state(dev);
 
        match->assigned_dev_id = assigned_dev->assigned_dev_id;
        match->host_segnr = assigned_dev->segnr;
@@ -544,6 +546,7 @@  out:
        mutex_unlock(&kvm->lock);
        return r;
 out_list_del:
+       pci_restore_state(dev);
        list_del(&match->list);
        pci_release_regions(dev);
 out_disable: