Message ID | 1302632571.3589.115.camel@x201 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2011-04-12 20:22, Alex Williamson wrote: > On Tue, 2011-04-12 at 11:02 +0300, Avi Kivity wrote: >> On 04/12/2011 10:48 AM, Ren, Yongjie wrote: >>> Hi All, >>> This is KVM test result against kvm.git 7a7ada1bfb958d2ad722d0df9299f1b0136ec1d4 based on kernel 2.6.39-rc2+, and qemu-kvm.git df85c051d780bca0ee2462cfeb8ef6d9552a19b0. >>> >>> We found 1 bug about "NIC cannot work when it had been used before ". >>> The VT-d bug 730441 (qemu bugzilla) concerning "nomsi NIC" is fixed. >>> >>> New issue: >>> 1.[VT-d] NIC cannot work when it had been used before >>> https://bugs.launchpad.net/qemu/+bug/754591 >>> >> >> += Alex. >> > > This is caused by the patch below. When we do a reset via PCI sysfs, > the device state is saved and restored around the reset. When the state > is restored, the saved state is invalidated. Now when we go to free the > device, we call the "I know what I'm doing" __pci_reset_function(), > which doesn't save/restore state, then try to do a restore, but there's > nothing saved, so the device only has reset values... oops. > > Jan, do you actually have a test case where you can see a difference > restoring the original saved state? I'm tempted to suggest we just > revert this patch. Otherwise it seems like we an interface to extract > and reload the original saved state for the device. Thanks, I've no test case, but the issue is clear: we used to leak guest manipulations of the config space to the host or the new owner. However, I'm first of all wondering why the heck libvirt should issue a sysfs PCI reset while the device is in KVM/guest hands? Is it clear that this is actually the case? Then it should be fixed independently as it would be a bug (proper pattern would be: deassign, reset, reassign). That said, our way of relying on the consistency of the saved state between assignment and release is in fact a bit fragile. We should probably make it more robust as you suggested. Jan
On Tue, 2011-04-12 at 23:02 +0200, Jan Kiszka wrote: > On 2011-04-12 20:22, Alex Williamson wrote: > > On Tue, 2011-04-12 at 11:02 +0300, Avi Kivity wrote: > >> On 04/12/2011 10:48 AM, Ren, Yongjie wrote: > >>> Hi All, > >>> This is KVM test result against kvm.git 7a7ada1bfb958d2ad722d0df9299f1b0136ec1d4 based on kernel 2.6.39-rc2+, and qemu-kvm.git df85c051d780bca0ee2462cfeb8ef6d9552a19b0. > >>> > >>> We found 1 bug about "NIC cannot work when it had been used before ". > >>> The VT-d bug 730441 (qemu bugzilla) concerning "nomsi NIC" is fixed. > >>> > >>> New issue: > >>> 1.[VT-d] NIC cannot work when it had been used before > >>> https://bugs.launchpad.net/qemu/+bug/754591 > >>> > >> > >> += Alex. > >> > > > > This is caused by the patch below. When we do a reset via PCI sysfs, > > the device state is saved and restored around the reset. When the state > > is restored, the saved state is invalidated. Now when we go to free the > > device, we call the "I know what I'm doing" __pci_reset_function(), > > which doesn't save/restore state, then try to do a restore, but there's > > nothing saved, so the device only has reset values... oops. > > > > Jan, do you actually have a test case where you can see a difference > > restoring the original saved state? I'm tempted to suggest we just > > revert this patch. Otherwise it seems like we an interface to extract > > and reload the original saved state for the device. Thanks, > > I've no test case, but the issue is clear: we used to leak guest > manipulations of the config space to the host or the new owner. But is there any state that we care about that isn't flushed by the device reset? Sure there might be some subtle config space changes, but the guest can't change fundamental mappings or anything. > However, I'm first of all wondering why the heck libvirt should issue a > sysfs PCI reset while the device is in KVM/guest hands? Is it clear that > this is actually the case? Then it should be fixed independently as it > would be a bug (proper pattern would be: deassign, reset, reassign). libvirt does it's own weird device resets, but that's prior to handing the device off to qemu, so doesn't come into play. The change is d9488459 where qemu resets the device via PCI sysfs on VM reset. Without this, a device can continue to DMA across the reset, trashing guest memory. I think qemu, as the owner of the device, has every right to do this, and it's the only reasonable way to quiesce the device. > That said, our way of relying on the consistency of the saved state > between assignment and release is in fact a bit fragile. We should > probably make it more robust as you suggested. That may take some time, and since we don't actually have a test case or known issue, I vote to revert ed78661f. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-04-12 23:20, Alex Williamson wrote: > On Tue, 2011-04-12 at 23:02 +0200, Jan Kiszka wrote: >> On 2011-04-12 20:22, Alex Williamson wrote: >>> On Tue, 2011-04-12 at 11:02 +0300, Avi Kivity wrote: >>>> On 04/12/2011 10:48 AM, Ren, Yongjie wrote: >>>>> Hi All, >>>>> This is KVM test result against kvm.git 7a7ada1bfb958d2ad722d0df9299f1b0136ec1d4 based on kernel 2.6.39-rc2+, and qemu-kvm.git df85c051d780bca0ee2462cfeb8ef6d9552a19b0. >>>>> >>>>> We found 1 bug about "NIC cannot work when it had been used before ". >>>>> The VT-d bug 730441 (qemu bugzilla) concerning "nomsi NIC" is fixed. >>>>> >>>>> New issue: >>>>> 1.[VT-d] NIC cannot work when it had been used before >>>>> https://bugs.launchpad.net/qemu/+bug/754591 >>>>> >>>> >>>> += Alex. >>>> >>> >>> This is caused by the patch below. When we do a reset via PCI sysfs, >>> the device state is saved and restored around the reset. When the state >>> is restored, the saved state is invalidated. Now when we go to free the >>> device, we call the "I know what I'm doing" __pci_reset_function(), >>> which doesn't save/restore state, then try to do a restore, but there's >>> nothing saved, so the device only has reset values... oops. >>> >>> Jan, do you actually have a test case where you can see a difference >>> restoring the original saved state? I'm tempted to suggest we just >>> revert this patch. Otherwise it seems like we an interface to extract >>> and reload the original saved state for the device. Thanks, >> >> I've no test case, but the issue is clear: we used to leak guest >> manipulations of the config space to the host or the new owner. > > But is there any state that we care about that isn't flushed by the > device reset? Sure there might be some subtle config space changes, but > the guest can't change fundamental mappings or anything. I haven't written a "malicious" guest driver yet, but one of the easiest ways to confuse the host is to disable the device's INTx before the guest terminates. That would survive the old way of releasing and resetting the device. > >> However, I'm first of all wondering why the heck libvirt should issue a >> sysfs PCI reset while the device is in KVM/guest hands? Is it clear that >> this is actually the case? Then it should be fixed independently as it >> would be a bug (proper pattern would be: deassign, reset, reassign). > > libvirt does it's own weird device resets, but that's prior to handing > the device off to qemu, so doesn't come into play. The change is > d9488459 where qemu resets the device via PCI sysfs on VM reset. > Without this, a device can continue to DMA across the reset, trashing > guest memory. I think qemu, as the owner of the device, has every right > to do this, and it's the only reasonable way to quiesce the device. > >> That said, our way of relying on the consistency of the saved state >> between assignment and release is in fact a bit fragile. We should >> probably make it more robust as you suggested. > > That may take some time, and since we don't actually have a test case or > known issue, I vote to revert ed78661f. Some alternative ways to address the issue in qemu: - save/restore the full config space in qemu (won't help if qemu terminates prematurely) - drop&reacquire the device before/after the reset - export a pure reset service via kvm to userspace Jan
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 7623408..d389207 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -197,7 +197,8 @@ static void kvm_free_assigned_device(struct kvm *kvm, { kvm_free_assigned_irq(kvm, assigned_dev); - pci_reset_function(assigned_dev->dev); + __pci_reset_function(assigned_dev->dev); + pci_restore_state(assigned_dev->dev); pci_release_regions(assigned_dev->dev); pci_disable_device(assigned_dev->dev); @@ -514,6 +515,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm, } pci_reset_function(dev); + pci_save_state(dev); match->assigned_dev_id = assigned_dev->assigned_dev_id; match->host_segnr = assigned_dev->segnr; @@ -544,6 +546,7 @@ out: mutex_unlock(&kvm->lock); return r; out_list_del: + pci_restore_state(dev); list_del(&match->list); pci_release_regions(dev); out_disable: