diff mbox

pci-assign: Do not reset the device unless the kernel supports it

Message ID 4DED470F.4020203@web.de (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kiszka June 6, 2011, 9:30 p.m. UTC
From: Jan Kiszka <jan.kiszka@siemens.com>

At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
reset on an assigned device and corrupt its config space. Prevent
this by checking for a host kernel with the required support, tagged by
the to-be-introduced KVM_CAP_DEVICE_RESET.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

PS: What's the state of those KVM patches? Will they make it into 3.0?

 hw/device-assignment.c |   33 +++++++++++++++++++--------------
 1 files changed, 19 insertions(+), 14 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alex Williamson June 6, 2011, 9:48 p.m. UTC | #1
On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
> reset on an assigned device and corrupt its config space. Prevent
> this by checking for a host kernel with the required support, tagged by
> the to-be-introduced KVM_CAP_DEVICE_RESET.

Wouldn't it be easier just to revert ed78661f in 2.6.39 stable?  I guess
we don't have an option to do that for .38 since stable is done there,
but there are also some intel-iommu breakages that won't make stable for
that release.  It seems like the userspace invoked reset resolves known,
demonstrable issues of devices continuing to DMA into guest memory while
ed78661f is mostly a theoretical change.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> PS: What's the state of those KVM patches? Will they make it into 3.0?

The PCI save/restore ones are in:

f8fcfd775523347afe460dc3a0f45d0479e784a2
ffbdd3f7931fb7cb7e36d00d16303ec433be5145
24a4742f0be6226eb0106fbb17caf4d711d1ad43

Thanks,
Alex

> 
>  hw/device-assignment.c |   33 +++++++++++++++++++--------------
>  1 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 57d8dc0..97a1450 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -1689,26 +1689,31 @@ static const VMStateDescription vmstate_assigned_device = {
>  static void reset_assigned_device(DeviceState *dev)
>  {
>      PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev);
> +#ifdef KVM_CAP_DEVICE_RESET
>      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
>      char reset_file[64];
>      const char reset[] = "1";
>      int fd, ret;
>  
> -    snprintf(reset_file, sizeof(reset_file),
> -             "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/reset",
> -             adev->host.seg, adev->host.bus, adev->host.dev, adev->host.func);
> -
> -    /*
> -     * Issue a device reset via pci-sysfs.  Note that we use write(2) here
> -     * and ignore the return value because some kernels have a bug that
> -     * returns 0 rather than bytes written on success, sending us into an
> -     * infinite retry loop using other write mechanisms.
> -     */
> -    fd = open(reset_file, O_WRONLY);
> -    if (fd != -1) {
> -        ret = write(fd, reset, strlen(reset));
> -        close(fd);
> +    if (kvm_check_extension(kvm_state, KVM_CAP_DEVICE_RESET) {
> +        snprintf(reset_file, sizeof(reset_file),
> +                 "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/reset",
> +                 adev->host.seg, adev->host.bus, adev->host.dev,
> +                 adev->host.func);
> +
> +        /*
> +         * Issue a device reset via pci-sysfs.  Note that we use write(2) here
> +         * and ignore the return value because some kernels have a bug that
> +         * returns 0 rather than bytes written on success, sending us into an
> +         * infinite retry loop using other write mechanisms.
> +         */
> +        fd = open(reset_file, O_WRONLY);
> +        if (fd != -1) {
> +            ret = write(fd, reset, strlen(reset));
> +            close(fd);
> +        }
>      }
> +#endif /* KVM_CAP_DEVICE_RESET */
>  
>      /*
>       * When a 0 is written to the command register, the device is logically
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka June 6, 2011, 10:04 p.m. UTC | #2
On 2011-06-06 23:48, Alex Williamson wrote:
> On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
>> reset on an assigned device and corrupt its config space. Prevent
>> this by checking for a host kernel with the required support, tagged by
>> the to-be-introduced KVM_CAP_DEVICE_RESET.
> 
> Wouldn't it be easier just to revert ed78661f in 2.6.39 stable?  I guess
> we don't have an option to do that for .38 since stable is done there,
> but there are also some intel-iommu breakages that won't make stable for
> that release.  It seems like the userspace invoked reset resolves known,
> demonstrable issues of devices continuing to DMA into guest memory while
> ed78661f is mostly a theoretical change.

Easier would be this patch. But I don't mind reverting the problematic
commit in 39, whatever is preferred. We should just resolve the issue
finally.

> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>
>> PS: What's the state of those KVM patches? Will they make it into 3.0?
> 
> The PCI save/restore ones are in:
> 
> f8fcfd775523347afe460dc3a0f45d0479e784a2
> ffbdd3f7931fb7cb7e36d00d16303ec433be5145
> 24a4742f0be6226eb0106fbb17caf4d711d1ad43

Oh, they are just missing in kvm.git so far.

Jan
Avi Kivity June 7, 2011, 8:06 a.m. UTC | #3
On 06/07/2011 01:04 AM, Jan Kiszka wrote:
> On 2011-06-06 23:48, Alex Williamson wrote:
> >  On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote:
> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >>  At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
> >>  reset on an assigned device and corrupt its config space. Prevent
> >>  this by checking for a host kernel with the required support, tagged by
> >>  the to-be-introduced KVM_CAP_DEVICE_RESET.
> >
> >  Wouldn't it be easier just to revert ed78661f in 2.6.39 stable?  I guess
> >  we don't have an option to do that for .38 since stable is done there,
> >  but there are also some intel-iommu breakages that won't make stable for
> >  that release.  It seems like the userspace invoked reset resolves known,
> >  demonstrable issues of devices continuing to DMA into guest memory while
> >  ed78661f is mostly a theoretical change.
>
> Easier would be this patch. But I don't mind reverting the problematic
> commit in 39, whatever is preferred. We should just resolve the issue
> finally.

Kernel problems should be solved in the kernel (with exceptions of 
course, but don't see the need here).
Jan Kiszka June 7, 2011, 8:14 a.m. UTC | #4
On 2011-06-07 10:06, Avi Kivity wrote:
> On 06/07/2011 01:04 AM, Jan Kiszka wrote:
>> On 2011-06-06 23:48, Alex Williamson wrote:
>> >  On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote:
>> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
>> >>
>> >>  At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
>> >>  reset on an assigned device and corrupt its config space. Prevent
>> >>  this by checking for a host kernel with the required support,
>> tagged by
>> >>  the to-be-introduced KVM_CAP_DEVICE_RESET.
>> >
>> >  Wouldn't it be easier just to revert ed78661f in 2.6.39 stable?  I
>> guess
>> >  we don't have an option to do that for .38 since stable is done there,
>> >  but there are also some intel-iommu breakages that won't make
>> stable for
>> >  that release.  It seems like the userspace invoked reset resolves
>> known,
>> >  demonstrable issues of devices continuing to DMA into guest memory
>> while
>> >  ed78661f is mostly a theoretical change.
>>
>> Easier would be this patch. But I don't mind reverting the problematic
>> commit in 39, whatever is preferred. We should just resolve the issue
>> finally.
> 
> Kernel problems should be solved in the kernel (with exceptions of
> course, but don't see the need here).

Then please file a revert for stable ASAP.

Jan
diff mbox

Patch

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 57d8dc0..97a1450 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1689,26 +1689,31 @@  static const VMStateDescription vmstate_assigned_device = {
 static void reset_assigned_device(DeviceState *dev)
 {
     PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev);
+#ifdef KVM_CAP_DEVICE_RESET
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     char reset_file[64];
     const char reset[] = "1";
     int fd, ret;
 
-    snprintf(reset_file, sizeof(reset_file),
-             "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/reset",
-             adev->host.seg, adev->host.bus, adev->host.dev, adev->host.func);
-
-    /*
-     * Issue a device reset via pci-sysfs.  Note that we use write(2) here
-     * and ignore the return value because some kernels have a bug that
-     * returns 0 rather than bytes written on success, sending us into an
-     * infinite retry loop using other write mechanisms.
-     */
-    fd = open(reset_file, O_WRONLY);
-    if (fd != -1) {
-        ret = write(fd, reset, strlen(reset));
-        close(fd);
+    if (kvm_check_extension(kvm_state, KVM_CAP_DEVICE_RESET) {
+        snprintf(reset_file, sizeof(reset_file),
+                 "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/reset",
+                 adev->host.seg, adev->host.bus, adev->host.dev,
+                 adev->host.func);
+
+        /*
+         * Issue a device reset via pci-sysfs.  Note that we use write(2) here
+         * and ignore the return value because some kernels have a bug that
+         * returns 0 rather than bytes written on success, sending us into an
+         * infinite retry loop using other write mechanisms.
+         */
+        fd = open(reset_file, O_WRONLY);
+        if (fd != -1) {
+            ret = write(fd, reset, strlen(reset));
+            close(fd);
+        }
     }
+#endif /* KVM_CAP_DEVICE_RESET */
 
     /*
      * When a 0 is written to the command register, the device is logically