diff mbox series

vdpa: Avoid reset when stop device

Message ID 1648024966-5170-1-git-send-email-08005325@163.com (mailing list archive)
State New, archived
Headers show
Series vdpa: Avoid reset when stop device | expand

Commit Message

08005325@163.com March 23, 2022, 8:42 a.m. UTC
From: Michael Qiu <qiudayu@archeros.com>

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset twice, this leads
to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_dev_stop(), qemu tries to stop the device,
then stop the queue: vhost_virtqueue_stop().
In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and the driver finds
that the VQ is invalied, this is the root cause.

Actually, device reset will be called within func release()

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

Signed-off-by: Michael Qiu<qiudayu@archeros.com>
---
 hw/virtio/vhost-vdpa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Jason Wang March 23, 2022, 9:20 a.m. UTC | #1
Adding Eugenio,  and Ling Shan.

On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>
> From: Michael Qiu <qiudayu@archeros.com>
>
> Currently, when VM poweroff, it will trigger vdpa
> device(such as mlx bluefield2 VF) reset twice, this leads
> to below issue:
>
> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>
> This because in vhost_dev_stop(), qemu tries to stop the device,
> then stop the queue: vhost_virtqueue_stop().
> In vhost_dev_stop(), it resets the device, which clear some flags
> in low level driver, and the driver finds
> that the VQ is invalied, this is the root cause.
>
> Actually, device reset will be called within func release()
>
> To solve the issue, vdpa should set vring unready, and
> remove reset ops in device stop: vhost_dev_start(hdev, false).

This is an interesting issue. Do you see a real issue except for the
above warnings.

The reason we "abuse" reset is that we don't have a stop uAPI for
vhost. We plan to add a status bit to stop the whole device in the
virtio spec, but considering it may take a while maybe we can first
introduce a new uAPI/ioctl for that.

Note that the stop doesn't just work for virtqueue but others like,
e.g config space. But considering we don't have config interrupt
support right now, we're probably fine.

Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
problematic, Ling Shan, please have a check. And we probably need a
workaround for vp_vdpa as well.

Anyhow, this seems to be better than reset. So for 7.1:

Acked-by: Jason Wang <jasowang@redhat.com>

>
> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> ---
>  hw/virtio/vhost-vdpa.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c5ed7a3..d858b4f 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>      return idx;
>  }
>
> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>  {
>      int i;
>      trace_vhost_vdpa_set_vring_ready(dev);
>      for (i = 0; i < dev->nvqs; ++i) {
>          struct vhost_vring_state state = {
>              .index = dev->vq_index + i,
> -            .num = 1,
> +            .num = ready,
>          };
>          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>      }
> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          if (unlikely(!ok)) {
>              return -1;
>          }
> -        vhost_vdpa_set_vring_ready(dev);
> +        vhost_vdpa_set_vring_ready(dev, 1);
>      } else {
> +        vhost_vdpa_set_vring_ready(dev, 0);
>          ok = vhost_vdpa_svqs_stop(dev);
>          if (unlikely(!ok)) {
>              return -1;
> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          memory_listener_register(&v->listener, &address_space_memory);
>          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>      } else {
> -        vhost_vdpa_reset_device(dev);
>          vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>                                     VIRTIO_CONFIG_S_DRIVER);
>          memory_listener_unregister(&v->listener);
> --
> 1.8.3.1
>
Si-Wei Liu March 25, 2022, 6:32 a.m. UTC | #2
On 3/23/2022 2:20 AM, Jason Wang wrote:
> Adding Eugenio,  and Ling Shan.
>
> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>> From: Michael Qiu <qiudayu@archeros.com>
>>
>> Currently, when VM poweroff, it will trigger vdpa
>> device(such as mlx bluefield2 VF) reset twice, this leads
>> to below issue:
>>
>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>
>> This because in vhost_dev_stop(), qemu tries to stop the device,
>> then stop the queue: vhost_virtqueue_stop().
>> In vhost_dev_stop(), it resets the device, which clear some flags
>> in low level driver, and the driver finds
>> that the VQ is invalied, this is the root cause.
>>
>> Actually, device reset will be called within func release()
>>
>> To solve the issue, vdpa should set vring unready, and
>> remove reset ops in device stop: vhost_dev_start(hdev, false).
> This is an interesting issue. Do you see a real issue except for the
> above warnings.
>
> The reason we "abuse" reset is that we don't have a stop uAPI for
> vhost. We plan to add a status bit to stop the whole device in the
> virtio spec, but considering it may take a while maybe we can first
> introduce a new uAPI/ioctl for that.
Yep. What was missing here is a vdpa specific uAPI for per-virtqueue 
stop/suspend rather than spec level amendment to stop the whole device 
(including both vq and config space). For now we can have vDPA specific 
means to control the vq, something vDPA hardware vendor must support for 
live migration, e.g. datapath switching to shadow vq. I believe the spec 
amendment may follow to define a bit for virtio feature negotiation 
later on if needed (FWIW virtio-vdpa already does set_vq_ready(..., 0) 
to stop the vq).

However, there's a flaw in this patch, see below.
>
> Note that the stop doesn't just work for virtqueue but others like,
> e.g config space. But considering we don't have config interrupt
> support right now, we're probably fine.
>
> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
> problematic, Ling Shan, please have a check. And we probably need a
> workaround for vp_vdpa as well.
>
> Anyhow, this seems to be better than reset. So for 7.1:
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>> ---
>>   hw/virtio/vhost-vdpa.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..d858b4f 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>>       return idx;
>>   }
>>
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = ready,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>           if (unlikely(!ok)) {
>>               return -1;
>>           }
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>           ok = vhost_vdpa_svqs_stop(dev);
>>           if (unlikely(!ok)) {
>>               return -1;
>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>           memory_listener_register(&v->listener, &address_space_memory);
>>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>       } else {
>> -        vhost_vdpa_reset_device(dev);
Unfortunately, the reset can't be be removed from here as this code path 
usually involves virtio reset or status change for e.g. invoked via 
virtio_net_set_status(... , 0). Ideally we should use the 
VhostOps.vhost_reset_device() to reset the vhost-vdpa device where 
status change is involved after vhost_dev_stop() is done, but this 
distinction is not there yet as of today in all of the virtio devices 
except vhost_user_scsi.

Alternatively we may be able to do something like below, stop the 
virtqueue in vhost_vdpa_get_vring_base() in the vhost_virtqueue_stop() 
context. Only until the hardware vq is stopped, svq can stop and unmap 
then vhost-vdpa would reset the device status. It kinda works, but not 
in a perfect way...

--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct 
vhost_dev *dev, int idx)
      return idx;
  }

-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int enable)
  {
      int i;
      trace_vhost_vdpa_set_vring_ready(dev);
      for (i = 0; i < dev->nvqs; ++i) {
          struct vhost_vring_state state = {
              .index = dev->vq_index + i,
-            .num = 1,
+            .num = enable,
          };
          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
      }
@@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
*dev, bool started)

      if (started) {
          vhost_vdpa_host_notifiers_init(dev);
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
      } else {
          vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
      }
@@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct 
vhost_dev *dev,
  {
      int ret;

+    /* Deactivate the queue (best effort) */
+    vhost_vdpa_set_vring_ready(dev, 0);
+
      ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
      trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
      return ret;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 437347a..2e917d8 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
VirtIODevice *vdev)
      /* should only be called after backend is connected */
      assert(hdev->vhost_ops);

-    if (hdev->vhost_ops->vhost_dev_start) {
-        hdev->vhost_ops->vhost_dev_start(hdev, false);
-    }
      for (i = 0; i < hdev->nvqs; ++i) {
          vhost_virtqueue_stop(hdev,
                               vdev,
                               hdev->vqs + i,
                               hdev->vq_index + i);
      }
+    if (hdev->vhost_ops->vhost_dev_start) {
+        hdev->vhost_ops->vhost_dev_start(hdev, false);
+    }

      if (vhost_dev_has_iommu(hdev)) {
          if (hdev->vhost_ops->vhost_set_iotlb_callback) {

Regards,
-Siwei

>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>                                      VIRTIO_CONFIG_S_DRIVER);
>>           memory_listener_unregister(&v->listener);
>> --
>> 1.8.3.1
>>
>
Si-Wei Liu March 25, 2022, 7:19 p.m. UTC | #3
On 3/25/2022 2:00 AM, Michael Qiu wrote:
>
>
> On 2022/3/25 14:32, Si-Wei Liu wrote:
>>
>>
>> On 3/23/2022 2:20 AM, Jason Wang wrote:
>>> Adding Eugenio,  and Ling Shan.
>>>
>>> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>>>> From: Michael Qiu <qiudayu@archeros.com>
>>>>
>>>> Currently, when VM poweroff, it will trigger vdpa
>>>> device(such as mlx bluefield2 VF) reset twice, this leads
>>>> to below issue:
>>>>
>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>
>>>> This because in vhost_dev_stop(), qemu tries to stop the device,
>>>> then stop the queue: vhost_virtqueue_stop().
>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>> in low level driver, and the driver finds
>>>> that the VQ is invalied, this is the root cause.
>>>>
>>>> Actually, device reset will be called within func release()
>>>>
>>>> To solve the issue, vdpa should set vring unready, and
>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>> This is an interesting issue. Do you see a real issue except for the
>>> above warnings.
>>>
>>> The reason we "abuse" reset is that we don't have a stop uAPI for
>>> vhost. We plan to add a status bit to stop the whole device in the
>>> virtio spec, but considering it may take a while maybe we can first
>>> introduce a new uAPI/ioctl for that.
>> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue 
>> stop/suspend rather than spec level amendment to stop the whole 
>> device (including both vq and config space). For now we can have vDPA 
>> specific means to control the vq, something vDPA hardware vendor must 
>> support for live migration, e.g. datapath switching to shadow vq. I 
>> believe the spec amendment may follow to define a bit for virtio 
>> feature negotiation later on if needed (FWIW virtio-vdpa already does 
>> set_vq_ready(..., 0) to stop the vq).
>>
>> However, there's a flaw in this patch, see below.
>>>
>>> Note that the stop doesn't just work for virtqueue but others like,
>>> e.g config space. But considering we don't have config interrupt
>>> support right now, we're probably fine.
>>>
>>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
>>> problematic, Ling Shan, please have a check. And we probably need a
>>> workaround for vp_vdpa as well.
>>>
>>> Anyhow, this seems to be better than reset. So for 7.1:
>>>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>
>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>> ---
>>>>   hw/virtio/vhost-vdpa.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..d858b4f 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct 
>>>> vhost_dev *dev, int idx)
>>>>       return idx;
>>>>   }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, 
>>>> unsigned int ready)
>>>>   {
>>>>       int i;
>>>>       trace_vhost_vdpa_set_vring_ready(dev);
>>>>       for (i = 0; i < dev->nvqs; ++i) {
>>>>           struct vhost_vring_state state = {
>>>>               .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = ready,
>>>>           };
>>>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>       }
>>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct 
>>>> vhost_dev *dev, bool started)
>>>>           if (unlikely(!ok)) {
>>>>               return -1;
>>>>           }
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>       } else {
>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>           ok = vhost_vdpa_svqs_stop(dev);
>>>>           if (unlikely(!ok)) {
>>>>               return -1;
>>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct 
>>>> vhost_dev *dev, bool started)
>>>>           memory_listener_register(&v->listener, 
>>>> &address_space_memory);
>>>>           return vhost_vdpa_add_status(dev, 
>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>       } else {
>>>> -        vhost_vdpa_reset_device(dev);
>> Unfortunately, the reset can't be be removed from here as this code 
>> path usually involves virtio reset or status change for e.g. invoked 
>> via virtio_net_set_status(... , 0). Ideally we should use the 
>> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where 
>> status change is involved after vhost_dev_stop() is done, but this 
>> distinction is not there yet as of today in all of the virtio devices 
>> except vhost_user_scsi.
>>
>
> Actually, we may not care about virtio_net_set_status(... , 0), 
> because in virtio_net_device_unrealize() will finnally call 
> qemu_del_nic(),
The reset is needed because guest can write 0 to the device status 
register to initiate device reset while VM is running, that's a very 
common scenario where virtio_net_set_status(... , 0) has to be invoked. 
Quoting the spec:

-----------------%<-----------------

2.1.2 Device Requirements: Device Status Field
The device MUST initialize device status to 0 upon reset.
...
device_status
The driver writes the device status here (see 2.1). Writing 0 into this 
field resets the device.

-----------------%<-----------------

That being said, remove vhost_vdpa_reset_device() will introduce severe 
regression to vdpa functionality, for e.g. you may see weird error or 
panic once guest is rebooted as the device state may have been messed 
up. As indicated earlier, to fix it in a clean way it would need to 
involve serious code refactoring to all callers of vhost_dev_stop, and 
converting those which require device reset to explicitly call 
VhostOps.vhost_reset_device().

> see below:
>
> qemu_del_nic()
>     -->qemu_cleanup_net_client()
>         -->cleanup/vhost_vdpa_cleanup()
>             -->qemu_close(s->vhost_vdpa.device_fd)
>
> In kernel space, close() action triggered release(),
> release()/vhost_vdpa_release()
>     --> vhost_vdpa_reset()
>
> So it will finnally do vdpa_reset, that's why I said reset will be 
> called twice in current qemu code.

That's a minor problem as nobody cares about the extra reset while guest 
is being shut off.


Regards,
-Siwei
>
> Thanks,
> Michael
>
>> Alternatively we may be able to do something like below, stop the 
>> virtqueue in vhost_vdpa_get_vring_base() in the 
>> vhost_virtqueue_stop() context. Only until the hardware vq is 
>> stopped, svq can stop and unmap then vhost-vdpa would reset the 
>> device status. It kinda works, but not in a perfect way...
>>
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct 
>> vhost_dev *dev, int idx)
>>       return idx;
>>   }
>>
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int 
>> enable)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = enable,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>
>>       if (started) {
>>           vhost_vdpa_host_notifiers_init(dev);
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>       }
>> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct 
>> vhost_dev *dev,
>>   {
>>       int ret;
>>
>> +    /* Deactivate the queue (best effort) */
>> +    vhost_vdpa_set_vring_ready(dev, 0);
>> +
>>       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
>>       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
>>       return ret;
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 437347a..2e917d8 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
>> VirtIODevice *vdev)
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>>
>> -    if (hdev->vhost_ops->vhost_dev_start) {
>> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
>> -    }
>>       for (i = 0; i < hdev->nvqs; ++i) {
>>           vhost_virtqueue_stop(hdev,
>>                                vdev,
>>                                hdev->vqs + i,
>>                                hdev->vq_index + i);
>>       }
>> +    if (hdev->vhost_ops->vhost_dev_start) {
>> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
>> +    }
>>
>>       if (vhost_dev_has_iommu(hdev)) {
>>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>>
>> Regards,
>> -Siwei
>>
>>>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> VIRTIO_CONFIG_S_DRIVER);
>>>>           memory_listener_unregister(&v->listener);
>>>> -- 
>>>> 1.8.3.1
>>>>
>>>
>>
>>
>
Si-Wei Liu March 25, 2022, 7:59 p.m. UTC | #4
On 3/25/2022 12:18 AM, Michael Qiu wrote:
>
>
> On 2022/3/25 14:32, Si-Wei Liu Wrote:
>>
>>
>> On 3/23/2022 2:20 AM, Jason Wang wrote:
>>> Adding Eugenio,  and Ling Shan.
>>>
>>> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>>>> From: Michael Qiu <qiudayu@archeros.com>
>>>>
>>>> Currently, when VM poweroff, it will trigger vdpa
>>>> device(such as mlx bluefield2 VF) reset twice, this leads
>>>> to below issue:
>>>>
>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>
>>>> This because in vhost_dev_stop(), qemu tries to stop the device,
>>>> then stop the queue: vhost_virtqueue_stop().
>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>> in low level driver, and the driver finds
>>>> that the VQ is invalied, this is the root cause.
>>>>
>>>> Actually, device reset will be called within func release()
>>>>
>>>> To solve the issue, vdpa should set vring unready, and
>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>> This is an interesting issue. Do you see a real issue except for the
>>> above warnings.
>>>
>>> The reason we "abuse" reset is that we don't have a stop uAPI for
>>> vhost. We plan to add a status bit to stop the whole device in the
>>> virtio spec, but considering it may take a while maybe we can first
>>> introduce a new uAPI/ioctl for that.
>> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue 
>> stop/suspend rather than spec level amendment to stop the whole 
>> device (including both vq and config space). For now we can have vDPA 
>> specific means to control the vq, something vDPA hardware vendor must 
>> support for live migration, e.g. datapath switching to shadow vq. I 
>> believe the spec amendment may follow to define a bit for virtio 
>> feature negotiation later on if needed (FWIW virtio-vdpa already does 
>> set_vq_ready(..., 0) to stop the vq).
>>
>> However, there's a flaw in this patch, see below.
>>>
>>> Note that the stop doesn't just work for virtqueue but others like,
>>> e.g config space. But considering we don't have config interrupt
>>> support right now, we're probably fine.
>>>
>>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
>>> problematic, Ling Shan, please have a check. And we probably need a
>>> workaround for vp_vdpa as well.
>>>
>>> Anyhow, this seems to be better than reset. So for 7.1:
>>>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>
>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>> ---
>>>>   hw/virtio/vhost-vdpa.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..d858b4f 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct 
>>>> vhost_dev *dev, int idx)
>>>>       return idx;
>>>>   }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, 
>>>> unsigned int ready)
>>>>   {
>>>>       int i;
>>>>       trace_vhost_vdpa_set_vring_ready(dev);
>>>>       for (i = 0; i < dev->nvqs; ++i) {
>>>>           struct vhost_vring_state state = {
>>>>               .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = ready,
>>>>           };
>>>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>       }
>>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct 
>>>> vhost_dev *dev, bool started)
>>>>           if (unlikely(!ok)) {
>>>>               return -1;
>>>>           }
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>       } else {
>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>           ok = vhost_vdpa_svqs_stop(dev);
>>>>           if (unlikely(!ok)) {
>>>>               return -1;
>>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct 
>>>> vhost_dev *dev, bool started)
>>>>           memory_listener_register(&v->listener, 
>>>> &address_space_memory);
>>>>           return vhost_vdpa_add_status(dev, 
>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>       } else {
>>>> -        vhost_vdpa_reset_device(dev);
>> Unfortunately, the reset can't be be removed from here as this code 
>> path usually involves virtio reset or status change for e.g. invoked 
>> via virtio_net_set_status(... , 0). Ideally we should use the 
>> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where 
>> status change is involved after vhost_dev_stop() is done, but this 
>> distinction is not there yet as of today in all of the virtio devices 
>> except vhost_user_scsi.
>>
>> Alternatively we may be able to do something like below, stop the 
>> virtqueue in vhost_vdpa_get_vring_base() in the 
>> vhost_virtqueue_stop() context. Only until the hardware vq is 
>> stopped, svq can stop and unmap then vhost-vdpa would reset the 
>> device status. It kinda works, but not in a perfect way...
As I indicated above, this is an less ideal way to address the issue you 
came across about, without losing functionality or introducing 
regression to the code. Ideally it'd be best to get fixed in a clean 
way, though that would a little more effort in code refactoring. 
Personally I feel that the error message you saw is somewhat benign and 
don't think it caused any real problem. Did you see trouble if living 
with the bogus error message for the moment?

>>
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct 
>> vhost_dev *dev, int idx)
>>       return idx;
>>   }
>>
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int 
>> enable)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = enable,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>
>>       if (started) {
>>           vhost_vdpa_host_notifiers_init(dev);
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>       }
>> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct 
>> vhost_dev *dev,
>>   {
>>       int ret;
>>
>> +    /* Deactivate the queue (best effort) */
>> +    vhost_vdpa_set_vring_ready(dev, 0);
>> +
>
> I don't think it's a good idea to change the state in "get" function,
>
> get means "read" not "write".
Well, if you look at the context of vhost_vdpa_get_vring_base(), the 
only caller is vhost_virtqueue_stop(). Without stopping the hardware 
ahead, it doesn't make sense to effectively get a used_index snapshot 
for resuming/restarting the vq. It might be more obvious and sensible to 
see that were to introduce another Vhost op to suspend the vq right 
before the get_vring_base() call, though I wouldn't bother doing it.

>
>>       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
>>       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
>>       return ret;
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 437347a..2e917d8 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
>> VirtIODevice *vdev)
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>>
>> -    if (hdev->vhost_ops->vhost_dev_start) {
>> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
>> -    }
>>       for (i = 0; i < hdev->nvqs; ++i) {
>>           vhost_virtqueue_stop(hdev,
>>                                vdev,
>>                                hdev->vqs + i,
>>                                hdev->vq_index + i);
>>       }
>> +    if (hdev->vhost_ops->vhost_dev_start) {
>> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
>> +    }
>>
>
> This first idea comes to me is just like this, but at last I don't 
> choose this solution.
>
> When we start a device, first we start the virtqueue then 
> vhost_ops->vhost_dev_start.
>
> So in stop stage, in my opinion, we should just do the opposite, do as 
> the orignal code do. Change the sequential is a dangerous action.
I don't see any danger yet, would you please elaborate the specific 
problem you see? I think this sequence is as expected:

1. suspend each individual vq i.e. stop processing upcoming request, and 
possibly complete inflight requests  -> get_vring_base()
2. tear down virtio resources from VMM for e.g. unmap guest memory 
mappings, remove host notifiers, and et al
3. reset device -> vhost_vdpa_reset_device()

Regards,
-Siwei

>
> Thanks,
> Michael
>>       if (vhost_dev_has_iommu(hdev)) {
>>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>>
>> Regards,
>> -Siwei
>>
>>>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> VIRTIO_CONFIG_S_DRIVER);
>>>>           memory_listener_unregister(&v->listener);
>>>> -- 
>>>> 1.8.3.1
>>>>
>>>
>>
>>
Jason Wang March 30, 2022, 8:52 a.m. UTC | #5
On Sat, Mar 26, 2022 at 3:59 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/25/2022 12:18 AM, Michael Qiu wrote:
> >
> >
> > On 2022/3/25 14:32, Si-Wei Liu Wrote:
> >>
> >>
> >> On 3/23/2022 2:20 AM, Jason Wang wrote:
> >>> Adding Eugenio,  and Ling Shan.
> >>>
> >>> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
> >>>> From: Michael Qiu <qiudayu@archeros.com>
> >>>>
> >>>> Currently, when VM poweroff, it will trigger vdpa
> >>>> device(such as mlx bluefield2 VF) reset twice, this leads
> >>>> to below issue:
> >>>>
> >>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
> >>>>
> >>>> This because in vhost_dev_stop(), qemu tries to stop the device,
> >>>> then stop the queue: vhost_virtqueue_stop().
> >>>> In vhost_dev_stop(), it resets the device, which clear some flags
> >>>> in low level driver, and the driver finds
> >>>> that the VQ is invalied, this is the root cause.
> >>>>
> >>>> Actually, device reset will be called within func release()
> >>>>
> >>>> To solve the issue, vdpa should set vring unready, and
> >>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
> >>> This is an interesting issue. Do you see a real issue except for the
> >>> above warnings.
> >>>
> >>> The reason we "abuse" reset is that we don't have a stop uAPI for
> >>> vhost. We plan to add a status bit to stop the whole device in the
> >>> virtio spec, but considering it may take a while maybe we can first
> >>> introduce a new uAPI/ioctl for that.
> >> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue
> >> stop/suspend rather than spec level amendment to stop the whole
> >> device (including both vq and config space). For now we can have vDPA
> >> specific means to control the vq, something vDPA hardware vendor must
> >> support for live migration, e.g. datapath switching to shadow vq. I
> >> believe the spec amendment may follow to define a bit for virtio
> >> feature negotiation later on if needed (FWIW virtio-vdpa already does
> >> set_vq_ready(..., 0) to stop the vq).
> >>
> >> However, there's a flaw in this patch, see below.
> >>>
> >>> Note that the stop doesn't just work for virtqueue but others like,
> >>> e.g config space. But considering we don't have config interrupt
> >>> support right now, we're probably fine.
> >>>
> >>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
> >>> problematic, Ling Shan, please have a check. And we probably need a
> >>> workaround for vp_vdpa as well.
> >>>
> >>> Anyhow, this seems to be better than reset. So for 7.1:
> >>>
> >>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>>
> >>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> >>>> ---
> >>>>   hw/virtio/vhost-vdpa.c | 8 ++++----
> >>>>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>> index c5ed7a3..d858b4f 100644
> >>>> --- a/hw/virtio/vhost-vdpa.c
> >>>> +++ b/hw/virtio/vhost-vdpa.c
> >>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct
> >>>> vhost_dev *dev, int idx)
> >>>>       return idx;
> >>>>   }
> >>>>
> >>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev,
> >>>> unsigned int ready)
> >>>>   {
> >>>>       int i;
> >>>>       trace_vhost_vdpa_set_vring_ready(dev);
> >>>>       for (i = 0; i < dev->nvqs; ++i) {
> >>>>           struct vhost_vring_state state = {
> >>>>               .index = dev->vq_index + i,
> >>>> -            .num = 1,
> >>>> +            .num = ready,
> >>>>           };
> >>>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> >>>>       }
> >>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct
> >>>> vhost_dev *dev, bool started)
> >>>>           if (unlikely(!ok)) {
> >>>>               return -1;
> >>>>           }
> >>>> -        vhost_vdpa_set_vring_ready(dev);
> >>>> +        vhost_vdpa_set_vring_ready(dev, 1);
> >>>>       } else {
> >>>> +        vhost_vdpa_set_vring_ready(dev, 0);
> >>>>           ok = vhost_vdpa_svqs_stop(dev);
> >>>>           if (unlikely(!ok)) {
> >>>>               return -1;
> >>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct
> >>>> vhost_dev *dev, bool started)
> >>>>           memory_listener_register(&v->listener,
> >>>> &address_space_memory);
> >>>>           return vhost_vdpa_add_status(dev,
> >>>> VIRTIO_CONFIG_S_DRIVER_OK);
> >>>>       } else {
> >>>> -        vhost_vdpa_reset_device(dev);
> >> Unfortunately, the reset can't be be removed from here as this code
> >> path usually involves virtio reset or status change for e.g. invoked
> >> via virtio_net_set_status(... , 0). Ideally we should use the
> >> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where
> >> status change is involved after vhost_dev_stop() is done, but this
> >> distinction is not there yet as of today in all of the virtio devices
> >> except vhost_user_scsi.
> >>
> >> Alternatively we may be able to do something like below, stop the
> >> virtqueue in vhost_vdpa_get_vring_base() in the
> >> vhost_virtqueue_stop() context. Only until the hardware vq is
> >> stopped, svq can stop and unmap then vhost-vdpa would reset the
> >> device status. It kinda works, but not in a perfect way...
> As I indicated above, this is an less ideal way to address the issue you
> came across about, without losing functionality or introducing
> regression to the code. Ideally it'd be best to get fixed in a clean
> way, though that would a little more effort in code refactoring.
> Personally I feel that the error message you saw is somewhat benign and
> don't think it caused any real problem. Did you see trouble if living
> with the bogus error message for the moment?

Should be fine for networking devices at least since we don't care
about duplicated packets (restore last_avail_idx as used_idx).

But it might be problematic for types of devices.

Thanks


>
> >>
> >> --- a/hw/virtio/vhost-vdpa.c
> >> +++ b/hw/virtio/vhost-vdpa.c
> >> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct
> >> vhost_dev *dev, int idx)
> >>       return idx;
> >>   }
> >>
> >> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int
> >> enable)
> >>   {
> >>       int i;
> >>       trace_vhost_vdpa_set_vring_ready(dev);
> >>       for (i = 0; i < dev->nvqs; ++i) {
> >>           struct vhost_vring_state state = {
> >>               .index = dev->vq_index + i,
> >> -            .num = 1,
> >> +            .num = enable,
> >>           };
> >>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> >>       }
> >> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev
> >> *dev, bool started)
> >>
> >>       if (started) {
> >>           vhost_vdpa_host_notifiers_init(dev);
> >> -        vhost_vdpa_set_vring_ready(dev);
> >> +        vhost_vdpa_set_vring_ready(dev, 1);
> >>       } else {
> >>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >>       }
> >> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct
> >> vhost_dev *dev,
> >>   {
> >>       int ret;
> >>
> >> +    /* Deactivate the queue (best effort) */
> >> +    vhost_vdpa_set_vring_ready(dev, 0);
> >> +
> >
> > I don't think it's a good idea to change the state in "get" function,
> >
> > get means "read" not "write".
> Well, if you look at the context of vhost_vdpa_get_vring_base(), the
> only caller is vhost_virtqueue_stop(). Without stopping the hardware
> ahead, it doesn't make sense to effectively get a used_index snapshot
> for resuming/restarting the vq. It might be more obvious and sensible to
> see that were to introduce another Vhost op to suspend the vq right
> before the get_vring_base() call, though I wouldn't bother doing it.
>
> >
> >>       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> >>       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
> >>       return ret;
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 437347a..2e917d8 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev,
> >> VirtIODevice *vdev)
> >>       /* should only be called after backend is connected */
> >>       assert(hdev->vhost_ops);
> >>
> >> -    if (hdev->vhost_ops->vhost_dev_start) {
> >> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
> >> -    }
> >>       for (i = 0; i < hdev->nvqs; ++i) {
> >>           vhost_virtqueue_stop(hdev,
> >>                                vdev,
> >>                                hdev->vqs + i,
> >>                                hdev->vq_index + i);
> >>       }
> >> +    if (hdev->vhost_ops->vhost_dev_start) {
> >> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
> >> +    }
> >>
> >
> > This first idea comes to me is just like this, but at last I don't
> > choose this solution.
> >
> > When we start a device, first we start the virtqueue then
> > vhost_ops->vhost_dev_start.
> >
> > So in stop stage, in my opinion, we should just do the opposite, do as
> > the orignal code do. Change the sequential is a dangerous action.
> I don't see any danger yet, would you please elaborate the specific
> problem you see? I think this sequence is as expected:
>
> 1. suspend each individual vq i.e. stop processing upcoming request, and
> possibly complete inflight requests  -> get_vring_base()
> 2. tear down virtio resources from VMM for e.g. unmap guest memory
> mappings, remove host notifiers, and et al
> 3. reset device -> vhost_vdpa_reset_device()
>
> Regards,
> -Siwei
>
> >
> > Thanks,
> > Michael
> >>       if (vhost_dev_has_iommu(hdev)) {
> >>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
> >>
> >> Regards,
> >> -Siwei
> >>
> >>>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >>>> VIRTIO_CONFIG_S_DRIVER);
> >>>>           memory_listener_unregister(&v->listener);
> >>>> --
> >>>> 1.8.3.1
> >>>>
> >>>
> >>
> >>
>
Michael Qiu March 30, 2022, 9:53 a.m. UTC | #6
On 2022/3/30 16:52, Jason Wang wrote:
> On Sat, Mar 26, 2022 at 3:59 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>>
>> On 3/25/2022 12:18 AM, Michael Qiu wrote:
>>>
>>>
>>> On 2022/3/25 14:32, Si-Wei Liu Wrote:
>>>>
>>>>
>>>> On 3/23/2022 2:20 AM, Jason Wang wrote:
>>>>> Adding Eugenio,  and Ling Shan.
>>>>>
>>>>> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>>>>>> From: Michael Qiu <qiudayu@archeros.com>
>>>>>>
>>>>>> Currently, when VM poweroff, it will trigger vdpa
>>>>>> device(such as mlx bluefield2 VF) reset twice, this leads
>>>>>> to below issue:
>>>>>>
>>>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>>>
>>>>>> This because in vhost_dev_stop(), qemu tries to stop the device,
>>>>>> then stop the queue: vhost_virtqueue_stop().
>>>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>>>> in low level driver, and the driver finds
>>>>>> that the VQ is invalied, this is the root cause.
>>>>>>
>>>>>> Actually, device reset will be called within func release()
>>>>>>
>>>>>> To solve the issue, vdpa should set vring unready, and
>>>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>>>> This is an interesting issue. Do you see a real issue except for the
>>>>> above warnings.
>>>>>
>>>>> The reason we "abuse" reset is that we don't have a stop uAPI for
>>>>> vhost. We plan to add a status bit to stop the whole device in the
>>>>> virtio spec, but considering it may take a while maybe we can first
>>>>> introduce a new uAPI/ioctl for that.
>>>> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue
>>>> stop/suspend rather than spec level amendment to stop the whole
>>>> device (including both vq and config space). For now we can have vDPA
>>>> specific means to control the vq, something vDPA hardware vendor must
>>>> support for live migration, e.g. datapath switching to shadow vq. I
>>>> believe the spec amendment may follow to define a bit for virtio
>>>> feature negotiation later on if needed (FWIW virtio-vdpa already does
>>>> set_vq_ready(..., 0) to stop the vq).
>>>>
>>>> However, there's a flaw in this patch, see below.
>>>>>
>>>>> Note that the stop doesn't just work for virtqueue but others like,
>>>>> e.g config space. But considering we don't have config interrupt
>>>>> support right now, we're probably fine.
>>>>>
>>>>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
>>>>> problematic, Ling Shan, please have a check. And we probably need a
>>>>> workaround for vp_vdpa as well.
>>>>>
>>>>> Anyhow, this seems to be better than reset. So for 7.1:
>>>>>
>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>>
>>>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>>>> ---
>>>>>>    hw/virtio/vhost-vdpa.c | 8 ++++----
>>>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>> index c5ed7a3..d858b4f 100644
>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct
>>>>>> vhost_dev *dev, int idx)
>>>>>>        return idx;
>>>>>>    }
>>>>>>
>>>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev,
>>>>>> unsigned int ready)
>>>>>>    {
>>>>>>        int i;
>>>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>>>            struct vhost_vring_state state = {
>>>>>>                .index = dev->vq_index + i,
>>>>>> -            .num = 1,
>>>>>> +            .num = ready,
>>>>>>            };
>>>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>>>        }
>>>>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct
>>>>>> vhost_dev *dev, bool started)
>>>>>>            if (unlikely(!ok)) {
>>>>>>                return -1;
>>>>>>            }
>>>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>>>        } else {
>>>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>>>            ok = vhost_vdpa_svqs_stop(dev);
>>>>>>            if (unlikely(!ok)) {
>>>>>>                return -1;
>>>>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct
>>>>>> vhost_dev *dev, bool started)
>>>>>>            memory_listener_register(&v->listener,
>>>>>> &address_space_memory);
>>>>>>            return vhost_vdpa_add_status(dev,
>>>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>>>        } else {
>>>>>> -        vhost_vdpa_reset_device(dev);
>>>> Unfortunately, the reset can't be be removed from here as this code
>>>> path usually involves virtio reset or status change for e.g. invoked
>>>> via virtio_net_set_status(... , 0). Ideally we should use the
>>>> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where
>>>> status change is involved after vhost_dev_stop() is done, but this
>>>> distinction is not there yet as of today in all of the virtio devices
>>>> except vhost_user_scsi.
>>>>
>>>> Alternatively we may be able to do something like below, stop the
>>>> virtqueue in vhost_vdpa_get_vring_base() in the
>>>> vhost_virtqueue_stop() context. Only until the hardware vq is
>>>> stopped, svq can stop and unmap then vhost-vdpa would reset the
>>>> device status. It kinda works, but not in a perfect way...
>> As I indicated above, this is an less ideal way to address the issue you
>> came across about, without losing functionality or introducing
>> regression to the code. Ideally it'd be best to get fixed in a clean
>> way, though that would a little more effort in code refactoring.
>> Personally I feel that the error message you saw is somewhat benign and
>> don't think it caused any real problem. Did you see trouble if living
>> with the bogus error message for the moment?
> 
> Should be fine for networking devices at least since we don't care
> about duplicated packets (restore last_avail_idx as used_idx).
> 
> But it might be problematic for types of devices.
> 
> Thanks
> 

I find that the second reset does not triggered by device qemu_del_nic, 
although it will trigger device reset, the rootcause is that,
we try to stop the vhost devices in each queue pair and control queue, 
each queue pair or control queue needs one vhost device. And all vhost 
devices share one kernel vdpa device.

In my case, only enable 1 datapath queue pair and one control queue.

the first time vhost_net_stop_one is for datapath queue pair, but
at this time the backend kernel vdpa device has been reset, in the
second loop, when stop control queue backend vhost device, the queue
already disappeared.

I will send V2 to fix the issue, to reset the device in the last vhost 
device stop stage.

> 
>>
>>>>
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct
>>>> vhost_dev *dev, int idx)
>>>>        return idx;
>>>>    }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int
>>>> enable)
>>>>    {
>>>>        int i;
>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>            struct vhost_vring_state state = {
>>>>                .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = enable,
>>>>            };
>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>        }
>>>> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev
>>>> *dev, bool started)
>>>>
>>>>        if (started) {
>>>>            vhost_vdpa_host_notifiers_init(dev);
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>        } else {
>>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>>        }
>>>> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct
>>>> vhost_dev *dev,
>>>>    {
>>>>        int ret;
>>>>
>>>> +    /* Deactivate the queue (best effort) */
>>>> +    vhost_vdpa_set_vring_ready(dev, 0);
>>>> +
>>>
>>> I don't think it's a good idea to change the state in "get" function,
>>>
>>> get means "read" not "write".
>> Well, if you look at the context of vhost_vdpa_get_vring_base(), the
>> only caller is vhost_virtqueue_stop(). Without stopping the hardware
>> ahead, it doesn't make sense to effectively get a used_index snapshot
>> for resuming/restarting the vq. It might be more obvious and sensible to
>> see that were to introduce another Vhost op to suspend the vq right
>> before the get_vring_base() call, though I wouldn't bother doing it.
>>
>>>
>>>>        ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
>>>>        trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
>>>>        return ret;
>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>> index 437347a..2e917d8 100644
>>>> --- a/hw/virtio/vhost.c
>>>> +++ b/hw/virtio/vhost.c
>>>> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev,
>>>> VirtIODevice *vdev)
>>>>        /* should only be called after backend is connected */
>>>>        assert(hdev->vhost_ops);
>>>>
>>>> -    if (hdev->vhost_ops->vhost_dev_start) {
>>>> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
>>>> -    }
>>>>        for (i = 0; i < hdev->nvqs; ++i) {
>>>>            vhost_virtqueue_stop(hdev,
>>>>                                 vdev,
>>>>                                 hdev->vqs + i,
>>>>                                 hdev->vq_index + i);
>>>>        }
>>>> +    if (hdev->vhost_ops->vhost_dev_start) {
>>>> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
>>>> +    }
>>>>
>>>
>>> This first idea comes to me is just like this, but at last I don't
>>> choose this solution.
>>>
>>> When we start a device, first we start the virtqueue then
>>> vhost_ops->vhost_dev_start.
>>>
>>> So in stop stage, in my opinion, we should just do the opposite, do as
>>> the orignal code do. Change the sequential is a dangerous action.
>> I don't see any danger yet, would you please elaborate the specific
>> problem you see? I think this sequence is as expected:
>>
>> 1. suspend each individual vq i.e. stop processing upcoming request, and
>> possibly complete inflight requests  -> get_vring_base()
>> 2. tear down virtio resources from VMM for e.g. unmap guest memory
>> mappings, remove host notifiers, and et al
>> 3. reset device -> vhost_vdpa_reset_device()
>>
>> Regards,
>> -Siwei
>>
>>>
>>> Thanks,
>>> Michael
>>>>        if (vhost_dev_has_iommu(hdev)) {
>>>>            if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>>>>
>>>> Regards,
>>>> -Siwei
>>>>
>>>>>>            vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>>> VIRTIO_CONFIG_S_DRIVER);
>>>>>>            memory_listener_unregister(&v->listener);
>>>>>> --
>>>>>> 1.8.3.1
>>>>>>
>>>>>
>>>>
>>>>
>>
> 
>
diff mbox series

Patch

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..d858b4f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -719,14 +719,14 @@  static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_vring_state state = {
             .index = dev->vq_index + i,
-            .num = 1,
+            .num = ready,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
     }
@@ -1088,8 +1088,9 @@  static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
     } else {
+        vhost_vdpa_set_vring_ready(dev, 0);
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
             return -1;
@@ -1105,7 +1106,6 @@  static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     } else {
-        vhost_vdpa_reset_device(dev);
         vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                    VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);