diff mbox series

[12/12] vdpa: fix network breakage after cancelling migration

Message ID 1707910082-10243-13-git-send-email-si-wei.liu@oracle.com (mailing list archive)
State New, archived
Headers show
Series Preparatory patches for live migration downtime improvement | expand

Commit Message

Si-Wei Liu Feb. 14, 2024, 11:28 a.m. UTC
Fix an issue where cancellation of ongoing migration ends up
with no network connectivity.

When canceling migration, SVQ will be switched back to the
passthrough mode, but the right call fd is not programed to
the device and the svq's own call fd is still used. At the
point of this transitioning period, the shadow_vqs_enabled
hadn't been set back to false yet, causing the installation
of call fd inadvertently bypassed.

Fixes: a8ac88585da1 ("vhost: Add Shadow VirtQueue call forwarding capabilities")
Cc: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Eugenio Perez Martin Feb. 15, 2024, 4:15 p.m. UTC | #1
On Wed, Feb 14, 2024 at 1:39 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Fix an issue where cancellation of ongoing migration ends up
> with no network connectivity.
>
> When canceling migration, SVQ will be switched back to the
> passthrough mode, but the right call fd is not programed to
> the device and the svq's own call fd is still used. At the
> point of this transitioning period, the shadow_vqs_enabled
> hadn't been set back to false yet, causing the installation
> of call fd inadvertently bypassed.
>
> Fixes: a8ac88585da1 ("vhost: Add Shadow VirtQueue call forwarding capabilities")
> Cc: Eugenio Pérez <eperezma@redhat.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 004110f..dfeca8b 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1468,7 +1468,15 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>
>      /* Remember last call fd because we can switch to SVQ anytime. */
>      vhost_svq_set_svq_call_fd(svq, file->fd);
> -    if (v->shadow_vqs_enabled) {
> +    /*
> +     * When SVQ is transitioning to off, shadow_vqs_enabled has
> +     * not been set back to false yet, but the underlying call fd
> +     * will have to switch back to the guest notifier to signal the
> +     * passthrough virtqueues. In other situations, SVQ's own call
> +     * fd shall be used to signal the device model.
> +     */
> +    if (v->shadow_vqs_enabled &&
> +        v->shared->svq_switching != SVQ_TSTATE_DISABLING) {

I think it would be great to not need to add more status variables to
vhost_vdpa (or any struct).

What if we recover the call file descriptor at vhost_vdpa_svqs_stop?
This way everything is more symmetrical as kick and call are set by
vhost_vdpa_svqs_start.

Thanks!

>          return 0;
>      }
>
> --
> 1.8.3.1
>
Eugenio Perez Martin Feb. 15, 2024, 5:10 p.m. UTC | #2
On Thu, Feb 15, 2024 at 5:15 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Feb 14, 2024 at 1:39 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >
> > Fix an issue where cancellation of ongoing migration ends up
> > with no network connectivity.
> >
> > When canceling migration, SVQ will be switched back to the
> > passthrough mode, but the right call fd is not programed to
> > the device and the svq's own call fd is still used. At the
> > point of this transitioning period, the shadow_vqs_enabled
> > hadn't been set back to false yet, causing the installation
> > of call fd inadvertently bypassed.
> >
> > Fixes: a8ac88585da1 ("vhost: Add Shadow VirtQueue call forwarding capabilities")
> > Cc: Eugenio Pérez <eperezma@redhat.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > ---
> >  hw/virtio/vhost-vdpa.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 004110f..dfeca8b 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1468,7 +1468,15 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >
> >      /* Remember last call fd because we can switch to SVQ anytime. */
> >      vhost_svq_set_svq_call_fd(svq, file->fd);
> > -    if (v->shadow_vqs_enabled) {
> > +    /*
> > +     * When SVQ is transitioning to off, shadow_vqs_enabled has
> > +     * not been set back to false yet, but the underlying call fd
> > +     * will have to switch back to the guest notifier to signal the
> > +     * passthrough virtqueues. In other situations, SVQ's own call
> > +     * fd shall be used to signal the device model.
> > +     */
> > +    if (v->shadow_vqs_enabled &&
> > +        v->shared->svq_switching != SVQ_TSTATE_DISABLING) {
>
> I think it would be great to not need to add more status variables to
> vhost_vdpa (or any struct).
>
> What if we recover the call file descriptor at vhost_vdpa_svqs_stop?
> This way everything is more symmetrical as kick and call are set by
> vhost_vdpa_svqs_start.
>

Also, we could check for dev->started.

> Thanks!
>
> >          return 0;
> >      }
> >
> > --
> > 1.8.3.1
> >
Michael Tokarev March 13, 2024, 6:12 p.m. UTC | #3
14.02.2024 14:28, Si-Wei Liu wrote:
> Fix an issue where cancellation of ongoing migration ends up
> with no network connectivity.
> 
> When canceling migration, SVQ will be switched back to the
> passthrough mode, but the right call fd is not programed to
> the device and the svq's own call fd is still used. At the
> point of this transitioning period, the shadow_vqs_enabled
> hadn't been set back to false yet, causing the installation
> of call fd inadvertently bypassed.
> 
> Fixes: a8ac88585da1 ("vhost: Add Shadow VirtQueue call forwarding capabilities")
> Cc: Eugenio Pérez <eperezma@redhat.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>   hw/virtio/vhost-vdpa.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)

Is this a -stable material?

If yes, is it also applicable for stable-7.2 (mentioned commit is in 7.2.0),
which lacks v7.2.0-2327-gb276524386 "vdpa: Remember last call fd set",
or should this one also be picked up?

Thanks,

/mjt

> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 004110f..dfeca8b 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1468,7 +1468,15 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>   
>       /* Remember last call fd because we can switch to SVQ anytime. */
>       vhost_svq_set_svq_call_fd(svq, file->fd);
> -    if (v->shadow_vqs_enabled) {
> +    /*
> +     * When SVQ is transitioning to off, shadow_vqs_enabled has
> +     * not been set back to false yet, but the underlying call fd
> +     * will have to switch back to the guest notifier to signal the
> +     * passthrough virtqueues. In other situations, SVQ's own call
> +     * fd shall be used to signal the device model.
> +     */
> +    if (v->shadow_vqs_enabled &&
> +        v->shared->svq_switching != SVQ_TSTATE_DISABLING) {
>           return 0;
>       }
>
Michael Tokarev March 13, 2024, 6:52 p.m. UTC | #4
13.03.2024 21:12, Michael Tokarev пишет:
> 14.02.2024 14:28, Si-Wei Liu wrote:
>> Fix an issue where cancellation of ongoing migration ends up
>> with no network connectivity.
>>
>> When canceling migration, SVQ will be switched back to the
>> passthrough mode, but the right call fd is not programed to
>> the device and the svq's own call fd is still used. At the
>> point of this transitioning period, the shadow_vqs_enabled
>> hadn't been set back to false yet, causing the installation
>> of call fd inadvertently bypassed.
>>
>> Fixes: a8ac88585da1 ("vhost: Add Shadow VirtQueue call forwarding capabilities")
>> Cc: Eugenio Pérez <eperezma@redhat.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>>   hw/virtio/vhost-vdpa.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> Is this a -stable material?
> 
> If yes, is it also applicable for stable-7.2 (mentioned commit is in 7.2.0),
> which lacks v7.2.0-2327-gb276524386 "vdpa: Remember last call fd set",
> or should this one also be picked up?

Aha, this does not actually work without v8.2.0-171-gf6fe3e333f
"vdpa: move memory listener to vhost_vdpa_shared".

/mjt

>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index 004110f..dfeca8b 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -1468,7 +1468,15 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>       /* Remember last call fd because we can switch to SVQ anytime. */
>>       vhost_svq_set_svq_call_fd(svq, file->fd);
>> -    if (v->shadow_vqs_enabled) {
>> +    /*
>> +     * When SVQ is transitioning to off, shadow_vqs_enabled has
>> +     * not been set back to false yet, but the underlying call fd
>> +     * will have to switch back to the guest notifier to signal the
>> +     * passthrough virtqueues. In other situations, SVQ's own call
>> +     * fd shall be used to signal the device model.
>> +     */
>> +    if (v->shadow_vqs_enabled &&
>> +        v->shared->svq_switching != SVQ_TSTATE_DISABLING) {
>>           return 0;
>>       }
>
Si-Wei Liu March 13, 2024, 7:10 p.m. UTC | #5
On 3/13/2024 11:12 AM, Michael Tokarev wrote:
> 14.02.2024 14:28, Si-Wei Liu wrote:
>> Fix an issue where cancellation of ongoing migration ends up
>> with no network connectivity.
>>
>> When canceling migration, SVQ will be switched back to the
>> passthrough mode, but the right call fd is not programed to
>> the device and the svq's own call fd is still used. At the
>> point of this transitioning period, the shadow_vqs_enabled
>> hadn't been set back to false yet, causing the installation
>> of call fd inadvertently bypassed.
>>
>> Fixes: a8ac88585da1 ("vhost: Add Shadow VirtQueue call forwarding 
>> capabilities")
>> Cc: Eugenio Pérez <eperezma@redhat.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>>   hw/virtio/vhost-vdpa.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> Is this a -stable material?
Probably yes, the pre-requisites of this patch are PATCH #10 and #11 
from this series (where SVQ_TSTATE_DISABLING gets defined and set).

>
> If yes, is it also applicable for stable-7.2 (mentioned commit is in 
> 7.2.0),
> which lacks v7.2.0-2327-gb276524386 "vdpa: Remember last call fd set",
> or should this one also be picked up?
Eugenio can judge, but seems to me the relevant code path cannot be 
effectively called as the dynamic SVQ feature (switching over to SVQ 
dynamically when migration is started) is not supported from 7.2. Maybe 
not worth it to cherry-pick this one to 7.2. Cherry-pick to stable-8.0 
and above should be applicable though (it needs some tweaks on patch #10 
to move svq_switching from @struct VhostVDPAShared to @struct vhost_vdpa).

Regards,
-Siwei

>
> Thanks,
>
> /mjt
>
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index 004110f..dfeca8b 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -1468,7 +1468,15 @@ static int vhost_vdpa_set_vring_call(struct 
>> vhost_dev *dev,
>>         /* Remember last call fd because we can switch to SVQ 
>> anytime. */
>>       vhost_svq_set_svq_call_fd(svq, file->fd);
>> -    if (v->shadow_vqs_enabled) {
>> +    /*
>> +     * When SVQ is transitioning to off, shadow_vqs_enabled has
>> +     * not been set back to false yet, but the underlying call fd
>> +     * will have to switch back to the guest notifier to signal the
>> +     * passthrough virtqueues. In other situations, SVQ's own call
>> +     * fd shall be used to signal the device model.
>> +     */
>> +    if (v->shadow_vqs_enabled &&
>> +        v->shared->svq_switching != SVQ_TSTATE_DISABLING) {
>>           return 0;
>>       }
>
Michael Tokarev March 13, 2024, 8:41 p.m. UTC | #6
13.03.2024 22:10, Si-Wei Liu wrote:
> On 3/13/2024 11:12 AM, Michael Tokarev wrote:
..
>> Is this a -stable material?
> Probably yes, the pre-requisites of this patch are PATCH #10 and #11 from this series (where SVQ_TSTATE_DISABLING gets defined and set).
> 
>>
>> If yes, is it also applicable for stable-7.2 (mentioned commit is in 7.2.0),
>> which lacks v7.2.0-2327-gb276524386 "vdpa: Remember last call fd set",
>> or should this one also be picked up?
> Eugenio can judge, but seems to me the relevant code path cannot be effectively called as the dynamic SVQ feature (switching over to SVQ dynamically 
> when migration is started) is not supported from 7.2. Maybe not worth it to cherry-pick this one to 7.2. Cherry-pick to stable-8.0 and above should be 
> applicable though (it needs some tweaks on patch #10 to move svq_switching from @struct VhostVDPAShared to @struct vhost_vdpa).

Uhh.  That's a bit larger than I thought.  With this amount of prepreqs
and manual tweaking, and with the fact this whole thing introduces Yet
Another State value, - let's omit this one.  Or at the very least I'd
love to have actual real-life testcase and a clean backport to 8.2 of
all the required changes.

Thanks,

/mjt
diff mbox series

Patch

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 004110f..dfeca8b 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1468,7 +1468,15 @@  static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
 
     /* Remember last call fd because we can switch to SVQ anytime. */
     vhost_svq_set_svq_call_fd(svq, file->fd);
-    if (v->shadow_vqs_enabled) {
+    /*
+     * When SVQ is transitioning to off, shadow_vqs_enabled has
+     * not been set back to false yet, but the underlying call fd
+     * will have to switch back to the guest notifier to signal the
+     * passthrough virtqueues. In other situations, SVQ's own call
+     * fd shall be used to signal the device model.
+     */
+    if (v->shadow_vqs_enabled &&
+        v->shared->svq_switching != SVQ_TSTATE_DISABLING) {
         return 0;
     }