diff mbox series

[v3,14/17] vfio/migration: Reset device if setting recover state fails

Message ID 20221103161620.13120-15-avihaih@nvidia.com (mailing list archive)
State New, archived
Headers show
Series vfio/migration: Implement VFIO migration protocol v2 | expand

Commit Message

Avihai Horon Nov. 3, 2022, 4:16 p.m. UTC
If vfio_migration_set_state() fails to set the device in the requested
state it tries to put it in a recover state. If setting the device in
the recover state fails as well, hw_error is triggered and the VM is
aborted.

To improve user experience and avoid VM data loss, reset the device with
VFIO_RESET_DEVICE instead of aborting the VM.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Alex Williamson Nov. 16, 2022, 6:36 p.m. UTC | #1
On Thu, 3 Nov 2022 18:16:17 +0200
Avihai Horon <avihaih@nvidia.com> wrote:

> If vfio_migration_set_state() fails to set the device in the requested
> state it tries to put it in a recover state. If setting the device in
> the recover state fails as well, hw_error is triggered and the VM is
> aborted.
> 
> To improve user experience and avoid VM data loss, reset the device with
> VFIO_RESET_DEVICE instead of aborting the VM.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  hw/vfio/migration.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index f8c3228314..e8068b9147 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>  
>          mig_state->device_state = recover_state;
>          if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -            hw_error("%s: Failed setting device in recover state, err: %s",
> -                     vbasedev->name, strerror(errno));
> +            error_report(
> +                "%s: Failed setting device in recover state, err: %s. Resetting device",
> +                         vbasedev->name, strerror(errno));
> +
> +            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
> +                hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
> +                         strerror(errno));
> +            }
> +
> +            migration->device_state = VFIO_DEVICE_STATE_RUNNING;
> +
> +            return -1;
>          }
>  
>          migration->device_state = recover_state;

This addresses one of my comments on 12/ and should probably be rolled
in there.  Thanks,

Alex
Avihai Horon Nov. 17, 2022, 5:11 p.m. UTC | #2
On 16/11/2022 20:36, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, 3 Nov 2022 18:16:17 +0200
> Avihai Horon <avihaih@nvidia.com> wrote:
>
>> If vfio_migration_set_state() fails to set the device in the requested
>> state it tries to put it in a recover state. If setting the device in
>> the recover state fails as well, hw_error is triggered and the VM is
>> aborted.
>>
>> To improve user experience and avoid VM data loss, reset the device with
>> VFIO_RESET_DEVICE instead of aborting the VM.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   hw/vfio/migration.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index f8c3228314..e8068b9147 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>>
>>           mig_state->device_state = recover_state;
>>           if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> -            hw_error("%s: Failed setting device in recover state, err: %s",
>> -                     vbasedev->name, strerror(errno));
>> +            error_report(
>> +                "%s: Failed setting device in recover state, err: %s. Resetting device",
>> +                         vbasedev->name, strerror(errno));
>> +
>> +            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
>> +                hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
>> +                         strerror(errno));
>> +            }
>> +
>> +            migration->device_state = VFIO_DEVICE_STATE_RUNNING;
>> +
>> +            return -1;
>>           }
>>
>>           migration->device_state = recover_state;
> This addresses one of my comments on 12/ and should probably be rolled
> in there.

Not sure to which comment you refer to. Could you elaborate?

Thanks!

>    Thanks,
>
> Alex
>
Alex Williamson Nov. 17, 2022, 6:18 p.m. UTC | #3
On Thu, 17 Nov 2022 19:11:47 +0200
Avihai Horon <avihaih@nvidia.com> wrote:

> On 16/11/2022 20:36, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, 3 Nov 2022 18:16:17 +0200
> > Avihai Horon <avihaih@nvidia.com> wrote:
> >  
> >> If vfio_migration_set_state() fails to set the device in the requested
> >> state it tries to put it in a recover state. If setting the device in
> >> the recover state fails as well, hw_error is triggered and the VM is
> >> aborted.
> >>
> >> To improve user experience and avoid VM data loss, reset the device with
> >> VFIO_RESET_DEVICE instead of aborting the VM.
> >>
> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >> ---
> >>   hw/vfio/migration.c | 14 ++++++++++++--
> >>   1 file changed, 12 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index f8c3228314..e8068b9147 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
> >>
> >>           mig_state->device_state = recover_state;
> >>           if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> >> -            hw_error("%s: Failed setting device in recover state, err: %s",
> >> -                     vbasedev->name, strerror(errno));
> >> +            error_report(
> >> +                "%s: Failed setting device in recover state, err: %s. Resetting device",
> >> +                         vbasedev->name, strerror(errno));
> >> +
> >> +            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
> >> +                hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
> >> +                         strerror(errno));
> >> +            }
> >> +
> >> +            migration->device_state = VFIO_DEVICE_STATE_RUNNING;
> >> +
> >> +            return -1;
> >>           }
> >>
> >>           migration->device_state = recover_state;  
> > This addresses one of my comments on 12/ and should probably be rolled
> > in there.  
> 
> Not sure to which comment you refer to. Could you elaborate?

Hmm, I guess I thought this was in the section immediately following
where I questioned going to recovery state.  I'm still not sure why
this is a separate patch from the initial implementation of the
function in 12/ though.  Thanks,
'
Alex
Avihai Horon Nov. 20, 2022, 9:39 a.m. UTC | #4
On 17/11/2022 20:18, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, 17 Nov 2022 19:11:47 +0200
> Avihai Horon <avihaih@nvidia.com> wrote:
>
>> On 16/11/2022 20:36, Alex Williamson wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Thu, 3 Nov 2022 18:16:17 +0200
>>> Avihai Horon <avihaih@nvidia.com> wrote:
>>>
>>>> If vfio_migration_set_state() fails to set the device in the requested
>>>> state it tries to put it in a recover state. If setting the device in
>>>> the recover state fails as well, hw_error is triggered and the VM is
>>>> aborted.
>>>>
>>>> To improve user experience and avoid VM data loss, reset the device with
>>>> VFIO_RESET_DEVICE instead of aborting the VM.
>>>>
>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>>> ---
>>>>    hw/vfio/migration.c | 14 ++++++++++++--
>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>> index f8c3228314..e8068b9147 100644
>>>> --- a/hw/vfio/migration.c
>>>> +++ b/hw/vfio/migration.c
>>>> @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>>>>
>>>>            mig_state->device_state = recover_state;
>>>>            if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>>>> -            hw_error("%s: Failed setting device in recover state, err: %s",
>>>> -                     vbasedev->name, strerror(errno));
>>>> +            error_report(
>>>> +                "%s: Failed setting device in recover state, err: %s. Resetting device",
>>>> +                         vbasedev->name, strerror(errno));
>>>> +
>>>> +            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
>>>> +                hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
>>>> +                         strerror(errno));
>>>> +            }
>>>> +
>>>> +            migration->device_state = VFIO_DEVICE_STATE_RUNNING;
>>>> +
>>>> +            return -1;
>>>>            }
>>>>
>>>>            migration->device_state = recover_state;
>>> This addresses one of my comments on 12/ and should probably be rolled
>>> in there.
>> Not sure to which comment you refer to. Could you elaborate?
> Hmm, I guess I thought this was in the section immediately following
> where I questioned going to recovery state.  I'm still not sure why
> this is a separate patch from the initial implementation of the
> function in 12/ though.

This adds new functionality comparing to v1, so I thought this should be 
in its own patch.

I can squash it to patch 12 if you want.
diff mbox series

Patch

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index f8c3228314..e8068b9147 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -92,8 +92,18 @@  static int vfio_migration_set_state(VFIODevice *vbasedev,
 
         mig_state->device_state = recover_state;
         if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
-            hw_error("%s: Failed setting device in recover state, err: %s",
-                     vbasedev->name, strerror(errno));
+            error_report(
+                "%s: Failed setting device in recover state, err: %s. Resetting device",
+                         vbasedev->name, strerror(errno));
+
+            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
+                hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
+                         strerror(errno));
+            }
+
+            migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+
+            return -1;
         }
 
         migration->device_state = recover_state;