Message ID | 20221103161620.13120-15-avihaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/migration: Implement VFIO migration protocol v2 | expand |
On Thu, 3 Nov 2022 18:16:17 +0200 Avihai Horon <avihaih@nvidia.com> wrote: > If vfio_migration_set_state() fails to set the device in the requested > state it tries to put it in a recover state. If setting the device in > the recover state fails as well, hw_error is triggered and the VM is > aborted. > > To improve user experience and avoid VM data loss, reset the device with > VFIO_RESET_DEVICE instead of aborting the VM. > > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > --- > hw/vfio/migration.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index f8c3228314..e8068b9147 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, > > mig_state->device_state = recover_state; > if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { > - hw_error("%s: Failed setting device in recover state, err: %s", > - vbasedev->name, strerror(errno)); > + error_report( > + "%s: Failed setting device in recover state, err: %s. Resetting device", > + vbasedev->name, strerror(errno)); > + > + if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) { > + hw_error("%s: Failed resetting device, err: %s", vbasedev->name, > + strerror(errno)); > + } > + > + migration->device_state = VFIO_DEVICE_STATE_RUNNING; > + > + return -1; > } > > migration->device_state = recover_state; This addresses one of my comments on 12/ and should probably be rolled in there. Thanks, Alex
On 16/11/2022 20:36, Alex Williamson wrote: > External email: Use caution opening links or attachments > > > On Thu, 3 Nov 2022 18:16:17 +0200 > Avihai Horon <avihaih@nvidia.com> wrote: > >> If vfio_migration_set_state() fails to set the device in the requested >> state it tries to put it in a recover state. If setting the device in >> the recover state fails as well, hw_error is triggered and the VM is >> aborted. >> >> To improve user experience and avoid VM data loss, reset the device with >> VFIO_RESET_DEVICE instead of aborting the VM. >> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >> --- >> hw/vfio/migration.c | 14 ++++++++++++-- >> 1 file changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index f8c3228314..e8068b9147 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, >> >> mig_state->device_state = recover_state; >> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { >> - hw_error("%s: Failed setting device in recover state, err: %s", >> - vbasedev->name, strerror(errno)); >> + error_report( >> + "%s: Failed setting device in recover state, err: %s. Resetting device", >> + vbasedev->name, strerror(errno)); >> + >> + if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) { >> + hw_error("%s: Failed resetting device, err: %s", vbasedev->name, >> + strerror(errno)); >> + } >> + >> + migration->device_state = VFIO_DEVICE_STATE_RUNNING; >> + >> + return -1; >> } >> >> migration->device_state = recover_state; > This addresses one of my comments on 12/ and should probably be rolled > in there. Not sure to which comment you refer to. Could you elaborate? Thanks! > Thanks, > > Alex >
On Thu, 17 Nov 2022 19:11:47 +0200 Avihai Horon <avihaih@nvidia.com> wrote: > On 16/11/2022 20:36, Alex Williamson wrote: > > External email: Use caution opening links or attachments > > > > > > On Thu, 3 Nov 2022 18:16:17 +0200 > > Avihai Horon <avihaih@nvidia.com> wrote: > > > >> If vfio_migration_set_state() fails to set the device in the requested > >> state it tries to put it in a recover state. If setting the device in > >> the recover state fails as well, hw_error is triggered and the VM is > >> aborted. > >> > >> To improve user experience and avoid VM data loss, reset the device with > >> VFIO_RESET_DEVICE instead of aborting the VM. > >> > >> Signed-off-by: Avihai Horon <avihaih@nvidia.com> > >> --- > >> hw/vfio/migration.c | 14 ++++++++++++-- > >> 1 file changed, 12 insertions(+), 2 deletions(-) > >> > >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > >> index f8c3228314..e8068b9147 100644 > >> --- a/hw/vfio/migration.c > >> +++ b/hw/vfio/migration.c > >> @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, > >> > >> mig_state->device_state = recover_state; > >> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { > >> - hw_error("%s: Failed setting device in recover state, err: %s", > >> - vbasedev->name, strerror(errno)); > >> + error_report( > >> + "%s: Failed setting device in recover state, err: %s. Resetting device", > >> + vbasedev->name, strerror(errno)); > >> + > >> + if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) { > >> + hw_error("%s: Failed resetting device, err: %s", vbasedev->name, > >> + strerror(errno)); > >> + } > >> + > >> + migration->device_state = VFIO_DEVICE_STATE_RUNNING; > >> + > >> + return -1; > >> } > >> > >> migration->device_state = recover_state; > > This addresses one of my comments on 12/ and should probably be rolled > > in there. > > Not sure to which comment you refer to. Could you elaborate? Hmm, I guess I thought this was in the section immediately following where I questioned going to recovery state. I'm still not sure why this is a separate patch from the initial implementation of the function in 12/ though. Thanks, ' Alex
On 17/11/2022 20:18, Alex Williamson wrote: > External email: Use caution opening links or attachments > > > On Thu, 17 Nov 2022 19:11:47 +0200 > Avihai Horon <avihaih@nvidia.com> wrote: > >> On 16/11/2022 20:36, Alex Williamson wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Thu, 3 Nov 2022 18:16:17 +0200 >>> Avihai Horon <avihaih@nvidia.com> wrote: >>> >>>> If vfio_migration_set_state() fails to set the device in the requested >>>> state it tries to put it in a recover state. If setting the device in >>>> the recover state fails as well, hw_error is triggered and the VM is >>>> aborted. >>>> >>>> To improve user experience and avoid VM data loss, reset the device with >>>> VFIO_RESET_DEVICE instead of aborting the VM. >>>> >>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >>>> --- >>>> hw/vfio/migration.c | 14 ++++++++++++-- >>>> 1 file changed, 12 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >>>> index f8c3228314..e8068b9147 100644 >>>> --- a/hw/vfio/migration.c >>>> +++ b/hw/vfio/migration.c >>>> @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, >>>> >>>> mig_state->device_state = recover_state; >>>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { >>>> - hw_error("%s: Failed setting device in recover state, err: %s", >>>> - vbasedev->name, strerror(errno)); >>>> + error_report( >>>> + "%s: Failed setting device in recover state, err: %s. Resetting device", >>>> + vbasedev->name, strerror(errno)); >>>> + >>>> + if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) { >>>> + hw_error("%s: Failed resetting device, err: %s", vbasedev->name, >>>> + strerror(errno)); >>>> + } >>>> + >>>> + migration->device_state = VFIO_DEVICE_STATE_RUNNING; >>>> + >>>> + return -1; >>>> } >>>> >>>> migration->device_state = recover_state; >>> This addresses one of my comments on 12/ and should probably be rolled >>> in there. >> Not sure to which comment you refer to. Could you elaborate? > Hmm, I guess I thought this was in the section immediately following > where I questioned going to recovery state. I'm still not sure why > this is a separate patch from the initial implementation of the > function in 12/ though. This adds new functionality comparing to v1, so I thought this should be in its own patch. I can squash it to patch 12 if you want.
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index f8c3228314..e8068b9147 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, mig_state->device_state = recover_state; if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { - hw_error("%s: Failed setting device in recover state, err: %s", - vbasedev->name, strerror(errno)); + error_report( + "%s: Failed setting device in recover state, err: %s. Resetting device", + vbasedev->name, strerror(errno)); + + if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) { + hw_error("%s: Failed resetting device, err: %s", vbasedev->name, + strerror(errno)); + } + + migration->device_state = VFIO_DEVICE_STATE_RUNNING; + + return -1; } migration->device_state = recover_state;
If vfio_migration_set_state() fails to set the device in the requested state it tries to put it in a recover state. If setting the device in the recover state fails as well, hw_error is triggered and the VM is aborted. To improve user experience and avoid VM data loss, reset the device with VFIO_RESET_DEVICE instead of aborting the VM. Signed-off-by: Avihai Horon <avihaih@nvidia.com> --- hw/vfio/migration.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)