diff mbox series

vfio/migration: Fix return value of vfio_migration_realize()

Message ID 20230615081851.326816-1-zhenzhong.duan@intel.com (mailing list archive)
State New, archived
Headers show
Series vfio/migration: Fix return value of vfio_migration_realize() | expand

Commit Message

Duan, Zhenzhong June 15, 2023, 8:18 a.m. UTC
We should print "Migration disabled" when migration is blocked
in vfio_migration_realize().

Fix it by reverting return value of migrate_add_blocker(),
meanwhile error out directly once migrate_add_blocker() failed.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/common.c    | 4 ++--
 hw/vfio/migration.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

Comments

Joao Martins June 15, 2023, 8:53 a.m. UTC | #1
On 15/06/2023 09:18, Zhenzhong Duan wrote:
> We should print "Migration disabled" when migration is blocked
> in vfio_migration_realize().
> 
> Fix it by reverting return value of migrate_add_blocker(),
> meanwhile error out directly once migrate_add_blocker() failed.
> 

It wasn't immediately obvious from commit message that this is mainly
about just printing an error message when blockers get added and
that well migrate_add_blocker() returns 0 on success and caller
of vfio_migration_realize expects the opposite when blockers are added.

Perhaps better wording would be:

migrate_add_blocker() returns 0 when successfully adding the
migration blocker. However, the caller of vfio_migration_realize()
considers that migration was blocked when the latter returned
an error. Fix it by negating the return value obtained by
migrate_add_blocker(). What matters for migration is that the
blocker is added in core migration, so this cleans up usability
such that user sees "Migrate disabled" when any of the vfio
migration blockers are active.

> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  hw/vfio/common.c    | 4 ++--
>  hw/vfio/migration.c | 6 +++---
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index fa8fd949b1cf..8505385798f3 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -399,7 +399,7 @@ int vfio_block_multiple_devices_migration(Error **errp)
>          multiple_devices_migration_blocker = NULL;
>      }
>  
> -    return ret;
> +    return !ret;
>  }
>  
>  void vfio_unblock_multiple_devices_migration(void)
> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>          giommu_migration_blocker = NULL;
>      }
>  
> -    return ret;
> +    return !ret;
>  }
>  
>  void vfio_migration_finalize(void)
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 6b58dddb8859..0146521d129a 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>      }
>  
>      ret = vfio_block_multiple_devices_migration(errp);
> -    if (ret) {
> +    if (ret || (errp && *errp)) {

Why do you need this extra clause?

>          return ret;
>      }
>  
>      ret = vfio_block_giommu_migration(errp);
> -    if (ret) {
> +    if (ret || (errp && *errp)) {

Same here?

>          return ret;
>      }
>  
> @@ -667,7 +667,7 @@ add_blocker:
>          error_free(vbasedev->migration_blocker);
>          vbasedev->migration_blocker = NULL;
>      }
> -    return ret;
> +    return !ret;
>  }
>  
>  void vfio_migration_exit(VFIODevice *vbasedev)
Duan, Zhenzhong June 15, 2023, 9:19 a.m. UTC | #2
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Thursday, June 15, 2023 4:54 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
>
>
>
>On 15/06/2023 09:18, Zhenzhong Duan wrote:
>> We should print "Migration disabled" when migration is blocked in
>> vfio_migration_realize().
>>
>> Fix it by reverting return value of migrate_add_blocker(), meanwhile
>> error out directly once migrate_add_blocker() failed.
>>
>
>It wasn't immediately obvious from commit message that this is mainly about
>just printing an error message when blockers get added and that well
>migrate_add_blocker() returns 0 on success and caller of
>vfio_migration_realize expects the opposite when blockers are added.
>
>Perhaps better wording would be:
>
>migrate_add_blocker() returns 0 when successfully adding the migration
>blocker. However, the caller of vfio_migration_realize() considers that
>migration was blocked when the latter returned an error. Fix it by negating the
>return value obtained by migrate_add_blocker(). What matters for migration
>is that the blocker is added in core migration, so this cleans up usability such
>that user sees "Migrate disabled" when any of the vfio migration blockers are
>active.

Great, I'll use your words.

>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>  hw/vfio/common.c    | 4 ++--
>>  hw/vfio/migration.c | 6 +++---
>>  2 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>> fa8fd949b1cf..8505385798f3 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -399,7 +399,7 @@ int vfio_block_multiple_devices_migration(Error
>**errp)
>>          multiple_devices_migration_blocker = NULL;
>>      }
>>
>> -    return ret;
>> +    return !ret;
>>  }
>>
>>  void vfio_unblock_multiple_devices_migration(void)
>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>          giommu_migration_blocker = NULL;
>>      }
>>
>> -    return ret;
>> +    return !ret;
>>  }
>>
>>  void vfio_migration_finalize(void)
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index
>> 6b58dddb8859..0146521d129a 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice *vbasedev,
>Error **errp)
>>      }
>>
>>      ret = vfio_block_multiple_devices_migration(errp);
>> -    if (ret) {
>> +    if (ret || (errp && *errp)) {
>
>Why do you need this extra clause?

Now that error happens, no need to add other blockers which will fail for
same reason.

>
>>          return ret;
>>      }
>>
>>      ret = vfio_block_giommu_migration(errp);
>> -    if (ret) {
>> +    if (ret || (errp && *errp)) {
>
>Same here?

To avoid calling into trace_vfio_migration_probe() which hints vfio migration realize succeed.

Thanks
Zhenzhong

>
>>          return ret;
>>      }
>>
>> @@ -667,7 +667,7 @@ add_blocker:
>>          error_free(vbasedev->migration_blocker);
>>          vbasedev->migration_blocker = NULL;
>>      }
>> -    return ret;
>> +    return !ret;
>>  }
>>
>>  void vfio_migration_exit(VFIODevice *vbasedev)
Joao Martins June 15, 2023, 10:22 a.m. UTC | #3
On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Sent: Thursday, June 15, 2023 4:54 PM
>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>> Peng, Chao P <chao.p.peng@intel.com>
>> Subject: Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
>>
>>
>>
>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>> We should print "Migration disabled" when migration is blocked in
>>> vfio_migration_realize().
>>>
>>> Fix it by reverting return value of migrate_add_blocker(), meanwhile
>>> error out directly once migrate_add_blocker() failed.
>>>
>>
>> It wasn't immediately obvious from commit message that this is mainly about
>> just printing an error message when blockers get added and that well
>> migrate_add_blocker() returns 0 on success and caller of
>> vfio_migration_realize expects the opposite when blockers are added.
>>
>> Perhaps better wording would be:
>>
>> migrate_add_blocker() returns 0 when successfully adding the migration
>> blocker. However, the caller of vfio_migration_realize() considers that
>> migration was blocked when the latter returned an error. Fix it by negating the
>> return value obtained by migrate_add_blocker(). What matters for migration
>> is that the blocker is added in core migration, so this cleans up usability such
>> that user sees "Migrate disabled" when any of the vfio migration blockers are
>> active.
> 
> Great, I'll use your words.
> 
>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>  hw/vfio/common.c    | 4 ++--
>>>  hw/vfio/migration.c | 6 +++---
>>>  2 files changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>> fa8fd949b1cf..8505385798f3 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -399,7 +399,7 @@ int vfio_block_multiple_devices_migration(Error
>> **errp)
>>>          multiple_devices_migration_blocker = NULL;
>>>      }
>>>
>>> -    return ret;
>>> +    return !ret;
>>>  }
>>>
>>>  void vfio_unblock_multiple_devices_migration(void)
>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>          giommu_migration_blocker = NULL;
>>>      }
>>>
>>> -    return ret;
>>> +    return !ret;
>>>  }
>>>
>>>  void vfio_migration_finalize(void)
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>> 6b58dddb8859..0146521d129a 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice *vbasedev,
>> Error **errp)
>>>      }
>>>
>>>      ret = vfio_block_multiple_devices_migration(errp);
>>> -    if (ret) {
>>> +    if (ret || (errp && *errp)) {
>>
>> Why do you need this extra clause?
> 
> Now that error happens, no need to add other blockers which will fail for
> same reason.
> 

But you don't need the (errp && *errp) for that as that's the whole point of
negating the result.

And if there's an error set it means migrate_add_blocker failed to add the
blocker (e.g. snapshotting IIUC), and you would be failing here unnecessarily?

Still it feels strange to use the error pointer to check for that, it's better
to return error appropriately.

>>
>>>          return ret;
>>>      }
>>>
>>>      ret = vfio_block_giommu_migration(errp);
>>> -    if (ret) {
>>> +    if (ret || (errp && *errp)) {
>>
>> Same here?
> 
> To avoid calling into trace_vfio_migration_probe() which hints vfio migration realize succeed.
> 

That was clear, my question was more related to second clause you are adding..

> Thanks
> Zhenzhong
> 
>>
>>>          return ret;
>>>      }
>>>
>>> @@ -667,7 +667,7 @@ add_blocker:
>>>          error_free(vbasedev->migration_blocker);
>>>          vbasedev->migration_blocker = NULL;
>>>      }
>>> -    return ret;
>>> +    return !ret;
>>>  }
>>>
>>>  void vfio_migration_exit(VFIODevice *vbasedev)
Duan, Zhenzhong June 16, 2023, 2:42 a.m. UTC | #4
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Thursday, June 15, 2023 6:23 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
>
>On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Sent: Thursday, June 15, 2023 4:54 PM
>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>> vfio_migration_realize()
>>>
>>>
>>>
>>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>>> We should print "Migration disabled" when migration is blocked in
>>>> vfio_migration_realize().
>>>>
>>>> Fix it by reverting return value of migrate_add_blocker(), meanwhile
>>>> error out directly once migrate_add_blocker() failed.
>>>>
>>>
>>> It wasn't immediately obvious from commit message that this is mainly
>>> about just printing an error message when blockers get added and that
>>> well
>>> migrate_add_blocker() returns 0 on success and caller of
>>> vfio_migration_realize expects the opposite when blockers are added.
>>>
>>> Perhaps better wording would be:
>>>
>>> migrate_add_blocker() returns 0 when successfully adding the
>>> migration blocker. However, the caller of vfio_migration_realize()
>>> considers that migration was blocked when the latter returned an
>>> error. Fix it by negating the return value obtained by
>>> migrate_add_blocker(). What matters for migration is that the blocker
>>> is added in core migration, so this cleans up usability such that
>>> user sees "Migrate disabled" when any of the vfio migration blockers are
>active.
>>
>> Great, I'll use your words.
>>
>>>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>>  hw/vfio/common.c    | 4 ++--
>>>>  hw/vfio/migration.c | 6 +++---
>>>>  2 files changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>>> fa8fd949b1cf..8505385798f3 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -399,7 +399,7 @@ int vfio_block_multiple_devices_migration(Error
>>> **errp)
>>>>          multiple_devices_migration_blocker = NULL;
>>>>      }
>>>>
>>>> -    return ret;
>>>> +    return !ret;
>>>>  }
>>>>
>>>>  void vfio_unblock_multiple_devices_migration(void)
>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>>          giommu_migration_blocker = NULL;
>>>>      }
>>>>
>>>> -    return ret;
>>>> +    return !ret;
>>>>  }
>>>>
>>>>  void vfio_migration_finalize(void)
>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>>> 6b58dddb8859..0146521d129a 100644
>>>> --- a/hw/vfio/migration.c
>>>> +++ b/hw/vfio/migration.c
>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice
>>>> *vbasedev,
>>> Error **errp)
>>>>      }
>>>>
>>>>      ret = vfio_block_multiple_devices_migration(errp);
>>>> -    if (ret) {
>>>> +    if (ret || (errp && *errp)) {
>>>
>>> Why do you need this extra clause?
>>
>> Now that error happens, no need to add other blockers which will fail
>> for same reason.
>>
>
>But you don't need the (errp && *errp) for that as that's the whole point of
>negating the result.
>
>And if there's an error set it means migrate_add_blocker failed to add the
>blocker (e.g. snapshotting IIUC), and you would be failing here unnecessarily?

If there is an error qdev_device_add() will fail, continue execution is meaningless here?
There is ERRP_GUARD in this path, so it looks (*errp) is enough.

If I removed (errp && *errp) to continue, need below change to bypass trace_vfio_migration_probe
Do you prefer this way?

    if (!*errp) {
        trace_vfio_migration_probe(vbasedev->name);
    }

Thanks
Zhenzhong
>
>Still it feels strange to use the error pointer to check for that, it's better to
>return error appropriately.
>
>>>
>>>>          return ret;
>>>>      }
>>>>
>>>>      ret = vfio_block_giommu_migration(errp);
>>>> -    if (ret) {
>>>> +    if (ret || (errp && *errp)) {
>>>
>>> Same here?
>>
>> To avoid calling into trace_vfio_migration_probe() which hints vfio migration
>realize succeed.
>>
>
>That was clear, my question was more related to second clause you are
>adding..
>
>> Thanks
>> Zhenzhong
>>
>>>
>>>>          return ret;
>>>>      }
>>>>
>>>> @@ -667,7 +667,7 @@ add_blocker:
>>>>          error_free(vbasedev->migration_blocker);
>>>>          vbasedev->migration_blocker = NULL;
>>>>      }
>>>> -    return ret;
>>>> +    return !ret;
>>>>  }
>>>>
>>>>  void vfio_migration_exit(VFIODevice *vbasedev)
Joao Martins June 16, 2023, 9:05 a.m. UTC | #5
On 16/06/2023 03:42, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Sent: Thursday, June 15, 2023 6:23 PM
>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>> Peng, Chao P <chao.p.peng@intel.com>
>> Subject: Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
>>
>> On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Sent: Thursday, June 15, 2023 4:54 PM
>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>> vfio_migration_realize()
>>>>
>>>>
>>>>
>>>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>>>> We should print "Migration disabled" when migration is blocked in
>>>>> vfio_migration_realize().
>>>>>
>>>>> Fix it by reverting return value of migrate_add_blocker(), meanwhile
>>>>> error out directly once migrate_add_blocker() failed.
>>>>>
>>>>
>>>> It wasn't immediately obvious from commit message that this is mainly
>>>> about just printing an error message when blockers get added and that
>>>> well
>>>> migrate_add_blocker() returns 0 on success and caller of
>>>> vfio_migration_realize expects the opposite when blockers are added.
>>>>
>>>> Perhaps better wording would be:
>>>>
>>>> migrate_add_blocker() returns 0 when successfully adding the
>>>> migration blocker. However, the caller of vfio_migration_realize()
>>>> considers that migration was blocked when the latter returned an
>>>> error. Fix it by negating the return value obtained by
>>>> migrate_add_blocker(). What matters for migration is that the blocker
>>>> is added in core migration, so this cleans up usability such that
>>>> user sees "Migrate disabled" when any of the vfio migration blockers are
>> active.
>>>
>>> Great, I'll use your words.
>>>
>>>>
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>  hw/vfio/common.c    | 4 ++--
>>>>>  hw/vfio/migration.c | 6 +++---
>>>>>  2 files changed, 5 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>>>> fa8fd949b1cf..8505385798f3 100644
>>>>> --- a/hw/vfio/common.c
>>>>> +++ b/hw/vfio/common.c
>>>>> @@ -399,7 +399,7 @@ int vfio_block_multiple_devices_migration(Error
>>>> **errp)
>>>>>          multiple_devices_migration_blocker = NULL;
>>>>>      }
>>>>>
>>>>> -    return ret;
>>>>> +    return !ret;
>>>>>  }
>>>>>
>>>>>  void vfio_unblock_multiple_devices_migration(void)
>>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>>>          giommu_migration_blocker = NULL;
>>>>>      }
>>>>>
>>>>> -    return ret;
>>>>> +    return !ret;
>>>>>  }
>>>>>
>>>>>  void vfio_migration_finalize(void)
>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>>>> 6b58dddb8859..0146521d129a 100644
>>>>> --- a/hw/vfio/migration.c
>>>>> +++ b/hw/vfio/migration.c
>>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice
>>>>> *vbasedev,
>>>> Error **errp)
>>>>>      }
>>>>>
>>>>>      ret = vfio_block_multiple_devices_migration(errp);
>>>>> -    if (ret) {
>>>>> +    if (ret || (errp && *errp)) {
>>>>
>>>> Why do you need this extra clause?
>>>
>>> Now that error happens, no need to add other blockers which will fail
>>> for same reason.
>>>
>>
>> But you don't need the (errp && *errp) for that as that's the whole point of
>> negating the result.
>>
>> And if there's an error set it means migrate_add_blocker failed to add the
>> blocker (e.g. snapshotting IIUC), and you would be failing here unnecessarily?
> 
> If there is an error qdev_device_add() will fail, continue execution is meaningless here?
> There is ERRP_GUARD in this path, so it looks (*errp) is enough.
> 
> If I removed (errp && *errp) to continue, need below change to bypass trace_vfio_migration_probe
> Do you prefer this way?
> 
>     if (!*errp) {
>         trace_vfio_migration_probe(vbasedev->name);
>     }
>

I am mainly questioning that the error testing is correct to test here.

IIUC, the only one that can propagate any *new* error in vfio_migration_realize
is the calls to migrate_add_blocker failing within the vfio_block* (migration
code suggests that this happens on snapshotting). Failing to add migration
blocker just means we haven't installed any blockers. And the current code
presents that as a "Migration disabled" case. If we want to preserve that
behaviour on migration_add_blocker() failures (which seems like that's what you
are doing here) then instead of this:

	return !ret;

you would do this:

	ret = migration_add_blocker(...);
	if (ret < 0) {
		error_free(...);
		blocker = NULL;
+		return ret;
	}

+	return 1;

Or something like that. And then if you return early as you intended?
Duan, Zhenzhong June 16, 2023, 9:53 a.m. UTC | #6
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Friday, June 16, 2023 5:06 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
>
>On 16/06/2023 03:42, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Sent: Thursday, June 15, 2023 6:23 PM
>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>> vfio_migration_realize()
>>>
>>> On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>>>>> -----Original Message-----
>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>> Sent: Thursday, June 15, 2023 4:54 PM
>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>> vfio_migration_realize()
>>>>>
>>>>>
>>>>>
>>>>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>>>>> We should print "Migration disabled" when migration is blocked in
>>>>>> vfio_migration_realize().
>>>>>>
>>>>>> Fix it by reverting return value of migrate_add_blocker(),
>>>>>> meanwhile error out directly once migrate_add_blocker() failed.
>>>>>>
>>>>>
>>>>> It wasn't immediately obvious from commit message that this is
>>>>> mainly about just printing an error message when blockers get added
>>>>> and that well
>>>>> migrate_add_blocker() returns 0 on success and caller of
>>>>> vfio_migration_realize expects the opposite when blockers are added.
>>>>>
>>>>> Perhaps better wording would be:
>>>>>
>>>>> migrate_add_blocker() returns 0 when successfully adding the
>>>>> migration blocker. However, the caller of vfio_migration_realize()
>>>>> considers that migration was blocked when the latter returned an
>>>>> error. Fix it by negating the return value obtained by
>>>>> migrate_add_blocker(). What matters for migration is that the
>>>>> blocker is added in core migration, so this cleans up usability
>>>>> such that user sees "Migrate disabled" when any of the vfio
>>>>> migration blockers are
>>> active.
>>>>
>>>> Great, I'll use your words.
>>>>
>>>>>
>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>> ---
>>>>>>  hw/vfio/common.c    | 4 ++--
>>>>>>  hw/vfio/migration.c | 6 +++---
>>>>>>  2 files changed, 5 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>>>>> fa8fd949b1cf..8505385798f3 100644
>>>>>> --- a/hw/vfio/common.c
>>>>>> +++ b/hw/vfio/common.c
>>>>>> @@ -399,7 +399,7 @@ int
>>>>>> vfio_block_multiple_devices_migration(Error
>>>>> **errp)
>>>>>>          multiple_devices_migration_blocker = NULL;
>>>>>>      }
>>>>>>
>>>>>> -    return ret;
>>>>>> +    return !ret;
>>>>>>  }
>>>>>>
>>>>>>  void vfio_unblock_multiple_devices_migration(void)
>>>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>>>>          giommu_migration_blocker = NULL;
>>>>>>      }
>>>>>>
>>>>>> -    return ret;
>>>>>> +    return !ret;
>>>>>>  }
>>>>>>
>>>>>>  void vfio_migration_finalize(void) diff --git
>>>>>> a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>>>>> 6b58dddb8859..0146521d129a 100644
>>>>>> --- a/hw/vfio/migration.c
>>>>>> +++ b/hw/vfio/migration.c
>>>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice
>>>>>> *vbasedev,
>>>>> Error **errp)
>>>>>>      }
>>>>>>
>>>>>>      ret = vfio_block_multiple_devices_migration(errp);
>>>>>> -    if (ret) {
>>>>>> +    if (ret || (errp && *errp)) {
>>>>>
>>>>> Why do you need this extra clause?
>>>>
>>>> Now that error happens, no need to add other blockers which will
>>>> fail for same reason.
>>>>
>>>
>>> But you don't need the (errp && *errp) for that as that's the whole
>>> point of negating the result.
>>>
>>> And if there's an error set it means migrate_add_blocker failed to
>>> add the blocker (e.g. snapshotting IIUC), and you would be failing here
>unnecessarily?
>>
>> If there is an error qdev_device_add() will fail, continue execution is
>meaningless here?
>> There is ERRP_GUARD in this path, so it looks (*errp) is enough.
>>
>> If I removed (errp && *errp) to continue, need below change to bypass
>> trace_vfio_migration_probe Do you prefer this way?
>>
>>     if (!*errp) {
>>         trace_vfio_migration_probe(vbasedev->name);
>>     }
>>
>
>I am mainly questioning that the error testing is correct to test here.
>
>IIUC, the only one that can propagate any *new* error in
>vfio_migration_realize is the calls to migrate_add_blocker failing within the
>vfio_block* (migration code suggests that this happens on snapshotting).
>Failing to add migration blocker just means we haven't installed any blockers.
>And the current code presents that as a "Migration disabled" case. If we want
>to preserve that behaviour on migration_add_blocker() failures (which seems
>like that's what you are doing here) then instead of this:

Current behavior(without my patch):
"Migration disabled" isn't printed if migrate_add_blocker succeed.
"Migration disabled" is printed if migrate_add_blocker fail.

I think this behavior isn't correct, I want to revert it not preserve it, so I used !ret.
Imagine we hotplug a vfio device when snapshotting, migrate_add_blocker failure
will lead to hotplug fail, then the guest is still migratable as no vfio plugged.
But we see "Migration disabled" which will confuse us.

>
>	return !ret;
>
>you would do this:
>
>	ret = migration_add_blocker(...);
>	if (ret < 0) {
>		error_free(...);
>		blocker = NULL;
>+		return ret;
>	}
>
>+	return 1;
>
>Or something like that. And then if you return early as you intended?

Yes, this change make sense, I also want to add below:

     if (!pdev->failover_pair_id) {
         ret = vfio_migration_realize(vbasedev, errp);
-        if (ret) {
+        if (ret > 0) {
             error_report("%s: Migration disabled", vbasedev->name);
         }

Thanks
Zhenzhong
Joao Martins June 16, 2023, 10:12 a.m. UTC | #7
On 16/06/2023 10:53, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Sent: Friday, June 16, 2023 5:06 PM
>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>> Peng, Chao P <chao.p.peng@intel.com>
>> Subject: Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
>>
>> On 16/06/2023 03:42, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Sent: Thursday, June 15, 2023 6:23 PM
>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>> vfio_migration_realize()
>>>>
>>>> On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>>>>>> -----Original Message-----
>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>> Sent: Thursday, June 15, 2023 4:54 PM
>>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>>> vfio_migration_realize()
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>>>>>> We should print "Migration disabled" when migration is blocked in
>>>>>>> vfio_migration_realize().
>>>>>>>
>>>>>>> Fix it by reverting return value of migrate_add_blocker(),
>>>>>>> meanwhile error out directly once migrate_add_blocker() failed.
>>>>>>>
>>>>>>
>>>>>> It wasn't immediately obvious from commit message that this is
>>>>>> mainly about just printing an error message when blockers get added
>>>>>> and that well
>>>>>> migrate_add_blocker() returns 0 on success and caller of
>>>>>> vfio_migration_realize expects the opposite when blockers are added.
>>>>>>
>>>>>> Perhaps better wording would be:
>>>>>>
>>>>>> migrate_add_blocker() returns 0 when successfully adding the
>>>>>> migration blocker. However, the caller of vfio_migration_realize()
>>>>>> considers that migration was blocked when the latter returned an
>>>>>> error. Fix it by negating the return value obtained by
>>>>>> migrate_add_blocker(). What matters for migration is that the
>>>>>> blocker is added in core migration, so this cleans up usability
>>>>>> such that user sees "Migrate disabled" when any of the vfio
>>>>>> migration blockers are
>>>> active.
>>>>>
>>>>> Great, I'll use your words.
>>>>>
>>>>>>
>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>> ---
>>>>>>>  hw/vfio/common.c    | 4 ++--
>>>>>>>  hw/vfio/migration.c | 6 +++---
>>>>>>>  2 files changed, 5 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>>>>>> fa8fd949b1cf..8505385798f3 100644
>>>>>>> --- a/hw/vfio/common.c
>>>>>>> +++ b/hw/vfio/common.c
>>>>>>> @@ -399,7 +399,7 @@ int
>>>>>>> vfio_block_multiple_devices_migration(Error
>>>>>> **errp)
>>>>>>>          multiple_devices_migration_blocker = NULL;
>>>>>>>      }
>>>>>>>
>>>>>>> -    return ret;
>>>>>>> +    return !ret;
>>>>>>>  }
>>>>>>>
>>>>>>>  void vfio_unblock_multiple_devices_migration(void)
>>>>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>>>>>          giommu_migration_blocker = NULL;
>>>>>>>      }
>>>>>>>
>>>>>>> -    return ret;
>>>>>>> +    return !ret;
>>>>>>>  }
>>>>>>>
>>>>>>>  void vfio_migration_finalize(void) diff --git
>>>>>>> a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>>>>>> 6b58dddb8859..0146521d129a 100644
>>>>>>> --- a/hw/vfio/migration.c
>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice
>>>>>>> *vbasedev,
>>>>>> Error **errp)
>>>>>>>      }
>>>>>>>
>>>>>>>      ret = vfio_block_multiple_devices_migration(errp);
>>>>>>> -    if (ret) {
>>>>>>> +    if (ret || (errp && *errp)) {
>>>>>>
>>>>>> Why do you need this extra clause?
>>>>>
>>>>> Now that error happens, no need to add other blockers which will
>>>>> fail for same reason.
>>>>>
>>>>
>>>> But you don't need the (errp && *errp) for that as that's the whole
>>>> point of negating the result.
>>>>
>>>> And if there's an error set it means migrate_add_blocker failed to
>>>> add the blocker (e.g. snapshotting IIUC), and you would be failing here
>> unnecessarily?
>>>
>>> If there is an error qdev_device_add() will fail, continue execution is
>> meaningless here?
>>> There is ERRP_GUARD in this path, so it looks (*errp) is enough.
>>>
>>> If I removed (errp && *errp) to continue, need below change to bypass
>>> trace_vfio_migration_probe Do you prefer this way?
>>>
>>>     if (!*errp) {
>>>         trace_vfio_migration_probe(vbasedev->name);
>>>     }
>>>
>>
>> I am mainly questioning that the error testing is correct to test here.
>>
>> IIUC, the only one that can propagate any *new* error in
>> vfio_migration_realize is the calls to migrate_add_blocker failing within the
>> vfio_block* (migration code suggests that this happens on snapshotting).
>> Failing to add migration blocker just means we haven't installed any blockers.
>> And the current code presents that as a "Migration disabled" case. If we want
>> to preserve that behaviour on migration_add_blocker() failures (which seems
>> like that's what you are doing here) then instead of this:
> 
> Current behavior(without my patch):
> "Migration disabled" isn't printed if migrate_add_blocker succeed.
> "Migration disabled" is printed if migrate_add_blocker fail.
> 
> I think this behavior isn't correct, I want to revert it not preserve it, so I used !ret.
> Imagine we hotplug a vfio device when snapshotting, migrate_add_blocker failure
> will lead to hotplug fail, then the guest is still migratable as no vfio plugged.
> But we see "Migration disabled" which will confuse us.
>

/me nods
 >>
>> 	return !ret;
>>
>> you would do this:
>>
>> 	ret = migration_add_blocker(...);
>> 	if (ret < 0) {
>> 		error_free(...);
>> 		blocker = NULL;
>> +		return ret;
>> 	}
>>
>> +	return 1;
>>
>> Or something like that. And then if you return early as you intended?
> 
> Yes, this change make sense, I also want to add below:
> 
>      if (!pdev->failover_pair_id) {
>          ret = vfio_migration_realize(vbasedev, errp);
> -        if (ret) {
> +        if (ret > 0) {
>              error_report("%s: Migration disabled", vbasedev->name);
>          }
> 
Makes sense. Checking errp above before printing the tracepoint (like you
suggested) is also an option taking the discussion so far, but perhaps the
return type to be bool to make it clear to callers that this is not no longer an
error code? Maybe let's wait for others on what style is generally preferred in
error propagation.

	Joao
Cédric Le Goater June 16, 2023, 2:04 p.m. UTC | #8
On 6/16/23 12:12, Joao Martins wrote:
> On 16/06/2023 10:53, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Sent: Friday, June 16, 2023 5:06 PM
>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>>> Peng, Chao P <chao.p.peng@intel.com>
>>> Subject: Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
>>>
>>> On 16/06/2023 03:42, Duan, Zhenzhong wrote:
>>>>> -----Original Message-----
>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>> Sent: Thursday, June 15, 2023 6:23 PM
>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>> vfio_migration_realize()
>>>>>
>>>>> On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>>> Sent: Thursday, June 15, 2023 4:54 PM
>>>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>>>> vfio_migration_realize()
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>>>>>>> We should print "Migration disabled" when migration is blocked in
>>>>>>>> vfio_migration_realize().
>>>>>>>>
>>>>>>>> Fix it by reverting return value of migrate_add_blocker(),
>>>>>>>> meanwhile error out directly once migrate_add_blocker() failed.
>>>>>>>>
>>>>>>>
>>>>>>> It wasn't immediately obvious from commit message that this is
>>>>>>> mainly about just printing an error message when blockers get added
>>>>>>> and that well
>>>>>>> migrate_add_blocker() returns 0 on success and caller of
>>>>>>> vfio_migration_realize expects the opposite when blockers are added.
>>>>>>>
>>>>>>> Perhaps better wording would be:
>>>>>>>
>>>>>>> migrate_add_blocker() returns 0 when successfully adding the
>>>>>>> migration blocker. However, the caller of vfio_migration_realize()
>>>>>>> considers that migration was blocked when the latter returned an
>>>>>>> error. Fix it by negating the return value obtained by
>>>>>>> migrate_add_blocker(). What matters for migration is that the
>>>>>>> blocker is added in core migration, so this cleans up usability
>>>>>>> such that user sees "Migrate disabled" when any of the vfio
>>>>>>> migration blockers are
>>>>> active.
>>>>>>
>>>>>> Great, I'll use your words.
>>>>>>
>>>>>>>
>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>> ---
>>>>>>>>   hw/vfio/common.c    | 4 ++--
>>>>>>>>   hw/vfio/migration.c | 6 +++---
>>>>>>>>   2 files changed, 5 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>>>>>>> fa8fd949b1cf..8505385798f3 100644
>>>>>>>> --- a/hw/vfio/common.c
>>>>>>>> +++ b/hw/vfio/common.c
>>>>>>>> @@ -399,7 +399,7 @@ int
>>>>>>>> vfio_block_multiple_devices_migration(Error
>>>>>>> **errp)
>>>>>>>>           multiple_devices_migration_blocker = NULL;
>>>>>>>>       }
>>>>>>>>
>>>>>>>> -    return ret;
>>>>>>>> +    return !ret;
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   void vfio_unblock_multiple_devices_migration(void)
>>>>>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>>>>>>           giommu_migration_blocker = NULL;
>>>>>>>>       }
>>>>>>>>
>>>>>>>> -    return ret;
>>>>>>>> +    return !ret;
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   void vfio_migration_finalize(void) diff --git
>>>>>>>> a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>>>>>>> 6b58dddb8859..0146521d129a 100644
>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice
>>>>>>>> *vbasedev,
>>>>>>> Error **errp)
>>>>>>>>       }
>>>>>>>>
>>>>>>>>       ret = vfio_block_multiple_devices_migration(errp);
>>>>>>>> -    if (ret) {
>>>>>>>> +    if (ret || (errp && *errp)) {
>>>>>>>
>>>>>>> Why do you need this extra clause?
>>>>>>
>>>>>> Now that error happens, no need to add other blockers which will
>>>>>> fail for same reason.
>>>>>>
>>>>>
>>>>> But you don't need the (errp && *errp) for that as that's the whole
>>>>> point of negating the result.
>>>>>
>>>>> And if there's an error set it means migrate_add_blocker failed to
>>>>> add the blocker (e.g. snapshotting IIUC), and you would be failing here
>>> unnecessarily?
>>>>
>>>> If there is an error qdev_device_add() will fail, continue execution is
>>> meaningless here?
>>>> There is ERRP_GUARD in this path, so it looks (*errp) is enough.
>>>>
>>>> If I removed (errp && *errp) to continue, need below change to bypass
>>>> trace_vfio_migration_probe Do you prefer this way?
>>>>
>>>>      if (!*errp) {
>>>>          trace_vfio_migration_probe(vbasedev->name);
>>>>      }
>>>>
>>>
>>> I am mainly questioning that the error testing is correct to test here.
>>>
>>> IIUC, the only one that can propagate any *new* error in
>>> vfio_migration_realize is the calls to migrate_add_blocker failing within the
>>> vfio_block* (migration code suggests that this happens on snapshotting).
>>> Failing to add migration blocker just means we haven't installed any blockers.
>>> And the current code presents that as a "Migration disabled" case. If we want
>>> to preserve that behaviour on migration_add_blocker() failures (which seems
>>> like that's what you are doing here) then instead of this:
>>
>> Current behavior(without my patch):
>> "Migration disabled" isn't printed if migrate_add_blocker succeed.
>> "Migration disabled" is printed if migrate_add_blocker fail.
>>
>> I think this behavior isn't correct, I want to revert it not preserve it, so I used !ret.
>> Imagine we hotplug a vfio device when snapshotting, migrate_add_blocker failure
>> will lead to hotplug fail, then the guest is still migratable as no vfio plugged.
>> But we see "Migration disabled" which will confuse us.
>>
> 
> /me nods

Yes. Overall, the reason for which migration is not supported or is
disabled is not very clear today (for the user). It might need some
more adjustments, like this one, before we remove the experimental flag.
It will also help QE to define test scenarios and track expected results.

>   >>
>>> 	return !ret;
>>>
>>> you would do this:
>>>
>>> 	ret = migration_add_blocker(...);
>>> 	if (ret < 0) {
>>> 		error_free(...);
>>> 		blocker = NULL;
>>> +		return ret;
>>> 	}
>>>
>>> +	return 1;
>>>
>>> Or something like that. And then if you return early as you intended?
>>
>> Yes, this change make sense, I also want to add below:
>>
>>       if (!pdev->failover_pair_id) {
>>           ret = vfio_migration_realize(vbasedev, errp);
>> -        if (ret) {
>> +        if (ret > 0) {
>>               error_report("%s: Migration disabled", vbasedev->name);
>>           }
>>
> Makes sense. Checking errp above before printing the tracepoint (like you
> suggested) is also an option taking the discussion so far, but perhaps the
> return type to be bool to make it clear to callers that this is not no longer an
> error code? Maybe let's wait for others on what style is generally preferred in
> error propagation.

I think the code should follow the qdev_realize() prototype, which returns
a bool.

Thanks,

C.
Joao Martins June 16, 2023, 6:06 p.m. UTC | #9
On 16/06/2023 15:04, Cédric Le Goater wrote:
> On 6/16/23 12:12, Joao Martins wrote:
>> On 16/06/2023 10:53, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Sent: Friday, June 16, 2023 5:06 PM
>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>>>> Peng, Chao P <chao.p.peng@intel.com>
>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>> vfio_migration_realize()
>>>>
>>>> On 16/06/2023 03:42, Duan, Zhenzhong wrote:
>>>>>> -----Original Message-----
>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>> Sent: Thursday, June 15, 2023 6:23 PM
>>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>>> vfio_migration_realize()
>>>>>>
>>>>>> On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>> Sent: Thursday, June 15, 2023 4:54 PM
>>>>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>>>>> vfio_migration_realize()
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>>>>>>>> We should print "Migration disabled" when migration is blocked in
>>>>>>>>> vfio_migration_realize().
>>>>>>>>>
>>>>>>>>> Fix it by reverting return value of migrate_add_blocker(),
>>>>>>>>> meanwhile error out directly once migrate_add_blocker() failed.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It wasn't immediately obvious from commit message that this is
>>>>>>>> mainly about just printing an error message when blockers get added
>>>>>>>> and that well
>>>>>>>> migrate_add_blocker() returns 0 on success and caller of
>>>>>>>> vfio_migration_realize expects the opposite when blockers are added.
>>>>>>>>
>>>>>>>> Perhaps better wording would be:
>>>>>>>>
>>>>>>>> migrate_add_blocker() returns 0 when successfully adding the
>>>>>>>> migration blocker. However, the caller of vfio_migration_realize()
>>>>>>>> considers that migration was blocked when the latter returned an
>>>>>>>> error. Fix it by negating the return value obtained by
>>>>>>>> migrate_add_blocker(). What matters for migration is that the
>>>>>>>> blocker is added in core migration, so this cleans up usability
>>>>>>>> such that user sees "Migrate disabled" when any of the vfio
>>>>>>>> migration blockers are
>>>>>> active.
>>>>>>>
>>>>>>> Great, I'll use your words.
>>>>>>>
>>>>>>>>
>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>> ---
>>>>>>>>>   hw/vfio/common.c    | 4 ++--
>>>>>>>>>   hw/vfio/migration.c | 6 +++---
>>>>>>>>>   2 files changed, 5 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>>>>>>>> fa8fd949b1cf..8505385798f3 100644
>>>>>>>>> --- a/hw/vfio/common.c
>>>>>>>>> +++ b/hw/vfio/common.c
>>>>>>>>> @@ -399,7 +399,7 @@ int
>>>>>>>>> vfio_block_multiple_devices_migration(Error
>>>>>>>> **errp)
>>>>>>>>>           multiple_devices_migration_blocker = NULL;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> -    return ret;
>>>>>>>>> +    return !ret;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>>   void vfio_unblock_multiple_devices_migration(void)
>>>>>>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>>>>>>>           giommu_migration_blocker = NULL;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> -    return ret;
>>>>>>>>> +    return !ret;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>>   void vfio_migration_finalize(void) diff --git
>>>>>>>>> a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>>>>>>>> 6b58dddb8859..0146521d129a 100644
>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice
>>>>>>>>> *vbasedev,
>>>>>>>> Error **errp)
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>>       ret = vfio_block_multiple_devices_migration(errp);
>>>>>>>>> -    if (ret) {
>>>>>>>>> +    if (ret || (errp && *errp)) {
>>>>>>>>
>>>>>>>> Why do you need this extra clause?
>>>>>>>
>>>>>>> Now that error happens, no need to add other blockers which will
>>>>>>> fail for same reason.
>>>>>>>
>>>>>>
>>>>>> But you don't need the (errp && *errp) for that as that's the whole
>>>>>> point of negating the result.
>>>>>>
>>>>>> And if there's an error set it means migrate_add_blocker failed to
>>>>>> add the blocker (e.g. snapshotting IIUC), and you would be failing here
>>>> unnecessarily?
>>>>>
>>>>> If there is an error qdev_device_add() will fail, continue execution is
>>>> meaningless here?
>>>>> There is ERRP_GUARD in this path, so it looks (*errp) is enough.
>>>>>
>>>>> If I removed (errp && *errp) to continue, need below change to bypass
>>>>> trace_vfio_migration_probe Do you prefer this way?
>>>>>
>>>>>      if (!*errp) {
>>>>>          trace_vfio_migration_probe(vbasedev->name);
>>>>>      }
>>>>>
>>>>
>>>> I am mainly questioning that the error testing is correct to test here.
>>>>
>>>> IIUC, the only one that can propagate any *new* error in
>>>> vfio_migration_realize is the calls to migrate_add_blocker failing within the
>>>> vfio_block* (migration code suggests that this happens on snapshotting).
>>>> Failing to add migration blocker just means we haven't installed any blockers.
>>>> And the current code presents that as a "Migration disabled" case. If we want
>>>> to preserve that behaviour on migration_add_blocker() failures (which seems
>>>> like that's what you are doing here) then instead of this:
>>>
>>> Current behavior(without my patch):
>>> "Migration disabled" isn't printed if migrate_add_blocker succeed.
>>> "Migration disabled" is printed if migrate_add_blocker fail.
>>>
>>> I think this behavior isn't correct, I want to revert it not preserve it, so
>>> I used !ret.
>>> Imagine we hotplug a vfio device when snapshotting, migrate_add_blocker failure
>>> will lead to hotplug fail, then the guest is still migratable as no vfio
>>> plugged.
>>> But we see "Migration disabled" which will confuse us.
>>>
>>
>> /me nods
> 
> Yes. Overall, the reason for which migration is not supported or is
> disabled is not very clear today (for the user). It might need some
> more adjustments, like this one, before we remove the experimental flag.

I think the "Migration Disabled" was an UX oversight when we refactored blockers
into vfio_migration_realize(). The actual migration is *not* disabled, it is
just that the message or tracepoint shouldn't be printed. The reasons why it is
blocked / not supported are clear in code right now.

> It will also help QE to define test scenarios and track expected results.
> 
>>   >>
>>>>     return !ret;
>>>>
>>>> you would do this:
>>>>
>>>>     ret = migration_add_blocker(...);
>>>>     if (ret < 0) {
>>>>         error_free(...);
>>>>         blocker = NULL;
>>>> +        return ret;
>>>>     }
>>>>
>>>> +    return 1;
>>>>
>>>> Or something like that. And then if you return early as you intended?
>>>
>>> Yes, this change make sense, I also want to add below:
>>>
>>>       if (!pdev->failover_pair_id) {
>>>           ret = vfio_migration_realize(vbasedev, errp);
>>> -        if (ret) {
>>> +        if (ret > 0) {
>>>               error_report("%s: Migration disabled", vbasedev->name);
>>>           }
>>>
>> Makes sense. Checking errp above before printing the tracepoint (like you
>> suggested) is also an option taking the discussion so far, but perhaps the
>> return type to be bool to make it clear to callers that this is not no longer an
>> error code? Maybe let's wait for others on what style is generally preferred in
>> error propagation.
> 
> I think the code should follow the qdev_realize() prototype, which returns
> a bool.
> 

That makes sense, it makes the decision a lot simpler.

> Thanks,
> 
> C.
> 
>
diff mbox series

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fa8fd949b1cf..8505385798f3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -399,7 +399,7 @@  int vfio_block_multiple_devices_migration(Error **errp)
         multiple_devices_migration_blocker = NULL;
     }
 
-    return ret;
+    return !ret;
 }
 
 void vfio_unblock_multiple_devices_migration(void)
@@ -444,7 +444,7 @@  int vfio_block_giommu_migration(Error **errp)
         giommu_migration_blocker = NULL;
     }
 
-    return ret;
+    return !ret;
 }
 
 void vfio_migration_finalize(void)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 6b58dddb8859..0146521d129a 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -646,12 +646,12 @@  int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
     }
 
     ret = vfio_block_multiple_devices_migration(errp);
-    if (ret) {
+    if (ret || (errp && *errp)) {
         return ret;
     }
 
     ret = vfio_block_giommu_migration(errp);
-    if (ret) {
+    if (ret || (errp && *errp)) {
         return ret;
     }
 
@@ -667,7 +667,7 @@  add_blocker:
         error_free(vbasedev->migration_blocker);
         vbasedev->migration_blocker = NULL;
     }
-    return ret;
+    return !ret;
 }
 
 void vfio_migration_exit(VFIODevice *vbasedev)