diff mbox series

[v2,4/5] vfio/migration: Block VFIO migration with postcopy migration

Message ID 20230831125702.11263-5-avihaih@nvidia.com (mailing list archive)
State New, archived
Headers show
Series vfio/migration: Block VFIO migration with postcopy and background snapshot | expand

Commit Message

Avihai Horon Aug. 31, 2023, 12:57 p.m. UTC
VFIO migration is not compatible with postcopy migration. A VFIO device
in the destination can't handle page faults for pages that have not been
sent yet.

Doing such migration will cause the VM to crash in the destination:

qemu-system-x86_64: VFIO_MAP_DMA failed: Bad address
qemu-system-x86_64: vfio_dma_map(0x55a28c7659d0, 0xc0000, 0xb000, 0x7f1b11a00000) = -14 (Bad address)
qemu: hardware error: vfio: DMA mapping failed, unable to continue

To prevent this, block VFIO migration with postcopy migration.

Reported-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

YangHang Liu Sept. 1, 2023, 10:14 a.m. UTC | #1
When try to do the vfio post-copy migration, we can get an expected
internal error now: "unable to execute QEMU command 'migrate':
0000:b1:00.2: VFIO migration is not supported with postcopy migration"

Tested-by: Yanghang Liu <yanghliu@redhat.com>

Best Regards,
YangHang Liu


On Thu, Aug 31, 2023 at 8:57 PM Avihai Horon <avihaih@nvidia.com> wrote:
>
> VFIO migration is not compatible with postcopy migration. A VFIO device
> in the destination can't handle page faults for pages that have not been
> sent yet.
>
> Doing such migration will cause the VM to crash in the destination:
>
> qemu-system-x86_64: VFIO_MAP_DMA failed: Bad address
> qemu-system-x86_64: vfio_dma_map(0x55a28c7659d0, 0xc0000, 0xb000, 0x7f1b11a00000) = -14 (Bad address)
> qemu: hardware error: vfio: DMA mapping failed, unable to continue
>
> To prevent this, block VFIO migration with postcopy migration.
>
> Reported-by: Yanghang Liu <yanghliu@redhat.com>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  hw/vfio/migration.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 71855468fe..20994dc1d6 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -335,6 +335,27 @@ static bool vfio_precopy_supported(VFIODevice *vbasedev)
>
>  /* ---------------------------------------------------------------------- */
>
> +static int vfio_save_prepare(void *opaque, Error **errp)
> +{
> +    VFIODevice *vbasedev = opaque;
> +
> +    /*
> +     * Snapshot doesn't use postcopy, so allow snapshot even if postcopy is on.
> +     */
> +    if (runstate_check(RUN_STATE_SAVE_VM)) {
> +        return 0;
> +    }
> +
> +    if (migrate_postcopy_ram()) {
> +        error_setg(
> +            errp, "%s: VFIO migration is not supported with postcopy migration",
> +            vbasedev->name);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    return 0;
> +}
> +
>  static int vfio_save_setup(QEMUFile *f, void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -640,6 +661,7 @@ static bool vfio_switchover_ack_needed(void *opaque)
>  }
>
>  static const SaveVMHandlers savevm_vfio_handlers = {
> +    .save_prepare = vfio_save_prepare,
>      .save_setup = vfio_save_setup,
>      .save_cleanup = vfio_save_cleanup,
>      .state_pending_estimate = vfio_state_pending_estimate,
> --
> 2.26.3
>
Peter Xu Sept. 1, 2023, 3:51 p.m. UTC | #2
On Thu, Aug 31, 2023 at 03:57:01PM +0300, Avihai Horon wrote:
> VFIO migration is not compatible with postcopy migration. A VFIO device
> in the destination can't handle page faults for pages that have not been
> sent yet.
> 
> Doing such migration will cause the VM to crash in the destination:
> 
> qemu-system-x86_64: VFIO_MAP_DMA failed: Bad address
> qemu-system-x86_64: vfio_dma_map(0x55a28c7659d0, 0xc0000, 0xb000, 0x7f1b11a00000) = -14 (Bad address)
> qemu: hardware error: vfio: DMA mapping failed, unable to continue
> 
> To prevent this, block VFIO migration with postcopy migration.
> 
> Reported-by: Yanghang Liu <yanghliu@redhat.com>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  hw/vfio/migration.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 71855468fe..20994dc1d6 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -335,6 +335,27 @@ static bool vfio_precopy_supported(VFIODevice *vbasedev)
>  
>  /* ---------------------------------------------------------------------- */
>  
> +static int vfio_save_prepare(void *opaque, Error **errp)
> +{
> +    VFIODevice *vbasedev = opaque;
> +
> +    /*
> +     * Snapshot doesn't use postcopy, so allow snapshot even if postcopy is on.
> +     */
> +    if (runstate_check(RUN_STATE_SAVE_VM)) {
> +        return 0;
> +    }

Just purely curious: will it really work to save a snapshot for the GPU
assigned use case?

> +
> +    if (migrate_postcopy_ram()) {
> +        error_setg(
> +            errp, "%s: VFIO migration is not supported with postcopy migration",
> +            vbasedev->name);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    return 0;
> +}
Avihai Horon Sept. 3, 2023, 7:52 a.m. UTC | #3
On 01/09/2023 18:51, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Aug 31, 2023 at 03:57:01PM +0300, Avihai Horon wrote:
>> VFIO migration is not compatible with postcopy migration. A VFIO device
>> in the destination can't handle page faults for pages that have not been
>> sent yet.
>>
>> Doing such migration will cause the VM to crash in the destination:
>>
>> qemu-system-x86_64: VFIO_MAP_DMA failed: Bad address
>> qemu-system-x86_64: vfio_dma_map(0x55a28c7659d0, 0xc0000, 0xb000, 0x7f1b11a00000) = -14 (Bad address)
>> qemu: hardware error: vfio: DMA mapping failed, unable to continue
>>
>> To prevent this, block VFIO migration with postcopy migration.
>>
>> Reported-by: Yanghang Liu <yanghliu@redhat.com>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   hw/vfio/migration.c | 22 ++++++++++++++++++++++
>>   1 file changed, 22 insertions(+)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 71855468fe..20994dc1d6 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -335,6 +335,27 @@ static bool vfio_precopy_supported(VFIODevice *vbasedev)
>>
>>   /* ---------------------------------------------------------------------- */
>>
>> +static int vfio_save_prepare(void *opaque, Error **errp)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +
>> +    /*
>> +     * Snapshot doesn't use postcopy, so allow snapshot even if postcopy is on.
>> +     */
>> +    if (runstate_check(RUN_STATE_SAVE_VM)) {
>> +        return 0;
>> +    }
> Just purely curious: will it really work to save a snapshot for the GPU
> assigned use case?

I have never tried that.
Adding Tarun, maybe he can answer that.

Thanks.

>> +
>> +    if (migrate_postcopy_ram()) {
>> +        error_setg(
>> +            errp, "%s: VFIO migration is not supported with postcopy migration",
>> +            vbasedev->name);
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    return 0;
>> +}
> --
> Peter Xu
>
diff mbox series

Patch

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 71855468fe..20994dc1d6 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -335,6 +335,27 @@  static bool vfio_precopy_supported(VFIODevice *vbasedev)
 
 /* ---------------------------------------------------------------------- */
 
+static int vfio_save_prepare(void *opaque, Error **errp)
+{
+    VFIODevice *vbasedev = opaque;
+
+    /*
+     * Snapshot doesn't use postcopy, so allow snapshot even if postcopy is on.
+     */
+    if (runstate_check(RUN_STATE_SAVE_VM)) {
+        return 0;
+    }
+
+    if (migrate_postcopy_ram()) {
+        error_setg(
+            errp, "%s: VFIO migration is not supported with postcopy migration",
+            vbasedev->name);
+        return -EOPNOTSUPP;
+    }
+
+    return 0;
+}
+
 static int vfio_save_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -640,6 +661,7 @@  static bool vfio_switchover_ack_needed(void *opaque)
 }
 
 static const SaveVMHandlers savevm_vfio_handlers = {
+    .save_prepare = vfio_save_prepare,
     .save_setup = vfio_save_setup,
     .save_cleanup = vfio_save_cleanup,
     .state_pending_estimate = vfio_state_pending_estimate,