mbox series

[RFC,0/1] Removing RAMBlocks during migration

Message ID 20191209074102.5926-1-yury-kotov@yandex-team.ru (mailing list archive)
Headers show
Series Removing RAMBlocks during migration | expand

Message

Yury Kotov Dec. 9, 2019, 7:41 a.m. UTC
Hi,

I found that it's possible to remove a RAMBlock during migration.
E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
And I want to clarify whether RAMBlock removing (or even adding) during
migration is valid operation or it's a bug.

Currently, it may cause some race conditions with migration thread and
migration may fail because of them. For instance, vmstate_unregister_ram
function which is called during PCIe device removing does these:
- Memset idstr -> target may receive unknown/zeroed idstr -> migration fail
- Set RAMBlock flags as non-migratable -> migration fail

RAMBlock removing itself seems safe for migration thread because of RCU.
But it seems to me there are other possible race conditions (didn't test it):
- qemu_put_buffer_async -> saves pointer to RAMBlock's memory
   -> block will be freed out of RCU (between ram save iterations)
   -> qemu_fflush -> access to freed memory.

So, I have the following questions:
1. Is RAMBlock removing/adding OK during migration?
2. If yes then what should we do with vmstate_unregister_ram?
   - Just remove vmstate_unregister_ram (my RFC patch)
   - Refcount RAMBlock's migratable/non-migratable state
   - Something else?
3. If it mustn't be possible, so may be
   assert(migration_is_idle()) in qemu_ram_free?

P.S.
I'm working on a fix of below problem and trying to choose better way:
allow device removing and fix all problem like this or fix a particular device.

--------
How to reproduce device removing during migration:

1. Source QEMU command line (target is similar)
  $ x86_64-softmmu/qemu-system-x86_64 \
    -nodefaults -no-user-config -m 1024 -M q35 \
    -qmp unix:./src.sock,server,nowait \
    -drive file=./image,format=raw,if=virtio \
    -device ioh3420,id=pcie.1 \
    -device virtio-net,bus=pcie.1
2. Start migration with slow speed (to simplify reproducing)
3. Power off a device on the hotplug pcie.1 bus:
  $ echo 0 > /sys/bus/pci/slots/0/power
4. Increase migration speed and wait until fail

Most likely you will get something like this:
  qemu-system-x86_64: get_pci_config_device: Bad config data:
          i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
  qemu-system-x86_64: Failed to load PCIDevice:config
  qemu-system-x86_64: Failed to load
          ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj
  qemu-system-x86_64: error while loading state for instance 0x0 of device
          '0000:00:03.0/ioh-3240-express-root-port'
  qemu-system-x86_64: load of migration failed: Invalid argument

This error is just an illustration of the removing device possibility,
but not actually an illustration of the race conditions for removing RAMBlock.

Regards,
Yury

Yury Kotov (1):
  migration: Remove vmstate_unregister_ram

 hw/block/pflash_cfi01.c     | 1 -
 hw/block/pflash_cfi02.c     | 1 -
 hw/mem/pc-dimm.c            | 5 -----
 hw/misc/ivshmem.c           | 2 --
 hw/pci/pci.c                | 1 -
 include/migration/vmstate.h | 1 -
 migration/savevm.c          | 6 ------
 7 files changed, 17 deletions(-)

Comments

Dr. David Alan Gilbert Dec. 11, 2019, 11:16 a.m. UTC | #1
* Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> Hi,
> 
> I found that it's possible to remove a RAMBlock during migration.
> E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
> And I want to clarify whether RAMBlock removing (or even adding) during
> migration is valid operation or it's a bug.
> 
> Currently, it may cause some race conditions with migration thread and
> migration may fail because of them. For instance, vmstate_unregister_ram
> function which is called during PCIe device removing does these:
> - Memset idstr -> target may receive unknown/zeroed idstr -> migration fail
> - Set RAMBlock flags as non-migratable -> migration fail
> 
> RAMBlock removing itself seems safe for migration thread because of RCU.
> But it seems to me there are other possible race conditions (didn't test it):
> - qemu_put_buffer_async -> saves pointer to RAMBlock's memory
>    -> block will be freed out of RCU (between ram save iterations)
>    -> qemu_fflush -> access to freed memory.
> 
> So, I have the following questions:
> 1. Is RAMBlock removing/adding OK during migration?

I don't think that any hot(un)plug is safe during migration.
While it's true we hold RCUs as we walk lists, we can't hold the RCU
around the entire migration.

There's lots of other problems;  for example we call the .save_setup
methods on devices at the start of migration, but then call the iterate
on those devices later - if the device is added/removed between stages
we'll end up either having done a setup and not calling the actual save,
or the other way around.

Juan added checks to qdev_device_add/qdev_unplug in b06424d ~2.5 years
ago.

> 2. If yes then what should we do with vmstate_unregister_ram?
>    - Just remove vmstate_unregister_ram (my RFC patch)
>    - Refcount RAMBlock's migratable/non-migratable state
>    - Something else?
> 3. If it mustn't be possible, so may be
>    assert(migration_is_idle()) in qemu_ram_free?
> 
> P.S.
> I'm working on a fix of below problem and trying to choose better way:
> allow device removing and fix all problem like this or fix a particular device.
> 
> --------
> How to reproduce device removing during migration:
> 
> 1. Source QEMU command line (target is similar)
>   $ x86_64-softmmu/qemu-system-x86_64 \
>     -nodefaults -no-user-config -m 1024 -M q35 \
>     -qmp unix:./src.sock,server,nowait \
>     -drive file=./image,format=raw,if=virtio \
>     -device ioh3420,id=pcie.1 \
>     -device virtio-net,bus=pcie.1
> 2. Start migration with slow speed (to simplify reproducing)
> 3. Power off a device on the hotplug pcie.1 bus:
>   $ echo 0 > /sys/bus/pci/slots/0/power
> 4. Increase migration speed and wait until fail
> 
> Most likely you will get something like this:
>   qemu-system-x86_64: get_pci_config_device: Bad config data:
>           i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
>   qemu-system-x86_64: Failed to load PCIDevice:config
>   qemu-system-x86_64: Failed to load
>           ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj
>   qemu-system-x86_64: error while loading state for instance 0x0 of device
>           '0000:00:03.0/ioh-3240-express-root-port'
>   qemu-system-x86_64: load of migration failed: Invalid argument
> 
> This error is just an illustration of the removing device possibility,
> but not actually an illustration of the race conditions for removing RAMBlock.

What path does this actually take - does it end up going via qdev_unplug
or some other way?

Dave

> Regards,
> Yury
> 
> Yury Kotov (1):
>   migration: Remove vmstate_unregister_ram
> 
>  hw/block/pflash_cfi01.c     | 1 -
>  hw/block/pflash_cfi02.c     | 1 -
>  hw/mem/pc-dimm.c            | 5 -----
>  hw/misc/ivshmem.c           | 2 --
>  hw/pci/pci.c                | 1 -
>  include/migration/vmstate.h | 1 -
>  migration/savevm.c          | 6 ------
>  7 files changed, 17 deletions(-)
> 
> -- 
> 2.24.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Yury Kotov Dec. 23, 2019, 8:51 a.m. UTC | #2
Hi!

11.12.2019, 14:17, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
>>  Hi,
>>
>>  I found that it's possible to remove a RAMBlock during migration.
>>  E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
>>  And I want to clarify whether RAMBlock removing (or even adding) during
>>  migration is valid operation or it's a bug.
>>
>>  Currently, it may cause some race conditions with migration thread and
>>  migration may fail because of them. For instance, vmstate_unregister_ram
>>  function which is called during PCIe device removing does these:
>>  - Memset idstr -> target may receive unknown/zeroed idstr -> migration fail
>>  - Set RAMBlock flags as non-migratable -> migration fail
>>
>>  RAMBlock removing itself seems safe for migration thread because of RCU.
>>  But it seems to me there are other possible race conditions (didn't test it):
>>  - qemu_put_buffer_async -> saves pointer to RAMBlock's memory
>>     -> block will be freed out of RCU (between ram save iterations)
>>     -> qemu_fflush -> access to freed memory.
>>
>>  So, I have the following questions:
>>  1. Is RAMBlock removing/adding OK during migration?
>
> I don't think that any hot(un)plug is safe during migration.
> While it's true we hold RCUs as we walk lists, we can't hold the RCU
> around the entire migration.

I agree. Currently, it's unsafe to do any hot(un)plug.
But I thought (and wanted to clarify) it would be nice to make it safe.
Hold the RCU around the entire migration is not the only way actually.
For example, we can defer RAMBlock deletion: refcount RAMBlocks before
migration and unref them after migration.

>
> There's lots of other problems; for example we call the .save_setup
> methods on devices at the start of migration, but then call the iterate
> on those devices later - if the device is added/removed between stages
> we'll end up either having done a setup and not calling the actual save,
> or the other way around.

Hm... Yeah, that's a problem, thanks for mentioning it!

>
> Juan added checks to qdev_device_add/qdev_unplug in b06424d ~2.5 years
> ago.

I see that hot(un)plug during migration has many issues.
But generally it has three groups (if I didn't miss something):
1) RAMBlock add/del
2) Device add/del
3) VMState add/del

IIUC, RAMBlocks are not always connected to some devices.
So, in theory, it might become possible to hot(un)plug a block
without hot adding/removing a device. It's why I wanted to clarify
is there a sense to fix separately the problems related to RAMBlocks.

But, if you think there is no sense to fix all related problems
to let hot(un)plugging during migration be allowed, I think we can add
an assert(!migrate_is_idle()) in qemu_ram_free.

>>  2. If yes then what should we do with vmstate_unregister_ram?
>>     - Just remove vmstate_unregister_ram (my RFC patch)
>>     - Refcount RAMBlock's migratable/non-migratable state
>>     - Something else?
>>  3. If it mustn't be possible, so may be
>>     assert(migration_is_idle()) in qemu_ram_free?
>>
>>  P.S.
>>  I'm working on a fix of below problem and trying to choose better way:
>>  allow device removing and fix all problem like this or fix a particular device.
>>
>>  --------
>>  How to reproduce device removing during migration:
>>
>>  1. Source QEMU command line (target is similar)
>>    $ x86_64-softmmu/qemu-system-x86_64 \
>>      -nodefaults -no-user-config -m 1024 -M q35 \
>>      -qmp unix:./src.sock,server,nowait \
>>      -drive file=./image,format=raw,if=virtio \
>>      -device ioh3420,id=pcie.1 \
>>      -device virtio-net,bus=pcie.1
>>  2. Start migration with slow speed (to simplify reproducing)
>>  3. Power off a device on the hotplug pcie.1 bus:
>>    $ echo 0 > /sys/bus/pci/slots/0/power
>>  4. Increase migration speed and wait until fail
>>
>>  Most likely you will get something like this:
>>    qemu-system-x86_64: get_pci_config_device: Bad config data:
>>            i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
>>    qemu-system-x86_64: Failed to load PCIDevice:config
>>    qemu-system-x86_64: Failed to load
>>            ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj
>>    qemu-system-x86_64: error while loading state for instance 0x0 of device
>>            '0000:00:03.0/ioh-3240-express-root-port'
>>    qemu-system-x86_64: load of migration failed: Invalid argument
>>
>>  This error is just an illustration of the removing device possibility,
>>  but not actually an illustration of the race conditions for removing RAMBlock.
>
> What path does this actually take - does it end up going via qdev_unplug
> or some other way?

1) Guest: writes to slot's pci config
2) QEMU: pcie_cap_slot_write_config -> pcie_unplug_device

So, it's only guest driven action and qdev_unplug doesn't help here.

>
> Dave
>
>>  Regards,
>>  Yury
>>
>>  Yury Kotov (1):
>>    migration: Remove vmstate_unregister_ram
>>
>>   hw/block/pflash_cfi01.c | 1 -
>>   hw/block/pflash_cfi02.c | 1 -
>>   hw/mem/pc-dimm.c | 5 -----
>>   hw/misc/ivshmem.c | 2 --
>>   hw/pci/pci.c | 1 -
>>   include/migration/vmstate.h | 1 -
>>   migration/savevm.c | 6 ------
>>   7 files changed, 17 deletions(-)
>>
>>  --
>>  2.24.0
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Regards,
Yury
Dr. David Alan Gilbert Jan. 3, 2020, 11:44 a.m. UTC | #3
* Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> Hi!
> 
> 11.12.2019, 14:17, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> > * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> >>  Hi,
> >>
> >>  I found that it's possible to remove a RAMBlock during migration.
> >>  E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
> >>  And I want to clarify whether RAMBlock removing (or even adding) during
> >>  migration is valid operation or it's a bug.
> >>
> >>  Currently, it may cause some race conditions with migration thread and
> >>  migration may fail because of them. For instance, vmstate_unregister_ram
> >>  function which is called during PCIe device removing does these:
> >>  - Memset idstr -> target may receive unknown/zeroed idstr -> migration fail
> >>  - Set RAMBlock flags as non-migratable -> migration fail
> >>
> >>  RAMBlock removing itself seems safe for migration thread because of RCU.
> >>  But it seems to me there are other possible race conditions (didn't test it):
> >>  - qemu_put_buffer_async -> saves pointer to RAMBlock's memory
> >>     -> block will be freed out of RCU (between ram save iterations)
> >>     -> qemu_fflush -> access to freed memory.
> >>
> >>  So, I have the following questions:
> >>  1. Is RAMBlock removing/adding OK during migration?
> >
> > I don't think that any hot(un)plug is safe during migration.
> > While it's true we hold RCUs as we walk lists, we can't hold the RCU
> > around the entire migration.
> 
> I agree. Currently, it's unsafe to do any hot(un)plug.
> But I thought (and wanted to clarify) it would be nice to make it safe.
> Hold the RCU around the entire migration is not the only way actually.
> For example, we can defer RAMBlock deletion: refcount RAMBlocks before
> migration and unref them after migration.

Yes, that might work.

> >
> > There's lots of other problems; for example we call the .save_setup
> > methods on devices at the start of migration, but then call the iterate
> > on those devices later - if the device is added/removed between stages
> > we'll end up either having done a setup and not calling the actual save,
> > or the other way around.
> 
> Hm... Yeah, that's a problem, thanks for mentioning it!
> 
> >
> > Juan added checks to qdev_device_add/qdev_unplug in b06424d ~2.5 years
> > ago.
> 
> I see that hot(un)plug during migration has many issues.
> But generally it has three groups (if I didn't miss something):
> 1) RAMBlock add/del
> 2) Device add/del
> 3) VMState add/del
> 
> IIUC, RAMBlocks are not always connected to some devices.
> So, in theory, it might become possible to hot(un)plug a block
> without hot adding/removing a device. It's why I wanted to clarify
> is there a sense to fix separately the problems related to RAMBlocks.

Possibly true.

> But, if you think there is no sense to fix all related problems
> to let hot(un)plugging during migration be allowed, I think we can add
> an assert(!migrate_is_idle()) in qemu_ram_free.

is_idle is probably the wrong thing, because of the new WAIT_UNPLUG
state that happens just after setup and is designed to allow vfio
devices to be unplugged before we actually start the guts of the
migration; however the idea makes sense.

> >>  2. If yes then what should we do with vmstate_unregister_ram?
> >>     - Just remove vmstate_unregister_ram (my RFC patch)
> >>     - Refcount RAMBlock's migratable/non-migratable state
> >>     - Something else?
> >>  3. If it mustn't be possible, so may be
> >>     assert(migration_is_idle()) in qemu_ram_free?
> >>
> >>  P.S.
> >>  I'm working on a fix of below problem and trying to choose better way:
> >>  allow device removing and fix all problem like this or fix a particular device.
> >>
> >>  --------
> >>  How to reproduce device removing during migration:
> >>
> >>  1. Source QEMU command line (target is similar)
> >>    $ x86_64-softmmu/qemu-system-x86_64 \
> >>      -nodefaults -no-user-config -m 1024 -M q35 \
> >>      -qmp unix:./src.sock,server,nowait \
> >>      -drive file=./image,format=raw,if=virtio \
> >>      -device ioh3420,id=pcie.1 \
> >>      -device virtio-net,bus=pcie.1
> >>  2. Start migration with slow speed (to simplify reproducing)
> >>  3. Power off a device on the hotplug pcie.1 bus:
> >>    $ echo 0 > /sys/bus/pci/slots/0/power
> >>  4. Increase migration speed and wait until fail
> >>
> >>  Most likely you will get something like this:
> >>    qemu-system-x86_64: get_pci_config_device: Bad config data:
> >>            i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> >>    qemu-system-x86_64: Failed to load PCIDevice:config
> >>    qemu-system-x86_64: Failed to load
> >>            ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj
> >>    qemu-system-x86_64: error while loading state for instance 0x0 of device
> >>            '0000:00:03.0/ioh-3240-express-root-port'
> >>    qemu-system-x86_64: load of migration failed: Invalid argument
> >>
> >>  This error is just an illustration of the removing device possibility,
> >>  but not actually an illustration of the race conditions for removing RAMBlock.
> >
> > What path does this actually take - does it end up going via qdev_unplug
> > or some other way?
> 
> 1) Guest: writes to slot's pci config
> 2) QEMU: pcie_cap_slot_write_config -> pcie_unplug_device
> 
> So, it's only guest driven action and qdev_unplug doesn't help here.

Hmm we need to find a way to stop that; lets see if Michael Tsirkin has
any ideas (cc'd) - I'm thinking if we could defer the unplug until the
end of the migration we'd be OK; but it feels racy as to whether the
destination is started with the device that the guest is unplugging.

Dave

> >
> > Dave
> >
> >>  Regards,
> >>  Yury
> >>
> >>  Yury Kotov (1):
> >>    migration: Remove vmstate_unregister_ram
> >>
> >>   hw/block/pflash_cfi01.c | 1 -
> >>   hw/block/pflash_cfi02.c | 1 -
> >>   hw/mem/pc-dimm.c | 5 -----
> >>   hw/misc/ivshmem.c | 2 --
> >>   hw/pci/pci.c | 1 -
> >>   include/migration/vmstate.h | 1 -
> >>   migration/savevm.c | 6 ------
> >>   7 files changed, 17 deletions(-)
> >>
> >>  --
> >>  2.24.0
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> Regards,
> Yury
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Michael S. Tsirkin Jan. 7, 2020, 8:02 p.m. UTC | #4
On Mon, Dec 09, 2019 at 10:41:01AM +0300, Yury Kotov wrote:
> Hi,
> 
> I found that it's possible to remove a RAMBlock during migration.
> E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
> And I want to clarify whether RAMBlock removing (or even adding) during
> migration is valid operation or it's a bug.

There's a very basic problem though: list of RAMBlock's on source and
destination must match otherwise destination will be confused.

It is probably fixable: keep a fake RAMBlock around until migration is
complete, and send some kind of "RAMBlock removed" message to
destination so it knows to remove it there as well.
Michael S. Tsirkin Jan. 7, 2020, 8:08 p.m. UTC | #5
On Fri, Jan 03, 2020 at 11:44:27AM +0000, Dr. David Alan Gilbert wrote:
> > 1) Guest: writes to slot's pci config
> > 2) QEMU: pcie_cap_slot_write_config -> pcie_unplug_device
> > 
> > So, it's only guest driven action and qdev_unplug doesn't help here.
> 
> Hmm we need to find a way to stop that; lets see if Michael Tsirkin has
> any ideas (cc'd) - I'm thinking if we could defer the unplug until the
> end of the migration we'd be OK; but it feels racy as to whether the
> destination is started with the device that the guest is unplugging.
> 
> Dave

It's true - and same is possible with PCI, guest can remove device
at will.

Also, while libvirt learned not to start device del while migration
is active, it's annoying to have to wait for device del
to complete before migration can be started.

I guess we can make device invisible to guest - that concept
now exists because of failover, and can maybe be reused here.

Alternatively or additionally - for a while now I wanted to only remove
the device if guest powered it out and removal was requested.  Again it
might be easier now that we have a concept of an invisible device.
Dr. David Alan Gilbert Jan. 8, 2020, 10:24 a.m. UTC | #6
* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Fri, Jan 03, 2020 at 11:44:27AM +0000, Dr. David Alan Gilbert wrote:
> > > 1) Guest: writes to slot's pci config
> > > 2) QEMU: pcie_cap_slot_write_config -> pcie_unplug_device
> > > 
> > > So, it's only guest driven action and qdev_unplug doesn't help here.
> > 
> > Hmm we need to find a way to stop that; lets see if Michael Tsirkin has
> > any ideas (cc'd) - I'm thinking if we could defer the unplug until the
> > end of the migration we'd be OK; but it feels racy as to whether the
> > destination is started with the device that the guest is unplugging.
> > 
> > Dave
> 
> It's true - and same is possible with PCI, guest can remove device
> at will.
> 
> Also, while libvirt learned not to start device del while migration
> is active, it's annoying to have to wait for device del
> to complete before migration can be started.
> 
> I guess we can make device invisible to guest - that concept
> now exists because of failover, and can maybe be reused here.
> 
> Alternatively or additionally - for a while now I wanted to only remove
> the device if guest powered it out and removal was requested.  Again it
> might be easier now that we have a concept of an invisible device.

How do invisible devices wrok? Is that something that each device has to
learn about?
Would we only make them invisible for migration and then do a full
hotunplug on the destination after the migration finishes?

Dave

> 
> 
> -- 
> MST
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Juan Quintela Jan. 8, 2020, 1:40 p.m. UTC | #7
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Dec 09, 2019 at 10:41:01AM +0300, Yury Kotov wrote:
>> Hi,
>> 
>> I found that it's possible to remove a RAMBlock during migration.
>> E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
>> And I want to clarify whether RAMBlock removing (or even adding) during
>> migration is valid operation or it's a bug.
>
> There's a very basic problem though: list of RAMBlock's on source and
> destination must match otherwise destination will be confused.
>
> It is probably fixable: keep a fake RAMBlock around until migration is
> complete, and send some kind of "RAMBlock removed" message to
> destination so it knows to remove it there as well.

Do we have the data to know that a device is unplugged?
I think that it would be just easier just queue that command.  Let the
guest unplug whatever it wants, and on destination, once that we know
that we have finish migration, we just do the rest of the staff.

As stated on the thread, we really want _not_ to have hotplug/unplug
during migration, but if a guest can initiate one, we can only "delay"
the bits that happen inside qemu.

Later, Juan.
Yury Kotov Jan. 13, 2020, 2:18 p.m. UTC | #8
Hi!

07.01.2020, 23:08, "Michael S. Tsirkin" <mst@redhat.com>:
> On Fri, Jan 03, 2020 at 11:44:27AM +0000, Dr. David Alan Gilbert wrote:
>>  > 1) Guest: writes to slot's pci config
>>  > 2) QEMU: pcie_cap_slot_write_config -> pcie_unplug_device
>>  >
>>  > So, it's only guest driven action and qdev_unplug doesn't help here.
>>
>>  Hmm we need to find a way to stop that; lets see if Michael Tsirkin has
>>  any ideas (cc'd) - I'm thinking if we could defer the unplug until the
>>  end of the migration we'd be OK; but it feels racy as to whether the
>>  destination is started with the device that the guest is unplugging.
>>
>>  Dave
>
> It's true - and same is possible with PCI, guest can remove device
> at will.
>
> Also, while libvirt learned not to start device del while migration
> is active, it's annoying to have to wait for device del
> to complete before migration can be started.
>
> I guess we can make device invisible to guest - that concept
> now exists because of failover, and can maybe be reused here.
>
> Alternatively or additionally - for a while now I wanted to only remove
> the device if guest powered it out and removal was requested. Again it
> might be easier now that we have a concept of an invisible device.
>

I sent an rfc patch that implements deferred device unplug:
  pcie: Defer hot unplug until migration is complete

Please take a look at it.

Regards,
Yury