[v3,0/10] add failover feature for assigned network devices

Message ID	20191011112015.11785-1-jfreimann@redhat.com (mailing list archive)
Headers	show Return-Path: <SRS0=9NZu=YE=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 43C472190F From: Jens Freimann <jfreimann@redhat.com> To: qemu-devel@nongnu.org Subject: [PATCH v3 0/10] add failover feature for assigned network devices Date: Fri, 11 Oct 2019 13:20:05 +0200 Message-Id: <20191011112015.11785-1-jfreimann@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: list Cc: parav@mellanox.com, mst@redhat.com, aadam@redhat.com, dgilbert@redhat.com, alex.williamson@redhat.com, laine@redhat.com, ailan@redhat.com, ehabkost@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	add failover feature for assigned network devices \| expand [v3,0/10] add failover feature for assigned network devices [v3,01/10] qdev/qbus: add hidden device support [v3,02/10] pci: mark devices partially unplugged [v3,03/10] pci: mark device having guest unplug request pending [v3,04/10] qapi: add unplug primary event [v3,05/10] qapi: add failover negotiated event [v3,06/10] migration: allow unplug during migration for failover devices [v3,07/10] migration: add new migration state wait-unplug [v3,08/10] libqos: tolerate wait-unplug migration state [v3,09/10] net/virtio: add failover support [v3,10/10] vfio: unplug failover primary device before migration

Jens Freimann Oct. 11, 2019, 11:20 a.m. UTC

This is implementing the host side of the net_failover concept
(https://www.kernel.org/doc/html/latest/networking/net_failover.html)

Changes since v2:
* back out of creating failover pair when it is a non-networking
  vfio-pci device (Alex W)
* handle migration state change from within the migration thread. I do a
  timed wait on a semaphore and then check if all unplugs were
  succesful. Added a new function to each device that checks the device
  if the unplug for it has happened. When all devices report the succesful
  unplug *or* the time/retries is up, continue with the migration or
  cancel. When not all devices could be unplugged I am cancelling at the
  moment. It is likely that we can't plug it back at the destination which
  would result in degraded network performance.
* fix a few bugs regarding re-plug on migration source and target 
* run full set of tests including migration tests
* add patch for libqos to tolerate new migration state
* squashed patch 1 and 2, added patch 8 
 
The general idea is that we have a pair of devices, a vfio-pci and a
virtio-net device. Before migration the vfio device is unplugged and data
flows to the virtio-net device, on the target side another vfio-pci device
is plugged in to take over the data-path. In the guest the net_failover
module will pair net devices with the same MAC address.

* Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs

* Patch 2 sets a new flag for PCIDevice 'partially_hotplugged' which we
  use to skip the unrealize code path when doing a unplug of the primary
  device

* Patch 3 sets the pending_deleted_event before triggering the guest
  unplug request

* Patch 4 and 5 add new qmp events, one sends the device id of a device
  that was just requested to be unplugged from the guest and another one
  to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated

* Patch 6 make sure that we can unplug the vfio-device before
  migration starts

* Patch 7 adds a new migration state that is entered while we wait for
  devices to be unplugged by guest OS

* Patch 8 just adds the new migration state to a check in libqos code

* Patch 9 In the second patch the virtio-net uses the API to defer adding the vfio
  device until the VIRTIO_NET_F_STANDBY feature is acked. It also
  implements the migration handler to unplug the device from the guest and
  re-plug in case of migration failure

* Patch 10 allows migration for failover vfio-pci devices, but only when it is
  a networking device 

Previous discussion:
  RFC v1 https://patchwork.ozlabs.org/cover/989098/
  RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
  v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
  v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html

To summarize concerns/feedback from previous discussion:
1.- guest OS can reject or worse _delay_ unplug by any amount of time.
  Migration might get stuck for unpredictable time with unclear reason.
  This approach combines two tricky things, hot/unplug and migration.
  -> We need to let libvirt know what's happening. Add new qmp events
     and a new migration state. When a primary device is (partially)
     unplugged (only from guest) we send a qmp event with the device id. When
     it is unplugged from the guest the DEVICE_DELETED event is sent.
     Migration will enter the wait-unplug state while waiting for the guest
     os to unplug all primary devices and then move on with migration.
2. PCI devices are a precious ressource. The primary device should never
  be added to QEMU if it won't be used by guest instead of hiding it in
  QEMU.
  -> We only hotplug the device when the standby feature bit was
     negotiated. We save the device cmdline options until we need it for
     qdev_device_add()
     Hiding a device can be a useful concept to model. For example a
     pci device in a powered-off slot could be marked as hidden until the slot is
     powered on (mst).
3. Management layer software should handle this. Open Stack already has
  components/code to handle unplug/replug VFIO devices and metadata to
  provide to the guest for detecting which devices should be paired.
  -> An approach that includes all software from firmware to
     higher-level management software wasn't tried in the last years. This is
     an attempt to keep it simple and contained in QEMU as much as possible.
     One of the problems that stopped management software and libvirt from
     implementing this idea is that it can't be sure that it's possible to
     re-plug the primary device. By not freeing the devices resources in QEMU
     and only asking the guest OS to unplug it is possible to re-plug the
     device in case of a migration failure.
4. Hotplugging a device and then making it part of a failover setup is
   not possible
  -> addressed by extending qdev hotplug functions to check for hidden
     attribute, so e.g. device_add can be used to plug a device.


I have tested this with a mlx5 and igb NIC and was able to migrate the VM.

Command line example:

qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
        -machine q35,kernel-irqchip=split -cpu host   \
        -serial stdio   \
        -net none \
        -qmp unix:/tmp/qmp.socket,server,nowait \
        -monitor telnet:127.0.0.1:5555,server,nowait \
        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
	-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,net_failover_pair_id =net1 \
        /root/rhel-guest-image-8.0-1781.x86_64.qcow2

I'm grateful for any remarks or ideas!

Thanks!

regards,
Jens 


Jens Freimann (10):
  qdev/qbus: add hidden device support
  pci: mark devices partially unplugged
  pci: mark device having guest unplug request pending
  qapi: add unplug primary event
  qapi: add failover negotiated event
  migration: allow unplug during migration for failover devices
  migration: add new migration state wait-unplug
  libqos: tolerate wait-unplug migration state
  net/virtio: add failover support
  vfio: unplug failover primary device before migration

 hw/core/qdev.c                 |  20 +++
 hw/net/virtio-net.c            | 267 +++++++++++++++++++++++++++++++++
 hw/pci/pci.c                   |   2 +
 hw/pci/pcie.c                  |   6 +
 hw/vfio/pci.c                  |  35 ++++-
 hw/vfio/pci.h                  |   2 +
 include/hw/pci/pci.h           |   1 +
 include/hw/qdev-core.h         |  10 ++
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h     |   1 +
 include/migration/vmstate.h    |   2 +
 migration/migration.c          |  34 +++++
 migration/migration.h          |   3 +
 migration/savevm.c             |  36 +++++
 migration/savevm.h             |   1 +
 qapi/migration.json            |  24 ++-
 qapi/net.json                  |  16 ++
 qdev-monitor.c                 |  43 +++++-
 tests/libqos/libqos.c          |   3 +-
 vl.c                           |   6 +-
 20 files changed, 515 insertions(+), 9 deletions(-)

Michael S. Tsirkin Oct. 11, 2019, 2:28 p.m. UTC | #1

On Fri, Oct 11, 2019 at 01:20:05PM +0200, Jens Freimann wrote:
> This is implementing the host side of the net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)


I posted a comment about the migration changes.
Besides that this looks good to me.
I did not look at VFIO parts at all yet.
Alex, could you pls review/ack the VFIO patch?


> Changes since v2:
> * back out of creating failover pair when it is a non-networking
>   vfio-pci device (Alex W)
> * handle migration state change from within the migration thread. I do a
>   timed wait on a semaphore and then check if all unplugs were
>   succesful. Added a new function to each device that checks the device
>   if the unplug for it has happened. When all devices report the succesful
>   unplug *or* the time/retries is up, continue with the migration or
>   cancel. When not all devices could be unplugged I am cancelling at the
>   moment. It is likely that we can't plug it back at the destination which
>   would result in degraded network performance.
> * fix a few bugs regarding re-plug on migration source and target 
> * run full set of tests including migration tests
> * add patch for libqos to tolerate new migration state
> * squashed patch 1 and 2, added patch 8 
>  
> The general idea is that we have a pair of devices, a vfio-pci and a
> virtio-net device. Before migration the vfio device is unplugged and data
> flows to the virtio-net device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
> 
> * Patch 2 sets a new flag for PCIDevice 'partially_hotplugged' which we
>   use to skip the unrealize code path when doing a unplug of the primary
>   device
> 
> * Patch 3 sets the pending_deleted_event before triggering the guest
>   unplug request
> 
> * Patch 4 and 5 add new qmp events, one sends the device id of a device
>   that was just requested to be unplugged from the guest and another one
>   to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated
> 
> * Patch 6 make sure that we can unplug the vfio-device before
>   migration starts
> 
> * Patch 7 adds a new migration state that is entered while we wait for
>   devices to be unplugged by guest OS
> 
> * Patch 8 just adds the new migration state to a check in libqos code
> 
> * Patch 9 In the second patch the virtio-net uses the API to defer adding the vfio
>   device until the VIRTIO_NET_F_STANDBY feature is acked. It also
>   implements the migration handler to unplug the device from the guest and
>   re-plug in case of migration failure
> 
> * Patch 10 allows migration for failover vfio-pci devices, but only when it is
>   a networking device 
> 
> Previous discussion:
>   RFC v1 https://patchwork.ozlabs.org/cover/989098/
>   RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
>   v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
> 
> To summarize concerns/feedback from previous discussion:
> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>   Migration might get stuck for unpredictable time with unclear reason.
>   This approach combines two tricky things, hot/unplug and migration.
>   -> We need to let libvirt know what's happening. Add new qmp events
>      and a new migration state. When a primary device is (partially)
>      unplugged (only from guest) we send a qmp event with the device id. When
>      it is unplugged from the guest the DEVICE_DELETED event is sent.
>      Migration will enter the wait-unplug state while waiting for the guest
>      os to unplug all primary devices and then move on with migration.
> 2. PCI devices are a precious ressource. The primary device should never
>   be added to QEMU if it won't be used by guest instead of hiding it in
>   QEMU.
>   -> We only hotplug the device when the standby feature bit was
>      negotiated. We save the device cmdline options until we need it for
>      qdev_device_add()
>      Hiding a device can be a useful concept to model. For example a
>      pci device in a powered-off slot could be marked as hidden until the slot is
>      powered on (mst).
> 3. Management layer software should handle this. Open Stack already has
>   components/code to handle unplug/replug VFIO devices and metadata to
>   provide to the guest for detecting which devices should be paired.
>   -> An approach that includes all software from firmware to
>      higher-level management software wasn't tried in the last years. This is
>      an attempt to keep it simple and contained in QEMU as much as possible.
>      One of the problems that stopped management software and libvirt from
>      implementing this idea is that it can't be sure that it's possible to
>      re-plug the primary device. By not freeing the devices resources in QEMU
>      and only asking the guest OS to unplug it is possible to re-plug the
>      device in case of a migration failure.
> 4. Hotplugging a device and then making it part of a failover setup is
>    not possible
>   -> addressed by extending qdev hotplug functions to check for hidden
>      attribute, so e.g. device_add can be used to plug a device.
> 
> 
> I have tested this with a mlx5 and igb NIC and was able to migrate the VM.
> 
> Command line example:
> 
> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>         -machine q35,kernel-irqchip=split -cpu host   \
>         -serial stdio   \
>         -net none \
>         -qmp unix:/tmp/qmp.socket,server,nowait \
>         -monitor telnet:127.0.0.1:5555,server,nowait \
>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
> 	-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,net_failover_pair_id =net1 \
>         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> 
> I'm grateful for any remarks or ideas!
> 
> Thanks!
> 
> regards,
> Jens 
> 
> 
> Jens Freimann (10):
>   qdev/qbus: add hidden device support
>   pci: mark devices partially unplugged
>   pci: mark device having guest unplug request pending
>   qapi: add unplug primary event
>   qapi: add failover negotiated event
>   migration: allow unplug during migration for failover devices
>   migration: add new migration state wait-unplug
>   libqos: tolerate wait-unplug migration state
>   net/virtio: add failover support
>   vfio: unplug failover primary device before migration
> 
>  hw/core/qdev.c                 |  20 +++
>  hw/net/virtio-net.c            | 267 +++++++++++++++++++++++++++++++++
>  hw/pci/pci.c                   |   2 +
>  hw/pci/pcie.c                  |   6 +
>  hw/vfio/pci.c                  |  35 ++++-
>  hw/vfio/pci.h                  |   2 +
>  include/hw/pci/pci.h           |   1 +
>  include/hw/qdev-core.h         |  10 ++
>  include/hw/virtio/virtio-net.h |  12 ++
>  include/hw/virtio/virtio.h     |   1 +
>  include/migration/vmstate.h    |   2 +
>  migration/migration.c          |  34 +++++
>  migration/migration.h          |   3 +
>  migration/savevm.c             |  36 +++++
>  migration/savevm.h             |   1 +
>  qapi/migration.json            |  24 ++-
>  qapi/net.json                  |  16 ++
>  qdev-monitor.c                 |  43 +++++-
>  tests/libqos/libqos.c          |   3 +-
>  vl.c                           |   6 +-
>  20 files changed, 515 insertions(+), 9 deletions(-)
> 
> -- 
> 2.21.0

no-reply@patchew.org Oct. 11, 2019, 4:04 p.m. UTC | #2

Patchew URL: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TEST    iotest-qcow2: 025
  TEST    iotest-qcow2: 027
**
ERROR:/tmp/qemu-test/src/tests/migration-test.c:903:wait_for_migration_fail: assertion failed: (!strcmp(status, "setup") || !strcmp(status, "failed") || (allow_active && !strcmp(status, "active")))
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/migration-test.c:903:wait_for_migration_fail: assertion failed: (!strcmp(status, "setup") || !strcmp(status, "failed") || (allow_active && !strcmp(status, "active")))
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 029
  TEST    check-unit: tests/test-qdist
---
  TEST    iotest-qcow2: 172
  TEST    iotest-qcow2: 174
**
ERROR:/tmp/qemu-test/src/tests/migration-test.c:903:wait_for_migration_fail: assertion failed: (!strcmp(status, "setup") || !strcmp(status, "failed") || (allow_active && !strcmp(status, "active")))
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/migration-test.c:903:wait_for_migration_fail: assertion failed: (!strcmp(status, "setup") || !strcmp(status, "failed") || (allow_active && !strcmp(status, "active")))
make: *** [check-qtest-x86_64] Error 1
  TEST    iotest-qcow2: 176
  TEST    iotest-qcow2: 177
  TEST    iotest-qcow2: 179
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=8a50cfce678f45bfb554befdb722504e', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-lrrx9ztr/src/docker-src.2019-10-11-11.53.24.30391:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=8a50cfce678f45bfb554befdb722504e
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-lrrx9ztr/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    10m49.386s
user    0m8.737s


The full log is available at
http://patchew.org/logs/20191011112015.11785-1-jfreimann@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

Alex Williamson Oct. 15, 2019, 7:03 p.m. UTC | #3

On Fri, 11 Oct 2019 13:20:05 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> This is implementing the host side of the net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> 
> Changes since v2:
> * back out of creating failover pair when it is a non-networking
>   vfio-pci device (Alex W)
> * handle migration state change from within the migration thread. I do a
>   timed wait on a semaphore and then check if all unplugs were
>   succesful. Added a new function to each device that checks the device
>   if the unplug for it has happened. When all devices report the succesful
>   unplug *or* the time/retries is up, continue with the migration or
>   cancel. When not all devices could be unplugged I am cancelling at the
>   moment. It is likely that we can't plug it back at the destination which
>   would result in degraded network performance.
> * fix a few bugs regarding re-plug on migration source and target 
> * run full set of tests including migration tests
> * add patch for libqos to tolerate new migration state
> * squashed patch 1 and 2, added patch 8 
>  
> The general idea is that we have a pair of devices, a vfio-pci and a
> virtio-net device. Before migration the vfio device is unplugged and data
> flows to the virtio-net device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
> 
> * Patch 2 sets a new flag for PCIDevice 'partially_hotplugged' which we
>   use to skip the unrealize code path when doing a unplug of the primary
>   device
> 
> * Patch 3 sets the pending_deleted_event before triggering the guest
>   unplug request

These only cover pcie hotplug, is this feature somehow dependent on
pcie?  There's also ACPI-based PCI hotplug, SHPC hotplug, and it looks
like s390 has it's own version (of course) of PCI hotplug.  IMO, we
either need to make an attempt to support this universally or the
option needs to fail if the hotplug controller doesn't support partial
removal.  Thanks,

Alex

Michael S. Tsirkin Oct. 15, 2019, 9:17 p.m. UTC | #4

On Tue, Oct 15, 2019 at 01:03:17PM -0600, Alex Williamson wrote:
> On Fri, 11 Oct 2019 13:20:05 +0200
> Jens Freimann <jfreimann@redhat.com> wrote:
> 
> > This is implementing the host side of the net_failover concept
> > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > 
> > Changes since v2:
> > * back out of creating failover pair when it is a non-networking
> >   vfio-pci device (Alex W)
> > * handle migration state change from within the migration thread. I do a
> >   timed wait on a semaphore and then check if all unplugs were
> >   succesful. Added a new function to each device that checks the device
> >   if the unplug for it has happened. When all devices report the succesful
> >   unplug *or* the time/retries is up, continue with the migration or
> >   cancel. When not all devices could be unplugged I am cancelling at the
> >   moment. It is likely that we can't plug it back at the destination which
> >   would result in degraded network performance.
> > * fix a few bugs regarding re-plug on migration source and target 
> > * run full set of tests including migration tests
> > * add patch for libqos to tolerate new migration state
> > * squashed patch 1 and 2, added patch 8 
> >  
> > The general idea is that we have a pair of devices, a vfio-pci and a
> > virtio-net device. Before migration the vfio device is unplugged and data
> > flows to the virtio-net device, on the target side another vfio-pci device
> > is plugged in to take over the data-path. In the guest the net_failover
> > module will pair net devices with the same MAC address.
> > 
> > * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
> > 
> > * Patch 2 sets a new flag for PCIDevice 'partially_hotplugged' which we
> >   use to skip the unrealize code path when doing a unplug of the primary
> >   device
> > 
> > * Patch 3 sets the pending_deleted_event before triggering the guest
> >   unplug request
> 
> These only cover pcie hotplug, is this feature somehow dependent on
> pcie?  There's also ACPI-based PCI hotplug, SHPC hotplug, and it looks
> like s390 has it's own version (of course) of PCI hotplug.  IMO, we
> either need to make an attempt to support this universally or the
> option needs to fail if the hotplug controller doesn't support partial
> removal.  Thanks,
> 
> Alex


Alex, could you please comment a bit more on vfio patches?
Besides what you point out here, any other issues?

Jens Freimann Oct. 17, 2019, 10:33 a.m. UTC | #5

On Tue, Oct 15, 2019 at 01:03:17PM -0600, Alex Williamson wrote:
>On Fri, 11 Oct 2019 13:20:05 +0200
>Jens Freimann <jfreimann@redhat.com> wrote:
>
>> This is implementing the host side of the net_failover concept
>> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
>>
>> Changes since v2:
>> * back out of creating failover pair when it is a non-networking
>>   vfio-pci device (Alex W)
>> * handle migration state change from within the migration thread. I do a
>>   timed wait on a semaphore and then check if all unplugs were
>>   succesful. Added a new function to each device that checks the device
>>   if the unplug for it has happened. When all devices report the succesful
>>   unplug *or* the time/retries is up, continue with the migration or
>>   cancel. When not all devices could be unplugged I am cancelling at the
>>   moment. It is likely that we can't plug it back at the destination which
>>   would result in degraded network performance.
>> * fix a few bugs regarding re-plug on migration source and target
>> * run full set of tests including migration tests
>> * add patch for libqos to tolerate new migration state
>> * squashed patch 1 and 2, added patch 8
>>
>> The general idea is that we have a pair of devices, a vfio-pci and a
>> virtio-net device. Before migration the vfio device is unplugged and data
>> flows to the virtio-net device, on the target side another vfio-pci device
>> is plugged in to take over the data-path. In the guest the net_failover
>> module will pair net devices with the same MAC address.
>>
>> * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
>>
>> * Patch 2 sets a new flag for PCIDevice 'partially_hotplugged' which we
>>   use to skip the unrealize code path when doing a unplug of the primary
>>   device
>>
>> * Patch 3 sets the pending_deleted_event before triggering the guest
>>   unplug request
>
>These only cover pcie hotplug, is this feature somehow dependent on
>pcie?  There's also ACPI-based PCI hotplug, SHPC hotplug, and it looks
>like s390 has it's own version (of course) of PCI hotplug.  IMO, we
>either need to make an attempt to support this universally or the
>option needs to fail if the hotplug controller doesn't support partial
>removal.  Thanks,

It is possible to make it work with non-pcie hotplug but as the first
step I want to enable it for pcie only. For that I would add a check
into pci_qdev_realize(), where I also check if the device is an
ethernet device, and fail if it is not a pcie device. Would that work
for you?

regards,
Jens

Alex Williamson Oct. 17, 2019, 12:51 p.m. UTC | #6

On Thu, 17 Oct 2019 12:33:47 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> On Tue, Oct 15, 2019 at 01:03:17PM -0600, Alex Williamson wrote:
> >On Fri, 11 Oct 2019 13:20:05 +0200
> >Jens Freimann <jfreimann@redhat.com> wrote:
> >  
> >> This is implementing the host side of the net_failover concept
> >> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> >>
> >> Changes since v2:
> >> * back out of creating failover pair when it is a non-networking
> >>   vfio-pci device (Alex W)
> >> * handle migration state change from within the migration thread. I do a
> >>   timed wait on a semaphore and then check if all unplugs were
> >>   succesful. Added a new function to each device that checks the device
> >>   if the unplug for it has happened. When all devices report the succesful
> >>   unplug *or* the time/retries is up, continue with the migration or
> >>   cancel. When not all devices could be unplugged I am cancelling at the
> >>   moment. It is likely that we can't plug it back at the destination which
> >>   would result in degraded network performance.
> >> * fix a few bugs regarding re-plug on migration source and target
> >> * run full set of tests including migration tests
> >> * add patch for libqos to tolerate new migration state
> >> * squashed patch 1 and 2, added patch 8
> >>
> >> The general idea is that we have a pair of devices, a vfio-pci and a
> >> virtio-net device. Before migration the vfio device is unplugged and data
> >> flows to the virtio-net device, on the target side another vfio-pci device
> >> is plugged in to take over the data-path. In the guest the net_failover
> >> module will pair net devices with the same MAC address.
> >>
> >> * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
> >>
> >> * Patch 2 sets a new flag for PCIDevice 'partially_hotplugged' which we
> >>   use to skip the unrealize code path when doing a unplug of the primary
> >>   device
> >>
> >> * Patch 3 sets the pending_deleted_event before triggering the guest
> >>   unplug request  
> >
> >These only cover pcie hotplug, is this feature somehow dependent on
> >pcie?  There's also ACPI-based PCI hotplug, SHPC hotplug, and it looks
> >like s390 has it's own version (of course) of PCI hotplug.  IMO, we
> >either need to make an attempt to support this universally or the
> >option needs to fail if the hotplug controller doesn't support partial
> >removal.  Thanks,  
> 
> It is possible to make it work with non-pcie hotplug but as the first
> step I want to enable it for pcie only. For that I would add a check
> into pci_qdev_realize(), where I also check if the device is an
> ethernet device, and fail if it is not a pcie device. Would that work
> for you?

How would libvirt introspect what topologies are supported rather than
trial and error?  I think this solves my issue that I get bugs that the
failover pair option doesn't work on vfio-pci depending on the
topology, but it really just pushes the problem up the stack.  Thanks,

Alex

Jens Freimann Oct. 17, 2019, 2:04 p.m. UTC | #7

On Thu, Oct 17, 2019 at 06:51:54AM -0600, Alex Williamson wrote:
>On Thu, 17 Oct 2019 12:33:47 +0200
>Jens Freimann <jfreimann@redhat.com> wrote:
>
>> On Tue, Oct 15, 2019 at 01:03:17PM -0600, Alex Williamson wrote:
>> >On Fri, 11 Oct 2019 13:20:05 +0200
>> >Jens Freimann <jfreimann@redhat.com> wrote:
>> >
>> >> This is implementing the host side of the net_failover concept
>> >> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
>> >>
>> >> Changes since v2:
>> >> * back out of creating failover pair when it is a non-networking
>> >>   vfio-pci device (Alex W)
>> >> * handle migration state change from within the migration thread. I do a
>> >>   timed wait on a semaphore and then check if all unplugs were
>> >>   succesful. Added a new function to each device that checks the device
>> >>   if the unplug for it has happened. When all devices report the succesful
>> >>   unplug *or* the time/retries is up, continue with the migration or
>> >>   cancel. When not all devices could be unplugged I am cancelling at the
>> >>   moment. It is likely that we can't plug it back at the destination which
>> >>   would result in degraded network performance.
>> >> * fix a few bugs regarding re-plug on migration source and target
>> >> * run full set of tests including migration tests
>> >> * add patch for libqos to tolerate new migration state
>> >> * squashed patch 1 and 2, added patch 8
>> >>
>> >> The general idea is that we have a pair of devices, a vfio-pci and a
>> >> virtio-net device. Before migration the vfio device is unplugged and data
>> >> flows to the virtio-net device, on the target side another vfio-pci device
>> >> is plugged in to take over the data-path. In the guest the net_failover
>> >> module will pair net devices with the same MAC address.
>> >>
>> >> * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
>> >>
>> >> * Patch 2 sets a new flag for PCIDevice 'partially_hotplugged' which we
>> >>   use to skip the unrealize code path when doing a unplug of the primary
>> >>   device
>> >>
>> >> * Patch 3 sets the pending_deleted_event before triggering the guest
>> >>   unplug request
>> >
>> >These only cover pcie hotplug, is this feature somehow dependent on
>> >pcie?  There's also ACPI-based PCI hotplug, SHPC hotplug, and it looks
>> >like s390 has it's own version (of course) of PCI hotplug.  IMO, we
>> >either need to make an attempt to support this universally or the
>> >option needs to fail if the hotplug controller doesn't support partial
>> >removal.  Thanks,
>>
>> It is possible to make it work with non-pcie hotplug but as the first
>> step I want to enable it for pcie only. For that I would add a check
>> into pci_qdev_realize(), where I also check if the device is an
>> ethernet device, and fail if it is not a pcie device. Would that work
>> for you?
>
>How would libvirt introspect what topologies are supported rather than
>trial and error?  I think this solves my issue that I get bugs that the
>failover pair option doesn't work on vfio-pci depending on the
>topology, but it really just pushes the problem up the stack.  Thanks,

Good point Alex. Laine, any idea what would be required for this?
How does libvirt introspect other properties of PCI devices if any?
I only found checks for commandline options in libvirt, but I might
miss something.

regards,
Jens

[v3,0/10] add failover feature for assigned network devices

Message

Comments