Message ID | cover.1724701542.git.maciej.szmigiero@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | Multifd | expand |
"Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes: > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> > > This is an updated v2 patch series of the v1 series located here: > https://lore.kernel.org/qemu-devel/cover.1718717584.git.maciej.szmigiero@oracle.com/ > > Changes from v1: > * Extended the QEMU thread-pool with non-AIO (generic) pool support, > implemented automatic memory management support for its work element > function argument. > > * Introduced a multifd device state save thread pool, ported the VFIO > multifd device state save implementation to use this thread pool instead > of VFIO internally managed individual threads. > > * Re-implemented on top of Fabiano's v4 multifd sender refactor patch set from > https://lore.kernel.org/qemu-devel/20240823173911.6712-1-farosas@suse.de/ > > * Moved device state related multifd code to new multifd-device-state.c > file where it made sense. > > * Implemented a max in-flight VFIO device state buffer count limit to > allow capping the maximum recipient memory usage. > > * Removed unnecessary explicit memory barriers from multifd_send(). > > * A few small changes like updated comments, code formatting, > fixed zero-copy RAM multifd bytes transferred counter under-counting, etc. > > > For convenience, this patch set is also available as a git tree: > https://github.com/maciejsszmigiero/qemu/tree/multifd-device-state-transfer-vfio With this branch I'm getting: $ QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/uri/plain/none ... qemu-system-x86_64: ../util/thread-pool.c:354: thread_pool_set_minmax_threads: Assertion `max_threads > 0' failed. Broken pipe $ ./tests/qemu-iotests/check -p -qcow2 068 ... +qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.
On 28.08.2024 22:46, Fabiano Rosas wrote: > "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes: > >> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >> >> This is an updated v2 patch series of the v1 series located here: >> https://lore.kernel.org/qemu-devel/cover.1718717584.git.maciej.szmigiero@oracle.com/ >> >> Changes from v1: >> * Extended the QEMU thread-pool with non-AIO (generic) pool support, >> implemented automatic memory management support for its work element >> function argument. >> >> * Introduced a multifd device state save thread pool, ported the VFIO >> multifd device state save implementation to use this thread pool instead >> of VFIO internally managed individual threads. >> >> * Re-implemented on top of Fabiano's v4 multifd sender refactor patch set from >> https://lore.kernel.org/qemu-devel/20240823173911.6712-1-farosas@suse.de/ >> >> * Moved device state related multifd code to new multifd-device-state.c >> file where it made sense. >> >> * Implemented a max in-flight VFIO device state buffer count limit to >> allow capping the maximum recipient memory usage. >> >> * Removed unnecessary explicit memory barriers from multifd_send(). >> >> * A few small changes like updated comments, code formatting, >> fixed zero-copy RAM multifd bytes transferred counter under-counting, etc. >> >> >> For convenience, this patch set is also available as a git tree: >> https://github.com/maciejsszmigiero/qemu/tree/multifd-device-state-transfer-vfio > > With this branch I'm getting: > > $ QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/uri/plain/none > ... > qemu-system-x86_64: ../util/thread-pool.c:354: thread_pool_set_minmax_threads: Assertion `max_threads > 0' failed. > Broken pipe > Oops, I should have tested this patch set in setups without any VFIO devices too. Fixed this now (together with that RAM tracepoint thing) and updated the GitHub tree - the above test now passes. Tomorrow I will test the whole multifd VFIO migration once again to be sure. > $ ./tests/qemu-iotests/check -p -qcow2 068 > ... > +qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. > I'm not sure how this can happen - it looks like qemu_loadvm_state() might be called somehow after migration_incoming_state_destroy() already destroyed the migration state? Will investigate this in detail tomorrow. By the way, this test seems to not be run by the default "make check". Thanks, Maciej
"Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes: > On 28.08.2024 22:46, Fabiano Rosas wrote: >> "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes: >> >>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>> >>> This is an updated v2 patch series of the v1 series located here: >>> https://lore.kernel.org/qemu-devel/cover.1718717584.git.maciej.szmigiero@oracle.com/ >>> >>> Changes from v1: >>> * Extended the QEMU thread-pool with non-AIO (generic) pool support, >>> implemented automatic memory management support for its work element >>> function argument. >>> >>> * Introduced a multifd device state save thread pool, ported the VFIO >>> multifd device state save implementation to use this thread pool instead >>> of VFIO internally managed individual threads. >>> >>> * Re-implemented on top of Fabiano's v4 multifd sender refactor patch set from >>> https://lore.kernel.org/qemu-devel/20240823173911.6712-1-farosas@suse.de/ >>> >>> * Moved device state related multifd code to new multifd-device-state.c >>> file where it made sense. >>> >>> * Implemented a max in-flight VFIO device state buffer count limit to >>> allow capping the maximum recipient memory usage. >>> >>> * Removed unnecessary explicit memory barriers from multifd_send(). >>> >>> * A few small changes like updated comments, code formatting, >>> fixed zero-copy RAM multifd bytes transferred counter under-counting, etc. >>> >>> >>> For convenience, this patch set is also available as a git tree: >>> https://github.com/maciejsszmigiero/qemu/tree/multifd-device-state-transfer-vfio >> >> With this branch I'm getting: >> >> $ QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/uri/plain/none >> ... >> qemu-system-x86_64: ../util/thread-pool.c:354: thread_pool_set_minmax_threads: Assertion `max_threads > 0' failed. >> Broken pipe >> > > Oops, I should have tested this patch set in setups without any VFIO devices too. > > Fixed this now (together with that RAM tracepoint thing) and updated the GitHub tree - > the above test now passes. > > Tomorrow I will test the whole multifd VFIO migration once again to be sure. > >> $ ./tests/qemu-iotests/check -p -qcow2 068 >> ... >> +qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. >> > > I'm not sure how this can happen - it looks like qemu_loadvm_state() might be called > somehow after migration_incoming_state_destroy() already destroyed the migration state? > Will investigate this in detail tomorrow. Usually something breaks and then the clean up code rushes and frees state while other parts are still using it. We also had issues recently with code not incrementing the migration state refcount properly: 27eb8499ed ("migration: Fix use-after-free of migration state object") > > By the way, this test seems to not be run by the default "make check". > > Thanks, > Maciej
On 29.08.2024 02:51, Fabiano Rosas wrote: > "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes: > >> On 28.08.2024 22:46, Fabiano Rosas wrote: >>> "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes: >>> >>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>>> >>>> This is an updated v2 patch series of the v1 series located here: >>>> https://lore.kernel.org/qemu-devel/cover.1718717584.git.maciej.szmigiero@oracle.com/ >>>> >>>> Changes from v1: >>>> * Extended the QEMU thread-pool with non-AIO (generic) pool support, >>>> implemented automatic memory management support for its work element >>>> function argument. >>>> >>>> * Introduced a multifd device state save thread pool, ported the VFIO >>>> multifd device state save implementation to use this thread pool instead >>>> of VFIO internally managed individual threads. >>>> >>>> * Re-implemented on top of Fabiano's v4 multifd sender refactor patch set from >>>> https://lore.kernel.org/qemu-devel/20240823173911.6712-1-farosas@suse.de/ >>>> >>>> * Moved device state related multifd code to new multifd-device-state.c >>>> file where it made sense. >>>> >>>> * Implemented a max in-flight VFIO device state buffer count limit to >>>> allow capping the maximum recipient memory usage. >>>> >>>> * Removed unnecessary explicit memory barriers from multifd_send(). >>>> >>>> * A few small changes like updated comments, code formatting, >>>> fixed zero-copy RAM multifd bytes transferred counter under-counting, etc. >>>> >>>> >>>> For convenience, this patch set is also available as a git tree: >>>> https://github.com/maciejsszmigiero/qemu/tree/multifd-device-state-transfer-vfio >>> >>> With this branch I'm getting: >>> (..) >>> $ ./tests/qemu-iotests/check -p -qcow2 068 >>> ... >>> +qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. >>> >> >> I'm not sure how this can happen - it looks like qemu_loadvm_state() might be called >> somehow after migration_incoming_state_destroy() already destroyed the migration state? >> Will investigate this in detail tomorrow. > > Usually something breaks and then the clean up code rushes and frees > state while other parts are still using it. > > We also had issues recently with code not incrementing the migration > state refcount properly: > > 27eb8499ed ("migration: Fix use-after-free of migration state object") Looks like MigrationIncomingState is just for "true" incoming migration, which can be started just once - so it is destroyed after the first attempt and never reinitialized. On the other hand, MigrationState is for both true incoming migration and also for snapshot load - the later which seems able to be started multiple times. Moved these variables to MigrationState, updated the GitHub tree and now this test passes. Thanks, Maciej
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> This is an updated v2 patch series of the v1 series located here: https://lore.kernel.org/qemu-devel/cover.1718717584.git.maciej.szmigiero@oracle.com/ Changes from v1: * Extended the QEMU thread-pool with non-AIO (generic) pool support, implemented automatic memory management support for its work element function argument. * Introduced a multifd device state save thread pool, ported the VFIO multifd device state save implementation to use this thread pool instead of VFIO internally managed individual threads. * Re-implemented on top of Fabiano's v4 multifd sender refactor patch set from https://lore.kernel.org/qemu-devel/20240823173911.6712-1-farosas@suse.de/ * Moved device state related multifd code to new multifd-device-state.c file where it made sense. * Implemented a max in-flight VFIO device state buffer count limit to allow capping the maximum recipient memory usage. * Removed unnecessary explicit memory barriers from multifd_send(). * A few small changes like updated comments, code formatting, fixed zero-copy RAM multifd bytes transferred counter under-counting, etc. For convenience, this patch set is also available as a git tree: https://github.com/maciejsszmigiero/qemu/tree/multifd-device-state-transfer-vfio Based-on: <20240823173911.6712-1-farosas@suse.de> Maciej S. Szmigiero (17): vfio/migration: Add save_{iterate,complete_precopy}_started trace events migration/ram: Add load start trace event migration/multifd: Zero p->flags before starting filling a packet thread-pool: Add a DestroyNotify parameter to thread_pool_submit{,_aio)() thread-pool: Implement non-AIO (generic) pool support migration: Add save_live_complete_precopy_{begin,end} handlers migration: Add qemu_loadvm_load_state_buffer() and its handler migration: Add load_finish handler and associated functions migration/multifd: Device state transfer support - receive side migration/multifd: Convert multifd_send()::next_channel to atomic migration/multifd: Add an explicit MultiFDSendData destructor migration/multifd: Device state transfer support - send side migration/multifd: Add migration_has_device_state_support() migration: Add save_live_complete_precopy_thread handler vfio/migration: Multifd device state transfer support - receive side vfio/migration: Add x-migration-multifd-transfer VFIO property vfio/migration: Multifd device state transfer support - send side backends/tpm/tpm_backend.c | 2 +- block/file-win32.c | 2 +- hw/9pfs/coth.c | 3 +- hw/ppc/spapr_nvdimm.c | 4 +- hw/vfio/migration.c | 520 ++++++++++++++++++++++++++++++- hw/vfio/pci.c | 9 + hw/vfio/trace-events | 14 +- hw/virtio/virtio-pmem.c | 2 +- include/block/thread-pool.h | 12 +- include/hw/vfio/vfio-common.h | 22 ++ include/migration/misc.h | 15 + include/migration/register.h | 97 ++++++ include/qemu/typedefs.h | 4 + migration/meson.build | 1 + migration/migration.c | 6 + migration/migration.h | 3 + migration/multifd-device-state.c | 193 ++++++++++++ migration/multifd-nocomp.c | 9 +- migration/multifd-qpl.c | 2 +- migration/multifd-uadk.c | 2 +- migration/multifd-zlib.c | 2 +- migration/multifd-zstd.c | 2 +- migration/multifd.c | 249 ++++++++++++--- migration/multifd.h | 65 +++- migration/ram.c | 1 + migration/savevm.c | 152 ++++++++- migration/savevm.h | 7 + migration/trace-events | 1 + tests/unit/test-thread-pool.c | 8 +- util/thread-pool.c | 83 ++++- 30 files changed, 1406 insertions(+), 86 deletions(-) create mode 100644 migration/multifd-device-state.c