Message ID | 20230117115511.3215273-4-xuchuangxclwt@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: reduce time of loading non-iterable vmstate | expand |
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test1 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 150 ms 740+ ms > after about 30 ms 630+ ms > > In test2, we keep the number of the device the same as test1, reduce the > number of queues per device: > > Here are the test2 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 1-queue vhost-net device > - 16 1-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 90 ms about 250 ms > > after about 25 ms about 160 ms > > In test3, we keep the number of queues per device the same as test1, reduce > the number of devices: > > Here are the test3 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 1 16-queue vhost-net device > - 1 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 20 ms about 70 ms > after about 11 ms about 60 ms > > As we can see from the test results above, both the number of queues and > the number of devices have a great impact on the time of loading non-iterable > vmstate. The growth of the number of devices and queues will lead to more > mr commits, and the time consumption caused by the flatview reconstruction > will also increase. > > Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com> Reviewed-by: Juan Quintela <quintela@redhat.com> And BTW, nice load times improvements.
diff --git a/migration/savevm.c b/migration/savevm.c index a0cdb714f7..8ca6d396f4 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2617,6 +2617,16 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) uint8_t section_type; int ret = 0; + /* + * Call memory_region_transaction_begin() before loading vmstate. + * This call is paired with memory_region_transaction_commit() at + * the end of qemu_loadvm_state_main(), in order to pack all the + * changes to memory region during the period of loading + * non-iterable vmstate in a single memory transaction. + * This operation will reduce time of loading non-iterable vmstate + */ + memory_region_transaction_begin(); + retry: while (true) { section_type = qemu_get_byte(f); @@ -2684,6 +2694,14 @@ out: goto retry; } } + + /* + * Call memory_region_transaction_commit() after loading vmstate. + * At this point, qemu actually completes all the previous memory + * region transactions. + */ + memory_region_transaction_commit(); + return ret; }
The duration of loading non-iterable vmstate accounts for a significant portion of downtime (starting with the timestamp of source qemu stop and ending with the timestamp of target qemu start). Most of the time is spent committing memory region changes repeatedly. This patch packs all the changes to memory region during the period of loading non-iterable vmstate in a single memory transaction. With the increase of devices, this patch will greatly improve the performance. Here are the test1 results: test info: - Host - Intel(R) Xeon(R) Platinum 8260 CPU - NVIDIA Mellanox ConnectX-5 - VM - 32 CPUs 128GB RAM VM - 8 16-queue vhost-net device - 16 4-queue vhost-user-blk device. time of loading non-iterable vmstate downtime before about 150 ms 740+ ms after about 30 ms 630+ ms In test2, we keep the number of the device the same as test1, reduce the number of queues per device: Here are the test2 results: test info: - Host - Intel(R) Xeon(R) Platinum 8260 CPU - NVIDIA Mellanox ConnectX-5 - VM - 32 CPUs 128GB RAM VM - 8 1-queue vhost-net device - 16 1-queue vhost-user-blk device. time of loading non-iterable vmstate downtime before about 90 ms about 250 ms after about 25 ms about 160 ms In test3, we keep the number of queues per device the same as test1, reduce the number of devices: Here are the test3 results: test info: - Host - Intel(R) Xeon(R) Platinum 8260 CPU - NVIDIA Mellanox ConnectX-5 - VM - 32 CPUs 128GB RAM VM - 1 16-queue vhost-net device - 1 4-queue vhost-user-blk device. time of loading non-iterable vmstate downtime before about 20 ms about 70 ms after about 11 ms about 60 ms As we can see from the test results above, both the number of queues and the number of devices have a great impact on the time of loading non-iterable vmstate. The growth of the number of devices and queues will lead to more mr commits, and the time consumption caused by the flatview reconstruction will also increase. Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com> --- migration/savevm.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)