Message ID | 20210129101407.103458-1-andrey.gruzdev@virtuozzo.com (mailing list archive) |
---|---|
Headers | show |
Series | UFFD write-tracking migration/snapshots | expand |
* Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote: > This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's > implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. > > Currently the only way to make (external) live VM snapshot is using existing > dirty page logging migration mechanism. The main problem is that it tends to > produce a lot of page duplicates while running VM goes on updating already > saved pages. That leads to the fact that vmstate image size is commonly several > times bigger then non-zero part of virtual machine's RSS. Time required to > converge RAM migration and the size of snapshot image severely depend on the > guest memory write rate, sometimes resulting in unacceptably long snapshot > creation time and huge image size. > > This series propose a way to solve the aforementioned problems. This is done > by using different RAM migration mechanism based on UFFD write protection > management introduced in v5.7 kernel. The migration strategy is to 'freeze' > guest RAM content using write-protection and iteratively release protection > for memory ranges that have already been saved to the migration stream. > At the same time we read in pending UFFD write fault events and save those > pages out-of-order with higher priority. > > How to use: > 1. Enable write-tracking migration capability > virsh qemu-monitor-command <domain> --hmp migrate_set_capability > background-snapshot on > > 2. Start the external migration to a file > virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' > > 3. Wait for the migration finish and check that the migration has completed. > state. > > > Changes v13->v14: > > * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed > * code was originally introduced. In v13 removed #ifdef's appeared to be > * a diff in [PATCH 4/5] on top of previous patches. Thanks! Dave > Changes v12->v13: > > * 1. Fixed codestyle problem for checkpatch. > > Changes v11->v12: > > * 1. Consolidated UFFD-related code under single #if defined(__linux__). > * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour > * of more compact code fragment in ram_save_host_page(). > * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script. > > Changes v10->v11: > > * 1. Updated commit messages. > > Changes v9->v10: > > * 1. Fixed commit message for [PATCH v9 1/5]. > > Changes v8->v9: > > * 1. Fixed wrong cover letter subject. > > Changes v7->v8: > > * 1. Fixed coding style problems to pass checkpatch. > > Changes v6->v7: > > * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request() > * before stopping VM to make runstate transition valid. > * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled. > * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies. > > Changes v5->v6: > > * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static > * for write-tracking support level in migrate_query_write_tracking(), check > * each time when one tries to enable 'background-snapshot' capability. > > Changes v4->v5: > > * 1. Refactored util/userfaultfd.c code to support features required by postcopy. > * 2. Introduced checks for host kernel and guest memory backend compatibility > * to 'background-snapshot' branch in migrate_caps_check(). > * 3. Switched to using trace_xxx instead of info_report()/error_report() for > * cases when error message must be hidden (probing UFFD-IO) or info may be > * really littering output if goes to stderr. > * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. > * 5. Added memory_region_ref() for each RAM block being wr-protected. > * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. > * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. > * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that > * that choosen criteria for high-latency fault detection (i.e. timestamp of > * UFFD event fetch) is not representative enough for this task. > * At the moment it looks somehow like premature optimization effort. > * 8. Dropped some unnecessary/unused code. > > Andrey Gruzdev (5): > migration: introduce 'background-snapshot' migration capability > migration: introduce UFFD-WP low-level interface helpers > migration: support UFFD write fault processing in ram_save_iterate() > migration: implementation of background snapshot thread > migration: introduce 'userfaultfd-wrlat.py' script > > include/exec/memory.h | 8 + > include/qemu/userfaultfd.h | 35 ++++ > migration/migration.c | 357 ++++++++++++++++++++++++++++++++++- > migration/migration.h | 4 + > migration/ram.c | 303 ++++++++++++++++++++++++++++- > migration/ram.h | 6 + > migration/savevm.c | 1 - > migration/savevm.h | 2 + > migration/trace-events | 2 + > qapi/migration.json | 7 +- > scripts/userfaultfd-wrlat.py | 122 ++++++++++++ > util/meson.build | 1 + > util/trace-events | 9 + > util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++ > 14 files changed, 1190 insertions(+), 12 deletions(-) > create mode 100644 include/qemu/userfaultfd.h > create mode 100755 scripts/userfaultfd-wrlat.py > create mode 100644 util/userfaultfd.c > > -- > 2.25.1 >
* Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote: > This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's > implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. > > Currently the only way to make (external) live VM snapshot is using existing > dirty page logging migration mechanism. The main problem is that it tends to > produce a lot of page duplicates while running VM goes on updating already > saved pages. That leads to the fact that vmstate image size is commonly several > times bigger then non-zero part of virtual machine's RSS. Time required to > converge RAM migration and the size of snapshot image severely depend on the > guest memory write rate, sometimes resulting in unacceptably long snapshot > creation time and huge image size. > > This series propose a way to solve the aforementioned problems. This is done > by using different RAM migration mechanism based on UFFD write protection > management introduced in v5.7 kernel. The migration strategy is to 'freeze' > guest RAM content using write-protection and iteratively release protection > for memory ranges that have already been saved to the migration stream. > At the same time we read in pending UFFD write fault events and save those > pages out-of-order with higher priority. Queued > How to use: > 1. Enable write-tracking migration capability > virsh qemu-monitor-command <domain> --hmp migrate_set_capability > background-snapshot on > > 2. Start the external migration to a file > virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' > > 3. Wait for the migration finish and check that the migration has completed. > state. > > > Changes v13->v14: > > * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed > * code was originally introduced. In v13 removed #ifdef's appeared to be > * a diff in [PATCH 4/5] on top of previous patches. > > Changes v12->v13: > > * 1. Fixed codestyle problem for checkpatch. > > Changes v11->v12: > > * 1. Consolidated UFFD-related code under single #if defined(__linux__). > * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour > * of more compact code fragment in ram_save_host_page(). > * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script. > > Changes v10->v11: > > * 1. Updated commit messages. > > Changes v9->v10: > > * 1. Fixed commit message for [PATCH v9 1/5]. > > Changes v8->v9: > > * 1. Fixed wrong cover letter subject. > > Changes v7->v8: > > * 1. Fixed coding style problems to pass checkpatch. > > Changes v6->v7: > > * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request() > * before stopping VM to make runstate transition valid. > * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled. > * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies. > > Changes v5->v6: > > * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static > * for write-tracking support level in migrate_query_write_tracking(), check > * each time when one tries to enable 'background-snapshot' capability. > > Changes v4->v5: > > * 1. Refactored util/userfaultfd.c code to support features required by postcopy. > * 2. Introduced checks for host kernel and guest memory backend compatibility > * to 'background-snapshot' branch in migrate_caps_check(). > * 3. Switched to using trace_xxx instead of info_report()/error_report() for > * cases when error message must be hidden (probing UFFD-IO) or info may be > * really littering output if goes to stderr. > * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. > * 5. Added memory_region_ref() for each RAM block being wr-protected. > * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. > * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. > * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that > * that choosen criteria for high-latency fault detection (i.e. timestamp of > * UFFD event fetch) is not representative enough for this task. > * At the moment it looks somehow like premature optimization effort. > * 8. Dropped some unnecessary/unused code. > > Andrey Gruzdev (5): > migration: introduce 'background-snapshot' migration capability > migration: introduce UFFD-WP low-level interface helpers > migration: support UFFD write fault processing in ram_save_iterate() > migration: implementation of background snapshot thread > migration: introduce 'userfaultfd-wrlat.py' script > > include/exec/memory.h | 8 + > include/qemu/userfaultfd.h | 35 ++++ > migration/migration.c | 357 ++++++++++++++++++++++++++++++++++- > migration/migration.h | 4 + > migration/ram.c | 303 ++++++++++++++++++++++++++++- > migration/ram.h | 6 + > migration/savevm.c | 1 - > migration/savevm.h | 2 + > migration/trace-events | 2 + > qapi/migration.json | 7 +- > scripts/userfaultfd-wrlat.py | 122 ++++++++++++ > util/meson.build | 1 + > util/trace-events | 9 + > util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++ > 14 files changed, 1190 insertions(+), 12 deletions(-) > create mode 100644 include/qemu/userfaultfd.h > create mode 100755 scripts/userfaultfd-wrlat.py > create mode 100644 util/userfaultfd.c > > -- > 2.25.1 > >
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > * Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote: > > This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's > > implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. > > > > Currently the only way to make (external) live VM snapshot is using existing > > dirty page logging migration mechanism. The main problem is that it tends to > > produce a lot of page duplicates while running VM goes on updating already > > saved pages. That leads to the fact that vmstate image size is commonly several > > times bigger then non-zero part of virtual machine's RSS. Time required to > > converge RAM migration and the size of snapshot image severely depend on the > > guest memory write rate, sometimes resulting in unacceptably long snapshot > > creation time and huge image size. > > > > This series propose a way to solve the aforementioned problems. This is done > > by using different RAM migration mechanism based on UFFD write protection > > management introduced in v5.7 kernel. The migration strategy is to 'freeze' > > guest RAM content using write-protection and iteratively release protection > > for memory ranges that have already been saved to the migration stream. > > At the same time we read in pending UFFD write fault events and save those > > pages out-of-order with higher priority. > > Queued > Andrey: I've fixed up some 32bit build casts in the pull. Please check them. Dave > > How to use: > > 1. Enable write-tracking migration capability > > virsh qemu-monitor-command <domain> --hmp migrate_set_capability > > background-snapshot on > > > > 2. Start the external migration to a file > > virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' > > > > 3. Wait for the migration finish and check that the migration has completed. > > state. > > > > > > Changes v13->v14: > > > > * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed > > * code was originally introduced. In v13 removed #ifdef's appeared to be > > * a diff in [PATCH 4/5] on top of previous patches. > > > > Changes v12->v13: > > > > * 1. Fixed codestyle problem for checkpatch. > > > > Changes v11->v12: > > > > * 1. Consolidated UFFD-related code under single #if defined(__linux__). > > * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour > > * of more compact code fragment in ram_save_host_page(). > > * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script. > > > > Changes v10->v11: > > > > * 1. Updated commit messages. > > > > Changes v9->v10: > > > > * 1. Fixed commit message for [PATCH v9 1/5]. > > > > Changes v8->v9: > > > > * 1. Fixed wrong cover letter subject. > > > > Changes v7->v8: > > > > * 1. Fixed coding style problems to pass checkpatch. > > > > Changes v6->v7: > > > > * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request() > > * before stopping VM to make runstate transition valid. > > * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled. > > * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies. > > > > Changes v5->v6: > > > > * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static > > * for write-tracking support level in migrate_query_write_tracking(), check > > * each time when one tries to enable 'background-snapshot' capability. > > > > Changes v4->v5: > > > > * 1. Refactored util/userfaultfd.c code to support features required by postcopy. > > * 2. Introduced checks for host kernel and guest memory backend compatibility > > * to 'background-snapshot' branch in migrate_caps_check(). > > * 3. Switched to using trace_xxx instead of info_report()/error_report() for > > * cases when error message must be hidden (probing UFFD-IO) or info may be > > * really littering output if goes to stderr. > > * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. > > * 5. Added memory_region_ref() for each RAM block being wr-protected. > > * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. > > * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. > > * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that > > * that choosen criteria for high-latency fault detection (i.e. timestamp of > > * UFFD event fetch) is not representative enough for this task. > > * At the moment it looks somehow like premature optimization effort. > > * 8. Dropped some unnecessary/unused code. > > > > Andrey Gruzdev (5): > > migration: introduce 'background-snapshot' migration capability > > migration: introduce UFFD-WP low-level interface helpers > > migration: support UFFD write fault processing in ram_save_iterate() > > migration: implementation of background snapshot thread > > migration: introduce 'userfaultfd-wrlat.py' script > > > > include/exec/memory.h | 8 + > > include/qemu/userfaultfd.h | 35 ++++ > > migration/migration.c | 357 ++++++++++++++++++++++++++++++++++- > > migration/migration.h | 4 + > > migration/ram.c | 303 ++++++++++++++++++++++++++++- > > migration/ram.h | 6 + > > migration/savevm.c | 1 - > > migration/savevm.h | 2 + > > migration/trace-events | 2 + > > qapi/migration.json | 7 +- > > scripts/userfaultfd-wrlat.py | 122 ++++++++++++ > > util/meson.build | 1 + > > util/trace-events | 9 + > > util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++ > > 14 files changed, 1190 insertions(+), 12 deletions(-) > > create mode 100644 include/qemu/userfaultfd.h > > create mode 100755 scripts/userfaultfd-wrlat.py > > create mode 100644 util/userfaultfd.c > > > > -- > > 2.25.1 > > > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > >
On 04.02.2021 18:01, Dr. David Alan Gilbert wrote: > * Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote: >> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's >> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. >> >> Currently the only way to make (external) live VM snapshot is using existing >> dirty page logging migration mechanism. The main problem is that it tends to >> produce a lot of page duplicates while running VM goes on updating already >> saved pages. That leads to the fact that vmstate image size is commonly several >> times bigger then non-zero part of virtual machine's RSS. Time required to >> converge RAM migration and the size of snapshot image severely depend on the >> guest memory write rate, sometimes resulting in unacceptably long snapshot >> creation time and huge image size. >> >> This series propose a way to solve the aforementioned problems. This is done >> by using different RAM migration mechanism based on UFFD write protection >> management introduced in v5.7 kernel. The migration strategy is to 'freeze' >> guest RAM content using write-protection and iteratively release protection >> for memory ranges that have already been saved to the migration stream. >> At the same time we read in pending UFFD write fault events and save those >> pages out-of-order with higher priority. > Queued Thanks! >> How to use: >> 1. Enable write-tracking migration capability >> virsh qemu-monitor-command <domain> --hmp migrate_set_capability >> background-snapshot on >> >> 2. Start the external migration to a file >> virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' >> >> 3. Wait for the migration finish and check that the migration has completed. >> state. >> >> >> Changes v13->v14: >> >> * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed >> * code was originally introduced. In v13 removed #ifdef's appeared to be >> * a diff in [PATCH 4/5] on top of previous patches. >> >> Changes v12->v13: >> >> * 1. Fixed codestyle problem for checkpatch. >> >> Changes v11->v12: >> >> * 1. Consolidated UFFD-related code under single #if defined(__linux__). >> * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour >> * of more compact code fragment in ram_save_host_page(). >> * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script. >> >> Changes v10->v11: >> >> * 1. Updated commit messages. >> >> Changes v9->v10: >> >> * 1. Fixed commit message for [PATCH v9 1/5]. >> >> Changes v8->v9: >> >> * 1. Fixed wrong cover letter subject. >> >> Changes v7->v8: >> >> * 1. Fixed coding style problems to pass checkpatch. >> >> Changes v6->v7: >> >> * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request() >> * before stopping VM to make runstate transition valid. >> * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled. >> * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies. >> >> Changes v5->v6: >> >> * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static >> * for write-tracking support level in migrate_query_write_tracking(), check >> * each time when one tries to enable 'background-snapshot' capability. >> >> Changes v4->v5: >> >> * 1. Refactored util/userfaultfd.c code to support features required by postcopy. >> * 2. Introduced checks for host kernel and guest memory backend compatibility >> * to 'background-snapshot' branch in migrate_caps_check(). >> * 3. Switched to using trace_xxx instead of info_report()/error_report() for >> * cases when error message must be hidden (probing UFFD-IO) or info may be >> * really littering output if goes to stderr. >> * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. >> * 5. Added memory_region_ref() for each RAM block being wr-protected. >> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. >> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. >> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that >> * that choosen criteria for high-latency fault detection (i.e. timestamp of >> * UFFD event fetch) is not representative enough for this task. >> * At the moment it looks somehow like premature optimization effort. >> * 8. Dropped some unnecessary/unused code. >> >> Andrey Gruzdev (5): >> migration: introduce 'background-snapshot' migration capability >> migration: introduce UFFD-WP low-level interface helpers >> migration: support UFFD write fault processing in ram_save_iterate() >> migration: implementation of background snapshot thread >> migration: introduce 'userfaultfd-wrlat.py' script >> >> include/exec/memory.h | 8 + >> include/qemu/userfaultfd.h | 35 ++++ >> migration/migration.c | 357 ++++++++++++++++++++++++++++++++++- >> migration/migration.h | 4 + >> migration/ram.c | 303 ++++++++++++++++++++++++++++- >> migration/ram.h | 6 + >> migration/savevm.c | 1 - >> migration/savevm.h | 2 + >> migration/trace-events | 2 + >> qapi/migration.json | 7 +- >> scripts/userfaultfd-wrlat.py | 122 ++++++++++++ >> util/meson.build | 1 + >> util/trace-events | 9 + >> util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++ >> 14 files changed, 1190 insertions(+), 12 deletions(-) >> create mode 100644 include/qemu/userfaultfd.h >> create mode 100755 scripts/userfaultfd-wrlat.py >> create mode 100644 util/userfaultfd.c >> >> -- >> 2.25.1 >> >>
On 04.02.2021 19:53, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: >> * Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote: >>> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's >>> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. >>> >>> Currently the only way to make (external) live VM snapshot is using existing >>> dirty page logging migration mechanism. The main problem is that it tends to >>> produce a lot of page duplicates while running VM goes on updating already >>> saved pages. That leads to the fact that vmstate image size is commonly several >>> times bigger then non-zero part of virtual machine's RSS. Time required to >>> converge RAM migration and the size of snapshot image severely depend on the >>> guest memory write rate, sometimes resulting in unacceptably long snapshot >>> creation time and huge image size. >>> >>> This series propose a way to solve the aforementioned problems. This is done >>> by using different RAM migration mechanism based on UFFD write protection >>> management introduced in v5.7 kernel. The migration strategy is to 'freeze' >>> guest RAM content using write-protection and iteratively release protection >>> for memory ranges that have already been saved to the migration stream. >>> At the same time we read in pending UFFD write fault events and save those >>> pages out-of-order with higher priority. >> Queued >> > Andrey: > I've fixed up some 32bit build casts in the pull. > Please check them. > > Dave Ok, sure. Andrey >>> How to use: >>> 1. Enable write-tracking migration capability >>> virsh qemu-monitor-command <domain> --hmp migrate_set_capability >>> background-snapshot on >>> >>> 2. Start the external migration to a file >>> virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' >>> >>> 3. Wait for the migration finish and check that the migration has completed. >>> state. >>> >>> >>> Changes v13->v14: >>> >>> * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed >>> * code was originally introduced. In v13 removed #ifdef's appeared to be >>> * a diff in [PATCH 4/5] on top of previous patches. >>> >>> Changes v12->v13: >>> >>> * 1. Fixed codestyle problem for checkpatch. >>> >>> Changes v11->v12: >>> >>> * 1. Consolidated UFFD-related code under single #if defined(__linux__). >>> * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour >>> * of more compact code fragment in ram_save_host_page(). >>> * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script. >>> >>> Changes v10->v11: >>> >>> * 1. Updated commit messages. >>> >>> Changes v9->v10: >>> >>> * 1. Fixed commit message for [PATCH v9 1/5]. >>> >>> Changes v8->v9: >>> >>> * 1. Fixed wrong cover letter subject. >>> >>> Changes v7->v8: >>> >>> * 1. Fixed coding style problems to pass checkpatch. >>> >>> Changes v6->v7: >>> >>> * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request() >>> * before stopping VM to make runstate transition valid. >>> * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled. >>> * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies. >>> >>> Changes v5->v6: >>> >>> * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static >>> * for write-tracking support level in migrate_query_write_tracking(), check >>> * each time when one tries to enable 'background-snapshot' capability. >>> >>> Changes v4->v5: >>> >>> * 1. Refactored util/userfaultfd.c code to support features required by postcopy. >>> * 2. Introduced checks for host kernel and guest memory backend compatibility >>> * to 'background-snapshot' branch in migrate_caps_check(). >>> * 3. Switched to using trace_xxx instead of info_report()/error_report() for >>> * cases when error message must be hidden (probing UFFD-IO) or info may be >>> * really littering output if goes to stderr. >>> * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. >>> * 5. Added memory_region_ref() for each RAM block being wr-protected. >>> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. >>> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. >>> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that >>> * that choosen criteria for high-latency fault detection (i.e. timestamp of >>> * UFFD event fetch) is not representative enough for this task. >>> * At the moment it looks somehow like premature optimization effort. >>> * 8. Dropped some unnecessary/unused code. >>> >>> Andrey Gruzdev (5): >>> migration: introduce 'background-snapshot' migration capability >>> migration: introduce UFFD-WP low-level interface helpers >>> migration: support UFFD write fault processing in ram_save_iterate() >>> migration: implementation of background snapshot thread >>> migration: introduce 'userfaultfd-wrlat.py' script >>> >>> include/exec/memory.h | 8 + >>> include/qemu/userfaultfd.h | 35 ++++ >>> migration/migration.c | 357 ++++++++++++++++++++++++++++++++++- >>> migration/migration.h | 4 + >>> migration/ram.c | 303 ++++++++++++++++++++++++++++- >>> migration/ram.h | 6 + >>> migration/savevm.c | 1 - >>> migration/savevm.h | 2 + >>> migration/trace-events | 2 + >>> qapi/migration.json | 7 +- >>> scripts/userfaultfd-wrlat.py | 122 ++++++++++++ >>> util/meson.build | 1 + >>> util/trace-events | 9 + >>> util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++ >>> 14 files changed, 1190 insertions(+), 12 deletions(-) >>> create mode 100644 include/qemu/userfaultfd.h >>> create mode 100755 scripts/userfaultfd-wrlat.py >>> create mode 100644 util/userfaultfd.c >>> >>> -- >>> 2.25.1 >>> >>> >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> >>
On 04.02.2021 19:53, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: >> * Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote: >>> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's >>> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. >>> >>> Currently the only way to make (external) live VM snapshot is using existing >>> dirty page logging migration mechanism. The main problem is that it tends to >>> produce a lot of page duplicates while running VM goes on updating already >>> saved pages. That leads to the fact that vmstate image size is commonly several >>> times bigger then non-zero part of virtual machine's RSS. Time required to >>> converge RAM migration and the size of snapshot image severely depend on the >>> guest memory write rate, sometimes resulting in unacceptably long snapshot >>> creation time and huge image size. >>> >>> This series propose a way to solve the aforementioned problems. This is done >>> by using different RAM migration mechanism based on UFFD write protection >>> management introduced in v5.7 kernel. The migration strategy is to 'freeze' >>> guest RAM content using write-protection and iteratively release protection >>> for memory ranges that have already been saved to the migration stream. >>> At the same time we read in pending UFFD write fault events and save those >>> pages out-of-order with higher priority. >> Queued >> > Andrey: > I've fixed up some 32bit build casts in the pull. > Please check them. > > Dave Dave, thanks for fixes, ok with them. Andrey >>> How to use: >>> 1. Enable write-tracking migration capability >>> virsh qemu-monitor-command <domain> --hmp migrate_set_capability >>> background-snapshot on >>> >>> 2. Start the external migration to a file >>> virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' >>> >>> 3. Wait for the migration finish and check that the migration has completed. >>> state. >>> >>> >>> Changes v13->v14: >>> >>> * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed >>> * code was originally introduced. In v13 removed #ifdef's appeared to be >>> * a diff in [PATCH 4/5] on top of previous patches. >>> >>> Changes v12->v13: >>> >>> * 1. Fixed codestyle problem for checkpatch. >>> >>> Changes v11->v12: >>> >>> * 1. Consolidated UFFD-related code under single #if defined(__linux__). >>> * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour >>> * of more compact code fragment in ram_save_host_page(). >>> * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script. >>> >>> Changes v10->v11: >>> >>> * 1. Updated commit messages. >>> >>> Changes v9->v10: >>> >>> * 1. Fixed commit message for [PATCH v9 1/5]. >>> >>> Changes v8->v9: >>> >>> * 1. Fixed wrong cover letter subject. >>> >>> Changes v7->v8: >>> >>> * 1. Fixed coding style problems to pass checkpatch. >>> >>> Changes v6->v7: >>> >>> * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request() >>> * before stopping VM to make runstate transition valid. >>> * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled. >>> * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies. >>> >>> Changes v5->v6: >>> >>> * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static >>> * for write-tracking support level in migrate_query_write_tracking(), check >>> * each time when one tries to enable 'background-snapshot' capability. >>> >>> Changes v4->v5: >>> >>> * 1. Refactored util/userfaultfd.c code to support features required by postcopy. >>> * 2. Introduced checks for host kernel and guest memory backend compatibility >>> * to 'background-snapshot' branch in migrate_caps_check(). >>> * 3. Switched to using trace_xxx instead of info_report()/error_report() for >>> * cases when error message must be hidden (probing UFFD-IO) or info may be >>> * really littering output if goes to stderr. >>> * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. >>> * 5. Added memory_region_ref() for each RAM block being wr-protected. >>> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. >>> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. >>> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that >>> * that choosen criteria for high-latency fault detection (i.e. timestamp of >>> * UFFD event fetch) is not representative enough for this task. >>> * At the moment it looks somehow like premature optimization effort. >>> * 8. Dropped some unnecessary/unused code. >>> >>> Andrey Gruzdev (5): >>> migration: introduce 'background-snapshot' migration capability >>> migration: introduce UFFD-WP low-level interface helpers >>> migration: support UFFD write fault processing in ram_save_iterate() >>> migration: implementation of background snapshot thread >>> migration: introduce 'userfaultfd-wrlat.py' script >>> >>> include/exec/memory.h | 8 + >>> include/qemu/userfaultfd.h | 35 ++++ >>> migration/migration.c | 357 ++++++++++++++++++++++++++++++++++- >>> migration/migration.h | 4 + >>> migration/ram.c | 303 ++++++++++++++++++++++++++++- >>> migration/ram.h | 6 + >>> migration/savevm.c | 1 - >>> migration/savevm.h | 2 + >>> migration/trace-events | 2 + >>> qapi/migration.json | 7 +- >>> scripts/userfaultfd-wrlat.py | 122 ++++++++++++ >>> util/meson.build | 1 + >>> util/trace-events | 9 + >>> util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++ >>> 14 files changed, 1190 insertions(+), 12 deletions(-) >>> create mode 100644 include/qemu/userfaultfd.h >>> create mode 100755 scripts/userfaultfd-wrlat.py >>> create mode 100644 util/userfaultfd.c >>> >>> -- >>> 2.25.1 >>> >>> >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> >>