Message ID | 20201217165712.369061-1-andrey.gruzdev@virtuozzo.com (mailing list archive) |
---|---|
Headers | show |
Series | UFFD write-tracking migration/snapshots | expand |
On 17.12.2020 19:57, Andrey Gruzdev wrote: > This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's > implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. > > Currently the only way to make (external) live VM snapshot is using existing > dirty page logging migration mechanism. The main problem is that it tends to > produce a lot of page duplicates while running VM goes on updating already > saved pages. That leads to the fact that vmstate image size is commonly several > times bigger then non-zero part of virtual machine's RSS. Time required to > converge RAM migration and the size of snapshot image severely depend on the > guest memory write rate, sometimes resulting in unacceptably long snapshot > creation time and huge image size. > > This series propose a way to solve the aforementioned problems. This is done > by using different RAM migration mechanism based on UFFD write protection > management introduced in v5.7 kernel. The migration strategy is to 'freeze' > guest RAM content using write-protection and iteratively release protection > for memory ranges that have already been saved to the migration stream. > At the same time we read in pending UFFD write fault events and save those > pages out-of-order with higher priority. > > How to use: > 1. Enable write-tracking migration capability > virsh qemu-monitor-command <domain> --hmp migrate_set_capability. > track-writes-ram on > > 2. Start the external migration to a file > virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' > > 3. Wait for the migration finish and check that the migration has completed. > state. > > > Changes v9->v10: > > * 1. Fixed commit message for [PATCH v9 1/5]. > > Changes v8->v9 > > * 1. Fixed wrong cover letter subject. > > Changes v7->v8 > > * 1. Fixed coding style problems to pass checkpatch. > > Changes v6->v7: > > * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request() > * before stopping VM to make runstate transition valid. > * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled. > * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies. > > Changes v5->v6: > > * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static > * for write-tracking support level in migrate_query_write_tracking(), check > * each time when one tries to enable 'background-snapshot' capability. > > Changes v4->v5: > > * 1. Refactored util/userfaultfd.c code to support features required by postcopy. > * 2. Introduced checks for host kernel and guest memory backend compatibility > * to 'background-snapshot' branch in migrate_caps_check(). > * 3. Switched to using trace_xxx instead of info_report()/error_report() for > * cases when error message must be hidden (probing UFFD-IO) or info may be > * really littering output if goes to stderr. > * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. > * 5. Added memory_region_ref() for each RAM block being wr-protected. > * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. > * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. > * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that > * that choosen criteria for high-latency fault detection (i.e. timestamp of > * UFFD event fetch) is not representative enough for this task. > * At the moment it looks somehow like premature optimization effort. > * 8. Dropped some unnecessary/unused code. > > Andrey Gruzdev (5): > migration: introduce 'background-snapshot' migration capability > migration: introduce UFFD-WP low-level interface helpers > migration: support UFFD write fault processing in ram_save_iterate() > migration: implementation of background snapshot thread > migration: introduce 'userfaultfd-wrlat.py' script > > include/exec/memory.h | 8 + > include/qemu/userfaultfd.h | 35 ++++ > migration/migration.c | 365 ++++++++++++++++++++++++++++++++++- > migration/migration.h | 4 + > migration/ram.c | 288 ++++++++++++++++++++++++++- > migration/ram.h | 6 + > migration/savevm.c | 1 - > migration/savevm.h | 2 + > migration/trace-events | 2 + > qapi/migration.json | 7 +- > scripts/userfaultfd-wrlat.py | 148 ++++++++++++++ > util/meson.build | 1 + > util/trace-events | 9 + > util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++ > 14 files changed, 1211 insertions(+), 10 deletions(-) > create mode 100644 include/qemu/userfaultfd.h > create mode 100755 scripts/userfaultfd-wrlat.py > create mode 100644 util/userfaultfd.c > Hi Peter, I have a question about the Wiki page you've created https://wiki.qemu.org/ToDo/LiveMigration#Features. May we also add to that page/have access rights?
On 17.12.2020 19:57, Andrey Gruzdev wrote: Ping > This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's > implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. > > Currently the only way to make (external) live VM snapshot is using existing > dirty page logging migration mechanism. The main problem is that it tends to > produce a lot of page duplicates while running VM goes on updating already > saved pages. That leads to the fact that vmstate image size is commonly several > times bigger then non-zero part of virtual machine's RSS. Time required to > converge RAM migration and the size of snapshot image severely depend on the > guest memory write rate, sometimes resulting in unacceptably long snapshot > creation time and huge image size. > > This series propose a way to solve the aforementioned problems. This is done > by using different RAM migration mechanism based on UFFD write protection > management introduced in v5.7 kernel. The migration strategy is to 'freeze' > guest RAM content using write-protection and iteratively release protection > for memory ranges that have already been saved to the migration stream. > At the same time we read in pending UFFD write fault events and save those > pages out-of-order with higher priority. > > How to use: > 1. Enable write-tracking migration capability > virsh qemu-monitor-command <domain> --hmp migrate_set_capability. > track-writes-ram on > > 2. Start the external migration to a file > virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' > > 3. Wait for the migration finish and check that the migration has completed. > state. > > > Changes v9->v10: > > * 1. Fixed commit message for [PATCH v9 1/5]. > > Changes v8->v9 > > * 1. Fixed wrong cover letter subject. > > Changes v7->v8 > > * 1. Fixed coding style problems to pass checkpatch. > > Changes v6->v7: > > * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request() > * before stopping VM to make runstate transition valid. > * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled. > * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies. > > Changes v5->v6: > > * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static > * for write-tracking support level in migrate_query_write_tracking(), check > * each time when one tries to enable 'background-snapshot' capability. > > Changes v4->v5: > > * 1. Refactored util/userfaultfd.c code to support features required by postcopy. > * 2. Introduced checks for host kernel and guest memory backend compatibility > * to 'background-snapshot' branch in migrate_caps_check(). > * 3. Switched to using trace_xxx instead of info_report()/error_report() for > * cases when error message must be hidden (probing UFFD-IO) or info may be > * really littering output if goes to stderr. > * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. > * 5. Added memory_region_ref() for each RAM block being wr-protected. > * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. > * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. > * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that > * that choosen criteria for high-latency fault detection (i.e. timestamp of > * UFFD event fetch) is not representative enough for this task. > * At the moment it looks somehow like premature optimization effort. > * 8. Dropped some unnecessary/unused code. > > Andrey Gruzdev (5): > migration: introduce 'background-snapshot' migration capability > migration: introduce UFFD-WP low-level interface helpers > migration: support UFFD write fault processing in ram_save_iterate() > migration: implementation of background snapshot thread > migration: introduce 'userfaultfd-wrlat.py' script > > include/exec/memory.h | 8 + > include/qemu/userfaultfd.h | 35 ++++ > migration/migration.c | 365 ++++++++++++++++++++++++++++++++++- > migration/migration.h | 4 + > migration/ram.c | 288 ++++++++++++++++++++++++++- > migration/ram.h | 6 + > migration/savevm.c | 1 - > migration/savevm.h | 2 + > migration/trace-events | 2 + > qapi/migration.json | 7 +- > scripts/userfaultfd-wrlat.py | 148 ++++++++++++++ > util/meson.build | 1 + > util/trace-events | 9 + > util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++ > 14 files changed, 1211 insertions(+), 10 deletions(-) > create mode 100644 include/qemu/userfaultfd.h > create mode 100755 scripts/userfaultfd-wrlat.py > create mode 100644 util/userfaultfd.c >
On Mon, Dec 21, 2020 at 03:44:38PM +0300, Andrey Gruzdev wrote: > Hi Peter, > > I have a question about the Wiki page you've created https://wiki.qemu.org/ToDo/LiveMigration#Features. > May we also add to that page/have access rights? Yes. I'll send you another email soon for that. Thanks,
On 21.12.2020 18:17, Peter Xu wrote: > On Mon, Dec 21, 2020 at 03:44:38PM +0300, Andrey Gruzdev wrote: >> Hi Peter, >> >> I have a question about the Wiki page you've created https://wiki.qemu.org/ToDo/LiveMigration#Features. >> May we also add to that page/have access rights? > Yes. I'll send you another email soon for that. Thanks, > Thanks, Peter!
On Thu, Dec 17, 2020 at 07:57:07PM +0300, Andrey Gruzdev wrote: > This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's > implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. > > Currently the only way to make (external) live VM snapshot is using existing > dirty page logging migration mechanism. The main problem is that it tends to > produce a lot of page duplicates while running VM goes on updating already > saved pages. That leads to the fact that vmstate image size is commonly several > times bigger then non-zero part of virtual machine's RSS. Time required to > converge RAM migration and the size of snapshot image severely depend on the > guest memory write rate, sometimes resulting in unacceptably long snapshot > creation time and huge image size. > > This series propose a way to solve the aforementioned problems. This is done > by using different RAM migration mechanism based on UFFD write protection > management introduced in v5.7 kernel. The migration strategy is to 'freeze' > guest RAM content using write-protection and iteratively release protection > for memory ranges that have already been saved to the migration stream. > At the same time we read in pending UFFD write fault events and save those > pages out-of-order with higher priority. > > How to use: > 1. Enable write-tracking migration capability > virsh qemu-monitor-command <domain> --hmp migrate_set_capability. > track-writes-ram on > > 2. Start the external migration to a file > virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' > > 3. Wait for the migration finish and check that the migration has completed. > state. For the rest patches: Acked-by: Peter Xu <peterx@redhat.com> Dave, considering the live snapshot series has been dangling for quite some time upstream (starting from Denis's work), do you have plan to review/merge it in the near future? I believe there're still quite a few things missing, but imho most of them should be doable on top too. Thanks!
On 05.01.2021 22:36, Peter Xu wrote: > On Thu, Dec 17, 2020 at 07:57:07PM +0300, Andrey Gruzdev wrote: >> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's >> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. >> >> Currently the only way to make (external) live VM snapshot is using existing >> dirty page logging migration mechanism. The main problem is that it tends to >> produce a lot of page duplicates while running VM goes on updating already >> saved pages. That leads to the fact that vmstate image size is commonly several >> times bigger then non-zero part of virtual machine's RSS. Time required to >> converge RAM migration and the size of snapshot image severely depend on the >> guest memory write rate, sometimes resulting in unacceptably long snapshot >> creation time and huge image size. >> >> This series propose a way to solve the aforementioned problems. This is done >> by using different RAM migration mechanism based on UFFD write protection >> management introduced in v5.7 kernel. The migration strategy is to 'freeze' >> guest RAM content using write-protection and iteratively release protection >> for memory ranges that have already been saved to the migration stream. >> At the same time we read in pending UFFD write fault events and save those >> pages out-of-order with higher priority. >> >> How to use: >> 1. Enable write-tracking migration capability >> virsh qemu-monitor-command <domain> --hmp migrate_set_capability. >> track-writes-ram on >> >> 2. Start the external migration to a file >> virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state' >> >> 3. Wait for the migration finish and check that the migration has completed. >> state. > For the rest patches: > > Acked-by: Peter Xu <peterx@redhat.com> > > Dave, considering the live snapshot series has been dangling for quite some > time upstream (starting from Denis's work), do you have plan to review/merge it > in the near future? > > I believe there're still quite a few things missing, but imho most of them > should be doable on top too. > > Thanks! > Thanks, Peter!