mbox series

[v14,0/5] UFFD write-tracking migration/snapshots

Message ID 20210129101407.103458-1-andrey.gruzdev@virtuozzo.com (mailing list archive)
Headers show
Series UFFD write-tracking migration/snapshots | expand

Message

Zhijian Li (Fujitsu)" via Jan. 29, 2021, 10:14 a.m. UTC
This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.

Currently the only way to make (external) live VM snapshot is using existing
dirty page logging migration mechanism. The main problem is that it tends to
produce a lot of page duplicates while running VM goes on updating already
saved pages. That leads to the fact that vmstate image size is commonly several
times bigger then non-zero part of virtual machine's RSS. Time required to
converge RAM migration and the size of snapshot image severely depend on the
guest memory write rate, sometimes resulting in unacceptably long snapshot
creation time and huge image size.

This series propose a way to solve the aforementioned problems. This is done
by using different RAM migration mechanism based on UFFD write protection
management introduced in v5.7 kernel. The migration strategy is to 'freeze'
guest RAM content using write-protection and iteratively release protection
for memory ranges that have already been saved to the migration stream.
At the same time we read in pending UFFD write fault events and save those
pages out-of-order with higher priority.

How to use:
1. Enable write-tracking migration capability
   virsh qemu-monitor-command <domain> --hmp migrate_set_capability
   background-snapshot on

2. Start the external migration to a file
   virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'

3. Wait for the migration finish and check that the migration has completed.
state.


Changes v13->v14:

* 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed
*    code was originally introduced. In v13 removed #ifdef's appeared to be
*    a diff in [PATCH 4/5] on top of previous patches.

Changes v12->v13:

* 1. Fixed codestyle problem for checkpatch.

Changes v11->v12:

* 1. Consolidated UFFD-related code under single #if defined(__linux__).
* 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour
*    of more compact code fragment in ram_save_host_page().
* 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script.

Changes v10->v11:

* 1. Updated commit messages.

Changes v9->v10:

* 1. Fixed commit message for [PATCH v9 1/5].

Changes v8->v9:

* 1. Fixed wrong cover letter subject.

Changes v7->v8:

* 1. Fixed coding style problems to pass checkpatch.

Changes v6->v7:

* 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request()
*    before stopping VM to make runstate transition valid.
* 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled.
* 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies.

Changes v5->v6:

* 1. Consider possible hot pluggin/unpluggin of memory device - don't use static
*    for write-tracking support level in migrate_query_write_tracking(), check
*    each time when one tries to enable 'background-snapshot' capability.

Changes v4->v5:

* 1. Refactored util/userfaultfd.c code to support features required by postcopy.
* 2. Introduced checks for host kernel and guest memory backend compatibility
*    to 'background-snapshot' branch in migrate_caps_check().
* 3. Switched to using trace_xxx instead of info_report()/error_report() for
*    cases when error message must be hidden (probing UFFD-IO) or info may be
*    really littering output if goes to stderr.
* 4  Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list.
* 5. Added memory_region_ref() for each RAM block being wr-protected.
* 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine.
* 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t.
* 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that
*    that choosen criteria for high-latency fault detection (i.e. timestamp of
*    UFFD event fetch) is not representative enough for this task.
*    At the moment it looks somehow like premature optimization effort.
* 8. Dropped some unnecessary/unused code.

Andrey Gruzdev (5):
  migration: introduce 'background-snapshot' migration capability
  migration: introduce UFFD-WP low-level interface helpers
  migration: support UFFD write fault processing in ram_save_iterate()
  migration: implementation of background snapshot thread
  migration: introduce 'userfaultfd-wrlat.py' script

 include/exec/memory.h        |   8 +
 include/qemu/userfaultfd.h   |  35 ++++
 migration/migration.c        | 357 ++++++++++++++++++++++++++++++++++-
 migration/migration.h        |   4 +
 migration/ram.c              | 303 ++++++++++++++++++++++++++++-
 migration/ram.h              |   6 +
 migration/savevm.c           |   1 -
 migration/savevm.h           |   2 +
 migration/trace-events       |   2 +
 qapi/migration.json          |   7 +-
 scripts/userfaultfd-wrlat.py | 122 ++++++++++++
 util/meson.build             |   1 +
 util/trace-events            |   9 +
 util/userfaultfd.c           | 345 +++++++++++++++++++++++++++++++++
 14 files changed, 1190 insertions(+), 12 deletions(-)
 create mode 100644 include/qemu/userfaultfd.h
 create mode 100755 scripts/userfaultfd-wrlat.py
 create mode 100644 util/userfaultfd.c

Comments

Dr. David Alan Gilbert Feb. 1, 2021, 12:05 p.m. UTC | #1
* Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote:
> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.
> 
> Currently the only way to make (external) live VM snapshot is using existing
> dirty page logging migration mechanism. The main problem is that it tends to
> produce a lot of page duplicates while running VM goes on updating already
> saved pages. That leads to the fact that vmstate image size is commonly several
> times bigger then non-zero part of virtual machine's RSS. Time required to
> converge RAM migration and the size of snapshot image severely depend on the
> guest memory write rate, sometimes resulting in unacceptably long snapshot
> creation time and huge image size.
> 
> This series propose a way to solve the aforementioned problems. This is done
> by using different RAM migration mechanism based on UFFD write protection
> management introduced in v5.7 kernel. The migration strategy is to 'freeze'
> guest RAM content using write-protection and iteratively release protection
> for memory ranges that have already been saved to the migration stream.
> At the same time we read in pending UFFD write fault events and save those
> pages out-of-order with higher priority.
> 
> How to use:
> 1. Enable write-tracking migration capability
>    virsh qemu-monitor-command <domain> --hmp migrate_set_capability
>    background-snapshot on
> 
> 2. Start the external migration to a file
>    virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'
> 
> 3. Wait for the migration finish and check that the migration has completed.
> state.
> 
> 
> Changes v13->v14:
> 
> * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed
> *    code was originally introduced. In v13 removed #ifdef's appeared to be
> *    a diff in [PATCH 4/5] on top of previous patches.

Thanks!

Dave

> Changes v12->v13:
> 
> * 1. Fixed codestyle problem for checkpatch.
> 
> Changes v11->v12:
> 
> * 1. Consolidated UFFD-related code under single #if defined(__linux__).
> * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour
> *    of more compact code fragment in ram_save_host_page().
> * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script.
> 
> Changes v10->v11:
> 
> * 1. Updated commit messages.
> 
> Changes v9->v10:
> 
> * 1. Fixed commit message for [PATCH v9 1/5].
> 
> Changes v8->v9:
> 
> * 1. Fixed wrong cover letter subject.
> 
> Changes v7->v8:
> 
> * 1. Fixed coding style problems to pass checkpatch.
> 
> Changes v6->v7:
> 
> * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request()
> *    before stopping VM to make runstate transition valid.
> * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled.
> * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies.
> 
> Changes v5->v6:
> 
> * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static
> *    for write-tracking support level in migrate_query_write_tracking(), check
> *    each time when one tries to enable 'background-snapshot' capability.
> 
> Changes v4->v5:
> 
> * 1. Refactored util/userfaultfd.c code to support features required by postcopy.
> * 2. Introduced checks for host kernel and guest memory backend compatibility
> *    to 'background-snapshot' branch in migrate_caps_check().
> * 3. Switched to using trace_xxx instead of info_report()/error_report() for
> *    cases when error message must be hidden (probing UFFD-IO) or info may be
> *    really littering output if goes to stderr.
> * 4  Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list.
> * 5. Added memory_region_ref() for each RAM block being wr-protected.
> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine.
> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t.
> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that
> *    that choosen criteria for high-latency fault detection (i.e. timestamp of
> *    UFFD event fetch) is not representative enough for this task.
> *    At the moment it looks somehow like premature optimization effort.
> * 8. Dropped some unnecessary/unused code.
> 
> Andrey Gruzdev (5):
>   migration: introduce 'background-snapshot' migration capability
>   migration: introduce UFFD-WP low-level interface helpers
>   migration: support UFFD write fault processing in ram_save_iterate()
>   migration: implementation of background snapshot thread
>   migration: introduce 'userfaultfd-wrlat.py' script
> 
>  include/exec/memory.h        |   8 +
>  include/qemu/userfaultfd.h   |  35 ++++
>  migration/migration.c        | 357 ++++++++++++++++++++++++++++++++++-
>  migration/migration.h        |   4 +
>  migration/ram.c              | 303 ++++++++++++++++++++++++++++-
>  migration/ram.h              |   6 +
>  migration/savevm.c           |   1 -
>  migration/savevm.h           |   2 +
>  migration/trace-events       |   2 +
>  qapi/migration.json          |   7 +-
>  scripts/userfaultfd-wrlat.py | 122 ++++++++++++
>  util/meson.build             |   1 +
>  util/trace-events            |   9 +
>  util/userfaultfd.c           | 345 +++++++++++++++++++++++++++++++++
>  14 files changed, 1190 insertions(+), 12 deletions(-)
>  create mode 100644 include/qemu/userfaultfd.h
>  create mode 100755 scripts/userfaultfd-wrlat.py
>  create mode 100644 util/userfaultfd.c
> 
> -- 
> 2.25.1
>
Dr. David Alan Gilbert Feb. 4, 2021, 3:01 p.m. UTC | #2
* Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote:
> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.
> 
> Currently the only way to make (external) live VM snapshot is using existing
> dirty page logging migration mechanism. The main problem is that it tends to
> produce a lot of page duplicates while running VM goes on updating already
> saved pages. That leads to the fact that vmstate image size is commonly several
> times bigger then non-zero part of virtual machine's RSS. Time required to
> converge RAM migration and the size of snapshot image severely depend on the
> guest memory write rate, sometimes resulting in unacceptably long snapshot
> creation time and huge image size.
> 
> This series propose a way to solve the aforementioned problems. This is done
> by using different RAM migration mechanism based on UFFD write protection
> management introduced in v5.7 kernel. The migration strategy is to 'freeze'
> guest RAM content using write-protection and iteratively release protection
> for memory ranges that have already been saved to the migration stream.
> At the same time we read in pending UFFD write fault events and save those
> pages out-of-order with higher priority.

Queued

> How to use:
> 1. Enable write-tracking migration capability
>    virsh qemu-monitor-command <domain> --hmp migrate_set_capability
>    background-snapshot on
> 
> 2. Start the external migration to a file
>    virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'
> 
> 3. Wait for the migration finish and check that the migration has completed.
> state.
> 
> 
> Changes v13->v14:
> 
> * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed
> *    code was originally introduced. In v13 removed #ifdef's appeared to be
> *    a diff in [PATCH 4/5] on top of previous patches.
> 
> Changes v12->v13:
> 
> * 1. Fixed codestyle problem for checkpatch.
> 
> Changes v11->v12:
> 
> * 1. Consolidated UFFD-related code under single #if defined(__linux__).
> * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour
> *    of more compact code fragment in ram_save_host_page().
> * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script.
> 
> Changes v10->v11:
> 
> * 1. Updated commit messages.
> 
> Changes v9->v10:
> 
> * 1. Fixed commit message for [PATCH v9 1/5].
> 
> Changes v8->v9:
> 
> * 1. Fixed wrong cover letter subject.
> 
> Changes v7->v8:
> 
> * 1. Fixed coding style problems to pass checkpatch.
> 
> Changes v6->v7:
> 
> * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request()
> *    before stopping VM to make runstate transition valid.
> * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled.
> * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies.
> 
> Changes v5->v6:
> 
> * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static
> *    for write-tracking support level in migrate_query_write_tracking(), check
> *    each time when one tries to enable 'background-snapshot' capability.
> 
> Changes v4->v5:
> 
> * 1. Refactored util/userfaultfd.c code to support features required by postcopy.
> * 2. Introduced checks for host kernel and guest memory backend compatibility
> *    to 'background-snapshot' branch in migrate_caps_check().
> * 3. Switched to using trace_xxx instead of info_report()/error_report() for
> *    cases when error message must be hidden (probing UFFD-IO) or info may be
> *    really littering output if goes to stderr.
> * 4  Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list.
> * 5. Added memory_region_ref() for each RAM block being wr-protected.
> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine.
> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t.
> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that
> *    that choosen criteria for high-latency fault detection (i.e. timestamp of
> *    UFFD event fetch) is not representative enough for this task.
> *    At the moment it looks somehow like premature optimization effort.
> * 8. Dropped some unnecessary/unused code.
> 
> Andrey Gruzdev (5):
>   migration: introduce 'background-snapshot' migration capability
>   migration: introduce UFFD-WP low-level interface helpers
>   migration: support UFFD write fault processing in ram_save_iterate()
>   migration: implementation of background snapshot thread
>   migration: introduce 'userfaultfd-wrlat.py' script
> 
>  include/exec/memory.h        |   8 +
>  include/qemu/userfaultfd.h   |  35 ++++
>  migration/migration.c        | 357 ++++++++++++++++++++++++++++++++++-
>  migration/migration.h        |   4 +
>  migration/ram.c              | 303 ++++++++++++++++++++++++++++-
>  migration/ram.h              |   6 +
>  migration/savevm.c           |   1 -
>  migration/savevm.h           |   2 +
>  migration/trace-events       |   2 +
>  qapi/migration.json          |   7 +-
>  scripts/userfaultfd-wrlat.py | 122 ++++++++++++
>  util/meson.build             |   1 +
>  util/trace-events            |   9 +
>  util/userfaultfd.c           | 345 +++++++++++++++++++++++++++++++++
>  14 files changed, 1190 insertions(+), 12 deletions(-)
>  create mode 100644 include/qemu/userfaultfd.h
>  create mode 100755 scripts/userfaultfd-wrlat.py
>  create mode 100644 util/userfaultfd.c
> 
> -- 
> 2.25.1
> 
>
Dr. David Alan Gilbert Feb. 4, 2021, 4:53 p.m. UTC | #3
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote:
> > This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
> > implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.
> > 
> > Currently the only way to make (external) live VM snapshot is using existing
> > dirty page logging migration mechanism. The main problem is that it tends to
> > produce a lot of page duplicates while running VM goes on updating already
> > saved pages. That leads to the fact that vmstate image size is commonly several
> > times bigger then non-zero part of virtual machine's RSS. Time required to
> > converge RAM migration and the size of snapshot image severely depend on the
> > guest memory write rate, sometimes resulting in unacceptably long snapshot
> > creation time and huge image size.
> > 
> > This series propose a way to solve the aforementioned problems. This is done
> > by using different RAM migration mechanism based on UFFD write protection
> > management introduced in v5.7 kernel. The migration strategy is to 'freeze'
> > guest RAM content using write-protection and iteratively release protection
> > for memory ranges that have already been saved to the migration stream.
> > At the same time we read in pending UFFD write fault events and save those
> > pages out-of-order with higher priority.
> 
> Queued
> 
Andrey:
  I've fixed up some 32bit build casts in the pull.
Please check them.

Dave

> > How to use:
> > 1. Enable write-tracking migration capability
> >    virsh qemu-monitor-command <domain> --hmp migrate_set_capability
> >    background-snapshot on
> > 
> > 2. Start the external migration to a file
> >    virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'
> > 
> > 3. Wait for the migration finish and check that the migration has completed.
> > state.
> > 
> > 
> > Changes v13->v14:
> > 
> > * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed
> > *    code was originally introduced. In v13 removed #ifdef's appeared to be
> > *    a diff in [PATCH 4/5] on top of previous patches.
> > 
> > Changes v12->v13:
> > 
> > * 1. Fixed codestyle problem for checkpatch.
> > 
> > Changes v11->v12:
> > 
> > * 1. Consolidated UFFD-related code under single #if defined(__linux__).
> > * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour
> > *    of more compact code fragment in ram_save_host_page().
> > * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script.
> > 
> > Changes v10->v11:
> > 
> > * 1. Updated commit messages.
> > 
> > Changes v9->v10:
> > 
> > * 1. Fixed commit message for [PATCH v9 1/5].
> > 
> > Changes v8->v9:
> > 
> > * 1. Fixed wrong cover letter subject.
> > 
> > Changes v7->v8:
> > 
> > * 1. Fixed coding style problems to pass checkpatch.
> > 
> > Changes v6->v7:
> > 
> > * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request()
> > *    before stopping VM to make runstate transition valid.
> > * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled.
> > * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies.
> > 
> > Changes v5->v6:
> > 
> > * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static
> > *    for write-tracking support level in migrate_query_write_tracking(), check
> > *    each time when one tries to enable 'background-snapshot' capability.
> > 
> > Changes v4->v5:
> > 
> > * 1. Refactored util/userfaultfd.c code to support features required by postcopy.
> > * 2. Introduced checks for host kernel and guest memory backend compatibility
> > *    to 'background-snapshot' branch in migrate_caps_check().
> > * 3. Switched to using trace_xxx instead of info_report()/error_report() for
> > *    cases when error message must be hidden (probing UFFD-IO) or info may be
> > *    really littering output if goes to stderr.
> > * 4  Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list.
> > * 5. Added memory_region_ref() for each RAM block being wr-protected.
> > * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine.
> > * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t.
> > * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that
> > *    that choosen criteria for high-latency fault detection (i.e. timestamp of
> > *    UFFD event fetch) is not representative enough for this task.
> > *    At the moment it looks somehow like premature optimization effort.
> > * 8. Dropped some unnecessary/unused code.
> > 
> > Andrey Gruzdev (5):
> >   migration: introduce 'background-snapshot' migration capability
> >   migration: introduce UFFD-WP low-level interface helpers
> >   migration: support UFFD write fault processing in ram_save_iterate()
> >   migration: implementation of background snapshot thread
> >   migration: introduce 'userfaultfd-wrlat.py' script
> > 
> >  include/exec/memory.h        |   8 +
> >  include/qemu/userfaultfd.h   |  35 ++++
> >  migration/migration.c        | 357 ++++++++++++++++++++++++++++++++++-
> >  migration/migration.h        |   4 +
> >  migration/ram.c              | 303 ++++++++++++++++++++++++++++-
> >  migration/ram.h              |   6 +
> >  migration/savevm.c           |   1 -
> >  migration/savevm.h           |   2 +
> >  migration/trace-events       |   2 +
> >  qapi/migration.json          |   7 +-
> >  scripts/userfaultfd-wrlat.py | 122 ++++++++++++
> >  util/meson.build             |   1 +
> >  util/trace-events            |   9 +
> >  util/userfaultfd.c           | 345 +++++++++++++++++++++++++++++++++
> >  14 files changed, 1190 insertions(+), 12 deletions(-)
> >  create mode 100644 include/qemu/userfaultfd.h
> >  create mode 100755 scripts/userfaultfd-wrlat.py
> >  create mode 100644 util/userfaultfd.c
> > 
> > -- 
> > 2.25.1
> > 
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
>
Andrey Gruzdev Feb. 4, 2021, 5:30 p.m. UTC | #4
On 04.02.2021 18:01, Dr. David Alan Gilbert wrote:
> * Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote:
>> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
>> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.
>>
>> Currently the only way to make (external) live VM snapshot is using existing
>> dirty page logging migration mechanism. The main problem is that it tends to
>> produce a lot of page duplicates while running VM goes on updating already
>> saved pages. That leads to the fact that vmstate image size is commonly several
>> times bigger then non-zero part of virtual machine's RSS. Time required to
>> converge RAM migration and the size of snapshot image severely depend on the
>> guest memory write rate, sometimes resulting in unacceptably long snapshot
>> creation time and huge image size.
>>
>> This series propose a way to solve the aforementioned problems. This is done
>> by using different RAM migration mechanism based on UFFD write protection
>> management introduced in v5.7 kernel. The migration strategy is to 'freeze'
>> guest RAM content using write-protection and iteratively release protection
>> for memory ranges that have already been saved to the migration stream.
>> At the same time we read in pending UFFD write fault events and save those
>> pages out-of-order with higher priority.
> Queued

Thanks!

>> How to use:
>> 1. Enable write-tracking migration capability
>>     virsh qemu-monitor-command <domain> --hmp migrate_set_capability
>>     background-snapshot on
>>
>> 2. Start the external migration to a file
>>     virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'
>>
>> 3. Wait for the migration finish and check that the migration has completed.
>> state.
>>
>>
>> Changes v13->v14:
>>
>> * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed
>> *    code was originally introduced. In v13 removed #ifdef's appeared to be
>> *    a diff in [PATCH 4/5] on top of previous patches.
>>
>> Changes v12->v13:
>>
>> * 1. Fixed codestyle problem for checkpatch.
>>
>> Changes v11->v12:
>>
>> * 1. Consolidated UFFD-related code under single #if defined(__linux__).
>> * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour
>> *    of more compact code fragment in ram_save_host_page().
>> * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script.
>>
>> Changes v10->v11:
>>
>> * 1. Updated commit messages.
>>
>> Changes v9->v10:
>>
>> * 1. Fixed commit message for [PATCH v9 1/5].
>>
>> Changes v8->v9:
>>
>> * 1. Fixed wrong cover letter subject.
>>
>> Changes v7->v8:
>>
>> * 1. Fixed coding style problems to pass checkpatch.
>>
>> Changes v6->v7:
>>
>> * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request()
>> *    before stopping VM to make runstate transition valid.
>> * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled.
>> * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies.
>>
>> Changes v5->v6:
>>
>> * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static
>> *    for write-tracking support level in migrate_query_write_tracking(), check
>> *    each time when one tries to enable 'background-snapshot' capability.
>>
>> Changes v4->v5:
>>
>> * 1. Refactored util/userfaultfd.c code to support features required by postcopy.
>> * 2. Introduced checks for host kernel and guest memory backend compatibility
>> *    to 'background-snapshot' branch in migrate_caps_check().
>> * 3. Switched to using trace_xxx instead of info_report()/error_report() for
>> *    cases when error message must be hidden (probing UFFD-IO) or info may be
>> *    really littering output if goes to stderr.
>> * 4  Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list.
>> * 5. Added memory_region_ref() for each RAM block being wr-protected.
>> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine.
>> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t.
>> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that
>> *    that choosen criteria for high-latency fault detection (i.e. timestamp of
>> *    UFFD event fetch) is not representative enough for this task.
>> *    At the moment it looks somehow like premature optimization effort.
>> * 8. Dropped some unnecessary/unused code.
>>
>> Andrey Gruzdev (5):
>>    migration: introduce 'background-snapshot' migration capability
>>    migration: introduce UFFD-WP low-level interface helpers
>>    migration: support UFFD write fault processing in ram_save_iterate()
>>    migration: implementation of background snapshot thread
>>    migration: introduce 'userfaultfd-wrlat.py' script
>>
>>   include/exec/memory.h        |   8 +
>>   include/qemu/userfaultfd.h   |  35 ++++
>>   migration/migration.c        | 357 ++++++++++++++++++++++++++++++++++-
>>   migration/migration.h        |   4 +
>>   migration/ram.c              | 303 ++++++++++++++++++++++++++++-
>>   migration/ram.h              |   6 +
>>   migration/savevm.c           |   1 -
>>   migration/savevm.h           |   2 +
>>   migration/trace-events       |   2 +
>>   qapi/migration.json          |   7 +-
>>   scripts/userfaultfd-wrlat.py | 122 ++++++++++++
>>   util/meson.build             |   1 +
>>   util/trace-events            |   9 +
>>   util/userfaultfd.c           | 345 +++++++++++++++++++++++++++++++++
>>   14 files changed, 1190 insertions(+), 12 deletions(-)
>>   create mode 100644 include/qemu/userfaultfd.h
>>   create mode 100755 scripts/userfaultfd-wrlat.py
>>   create mode 100644 util/userfaultfd.c
>>
>> -- 
>> 2.25.1
>>
>>
Andrey Gruzdev Feb. 4, 2021, 5:32 p.m. UTC | #5
On 04.02.2021 19:53, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>> * Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote:
>>> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
>>> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.
>>>
>>> Currently the only way to make (external) live VM snapshot is using existing
>>> dirty page logging migration mechanism. The main problem is that it tends to
>>> produce a lot of page duplicates while running VM goes on updating already
>>> saved pages. That leads to the fact that vmstate image size is commonly several
>>> times bigger then non-zero part of virtual machine's RSS. Time required to
>>> converge RAM migration and the size of snapshot image severely depend on the
>>> guest memory write rate, sometimes resulting in unacceptably long snapshot
>>> creation time and huge image size.
>>>
>>> This series propose a way to solve the aforementioned problems. This is done
>>> by using different RAM migration mechanism based on UFFD write protection
>>> management introduced in v5.7 kernel. The migration strategy is to 'freeze'
>>> guest RAM content using write-protection and iteratively release protection
>>> for memory ranges that have already been saved to the migration stream.
>>> At the same time we read in pending UFFD write fault events and save those
>>> pages out-of-order with higher priority.
>> Queued
>>
> Andrey:
>    I've fixed up some 32bit build casts in the pull.
> Please check them.
>
> Dave

Ok, sure.

Andrey

>>> How to use:
>>> 1. Enable write-tracking migration capability
>>>     virsh qemu-monitor-command <domain> --hmp migrate_set_capability
>>>     background-snapshot on
>>>
>>> 2. Start the external migration to a file
>>>     virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'
>>>
>>> 3. Wait for the migration finish and check that the migration has completed.
>>> state.
>>>
>>>
>>> Changes v13->v14:
>>>
>>> * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed
>>> *    code was originally introduced. In v13 removed #ifdef's appeared to be
>>> *    a diff in [PATCH 4/5] on top of previous patches.
>>>
>>> Changes v12->v13:
>>>
>>> * 1. Fixed codestyle problem for checkpatch.
>>>
>>> Changes v11->v12:
>>>
>>> * 1. Consolidated UFFD-related code under single #if defined(__linux__).
>>> * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour
>>> *    of more compact code fragment in ram_save_host_page().
>>> * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script.
>>>
>>> Changes v10->v11:
>>>
>>> * 1. Updated commit messages.
>>>
>>> Changes v9->v10:
>>>
>>> * 1. Fixed commit message for [PATCH v9 1/5].
>>>
>>> Changes v8->v9:
>>>
>>> * 1. Fixed wrong cover letter subject.
>>>
>>> Changes v7->v8:
>>>
>>> * 1. Fixed coding style problems to pass checkpatch.
>>>
>>> Changes v6->v7:
>>>
>>> * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request()
>>> *    before stopping VM to make runstate transition valid.
>>> * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled.
>>> * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies.
>>>
>>> Changes v5->v6:
>>>
>>> * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static
>>> *    for write-tracking support level in migrate_query_write_tracking(), check
>>> *    each time when one tries to enable 'background-snapshot' capability.
>>>
>>> Changes v4->v5:
>>>
>>> * 1. Refactored util/userfaultfd.c code to support features required by postcopy.
>>> * 2. Introduced checks for host kernel and guest memory backend compatibility
>>> *    to 'background-snapshot' branch in migrate_caps_check().
>>> * 3. Switched to using trace_xxx instead of info_report()/error_report() for
>>> *    cases when error message must be hidden (probing UFFD-IO) or info may be
>>> *    really littering output if goes to stderr.
>>> * 4  Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list.
>>> * 5. Added memory_region_ref() for each RAM block being wr-protected.
>>> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine.
>>> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t.
>>> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that
>>> *    that choosen criteria for high-latency fault detection (i.e. timestamp of
>>> *    UFFD event fetch) is not representative enough for this task.
>>> *    At the moment it looks somehow like premature optimization effort.
>>> * 8. Dropped some unnecessary/unused code.
>>>
>>> Andrey Gruzdev (5):
>>>    migration: introduce 'background-snapshot' migration capability
>>>    migration: introduce UFFD-WP low-level interface helpers
>>>    migration: support UFFD write fault processing in ram_save_iterate()
>>>    migration: implementation of background snapshot thread
>>>    migration: introduce 'userfaultfd-wrlat.py' script
>>>
>>>   include/exec/memory.h        |   8 +
>>>   include/qemu/userfaultfd.h   |  35 ++++
>>>   migration/migration.c        | 357 ++++++++++++++++++++++++++++++++++-
>>>   migration/migration.h        |   4 +
>>>   migration/ram.c              | 303 ++++++++++++++++++++++++++++-
>>>   migration/ram.h              |   6 +
>>>   migration/savevm.c           |   1 -
>>>   migration/savevm.h           |   2 +
>>>   migration/trace-events       |   2 +
>>>   qapi/migration.json          |   7 +-
>>>   scripts/userfaultfd-wrlat.py | 122 ++++++++++++
>>>   util/meson.build             |   1 +
>>>   util/trace-events            |   9 +
>>>   util/userfaultfd.c           | 345 +++++++++++++++++++++++++++++++++
>>>   14 files changed, 1190 insertions(+), 12 deletions(-)
>>>   create mode 100644 include/qemu/userfaultfd.h
>>>   create mode 100755 scripts/userfaultfd-wrlat.py
>>>   create mode 100644 util/userfaultfd.c
>>>
>>> -- 
>>> 2.25.1
>>>
>>>
>> -- 
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
>>
Andrey Gruzdev Feb. 8, 2021, 11:55 a.m. UTC | #6
On 04.02.2021 19:53, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>> * Andrey Gruzdev (andrey.gruzdev@virtuozzo.com) wrote:
>>> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
>>> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.
>>>
>>> Currently the only way to make (external) live VM snapshot is using existing
>>> dirty page logging migration mechanism. The main problem is that it tends to
>>> produce a lot of page duplicates while running VM goes on updating already
>>> saved pages. That leads to the fact that vmstate image size is commonly several
>>> times bigger then non-zero part of virtual machine's RSS. Time required to
>>> converge RAM migration and the size of snapshot image severely depend on the
>>> guest memory write rate, sometimes resulting in unacceptably long snapshot
>>> creation time and huge image size.
>>>
>>> This series propose a way to solve the aforementioned problems. This is done
>>> by using different RAM migration mechanism based on UFFD write protection
>>> management introduced in v5.7 kernel. The migration strategy is to 'freeze'
>>> guest RAM content using write-protection and iteratively release protection
>>> for memory ranges that have already been saved to the migration stream.
>>> At the same time we read in pending UFFD write fault events and save those
>>> pages out-of-order with higher priority.
>> Queued
>>
> Andrey:
>    I've fixed up some 32bit build casts in the pull.
> Please check them.
>
> Dave

Dave, thanks for fixes, ok with them.

Andrey

>>> How to use:
>>> 1. Enable write-tracking migration capability
>>>     virsh qemu-monitor-command <domain> --hmp migrate_set_capability
>>>     background-snapshot on
>>>
>>> 2. Start the external migration to a file
>>>     virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'
>>>
>>> 3. Wait for the migration finish and check that the migration has completed.
>>> state.
>>>
>>>
>>> Changes v13->v14:
>>>
>>> * 1. Removed unneeded '#ifdef CONFIG_LINUX' from [PATCH 1/5] where #ifdef'ed
>>> *    code was originally introduced. In v13 removed #ifdef's appeared to be
>>> *    a diff in [PATCH 4/5] on top of previous patches.
>>>
>>> Changes v12->v13:
>>>
>>> * 1. Fixed codestyle problem for checkpatch.
>>>
>>> Changes v11->v12:
>>>
>>> * 1. Consolidated UFFD-related code under single #if defined(__linux__).
>>> * 2. Abandoned use of pre/post hooks in ram_find_and_save_block() in favour
>>> *    of more compact code fragment in ram_save_host_page().
>>> * 3. Refactored/simplified eBPF code in userfaultfd-wrlat.py script.
>>>
>>> Changes v10->v11:
>>>
>>> * 1. Updated commit messages.
>>>
>>> Changes v9->v10:
>>>
>>> * 1. Fixed commit message for [PATCH v9 1/5].
>>>
>>> Changes v8->v9:
>>>
>>> * 1. Fixed wrong cover letter subject.
>>>
>>> Changes v7->v8:
>>>
>>> * 1. Fixed coding style problems to pass checkpatch.
>>>
>>> Changes v6->v7:
>>>
>>> * 1. Fixed background snapshot on suspended guest: call qemu_system_wakeup_request()
>>> *    before stopping VM to make runstate transition valid.
>>> * 2. Disabled dirty page logging and log syn when 'background-snapshot' is enabled.
>>> * 3. Introduced 'userfaultfd-wrlat.py' script to analyze UFFD write fault latencies.
>>>
>>> Changes v5->v6:
>>>
>>> * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static
>>> *    for write-tracking support level in migrate_query_write_tracking(), check
>>> *    each time when one tries to enable 'background-snapshot' capability.
>>>
>>> Changes v4->v5:
>>>
>>> * 1. Refactored util/userfaultfd.c code to support features required by postcopy.
>>> * 2. Introduced checks for host kernel and guest memory backend compatibility
>>> *    to 'background-snapshot' branch in migrate_caps_check().
>>> * 3. Switched to using trace_xxx instead of info_report()/error_report() for
>>> *    cases when error message must be hidden (probing UFFD-IO) or info may be
>>> *    really littering output if goes to stderr.
>>> * 4  Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list.
>>> * 5. Added memory_region_ref() for each RAM block being wr-protected.
>>> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine.
>>> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t.
>>> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that
>>> *    that choosen criteria for high-latency fault detection (i.e. timestamp of
>>> *    UFFD event fetch) is not representative enough for this task.
>>> *    At the moment it looks somehow like premature optimization effort.
>>> * 8. Dropped some unnecessary/unused code.
>>>
>>> Andrey Gruzdev (5):
>>>    migration: introduce 'background-snapshot' migration capability
>>>    migration: introduce UFFD-WP low-level interface helpers
>>>    migration: support UFFD write fault processing in ram_save_iterate()
>>>    migration: implementation of background snapshot thread
>>>    migration: introduce 'userfaultfd-wrlat.py' script
>>>
>>>   include/exec/memory.h        |   8 +
>>>   include/qemu/userfaultfd.h   |  35 ++++
>>>   migration/migration.c        | 357 ++++++++++++++++++++++++++++++++++-
>>>   migration/migration.h        |   4 +
>>>   migration/ram.c              | 303 ++++++++++++++++++++++++++++-
>>>   migration/ram.h              |   6 +
>>>   migration/savevm.c           |   1 -
>>>   migration/savevm.h           |   2 +
>>>   migration/trace-events       |   2 +
>>>   qapi/migration.json          |   7 +-
>>>   scripts/userfaultfd-wrlat.py | 122 ++++++++++++
>>>   util/meson.build             |   1 +
>>>   util/trace-events            |   9 +
>>>   util/userfaultfd.c           | 345 +++++++++++++++++++++++++++++++++
>>>   14 files changed, 1190 insertions(+), 12 deletions(-)
>>>   create mode 100644 include/qemu/userfaultfd.h
>>>   create mode 100755 scripts/userfaultfd-wrlat.py
>>>   create mode 100644 util/userfaultfd.c
>>>
>>> -- 
>>> 2.25.1
>>>
>>>
>> -- 
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
>>