diff mbox

[RFC,3/4] A separate thread for the VM migration

Message ID 812725771.1447271.1311134444174.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Umesh Deshpande July 20, 2011, 4 a.m. UTC
This patch creates a separate thread for the guest migration on the source side. The migration routine is called from the migration clock.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 arch_init.c      |    8 +++++++
 buffered_file.c  |   10 ++++-----
 migration-tcp.c  |   18 ++++++++---------
 migration-unix.c |    7 ++----
 migration.c      |   56 +++++++++++++++++++++++++++++--------------------------
 migration.h      |    4 +--
 6 files changed, 57 insertions(+), 46 deletions(-)

Comments

Marcelo Tosatti July 20, 2011, 7:02 p.m. UTC | #1
On Wed, Jul 20, 2011 at 12:00:44AM -0400, Umesh Deshpande wrote:
> This patch creates a separate thread for the guest migration on the source side. The migration routine is called from the migration clock.
> 
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  arch_init.c      |    8 +++++++
>  buffered_file.c  |   10 ++++-----
>  migration-tcp.c  |   18 ++++++++---------
>  migration-unix.c |    7 ++----
>  migration.c      |   56 +++++++++++++++++++++++++++++--------------------------
>  migration.h      |    4 +--
>  6 files changed, 57 insertions(+), 46 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index f81a729..6d44b72 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -260,6 +260,10 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>          return 0;
>      }
>  
> +    if (stage != 3) {
> +        qemu_mutex_lock_iothread();
> +    }
> +
>      if (cpu_physical_sync_dirty_bitmap(0, TARGET_PHYS_ADDR_MAX) != 0) {
>          qemu_file_set_error(f);
>          return 0;
> @@ -267,6 +271,10 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>  
>      sync_migration_bitmap(0, TARGET_PHYS_ADDR_MAX);
>  
> +    if (stage != 3) {
> +        qemu_mutex_unlock_iothread();
> +    }
> +

Many data structures shared by vcpus/iothread and migration thread are
accessed simultaneously without protection. Instead of simply moving
the entire migration routines to a thread, i'd suggest moving only the
time consuming work in ram_save_block (dup_page and put_buffer), after
properly audit for shared access. And send more than one page a time, of
course.

A separate lock for ram_list is probably necessary, so that it can
be accessed from the migration thread.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Umesh Deshpande July 21, 2011, 11:28 p.m. UTC | #2
----- Original Message -----
From: "Marcelo Tosatti" <mtosatti@redhat.com>
To: "Umesh Deshpande" <udeshpan@redhat.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org
Sent: Wednesday, July 20, 2011 3:02:46 PM
Subject: Re: [RFC 3/4] A separate thread for the VM migration

On Wed, Jul 20, 2011 at 12:00:44AM -0400, Umesh Deshpande wrote:
> This patch creates a separate thread for the guest migration on the source side. The migration routine is called from the migration clock.
> 
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  arch_init.c      |    8 +++++++
>  buffered_file.c  |   10 ++++-----
>  migration-tcp.c  |   18 ++++++++---------
>  migration-unix.c |    7 ++----
>  migration.c      |   56 +++++++++++++++++++++++++++++--------------------------
>  migration.h      |    4 +--
>  6 files changed, 57 insertions(+), 46 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index f81a729..6d44b72 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -260,6 +260,10 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>          return 0;
>      }
>  
> +    if (stage != 3) {
> +        qemu_mutex_lock_iothread();
> +    }
> +
>      if (cpu_physical_sync_dirty_bitmap(0, TARGET_PHYS_ADDR_MAX) != 0) {
>          qemu_file_set_error(f);
>          return 0;
> @@ -267,6 +271,10 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>  
>      sync_migration_bitmap(0, TARGET_PHYS_ADDR_MAX);
>  
> +    if (stage != 3) {
> +        qemu_mutex_unlock_iothread();
> +    }
> +

Many data structures shared by vcpus/iothread and migration thread are
accessed simultaneously without protection. Instead of simply moving
the entire migration routines to a thread, i'd suggest moving only the
time consuming work in ram_save_block (dup_page and put_buffer), after
properly audit for shared access. And send more than one page a time, of
course.

The group of migration routines moved into the thread needs to be executed sequentially, because of the way protocol is designed.
Currently, migration is performed in sections, and we cannot proceed to the next section
until current section has been written to the QEMUFile. A thread for any sub-part would introduce parallelism, breaking the sequential semantics.
(Condition variables will have to be used to ensure sequentiality across new thread and iothread)

Secondly, put_buffer is called from iohandler and timers, currently both are called from iothread.
With a separate thread for dup_page and put_buffer, it will also be called from inside the thread.

Another option with the current implementation could be to hold the qemu_mutex inside the thread for most of the part and releasing it for time consuming part in ram_save_block.

A separate lock for ram_list is probably necessary, so that it can
be accessed from the migration thread.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch_init.c b/arch_init.c
index f81a729..6d44b72 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -260,6 +260,10 @@  int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         return 0;
     }
 
+    if (stage != 3) {
+        qemu_mutex_lock_iothread();
+    }
+
     if (cpu_physical_sync_dirty_bitmap(0, TARGET_PHYS_ADDR_MAX) != 0) {
         qemu_file_set_error(f);
         return 0;
@@ -267,6 +271,10 @@  int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
 
     sync_migration_bitmap(0, TARGET_PHYS_ADDR_MAX);
 
+    if (stage != 3) {
+        qemu_mutex_unlock_iothread();
+    }
+
     if (stage == 1) {
         RAMBlock *block;
         bytes_transferred = 0;
diff --git a/buffered_file.c b/buffered_file.c
index 41b42c3..e05efe8 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -237,7 +237,7 @@  static void buffered_rate_tick(void *opaque)
         return;
     }
 
-    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+    qemu_mod_timer(s->timer, qemu_get_clock_ms(migration_clock) + 100);
 
     if (s->freeze_output)
         return;
@@ -246,8 +246,8 @@  static void buffered_rate_tick(void *opaque)
 
     buffered_flush(s);
 
-    /* Add some checks around this */
     s->put_ready(s->opaque);
+    usleep(qemu_timer_difference(s->timer, migration_clock) * 1000);
 }
 
 QEMUFile *qemu_fopen_ops_buffered(void *opaque,
@@ -271,11 +271,11 @@  QEMUFile *qemu_fopen_ops_buffered(void *opaque,
     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
                              buffered_close, buffered_rate_limit,
                              buffered_set_rate_limit,
-			     buffered_get_rate_limit);
+                             buffered_get_rate_limit);
 
-    s->timer = qemu_new_timer_ms(rt_clock, buffered_rate_tick, s);
+    s->timer = qemu_new_timer_ms(migration_clock, buffered_rate_tick, s);
 
-    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+    qemu_mod_timer(s->timer, qemu_get_clock_ms(migration_clock) + 100);
 
     return s->file;
 }
diff --git a/migration-tcp.c b/migration-tcp.c
index d3d80c9..ad1b9d0 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -65,11 +65,9 @@  static void tcp_wait_for_connect(void *opaque)
         return;
     }
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-
-    if (val == 0)
+    if (val == 0) {
         migrate_fd_connect(s);
-    else {
+    } else {
         DPRINTF("error connecting %d\n", val);
         migrate_fd_error(s);
     }
@@ -79,8 +77,8 @@  MigrationState *tcp_start_outgoing_migration(Monitor *mon,
                                              const char *host_port,
                                              int64_t bandwidth_limit,
                                              int detach,
-					     int blk,
-					     int inc)
+                                             int blk,
+                                             int inc)
 {
     struct sockaddr_in addr;
     FdMigrationState *s;
@@ -121,15 +119,17 @@  MigrationState *tcp_start_outgoing_migration(Monitor *mon,
         if (ret == -1)
             ret = -(s->get_error(s));
 
-        if (ret == -EINPROGRESS || ret == -EWOULDBLOCK)
-            qemu_set_fd_handler2(s->fd, NULL, NULL, tcp_wait_for_connect, s);
     } while (ret == -EINTR);
 
     if (ret < 0 && ret != -EINPROGRESS && ret != -EWOULDBLOCK) {
         DPRINTF("connect failed\n");
         migrate_fd_error(s);
-    } else if (ret >= 0)
+    } else if (ret >= 0) {
         migrate_fd_connect(s);
+    } else {
+        migrate_fd_wait_for_unfreeze(s);
+        tcp_wait_for_connect(s);
+    }
 
     return &s->mig_state;
 }
diff --git a/migration-unix.c b/migration-unix.c
index c8625c7..ed57d5a 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -64,8 +64,6 @@  static void unix_wait_for_connect(void *opaque)
         return;
     }
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-
     if (val == 0)
         migrate_fd_connect(s);
     else {
@@ -116,13 +114,14 @@  MigrationState *unix_start_outgoing_migration(Monitor *mon,
         if (ret == -1)
 	    ret = -(s->get_error(s));
 
-        if (ret == -EINPROGRESS || ret == -EWOULDBLOCK)
-	    qemu_set_fd_handler2(s->fd, NULL, NULL, unix_wait_for_connect, s);
     } while (ret == -EINTR);
 
     if (ret < 0 && ret != -EINPROGRESS && ret != -EWOULDBLOCK) {
         DPRINTF("connect failed\n");
         goto err_after_open;
+    } else if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
+        migrate_fd_wait_for_unfreeze(s);
+        unix_wait_for_connect(s);
     }
 
     if (!detach) {
diff --git a/migration.c b/migration.c
index af3a1f2..34b1aa6 100644
--- a/migration.c
+++ b/migration.c
@@ -12,6 +12,8 @@ 
  */
 
 #include "qemu-common.h"
+#include "qemu-thread.h"
+#include "qemu-timer.h"
 #include "migration.h"
 #include "monitor.h"
 #include "buffered_file.h"
@@ -35,6 +37,7 @@ 
 static int64_t max_throttle = (32 << 20);
 
 static MigrationState *current_migration;
+char host_port[50];
 
 static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
@@ -284,8 +315,6 @@  int migrate_fd_cleanup(FdMigrationState *s)
 {
     int ret = 0;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-
     if (s->file) {
         DPRINTF("closing file\n");
         if (qemu_fclose(s->file) != 0) {
@@ -307,14 +336,6 @@  int migrate_fd_cleanup(FdMigrationState *s)
     return ret;
 }
 
-void migrate_fd_put_notify(void *opaque)
-{
-    FdMigrationState *s = opaque;
-
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-    qemu_file_put_notify(s->file);
-}
-
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
 {
     FdMigrationState *s = opaque;
@@ -327,9 +348,7 @@  ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     if (ret == -1)
         ret = -(s->get_error(s));
 
-    if (ret == -EAGAIN) {
-        qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
-    } else if (ret < 0) {
+    if (ret < 0 && ret != -EAGAIN) {
         if (s->mon) {
             monitor_resume(s->mon);
         }
@@ -340,10 +359,29 @@  ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     return ret;
 }
 
-void migrate_fd_connect(FdMigrationState *s)
+void *migrate_run_timers(void *arg)
 {
+    FdMigrationState *s = arg;
     int ret;
+    ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
+            s->mig_state.shared);
+    if (ret < 0) {
+        DPRINTF("failed, %d\n", ret);
+        migrate_fd_error(s);
+        return NULL;
+    }
+
+    migrate_fd_put_ready(s);
+    while (s->state == MIG_STATE_ACTIVE) {
+        qemu_run_timers(migration_clock);
+    }
+
+    return NULL;
+}
 
+void migrate_fd_connect(FdMigrationState *s)
+{
+    struct QemuThread migrate_thread;
     s->file = qemu_fopen_ops_buffered(s,
                                       s->bandwidth_limit,
                                       migrate_fd_put_buffer,
@@ -352,15 +390,7 @@  void migrate_fd_connect(FdMigrationState *s)
                                       migrate_fd_close);
 
     DPRINTF("beginning savevm\n");
-    ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
-                                  s->mig_state.shared);
-    if (ret < 0) {
-        DPRINTF("failed, %d\n", ret);
-        migrate_fd_error(s);
-        return;
-    }
-    
-    migrate_fd_put_ready(s);
+    qemu_thread_create(&migrate_thread, migrate_run_timers, s);
 }
 
 void migrate_fd_put_ready(void *opaque)
@@ -376,8 +406,7 @@  void migrate_fd_put_ready(void *opaque)
     if (qemu_savevm_state_iterate(s->mon, s->file) == 1) {
         int state;
         int old_vm_running = vm_running;
-
-        DPRINTF("done iterating\n");
+        qemu_mutex_lock_iothread();
         vm_stop(VMSTOP_MIGRATE);
 
         if ((qemu_savevm_state_complete(s->mon, s->file)) < 0) {
@@ -396,6 +425,10 @@  void migrate_fd_put_ready(void *opaque)
         }
         s->state = state;
         notifier_list_notify(&migration_state_notifiers);
+        qemu_mutex_unlock_iothread();
+    } else {
+        migrate_fd_wait_for_unfreeze(s);
+        qemu_file_put_notify(s->file);
     }
 }
 
@@ -458,7 +491,6 @@  int migrate_fd_close(void *opaque)
 {
     FdMigrationState *s = opaque;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     return s->close(s);
 }
 
diff --git a/migration.h b/migration.h
index 050c56c..33318d9 100644
--- a/migration.h
+++ b/migration.h
@@ -72,6 +86,8 @@  void do_info_migrate(Monitor *mon, QObject **ret_data);
 
 int exec_start_incoming_migration(const char *host_port);
 
+void *migrate_run_timers(void *);
+
 MigrationState *exec_start_outgoing_migration(Monitor *mon,
                                               const char *host_port,
 					      int64_t bandwidth_limit,
@@ -112,8 +128,6 @@  void migrate_fd_error(FdMigrationState *s);
 
 int migrate_fd_cleanup(FdMigrationState *s);
 
-void migrate_fd_put_notify(void *opaque);
-
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 
 void migrate_fd_connect(FdMigrationState *s);