diff mbox series

migration: RDMA registrations interval optimization

Message ID 20210820155756.3899605-1-elish.jiang@ucloud.cn (mailing list archive)
State New, archived
Headers show
Series migration: RDMA registrations interval optimization | expand

Commit Message

Zhiwei Jiang Aug. 20, 2021, 3:57 p.m. UTC
RDMA migration very hard to complete when VM run mysql
benchmark on 1G host hugepage.I think the time between
ram_control_before_iterate(f, RAM_CONTROL_ROUND) and
after_iterate is too large when 1G host pagesize,so 1M
buffer size match with mlx driver that will be good.
after this patch,it will work as normal on my situation.

Signed-off-by: Zhiwei Jiang <elish.jiang@ucloud.cn>
---
 migration/migration.c | 13 +++++++++++++
 migration/migration.h |  6 ++++++
 migration/ram.c       |  6 +++++-
 3 files changed, 24 insertions(+), 1 deletion(-)

Comments

Juan Quintela Nov. 2, 2021, 4:47 p.m. UTC | #1
Zhiwei Jiang <elish.jiang@ucloud.cn> wrote:
> RDMA migration very hard to complete when VM run mysql
> benchmark on 1G host hugepage.I think the time between
> ram_control_before_iterate(f, RAM_CONTROL_ROUND) and
> after_iterate is too large when 1G host pagesize,so 1M
> buffer size match with mlx driver that will be good.
> after this patch,it will work as normal on my situation.
>
> Signed-off-by: Zhiwei Jiang <elish.jiang@ucloud.cn>

Hi

Nice catch.

Could you split the migrate_use_rdma() part (which is obvious).
I can't believe that wedon't have that function. And the rest.

See my comments there.

> diff --git a/migration/migration.c b/migration/migration.c
> index 041b8451a6..934916b161 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -457,6 +457,8 @@ void migrate_add_address(SocketAddress *address)
>  static void qemu_start_incoming_migration(const char *uri, Error **errp)
>  {
>      const char *p = NULL;
> +    MigrationState *s = migrate_get_current();
> +    s->enabled_rdma_migration = false;

This shouldn't be needed (famous last words).  If it is needed, we are
masking other problem that we need to fix, like not initialzing the
migration state properly.
> +    /*
> +     * Enable RDMA migration
> +     */
> +    bool enabled_rdma_migration;


I think that ""rdma_enabled" is enough, what do you think?

I think that the comment is not needed.

> diff --git a/migration/ram.c b/migration/ram.c
> index 7a43bfd7af..dc0c0e2565 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2043,7 +2043,11 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
>      unsigned long hostpage_boundary =
>          QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
> +    /* Set RDMA boundary default 256*4K=1M that driver delivery more effective*/
> +    unsigned long rdma_boundary =
> +        QEMU_ALIGN_UP(pss->page + 1, 256);
>      unsigned long start_page = pss->page;
> +    bool use_rdma = migrate_use_rdma();
>      int res;

To not have to do the comparison everytime, what about:

       unsigned long boundary = migrate_use_rdma() ? rdma_boundary : hostpage_boundary;

>      if (ramblock_is_ignored(pss->block)) {
> @@ -2069,7 +2073,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>              }
>          }
>          pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
> -    } while ((pss->page < hostpage_boundary) &&
> +    } while ((pss->page < (use_rdma ? rdma_boundary : hostpage_boundary)) &&
>               offset_in_ramblock(pss->block,
>                                  ((ram_addr_t)pss->page) <<
> TARGET_PAGE_BITS));

    } while ((pss->page < boundary) &&
              offset_in_ramblock(pss->block,
                                 ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));

We remove one comparison in every round.

What do you think?

Later, Juan.

PD. Nice acchievement being able to migrate with a relative high load
    using 1GB pages.  Could you share your setup and how fast it goes.
    Thanks.
Dr. David Alan Gilbert Nov. 3, 2021, 12:11 p.m. UTC | #2
* Zhiwei Jiang (elish.jiang@ucloud.cn) wrote:
> RDMA migration very hard to complete when VM run mysql
> benchmark on 1G host hugepage.I think the time between
> ram_control_before_iterate(f, RAM_CONTROL_ROUND) and
> after_iterate is too large when 1G host pagesize,so 1M
> buffer size match with mlx driver that will be good.
> after this patch,it will work as normal on my situation.

Hi,
  Apologies for the delay in responding.

There are a few things I don't understand about this patch:
  a) Splitting the ram_save-host_page is probably wrong, in other
places we rely on this always saving the whole page.

  b) I don't understand why splitting ram_save_host_page to a smaller
size, causes ram_control..(RAM_CONTROL_ROUND) to be called more often
- it might be if you have the bandwidth limits set?

  c) I thought the RDMA cards preferred large transfers? Can you provide
some pointer to the information on this 1M buffer size and why it slows
it down?

Dave

> Signed-off-by: Zhiwei Jiang <elish.jiang@ucloud.cn>
> ---
>  migration/migration.c | 13 +++++++++++++
>  migration/migration.h |  6 ++++++
>  migration/ram.c       |  6 +++++-
>  3 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 041b8451a6..934916b161 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -457,6 +457,8 @@ void migrate_add_address(SocketAddress *address)
>  static void qemu_start_incoming_migration(const char *uri, Error **errp)
>  {
>      const char *p = NULL;
> +    MigrationState *s = migrate_get_current();
> +    s->enabled_rdma_migration = false;
>  
>      qapi_event_send_migration(MIGRATION_STATUS_SETUP);
>      if (strstart(uri, "tcp:", &p) ||
> @@ -465,6 +467,7 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
>          socket_start_incoming_migration(p ? p : uri, errp);
>  #ifdef CONFIG_RDMA
>      } else if (strstart(uri, "rdma:", &p)) {
> +        s->enabled_rdma_migration = true;
>          rdma_start_incoming_migration(p, errp);
>  #endif
>      } else if (strstart(uri, "exec:", &p)) {
> @@ -2040,6 +2043,7 @@ void migrate_init(MigrationState *s)
>      s->start_postcopy = false;
>      s->postcopy_after_devices = false;
>      s->migration_thread_running = false;
> +    s->enabled_rdma_migration = false;
>      error_free(s->error);
>      s->error = NULL;
>      s->hostname = NULL;
> @@ -2300,6 +2304,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>          socket_start_outgoing_migration(s, p ? p : uri, &local_err);
>  #ifdef CONFIG_RDMA
>      } else if (strstart(uri, "rdma:", &p)) {
> +        s->enabled_rdma_migration = true;
>          rdma_start_outgoing_migration(s, p, &local_err);
>  #endif
>      } else if (strstart(uri, "exec:", &p)) {
> @@ -2475,6 +2480,14 @@ bool migrate_use_events(void)
>      return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS];
>  }
>  
> +bool migrate_use_rdma(void)
> +{
> +    MigrationState *s;
> +    s = migrate_get_current();
> +
> +    return s->enabled_rdma_migration;
> +}
> +
>  bool migrate_use_multifd(void)
>  {
>      MigrationState *s;
> diff --git a/migration/migration.h b/migration/migration.h
> index 7a5aa8c2fd..860dc93df1 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -296,6 +296,11 @@ struct MigrationState {
>       * This save hostname when out-going migration starts
>       */
>      char *hostname;
> +
> +    /*
> +     * Enable RDMA migration
> +     */
> +    bool enabled_rdma_migration;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> @@ -332,6 +337,7 @@ bool migrate_ignore_shared(void);
>  bool migrate_validate_uuid(void);
>  
>  bool migrate_auto_converge(void);
> +bool migrate_use_rdma(void);
>  bool migrate_use_multifd(void);
>  bool migrate_pause_before_switchover(void);
>  int migrate_multifd_channels(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index 7a43bfd7af..dc0c0e2565 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2043,7 +2043,11 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
>      unsigned long hostpage_boundary =
>          QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
> +    /* Set RDMA boundary default 256*4K=1M that driver delivery more effective*/
> +    unsigned long rdma_boundary =
> +        QEMU_ALIGN_UP(pss->page + 1, 256);
>      unsigned long start_page = pss->page;
> +    bool use_rdma = migrate_use_rdma();
>      int res;
>  
>      if (ramblock_is_ignored(pss->block)) {
> @@ -2069,7 +2073,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>              }
>          }
>          pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
> -    } while ((pss->page < hostpage_boundary) &&
> +    } while ((pss->page < (use_rdma ? rdma_boundary : hostpage_boundary)) &&
>               offset_in_ramblock(pss->block,
>                                  ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
>      /* The offset we leave with is the min boundary of host page and block */
> -- 
> 2.25.1
> 
>
diff mbox series

Patch

diff --git a/migration/migration.c b/migration/migration.c
index 041b8451a6..934916b161 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -457,6 +457,8 @@  void migrate_add_address(SocketAddress *address)
 static void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p = NULL;
+    MigrationState *s = migrate_get_current();
+    s->enabled_rdma_migration = false;
 
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (strstart(uri, "tcp:", &p) ||
@@ -465,6 +467,7 @@  static void qemu_start_incoming_migration(const char *uri, Error **errp)
         socket_start_incoming_migration(p ? p : uri, errp);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
+        s->enabled_rdma_migration = true;
         rdma_start_incoming_migration(p, errp);
 #endif
     } else if (strstart(uri, "exec:", &p)) {
@@ -2040,6 +2043,7 @@  void migrate_init(MigrationState *s)
     s->start_postcopy = false;
     s->postcopy_after_devices = false;
     s->migration_thread_running = false;
+    s->enabled_rdma_migration = false;
     error_free(s->error);
     s->error = NULL;
     s->hostname = NULL;
@@ -2300,6 +2304,7 @@  void qmp_migrate(const char *uri, bool has_blk, bool blk,
         socket_start_outgoing_migration(s, p ? p : uri, &local_err);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
+        s->enabled_rdma_migration = true;
         rdma_start_outgoing_migration(s, p, &local_err);
 #endif
     } else if (strstart(uri, "exec:", &p)) {
@@ -2475,6 +2480,14 @@  bool migrate_use_events(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS];
 }
 
+bool migrate_use_rdma(void)
+{
+    MigrationState *s;
+    s = migrate_get_current();
+
+    return s->enabled_rdma_migration;
+}
+
 bool migrate_use_multifd(void)
 {
     MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 7a5aa8c2fd..860dc93df1 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -296,6 +296,11 @@  struct MigrationState {
      * This save hostname when out-going migration starts
      */
     char *hostname;
+
+    /*
+     * Enable RDMA migration
+     */
+    bool enabled_rdma_migration;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
@@ -332,6 +337,7 @@  bool migrate_ignore_shared(void);
 bool migrate_validate_uuid(void);
 
 bool migrate_auto_converge(void);
+bool migrate_use_rdma(void);
 bool migrate_use_multifd(void);
 bool migrate_pause_before_switchover(void);
 int migrate_multifd_channels(void);
diff --git a/migration/ram.c b/migration/ram.c
index 7a43bfd7af..dc0c0e2565 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2043,7 +2043,11 @@  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
         qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
     unsigned long hostpage_boundary =
         QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
+    /* Set RDMA boundary default 256*4K=1M that driver delivery more effective*/
+    unsigned long rdma_boundary =
+        QEMU_ALIGN_UP(pss->page + 1, 256);
     unsigned long start_page = pss->page;
+    bool use_rdma = migrate_use_rdma();
     int res;
 
     if (ramblock_is_ignored(pss->block)) {
@@ -2069,7 +2073,7 @@  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
             }
         }
         pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
-    } while ((pss->page < hostpage_boundary) &&
+    } while ((pss->page < (use_rdma ? rdma_boundary : hostpage_boundary)) &&
              offset_in_ramblock(pss->block,
                                 ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
     /* The offset we leave with is the min boundary of host page and block */