[v1,1/7] migration: Introduce structs for background sync

Message ID	531750c8d7b6c09f877b5f335a60fab402c168be.1726390098.git.yong.huang@smartx.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Hyman Huang <yong.huang@smartx.com> To: qemu-devel@nongnu.org Cc: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>, David Hildenbrand <david@redhat.com>, =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, yong.huang@smartx.com Subject: [PATCH v1 1/7] migration: Introduce structs for background sync Date: Mon, 16 Sep 2024 00:08:44 +0800 Message-Id: <531750c8d7b6c09f877b5f335a60fab402c168be.1726390098.git.yong.huang@smartx.com> In-Reply-To: <cover.1726390098.git.yong.huang@smartx.com> References: <cover.1726390098.git.yong.huang@smartx.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::429; envelope-from=yong.huang@smartx.com; helo=mail-pf1-x429.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	migration: auto-converge refinements for huge VM \| expand [v1,0/7] migration: auto-converge refinements for huge VM [v1,1/7] migration: Introduce structs for background sync [v1,2/7] migration: Refine util functions to support background sync [v1,3/7] qapi/migration: Introduce the iteration-count [v1,4/7] migration: Implment background sync watcher [v1,5/7] migration: Support background dirty bitmap sync and throttle [v1,6/7] qapi/migration: Introduce cpu-responsive-throttle parameter [v1,7/7] migration: Support responsive CPU throttle

Yong Huang Sept. 15, 2024, 4:08 p.m. UTC

shadow_bmap, iter_bmap and iter_dirty_pages are introduced
to satisfy the need for background sync.

Meanwhile, introduce enumeration of sync method.

Signed-off-by: Hyman Huang <yong.huang@smartx.com>
---
 include/exec/ramblock.h | 45 +++++++++++++++++++++++++++++++++++++++++
 migration/ram.c         |  6 ++++++
 2 files changed, 51 insertions(+)

Fabiano Rosas Sept. 16, 2024, 9:11 p.m. UTC | #1

Hyman Huang <yong.huang@smartx.com> writes:

> shadow_bmap, iter_bmap and iter_dirty_pages are introduced
> to satisfy the need for background sync.
>
> Meanwhile, introduce enumeration of sync method.
>
> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> ---
>  include/exec/ramblock.h | 45 +++++++++++++++++++++++++++++++++++++++++
>  migration/ram.c         |  6 ++++++
>  2 files changed, 51 insertions(+)
>
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 0babd105c0..0e327bc0ae 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -24,6 +24,30 @@
>  #include "qemu/rcu.h"
>  #include "exec/ramlist.h"
>  
> +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */
> +
> +/*
> + * The old-fashioned sync, which is, in turn, used for CPU
> + * throttle and memory transfer.

I'm not sure I follow what "in turn" is supposed to mean in this
sentence. Could you clarify?

> + */
> +#define RAMBLOCK_SYN_LEGACY_ITER   (1U << 0)

So ITER is as opposed to background? I'm a bit confused with the terms.

> +
> +/*
> + * The modern sync, which is, in turn, used for CPU throttle
> + * and memory transfer.
> + */
> +#define RAMBLOCK_SYN_MODERN_ITER   (1U << 1)
> +
> +/* The modern sync, which is used for CPU throttle only */
> +#define RAMBLOCK_SYN_MODERN_BACKGROUND    (1U << 2)

What's the plan for the "legacy" part? To be removed soon? Do we want to
remove it now? Maybe better to not use the modern/legacy terms unless we
want to give the impression that the legacy one is discontinued.

> +
> +#define RAMBLOCK_SYN_MASK  (0x7)
> +
> +typedef enum RAMBlockSynMode {
> +    RAMBLOCK_SYN_LEGACY, /* Old-fashined mode */
> +    RAMBLOCK_SYN_MODERN, /* Background-sync-supported mode */
> +} RAMBlockSynMode;

I'm also wondering wheter we need this enum + the flags or one of them
would suffice. I'm looking at code like this in the following patches,
for instance:

+    if (sync_mode == RAMBLOCK_SYN_MODERN) {
+        if (background) {
+            flag = RAMBLOCK_SYN_MODERN_BACKGROUND;
+        } else {
+            flag = RAMBLOCK_SYN_MODERN_ITER;
+        }
+    }

Couldn't we use LEGACY/BG/ITER?

> +
>  struct RAMBlock {
>      struct rcu_head rcu;
>      struct MemoryRegion *mr;
> @@ -89,6 +113,27 @@ struct RAMBlock {
>       * could not have been valid on the source.
>       */
>      ram_addr_t postcopy_length;
> +
> +    /*
> +     * Used to backup the bmap during background sync to see whether any dirty
> +     * pages were sent during that time.
> +     */
> +    unsigned long *shadow_bmap;
> +
> +    /*
> +     * The bitmap "bmap," which was initially used for both sync and memory
> +     * transfer, will be replaced by two bitmaps: the previously used "bmap"
> +     * and the recently added "iter_bmap." Only the memory transfer is
> +     * conducted with the previously used "bmap"; the recently added
> +     * "iter_bmap" is utilized for dirty bitmap sync.
> +     */
> +    unsigned long *iter_bmap;
> +
> +    /* Number of new dirty pages during iteration */
> +    uint64_t iter_dirty_pages;
> +
> +    /* If background sync has shown up during iteration */
> +    bool background_sync_shown_up;
>  };
>  #endif
>  #endif
> diff --git a/migration/ram.c b/migration/ram.c
> index 67ca3d5d51..f29faa82d6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2362,6 +2362,10 @@ static void ram_bitmaps_destroy(void)
>          block->bmap = NULL;
>          g_free(block->file_bmap);
>          block->file_bmap = NULL;
> +        g_free(block->shadow_bmap);
> +        block->shadow_bmap = NULL;
> +        g_free(block->iter_bmap);
> +        block->iter_bmap = NULL;
>      }
>  }
>  
> @@ -2753,6 +2757,8 @@ static void ram_list_init_bitmaps(void)
>              }
>              block->clear_bmap_shift = shift;
>              block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
> +            block->shadow_bmap = bitmap_new(pages);
> +            block->iter_bmap = bitmap_new(pages);
>          }
>      }
>  }

Yong Huang Sept. 17, 2024, 6:48 a.m. UTC | #2

On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote:

> Hyman Huang <yong.huang@smartx.com> writes:
>
> > shadow_bmap, iter_bmap and iter_dirty_pages are introduced
> > to satisfy the need for background sync.
> >
> > Meanwhile, introduce enumeration of sync method.
> >
> > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > ---
> >  include/exec/ramblock.h | 45 +++++++++++++++++++++++++++++++++++++++++
> >  migration/ram.c         |  6 ++++++
> >  2 files changed, 51 insertions(+)
> >
> > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> > index 0babd105c0..0e327bc0ae 100644
> > --- a/include/exec/ramblock.h
> > +++ b/include/exec/ramblock.h
> > @@ -24,6 +24,30 @@
> >  #include "qemu/rcu.h"
> >  #include "exec/ramlist.h"
> >
> > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */
> > +
> > +/*
> > + * The old-fashioned sync, which is, in turn, used for CPU
> > + * throttle and memory transfer.
>

Using the traditional sync method, the page sending logic iterates
the "bmap" to transfer dirty pages while the CPU throttle logic
counts the amount of new dirty pages and detects convergence.
There are two uses for "bmap".

Using the modern sync method, "bmap" is used for transfer
dirty pages and "iter_bmap" is used to track new dirty pages.


> I'm not sure I follow what "in turn" is supposed to mean in this
> sentence. Could you clarify?
>

Here I want to express "in sequence".  But failed obviously. :(


>
> > + */
> > +#define RAMBLOCK_SYN_LEGACY_ITER   (1U << 0)
>
> So ITER is as opposed to background? I'm a bit confused with the terms.
>

Yes.


>
> > +
> > +/*
> > + * The modern sync, which is, in turn, used for CPU throttle
> > + * and memory transfer.
> > + */
> > +#define RAMBLOCK_SYN_MODERN_ITER   (1U << 1)
> > +
> > +/* The modern sync, which is used for CPU throttle only */
> > +#define RAMBLOCK_SYN_MODERN_BACKGROUND    (1U << 2)
>
> What's the plan for the "legacy" part? To be removed soon? Do we want to
> remove it now? Maybe better to not use the modern/legacy terms unless we
> want to give the impression that the legacy one is discontinued.
>

The bitmap they utilized to track the dirty page information was the
distinction between the "legacy iteration" and the "modern iteration."
The "iter_bmap" field is used by the "modern iteration" while the "bmap"
field is used by the "legacy iteration."

Since the refinement is now transparent and there is no API available to
change the sync method, I actually want to remove it right now in order
to simplify the logic. I'll include it in the next version.


>
> > +
> > +#define RAMBLOCK_SYN_MASK  (0x7)
> > +
> > +typedef enum RAMBlockSynMode {
> > +    RAMBLOCK_SYN_LEGACY, /* Old-fashined mode */
> > +    RAMBLOCK_SYN_MODERN, /* Background-sync-supported mode */
> > +} RAMBlockSynMode;
>
> I'm also wondering wheter we need this enum + the flags or one of them
> would suffice. I'm looking at code like this in the following patches,
> for instance:
>

If we drop the "legacy modern", we can simplify the following
logic too.


> +    if (sync_mode == RAMBLOCK_SYN_MODERN) {
> +        if (background) {
> +            flag = RAMBLOCK_SYN_MODERN_BACKGROUND;
> +        } else {
> +            flag = RAMBLOCK_SYN_MODERN_ITER;
> +        }
> +    }

Couldn't we use LEGACY/BG/ITER?


> > +
> >  struct RAMBlock {
> >      struct rcu_head rcu;
> >      struct MemoryRegion *mr;
> > @@ -89,6 +113,27 @@ struct RAMBlock {
> >       * could not have been valid on the source.
> >       */
> >      ram_addr_t postcopy_length;
> > +
> > +    /*
> > +     * Used to backup the bmap during background sync to see whether
> any dirty
> > +     * pages were sent during that time.
> > +     */
> > +    unsigned long *shadow_bmap;
> > +
> > +    /*
> > +     * The bitmap "bmap," which was initially used for both sync and
> memory
> > +     * transfer, will be replaced by two bitmaps: the previously used
> "bmap"
> > +     * and the recently added "iter_bmap." Only the memory transfer is
> > +     * conducted with the previously used "bmap"; the recently added
> > +     * "iter_bmap" is utilized for dirty bitmap sync.
> > +     */
> > +    unsigned long *iter_bmap;
> > +
> > +    /* Number of new dirty pages during iteration */
> > +    uint64_t iter_dirty_pages;
> > +
> > +    /* If background sync has shown up during iteration */
> > +    bool background_sync_shown_up;
> >  };
> >  #endif
> >  #endif
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 67ca3d5d51..f29faa82d6 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2362,6 +2362,10 @@ static void ram_bitmaps_destroy(void)
> >          block->bmap = NULL;
> >          g_free(block->file_bmap);
> >          block->file_bmap = NULL;
> > +        g_free(block->shadow_bmap);
> > +        block->shadow_bmap = NULL;
> > +        g_free(block->iter_bmap);
> > +        block->iter_bmap = NULL;
> >      }
> >  }
> >
> > @@ -2753,6 +2757,8 @@ static void ram_list_init_bitmaps(void)
> >              }
> >              block->clear_bmap_shift = shift;
> >              block->clear_bmap = bitmap_new(clear_bmap_size(pages,
> shift));
> > +            block->shadow_bmap = bitmap_new(pages);
> > +            block->iter_bmap = bitmap_new(pages);
> >          }
> >      }
> >  }
>

Peter Xu Sept. 19, 2024, 6:45 p.m. UTC | #3

On Tue, Sep 17, 2024 at 02:48:03PM +0800, Yong Huang wrote:
> On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote:
> 
> > Hyman Huang <yong.huang@smartx.com> writes:
> >
> > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced
> > > to satisfy the need for background sync.
> > >
> > > Meanwhile, introduce enumeration of sync method.
> > >
> > > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > > ---
> > >  include/exec/ramblock.h | 45 +++++++++++++++++++++++++++++++++++++++++
> > >  migration/ram.c         |  6 ++++++
> > >  2 files changed, 51 insertions(+)
> > >
> > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> > > index 0babd105c0..0e327bc0ae 100644
> > > --- a/include/exec/ramblock.h
> > > +++ b/include/exec/ramblock.h
> > > @@ -24,6 +24,30 @@
> > >  #include "qemu/rcu.h"
> > >  #include "exec/ramlist.h"
> > >
> > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */
> > > +
> > > +/*
> > > + * The old-fashioned sync, which is, in turn, used for CPU
> > > + * throttle and memory transfer.
> >
> 
> Using the traditional sync method, the page sending logic iterates
> the "bmap" to transfer dirty pages while the CPU throttle logic
> counts the amount of new dirty pages and detects convergence.
> There are two uses for "bmap".
> 
> Using the modern sync method, "bmap" is used for transfer
> dirty pages and "iter_bmap" is used to track new dirty pages.
> 
> 
> > I'm not sure I follow what "in turn" is supposed to mean in this
> > sentence. Could you clarify?
> >
> 
> Here I want to express "in sequence".  But failed obviously. :(
> 
> 
> >
> > > + */
> > > +#define RAMBLOCK_SYN_LEGACY_ITER   (1U << 0)
> >
> > So ITER is as opposed to background? I'm a bit confused with the terms.
> >
> 
> Yes.
> 
> 
> >
> > > +
> > > +/*
> > > + * The modern sync, which is, in turn, used for CPU throttle
> > > + * and memory transfer.
> > > + */
> > > +#define RAMBLOCK_SYN_MODERN_ITER   (1U << 1)
> > > +
> > > +/* The modern sync, which is used for CPU throttle only */
> > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND    (1U << 2)
> >
> > What's the plan for the "legacy" part? To be removed soon? Do we want to
> > remove it now? Maybe better to not use the modern/legacy terms unless we
> > want to give the impression that the legacy one is discontinued.
> >
> 
> The bitmap they utilized to track the dirty page information was the
> distinction between the "legacy iteration" and the "modern iteration."
> The "iter_bmap" field is used by the "modern iteration" while the "bmap"
> field is used by the "legacy iteration."
> 
> Since the refinement is now transparent and there is no API available to
> change the sync method, I actually want to remove it right now in order
> to simplify the logic. I'll include it in the next version.

How confident do we think the new way is better than the old?

If it'll be 100% / always better, I agree we can consider removing the old.
But is it always better?  At least it consumes much more resources..

Otherwise, we can still leave that logic as-is but use a migration property
to turn it on only on new machines I think.

Besides, could you explain why the solution needs to be this complex?  My
previous question was that we sync dirty too less, while auto converge
relies on dirty information, so that means auto converge can be adjusted
too unfrequently.

However I wonder whether that can be achieved in a simpler manner by
e.g. invoke migration_bitmap_sync_precopy() more frequently during
migration, for example, in ram_save_iterate() - not every time but the
iterate() is invoked much more frequent, and maybe we can do sync from time
to time.

I also don't see why we need a separate thread, plus two new bitmaps, to
achieve this..  I didn't read in-depth yet, but I thought dirty sync
requires bql anyway, then I don't yet understand why the two bitmaps are
required.  If the bitmaps are introduced in the 1st patch, IMO it'll be
great to explain clearly on why they're needed here.

Thanks,

Yong Huang Sept. 20, 2024, 2:43 a.m. UTC | #4

On Fri, Sep 20, 2024 at 2:45 AM Peter Xu <peterx@redhat.com> wrote:

> On Tue, Sep 17, 2024 at 02:48:03PM +0800, Yong Huang wrote:
> > On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote:
> >
> > > Hyman Huang <yong.huang@smartx.com> writes:
> > >
> > > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced
> > > > to satisfy the need for background sync.
> > > >
> > > > Meanwhile, introduce enumeration of sync method.
> > > >
> > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > > > ---
> > > >  include/exec/ramblock.h | 45
> +++++++++++++++++++++++++++++++++++++++++
> > > >  migration/ram.c         |  6 ++++++
> > > >  2 files changed, 51 insertions(+)
> > > >
> > > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> > > > index 0babd105c0..0e327bc0ae 100644
> > > > --- a/include/exec/ramblock.h
> > > > +++ b/include/exec/ramblock.h
> > > > @@ -24,6 +24,30 @@
> > > >  #include "qemu/rcu.h"
> > > >  #include "exec/ramlist.h"
> > > >
> > > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */
> > > > +
> > > > +/*
> > > > + * The old-fashioned sync, which is, in turn, used for CPU
> > > > + * throttle and memory transfer.
> > >
> >
> > Using the traditional sync method, the page sending logic iterates
> > the "bmap" to transfer dirty pages while the CPU throttle logic
> > counts the amount of new dirty pages and detects convergence.
> > There are two uses for "bmap".
> >
> > Using the modern sync method, "bmap" is used for transfer
> > dirty pages and "iter_bmap" is used to track new dirty pages.
> >
> >
> > > I'm not sure I follow what "in turn" is supposed to mean in this
> > > sentence. Could you clarify?
> > >
> >
> > Here I want to express "in sequence".  But failed obviously. :(
> >
> >
> > >
> > > > + */
> > > > +#define RAMBLOCK_SYN_LEGACY_ITER   (1U << 0)
> > >
> > > So ITER is as opposed to background? I'm a bit confused with the terms.
> > >
> >
> > Yes.
> >
> >
> > >
> > > > +
> > > > +/*
> > > > + * The modern sync, which is, in turn, used for CPU throttle
> > > > + * and memory transfer.
> > > > + */
> > > > +#define RAMBLOCK_SYN_MODERN_ITER   (1U << 1)
> > > > +
> > > > +/* The modern sync, which is used for CPU throttle only */
> > > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND    (1U << 2)
> > >
> > > What's the plan for the "legacy" part? To be removed soon? Do we want
> to
> > > remove it now? Maybe better to not use the modern/legacy terms unless
> we
> > > want to give the impression that the legacy one is discontinued.
> > >
> >
> > The bitmap they utilized to track the dirty page information was the
> > distinction between the "legacy iteration" and the "modern iteration."
> > The "iter_bmap" field is used by the "modern iteration" while the "bmap"
> > field is used by the "legacy iteration."
> >
> > Since the refinement is now transparent and there is no API available to
> > change the sync method, I actually want to remove it right now in order
> > to simplify the logic. I'll include it in the next version.
>
> How confident do we think the new way is better than the old?
>
> If it'll be 100% / always better, I agree we can consider removing the old.
> But is it always better?  At least it consumes much more resources..
>
> Otherwise, we can still leave that logic as-is but use a migration property
> to turn it on only on new machines I think.
>
> Besides, could you explain why the solution needs to be this complex?  My
> previous question was that we sync dirty too less, while auto converge
> relies on dirty information, so that means auto converge can be adjusted
> too unfrequently.
>

The original logic will update the bmap for each sync, which was used to
conduct the dirty page sending. In the background sync logic, we do not
want to update bmap to interfere with the behavior of page sending for
each background sync, since the bitmap we are syncing is only used to
detect the convergence and do the CPU throttle.

The iteration sync wants to 1: sync dirty bitmap, 2:detect convergence,
3: do the CPU throttle and 4: use the bmap fetched to conduct the page
sending, while the background sync only does the 1,2,3. They have different
purposes. These logic need at least two bitmap, one is used to page sending
and another is used for CPU throttling, to achieve this, we introduced the
iter_bmap as the temporary bitmap to store the dirty page information
during background sync and copy it to the bmap in the iteration sync logic.
However, the dirty page information in iter_bmap may be repetitive since
the dirty pages it records could be sent after background syncing, we
introduced the shadow_bmap to help calculate the dirty pages having
been sent during two background syncs.


> However I wonder whether that can be achieved in a simpler manner by
>

I have tried my best to make the solution simpler but failed. :(


> e.g. invoke migration_bitmap_sync_precopy() more frequently during
>

Yes, invoke migration_bitmap_sync_precopy more frequently is also my
first idea but it involves bitmap updating and interfere with the behavior
of page sending, it also affects the migration information stats and
interfere
other migration logic such as migration_update_rates().


> migration, for example, in ram_save_iterate() - not every time but the
> iterate() is invoked much more frequent, and maybe we can do sync from time
> to time.


> I also don't see why we need a separate thread, plus two new bitmaps, to
> achieve this..  I didn't read in-depth yet, but I thought dirty sync
> requires bql anyway, then I don't yet understand why the two bitmaps are
> required.  If the bitmaps are introduced in the 1st patch, IMO it'll be
> great to explain clearly on why they're needed here.
>
> Thanks,
>
> --
> Peter Xu
>
>
Thanks,

Yong

Yong Huang Sept. 20, 2024, 3:02 a.m. UTC | #5

On Fri, Sep 20, 2024 at 2:45 AM Peter Xu <peterx@redhat.com> wrote:

> On Tue, Sep 17, 2024 at 02:48:03PM +0800, Yong Huang wrote:
> > On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote:
> >
> > > Hyman Huang <yong.huang@smartx.com> writes:
> > >
> > > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced
> > > > to satisfy the need for background sync.
> > > >
> > > > Meanwhile, introduce enumeration of sync method.
> > > >
> > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > > > ---
> > > >  include/exec/ramblock.h | 45
> +++++++++++++++++++++++++++++++++++++++++
> > > >  migration/ram.c         |  6 ++++++
> > > >  2 files changed, 51 insertions(+)
> > > >
> > > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> > > > index 0babd105c0..0e327bc0ae 100644
> > > > --- a/include/exec/ramblock.h
> > > > +++ b/include/exec/ramblock.h
> > > > @@ -24,6 +24,30 @@
> > > >  #include "qemu/rcu.h"
> > > >  #include "exec/ramlist.h"
> > > >
> > > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */
> > > > +
> > > > +/*
> > > > + * The old-fashioned sync, which is, in turn, used for CPU
> > > > + * throttle and memory transfer.
> > >
> >
> > Using the traditional sync method, the page sending logic iterates
> > the "bmap" to transfer dirty pages while the CPU throttle logic
> > counts the amount of new dirty pages and detects convergence.
> > There are two uses for "bmap".
> >
> > Using the modern sync method, "bmap" is used for transfer
> > dirty pages and "iter_bmap" is used to track new dirty pages.
> >
> >
> > > I'm not sure I follow what "in turn" is supposed to mean in this
> > > sentence. Could you clarify?
> > >
> >
> > Here I want to express "in sequence".  But failed obviously. :(
> >
> >
> > >
> > > > + */
> > > > +#define RAMBLOCK_SYN_LEGACY_ITER   (1U << 0)
> > >
> > > So ITER is as opposed to background? I'm a bit confused with the terms.
> > >
> >
> > Yes.
> >
> >
> > >
> > > > +
> > > > +/*
> > > > + * The modern sync, which is, in turn, used for CPU throttle
> > > > + * and memory transfer.
> > > > + */
> > > > +#define RAMBLOCK_SYN_MODERN_ITER   (1U << 1)
> > > > +
> > > > +/* The modern sync, which is used for CPU throttle only */
> > > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND    (1U << 2)
> > >
> > > What's the plan for the "legacy" part? To be removed soon? Do we want
> to
> > > remove it now? Maybe better to not use the modern/legacy terms unless
> we
> > > want to give the impression that the legacy one is discontinued.
> > >
> >
> > The bitmap they utilized to track the dirty page information was the
> > distinction between the "legacy iteration" and the "modern iteration."
> > The "iter_bmap" field is used by the "modern iteration" while the "bmap"
> > field is used by the "legacy iteration."
> >
> > Since the refinement is now transparent and there is no API available to
> > change the sync method, I actually want to remove it right now in order
> > to simplify the logic. I'll include it in the next version.
>
> How confident do we think the new way is better than the old?
>
> If it'll be 100% / always better, I agree we can consider removing the old.
> But is it always better?  At least it consumes much more resources..
>

Yes, it introduces an extra bitmap with respect to the old sync logic.


>
> Otherwise, we can still leave that logic as-is but use a migration property
> to turn it on only on new machines I think.
>

OK, that's fine.


>
> Besides, could you explain why the solution needs to be this complex?  My
> previous question was that we sync dirty too less, while auto converge
> relies on dirty information, so that means auto converge can be adjusted
> too unfrequently.
>
> However I wonder whether that can be achieved in a simpler manner by
> e.g. invoke migration_bitmap_sync_precopy() more frequently during
> migration, for example, in ram_save_iterate() - not every time but the
> iterate() is invoked much more frequent, and maybe we can do sync from time
> to time.
>
> I also don't see why we need a separate thread, plus two new bitmaps, to
> achieve this..  I didn't read in-depth yet, but I thought dirty sync
> requires bql anyway, then I don't yet understand why the two bitmaps are
> required.  If the bitmaps are introduced in the 1st patch, IMO it'll be
> great to explain clearly on why they're needed here.
>
> Thanks,
>
> --
> Peter Xu
>
>

Yong Huang Sept. 20, 2024, 3:13 a.m. UTC | #6

On Fri, Sep 20, 2024 at 2:45 AM Peter Xu <peterx@redhat.com> wrote:

> On Tue, Sep 17, 2024 at 02:48:03PM +0800, Yong Huang wrote:
> > On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote:
> >
> > > Hyman Huang <yong.huang@smartx.com> writes:
> > >
> > > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced
> > > > to satisfy the need for background sync.
> > > >
> > > > Meanwhile, introduce enumeration of sync method.
> > > >
> > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > > > ---
> > > >  include/exec/ramblock.h | 45
> +++++++++++++++++++++++++++++++++++++++++
> > > >  migration/ram.c         |  6 ++++++
> > > >  2 files changed, 51 insertions(+)
> > > >
> > > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> > > > index 0babd105c0..0e327bc0ae 100644
> > > > --- a/include/exec/ramblock.h
> > > > +++ b/include/exec/ramblock.h
> > > > @@ -24,6 +24,30 @@
> > > >  #include "qemu/rcu.h"
> > > >  #include "exec/ramlist.h"
> > > >
> > > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */
> > > > +
> > > > +/*
> > > > + * The old-fashioned sync, which is, in turn, used for CPU
> > > > + * throttle and memory transfer.
> > >
> >
> > Using the traditional sync method, the page sending logic iterates
> > the "bmap" to transfer dirty pages while the CPU throttle logic
> > counts the amount of new dirty pages and detects convergence.
> > There are two uses for "bmap".
> >
> > Using the modern sync method, "bmap" is used for transfer
> > dirty pages and "iter_bmap" is used to track new dirty pages.
> >
> >
> > > I'm not sure I follow what "in turn" is supposed to mean in this
> > > sentence. Could you clarify?
> > >
> >
> > Here I want to express "in sequence".  But failed obviously. :(
> >
> >
> > >
> > > > + */
> > > > +#define RAMBLOCK_SYN_LEGACY_ITER   (1U << 0)
> > >
> > > So ITER is as opposed to background? I'm a bit confused with the terms.
> > >
> >
> > Yes.
> >
> >
> > >
> > > > +
> > > > +/*
> > > > + * The modern sync, which is, in turn, used for CPU throttle
> > > > + * and memory transfer.
> > > > + */
> > > > +#define RAMBLOCK_SYN_MODERN_ITER   (1U << 1)
> > > > +
> > > > +/* The modern sync, which is used for CPU throttle only */
> > > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND    (1U << 2)
> > >
> > > What's the plan for the "legacy" part? To be removed soon? Do we want
> to
> > > remove it now? Maybe better to not use the modern/legacy terms unless
> we
> > > want to give the impression that the legacy one is discontinued.
> > >
> >
> > The bitmap they utilized to track the dirty page information was the
> > distinction between the "legacy iteration" and the "modern iteration."
> > The "iter_bmap" field is used by the "modern iteration" while the "bmap"
> > field is used by the "legacy iteration."
> >
> > Since the refinement is now transparent and there is no API available to
> > change the sync method, I actually want to remove it right now in order
> > to simplify the logic. I'll include it in the next version.
>
> How confident do we think the new way is better than the old?
>
> If it'll be 100% / always better, I agree we can consider removing the old.
> But is it always better?  At least it consumes much more resources..
>
> Otherwise, we can still leave that logic as-is but use a migration property
> to turn it on only on new machines I think.
>
> Besides, could you explain why the solution needs to be this complex?  My
> previous question was that we sync dirty too less, while auto converge
> relies on dirty information, so that means auto converge can be adjusted
> too unfrequently.
>
> However I wonder whether that can be achieved in a simpler manner by
> e.g. invoke migration_bitmap_sync_precopy() more frequently during
> migration, for example, in ram_save_iterate() - not every time but the
> iterate() is invoked much more frequent, and maybe we can do sync from time
> to time.
>
> I also don't see why we need a separate thread, plus two new bitmaps, to
>

You mean we could do the background sync in the migration thread or
in the main thread( eg, using a timer) ?


> achieve this..  I didn't read in-depth yet, but I thought dirty sync
> requires bql anyway, then I don't yet understand why the two bitmaps are
> required.  If the bitmaps are introduced in the 1st patch, IMO it'll be
> great to explain clearly on why they're needed here.
>
> Thanks,
>
> --
> Peter Xu
>
>

Peter Xu Sept. 25, 2024, 7:17 p.m. UTC | #7

On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
> Yes, invoke migration_bitmap_sync_precopy more frequently is also my
> first idea but it involves bitmap updating and interfere with the behavior
> of page sending, it also affects the migration information stats and
> interfere other migration logic such as migration_update_rates().

Could you elaborate?

For example, what happens if we start to sync in ram_save_iterate() for
some time intervals (e.g. 5 seconds)?

Btw, we shouldn't have this extra sync exist if auto converge is disabled
no matter which way we use, because it's pure overhead when auto converge
is not in use.

Thanks,

Yong Huang Sept. 26, 2024, 6:13 p.m. UTC | #8

On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote:

> On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
> > Yes, invoke migration_bitmap_sync_precopy more frequently is also my
> > first idea but it involves bitmap updating and interfere with the
> behavior
> > of page sending, it also affects the migration information stats and
> > interfere other migration logic such as migration_update_rates().
>
> Could you elaborate?
>
> For example, what happens if we start to sync in ram_save_iterate() for
> some time intervals (e.g. 5 seconds)?
>

I didn't try to sync in ram_save_iterate but in the
migration_bitmap_sync_precopy.

If we use the migration_bitmap_sync_precopy in the ram_save_iterate
function,
This approach seems to be correct. However, the bitmap will be updated as
the
migration thread iterates through each dirty page in the RAMBlock list.
Compared
to the existing implementation, this is different but still straightforward;
I'll give it a shot soon to see if it works.

> Btw, we shouldn't have this extra sync exist if auto converge is disabled
> no matter which way we use, because it's pure overhead when auto converge
> is not in use.
>

Ok, I'll add the check in the next versioni.

>
> Thanks,
>
> --
> Peter Xu
>
>
Thanks for the comment.

Yong

Peter Xu Sept. 26, 2024, 7:55 p.m. UTC | #9

On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote:
> On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote:
> 
> > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
> > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my
> > > first idea but it involves bitmap updating and interfere with the
> > behavior
> > > of page sending, it also affects the migration information stats and
> > > interfere other migration logic such as migration_update_rates().
> >
> > Could you elaborate?
> >
> > For example, what happens if we start to sync in ram_save_iterate() for
> > some time intervals (e.g. 5 seconds)?
> >
> 
> I didn't try to sync in ram_save_iterate but in the
> migration_bitmap_sync_precopy.
> 
> If we use the migration_bitmap_sync_precopy in the ram_save_iterate
> function,
> This approach seems to be correct. However, the bitmap will be updated as
> the
> migration thread iterates through each dirty page in the RAMBlock list.
> Compared
> to the existing implementation, this is different but still straightforward;
> I'll give it a shot soon to see if it works.

It's still serialized in the migration thread, so I'd expect it is similar
to e.g. ->state_pending_exact() calls when QEMU flushed most dirty pages in
the current bitmap.

> 
> 
> > Btw, we shouldn't have this extra sync exist if auto converge is disabled
> > no matter which way we use, because it's pure overhead when auto converge
> > is not in use.
> >
> 
> Ok, I'll add the check in the next versioni.

Let's start with simple, and if there's anything unsure we can discuss
upfront, just to avoid coding something and change direction later.  Again,
personally I think we shouldn't add too much new code to auto converge
(unless very well justfied, but I think it's just hard.. fundamentally with
any pure throttling solutions), hopefully something small can make it start
to work for huge VMs.

Thanks,

Yong Huang Sept. 27, 2024, 2:50 a.m. UTC | #10

On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <peterx@redhat.com> wrote:

> On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote:
> > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
> > > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my
> > > > first idea but it involves bitmap updating and interfere with the
> > > behavior
> > > > of page sending, it also affects the migration information stats and
> > > > interfere other migration logic such as migration_update_rates().
> > >
> > > Could you elaborate?
> > >
> > > For example, what happens if we start to sync in ram_save_iterate() for
> > > some time intervals (e.g. 5 seconds)?
> > >
> >
> > I didn't try to sync in ram_save_iterate but in the
> > migration_bitmap_sync_precopy.
> >
> > If we use the migration_bitmap_sync_precopy in the ram_save_iterate
> > function,
> > This approach seems to be correct. However, the bitmap will be updated as
> > the
> > migration thread iterates through each dirty page in the RAMBlock list.
> > Compared
> > to the existing implementation, this is different but still
> straightforward;
> > I'll give it a shot soon to see if it works.
>
> It's still serialized in the migration thread, so I'd expect it is similar
>

What does "serialized" mean?

How about we:
1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook,
   every 5 seconds.
2. register the bg_sync_timer in the main loop when the machine starts like
    throttle_timer
3. activate the timer when ram_save_iterate gets called and deactivate it in
    the ram_save_cleanup gracefully during migration.

I think it is simple enough and also isn't "serialized"?

to e.g. ->state_pending_exact() calls when QEMU flushed most dirty pages in
> the current bitmap.
>
> >
> >
> > > Btw, we shouldn't have this extra sync exist if auto converge is
> disabled
> > > no matter which way we use, because it's pure overhead when auto
> converge
> > > is not in use.
> > >
> >
> > Ok, I'll add the check in the next versioni.
>
> Let's start with simple, and if there's anything unsure we can discuss
> upfront, just to avoid coding something and change direction later.  Again,
> personally I think we shouldn't add too much new code to auto converge
> (unless very well justfied, but I think it's just hard.. fundamentally with
> any pure throttling solutions), hopefully something small can make it start
> to work for huge VMs.
>
> Thanks,
>
> --
> Peter Xu
>
>

Peter Xu Sept. 27, 2024, 3:35 p.m. UTC | #11

On Fri, Sep 27, 2024 at 10:50:01AM +0800, Yong Huang wrote:
> On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <peterx@redhat.com> wrote:
> 
> > On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote:
> > > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
> > > > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my
> > > > > first idea but it involves bitmap updating and interfere with the
> > > > behavior
> > > > > of page sending, it also affects the migration information stats and
> > > > > interfere other migration logic such as migration_update_rates().
> > > >
> > > > Could you elaborate?
> > > >
> > > > For example, what happens if we start to sync in ram_save_iterate() for
> > > > some time intervals (e.g. 5 seconds)?
> > > >
> > >
> > > I didn't try to sync in ram_save_iterate but in the
> > > migration_bitmap_sync_precopy.
> > >
> > > If we use the migration_bitmap_sync_precopy in the ram_save_iterate
> > > function,
> > > This approach seems to be correct. However, the bitmap will be updated as
> > > the
> > > migration thread iterates through each dirty page in the RAMBlock list.
> > > Compared
> > > to the existing implementation, this is different but still
> > straightforward;
> > > I'll give it a shot soon to see if it works.
> >
> > It's still serialized in the migration thread, so I'd expect it is similar
> >
> 
> What does "serialized" mean?

I meant sync() never happens before concurrently with RAM pages being
iterated, simply because sync() previously only happens in the migration
thread, which is still the same thread that initiate the movement of pages.

> 
> How about we:
> 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook,
>    every 5 seconds.
> 2. register the bg_sync_timer in the main loop when the machine starts like
>     throttle_timer
> 3. activate the timer when ram_save_iterate gets called and deactivate it in
>     the ram_save_cleanup gracefully during migration.
> 
> I think it is simple enough and also isn't "serialized"?

If you want to do that with timer that's ok, but then IIUC it doesn't need
to involve ram.c code at all.

You can rely on cpu_throttle_get_percentage() too just like the throttle
timer, and it'll work naturally with migration because outside migration
the throttle will be cleared (cpu_throttle_stop() at finish/fail/cancel..).

Then it also gracefully align the async thread sync() that it only happens
with auto-converge is enabled.  Yeh that may look better.. and stick the
code together with cpu-throttle.c seems nice.

Side note: one thing regarind to sync() is ram_init_bitmaps() sync once,
while I don't see why it's necessary.  I remember I tried to remove it but
maybe I hit some issues and I didn't dig further.  If you're working on
sync() anyway not sure whether you'd like to have a look.

Yong Huang Sept. 27, 2024, 4:44 p.m. UTC | #12

在 2024/9/27 23:35, Peter Xu 写道:
> On Fri, Sep 27, 2024 at 10:50:01AM +0800, Yong Huang wrote:
>> On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <peterx@redhat.com> wrote:
>>
>>> On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote:
>>>> On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote:
>>>>
>>>>> On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
>>>>>> Yes, invoke migration_bitmap_sync_precopy more frequently is also my
>>>>>> first idea but it involves bitmap updating and interfere with the
>>>>> behavior
>>>>>> of page sending, it also affects the migration information stats and
>>>>>> interfere other migration logic such as migration_update_rates().
>>>>> Could you elaborate?
>>>>>
>>>>> For example, what happens if we start to sync in ram_save_iterate() for
>>>>> some time intervals (e.g. 5 seconds)?
>>>>>
>>>> I didn't try to sync in ram_save_iterate but in the
>>>> migration_bitmap_sync_precopy.
>>>>
>>>> If we use the migration_bitmap_sync_precopy in the ram_save_iterate
>>>> function,
>>>> This approach seems to be correct. However, the bitmap will be updated as
>>>> the
>>>> migration thread iterates through each dirty page in the RAMBlock list.
>>>> Compared
>>>> to the existing implementation, this is different but still
>>> straightforward;
>>>> I'll give it a shot soon to see if it works.
>>> It's still serialized in the migration thread, so I'd expect it is similar
>>>
>> What does "serialized" mean?
> I meant sync() never happens before concurrently with RAM pages being
> iterated, simply because sync() previously only happens in the migration
> thread, which is still the same thread that initiate the movement of pages.
>
>> How about we:
>> 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook,
>>     every 5 seconds.
>> 2. register the bg_sync_timer in the main loop when the machine starts like
>>      throttle_timer
>> 3. activate the timer when ram_save_iterate gets called and deactivate it in
>>      the ram_save_cleanup gracefully during migration.
>>
>> I think it is simple enough and also isn't "serialized"?
> If you want to do that with timer that's ok, but then IIUC it doesn't need
> to involve ram.c code at all.
>
> You can rely on cpu_throttle_get_percentage() too just like the throttle
> timer, and it'll work naturally with migration because outside migration
> the throttle will be cleared (cpu_throttle_stop() at finish/fail/cancel..).
>
> Then it also gracefully align the async thread sync() that it only happens
> with auto-converge is enabled.  Yeh that may look better.. and stick the
> code together with cpu-throttle.c seems nice.

Ok, Thanks for the advices, i'll check it and see how it goes.

>
> Side note: one thing regarind to sync() is ram_init_bitmaps() sync once,
> while I don't see why it's necessary.  I remember I tried to remove it but
> maybe I hit some issues and I didn't dig further.  If you're working on
> sync() anyway not sure whether you'd like to have a look.

Agree, I'll try it after working out current series.


Yong

Yong Huang Sept. 28, 2024, 5:07 a.m. UTC | #13

On Fri, Sep 27, 2024 at 11:35 PM Peter Xu <peterx@redhat.com> wrote:

> On Fri, Sep 27, 2024 at 10:50:01AM +0800, Yong Huang wrote:
> > On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > > On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote:
> > > > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
> > > > > > Yes, invoke migration_bitmap_sync_precopy more frequently is
> also my
> > > > > > first idea but it involves bitmap updating and interfere with the
> > > > > behavior
> > > > > > of page sending, it also affects the migration information stats
> and
> > > > > > interfere other migration logic such as migration_update_rates().
> > > > >
> > > > > Could you elaborate?
> > > > >
> > > > > For example, what happens if we start to sync in
> ram_save_iterate() for
> > > > > some time intervals (e.g. 5 seconds)?
> > > > >
> > > >
> > > > I didn't try to sync in ram_save_iterate but in the
> > > > migration_bitmap_sync_precopy.
> > > >
> > > > If we use the migration_bitmap_sync_precopy in the ram_save_iterate
> > > > function,
> > > > This approach seems to be correct. However, the bitmap will be
> updated as
> > > > the
> > > > migration thread iterates through each dirty page in the RAMBlock
> list.
> > > > Compared
> > > > to the existing implementation, this is different but still
> > > straightforward;
> > > > I'll give it a shot soon to see if it works.
> > >
> > > It's still serialized in the migration thread, so I'd expect it is
> similar
> > >
> >
> > What does "serialized" mean?
>
> I meant sync() never happens before concurrently with RAM pages being
> iterated, simply because sync() previously only happens in the migration
> thread, which is still the same thread that initiate the movement of pages.
>
> >
> > How about we:
> > 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer)
> hook,
> >    every 5 seconds.
> > 2. register the bg_sync_timer in the main loop when the machine starts
> like
> >     throttle_timer
> > 3. activate the timer when ram_save_iterate gets called and deactivate
> it in
> >     the ram_save_cleanup gracefully during migration.
> >
> > I think it is simple enough and also isn't "serialized"?
>
> If you want to do that with timer that's ok, but then IIUC it doesn't need
> to involve ram.c code at all.
>

The timer hook will call the migration_bitmap_sync_precopy()
which is implemented in ram.c, maybe we can define the hook
function in ram.c and expose it in ram.h?


>
> You can rely on cpu_throttle_get_percentage() too just like the throttle
> timer, and it'll work naturally with migration because outside migration
> the throttle will be cleared (cpu_throttle_stop() at finish/fail/cancel..).
>

Relying on cpu_throttle_get_percentage() may miss the sync time window
during the second iteration when it last a long time while the throtlle
hasn't  started yet. I'll think through your idea and apply it as possible.


>
> Then it also gracefully align the async thread sync() that it only happens
> with auto-converge is enabled.  Yeh that may look better.. and stick the
> code together with cpu-throttle.c seems nice.
>
> Side note: one thing regarind to sync() is ram_init_bitmaps() sync once,
> while I don't see why it's necessary.  I remember I tried to remove it but
> maybe I hit some issues and I didn't dig further.  If you're working on
> sync() anyway not sure whether you'd like to have a look.
>
> --
> Peter Xu
>
>
Thanks,
Yong

[v1,1/7] migration: Introduce structs for background sync

Commit Message

Comments

Patch