Message ID | 531750c8d7b6c09f877b5f335a60fab402c168be.1726390098.git.yong.huang@smartx.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: auto-converge refinements for huge VM | expand |
Hyman Huang <yong.huang@smartx.com> writes: > shadow_bmap, iter_bmap and iter_dirty_pages are introduced > to satisfy the need for background sync. > > Meanwhile, introduce enumeration of sync method. > > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > --- > include/exec/ramblock.h | 45 +++++++++++++++++++++++++++++++++++++++++ > migration/ram.c | 6 ++++++ > 2 files changed, 51 insertions(+) > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h > index 0babd105c0..0e327bc0ae 100644 > --- a/include/exec/ramblock.h > +++ b/include/exec/ramblock.h > @@ -24,6 +24,30 @@ > #include "qemu/rcu.h" > #include "exec/ramlist.h" > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */ > + > +/* > + * The old-fashioned sync, which is, in turn, used for CPU > + * throttle and memory transfer. I'm not sure I follow what "in turn" is supposed to mean in this sentence. Could you clarify? > + */ > +#define RAMBLOCK_SYN_LEGACY_ITER (1U << 0) So ITER is as opposed to background? I'm a bit confused with the terms. > + > +/* > + * The modern sync, which is, in turn, used for CPU throttle > + * and memory transfer. > + */ > +#define RAMBLOCK_SYN_MODERN_ITER (1U << 1) > + > +/* The modern sync, which is used for CPU throttle only */ > +#define RAMBLOCK_SYN_MODERN_BACKGROUND (1U << 2) What's the plan for the "legacy" part? To be removed soon? Do we want to remove it now? Maybe better to not use the modern/legacy terms unless we want to give the impression that the legacy one is discontinued. > + > +#define RAMBLOCK_SYN_MASK (0x7) > + > +typedef enum RAMBlockSynMode { > + RAMBLOCK_SYN_LEGACY, /* Old-fashined mode */ > + RAMBLOCK_SYN_MODERN, /* Background-sync-supported mode */ > +} RAMBlockSynMode; I'm also wondering wheter we need this enum + the flags or one of them would suffice. I'm looking at code like this in the following patches, for instance: + if (sync_mode == RAMBLOCK_SYN_MODERN) { + if (background) { + flag = RAMBLOCK_SYN_MODERN_BACKGROUND; + } else { + flag = RAMBLOCK_SYN_MODERN_ITER; + } + } Couldn't we use LEGACY/BG/ITER? > + > struct RAMBlock { > struct rcu_head rcu; > struct MemoryRegion *mr; > @@ -89,6 +113,27 @@ struct RAMBlock { > * could not have been valid on the source. > */ > ram_addr_t postcopy_length; > + > + /* > + * Used to backup the bmap during background sync to see whether any dirty > + * pages were sent during that time. > + */ > + unsigned long *shadow_bmap; > + > + /* > + * The bitmap "bmap," which was initially used for both sync and memory > + * transfer, will be replaced by two bitmaps: the previously used "bmap" > + * and the recently added "iter_bmap." Only the memory transfer is > + * conducted with the previously used "bmap"; the recently added > + * "iter_bmap" is utilized for dirty bitmap sync. > + */ > + unsigned long *iter_bmap; > + > + /* Number of new dirty pages during iteration */ > + uint64_t iter_dirty_pages; > + > + /* If background sync has shown up during iteration */ > + bool background_sync_shown_up; > }; > #endif > #endif > diff --git a/migration/ram.c b/migration/ram.c > index 67ca3d5d51..f29faa82d6 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -2362,6 +2362,10 @@ static void ram_bitmaps_destroy(void) > block->bmap = NULL; > g_free(block->file_bmap); > block->file_bmap = NULL; > + g_free(block->shadow_bmap); > + block->shadow_bmap = NULL; > + g_free(block->iter_bmap); > + block->iter_bmap = NULL; > } > } > > @@ -2753,6 +2757,8 @@ static void ram_list_init_bitmaps(void) > } > block->clear_bmap_shift = shift; > block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift)); > + block->shadow_bmap = bitmap_new(pages); > + block->iter_bmap = bitmap_new(pages); > } > } > }
On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote: > Hyman Huang <yong.huang@smartx.com> writes: > > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced > > to satisfy the need for background sync. > > > > Meanwhile, introduce enumeration of sync method. > > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > > --- > > include/exec/ramblock.h | 45 +++++++++++++++++++++++++++++++++++++++++ > > migration/ram.c | 6 ++++++ > > 2 files changed, 51 insertions(+) > > > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h > > index 0babd105c0..0e327bc0ae 100644 > > --- a/include/exec/ramblock.h > > +++ b/include/exec/ramblock.h > > @@ -24,6 +24,30 @@ > > #include "qemu/rcu.h" > > #include "exec/ramlist.h" > > > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */ > > + > > +/* > > + * The old-fashioned sync, which is, in turn, used for CPU > > + * throttle and memory transfer. > Using the traditional sync method, the page sending logic iterates the "bmap" to transfer dirty pages while the CPU throttle logic counts the amount of new dirty pages and detects convergence. There are two uses for "bmap". Using the modern sync method, "bmap" is used for transfer dirty pages and "iter_bmap" is used to track new dirty pages. > I'm not sure I follow what "in turn" is supposed to mean in this > sentence. Could you clarify? > Here I want to express "in sequence". But failed obviously. :( > > > + */ > > +#define RAMBLOCK_SYN_LEGACY_ITER (1U << 0) > > So ITER is as opposed to background? I'm a bit confused with the terms. > Yes. > > > + > > +/* > > + * The modern sync, which is, in turn, used for CPU throttle > > + * and memory transfer. > > + */ > > +#define RAMBLOCK_SYN_MODERN_ITER (1U << 1) > > + > > +/* The modern sync, which is used for CPU throttle only */ > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND (1U << 2) > > What's the plan for the "legacy" part? To be removed soon? Do we want to > remove it now? Maybe better to not use the modern/legacy terms unless we > want to give the impression that the legacy one is discontinued. > The bitmap they utilized to track the dirty page information was the distinction between the "legacy iteration" and the "modern iteration." The "iter_bmap" field is used by the "modern iteration" while the "bmap" field is used by the "legacy iteration." Since the refinement is now transparent and there is no API available to change the sync method, I actually want to remove it right now in order to simplify the logic. I'll include it in the next version. > > > + > > +#define RAMBLOCK_SYN_MASK (0x7) > > + > > +typedef enum RAMBlockSynMode { > > + RAMBLOCK_SYN_LEGACY, /* Old-fashined mode */ > > + RAMBLOCK_SYN_MODERN, /* Background-sync-supported mode */ > > +} RAMBlockSynMode; > > I'm also wondering wheter we need this enum + the flags or one of them > would suffice. I'm looking at code like this in the following patches, > for instance: > If we drop the "legacy modern", we can simplify the following logic too. > + if (sync_mode == RAMBLOCK_SYN_MODERN) { > + if (background) { > + flag = RAMBLOCK_SYN_MODERN_BACKGROUND; > + } else { > + flag = RAMBLOCK_SYN_MODERN_ITER; > + } > + } Couldn't we use LEGACY/BG/ITER? > > + > > struct RAMBlock { > > struct rcu_head rcu; > > struct MemoryRegion *mr; > > @@ -89,6 +113,27 @@ struct RAMBlock { > > * could not have been valid on the source. > > */ > > ram_addr_t postcopy_length; > > + > > + /* > > + * Used to backup the bmap during background sync to see whether > any dirty > > + * pages were sent during that time. > > + */ > > + unsigned long *shadow_bmap; > > + > > + /* > > + * The bitmap "bmap," which was initially used for both sync and > memory > > + * transfer, will be replaced by two bitmaps: the previously used > "bmap" > > + * and the recently added "iter_bmap." Only the memory transfer is > > + * conducted with the previously used "bmap"; the recently added > > + * "iter_bmap" is utilized for dirty bitmap sync. > > + */ > > + unsigned long *iter_bmap; > > + > > + /* Number of new dirty pages during iteration */ > > + uint64_t iter_dirty_pages; > > + > > + /* If background sync has shown up during iteration */ > > + bool background_sync_shown_up; > > }; > > #endif > > #endif > > diff --git a/migration/ram.c b/migration/ram.c > > index 67ca3d5d51..f29faa82d6 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -2362,6 +2362,10 @@ static void ram_bitmaps_destroy(void) > > block->bmap = NULL; > > g_free(block->file_bmap); > > block->file_bmap = NULL; > > + g_free(block->shadow_bmap); > > + block->shadow_bmap = NULL; > > + g_free(block->iter_bmap); > > + block->iter_bmap = NULL; > > } > > } > > > > @@ -2753,6 +2757,8 @@ static void ram_list_init_bitmaps(void) > > } > > block->clear_bmap_shift = shift; > > block->clear_bmap = bitmap_new(clear_bmap_size(pages, > shift)); > > + block->shadow_bmap = bitmap_new(pages); > > + block->iter_bmap = bitmap_new(pages); > > } > > } > > } >
On Tue, Sep 17, 2024 at 02:48:03PM +0800, Yong Huang wrote: > On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote: > > > Hyman Huang <yong.huang@smartx.com> writes: > > > > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced > > > to satisfy the need for background sync. > > > > > > Meanwhile, introduce enumeration of sync method. > > > > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > > > --- > > > include/exec/ramblock.h | 45 +++++++++++++++++++++++++++++++++++++++++ > > > migration/ram.c | 6 ++++++ > > > 2 files changed, 51 insertions(+) > > > > > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h > > > index 0babd105c0..0e327bc0ae 100644 > > > --- a/include/exec/ramblock.h > > > +++ b/include/exec/ramblock.h > > > @@ -24,6 +24,30 @@ > > > #include "qemu/rcu.h" > > > #include "exec/ramlist.h" > > > > > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */ > > > + > > > +/* > > > + * The old-fashioned sync, which is, in turn, used for CPU > > > + * throttle and memory transfer. > > > > Using the traditional sync method, the page sending logic iterates > the "bmap" to transfer dirty pages while the CPU throttle logic > counts the amount of new dirty pages and detects convergence. > There are two uses for "bmap". > > Using the modern sync method, "bmap" is used for transfer > dirty pages and "iter_bmap" is used to track new dirty pages. > > > > I'm not sure I follow what "in turn" is supposed to mean in this > > sentence. Could you clarify? > > > > Here I want to express "in sequence". But failed obviously. :( > > > > > > > + */ > > > +#define RAMBLOCK_SYN_LEGACY_ITER (1U << 0) > > > > So ITER is as opposed to background? I'm a bit confused with the terms. > > > > Yes. > > > > > > > + > > > +/* > > > + * The modern sync, which is, in turn, used for CPU throttle > > > + * and memory transfer. > > > + */ > > > +#define RAMBLOCK_SYN_MODERN_ITER (1U << 1) > > > + > > > +/* The modern sync, which is used for CPU throttle only */ > > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND (1U << 2) > > > > What's the plan for the "legacy" part? To be removed soon? Do we want to > > remove it now? Maybe better to not use the modern/legacy terms unless we > > want to give the impression that the legacy one is discontinued. > > > > The bitmap they utilized to track the dirty page information was the > distinction between the "legacy iteration" and the "modern iteration." > The "iter_bmap" field is used by the "modern iteration" while the "bmap" > field is used by the "legacy iteration." > > Since the refinement is now transparent and there is no API available to > change the sync method, I actually want to remove it right now in order > to simplify the logic. I'll include it in the next version. How confident do we think the new way is better than the old? If it'll be 100% / always better, I agree we can consider removing the old. But is it always better? At least it consumes much more resources.. Otherwise, we can still leave that logic as-is but use a migration property to turn it on only on new machines I think. Besides, could you explain why the solution needs to be this complex? My previous question was that we sync dirty too less, while auto converge relies on dirty information, so that means auto converge can be adjusted too unfrequently. However I wonder whether that can be achieved in a simpler manner by e.g. invoke migration_bitmap_sync_precopy() more frequently during migration, for example, in ram_save_iterate() - not every time but the iterate() is invoked much more frequent, and maybe we can do sync from time to time. I also don't see why we need a separate thread, plus two new bitmaps, to achieve this.. I didn't read in-depth yet, but I thought dirty sync requires bql anyway, then I don't yet understand why the two bitmaps are required. If the bitmaps are introduced in the 1st patch, IMO it'll be great to explain clearly on why they're needed here. Thanks,
On Fri, Sep 20, 2024 at 2:45 AM Peter Xu <peterx@redhat.com> wrote: > On Tue, Sep 17, 2024 at 02:48:03PM +0800, Yong Huang wrote: > > On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote: > > > > > Hyman Huang <yong.huang@smartx.com> writes: > > > > > > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced > > > > to satisfy the need for background sync. > > > > > > > > Meanwhile, introduce enumeration of sync method. > > > > > > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > > > > --- > > > > include/exec/ramblock.h | 45 > +++++++++++++++++++++++++++++++++++++++++ > > > > migration/ram.c | 6 ++++++ > > > > 2 files changed, 51 insertions(+) > > > > > > > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h > > > > index 0babd105c0..0e327bc0ae 100644 > > > > --- a/include/exec/ramblock.h > > > > +++ b/include/exec/ramblock.h > > > > @@ -24,6 +24,30 @@ > > > > #include "qemu/rcu.h" > > > > #include "exec/ramlist.h" > > > > > > > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */ > > > > + > > > > +/* > > > > + * The old-fashioned sync, which is, in turn, used for CPU > > > > + * throttle and memory transfer. > > > > > > > Using the traditional sync method, the page sending logic iterates > > the "bmap" to transfer dirty pages while the CPU throttle logic > > counts the amount of new dirty pages and detects convergence. > > There are two uses for "bmap". > > > > Using the modern sync method, "bmap" is used for transfer > > dirty pages and "iter_bmap" is used to track new dirty pages. > > > > > > > I'm not sure I follow what "in turn" is supposed to mean in this > > > sentence. Could you clarify? > > > > > > > Here I want to express "in sequence". But failed obviously. :( > > > > > > > > > > > + */ > > > > +#define RAMBLOCK_SYN_LEGACY_ITER (1U << 0) > > > > > > So ITER is as opposed to background? I'm a bit confused with the terms. > > > > > > > Yes. > > > > > > > > > > > + > > > > +/* > > > > + * The modern sync, which is, in turn, used for CPU throttle > > > > + * and memory transfer. > > > > + */ > > > > +#define RAMBLOCK_SYN_MODERN_ITER (1U << 1) > > > > + > > > > +/* The modern sync, which is used for CPU throttle only */ > > > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND (1U << 2) > > > > > > What's the plan for the "legacy" part? To be removed soon? Do we want > to > > > remove it now? Maybe better to not use the modern/legacy terms unless > we > > > want to give the impression that the legacy one is discontinued. > > > > > > > The bitmap they utilized to track the dirty page information was the > > distinction between the "legacy iteration" and the "modern iteration." > > The "iter_bmap" field is used by the "modern iteration" while the "bmap" > > field is used by the "legacy iteration." > > > > Since the refinement is now transparent and there is no API available to > > change the sync method, I actually want to remove it right now in order > > to simplify the logic. I'll include it in the next version. > > How confident do we think the new way is better than the old? > > If it'll be 100% / always better, I agree we can consider removing the old. > But is it always better? At least it consumes much more resources.. > > Otherwise, we can still leave that logic as-is but use a migration property > to turn it on only on new machines I think. > > Besides, could you explain why the solution needs to be this complex? My > previous question was that we sync dirty too less, while auto converge > relies on dirty information, so that means auto converge can be adjusted > too unfrequently. > The original logic will update the bmap for each sync, which was used to conduct the dirty page sending. In the background sync logic, we do not want to update bmap to interfere with the behavior of page sending for each background sync, since the bitmap we are syncing is only used to detect the convergence and do the CPU throttle. The iteration sync wants to 1: sync dirty bitmap, 2:detect convergence, 3: do the CPU throttle and 4: use the bmap fetched to conduct the page sending, while the background sync only does the 1,2,3. They have different purposes. These logic need at least two bitmap, one is used to page sending and another is used for CPU throttling, to achieve this, we introduced the iter_bmap as the temporary bitmap to store the dirty page information during background sync and copy it to the bmap in the iteration sync logic. However, the dirty page information in iter_bmap may be repetitive since the dirty pages it records could be sent after background syncing, we introduced the shadow_bmap to help calculate the dirty pages having been sent during two background syncs. > However I wonder whether that can be achieved in a simpler manner by > I have tried my best to make the solution simpler but failed. :( > e.g. invoke migration_bitmap_sync_precopy() more frequently during > Yes, invoke migration_bitmap_sync_precopy more frequently is also my first idea but it involves bitmap updating and interfere with the behavior of page sending, it also affects the migration information stats and interfere other migration logic such as migration_update_rates(). > migration, for example, in ram_save_iterate() - not every time but the > iterate() is invoked much more frequent, and maybe we can do sync from time > to time. > I also don't see why we need a separate thread, plus two new bitmaps, to > achieve this.. I didn't read in-depth yet, but I thought dirty sync > requires bql anyway, then I don't yet understand why the two bitmaps are > required. If the bitmaps are introduced in the 1st patch, IMO it'll be > great to explain clearly on why they're needed here. > > Thanks, > > -- > Peter Xu > > Thanks, Yong
On Fri, Sep 20, 2024 at 2:45 AM Peter Xu <peterx@redhat.com> wrote: > On Tue, Sep 17, 2024 at 02:48:03PM +0800, Yong Huang wrote: > > On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote: > > > > > Hyman Huang <yong.huang@smartx.com> writes: > > > > > > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced > > > > to satisfy the need for background sync. > > > > > > > > Meanwhile, introduce enumeration of sync method. > > > > > > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > > > > --- > > > > include/exec/ramblock.h | 45 > +++++++++++++++++++++++++++++++++++++++++ > > > > migration/ram.c | 6 ++++++ > > > > 2 files changed, 51 insertions(+) > > > > > > > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h > > > > index 0babd105c0..0e327bc0ae 100644 > > > > --- a/include/exec/ramblock.h > > > > +++ b/include/exec/ramblock.h > > > > @@ -24,6 +24,30 @@ > > > > #include "qemu/rcu.h" > > > > #include "exec/ramlist.h" > > > > > > > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */ > > > > + > > > > +/* > > > > + * The old-fashioned sync, which is, in turn, used for CPU > > > > + * throttle and memory transfer. > > > > > > > Using the traditional sync method, the page sending logic iterates > > the "bmap" to transfer dirty pages while the CPU throttle logic > > counts the amount of new dirty pages and detects convergence. > > There are two uses for "bmap". > > > > Using the modern sync method, "bmap" is used for transfer > > dirty pages and "iter_bmap" is used to track new dirty pages. > > > > > > > I'm not sure I follow what "in turn" is supposed to mean in this > > > sentence. Could you clarify? > > > > > > > Here I want to express "in sequence". But failed obviously. :( > > > > > > > > > > > + */ > > > > +#define RAMBLOCK_SYN_LEGACY_ITER (1U << 0) > > > > > > So ITER is as opposed to background? I'm a bit confused with the terms. > > > > > > > Yes. > > > > > > > > > > > + > > > > +/* > > > > + * The modern sync, which is, in turn, used for CPU throttle > > > > + * and memory transfer. > > > > + */ > > > > +#define RAMBLOCK_SYN_MODERN_ITER (1U << 1) > > > > + > > > > +/* The modern sync, which is used for CPU throttle only */ > > > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND (1U << 2) > > > > > > What's the plan for the "legacy" part? To be removed soon? Do we want > to > > > remove it now? Maybe better to not use the modern/legacy terms unless > we > > > want to give the impression that the legacy one is discontinued. > > > > > > > The bitmap they utilized to track the dirty page information was the > > distinction between the "legacy iteration" and the "modern iteration." > > The "iter_bmap" field is used by the "modern iteration" while the "bmap" > > field is used by the "legacy iteration." > > > > Since the refinement is now transparent and there is no API available to > > change the sync method, I actually want to remove it right now in order > > to simplify the logic. I'll include it in the next version. > > How confident do we think the new way is better than the old? > > If it'll be 100% / always better, I agree we can consider removing the old. > But is it always better? At least it consumes much more resources.. > Yes, it introduces an extra bitmap with respect to the old sync logic. > > Otherwise, we can still leave that logic as-is but use a migration property > to turn it on only on new machines I think. > OK, that's fine. > > Besides, could you explain why the solution needs to be this complex? My > previous question was that we sync dirty too less, while auto converge > relies on dirty information, so that means auto converge can be adjusted > too unfrequently. > > However I wonder whether that can be achieved in a simpler manner by > e.g. invoke migration_bitmap_sync_precopy() more frequently during > migration, for example, in ram_save_iterate() - not every time but the > iterate() is invoked much more frequent, and maybe we can do sync from time > to time. > > I also don't see why we need a separate thread, plus two new bitmaps, to > achieve this.. I didn't read in-depth yet, but I thought dirty sync > requires bql anyway, then I don't yet understand why the two bitmaps are > required. If the bitmaps are introduced in the 1st patch, IMO it'll be > great to explain clearly on why they're needed here. > > Thanks, > > -- > Peter Xu > >
On Fri, Sep 20, 2024 at 2:45 AM Peter Xu <peterx@redhat.com> wrote: > On Tue, Sep 17, 2024 at 02:48:03PM +0800, Yong Huang wrote: > > On Tue, Sep 17, 2024 at 5:11 AM Fabiano Rosas <farosas@suse.de> wrote: > > > > > Hyman Huang <yong.huang@smartx.com> writes: > > > > > > > shadow_bmap, iter_bmap and iter_dirty_pages are introduced > > > > to satisfy the need for background sync. > > > > > > > > Meanwhile, introduce enumeration of sync method. > > > > > > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > > > > --- > > > > include/exec/ramblock.h | 45 > +++++++++++++++++++++++++++++++++++++++++ > > > > migration/ram.c | 6 ++++++ > > > > 2 files changed, 51 insertions(+) > > > > > > > > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h > > > > index 0babd105c0..0e327bc0ae 100644 > > > > --- a/include/exec/ramblock.h > > > > +++ b/include/exec/ramblock.h > > > > @@ -24,6 +24,30 @@ > > > > #include "qemu/rcu.h" > > > > #include "exec/ramlist.h" > > > > > > > > +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */ > > > > + > > > > +/* > > > > + * The old-fashioned sync, which is, in turn, used for CPU > > > > + * throttle and memory transfer. > > > > > > > Using the traditional sync method, the page sending logic iterates > > the "bmap" to transfer dirty pages while the CPU throttle logic > > counts the amount of new dirty pages and detects convergence. > > There are two uses for "bmap". > > > > Using the modern sync method, "bmap" is used for transfer > > dirty pages and "iter_bmap" is used to track new dirty pages. > > > > > > > I'm not sure I follow what "in turn" is supposed to mean in this > > > sentence. Could you clarify? > > > > > > > Here I want to express "in sequence". But failed obviously. :( > > > > > > > > > > > + */ > > > > +#define RAMBLOCK_SYN_LEGACY_ITER (1U << 0) > > > > > > So ITER is as opposed to background? I'm a bit confused with the terms. > > > > > > > Yes. > > > > > > > > > > > + > > > > +/* > > > > + * The modern sync, which is, in turn, used for CPU throttle > > > > + * and memory transfer. > > > > + */ > > > > +#define RAMBLOCK_SYN_MODERN_ITER (1U << 1) > > > > + > > > > +/* The modern sync, which is used for CPU throttle only */ > > > > +#define RAMBLOCK_SYN_MODERN_BACKGROUND (1U << 2) > > > > > > What's the plan for the "legacy" part? To be removed soon? Do we want > to > > > remove it now? Maybe better to not use the modern/legacy terms unless > we > > > want to give the impression that the legacy one is discontinued. > > > > > > > The bitmap they utilized to track the dirty page information was the > > distinction between the "legacy iteration" and the "modern iteration." > > The "iter_bmap" field is used by the "modern iteration" while the "bmap" > > field is used by the "legacy iteration." > > > > Since the refinement is now transparent and there is no API available to > > change the sync method, I actually want to remove it right now in order > > to simplify the logic. I'll include it in the next version. > > How confident do we think the new way is better than the old? > > If it'll be 100% / always better, I agree we can consider removing the old. > But is it always better? At least it consumes much more resources.. > > Otherwise, we can still leave that logic as-is but use a migration property > to turn it on only on new machines I think. > > Besides, could you explain why the solution needs to be this complex? My > previous question was that we sync dirty too less, while auto converge > relies on dirty information, so that means auto converge can be adjusted > too unfrequently. > > However I wonder whether that can be achieved in a simpler manner by > e.g. invoke migration_bitmap_sync_precopy() more frequently during > migration, for example, in ram_save_iterate() - not every time but the > iterate() is invoked much more frequent, and maybe we can do sync from time > to time. > > I also don't see why we need a separate thread, plus two new bitmaps, to > You mean we could do the background sync in the migration thread or in the main thread( eg, using a timer) ? > achieve this.. I didn't read in-depth yet, but I thought dirty sync > requires bql anyway, then I don't yet understand why the two bitmaps are > required. If the bitmaps are introduced in the 1st patch, IMO it'll be > great to explain clearly on why they're needed here. > > Thanks, > > -- > Peter Xu > >
On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: > Yes, invoke migration_bitmap_sync_precopy more frequently is also my > first idea but it involves bitmap updating and interfere with the behavior > of page sending, it also affects the migration information stats and > interfere other migration logic such as migration_update_rates(). Could you elaborate? For example, what happens if we start to sync in ram_save_iterate() for some time intervals (e.g. 5 seconds)? Btw, we shouldn't have this extra sync exist if auto converge is disabled no matter which way we use, because it's pure overhead when auto converge is not in use. Thanks,
On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote: > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my > > first idea but it involves bitmap updating and interfere with the > behavior > > of page sending, it also affects the migration information stats and > > interfere other migration logic such as migration_update_rates(). > > Could you elaborate? > > For example, what happens if we start to sync in ram_save_iterate() for > some time intervals (e.g. 5 seconds)? > I didn't try to sync in ram_save_iterate but in the migration_bitmap_sync_precopy. If we use the migration_bitmap_sync_precopy in the ram_save_iterate function, This approach seems to be correct. However, the bitmap will be updated as the migration thread iterates through each dirty page in the RAMBlock list. Compared to the existing implementation, this is different but still straightforward; I'll give it a shot soon to see if it works. > Btw, we shouldn't have this extra sync exist if auto converge is disabled > no matter which way we use, because it's pure overhead when auto converge > is not in use. > Ok, I'll add the check in the next versioni. > > Thanks, > > -- > Peter Xu > > Thanks for the comment. Yong
On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote: > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote: > > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: > > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my > > > first idea but it involves bitmap updating and interfere with the > > behavior > > > of page sending, it also affects the migration information stats and > > > interfere other migration logic such as migration_update_rates(). > > > > Could you elaborate? > > > > For example, what happens if we start to sync in ram_save_iterate() for > > some time intervals (e.g. 5 seconds)? > > > > I didn't try to sync in ram_save_iterate but in the > migration_bitmap_sync_precopy. > > If we use the migration_bitmap_sync_precopy in the ram_save_iterate > function, > This approach seems to be correct. However, the bitmap will be updated as > the > migration thread iterates through each dirty page in the RAMBlock list. > Compared > to the existing implementation, this is different but still straightforward; > I'll give it a shot soon to see if it works. It's still serialized in the migration thread, so I'd expect it is similar to e.g. ->state_pending_exact() calls when QEMU flushed most dirty pages in the current bitmap. > > > > Btw, we shouldn't have this extra sync exist if auto converge is disabled > > no matter which way we use, because it's pure overhead when auto converge > > is not in use. > > > > Ok, I'll add the check in the next versioni. Let's start with simple, and if there's anything unsure we can discuss upfront, just to avoid coding something and change direction later. Again, personally I think we shouldn't add too much new code to auto converge (unless very well justfied, but I think it's just hard.. fundamentally with any pure throttling solutions), hopefully something small can make it start to work for huge VMs. Thanks,
On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <peterx@redhat.com> wrote: > On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote: > > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote: > > > > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: > > > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my > > > > first idea but it involves bitmap updating and interfere with the > > > behavior > > > > of page sending, it also affects the migration information stats and > > > > interfere other migration logic such as migration_update_rates(). > > > > > > Could you elaborate? > > > > > > For example, what happens if we start to sync in ram_save_iterate() for > > > some time intervals (e.g. 5 seconds)? > > > > > > > I didn't try to sync in ram_save_iterate but in the > > migration_bitmap_sync_precopy. > > > > If we use the migration_bitmap_sync_precopy in the ram_save_iterate > > function, > > This approach seems to be correct. However, the bitmap will be updated as > > the > > migration thread iterates through each dirty page in the RAMBlock list. > > Compared > > to the existing implementation, this is different but still > straightforward; > > I'll give it a shot soon to see if it works. > > It's still serialized in the migration thread, so I'd expect it is similar > What does "serialized" mean? How about we: 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook, every 5 seconds. 2. register the bg_sync_timer in the main loop when the machine starts like throttle_timer 3. activate the timer when ram_save_iterate gets called and deactivate it in the ram_save_cleanup gracefully during migration. I think it is simple enough and also isn't "serialized"? to e.g. ->state_pending_exact() calls when QEMU flushed most dirty pages in > the current bitmap. > > > > > > > > Btw, we shouldn't have this extra sync exist if auto converge is > disabled > > > no matter which way we use, because it's pure overhead when auto > converge > > > is not in use. > > > > > > > Ok, I'll add the check in the next versioni. > > Let's start with simple, and if there's anything unsure we can discuss > upfront, just to avoid coding something and change direction later. Again, > personally I think we shouldn't add too much new code to auto converge > (unless very well justfied, but I think it's just hard.. fundamentally with > any pure throttling solutions), hopefully something small can make it start > to work for huge VMs. > > Thanks, > > -- > Peter Xu > >
On Fri, Sep 27, 2024 at 10:50:01AM +0800, Yong Huang wrote: > On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <peterx@redhat.com> wrote: > > > On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote: > > > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote: > > > > > > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: > > > > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my > > > > > first idea but it involves bitmap updating and interfere with the > > > > behavior > > > > > of page sending, it also affects the migration information stats and > > > > > interfere other migration logic such as migration_update_rates(). > > > > > > > > Could you elaborate? > > > > > > > > For example, what happens if we start to sync in ram_save_iterate() for > > > > some time intervals (e.g. 5 seconds)? > > > > > > > > > > I didn't try to sync in ram_save_iterate but in the > > > migration_bitmap_sync_precopy. > > > > > > If we use the migration_bitmap_sync_precopy in the ram_save_iterate > > > function, > > > This approach seems to be correct. However, the bitmap will be updated as > > > the > > > migration thread iterates through each dirty page in the RAMBlock list. > > > Compared > > > to the existing implementation, this is different but still > > straightforward; > > > I'll give it a shot soon to see if it works. > > > > It's still serialized in the migration thread, so I'd expect it is similar > > > > What does "serialized" mean? I meant sync() never happens before concurrently with RAM pages being iterated, simply because sync() previously only happens in the migration thread, which is still the same thread that initiate the movement of pages. > > How about we: > 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook, > every 5 seconds. > 2. register the bg_sync_timer in the main loop when the machine starts like > throttle_timer > 3. activate the timer when ram_save_iterate gets called and deactivate it in > the ram_save_cleanup gracefully during migration. > > I think it is simple enough and also isn't "serialized"? If you want to do that with timer that's ok, but then IIUC it doesn't need to involve ram.c code at all. You can rely on cpu_throttle_get_percentage() too just like the throttle timer, and it'll work naturally with migration because outside migration the throttle will be cleared (cpu_throttle_stop() at finish/fail/cancel..). Then it also gracefully align the async thread sync() that it only happens with auto-converge is enabled. Yeh that may look better.. and stick the code together with cpu-throttle.c seems nice. Side note: one thing regarind to sync() is ram_init_bitmaps() sync once, while I don't see why it's necessary. I remember I tried to remove it but maybe I hit some issues and I didn't dig further. If you're working on sync() anyway not sure whether you'd like to have a look.
在 2024/9/27 23:35, Peter Xu 写道: > On Fri, Sep 27, 2024 at 10:50:01AM +0800, Yong Huang wrote: >> On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <peterx@redhat.com> wrote: >> >>> On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote: >>>> On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote: >>>> >>>>> On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: >>>>>> Yes, invoke migration_bitmap_sync_precopy more frequently is also my >>>>>> first idea but it involves bitmap updating and interfere with the >>>>> behavior >>>>>> of page sending, it also affects the migration information stats and >>>>>> interfere other migration logic such as migration_update_rates(). >>>>> Could you elaborate? >>>>> >>>>> For example, what happens if we start to sync in ram_save_iterate() for >>>>> some time intervals (e.g. 5 seconds)? >>>>> >>>> I didn't try to sync in ram_save_iterate but in the >>>> migration_bitmap_sync_precopy. >>>> >>>> If we use the migration_bitmap_sync_precopy in the ram_save_iterate >>>> function, >>>> This approach seems to be correct. However, the bitmap will be updated as >>>> the >>>> migration thread iterates through each dirty page in the RAMBlock list. >>>> Compared >>>> to the existing implementation, this is different but still >>> straightforward; >>>> I'll give it a shot soon to see if it works. >>> It's still serialized in the migration thread, so I'd expect it is similar >>> >> What does "serialized" mean? > I meant sync() never happens before concurrently with RAM pages being > iterated, simply because sync() previously only happens in the migration > thread, which is still the same thread that initiate the movement of pages. > >> How about we: >> 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook, >> every 5 seconds. >> 2. register the bg_sync_timer in the main loop when the machine starts like >> throttle_timer >> 3. activate the timer when ram_save_iterate gets called and deactivate it in >> the ram_save_cleanup gracefully during migration. >> >> I think it is simple enough and also isn't "serialized"? > If you want to do that with timer that's ok, but then IIUC it doesn't need > to involve ram.c code at all. > > You can rely on cpu_throttle_get_percentage() too just like the throttle > timer, and it'll work naturally with migration because outside migration > the throttle will be cleared (cpu_throttle_stop() at finish/fail/cancel..). > > Then it also gracefully align the async thread sync() that it only happens > with auto-converge is enabled. Yeh that may look better.. and stick the > code together with cpu-throttle.c seems nice. Ok, Thanks for the advices, i'll check it and see how it goes. > > Side note: one thing regarind to sync() is ram_init_bitmaps() sync once, > while I don't see why it's necessary. I remember I tried to remove it but > maybe I hit some issues and I didn't dig further. If you're working on > sync() anyway not sure whether you'd like to have a look. Agree, I'll try it after working out current series. Yong
On Fri, Sep 27, 2024 at 11:35 PM Peter Xu <peterx@redhat.com> wrote: > On Fri, Sep 27, 2024 at 10:50:01AM +0800, Yong Huang wrote: > > On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <peterx@redhat.com> wrote: > > > > > On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote: > > > > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <peterx@redhat.com> wrote: > > > > > > > > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: > > > > > > Yes, invoke migration_bitmap_sync_precopy more frequently is > also my > > > > > > first idea but it involves bitmap updating and interfere with the > > > > > behavior > > > > > > of page sending, it also affects the migration information stats > and > > > > > > interfere other migration logic such as migration_update_rates(). > > > > > > > > > > Could you elaborate? > > > > > > > > > > For example, what happens if we start to sync in > ram_save_iterate() for > > > > > some time intervals (e.g. 5 seconds)? > > > > > > > > > > > > > I didn't try to sync in ram_save_iterate but in the > > > > migration_bitmap_sync_precopy. > > > > > > > > If we use the migration_bitmap_sync_precopy in the ram_save_iterate > > > > function, > > > > This approach seems to be correct. However, the bitmap will be > updated as > > > > the > > > > migration thread iterates through each dirty page in the RAMBlock > list. > > > > Compared > > > > to the existing implementation, this is different but still > > > straightforward; > > > > I'll give it a shot soon to see if it works. > > > > > > It's still serialized in the migration thread, so I'd expect it is > similar > > > > > > > What does "serialized" mean? > > I meant sync() never happens before concurrently with RAM pages being > iterated, simply because sync() previously only happens in the migration > thread, which is still the same thread that initiate the movement of pages. > > > > > How about we: > > 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) > hook, > > every 5 seconds. > > 2. register the bg_sync_timer in the main loop when the machine starts > like > > throttle_timer > > 3. activate the timer when ram_save_iterate gets called and deactivate > it in > > the ram_save_cleanup gracefully during migration. > > > > I think it is simple enough and also isn't "serialized"? > > If you want to do that with timer that's ok, but then IIUC it doesn't need > to involve ram.c code at all. > The timer hook will call the migration_bitmap_sync_precopy() which is implemented in ram.c, maybe we can define the hook function in ram.c and expose it in ram.h? > > You can rely on cpu_throttle_get_percentage() too just like the throttle > timer, and it'll work naturally with migration because outside migration > the throttle will be cleared (cpu_throttle_stop() at finish/fail/cancel..). > Relying on cpu_throttle_get_percentage() may miss the sync time window during the second iteration when it last a long time while the throtlle hasn't started yet. I'll think through your idea and apply it as possible. > > Then it also gracefully align the async thread sync() that it only happens > with auto-converge is enabled. Yeh that may look better.. and stick the > code together with cpu-throttle.c seems nice. > > Side note: one thing regarind to sync() is ram_init_bitmaps() sync once, > while I don't see why it's necessary. I remember I tried to remove it but > maybe I hit some issues and I didn't dig further. If you're working on > sync() anyway not sure whether you'd like to have a look. > > -- > Peter Xu > > Thanks, Yong
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h index 0babd105c0..0e327bc0ae 100644 --- a/include/exec/ramblock.h +++ b/include/exec/ramblock.h @@ -24,6 +24,30 @@ #include "qemu/rcu.h" #include "exec/ramlist.h" +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */ + +/* + * The old-fashioned sync, which is, in turn, used for CPU + * throttle and memory transfer. + */ +#define RAMBLOCK_SYN_LEGACY_ITER (1U << 0) + +/* + * The modern sync, which is, in turn, used for CPU throttle + * and memory transfer. + */ +#define RAMBLOCK_SYN_MODERN_ITER (1U << 1) + +/* The modern sync, which is used for CPU throttle only */ +#define RAMBLOCK_SYN_MODERN_BACKGROUND (1U << 2) + +#define RAMBLOCK_SYN_MASK (0x7) + +typedef enum RAMBlockSynMode { + RAMBLOCK_SYN_LEGACY, /* Old-fashined mode */ + RAMBLOCK_SYN_MODERN, /* Background-sync-supported mode */ +} RAMBlockSynMode; + struct RAMBlock { struct rcu_head rcu; struct MemoryRegion *mr; @@ -89,6 +113,27 @@ struct RAMBlock { * could not have been valid on the source. */ ram_addr_t postcopy_length; + + /* + * Used to backup the bmap during background sync to see whether any dirty + * pages were sent during that time. + */ + unsigned long *shadow_bmap; + + /* + * The bitmap "bmap," which was initially used for both sync and memory + * transfer, will be replaced by two bitmaps: the previously used "bmap" + * and the recently added "iter_bmap." Only the memory transfer is + * conducted with the previously used "bmap"; the recently added + * "iter_bmap" is utilized for dirty bitmap sync. + */ + unsigned long *iter_bmap; + + /* Number of new dirty pages during iteration */ + uint64_t iter_dirty_pages; + + /* If background sync has shown up during iteration */ + bool background_sync_shown_up; }; #endif #endif diff --git a/migration/ram.c b/migration/ram.c index 67ca3d5d51..f29faa82d6 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2362,6 +2362,10 @@ static void ram_bitmaps_destroy(void) block->bmap = NULL; g_free(block->file_bmap); block->file_bmap = NULL; + g_free(block->shadow_bmap); + block->shadow_bmap = NULL; + g_free(block->iter_bmap); + block->iter_bmap = NULL; } } @@ -2753,6 +2757,8 @@ static void ram_list_init_bitmaps(void) } block->clear_bmap_shift = shift; block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift)); + block->shadow_bmap = bitmap_new(pages); + block->iter_bmap = bitmap_new(pages); } } }
shadow_bmap, iter_bmap and iter_dirty_pages are introduced to satisfy the need for background sync. Meanwhile, introduce enumeration of sync method. Signed-off-by: Hyman Huang <yong.huang@smartx.com> --- include/exec/ramblock.h | 45 +++++++++++++++++++++++++++++++++++++++++ migration/ram.c | 6 ++++++ 2 files changed, 51 insertions(+)