diff mbox series

[v4,2/7] migration/multifd: Implement zero page transmission on the multifd thread.

Message ID 20240301022829.3390548-3-hao.xiang@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Introduce multifd zero page checking. | expand

Commit Message

Hao Xiang March 1, 2024, 2:28 a.m. UTC
1. Add zero_pages field in MultiFDPacket_t.
2. Implements the zero page detection and handling on the multifd
threads for non-compression, zlib and zstd compression backends.
3. Added a new value 'multifd' in ZeroPageDetection enumeration.
4. Handle migration QEMU9.0 -> QEMU8.2 compatibility.
5. Adds zero page counters and updates multifd send/receive tracing
format to track the newly added counters.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 hw/core/machine.c                |  4 +-
 hw/core/qdev-properties-system.c |  2 +-
 migration/meson.build            |  1 +
 migration/multifd-zero-page.c    | 92 ++++++++++++++++++++++++++++++++
 migration/multifd-zlib.c         | 21 ++++++--
 migration/multifd-zstd.c         | 20 +++++--
 migration/multifd.c              | 83 +++++++++++++++++++++++-----
 migration/multifd.h              | 24 ++++++++-
 migration/ram.c                  |  1 -
 migration/trace-events           |  8 +--
 qapi/migration.json              |  7 ++-
 11 files changed, 230 insertions(+), 33 deletions(-)
 create mode 100644 migration/multifd-zero-page.c

Comments

Markus Armbruster March 1, 2024, 7:28 a.m. UTC | #1
Hao Xiang <hao.xiang@bytedance.com> writes:

> 1. Add zero_pages field in MultiFDPacket_t.
> 2. Implements the zero page detection and handling on the multifd
> threads for non-compression, zlib and zstd compression backends.
> 3. Added a new value 'multifd' in ZeroPageDetection enumeration.
> 4. Handle migration QEMU9.0 -> QEMU8.2 compatibility.
> 5. Adds zero page counters and updates multifd send/receive tracing
> format to track the newly added counters.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8da05dba47..846d0411d5 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -671,10 +671,15 @@
>  #
>  # @legacy: Perform zero page checking in main migration thread.
>  #
> +# @multifd: Perform zero page checking in multifd sender thread.
> +#     This option only takes effect if migration capability multifd
> +#     is set.  Otherwise, it will have the same effect as legacy.

Suggest

   # @multifd: Perform zero page checking in multifd sender thread if
   #     multifd migration is enabled, else in the main migration
   #     thread as for @legacy.

Thoughts?

> +#
>  # Since: 9.0
> +#
>  ##
>  { 'enum': 'ZeroPageDetection',
> -  'data': [ 'none', 'legacy' ] }
> +  'data': [ 'none', 'legacy', 'multifd' ] }
>  
>  ##
>  # @BitmapMigrationBitmapAliasTransform:

QAPI schema
Acked-by: Markus Armbruster <armbru@redhat.com>
Hao Xiang March 1, 2024, 10:49 p.m. UTC | #2
On Thu, Feb 29, 2024 at 11:28 PM Markus Armbruster <armbru@redhat.com> wrote:
>
> Hao Xiang <hao.xiang@bytedance.com> writes:
>
> > 1. Add zero_pages field in MultiFDPacket_t.
> > 2. Implements the zero page detection and handling on the multifd
> > threads for non-compression, zlib and zstd compression backends.
> > 3. Added a new value 'multifd' in ZeroPageDetection enumeration.
> > 4. Handle migration QEMU9.0 -> QEMU8.2 compatibility.
> > 5. Adds zero page counters and updates multifd send/receive tracing
> > format to track the newly added counters.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>
> [...]
>
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 8da05dba47..846d0411d5 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -671,10 +671,15 @@
> >  #
> >  # @legacy: Perform zero page checking in main migration thread.
> >  #
> > +# @multifd: Perform zero page checking in multifd sender thread.
> > +#     This option only takes effect if migration capability multifd
> > +#     is set.  Otherwise, it will have the same effect as legacy.
>
> Suggest
>
>    # @multifd: Perform zero page checking in multifd sender thread if
>    #     multifd migration is enabled, else in the main migration
>    #     thread as for @legacy.
>
> Thoughts?

Sounds good. Will change that.

>
> > +#
> >  # Since: 9.0
> > +#
> >  ##
> >  { 'enum': 'ZeroPageDetection',
> > -  'data': [ 'none', 'legacy' ] }
> > +  'data': [ 'none', 'legacy', 'multifd' ] }
> >
> >  ##
> >  # @BitmapMigrationBitmapAliasTransform:
>
> QAPI schema
> Acked-by: Markus Armbruster <armbru@redhat.com>
>
Peter Xu March 4, 2024, 7:16 a.m. UTC | #3
On Fri, Mar 01, 2024 at 02:28:24AM +0000, Hao Xiang wrote:
> -GlobalProperty hw_compat_8_2[] = {};
> +GlobalProperty hw_compat_8_2[] = {
> +    { "migration", "zero-page-detection", "legacy"},
> +};

I hope we can make it for 9.0, then this (and many rest places) can be kept
as-is.  Let's see..  soft-freeze is March 12th.

One thing to mention is I just sent a pull which has mapped-ram feature
merged.  You may need a rebase onto that, and hopefully mapped-ram can also
use your feature too within the same patch when you repost.

https://lore.kernel.org/all/20240229153017.2221-1-farosas@suse.de/

That rebase may or may not need much caution, I apologize for that:
mapped-ram as a feature was discussed 1+ years, so it was a plan to merge
it (actually still partly of it) into QEMU 9.0.

[...]

> +static bool multifd_zero_page(void)

multifd_zero_page_enabled()?

> +{
> +    return migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD;
> +}
> +
> +static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
> +{
> +    ram_addr_t temp;
> +
> +    if (a == b) {
> +        return;
> +    }
> +
> +    temp = pages_offset[a];
> +    pages_offset[a] = pages_offset[b];
> +    pages_offset[b] = temp;
> +}
> +
> +/**
> + * multifd_send_zero_page_check: Perform zero page detection on all pages.
> + *
> + * Sorts normal pages before zero pages in p->pages->offset and updates
> + * p->pages->normal_num.
> + *
> + * @param p A pointer to the send params.

Nit: the majority of doc style in QEMU (it seems to me) is:

  @p: pointer to @MultiFDSendParams.

> + */
> +void multifd_send_zero_page_check(MultiFDSendParams *p)

multifd_send_zero_page_detect()?

This patch used "check" on both sides, but neither of them is a pure check
to me.  For the other side, maybe multifd_recv_zero_page_process()?  As
that one applies the zero pages.

> +{
> +    MultiFDPages_t *pages = p->pages;
> +    RAMBlock *rb = pages->block;
> +    int i = 0;
> +    int j = pages->num - 1;
> +
> +    /*
> +     * QEMU older than 9.0 don't understand zero page
> +     * on multifd channel. This switch is required to
> +     * maintain backward compatibility.
> +     */

IMHO we can drop this comment; it is not accurate as the user can disable
it explicitly through the parameter, then it may not always about compatibility.

> +    if (multifd_zero_page()) {

Shouldn't this be "!multifd_zero_page_enabled()"?

> +        pages->normal_num = pages->num;
> +        return;
> +    }

The rest looks all sane.

Thanks,
Fabiano Rosas March 4, 2024, 1:17 p.m. UTC | #4
Peter Xu <peterx@redhat.com> writes:

> On Fri, Mar 01, 2024 at 02:28:24AM +0000, Hao Xiang wrote:
>> -GlobalProperty hw_compat_8_2[] = {};
>> +GlobalProperty hw_compat_8_2[] = {
>> +    { "migration", "zero-page-detection", "legacy"},
>> +};
>
> I hope we can make it for 9.0, then this (and many rest places) can be kept
> as-is.  Let's see..  soft-freeze is March 12th.
>
> One thing to mention is I just sent a pull which has mapped-ram feature
> merged.  You may need a rebase onto that, and hopefully mapped-ram can also
> use your feature too within the same patch when you repost.

The key points are:

- The socket migration is under "use_packets", the mapped-ram is under
"!use_packets" always.

- mapped-ram doesn't trasmit zero-pages, it just clears the
corresponding bit in block->file_bmap.

> https://lore.kernel.org/all/20240229153017.2221-1-farosas@suse.de/
>
> That rebase may or may not need much caution, I apologize for that:
> mapped-ram as a feature was discussed 1+ years, so it was a plan to merge
> it (actually still partly of it) into QEMU 9.0.

I started doing that rebase last week and saw issues with a sender
thread always getting -EPIPE at the sendmsg() on the regular socket
migration. Let's hope it was just me being tired. I'll try to get
something ready this week.
Fabiano Rosas March 4, 2024, 2:31 p.m. UTC | #5
Fabiano Rosas <farosas@suse.de> writes:

> Peter Xu <peterx@redhat.com> writes:
>
>> On Fri, Mar 01, 2024 at 02:28:24AM +0000, Hao Xiang wrote:
>>> -GlobalProperty hw_compat_8_2[] = {};
>>> +GlobalProperty hw_compat_8_2[] = {
>>> +    { "migration", "zero-page-detection", "legacy"},
>>> +};
>>
>> I hope we can make it for 9.0, then this (and many rest places) can be kept
>> as-is.  Let's see..  soft-freeze is March 12th.
>>
>> One thing to mention is I just sent a pull which has mapped-ram feature
>> merged.  You may need a rebase onto that, and hopefully mapped-ram can also
>> use your feature too within the same patch when you repost.
>
> The key points are:
>
> - The socket migration is under "use_packets", the mapped-ram is under
> "!use_packets" always.
>
> - mapped-ram doesn't trasmit zero-pages, it just clears the
> corresponding bit in block->file_bmap.
>
>> https://lore.kernel.org/all/20240229153017.2221-1-farosas@suse.de/
>>
>> That rebase may or may not need much caution, I apologize for that:
>> mapped-ram as a feature was discussed 1+ years, so it was a plan to merge
>> it (actually still partly of it) into QEMU 9.0.
>
> I started doing that rebase last week and saw issues with a sender
> thread always getting -EPIPE at the sendmsg() on the regular socket
> migration. Let's hope it was just me being tired. I'll try to get
> something ready this week.

This was just a rebase mistake.

While debugging it I noticed that migration-test doesn't really test
zero page code properly. The guest workload dirties all memory right
away, so I'm not sure we ever see a zero page. A quick test with
multifd, shows p->zero_num=0 all the time.

Any ideas on how to introduce some holes for zero page testing?
Fabiano Rosas March 4, 2024, 2:39 p.m. UTC | #6
Fabiano Rosas <farosas@suse.de> writes:

> Fabiano Rosas <farosas@suse.de> writes:
>
>> Peter Xu <peterx@redhat.com> writes:
>>
>>> On Fri, Mar 01, 2024 at 02:28:24AM +0000, Hao Xiang wrote:
>>>> -GlobalProperty hw_compat_8_2[] = {};
>>>> +GlobalProperty hw_compat_8_2[] = {
>>>> +    { "migration", "zero-page-detection", "legacy"},
>>>> +};
>>>
>>> I hope we can make it for 9.0, then this (and many rest places) can be kept
>>> as-is.  Let's see..  soft-freeze is March 12th.
>>>
>>> One thing to mention is I just sent a pull which has mapped-ram feature
>>> merged.  You may need a rebase onto that, and hopefully mapped-ram can also
>>> use your feature too within the same patch when you repost.
>>
>> The key points are:
>>
>> - The socket migration is under "use_packets", the mapped-ram is under
>> "!use_packets" always.
>>
>> - mapped-ram doesn't trasmit zero-pages, it just clears the
>> corresponding bit in block->file_bmap.
>>
>>> https://lore.kernel.org/all/20240229153017.2221-1-farosas@suse.de/
>>>
>>> That rebase may or may not need much caution, I apologize for that:
>>> mapped-ram as a feature was discussed 1+ years, so it was a plan to merge
>>> it (actually still partly of it) into QEMU 9.0.
>>
>> I started doing that rebase last week and saw issues with a sender
>> thread always getting -EPIPE at the sendmsg() on the regular socket
>> migration. Let's hope it was just me being tired. I'll try to get
>> something ready this week.
>
> This was just a rebase mistake.
>
> While debugging it I noticed that migration-test doesn't really test
> zero page code properly. The guest workload dirties all memory right
> away, so I'm not sure we ever see a zero page. A quick test with
> multifd, shows p->zero_num=0 all the time.
>
> Any ideas on how to introduce some holes for zero page testing?

Aaaand that's another mistake on my part. Scratch that. The tests work
just fine.
Fabiano Rosas March 4, 2024, 6:24 p.m. UTC | #7
Fabiano Rosas <farosas@suse.de> writes:

> Peter Xu <peterx@redhat.com> writes:
>
>> On Fri, Mar 01, 2024 at 02:28:24AM +0000, Hao Xiang wrote:
>>> -GlobalProperty hw_compat_8_2[] = {};
>>> +GlobalProperty hw_compat_8_2[] = {
>>> +    { "migration", "zero-page-detection", "legacy"},
>>> +};
>>
>> I hope we can make it for 9.0, then this (and many rest places) can be kept
>> as-is.  Let's see..  soft-freeze is March 12th.
>>
>> One thing to mention is I just sent a pull which has mapped-ram feature
>> merged.  You may need a rebase onto that, and hopefully mapped-ram can also
>> use your feature too within the same patch when you repost.
>
> The key points are:
>
> - The socket migration is under "use_packets", the mapped-ram is under
> "!use_packets" always.
>
> - mapped-ram doesn't trasmit zero-pages, it just clears the
> corresponding bit in block->file_bmap.
>
>> https://lore.kernel.org/all/20240229153017.2221-1-farosas@suse.de/
>>
>> That rebase may or may not need much caution, I apologize for that:
>> mapped-ram as a feature was discussed 1+ years, so it was a plan to merge
>> it (actually still partly of it) into QEMU 9.0.
>
> I started doing that rebase last week and saw issues with a sender
> thread always getting -EPIPE at the sendmsg() on the regular socket
> migration. Let's hope it was just me being tired.

> I'll try to get something ready this week.

@Hao Xiang:

Here's a branch with the rebase. Please include the first two commits
when you repost:

 migration/multifd: Allow clearing of the file_bmap from multifd
 migration/multifd: Allow zero pages in file migration

There are also two small additions and some conflict resolution at the
"Implement zero page..." commit. Make sure you don't miss them.

https://gitlab.com/farosas/qemu/-/commits/migration-multifd-zero-page

Let me know if you encounter any issues.
Hao Xiang March 9, 2024, 2:37 a.m. UTC | #8
> 
> On Sun, Mar 3, 2024 at 11:16 PM Peter Xu <peterx@redhat.com> wrote:
> 
> > 
> > On Fri, Mar 01, 2024 at 02:28:24AM +0000, Hao Xiang wrote:
> > 
> >  -GlobalProperty hw_compat_8_2[] = {};
> > 
> >  +GlobalProperty hw_compat_8_2[] = {
> > 
> >  + { "migration", "zero-page-detection", "legacy"},
> > 
> >  +};
> > 
> >  I hope we can make it for 9.0, then this (and many rest places) can be kept
> > 
> >  as-is. Let's see.. soft-freeze is March 12th.
> > 
> >  One thing to mention is I just sent a pull which has mapped-ram feature
> > 
> >  merged. You may need a rebase onto that, and hopefully mapped-ram can also
> > 
> >  use your feature too within the same patch when you repost.
> > 
> >  https://lore.kernel.org/all/20240229153017.2221-1-farosas@suse.de/
> > 
> >  That rebase may or may not need much caution, I apologize for that:
> > 
> >  mapped-ram as a feature was discussed 1+ years, so it was a plan to merge
> > 
> >  it (actually still partly of it) into QEMU 9.0.

Let's see if we can catch that.

> > 
> >  [...]
> > 
> >  +static bool multifd_zero_page(void)
> > 
> >  multifd_zero_page_enabled()?

Changed.

> > 
> >  +{
> > 
> >  + return migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD;
> > 
> >  +}
> > 
> >  +
> > 
> >  +static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
> > 
> >  +{
> > 
> >  + ram_addr_t temp;
> > 
> >  +
> > 
> >  + if (a == b) {
> > 
> >  + return;
> > 
> >  + }
> > 
> >  +
> > 
> >  + temp = pages_offset[a];
> > 
> >  + pages_offset[a] = pages_offset[b];
> > 
> >  + pages_offset[b] = temp;
> > 
> >  +}
> > 
> >  +
> > 
> >  +/**
> > 
> >  + * multifd_send_zero_page_check: Perform zero page detection on all pages.
> > 
> >  + *
> > 
> >  + * Sorts normal pages before zero pages in p->pages->offset and updates
> > 
> >  + * p->pages->normal_num.
> > 
> >  + *
> > 
> >  + * @param p A pointer to the send params.
> > 
> >  Nit: the majority of doc style in QEMU (it seems to me) is:
> > 
> >  @p: pointer to @MultiFDSendParams.
> > 
> >  + */
> > 
> >  +void multifd_send_zero_page_check(MultiFDSendParams *p)
> > 
> >  multifd_send_zero_page_detect()?
> > 
> >  This patch used "check" on both sides, but neither of them is a pure check
> > 
> >  to me. For the other side, maybe multifd_recv_zero_page_process()? As
> > 
> >  that one applies the zero pages.


Renamed.

> > 
> >  +{
> > 
> >  + MultiFDPages_t *pages = p->pages;
> > 
> >  + RAMBlock *rb = pages->block;
> > 
> >  + int i = 0;
> > 
> >  + int j = pages->num - 1;
> > 
> >  +
> > 
> >  + /*
> > 
> >  + * QEMU older than 9.0 don't understand zero page
> > 
> >  + * on multifd channel. This switch is required to
> > 
> >  + * maintain backward compatibility.
> > 
> >  + */
> > 
> >  IMHO we can drop this comment; it is not accurate as the user can disable
> > 
> >  it explicitly through the parameter, then it may not always about compatibility.

Dropped.

> > 
> >  + if (multifd_zero_page()) {
> > 
> >  Shouldn't this be "!multifd_zero_page_enabled()"?

Thanks for catching this! My bad. Fixed.

> > 
> >  + pages->normal_num = pages->num;
> > 
> >  + return;
> > 
> >  + }
> > 
> >  The rest looks all sane.
> > 
> >  Thanks,
> > 
> >  --
> > 
> >  Peter Xu
> >
>
Hao Xiang March 9, 2024, 8:08 a.m. UTC | #9
> 
> On Mon, Mar 4, 2024 at 10:24 AM Fabiano Rosas <farosas@suse.de> wrote:
> 
>  
> 
>  
> 
>  Fabiano Rosas <farosas@suse.de> writes:
> 
>  
> 
>  Peter Xu <peterx@redhat.com> writes:
> 
>  
> 
>  On Fri, Mar 01, 2024 at 02:28:24AM +0000, Hao Xiang wrote:
> 
>  
> 
>  -GlobalProperty hw_compat_8_2[] = {};
> 
>  
> 
>  +GlobalProperty hw_compat_8_2[] = {
> 
>  
> 
>  + { "migration", "zero-page-detection", "legacy"},
> 
>  
> 
>  +};
> 
>  
> 
>  I hope we can make it for 9.0, then this (and many rest places) can be kept
> 
>  
> 
>  as-is. Let's see.. soft-freeze is March 12th.
> 
>  
> 
>  One thing to mention is I just sent a pull which has mapped-ram feature
> 
>  
> 
>  merged. You may need a rebase onto that, and hopefully mapped-ram can also
> 
>  
> 
>  use your feature too within the same patch when you repost.
> 
>  
> 
>  The key points are:
> 
>  
> 
>  - The socket migration is under "use_packets", the mapped-ram is under
> 
>  
> 
>  "!use_packets" always.
> 
>  
> 
>  - mapped-ram doesn't trasmit zero-pages, it just clears the
> 
>  
> 
>  corresponding bit in block->file_bmap.
> 
>  
> 
>  https://lore.kernel.org/all/20240229153017.2221-1-farosas@suse.de/
> 
>  
> 
>  That rebase may or may not need much caution, I apologize for that:
> 
>  
> 
>  mapped-ram as a feature was discussed 1+ years, so it was a plan to merge
> 
>  
> 
>  it (actually still partly of it) into QEMU 9.0.
> 
>  
> 
>  I started doing that rebase last week and saw issues with a sender
> 
>  
> 
>  thread always getting -EPIPE at the sendmsg() on the regular socket
> 
>  
> 
>  migration. Let's hope it was just me being tired.
> 
>  
> 
>  I'll try to get something ready this week.
> 
>  
> 
>  @Hao Xiang:
> 
>  
> 
>  Here's a branch with the rebase. Please include the first two commits
> 
>  
> 
>  when you repost:
> 
>  
> 
>  migration/multifd: Allow clearing of the file_bmap from multifd
> 
>  
> 
>  migration/multifd: Allow zero pages in file migration
> 
>  
> 
>  There are also two small additions and some conflict resolution at the
> 
>  
> 
>  "Implement zero page..." commit. Make sure you don't miss them.
> 
>  
> 
>  https://gitlab.com/farosas/qemu/-/commits/migration-multifd-zero-page
> 
>  
> 
>  Let me know if you encounter any issues.
> 

Sorry about the delay. I have rebased and pulled in the two commits you mentioned. Test works fine. I just sent out a new version.

I removed the zero/zero-bytes interface changes out of this patchset but will follow up with a separate one.
diff mbox series

Patch

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9ac5d5389a..0e9d646b61 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -32,7 +32,9 @@ 
 #include "hw/virtio/virtio-net.h"
 #include "audio/audio.h"
 
-GlobalProperty hw_compat_8_2[] = {};
+GlobalProperty hw_compat_8_2[] = {
+    { "migration", "zero-page-detection", "legacy"},
+};
 const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2);
 
 GlobalProperty hw_compat_8_1[] = {
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 228e685f52..6e6f68ae1b 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -682,7 +682,7 @@  const PropertyInfo qdev_prop_mig_mode = {
 const PropertyInfo qdev_prop_zero_page_detection = {
     .name = "ZeroPageDetection",
     .description = "zero_page_detection values, "
-                   "none,legacy",
+                   "none,legacy,multifd",
     .enum_table = &ZeroPageDetection_lookup,
     .get = qdev_propinfo_get_enum,
     .set = qdev_propinfo_set_enum,
diff --git a/migration/meson.build b/migration/meson.build
index 92b1cc4297..1eeb915ff6 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -22,6 +22,7 @@  system_ss.add(files(
   'migration.c',
   'multifd.c',
   'multifd-zlib.c',
+  'multifd-zero-page.c',
   'ram-compress.c',
   'options.c',
   'postcopy-ram.c',
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
new file mode 100644
index 0000000000..e695f6ff7d
--- /dev/null
+++ b/migration/multifd-zero-page.c
@@ -0,0 +1,92 @@ 
+/*
+ * Multifd zero page detection implementation.
+ *
+ * Copyright (c) 2024 Bytedance Inc
+ *
+ * Authors:
+ *  Hao Xiang <hao.xiang@bytedance.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "exec/ramblock.h"
+#include "migration.h"
+#include "multifd.h"
+#include "options.h"
+#include "ram.h"
+
+static bool multifd_zero_page(void)
+{
+    return migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD;
+}
+
+static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
+{
+    ram_addr_t temp;
+
+    if (a == b) {
+        return;
+    }
+
+    temp = pages_offset[a];
+    pages_offset[a] = pages_offset[b];
+    pages_offset[b] = temp;
+}
+
+/**
+ * multifd_send_zero_page_check: Perform zero page detection on all pages.
+ *
+ * Sorts normal pages before zero pages in p->pages->offset and updates
+ * p->pages->normal_num.
+ *
+ * @param p A pointer to the send params.
+ */
+void multifd_send_zero_page_check(MultiFDSendParams *p)
+{
+    MultiFDPages_t *pages = p->pages;
+    RAMBlock *rb = pages->block;
+    int i = 0;
+    int j = pages->num - 1;
+
+    /*
+     * QEMU older than 9.0 don't understand zero page
+     * on multifd channel. This switch is required to
+     * maintain backward compatibility.
+     */
+    if (multifd_zero_page()) {
+        pages->normal_num = pages->num;
+        return;
+    }
+
+    /*
+     * Sort the page offset array by moving all normal pages to
+     * the left and all zero pages to the right of the array.
+     */
+    while (i <= j) {
+        uint64_t offset = pages->offset[i];
+
+        if (!buffer_is_zero(rb->host + offset, p->page_size)) {
+            i++;
+            continue;
+        }
+
+        swap_page_offset(pages->offset, i, j);
+        ram_release_page(rb->idstr, offset);
+        j--;
+    }
+
+    pages->normal_num = i;
+}
+
+void multifd_recv_zero_page_check(MultiFDRecvParams *p)
+{
+    for (int i = 0; i < p->zero_num; i++) {
+        void *page = p->host + p->zero[i];
+        if (!buffer_is_zero(page, p->page_size)) {
+            memset(page, 0, p->page_size);
+        }
+    }
+}
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 012e3bdea1..f749e703a9 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -123,13 +123,15 @@  static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
     int ret;
     uint32_t i;
 
-    multifd_send_prepare_header(p);
+    if (!multifd_send_prepare_common(p)) {
+        goto out;
+    }
 
-    for (i = 0; i < pages->num; i++) {
+    for (i = 0; i < pages->normal_num; i++) {
         uint32_t available = z->zbuff_len - out_size;
         int flush = Z_NO_FLUSH;
 
-        if (i == pages->num - 1) {
+        if (i == pages->normal_num - 1) {
             flush = Z_SYNC_FLUSH;
         }
 
@@ -172,10 +174,10 @@  static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
     p->iov[p->iovs_num].iov_len = out_size;
     p->iovs_num++;
     p->next_packet_size = out_size;
-    p->flags |= MULTIFD_FLAG_ZLIB;
 
+out:
+    p->flags |= MULTIFD_FLAG_ZLIB;
     multifd_send_fill_packet(p);
-
     return 0;
 }
 
@@ -261,6 +263,14 @@  static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
                    p->id, flags, MULTIFD_FLAG_ZLIB);
         return -1;
     }
+
+    multifd_recv_zero_page_check(p);
+
+    if (!p->normal_num) {
+        assert(in_size == 0);
+        return 0;
+    }
+
     ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
 
     if (ret != 0) {
@@ -310,6 +320,7 @@  static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
                    p->id, out_size, expected_size);
         return -1;
     }
+
     return 0;
 }
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index dc8fe43e94..80635e19ef 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -118,16 +118,18 @@  static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
     int ret;
     uint32_t i;
 
-    multifd_send_prepare_header(p);
+    if (!multifd_send_prepare_common(p)) {
+        goto out;
+    }
 
     z->out.dst = z->zbuff;
     z->out.size = z->zbuff_len;
     z->out.pos = 0;
 
-    for (i = 0; i < pages->num; i++) {
+    for (i = 0; i < pages->normal_num; i++) {
         ZSTD_EndDirective flush = ZSTD_e_continue;
 
-        if (i == pages->num - 1) {
+        if (i == pages->normal_num - 1) {
             flush = ZSTD_e_flush;
         }
         z->in.src = p->pages->block->host + pages->offset[i];
@@ -161,10 +163,10 @@  static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
     p->iov[p->iovs_num].iov_len = z->out.pos;
     p->iovs_num++;
     p->next_packet_size = z->out.pos;
-    p->flags |= MULTIFD_FLAG_ZSTD;
 
+out:
+    p->flags |= MULTIFD_FLAG_ZSTD;
     multifd_send_fill_packet(p);
-
     return 0;
 }
 
@@ -257,6 +259,14 @@  static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
                    p->id, flags, MULTIFD_FLAG_ZSTD);
         return -1;
     }
+
+    multifd_recv_zero_page_check(p);
+
+    if (!p->normal_num) {
+        assert(in_size == 0);
+        return 0;
+    }
+
     ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
 
     if (ret != 0) {
diff --git a/migration/multifd.c b/migration/multifd.c
index 6c07f19af1..a95c763df8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -11,6 +11,7 @@ 
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include "qemu/rcu.h"
 #include "exec/target_page.h"
 #include "sysemu/sysemu.h"
@@ -139,6 +140,8 @@  static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
     MultiFDPages_t *pages = p->pages;
     int ret;
 
+    multifd_send_zero_page_check(p);
+
     if (!use_zero_copy_send) {
         /*
          * Only !zerocopy needs the header in IOV; zerocopy will
@@ -147,13 +150,13 @@  static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
         multifd_send_prepare_header(p);
     }
 
-    for (int i = 0; i < pages->num; i++) {
+    for (int i = 0; i < pages->normal_num; i++) {
         p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
         p->iov[p->iovs_num].iov_len = p->page_size;
         p->iovs_num++;
     }
 
-    p->next_packet_size = pages->num * p->page_size;
+    p->next_packet_size = pages->normal_num * p->page_size;
     p->flags |= MULTIFD_FLAG_NOCOMP;
 
     multifd_send_fill_packet(p);
@@ -215,6 +218,13 @@  static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
                    p->id, flags, MULTIFD_FLAG_NOCOMP);
         return -1;
     }
+
+    multifd_recv_zero_page_check(p);
+
+    if (!p->normal_num) {
+        return 0;
+    }
+
     for (int i = 0; i < p->normal_num; i++) {
         p->iov[i].iov_base = p->host + p->normal[i];
         p->iov[i].iov_len = p->page_size;
@@ -249,6 +259,7 @@  static void multifd_pages_reset(MultiFDPages_t *pages)
      * overwritten later when reused.
      */
     pages->num = 0;
+    pages->normal_num = 0;
     pages->block = NULL;
 }
 
@@ -340,11 +351,13 @@  void multifd_send_fill_packet(MultiFDSendParams *p)
     MultiFDPacket_t *packet = p->packet;
     MultiFDPages_t *pages = p->pages;
     uint64_t packet_num;
+    uint32_t zero_num = pages->num - pages->normal_num;
     int i;
 
     packet->flags = cpu_to_be32(p->flags);
     packet->pages_alloc = cpu_to_be32(p->pages->allocated);
-    packet->normal_pages = cpu_to_be32(pages->num);
+    packet->normal_pages = cpu_to_be32(pages->normal_num);
+    packet->zero_pages = cpu_to_be32(zero_num);
     packet->next_packet_size = cpu_to_be32(p->next_packet_size);
 
     packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
@@ -362,10 +375,11 @@  void multifd_send_fill_packet(MultiFDSendParams *p)
     }
 
     p->packets_sent++;
-    p->total_normal_pages += pages->num;
+    p->total_normal_pages += pages->normal_num;
+    p->total_zero_pages += zero_num;
 
-    trace_multifd_send(p->id, packet_num, pages->num, p->flags,
-                       p->next_packet_size);
+    trace_multifd_send(p->id, packet_num, pages->normal_num, zero_num,
+                       p->flags, p->next_packet_size);
 }
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
@@ -406,20 +420,29 @@  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     p->normal_num = be32_to_cpu(packet->normal_pages);
     if (p->normal_num > packet->pages_alloc) {
         error_setg(errp, "multifd: received packet "
-                   "with %u pages and expected maximum pages are %u",
+                   "with %u normal pages and expected maximum pages are %u",
                    p->normal_num, packet->pages_alloc) ;
         return -1;
     }
 
+    p->zero_num = be32_to_cpu(packet->zero_pages);
+    if (p->zero_num > packet->pages_alloc - p->normal_num) {
+        error_setg(errp, "multifd: received packet "
+                   "with %u zero pages and expected maximum zero pages are %u",
+                   p->zero_num, packet->pages_alloc - p->normal_num) ;
+        return -1;
+    }
+
     p->next_packet_size = be32_to_cpu(packet->next_packet_size);
     p->packet_num = be64_to_cpu(packet->packet_num);
     p->packets_recved++;
     p->total_normal_pages += p->normal_num;
+    p->total_zero_pages += p->zero_num;
 
-    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
-                       p->next_packet_size);
+    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
+                       p->flags, p->next_packet_size);
 
-    if (p->normal_num == 0) {
+    if (p->normal_num == 0 && p->zero_num == 0) {
         return 0;
     }
 
@@ -445,6 +468,18 @@  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         p->normal[i] = offset;
     }
 
+    for (i = 0; i < p->zero_num; i++) {
+        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
+
+        if (offset > (p->block->used_length - p->page_size)) {
+            error_setg(errp, "multifd: offset too long %" PRIu64
+                       " (max " RAM_ADDR_FMT ")",
+                       offset, p->block->used_length);
+            return -1;
+        }
+        p->zero[i] = offset;
+    }
+
     return 0;
 }
 
@@ -837,6 +872,8 @@  static void *multifd_send_thread(void *opaque)
 
             stat64_add(&mig_stats.multifd_bytes,
                        p->next_packet_size + p->packet_len);
+            stat64_add(&mig_stats.normal_pages, pages->normal_num);
+            stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
 
             multifd_pages_reset(p->pages);
             p->next_packet_size = 0;
@@ -880,7 +917,8 @@  out:
 
     rcu_unregister_thread();
     migration_threads_remove(thread);
-    trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages);
+    trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages,
+                                  p->total_zero_pages);
 
     return NULL;
 }
@@ -1147,6 +1185,8 @@  static void multifd_recv_cleanup_channel(MultiFDRecvParams *p)
     p->iov = NULL;
     g_free(p->normal);
     p->normal = NULL;
+    g_free(p->zero);
+    p->zero = NULL;
     multifd_recv_state->ops->recv_cleanup(p);
 }
 
@@ -1241,7 +1281,7 @@  static void *multifd_recv_thread(void *opaque)
         p->flags &= ~MULTIFD_FLAG_SYNC;
         qemu_mutex_unlock(&p->mutex);
 
-        if (p->normal_num) {
+        if (p->normal_num + p->zero_num) {
             ret = multifd_recv_state->ops->recv_pages(p, &local_err);
             if (ret != 0) {
                 break;
@@ -1260,7 +1300,9 @@  static void *multifd_recv_thread(void *opaque)
     }
 
     rcu_unregister_thread();
-    trace_multifd_recv_thread_end(p->id, p->packets_recved, p->total_normal_pages);
+    trace_multifd_recv_thread_end(p->id, p->packets_recved,
+                                  p->total_normal_pages,
+                                  p->total_zero_pages);
 
     return NULL;
 }
@@ -1299,6 +1341,7 @@  int multifd_recv_setup(Error **errp)
         p->name = g_strdup_printf("multifdrecv_%d", i);
         p->iov = g_new0(struct iovec, page_count);
         p->normal = g_new0(ram_addr_t, page_count);
+        p->zero = g_new0(ram_addr_t, page_count);
         p->page_count = page_count;
         p->page_size = qemu_target_page_size();
     }
@@ -1368,3 +1411,17 @@  void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
                        QEMU_THREAD_JOINABLE);
     qatomic_inc(&multifd_recv_state->count);
 }
+
+bool multifd_send_prepare_common(MultiFDSendParams *p)
+{
+    multifd_send_zero_page_check(p);
+
+    if (!p->pages->normal_num) {
+        p->next_packet_size = 0;
+        return false;
+    }
+
+    multifd_send_prepare_header(p);
+
+    return true;
+}
diff --git a/migration/multifd.h b/migration/multifd.h
index b3fe27ae93..4b59d5569c 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -48,14 +48,24 @@  typedef struct {
     /* size of the next packet that contains pages */
     uint32_t next_packet_size;
     uint64_t packet_num;
-    uint64_t unused[4];    /* Reserved for future use */
+    /* zero pages */
+    uint32_t zero_pages;
+    uint32_t unused32[1];    /* Reserved for future use */
+    uint64_t unused64[3];    /* Reserved for future use */
     char ramblock[256];
+    /*
+     * This array contains the pointers to:
+     *  - normal pages (initial normal_pages entries)
+     *  - zero pages (following zero_pages entries)
+     */
     uint64_t offset[];
 } __attribute__((packed)) MultiFDPacket_t;
 
 typedef struct {
     /* number of used pages */
     uint32_t num;
+    /* number of normal pages */
+    uint32_t normal_num;
     /* number of allocated pages */
     uint32_t allocated;
     /* offset of each page */
@@ -122,6 +132,8 @@  typedef struct {
     uint64_t packets_sent;
     /* non zero pages sent through this channel */
     uint64_t total_normal_pages;
+    /* zero pages sent through this channel */
+    uint64_t total_zero_pages;
     /* buffers to send */
     struct iovec *iov;
     /* number of iovs used */
@@ -176,12 +188,18 @@  typedef struct {
     uint8_t *host;
     /* non zero pages recv through this channel */
     uint64_t total_normal_pages;
+    /* zero pages recv through this channel */
+    uint64_t total_zero_pages;
     /* buffers to recv */
     struct iovec *iov;
     /* Pages that are not zero */
     ram_addr_t *normal;
     /* num of non zero pages */
     uint32_t normal_num;
+    /* Pages that are zero */
+    ram_addr_t *zero;
+    /* num of zero pages */
+    uint32_t zero_num;
     /* used for de-compression methods */
     void *data;
 } MultiFDRecvParams;
@@ -203,6 +221,9 @@  typedef struct {
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
 void multifd_send_fill_packet(MultiFDSendParams *p);
+bool multifd_send_prepare_common(MultiFDSendParams *p);
+void multifd_send_zero_page_check(MultiFDSendParams *p);
+void multifd_recv_zero_page_check(MultiFDRecvParams *p);
 
 static inline void multifd_send_prepare_header(MultiFDSendParams *p)
 {
@@ -211,5 +232,4 @@  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
     p->iovs_num++;
 }
 
-
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 2581ee1e04..e1fa229acf 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1258,7 +1258,6 @@  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
     if (!multifd_queue_page(block, offset)) {
         return -1;
     }
-    stat64_add(&mig_stats.normal_pages, 1);
 
     return 1;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 298ad2b0dd..9f1d7ae71a 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -128,21 +128,21 @@  postcopy_preempt_reset_channel(void) ""
 # multifd.c
 multifd_new_send_channel_async(uint8_t id) "channel %u"
 multifd_new_send_channel_async_error(uint8_t id, void *err) "channel=%u err=%p"
-multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
+multifd_recv(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t zero, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
 multifd_recv_new_channel(uint8_t id) "channel %u"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
 multifd_recv_sync_main_signal(uint8_t id) "channel %u"
 multifd_recv_sync_main_wait(uint8_t id) "channel %u"
 multifd_recv_terminate_threads(bool error) "error %d"
-multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %" PRIu64
+multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %" PRIu64 " zero pages %" PRIu64
 multifd_recv_thread_start(uint8_t id) "%u"
-multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u flags 0x%x next packet size %u"
+multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal_pages, uint32_t zero_pages, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
 multifd_send_error(uint8_t id) "channel %u"
 multifd_send_sync_main(long packet_num) "packet num %ld"
 multifd_send_sync_main_signal(uint8_t id) "channel %u"
 multifd_send_sync_main_wait(uint8_t id) "channel %u"
 multifd_send_terminate_threads(void) ""
-multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
+multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64 " zero pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
 multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
 multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p err=%s"
diff --git a/qapi/migration.json b/qapi/migration.json
index 8da05dba47..846d0411d5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -671,10 +671,15 @@ 
 #
 # @legacy: Perform zero page checking in main migration thread.
 #
+# @multifd: Perform zero page checking in multifd sender thread.
+#     This option only takes effect if migration capability multifd
+#     is set.  Otherwise, it will have the same effect as legacy.
+#
 # Since: 9.0
+#
 ##
 { 'enum': 'ZeroPageDetection',
-  'data': [ 'none', 'legacy' ] }
+  'data': [ 'none', 'legacy', 'multifd' ] }
 
 ##
 # @BitmapMigrationBitmapAliasTransform: