diff mbox series

[v4,3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)

Message ID 20211009075612.230283-4-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series MSG_ZEROCOPY for multifd | expand

Commit Message

Leonardo Bras Oct. 9, 2021, 7:56 a.m. UTC
Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
zerocopy interface.

Change multifd_send_sync_main() so it can distinguish the last sync from
the setup and per-iteration ones, so a flush_zerocopy() can be called
at the last sync in order to make sure all RAM is sent before finishing
the migration.

Also make it return -1 if flush_zerocopy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between async_send() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

Given a lot of locked memory may be needed in order to use multid migration
with zerocopy enabled, make it optional by creating a new parameter
multifd-zerocopy on qapi, so low-privileged users can still perform multifd
migrations.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 qapi/migration.json   | 18 ++++++++++++++++++
 migration/migration.h |  1 +
 migration/multifd.h   |  2 +-
 migration/migration.c | 20 ++++++++++++++++++++
 migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
 migration/ram.c       | 20 +++++++++++++-------
 monitor/hmp-cmds.c    |  4 ++++
 7 files changed, 85 insertions(+), 13 deletions(-)

Comments

Eric Blake Oct. 11, 2021, 7:31 p.m. UTC | #1
On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> zerocopy interface.
> 
> Change multifd_send_sync_main() so it can distinguish the last sync from
> the setup and per-iteration ones, so a flush_zerocopy() can be called
> at the last sync in order to make sure all RAM is sent before finishing
> the migration.
> 
> Also make it return -1 if flush_zerocopy() fails, in order to cancel
> the migration process, and avoid resuming the guest in the target host
> without receiving all current RAM.
> 
> This will work fine on RAM migration because the RAM pages are not usually freed,
> and there is no problem on changing the pages content between async_send() and
> the actual sending of the buffer, because this change will dirty the page and
> cause it to be re-sent on a next iteration anyway.
> 
> Given a lot of locked memory may be needed in order to use multid migration
> with zerocopy enabled, make it optional by creating a new parameter
> multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> migrations.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  qapi/migration.json   | 18 ++++++++++++++++++
>  migration/migration.h |  1 +
>  migration/multifd.h   |  2 +-
>  migration/migration.c | 20 ++++++++++++++++++++
>  migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
>  migration/ram.c       | 20 +++++++++++++-------
>  monitor/hmp-cmds.c    |  4 ++++
>  7 files changed, 85 insertions(+), 13 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88f07baedd..c4890cbb54 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -724,6 +724,11 @@
>  #                      will consume more CPU.
>  #                      Defaults to 1. (Since 5.0)
>  #
> +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> +#                    When true, enables a zerocopy mechanism for sending memory
> +#                    pages, if host does support it.

s/does support/supports/ (several instances this patch)

> +#                    Defaults to false. (Since 6.2)
> +#
>  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
>  #                        aliases for the purpose of dirty bitmap migration.  Such
>  #                        aliases may for example be the corresponding names on the
> @@ -758,6 +763,7 @@
>             'xbzrle-cache-size', 'max-postcopy-bandwidth',
>             'max-cpu-throttle', 'multifd-compression',
>             'multifd-zlib-level' ,'multifd-zstd-level',
> +	   'multifd-zerocopy',
>             'block-bitmap-mapping' ] }

Should this feature be guarded with 'if':'CONFIG_LINUX', since that's
the only platform where you even compile the code (even then, it can
still fail if the kernel doesn't support it).
Leonardo Bras Oct. 11, 2021, 7:56 p.m. UTC | #2
Hello Eric,

On Mon, Oct 11, 2021 at 4:32 PM Eric Blake <eblake@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish the last sync from
> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> > at the last sync in order to make sure all RAM is sent before finishing
> > the migration.
> >
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new parameter
> > multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> > migrations.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  qapi/migration.json   | 18 ++++++++++++++++++
> >  migration/migration.h |  1 +
> >  migration/multifd.h   |  2 +-
> >  migration/migration.c | 20 ++++++++++++++++++++
> >  migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
> >  migration/ram.c       | 20 +++++++++++++-------
> >  monitor/hmp-cmds.c    |  4 ++++
> >  7 files changed, 85 insertions(+), 13 deletions(-)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 88f07baedd..c4890cbb54 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -724,6 +724,11 @@
> >  #                      will consume more CPU.
> >  #                      Defaults to 1. (Since 5.0)
> >  #
> > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> > +#                    When true, enables a zerocopy mechanism for sending memory
> > +#                    pages, if host does support it.
>
> s/does support/supports/ (several instances this patch)

I will make sure to fix that in v5.

>
> > +#                    Defaults to false. (Since 6.2)
> > +#
> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> >  #                        aliases for the purpose of dirty bitmap migration.  Such
> >  #                        aliases may for example be the corresponding names on the
> > @@ -758,6 +763,7 @@
> >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> >             'max-cpu-throttle', 'multifd-compression',
> >             'multifd-zlib-level' ,'multifd-zstd-level',
> > +        'multifd-zerocopy',
> >             'block-bitmap-mapping' ] }
>
> Should this feature be guarded with 'if':'CONFIG_LINUX', since that's
> the only platform where you even compile the code (even then, it can
> still fail if the kernel doesn't support it).

I think it makes sense for the feature to always be available, even
though it's not supported
outside linux > v4.14.

IMHO it makes more sense for the user to get an error when it starts
migration, due to host
not supporting zerocopy, than the error happening in the config step,
which may cause the user
to question if it was the right parameter.

The config message error here could also be ignored, and users can
think zerocopy is working, while it's not.

For automated migrations, this feature should never be enabled  for
non-linux / older linux hosts anyway.

Is there a good argument I am missing for this feature being disabled
on non-linux?

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

Best regards,
Leo
Markus Armbruster Oct. 12, 2021, 5:53 a.m. UTC | #3
Leonardo Bras Soares Passos <leobras@redhat.com> writes:

> Hello Eric,
>
> On Mon, Oct 11, 2021 at 4:32 PM Eric Blake <eblake@redhat.com> wrote:
>>
>> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
>> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
>> > zerocopy interface.
>> >
>> > Change multifd_send_sync_main() so it can distinguish the last sync from
>> > the setup and per-iteration ones, so a flush_zerocopy() can be called
>> > at the last sync in order to make sure all RAM is sent before finishing
>> > the migration.
>> >
>> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
>> > the migration process, and avoid resuming the guest in the target host
>> > without receiving all current RAM.
>> >
>> > This will work fine on RAM migration because the RAM pages are not usually freed,
>> > and there is no problem on changing the pages content between async_send() and
>> > the actual sending of the buffer, because this change will dirty the page and
>> > cause it to be re-sent on a next iteration anyway.
>> >
>> > Given a lot of locked memory may be needed in order to use multid migration
>> > with zerocopy enabled, make it optional by creating a new parameter
>> > multifd-zerocopy on qapi, so low-privileged users can still perform multifd
>> > migrations.
>> >
>> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>> > ---
>> >  qapi/migration.json   | 18 ++++++++++++++++++
>> >  migration/migration.h |  1 +
>> >  migration/multifd.h   |  2 +-
>> >  migration/migration.c | 20 ++++++++++++++++++++
>> >  migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
>> >  migration/ram.c       | 20 +++++++++++++-------
>> >  monitor/hmp-cmds.c    |  4 ++++
>> >  7 files changed, 85 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/qapi/migration.json b/qapi/migration.json
>> > index 88f07baedd..c4890cbb54 100644
>> > --- a/qapi/migration.json
>> > +++ b/qapi/migration.json
>> > @@ -724,6 +724,11 @@
>> >  #                      will consume more CPU.
>> >  #                      Defaults to 1. (Since 5.0)
>> >  #
>> > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
>> > +#                    When true, enables a zerocopy mechanism for sending memory
>> > +#                    pages, if host does support it.
>>
>> s/does support/supports/ (several instances this patch)
>
> I will make sure to fix that in v5.
>
>>
>> > +#                    Defaults to false. (Since 6.2)
>> > +#
>> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
>> >  #                        aliases for the purpose of dirty bitmap migration.  Such
>> >  #                        aliases may for example be the corresponding names on the
>> > @@ -758,6 +763,7 @@
>> >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
>> >             'max-cpu-throttle', 'multifd-compression',
>> >             'multifd-zlib-level' ,'multifd-zstd-level',
>> > +        'multifd-zerocopy',
>> >             'block-bitmap-mapping' ] }
>>
>> Should this feature be guarded with 'if':'CONFIG_LINUX', since that's
>> the only platform where you even compile the code (even then, it can
>> still fail if the kernel doesn't support it).
>
> I think it makes sense for the feature to always be available, even
> though it's not supported
> outside linux > v4.14.
>
> IMHO it makes more sense for the user to get an error when it starts
> migration, due to host
> not supporting zerocopy, than the error happening in the config step,
> which may cause the user
> to question if it was the right parameter.
>
> The config message error here could also be ignored, and users can
> think zerocopy is working, while it's not.
>
> For automated migrations, this feature should never be enabled  for
> non-linux / older linux hosts anyway.
>
> Is there a good argument I am missing for this feature being disabled
> on non-linux?

The general argument for having QAPI schema 'if' mirror the C
implementation's #if is introspection.  Let me explain why that matters.

Consider a management application that supports a range of QEMU
versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
new in QEMU 6.2.  The sane way to do that is to probe for the command
with query-qmp-schema.  Same for command arguments, and anything else
QMP.

If you doubt "sane", check out Part II of "QEMU interface introspection:
From hacks to solutions"[*].

The same technique works when a QMP command / argument / whatever is
compile-time conditional ('if' in the schema).  The code the management
application needs anyway to deal with older QEMU now also deals with
"compiled out".  Nice.

Of course, a command or argument present in QEMU can still fail, and the
management application still needs to handle failure.  Distinguishing
different failure modes can be bothersome and/or fragile.

By making the QAPI schema conditional mirror the C conditional, you
squash the failure mode "this version of QEMU supports it, but this
build of QEMU does not" into "this version of QEMU does not support
it".  Makes sense, doesn't it?

A minor additional advantage is less generated code.



[*] http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
Peter Xu Oct. 13, 2021, 6:23 a.m. UTC | #4
On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88f07baedd..c4890cbb54 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -724,6 +724,11 @@
>  #                      will consume more CPU.
>  #                      Defaults to 1. (Since 5.0)
>  #
> +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> +#                    When true, enables a zerocopy mechanism for sending memory
> +#                    pages, if host does support it.
> +#                    Defaults to false. (Since 6.2)
> +#

Shall we keep it named "@zerocopy"?  Yes we have that dependency with multifd,
but it's fine to just fail the configuration if multifd not set. The thing is
we don't know whether that dependency will last forever, and we probably don't
want to introduce yet another feature bit when we can remove the dependency..
as we can't remove the old one to be compatible.
Peter Xu Oct. 13, 2021, 6:26 a.m. UTC | #5
On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> @@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
>   */
>  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
>  {
> -    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> +    int flags = 0;
> +
> +    if (migrate_multifd_zerocopy()) {
> +        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> +    }
> +
> +    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);
>  }

What if the user specified ZEROCOPY+MULTIFD, but the kernel doesn't support it?

IIUC then the first call to qio_channel_writev_all_flags() above will fail,
then we fail the migration.

It seems fine, but since you've introduced QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY
in the previous patch - I think the cleaner way is when start migration and
after we setup the ioc, we sanity check on the capability and the ioc to make
sure if ZEROCOPY+MULTIFD is specified, we should fail early if we're sure the
ioc doesn't have QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY bit set?
Leonardo Bras Oct. 27, 2021, 6:47 a.m. UTC | #6
On Wed, Oct 13, 2021 at 3:24 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 88f07baedd..c4890cbb54 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -724,6 +724,11 @@
> >  #                      will consume more CPU.
> >  #                      Defaults to 1. (Since 5.0)
> >  #
> > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> > +#                    When true, enables a zerocopy mechanism for sending memory
> > +#                    pages, if host does support it.
> > +#                    Defaults to false. (Since 6.2)
> > +#
>
> Shall we keep it named "@zerocopy"?  Yes we have that dependency with multifd,
> but it's fine to just fail the configuration if multifd not set. The thing is
> we don't know whether that dependency will last forever, and we probably don't
> want to introduce yet another feature bit when we can remove the dependency..
> as we can't remove the old one to be compatible.

It makes sense not wanting to create a new future bit in the future,
but if we just add a
"@zerocopy' , wouldn't we need to fail every migration setup that
don't support zerocopy?

(Thinking back, to stay as it is, it would also be necessary that I
find a way to fail other multifd setups that don't support zerocopy,
for v5)

>
> --
> Peter Xu
>
Leonardo Bras Oct. 27, 2021, 6:50 a.m. UTC | #7
On Wed, Oct 13, 2021 at 3:26 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > @@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
> >   */
> >  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
> >  {
> > -    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> > +    int flags = 0;
> > +
> > +    if (migrate_multifd_zerocopy()) {
> > +        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
> > +
> > +    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);
> >  }
>
> What if the user specified ZEROCOPY+MULTIFD, but the kernel doesn't support it?
>
> IIUC then the first call to qio_channel_writev_all_flags() above will fail,
> then we fail the migration.
>
> It seems fine, but since you've introduced QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY
> in the previous patch - I think the cleaner way is when start migration and
> after we setup the ioc, we sanity check on the capability and the ioc to make
> sure if ZEROCOPY+MULTIFD is specified, we should fail early if we're sure the
> ioc doesn't have QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY bit set?
>
> --
> Peter Xu
>

Failing earlier is always a good idea.
I will try to implement that.

Thanks Peter!

Best regards,
Leo
Peter Xu Oct. 27, 2021, 7:06 a.m. UTC | #8
On Wed, Oct 27, 2021 at 03:47:18AM -0300, Leonardo Bras Soares Passos wrote:
> On Wed, Oct 13, 2021 at 3:24 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 88f07baedd..c4890cbb54 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -724,6 +724,11 @@
> > >  #                      will consume more CPU.
> > >  #                      Defaults to 1. (Since 5.0)
> > >  #
> > > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> > > +#                    When true, enables a zerocopy mechanism for sending memory
> > > +#                    pages, if host does support it.
> > > +#                    Defaults to false. (Since 6.2)
> > > +#
> >
> > Shall we keep it named "@zerocopy"?  Yes we have that dependency with multifd,
> > but it's fine to just fail the configuration if multifd not set. The thing is
> > we don't know whether that dependency will last forever, and we probably don't
> > want to introduce yet another feature bit when we can remove the dependency..
> > as we can't remove the old one to be compatible.
> 
> It makes sense not wanting to create a new future bit in the future,
> but if we just add a
> "@zerocopy' , wouldn't we need to fail every migration setup that
> don't support zerocopy?
> 
> (Thinking back, to stay as it is, it would also be necessary that I
> find a way to fail other multifd setups that don't support zerocopy,
> for v5)

Yes I think so; imho we can fail either whey applying the bit, or it's okay too
to fail at the start of migration.  Thanks,
Leonardo Bras Oct. 28, 2021, 1:56 a.m. UTC | #9
Hello Markus!
(comments at the bottom)

On Tue, Oct 12, 2021 at 2:54 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leobras@redhat.com> writes:
>
> > Hello Eric,
> >
> > On Mon, Oct 11, 2021 at 4:32 PM Eric Blake <eblake@redhat.com> wrote:
> >>
> >> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> >> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> >> > zerocopy interface.
> >> >
> >> > Change multifd_send_sync_main() so it can distinguish the last sync from
> >> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> >> > at the last sync in order to make sure all RAM is sent before finishing
> >> > the migration.
> >> >
> >> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> >> > the migration process, and avoid resuming the guest in the target host
> >> > without receiving all current RAM.
> >> >
> >> > This will work fine on RAM migration because the RAM pages are not usually freed,
> >> > and there is no problem on changing the pages content between async_send() and
> >> > the actual sending of the buffer, because this change will dirty the page and
> >> > cause it to be re-sent on a next iteration anyway.
> >> >
> >> > Given a lot of locked memory may be needed in order to use multid migration
> >> > with zerocopy enabled, make it optional by creating a new parameter
> >> > multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> >> > migrations.
> >> >
> >> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> >> > ---
> >> >  qapi/migration.json   | 18 ++++++++++++++++++
> >> >  migration/migration.h |  1 +
> >> >  migration/multifd.h   |  2 +-
> >> >  migration/migration.c | 20 ++++++++++++++++++++
> >> >  migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
> >> >  migration/ram.c       | 20 +++++++++++++-------
> >> >  monitor/hmp-cmds.c    |  4 ++++
> >> >  7 files changed, 85 insertions(+), 13 deletions(-)
> >> >
> >> > diff --git a/qapi/migration.json b/qapi/migration.json
> >> > index 88f07baedd..c4890cbb54 100644
> >> > --- a/qapi/migration.json
> >> > +++ b/qapi/migration.json
> >> > @@ -724,6 +724,11 @@
> >> >  #                      will consume more CPU.
> >> >  #                      Defaults to 1. (Since 5.0)
> >> >  #
> >> > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> >> > +#                    When true, enables a zerocopy mechanism for sending memory
> >> > +#                    pages, if host does support it.
> >>
> >> s/does support/supports/ (several instances this patch)
> >
> > I will make sure to fix that in v5.
> >
> >>
> >> > +#                    Defaults to false. (Since 6.2)
> >> > +#
> >> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> >> >  #                        aliases for the purpose of dirty bitmap migration.  Such
> >> >  #                        aliases may for example be the corresponding names on the
> >> > @@ -758,6 +763,7 @@
> >> >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> >> >             'max-cpu-throttle', 'multifd-compression',
> >> >             'multifd-zlib-level' ,'multifd-zstd-level',
> >> > +        'multifd-zerocopy',
> >> >             'block-bitmap-mapping' ] }
> >>
> >> Should this feature be guarded with 'if':'CONFIG_LINUX', since that's
> >> the only platform where you even compile the code (even then, it can
> >> still fail if the kernel doesn't support it).
> >
> > I think it makes sense for the feature to always be available, even
> > though it's not supported
> > outside linux > v4.14.
> >
> > IMHO it makes more sense for the user to get an error when it starts
> > migration, due to host
> > not supporting zerocopy, than the error happening in the config step,
> > which may cause the user
> > to question if it was the right parameter.
> >
> > The config message error here could also be ignored, and users can
> > think zerocopy is working, while it's not.
> >
> > For automated migrations, this feature should never be enabled  for
> > non-linux / older linux hosts anyway.
> >
> > Is there a good argument I am missing for this feature being disabled
> > on non-linux?
>
> The general argument for having QAPI schema 'if' mirror the C
> implementation's #if is introspection.  Let me explain why that matters.
>
> Consider a management application that supports a range of QEMU
> versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
> new in QEMU 6.2.  The sane way to do that is to probe for the command
> with query-qmp-schema.  Same for command arguments, and anything else
> QMP.
>
> If you doubt "sane", check out Part II of "QEMU interface introspection:
> From hacks to solutions"[*].
>
> The same technique works when a QMP command / argument / whatever is
> compile-time conditional ('if' in the schema).  The code the management
> application needs anyway to deal with older QEMU now also deals with
> "compiled out".  Nice.
>
> Of course, a command or argument present in QEMU can still fail, and the
> management application still needs to handle failure.  Distinguishing
> different failure modes can be bothersome and/or fragile.
>
> By making the QAPI schema conditional mirror the C conditional, you
> squash the failure mode "this version of QEMU supports it, but this
> build of QEMU does not" into "this version of QEMU does not support
> it".  Makes sense, doesn't it?
>
> A minor additional advantage is less generated code.
>
>
>
> [*] http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
>

This was very informative, thanks!
I now understand the rationale about this choice.

TBH I am not very used to this syntax.
I did a take a peek at some other json files, and ended adding this
lines in code, which compiled just fine:

for : enum MigrationParameter
{'name': 'multifd-zerocopy', 'if' : 'CONFIG_LINUX'},

for : struct MigrateSetParameters and struct MigrationParameters:
'*multifd-zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },

Is that enough? Is there any other necessary change?

Thanks for reviewing and for helping out with this!

Best regards,
Leonardo Bras
Markus Armbruster Oct. 28, 2021, 4:30 a.m. UTC | #10
Leonardo Bras Soares Passos <leobras@redhat.com> writes:

[...]

>> The general argument for having QAPI schema 'if' mirror the C
>> implementation's #if is introspection.  Let me explain why that matters.
>>
>> Consider a management application that supports a range of QEMU
>> versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
>> new in QEMU 6.2.  The sane way to do that is to probe for the command
>> with query-qmp-schema.  Same for command arguments, and anything else
>> QMP.
>>
>> If you doubt "sane", check out Part II of "QEMU interface introspection:
>> From hacks to solutions"[*].
>>
>> The same technique works when a QMP command / argument / whatever is
>> compile-time conditional ('if' in the schema).  The code the management
>> application needs anyway to deal with older QEMU now also deals with
>> "compiled out".  Nice.
>>
>> Of course, a command or argument present in QEMU can still fail, and the
>> management application still needs to handle failure.  Distinguishing
>> different failure modes can be bothersome and/or fragile.
>>
>> By making the QAPI schema conditional mirror the C conditional, you
>> squash the failure mode "this version of QEMU supports it, but this
>> build of QEMU does not" into "this version of QEMU does not support
>> it".  Makes sense, doesn't it?
>>
>> A minor additional advantage is less generated code.
>>
>>
>>
>> [*] http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
>>
>
> This was very informative, thanks!
> I now understand the rationale about this choice.
>
> TBH I am not very used to this syntax.
> I did a take a peek at some other json files, and ended adding this
> lines in code, which compiled just fine:
>
> for : enum MigrationParameter
> {'name': 'multifd-zerocopy', 'if' : 'CONFIG_LINUX'},
>
> for : struct MigrateSetParameters and struct MigrationParameters:
> '*multifd-zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
>
> Is that enough? Is there any other necessary change?

Looks good to me.

The QAPI schema language is documented in docs/devel/qapi-code-gen.rst.

If you're curious, you can diff code generated into qapi/ before and
after adding the 'if'.

> Thanks for reviewing and for helping out with this!

My pleasure!
Leonardo Bras Oct. 28, 2021, 4:37 a.m. UTC | #11
On Thu, Oct 28, 2021 at 1:30 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leobras@redhat.com> writes:
>
> [...]
>
> >> The general argument for having QAPI schema 'if' mirror the C
> >> implementation's #if is introspection.  Let me explain why that matters.
> >>
> >> Consider a management application that supports a range of QEMU
> >> versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
> >> new in QEMU 6.2.  The sane way to do that is to probe for the command
> >> with query-qmp-schema.  Same for command arguments, and anything else
> >> QMP.
> >>
> >> If you doubt "sane", check out Part II of "QEMU interface introspection:
> >> From hacks to solutions"[*].
> >>
> >> The same technique works when a QMP command / argument / whatever is
> >> compile-time conditional ('if' in the schema).  The code the management
> >> application needs anyway to deal with older QEMU now also deals with
> >> "compiled out".  Nice.
> >>
> >> Of course, a command or argument present in QEMU can still fail, and the
> >> management application still needs to handle failure.  Distinguishing
> >> different failure modes can be bothersome and/or fragile.
> >>
> >> By making the QAPI schema conditional mirror the C conditional, you
> >> squash the failure mode "this version of QEMU supports it, but this
> >> build of QEMU does not" into "this version of QEMU does not support
> >> it".  Makes sense, doesn't it?
> >>
> >> A minor additional advantage is less generated code.
> >>
> >>
> >>
> >> [*] http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
> >>
> >
> > This was very informative, thanks!
> > I now understand the rationale about this choice.
> >
> > TBH I am not very used to this syntax.
> > I did a take a peek at some other json files, and ended adding this
> > lines in code, which compiled just fine:
> >
> > for : enum MigrationParameter
> > {'name': 'multifd-zerocopy', 'if' : 'CONFIG_LINUX'},
> >
> > for : struct MigrateSetParameters and struct MigrationParameters:
> > '*multifd-zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> >
> > Is that enough? Is there any other necessary change?
>
> Looks good to me.
>
> The QAPI schema language is documented in docs/devel/qapi-code-gen.rst.

Thanks for reviewing and for pointing this docs!

>
> If you're curious, you can diff code generated into qapi/ before and
> after adding the 'if'.

Good idea!

>
> > Thanks for reviewing and for helping out with this!
>
> My pleasure!
>

:)

Best regards,
Leo
Juan Quintela Nov. 2, 2021, 12:32 p.m. UTC | #12
Leonardo Bras <leobras@redhat.com> wrote:
> Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> zerocopy interface.
>
> Change multifd_send_sync_main() so it can distinguish the last sync from
> the setup and per-iteration ones, so a flush_zerocopy() can be called
> at the last sync in order to make sure all RAM is sent before finishing
> the migration.

You need to do this after each iteration.  Otherwise it can happen that:

channel 1:               channel 2:

   send page 11

next iteration
                         send page 11

                         this page arrives

now arrives this old copy.

After each iteration, one needs to be sure that no ram is inflight.

This means that I think you don't need the last_sync parameter at all,
as you have to do the flush() in every iteration.

> Also make it return -1 if flush_zerocopy() fails, in order to cancel
> the migration process, and avoid resuming the guest in the target host
> without receiving all current RAM.
>
> This will work fine on RAM migration because the RAM pages are not usually freed,
> and there is no problem on changing the pages content between async_send() and
> the actual sending of the buffer, because this change will dirty the page and
> cause it to be re-sent on a next iteration anyway.
>
> Given a lot of locked memory may be needed in order to use multid migration
> with zerocopy enabled, make it optional by creating a new parameter
> multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> migrations.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

I think it is better that you split this patch into two:

- Add the new parameter (it looks good to me, and can be reviewed-by)
- Implement the feature, here probably you need more changes/review


>              '*multifd-zlib-level': 'uint8',
>              '*multifd-zstd-level': 'uint8',
> +	    '*multifd-zerocopy': 'bool',
>              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }

Something weird here.

>              '*multifd-compression': 'MultiFDCompression',
>              '*multifd-zlib-level': 'uint8',
>              '*multifd-zstd-level': 'uint8',
> +	    '*multifd-zerocopy': 'bool',
>              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>  

Same here.


> @@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
>   */
>  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
>  {
> -    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> +    int flags = 0;
> +
> +    if (migrate_multifd_zerocopy()) {
> +        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> +    }

You have added an if on each write, just add it during initialization.

There is already a uint32_t flags field in MultiFDRecvParams, but you
can add a

int write_flags;

one and add it during initialization.  That way you don't need any check
here, just pass it.

> +    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);


> -void multifd_send_sync_main(QEMUFile *f)
> +int multifd_send_sync_main(QEMUFile *f, bool last_sync)

As you need to check every round, you now have to check for errors on
every multifd_send_sync_main() call.  It really looked weird that you
only need to check it sometimes.

> @@ -3006,13 +3006,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          ram_control_after_iterate(f, RAM_CONTROL_FINISH);
>      }
>  
> -    if (ret >= 0) {
> -        multifd_send_sync_main(rs->f);
> -        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> -        qemu_fflush(f);
> +    if (ret < 0) {
> +        return ret;
>      }
>  
> -    return ret;
> +    ret = multifd_send_sync_main(rs->f, true);
> +    if (ret < 0) {
> +        return -1;

Why are you returning -1 instead of ret?

Callers of ram_save_complete(). We set qemu_error_file() with that
error, so it is not a good idea to reset it.


Later, Juan.
Leonardo Bras Nov. 3, 2021, 9:29 p.m. UTC | #13
Hello Juan,

On Tue, Nov 2, 2021 at 9:32 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish the last sync from
> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> > at the last sync in order to make sure all RAM is sent before finishing
> > the migration.
>
> You need to do this after each iteration.  Otherwise it can happen that:
>
> channel 1:               channel 2:
>
>    send page 11
>
> next iteration
>                          send page 11
>
>                          this page arrives
>
> now arrives this old copy.

Current multifd's sendmsg() will block until all data is sent, is that correct?

If that's the case, and supposing the network driver supports
multiqueue, maybe there is a small chance for this to happen.
I will add the flush at the end of each iteration, just to be sure.

>
> After each iteration, one needs to be sure that no ram is inflight.
>
> This means that I think you don't need the last_sync parameter at all,
> as you have to do the flush() in every iteration.
>
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new parameter
> > multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> > migrations.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>
> I think it is better that you split this patch into two:
>
> - Add the new parameter (it looks good to me, and can be reviewed-by)
> - Implement the feature, here probably you need more changes/review

Sure, I will try to divide the patch like this.

>
>
> >              '*multifd-zlib-level': 'uint8',
> >              '*multifd-zstd-level': 'uint8',
> > +         '*multifd-zerocopy': 'bool',
> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>
> Something weird here.
>
> >              '*multifd-compression': 'MultiFDCompression',
> >              '*multifd-zlib-level': 'uint8',
> >              '*multifd-zstd-level': 'uint8',
> > +         '*multifd-zerocopy': 'bool',
> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> >
>
> Same here.

Could you please elaborate?

>
>
> > @@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
> >   */
> >  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
> >  {
> > -    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> > +    int flags = 0;
> > +
> > +    if (migrate_multifd_zerocopy()) {
> > +        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
>
> You have added an if on each write, just add it during initialization.



>
> There is already a uint32_t flags field in MultiFDRecvParams, but you
> can add a
>
> int write_flags;
>
> one and add it during initialization.  That way you don't need any check
> here, just pass it.
>
> > +    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);

Ok, I will try to make it work with this suggestion.

>
>
> > -void multifd_send_sync_main(QEMUFile *f)
> > +int multifd_send_sync_main(QEMUFile *f, bool last_sync)
>
> As you need to check every round, you now have to check for errors on
> every multifd_send_sync_main() call.  It really looked weird that you
> only need to check it sometimes.

Ok, I will work on that.

>
> > @@ -3006,13 +3006,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> >          ram_control_after_iterate(f, RAM_CONTROL_FINISH);
> >      }
> >
> > -    if (ret >= 0) {
> > -        multifd_send_sync_main(rs->f);
> > -        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> > -        qemu_fflush(f);
> > +    if (ret < 0) {
> > +        return ret;
> >      }
> >
> > -    return ret;
> > +    ret = multifd_send_sync_main(rs->f, true);
> > +    if (ret < 0) {
> > +        return -1;
>
> Why are you returning -1 instead of ret?
>
> Callers of ram_save_complete(). We set qemu_error_file() with that
> error, so it is not a good idea to reset it.
>
>

Ok, I will take a look on that.

> Later, Juan.
>

Thanks,
Leo
Juan Quintela Nov. 3, 2021, 11:24 p.m. UTC | #14
Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> Hello Juan,

hi

> Current multifd's sendmsg() will block until all data is sent, is that correct?
>
> If that's the case, and supposing the network driver supports
> multiqueue, maybe there is a small chance for this to happen.
> I will add the flush at the end of each iteration, just to be sure.
>
>>
>> After each iteration, one needs to be sure that no ram is inflight.
>>
>> This means that I think you don't need the last_sync parameter at all,
>> as you have to do the flush() in every iteration.

It means guest memory corruption, it _needs_ to be there.
That is the whole reason why multifd code has to wait after each
iteration for all channels to finish.  Probability of failure is low,
but it exist, so it needs to be handled correctly.
>> >              '*multifd-zlib-level': 'uint8',
>> >              '*multifd-zstd-level': 'uint8',
>> > +         '*multifd-zerocopy': 'bool',
>> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>>
>> Something weird here.
>>
>> >              '*multifd-compression': 'MultiFDCompression',
>> >              '*multifd-zlib-level': 'uint8',
>> >              '*multifd-zstd-level': 'uint8',
>> > +         '*multifd-zerocopy': 'bool',
>> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>> >
>>
>> Same here.
>
> Could you please elaborate?

Indentation, 

+         '*multifd-zerocopy': 'bool',
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }

This is how it is seen here.  space/tab?

Later, Juan.
Leonardo Bras Nov. 4, 2021, 3:43 a.m. UTC | #15
On Wed, Nov 3, 2021 at 8:24 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> > Hello Juan,
>
> hi
>
> > Current multifd's sendmsg() will block until all data is sent, is that correct?
> >
> > If that's the case, and supposing the network driver supports
> > multiqueue, maybe there is a small chance for this to happen.
> > I will add the flush at the end of each iteration, just to be sure.
> >
> >>
> >> After each iteration, one needs to be sure that no ram is inflight.
> >>
> >> This means that I think you don't need the last_sync parameter at all,
> >> as you have to do the flush() in every iteration.
>
> It means guest memory corruption, it _needs_ to be there.
> That is the whole reason why multifd code has to wait after each
> iteration for all channels to finish.  Probability of failure is low,
> but it exist, so it needs to be handled correctly.

Sure, I will add a flush after each iteration.

> >> >              '*multifd-zlib-level': 'uint8',
> >> >              '*multifd-zstd-level': 'uint8',
> >> > +         '*multifd-zerocopy': 'bool',
> >> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> >>
> >> Something weird here.
> >>
> >> >              '*multifd-compression': 'MultiFDCompression',
> >> >              '*multifd-zlib-level': 'uint8',
> >> >              '*multifd-zstd-level': 'uint8',
> >> > +         '*multifd-zerocopy': 'bool',
> >> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> >> >
> >>
> >> Same here.
> >
> > Could you please elaborate?
>
> Indentation,
>
> +         '*multifd-zerocopy': 'bool',
>              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>
> This is how it is seen here.  space/tab?

It was supposed to be using only spaces.
I will make sure to fix that.

>
> Later, Juan.
>
Thanks!

Best regards,
Leo
Leonardo Bras April 14, 2022, 4 a.m. UTC | #16
Hello Juan,

Sorry to go back that early in discussion, but I was reviewing for v9
and I am not sure If I am unable to recall the reason, or I missed an
argument here.
Could you please help me with this?

On Tue, Nov 2, 2021 at 9:32 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish the last sync from
> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> > at the last sync in order to make sure all RAM is sent before finishing
> > the migration.
>
> You need to do this after each iteration.  Otherwise it can happen that:
>
> channel 1:               channel 2:
>
>    send page 11
>
> next iteration
>                          send page 11
>
>                          this page arrives
>
> now arrives this old copy.
>
> After each iteration, one needs to be sure that no ram is inflight.
>
> This means that I think you don't need the last_sync parameter at all,
> as you have to do the flush() in every iteration.

The flush command is used to guarantee every packet queued before
flush is actually sent before flush returns.
I mean, flushing every iteration will not help with the situation
above, where the pages are sent in order, but arrive at target in a
different order.

There is a chance that in the above text you meant 'send page' as
"queue page for sending", and 'page arrives' as "actually send the
queued page".
It that is correct, then syncing every iteration should not be necessary:
- On page queue, Linux saves the page address and size for sending
- On actual send, Linux will send the current data in the page and send.

So, in this example, if page 11 from iteration 'i' happens to be
'actually sent' after page 11 from iteration 'i+1', it would not be an
issue:
###
channel 1:               channel 2:
Iteration i

queue page 11 (i)

iteration i+1
                          queue page 11 (i+1)
                          actually send page 11 (i+1)

actually send page 11 (i)
###

That's because page 11 (i) will contain a newer version compared to
page 11 (i+1)

tl;dr:
- The page content always depends on the send time, instead of queue time.
- The iteration count describes the queue time.
(on non-zerocopy it's the opposite: it will depend on queue time,
because it copies the memory content during enqueue)

>
[...]

Juan, could you please help me understand if I am missing a part of
your argument up there?
Also, syncing every iteration is still necessary / recommended?

Best regards,
Leo
diff mbox series

Patch

diff --git a/qapi/migration.json b/qapi/migration.json
index 88f07baedd..c4890cbb54 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -724,6 +724,11 @@ 
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
+#                    When true, enables a zerocopy mechanism for sending memory
+#                    pages, if host does support it.
+#                    Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -758,6 +763,7 @@ 
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
+	   'multifd-zerocopy',
            'block-bitmap-mapping' ] }
 
 ##
@@ -884,6 +890,11 @@ 
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
+#                    When true, enables a zerocopy mechanism for sending memory
+#                    pages, if host does support it.
+#                    Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -934,6 +945,7 @@ 
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+	    '*multifd-zerocopy': 'bool',
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
@@ -1080,6 +1092,11 @@ 
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
+#                    When true, enables a zerocopy mechanism for sending memory
+#                    pages, if host does support it.
+#                    Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -1128,6 +1145,7 @@ 
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+	    '*multifd-zerocopy': 'bool',
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
diff --git a/migration/migration.h b/migration/migration.h
index 7a5aa8c2fd..860d83cc41 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -338,6 +338,7 @@  int migrate_multifd_channels(void);
 MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
 int migrate_multifd_zstd_level(void);
+int migrate_multifd_zerocopy(void);
 
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
diff --git a/migration/multifd.h b/migration/multifd.h
index 8d6751f5ed..8f5c5a6953 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -20,7 +20,7 @@  int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f, bool last_sync);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
diff --git a/migration/migration.c b/migration/migration.c
index 6ac807ef3d..326f7c515f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -879,6 +879,8 @@  MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->multifd_zlib_level = s->parameters.multifd_zlib_level;
     params->has_multifd_zstd_level = true;
     params->multifd_zstd_level = s->parameters.multifd_zstd_level;
+    params->has_multifd_zerocopy = true;
+    params->multifd_zerocopy = s->parameters.multifd_zerocopy;
     params->has_xbzrle_cache_size = true;
     params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
     params->has_max_postcopy_bandwidth = true;
@@ -1523,6 +1525,9 @@  static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_multifd_compression) {
         dest->multifd_compression = params->multifd_compression;
     }
+    if (params->has_multifd_zerocopy) {
+        dest->multifd_zerocopy = params->multifd_zerocopy;
+    }
     if (params->has_xbzrle_cache_size) {
         dest->xbzrle_cache_size = params->xbzrle_cache_size;
     }
@@ -1635,6 +1640,9 @@  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_multifd_compression) {
         s->parameters.multifd_compression = params->multifd_compression;
     }
+    if (params->has_multifd_zerocopy) {
+        s->parameters.multifd_zerocopy = params->multifd_zerocopy;
+    }
     if (params->has_xbzrle_cache_size) {
         s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
         xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -2516,6 +2524,15 @@  int migrate_multifd_zstd_level(void)
     return s->parameters.multifd_zstd_level;
 }
 
+int migrate_multifd_zerocopy(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.multifd_zerocopy;
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
@@ -4164,6 +4181,8 @@  static Property migration_properties[] = {
     DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState,
                       parameters.multifd_zstd_level,
                       DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL),
+    DEFINE_PROP_BOOL("multifd-zerocopy", MigrationState,
+                      parameters.multifd_zerocopy, false),
     DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,
                       parameters.xbzrle_cache_size,
                       DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE),
@@ -4261,6 +4280,7 @@  static void migration_instance_init(Object *obj)
     params->has_multifd_compression = true;
     params->has_multifd_zlib_level = true;
     params->has_multifd_zstd_level = true;
+    params->has_multifd_zerocopy = true;
     params->has_xbzrle_cache_size = true;
     params->has_max_postcopy_bandwidth = true;
     params->has_max_cpu_throttle = true;
diff --git a/migration/multifd.c b/migration/multifd.c
index 377da78f5b..17a7d90de3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -105,7 +105,13 @@  static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
  */
 static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
+    int flags = 0;
+
+    if (migrate_multifd_zerocopy()) {
+        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
+    }
+
+    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);
 }
 
 /**
@@ -575,19 +581,23 @@  void multifd_save_cleanup(void)
     multifd_send_state = NULL;
 }
 
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f, bool last_sync)
 {
     int i;
+    bool flush_zerocopy;
 
     if (!migrate_use_multifd()) {
-        return;
+        return 0;
     }
     if (multifd_send_state->pages->used) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
-            return;
+            return 0;
         }
     }
+
+    flush_zerocopy = last_sync && migrate_multifd_zerocopy();
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -598,7 +608,7 @@  void multifd_send_sync_main(QEMUFile *f)
         if (p->quit) {
             error_report("%s: channel %d has already quit", __func__, i);
             qemu_mutex_unlock(&p->mutex);
-            return;
+            return 0;
         }
 
         p->packet_num = multifd_send_state->packet_num++;
@@ -609,6 +619,17 @@  void multifd_send_sync_main(QEMUFile *f)
         ram_counters.transferred += p->packet_len;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
+
+        if (flush_zerocopy) {
+            int ret;
+            Error *err = NULL;
+
+            ret = qio_channel_flush_zerocopy(p->c, &err);
+            if (ret < 0) {
+                error_report_err(err);
+                return -1;
+            }
+        }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -617,6 +638,8 @@  void multifd_send_sync_main(QEMUFile *f)
         qemu_sem_wait(&p->sem_sync);
     }
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
+
+    return 0;
 }
 
 static void *multifd_send_thread(void *opaque)
diff --git a/migration/ram.c b/migration/ram.c
index 7a43bfd7af..ada57846a5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2839,7 +2839,7 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-    multifd_send_sync_main(f);
+    multifd_send_sync_main(f, false);
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     qemu_fflush(f);
 
@@ -2948,7 +2948,7 @@  static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        multifd_send_sync_main(rs->f);
+        multifd_send_sync_main(rs->f, false);
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
         ram_counters.transferred += 8;
@@ -3006,13 +3006,19 @@  static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
-    if (ret >= 0) {
-        multifd_send_sync_main(rs->f);
-        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-        qemu_fflush(f);
+    if (ret < 0) {
+        return ret;
     }
 
-    return ret;
+    ret = multifd_send_sync_main(rs->f, true);
+    if (ret < 0) {
+        return -1;
+    }
+
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_fflush(f);
+
+    return 0;
 }
 
 static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index bcaa41350e..b04f14ec1e 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1364,6 +1364,10 @@  void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_multifd_zstd_level = true;
         visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
         break;
+    case MIGRATION_PARAMETER_MULTIFD_ZEROCOPY:
+        p->has_multifd_zerocopy = true;
+        visit_type_bool(v, param, &p->multifd_zerocopy, &err);
+        break;
     case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
         p->has_xbzrle_cache_size = true;
         if (!visit_type_size(v, param, &cache_size, &err)) {