diff mbox series

[v5,6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)

Message ID 20211112051040.923746-7-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series MSG_ZEROCOPY + multifd | expand

Commit Message

Leonardo Bras Nov. 12, 2021, 5:10 a.m. UTC
Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
zerocopy interface.

Change multifd_send_sync_main() so it can distinguish each iteration sync from
the setup and the completion, so a flush_zerocopy() can be called
at the after each iteration in order to make sure all dirty pages are sent
before a new iteration is started.

Also make it return -1 if flush_zerocopy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between async_send() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

Given a lot of locked memory may be needed in order to use multid migration
with zerocopy enabled, make it optional by creating a new migration parameter
"zerocopy" on qapi, so low-privileged users can still perform multifd
migrations.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.h |  4 +++-
 migration/multifd.c | 37 ++++++++++++++++++++++++++++++++-----
 migration/ram.c     | 29 ++++++++++++++++++++++-------
 migration/socket.c  |  9 +++++++--
 4 files changed, 64 insertions(+), 15 deletions(-)

Comments

Juan Quintela Nov. 16, 2021, 4:08 p.m. UTC | #1
Leonardo Bras <leobras@redhat.com> wrote:
> Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> zerocopy interface.
>
> Change multifd_send_sync_main() so it can distinguish each iteration sync from
> the setup and the completion, so a flush_zerocopy() can be called
> at the after each iteration in order to make sure all dirty pages are sent
> before a new iteration is started.
>
> Also make it return -1 if flush_zerocopy() fails, in order to cancel
> the migration process, and avoid resuming the guest in the target host
> without receiving all current RAM.
>
> This will work fine on RAM migration because the RAM pages are not usually freed,
> and there is no problem on changing the pages content between async_send() and
> the actual sending of the buffer, because this change will dirty the page and
> cause it to be re-sent on a next iteration anyway.
>
> Given a lot of locked memory may be needed in order to use multid migration
> with zerocopy enabled, make it optional by creating a new migration parameter
> "zerocopy" on qapi, so low-privileged users can still perform multifd
> migrations.

How much memory can a non-root program use by default?


>  static void *multifd_send_thread(void *opaque)
> @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>          goto cleanup;
>      }
>  
> +    if (migrate_use_zerocopy()) {
> +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> +    }

This belongs


>      p->c = QIO_CHANNEL(sioc);
>      qio_channel_set_delay(p->c, false);
>      p->running = true;
> @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
>          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
>          p->name = g_strdup_printf("multifdsend_%d", i);
>          p->tls_hostname = g_strdup(s->hostname);
> +        p->write_flags = 0;

here?

>          socket_send_channel_create(multifd_new_send_channel_async, p);
>      }
> diff --git a/migration/socket.c b/migration/socket.c
> index e26e94aa0c..8e40e0a3fd 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
>          trace_migration_socket_outgoing_connected(data->hostname);
>      }
>  
> -    if (migrate_use_zerocopy()) {
> -        error_setg(&err, "Zerocopy not available in migration");
> +    if (migrate_use_zerocopy() &&
> +        (!migrate_use_multifd() ||
> +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> +          migrate_use_tls())) {
> +        error_setg(&err,
> +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
>      }
>  
>      migration_channel_connect(data->s, sioc, data->hostname, err);

Do we really want to do this check here?  I think this is really too
late.

You are not patching migrate_params_check().

I think that the proper way of doing this is something like:

    if (params->zerocopy &&
        (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
         migrate_use_tls())) {
           error_setg(&err,
                     "Zerocopy only available for non-compressed non-TLS multifd migration");
        return false;
    }

You have to do the equivalent of multifd_compression and tls enablement,
to see that zerocopy is not enabled, of course.

I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
I can't see a way of doing that without a qio.

Later, Juan.
Daniel P. Berrangé Nov. 16, 2021, 4:17 p.m. UTC | #2
On Tue, Nov 16, 2021 at 05:08:06PM +0100, Juan Quintela wrote:
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > the setup and the completion, so a flush_zerocopy() can be called
> > at the after each iteration in order to make sure all dirty pages are sent
> > before a new iteration is started.
> >
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new migration parameter
> > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > migrations.
> 
> How much memory can a non-root program use by default?
> 
> 
> >  static void *multifd_send_thread(void *opaque)
> > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> >          goto cleanup;
> >      }
> >  
> > +    if (migrate_use_zerocopy()) {
> > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
> 
> This belongs
> 
> 
> >      p->c = QIO_CHANNEL(sioc);
> >      qio_channel_set_delay(p->c, false);
> >      p->running = true;
> > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> >          p->name = g_strdup_printf("multifdsend_%d", i);
> >          p->tls_hostname = g_strdup(s->hostname);
> > +        p->write_flags = 0;
> 
> here?
> 
> >          socket_send_channel_create(multifd_new_send_channel_async, p);
> >      }
> > diff --git a/migration/socket.c b/migration/socket.c
> > index e26e94aa0c..8e40e0a3fd 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> >          trace_migration_socket_outgoing_connected(data->hostname);
> >      }
> >  
> > -    if (migrate_use_zerocopy()) {
> > -        error_setg(&err, "Zerocopy not available in migration");
> > +    if (migrate_use_zerocopy() &&
> > +        (!migrate_use_multifd() ||
> > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > +          migrate_use_tls())) {
> > +        error_setg(&err,
> > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> >      }
> >  
> >      migration_channel_connect(data->s, sioc, data->hostname, err);
> 
> Do we really want to do this check here?  I think this is really too
> late.
> 
> You are not patching migrate_params_check().
> 
> I think that the proper way of doing this is something like:
> 
>     if (params->zerocopy &&
>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
>          migrate_use_tls())) {
>            error_setg(&err,
>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
>         return false;
>     }
> 
> You have to do the equivalent of multifd_compression and tls enablement,
> to see that zerocopy is not enabled, of course.
> 
> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> I can't see a way of doing that without a qio.

I don't think you need to check that feature flag.

The combination of zerocopy and compression is simply illogical
and can be rejected unconditionally.

The combination of zerocopy and TLS is somewhat questionable.
You're always going to have a copy between the plain text and
cipher text. Avoiding an extra copy for the cipher text will
require kerenl work which has no ETA. If it does ever arise,
QEMU will need more work again to actually support it. So
today we can just reject it unconditonally again.

Regards,
Daniel
Juan Quintela Nov. 16, 2021, 4:34 p.m. UTC | #3
Daniel P. Berrangé <berrange@redhat.com> wrote:

>> 
>>     if (params->zerocopy &&
>>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
>>          migrate_use_tls())) {
>>            error_setg(&err,
>>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
>>         return false;
>>     }
>> 
>> You have to do the equivalent of multifd_compression and tls enablement,
>> to see that zerocopy is not enabled, of course.
>> 
>> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
>> I can't see a way of doing that without a qio.
>
> I don't think you need to check that feature flag.

Oh, I mean other thing.

When you set "zerocopy" capability, you don't know if the kernel support
it.  My understanding is that the only way to check if it supported is
here.

I agree with the rest of the arguments.

Later, Juan.
Daniel P. Berrangé Nov. 16, 2021, 4:34 p.m. UTC | #4
On Tue, Nov 16, 2021 at 04:17:47PM +0000, Daniel P. Berrangé wrote:
> On Tue, Nov 16, 2021 at 05:08:06PM +0100, Juan Quintela wrote:
> > Leonardo Bras <leobras@redhat.com> wrote:
> > > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > > zerocopy interface.
> > >
> > > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > > the setup and the completion, so a flush_zerocopy() can be called
> > > at the after each iteration in order to make sure all dirty pages are sent
> > > before a new iteration is started.
> > >
> > > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > > the migration process, and avoid resuming the guest in the target host
> > > without receiving all current RAM.
> > >
> > > This will work fine on RAM migration because the RAM pages are not usually freed,
> > > and there is no problem on changing the pages content between async_send() and
> > > the actual sending of the buffer, because this change will dirty the page and
> > > cause it to be re-sent on a next iteration anyway.
> > >
> > > Given a lot of locked memory may be needed in order to use multid migration
> > > with zerocopy enabled, make it optional by creating a new migration parameter
> > > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > > migrations.
> > 
> > How much memory can a non-root program use by default?
> > 
> > 
> > >  static void *multifd_send_thread(void *opaque)
> > > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > >          goto cleanup;
> > >      }
> > >  
> > > +    if (migrate_use_zerocopy()) {
> > > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > > +    }
> > 
> > This belongs
> > 
> > 
> > >      p->c = QIO_CHANNEL(sioc);
> > >      qio_channel_set_delay(p->c, false);
> > >      p->running = true;
> > > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> > >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> > >          p->name = g_strdup_printf("multifdsend_%d", i);
> > >          p->tls_hostname = g_strdup(s->hostname);
> > > +        p->write_flags = 0;
> > 
> > here?
> > 
> > >          socket_send_channel_create(multifd_new_send_channel_async, p);
> > >      }
> > > diff --git a/migration/socket.c b/migration/socket.c
> > > index e26e94aa0c..8e40e0a3fd 100644
> > > --- a/migration/socket.c
> > > +++ b/migration/socket.c
> > > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> > >          trace_migration_socket_outgoing_connected(data->hostname);
> > >      }
> > >  
> > > -    if (migrate_use_zerocopy()) {
> > > -        error_setg(&err, "Zerocopy not available in migration");
> > > +    if (migrate_use_zerocopy() &&
> > > +        (!migrate_use_multifd() ||
> > > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > > +          migrate_use_tls())) {
> > > +        error_setg(&err,
> > > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> > >      }
> > >  
> > >      migration_channel_connect(data->s, sioc, data->hostname, err);
> > 
> > Do we really want to do this check here?  I think this is really too
> > late.
> > 
> > You are not patching migrate_params_check().
> > 
> > I think that the proper way of doing this is something like:
> > 
> >     if (params->zerocopy &&
> >         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> >          migrate_use_tls())) {
> >            error_setg(&err,
> >                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> >         return false;
> >     }
> > 
> > You have to do the equivalent of multifd_compression and tls enablement,
> > to see that zerocopy is not enabled, of course.
> > 
> > I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> > I can't see a way of doing that without a qio.
> 
> I don't think you need to check that feature flag.
> 
> The combination of zerocopy and compression is simply illogical
> and can be rejected unconditionally.

Or we could think of "zerocopy"  in a more targetted way.
It is only "zerocopy" in terms the final I/O operation.
Earlier parts of the process may involve copies. IOW, we
can copy as part of the compress operation, but skip the
when then sending the compressed page.

In practice though this is still unlikely to be something
we can practically do, as we would need to keep compressed
pages around for an entire migration iteration until we can
call flush to ensure the data has been sent. This would be
significant memory burden.

> The combination of zerocopy and TLS is somewhat questionable.
> You're always going to have a copy between the plain text and
> cipher text. Avoiding an extra copy for the cipher text will
> require kerenl work which has no ETA. If it does ever arise,
> QEMU will need more work again to actually support it. So
> today we can just reject it unconditonally again.


Regards,
Daniel
Daniel P. Berrangé Nov. 16, 2021, 4:39 p.m. UTC | #5
On Tue, Nov 16, 2021 at 05:34:50PM +0100, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> >> 
> >>     if (params->zerocopy &&
> >>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> >>          migrate_use_tls())) {
> >>            error_setg(&err,
> >>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> >>         return false;
> >>     }
> >> 
> >> You have to do the equivalent of multifd_compression and tls enablement,
> >> to see that zerocopy is not enabled, of course.
> >> 
> >> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> >> I can't see a way of doing that without a qio.
> >
> > I don't think you need to check that feature flag.
> 
> Oh, I mean other thing.
> 
> When you set "zerocopy" capability, you don't know if the kernel support
> it.  My understanding is that the only way to check if it supported is
> here.

If you reqest it and it isn't supported you'll get an error back from
qio_channel_writev_zerocopy(). That's a bit too late though.

Ideally we should report an error straight after the migration code
creates the I/O channel, by querying for the feature.


Regards,
Daniel
Leonardo Bras Dec. 2, 2021, 6:47 a.m. UTC | #6
On Tue, Nov 16, 2021 at 1:08 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > the setup and the completion, so a flush_zerocopy() can be called
> > at the after each iteration in order to make sure all dirty pages are sent
> > before a new iteration is started.
> >
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new migration parameter
> > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > migrations.
>
> How much memory can a non-root program use by default?

On RHEL 8, a standard user is created allowing 64kB max locked memory.
(memory seems 'unlimited', though)


>
>
> >  static void *multifd_send_thread(void *opaque)
> > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> >          goto cleanup;
> >      }
> >
> > +    if (migrate_use_zerocopy()) {
> > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
>
> This belongs
>
>
> >      p->c = QIO_CHANNEL(sioc);
> >      qio_channel_set_delay(p->c, false);
> >      p->running = true;
> > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> >          p->name = g_strdup_printf("multifdsend_%d", i);
> >          p->tls_hostname = g_strdup(s->hostname);
> > +        p->write_flags = 0;
>
> here?

yeah, makes sense.
Moving on v6.

>
> >          socket_send_channel_create(multifd_new_send_channel_async, p);
> >      }
> > diff --git a/migration/socket.c b/migration/socket.c
> > index e26e94aa0c..8e40e0a3fd 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> >          trace_migration_socket_outgoing_connected(data->hostname);
> >      }
> >
> > -    if (migrate_use_zerocopy()) {
> > -        error_setg(&err, "Zerocopy not available in migration");
> > +    if (migrate_use_zerocopy() &&
> > +        (!migrate_use_multifd() ||
> > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > +          migrate_use_tls())) {
> > +        error_setg(&err,
> > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> >      }
> >
> >      migration_channel_connect(data->s, sioc, data->hostname, err);
>
> Do we really want to do this check here?  I think this is really too
> late.
>
> You are not patching migrate_params_check().
>
> I think that the proper way of doing this is something like:
>
>     if (params->zerocopy &&
>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
>          migrate_use_tls())) {
>            error_setg(&err,
>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
>         return false;
>     }
>
> You have to do the equivalent of multifd_compression and tls enablement,
> to see that zerocopy is not enabled, of course.

IIUC, by following your suggestion and changing this in
migrate_params_check() instead would allow the misconfiguration to be
known before migration is attempted.
That seems the best thing to do here.


>
> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> I can't see a way of doing that without a qio.

Yeah, I think I should leave the feature testing in here, and move the
parameter testing to migrate_params_check() as commented before.

What do you think?

>
> Later, Juan.
>

Best regards,
Leo
Leonardo Bras Dec. 2, 2021, 6:54 a.m. UTC | #7
On Tue, Nov 16, 2021 at 1:35 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Nov 16, 2021 at 04:17:47PM +0000, Daniel P. Berrangé wrote:
> > On Tue, Nov 16, 2021 at 05:08:06PM +0100, Juan Quintela wrote:
> > > Leonardo Bras <leobras@redhat.com> wrote:
> > > > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > > > zerocopy interface.
> > > >
> > > > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > > > the setup and the completion, so a flush_zerocopy() can be called
> > > > at the after each iteration in order to make sure all dirty pages are sent
> > > > before a new iteration is started.
> > > >
> > > > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > > > the migration process, and avoid resuming the guest in the target host
> > > > without receiving all current RAM.
> > > >
> > > > This will work fine on RAM migration because the RAM pages are not usually freed,
> > > > and there is no problem on changing the pages content between async_send() and
> > > > the actual sending of the buffer, because this change will dirty the page and
> > > > cause it to be re-sent on a next iteration anyway.
> > > >
> > > > Given a lot of locked memory may be needed in order to use multid migration
> > > > with zerocopy enabled, make it optional by creating a new migration parameter
> > > > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > > > migrations.
> > >
> > > How much memory can a non-root program use by default?
> > >
> > >
> > > >  static void *multifd_send_thread(void *opaque)
> > > > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > > >          goto cleanup;
> > > >      }
> > > >
> > > > +    if (migrate_use_zerocopy()) {
> > > > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > > > +    }
> > >
> > > This belongs
> > >
> > >
> > > >      p->c = QIO_CHANNEL(sioc);
> > > >      qio_channel_set_delay(p->c, false);
> > > >      p->running = true;
> > > > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> > > >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> > > >          p->name = g_strdup_printf("multifdsend_%d", i);
> > > >          p->tls_hostname = g_strdup(s->hostname);
> > > > +        p->write_flags = 0;
> > >
> > > here?
> > >
> > > >          socket_send_channel_create(multifd_new_send_channel_async, p);
> > > >      }
> > > > diff --git a/migration/socket.c b/migration/socket.c
> > > > index e26e94aa0c..8e40e0a3fd 100644
> > > > --- a/migration/socket.c
> > > > +++ b/migration/socket.c
> > > > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> > > >          trace_migration_socket_outgoing_connected(data->hostname);
> > > >      }
> > > >
> > > > -    if (migrate_use_zerocopy()) {
> > > > -        error_setg(&err, "Zerocopy not available in migration");
> > > > +    if (migrate_use_zerocopy() &&
> > > > +        (!migrate_use_multifd() ||
> > > > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > > > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > > > +          migrate_use_tls())) {
> > > > +        error_setg(&err,
> > > > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> > > >      }
> > > >
> > > >      migration_channel_connect(data->s, sioc, data->hostname, err);
> > >
> > > Do we really want to do this check here?  I think this is really too
> > > late.
> > >
> > > You are not patching migrate_params_check().
> > >
> > > I think that the proper way of doing this is something like:
> > >
> > >     if (params->zerocopy &&
> > >         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > >          migrate_use_tls())) {
> > >            error_setg(&err,
> > >                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> > >         return false;
> > >     }
> > >
> > > You have to do the equivalent of multifd_compression and tls enablement,
> > > to see that zerocopy is not enabled, of course.
> > >
> > > I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> > > I can't see a way of doing that without a qio.
> >
> > I don't think you need to check that feature flag.
> >
> > The combination of zerocopy and compression is simply illogical
> > and can be rejected unconditionally.
>
> Or we could think of "zerocopy"  in a more targetted way.
> It is only "zerocopy" in terms the final I/O operation.
> Earlier parts of the process may involve copies. IOW, we
> can copy as part of the compress operation, but skip the
> when then sending the compressed page.
>
> In practice though this is still unlikely to be something
> we can practically do, as we would need to keep compressed
> pages around for an entire migration iteration until we can
> call flush to ensure the data has been sent. This would be
> significant memory burden.
>
> > The combination of zerocopy and TLS is somewhat questionable.
> > You're always going to have a copy between the plain text and
> > cipher text. Avoiding an extra copy for the cipher text will
> > require kerenl work which has no ETA. If it does ever arise,
> > QEMU will need more work again to actually support it. So
> > today we can just reject it unconditonally again.
>

My idea on failing here in the combination of zerocopy & (!multifd ||
TLS || compaction) is just about how it is today.
Meaning if someone wants to use zerocopy + TLS or zerocopy +
compaction in the future, and can provide a good implementation, it
should be ok to change it here (or at migrate_params_check() where
this should be).

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Best regards,
Leo
Leonardo Bras Dec. 2, 2021, 6:56 a.m. UTC | #8
On Tue, Nov 16, 2021 at 1:40 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Nov 16, 2021 at 05:34:50PM +0100, Juan Quintela wrote:
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > >>
> > >>     if (params->zerocopy &&
> > >>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > >>          migrate_use_tls())) {
> > >>            error_setg(&err,
> > >>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> > >>         return false;
> > >>     }
> > >>
> > >> You have to do the equivalent of multifd_compression and tls enablement,
> > >> to see that zerocopy is not enabled, of course.
> > >>
> > >> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> > >> I can't see a way of doing that without a qio.
> > >
> > > I don't think you need to check that feature flag.
> >
> > Oh, I mean other thing.
> >
> > When you set "zerocopy" capability, you don't know if the kernel support
> > it.  My understanding is that the only way to check if it supported is
> > here.
>
> If you reqest it and it isn't supported you'll get an error back from
> qio_channel_writev_zerocopy(). That's a bit too late though.
>
> Ideally we should report an error straight after the migration code
> creates the I/O channel, by querying for the feature.
>
>

Agree.
I suggested checking the feature presence where the test is happening
in v5, and the other combinations of migration parameters at
migrate_params_check() as Juan suggested.

What do you think?

> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Best regards,
Leo
Juan Quintela Dec. 2, 2021, 12:10 p.m. UTC | #9
Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> On Tue, Nov 16, 2021 at 1:08 PM Juan Quintela <quintela@redhat.com> wrote:
>>
>> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
>> I can't see a way of doing that without a qio.
>
> Yeah, I think I should leave the feature testing in here, and move the
> parameter testing to migrate_params_check() as commented before.
>
> What do you think?

The best that we can do.

Thanks, Juan.
Leonardo Bras Dec. 9, 2021, 8:51 a.m. UTC | #10
Hello Juan,

On Tue, Nov 16, 2021 at 1:08 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > the setup and the completion, so a flush_zerocopy() can be called
> > at the after each iteration in order to make sure all dirty pages are sent
> > before a new iteration is started.
> >
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new migration parameter
> > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > migrations.
>
> How much memory can a non-root program use by default?
>
>
> >  static void *multifd_send_thread(void *opaque)
> > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> >          goto cleanup;
> >      }
> >
> > +    if (migrate_use_zerocopy()) {
> > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
>
> This belongs
>
>
> >      p->c = QIO_CHANNEL(sioc);
> >      qio_channel_set_delay(p->c, false);
> >      p->running = true;
> > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> >          p->name = g_strdup_printf("multifdsend_%d", i);
> >          p->tls_hostname = g_strdup(s->hostname);
> > +        p->write_flags = 0;
>
> here?
>
> >          socket_send_channel_create(multifd_new_send_channel_async, p);
> >      }
> > diff --git a/migration/socket.c b/migration/socket.c
> > index e26e94aa0c..8e40e0a3fd 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> >          trace_migration_socket_outgoing_connected(data->hostname);
> >      }
> >
> > -    if (migrate_use_zerocopy()) {
> > -        error_setg(&err, "Zerocopy not available in migration");
> > +    if (migrate_use_zerocopy() &&
> > +        (!migrate_use_multifd() ||
> > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > +          migrate_use_tls())) {
> > +        error_setg(&err,
> > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> >      }
> >
> >      migration_channel_connect(data->s, sioc, data->hostname, err);
>
> Do we really want to do this check here?  I think this is really too
> late.
>
> You are not patching migrate_params_check().
>
> I think that the proper way of doing this is something like:
>
>     if (params->zerocopy &&
>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
>          migrate_use_tls())) {
>            error_setg(&err,
>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
>         return false;
>     }

Don't we also need a check for multifd enabled here?
We could have zerocopy, multifd_compression=none, tls=disabled but it
will not fail if multifd=disabled.

Is this correct?


>
> You have to do the equivalent of multifd_compression and tls enablement,
> to see that zerocopy is not enabled, of course.
>
> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> I can't see a way of doing that without a qio.
>
> Later, Juan.
>
Leonardo Bras Dec. 9, 2021, 9:42 a.m. UTC | #11
On Thu, Dec 9, 2021 at 5:51 AM Leonardo Bras Soares Passos
<leobras@redhat.com> wrote:
>
> Hello Juan,
>
> On Tue, Nov 16, 2021 at 1:08 PM Juan Quintela <quintela@redhat.com> wrote:
> >
> > Leonardo Bras <leobras@redhat.com> wrote:
> > > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > > zerocopy interface.
> > >
> > > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > > the setup and the completion, so a flush_zerocopy() can be called
> > > at the after each iteration in order to make sure all dirty pages are sent
> > > before a new iteration is started.
> > >
> > > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > > the migration process, and avoid resuming the guest in the target host
> > > without receiving all current RAM.
> > >
> > > This will work fine on RAM migration because the RAM pages are not usually freed,
> > > and there is no problem on changing the pages content between async_send() and
> > > the actual sending of the buffer, because this change will dirty the page and
> > > cause it to be re-sent on a next iteration anyway.
> > >
> > > Given a lot of locked memory may be needed in order to use multid migration
> > > with zerocopy enabled, make it optional by creating a new migration parameter
> > > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > > migrations.
> >
> > How much memory can a non-root program use by default?
> >
> >
> > >  static void *multifd_send_thread(void *opaque)
> > > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > >          goto cleanup;
> > >      }
> > >
> > > +    if (migrate_use_zerocopy()) {
> > > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > > +    }
> >
> > This belongs
> >
> >
> > >      p->c = QIO_CHANNEL(sioc);
> > >      qio_channel_set_delay(p->c, false);
> > >      p->running = true;
> > > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> > >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> > >          p->name = g_strdup_printf("multifdsend_%d", i);
> > >          p->tls_hostname = g_strdup(s->hostname);
> > > +        p->write_flags = 0;
> >
> > here?
> >
> > >          socket_send_channel_create(multifd_new_send_channel_async, p);
> > >      }
> > > diff --git a/migration/socket.c b/migration/socket.c
> > > index e26e94aa0c..8e40e0a3fd 100644
> > > --- a/migration/socket.c
> > > +++ b/migration/socket.c
> > > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> > >          trace_migration_socket_outgoing_connected(data->hostname);
> > >      }
> > >
> > > -    if (migrate_use_zerocopy()) {
> > > -        error_setg(&err, "Zerocopy not available in migration");
> > > +    if (migrate_use_zerocopy() &&
> > > +        (!migrate_use_multifd() ||
> > > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > > +          migrate_use_tls())) {
> > > +        error_setg(&err,
> > > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> > >      }
> > >
> > >      migration_channel_connect(data->s, sioc, data->hostname, err);
> >
> > Do we really want to do this check here?  I think this is really too
> > late.
> >
> > You are not patching migrate_params_check().
> >
> > I think that the proper way of doing this is something like:
> >
> >     if (params->zerocopy &&
> >         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> >          migrate_use_tls())) {
> >            error_setg(&err,
> >                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> >         return false;
> >     }
>
> Don't we also need a check for multifd enabled here?
> We could have zerocopy, multifd_compression=none, tls=disabled but it
> will not fail if multifd=disabled.
>
> Is this correct?
>

I did some tests and this case actually seems to not fail, even though
it should.
So IIUC we really need to check for multifd here.

Sending v6.

>
> >
> > You have to do the equivalent of multifd_compression and tls enablement,
> > to see that zerocopy is not enabled, of course.
> >
> > I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> > I can't see a way of doing that without a qio.
> >
> > Later, Juan.
> >
diff mbox series

Patch

diff --git a/migration/multifd.h b/migration/multifd.h
index 15c50ca0b2..37941c1872 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -22,7 +22,7 @@  int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f, bool sync);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
@@ -97,6 +97,8 @@  typedef struct {
     uint32_t packet_len;
     /* pointer to the packet */
     MultiFDPacket_t *packet;
+    /* multifd flags for sending ram */
+    int write_flags;
     /* multifd flags for each packet */
     uint32_t flags;
     /* size of the next packet that contains pages */
diff --git a/migration/multifd.c b/migration/multifd.c
index 3d9dc8cb58..816078df60 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -105,7 +105,8 @@  static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
  */
 static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
+    return qio_channel_writev_all_flags(p->c, p->pages->iov, used,
+                                        p->write_flags, errp);
 }
 
 /**
@@ -578,19 +579,27 @@  void multifd_save_cleanup(void)
     multifd_send_state = NULL;
 }
 
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f, bool sync)
 {
     int i;
+    bool flush_zerocopy;
 
     if (!migrate_use_multifd()) {
-        return;
+        return 0;
     }
     if (multifd_send_state->pages->used) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
-            return;
+            return 0;
         }
     }
+
+    /*
+     * When using zerocopy, it's necessary to flush after each iteration to make
+     * sure pages from earlier iterations don't end up replacing newer pages.
+     */
+    flush_zerocopy = sync && migrate_use_zerocopy();
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -601,7 +610,7 @@  void multifd_send_sync_main(QEMUFile *f)
         if (p->quit) {
             error_report("%s: channel %d has already quit", __func__, i);
             qemu_mutex_unlock(&p->mutex);
-            return;
+            return 0;
         }
 
         p->packet_num = multifd_send_state->packet_num++;
@@ -612,6 +621,17 @@  void multifd_send_sync_main(QEMUFile *f)
         ram_counters.transferred += p->packet_len;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
+
+        if (flush_zerocopy) {
+            int ret;
+            Error *err = NULL;
+
+            ret = qio_channel_flush_zerocopy(p->c, &err);
+            if (ret < 0) {
+                error_report_err(err);
+                return -1;
+            }
+        }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -620,6 +640,8 @@  void multifd_send_sync_main(QEMUFile *f)
         qemu_sem_wait(&p->sem_sync);
     }
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
+
+    return 0;
 }
 
 static void *multifd_send_thread(void *opaque)
@@ -853,6 +875,10 @@  static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
         goto cleanup;
     }
 
+    if (migrate_use_zerocopy()) {
+        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
+    }
+
     p->c = QIO_CHANNEL(sioc);
     qio_channel_set_delay(p->c, false);
     p->running = true;
@@ -918,6 +944,7 @@  int multifd_save_setup(Error **errp)
         p->packet->version = cpu_to_be32(MULTIFD_VERSION);
         p->name = g_strdup_printf("multifdsend_%d", i);
         p->tls_hostname = g_strdup(s->hostname);
+        p->write_flags = 0;
         socket_send_channel_create(multifd_new_send_channel_async, p);
     }
 
diff --git a/migration/ram.c b/migration/ram.c
index 863035d235..0b3ddbffc1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2992,6 +2992,7 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMState **rsp = opaque;
     RAMBlock *block;
+    int ret;
 
     if (compress_threads_save_setup()) {
         return -1;
@@ -3026,7 +3027,11 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-    multifd_send_sync_main(f);
+    ret =  multifd_send_sync_main(f, false);
+    if (ret < 0) {
+        return ret;
+    }
+
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     qemu_fflush(f);
 
@@ -3135,7 +3140,11 @@  static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        multifd_send_sync_main(rs->f);
+        ret = multifd_send_sync_main(rs->f, true);
+        if (ret < 0) {
+            return ret;
+        }
+
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
         ram_counters.transferred += 8;
@@ -3193,13 +3202,19 @@  static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
-    if (ret >= 0) {
-        multifd_send_sync_main(rs->f);
-        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-        qemu_fflush(f);
+    if (ret < 0) {
+        return ret;
     }
 
-    return ret;
+    ret = multifd_send_sync_main(rs->f, false);
+    if (ret < 0) {
+        return ret;
+    }
+
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_fflush(f);
+
+    return 0;
 }
 
 static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
diff --git a/migration/socket.c b/migration/socket.c
index e26e94aa0c..8e40e0a3fd 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -78,8 +78,13 @@  static void socket_outgoing_migration(QIOTask *task,
         trace_migration_socket_outgoing_connected(data->hostname);
     }
 
-    if (migrate_use_zerocopy()) {
-        error_setg(&err, "Zerocopy not available in migration");
+    if (migrate_use_zerocopy() &&
+        (!migrate_use_multifd() ||
+         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
+          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
+          migrate_use_tls())) {
+        error_setg(&err,
+                   "Zerocopy only available for non-compressed non-TLS multifd migration");
     }
 
     migration_channel_connect(data->s, sioc, data->hostname, err);