diff mbox series

[v9,6/7] multifd: Send header packet without flags if zero-copy-send is enabled

Message ID 20220425215055.611825-7-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series MSG_ZEROCOPY + multifd | expand

Commit Message

Leonardo Bras April 25, 2022, 9:50 p.m. UTC
Since d48c3a0445 ("multifd: Use a single writev on the send side"),
sending the header packet and the memory pages happens in the same
writev, which can potentially make the migration faster.

Using channel-socket as example, this works well with the default copying
mechanism of sendmsg(), but with zero-copy-send=true, it will cause
the migration to often break.

This happens because the header packet buffer gets reused quite often,
and there is a high chance that by the time the MSG_ZEROCOPY mechanism get
to send the buffer, it has already changed, sending the wrong data and
causing the migration to abort.

It means that, as it is, the buffer for the header packet is not suitable
for sending with MSG_ZEROCOPY.

In order to enable zero copy for multifd, send the header packet on an
individual write(), without any flags, and the remanining pages with a
writev(), as it was happening before. This only changes how a migration
with zero-copy-send=true works, not changing any current behavior for
migrations with zero-copy-send=false.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

Comments

Daniel P. Berrangé April 26, 2022, 8:10 a.m. UTC | #1
On Mon, Apr 25, 2022 at 06:50:55PM -0300, Leonardo Bras wrote:
> Since d48c3a0445 ("multifd: Use a single writev on the send side"),
> sending the header packet and the memory pages happens in the same
> writev, which can potentially make the migration faster.
> 
> Using channel-socket as example, this works well with the default copying
> mechanism of sendmsg(), but with zero-copy-send=true, it will cause
> the migration to often break.
> 
> This happens because the header packet buffer gets reused quite often,
> and there is a high chance that by the time the MSG_ZEROCOPY mechanism get
> to send the buffer, it has already changed, sending the wrong data and
> causing the migration to abort.
> 
> It means that, as it is, the buffer for the header packet is not suitable
> for sending with MSG_ZEROCOPY.
> 
> In order to enable zero copy for multifd, send the header packet on an
> individual write(), without any flags, and the remanining pages with a
> writev(), as it was happening before. This only changes how a migration
> with zero-copy-send=true works, not changing any current behavior for
> migrations with zero-copy-send=false.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  migration/multifd.c | 29 ++++++++++++++++++++++++++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 15fb668e64..6c940aaa98 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -639,6 +639,8 @@ static void *multifd_send_thread(void *opaque)
>          if (p->pending_job) {
>              uint64_t packet_num = p->packet_num;
>              uint32_t flags = p->flags;
> +            int iov_offset = 0;
> +

No need for this if you change:

>              p->iovs_num = 1;

   if (!migrate_use_zero_copy_send()) {
      p->iovs_num = 1;
   }


>              p->normal_num = 0;
>  
> @@ -665,15 +667,36 @@ static void *multifd_send_thread(void *opaque)
>              trace_multifd_send(p->id, packet_num, p->normal_num, flags,
>                                 p->next_packet_size);
>  
> -            p->iov[0].iov_len = p->packet_len;
> -            p->iov[0].iov_base = p->packet;
> +            if (migrate_use_zero_copy_send()) {
> +                /* Send header without zerocopy */
> +                ret = qio_channel_write_all(p->c, (void *)p->packet,
> +                                            p->packet_len, &local_err);
> +                if (ret != 0) {
> +                    break;
> +                }
> +
> +                if (!p->normal_num) {
> +                    /* No pages will be sent */
> +                    goto skip_send;
> +                }

Don't need this AFAIK, because the qio_channel_writev_all
call will be a no-op if  iovs_num is zero

>  
> -            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> +                /* Skip first iov : header */
> +                iov_offset = 1;

Don't need to set this

> +            } else {
> +                /* Send header using the same writev call */
> +                p->iov[0].iov_len = p->packet_len;
> +                p->iov[0].iov_base = p->packet;
> +            }
> +
> +            ret = qio_channel_writev_all(p->c, p->iov + iov_offset,
> +                                         p->iovs_num - iov_offset,
>                                           &local_err);

This wouldn't need changing if we don't reserve iovs[0] when
not required.

> +
>              if (ret != 0) {
>                  break;
>              }
>  
> +skip_send:
>              qemu_mutex_lock(&p->mutex);
>              p->pending_job--;
>              qemu_mutex_unlock(&p->mutex);
> -- 
> 2.36.0
> 

With regards,
Daniel
Leonardo Bras April 26, 2022, 9:59 p.m. UTC | #2
Hello Daniel, thank you for the feedback!

On Tue, Apr 26, 2022 at 5:11 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, Apr 25, 2022 at 06:50:55PM -0300, Leonardo Bras wrote:
> > Since d48c3a0445 ("multifd: Use a single writev on the send side"),
> > sending the header packet and the memory pages happens in the same
> > writev, which can potentially make the migration faster.
> >
> > Using channel-socket as example, this works well with the default copying
> > mechanism of sendmsg(), but with zero-copy-send=true, it will cause
> > the migration to often break.
> >
> > This happens because the header packet buffer gets reused quite often,
> > and there is a high chance that by the time the MSG_ZEROCOPY mechanism get
> > to send the buffer, it has already changed, sending the wrong data and
> > causing the migration to abort.
> >
> > It means that, as it is, the buffer for the header packet is not suitable
> > for sending with MSG_ZEROCOPY.
> >
> > In order to enable zero copy for multifd, send the header packet on an
> > individual write(), without any flags, and the remanining pages with a
> > writev(), as it was happening before. This only changes how a migration
> > with zero-copy-send=true works, not changing any current behavior for
> > migrations with zero-copy-send=false.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  migration/multifd.c | 29 ++++++++++++++++++++++++++---
> >  1 file changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 15fb668e64..6c940aaa98 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -639,6 +639,8 @@ static void *multifd_send_thread(void *opaque)
> >          if (p->pending_job) {
> >              uint64_t packet_num = p->packet_num;
> >              uint32_t flags = p->flags;
> > +            int iov_offset = 0;
> > +
>
> No need for this if you change:
>
> >              p->iovs_num = 1;
>
>    if (!migrate_use_zero_copy_send()) {
>       p->iovs_num = 1;
>    }
>

I understand the point now: setting p->iovs_num = 0 before
multifd_send_state->ops->send_prepare() causes p->iov[0] to be used for
pages instead of the header. I was not aware, so thanks for pointing that out!

But it's also necessary to have an else clause with p->iovs_num = 0, right?
It seems like the variable is not set anywhere else, and it would keep growing
after the second loop iteration, causing prepare() to access p->iov[]
outside bounds.

Am I missing something here?

>
> >              p->normal_num = 0;
> >
> > @@ -665,15 +667,36 @@ static void *multifd_send_thread(void *opaque)
> >              trace_multifd_send(p->id, packet_num, p->normal_num, flags,
> >                                 p->next_packet_size);
> >
> > -            p->iov[0].iov_len = p->packet_len;
> > -            p->iov[0].iov_base = p->packet;
> > +            if (migrate_use_zero_copy_send()) {
> > +                /* Send header without zerocopy */
> > +                ret = qio_channel_write_all(p->c, (void *)p->packet,
> > +                                            p->packet_len, &local_err);
> > +                if (ret != 0) {
> > +                    break;
> > +                }
> > +
> > +                if (!p->normal_num) {
> > +                    /* No pages will be sent */
> > +                    goto skip_send;
> > +                }
>
> Don't need this AFAIK, because the qio_channel_writev_all
> call will be a no-op if  iovs_num is zero
>

Oh, I see:
qio_channel_writev_all() will call qio_channel_writev_full_all() where
niov == 0 and thus nlocal_iov == 0, avoiding the loop that calls
qio_channel_writev_full().

I will remove that in v10


> >
> > -            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> > +                /* Skip first iov : header */
> > +                iov_offset = 1;
>
> Don't need to set this

Agree, that makes sense since the offset part is discontinued.

>
> > +            } else {
> > +                /* Send header using the same writev call */
> > +                p->iov[0].iov_len = p->packet_len;
> > +                p->iov[0].iov_base = p->packet;
> > +            }
> > +
> > +            ret = qio_channel_writev_all(p->c, p->iov + iov_offset,
> > +                                         p->iovs_num - iov_offset,
> >                                           &local_err);
>
> This wouldn't need changing if we don't reserve iovs[0] when
> not required.

Agree.

>
> > +
> >              if (ret != 0) {
> >                  break;
> >              }
> >
> > +skip_send:
> >              qemu_mutex_lock(&p->mutex);
> >              p->pending_job--;
> >              qemu_mutex_unlock(&p->mutex);
> > --
> > 2.36.0
> >
>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

I will probably send a v10 shortly.

Thanks for reviewing!

Best regards,
Leo
diff mbox series

Patch

diff --git a/migration/multifd.c b/migration/multifd.c
index 15fb668e64..6c940aaa98 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -639,6 +639,8 @@  static void *multifd_send_thread(void *opaque)
         if (p->pending_job) {
             uint64_t packet_num = p->packet_num;
             uint32_t flags = p->flags;
+            int iov_offset = 0;
+
             p->iovs_num = 1;
             p->normal_num = 0;
 
@@ -665,15 +667,36 @@  static void *multifd_send_thread(void *opaque)
             trace_multifd_send(p->id, packet_num, p->normal_num, flags,
                                p->next_packet_size);
 
-            p->iov[0].iov_len = p->packet_len;
-            p->iov[0].iov_base = p->packet;
+            if (migrate_use_zero_copy_send()) {
+                /* Send header without zerocopy */
+                ret = qio_channel_write_all(p->c, (void *)p->packet,
+                                            p->packet_len, &local_err);
+                if (ret != 0) {
+                    break;
+                }
+
+                if (!p->normal_num) {
+                    /* No pages will be sent */
+                    goto skip_send;
+                }
 
-            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
+                /* Skip first iov : header */
+                iov_offset = 1;
+            } else {
+                /* Send header using the same writev call */
+                p->iov[0].iov_len = p->packet_len;
+                p->iov[0].iov_base = p->packet;
+            }
+
+            ret = qio_channel_writev_all(p->c, p->iov + iov_offset,
+                                         p->iovs_num - iov_offset,
                                          &local_err);
+
             if (ret != 0) {
                 break;
             }
 
+skip_send:
             qemu_mutex_lock(&p->mutex);
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);