Message ID | 20220425215055.611825-7-leobras@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | MSG_ZEROCOPY + multifd | expand |
On Mon, Apr 25, 2022 at 06:50:55PM -0300, Leonardo Bras wrote: > Since d48c3a0445 ("multifd: Use a single writev on the send side"), > sending the header packet and the memory pages happens in the same > writev, which can potentially make the migration faster. > > Using channel-socket as example, this works well with the default copying > mechanism of sendmsg(), but with zero-copy-send=true, it will cause > the migration to often break. > > This happens because the header packet buffer gets reused quite often, > and there is a high chance that by the time the MSG_ZEROCOPY mechanism get > to send the buffer, it has already changed, sending the wrong data and > causing the migration to abort. > > It means that, as it is, the buffer for the header packet is not suitable > for sending with MSG_ZEROCOPY. > > In order to enable zero copy for multifd, send the header packet on an > individual write(), without any flags, and the remanining pages with a > writev(), as it was happening before. This only changes how a migration > with zero-copy-send=true works, not changing any current behavior for > migrations with zero-copy-send=false. > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > migration/multifd.c | 29 ++++++++++++++++++++++++++--- > 1 file changed, 26 insertions(+), 3 deletions(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index 15fb668e64..6c940aaa98 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -639,6 +639,8 @@ static void *multifd_send_thread(void *opaque) > if (p->pending_job) { > uint64_t packet_num = p->packet_num; > uint32_t flags = p->flags; > + int iov_offset = 0; > + No need for this if you change: > p->iovs_num = 1; if (!migrate_use_zero_copy_send()) { p->iovs_num = 1; } > p->normal_num = 0; > > @@ -665,15 +667,36 @@ static void *multifd_send_thread(void *opaque) > trace_multifd_send(p->id, packet_num, p->normal_num, flags, > p->next_packet_size); > > - p->iov[0].iov_len = p->packet_len; > - p->iov[0].iov_base = p->packet; > + if (migrate_use_zero_copy_send()) { > + /* Send header without zerocopy */ > + ret = qio_channel_write_all(p->c, (void *)p->packet, > + p->packet_len, &local_err); > + if (ret != 0) { > + break; > + } > + > + if (!p->normal_num) { > + /* No pages will be sent */ > + goto skip_send; > + } Don't need this AFAIK, because the qio_channel_writev_all call will be a no-op if iovs_num is zero > > - ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num, > + /* Skip first iov : header */ > + iov_offset = 1; Don't need to set this > + } else { > + /* Send header using the same writev call */ > + p->iov[0].iov_len = p->packet_len; > + p->iov[0].iov_base = p->packet; > + } > + > + ret = qio_channel_writev_all(p->c, p->iov + iov_offset, > + p->iovs_num - iov_offset, > &local_err); This wouldn't need changing if we don't reserve iovs[0] when not required. > + > if (ret != 0) { > break; > } > > +skip_send: > qemu_mutex_lock(&p->mutex); > p->pending_job--; > qemu_mutex_unlock(&p->mutex); > -- > 2.36.0 > With regards, Daniel
Hello Daniel, thank you for the feedback! On Tue, Apr 26, 2022 at 5:11 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Apr 25, 2022 at 06:50:55PM -0300, Leonardo Bras wrote: > > Since d48c3a0445 ("multifd: Use a single writev on the send side"), > > sending the header packet and the memory pages happens in the same > > writev, which can potentially make the migration faster. > > > > Using channel-socket as example, this works well with the default copying > > mechanism of sendmsg(), but with zero-copy-send=true, it will cause > > the migration to often break. > > > > This happens because the header packet buffer gets reused quite often, > > and there is a high chance that by the time the MSG_ZEROCOPY mechanism get > > to send the buffer, it has already changed, sending the wrong data and > > causing the migration to abort. > > > > It means that, as it is, the buffer for the header packet is not suitable > > for sending with MSG_ZEROCOPY. > > > > In order to enable zero copy for multifd, send the header packet on an > > individual write(), without any flags, and the remanining pages with a > > writev(), as it was happening before. This only changes how a migration > > with zero-copy-send=true works, not changing any current behavior for > > migrations with zero-copy-send=false. > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > --- > > migration/multifd.c | 29 ++++++++++++++++++++++++++--- > > 1 file changed, 26 insertions(+), 3 deletions(-) > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > index 15fb668e64..6c940aaa98 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -639,6 +639,8 @@ static void *multifd_send_thread(void *opaque) > > if (p->pending_job) { > > uint64_t packet_num = p->packet_num; > > uint32_t flags = p->flags; > > + int iov_offset = 0; > > + > > No need for this if you change: > > > p->iovs_num = 1; > > if (!migrate_use_zero_copy_send()) { > p->iovs_num = 1; > } > I understand the point now: setting p->iovs_num = 0 before multifd_send_state->ops->send_prepare() causes p->iov[0] to be used for pages instead of the header. I was not aware, so thanks for pointing that out! But it's also necessary to have an else clause with p->iovs_num = 0, right? It seems like the variable is not set anywhere else, and it would keep growing after the second loop iteration, causing prepare() to access p->iov[] outside bounds. Am I missing something here? > > > p->normal_num = 0; > > > > @@ -665,15 +667,36 @@ static void *multifd_send_thread(void *opaque) > > trace_multifd_send(p->id, packet_num, p->normal_num, flags, > > p->next_packet_size); > > > > - p->iov[0].iov_len = p->packet_len; > > - p->iov[0].iov_base = p->packet; > > + if (migrate_use_zero_copy_send()) { > > + /* Send header without zerocopy */ > > + ret = qio_channel_write_all(p->c, (void *)p->packet, > > + p->packet_len, &local_err); > > + if (ret != 0) { > > + break; > > + } > > + > > + if (!p->normal_num) { > > + /* No pages will be sent */ > > + goto skip_send; > > + } > > Don't need this AFAIK, because the qio_channel_writev_all > call will be a no-op if iovs_num is zero > Oh, I see: qio_channel_writev_all() will call qio_channel_writev_full_all() where niov == 0 and thus nlocal_iov == 0, avoiding the loop that calls qio_channel_writev_full(). I will remove that in v10 > > > > - ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num, > > + /* Skip first iov : header */ > > + iov_offset = 1; > > Don't need to set this Agree, that makes sense since the offset part is discontinued. > > > + } else { > > + /* Send header using the same writev call */ > > + p->iov[0].iov_len = p->packet_len; > > + p->iov[0].iov_base = p->packet; > > + } > > + > > + ret = qio_channel_writev_all(p->c, p->iov + iov_offset, > > + p->iovs_num - iov_offset, > > &local_err); > > This wouldn't need changing if we don't reserve iovs[0] when > not required. Agree. > > > + > > if (ret != 0) { > > break; > > } > > > > +skip_send: > > qemu_mutex_lock(&p->mutex); > > p->pending_job--; > > qemu_mutex_unlock(&p->mutex); > > -- > > 2.36.0 > > > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > I will probably send a v10 shortly. Thanks for reviewing! Best regards, Leo
diff --git a/migration/multifd.c b/migration/multifd.c index 15fb668e64..6c940aaa98 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -639,6 +639,8 @@ static void *multifd_send_thread(void *opaque) if (p->pending_job) { uint64_t packet_num = p->packet_num; uint32_t flags = p->flags; + int iov_offset = 0; + p->iovs_num = 1; p->normal_num = 0; @@ -665,15 +667,36 @@ static void *multifd_send_thread(void *opaque) trace_multifd_send(p->id, packet_num, p->normal_num, flags, p->next_packet_size); - p->iov[0].iov_len = p->packet_len; - p->iov[0].iov_base = p->packet; + if (migrate_use_zero_copy_send()) { + /* Send header without zerocopy */ + ret = qio_channel_write_all(p->c, (void *)p->packet, + p->packet_len, &local_err); + if (ret != 0) { + break; + } + + if (!p->normal_num) { + /* No pages will be sent */ + goto skip_send; + } - ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num, + /* Skip first iov : header */ + iov_offset = 1; + } else { + /* Send header using the same writev call */ + p->iov[0].iov_len = p->packet_len; + p->iov[0].iov_base = p->packet; + } + + ret = qio_channel_writev_all(p->c, p->iov + iov_offset, + p->iovs_num - iov_offset, &local_err); + if (ret != 0) { break; } +skip_send: qemu_mutex_lock(&p->mutex); p->pending_job--; qemu_mutex_unlock(&p->mutex);
Since d48c3a0445 ("multifd: Use a single writev on the send side"), sending the header packet and the memory pages happens in the same writev, which can potentially make the migration faster. Using channel-socket as example, this works well with the default copying mechanism of sendmsg(), but with zero-copy-send=true, it will cause the migration to often break. This happens because the header packet buffer gets reused quite often, and there is a high chance that by the time the MSG_ZEROCOPY mechanism get to send the buffer, it has already changed, sending the wrong data and causing the migration to abort. It means that, as it is, the buffer for the header packet is not suitable for sending with MSG_ZEROCOPY. In order to enable zero copy for multifd, send the header packet on an individual write(), without any flags, and the remanining pages with a writev(), as it was happening before. This only changes how a migration with zero-copy-send=true works, not changing any current behavior for migrations with zero-copy-send=false. Signed-off-by: Leonardo Bras <leobras@redhat.com> --- migration/multifd.c | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-)