mbox series

[v1,0/3] QIOChannel flags + multifd zerocopy

Message ID 20210831110238.299458-1-leobras@redhat.com (mailing list archive)
Headers show
Series QIOChannel flags + multifd zerocopy | expand

Message

Leonardo Bras Aug. 31, 2021, 11:02 a.m. UTC
This patch series intends to enable MSG_ZEROCOPY in QIOChannel, and make
use of it for multifd migration performance improvement.

Patch #1 enables the use of flags on qio_channel_write*(), allowing
more flexibility in using the channel. 
It was designed for MSG_ZEROCOPY usage, in which it's a good idea
having a eassy way to choose what packets are sent with the flag, but
also makes it more flexible for future usage.

Patch #2 just adds the MSG_ZEROCOPY feature, and defines the enablement
mechanics, while not enabling it in any code.

Patch #3 enables MSG_ZEROCOPY for migration / multifd.


Results:
So far, the resource usage of __sys_sendmsg() reduced 15 times, and the
overall migration took 13-18% less time, based in synthetic workload.

The objective is to reduce migration time in hosts with heavy cpu usage.

Leonardo Bras (3):
  io: Enable write flags for QIOChannel
  io: Add zerocopy and errqueue
  migration: multifd: Enable zerocopy

 chardev/char-io.c                   |  2 +-
 hw/remote/mpqemu-link.c             |  2 +-
 include/io/channel-socket.h         |  2 +
 include/io/channel.h                | 85 +++++++++++++++++++++++------
 io/channel-buffer.c                 |  1 +
 io/channel-command.c                |  1 +
 io/channel-file.c                   |  1 +
 io/channel-socket.c                 | 80 ++++++++++++++++++++++++++-
 io/channel-tls.c                    | 12 ++++
 io/channel-websock.c                | 10 ++++
 io/channel.c                        | 64 +++++++++++++---------
 migration/multifd-zlib.c            |  7 ++-
 migration/multifd-zstd.c            |  7 ++-
 migration/multifd.c                 |  9 ++-
 migration/multifd.h                 |  3 +-
 migration/rdma.c                    |  1 +
 scsi/pr-manager-helper.c            |  2 +-
 tests/unit/test-io-channel-socket.c |  1 +
 18 files changed, 235 insertions(+), 55 deletions(-)

Comments

Peter Xu Aug. 31, 2021, 9:24 p.m. UTC | #1
On Tue, Aug 31, 2021 at 08:02:36AM -0300, Leonardo Bras wrote:
> Results:
> So far, the resource usage of __sys_sendmsg() reduced 15 times, and the
> overall migration took 13-18% less time, based in synthetic workload.

Leo,

Could you share some of the details of your tests?  E.g., what's the
configuration of your VM for testing?  What's the migration time before/after
the patchset applied?  What is the network you're using?

Thanks,
Leonardo Bras Sept. 1, 2021, 7:21 p.m. UTC | #2
Hello Peter,

On Tue, Aug 31, 2021 at 6:24 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Aug 31, 2021 at 08:02:36AM -0300, Leonardo Bras wrote:
> > Results:
> > So far, the resource usage of __sys_sendmsg() reduced 15 times, and the
> > overall migration took 13-18% less time, based in synthetic workload.
>
> Leo,
>
> Could you share some of the details of your tests?  E.g., what's the
> configuration of your VM for testing?  What's the migration time before/after
> the patchset applied?  What is the network you're using?
>
> Thanks,
>
> --
> Peter Xu
>

Sure,
- Both receiving and sending hosts have 128GB ram and a 10Gbps network interface
  - There is a direct connection between the network interfaces.
- The guest has 100GB ram, mem-lock=on and enable-kvm.
- Before sending, I use a simple application to completely fill all
guest pages with unique values, to avoid duplicated pages and zeroed
pages.

On a single test:

Without zerocopy (qemu/master)
- Migration took 123355ms, with an average of 6912.58 Mbps
With Zerocopy:
- Migration took 108514ms, with an average of 7858.39 Mbps

This represents a throughput improvement around 13.6%.

Comparing perf recorded during default and zerocopy migration:
Without zerocopy:
- copy_user_generic_string() uses 5.4% of cpu time
- __sys_sendmsg() uses 5.19% of cpu time
With zerocopy:
- copy_user_generic_string() uses 0.02% of cpu time (~1/270 of the original)
- __sys_sendmsg() uses 0.34% of cpu time (~1/15 of the original)