mbox series

[RFC,0/5] mptcp support

Message ID 20210408191159.133644-1-dgilbert@redhat.com (mailing list archive)
Headers show
Series mptcp support | expand

Message

Dr. David Alan Gilbert April 8, 2021, 7:11 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This RFC set adds support for multipath TCP (mptcp),
in particular on the migration path - but should be extensible
to other users.

  Multipath-tcp is a bit like bonding, but at L3; you can use
it to handle failure, but can also use it to split traffic across
multiple interfaces.

  Using a pair of 10Gb interfaces, I've managed to get 19Gbps
(with the only tuning being using huge pages and turning the MTU up).

  It needs a bleeding-edge Linux kernel (in some older ones you get
false accept messages for the subflows), and a C lib that has the
constants defined (as current glibc does).

  To use it you just need to append ,mptcp to an address;

  -incoming tcp:0:4444,mptcp
  migrate -d tcp:192.168.11.20:4444,mptcp

  I had a quick go at trying NBD as well, but I think it needs
some work with the parsing of NBD addresses.

  All comments welcome.

Dave

Dr. David Alan Gilbert (5):
  channel-socket: Only set CLOEXEC if we have space for fds
  io/net-listener: Call the notifier during finalize
  migration: Add cleanup hook for inwards migration
  migration/socket: Close the listener at the end
  sockets: Support multipath TCP

 io/channel-socket.c   |  8 ++++----
 io/dns-resolver.c     |  2 ++
 io/net-listener.c     |  3 +++
 migration/migration.c |  3 +++
 migration/migration.h |  4 ++++
 migration/multifd.c   |  5 +++++
 migration/socket.c    | 24 ++++++++++++++++++------
 qapi/sockets.json     |  5 ++++-
 util/qemu-sockets.c   | 34 ++++++++++++++++++++++++++++++++++
 9 files changed, 77 insertions(+), 11 deletions(-)

Comments

Daniel P. Berrangé April 9, 2021, 9:34 a.m. UTC | #1
On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This RFC set adds support for multipath TCP (mptcp),
> in particular on the migration path - but should be extensible
> to other users.
> 
>   Multipath-tcp is a bit like bonding, but at L3; you can use
> it to handle failure, but can also use it to split traffic across
> multiple interfaces.
> 
>   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> (with the only tuning being using huge pages and turning the MTU up).
> 
>   It needs a bleeding-edge Linux kernel (in some older ones you get
> false accept messages for the subflows), and a C lib that has the
> constants defined (as current glibc does).
> 
>   To use it you just need to append ,mptcp to an address;
> 
>   -incoming tcp:0:4444,mptcp
>   migrate -d tcp:192.168.11.20:4444,mptcp

What happens if you only enable mptcp flag on one side of the
stream (whether client or server), does it degrade to boring
old single path TCP, or does it result in an error ?

>   I had a quick go at trying NBD as well, but I think it needs
> some work with the parsing of NBD addresses.

In theory this is applicable to anywhere that we use sockets.
Anywhere that is configured with the QAPI  SocketAddress /
SocketAddressLegacy type will get it for free AFAICT.

Anywhere that is configured via QemuOpts will need an enhancement.

IOW, I would think NBD already works if you configure NBD via
QMP with nbd-server-start, or block-export-add.  qemu-nbd will
need cli options added.

The block layer clients for NBD, Gluster, Sheepdog and SSH also
all get it for free when configured va QMP, or -blockdev AFAICT

Legacy blocklayer filename syntax would need extra parsing, or
we can just not bother and say if you want new features, use
blockdev.


Overall this is impressively simple.

It feels like it obsoletes the multifd migration code, at least
if you assume Linux platform and new enough kernel ?

Except TLS... We already bottleneck on TLS encryption with
a single FD, since userspace encryption is limited to a
single thread.

There is the KTLS feature which offloads TLS encryption/decryption
to the kernel. This benefits even regular single FD performance,
because the encrytion work can be done by the kernel in a separate
thread from the userspace IO syscalls.

Any idea if KTLS is fully compatible with MPTCP ?  If so, then that
would look like it makes it a full replacementfor multifd on Linux.

Regards,
Daniel
Daniel P. Berrangé April 9, 2021, 9:42 a.m. UTC | #2
On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >   I had a quick go at trying NBD as well, but I think it needs
> > some work with the parsing of NBD addresses.
> 
> In theory this is applicable to anywhere that we use sockets.
> Anywhere that is configured with the QAPI  SocketAddress /
> SocketAddressLegacy type will get it for free AFAICT.

The caveat is any servers which share the problem of prematurely
closing the listener socket that you fixed here for migration.


Regards,
Daniel
Paolo Abeni April 9, 2021, 9:47 a.m. UTC | #3
On Fri, 2021-04-09 at 10:34 +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This RFC set adds support for multipath TCP (mptcp),
> > in particular on the migration path - but should be extensible
> > to other users.
> > 
> >   Multipath-tcp is a bit like bonding, but at L3; you can use
> > it to handle failure, but can also use it to split traffic across
> > multiple interfaces.
> > 
> >   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> > (with the only tuning being using huge pages and turning the MTU up).
> > 
> >   It needs a bleeding-edge Linux kernel (in some older ones you get
> > false accept messages for the subflows), and a C lib that has the
> > constants defined (as current glibc does).
> > 
> >   To use it you just need to append ,mptcp to an address;
> > 
> >   -incoming tcp:0:4444,mptcp
> >   migrate -d tcp:192.168.11.20:4444,mptcp
> 
> What happens if you only enable mptcp flag on one side of the
> stream (whether client or server), does it degrade to boring
> old single path TCP, or does it result in an error ?

If the mptcp handshake fails by any means - e.g. one side does not ask
for MPTCP - the connection fallbacks to plain TCP in a transparent way.

> >   I had a quick go at trying NBD as well, but I think it needs
> > some work with the parsing of NBD addresses.
> 
> In theory this is applicable to anywhere that we use sockets.
> Anywhere that is configured with the QAPI  SocketAddress /
> SocketAddressLegacy type will get it for free AFAICT.
> 
> Anywhere that is configured via QemuOpts will need an enhancement.
> 
> IOW, I would think NBD already works if you configure NBD via
> QMP with nbd-server-start, or block-export-add.  qemu-nbd will
> need cli options added.
> 
> The block layer clients for NBD, Gluster, Sheepdog and SSH also
> all get it for free when configured va QMP, or -blockdev AFAICT
> 
> Legacy blocklayer filename syntax would need extra parsing, or
> we can just not bother and say if you want new features, use
> blockdev.
> 
> 
> Overall this is impressively simple.
> 
> It feels like it obsoletes the multifd migration code, at least
> if you assume Linux platform and new enough kernel ?
> 
> Except TLS... We already bottleneck on TLS encryption with
> a single FD, since userspace encryption is limited to a
> single thread.
> 
> There is the KTLS feature which offloads TLS encryption/decryption
> to the kernel. This benefits even regular single FD performance,
> because the encrytion work can be done by the kernel in a separate
> thread from the userspace IO syscalls.
> 
> Any idea if KTLS is fully compatible with MPTCP ?  

Ouch!

So far is not supported. Both KTLS and MPTCP use/need ULP (Upper Layer
Protocol, a kernel way of hijaking core TCP features) and we can have a
single ULP per socket, so possibly that there is some technical show-
stopper there.

At very least is not in our short term roadmap, but I guess we can
updated that based on user needs.

Thanks!

Paolo
Paolo Abeni April 9, 2021, 9:55 a.m. UTC | #4
On Fri, 2021-04-09 at 10:42 +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote:
> > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >   I had a quick go at trying NBD as well, but I think it needs
> > > some work with the parsing of NBD addresses.
> > 
> > In theory this is applicable to anywhere that we use sockets.
> > Anywhere that is configured with the QAPI  SocketAddress /
> > SocketAddressLegacy type will get it for free AFAICT.
> 
> The caveat is any servers which share the problem of prematurely
> closing the listener socket that you fixed here for migration.

For the records, there is an alternative to that, based on a more
advanced and complex MPTCP configuration available only on even more
recent kernels. MPTCP can be configured to accept additional subflows
on a different listener, which will be managed (created and disposed)
by the kernel with no additional user-space changes (beyond the MPTCP
configuration).

That will require also a suitable firewalld (if enabled) configuration
(keeping the additional port open/accessible from the client).

Finally such configuration can be even more complex e.g. the additional
listener could be alternatively configured on the client side (!!!) and
the server could be configured to create additional subflows connecting
to such port (again no user-space changes needed, "only" more complex
MPTCP configuration).

Cheers,

Paolo
Dr. David Alan Gilbert April 12, 2021, 2:46 p.m. UTC | #5
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote:
> > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >   I had a quick go at trying NBD as well, but I think it needs
> > > some work with the parsing of NBD addresses.
> > 
> > In theory this is applicable to anywhere that we use sockets.
> > Anywhere that is configured with the QAPI  SocketAddress /
> > SocketAddressLegacy type will get it for free AFAICT.
> 
> The caveat is any servers which share the problem of prematurely
> closing the listener socket that you fixed here for migration.

Right, this varies depending on the server semantics; migration is only
expecting a single connection so shut it immediately; nbd is already
wired to expect multiple connections.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Dr. David Alan Gilbert April 12, 2021, 2:51 p.m. UTC | #6
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This RFC set adds support for multipath TCP (mptcp),
> > in particular on the migration path - but should be extensible
> > to other users.
> > 
> >   Multipath-tcp is a bit like bonding, but at L3; you can use
> > it to handle failure, but can also use it to split traffic across
> > multiple interfaces.
> > 
> >   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> > (with the only tuning being using huge pages and turning the MTU up).
> > 
> >   It needs a bleeding-edge Linux kernel (in some older ones you get
> > false accept messages for the subflows), and a C lib that has the
> > constants defined (as current glibc does).
> > 
> >   To use it you just need to append ,mptcp to an address;
> > 
> >   -incoming tcp:0:4444,mptcp
> >   migrate -d tcp:192.168.11.20:4444,mptcp
> 
> What happens if you only enable mptcp flag on one side of the
> stream (whether client or server), does it degrade to boring
> old single path TCP, or does it result in an error ?

I've just tested this and it matches what pabeni said; it seems to just
fall back.

> >   I had a quick go at trying NBD as well, but I think it needs
> > some work with the parsing of NBD addresses.
> 
> In theory this is applicable to anywhere that we use sockets.
> Anywhere that is configured with the QAPI  SocketAddress /
> SocketAddressLegacy type will get it for free AFAICT.

That was my hope.

> Anywhere that is configured via QemuOpts will need an enhancement.
> 
> IOW, I would think NBD already works if you configure NBD via
> QMP with nbd-server-start, or block-export-add.  qemu-nbd will
> need cli options added.
> 
> The block layer clients for NBD, Gluster, Sheepdog and SSH also
> all get it for free when configured va QMP, or -blockdev AFAICT

Have you got some examples via QMP?
I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero

> Legacy blocklayer filename syntax would need extra parsing, or
> we can just not bother and say if you want new features, use
> blockdev.
> 
> 
> Overall this is impressively simple.

Yeh; lots of small unexpected tidyups that took a while to fix.

> It feels like it obsoletes the multifd migration code, at least
> if you assume Linux platform and new enough kernel ?
>
> Except TLS... We already bottleneck on TLS encryption with
> a single FD, since userspace encryption is limited to a
> single thread.

Even without TLS we already run out of CPU, probably on the receiving
thread at around 20Gbps; which is a bit meh, compared to multifd which
I have seen hit 80Gbps on a particularly well greased 100Gbps
connection.
Curiously my attempts with multifd+mptcp so far have it being slower
than with just mptcp on it's own, not hitting the 20Gbps - not sure why
yet.

> There is the KTLS feature which offloads TLS encryption/decryption
> to the kernel. This benefits even regular single FD performance,
> because the encrytion work can be done by the kernel in a separate
> thread from the userspace IO syscalls.
> 
> Any idea if KTLS is fully compatible with MPTCP ?  If so, then that
> would look like it makes it a full replacementfor multifd on Linux.

I've not tried kTLS at all yet; as pabeni says, not currently
compatible.
The otherones I'd like to try are zero-copy offload receive/transmit
(again I'm not sure those are compatible).

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Daniel P. Berrangé April 12, 2021, 2:56 p.m. UTC | #7
On Mon, Apr 12, 2021 at 03:51:10PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Hi,
> > >   This RFC set adds support for multipath TCP (mptcp),
> > > in particular on the migration path - but should be extensible
> > > to other users.
> > > 
> > >   Multipath-tcp is a bit like bonding, but at L3; you can use
> > > it to handle failure, but can also use it to split traffic across
> > > multiple interfaces.
> > > 
> > >   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> > > (with the only tuning being using huge pages and turning the MTU up).
> > > 
> > >   It needs a bleeding-edge Linux kernel (in some older ones you get
> > > false accept messages for the subflows), and a C lib that has the
> > > constants defined (as current glibc does).
> > > 
> > >   To use it you just need to append ,mptcp to an address;
> > > 
> > >   -incoming tcp:0:4444,mptcp
> > >   migrate -d tcp:192.168.11.20:4444,mptcp
> > 
> > What happens if you only enable mptcp flag on one side of the
> > stream (whether client or server), does it degrade to boring
> > old single path TCP, or does it result in an error ?
> 
> I've just tested this and it matches what pabeni said; it seems to just
> fall back.
> 
> > >   I had a quick go at trying NBD as well, but I think it needs
> > > some work with the parsing of NBD addresses.
> > 
> > In theory this is applicable to anywhere that we use sockets.
> > Anywhere that is configured with the QAPI  SocketAddress /
> > SocketAddressLegacy type will get it for free AFAICT.
> 
> That was my hope.
> 
> > Anywhere that is configured via QemuOpts will need an enhancement.
> > 
> > IOW, I would think NBD already works if you configure NBD via
> > QMP with nbd-server-start, or block-export-add.  qemu-nbd will
> > need cli options added.
> > 
> > The block layer clients for NBD, Gluster, Sheepdog and SSH also
> > all get it for free when configured va QMP, or -blockdev AFAICT
> 
> Have you got some examples via QMP?
> I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero

I never remember the mapping to blockdev QAPI schema, especially
when using legacy filename syntax with the URI.

Try instead

 -blockdev driver=nbd,host=192.168.11.20,port=3333,mptcp=on,id=disk0backend
 -device virtio-blk,drive=disk0backend,id=disk0



Regards,
Daniel
Dr. David Alan Gilbert April 14, 2021, 6:49 p.m. UTC | #8
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Apr 12, 2021 at 03:51:10PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Hi,
> > > >   This RFC set adds support for multipath TCP (mptcp),
> > > > in particular on the migration path - but should be extensible
> > > > to other users.
> > > > 
> > > >   Multipath-tcp is a bit like bonding, but at L3; you can use
> > > > it to handle failure, but can also use it to split traffic across
> > > > multiple interfaces.
> > > > 
> > > >   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> > > > (with the only tuning being using huge pages and turning the MTU up).
> > > > 
> > > >   It needs a bleeding-edge Linux kernel (in some older ones you get
> > > > false accept messages for the subflows), and a C lib that has the
> > > > constants defined (as current glibc does).
> > > > 
> > > >   To use it you just need to append ,mptcp to an address;
> > > > 
> > > >   -incoming tcp:0:4444,mptcp
> > > >   migrate -d tcp:192.168.11.20:4444,mptcp
> > > 
> > > What happens if you only enable mptcp flag on one side of the
> > > stream (whether client or server), does it degrade to boring
> > > old single path TCP, or does it result in an error ?
> > 
> > I've just tested this and it matches what pabeni said; it seems to just
> > fall back.
> > 
> > > >   I had a quick go at trying NBD as well, but I think it needs
> > > > some work with the parsing of NBD addresses.
> > > 
> > > In theory this is applicable to anywhere that we use sockets.
> > > Anywhere that is configured with the QAPI  SocketAddress /
> > > SocketAddressLegacy type will get it for free AFAICT.
> > 
> > That was my hope.
> > 
> > > Anywhere that is configured via QemuOpts will need an enhancement.
> > > 
> > > IOW, I would think NBD already works if you configure NBD via
> > > QMP with nbd-server-start, or block-export-add.  qemu-nbd will
> > > need cli options added.
> > > 
> > > The block layer clients for NBD, Gluster, Sheepdog and SSH also
> > > all get it for free when configured va QMP, or -blockdev AFAICT
> > 
> > Have you got some examples via QMP?
> > I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero
> 
> I never remember the mapping to blockdev QAPI schema, especially
> when using legacy filename syntax with the URI.
> 
> Try instead
> 
>  -blockdev driver=nbd,host=192.168.11.20,port=3333,mptcp=on,id=disk0backend
>  -device virtio-blk,drive=disk0backend,id=disk0

That doesn't look like the right syntax, but it got me closer; and it's
working with no more code changes:

On the source:

qemu... -nographic -M none -drive if=none,file=my.qcow2,id=mydisk
(qemu) nbd_server_start 0.0.0.0:3333,mptcp=on
(qemu) nbd_server_add -w mydisk

On the destination:
-blockdev driver=nbd,server.type=inet,server.host=192.168.11.20,server.port=3333,server.mptcp=on,node-name=nbddisk,export=mydisk -device virtio-blk,drive=nbddisk,id=disk0

and it succesfully booted off it, and it looks like it has two flows.
(It didn't get that great a bandwidth, but I'm not sure where that's due
to).

Dave
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|