Message ID | 20210408191159.133644-1-dgilbert@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | mptcp support | expand |
On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Hi, > This RFC set adds support for multipath TCP (mptcp), > in particular on the migration path - but should be extensible > to other users. > > Multipath-tcp is a bit like bonding, but at L3; you can use > it to handle failure, but can also use it to split traffic across > multiple interfaces. > > Using a pair of 10Gb interfaces, I've managed to get 19Gbps > (with the only tuning being using huge pages and turning the MTU up). > > It needs a bleeding-edge Linux kernel (in some older ones you get > false accept messages for the subflows), and a C lib that has the > constants defined (as current glibc does). > > To use it you just need to append ,mptcp to an address; > > -incoming tcp:0:4444,mptcp > migrate -d tcp:192.168.11.20:4444,mptcp What happens if you only enable mptcp flag on one side of the stream (whether client or server), does it degrade to boring old single path TCP, or does it result in an error ? > I had a quick go at trying NBD as well, but I think it needs > some work with the parsing of NBD addresses. In theory this is applicable to anywhere that we use sockets. Anywhere that is configured with the QAPI SocketAddress / SocketAddressLegacy type will get it for free AFAICT. Anywhere that is configured via QemuOpts will need an enhancement. IOW, I would think NBD already works if you configure NBD via QMP with nbd-server-start, or block-export-add. qemu-nbd will need cli options added. The block layer clients for NBD, Gluster, Sheepdog and SSH also all get it for free when configured va QMP, or -blockdev AFAICT Legacy blocklayer filename syntax would need extra parsing, or we can just not bother and say if you want new features, use blockdev. Overall this is impressively simple. It feels like it obsoletes the multifd migration code, at least if you assume Linux platform and new enough kernel ? Except TLS... We already bottleneck on TLS encryption with a single FD, since userspace encryption is limited to a single thread. There is the KTLS feature which offloads TLS encryption/decryption to the kernel. This benefits even regular single FD performance, because the encrytion work can be done by the kernel in a separate thread from the userspace IO syscalls. Any idea if KTLS is fully compatible with MPTCP ? If so, then that would look like it makes it a full replacementfor multifd on Linux. Regards, Daniel
On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote: > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > I had a quick go at trying NBD as well, but I think it needs > > some work with the parsing of NBD addresses. > > In theory this is applicable to anywhere that we use sockets. > Anywhere that is configured with the QAPI SocketAddress / > SocketAddressLegacy type will get it for free AFAICT. The caveat is any servers which share the problem of prematurely closing the listener socket that you fixed here for migration. Regards, Daniel
On Fri, 2021-04-09 at 10:34 +0100, Daniel P. Berrangé wrote: > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Hi, > > This RFC set adds support for multipath TCP (mptcp), > > in particular on the migration path - but should be extensible > > to other users. > > > > Multipath-tcp is a bit like bonding, but at L3; you can use > > it to handle failure, but can also use it to split traffic across > > multiple interfaces. > > > > Using a pair of 10Gb interfaces, I've managed to get 19Gbps > > (with the only tuning being using huge pages and turning the MTU up). > > > > It needs a bleeding-edge Linux kernel (in some older ones you get > > false accept messages for the subflows), and a C lib that has the > > constants defined (as current glibc does). > > > > To use it you just need to append ,mptcp to an address; > > > > -incoming tcp:0:4444,mptcp > > migrate -d tcp:192.168.11.20:4444,mptcp > > What happens if you only enable mptcp flag on one side of the > stream (whether client or server), does it degrade to boring > old single path TCP, or does it result in an error ? If the mptcp handshake fails by any means - e.g. one side does not ask for MPTCP - the connection fallbacks to plain TCP in a transparent way. > > I had a quick go at trying NBD as well, but I think it needs > > some work with the parsing of NBD addresses. > > In theory this is applicable to anywhere that we use sockets. > Anywhere that is configured with the QAPI SocketAddress / > SocketAddressLegacy type will get it for free AFAICT. > > Anywhere that is configured via QemuOpts will need an enhancement. > > IOW, I would think NBD already works if you configure NBD via > QMP with nbd-server-start, or block-export-add. qemu-nbd will > need cli options added. > > The block layer clients for NBD, Gluster, Sheepdog and SSH also > all get it for free when configured va QMP, or -blockdev AFAICT > > Legacy blocklayer filename syntax would need extra parsing, or > we can just not bother and say if you want new features, use > blockdev. > > > Overall this is impressively simple. > > It feels like it obsoletes the multifd migration code, at least > if you assume Linux platform and new enough kernel ? > > Except TLS... We already bottleneck on TLS encryption with > a single FD, since userspace encryption is limited to a > single thread. > > There is the KTLS feature which offloads TLS encryption/decryption > to the kernel. This benefits even regular single FD performance, > because the encrytion work can be done by the kernel in a separate > thread from the userspace IO syscalls. > > Any idea if KTLS is fully compatible with MPTCP ? Ouch! So far is not supported. Both KTLS and MPTCP use/need ULP (Upper Layer Protocol, a kernel way of hijaking core TCP features) and we can have a single ULP per socket, so possibly that there is some technical show- stopper there. At very least is not in our short term roadmap, but I guess we can updated that based on user needs. Thanks! Paolo
On Fri, 2021-04-09 at 10:42 +0100, Daniel P. Berrangé wrote: > On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote: > > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > I had a quick go at trying NBD as well, but I think it needs > > > some work with the parsing of NBD addresses. > > > > In theory this is applicable to anywhere that we use sockets. > > Anywhere that is configured with the QAPI SocketAddress / > > SocketAddressLegacy type will get it for free AFAICT. > > The caveat is any servers which share the problem of prematurely > closing the listener socket that you fixed here for migration. For the records, there is an alternative to that, based on a more advanced and complex MPTCP configuration available only on even more recent kernels. MPTCP can be configured to accept additional subflows on a different listener, which will be managed (created and disposed) by the kernel with no additional user-space changes (beyond the MPTCP configuration). That will require also a suitable firewalld (if enabled) configuration (keeping the additional port open/accessible from the client). Finally such configuration can be even more complex e.g. the additional listener could be alternatively configured on the client side (!!!) and the server could be configured to create additional subflows connecting to such port (again no user-space changes needed, "only" more complex MPTCP configuration). Cheers, Paolo
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote: > > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > I had a quick go at trying NBD as well, but I think it needs > > > some work with the parsing of NBD addresses. > > > > In theory this is applicable to anywhere that we use sockets. > > Anywhere that is configured with the QAPI SocketAddress / > > SocketAddressLegacy type will get it for free AFAICT. > > The caveat is any servers which share the problem of prematurely > closing the listener socket that you fixed here for migration. Right, this varies depending on the server semantics; migration is only expecting a single connection so shut it immediately; nbd is already wired to expect multiple connections. Dave > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Hi, > > This RFC set adds support for multipath TCP (mptcp), > > in particular on the migration path - but should be extensible > > to other users. > > > > Multipath-tcp is a bit like bonding, but at L3; you can use > > it to handle failure, but can also use it to split traffic across > > multiple interfaces. > > > > Using a pair of 10Gb interfaces, I've managed to get 19Gbps > > (with the only tuning being using huge pages and turning the MTU up). > > > > It needs a bleeding-edge Linux kernel (in some older ones you get > > false accept messages for the subflows), and a C lib that has the > > constants defined (as current glibc does). > > > > To use it you just need to append ,mptcp to an address; > > > > -incoming tcp:0:4444,mptcp > > migrate -d tcp:192.168.11.20:4444,mptcp > > What happens if you only enable mptcp flag on one side of the > stream (whether client or server), does it degrade to boring > old single path TCP, or does it result in an error ? I've just tested this and it matches what pabeni said; it seems to just fall back. > > I had a quick go at trying NBD as well, but I think it needs > > some work with the parsing of NBD addresses. > > In theory this is applicable to anywhere that we use sockets. > Anywhere that is configured with the QAPI SocketAddress / > SocketAddressLegacy type will get it for free AFAICT. That was my hope. > Anywhere that is configured via QemuOpts will need an enhancement. > > IOW, I would think NBD already works if you configure NBD via > QMP with nbd-server-start, or block-export-add. qemu-nbd will > need cli options added. > > The block layer clients for NBD, Gluster, Sheepdog and SSH also > all get it for free when configured va QMP, or -blockdev AFAICT Have you got some examples via QMP? I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero > Legacy blocklayer filename syntax would need extra parsing, or > we can just not bother and say if you want new features, use > blockdev. > > > Overall this is impressively simple. Yeh; lots of small unexpected tidyups that took a while to fix. > It feels like it obsoletes the multifd migration code, at least > if you assume Linux platform and new enough kernel ? > > Except TLS... We already bottleneck on TLS encryption with > a single FD, since userspace encryption is limited to a > single thread. Even without TLS we already run out of CPU, probably on the receiving thread at around 20Gbps; which is a bit meh, compared to multifd which I have seen hit 80Gbps on a particularly well greased 100Gbps connection. Curiously my attempts with multifd+mptcp so far have it being slower than with just mptcp on it's own, not hitting the 20Gbps - not sure why yet. > There is the KTLS feature which offloads TLS encryption/decryption > to the kernel. This benefits even regular single FD performance, > because the encrytion work can be done by the kernel in a separate > thread from the userspace IO syscalls. > > Any idea if KTLS is fully compatible with MPTCP ? If so, then that > would look like it makes it a full replacementfor multifd on Linux. I've not tried kTLS at all yet; as pabeni says, not currently compatible. The otherones I'd like to try are zero-copy offload receive/transmit (again I'm not sure those are compatible). Dave > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Mon, Apr 12, 2021 at 03:51:10PM +0100, Dr. David Alan Gilbert wrote: > * Daniel P. Berrangé (berrange@redhat.com) wrote: > > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > Hi, > > > This RFC set adds support for multipath TCP (mptcp), > > > in particular on the migration path - but should be extensible > > > to other users. > > > > > > Multipath-tcp is a bit like bonding, but at L3; you can use > > > it to handle failure, but can also use it to split traffic across > > > multiple interfaces. > > > > > > Using a pair of 10Gb interfaces, I've managed to get 19Gbps > > > (with the only tuning being using huge pages and turning the MTU up). > > > > > > It needs a bleeding-edge Linux kernel (in some older ones you get > > > false accept messages for the subflows), and a C lib that has the > > > constants defined (as current glibc does). > > > > > > To use it you just need to append ,mptcp to an address; > > > > > > -incoming tcp:0:4444,mptcp > > > migrate -d tcp:192.168.11.20:4444,mptcp > > > > What happens if you only enable mptcp flag on one side of the > > stream (whether client or server), does it degrade to boring > > old single path TCP, or does it result in an error ? > > I've just tested this and it matches what pabeni said; it seems to just > fall back. > > > > I had a quick go at trying NBD as well, but I think it needs > > > some work with the parsing of NBD addresses. > > > > In theory this is applicable to anywhere that we use sockets. > > Anywhere that is configured with the QAPI SocketAddress / > > SocketAddressLegacy type will get it for free AFAICT. > > That was my hope. > > > Anywhere that is configured via QemuOpts will need an enhancement. > > > > IOW, I would think NBD already works if you configure NBD via > > QMP with nbd-server-start, or block-export-add. qemu-nbd will > > need cli options added. > > > > The block layer clients for NBD, Gluster, Sheepdog and SSH also > > all get it for free when configured va QMP, or -blockdev AFAICT > > Have you got some examples via QMP? > I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero I never remember the mapping to blockdev QAPI schema, especially when using legacy filename syntax with the URI. Try instead -blockdev driver=nbd,host=192.168.11.20,port=3333,mptcp=on,id=disk0backend -device virtio-blk,drive=disk0backend,id=disk0 Regards, Daniel
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Mon, Apr 12, 2021 at 03:51:10PM +0100, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrangé (berrange@redhat.com) wrote: > > > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote: > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > > > Hi, > > > > This RFC set adds support for multipath TCP (mptcp), > > > > in particular on the migration path - but should be extensible > > > > to other users. > > > > > > > > Multipath-tcp is a bit like bonding, but at L3; you can use > > > > it to handle failure, but can also use it to split traffic across > > > > multiple interfaces. > > > > > > > > Using a pair of 10Gb interfaces, I've managed to get 19Gbps > > > > (with the only tuning being using huge pages and turning the MTU up). > > > > > > > > It needs a bleeding-edge Linux kernel (in some older ones you get > > > > false accept messages for the subflows), and a C lib that has the > > > > constants defined (as current glibc does). > > > > > > > > To use it you just need to append ,mptcp to an address; > > > > > > > > -incoming tcp:0:4444,mptcp > > > > migrate -d tcp:192.168.11.20:4444,mptcp > > > > > > What happens if you only enable mptcp flag on one side of the > > > stream (whether client or server), does it degrade to boring > > > old single path TCP, or does it result in an error ? > > > > I've just tested this and it matches what pabeni said; it seems to just > > fall back. > > > > > > I had a quick go at trying NBD as well, but I think it needs > > > > some work with the parsing of NBD addresses. > > > > > > In theory this is applicable to anywhere that we use sockets. > > > Anywhere that is configured with the QAPI SocketAddress / > > > SocketAddressLegacy type will get it for free AFAICT. > > > > That was my hope. > > > > > Anywhere that is configured via QemuOpts will need an enhancement. > > > > > > IOW, I would think NBD already works if you configure NBD via > > > QMP with nbd-server-start, or block-export-add. qemu-nbd will > > > need cli options added. > > > > > > The block layer clients for NBD, Gluster, Sheepdog and SSH also > > > all get it for free when configured va QMP, or -blockdev AFAICT > > > > Have you got some examples via QMP? > > I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero > > I never remember the mapping to blockdev QAPI schema, especially > when using legacy filename syntax with the URI. > > Try instead > > -blockdev driver=nbd,host=192.168.11.20,port=3333,mptcp=on,id=disk0backend > -device virtio-blk,drive=disk0backend,id=disk0 That doesn't look like the right syntax, but it got me closer; and it's working with no more code changes: On the source: qemu... -nographic -M none -drive if=none,file=my.qcow2,id=mydisk (qemu) nbd_server_start 0.0.0.0:3333,mptcp=on (qemu) nbd_server_add -w mydisk On the destination: -blockdev driver=nbd,server.type=inet,server.host=192.168.11.20,server.port=3333,server.mptcp=on,node-name=nbddisk,export=mydisk -device virtio-blk,drive=nbddisk,id=disk0 and it succesfully booted off it, and it looks like it has two flows. (It didn't get that great a bandwidth, but I'm not sure where that's due to). Dave > > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Hi, This RFC set adds support for multipath TCP (mptcp), in particular on the migration path - but should be extensible to other users. Multipath-tcp is a bit like bonding, but at L3; you can use it to handle failure, but can also use it to split traffic across multiple interfaces. Using a pair of 10Gb interfaces, I've managed to get 19Gbps (with the only tuning being using huge pages and turning the MTU up). It needs a bleeding-edge Linux kernel (in some older ones you get false accept messages for the subflows), and a C lib that has the constants defined (as current glibc does). To use it you just need to append ,mptcp to an address; -incoming tcp:0:4444,mptcp migrate -d tcp:192.168.11.20:4444,mptcp I had a quick go at trying NBD as well, but I think it needs some work with the parsing of NBD addresses. All comments welcome. Dave Dr. David Alan Gilbert (5): channel-socket: Only set CLOEXEC if we have space for fds io/net-listener: Call the notifier during finalize migration: Add cleanup hook for inwards migration migration/socket: Close the listener at the end sockets: Support multipath TCP io/channel-socket.c | 8 ++++---- io/dns-resolver.c | 2 ++ io/net-listener.c | 3 +++ migration/migration.c | 3 +++ migration/migration.h | 4 ++++ migration/multifd.c | 5 +++++ migration/socket.c | 24 ++++++++++++++++++------ qapi/sockets.json | 5 ++++- util/qemu-sockets.c | 34 ++++++++++++++++++++++++++++++++++ 9 files changed, 77 insertions(+), 11 deletions(-)