[GIT,PULL] io_uring support for zerocopy send

Message ID	d5568318-39ea-0c39-c765-852411409b68@kernel.dk (mailing list archive)
State	New
Headers	show Return-Path: <io-uring-owner@kernel.org> Message-ID: <d5568318-39ea-0c39-c765-852411409b68@kernel.dk> Date: Sun, 31 Jul 2022 09:03:36 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 From: Jens Axboe <axboe@kernel.dk> Subject: [GIT PULL] io_uring support for zerocopy send To: Linus Torvalds <torvalds@linux-foundation.org> Cc: io-uring <io-uring@vger.kernel.org>, netdev <netdev@vger.kernel.org> Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk
Series	[GIT,PULL] io_uring support for zerocopy send \| expand [GIT,PULL] io_uring support for zerocopy send

Jens Axboe July 31, 2022, 3:03 p.m. UTC

Hi Linus,

On top of the core io_uring changes, this pull request adds support for
efficient support for zerocopy sends through io_uring. Both ipv4 and
ipv6 is supported, as well as both TCP and UDP.

The core network changes to support this is in a stable branch from
Jakub that both io_uring and net-next has pulled in, and the io_uring
changes are layered on top of that.

All of the work has been done by Pavel.

Please pull!


The following changes since commit f6b543fd03d347e8bf245cee4f2d54eb6ffd8fcb:

  io_uring: ensure REQ_F_ISREG is set async offload (2022-07-24 18:39:18 -0600)

are available in the Git repository at:

  git://git.kernel.dk/linux-block.git tags/for-5.20/io_uring-zerocopy-send-2022-07-29

for you to fetch changes up to 14b146b688ad9593f5eee93d51a34d09a47e50b5:

  io_uring: notification completion optimisation (2022-07-27 08:50:50 -0600)

----------------------------------------------------------------
for-5.20/io_uring-zerocopy-send-2022-07-29

----------------------------------------------------------------
David Ahern (1):
      net: Allow custom iter handler in msghdr

Jens Axboe (2):
      Merge branch 'io_uring-zerocopy-send' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux into for-5.20/io_uring-zerocopy-send
      Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-send

Pavel Begunkov (33):
      ipv4: avoid partial copy for zc
      ipv6: avoid partial copy for zc
      skbuff: don't mix ubuf_info from different sources
      skbuff: add SKBFL_DONT_ORPHAN flag
      skbuff: carry external ubuf_info in msghdr
      net: introduce managed frags infrastructure
      net: introduce __skb_fill_page_desc_noacc
      ipv4/udp: support externally provided ubufs
      ipv6/udp: support externally provided ubufs
      tcp: support externally provided ubufs
      net: fix uninitialised msghdr->sg_from_iter
      io_uring: initialise msghdr::msg_ubuf
      io_uring: export io_put_task()
      io_uring: add zc notification infrastructure
      io_uring: cache struct io_notif
      io_uring: complete notifiers in tw
      io_uring: add rsrc referencing for notifiers
      io_uring: add notification slot registration
      io_uring: wire send zc request type
      io_uring: account locked pages for non-fixed zc
      io_uring: allow to pass addr into sendzc
      io_uring: sendzc with fixed buffers
      io_uring: flush notifiers after sendzc
      io_uring: rename IORING_OP_FILES_UPDATE
      io_uring: add zc notification flush requests
      io_uring: enable managed frags with register buffers
      selftests/io_uring: test zerocopy send
      io_uring/net: improve io_get_notif_slot types
      io_uring/net: checks errors of zc mem accounting
      io_uring/net: make page accounting more consistent
      io_uring/net: use unsigned for flags
      io_uring: export req alloc from core
      io_uring: notification completion optimisation

 include/linux/io_uring_types.h                     |  30 +
 include/linux/skbuff.h                             |  66 ++-
 include/linux/socket.h                             |   5 +
 include/uapi/linux/io_uring.h                      |  45 +-
 io_uring/Makefile                                  |   2 +-
 io_uring/io_uring.c                                |  61 +--
 io_uring/io_uring.h                                |  43 ++
 io_uring/net.c                                     | 193 ++++++-
 io_uring/net.h                                     |   3 +
 io_uring/notif.c                                   | 159 ++++++
 io_uring/notif.h                                   |  90 +++
 io_uring/opdef.c                                   |  24 +-
 io_uring/rsrc.c                                    |  67 ++-
 io_uring/rsrc.h                                    |  25 +-
 io_uring/tctx.h                                    |  26 -
 net/compat.c                                       |   1 +
 net/core/datagram.c                                |  14 +-
 net/core/skbuff.c                                  |  37 +-
 net/ipv4/ip_output.c                               |  50 +-
 net/ipv4/tcp.c                                     |  33 +-
 net/ipv6/ip6_output.c                              |  49 +-
 net/socket.c                                       |   2 +
 tools/testing/selftests/net/Makefile               |   1 +
 tools/testing/selftests/net/io_uring_zerocopy_tx.c | 605 +++++++++++++++++++++
 .../testing/selftests/net/io_uring_zerocopy_tx.sh  | 131 +++++
 25 files changed, 1604 insertions(+), 158 deletions(-)
 create mode 100644 io_uring/notif.c
 create mode 100644 io_uring/notif.h
 create mode 100644 tools/testing/selftests/net/io_uring_zerocopy_tx.c
 create mode 100755 tools/testing/selftests/net/io_uring_zerocopy_tx.sh

Linus Torvalds Aug. 2, 2022, 8:45 p.m. UTC | #1

On Sun, Jul 31, 2022 at 8:03 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On top of the core io_uring changes, this pull request adds support for
> efficient support for zerocopy sends through io_uring. Both ipv4 and
> ipv6 is supported, as well as both TCP and UDP.

I've pulled this, but I would *really* have wanted to see real
performance numbers from real loads.

Zero-copy networking has decades of history (and very much not just in
Linux) of absolutely _wonderful_ benchmark numbers, but less-than
impressive take-up on real loads.

A lot of the wonderful benchmark numbers are based on loads that
carefully don't touch the data on either the sender or receiver side,
and that get perfect behavior from a performance standpoint as a
result, but don't actually do anything remotely realistic in the
process.

Having data that never resides in the CPU caches, or having mappings
that are never written to and thus never take page faults are classic
examples of "look, benchmark numbers!".

Please?

           Linus

pr-tracker-bot@kernel.org Aug. 2, 2022, 9:30 p.m. UTC | #2

The pull request you sent on Sun, 31 Jul 2022 09:03:36 -0600:

> git://git.kernel.dk/linux-block.git tags/for-5.20/io_uring-zerocopy-send-2022-07-29

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/42df1cbf6a4726934cc5dac12bf263aa73c49fa3

Thank you!

Jens Axboe Aug. 3, 2022, 4:39 p.m. UTC | #3

On 8/2/22 2:45 PM, Linus Torvalds wrote:
> On Sun, Jul 31, 2022 at 8:03 AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On top of the core io_uring changes, this pull request adds support for
>> efficient support for zerocopy sends through io_uring. Both ipv4 and
>> ipv6 is supported, as well as both TCP and UDP.
> 
> I've pulled this, but I would *really* have wanted to see real
> performance numbers from real loads.
> 
> Zero-copy networking has decades of history (and very much not just in
> Linux) of absolutely _wonderful_ benchmark numbers, but less-than
> impressive take-up on real loads.
> 
> A lot of the wonderful benchmark numbers are based on loads that
> carefully don't touch the data on either the sender or receiver side,
> and that get perfect behavior from a performance standpoint as a
> result, but don't actually do anything remotely realistic in the
> process.
> 
> Having data that never resides in the CPU caches, or having mappings
> that are never written to and thus never take page faults are classic
> examples of "look, benchmark numbers!".
> 
> Please?

That's a valid concern! One of the key points behind Pavel's work is
that we wanted to make zerocopy _actually_ work with smaller payloads. A
lot of the past work has been focused on (or only useful with) bigger
payloads, which then almost firmly lands it in the realm of "looks good
on streamed benchmarks". If you look at the numbers Pavel posted, it's
definitely firmly in benchmark land, but I do think the goals of
breaking even with non zero-copy for realistic payload sizes is the real
differentiator here.

For the io_uring network developments, Dylan wrote a benchmark that we
use to mimic things like Thrift. Yes it's a benchmark, but it's meant to
model real world things, not just measure ping-pongs or streamed
bandwidth. It's actually helped drive various of the more recent
features, as well as things coming in the next release, and been very
useful as a research vehicle for adding real io_uring support to Thrift.
The latter is why it was created in the first place, not to have Yet
Another benchmark that can just spew meaningless numbers. Zero-copy is
being added there too, and we just talked about adding some more tweaks
to netbench that allows it to model data/cache usage too on both ends.

The Thrift work is what is really driving this, but it isn't quite done
yet. Looking very promising vs epoll now, though, we'll make some more
noise about this once it lands. Moving to a completion based model takes
a bit of time, it's not a quick hack conversion where you just switch to
a different notification base.

Linus Torvalds Aug. 3, 2022, 4:44 p.m. UTC | #4

On Wed, Aug 3, 2022 at 9:39 AM Jens Axboe <axboe@kernel.dk> wrote:
>
>      If you look at the numbers Pavel posted, it's
> definitely firmly in benchmark land, but I do think the goals of
> breaking even with non zero-copy for realistic payload sizes is the real
> differentiator here.

Well, a big part of why I wrote the query email was exactly because I
haven't seen any numbers, and the pull request didn't have any links
to any.

So you say "the numbers Pavel posted" and I say "where?"

It would have been good to have had a link in the pull request (and
thus in the merge message).

               Linus

Jens Axboe Aug. 3, 2022, 4:47 p.m. UTC | #5

On 8/3/22 10:44 AM, Linus Torvalds wrote:
> On Wed, Aug 3, 2022 at 9:39 AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>>      If you look at the numbers Pavel posted, it's
>> definitely firmly in benchmark land, but I do think the goals of
>> breaking even with non zero-copy for realistic payload sizes is the real
>> differentiator here.
> 
> Well, a big part of why I wrote the query email was exactly because I
> haven't seen any numbers, and the pull request didn't have any links
> to any.
> 
> So you say "the numbers Pavel posted" and I say "where?"

Didn't think of that since it's in the git commit link, but I now
realize that it's like 3 series of things in there.

> It would have been good to have had a link in the pull request (and
> thus in the merge message).

Agree, it should've been in there. Here's the one from the series that
got merged:

https://lore.kernel.org/all/cover.1657643355.git.asml.silence@gmail.com/

[GIT,PULL] io_uring support for zerocopy send

Pull-request

Message

Comments