mbox series

[RFC,v1,0/9] zero-copy RX for io_uring

Message ID 20221007211713.170714-1-jonathan.lemon@gmail.com (mailing list archive)
Headers show
Series zero-copy RX for io_uring | expand

Message

Jonathan Lemon Oct. 7, 2022, 9:17 p.m. UTC
This series is a RFC for io_uring/zctap.  This is an evolution of
the earlier zctap work, re-targeted to use io_uring as the userspace
API.  The current code is intended to provide a zero-copy RX path for
upper-level networking protocols (aka TCP and UDP).  The current draft
focuses on host-provided memory (not GPU memory).

This RFC contains the upper-level core code required for operation,
with the intent of soliciting feedback on the general API.  This does
not contain the network driver side changes required for complete
operation.  Also please note that as an RFC, there are some things
which are incomplete or in need of rework.

The intent is to use a network driver which provides header/data
splitting, so the frame header (which is processed by the networking
stack) does not reside in user memory.

The code is roughly working (in that it has successfully received
a TCP stream from a remote sender), but as an RFC, the intent is
to solicit feedback on the API and overall design.  The current code
will also work with system pages, copying the data out to the
application - this is intended as a fallback/testing path.

High level description:

The application allocates a frame backing store, and provides this
to the kernel for use.  An interface queue is requested from the
networking device, and incoming frames are deposited into the provided
memory region.

Responsibility for correctly steering incoming frames to the queue
is outside the scope of this work - it is assumed that the user
has set steering rules up separately.

Incoming frames are sent up the stack as skb's and eventually
land in the application's socket receive queue.  This differs
from AF_XDP, which receives raw frames directly to userspace,
without protocol processing.

The RECV_ZC opcode then returns an iov[] style vector which points
to the data in userspace memory.  When the application has completed
processing of the data, the buffer is returned back to the kernel
through a fill ring for reuse.

Jonathan Lemon (9):
  io_uring: add zctap ifq definition
  netdevice: add SETUP_ZCTAP to the netdev_bpf structure
  io_uring: add register ifq opcode
  io_uring: add provide_ifq_region opcode
  io_uring: Add io_uring zctap iov structure and helpers
  io_uring: introduce reference tracking for user pages.
  page_pool: add page allocation and free hooks.
  io_uring: provide functions for the page_pool.
  io_uring: add OP_RECV_ZC command.

 include/linux/io_uring.h       |  24 ++
 include/linux/io_uring_types.h |  10 +
 include/linux/netdevice.h      |   6 +
 include/net/page_pool.h        |   6 +
 include/uapi/linux/io_uring.h  |  26 ++
 io_uring/Makefile              |   3 +-
 io_uring/io_uring.c            |  10 +
 io_uring/kbuf.c                |  13 +
 io_uring/kbuf.h                |   2 +
 io_uring/net.c                 | 123 ++++++
 io_uring/opdef.c               |  23 +
 io_uring/zctap.c               | 749 +++++++++++++++++++++++++++++++++
 io_uring/zctap.h               |  20 +
 net/core/page_pool.c           |  41 +-
 14 files changed, 1048 insertions(+), 8 deletions(-)
 create mode 100644 io_uring/zctap.c
 create mode 100644 io_uring/zctap.h

Comments

Dust Li Oct. 10, 2022, 7:37 a.m. UTC | #1
On Fri, Oct 07, 2022 at 02:17:04PM -0700, Jonathan Lemon wrote:
>This series is a RFC for io_uring/zctap.  This is an evolution of
>the earlier zctap work, re-targeted to use io_uring as the userspace
>API.  The current code is intended to provide a zero-copy RX path for
>upper-level networking protocols (aka TCP and UDP).  The current draft
>focuses on host-provided memory (not GPU memory).
>
>This RFC contains the upper-level core code required for operation,
>with the intent of soliciting feedback on the general API.  This does
>not contain the network driver side changes required for complete
>operation.  Also please note that as an RFC, there are some things
>which are incomplete or in need of rework.
>
>The intent is to use a network driver which provides header/data
>splitting, so the frame header (which is processed by the networking
>stack) does not reside in user memory.
>
>The code is roughly working (in that it has successfully received
>a TCP stream from a remote sender), but as an RFC, the intent is
>to solicit feedback on the API and overall design.  The current code
>will also work with system pages, copying the data out to the
>application - this is intended as a fallback/testing path.
>
>High level description:
>
>The application allocates a frame backing store, and provides this
>to the kernel for use.  An interface queue is requested from the
>networking device, and incoming frames are deposited into the provided
>memory region.
>
>Responsibility for correctly steering incoming frames to the queue
>is outside the scope of this work - it is assumed that the user
>has set steering rules up separately.
>
>Incoming frames are sent up the stack as skb's and eventually
>land in the application's socket receive queue.  This differs
>from AF_XDP, which receives raw frames directly to userspace,
>without protocol processing.
>
>The RECV_ZC opcode then returns an iov[] style vector which points
>to the data in userspace memory.  When the application has completed
>processing of the data, the buffer is returned back to the kernel
>through a fill ring for reuse.

Interesting work ! Any userspace demo and performance data ?

>
>Jonathan Lemon (9):
>  io_uring: add zctap ifq definition
>  netdevice: add SETUP_ZCTAP to the netdev_bpf structure
>  io_uring: add register ifq opcode
>  io_uring: add provide_ifq_region opcode
>  io_uring: Add io_uring zctap iov structure and helpers
>  io_uring: introduce reference tracking for user pages.
>  page_pool: add page allocation and free hooks.
>  io_uring: provide functions for the page_pool.
>  io_uring: add OP_RECV_ZC command.
>
> include/linux/io_uring.h       |  24 ++
> include/linux/io_uring_types.h |  10 +
> include/linux/netdevice.h      |   6 +
> include/net/page_pool.h        |   6 +
> include/uapi/linux/io_uring.h  |  26 ++
> io_uring/Makefile              |   3 +-
> io_uring/io_uring.c            |  10 +
> io_uring/kbuf.c                |  13 +
> io_uring/kbuf.h                |   2 +
> io_uring/net.c                 | 123 ++++++
> io_uring/opdef.c               |  23 +
> io_uring/zctap.c               | 749 +++++++++++++++++++++++++++++++++
> io_uring/zctap.h               |  20 +
> net/core/page_pool.c           |  41 +-
> 14 files changed, 1048 insertions(+), 8 deletions(-)
> create mode 100644 io_uring/zctap.c
> create mode 100644 io_uring/zctap.h
>
>-- 
>2.30.2
Jonathan Lemon Oct. 10, 2022, 7:34 p.m. UTC | #2
On 10/10/22 12:37 AM, dust.li wrote:
> On Fri, Oct 07, 2022 at 02:17:04PM -0700, Jonathan Lemon wrote:
>> This series is a RFC for io_uring/zctap.  This is an evolution of
>> the earlier zctap work, re-targeted to use io_uring as the userspace
>> API.  The current code is intended to provide a zero-copy RX path for
>> upper-level networking protocols (aka TCP and UDP).  The current draft
>> focuses on host-provided memory (not GPU memory).
>>
>> This RFC contains the upper-level core code required for operation,
>> with the intent of soliciting feedback on the general API.  This does
>> not contain the network driver side changes required for complete
>> operation.  Also please note that as an RFC, there are some things
>> which are incomplete or in need of rework.
>>
>> The intent is to use a network driver which provides header/data
>> splitting, so the frame header (which is processed by the networking
>> stack) does not reside in user memory.
>>
>> The code is roughly working (in that it has successfully received
>> a TCP stream from a remote sender), but as an RFC, the intent is
>> to solicit feedback on the API and overall design.  The current code
>> will also work with system pages, copying the data out to the
>> application - this is intended as a fallback/testing path.
>>
>> High level description:
>>
>> The application allocates a frame backing store, and provides this
>> to the kernel for use.  An interface queue is requested from the
>> networking device, and incoming frames are deposited into the provided
>> memory region.
>>
>> Responsibility for correctly steering incoming frames to the queue
>> is outside the scope of this work - it is assumed that the user
>> has set steering rules up separately.
>>
>> Incoming frames are sent up the stack as skb's and eventually
>> land in the application's socket receive queue.  This differs
>>from AF_XDP, which receives raw frames directly to userspace,
>> without protocol processing.
>>
>> The RECV_ZC opcode then returns an iov[] style vector which points
>> to the data in userspace memory.  When the application has completed
>> processing of the data, the buffer is returned back to the kernel
>> through a fill ring for reuse.
> 
> Interesting work ! Any userspace demo and performance data ?

Coming soon!  I'm hoping to get feedback on the overall API though, did 
you have any thoughts here?