mbox series

[RFC,0/6] implement io_uring notification (ubuf_info) stacking

Message ID cover.1712923998.git.asml.silence@gmail.com (mailing list archive)
Headers show
Series implement io_uring notification (ubuf_info) stacking | expand

Message

Pavel Begunkov April 12, 2024, 12:55 p.m. UTC
io_uring allocates a ubuf_info per zerocopy send request, it's convenient
for the userspace but with how things are it means that every time the 
TCP stack has to allocate a new skb instead of amending into a previous
one. Unless sends are big enough, there will be lots of small skbs
straining the stack and dipping performance.

The patchset implements notification, i.e. an io_uring's ubuf_info
extension, stacking. It tries to link ubuf_info's into a list, and
the entire link will be put down together once all references are
gone.

Testing with liburing/examples/send-zerocopy and another custom made
tool, with 4K bytes per send it improves performance ~6 times and
levels it with MSG_ZEROCOPY. Without the patchset it requires much
larger sends to utilise all potential.

bytes  | before | after (Kqps)  
100    | 283    | 936
1200   | 195    | 1023
4000   | 193    | 1386
8000   | 154    | 1058

Pavel Begunkov (6):
  net: extend ubuf_info callback to ops structure
  net: add callback for setting a ubuf_info to skb
  io_uring/notif: refactor io_tx_ubuf_complete()
  io_uring/notif: remove ctx var from io_notif_tw_complete
  io_uring/notif: simplify io_notif_flush()
  io_uring/notif: implement notification stacking

 drivers/net/tap.c      |  2 +-
 drivers/net/tun.c      |  2 +-
 drivers/vhost/net.c    |  8 +++-
 include/linux/skbuff.h | 21 ++++++----
 io_uring/notif.c       | 91 +++++++++++++++++++++++++++++++++++-------
 io_uring/notif.h       | 13 +++---
 net/core/skbuff.c      | 37 +++++++++++------
 7 files changed, 129 insertions(+), 45 deletions(-)

Comments

Jens Axboe April 12, 2024, 1:44 p.m. UTC | #1
On 4/12/24 6:55 AM, Pavel Begunkov wrote:
> io_uring allocates a ubuf_info per zerocopy send request, it's convenient
> for the userspace but with how things are it means that every time the 
> TCP stack has to allocate a new skb instead of amending into a previous
> one. Unless sends are big enough, there will be lots of small skbs
> straining the stack and dipping performance.
> 
> The patchset implements notification, i.e. an io_uring's ubuf_info
> extension, stacking. It tries to link ubuf_info's into a list, and
> the entire link will be put down together once all references are
> gone.

Excellent! I'll take a closer look, but I ran a quick test with my test
tool just to see the difference. This is on a 100G link.

Packet size	Before (Mbit)    After (Mbit)   Diff
====================================================
100		290		  1250		4.3x
200		560		  2460		4.4x
400		1190		  4900		4.1x
800		2300		  9700		4.2x
1600		4500		 19100		4.2x
3200		8900		 35000		3.9x

which are just rough numbers and the tool isn't that great, but
definitely encouraging. And it does have parity with sync MSG_ZEROPCY,
which is what I was really bugged about before.
Jens Axboe April 12, 2024, 2:52 p.m. UTC | #2
Reviewed the patch set, and I think this is nice and clean and the right
fix. For the series:

Reviewed-by: Jens Axboe <axboe@kernel.dk>

If the net people agree, we'll have to coordinate staging of the first
two patches.
David Ahern April 13, 2024, 5:17 p.m. UTC | #3
On 4/12/24 6:55 AM, Pavel Begunkov wrote:
> io_uring allocates a ubuf_info per zerocopy send request, it's convenient
> for the userspace but with how things are it means that every time the 
> TCP stack has to allocate a new skb instead of amending into a previous
> one. Unless sends are big enough, there will be lots of small skbs
> straining the stack and dipping performance.

The ubuf_info forces TCP segmentation at less than MTU boundaries which
kills performance with small message sizes as TCP is forced to send
small packets. This is an interesting solution to allow the byte stream
to flow yet maintain the segmentation boundaries for callbacks.

> 
> The patchset implements notification, i.e. an io_uring's ubuf_info
> extension, stacking. It tries to link ubuf_info's into a list, and
> the entire link will be put down together once all references are
> gone.
> 
> Testing with liburing/examples/send-zerocopy and another custom made
> tool, with 4K bytes per send it improves performance ~6 times and
> levels it with MSG_ZEROCOPY. Without the patchset it requires much
> larger sends to utilise all potential.
> 
> bytes  | before | after (Kqps)  
> 100    | 283    | 936
> 1200   | 195    | 1023
> 4000   | 193    | 1386
> 8000   | 154    | 1058
> 
> Pavel Begunkov (6):
>   net: extend ubuf_info callback to ops structure
>   net: add callback for setting a ubuf_info to skb
>   io_uring/notif: refactor io_tx_ubuf_complete()
>   io_uring/notif: remove ctx var from io_notif_tw_complete
>   io_uring/notif: simplify io_notif_flush()
>   io_uring/notif: implement notification stacking
> 
>  drivers/net/tap.c      |  2 +-
>  drivers/net/tun.c      |  2 +-
>  drivers/vhost/net.c    |  8 +++-
>  include/linux/skbuff.h | 21 ++++++----
>  io_uring/notif.c       | 91 +++++++++++++++++++++++++++++++++++-------
>  io_uring/notif.h       | 13 +++---
>  net/core/skbuff.c      | 37 +++++++++++------
>  7 files changed, 129 insertions(+), 45 deletions(-)
>
Pavel Begunkov April 15, 2024, 12:08 a.m. UTC | #4
On 4/13/24 18:17, David Ahern wrote:
> On 4/12/24 6:55 AM, Pavel Begunkov wrote:
>> io_uring allocates a ubuf_info per zerocopy send request, it's convenient
>> for the userspace but with how things are it means that every time the
>> TCP stack has to allocate a new skb instead of amending into a previous
>> one. Unless sends are big enough, there will be lots of small skbs
>> straining the stack and dipping performance.
> 
> The ubuf_info forces TCP segmentation at less than MTU boundaries which
> kills performance with small message sizes as TCP is forced to send
> small packets. This is an interesting solution to allow the byte stream
> to flow yet maintain the segmentation boundaries for callbacks.

Thanks, I'll add your reviews if the patches survive in the
current form!


>> The patchset implements notification, i.e. an io_uring's ubuf_info
>> extension, stacking. It tries to link ubuf_info's into a list, and
>> the entire link will be put down together once all references are
>> gone.
>>
>> Testing with liburing/examples/send-zerocopy and another custom made
>> tool, with 4K bytes per send it improves performance ~6 times and
>> levels it with MSG_ZEROCOPY. Without the patchset it requires much
>> larger sends to utilise all potential.
>>
>> bytes  | before | after (Kqps)
>> 100    | 283    | 936
>> 1200   | 195    | 1023
>> 4000   | 193    | 1386
>> 8000   | 154    | 1058
>>
>> Pavel Begunkov (6):
>>    net: extend ubuf_info callback to ops structure
>>    net: add callback for setting a ubuf_info to skb
>>    io_uring/notif: refactor io_tx_ubuf_complete()
>>    io_uring/notif: remove ctx var from io_notif_tw_complete
>>    io_uring/notif: simplify io_notif_flush()
>>    io_uring/notif: implement notification stacking
>>
>>   drivers/net/tap.c      |  2 +-
>>   drivers/net/tun.c      |  2 +-
>>   drivers/vhost/net.c    |  8 +++-
>>   include/linux/skbuff.h | 21 ++++++----
>>   io_uring/notif.c       | 91 +++++++++++++++++++++++++++++++++++-------
>>   io_uring/notif.h       | 13 +++---
>>   net/core/skbuff.c      | 37 +++++++++++------
>>   7 files changed, 129 insertions(+), 45 deletions(-)
>>
>