Message ID | cover.1712923998.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | implement io_uring notification (ubuf_info) stacking | expand |
On 4/12/24 6:55 AM, Pavel Begunkov wrote: > io_uring allocates a ubuf_info per zerocopy send request, it's convenient > for the userspace but with how things are it means that every time the > TCP stack has to allocate a new skb instead of amending into a previous > one. Unless sends are big enough, there will be lots of small skbs > straining the stack and dipping performance. > > The patchset implements notification, i.e. an io_uring's ubuf_info > extension, stacking. It tries to link ubuf_info's into a list, and > the entire link will be put down together once all references are > gone. Excellent! I'll take a closer look, but I ran a quick test with my test tool just to see the difference. This is on a 100G link. Packet size Before (Mbit) After (Mbit) Diff ==================================================== 100 290 1250 4.3x 200 560 2460 4.4x 400 1190 4900 4.1x 800 2300 9700 4.2x 1600 4500 19100 4.2x 3200 8900 35000 3.9x which are just rough numbers and the tool isn't that great, but definitely encouraging. And it does have parity with sync MSG_ZEROPCY, which is what I was really bugged about before.
Reviewed the patch set, and I think this is nice and clean and the right
fix. For the series:
Reviewed-by: Jens Axboe <axboe@kernel.dk>
If the net people agree, we'll have to coordinate staging of the first
two patches.
On 4/12/24 6:55 AM, Pavel Begunkov wrote: > io_uring allocates a ubuf_info per zerocopy send request, it's convenient > for the userspace but with how things are it means that every time the > TCP stack has to allocate a new skb instead of amending into a previous > one. Unless sends are big enough, there will be lots of small skbs > straining the stack and dipping performance. The ubuf_info forces TCP segmentation at less than MTU boundaries which kills performance with small message sizes as TCP is forced to send small packets. This is an interesting solution to allow the byte stream to flow yet maintain the segmentation boundaries for callbacks. > > The patchset implements notification, i.e. an io_uring's ubuf_info > extension, stacking. It tries to link ubuf_info's into a list, and > the entire link will be put down together once all references are > gone. > > Testing with liburing/examples/send-zerocopy and another custom made > tool, with 4K bytes per send it improves performance ~6 times and > levels it with MSG_ZEROCOPY. Without the patchset it requires much > larger sends to utilise all potential. > > bytes | before | after (Kqps) > 100 | 283 | 936 > 1200 | 195 | 1023 > 4000 | 193 | 1386 > 8000 | 154 | 1058 > > Pavel Begunkov (6): > net: extend ubuf_info callback to ops structure > net: add callback for setting a ubuf_info to skb > io_uring/notif: refactor io_tx_ubuf_complete() > io_uring/notif: remove ctx var from io_notif_tw_complete > io_uring/notif: simplify io_notif_flush() > io_uring/notif: implement notification stacking > > drivers/net/tap.c | 2 +- > drivers/net/tun.c | 2 +- > drivers/vhost/net.c | 8 +++- > include/linux/skbuff.h | 21 ++++++---- > io_uring/notif.c | 91 +++++++++++++++++++++++++++++++++++------- > io_uring/notif.h | 13 +++--- > net/core/skbuff.c | 37 +++++++++++------ > 7 files changed, 129 insertions(+), 45 deletions(-) >
On 4/13/24 18:17, David Ahern wrote: > On 4/12/24 6:55 AM, Pavel Begunkov wrote: >> io_uring allocates a ubuf_info per zerocopy send request, it's convenient >> for the userspace but with how things are it means that every time the >> TCP stack has to allocate a new skb instead of amending into a previous >> one. Unless sends are big enough, there will be lots of small skbs >> straining the stack and dipping performance. > > The ubuf_info forces TCP segmentation at less than MTU boundaries which > kills performance with small message sizes as TCP is forced to send > small packets. This is an interesting solution to allow the byte stream > to flow yet maintain the segmentation boundaries for callbacks. Thanks, I'll add your reviews if the patches survive in the current form! >> The patchset implements notification, i.e. an io_uring's ubuf_info >> extension, stacking. It tries to link ubuf_info's into a list, and >> the entire link will be put down together once all references are >> gone. >> >> Testing with liburing/examples/send-zerocopy and another custom made >> tool, with 4K bytes per send it improves performance ~6 times and >> levels it with MSG_ZEROCOPY. Without the patchset it requires much >> larger sends to utilise all potential. >> >> bytes | before | after (Kqps) >> 100 | 283 | 936 >> 1200 | 195 | 1023 >> 4000 | 193 | 1386 >> 8000 | 154 | 1058 >> >> Pavel Begunkov (6): >> net: extend ubuf_info callback to ops structure >> net: add callback for setting a ubuf_info to skb >> io_uring/notif: refactor io_tx_ubuf_complete() >> io_uring/notif: remove ctx var from io_notif_tw_complete >> io_uring/notif: simplify io_notif_flush() >> io_uring/notif: implement notification stacking >> >> drivers/net/tap.c | 2 +- >> drivers/net/tun.c | 2 +- >> drivers/vhost/net.c | 8 +++- >> include/linux/skbuff.h | 21 ++++++---- >> io_uring/notif.c | 91 +++++++++++++++++++++++++++++++++++------- >> io_uring/notif.h | 13 +++--- >> net/core/skbuff.c | 37 +++++++++++------ >> 7 files changed, 129 insertions(+), 45 deletions(-) >> >