mbox series

[RFC,net-next,00/27] net and/or udp optimisations

Message ID cover.1648981570.git.asml.silence@gmail.com (mailing list archive)
Headers show
Series net and/or udp optimisations | expand

Message

Pavel Begunkov April 3, 2022, 1:06 p.m. UTC
A mix of various net optimisations, which were mostly discovered during UDP
testing. Benchmarked with an io_uring test using 16B UDP/IPv6 over dummy netdev:
2090K vs 2229K tx/s, +6.6%, or in a 4-8% range if not averaging across reboots.

1-3 removes extra atomics and barriers from sock_wfree() mainly benefitting UDP.
4-7 cleans up some zerocopy helpers
8-16 do inlining of ipv6 and generic net pathes
17 is a small nice performance improvement for TCP zerocopy
18-27 refactors UDP to shed some more overhead

Pavel Begunkov (27):
  sock: deduplicate ->sk_wmem_alloc check
  sock: optimise sock_def_write_space send refcounting
  sock: optimise sock_def_write_space barriers
  skbuff: drop zero check from skb_zcopy_set
  skbuff: drop null check from skb_zcopy
  net: xen: set zc flags only when there is ubuf
  skbuff: introduce skb_is_zcopy()
  skbuff: optimise alloc_skb_with_frags()
  net: inline sock_alloc_send_skb
  net: inline part of skb_csum_hwoffload_help
  net: inline skb_zerocopy_iter_dgram
  ipv6: inline ip6_local_out()
  ipv6: help __ip6_finish_output() inlining
  ipv6: refactor ip6_finish_output2()
  net: inline dev_queue_xmit()
  ipv6: partially inline fl6_update_dst()
  tcp: optimise skb_zerocopy_iter_stream()
  net: optimise ipcm6 cookie init
  udp/ipv6: refactor udpv6_sendmsg udplite checks
  udp/ipv6: move pending section of udpv6_sendmsg
  udp/ipv6: prioritise the ip6 path over ip4 checks
  udp/ipv6: optimise udpv6_sendmsg() daddr checks
  udp/ipv6: optimise out daddr reassignment
  udp/ipv6: clean up udpv6_sendmsg's saddr init
  ipv6: refactor opts push in __ip6_make_skb()
  ipv6: improve opt-less __ip6_make_skb()
  ipv6: clean up ip6_setup_cork

 drivers/net/xen-netback/interface.c |   3 +-
 include/linux/netdevice.h           |  27 ++++-
 include/linux/skbuff.h              | 102 +++++++++++++-----
 include/net/ipv6.h                  |  37 ++++---
 include/net/sock.h                  |  10 +-
 net/core/datagram.c                 |   2 -
 net/core/datagram.h                 |  15 ---
 net/core/dev.c                      |  28 ++---
 net/core/skbuff.c                   |  59 ++++-------
 net/core/sock.c                     |  50 +++++++--
 net/ipv4/ip_output.c                |  10 +-
 net/ipv4/tcp.c                      |   5 +-
 net/ipv6/datagram.c                 |   4 +-
 net/ipv6/exthdrs.c                  |  15 ++-
 net/ipv6/ip6_output.c               |  88 ++++++++--------
 net/ipv6/output_core.c              |  12 ---
 net/ipv6/raw.c                      |   8 +-
 net/ipv6/udp.c                      | 158 +++++++++++++---------------
 net/l2tp/l2tp_ip6.c                 |   8 +-
 19 files changed, 339 insertions(+), 302 deletions(-)
 delete mode 100644 net/core/datagram.h

Comments

Eric Dumazet April 6, 2022, 9:44 a.m. UTC | #1
On Sun, Apr 3, 2022 at 6:08 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> A mix of various net optimisations, which were mostly discovered during UDP
> testing. Benchmarked with an io_uring test using 16B UDP/IPv6 over dummy netdev:
> 2090K vs 2229K tx/s, +6.6%, or in a 4-8% range if not averaging across reboots.
>
> 1-3 removes extra atomics and barriers from sock_wfree() mainly benefitting UDP.
> 4-7 cleans up some zerocopy helpers
> 8-16 do inlining of ipv6 and generic net pathes
> 17 is a small nice performance improvement for TCP zerocopy
> 18-27 refactors UDP to shed some more overhead
>

Please send a smaller series first.

About inlining everything around, make sure to include performance
numbers only for the inline parts.
We can inline everything and make the kernel 4 x time bigger.
Synthetic benchmarks will still show improvements but in overall, we
add icache cost that is going to hurt latencies.
I vote that you focus on the other parts first.

Thank you.

> Pavel Begunkov (27):
>   sock: deduplicate ->sk_wmem_alloc check
>   sock: optimise sock_def_write_space send refcounting
>   sock: optimise sock_def_write_space barriers
>   skbuff: drop zero check from skb_zcopy_set
>   skbuff: drop null check from skb_zcopy
>   net: xen: set zc flags only when there is ubuf
>   skbuff: introduce skb_is_zcopy()
>   skbuff: optimise alloc_skb_with_frags()
>   net: inline sock_alloc_send_skb
>   net: inline part of skb_csum_hwoffload_help
>   net: inline skb_zerocopy_iter_dgram
>   ipv6: inline ip6_local_out()
>   ipv6: help __ip6_finish_output() inlining
>   ipv6: refactor ip6_finish_output2()
>   net: inline dev_queue_xmit()
>   ipv6: partially inline fl6_update_dst()
>   tcp: optimise skb_zerocopy_iter_stream()
>   net: optimise ipcm6 cookie init
>   udp/ipv6: refactor udpv6_sendmsg udplite checks
>   udp/ipv6: move pending section of udpv6_sendmsg
>   udp/ipv6: prioritise the ip6 path over ip4 checks
>   udp/ipv6: optimise udpv6_sendmsg() daddr checks
>   udp/ipv6: optimise out daddr reassignment
>   udp/ipv6: clean up udpv6_sendmsg's saddr init
>   ipv6: refactor opts push in __ip6_make_skb()
>   ipv6: improve opt-less __ip6_make_skb()
>   ipv6: clean up ip6_setup_cork
>
>  drivers/net/xen-netback/interface.c |   3 +-
>  include/linux/netdevice.h           |  27 ++++-
>  include/linux/skbuff.h              | 102 +++++++++++++-----
>  include/net/ipv6.h                  |  37 ++++---
>  include/net/sock.h                  |  10 +-
>  net/core/datagram.c                 |   2 -
>  net/core/datagram.h                 |  15 ---
>  net/core/dev.c                      |  28 ++---
>  net/core/skbuff.c                   |  59 ++++-------
>  net/core/sock.c                     |  50 +++++++--
>  net/ipv4/ip_output.c                |  10 +-
>  net/ipv4/tcp.c                      |   5 +-
>  net/ipv6/datagram.c                 |   4 +-
>  net/ipv6/exthdrs.c                  |  15 ++-
>  net/ipv6/ip6_output.c               |  88 ++++++++--------
>  net/ipv6/output_core.c              |  12 ---
>  net/ipv6/raw.c                      |   8 +-
>  net/ipv6/udp.c                      | 158 +++++++++++++---------------
>  net/l2tp/l2tp_ip6.c                 |   8 +-
>  19 files changed, 339 insertions(+), 302 deletions(-)
>  delete mode 100644 net/core/datagram.h
>
> --
> 2.35.1
>
Pavel Begunkov April 11, 2022, 12:04 p.m. UTC | #2
On 4/6/22 10:44, Eric Dumazet wrote:
> On Sun, Apr 3, 2022 at 6:08 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> A mix of various net optimisations, which were mostly discovered during UDP
>> testing. Benchmarked with an io_uring test using 16B UDP/IPv6 over dummy netdev:
>> 2090K vs 2229K tx/s, +6.6%, or in a 4-8% range if not averaging across reboots.
>>
>> 1-3 removes extra atomics and barriers from sock_wfree() mainly benefitting UDP.
>> 4-7 cleans up some zerocopy helpers
>> 8-16 do inlining of ipv6 and generic net pathes
>> 17 is a small nice performance improvement for TCP zerocopy
>> 18-27 refactors UDP to shed some more overhead
>>

> Please send a smaller series first.

Apologies for delays. Ok, I'll split it.

> About inlining everything around, make sure to include performance
> numbers only for the inline parts.
> We can inline everything and make the kernel 4 x time bigger.
> Synthetic benchmarks will still show improvements but in overall, we
> add icache cost that is going to hurt latencies.

I do care about kernel bloating, but I think we can agree that for most
patches inlining is safe. There are 6 such patches (9-12,15,16). Three
of them (9,11,15) only do simple redirecting to another function
skb_csum_hwoffload_help() in 10 has only two callers. I think we can
agree that they're safe to inline.

That leaves ip6_local_out() with ~8 callers and used quite heavily. And
fl6_update_dst() with ~12 users, I don't have exact data but it appears
not everybody uses ip6 options and so the function does nothing. At
least that's true for UDP cases I care about. I think it's justified
to be inlined. Would you prefer these two to be removed?


> I vote that you focus on the other parts first.
> 
> Thank you.
> 
>> Pavel Begunkov (27):
>>    sock: deduplicate ->sk_wmem_alloc check
>>    sock: optimise sock_def_write_space send refcounting
>>    sock: optimise sock_def_write_space barriers
>>    skbuff: drop zero check from skb_zcopy_set
>>    skbuff: drop null check from skb_zcopy
>>    net: xen: set zc flags only when there is ubuf
>>    skbuff: introduce skb_is_zcopy()
>>    skbuff: optimise alloc_skb_with_frags()
>>    net: inline sock_alloc_send_skb
>>    net: inline part of skb_csum_hwoffload_help
>>    net: inline skb_zerocopy_iter_dgram
>>    ipv6: inline ip6_local_out()
>>    ipv6: help __ip6_finish_output() inlining
>>    ipv6: refactor ip6_finish_output2()
>>    net: inline dev_queue_xmit()
>>    ipv6: partially inline fl6_update_dst()
>>    tcp: optimise skb_zerocopy_iter_stream()
>>    net: optimise ipcm6 cookie init
>>    udp/ipv6: refactor udpv6_sendmsg udplite checks
>>    udp/ipv6: move pending section of udpv6_sendmsg
>>    udp/ipv6: prioritise the ip6 path over ip4 checks
>>    udp/ipv6: optimise udpv6_sendmsg() daddr checks
>>    udp/ipv6: optimise out daddr reassignment
>>    udp/ipv6: clean up udpv6_sendmsg's saddr init
>>    ipv6: refactor opts push in __ip6_make_skb()
>>    ipv6: improve opt-less __ip6_make_skb()
>>    ipv6: clean up ip6_setup_cork
>>
>>   drivers/net/xen-netback/interface.c |   3 +-
>>   include/linux/netdevice.h           |  27 ++++-
>>   include/linux/skbuff.h              | 102 +++++++++++++-----
>>   include/net/ipv6.h                  |  37 ++++---
>>   include/net/sock.h                  |  10 +-
>>   net/core/datagram.c                 |   2 -
>>   net/core/datagram.h                 |  15 ---
>>   net/core/dev.c                      |  28 ++---
>>   net/core/skbuff.c                   |  59 ++++-------
>>   net/core/sock.c                     |  50 +++++++--
>>   net/ipv4/ip_output.c                |  10 +-
>>   net/ipv4/tcp.c                      |   5 +-
>>   net/ipv6/datagram.c                 |   4 +-
>>   net/ipv6/exthdrs.c                  |  15 ++-
>>   net/ipv6/ip6_output.c               |  88 ++++++++--------
>>   net/ipv6/output_core.c              |  12 ---
>>   net/ipv6/raw.c                      |   8 +-
>>   net/ipv6/udp.c                      | 158 +++++++++++++---------------
>>   net/l2tp/l2tp_ip6.c                 |   8 +-
>>   19 files changed, 339 insertions(+), 302 deletions(-)
>>   delete mode 100644 net/core/datagram.h
>>
>> --
>> 2.35.1
>>