Message ID | cover.1648981570.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net and/or udp optimisations | expand |
On Sun, Apr 3, 2022 at 6:08 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > > A mix of various net optimisations, which were mostly discovered during UDP > testing. Benchmarked with an io_uring test using 16B UDP/IPv6 over dummy netdev: > 2090K vs 2229K tx/s, +6.6%, or in a 4-8% range if not averaging across reboots. > > 1-3 removes extra atomics and barriers from sock_wfree() mainly benefitting UDP. > 4-7 cleans up some zerocopy helpers > 8-16 do inlining of ipv6 and generic net pathes > 17 is a small nice performance improvement for TCP zerocopy > 18-27 refactors UDP to shed some more overhead > Please send a smaller series first. About inlining everything around, make sure to include performance numbers only for the inline parts. We can inline everything and make the kernel 4 x time bigger. Synthetic benchmarks will still show improvements but in overall, we add icache cost that is going to hurt latencies. I vote that you focus on the other parts first. Thank you. > Pavel Begunkov (27): > sock: deduplicate ->sk_wmem_alloc check > sock: optimise sock_def_write_space send refcounting > sock: optimise sock_def_write_space barriers > skbuff: drop zero check from skb_zcopy_set > skbuff: drop null check from skb_zcopy > net: xen: set zc flags only when there is ubuf > skbuff: introduce skb_is_zcopy() > skbuff: optimise alloc_skb_with_frags() > net: inline sock_alloc_send_skb > net: inline part of skb_csum_hwoffload_help > net: inline skb_zerocopy_iter_dgram > ipv6: inline ip6_local_out() > ipv6: help __ip6_finish_output() inlining > ipv6: refactor ip6_finish_output2() > net: inline dev_queue_xmit() > ipv6: partially inline fl6_update_dst() > tcp: optimise skb_zerocopy_iter_stream() > net: optimise ipcm6 cookie init > udp/ipv6: refactor udpv6_sendmsg udplite checks > udp/ipv6: move pending section of udpv6_sendmsg > udp/ipv6: prioritise the ip6 path over ip4 checks > udp/ipv6: optimise udpv6_sendmsg() daddr checks > udp/ipv6: optimise out daddr reassignment > udp/ipv6: clean up udpv6_sendmsg's saddr init > ipv6: refactor opts push in __ip6_make_skb() > ipv6: improve opt-less __ip6_make_skb() > ipv6: clean up ip6_setup_cork > > drivers/net/xen-netback/interface.c | 3 +- > include/linux/netdevice.h | 27 ++++- > include/linux/skbuff.h | 102 +++++++++++++----- > include/net/ipv6.h | 37 ++++--- > include/net/sock.h | 10 +- > net/core/datagram.c | 2 - > net/core/datagram.h | 15 --- > net/core/dev.c | 28 ++--- > net/core/skbuff.c | 59 ++++------- > net/core/sock.c | 50 +++++++-- > net/ipv4/ip_output.c | 10 +- > net/ipv4/tcp.c | 5 +- > net/ipv6/datagram.c | 4 +- > net/ipv6/exthdrs.c | 15 ++- > net/ipv6/ip6_output.c | 88 ++++++++-------- > net/ipv6/output_core.c | 12 --- > net/ipv6/raw.c | 8 +- > net/ipv6/udp.c | 158 +++++++++++++--------------- > net/l2tp/l2tp_ip6.c | 8 +- > 19 files changed, 339 insertions(+), 302 deletions(-) > delete mode 100644 net/core/datagram.h > > -- > 2.35.1 >
On 4/6/22 10:44, Eric Dumazet wrote: > On Sun, Apr 3, 2022 at 6:08 AM Pavel Begunkov <asml.silence@gmail.com> wrote: >> >> A mix of various net optimisations, which were mostly discovered during UDP >> testing. Benchmarked with an io_uring test using 16B UDP/IPv6 over dummy netdev: >> 2090K vs 2229K tx/s, +6.6%, or in a 4-8% range if not averaging across reboots. >> >> 1-3 removes extra atomics and barriers from sock_wfree() mainly benefitting UDP. >> 4-7 cleans up some zerocopy helpers >> 8-16 do inlining of ipv6 and generic net pathes >> 17 is a small nice performance improvement for TCP zerocopy >> 18-27 refactors UDP to shed some more overhead >> > Please send a smaller series first. Apologies for delays. Ok, I'll split it. > About inlining everything around, make sure to include performance > numbers only for the inline parts. > We can inline everything and make the kernel 4 x time bigger. > Synthetic benchmarks will still show improvements but in overall, we > add icache cost that is going to hurt latencies. I do care about kernel bloating, but I think we can agree that for most patches inlining is safe. There are 6 such patches (9-12,15,16). Three of them (9,11,15) only do simple redirecting to another function skb_csum_hwoffload_help() in 10 has only two callers. I think we can agree that they're safe to inline. That leaves ip6_local_out() with ~8 callers and used quite heavily. And fl6_update_dst() with ~12 users, I don't have exact data but it appears not everybody uses ip6 options and so the function does nothing. At least that's true for UDP cases I care about. I think it's justified to be inlined. Would you prefer these two to be removed? > I vote that you focus on the other parts first. > > Thank you. > >> Pavel Begunkov (27): >> sock: deduplicate ->sk_wmem_alloc check >> sock: optimise sock_def_write_space send refcounting >> sock: optimise sock_def_write_space barriers >> skbuff: drop zero check from skb_zcopy_set >> skbuff: drop null check from skb_zcopy >> net: xen: set zc flags only when there is ubuf >> skbuff: introduce skb_is_zcopy() >> skbuff: optimise alloc_skb_with_frags() >> net: inline sock_alloc_send_skb >> net: inline part of skb_csum_hwoffload_help >> net: inline skb_zerocopy_iter_dgram >> ipv6: inline ip6_local_out() >> ipv6: help __ip6_finish_output() inlining >> ipv6: refactor ip6_finish_output2() >> net: inline dev_queue_xmit() >> ipv6: partially inline fl6_update_dst() >> tcp: optimise skb_zerocopy_iter_stream() >> net: optimise ipcm6 cookie init >> udp/ipv6: refactor udpv6_sendmsg udplite checks >> udp/ipv6: move pending section of udpv6_sendmsg >> udp/ipv6: prioritise the ip6 path over ip4 checks >> udp/ipv6: optimise udpv6_sendmsg() daddr checks >> udp/ipv6: optimise out daddr reassignment >> udp/ipv6: clean up udpv6_sendmsg's saddr init >> ipv6: refactor opts push in __ip6_make_skb() >> ipv6: improve opt-less __ip6_make_skb() >> ipv6: clean up ip6_setup_cork >> >> drivers/net/xen-netback/interface.c | 3 +- >> include/linux/netdevice.h | 27 ++++- >> include/linux/skbuff.h | 102 +++++++++++++----- >> include/net/ipv6.h | 37 ++++--- >> include/net/sock.h | 10 +- >> net/core/datagram.c | 2 - >> net/core/datagram.h | 15 --- >> net/core/dev.c | 28 ++--- >> net/core/skbuff.c | 59 ++++------- >> net/core/sock.c | 50 +++++++-- >> net/ipv4/ip_output.c | 10 +- >> net/ipv4/tcp.c | 5 +- >> net/ipv6/datagram.c | 4 +- >> net/ipv6/exthdrs.c | 15 ++- >> net/ipv6/ip6_output.c | 88 ++++++++-------- >> net/ipv6/output_core.c | 12 --- >> net/ipv6/raw.c | 8 +- >> net/ipv6/udp.c | 158 +++++++++++++--------------- >> net/l2tp/l2tp_ip6.c | 8 +- >> 19 files changed, 339 insertions(+), 302 deletions(-) >> delete mode 100644 net/core/datagram.h >> >> -- >> 2.35.1 >>