Message ID | 20241231055207.9208-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: tcp: Define tcp_listendrop() as noinline | expand |
On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons, > such as: > - A full SYN backlog queue > - SYN flood attacks > - Memory exhaustion > - Other resource constraints > > Recently, one of our services experienced frequent ListenDrops. While > attempting to trace the root cause, we discovered that tracing the function > tcp_listendrop() was not straightforward because it is currently inlined. > To overcome this, we had to create a livepatch that defined a non-inlined > version of the function, which we then traced using BPF programs. > > $ grep tcp_listendrop /proc/kallsyms > ffffffffc093fba0 t tcp_listendrop_x [livepatch_tmp] > > Through this process, we eventually determined that the ListenDrops were > caused by SYN attacks. > > Since tcp_listendrop() is not part of the critical path, defining it as > noinline would make it significantly easier to trace and diagnose issues > without requiring workarounds such as livepatching. This function can be > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL(). > > After that change, the result is as follows, > > $ grep tcp_listendrop /proc/kallsyms > ffffffffb718eaa0 T __pfx_tcp_listendrop > ffffffffb718eab0 T tcp_listendrop > ffffffffb7e636b8 r __ksymtab_tcp_listendrop > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > -- Are we going to accept one patch at a time to un-inline all TCP related functions in the kernel ? My understanding is that current tools work fine. You may need to upgrade yours. # perf probe -a tcp_listendrop Added new events: probe:tcp_listendrop (on tcp_listendrop) probe:tcp_listendrop (on tcp_listendrop) probe:tcp_listendrop (on tcp_listendrop) probe:tcp_listendrop (on tcp_listendrop) probe:tcp_listendrop (on tcp_listendrop) probe:tcp_listendrop (on tcp_listendrop) probe:tcp_listendrop (on tcp_listendrop) probe:tcp_listendrop (on tcp_listendrop) You can now use it in all perf tools, such as: perf record -e probe:tcp_listendrop -aR sleep 1
On Tue, Dec 31, 2024 at 5:26 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons, > > such as: > > - A full SYN backlog queue > > - SYN flood attacks > > - Memory exhaustion > > - Other resource constraints > > > > Recently, one of our services experienced frequent ListenDrops. While > > attempting to trace the root cause, we discovered that tracing the function > > tcp_listendrop() was not straightforward because it is currently inlined. > > To overcome this, we had to create a livepatch that defined a non-inlined > > version of the function, which we then traced using BPF programs. > > > > $ grep tcp_listendrop /proc/kallsyms > > ffffffffc093fba0 t tcp_listendrop_x [livepatch_tmp] > > > > Through this process, we eventually determined that the ListenDrops were > > caused by SYN attacks. > > > > Since tcp_listendrop() is not part of the critical path, defining it as > > noinline would make it significantly easier to trace and diagnose issues > > without requiring workarounds such as livepatching. This function can be > > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL(). > > > > After that change, the result is as follows, > > > > $ grep tcp_listendrop /proc/kallsyms > > ffffffffb718eaa0 T __pfx_tcp_listendrop > > ffffffffb718eab0 T tcp_listendrop > > ffffffffb7e636b8 r __ksymtab_tcp_listendrop > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > -- > > Are we going to accept one patch at a time to un-inline all TCP > related functions in the kernel ? I don't have an in-depth understanding of the TCP/IP stack, so I can't consolidate all related TCP functions in the error paths into a single patch. I would greatly appreciate it if you could help ensure these functions are marked as non-inlined based on your expertise. If you don't have time to do it directly, could you provide some guidance? The inlining of TCP functions in error paths often complicates tracing efforts. For instance, we recently encountered issues with the inlined tcp_write_err(), although we were fortunate to have an alternative tracepoint available: inet_sk_error_report. Unfortunately, no such alternative exists for tcp_listendrop(), which is why I submitted this patch. > > My understanding is that current tools work fine. You may need to upgrade yours. > > # perf probe -a tcp_listendrop > Added new events: > probe:tcp_listendrop (on tcp_listendrop) > probe:tcp_listendrop (on tcp_listendrop) > probe:tcp_listendrop (on tcp_listendrop) > probe:tcp_listendrop (on tcp_listendrop) > probe:tcp_listendrop (on tcp_listendrop) > probe:tcp_listendrop (on tcp_listendrop) > probe:tcp_listendrop (on tcp_listendrop) > probe:tcp_listendrop (on tcp_listendrop) > > You can now use it in all perf tools, such as: > > perf record -e probe:tcp_listendrop -aR sleep 1 The issue is that we can't extract the struct `sock *sk` from tcp_listendrop() using perf. Please correct me if I’m mistaken. In a containerized environment, it's common to have many pods running on a single host, each assigned a unique IP. Our goal is to filter for a specific local_ip:local_port combination while also capturing the corresponding remote IP. This task is straightforward with bpftrace or other BPF-based tools, but I’m unsure how to accomplish it using perf.
On Tue, Dec 31, 2024 at 12:49 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Tue, Dec 31, 2024 at 5:26 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons, > > > such as: > > > - A full SYN backlog queue > > > - SYN flood attacks > > > - Memory exhaustion > > > - Other resource constraints > > > > > > Recently, one of our services experienced frequent ListenDrops. While > > > attempting to trace the root cause, we discovered that tracing the function > > > tcp_listendrop() was not straightforward because it is currently inlined. > > > To overcome this, we had to create a livepatch that defined a non-inlined > > > version of the function, which we then traced using BPF programs. > > > > > > $ grep tcp_listendrop /proc/kallsyms > > > ffffffffc093fba0 t tcp_listendrop_x [livepatch_tmp] > > > > > > Through this process, we eventually determined that the ListenDrops were > > > caused by SYN attacks. > > > > > > Since tcp_listendrop() is not part of the critical path, defining it as > > > noinline would make it significantly easier to trace and diagnose issues > > > without requiring workarounds such as livepatching. This function can be > > > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL(). > > > > > > After that change, the result is as follows, > > > > > > $ grep tcp_listendrop /proc/kallsyms > > > ffffffffb718eaa0 T __pfx_tcp_listendrop > > > ffffffffb718eab0 T tcp_listendrop > > > ffffffffb7e636b8 r __ksymtab_tcp_listendrop > > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > -- > > > > Are we going to accept one patch at a time to un-inline all TCP > > related functions in the kernel ? > > I don't have an in-depth understanding of the TCP/IP stack, so I can't > consolidate all related TCP functions in the error paths into a single > patch. I would greatly appreciate it if you could help ensure these > functions are marked as non-inlined based on your expertise. If you > don't have time to do it directly, could you provide some guidance? > > The inlining of TCP functions in error paths often complicates tracing > efforts. For instance, we recently encountered issues with the inlined > tcp_write_err(), although we were fortunate to have an alternative > tracepoint available: inet_sk_error_report. > > Unfortunately, no such alternative exists for tcp_listendrop(), which > is why I submitted this patch. > > > > > My understanding is that current tools work fine. You may need to upgrade yours. > > > > # perf probe -a tcp_listendrop > > Added new events: > > probe:tcp_listendrop (on tcp_listendrop) > > probe:tcp_listendrop (on tcp_listendrop) > > probe:tcp_listendrop (on tcp_listendrop) > > probe:tcp_listendrop (on tcp_listendrop) > > probe:tcp_listendrop (on tcp_listendrop) > > probe:tcp_listendrop (on tcp_listendrop) > > probe:tcp_listendrop (on tcp_listendrop) > > probe:tcp_listendrop (on tcp_listendrop) > > > > You can now use it in all perf tools, such as: > > > > perf record -e probe:tcp_listendrop -aR sleep 1 > > The issue is that we can't extract the struct `sock *sk` from > tcp_listendrop() using perf. Please correct me if I’m mistaken. > > In a containerized environment, it's common to have many pods running > on a single host, each assigned a unique IP. Our goal is to filter for > a specific local_ip:local_port combination while also capturing the > corresponding remote IP. This task is straightforward with bpftrace or > other BPF-based tools, but I’m unsure how to accomplish it using perf. I suggest you ask tracing experts' opinions, why is perf able to add a probe, but not bpftrace ? Again, accepting thousands of patches just because of tracing tools seems not good to me.
On Tue, Dec 31, 2024 at 8:34 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Dec 31, 2024 at 12:49 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Tue, Dec 31, 2024 at 5:26 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons, > > > > such as: > > > > - A full SYN backlog queue > > > > - SYN flood attacks > > > > - Memory exhaustion > > > > - Other resource constraints > > > > > > > > Recently, one of our services experienced frequent ListenDrops. While > > > > attempting to trace the root cause, we discovered that tracing the function > > > > tcp_listendrop() was not straightforward because it is currently inlined. > > > > To overcome this, we had to create a livepatch that defined a non-inlined > > > > version of the function, which we then traced using BPF programs. > > > > > > > > $ grep tcp_listendrop /proc/kallsyms > > > > ffffffffc093fba0 t tcp_listendrop_x [livepatch_tmp] > > > > > > > > Through this process, we eventually determined that the ListenDrops were > > > > caused by SYN attacks. > > > > > > > > Since tcp_listendrop() is not part of the critical path, defining it as > > > > noinline would make it significantly easier to trace and diagnose issues > > > > without requiring workarounds such as livepatching. This function can be > > > > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL(). > > > > > > > > After that change, the result is as follows, > > > > > > > > $ grep tcp_listendrop /proc/kallsyms > > > > ffffffffb718eaa0 T __pfx_tcp_listendrop > > > > ffffffffb718eab0 T tcp_listendrop > > > > ffffffffb7e636b8 r __ksymtab_tcp_listendrop > > > > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > > -- > > > > > > Are we going to accept one patch at a time to un-inline all TCP > > > related functions in the kernel ? > > > > I don't have an in-depth understanding of the TCP/IP stack, so I can't > > consolidate all related TCP functions in the error paths into a single > > patch. I would greatly appreciate it if you could help ensure these > > functions are marked as non-inlined based on your expertise. If you > > don't have time to do it directly, could you provide some guidance? > > > > The inlining of TCP functions in error paths often complicates tracing > > efforts. For instance, we recently encountered issues with the inlined > > tcp_write_err(), although we were fortunate to have an alternative > > tracepoint available: inet_sk_error_report. > > > > Unfortunately, no such alternative exists for tcp_listendrop(), which > > is why I submitted this patch. > > > > > > > > My understanding is that current tools work fine. You may need to upgrade yours. > > > > > > # perf probe -a tcp_listendrop > > > Added new events: > > > probe:tcp_listendrop (on tcp_listendrop) > > > probe:tcp_listendrop (on tcp_listendrop) > > > probe:tcp_listendrop (on tcp_listendrop) > > > probe:tcp_listendrop (on tcp_listendrop) > > > probe:tcp_listendrop (on tcp_listendrop) > > > probe:tcp_listendrop (on tcp_listendrop) > > > probe:tcp_listendrop (on tcp_listendrop) > > > probe:tcp_listendrop (on tcp_listendrop) > > > > > > You can now use it in all perf tools, such as: > > > > > > perf record -e probe:tcp_listendrop -aR sleep 1 > > > > The issue is that we can't extract the struct `sock *sk` from > > tcp_listendrop() using perf. Please correct me if I’m mistaken. > > > > In a containerized environment, it's common to have many pods running > > on a single host, each assigned a unique IP. Our goal is to filter for > > a specific local_ip:local_port combination while also capturing the > > corresponding remote IP. This task is straightforward with bpftrace or > > other BPF-based tools, but I’m unsure how to accomplish it using perf. > > I suggest you ask tracing experts' opinions, why is perf able to add a > probe, but not bpftrace ? > > Again, accepting thousands of patches just because of tracing tools > seems not good to me. Thank you for your guidance on using perf. I now understand how to achieve similar tracing with bpftrace. I agree with your point that it’s not ideal to send thousands of patches solely for the sake of tracing tools. It would indeed be much better if this could be addressed in a single patch.
diff --git a/include/net/tcp.h b/include/net/tcp.h index e9b37b76e894..65e6357e893b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2537,18 +2537,7 @@ static inline void tcp_segs_in(struct tcp_sock *tp, const struct sk_buff *skb) WRITE_ONCE(tp->data_segs_in, tp->data_segs_in + segs_in); } -/* - * TCP listen path runs lockless. - * We forced "struct sock" to be const qualified to make sure - * we don't modify one of its field by mistake. - * Here, we increment sk_drops which is an atomic_t, so we can safely - * make sock writable again. - */ -static inline void tcp_listendrop(const struct sock *sk) -{ - atomic_inc(&((struct sock *)sk)->sk_drops); - __NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENDROPS); -} +void tcp_listendrop(const struct sock *sk); enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5bdf13ac26ef..0fcea815860b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -7174,6 +7174,20 @@ u16 tcp_get_syncookie_mss(struct request_sock_ops *rsk_ops, } EXPORT_SYMBOL_GPL(tcp_get_syncookie_mss); +/* + * TCP listen path runs lockless. + * We forced "struct sock" to be const qualified to make sure + * we don't modify one of its field by mistake. + * Here, we increment sk_drops which is an atomic_t, so we can safely + * make sock writable again. + */ +void tcp_listendrop(const struct sock *sk) +{ + atomic_inc(&((struct sock *)sk)->sk_drops); + __NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENDROPS); +} +EXPORT_SYMBOL_GPL(tcp_listendrop); + int tcp_conn_request(struct request_sock_ops *rsk_ops, const struct tcp_request_sock_ops *af_ops, struct sock *sk, struct sk_buff *skb)
The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons, such as: - A full SYN backlog queue - SYN flood attacks - Memory exhaustion - Other resource constraints Recently, one of our services experienced frequent ListenDrops. While attempting to trace the root cause, we discovered that tracing the function tcp_listendrop() was not straightforward because it is currently inlined. To overcome this, we had to create a livepatch that defined a non-inlined version of the function, which we then traced using BPF programs. $ grep tcp_listendrop /proc/kallsyms ffffffffc093fba0 t tcp_listendrop_x [livepatch_tmp] Through this process, we eventually determined that the ListenDrops were caused by SYN attacks. Since tcp_listendrop() is not part of the critical path, defining it as noinline would make it significantly easier to trace and diagnose issues without requiring workarounds such as livepatching. This function can be used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL(). After that change, the result is as follows, $ grep tcp_listendrop /proc/kallsyms ffffffffb718eaa0 T __pfx_tcp_listendrop ffffffffb718eab0 T tcp_listendrop ffffffffb7e636b8 r __ksymtab_tcp_listendrop Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- include/net/tcp.h | 13 +------------ net/ipv4/tcp_input.c | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 12 deletions(-)