Message ID | 20250224-tcpsendmsg-v1-1-bac043c59cc8@debian.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] trace: tcp: Add tracepoint for tcp_sendmsg() | expand |
On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: > > Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling > the tracing of TCP messages being sent. > > Meta has been using BPF programs to monitor this function for years, > indicating significant interest in observing this important > functionality. Adding a proper tracepoint provides a stable API for all > users who need visibility into TCP message transmission. > > The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > creating unnecessary trace event infrastructure and tracefs exports, > keeping the implementation minimal while stabilizing the API. > > Given that this patch creates a rawtracepoint, you could hook into it > using regular tooling, like bpftrace, using regular rawtracepoint > infrastructure, such as: > > rawtracepoint:tcp_sendmsg_tp { > .... > } I would expect tcp_sendmsg() being stable enough ? kprobe:tcp_sendmsg { }
> ________________________________________ > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: >> >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling >> the tracing of TCP messages being sent. >> >> Meta has been using BPF programs to monitor this function for years, >> indicating significant interest in observing this important >> functionality. Adding a proper tracepoint provides a stable API for all >> users who need visibility into TCP message transmission. >> >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid >> creating unnecessary trace event infrastructure and tracefs exports, >> keeping the implementation minimal while stabilizing the API. >> >> Given that this patch creates a rawtracepoint, you could hook into it >> using regular tooling, like bpftrace, using regular rawtracepoint >> infrastructure, such as: >> >> rawtracepoint:tcp_sendmsg_tp { >> .... >> } > > I would expect tcp_sendmsg() being stable enough ? > > kprobe:tcp_sendmsg { > } In LTO mode, tcp_sendmsg could be inlined cross files. For example, net/ipv4/tcp.c: int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) net/ipv4/tcp_bpf.c: ... return tcp_sendmsg(sk, msg, size); net/ipv6/af_inet6.c: ... return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...) And this does happen in our production environment.
On 2/24/25 12:03 PM, Eric Dumazet wrote: > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: >> >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling >> the tracing of TCP messages being sent. >> >> Meta has been using BPF programs to monitor this function for years, >> indicating significant interest in observing this important >> functionality. Adding a proper tracepoint provides a stable API for all >> users who need visibility into TCP message transmission. >> >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid >> creating unnecessary trace event infrastructure and tracefs exports, >> keeping the implementation minimal while stabilizing the API. >> >> Given that this patch creates a rawtracepoint, you could hook into it >> using regular tooling, like bpftrace, using regular rawtracepoint >> infrastructure, such as: >> >> rawtracepoint:tcp_sendmsg_tp { >> .... >> } > > I would expect tcp_sendmsg() being stable enough ? > > kprobe:tcp_sendmsg { > } Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover more use cases (see kernel references to it). We have a patch for a couple years now with a tracepoint inside the while (msg_data_left(msg)) { } loop which is more useful than just entry to sendmsg.
On Mon, Feb 24, 2025 at 8:13 PM Yonghong Song <yhs@meta.com> wrote: > > > ________________________________________ > > > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: > >> > >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling > >> the tracing of TCP messages being sent. > >> > >> Meta has been using BPF programs to monitor this function for years, > >> indicating significant interest in observing this important > >> functionality. Adding a proper tracepoint provides a stable API for all > >> users who need visibility into TCP message transmission. > >> > >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > >> creating unnecessary trace event infrastructure and tracefs exports, > >> keeping the implementation minimal while stabilizing the API. > >> > >> Given that this patch creates a rawtracepoint, you could hook into it > >> using regular tooling, like bpftrace, using regular rawtracepoint > >> infrastructure, such as: > >> > >> rawtracepoint:tcp_sendmsg_tp { > >> .... > >> } > > > > I would expect tcp_sendmsg() being stable enough ? > > > > kprobe:tcp_sendmsg { > > } > > In LTO mode, tcp_sendmsg could be inlined cross files. For example, > > net/ipv4/tcp.c: > int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > net/ipv4/tcp_bpf.c: > ... > return tcp_sendmsg(sk, msg, size); > net/ipv6/af_inet6.c: > ... > return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...) > > And this does happen in our production environment. And we do not have a way to make the kprobe work even if LTO decided to inline a function ? This seems like a tracing or LTO issue, this could be addressed there in a generic way and avoid many other patches to work around this.
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index 1a40c41ff8c30..7c0171d2dacdc 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -259,6 +259,11 @@ TRACE_EVENT(tcp_retransmit_synack, __entry->saddr_v6, __entry->daddr_v6) ); +DECLARE_TRACE(tcp_sendmsg_tp, + TP_PROTO(const struct sock *sk, const struct msghdr *msg, size_t size), + TP_ARGS(sk, msg, size) +); + DECLARE_TRACE(tcp_cwnd_reduction_tp, TP_PROTO(const struct sock *sk, int newly_acked_sacked, int newly_lost, int flag), diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 08d73f17e8162..5ef86fbd8aa85 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1362,6 +1362,8 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { int ret; + trace_tcp_sendmsg_tp(sk, msg, size); + lock_sock(sk); ret = tcp_sendmsg_locked(sk, msg, size); release_sock(sk);
Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling the tracing of TCP messages being sent. Meta has been using BPF programs to monitor this function for years, indicating significant interest in observing this important functionality. Adding a proper tracepoint provides a stable API for all users who need visibility into TCP message transmission. The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid creating unnecessary trace event infrastructure and tracefs exports, keeping the implementation minimal while stabilizing the API. Given that this patch creates a rawtracepoint, you could hook into it using regular tooling, like bpftrace, using regular rawtracepoint infrastructure, such as: rawtracepoint:tcp_sendmsg_tp { .... } Signed-off-by: Breno Leitao <leitao@debian.org> --- include/trace/events/tcp.h | 5 +++++ net/ipv4/tcp.c | 2 ++ 2 files changed, 7 insertions(+) --- base-commit: e13b6da7045f997e1a5a5efd61d40e63c4fc20e8 change-id: 20250224-tcpsendmsg-4f0a236751d7 Best regards,