diff mbox series

[net-next] trace: tcp: Add tracepoint for tcp_sendmsg()

Message ID 20250224-tcpsendmsg-v1-1-bac043c59cc8@debian.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] trace: tcp: Add tracepoint for tcp_sendmsg() | expand

Commit Message

Breno Leitao Feb. 24, 2025, 6:24 p.m. UTC
Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
the tracing of TCP messages being sent.

Meta has been using BPF programs to monitor this function for years,
indicating significant interest in observing this important
functionality. Adding a proper tracepoint provides a stable API for all
users who need visibility into TCP message transmission.

The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
creating unnecessary trace event infrastructure and tracefs exports,
keeping the implementation minimal while stabilizing the API.

Given that this patch creates a rawtracepoint, you could hook into it
using regular tooling, like bpftrace, using regular rawtracepoint
infrastructure, such as:

	rawtracepoint:tcp_sendmsg_tp {
		....
	}

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/trace/events/tcp.h | 5 +++++
 net/ipv4/tcp.c             | 2 ++
 2 files changed, 7 insertions(+)


---
base-commit: e13b6da7045f997e1a5a5efd61d40e63c4fc20e8
change-id: 20250224-tcpsendmsg-4f0a236751d7

Best regards,

Comments

Eric Dumazet Feb. 24, 2025, 7:03 p.m. UTC | #1
On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
>
> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
> the tracing of TCP messages being sent.
>
> Meta has been using BPF programs to monitor this function for years,
> indicating significant interest in observing this important
> functionality. Adding a proper tracepoint provides a stable API for all
> users who need visibility into TCP message transmission.
>
> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
> creating unnecessary trace event infrastructure and tracefs exports,
> keeping the implementation minimal while stabilizing the API.
>
> Given that this patch creates a rawtracepoint, you could hook into it
> using regular tooling, like bpftrace, using regular rawtracepoint
> infrastructure, such as:
>
>         rawtracepoint:tcp_sendmsg_tp {
>                 ....
>         }

I would expect tcp_sendmsg() being stable enough ?

kprobe:tcp_sendmsg {
}
Yonghong Song Feb. 24, 2025, 7:12 p.m. UTC | #2
> ________________________________________
>
> On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
>>
>> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
>> the tracing of TCP messages being sent.
>>
>> Meta has been using BPF programs to monitor this function for years,
>> indicating significant interest in observing this important
>> functionality. Adding a proper tracepoint provides a stable API for all
>> users who need visibility into TCP message transmission.
>>
>> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
>> creating unnecessary trace event infrastructure and tracefs exports,
>> keeping the implementation minimal while stabilizing the API.
>>
>> Given that this patch creates a rawtracepoint, you could hook into it
>> using regular tooling, like bpftrace, using regular rawtracepoint
>> infrastructure, such as:
>>
>>         rawtracepoint:tcp_sendmsg_tp {
>>                 ....
>>         }
>
> I would expect tcp_sendmsg() being stable enough ?
>
> kprobe:tcp_sendmsg {
> }

In LTO mode, tcp_sendmsg could be inlined cross files. For example,

  net/ipv4/tcp.c: 
       int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
  net/ipv4/tcp_bpf.c:
       ...
      return tcp_sendmsg(sk, msg, size);
  net/ipv6/af_inet6.c:
       ...
       return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...)

And this does happen in our production environment.
David Ahern Feb. 24, 2025, 7:16 p.m. UTC | #3
On 2/24/25 12:03 PM, Eric Dumazet wrote:
> On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
>>
>> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
>> the tracing of TCP messages being sent.
>>
>> Meta has been using BPF programs to monitor this function for years,
>> indicating significant interest in observing this important
>> functionality. Adding a proper tracepoint provides a stable API for all
>> users who need visibility into TCP message transmission.
>>
>> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
>> creating unnecessary trace event infrastructure and tracefs exports,
>> keeping the implementation minimal while stabilizing the API.
>>
>> Given that this patch creates a rawtracepoint, you could hook into it
>> using regular tooling, like bpftrace, using regular rawtracepoint
>> infrastructure, such as:
>>
>>         rawtracepoint:tcp_sendmsg_tp {
>>                 ....
>>         }
> 
> I would expect tcp_sendmsg() being stable enough ?
> 
> kprobe:tcp_sendmsg {
> }

Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover
more use cases (see kernel references to it).

We have a patch for a couple years now with a tracepoint inside the

while (msg_data_left(msg)) {
}

loop which is more useful than just entry to sendmsg.
Eric Dumazet Feb. 24, 2025, 7:23 p.m. UTC | #4
On Mon, Feb 24, 2025 at 8:13 PM Yonghong Song <yhs@meta.com> wrote:
>
> > ________________________________________
> >
> > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
> >>
> >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
> >> the tracing of TCP messages being sent.
> >>
> >> Meta has been using BPF programs to monitor this function for years,
> >> indicating significant interest in observing this important
> >> functionality. Adding a proper tracepoint provides a stable API for all
> >> users who need visibility into TCP message transmission.
> >>
> >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
> >> creating unnecessary trace event infrastructure and tracefs exports,
> >> keeping the implementation minimal while stabilizing the API.
> >>
> >> Given that this patch creates a rawtracepoint, you could hook into it
> >> using regular tooling, like bpftrace, using regular rawtracepoint
> >> infrastructure, such as:
> >>
> >>         rawtracepoint:tcp_sendmsg_tp {
> >>                 ....
> >>         }
> >
> > I would expect tcp_sendmsg() being stable enough ?
> >
> > kprobe:tcp_sendmsg {
> > }
>
> In LTO mode, tcp_sendmsg could be inlined cross files. For example,
>
>   net/ipv4/tcp.c:
>        int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
>   net/ipv4/tcp_bpf.c:
>        ...
>       return tcp_sendmsg(sk, msg, size);
>   net/ipv6/af_inet6.c:
>        ...
>        return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...)
>
> And this does happen in our production environment.

And we do not have a way to make the kprobe work even if LTO decided
to inline a function ?

This seems like a tracing or LTO issue, this could be addressed there
in a generic way
and avoid many other patches to work around this.
diff mbox series

Patch

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 1a40c41ff8c30..7c0171d2dacdc 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -259,6 +259,11 @@  TRACE_EVENT(tcp_retransmit_synack,
 		  __entry->saddr_v6, __entry->daddr_v6)
 );
 
+DECLARE_TRACE(tcp_sendmsg_tp,
+	TP_PROTO(const struct sock *sk, const struct msghdr *msg, size_t size),
+	TP_ARGS(sk, msg, size)
+);
+
 DECLARE_TRACE(tcp_cwnd_reduction_tp,
 	TP_PROTO(const struct sock *sk, int newly_acked_sacked,
 		 int newly_lost, int flag),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 08d73f17e8162..5ef86fbd8aa85 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1362,6 +1362,8 @@  int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 {
 	int ret;
 
+	trace_tcp_sendmsg_tp(sk, msg, size);
+
 	lock_sock(sk);
 	ret = tcp_sendmsg_locked(sk, msg, size);
 	release_sock(sk);