diff mbox series

[net-next] trace: tcp: Add tracepoint for tcp_cwnd_reduction()

Message ID 20250207-cwnd_tracepoint-v1-1-13650f3ca96d@debian.org (mailing list archive)
State Superseded
Headers show
Series [net-next] trace: tcp: Add tracepoint for tcp_cwnd_reduction() | expand

Commit Message

Breno Leitao Feb. 7, 2025, 6:03 p.m. UTC
Add a lightweight tracepoint to monitor TCP congestion window
adjustments via tcp_cwnd_reduction(). This tracepoint enables tracking
of:
- TCP window size fluctuations
- Active socket behavior
- Congestion window reduction events

Meta has been using BPF programs to monitor this function for years.
Adding a proper tracepoint provides a stable API for all users who need
to monitor TCP congestion window behavior.

Use DECLARE_TRACE instead of TRACE_EVENT to avoid creating trace event
infrastructure and exporting to tracefs, keeping the implementation
minimal. (Thanks Steven Rostedt)

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes since RFC:
- Change from a full tracepoint to DECLARE_TRACE() as suggested by
  Steven
- Link to RFC: https://lore.kernel.org/r/20250120-cwnd_tracepoint-v1-1-36b0e0d643fa@debian.org
---
 include/trace/events/tcp.h | 5 +++++
 net/ipv4/tcp_input.c       | 2 ++
 2 files changed, 7 insertions(+)


---
base-commit: 09717c28b76c30b1dc8c261c855ffb2406abab2e
change-id: 20250120-cwnd_tracepoint-2e11c996a9cb

Best regards,

Comments

Eric Dumazet Feb. 11, 2025, 3:19 p.m. UTC | #1
On Fri, Feb 7, 2025 at 7:04 PM Breno Leitao <leitao@debian.org> wrote:
>
> Add a lightweight tracepoint to monitor TCP congestion window
> adjustments via tcp_cwnd_reduction(). This tracepoint enables tracking
> of:
> - TCP window size fluctuations
> - Active socket behavior
> - Congestion window reduction events
>
> Meta has been using BPF programs to monitor this function for years.
> Adding a proper tracepoint provides a stable API for all users who need
> to monitor TCP congestion window behavior.
>
> Use DECLARE_TRACE instead of TRACE_EVENT to avoid creating trace event
> infrastructure and exporting to tracefs, keeping the implementation
> minimal. (Thanks Steven Rostedt)
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---

I can give my +2 on this patch, although I have no way of testing it.

I will trust Steven on this :)

Reviewed-by: Eric Dumazet <edumazet@google.com>
Steven Rostedt Feb. 11, 2025, 5:05 p.m. UTC | #2
On Tue, 11 Feb 2025 16:19:54 +0100
Eric Dumazet <edumazet@google.com> wrote:

> I can give my +2 on this patch, although I have no way of testing it.

If you want to test this, apply the below patch, enable
CONFIG_SAMPLE_TRACE_CUSTOM_EVENTS, and after you boot up, do the following:

 # modprobe trace_custom_sched
 # cd /sys/kernel/tracing
 # echo 1 > events/custom/tcp_cwnd_reduction_tp/enable

[ do something to trigger it ]

 # cat trace

-- Steve

diff --git a/samples/trace_events/trace_custom_sched.c b/samples/trace_events/trace_custom_sched.c
index dd409b704b35..35b3cfa6e91d 100644
--- a/samples/trace_events/trace_custom_sched.c
+++ b/samples/trace_events/trace_custom_sched.c
@@ -16,6 +16,7 @@
  * from the C file, and not in the custom header file.
  */
 #include <trace/events/sched.h>
+#include <trace/events/tcp.h>
 
 /* Declare CREATE_CUSTOM_TRACE_EVENTS before including custom header */
 #define CREATE_CUSTOM_TRACE_EVENTS
@@ -37,6 +38,7 @@
  */
 static void fct(struct tracepoint *tp, void *priv)
 {
+	trace_custom_event_tcp_cwnd_reduction_tp_update(tp);
 	trace_custom_event_sched_switch_update(tp);
 	trace_custom_event_sched_waking_update(tp);
 }
diff --git a/samples/trace_events/trace_custom_sched.h b/samples/trace_events/trace_custom_sched.h
index 951388334a3f..339957d692c0 100644
--- a/samples/trace_events/trace_custom_sched.h
+++ b/samples/trace_events/trace_custom_sched.h
@@ -74,6 +74,33 @@ TRACE_CUSTOM_EVENT(sched_waking,
 
 	TP_printk("pid=%d prio=%d", __entry->pid, __entry->prio)
 )
+
+struct sock;
+
+TRACE_CUSTOM_EVENT(tcp_cwnd_reduction_tp,
+
+	TP_PROTO(const struct sock *sk, int newly_acked_sacked,
+                int newly_lost, int flag),
+
+	TP_ARGS(sk, newly_acked_sacked, newly_lost, flag),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,		sk	)
+		__field(	int,			ack	)
+		__field(	int,			lost	)
+		__field(	int,			flag	)
+	),
+
+	TP_fast_assign(
+		__entry->sk	= (unsigned long)sk;
+		__entry->ack	= newly_acked_sacked;
+		__entry->lost	= newly_lost;
+		__entry->flag	= flag;
+	),
+
+	TP_printk("sk=%lx ack=%d lost=%d flag=%d", __entry->sk,
+		__entry->ack, __entry->lost, __entry->flag)
+)
 #endif
 /*
  * Just like the headers that create TRACE_EVENTs, the below must
Jakub Kicinski Feb. 11, 2025, 10:21 p.m. UTC | #3
On Fri, 07 Feb 2025 10:03:53 -0800 Breno Leitao wrote:
> +DECLARE_TRACE(tcp_cwnd_reduction_tp,
> +	TP_PROTO(const struct sock *sk, int newly_acked_sacked,
> +		 int newly_lost, int flag),
> +	TP_ARGS(sk, newly_acked_sacked, newly_lost, flag));

nit: I think that the ");" traditionally goes on a separate line?

regarding testing if the goal is the use in BPF perhaps you could
add a small sample/result to the commit message of using bpftrace
against it?
diff mbox series

Patch

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index a27c4b619dffd7dcc72fffa71bf0fd5e34fe6681..d574e6151dc4f7430206f9ccefe0bf0d463aaa52 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -259,6 +259,11 @@  TRACE_EVENT(tcp_retransmit_synack,
 		  __entry->saddr_v6, __entry->daddr_v6)
 );
 
+DECLARE_TRACE(tcp_cwnd_reduction_tp,
+	TP_PROTO(const struct sock *sk, int newly_acked_sacked,
+		 int newly_lost, int flag),
+	TP_ARGS(sk, newly_acked_sacked, newly_lost, flag));
+
 #include <trace/events/net_probe_common.h>
 
 TRACE_EVENT(tcp_probe,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index eb82e01da911048b41ca380f913ef55566be79a7..1a667e67df6beacde9871a50d44e180c2943ded0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2710,6 +2710,8 @@  void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked, int newly_lost,
 	if (newly_acked_sacked <= 0 || WARN_ON_ONCE(!tp->prior_cwnd))
 		return;
 
+	trace_tcp_cwnd_reduction_tp(sk, newly_acked_sacked, newly_lost, flag);
+
 	tp->prr_delivered += newly_acked_sacked;
 	if (delta < 0) {
 		u64 dividend = (u64)tp->snd_ssthresh * tp->prr_delivered +