diff mbox series

[net-next,v2,3/3] tcp: add location into reset trace process

Message ID 20240325062831.48675-4-kerneljasonxing@gmail.com (mailing list archive)
State Superseded
Headers show
Series tcp: make trace of reset logic complete | expand

Commit Message

Jason Xing March 25, 2024, 6:28 a.m. UTC
From: Jason Xing <kernelxing@tencent.com>

In addition to knowing the 4-tuple of the flow which generates RST,
the reason why it does so is very important because we have some
cases where the RST should be sent and have no clue which one
exactly.

Adding location of reset process can help us more, like what
trace_kfree_skb does.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/trace/events/tcp.h | 14 ++++++++++----
 net/ipv4/tcp_ipv4.c        |  2 +-
 net/ipv4/tcp_output.c      |  2 +-
 net/ipv6/tcp_ipv6.c        |  2 +-
 4 files changed, 13 insertions(+), 7 deletions(-)

Comments

Paolo Abeni March 26, 2024, 11:08 a.m. UTC | #1
On Mon, 2024-03-25 at 14:28 +0800, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> In addition to knowing the 4-tuple of the flow which generates RST,
> the reason why it does so is very important because we have some
> cases where the RST should be sent and have no clue which one
> exactly.
> 
> Adding location of reset process can help us more, like what
> trace_kfree_skb does.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  include/trace/events/tcp.h | 14 ++++++++++----
>  net/ipv4/tcp_ipv4.c        |  2 +-
>  net/ipv4/tcp_output.c      |  2 +-
>  net/ipv6/tcp_ipv6.c        |  2 +-
>  4 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index a13eb2147a02..8f6c1a07503c 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -109,13 +109,17 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
>   */
>  TRACE_EVENT(tcp_send_reset,
>  
> -	TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
> +	TP_PROTO(
> +		const struct sock *sk,
> +		const struct sk_buff *skb,
> +		void *location),

Very minor nit: the above lines should be aligned with the open
bracket.

No need to repost just for this, but let's wait for Eric's feedback.

Cheers,

Paolo
Jakub Kicinski March 29, 2024, 1:15 a.m. UTC | #2
On Tue, 26 Mar 2024 12:08:01 +0100 Paolo Abeni wrote:
> > -	TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
> > +	TP_PROTO(
> > +		const struct sock *sk,
> > +		const struct sk_buff *skb,
> > +		void *location),  
> 
> Very minor nit: the above lines should be aligned with the open
> bracket.

Yes, and a very odd way of breaking it up. Empty line after ( but
) not on a separate line.

> No need to repost just for this, but let's wait for Eric's feedback.

Erring on the side of caution I'd read this:
https://lore.kernel.org/all/CANn89iKK-qPhQ91Sq8rR_=KDWajnY2=Et2bUjDsgoQK4wxFOHw@mail.gmail.com/
as lukewarm towards tp changes. Please repost if you think otherwise
(with the formatting fixed)
Jason Xing March 29, 2024, 2:53 a.m. UTC | #3
On Fri, Mar 29, 2024 at 9:15 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 26 Mar 2024 12:08:01 +0100 Paolo Abeni wrote:
> > > -   TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
> > > +   TP_PROTO(
> > > +           const struct sock *sk,
> > > +           const struct sk_buff *skb,
> > > +           void *location),
> >
> > Very minor nit: the above lines should be aligned with the open
> > bracket.
>
> Yes, and a very odd way of breaking it up. Empty line after ( but
> ) not on a separate line.

After I blamed the history, maybe I should follow the format like
TRACE_EVENT(netfs_read)?

>
> > No need to repost just for this, but let's wait for Eric's feedback.
>
> Erring on the side of caution I'd read this:
> https://lore.kernel.org/all/CANn89iKK-qPhQ91Sq8rR_=KDWajnY2=Et2bUjDsgoQK4wxFOHw@mail.gmail.com/
> as lukewarm towards tp changes. Please repost if you think otherwise
> (with the formatting fixed)

Yes, I will repost it. I'm not introducing a controversial new tracepoint.

This patch is not only about whether we should use 'old-way' tracing
but about the tracepoint of this tcp reset that is not complete. Some
admins could use bpf to capture RST behaviours through hooking this
tracepoint which is not right currently apparently.

Besides, I simply tested the performance between using tracing and bpf
to monitor the fast path (like __tcp_transmit_skb()) on my loopback. I
saw at least 12% degradation with BPF used. So the advantage of trace
is obvious even though nowadays it is considered as an old school
method.

Thanks,
Jason

> --
> pw-bot: cr
diff mbox series

Patch

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index a13eb2147a02..8f6c1a07503c 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -109,13 +109,17 @@  DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
  */
 TRACE_EVENT(tcp_send_reset,
 
-	TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
+	TP_PROTO(
+		const struct sock *sk,
+		const struct sk_buff *skb,
+		void *location),
 
-	TP_ARGS(sk, skb),
+	TP_ARGS(sk, skb, location),
 
 	TP_STRUCT__entry(
 		__field(const void *, skbaddr)
 		__field(const void *, skaddr)
+		__field(void *, location)
 		__field(int, state)
 		__array(__u8, saddr, sizeof(struct sockaddr_in6))
 		__array(__u8, daddr, sizeof(struct sockaddr_in6))
@@ -141,12 +145,14 @@  TRACE_EVENT(tcp_send_reset,
 			 */
 			TP_STORE_ADDR_PORTS_SKB(skb, entry->daddr, entry->saddr);
 		}
+		__entry->location = location;
 	),
 
-	TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s",
+	TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s location=%pS",
 		  __entry->skbaddr, __entry->skaddr,
 		  __entry->saddr, __entry->daddr,
-		  __entry->state ? show_tcp_state_name(__entry->state) : "UNKNOWN")
+		  __entry->state ? show_tcp_state_name(__entry->state) : "UNKNOWN",
+		  __entry->location)
 );
 
 /*
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d5c4a969c066..fec54cfc4fb3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -870,7 +870,7 @@  static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
 		arg.bound_dev_if = sk->sk_bound_dev_if;
 	}
 
-	trace_tcp_send_reset(sk, skb);
+	trace_tcp_send_reset(sk, skb,  __builtin_return_address(0));
 
 	BUILD_BUG_ON(offsetof(struct sock, sk_bound_dev_if) !=
 		     offsetof(struct inet_timewait_sock, tw_bound_dev_if));
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e3167ad96567..fb613582817e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3608,7 +3608,7 @@  void tcp_send_active_reset(struct sock *sk, gfp_t priority)
 	/* skb of trace_tcp_send_reset() keeps the skb that caused RST,
 	 * skb here is different to the troublesome skb, so use NULL
 	 */
-	trace_tcp_send_reset(sk, NULL);
+	trace_tcp_send_reset(sk, NULL,  __builtin_return_address(0));
 }
 
 /* Send a crossed SYN-ACK during socket establishment.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8e9c59b6c00c..7eba9c3d69f1 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1128,7 +1128,7 @@  static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
 			label = ip6_flowlabel(ipv6h);
 	}
 
-	trace_tcp_send_reset(sk, skb);
+	trace_tcp_send_reset(sk, skb,  __builtin_return_address(0));
 
 	tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, 1,
 			     ipv6_get_dsfield(ipv6h), label, priority, txhash,