[v4,net-next,1/7] net: add rx_sk to trace_kfree_skb

Message ID	dcfa5db9be2d29b68fe7c87b3f017e98e5ec83b4.1718136376.git.yan@cloudflare.com (mailing list archive)
State	Superseded
Headers	show Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDA07152533 for <linux-trace-kernel@vger.kernel.org>; Tue, 11 Jun 2024 20:11:28 +0000 (UTC) Date: Tue, 11 Jun 2024 13:11:24 -0700 From: Yan Zhai <yan@cloudflare.com> To: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>, David Ahern <dsahern@kernel.org>, Abhishek Chauhan <quic_abchauha@quicinc.com>, Mina Almasry <almasrymina@google.com>, Florian Westphal <fw@strlen.de>, Alexander Lobakin <aleksander.lobakin@intel.com>, David Howells <dhowells@redhat.com>, Jiri Pirko <jiri@resnulli.us>, Daniel Borkmann <daniel@iogearbox.net>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Lorenzo Bianconi <lorenzo@kernel.org>, Pavel Begunkov <asml.silence@gmail.com>, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Jesper Dangaard Brouer <hawk@kernel.org>, Steven Rostedt <rostedt@goodmis.org>, Masami Hiramatsu <mhiramat@kernel.org>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Neil Horman <nhorman@tuxdriver.com>, linux-trace-kernel@vger.kernel.org, Dan Carpenter <dan.carpenter@linaro.org> Subject: [PATCH v4 net-next 1/7] net: add rx_sk to trace_kfree_skb Message-ID: <dcfa5db9be2d29b68fe7c87b3f017e98e5ec83b4.1718136376.git.yan@cloudflare.com> References: <cover.1718136376.git.yan@cloudflare.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <cover.1718136376.git.yan@cloudflare.com>
Series	net: pass receive socket to drop tracepoint \| expand [v4,net-next,0/7] net: pass receive socket to drop tracepoint [v4,net-next,1/7] net: add rx_sk to trace_kfree_skb [v4,net-next,2/7] net: introduce sk_skb_reason_drop function [v4,net-next,3/7] ping: use sk_skb_reason_drop to free rx packets [v4,net-next,4/7] net: raw: use sk_skb_reason_drop to free rx packets [v4,net-next,5/7] tcp: use sk_skb_reason_drop to free rx packets [v4,net-next,6/7] udp: use sk_skb_reason_drop to free rx packets [v4,net-next,7/7] af_packet: use sk_skb_reason_drop to free rx packets

Message ID

dcfa5db9be2d29b68fe7c87b3f017e98e5ec83b4.1718136376.git.yan@cloudflare.com (mailing list archive)

State

Superseded

Headers

Date: Tue, 11 Jun 2024 13:11:24 -0700
From: Yan Zhai <yan@cloudflare.com>
To: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>, David Ahern <dsahern@kernel.org>,
	Abhishek Chauhan <quic_abchauha@quicinc.com>,
	Mina Almasry <almasrymina@google.com>,
	Florian Westphal <fw@strlen.de>,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	David Howells <dhowells@redhat.com>, Jiri Pirko <jiri@resnulli.us>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Pavel Begunkov <asml.silence@gmail.com>,
	linux-kernel@vger.kernel.org, kernel-team@cloudflare.com,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Neil Horman <nhorman@tuxdriver.com>,
	linux-trace-kernel@vger.kernel.org,
	Dan Carpenter <dan.carpenter@linaro.org>
Subject: [PATCH v4 net-next 1/7] net: add rx_sk to trace_kfree_skb
Message-ID: 
 <dcfa5db9be2d29b68fe7c87b3f017e98e5ec83b4.1718136376.git.yan@cloudflare.com>
References: <cover.1718136376.git.yan@cloudflare.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <cover.1718136376.git.yan@cloudflare.com>

Series

net: pass receive socket to drop tracepoint | expand

Commit Message

Yan Zhai June 11, 2024, 8:11 p.m. UTC

skb does not include enough information to find out receiving
sockets/services and netns/containers on packet drops. In theory
skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
stack for OOO packet lookup. Similarly, skb->sk often identifies a local
sender, and tells nothing about a receiver.

Allow passing an extra receiving socket to the tracepoint to improve
the visibility on receiving drops.

Signed-off-by: Yan Zhai <yan@cloudflare.com>
---
v3->v4: adjusted the TP_STRUCT field order to be consistent
v2->v3: fixed drop_monitor function prototype
---
 include/trace/events/skb.h | 11 +++++++----
 net/core/dev.c             |  2 +-
 net/core/drop_monitor.c    |  9 ++++++---
 net/core/skbuff.c          |  2 +-
 4 files changed, 15 insertions(+), 9 deletions(-)

Comments

Jesper Dangaard Brouer June 12, 2024, 7:59 a.m. UTC | #1

On 11/06/2024 22.11, Yan Zhai wrote:
> skb does not include enough information to find out receiving
> sockets/services and netns/containers on packet drops. In theory
> skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
> stack for OOO packet lookup. Similarly, skb->sk often identifies a local
> sender, and tells nothing about a receiver.
> 
> Allow passing an extra receiving socket to the tracepoint to improve
> the visibility on receiving drops.
> 
> Signed-off-by: Yan Zhai<yan@cloudflare.com>
> ---
> v3->v4: adjusted the TP_STRUCT field order to be consistent
> v2->v3: fixed drop_monitor function prototype
> ---
>   include/trace/events/skb.h | 11 +++++++----
>   net/core/dev.c             |  2 +-
>   net/core/drop_monitor.c    |  9 ++++++---
>   net/core/skbuff.c          |  2 +-
>   4 files changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index 07e0715628ec..3e9ea1cca6f2 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
>   TRACE_EVENT(kfree_skb,
>   
>   	TP_PROTO(struct sk_buff *skb, void *location,
> -		 enum skb_drop_reason reason),
> +		 enum skb_drop_reason reason, struct sock *rx_sk),
>   
> -	TP_ARGS(skb, location, reason),
> +	TP_ARGS(skb, location, reason, rx_sk),
>   
>   	TP_STRUCT__entry(
>   		__field(void *,		skbaddr)
>   		__field(void *,		location)
> +		__field(void *,		rx_skaddr)

Is there any reason for appending the "addr" part to "rx_sk" ?
It makes it harder to read this is the sk (socket).

AFAICR the skbaddr naming is a legacy thing.

>   		__field(unsigned short,	protocol)
>   		__field(enum skb_drop_reason,	reason)
>   	),
> @@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb,
>   	TP_fast_assign(
>   		__entry->skbaddr = skb;
>   		__entry->location = location;
> +		__entry->rx_skaddr = rx_sk;
>   		__entry->protocol = ntohs(skb->protocol);
>   		__entry->reason = reason;
>   	),
>   
> -	TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
> -		  __entry->skbaddr, __entry->protocol, __entry->location,
> +	TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s",
                               ^^^^^^^^^
I find it hard to visually tell skbaddr and rx_skaddr apart.
And especially noticing the "skb" vs "sk" part of the string.


> +		  __entry->skbaddr, __entry->rx_skaddr, __entry->protocol,
> +		  __entry->location,
>   		  __print_symbolic(__entry->reason,
>   				   DEFINE_DROP_REASON(FN, FNe)))
>   );


--Jesper

Paolo Abeni June 14, 2024, 10:15 a.m. UTC | #2

On Wed, 2024-06-12 at 09:59 +0200, Jesper Dangaard Brouer wrote:
> 
> On 11/06/2024 22.11, Yan Zhai wrote:
> > skb does not include enough information to find out receiving
> > sockets/services and netns/containers on packet drops. In theory
> > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
> > stack for OOO packet lookup. Similarly, skb->sk often identifies a local
> > sender, and tells nothing about a receiver.
> > 
> > Allow passing an extra receiving socket to the tracepoint to improve
> > the visibility on receiving drops.
> > 
> > Signed-off-by: Yan Zhai<yan@cloudflare.com>
> > ---
> > v3->v4: adjusted the TP_STRUCT field order to be consistent
> > v2->v3: fixed drop_monitor function prototype
> > ---
> >   include/trace/events/skb.h | 11 +++++++----
> >   net/core/dev.c             |  2 +-
> >   net/core/drop_monitor.c    |  9 ++++++---
> >   net/core/skbuff.c          |  2 +-
> >   4 files changed, 15 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> > index 07e0715628ec..3e9ea1cca6f2 100644
> > --- a/include/trace/events/skb.h
> > +++ b/include/trace/events/skb.h
> > @@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
> >   TRACE_EVENT(kfree_skb,
> >   
> >   	TP_PROTO(struct sk_buff *skb, void *location,
> > -		 enum skb_drop_reason reason),
> > +		 enum skb_drop_reason reason, struct sock *rx_sk),
> >   
> > -	TP_ARGS(skb, location, reason),
> > +	TP_ARGS(skb, location, reason, rx_sk),
> >   
> >   	TP_STRUCT__entry(
> >   		__field(void *,		skbaddr)
> >   		__field(void *,		location)
> > +		__field(void *,		rx_skaddr)
> 
> Is there any reason for appending the "addr" part to "rx_sk" ?
> It makes it harder to read this is the sk (socket).
> 
> AFAICR the skbaddr naming is a legacy thing.

I'm double-minded about the above: I can see your point, but on the
flip side the 'addr' suffix is consistently used in net-related
tracepoints.
> 
> >   		__field(unsigned short,	protocol)
> >   		__field(enum skb_drop_reason,	reason)
> >   	),
> > @@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb,
> >   	TP_fast_assign(
> >   		__entry->skbaddr = skb;
> >   		__entry->location = location;
> > +		__entry->rx_skaddr = rx_sk;
> >   		__entry->protocol = ntohs(skb->protocol);
> >   		__entry->reason = reason;
> >   	),
> >   
> > -	TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
> > -		  __entry->skbaddr, __entry->protocol, __entry->location,
> > +	TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s",
>                                ^^^^^^^^^
> I find it hard to visually tell skbaddr and rx_skaddr apart.
> And especially noticing the "skb" vs "sk" part of the string.

I agree 'rx_skaddr' is sub-optimal. Either be consistent with all the
other net tracepoints and use 'skaddr' (which will very likely will
increase Jesper concerns, but I personally have no problem with such
format) or prefer readability with something alike 'rx_sk' or (even
better) 'sk'.

Thanks,

Paolo

Yan Zhai June 14, 2024, 7:40 p.m. UTC | #3

On Fri, Jun 14, 2024 at 5:15 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2024-06-12 at 09:59 +0200, Jesper Dangaard Brouer wrote:
> >
> > On 11/06/2024 22.11, Yan Zhai wrote:
> > > skb does not include enough information to find out receiving
> > > sockets/services and netns/containers on packet drops. In theory
> > > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
> > > stack for OOO packet lookup. Similarly, skb->sk often identifies a local
> > > sender, and tells nothing about a receiver.
> > >
> > > Allow passing an extra receiving socket to the tracepoint to improve
> > > the visibility on receiving drops.
> > >
> > > Signed-off-by: Yan Zhai<yan@cloudflare.com>
> > > ---
> > > v3->v4: adjusted the TP_STRUCT field order to be consistent
> > > v2->v3: fixed drop_monitor function prototype
> > > ---
> > >   include/trace/events/skb.h | 11 +++++++----
> > >   net/core/dev.c             |  2 +-
> > >   net/core/drop_monitor.c    |  9 ++++++---
> > >   net/core/skbuff.c          |  2 +-
> > >   4 files changed, 15 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> > > index 07e0715628ec..3e9ea1cca6f2 100644
> > > --- a/include/trace/events/skb.h
> > > +++ b/include/trace/events/skb.h
> > > @@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
> > >   TRACE_EVENT(kfree_skb,
> > >
> > >     TP_PROTO(struct sk_buff *skb, void *location,
> > > -            enum skb_drop_reason reason),
> > > +            enum skb_drop_reason reason, struct sock *rx_sk),
> > >
> > > -   TP_ARGS(skb, location, reason),
> > > +   TP_ARGS(skb, location, reason, rx_sk),
> > >
> > >     TP_STRUCT__entry(
> > >             __field(void *,         skbaddr)
> > >             __field(void *,         location)
> > > +           __field(void *,         rx_skaddr)
> >
> > Is there any reason for appending the "addr" part to "rx_sk" ?
> > It makes it harder to read this is the sk (socket).
> >
> > AFAICR the skbaddr naming is a legacy thing.
>
> I'm double-minded about the above: I can see your point, but on the
> flip side the 'addr' suffix is consistently used in net-related
> tracepoints.
> >
> > >             __field(unsigned short, protocol)
> > >             __field(enum skb_drop_reason,   reason)
> > >     ),
> > > @@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb,
> > >     TP_fast_assign(
> > >             __entry->skbaddr = skb;
> > >             __entry->location = location;
> > > +           __entry->rx_skaddr = rx_sk;
> > >             __entry->protocol = ntohs(skb->protocol);
> > >             __entry->reason = reason;
> > >     ),
> > >
> > > -   TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
> > > -             __entry->skbaddr, __entry->protocol, __entry->location,
> > > +   TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s",
> >                                ^^^^^^^^^
> > I find it hard to visually tell skbaddr and rx_skaddr apart.
> > And especially noticing the "skb" vs "sk" part of the string.
>
> I agree 'rx_skaddr' is sub-optimal. Either be consistent with all the
> other net tracepoints and use 'skaddr' (which will very likely will
> increase Jesper concerns, but I personally have no problem with such
> format) or prefer readability with something alike 'rx_sk' or (even
> better) 'sk'.
>

Jesper explained to me in a private message that "addr" makes more
sense when there was no BPF, since likely nothing would dereference
the pointer anymore at that time, so it's purely an address. But it is
no longer the case now. Also, in later patches of this change, I am
already breaking the "convention" by replacing kfree_skb with
sk_skb_reason_drop, so how about breaking it once more, and just
calling it "rx_sk". I want to keep the "rx_" to emphasize this is a
receiving socket. Let me send an amended version early next week and
see if more thoughts come.

thanks
Yan


> Thanks,
>
> Paolo
>

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 07e0715628ec..3e9ea1cca6f2 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -24,13 +24,14 @@  DEFINE_DROP_REASON(FN, FN)
 TRACE_EVENT(kfree_skb,
 
 	TP_PROTO(struct sk_buff *skb, void *location,
-		 enum skb_drop_reason reason),
+		 enum skb_drop_reason reason, struct sock *rx_sk),
 
-	TP_ARGS(skb, location, reason),
+	TP_ARGS(skb, location, reason, rx_sk),
 
 	TP_STRUCT__entry(
 		__field(void *,		skbaddr)
 		__field(void *,		location)
+		__field(void *,		rx_skaddr)
 		__field(unsigned short,	protocol)
 		__field(enum skb_drop_reason,	reason)
 	),
@@ -38,12 +39,14 @@  TRACE_EVENT(kfree_skb,
 	TP_fast_assign(
 		__entry->skbaddr = skb;
 		__entry->location = location;
+		__entry->rx_skaddr = rx_sk;
 		__entry->protocol = ntohs(skb->protocol);
 		__entry->reason = reason;
 	),
 
-	TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
-		  __entry->skbaddr, __entry->protocol, __entry->location,
+	TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s",
+		  __entry->skbaddr, __entry->rx_skaddr, __entry->protocol,
+		  __entry->location,
 		  __print_symbolic(__entry->reason,
 				   DEFINE_DROP_REASON(FN, FNe)))
 );
diff --git a/net/core/dev.c b/net/core/dev.c
index 85fe8138f3e4..7844227ecbfd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5233,7 +5233,7 @@  static __latent_entropy void net_tx_action(struct softirq_action *h)
 				trace_consume_skb(skb, net_tx_action);
 			else
 				trace_kfree_skb(skb, net_tx_action,
-						get_kfree_skb_cb(skb)->reason);
+						get_kfree_skb_cb(skb)->reason, NULL);
 
 			if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
 				__kfree_skb(skb);
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 430ed18f8584..2e0ae3328232 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -109,7 +109,8 @@  static u32 net_dm_queue_len = 1000;
 struct net_dm_alert_ops {
 	void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb,
 				void *location,
-				enum skb_drop_reason reason);
+				enum skb_drop_reason reason,
+				struct sock *rx_sk);
 	void (*napi_poll_probe)(void *ignore, struct napi_struct *napi,
 				int work, int budget);
 	void (*work_item_func)(struct work_struct *work);
@@ -264,7 +265,8 @@  static void trace_drop_common(struct sk_buff *skb, void *location)
 
 static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb,
 				void *location,
-				enum skb_drop_reason reason)
+				enum skb_drop_reason reason,
+				struct sock *rx_sk)
 {
 	trace_drop_common(skb, location);
 }
@@ -491,7 +493,8 @@  static const struct net_dm_alert_ops net_dm_alert_summary_ops = {
 static void net_dm_packet_trace_kfree_skb_hit(void *ignore,
 					      struct sk_buff *skb,
 					      void *location,
-					      enum skb_drop_reason reason)
+					      enum skb_drop_reason reason,
+					      struct sock *rx_sk)
 {
 	ktime_t tstamp = ktime_get_real();
 	struct per_cpu_dm_data *data;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466999a7515e..2854afdd713f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1203,7 +1203,7 @@  bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
 	if (reason == SKB_CONSUMED)
 		trace_consume_skb(skb, __builtin_return_address(0));
 	else
-		trace_kfree_skb(skb, __builtin_return_address(0), reason);
+		trace_kfree_skb(skb, __builtin_return_address(0), reason, NULL);
 	return true;
 }

[v4,net-next,1/7] net: add rx_sk to trace_kfree_skb

Commit Message

Comments

Patch