diff mbox series

[net-next,v4,2/4] net: tap: track dropped skb via kfree_skb_reason()

Message ID 20220226084929.6417-3-dongli.zhang@oracle.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series tun/tap: use kfree_skb_reason() to trace dropped skb | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5978 this patch: 5978
netdev/cc_maintainers warning 9 maintainers not CCed: flyingpeng@tencent.com kpsingh@kernel.org john.fastabend@gmail.com kafai@fb.com songliubraving@fb.com mengensun@tencent.com dsahern@kernel.org keescook@chromium.org yhs@fb.com
netdev/build_clang success Errors and warnings before: 878 this patch: 878
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 6129 this patch: 6129
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 123 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Dongli Zhang Feb. 26, 2022, 8:49 a.m. UTC
The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
the interface to forward the skb from TAP to vhost-net/virtio-net.

However, there are many "goto drop" in the TAP driver. Therefore, the
kfree_skb_reason() is involved at each "goto drop" to help userspace
ftrace/ebpf to track the reason for the loss of packets.

The below reasons are introduced:

- SKB_DROP_REASON_SKB_CSUM
- SKB_DROP_REASON_SKB_COPY_DATA
- SKB_DROP_REASON_SKB_GSO_SEG
- SKB_DROP_REASON_DEV_HDR
- SKB_DROP_REASON_FULL_RING

Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
---
Changed since v1:
  - revise the reason name
Changed since v2:
  - declare drop_reason as type "enum skb_drop_reason"
  - handle the drop in skb_list_walk_safe() case

 drivers/net/tap.c          | 35 +++++++++++++++++++++++++----------
 include/linux/skbuff.h     |  9 +++++++++
 include/trace/events/skb.h |  5 +++++
 3 files changed, 39 insertions(+), 10 deletions(-)

Comments

Jakub Kicinski March 2, 2022, 2:42 a.m. UTC | #1
On Sat, 26 Feb 2022 00:49:27 -0800 Dongli Zhang wrote:
> +	SKB_DROP_REASON_SKB_CSUM,	/* sk_buff checksum error */

Can we spell it out a little more? It sounds like the checksum was
incorrect. Will it be clear that computing the checksum failed, rather
than checksum validation failed?

> +	SKB_DROP_REASON_SKB_COPY_DATA,	/* failed to copy data from or to
> +					 * sk_buff
> +					 */

Here should we specify that it's copying from user space?

> +	SKB_DROP_REASON_SKB_GSO_SEG,	/* gso segmentation error */
> +	SKB_DROP_REASON_DEV_HDR,	/* there is something wrong with
> +					 * device driver specific header
> +					 */

How about:
device driver specific header / metadata was invalid

to broaden the scope also to devices which don't transfer the metadata
in form of a header?

> +	SKB_DROP_REASON_FULL_RING,	/* ring buffer is full */
Dongli Zhang March 2, 2022, 5:43 p.m. UTC | #2
Hi Jakub,

On 3/1/22 6:42 PM, Jakub Kicinski wrote:
> On Sat, 26 Feb 2022 00:49:27 -0800 Dongli Zhang wrote:
>> +	SKB_DROP_REASON_SKB_CSUM,	/* sk_buff checksum error */
> 
> Can we spell it out a little more? It sounds like the checksum was
> incorrect. Will it be clear that computing the checksum failed, rather
> than checksum validation failed?

I am just trying to make the reasons as generic as possible so that:

1. We may minimize the number of reasons.

2. People may re-use the same reason for all CSUM related issue.

> 
>> +	SKB_DROP_REASON_SKB_COPY_DATA,	/* failed to copy data from or to
>> +					 * sk_buff
>> +					 */
> 
> Here should we specify that it's copying from user space?

Same as above. I am minimizing the number of reasons so that any memory copy for
sk_buff may re-use this reason.

Please let me know if you think it is very significant to specialize the usage
of reason. I will then add "sk_buff csum computation failed" and "userspace".

> 
>> +	SKB_DROP_REASON_SKB_GSO_SEG,	/* gso segmentation error */
>> +	SKB_DROP_REASON_DEV_HDR,	/* there is something wrong with
>> +					 * device driver specific header
>> +					 */
> 
> How about:
> device driver specific header / metadata was invalid
> 
> to broaden the scope also to devices which don't transfer the metadata
> in form of a header?

I will add 'metadata'.

Thank you very much!

Dongli Zhang

> 
>> +	SKB_DROP_REASON_FULL_RING,	/* ring buffer is full */
Jakub Kicinski March 2, 2022, 7:03 p.m. UTC | #3
On Wed, 2 Mar 2022 09:43:29 -0800 Dongli Zhang wrote:
> On 3/1/22 6:42 PM, Jakub Kicinski wrote:
> > On Sat, 26 Feb 2022 00:49:27 -0800 Dongli Zhang wrote:  
> >> +	SKB_DROP_REASON_SKB_CSUM,	/* sk_buff checksum error */  
> > 
> > Can we spell it out a little more? It sounds like the checksum was
> > incorrect. Will it be clear that computing the checksum failed, rather
> > than checksum validation failed?  
> 
> I am just trying to make the reasons as generic as possible so that:
> 
> 1. We may minimize the number of reasons.
> 
> 2. People may re-use the same reason for all CSUM related issue.

The generic nature is fine, my concern is to clearly differentiate
errors in _validating_ the checksum from errors in _generating_ them.
"sk_buff checksum error" does not explain which one had taken place.

> >> +	SKB_DROP_REASON_SKB_COPY_DATA,	/* failed to copy data from or to
> >> +					 * sk_buff
> >> +					 */  
> > 
> > Here should we specify that it's copying from user space?  
> 
> Same as above. I am minimizing the number of reasons so that any memory copy for
> sk_buff may re-use this reason.

IIUC this failure is equivalent to user passing an invalid buffer. 
I mean something like:

	send(fd, (void *)random(), 1000, 0);

I'd be tempted to call the reason something link SKB_UCOPY_FAULT.
To indicate it's a problem copying from user space. EFAULT is the
typical errno for that. WDYT?
Dongli Zhang March 2, 2022, 9:44 p.m. UTC | #4
Hi Jakub,

On 3/2/22 11:03 AM, Jakub Kicinski wrote:
> On Wed, 2 Mar 2022 09:43:29 -0800 Dongli Zhang wrote:
>> On 3/1/22 6:42 PM, Jakub Kicinski wrote:
>>> On Sat, 26 Feb 2022 00:49:27 -0800 Dongli Zhang wrote:  
>>>> +	SKB_DROP_REASON_SKB_CSUM,	/* sk_buff checksum error */  
>>>
>>> Can we spell it out a little more? It sounds like the checksum was
>>> incorrect. Will it be clear that computing the checksum failed, rather
>>> than checksum validation failed?  
>>
>> I am just trying to make the reasons as generic as possible so that:
>>
>> 1. We may minimize the number of reasons.
>>
>> 2. People may re-use the same reason for all CSUM related issue.
> 
> The generic nature is fine, my concern is to clearly differentiate
> errors in _validating_ the checksum from errors in _generating_ them.
> "sk_buff checksum error" does not explain which one had taken place.

This is for skb_checksum_help() and it is for csum computation.

Therefore, I will keep SKB_DROP_REASON_SKB_CSUM and add 'computation' or
'generating' to the comments.

> 
>>>> +	SKB_DROP_REASON_SKB_COPY_DATA,	/* failed to copy data from or to
>>>> +					 * sk_buff
>>>> +					 */  
>>>
>>> Here should we specify that it's copying from user space?  
>>
>> Same as above. I am minimizing the number of reasons so that any memory copy for
>> sk_buff may re-use this reason.
> 
> IIUC this failure is equivalent to user passing an invalid buffer. 
> I mean something like:
> 
> 	send(fd, (void *)random(), 1000, 0);
> 
> I'd be tempted to call the reason something link SKB_UCOPY_FAULT.
> To indicate it's a problem copying from user space. EFAULT is the
> typical errno for that. WDYT?
> 

So far the reason is used for below functions' return value:

- tap_get_user() -> zerocopy_sg_from_iter()
- tap_get_user() -> skb_copy_datagram_from_iter()
- tun_net_xmit() -> skb_orphan_frags_rx() -> skb_copy_ubufs()

I will switch to SKB_UCOPY_FAULT.

Thank you very much!

Dongli Zhang
diff mbox series

Patch

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 8e3a28ba6b28..b48f519fe63f 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -322,6 +322,7 @@  rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 	struct tap_dev *tap;
 	struct tap_queue *q;
 	netdev_features_t features = TAP_FEATURES;
+	enum skb_drop_reason drop_reason;
 
 	tap = tap_dev_get_rcu(dev);
 	if (!tap)
@@ -343,12 +344,16 @@  rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 		struct sk_buff *segs = __skb_gso_segment(skb, features, false);
 		struct sk_buff *next;
 
-		if (IS_ERR(segs))
+		if (IS_ERR(segs)) {
+			drop_reason = SKB_DROP_REASON_SKB_GSO_SEG;
 			goto drop;
+		}
 
 		if (!segs) {
-			if (ptr_ring_produce(&q->ring, skb))
+			if (ptr_ring_produce(&q->ring, skb)) {
+				drop_reason = SKB_DROP_REASON_FULL_RING;
 				goto drop;
+			}
 			goto wake_up;
 		}
 
@@ -356,8 +361,9 @@  rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 		skb_list_walk_safe(segs, skb, next) {
 			skb_mark_not_on_list(skb);
 			if (ptr_ring_produce(&q->ring, skb)) {
-				kfree_skb(skb);
-				kfree_skb_list(next);
+				drop_reason = SKB_DROP_REASON_FULL_RING;
+				kfree_skb_reason(skb, drop_reason);
+				kfree_skb_list_reason(next, drop_reason);
 				break;
 			}
 		}
@@ -369,10 +375,14 @@  rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 		 */
 		if (skb->ip_summed == CHECKSUM_PARTIAL &&
 		    !(features & NETIF_F_CSUM_MASK) &&
-		    skb_checksum_help(skb))
+		    skb_checksum_help(skb)) {
+			drop_reason = SKB_DROP_REASON_SKB_CSUM;
 			goto drop;
-		if (ptr_ring_produce(&q->ring, skb))
+		}
+		if (ptr_ring_produce(&q->ring, skb)) {
+			drop_reason = SKB_DROP_REASON_FULL_RING;
 			goto drop;
+		}
 	}
 
 wake_up:
@@ -383,7 +393,7 @@  rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 	/* Count errors/drops only here, thus don't care about args. */
 	if (tap->count_rx_dropped)
 		tap->count_rx_dropped(tap);
-	kfree_skb(skb);
+	kfree_skb_reason(skb, drop_reason);
 	return RX_HANDLER_CONSUMED;
 }
 EXPORT_SYMBOL_GPL(tap_handle_frame);
@@ -632,6 +642,7 @@  static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 	int depth;
 	bool zerocopy = false;
 	size_t linear;
+	enum skb_drop_reason drop_reason;
 
 	if (q->flags & IFF_VNET_HDR) {
 		vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
@@ -696,8 +707,10 @@  static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 	else
 		err = skb_copy_datagram_from_iter(skb, 0, from, len);
 
-	if (err)
+	if (err) {
+		drop_reason = SKB_DROP_REASON_SKB_COPY_DATA;
 		goto err_kfree;
+	}
 
 	skb_set_network_header(skb, ETH_HLEN);
 	skb_reset_mac_header(skb);
@@ -706,8 +719,10 @@  static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 	if (vnet_hdr_len) {
 		err = virtio_net_hdr_to_skb(skb, &vnet_hdr,
 					    tap_is_little_endian(q));
-		if (err)
+		if (err) {
+			drop_reason = SKB_DROP_REASON_DEV_HDR;
 			goto err_kfree;
+		}
 	}
 
 	skb_probe_transport_header(skb);
@@ -738,7 +753,7 @@  static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 	return total_len;
 
 err_kfree:
-	kfree_skb(skb);
+	kfree_skb_reason(skb, drop_reason);
 
 err:
 	rcu_read_lock();
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bc3f7822110c..9f523da4d3f2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -380,6 +380,15 @@  enum skb_drop_reason {
 					 * the ofo queue, corresponding to
 					 * LINUX_MIB_TCPOFOMERGE
 					 */
+	SKB_DROP_REASON_SKB_CSUM,	/* sk_buff checksum error */
+	SKB_DROP_REASON_SKB_COPY_DATA,	/* failed to copy data from or to
+					 * sk_buff
+					 */
+	SKB_DROP_REASON_SKB_GSO_SEG,	/* gso segmentation error */
+	SKB_DROP_REASON_DEV_HDR,	/* there is something wrong with
+					 * device driver specific header
+					 */
+	SKB_DROP_REASON_FULL_RING,	/* ring buffer is full */
 	SKB_DROP_REASON_MAX,
 };
 
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 2ab7193313aa..5b5f1351dcde 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -37,6 +37,11 @@ 
 	EM(SKB_DROP_REASON_TCP_OLD_DATA, TCP_OLD_DATA)		\
 	EM(SKB_DROP_REASON_TCP_OVERWINDOW, TCP_OVERWINDOW)	\
 	EM(SKB_DROP_REASON_TCP_OFOMERGE, TCP_OFOMERGE)		\
+	EM(SKB_DROP_REASON_SKB_CSUM, SKB_CSUM)			\
+	EM(SKB_DROP_REASON_SKB_COPY_DATA, SKB_COPY_DATA)	\
+	EM(SKB_DROP_REASON_SKB_GSO_SEG, SKB_GSO_SEG)		\
+	EM(SKB_DROP_REASON_DEV_HDR, DEV_HDR)			\
+	EM(SKB_DROP_REASON_FULL_RING, FULL_RING)		\
 	EMe(SKB_DROP_REASON_MAX, MAX)
 
 #undef EM