Message ID | 20220226084929.6417-5-dongli.zhang@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tun/tap: use kfree_skb_reason() to trace dropped skb | expand |
On 2/26/22 1:49 AM, Dongli Zhang wrote: > The TUN can be used as vhost-net backend. E.g, the tun_net_xmit() is the > interface to forward the skb from TUN to vhost-net/virtio-net. > > However, there are many "goto drop" in the TUN driver. Therefore, the > kfree_skb_reason() is involved at each "goto drop" to help userspace > ftrace/ebpf to track the reason for the loss of packets. > > The below reasons are introduced: > > - SKB_DROP_REASON_SKB_PULL > - SKB_DROP_REASON_SKB_TRIM > - SKB_DROP_REASON_DEV_READY > - SKB_DROP_REASON_TAP_FILTER > - SKB_DROP_REASON_TAP_TXFILTER > > Cc: Joao Martins <joao.m.martins@oracle.com> > Cc: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> > --- > Changed since v1: > - revise the reason name > Changed since v2: > - declare drop_reason as type "enum skb_drop_reason" > Changed since v3: > - rename to TAP_FILTER and TAP_TXFILTER > - honor reverse xmas tree style declaration for 'drop_reason' in > tun_net_xmit() > > drivers/net/tun.c | 37 ++++++++++++++++++++++++++++--------- > include/linux/skbuff.h | 10 ++++++++++ > include/trace/events/skb.h | 5 +++++ > 3 files changed, 43 insertions(+), 9 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: > + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ > + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ IDK if these are not too low level and therefore lacking meaning. What are your thoughts David? Would it be better to up level the names a little bit and call SKB_PULL something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN? For SKB_TRIM the error comes from allocation failures, there may be a whole bunch of skb helpers which will fail only under mem pressure, would it be better to identify them and return some ENOMEM related reason, since, most likely, those will be noise to whoever is tracking real errors? > SKB_DROP_REASON_DEV_HDR, /* there is something wrong with > * device driver specific header > */ > + SKB_DROP_REASON_DEV_READY, /* device is not ready */ What is ready? link is not up? peer not connected? can we expand?
On 3/1/22 7:50 PM, Jakub Kicinski wrote: > On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: >> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ >> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ > > IDK if these are not too low level and therefore lacking meaning. > > What are your thoughts David? I agree. Not every kfree_skb is worthy of a reason. "Internal housekeeping" errors are random and nothing a user / admin can do about drops. IMHO, the value of the reason code is when it aligns with SNMP counters (original motivation for this direction) and relevant details like TCP or UDP checksum mismatch, packets for a socket that is not open, socket is full, ring buffer is full, packets for "other host", etc. > > Would it be better to up level the names a little bit and call SKB_PULL > something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe > "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN? > > For SKB_TRIM the error comes from allocation failures, there may be > a whole bunch of skb helpers which will fail only under mem pressure, > would it be better to identify them and return some ENOMEM related > reason, since, most likely, those will be noise to whoever is tracking > real errors? > >> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with >> * device driver specific header >> */ >> + SKB_DROP_REASON_DEV_READY, /* device is not ready */ > > What is ready? link is not up? peer not connected? can we expand? As I recall in this case it is the tfile for a tun device disappeared - ie., a race condition.
On 2022/3/2 AM 11:29,“David Ahern”<dsahern@gmail.com> write: >On 3/1/22 7:50 PM, Jakub Kicinski wrote: >> On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: >>> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ >>> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ >> [...] >>> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with >>> * device driver specific header >>> */ >>> + SKB_DROP_REASON_DEV_READY, /* device is not ready */ >> >> What is ready? link is not up? peer not connected? can we expand? > >As I recall in this case it is the tfile for a tun device disappeared - >ie., a race condition. This seems is that tun is not attached to a file (the tun device file is not opened?) Maybe TAP_UNATTACHED is more suitable :)
Hi Jakub and David, On 3/1/22 6:50 PM, Jakub Kicinski wrote: > On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: >> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ >> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ > > IDK if these are not too low level and therefore lacking meaning. > > What are your thoughts David? > > Would it be better to up level the names a little bit and call SKB_PULL > something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe > "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN? This is for device driver and I think for most of cases the people understanding source code will be involved. I think SKB_PULL is more meaningful to people understanding source code. The functions to pull data to skb is commonly used with the same pattern, and not only for ETH_HLEN. E.g., I randomly found below in kernel source code. 1071 static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb) 1072 { ... ... 1102 pulled_sci = pskb_may_pull(skb, macsec_extra_len(true)); 1103 if (!pulled_sci) { 1104 if (!pskb_may_pull(skb, macsec_extra_len(false))) 1105 goto drop_direct; 1106 } ... ... 1254 drop_direct: 1255 kfree_skb(skb); 1256 *pskb = NULL; 1257 return RX_HANDLER_CONSUMED; About 'L2_HDR_ERR', I am curious what the user/administrator may do as next step, while the 'SKB_PULL' will be very clear to the developers which kernel operation (e.g., to pull some protocol/hdr data to sk_buff data) is with the issue. I may use 'L2_HDR_ERR' if you prefer. > > For SKB_TRIM the error comes from allocation failures, there may be > a whole bunch of skb helpers which will fail only under mem pressure, > would it be better to identify them and return some ENOMEM related > reason, since, most likely, those will be noise to whoever is tracking > real errors? The reasons I want to use SKB_TRIM: 1. To have SKB_PULL and SKB_TRIM (perhaps more SKB_XXX in the future in the same set). 2. Although so that SKB_TRIM is always caused by ENOMEM, suppose if there is new return values by pskb_trim(), the reason is not going to be valid any longer. I may use SKB_DROP_REASON_NOMEM if you prefer. Another concern is that many functions may return -ENOMEM. It is more likely that if there are two "goto drop" to return -ENOMEM, we will not be able to tell from which function the sk_buff is dropped, e.g., if (function_A()) { reason = -ENOMEM; goto drop; } if (function_B()) { reason = -ENOMEM; goto drop; } > >> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with >> * device driver specific header >> */ >> + SKB_DROP_REASON_DEV_READY, /* device is not ready */ > > What is ready? link is not up? peer not connected? can we expand? > In this patchset, it is for either: - tun->tfiles[txq] is not set, or - !(tun->dev->flags & IFF_UP) I want to make it very generic so that the sk_buff dropped due to any device level data structure that is not up/ready/initialized/allocated will use this reason in the future. Thank you very much for suggestions! Dongli Zhang
Hi Menglong, On 3/1/22 8:16 PM, imagedong(董梦龙) wrote: > > > On 2022/3/2 AM 11:29,“David Ahern”<dsahern@gmail.com> write: > >> On 3/1/22 7:50 PM, Jakub Kicinski wrote: >>> On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: >>>> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ >>>> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ >>> > [...] >>>> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with >>>> * device driver specific header >>>> */ >>>> + SKB_DROP_REASON_DEV_READY, /* device is not ready */ >>> >>> What is ready? link is not up? peer not connected? can we expand? >> >> As I recall in this case it is the tfile for a tun device disappeared - >> ie., a race condition. > > This seems is that tun is not attached to a file (the tun device file > is not opened?) Maybe TAP_UNATTACHED is more suitable :) > > Thank you very much for the suggestions! TAP_UNATTACHED is more suitable. The tap/tun are only two of drivers in linux kernel. We have already introduced another two tap/tun specific reasons. My concern is we may finally have too many reasons. That's why I am always trying to define the reason as generic as possible so that they will be re-used by most networking drivers. We may expand the reason if it is fine to have too many reasons for skb_drop_reason. Dongli Zhang
On Wed, 2 Mar 2022 10:19:30 -0800 Dongli Zhang wrote: > On 3/1/22 6:50 PM, Jakub Kicinski wrote: > > On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: > >> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ > >> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ > > > > IDK if these are not too low level and therefore lacking meaning. > > > > What are your thoughts David? > > > > Would it be better to up level the names a little bit and call SKB_PULL > > something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe > > "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN? > > This is for device driver and I think for most of cases the people understanding > source code will be involved. I think SKB_PULL is more meaningful to people > understanding source code. > > The functions to pull data to skb is commonly used with the same pattern, and > not only for ETH_HLEN. E.g., I randomly found below in kernel source code. > > 1071 static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb) > 1072 { > ... ... > 1102 pulled_sci = pskb_may_pull(skb, macsec_extra_len(true)); > 1103 if (!pulled_sci) { > 1104 if (!pskb_may_pull(skb, macsec_extra_len(false))) > 1105 goto drop_direct; > 1106 } > ... ... > 1254 drop_direct: > 1255 kfree_skb(skb); > 1256 *pskb = NULL; > 1257 return RX_HANDLER_CONSUMED; > > > About 'L2_HDR_ERR', I am curious what the user/administrator may do as next > step, while the 'SKB_PULL' will be very clear to the developers which kernel > operation (e.g., to pull some protocol/hdr data to sk_buff data) is with the issue. > > I may use 'L2_HDR_ERR' if you prefer. We don't have to break it out per layer if you prefer. Let's call it HDR_TRUNC. I don't like SKB_PULL, people using these trace points are as likely to be BPF developers as kernel developers and skb_pull will not be meaningful to them. Besides the code can check if header is not truncated in other ways than pskb_may_pull(). And calling things by the name of the helper that failed is bad precedent. > > For SKB_TRIM the error comes from allocation failures, there may be > > a whole bunch of skb helpers which will fail only under mem pressure, > > would it be better to identify them and return some ENOMEM related > > reason, since, most likely, those will be noise to whoever is tracking > > real errors? > > The reasons I want to use SKB_TRIM: > > 1. To have SKB_PULL and SKB_TRIM (perhaps more SKB_XXX in the future in the same > set). > > 2. Although so that SKB_TRIM is always caused by ENOMEM, suppose if there is new > return values by pskb_trim(), the reason is not going to be valid any longer. > > > I may use SKB_DROP_REASON_NOMEM if you prefer. > > Another concern is that many functions may return -ENOMEM. It is more likely > that if there are two "goto drop" to return -ENOMEM, we will not be able to tell > from which function the sk_buff is dropped, e.g., > > if (function_A()) { > reason = -ENOMEM; > goto drop; > } > > if (function_B()) { > reason = -ENOMEM; > goto drop; > } Are you saying that you're intending to break out skb drop reasons by what entity failed to allocate memory? I'd think "skb was dropped because of OOM" is what should be reported. What we were trying to allocate is not very relevant (and can be gotten from the stack trace if needed). > >> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with > >> * device driver specific header > >> */ > >> + SKB_DROP_REASON_DEV_READY, /* device is not ready */ > > > > What is ready? link is not up? peer not connected? can we expand? > > In this patchset, it is for either: > > - tun->tfiles[txq] is not set, or > > - !(tun->dev->flags & IFF_UP) > > I want to make it very generic so that the sk_buff dropped due to any device > level data structure that is not up/ready/initialized/allocated will use this > reason in the future. Let's expand the documentation so someone reading thru the enum can feel confident if they are using this reason correctly. Side note - you may want to switch to inline comments to make writing more verbose documentation, I mean: /* This is the explanation of reason one which explains what * reason ones means, and how it should be used. We can make * use of full line width this way. */ SKB_DROP_REASON_ONE, /* And this is an explanation for reason two. */ SKB_DROP_REASON_TWO,
On Tue, 1 Mar 2022 20:29:37 -0700 David Ahern wrote: > On 3/1/22 7:50 PM, Jakub Kicinski wrote: > > On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: > >> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ > >> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ > > > > IDK if these are not too low level and therefore lacking meaning. > > > > What are your thoughts David? > > I agree. Not every kfree_skb is worthy of a reason. "Internal > housekeeping" errors are random and nothing a user / admin can do about > drops. > > IMHO, the value of the reason code is when it aligns with SNMP counters > (original motivation for this direction) and relevant details like TCP > or UDP checksum mismatch, packets for a socket that is not open, socket > is full, ring buffer is full, packets for "other host", etc. Agreed :(
Hi Jakub, On 3/2/22 11:17 AM, Jakub Kicinski wrote: > On Wed, 2 Mar 2022 10:19:30 -0800 Dongli Zhang wrote: >> On 3/1/22 6:50 PM, Jakub Kicinski wrote: >>> On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: >>>> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ >>>> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ >>> >>> IDK if these are not too low level and therefore lacking meaning. >>> >>> What are your thoughts David? >>> >>> Would it be better to up level the names a little bit and call SKB_PULL >>> something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe >>> "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN? >> >> This is for device driver and I think for most of cases the people understanding >> source code will be involved. I think SKB_PULL is more meaningful to people >> understanding source code. >> >> The functions to pull data to skb is commonly used with the same pattern, and >> not only for ETH_HLEN. E.g., I randomly found below in kernel source code. >> >> 1071 static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb) >> 1072 { >> ... ... >> 1102 pulled_sci = pskb_may_pull(skb, macsec_extra_len(true)); >> 1103 if (!pulled_sci) { >> 1104 if (!pskb_may_pull(skb, macsec_extra_len(false))) >> 1105 goto drop_direct; >> 1106 } >> ... ... >> 1254 drop_direct: >> 1255 kfree_skb(skb); >> 1256 *pskb = NULL; >> 1257 return RX_HANDLER_CONSUMED; >> >> >> About 'L2_HDR_ERR', I am curious what the user/administrator may do as next >> step, while the 'SKB_PULL' will be very clear to the developers which kernel >> operation (e.g., to pull some protocol/hdr data to sk_buff data) is with the issue. >> >> I may use 'L2_HDR_ERR' if you prefer. > > We don't have to break it out per layer if you prefer. Let's call it > HDR_TRUNC. > > I don't like SKB_PULL, people using these trace points are as likely > to be BPF developers as kernel developers and skb_pull will not be > meaningful to them. Besides the code can check if header is not > truncated in other ways than pskb_may_pull(). And calling things > by the name of the helper that failed is bad precedent. I will switch to SKB_DROP_REASON_HDR_TRUNC. > >>> For SKB_TRIM the error comes from allocation failures, there may be >>> a whole bunch of skb helpers which will fail only under mem pressure, >>> would it be better to identify them and return some ENOMEM related >>> reason, since, most likely, those will be noise to whoever is tracking >>> real errors? >> >> The reasons I want to use SKB_TRIM: >> >> 1. To have SKB_PULL and SKB_TRIM (perhaps more SKB_XXX in the future in the same >> set). >> >> 2. Although so that SKB_TRIM is always caused by ENOMEM, suppose if there is new >> return values by pskb_trim(), the reason is not going to be valid any longer. >> >> >> I may use SKB_DROP_REASON_NOMEM if you prefer. >> >> Another concern is that many functions may return -ENOMEM. It is more likely >> that if there are two "goto drop" to return -ENOMEM, we will not be able to tell >> from which function the sk_buff is dropped, e.g., >> >> if (function_A()) { >> reason = -ENOMEM; >> goto drop; >> } >> >> if (function_B()) { >> reason = -ENOMEM; >> goto drop; >> } > > Are you saying that you're intending to break out skb drop reasons > by what entity failed to allocate memory? I'd think "skb was dropped Yes. > because of OOM" is what should be reported. What we were trying to > allocate is not very relevant (and can be gotten from the stack trace > if needed). I think OOM is not enough. Although it may not be the case in this patchset, sometimes the allocation is failed because we are allocating a large chunk of physically continuous pages (kmalloc vs. vmalloc) while there is still plenty of memory pages available. As a kernel developer, it is very significant for me to identify the specific line/function and specific data structure that cause the error. E.g, the bug filer may be chasing which line is making trouble. It is less likely to SKB_TRIM more than once in a driver function, compared to ENOMEM. I am the user of this patchset and I prefer to make my work easier in the future :) > >>>> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with >>>> * device driver specific header >>>> */ >>>> + SKB_DROP_REASON_DEV_READY, /* device is not ready */ >>> >>> What is ready? link is not up? peer not connected? can we expand? >> >> In this patchset, it is for either: >> >> - tun->tfiles[txq] is not set, or >> >> - !(tun->dev->flags & IFF_UP) >> >> I want to make it very generic so that the sk_buff dropped due to any device >> level data structure that is not up/ready/initialized/allocated will use this >> reason in the future. > > Let's expand the documentation so someone reading thru the enum can > feel confident if they are using this reason correctly. > > Side note - you may want to switch to inline comments to make writing > more verbose documentation, I mean: > > /* This is the explanation of reason one which explains what > * reason ones means, and how it should be used. We can make > * use of full line width this way. > */ > SKB_DROP_REASON_ONE, > /* And this is an explanation for reason two. */ > SKB_DROP_REASON_TWO, > I will expand the comments. Thank you very much! Dongli Zhang
On Wed, 2 Mar 2022 14:21:31 -0800 Dongli Zhang wrote: > > because of OOM" is what should be reported. What we were trying to > > allocate is not very relevant (and can be gotten from the stack trace > > if needed). > > I think OOM is not enough. Although it may not be the case in this patchset, > sometimes the allocation is failed because we are allocating a large chunk of > physically continuous pages (kmalloc vs. vmalloc) while there is still plenty of > memory pages available. > > As a kernel developer, it is very significant for me to identify the specific > line/function and specific data structure that cause the error. E.g, the bug > filer may be chasing which line is making trouble. > > It is less likely to SKB_TRIM more than once in a driver function, compared to > ENOMEM. Nack, trim is meaningless.
Hi Jakub, On 3/2/22 9:21 PM, Jakub Kicinski wrote: > On Wed, 2 Mar 2022 14:21:31 -0800 Dongli Zhang wrote: >>> because of OOM" is what should be reported. What we were trying to >>> allocate is not very relevant (and can be gotten from the stack trace >>> if needed). >> >> I think OOM is not enough. Although it may not be the case in this patchset, >> sometimes the allocation is failed because we are allocating a large chunk of >> physically continuous pages (kmalloc vs. vmalloc) while there is still plenty of >> memory pages available. >> >> As a kernel developer, it is very significant for me to identify the specific >> line/function and specific data structure that cause the error. E.g, the bug >> filer may be chasing which line is making trouble. >> >> It is less likely to SKB_TRIM more than once in a driver function, compared to >> ENOMEM. > > Nack, trim is meaningless. > I will use SKB_DROP_REASON_NOMEM. Thank you very much! Dongli Zhang
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index aa27268edc5f..73ad2bb5e8ae 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1058,6 +1058,7 @@ static unsigned int run_ebpf_filter(struct tun_struct *tun, static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) { struct tun_struct *tun = netdev_priv(dev); + enum skb_drop_reason drop_reason; int txq = skb->queue_mapping; struct netdev_queue *queue; struct tun_file *tfile; @@ -1067,8 +1068,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) tfile = rcu_dereference(tun->tfiles[txq]); /* Drop packet if interface is not attached */ - if (!tfile) + if (!tfile) { + drop_reason = SKB_DROP_REASON_DEV_READY; goto drop; + } if (!rcu_dereference(tun->steering_prog)) tun_automq_xmit(tun, skb); @@ -1078,22 +1081,32 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) /* Drop if the filter does not like it. * This is a noop if the filter is disabled. * Filter can be enabled only for the TAP devices. */ - if (!check_filter(&tun->txflt, skb)) + if (!check_filter(&tun->txflt, skb)) { + drop_reason = SKB_DROP_REASON_TAP_TXFILTER; goto drop; + } if (tfile->socket.sk->sk_filter && - sk_filter(tfile->socket.sk, skb)) + sk_filter(tfile->socket.sk, skb)) { + drop_reason = SKB_DROP_REASON_SOCKET_FILTER; goto drop; + } len = run_ebpf_filter(tun, skb, len); - if (len == 0) + if (len == 0) { + drop_reason = SKB_DROP_REASON_TAP_FILTER; goto drop; + } - if (pskb_trim(skb, len)) + if (pskb_trim(skb, len)) { + drop_reason = SKB_DROP_REASON_SKB_TRIM; goto drop; + } - if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC))) + if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC))) { + drop_reason = SKB_DROP_REASON_SKB_COPY_DATA; goto drop; + } skb_tx_timestamp(skb); @@ -1104,8 +1117,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) nf_reset_ct(skb); - if (ptr_ring_produce(&tfile->tx_ring, skb)) + if (ptr_ring_produce(&tfile->tx_ring, skb)) { + drop_reason = SKB_DROP_REASON_FULL_RING; goto drop; + } /* NETIF_F_LLTX requires to do our own update of trans_start */ queue = netdev_get_tx_queue(dev, txq); @@ -1122,7 +1137,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) drop: atomic_long_inc(&dev->tx_dropped); skb_tx_error(skb); - kfree_skb(skb); + kfree_skb_reason(skb, drop_reason); rcu_read_unlock(); return NET_XMIT_DROP; } @@ -1720,6 +1735,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, u32 rxhash = 0; int skb_xdp = 1; bool frags = tun_napi_frags_enabled(tfile); + enum skb_drop_reason drop_reason; if (!(tun->flags & IFF_NO_PI)) { if (len < sizeof(pi)) @@ -1823,9 +1839,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, if (err) { err = -EFAULT; + drop_reason = SKB_DROP_REASON_SKB_COPY_DATA; drop: atomic_long_inc(&tun->dev->rx_dropped); - kfree_skb(skb); + kfree_skb_reason(skb, drop_reason); if (frags) { tfile->napi.skb = NULL; mutex_unlock(&tfile->napi_mutex); @@ -1872,6 +1889,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, case IFF_TAP: if (frags && !pskb_may_pull(skb, ETH_HLEN)) { err = -ENOMEM; + drop_reason = SKB_DROP_REASON_SKB_PULL; goto drop; } skb->protocol = eth_type_trans(skb, tun->dev); @@ -1925,6 +1943,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, if (unlikely(!(tun->dev->flags & IFF_UP))) { err = -EIO; rcu_read_unlock(); + drop_reason = SKB_DROP_REASON_DEV_READY; goto drop; } diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9f523da4d3f2..9a0a15a31591 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -385,10 +385,20 @@ enum skb_drop_reason { * sk_buff */ SKB_DROP_REASON_SKB_GSO_SEG, /* gso segmentation error */ + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ SKB_DROP_REASON_DEV_HDR, /* there is something wrong with * device driver specific header */ + SKB_DROP_REASON_DEV_READY, /* device is not ready */ SKB_DROP_REASON_FULL_RING, /* ring buffer is full */ + SKB_DROP_REASON_TAP_FILTER, /* dropped by (ebpf) filter directly + * attached to tun/tap, e.g., via + * TUNSETFILTEREBPF + */ + SKB_DROP_REASON_TAP_TXFILTER, /* dropped by tx filter implemented + * at tun/tap, e.g., check_filter() + */ SKB_DROP_REASON_MAX, }; diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 5b5f1351dcde..e8dcf784ac17 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -40,8 +40,13 @@ EM(SKB_DROP_REASON_SKB_CSUM, SKB_CSUM) \ EM(SKB_DROP_REASON_SKB_COPY_DATA, SKB_COPY_DATA) \ EM(SKB_DROP_REASON_SKB_GSO_SEG, SKB_GSO_SEG) \ + EM(SKB_DROP_REASON_SKB_PULL, SKB_PULL) \ + EM(SKB_DROP_REASON_SKB_TRIM, SKB_TRIM) \ EM(SKB_DROP_REASON_DEV_HDR, DEV_HDR) \ + EM(SKB_DROP_REASON_DEV_READY, DEV_READY) \ EM(SKB_DROP_REASON_FULL_RING, FULL_RING) \ + EM(SKB_DROP_REASON_TAP_FILTER, TAP_FILTER) \ + EM(SKB_DROP_REASON_TAP_TXFILTER, TAP_TXFILTER) \ EMe(SKB_DROP_REASON_MAX, MAX) #undef EM
The TUN can be used as vhost-net backend. E.g, the tun_net_xmit() is the interface to forward the skb from TUN to vhost-net/virtio-net. However, there are many "goto drop" in the TUN driver. Therefore, the kfree_skb_reason() is involved at each "goto drop" to help userspace ftrace/ebpf to track the reason for the loss of packets. The below reasons are introduced: - SKB_DROP_REASON_SKB_PULL - SKB_DROP_REASON_SKB_TRIM - SKB_DROP_REASON_DEV_READY - SKB_DROP_REASON_TAP_FILTER - SKB_DROP_REASON_TAP_TXFILTER Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> --- Changed since v1: - revise the reason name Changed since v2: - declare drop_reason as type "enum skb_drop_reason" Changed since v3: - rename to TAP_FILTER and TAP_TXFILTER - honor reverse xmas tree style declaration for 'drop_reason' in tun_net_xmit() drivers/net/tun.c | 37 ++++++++++++++++++++++++++++--------- include/linux/skbuff.h | 10 ++++++++++ include/trace/events/skb.h | 5 +++++ 3 files changed, 43 insertions(+), 9 deletions(-)