mbox series

[v2,net-next,0/2] net: snmp: tracepoint support for snmp

Message ID 20211118124812.106538-1-imagedong@tencent.com (mailing list archive)
Headers show
Series net: snmp: tracepoint support for snmp | expand

Message

Menglong Dong Nov. 18, 2021, 12:48 p.m. UTC
From: Menglong Dong <imagedong@tencent.com>

snmp is the network package statistics module in kernel, and it is
useful in network issue diagnosis, such as packet drop.

However, it is hard to get the detail information about the packet.
For example, we can know that there is something wrong with the
checksum of udp packet though 'InCsumErrors' of UDP protocol in
/proc/net/snmp, but we can't figure out the ip and port of the packet
that this error is happening on.

Add tracepoint for snmp. Therefor, users can use some tools (such as
eBPF) to get the information of the exceptional packet.

In the first patch, the frame of snmp-tracepoint is created. And in
the second patch, tracepoint for udp-snmp is introduced.

Changes since v1:
- use a single trace event for all statistics type, and special
  statistics can be filter by type (procotol) and field.

Now, it will looks like this for udp statistics:
$ cat trace
$ tracer: nop
$
$ entries-in-buffer/entries-written: 4/4   #P:1
$
$                                _-----=> irqs-off
$                               / _----=> need-resched
$                              | / _---=> hardirq/softirq
$                              || / _--=> preempt-depth
$                              ||| / _-=> migrate-disable
$                              |||| /     delay
$           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
$              | |         |   |||||     |         |
              nc-171     [000] ..s1.    35.952997: snmp: skbaddr=(____ptrval____), type=9, field=2, val=1
              nc-171     [000] .N...    35.957006: snmp: skbaddr=(____ptrval____), type=9, field=4, val=1
              nc-171     [000] ..s1.    35.957822: snmp: skbaddr=(____ptrval____), type=9, field=2, val=1
              nc-171     [000] .....    35.957893: snmp: skbaddr=(____ptrval____), type=9, field=4, val=1

'type=9' means that the event is triggered by udp statistics and 'field=2'
means that this is triggered by 'NoPorts'. 'val=1' means that increases
of statistics (decrease can happen on tcp).


Menglong Dong (2):
  net: snmp: add tracepoint support for snmp
  net: snmp: add snmp tracepoint support for udp

 include/net/udp.h           | 25 ++++++++++++++++-----
 include/trace/events/snmp.h | 44 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/snmp.h   | 21 ++++++++++++++++++
 net/core/net-traces.c       |  3 +++
 net/ipv4/udp.c              | 28 +++++++++++++----------
 5 files changed, 104 insertions(+), 17 deletions(-)
 create mode 100644 include/trace/events/snmp.h

Comments

Steven Rostedt Nov. 18, 2021, 2:46 p.m. UTC | #1
On Thu, 18 Nov 2021 20:48:10 +0800
menglong8.dong@gmail.com wrote:

>               nc-171     [000] ..s1.    35.952997: snmp: skbaddr=(____ptrval____), type=9, field=2, val=1
>               nc-171     [000] .N...    35.957006: snmp: skbaddr=(____ptrval____), type=9, field=4, val=1
>               nc-171     [000] ..s1.    35.957822: snmp: skbaddr=(____ptrval____), type=9, field=2, val=1
>               nc-171     [000] .....    35.957893: snmp: skbaddr=(____ptrval____), type=9, field=4, val=1
> 
> 'type=9' means that the event is triggered by udp statistics and 'field=2'
> means that this is triggered by 'NoPorts'. 'val=1' means that increases
> of statistics (decrease can happen on tcp).

Instead of printing numbers, why not print the meanings?

I'll reply to the other patches to explain that.

-- Steve
David Ahern Nov. 18, 2021, 3:36 p.m. UTC | #2
On 11/18/21 5:48 AM, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> snmp is the network package statistics module in kernel, and it is
> useful in network issue diagnosis, such as packet drop.
> 
> However, it is hard to get the detail information about the packet.
> For example, we can know that there is something wrong with the
> checksum of udp packet though 'InCsumErrors' of UDP protocol in
> /proc/net/snmp, but we can't figure out the ip and port of the packet
> that this error is happening on.
> 
> Add tracepoint for snmp. Therefor, users can use some tools (such as
> eBPF) to get the information of the exceptional packet.
> 
> In the first patch, the frame of snmp-tracepoint is created. And in
> the second patch, tracepoint for udp-snmp is introduced.
> 

there is already good infrastructure around kfree_skb - e.g., drop watch
monitor. Why not extend that in a way that other drop points can benefit
over time?

e.g., something like this (uncompiled and not tested; and to which
Steven is going to suggest strings for the reason):

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0bd6520329f6..e66e634acad0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1075,8 +1075,13 @@ static inline bool skb_unref(struct sk_buff *skb)
        return true;
 }

+enum skb_drop_reason {
+       SKB_DROP_REASON_NOT_SPECIFIED,
+       SKB_DROP_REASON_CSUM,
+}
 void skb_release_head_state(struct sk_buff *skb);
 void kfree_skb(struct sk_buff *skb);
+void kfree_skb_with_reason(struct sk_buff *skb, enum skb_drop_reason);
 void kfree_skb_list(struct sk_buff *segs);
 void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt);
 void skb_tx_error(struct sk_buff *skb);
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 9e92f22eb086..2a2d263f9d46 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -14,7 +14,7 @@
  */
 TRACE_EVENT(kfree_skb,

-       TP_PROTO(struct sk_buff *skb, void *location),
+       TP_PROTO(struct sk_buff *skb, void *location, enum
skb_drop_reason reason),

        TP_ARGS(skb, location),

@@ -22,16 +22,18 @@ TRACE_EVENT(kfree_skb,
                __field(        void *,         skbaddr         )
                __field(        void *,         location        )
                __field(        unsigned short, protocol        )
+               __field(        unsigned int,   reason          )
        ),

        TP_fast_assign(
                __entry->skbaddr = skb;
                __entry->location = location;
                __entry->protocol = ntohs(skb->protocol);
+               __entry->reason = reason;
        ),

-       TP_printk("skbaddr=%p protocol=%u location=%p",
-               __entry->skbaddr, __entry->protocol, __entry->location)
+       TP_printk("skbaddr=%p protocol=%u location=%p reason %u",
+               __entry->skbaddr, __entry->protocol, __entry->location,
__entry->reason)
 );

 TRACE_EVENT(consume_skb,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 67a9188d8a49..388059bda3d1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -770,11 +770,29 @@ void kfree_skb(struct sk_buff *skb)
        if (!skb_unref(skb))
                return;

-       trace_kfree_skb(skb, __builtin_return_address(0));
+       trace_kfree_skb(skb, __builtin_return_address(0),
SKB_DROP_REASON_NOT_SPECIFIED);
        __kfree_skb(skb);
 }
 EXPORT_SYMBOL(kfree_skb);

+/**
+ *     kfree_skb_with_reason - free an sk_buff
+ *     @skb: buffer to free
+ *     @reason: enum describing why the skb is dropped
+ *
+ *     Drop a reference to the buffer and free it if the usage count has
+ *     hit zero.
+ */
+void kfree_skb_with_reason(struct sk_buff *skb, enum skb_drop_reason
reason);
+{
+       if (!skb_unref(skb))
+               return;
+
+       trace_kfree_skb(skb, __builtin_return_address(0), reason);
+       __kfree_skb(skb);
+}
+EXPORT_SYMBOL(kfree_skb_with_reason);
+
 void kfree_skb_list(struct sk_buff *segs)
 {
        while (segs) {
Menglong Dong Nov. 19, 2021, 3:45 a.m. UTC | #3
Hello~

On Thu, Nov 18, 2021 at 11:36 PM David Ahern <dsahern@gmail.com> wrote:
>
[...]
>
> there is already good infrastructure around kfree_skb - e.g., drop watch
> monitor. Why not extend that in a way that other drop points can benefit
> over time?
>

Thanks for your advice.

In fact, I don't think that this is a perfect idea. This way may have benefit
of reuse the existing kfree_skb event, but this will do plentiful modification
to the current code. For example, in tcp_v4_rcv(), you need to introduce the
new variate 'int free_reason' and record the drop reason in it, and pass
it to 'kfree_skb_with_reason()' in 'discard_it:'. Many places need this kind
modification. What's more, some statistics don't use 'kfree_skb()'.

However, with the tracepoint for snmp, we just need to pass 'skb' to
'UDP_INC_STATS()/TCP_INC_STATS()', the reason is already included.
This way, the modification is more simple and easier to maintain.

Thanks!
Menglong Dong

> e.g., something like this (uncompiled and not tested; and to which
> Steven is going to suggest strings for the reason):
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0bd6520329f6..e66e634acad0 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1075,8 +1075,13 @@ static inline bool skb_unref(struct sk_buff *skb)
>         return true;
>  }
>
> +enum skb_drop_reason {
> +       SKB_DROP_REASON_NOT_SPECIFIED,
> +       SKB_DROP_REASON_CSUM,
> +}
>  void skb_release_head_state(struct sk_buff *skb);
>  void kfree_skb(struct sk_buff *skb);
> +void kfree_skb_with_reason(struct sk_buff *skb, enum skb_drop_reason);
>  void kfree_skb_list(struct sk_buff *segs);
>  void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt);
>  void skb_tx_error(struct sk_buff *skb);
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index 9e92f22eb086..2a2d263f9d46 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -14,7 +14,7 @@
>   */
>  TRACE_EVENT(kfree_skb,
>
> -       TP_PROTO(struct sk_buff *skb, void *location),
> +       TP_PROTO(struct sk_buff *skb, void *location, enum
> skb_drop_reason reason),
>
>         TP_ARGS(skb, location),
>
> @@ -22,16 +22,18 @@ TRACE_EVENT(kfree_skb,
>                 __field(        void *,         skbaddr         )
>                 __field(        void *,         location        )
>                 __field(        unsigned short, protocol        )
> +               __field(        unsigned int,   reason          )
>         ),
>
>         TP_fast_assign(
>                 __entry->skbaddr = skb;
>                 __entry->location = location;
>                 __entry->protocol = ntohs(skb->protocol);
> +               __entry->reason = reason;
>         ),
>
> -       TP_printk("skbaddr=%p protocol=%u location=%p",
> -               __entry->skbaddr, __entry->protocol, __entry->location)
> +       TP_printk("skbaddr=%p protocol=%u location=%p reason %u",
> +               __entry->skbaddr, __entry->protocol, __entry->location,
> __entry->reason)
>  );
>
>  TRACE_EVENT(consume_skb,
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 67a9188d8a49..388059bda3d1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -770,11 +770,29 @@ void kfree_skb(struct sk_buff *skb)
>         if (!skb_unref(skb))
>                 return;
>
> -       trace_kfree_skb(skb, __builtin_return_address(0));
> +       trace_kfree_skb(skb, __builtin_return_address(0),
> SKB_DROP_REASON_NOT_SPECIFIED);
>         __kfree_skb(skb);
>  }
>  EXPORT_SYMBOL(kfree_skb);
>
> +/**
> + *     kfree_skb_with_reason - free an sk_buff
> + *     @skb: buffer to free
> + *     @reason: enum describing why the skb is dropped
> + *
> + *     Drop a reference to the buffer and free it if the usage count has
> + *     hit zero.
> + */
> +void kfree_skb_with_reason(struct sk_buff *skb, enum skb_drop_reason
> reason);
> +{
> +       if (!skb_unref(skb))
> +               return;
> +
> +       trace_kfree_skb(skb, __builtin_return_address(0), reason);
> +       __kfree_skb(skb);
> +}
> +EXPORT_SYMBOL(kfree_skb_with_reason);
> +
>  void kfree_skb_list(struct sk_buff *segs)
>  {
>         while (segs) {
David Ahern Nov. 19, 2021, 3:54 a.m. UTC | #4
On 11/18/21 8:45 PM, Menglong Dong wrote:
> Hello~
> 
> On Thu, Nov 18, 2021 at 11:36 PM David Ahern <dsahern@gmail.com> wrote:
>>
> [...]
>>
>> there is already good infrastructure around kfree_skb - e.g., drop watch
>> monitor. Why not extend that in a way that other drop points can benefit
>> over time?
>>
> 
> Thanks for your advice.
> 
> In fact, I don't think that this is a perfect idea. This way may have benefit
> of reuse the existing kfree_skb event, but this will do plentiful modification
> to the current code. For example, in tcp_v4_rcv(), you need to introduce the
> new variate 'int free_reason' and record the drop reason in it, and pass
> it to 'kfree_skb_with_reason()' in 'discard_it:'. Many places need this kind
> modification. What's more, some statistics don't use 'kfree_skb()'.
> 
> However, with the tracepoint for snmp, we just need to pass 'skb' to
> 'UDP_INC_STATS()/TCP_INC_STATS()', the reason is already included.
> This way, the modification is more simple and easier to maintain.
> 

But it integrates into existing tooling which is a big win.

Ido gave the references for his work:
https://github.com/nhorman/dropwatch/pull/11
https://github.com/nhorman/dropwatch/commit/199440959a288dd97e3b7ae701d4e78968cddab7

And the Wireshark dissector is also upstream:
https://github.com/wireshark/wireshark/commit/a94a860c0644ec3b8a129fd243674a2e376ce1c8

i.e., the skb is already pushed to userspace for packet analysis. You
would just be augmenting more metadata along with it and not reinventing
all of this for just snmp counter based drops.
Menglong Dong Nov. 21, 2021, 10:47 a.m. UTC | #5
On Fri, Nov 19, 2021 at 11:54 AM David Ahern <dsahern@gmail.com> wrote:
>
[...]
>
> But it integrates into existing tooling which is a big win.
>
> Ido gave the references for his work:
> https://github.com/nhorman/dropwatch/pull/11
> https://github.com/nhorman/dropwatch/commit/199440959a288dd97e3b7ae701d4e78968cddab7
>

I have been thinking about this all day, and I think your words make sense.
Indeed, this can make use of the frame of the 'drop monitor' module of kernel
and the userspace tools of wireshark, dropwatch, etc. And this idea is more
suitable for the aim of 'get the reason for packet drop'. However, the
shortcoming
of this idea is that it can't reuse the drop reason for the 'snmp'
frame.

With creating a tracepoint for 'snmp', it can make use of the 'snmp' frame and
the modifications can be easier. However, it's not friendly to the
users, such as
dropwatch, wireshark, etc. And it seems it is a little redundant with what
the tracepoint for 'kfree_sbk()' do. However, I think it's not
difficult to develop
a userspace tool. In fact, I have already write a tool based on BCC, which is
able to make use of 'snmp' tracepoint, such as:

$ sudo ./nettrace.py --tracer snmp -p udp --addr 192.168.122.8
begin tracing......
785487.366412: [snmp][udplite_noports]: UDP: 192.168.122.8:35310 ->
192.168.122.1:7979

And it can monitor packet drop of udp with ip 192.168.122.8 (filter by port,
statistics type are supported too).

And maybe we can integrate tracepoint of  'snmp' into 'drop monitor' with
NET_DM_ATTR_SNMP, just link NET_DM_ATTR_SW_DROPS and
NET_DM_ATTR_HW_DROPS?

@Steven What do you think? I think I'm ok with both ideas, as my main target
is to get the reason for the packet drop. As for the idea of
'kfree_skb_with_reason', I'm just a little worry about if we can accept the
modification it brings in.

Thanks!
Menglong Dong

> And the Wireshark dissector is also upstream:
> https://github.com/wireshark/wireshark/commit/a94a860c0644ec3b8a129fd243674a2e376ce1c8
>
> i.e., the skb is already pushed to userspace for packet analysis. You
> would just be augmenting more metadata along with it and not reinventing
> all of this for just snmp counter based drops.
Steven Rostedt Nov. 21, 2021, 2:02 p.m. UTC | #6
On Sun, 21 Nov 2021 18:47:21 +0800
Menglong Dong <menglong8.dong@gmail.com> wrote:

> @Steven What do you think? I think I'm ok with both ideas, as my main target
> is to get the reason for the packet drop. As for the idea of
> 'kfree_skb_with_reason', I'm just a little worry about if we can accept the
> modification it brings in.

The use cases of trace events is really up to the subsystem
maintainers. I only make sure that the trace events are done properly.

So I'm not sure exactly what you are asking me.

-- Steve
David Ahern Nov. 22, 2021, 4:42 p.m. UTC | #7
On 11/21/21 3:47 AM, Menglong Dong wrote:
> On Fri, Nov 19, 2021 at 11:54 AM David Ahern <dsahern@gmail.com> wrote:
>>
> [...]
>>
>> But it integrates into existing tooling which is a big win.
>>
>> Ido gave the references for his work:
>> https://github.com/nhorman/dropwatch/pull/11
>> https://github.com/nhorman/dropwatch/commit/199440959a288dd97e3b7ae701d4e78968cddab7
>>
> 
> I have been thinking about this all day, and I think your words make sense.
> Indeed, this can make use of the frame of the 'drop monitor' module of kernel
> and the userspace tools of wireshark, dropwatch, etc. And this idea is more
> suitable for the aim of 'get the reason for packet drop'. However, the
> shortcoming
> of this idea is that it can't reuse the drop reason for the 'snmp'
> frame.
> 
> With creating a tracepoint for 'snmp', it can make use of the 'snmp' frame and
> the modifications can be easier. However, it's not friendly to the
> users, such as
> dropwatch, wireshark, etc. And it seems it is a little redundant with what
> the tracepoint for 'kfree_sbk()' do. However, I think it's not
> difficult to develop
> a userspace tool. In fact, I have already write a tool based on BCC, which is
> able to make use of 'snmp' tracepoint, such as:
> 
> $ sudo ./nettrace.py --tracer snmp -p udp --addr 192.168.122.8
> begin tracing......
> 785487.366412: [snmp][udplite_noports]: UDP: 192.168.122.8:35310 ->
> 192.168.122.1:7979
> 
> And it can monitor packet drop of udp with ip 192.168.122.8 (filter by port,
> statistics type are supported too).
> 
> And maybe we can integrate tracepoint of  'snmp' into 'drop monitor' with
> NET_DM_ATTR_SNMP, just link NET_DM_ATTR_SW_DROPS and
> NET_DM_ATTR_HW_DROPS?
> 

you don't need to add 'snmp' to drop monitor; you only need to add
NET_DM_ATTR_ to the existing one.

This is the end of __udp4_lib_rcv:

        __UDP_INC_STATS(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
drop:
        __UDP_INC_STATS(net, UDP_MIB_INERRORS, proto == IPPROTO_UDPLITE);
        kfree_skb(skb);

you want to add a tracepoint at both UDP_INC_STATS making 3 consecutive
lines that give access to the dropped skb with only slight variations in
metadata.

The last one, kfree_skb, gives you the address of the drop + the skb for
analysis. Just add the metadata to the existing drop monitor.
Menglong Dong Nov. 24, 2021, 2:56 a.m. UTC | #8
On Tue, Nov 23, 2021 at 12:42 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 11/21/21 3:47 AM, Menglong Dong wrote:
> > On Fri, Nov 19, 2021 at 11:54 AM David Ahern <dsahern@gmail.com> wrote:
> >>
> > [...]
> >>
> >> But it integrates into existing tooling which is a big win.
> >>
> >> Ido gave the references for his work:
> >> https://github.com/nhorman/dropwatch/pull/11
> >> https://github.com/nhorman/dropwatch/commit/199440959a288dd97e3b7ae701d4e78968cddab7
> >>
> >
> > I have been thinking about this all day, and I think your words make sense.
> > Indeed, this can make use of the frame of the 'drop monitor' module of kernel
> > and the userspace tools of wireshark, dropwatch, etc. And this idea is more
> > suitable for the aim of 'get the reason for packet drop'. However, the
> > shortcoming
> > of this idea is that it can't reuse the drop reason for the 'snmp'
> > frame.
> >
> > With creating a tracepoint for 'snmp', it can make use of the 'snmp' frame and
> > the modifications can be easier. However, it's not friendly to the
> > users, such as
> > dropwatch, wireshark, etc. And it seems it is a little redundant with what
> > the tracepoint for 'kfree_sbk()' do. However, I think it's not
> > difficult to develop
> > a userspace tool. In fact, I have already write a tool based on BCC, which is
> > able to make use of 'snmp' tracepoint, such as:
> >
> > $ sudo ./nettrace.py --tracer snmp -p udp --addr 192.168.122.8
> > begin tracing......
> > 785487.366412: [snmp][udplite_noports]: UDP: 192.168.122.8:35310 ->
> > 192.168.122.1:7979
> >
> > And it can monitor packet drop of udp with ip 192.168.122.8 (filter by port,
> > statistics type are supported too).
> >
> > And maybe we can integrate tracepoint of  'snmp' into 'drop monitor' with
> > NET_DM_ATTR_SNMP, just link NET_DM_ATTR_SW_DROPS and
> > NET_DM_ATTR_HW_DROPS?
> >
>
> you don't need to add 'snmp' to drop monitor; you only need to add
> NET_DM_ATTR_ to the existing one.
>
> This is the end of __udp4_lib_rcv:
>
>         __UDP_INC_STATS(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
> drop:
>         __UDP_INC_STATS(net, UDP_MIB_INERRORS, proto == IPPROTO_UDPLITE);
>         kfree_skb(skb);
>
> you want to add a tracepoint at both UDP_INC_STATS making 3 consecutive
> lines that give access to the dropped skb with only slight variations in
> metadata.
>
> The last one, kfree_skb, gives you the address of the drop + the skb for
> analysis. Just add the metadata to the existing drop monitor.

Ok, get it! Thanks~

Menglong Dong

>