Message ID | 20230110091356.1524-1-cuiyunhui@bytedance.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v5] sock: add tracepoint for send recv length | expand |
On Tue, Jan 10, 2023 at 10:15 AM Yunhui Cui <cuiyunhui@bytedance.com> wrote: > > Add 2 tracepoints to monitor the tcp/udp traffic > of per process and per cgroup. > > Regarding monitoring the tcp/udp traffic of each process, there are two > existing solutions, the first one is https://www.atoptool.nl/netatop.php. > The second is via kprobe/kretprobe. > > Netatop solution is implemented by registering the hook function at the > hook point provided by the netfilter framework. > > These hook functions may be in the soft interrupt context and cannot > directly obtain the pid. Some data structures are added to bind packets > and processes. For example, struct taskinfobucket, struct taskinfo ... > > Every time the process sends and receives packets it needs multiple > hashmaps,resulting in low performance and it has the problem fo inaccurate > tcp/udp traffic statistics(for example: multiple threads share sockets). > > We can obtain the information with kretprobe, but as we know, kprobe gets > the result by trappig in an exception, which loses performance compared > to tracepoint. > > We compared the performance of tracepoints with the above two methods, and > the results are as follows: > > ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html > without trace: > Time per request: 39.660 [ms] (mean) > Time per request: 0.040 [ms] (mean, across all concurrent requests) > > netatop: > Time per request: 50.717 [ms] (mean) > Time per request: 0.051 [ms] (mean, across all concurrent requests) > > kr: > Time per request: 43.168 [ms] (mean) > Time per request: 0.043 [ms] (mean, across all concurrent requests) > > tracepoint: > Time per request: 41.004 [ms] (mean) > Time per request: 0.041 [ms] (mean, across all concurrent requests > > It can be seen that tracepoint has better performance. > > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> > Signed-off-by: Xiongchun Duan <duanxiongchun@bytedance.com> > --- > include/trace/events/sock.h | 44 +++++++++++++++++++++++++++++++++++++ > net/socket.c | 36 ++++++++++++++++++++++++++---- > 2 files changed, 76 insertions(+), 4 deletions(-) > ... > +static noinline void call_trace_sock_recv_length(struct sock *sk, int ret, int flags) > +{ > + trace_sock_recv_length(sk, !(flags & MSG_PEEK) ? ret : > + (ret < 0 ? ret : 0), flags); Maybe we should only 'fast assign' the two fields (ret and flags), and let this logic happen later at 'print' time ? This would reduce storage by one integer, and make fast path really fast. This also could potentially remove the need for the peculiar construct with these noinline helpers. > +} > + > static inline int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg, > int flags) > { > - return INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg, > - inet_recvmsg, sock, msg, msg_data_left(msg), > - flags); > + int ret = INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg, > + inet_recvmsg, sock, msg, > + msg_data_left(msg), flags); > + > + if (trace_sock_recv_length_enabled()) > + call_trace_sock_recv_length(sock->sk, !(flags & MSG_PEEK) ? > + ret : (ret < 0 ? ret : 0), flags); > + return ret; > } Maybe you meant : if (trace_sock_recv_length_enabled()) call_trace_sock_recv_length(sock->sk, ret, flags); ? Please make sure to test your patches.
On Tue, 10 Jan 2023 12:49:30 +0100 Eric Dumazet <edumazet@google.com> wrote: > > +static noinline void call_trace_sock_recv_length(struct sock *sk, int ret, int flags) > > +{ > > + trace_sock_recv_length(sk, !(flags & MSG_PEEK) ? ret : > > + (ret < 0 ? ret : 0), flags); > > Maybe we should only 'fast assign' the two fields (ret and flags), > and let this logic happen later at 'print' time ? > > This would reduce storage by one integer, and make fast path really fast. > > This also could potentially remove the need for the peculiar construct with > these noinline helpers. I noticed that the trace_sock_send_length() doesn't have this logic, and they are both DEFINE_EVENT() of the same DECLARE_EVENT_CLASS(). But we could change this too, by the following: /* * sock send/recv msg length */ DECLARE_EVENT_CLASS(sock_msg_length, TP_PROTO(struct sock *sk, int ret, int flags), TP_ARGS(sk, ret, flags), TP_STRUCT__entry( __field(void *, sk) __field(__u16, family) __field(__u16, protocol) __field(int, ret) __field(int, flags) ), TP_fast_assign( __entry->sk = sk; __entry->family = sk->sk_family; __entry->protocol = sk->sk_protocol; __entry->length = ret > 0 ? ret : 0; __entry->error = ret < 0 ? ret : 0; __entry->flags = flags; ), TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x", __entry->sk, show_family_name(__entry->family), show_inet_protocol_name(__entry->protocol), __entry->ret > 0 ? ret : 0, __entry->ret < 0 ? ret : 0, __entry->flags) ); DEFINE_EVENT(sock_msg_length, sock_send_length, TP_PROTO(struct sock *sk, int ret, int flags), TP_ARGS(sk, ret, flags) ); DEFINE_EVENT_PRINT(sock_msg_length, sock_recv_length, TP_PROTO(struct sock *sk, int ret, int flags), TP_ARGS(sk, ret, flags) TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x", __entry->sk, show_family_name(__entry->family), show_inet_protocol_name(__entry->protocol), !(__entry->flags & MSG_PEEK) ? __entry->ret : __entry->ret > 0 ? ret : 0, __entry->ret < 0 ? ret : 0, __entry->flags) ); #endif /* _TRACE_SOCK_H */ As DEFINE_EVENT_PRINT() uses the class template, but overrides the TP_printk() portion (still saving memory). And then both calls can just do: trace_sock_send_length(sk, ret, 0); trace_sock_recv_length(sock->sk, ret, flags); And I bet that will also solve all the gcc being smart waste. -- Steve
On Tue, Jan 10, 2023 at 11:44 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x", > __entry->sk, show_family_name(__entry->family), > show_inet_protocol_name(__entry->protocol), > __entry->ret > 0 ? ret : 0, __entry->ret < 0 ? ret : 0, > __entry->flags) > ); > > DEFINE_EVENT(sock_msg_length, sock_send_length, > TP_PROTO(struct sock *sk, int ret, int flags), > > TP_ARGS(sk, ret, flags) > ); > > DEFINE_EVENT_PRINT(sock_msg_length, sock_recv_length, > TP_PROTO(struct sock *sk, int ret, int flags), > > TP_ARGS(sk, ret, flags) > > TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x", > __entry->sk, show_family_name(__entry->family), > show_inet_protocol_name(__entry->protocol), > !(__entry->flags & MSG_PEEK) ? __entry->ret : __entry->ret > 0 ? ret : 0, > __entry->ret < 0 ? ret : 0, > __entry->flags) > ); > #endif /* _TRACE_SOCK_H */ > > As DEFINE_EVENT_PRINT() uses the class template, but overrides the > TP_printk() portion (still saving memory). > Hi Steve, Based on your suggestion, can we use the following code instead of using DEFINE_EVENT_PRINT ? DECLARE_EVENT_CLASS(sock_msg_length, TP_PROTO(struct sock *sk, int ret, int flags), TP_ARGS(sk, ret, flags), TP_STRUCT__entry( __field(void *, sk) __field(__u16, family) __field(__u16, protocol) __field(int, ret) __field(int, flags) ), TP_fast_assign( __entry->sk = sk; __entry->family = sk->sk_family; __entry->protocol = sk->sk_protocol; __entry->ret = ret; __entry->flags = flags; ), TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x", __entry->sk, show_family_name(__entry->family), show_inet_protocol_name(__entry->protocol), !(__entry->flags & MSG_PEEK) ? (__entry->ret > 0 ? __entry->ret : 0) : 0, __entry->ret < 0 ? __entry->ret : 0, __entry->flags) ); DEFINE_EVENT(sock_msg_length, sock_send_length, TP_PROTO(struct sock *sk, int ret, int flags), TP_ARGS(sk, ret, flags) ); DEFINE_EVENT(sock_msg_length, sock_recv_length, TP_PROTO(struct sock *sk, int ret, int flags), TP_ARGS(sk, ret, flags) ); > And then both calls can just do: > > trace_sock_send_length(sk, ret, 0); > > trace_sock_recv_length(sock->sk, ret, flags); > > And I bet that will also solve all the gcc being smart waste. > > -- Steve > Btw, we still need noinline helpers, right? Otherwise the following code would be inlined into sock_recvmsg: mov 0x18(%r13),%rsi mov %gs:0x7e832d87(%rip),%eax # 0x2e68c <pcpu_hot+12> mov %eax,%eax bt %rax,0xdca591(%rip) # 0xffffffff825c5ea0 <__cpu_online_mask> Thanks, Yunhui
On Wed, 11 Jan 2023 11:53:24 +0800 运辉崔 <cuiyunhui@bytedance.com> wrote: > Hi Steve, Based on your suggestion, can we use the following code > instead of using DEFINE_EVENT_PRINT ? I only suggested it because you didn't have that logic for the sock_send_length trace event. But if you don't care about that one, then sure, use it for both. -- Steve > > DECLARE_EVENT_CLASS(sock_msg_length, > > TP_PROTO(struct sock *sk, int ret, int flags), > > TP_ARGS(sk, ret, flags), > > TP_STRUCT__entry( > __field(void *, sk) > __field(__u16, family) > __field(__u16, protocol) > __field(int, ret) > __field(int, flags) > ), > > TP_fast_assign( > __entry->sk = sk; > __entry->family = sk->sk_family; > __entry->protocol = sk->sk_protocol; > __entry->ret = ret; > __entry->flags = flags; > ), > > TP_printk("sk address = %p, family = %s protocol = %s, length > = %d, error = %d, flags = 0x%x", > __entry->sk, show_family_name(__entry->family), > show_inet_protocol_name(__entry->protocol), > !(__entry->flags & MSG_PEEK) ? > (__entry->ret > 0 ? __entry->ret : 0) : 0, > __entry->ret < 0 ? __entry->ret : 0, > __entry->flags) > );
On Wed, Jan 11, 2023 at 12:12 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Wed, 11 Jan 2023 11:53:24 +0800 > 运辉崔 <cuiyunhui@bytedance.com> wrote: > > > Hi Steve, Based on your suggestion, can we use the following code > > instead of using DEFINE_EVENT_PRINT ? > > I only suggested it because you didn't have that logic for the > sock_send_length trace event. But if you don't care about that one, then > sure, use it for both. > > -- Steve > Ok, Thank you, I'll post v6 later.
diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h index 777ee6cbe933..2c380cb110a3 100644 --- a/include/trace/events/sock.h +++ b/include/trace/events/sock.h @@ -263,6 +263,50 @@ TRACE_EVENT(inet_sk_error_report, __entry->error) ); +/* + * sock send/recv msg length + */ +DECLARE_EVENT_CLASS(sock_msg_length, + + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags), + + TP_STRUCT__entry( + __field(void *, sk) + __field(__u16, family) + __field(__u16, protocol) + __field(int, length) + __field(int, error) + __field(int, flags) + ), + + TP_fast_assign( + __entry->sk = sk; + __entry->family = sk->sk_family; + __entry->protocol = sk->sk_protocol; + __entry->length = ret > 0 ? ret : 0; + __entry->error = ret < 0 ? ret : 0; + __entry->flags = flags; + ), + + TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x", + __entry->sk, show_family_name(__entry->family), + show_inet_protocol_name(__entry->protocol), + __entry->length, __entry->error, __entry->flags) +); + +DEFINE_EVENT(sock_msg_length, sock_send_length, + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags) +); + +DEFINE_EVENT(sock_msg_length, sock_recv_length, + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags) +); #endif /* _TRACE_SOCK_H */ /* This part must be outside protection */ diff --git a/net/socket.c b/net/socket.c index 888cd618a968..6180d0ad47f9 100644 --- a/net/socket.c +++ b/net/socket.c @@ -106,6 +106,7 @@ #include <net/busy_poll.h> #include <linux/errqueue.h> #include <linux/ptp_clock_kernel.h> +#include <trace/events/sock.h> #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int sysctl_net_busy_read __read_mostly; @@ -709,12 +710,22 @@ INDIRECT_CALLABLE_DECLARE(int inet_sendmsg(struct socket *, struct msghdr *, size_t)); INDIRECT_CALLABLE_DECLARE(int inet6_sendmsg(struct socket *, struct msghdr *, size_t)); + +static noinline void call_trace_sock_send_length(struct sock *sk, int ret, + int flags) +{ + trace_sock_send_length(sk, ret, 0); +} + static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg) { int ret = INDIRECT_CALL_INET(sock->ops->sendmsg, inet6_sendmsg, inet_sendmsg, sock, msg, msg_data_left(msg)); BUG_ON(ret == -EIOCBQUEUED); + + if (trace_sock_send_length_enabled()) + call_trace_sock_send_length(sock->sk, ret, 0); return ret; } @@ -989,12 +1000,24 @@ INDIRECT_CALLABLE_DECLARE(int inet_recvmsg(struct socket *, struct msghdr *, size_t, int)); INDIRECT_CALLABLE_DECLARE(int inet6_recvmsg(struct socket *, struct msghdr *, size_t, int)); + +static noinline void call_trace_sock_recv_length(struct sock *sk, int ret, int flags) +{ + trace_sock_recv_length(sk, !(flags & MSG_PEEK) ? ret : + (ret < 0 ? ret : 0), flags); +} + static inline int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg, int flags) { - return INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg, - inet_recvmsg, sock, msg, msg_data_left(msg), - flags); + int ret = INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg, + inet_recvmsg, sock, msg, + msg_data_left(msg), flags); + + if (trace_sock_recv_length_enabled()) + call_trace_sock_recv_length(sock->sk, !(flags & MSG_PEEK) ? + ret : (ret < 0 ? ret : 0), flags); + return ret; } /** @@ -1044,6 +1067,7 @@ static ssize_t sock_sendpage(struct file *file, struct page *page, { struct socket *sock; int flags; + int ret; sock = file->private_data; @@ -1051,7 +1075,11 @@ static ssize_t sock_sendpage(struct file *file, struct page *page, /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */ flags |= more; - return kernel_sendpage(sock, page, offset, size, flags); + ret = kernel_sendpage(sock, page, offset, size, flags); + + if (trace_sock_send_length_enabled()) + call_trace_sock_send_length(sock->sk, ret, 0); + return ret; } static ssize_t sock_splice_read(struct file *file, loff_t *ppos,