Message ID | 20250416-udp_sendmsg-v1-1-1a886b8733c2@debian.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] udp: Add tracepoint for udp_sendmsg() | expand |
Breno Leitao wrote: > Add a lightweight tracepoint to monitor UDP send message operations, > similar to the recently introduced tcp_sendmsg_locked() trace event in > commit 0f08335ade712 ("trace: tcp: Add tracepoint for > tcp_sendmsg_locked()") > > This implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > creating extensive trace event infrastructure and exporting to tracefs, > keeping it minimal and efficient. > > Since this patch creates a rawtracepoint, it can be accessed using > standard tracing tools like bpftrace: > > rawtracepoint:udp_sendmsg_tp { > ... > } What does this enable beyond kfunc:udp_sendmsg? The arguments are the same, unlike the TCP tracepoint.
On 4/16/25 1:23 PM, Breno Leitao wrote: > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index f9f5b92cf4b61..8c2902504a399 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -1345,6 +1345,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) > connected = 1; > } > > + trace_udp_sendmsg_tp(sk, msg); why `_tp` suffix? the usual naming is trace_${func}
On 4/16/25 9:23 PM, Breno Leitao wrote: > Add a lightweight tracepoint to monitor UDP send message operations, > similar to the recently introduced tcp_sendmsg_locked() trace event in > commit 0f08335ade712 ("trace: tcp: Add tracepoint for > tcp_sendmsg_locked()") Why is it needed? what would add on top of a plain perf probe, which will be always available for such function with such argument, as the function can't be inlined? Thanks, Paolo
Hello Willem, On Wed, Apr 16, 2025 at 03:34:38PM -0400, Willem de Bruijn wrote: > Breno Leitao wrote: > > Add a lightweight tracepoint to monitor UDP send message operations, > > similar to the recently introduced tcp_sendmsg_locked() trace event in > > commit 0f08335ade712 ("trace: tcp: Add tracepoint for > > tcp_sendmsg_locked()") > > > > This implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > > creating extensive trace event infrastructure and exporting to tracefs, > > keeping it minimal and efficient. > > > > Since this patch creates a rawtracepoint, it can be accessed using > > standard tracing tools like bpftrace: > > > > rawtracepoint:udp_sendmsg_tp { > > ... > > } > > What does this enable beyond kfunc:udp_sendmsg? A few things come to mind when evaluating the use of tracepoints. One significant advantage is that tracepoints provide a stable API where programs can hook into, making it easier for users to interact with key functions. However, kfunc/kprobes has some notable disadvantages. For instance, partial or total inlining can cause hooks to fail, which is not ideal and can lead to issues (mainly when we have partial inlines, and the hook works _sometimes_). In contrast, tracepoints create a more stable API for users to hook into, eliminating the need to patch the kernel with noinline function attributes. This solution may be ugly, but it is a common practice. (and this is my main goal with it, remove `noinline` from our internal kernel) While tracepoints are not officially considered stable APIs, they tend to be "more stable" in practice due to their deliberate and strategic placement. As a result, they are less likely to get renamed or changed frequently. Additionally, tracepoints offer performance benefits, being faster than both kfunc and kprobes. For further discussion on this topic, please refer to same discussion in VFS: https://lore.kernel.org/bpf/20250118033723.GV1977892@ZenIV/T/#m4c2fb2d904e839b34800daf8578dff0b9abd69a0 Thanks --breno
Hello Paolo, On Thu, Apr 17, 2025 at 08:57:24AM +0200, Paolo Abeni wrote: > On 4/16/25 9:23 PM, Breno Leitao wrote: > > Add a lightweight tracepoint to monitor UDP send message operations, > > similar to the recently introduced tcp_sendmsg_locked() trace event in > > commit 0f08335ade712 ("trace: tcp: Add tracepoint for > > tcp_sendmsg_locked()") > > Why is it needed? what would add on top of a plain perf probe, which > will be always available for such function with such argument, as the > function can't be inlined? Why this function can't be inlined? I got the impression that this funciton could be, at least, partially inlined. Mainly when generating ultra optimized kernels (i.e, kernels compiled with PGO and LTO features enabled). Thanks, --breno
Hello David, On Wed, Apr 16, 2025 at 04:16:26PM -0700, David Ahern wrote: > On 4/16/25 1:23 PM, Breno Leitao wrote: > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > > index f9f5b92cf4b61..8c2902504a399 100644 > > --- a/net/ipv4/udp.c > > +++ b/net/ipv4/udp.c > > @@ -1345,6 +1345,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) > > connected = 1; > > } > > > > + trace_udp_sendmsg_tp(sk, msg); > > why `_tp` suffix? the usual naming is trace_${func} I got the impression that DECLARE_TRACE() raw tracepoints names end up in _tp(): include/trace/events/tcp.h DECLARE_TRACE(tcp_cwnd_reduction_tp, include/trace/events/sched.h DECLARE_TRACE(pelt_cfs_tp, DECLARE_TRACE(pelt_rt_tp, DECLARE_TRACE(pelt_dl_tp, DECLARE_TRACE(pelt_hw_tp, DECLARE_TRACE(pelt_irq_tp, DECLARE_TRACE(pelt_se_tp, DECLARE_TRACE(sched_cpu_capacity_tp, DECLARE_TRACE(sched_overutilized_tp, DECLARE_TRACE(sched_util_est_cfs_tp, DECLARE_TRACE(sched_util_est_se_tp, DECLARE_TRACE(sched_update_nr_running_tp, DECLARE_TRACE(sched_compute_energy_tp, DECLARE_TRACE(sched_entry_tp, DECLARE_TRACE(sched_exit_tp, But, I am happy to remove _tp if that is not correct. Thanks, --breno
On 4/17/25 1:34 PM, Breno Leitao wrote: > On Thu, Apr 17, 2025 at 08:57:24AM +0200, Paolo Abeni wrote: >> On 4/16/25 9:23 PM, Breno Leitao wrote: >>> Add a lightweight tracepoint to monitor UDP send message operations, >>> similar to the recently introduced tcp_sendmsg_locked() trace event in >>> commit 0f08335ade712 ("trace: tcp: Add tracepoint for >>> tcp_sendmsg_locked()") >> >> Why is it needed? what would add on top of a plain perf probe, which >> will be always available for such function with such argument, as the >> function can't be inlined? > > Why this function can't be inlined? Because the kernel need to be able find a pointer to it: .sendmsg = udp_sendmsg, I'll be really curious to learn how the compiler could inline that. /P
Breno Leitao wrote: > Hello Willem, > > On Wed, Apr 16, 2025 at 03:34:38PM -0400, Willem de Bruijn wrote: > > Breno Leitao wrote: > > > Add a lightweight tracepoint to monitor UDP send message operations, > > > similar to the recently introduced tcp_sendmsg_locked() trace event in > > > commit 0f08335ade712 ("trace: tcp: Add tracepoint for > > > tcp_sendmsg_locked()") > > > > > > This implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > > > creating extensive trace event infrastructure and exporting to tracefs, > > > keeping it minimal and efficient. > > > > > > Since this patch creates a rawtracepoint, it can be accessed using > > > standard tracing tools like bpftrace: > > > > > > rawtracepoint:udp_sendmsg_tp { > > > ... > > > } > > > > What does this enable beyond kfunc:udp_sendmsg? > > A few things come to mind when evaluating the use of tracepoints. > > One significant advantage is that tracepoints provide a stable API where > programs can hook into, making it easier for users to interact with key > functions. > > However, kfunc/kprobes has some notable disadvantages. For instance, > partial or total inlining can cause hooks to fail, which is not ideal > and can lead to issues (mainly when we have partial inlines, and the > hook works _sometimes_). As Paolo explained, that is unlikely to happen in this case, as this is a protocol specific callback function. > In contrast, tracepoints create a more stable API for users to hook > into, eliminating the need to patch the kernel with noinline function > attributes. This solution may be ugly, but it is a common practice. > (and this is my main goal with it, remove `noinline` from our internal > kernel) > > While tracepoints are not officially considered stable APIs, they tend > to be "more stable" in practice due to their deliberate and strategic > placement. As a result, they are less likely to get renamed or changed > frequently. > > Additionally, tracepoints offer performance benefits, being faster than > both kfunc and kprobes. The performance argument is fair. Perhaps we want to think this through more broadly for networking tracepoints vs more flexible kprobes/kfuncs, rather than on a case by case basis: Where do we think the performance or functionality (if exposing additional info, as for tcp_sendmsg) warrants the tracepoint? I suspect that the use is predominantly for on-demand debugging, where the performance cost (and latency impact) of measurement is minor. > For further discussion on this topic, please refer to same discussion in > VFS: > > https://lore.kernel.org/bpf/20250118033723.GV1977892@ZenIV/T/#m4c2fb2d904e839b34800daf8578dff0b9abd69a0 > > Thanks > --breno
Hi Paolo, > On Apr 17, 2025, at 6:17 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > On 4/17/25 1:34 PM, Breno Leitao wrote: >> On Thu, Apr 17, 2025 at 08:57:24AM +0200, Paolo Abeni wrote: >>> On 4/16/25 9:23 PM, Breno Leitao wrote: >>>> Add a lightweight tracepoint to monitor UDP send message operations, >>>> similar to the recently introduced tcp_sendmsg_locked() trace event in >>>> commit 0f08335ade712 ("trace: tcp: Add tracepoint for >>>> tcp_sendmsg_locked()") >>> >>> Why is it needed? what would add on top of a plain perf probe, which >>> will be always available for such function with such argument, as the >>> function can't be inlined? >> >> Why this function can't be inlined? > > Because the kernel need to be able find a pointer to it: > > .sendmsg = udp_sendmsg, > > I'll be really curious to learn how the compiler could inline that. It is true that functions that are only used via function pointers will not be inlined by compilers (at least for those we have tested). For this reason, we do not worry about functions in various tcp_congestion_ops. However, udp_sendmsg is also called directly by udpv6_sendmsg, so it can still get inlined by LTO. Thanks, Song
Song Liu wrote: > Hi Paolo, > > > On Apr 17, 2025, at 6:17 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > > > On 4/17/25 1:34 PM, Breno Leitao wrote: > >> On Thu, Apr 17, 2025 at 08:57:24AM +0200, Paolo Abeni wrote: > >>> On 4/16/25 9:23 PM, Breno Leitao wrote: > >>>> Add a lightweight tracepoint to monitor UDP send message operations, > >>>> similar to the recently introduced tcp_sendmsg_locked() trace event in > >>>> commit 0f08335ade712 ("trace: tcp: Add tracepoint for > >>>> tcp_sendmsg_locked()") > >>> > >>> Why is it needed? what would add on top of a plain perf probe, which > >>> will be always available for such function with such argument, as the > >>> function can't be inlined? > >> > >> Why this function can't be inlined? > > > > Because the kernel need to be able find a pointer to it: > > > > .sendmsg = udp_sendmsg, > > > > I'll be really curious to learn how the compiler could inline that. > > It is true that functions that are only used via function pointers > will not be inlined by compilers (at least for those we have tested). > For this reason, we do not worry about functions in various > tcp_congestion_ops. However, udp_sendmsg is also called directly > by udpv6_sendmsg, so it can still get inlined by LTO. > > Thanks, > Song > I would think that hitting this tracepoint for ipv6_addr_v4mapped addresses is unintentional and surprising, as those would already hit udpv6_sendmsg. On which note, any IPv4 change to UDP needs an equivalent IPv6 one.
On 4/17/25 5:42 AM, Breno Leitao wrote: > Hello David, > > On Wed, Apr 16, 2025 at 04:16:26PM -0700, David Ahern wrote: >> On 4/16/25 1:23 PM, Breno Leitao wrote: >>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >>> index f9f5b92cf4b61..8c2902504a399 100644 >>> --- a/net/ipv4/udp.c >>> +++ b/net/ipv4/udp.c >>> @@ -1345,6 +1345,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) >>> connected = 1; >>> } >>> >>> + trace_udp_sendmsg_tp(sk, msg); >> >> why `_tp` suffix? the usual naming is trace_${func} > > I got the impression that DECLARE_TRACE() raw tracepoints names end up > in _tp(): > > include/trace/events/tcp.h > DECLARE_TRACE(tcp_cwnd_reduction_tp, that is the only networking one: $ git grep trace_ net drivers/net | grep _tp net/bpf/test_run.c: trace_bpf_trigger_tp(nonce); net/ipv4/tcp_input.c: trace_tcp_cwnd_reduction_tp(sk, newly_acked_sacked, newly_lost, flag); The rest follow do not have the _tp suffix: $ git grep trace_ net drivers/net | wc -l 2727 2725 without; 2 with
On Thu, Apr 17, 2025 at 08:55:27AM -0700, David Ahern wrote: > On 4/17/25 5:42 AM, Breno Leitao wrote: > > Hello David, > > > > On Wed, Apr 16, 2025 at 04:16:26PM -0700, David Ahern wrote: > >> On 4/16/25 1:23 PM, Breno Leitao wrote: > >>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > >>> index f9f5b92cf4b61..8c2902504a399 100644 > >>> --- a/net/ipv4/udp.c > >>> +++ b/net/ipv4/udp.c > >>> @@ -1345,6 +1345,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) > >>> connected = 1; > >>> } > >>> > >>> + trace_udp_sendmsg_tp(sk, msg); > >> > >> why `_tp` suffix? the usual naming is trace_${func} > > > > I got the impression that DECLARE_TRACE() raw tracepoints names end up > > in _tp(): > > > > include/trace/events/tcp.h > > DECLARE_TRACE(tcp_cwnd_reduction_tp, > > that is the only networking one: > > $ git grep trace_ net drivers/net | grep _tp > net/bpf/test_run.c: trace_bpf_trigger_tp(nonce); > net/ipv4/tcp_input.c: trace_tcp_cwnd_reduction_tp(sk, > newly_acked_sacked, newly_lost, flag); Do we want to rename them and remove the _tp? I suppose it is OK given that tracepoints are not expected to be stable? Also, if we have consensus about this patch, I will remove the _tp from it. Thanks! --breno
> On Apr 17, 2025, at 8:48 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Song Liu wrote: >> Hi Paolo, >> >>> On Apr 17, 2025, at 6:17 AM, Paolo Abeni <pabeni@redhat.com> wrote: >>> >>> On 4/17/25 1:34 PM, Breno Leitao wrote: >>>> On Thu, Apr 17, 2025 at 08:57:24AM +0200, Paolo Abeni wrote: >>>>> On 4/16/25 9:23 PM, Breno Leitao wrote: >>>>>> Add a lightweight tracepoint to monitor UDP send message operations, >>>>>> similar to the recently introduced tcp_sendmsg_locked() trace event in >>>>>> commit 0f08335ade712 ("trace: tcp: Add tracepoint for >>>>>> tcp_sendmsg_locked()") >>>>> >>>>> Why is it needed? what would add on top of a plain perf probe, which >>>>> will be always available for such function with such argument, as the >>>>> function can't be inlined? >>>> >>>> Why this function can't be inlined? >>> >>> Because the kernel need to be able find a pointer to it: >>> >>> .sendmsg = udp_sendmsg, >>> >>> I'll be really curious to learn how the compiler could inline that. >> >> It is true that functions that are only used via function pointers >> will not be inlined by compilers (at least for those we have tested). >> For this reason, we do not worry about functions in various >> tcp_congestion_ops. However, udp_sendmsg is also called directly >> by udpv6_sendmsg, so it can still get inlined by LTO. >> >> Thanks, >> Song >> > > I would think that hitting this tracepoint for ipv6_addr_v4mapped > addresses is unintentional and surprising, as those would already > hit udpv6_sendmsg. It is up to the user to decide how these tracepoints should be used. For example, the user may only be interested in udpv6_sendmsg => udp_sendmsg case. Without a tracepoint, the user has to understand whether the compiler inlined this function. > > On which note, any IPv4 change to UDP needs an equivalent IPv6 one. Do you mean we need to also add tracepoints for udpv6_sendmsg? Thanks, Song
On 4/17/25 10:00 AM, Breno Leitao wrote: >> $ git grep trace_ net drivers/net | grep _tp >> net/bpf/test_run.c: trace_bpf_trigger_tp(nonce); >> net/ipv4/tcp_input.c: trace_tcp_cwnd_reduction_tp(sk, >> newly_acked_sacked, newly_lost, flag); > > Do we want to rename them and remove the _tp? I suppose it is OK given > that tracepoints are not expected to be stable? > > Also, if we have consensus about this patch, I will remove the _tp from > it. > I am only asking for consistency. Based on existing networking instances, consistency is no _tp suffix.
On Thu, 17 Apr 2025 21:57:56 -0700 David Ahern <dsahern@kernel.org> wrote: > On 4/17/25 10:00 AM, Breno Leitao wrote: > >> $ git grep trace_ net drivers/net | grep _tp > >> net/bpf/test_run.c: trace_bpf_trigger_tp(nonce); > >> net/ipv4/tcp_input.c: trace_tcp_cwnd_reduction_tp(sk, > >> newly_acked_sacked, newly_lost, flag); > > > > Do we want to rename them and remove the _tp? I suppose it is OK given > > that tracepoints are not expected to be stable? > > > > Also, if we have consensus about this patch, I will remove the _tp from > > it. > > > > I am only asking for consistency. Based on existing networking > instances, consistency is no _tp suffix. I was looking at what other tracepoints have "_tp" and found a few. What it appears to be is that the "_tp" tracepoints are defined by: DECLARE_TRACEPOINT() and have no corresponding trace event in tracefs (/sys/kernel/tracing/events). I like that distinction because it lets the developer know that this tracepoint is in kernel only, and not exposed to user space. Perhaps it should stay as "_tp()" if it's not exposed via tracefs. In fact, if there is a clean up, it should be adding "_tp" to all tracepoints that do not have a corresponding trace event attached to them. As they are in kernel only, that change should not cause any ABI breakage. -- Steve
On Fri, 18 Apr 2025 08:33:51 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > In fact, if there is a clean up, it should be adding "_tp" to all > tracepoints that do not have a corresponding trace event attached to them. > As they are in kernel only, that change should not cause any ABI breakage. Actually, I think I'll make it where tracepoints created via DECLARE_TRACE() will automatically get the "_tp" ending. That would help enforce this API. -- Steve
Song Liu wrote: > > > > On Apr 17, 2025, at 8:48 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > > > Song Liu wrote: > >> Hi Paolo, > >> > >>> On Apr 17, 2025, at 6:17 AM, Paolo Abeni <pabeni@redhat.com> wrote: > >>> > >>> On 4/17/25 1:34 PM, Breno Leitao wrote: > >>>> On Thu, Apr 17, 2025 at 08:57:24AM +0200, Paolo Abeni wrote: > >>>>> On 4/16/25 9:23 PM, Breno Leitao wrote: > >>>>>> Add a lightweight tracepoint to monitor UDP send message operations, > >>>>>> similar to the recently introduced tcp_sendmsg_locked() trace event in > >>>>>> commit 0f08335ade712 ("trace: tcp: Add tracepoint for > >>>>>> tcp_sendmsg_locked()") > >>>>> > >>>>> Why is it needed? what would add on top of a plain perf probe, which > >>>>> will be always available for such function with such argument, as the > >>>>> function can't be inlined? > >>>> > >>>> Why this function can't be inlined? > >>> > >>> Because the kernel need to be able find a pointer to it: > >>> > >>> .sendmsg = udp_sendmsg, > >>> > >>> I'll be really curious to learn how the compiler could inline that. > >> > >> It is true that functions that are only used via function pointers > >> will not be inlined by compilers (at least for those we have tested). > >> For this reason, we do not worry about functions in various > >> tcp_congestion_ops. However, udp_sendmsg is also called directly > >> by udpv6_sendmsg, so it can still get inlined by LTO. > >> > >> Thanks, > >> Song > >> > > > > I would think that hitting this tracepoint for ipv6_addr_v4mapped > > addresses is unintentional and surprising, as those would already > > hit udpv6_sendmsg. > > It is up to the user to decide how these tracepoints should be > used. For example, the user may only be interested in > udpv6_sendmsg => udp_sendmsg case. Without a tracepoint, the user > has to understand whether the compiler inlined this function. > > > > > On which note, any IPv4 change to UDP needs an equivalent IPv6 one. > > Do you mean we need to also add tracepoints for udpv6_sendmsg? If there is consensus that a tracepoint at this point is valuable, then it should be supported equally for IPv4 and IPv6. That holds true for all such hooks. No IPv4 only.
diff --git a/include/trace/events/udp.h b/include/trace/events/udp.h index 6142be4068e29..38ab24053b6ff 100644 --- a/include/trace/events/udp.h +++ b/include/trace/events/udp.h @@ -46,6 +46,11 @@ TRACE_EVENT(udp_fail_queue_rcv_skb, __entry->saddr, __entry->daddr) ); +DECLARE_TRACE(udp_sendmsg_tp, + TP_PROTO(const struct sock *sk, const struct msghdr *msg), + TP_ARGS(sk, msg) +); + #endif /* _TRACE_UDP_H */ /* This part must be outside protection */ diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index f9f5b92cf4b61..8c2902504a399 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1345,6 +1345,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) connected = 1; } + trace_udp_sendmsg_tp(sk, msg); + ipcm_init_sk(&ipc, inet); ipc.gso_size = READ_ONCE(up->gso_size);
Add a lightweight tracepoint to monitor UDP send message operations, similar to the recently introduced tcp_sendmsg_locked() trace event in commit 0f08335ade712 ("trace: tcp: Add tracepoint for tcp_sendmsg_locked()") This implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid creating extensive trace event infrastructure and exporting to tracefs, keeping it minimal and efficient. Since this patch creates a rawtracepoint, it can be accessed using standard tracing tools like bpftrace: rawtracepoint:udp_sendmsg_tp { ... } Signed-off-by: Breno Leitao <leitao@debian.org> --- include/trace/events/udp.h | 5 +++++ net/ipv4/udp.c | 2 ++ 2 files changed, 7 insertions(+) --- base-commit: 1d6f4861027b451e064896f34dd0beada8871bfe change-id: 20250416-udp_sendmsg-084a32657a56 Best regards,