diff mbox series

[net-next,4/8] net: psample: add tracepoint

Message ID 20240424135109.3524355-5-amorenoz@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: openvswitch: Add sample multicasting. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 47 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 932 this patch: 932
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 944 this patch: 944
netdev/checkpatch warning CHECK: Alignment should match open parenthesis CHECK: Lines should not end with a '(' CHECK: spaces preferred around that '/' (ctx:VxV) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-25--09-00 (tests: 995)

Commit Message

Adrián Moreno April 24, 2024, 1:50 p.m. UTC
Currently there are no widely-available tools to dump the metadata and
group information when a packet is sampled, making it difficult to
troubleshoot related issues.

This makes psample use the event tracing framework to log the sampling
of a packet so that it's easier to quickly identify the source
(i.e: group) and context (i.e: metadata) of a packet being sampled.

This patch creates some checkpatch splats, but the style of the
tracepoint definition mimics that of other modules so it seems
acceptable.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 net/psample/psample.c |  9 +++++++
 net/psample/trace.h   | 62 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)
 create mode 100644 net/psample/trace.h

Comments

Ido Schimmel April 25, 2024, 7:18 a.m. UTC | #1
On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
> Currently there are no widely-available tools to dump the metadata and
> group information when a packet is sampled, making it difficult to
> troubleshoot related issues.
> 
> This makes psample use the event tracing framework to log the sampling
> of a packet so that it's easier to quickly identify the source
> (i.e: group) and context (i.e: metadata) of a packet being sampled.
> 
> This patch creates some checkpatch splats, but the style of the
> tracepoint definition mimics that of other modules so it seems
> acceptable.

I don't see a good reason to add this tracepoint (which we won't be able
to remove) when you can easily do that with bpftrace which by now should
be widely available:

#!/usr/bin/bpftrace

kfunc:psample_sample_packet
{
        $ts_us = nsecs() / 1000;
        $secs = $ts_us / 1000000;
        $us = $ts_us % 1000000;
        $group = args.group;
        $skb = args.skb;
        $md = args.md;

        printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
               comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
               $skb, $skb->len, $skb->data_len, args.sample_rate,
               $md->in_ifindex, $md->out_ifindex,
               buf($md->user_cookie, $md->user_cookie_len));
}

Example output:

mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
\xde\xad\xbe\xef
mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
\xde\xad\xbe\xef

Note that it prints the cookie itself unlike the tracepoint which only
prints the hashed pointer.

> 
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> ---
>  net/psample/psample.c |  9 +++++++
>  net/psample/trace.h   | 62 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+)
>  create mode 100644 net/psample/trace.h
> 
> diff --git a/net/psample/psample.c b/net/psample/psample.c
> index 476aaad7a885..92db8ffa2ba2 100644
> --- a/net/psample/psample.c
> +++ b/net/psample/psample.c
> @@ -18,6 +18,12 @@
>  #include <net/ip_tunnels.h>
>  #include <net/dst_metadata.h>
>  
> +#ifndef __CHECKER__
> +#define CREATE_TRACE_POINTS
> +#endif
> +
> +#include "trace.h"
> +
>  #define PSAMPLE_MAX_PACKET_SIZE 0xffff
>  
>  static LIST_HEAD(psample_groups_list);
> @@ -470,6 +476,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>  	void *data;
>  	int ret;
>  
> +	if (trace_psample_sample_packet_enabled())
> +		trace_psample_sample_packet(group, skb, sample_rate, md);

My understanding is that trace_x_enabled() should only be used if you
need to do some work to prepare parameters for the tracepoint.

> +
>  	meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>  		   (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>  		   (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +
Adrián Moreno April 25, 2024, 8:06 a.m. UTC | #2
On 4/25/24 09:18, Ido Schimmel wrote:
> On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
>> Currently there are no widely-available tools to dump the metadata and
>> group information when a packet is sampled, making it difficult to
>> troubleshoot related issues.
>>
>> This makes psample use the event tracing framework to log the sampling
>> of a packet so that it's easier to quickly identify the source
>> (i.e: group) and context (i.e: metadata) of a packet being sampled.
>>
>> This patch creates some checkpatch splats, but the style of the
>> tracepoint definition mimics that of other modules so it seems
>> acceptable.
> 
> I don't see a good reason to add this tracepoint (which we won't be able
> to remove) when you can easily do that with bpftrace which by now should
> be widely available:
> 
> #!/usr/bin/bpftrace
> 
> kfunc:psample_sample_packet
> {
>          $ts_us = nsecs() / 1000;
>          $secs = $ts_us / 1000000;
>          $us = $ts_us % 1000000;
>          $group = args.group;
>          $skb = args.skb;
>          $md = args.md;
> 
>          printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
>                 comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
>                 $skb, $skb->len, $skb->data_len, args.sample_rate,
>                 $md->in_ifindex, $md->out_ifindex,
>                 buf($md->user_cookie, $md->user_cookie_len));
> }
> 
> Example output:
> 
> mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> \xde\xad\xbe\xef
> mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> \xde\xad\xbe\xef
> 
> Note that it prints the cookie itself unlike the tracepoint which only
> prints the hashed pointer.
> 

I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess 
that also true for many other tracepoints out there, right?

For development and labs bpftrace is perfectly fine, but using kfuncs and 
requiring recompilation is harder in production systems compared with using 
smaller CO-RE tools.

If OVS starts using psample heavily and users need to troubleshoot or merely 
observe packets as they are sampled in a more efficient way, they are likely to 
use ebpf for that. I think making it a bit easier (as in, providing a sligthly 
more stable tracepoint) is worth considering.

Can you please expand on your concerns about the tracepoint? It's on the main 
internal function of the module so, even though the function name or its 
arguments might change, it doesn't seem probable that it'll disappear 
altogether. Why else would we want to remove the tracepoint?

>>
>> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
>> ---
>>   net/psample/psample.c |  9 +++++++
>>   net/psample/trace.h   | 62 +++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 71 insertions(+)
>>   create mode 100644 net/psample/trace.h
>>
>> diff --git a/net/psample/psample.c b/net/psample/psample.c
>> index 476aaad7a885..92db8ffa2ba2 100644
>> --- a/net/psample/psample.c
>> +++ b/net/psample/psample.c
>> @@ -18,6 +18,12 @@
>>   #include <net/ip_tunnels.h>
>>   #include <net/dst_metadata.h>
>>   
>> +#ifndef __CHECKER__
>> +#define CREATE_TRACE_POINTS
>> +#endif
>> +
>> +#include "trace.h"
>> +
>>   #define PSAMPLE_MAX_PACKET_SIZE 0xffff
>>   
>>   static LIST_HEAD(psample_groups_list);
>> @@ -470,6 +476,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>>   	void *data;
>>   	int ret;
>>   
>> +	if (trace_psample_sample_packet_enabled())
>> +		trace_psample_sample_packet(group, skb, sample_rate, md);
> 
> My understanding is that trace_x_enabled() should only be used if you
> need to do some work to prepare parameters for the tracepoint.
> 

Oh, thanks for that! I was not aware.

>> +
>>   	meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>>   		   (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>>   		   (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +
>
Ido Schimmel April 25, 2024, 3:25 p.m. UTC | #3
On Thu, Apr 25, 2024 at 10:06:20AM +0200, Adrian Moreno wrote:
> 
> 
> On 4/25/24 09:18, Ido Schimmel wrote:
> > On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
> > > Currently there are no widely-available tools to dump the metadata and
> > > group information when a packet is sampled, making it difficult to
> > > troubleshoot related issues.
> > > 
> > > This makes psample use the event tracing framework to log the sampling
> > > of a packet so that it's easier to quickly identify the source
> > > (i.e: group) and context (i.e: metadata) of a packet being sampled.
> > > 
> > > This patch creates some checkpatch splats, but the style of the
> > > tracepoint definition mimics that of other modules so it seems
> > > acceptable.
> > 
> > I don't see a good reason to add this tracepoint (which we won't be able
> > to remove) when you can easily do that with bpftrace which by now should
> > be widely available:
> > 
> > #!/usr/bin/bpftrace
> > 
> > kfunc:psample_sample_packet
> > {
> >          $ts_us = nsecs() / 1000;
> >          $secs = $ts_us / 1000000;
> >          $us = $ts_us % 1000000;
> >          $group = args.group;
> >          $skb = args.skb;
> >          $md = args.md;
> > 
> >          printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
> >                 comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
> >                 $skb, $skb->len, $skb->data_len, args.sample_rate,
> >                 $md->in_ifindex, $md->out_ifindex,
> >                 buf($md->user_cookie, $md->user_cookie_len));
> > }
> > 
> > Example output:
> > 
> > mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> > \xde\xad\xbe\xef
> > mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> > \xde\xad\xbe\xef
> > 
> > Note that it prints the cookie itself unlike the tracepoint which only
> > prints the hashed pointer.
> > 
> 
> I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess
> that also true for many other tracepoints out there, right?

Maybe, but this particular tracepoint is not buried deep inside some
complex function with manipulated data being passed as arguments.
Instead, this tracepoint is placed at the very beginning of the function
and takes the function arguments as its own arguments. The tracepoint
can be easily replaced with fentry/kprobes like I've shown with the
example above.

> For development and labs bpftrace is perfectly fine, but using kfuncs and
> requiring recompilation is harder in production systems compared with using
> smaller CO-RE tools.

I used bpftrace because it is very easy to write, but I could have done
the same with libbpf. I have a bunch of such tools that I wrote over the
years that I compiled once on my laptop and which I copy to various
machines where I need them.

> If OVS starts using psample heavily and users need to troubleshoot or merely
> observe packets as they are sampled in a more efficient way, they are likely
> to use ebpf for that. I think making it a bit easier (as in, providing a
> sligthly more stable tracepoint) is worth considering.

I'm not saying that it's not worth considering, I'm simply saying that
it should be done after gathering operational experience with existing
mechanisms. It's possible you will conclude that this tracepoint is not
actually needed.

Also, there are some disadvantages in using tracepoints compared to
fentry:

https://github.com/Mellanox/mlxsw/commit/e996fd583eff1c43aacb9c79e55f5add12402d7d
https://lore.kernel.org/all/CAEf4BzbhvD_f=y3SDAiFqNvuErcnXt4fErMRSfanjYQg5=7GJg@mail.gmail.com/#t

Not saying that's the case here, but worth considering / being aware.

> Can you please expand on your concerns about the tracepoint? It's on the
> main internal function of the module so, even though the function name or
> its arguments might change, it doesn't seem probable that it'll disappear
> altogether. Why else would we want to remove the tracepoint?

It's not really concerns, but dissatisfaction. It's my impression (might
be wrong) that this series commits to adding new interfaces without
first seriously evaluating existing ones. This is true for this patch
and patch #2 that adds a new netlink command instead of using
SO_ATTACH_FILTER like existing applications are doing to achieve the
same goal.

I guess some will disagree, but wanted to voice my opinion nonetheless.
Adrián Moreno April 29, 2024, 5:33 a.m. UTC | #4
On 4/25/24 17:25, Ido Schimmel wrote:
> On Thu, Apr 25, 2024 at 10:06:20AM +0200, Adrian Moreno wrote:
>>
>>
>> On 4/25/24 09:18, Ido Schimmel wrote:
>>> On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
>>>> Currently there are no widely-available tools to dump the metadata and
>>>> group information when a packet is sampled, making it difficult to
>>>> troubleshoot related issues.
>>>>
>>>> This makes psample use the event tracing framework to log the sampling
>>>> of a packet so that it's easier to quickly identify the source
>>>> (i.e: group) and context (i.e: metadata) of a packet being sampled.
>>>>
>>>> This patch creates some checkpatch splats, but the style of the
>>>> tracepoint definition mimics that of other modules so it seems
>>>> acceptable.
>>>
>>> I don't see a good reason to add this tracepoint (which we won't be able
>>> to remove) when you can easily do that with bpftrace which by now should
>>> be widely available:
>>>
>>> #!/usr/bin/bpftrace
>>>
>>> kfunc:psample_sample_packet
>>> {
>>>           $ts_us = nsecs() / 1000;
>>>           $secs = $ts_us / 1000000;
>>>           $us = $ts_us % 1000000;
>>>           $group = args.group;
>>>           $skb = args.skb;
>>>           $md = args.md;
>>>
>>>           printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
>>>                  comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
>>>                  $skb, $skb->len, $skb->data_len, args.sample_rate,
>>>                  $md->in_ifindex, $md->out_ifindex,
>>>                  buf($md->user_cookie, $md->user_cookie_len));
>>> }
>>>
>>> Example output:
>>>
>>> mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
>>> \xde\xad\xbe\xef
>>> mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
>>> \xde\xad\xbe\xef
>>>
>>> Note that it prints the cookie itself unlike the tracepoint which only
>>> prints the hashed pointer.
>>>
>>
>> I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess
>> that also true for many other tracepoints out there, right?
> 
> Maybe, but this particular tracepoint is not buried deep inside some
> complex function with manipulated data being passed as arguments.
> Instead, this tracepoint is placed at the very beginning of the function
> and takes the function arguments as its own arguments. The tracepoint
> can be easily replaced with fentry/kprobes like I've shown with the
> example above.
> 
>> For development and labs bpftrace is perfectly fine, but using kfuncs and
>> requiring recompilation is harder in production systems compared with using
>> smaller CO-RE tools.
> 
> I used bpftrace because it is very easy to write, but I could have done
> the same with libbpf. I have a bunch of such tools that I wrote over the
> years that I compiled once on my laptop and which I copy to various
> machines where I need them.
>

My worry is that if tools are built around a particular kprobe/kfunc they will 
break if the function name or its arguments change, where as a tracepoint give 
them a bit more stability across kernel versions. This breakage might not be a 
huge problem for bpftrace since the user can change the script at runtime, but 
libbpf programs will need recompilation or some kind of version-detection mechanism.

Given the observability-oriented nature of psample I can very much see tools 
like this being built (I myself plan to write one for OVS repo) and my concern 
is having their stability depend on a function name or arguments not changing 
across versions.


>> If OVS starts using psample heavily and users need to troubleshoot or merely
>> observe packets as they are sampled in a more efficient way, they are likely
>> to use ebpf for that. I think making it a bit easier (as in, providing a
>> sligthly more stable tracepoint) is worth considering.
> 
> I'm not saying that it's not worth considering, I'm simply saying that
> it should be done after gathering operational experience with existing
> mechanisms. It's possible you will conclude that this tracepoint is not
> actually needed.
> 
> Also, there are some disadvantages in using tracepoints compared to
> fentry:
> 
> https://github.com/Mellanox/mlxsw/commit/e996fd583eff1c43aacb9c79e55f5add12402d7d
> https://lore.kernel.org/all/CAEf4BzbhvD_f=y3SDAiFqNvuErcnXt4fErMRSfanjYQg5=7GJg@mail.gmail.com/#t
> 
> Not saying that's the case here, but worth considering / being aware.
> 
>> Can you please expand on your concerns about the tracepoint? It's on the
>> main internal function of the module so, even though the function name or
>> its arguments might change, it doesn't seem probable that it'll disappear
>> altogether. Why else would we want to remove the tracepoint?
> 
> It's not really concerns, but dissatisfaction. It's my impression (might
> be wrong) that this series commits to adding new interfaces without
> first seriously evaluating existing ones. This is true for this patch
> and patch #2 that adds a new netlink command instead of using
> SO_ATTACH_FILTER like existing applications are doing to achieve the
> same goal.
> 
> I guess some will disagree, but wanted to voice my opinion nonetheless.
> 

That's a fair point and I appreciate the feedback.

For patch #2, I can concede that it's just making applications slightly simpler 
without providing any further stability guarantees. I'm OK removing it.

And, I fail to convince you of the usefulness of the tracepoint, I can remove it 
as well.

Thanks.
Ido Schimmel April 30, 2024, 12:53 p.m. UTC | #5
On Mon, Apr 29, 2024 at 07:33:59AM +0200, Adrian Moreno wrote:
> 
> 
> On 4/25/24 17:25, Ido Schimmel wrote:
> > On Thu, Apr 25, 2024 at 10:06:20AM +0200, Adrian Moreno wrote:
> > > 
> > > 
> > > On 4/25/24 09:18, Ido Schimmel wrote:
> > > > On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
> > > > > Currently there are no widely-available tools to dump the metadata and
> > > > > group information when a packet is sampled, making it difficult to
> > > > > troubleshoot related issues.
> > > > > 
> > > > > This makes psample use the event tracing framework to log the sampling
> > > > > of a packet so that it's easier to quickly identify the source
> > > > > (i.e: group) and context (i.e: metadata) of a packet being sampled.
> > > > > 
> > > > > This patch creates some checkpatch splats, but the style of the
> > > > > tracepoint definition mimics that of other modules so it seems
> > > > > acceptable.
> > > > 
> > > > I don't see a good reason to add this tracepoint (which we won't be able
> > > > to remove) when you can easily do that with bpftrace which by now should
> > > > be widely available:
> > > > 
> > > > #!/usr/bin/bpftrace
> > > > 
> > > > kfunc:psample_sample_packet
> > > > {
> > > >           $ts_us = nsecs() / 1000;
> > > >           $secs = $ts_us / 1000000;
> > > >           $us = $ts_us % 1000000;
> > > >           $group = args.group;
> > > >           $skb = args.skb;
> > > >           $md = args.md;
> > > > 
> > > >           printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
> > > >                  comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
> > > >                  $skb, $skb->len, $skb->data_len, args.sample_rate,
> > > >                  $md->in_ifindex, $md->out_ifindex,
> > > >                  buf($md->user_cookie, $md->user_cookie_len));
> > > > }
> > > > 
> > > > Example output:
> > > > 
> > > > mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> > > > \xde\xad\xbe\xef
> > > > mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> > > > \xde\xad\xbe\xef
> > > > 
> > > > Note that it prints the cookie itself unlike the tracepoint which only
> > > > prints the hashed pointer.
> > > > 
> > > 
> > > I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess
> > > that also true for many other tracepoints out there, right?
> > 
> > Maybe, but this particular tracepoint is not buried deep inside some
> > complex function with manipulated data being passed as arguments.
> > Instead, this tracepoint is placed at the very beginning of the function
> > and takes the function arguments as its own arguments. The tracepoint
> > can be easily replaced with fentry/kprobes like I've shown with the
> > example above.
> > 
> > > For development and labs bpftrace is perfectly fine, but using kfuncs and
> > > requiring recompilation is harder in production systems compared with using
> > > smaller CO-RE tools.
> > 
> > I used bpftrace because it is very easy to write, but I could have done
> > the same with libbpf. I have a bunch of such tools that I wrote over the
> > years that I compiled once on my laptop and which I copy to various
> > machines where I need them.
> > 
> 
> My worry is that if tools are built around a particular kprobe/kfunc they
> will break if the function name or its arguments change, where as a
> tracepoint give them a bit more stability across kernel versions. This
> breakage might not be a huge problem for bpftrace since the user can change
> the script at runtime, but libbpf programs will need recompilation or some
> kind of version-detection mechanism.
> 
> Given the observability-oriented nature of psample I can very much see tools
> like this being built (I myself plan to write one for OVS repo) and my
> concern is having their stability depend on a function name or arguments not
> changing across versions.

There are a lot of tools in BCC that are using kprobes/fentry so
experience shows that it is possible to build observability tools on top
of these interfaces. My preference would be to avoid preemptively adding
a new tracepoint.

> 
> 
> > > If OVS starts using psample heavily and users need to troubleshoot or merely
> > > observe packets as they are sampled in a more efficient way, they are likely
> > > to use ebpf for that. I think making it a bit easier (as in, providing a
> > > sligthly more stable tracepoint) is worth considering.
> > 
> > I'm not saying that it's not worth considering, I'm simply saying that
> > it should be done after gathering operational experience with existing
> > mechanisms. It's possible you will conclude that this tracepoint is not
> > actually needed.
> > 
> > Also, there are some disadvantages in using tracepoints compared to
> > fentry:
> > 
> > https://github.com/Mellanox/mlxsw/commit/e996fd583eff1c43aacb9c79e55f5add12402d7d
> > https://lore.kernel.org/all/CAEf4BzbhvD_f=y3SDAiFqNvuErcnXt4fErMRSfanjYQg5=7GJg@mail.gmail.com/#t
> > 
> > Not saying that's the case here, but worth considering / being aware.
> > 
> > > Can you please expand on your concerns about the tracepoint? It's on the
> > > main internal function of the module so, even though the function name or
> > > its arguments might change, it doesn't seem probable that it'll disappear
> > > altogether. Why else would we want to remove the tracepoint?
> > 
> > It's not really concerns, but dissatisfaction. It's my impression (might
> > be wrong) that this series commits to adding new interfaces without
> > first seriously evaluating existing ones. This is true for this patch
> > and patch #2 that adds a new netlink command instead of using
> > SO_ATTACH_FILTER like existing applications are doing to achieve the
> > same goal.
> > 
> > I guess some will disagree, but wanted to voice my opinion nonetheless.
> > 
> 
> That's a fair point and I appreciate the feedback.
> 
> For patch #2, I can concede that it's just making applications slightly
> simpler without providing any further stability guarantees. I'm OK removing
> it.
> 
> And, I fail to convince you of the usefulness of the tracepoint, I can
> remove it as well.

Great, thank you. To be clear, my goal is not to make your life more
difficult, but simply to avoid merging changes that cannot be undone
when their goal can be achieved using existing interfaces.
diff mbox series

Patch

diff --git a/net/psample/psample.c b/net/psample/psample.c
index 476aaad7a885..92db8ffa2ba2 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -18,6 +18,12 @@ 
 #include <net/ip_tunnels.h>
 #include <net/dst_metadata.h>
 
+#ifndef __CHECKER__
+#define CREATE_TRACE_POINTS
+#endif
+
+#include "trace.h"
+
 #define PSAMPLE_MAX_PACKET_SIZE 0xffff
 
 static LIST_HEAD(psample_groups_list);
@@ -470,6 +476,9 @@  void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
 	void *data;
 	int ret;
 
+	if (trace_psample_sample_packet_enabled())
+		trace_psample_sample_packet(group, skb, sample_rate, md);
+
 	meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
 		   (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
 		   (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +
diff --git a/net/psample/trace.h b/net/psample/trace.h
new file mode 100644
index 000000000000..2d32a846989b
--- /dev/null
+++ b/net/psample/trace.h
@@ -0,0 +1,62 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM psample
+
+#if !defined(_TRACE_PSAMPLE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PSAMPLE__H
+
+#include <linux/skbuff.h>
+#include <linux/tracepoint.h>
+#include <net/psample.h>
+
+TRACE_EVENT(psample_sample_packet,
+
+	TP_PROTO(struct psample_group *group, struct sk_buff *skb,
+		 u32 sample_rate, const struct psample_metadata *md),
+
+	TP_ARGS(group, skb, sample_rate, md),
+
+	TP_STRUCT__entry(
+		__field(u32, group_num)
+		__field(u32, refcount)
+		__field(u32, seq)
+		__field(void *, skbaddr)
+		__field(unsigned int, len)
+		__field(unsigned int, data_len)
+		__field(u32, sample_rate)
+		__field(int, in_ifindex)
+		__field(int, out_ifindex)
+		__field(const void *, user_cookie)
+		__field(u32, user_cookie_len)
+	),
+
+	TP_fast_assign(
+		__entry->group_num = group->group_num;
+		__entry->refcount = group->refcount;
+		__entry->seq = group->seq;
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__entry->data_len = skb->data_len;
+		__entry->sample_rate = sample_rate;
+		__entry->in_ifindex = md->in_ifindex;
+		__entry->out_ifindex = md->out_ifindex;
+		__entry->user_cookie = &md->user_cookie[0];
+		__entry->user_cookie_len = md->user_cookie_len;
+	),
+
+	TP_printk("group_num=%u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%p user_cookie_len=%u",
+		  __entry->group_num, __entry->refcount, __entry->seq,
+		  __entry->skbaddr, __entry->len, __entry->data_len,
+		  __entry->sample_rate, __entry->in_ifindex,
+		  __entry->out_ifindex, __entry->user_cookie,
+		  __entry->user_cookie_len)
+);
+
+#endif /* _TRACE_PSAMPLE_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ../../net/psample
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>