mbox series

[RFC,bpf-next,0/3] libbpf: name-based u[ret]probe attach

Message ID 1642678950-19584-1-git-send-email-alan.maguire@oracle.com (mailing list archive)
Headers show
Series libbpf: name-based u[ret]probe attach | expand

Message

Alan Maguire Jan. 20, 2022, 11:42 a.m. UTC
This patch series is a refinement of the RFC patchset [1], focusing
on support for attach by name for uprobes and uretprobes.  Still
marked RFC as there are unresolved questions.

Currently attach for such probes is done by determining the offset
manually, so the aim is to try and mimic the simplicity of kprobe
attach, making use of uprobe opts to specify a name string.

uprobe attach is done by specifying a binary path, a pid (where
0 means "this process" and -1 means "all processes") and an
offset.  Here a 'func_name' option is added to 'struct uprobe_opts'
and that name is searched for in symbol tables.  If the binary
is a program, relative offset calcuation must be done to the
symbol address as described in [2].

Having a name allows us to support auto-attach via SEC()
specification, for example

SEC("uprobe/usr/lib64/libc.so.6/malloc")

Unresolved questions:

 - the current scheme uses

u[ret]probe[/]/path/2/binary/function[+offset]

   ...as SEC() format for auto-attach, for example

SEC("uprobe/usr/lib64/libc.so.6/malloc")

   It would be cleaner to delimit binary and function with ':'
   as is done by bcc.  One simple way to achieve that would be
   to support section string pre-processing, where instances of
   ':' are replaced by a '/'; this would get us to supporting
   a similar probe specification as bcc without the backward
   compatibility headaches.  I can't think of any valid
   cases where SEC() definitions have a ':' that we would
   replace with '/' in error, but I might be missing something.

 - the current scheme doesn't support a raw offset address, since
   it felt un-portable to encourage that, but can add this support
   if needed.

 - The auto-attach behaviour is to attach to all processes.
   It would be good to have a way to specify the attach process
   target. A few possibilities that would be compatible with
   BPF skeleton support are to use the open opts (feels kind of
   wrong conceptually since it's an attach-time attribute) or
   to support opts with attach pid field in "struct bpf_prog_skeleton".
   Latter would even allow a skeleton to attach to multiple
   different processes with prog-level granularity (perhaps a union
   of the various attach opts or similar?). There may be other
   ways to achieve this.

Changes since RFC [1]:
 - focused on uprobe entry/return, omitting USDT attach (Andrii)
 - use ELF program headers in calculating relative offsets, as this
   works for the case where we do not specify a process.  The
   previous approach relied on /proc/pid/maps so would not work
   for the "all processes" case (where pid is -1).
 - add support for auto-attach (patch 2)
 - fix selftests to use a real library function.  I didn't notice
   selftests override the usleep(3) definition, so as a result of
   this, the libc function wasn't being called, so usleep() should
   not be used to test shared library attach.  Also switch to
   using libc path as the binary argument for these cases, as
   specifying a shared library function name for a program is
   not supported.  Tests now instrument malloc/free.
 - added selftest that verifies auto-attach.

[1] https://lore.kernel.org/bpf/1642004329-23514-1-git-send-email-alan.maguire@oracle.com/
[2] https://www.kernel.org/doc/html/latest/trace/uprobetracer.html

Alan Maguire (3):
  libbpf: support function name-based attach for uprobes
  libbpf: add auto-attach for uprobes based on section name
  selftests/bpf: add tests for u[ret]probe attach by name

 tools/lib/bpf/libbpf.c                             | 259 ++++++++++++++++++++-
 tools/lib/bpf/libbpf.h                             |  10 +-
 .../selftests/bpf/prog_tests/attach_probe.c        | 114 +++++++--
 .../selftests/bpf/progs/test_attach_probe.c        |  33 +++
 4 files changed, 396 insertions(+), 20 deletions(-)

Comments

Alexei Starovoitov Jan. 21, 2022, 1:44 a.m. UTC | #1
On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> This patch series is a refinement of the RFC patchset [1], focusing
> on support for attach by name for uprobes and uretprobes.  Still
> marked RFC as there are unresolved questions.
>
> Currently attach for such probes is done by determining the offset
> manually, so the aim is to try and mimic the simplicity of kprobe
> attach, making use of uprobe opts to specify a name string.
>
> uprobe attach is done by specifying a binary path, a pid (where
> 0 means "this process" and -1 means "all processes") and an
> offset.  Here a 'func_name' option is added to 'struct uprobe_opts'
> and that name is searched for in symbol tables.  If the binary
> is a program, relative offset calcuation must be done to the
> symbol address as described in [2].

I think the pid discussion here and in the patches only causes
confusion. I think it's best to remove pid from the api.
uprobes are attached to an inode. They're not attached to a pid
or a process. Any existing process or future process started
from that inode (executable file) will have that uprobe triggering.
The kernel can do pid filtering through predicate mechanism,
but bpf uprobe doesn't do any filtering. iirc.
Similarly "attach to all processes" doesn't sound right either.
It's attached to all current and future processes.
Andrii Nakryiko Jan. 21, 2022, 6:15 p.m. UTC | #2
On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > This patch series is a refinement of the RFC patchset [1], focusing
> > on support for attach by name for uprobes and uretprobes.  Still
> > marked RFC as there are unresolved questions.
> >
> > Currently attach for such probes is done by determining the offset
> > manually, so the aim is to try and mimic the simplicity of kprobe
> > attach, making use of uprobe opts to specify a name string.
> >
> > uprobe attach is done by specifying a binary path, a pid (where
> > 0 means "this process" and -1 means "all processes") and an
> > offset.  Here a 'func_name' option is added to 'struct uprobe_opts'
> > and that name is searched for in symbol tables.  If the binary
> > is a program, relative offset calcuation must be done to the
> > symbol address as described in [2].
>
> I think the pid discussion here and in the patches only causes
> confusion. I think it's best to remove pid from the api.

It's already part of the uprobe API in libbpf
(bpf_program__attach_uprobe), but nothing really changes there.
API-wise Alan just added an optional func_name option. I think it
makes sense overall.

For auto-attach it has to be all PIDs, of course.

> uprobes are attached to an inode. They're not attached to a pid
> or a process. Any existing process or future process started
> from that inode (executable file) will have that uprobe triggering.
> The kernel can do pid filtering through predicate mechanism,
> but bpf uprobe doesn't do any filtering. iirc.
> Similarly "attach to all processes" doesn't sound right either.
> It's attached to all current and future processes.
Alexei Starovoitov Jan. 21, 2022, 6:20 p.m. UTC | #3
On Fri, Jan 21, 2022 at 10:15 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > >
> > > This patch series is a refinement of the RFC patchset [1], focusing
> > > on support for attach by name for uprobes and uretprobes.  Still
> > > marked RFC as there are unresolved questions.
> > >
> > > Currently attach for such probes is done by determining the offset
> > > manually, so the aim is to try and mimic the simplicity of kprobe
> > > attach, making use of uprobe opts to specify a name string.
> > >
> > > uprobe attach is done by specifying a binary path, a pid (where
> > > 0 means "this process" and -1 means "all processes") and an
> > > offset.  Here a 'func_name' option is added to 'struct uprobe_opts'
> > > and that name is searched for in symbol tables.  If the binary
> > > is a program, relative offset calcuation must be done to the
> > > symbol address as described in [2].
> >
> > I think the pid discussion here and in the patches only causes
> > confusion. I think it's best to remove pid from the api.
>
> It's already part of the uprobe API in libbpf
> (bpf_program__attach_uprobe), but nothing really changes there.
> API-wise Alan just added an optional func_name option. I think it
> makes sense overall.

Technically it can be deprecated.
So "it's already part of api" isn't really an excuse to keep
confusing the users.
Andrii Nakryiko Jan. 21, 2022, 6:27 p.m. UTC | #4
On Fri, Jan 21, 2022 at 10:20 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jan 21, 2022 at 10:15 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > > >
> > > > This patch series is a refinement of the RFC patchset [1], focusing
> > > > on support for attach by name for uprobes and uretprobes.  Still
> > > > marked RFC as there are unresolved questions.
> > > >
> > > > Currently attach for such probes is done by determining the offset
> > > > manually, so the aim is to try and mimic the simplicity of kprobe
> > > > attach, making use of uprobe opts to specify a name string.
> > > >
> > > > uprobe attach is done by specifying a binary path, a pid (where
> > > > 0 means "this process" and -1 means "all processes") and an
> > > > offset.  Here a 'func_name' option is added to 'struct uprobe_opts'
> > > > and that name is searched for in symbol tables.  If the binary
> > > > is a program, relative offset calcuation must be done to the
> > > > symbol address as described in [2].
> > >
> > > I think the pid discussion here and in the patches only causes
> > > confusion. I think it's best to remove pid from the api.
> >
> > It's already part of the uprobe API in libbpf
> > (bpf_program__attach_uprobe), but nothing really changes there.
> > API-wise Alan just added an optional func_name option. I think it
> > makes sense overall.
>
> Technically it can be deprecated.
> So "it's already part of api" isn't really an excuse to keep
> confusing the users.

... but I don't find it confusing and no one really ever complained?..
So it doesn't seem like we need to remove this.
Andrii Nakryiko Jan. 21, 2022, 6:31 p.m. UTC | #5
On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> This patch series is a refinement of the RFC patchset [1], focusing
> on support for attach by name for uprobes and uretprobes.  Still
> marked RFC as there are unresolved questions.
>
> Currently attach for such probes is done by determining the offset
> manually, so the aim is to try and mimic the simplicity of kprobe
> attach, making use of uprobe opts to specify a name string.
>
> uprobe attach is done by specifying a binary path, a pid (where
> 0 means "this process" and -1 means "all processes") and an
> offset.  Here a 'func_name' option is added to 'struct uprobe_opts'
> and that name is searched for in symbol tables.  If the binary
> is a program, relative offset calcuation must be done to the
> symbol address as described in [2].
>
> Having a name allows us to support auto-attach via SEC()
> specification, for example
>
> SEC("uprobe/usr/lib64/libc.so.6/malloc")
>
> Unresolved questions:
>
>  - the current scheme uses
>
> u[ret]probe[/]/path/2/binary/function[+offset]

that / after uprobe is not optional. This should be parsed as
"uprobe/<path-to-binary>/<func_name>[+<offset>]", in general. If
<path-to-binary> doesn't have leading '/' it will be just treated as a
relative path. Otherwise it's going to be ambiguous. So with your
example SEC("uprobe/usr/lib64/libc.so.6/malloc") you are specifying
"usr/lib64/libc.so.6", relative path, which is wrong. It has to be
SEC("uprobe//usr/lib64/libc.so.6/malloc"), however ugly that might
look.

>
>    ...as SEC() format for auto-attach, for example
>
> SEC("uprobe/usr/lib64/libc.so.6/malloc")
>
>    It would be cleaner to delimit binary and function with ':'
>    as is done by bcc.  One simple way to achieve that would be
>    to support section string pre-processing, where instances of
>    ':' are replaced by a '/'; this would get us to supporting
>    a similar probe specification as bcc without the backward
>    compatibility headaches.  I can't think of any valid
>    cases where SEC() definitions have a ':' that we would
>    replace with '/' in error, but I might be missing something.

I think at least for separating path and function name using ':' is
much better. I'd go with

SEC("uprobe//usr/lib64/libc.so.6:malloc")

for your example

>
>  - the current scheme doesn't support a raw offset address, since
>    it felt un-portable to encourage that, but can add this support
>    if needed.

I think for consistency with kprobe it's good to support it. And there
are local experimentation situations where this could be useful. So
let's add (sscanf() is pretty great at parsing this anyways)

>
>  - The auto-attach behaviour is to attach to all processes.
>    It would be good to have a way to specify the attach process
>    target. A few possibilities that would be compatible with
>    BPF skeleton support are to use the open opts (feels kind of
>    wrong conceptually since it's an attach-time attribute) or
>    to support opts with attach pid field in "struct bpf_prog_skeleton".
>    Latter would even allow a skeleton to attach to multiple
>    different processes with prog-level granularity (perhaps a union
>    of the various attach opts or similar?). There may be other
>    ways to achieve this.

Let's keep it simple and for auto-attach it's always -1 (all PIDs). If
that's not satisfactory, user shouldn't use auto-attach. Skeleton's
auto-attach (or bpf_program__attach()) is a convenience feature, not a
mandatory step.

>
> Changes since RFC [1]:
>  - focused on uprobe entry/return, omitting USDT attach (Andrii)
>  - use ELF program headers in calculating relative offsets, as this
>    works for the case where we do not specify a process.  The
>    previous approach relied on /proc/pid/maps so would not work
>    for the "all processes" case (where pid is -1).
>  - add support for auto-attach (patch 2)
>  - fix selftests to use a real library function.  I didn't notice
>    selftests override the usleep(3) definition, so as a result of
>    this, the libc function wasn't being called, so usleep() should
>    not be used to test shared library attach.  Also switch to
>    using libc path as the binary argument for these cases, as
>    specifying a shared library function name for a program is
>    not supported.  Tests now instrument malloc/free.
>  - added selftest that verifies auto-attach.
>
> [1] https://lore.kernel.org/bpf/1642004329-23514-1-git-send-email-alan.maguire@oracle.com/
> [2] https://www.kernel.org/doc/html/latest/trace/uprobetracer.html
>
> Alan Maguire (3):
>   libbpf: support function name-based attach for uprobes
>   libbpf: add auto-attach for uprobes based on section name
>   selftests/bpf: add tests for u[ret]probe attach by name
>
>  tools/lib/bpf/libbpf.c                             | 259 ++++++++++++++++++++-
>  tools/lib/bpf/libbpf.h                             |  10 +-
>  .../selftests/bpf/prog_tests/attach_probe.c        | 114 +++++++--
>  .../selftests/bpf/progs/test_attach_probe.c        |  33 +++
>  4 files changed, 396 insertions(+), 20 deletions(-)
>
> --
> 1.8.3.1
>
Alan Maguire Jan. 24, 2022, 2:13 p.m. UTC | #6
On Fri, 21 Jan 2022, Andrii Nakryiko wrote:

> On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > >
> > > This patch series is a refinement of the RFC patchset [1], focusing
> > > on support for attach by name for uprobes and uretprobes.  Still
> > > marked RFC as there are unresolved questions.
> > >
> > > Currently attach for such probes is done by determining the offset
> > > manually, so the aim is to try and mimic the simplicity of kprobe
> > > attach, making use of uprobe opts to specify a name string.
> > >
> > > uprobe attach is done by specifying a binary path, a pid (where
> > > 0 means "this process" and -1 means "all processes") and an
> > > offset.  Here a 'func_name' option is added to 'struct uprobe_opts'
> > > and that name is searched for in symbol tables.  If the binary
> > > is a program, relative offset calcuation must be done to the
> > > symbol address as described in [2].
> >
> > I think the pid discussion here and in the patches only causes
> > confusion. I think it's best to remove pid from the api.
> 
> It's already part of the uprobe API in libbpf
> (bpf_program__attach_uprobe), but nothing really changes there.
> API-wise Alan just added an optional func_name option. I think it
> makes sense overall.
> 
> For auto-attach it has to be all PIDs, of course.
> 

Makes sense.

> > uprobes are attached to an inode. They're not attached to a pid
> > or a process. Any existing process or future process started
> > from that inode (executable file) will have that uprobe triggering.
> > The kernel can do pid filtering through predicate mechanism,
> > but bpf uprobe doesn't do any filtering. iirc.

I _think_ there is filtering in uprobe_perf_func() - it calls
uprobe_perf_filter() prior to calling __uprobe_perf_func(), and
the latter runs the BPF program. Maybe I'm missing something here
though? However I think we need to give the user ways to minimize
the cost of breakpoint placement where possible. See below...

> > Similarly "attach to all processes" doesn't sound right either.
> > It's attached to all current and future processes.
> 

True, will fix for the next version.

I think for users it'd be good to clarify what the overheads are. If I 
want to see malloc()s in a particular process, say I specify the libc 
path along with the process ID I'm interested in.  This adds the
breakpoint to libc and will - as far as I understand it - trigger 
breakpoints system-wide which are then filtered out by uprobe perf handling
for the specific process specified.  That's pretty expensive 
performance-wise, so we should probably try and give users options to 
limit the cost in cases where they don't want to incur system-wide 
overheads. I've been investigating adding support for instrumenting shared 
library calls _within_ programs by placing the breakpoint on the procedure  
linking table call associated with the function.  As this is local to the  
program, it will only have an effect on malloc()s in that specific 
program.  So the next version will have a solution which allows us to 
trace malloc() in /usr/bin/foo as well as in libc. Thanks!

Alan
Andrii Nakryiko Jan. 24, 2022, 10:47 p.m. UTC | #7
On Mon, Jan 24, 2022 at 6:14 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 21 Jan 2022, Andrii Nakryiko wrote:
>
> > On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > > >
> > > > This patch series is a refinement of the RFC patchset [1], focusing
> > > > on support for attach by name for uprobes and uretprobes.  Still
> > > > marked RFC as there are unresolved questions.
> > > >
> > > > Currently attach for such probes is done by determining the offset
> > > > manually, so the aim is to try and mimic the simplicity of kprobe
> > > > attach, making use of uprobe opts to specify a name string.
> > > >
> > > > uprobe attach is done by specifying a binary path, a pid (where
> > > > 0 means "this process" and -1 means "all processes") and an
> > > > offset.  Here a 'func_name' option is added to 'struct uprobe_opts'
> > > > and that name is searched for in symbol tables.  If the binary
> > > > is a program, relative offset calcuation must be done to the
> > > > symbol address as described in [2].
> > >
> > > I think the pid discussion here and in the patches only causes
> > > confusion. I think it's best to remove pid from the api.
> >
> > It's already part of the uprobe API in libbpf
> > (bpf_program__attach_uprobe), but nothing really changes there.
> > API-wise Alan just added an optional func_name option. I think it
> > makes sense overall.
> >
> > For auto-attach it has to be all PIDs, of course.
> >
>
> Makes sense.
>
> > > uprobes are attached to an inode. They're not attached to a pid
> > > or a process. Any existing process or future process started
> > > from that inode (executable file) will have that uprobe triggering.
> > > The kernel can do pid filtering through predicate mechanism,
> > > but bpf uprobe doesn't do any filtering. iirc.
>
> I _think_ there is filtering in uprobe_perf_func() - it calls
> uprobe_perf_filter() prior to calling __uprobe_perf_func(), and
> the latter runs the BPF program. Maybe I'm missing something here
> though? However I think we need to give the user ways to minimize
> the cost of breakpoint placement where possible. See below...
>
> > > Similarly "attach to all processes" doesn't sound right either.
> > > It's attached to all current and future processes.
> >
>
> True, will fix for the next version.
>
> I think for users it'd be good to clarify what the overheads are. If I
> want to see malloc()s in a particular process, say I specify the libc
> path along with the process ID I'm interested in.  This adds the
> breakpoint to libc and will - as far as I understand it - trigger
> breakpoints system-wide which are then filtered out by uprobe perf handling
> for the specific process specified.  That's pretty expensive
> performance-wise, so we should probably try and give users options to
> limit the cost in cases where they don't want to incur system-wide
> overheads. I've been investigating adding support for instrumenting shared
> library calls _within_ programs by placing the breakpoint on the procedure
> linking table call associated with the function.  As this is local to the

You mean to patch PLT stubs ([0])? One concern with that is (besides
making sure that pt_regs still have exactly the same semantics and
stuff) that uprobes are much faster when patching nop instructions (if
the library was compiled with nop "preambles". Do you know if @plt
entries can be compiled with nops as well? The difference in
performance is more than 2x from my non-scientific testing recently.
So it can be a pretty big difference.

  [0] https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html

> program, it will only have an effect on malloc()s in that specific
> program.  So the next version will have a solution which allows us to
> trace malloc() in /usr/bin/foo as well as in libc. Thanks!
>
> Alan
Alan Maguire Jan. 27, 2022, 10:54 p.m. UTC | #8
On Mon, 24 Jan 2022, Andrii Nakryiko wrote:

> On Mon, Jan 24, 2022 at 6:14 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > I think for users it'd be good to clarify what the overheads are. If I
> > want to see malloc()s in a particular process, say I specify the libc
> > path along with the process ID I'm interested in.  This adds the
> > breakpoint to libc and will - as far as I understand it - trigger
> > breakpoints system-wide which are then filtered out by uprobe perf handling
> > for the specific process specified.  That's pretty expensive
> > performance-wise, so we should probably try and give users options to
> > limit the cost in cases where they don't want to incur system-wide
> > overheads. I've been investigating adding support for instrumenting shared
> > library calls _within_ programs by placing the breakpoint on the procedure
> > linking table call associated with the function.  As this is local to the
> 
> You mean to patch PLT stubs ([0])?

Yep, the .plt table, as shown by "objdump -D -j .plt <program>"

Disassembly of section .plt:

000000000040d020 <.plt>:
  40d020:	ff 35 e2 5f 4b 00    	pushq  0x4b5fe2(%rip)        # 
8c3008 <
_GLOBAL_OFFSET_TABLE_+0x8>
  40d026:	ff 25 e4 5f 4b 00    	jmpq   *0x4b5fe4(%rip)        # 
8c3010 
<_GLOBAL_OFFSET_TABLE_+0x10>
  40d02c:	0f 1f 40 00          	nopl   0x0(%rax)

000000000040d030 <inet_ntop@plt>:
  40d030:	ff 25 e2 5f 4b 00    	jmpq   *0x4b5fe2(%rip)        # 
8c3018 
<inet_ntop@GLIBC_2.2.5>
  40d036:	68 00 00 00 00       	pushq  $0x0
  40d03b:	e9 e0 ff ff ff       	jmpq   40d020 <.plt>

In the case of inet_ntop() the address would be 40d030 - which we
then do the relative address calcuation on, giving the address to
be uprobe'd as 0xd030.

> One concern with that is (besides
> making sure that pt_regs still have exactly the same semantics and
> stuff) that uprobes are much faster when patching nop instructions (if
> the library was compiled with nop "preambles". Do you know if @plt
> entries can be compiled with nops as well?

I haven't found any way to do that unfortunately.

> The difference in
> performance is more than 2x from my non-scientific testing recently.
> So it can be a pretty big difference.
> 

Interesting! There may be a cleaner way to achieve the goal of
tracing shared library calls in the local binary, but I'm not seeing
an alternate approach yet unfortunately.  To me the key thing is
to ensure we have an alternative to globally tracing in libc. I'll send 
out the v2 addressing the things you found in the RFC shortly (and that 
uses the .plt instrumentation approach). Thanks!

Alan