Message ID | 1642678950-19584-1-git-send-email-alan.maguire@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | libbpf: name-based u[ret]probe attach | expand |
On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > This patch series is a refinement of the RFC patchset [1], focusing > on support for attach by name for uprobes and uretprobes. Still > marked RFC as there are unresolved questions. > > Currently attach for such probes is done by determining the offset > manually, so the aim is to try and mimic the simplicity of kprobe > attach, making use of uprobe opts to specify a name string. > > uprobe attach is done by specifying a binary path, a pid (where > 0 means "this process" and -1 means "all processes") and an > offset. Here a 'func_name' option is added to 'struct uprobe_opts' > and that name is searched for in symbol tables. If the binary > is a program, relative offset calcuation must be done to the > symbol address as described in [2]. I think the pid discussion here and in the patches only causes confusion. I think it's best to remove pid from the api. uprobes are attached to an inode. They're not attached to a pid or a process. Any existing process or future process started from that inode (executable file) will have that uprobe triggering. The kernel can do pid filtering through predicate mechanism, but bpf uprobe doesn't do any filtering. iirc. Similarly "attach to all processes" doesn't sound right either. It's attached to all current and future processes.
On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > This patch series is a refinement of the RFC patchset [1], focusing > > on support for attach by name for uprobes and uretprobes. Still > > marked RFC as there are unresolved questions. > > > > Currently attach for such probes is done by determining the offset > > manually, so the aim is to try and mimic the simplicity of kprobe > > attach, making use of uprobe opts to specify a name string. > > > > uprobe attach is done by specifying a binary path, a pid (where > > 0 means "this process" and -1 means "all processes") and an > > offset. Here a 'func_name' option is added to 'struct uprobe_opts' > > and that name is searched for in symbol tables. If the binary > > is a program, relative offset calcuation must be done to the > > symbol address as described in [2]. > > I think the pid discussion here and in the patches only causes > confusion. I think it's best to remove pid from the api. It's already part of the uprobe API in libbpf (bpf_program__attach_uprobe), but nothing really changes there. API-wise Alan just added an optional func_name option. I think it makes sense overall. For auto-attach it has to be all PIDs, of course. > uprobes are attached to an inode. They're not attached to a pid > or a process. Any existing process or future process started > from that inode (executable file) will have that uprobe triggering. > The kernel can do pid filtering through predicate mechanism, > but bpf uprobe doesn't do any filtering. iirc. > Similarly "attach to all processes" doesn't sound right either. > It's attached to all current and future processes.
On Fri, Jan 21, 2022 at 10:15 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > > > This patch series is a refinement of the RFC patchset [1], focusing > > > on support for attach by name for uprobes and uretprobes. Still > > > marked RFC as there are unresolved questions. > > > > > > Currently attach for such probes is done by determining the offset > > > manually, so the aim is to try and mimic the simplicity of kprobe > > > attach, making use of uprobe opts to specify a name string. > > > > > > uprobe attach is done by specifying a binary path, a pid (where > > > 0 means "this process" and -1 means "all processes") and an > > > offset. Here a 'func_name' option is added to 'struct uprobe_opts' > > > and that name is searched for in symbol tables. If the binary > > > is a program, relative offset calcuation must be done to the > > > symbol address as described in [2]. > > > > I think the pid discussion here and in the patches only causes > > confusion. I think it's best to remove pid from the api. > > It's already part of the uprobe API in libbpf > (bpf_program__attach_uprobe), but nothing really changes there. > API-wise Alan just added an optional func_name option. I think it > makes sense overall. Technically it can be deprecated. So "it's already part of api" isn't really an excuse to keep confusing the users.
On Fri, Jan 21, 2022 at 10:20 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Jan 21, 2022 at 10:15 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > > > > > This patch series is a refinement of the RFC patchset [1], focusing > > > > on support for attach by name for uprobes and uretprobes. Still > > > > marked RFC as there are unresolved questions. > > > > > > > > Currently attach for such probes is done by determining the offset > > > > manually, so the aim is to try and mimic the simplicity of kprobe > > > > attach, making use of uprobe opts to specify a name string. > > > > > > > > uprobe attach is done by specifying a binary path, a pid (where > > > > 0 means "this process" and -1 means "all processes") and an > > > > offset. Here a 'func_name' option is added to 'struct uprobe_opts' > > > > and that name is searched for in symbol tables. If the binary > > > > is a program, relative offset calcuation must be done to the > > > > symbol address as described in [2]. > > > > > > I think the pid discussion here and in the patches only causes > > > confusion. I think it's best to remove pid from the api. > > > > It's already part of the uprobe API in libbpf > > (bpf_program__attach_uprobe), but nothing really changes there. > > API-wise Alan just added an optional func_name option. I think it > > makes sense overall. > > Technically it can be deprecated. > So "it's already part of api" isn't really an excuse to keep > confusing the users. ... but I don't find it confusing and no one really ever complained?.. So it doesn't seem like we need to remove this.
On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > This patch series is a refinement of the RFC patchset [1], focusing > on support for attach by name for uprobes and uretprobes. Still > marked RFC as there are unresolved questions. > > Currently attach for such probes is done by determining the offset > manually, so the aim is to try and mimic the simplicity of kprobe > attach, making use of uprobe opts to specify a name string. > > uprobe attach is done by specifying a binary path, a pid (where > 0 means "this process" and -1 means "all processes") and an > offset. Here a 'func_name' option is added to 'struct uprobe_opts' > and that name is searched for in symbol tables. If the binary > is a program, relative offset calcuation must be done to the > symbol address as described in [2]. > > Having a name allows us to support auto-attach via SEC() > specification, for example > > SEC("uprobe/usr/lib64/libc.so.6/malloc") > > Unresolved questions: > > - the current scheme uses > > u[ret]probe[/]/path/2/binary/function[+offset] that / after uprobe is not optional. This should be parsed as "uprobe/<path-to-binary>/<func_name>[+<offset>]", in general. If <path-to-binary> doesn't have leading '/' it will be just treated as a relative path. Otherwise it's going to be ambiguous. So with your example SEC("uprobe/usr/lib64/libc.so.6/malloc") you are specifying "usr/lib64/libc.so.6", relative path, which is wrong. It has to be SEC("uprobe//usr/lib64/libc.so.6/malloc"), however ugly that might look. > > ...as SEC() format for auto-attach, for example > > SEC("uprobe/usr/lib64/libc.so.6/malloc") > > It would be cleaner to delimit binary and function with ':' > as is done by bcc. One simple way to achieve that would be > to support section string pre-processing, where instances of > ':' are replaced by a '/'; this would get us to supporting > a similar probe specification as bcc without the backward > compatibility headaches. I can't think of any valid > cases where SEC() definitions have a ':' that we would > replace with '/' in error, but I might be missing something. I think at least for separating path and function name using ':' is much better. I'd go with SEC("uprobe//usr/lib64/libc.so.6:malloc") for your example > > - the current scheme doesn't support a raw offset address, since > it felt un-portable to encourage that, but can add this support > if needed. I think for consistency with kprobe it's good to support it. And there are local experimentation situations where this could be useful. So let's add (sscanf() is pretty great at parsing this anyways) > > - The auto-attach behaviour is to attach to all processes. > It would be good to have a way to specify the attach process > target. A few possibilities that would be compatible with > BPF skeleton support are to use the open opts (feels kind of > wrong conceptually since it's an attach-time attribute) or > to support opts with attach pid field in "struct bpf_prog_skeleton". > Latter would even allow a skeleton to attach to multiple > different processes with prog-level granularity (perhaps a union > of the various attach opts or similar?). There may be other > ways to achieve this. Let's keep it simple and for auto-attach it's always -1 (all PIDs). If that's not satisfactory, user shouldn't use auto-attach. Skeleton's auto-attach (or bpf_program__attach()) is a convenience feature, not a mandatory step. > > Changes since RFC [1]: > - focused on uprobe entry/return, omitting USDT attach (Andrii) > - use ELF program headers in calculating relative offsets, as this > works for the case where we do not specify a process. The > previous approach relied on /proc/pid/maps so would not work > for the "all processes" case (where pid is -1). > - add support for auto-attach (patch 2) > - fix selftests to use a real library function. I didn't notice > selftests override the usleep(3) definition, so as a result of > this, the libc function wasn't being called, so usleep() should > not be used to test shared library attach. Also switch to > using libc path as the binary argument for these cases, as > specifying a shared library function name for a program is > not supported. Tests now instrument malloc/free. > - added selftest that verifies auto-attach. > > [1] https://lore.kernel.org/bpf/1642004329-23514-1-git-send-email-alan.maguire@oracle.com/ > [2] https://www.kernel.org/doc/html/latest/trace/uprobetracer.html > > Alan Maguire (3): > libbpf: support function name-based attach for uprobes > libbpf: add auto-attach for uprobes based on section name > selftests/bpf: add tests for u[ret]probe attach by name > > tools/lib/bpf/libbpf.c | 259 ++++++++++++++++++++- > tools/lib/bpf/libbpf.h | 10 +- > .../selftests/bpf/prog_tests/attach_probe.c | 114 +++++++-- > .../selftests/bpf/progs/test_attach_probe.c | 33 +++ > 4 files changed, 396 insertions(+), 20 deletions(-) > > -- > 1.8.3.1 >
On Fri, 21 Jan 2022, Andrii Nakryiko wrote: > On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > > > This patch series is a refinement of the RFC patchset [1], focusing > > > on support for attach by name for uprobes and uretprobes. Still > > > marked RFC as there are unresolved questions. > > > > > > Currently attach for such probes is done by determining the offset > > > manually, so the aim is to try and mimic the simplicity of kprobe > > > attach, making use of uprobe opts to specify a name string. > > > > > > uprobe attach is done by specifying a binary path, a pid (where > > > 0 means "this process" and -1 means "all processes") and an > > > offset. Here a 'func_name' option is added to 'struct uprobe_opts' > > > and that name is searched for in symbol tables. If the binary > > > is a program, relative offset calcuation must be done to the > > > symbol address as described in [2]. > > > > I think the pid discussion here and in the patches only causes > > confusion. I think it's best to remove pid from the api. > > It's already part of the uprobe API in libbpf > (bpf_program__attach_uprobe), but nothing really changes there. > API-wise Alan just added an optional func_name option. I think it > makes sense overall. > > For auto-attach it has to be all PIDs, of course. > Makes sense. > > uprobes are attached to an inode. They're not attached to a pid > > or a process. Any existing process or future process started > > from that inode (executable file) will have that uprobe triggering. > > The kernel can do pid filtering through predicate mechanism, > > but bpf uprobe doesn't do any filtering. iirc. I _think_ there is filtering in uprobe_perf_func() - it calls uprobe_perf_filter() prior to calling __uprobe_perf_func(), and the latter runs the BPF program. Maybe I'm missing something here though? However I think we need to give the user ways to minimize the cost of breakpoint placement where possible. See below... > > Similarly "attach to all processes" doesn't sound right either. > > It's attached to all current and future processes. > True, will fix for the next version. I think for users it'd be good to clarify what the overheads are. If I want to see malloc()s in a particular process, say I specify the libc path along with the process ID I'm interested in. This adds the breakpoint to libc and will - as far as I understand it - trigger breakpoints system-wide which are then filtered out by uprobe perf handling for the specific process specified. That's pretty expensive performance-wise, so we should probably try and give users options to limit the cost in cases where they don't want to incur system-wide overheads. I've been investigating adding support for instrumenting shared library calls _within_ programs by placing the breakpoint on the procedure linking table call associated with the function. As this is local to the program, it will only have an effect on malloc()s in that specific program. So the next version will have a solution which allows us to trace malloc() in /usr/bin/foo as well as in libc. Thanks! Alan
On Mon, Jan 24, 2022 at 6:14 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > On Fri, 21 Jan 2022, Andrii Nakryiko wrote: > > > On Thu, Jan 20, 2022 at 5:44 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Thu, Jan 20, 2022 at 3:43 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > > > > > This patch series is a refinement of the RFC patchset [1], focusing > > > > on support for attach by name for uprobes and uretprobes. Still > > > > marked RFC as there are unresolved questions. > > > > > > > > Currently attach for such probes is done by determining the offset > > > > manually, so the aim is to try and mimic the simplicity of kprobe > > > > attach, making use of uprobe opts to specify a name string. > > > > > > > > uprobe attach is done by specifying a binary path, a pid (where > > > > 0 means "this process" and -1 means "all processes") and an > > > > offset. Here a 'func_name' option is added to 'struct uprobe_opts' > > > > and that name is searched for in symbol tables. If the binary > > > > is a program, relative offset calcuation must be done to the > > > > symbol address as described in [2]. > > > > > > I think the pid discussion here and in the patches only causes > > > confusion. I think it's best to remove pid from the api. > > > > It's already part of the uprobe API in libbpf > > (bpf_program__attach_uprobe), but nothing really changes there. > > API-wise Alan just added an optional func_name option. I think it > > makes sense overall. > > > > For auto-attach it has to be all PIDs, of course. > > > > Makes sense. > > > > uprobes are attached to an inode. They're not attached to a pid > > > or a process. Any existing process or future process started > > > from that inode (executable file) will have that uprobe triggering. > > > The kernel can do pid filtering through predicate mechanism, > > > but bpf uprobe doesn't do any filtering. iirc. > > I _think_ there is filtering in uprobe_perf_func() - it calls > uprobe_perf_filter() prior to calling __uprobe_perf_func(), and > the latter runs the BPF program. Maybe I'm missing something here > though? However I think we need to give the user ways to minimize > the cost of breakpoint placement where possible. See below... > > > > Similarly "attach to all processes" doesn't sound right either. > > > It's attached to all current and future processes. > > > > True, will fix for the next version. > > I think for users it'd be good to clarify what the overheads are. If I > want to see malloc()s in a particular process, say I specify the libc > path along with the process ID I'm interested in. This adds the > breakpoint to libc and will - as far as I understand it - trigger > breakpoints system-wide which are then filtered out by uprobe perf handling > for the specific process specified. That's pretty expensive > performance-wise, so we should probably try and give users options to > limit the cost in cases where they don't want to incur system-wide > overheads. I've been investigating adding support for instrumenting shared > library calls _within_ programs by placing the breakpoint on the procedure > linking table call associated with the function. As this is local to the You mean to patch PLT stubs ([0])? One concern with that is (besides making sure that pt_regs still have exactly the same semantics and stuff) that uprobes are much faster when patching nop instructions (if the library was compiled with nop "preambles". Do you know if @plt entries can be compiled with nops as well? The difference in performance is more than 2x from my non-scientific testing recently. So it can be a pretty big difference. [0] https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html > program, it will only have an effect on malloc()s in that specific > program. So the next version will have a solution which allows us to > trace malloc() in /usr/bin/foo as well as in libc. Thanks! > > Alan
On Mon, 24 Jan 2022, Andrii Nakryiko wrote: > On Mon, Jan 24, 2022 at 6:14 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > I think for users it'd be good to clarify what the overheads are. If I > > want to see malloc()s in a particular process, say I specify the libc > > path along with the process ID I'm interested in. This adds the > > breakpoint to libc and will - as far as I understand it - trigger > > breakpoints system-wide which are then filtered out by uprobe perf handling > > for the specific process specified. That's pretty expensive > > performance-wise, so we should probably try and give users options to > > limit the cost in cases where they don't want to incur system-wide > > overheads. I've been investigating adding support for instrumenting shared > > library calls _within_ programs by placing the breakpoint on the procedure > > linking table call associated with the function. As this is local to the > > You mean to patch PLT stubs ([0])? Yep, the .plt table, as shown by "objdump -D -j .plt <program>" Disassembly of section .plt: 000000000040d020 <.plt>: 40d020: ff 35 e2 5f 4b 00 pushq 0x4b5fe2(%rip) # 8c3008 < _GLOBAL_OFFSET_TABLE_+0x8> 40d026: ff 25 e4 5f 4b 00 jmpq *0x4b5fe4(%rip) # 8c3010 <_GLOBAL_OFFSET_TABLE_+0x10> 40d02c: 0f 1f 40 00 nopl 0x0(%rax) 000000000040d030 <inet_ntop@plt>: 40d030: ff 25 e2 5f 4b 00 jmpq *0x4b5fe2(%rip) # 8c3018 <inet_ntop@GLIBC_2.2.5> 40d036: 68 00 00 00 00 pushq $0x0 40d03b: e9 e0 ff ff ff jmpq 40d020 <.plt> In the case of inet_ntop() the address would be 40d030 - which we then do the relative address calcuation on, giving the address to be uprobe'd as 0xd030. > One concern with that is (besides > making sure that pt_regs still have exactly the same semantics and > stuff) that uprobes are much faster when patching nop instructions (if > the library was compiled with nop "preambles". Do you know if @plt > entries can be compiled with nops as well? I haven't found any way to do that unfortunately. > The difference in > performance is more than 2x from my non-scientific testing recently. > So it can be a pretty big difference. > Interesting! There may be a cleaner way to achieve the goal of tracing shared library calls in the local binary, but I'm not seeing an alternate approach yet unfortunately. To me the key thing is to ensure we have an alternative to globally tracing in libc. I'll send out the v2 addressing the things you found in the RFC shortly (and that uses the .plt instrumentation approach). Thanks! Alan