mbox series

[RFC/PATCH,bpf-next,00/20] bpf: Add multi uprobe link

Message ID 20230424160447.2005755-1-jolsa@kernel.org (mailing list archive)
Headers show
Series bpf: Add multi uprobe link | expand

Message

Jiri Olsa April 24, 2023, 4:04 p.m. UTC
hi,
this patchset is adding support to attach multiple uprobes and usdt probes
through new uprobe_multi link.

The current uprobe is attached through the perf event and attaching many
uprobes takes a lot of time because of that.

The main reason is that we need to install perf event for each probed function
and profile shows perf event installation (perf_install_in_context) as culprit.

The new uprobe_multi link just creates raw uprobes and attaches the bpf
program to them without perf event being involved.

In addition to being faster we also save file descriptors. For the current
uprobe attach we use extra perf event fd for each probed function. The new
link just need one fd that covers all the functions we are attaching to.

By dropping perf we lose the ability to attach uprobe to specific pid.
We can workaround that by having pid check directly in the bpf program,
but we might need to check for another solution if that will turn out
to be a problem.


Attaching current bpftrace to 1000 uprobes:

  # BPFTRACE_MAX_PROBES=100000 perf stat -e cycles,instructions \
    ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }' 
    ...

     126,666,390,509      cycles                                                                
      29,973,207,307      instructions                     #    0.24  insn per cycle            

        85.284833554 seconds time elapsed


Same bpftrace setup with uprobe_multi support:

  # perf stat -e cycles,instructions \
    ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }' 
    ...

       6,818,470,649      cycles                                                                
      13,275,510,122      instructions                     #    1.95  insn per cycle            

         1.943269451 seconds time elapsed


I'm sending this as RFC because of:
  - I added/exported some new elf_* helper functions in libbpf,
    and I'm not sure that's the best/right way of doing this
  - I'm not completely sure about the usdt integration in bpf_program__attach_usdt,
    I was trying to detect uprobe_multi kernel support first, but ended up with
    just new field for struct bpf_usdt_opts
  - I plan to add more tests for usdt probes defined with refctr


Also available at:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  uprobe_multi

There are PRs for tetragon [1] and bpftrace [2] support.

thanks,
jirka


[1] https://github.com/cilium/tetragon/pull/936
[2] https://github.com/olsajiri/bpftrace/tree/uprobe_multi

Cc: Viktor Malik <viktor.malik@gmail.com>
Cc: Daniel Xu <dxu@dxuuu.xyz>
---
Jiri Olsa (20):
      bpf: Add multi uprobe link
      bpf: Add cookies support for uprobe_multi link
      bpf: Add bpf_get_func_ip helper support for uprobe link
      libbpf: Update uapi bpf.h tools header
      libbpf: Add uprobe_multi attach type and link names
      libbpf: Factor elf_for_each_symbol function
      libbpf: Add elf_find_multi_func_offset function
      libbpf: Add elf_find_patern_func_offset function
      libbpf: Add bpf_link_create support for multi uprobes
      libbpf: Add bpf_program__attach_uprobe_multi_opts function
      libbpf: Add support for uprobe.multi/uprobe.multi program sections
      libbpf: Add uprobe multi link support to bpf_program__attach_usdt
      selftests/bpf: Add uprobe_multi skel test
      selftests/bpf: Add uprobe_multi api test
      selftests/bpf: Add uprobe_multi link test
      selftests/bpf: Add uprobe_multi test program
      selftests/bpf: Add uprobe_multi bench test
      selftests/bpf: Add usdt_multi test program
      selftests/bpf: Add usdt_multi bench test
      selftests/bpf: Add uprobe_multi cookie test

 include/linux/trace_events.h                               |   6 +
 include/uapi/linux/bpf.h                                   |  15 +++
 kernel/bpf/syscall.c                                       |  18 ++-
 kernel/trace/bpf_trace.c                                   | 310 +++++++++++++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h                             |  15 +++
 tools/lib/bpf/bpf.c                                        |  10 ++
 tools/lib/bpf/bpf.h                                        |  10 +-
 tools/lib/bpf/libbpf.c                                     | 653 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
 tools/lib/bpf/libbpf.h                                     |  43 ++++++
 tools/lib/bpf/libbpf.map                                   |   3 +
 tools/lib/bpf/libbpf_internal.h                            |   2 +-
 tools/lib/bpf/usdt.c                                       | 127 +++++++++++++-----
 tools/testing/selftests/bpf/Makefile                       |  10 ++
 tools/testing/selftests/bpf/prog_tests/bpf_cookie.c        |  78 +++++++++++
 tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c | 286 ++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/uprobe_multi.c           |  72 +++++++++++
 tools/testing/selftests/bpf/progs/uprobe_multi_usdt.c      |  16 +++
 tools/testing/selftests/bpf/uprobe_multi.c                 |  53 ++++++++
 tools/testing/selftests/bpf/usdt_multi.c                   |  23 ++++
 19 files changed, 1634 insertions(+), 116 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_multi.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_multi_usdt.c
 create mode 100644 tools/testing/selftests/bpf/uprobe_multi.c
 create mode 100644 tools/testing/selftests/bpf/usdt_multi.c

Comments

Andrii Nakryiko April 26, 2023, 7:09 p.m. UTC | #1
On Mon, Apr 24, 2023 at 9:04 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> hi,
> this patchset is adding support to attach multiple uprobes and usdt probes
> through new uprobe_multi link.
>
> The current uprobe is attached through the perf event and attaching many
> uprobes takes a lot of time because of that.
>
> The main reason is that we need to install perf event for each probed function
> and profile shows perf event installation (perf_install_in_context) as culprit.
>
> The new uprobe_multi link just creates raw uprobes and attaches the bpf
> program to them without perf event being involved.
>
> In addition to being faster we also save file descriptors. For the current
> uprobe attach we use extra perf event fd for each probed function. The new
> link just need one fd that covers all the functions we are attaching to.

All of the above are good reasons and thanks for tackling multi-uprobe!

>
> By dropping perf we lose the ability to attach uprobe to specific pid.
> We can workaround that by having pid check directly in the bpf program,
> but we might need to check for another solution if that will turn out
> to be a problem.
>

I think this is a big deal, because it makes multi-uprobe not a
drop-in replacement for normal uprobes even for typical scenarios. It
might be why you couldn't do transparent use of uprobe.multi in USDT?

But I'm not sure why this is a problem? How does perf handle this?
Does it do runtime filtering or something more efficient that prevents
uprobe to be triggered for other PIDs in the first place? If it's the
former, then why can't we do the same simple check ourselves if pid
filter is specified?

I also see that uprobe_consumer has filter callback, not sure if it's
a better solution just for pid filtering, but might be another way to
do this?


Another aspect I wanted to discuss (and I don't know the right answer)
was whether we need to support separate binary path for each offset?
It would simplify (and trim down memory usage significantly) a bunch
of internals if we knew we are dealing with single inode for each
multi-uprobe link. I'm trying to think if it would be limiting in
practice to have to create link per each binary, and so far it seems
like usually user-space code will do symbol resolution per ELF file
anyways, so doesn't seem limiting to have single path + multiple
offsets/cookies within that file. For USDTs use case even ref_ctr is
probably the same, but I'd keep it 1:1 with offset and cookie anyways.
For uniformity and generality.

WDYT?

>
> Attaching current bpftrace to 1000 uprobes:
>
>   # BPFTRACE_MAX_PROBES=100000 perf stat -e cycles,instructions \
>     ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }'
>     ...
>
>      126,666,390,509      cycles
>       29,973,207,307      instructions                     #    0.24  insn per cycle
>
>         85.284833554 seconds time elapsed
>
>
> Same bpftrace setup with uprobe_multi support:
>
>   # perf stat -e cycles,instructions \
>     ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }'
>     ...
>
>        6,818,470,649      cycles
>       13,275,510,122      instructions                     #    1.95  insn per cycle
>
>          1.943269451 seconds time elapsed
>
>
> I'm sending this as RFC because of:
>   - I added/exported some new elf_* helper functions in libbpf,
>     and I'm not sure that's the best/right way of doing this

didn't get to that yet, sounds suspicious :)

>   - I'm not completely sure about the usdt integration in bpf_program__attach_usdt,
>     I was trying to detect uprobe_multi kernel support first, but ended up with
>     just new field for struct bpf_usdt_opts

haven't gotten to this yet as well, but it has to be auto-detectable,
not an option (at least I don't see why it wouldn't be, but let me get
to the patch)

>   - I plan to add more tests for usdt probes defined with refctr
>
>
> Also available at:
>   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   uprobe_multi
>
> There are PRs for tetragon [1] and bpftrace [2] support.
>
> thanks,
> jirka
>
>
> [1] https://github.com/cilium/tetragon/pull/936
> [2] https://github.com/olsajiri/bpftrace/tree/uprobe_multi
>
> Cc: Viktor Malik <viktor.malik@gmail.com>
> Cc: Daniel Xu <dxu@dxuuu.xyz>
> ---
> Jiri Olsa (20):
>       bpf: Add multi uprobe link
>       bpf: Add cookies support for uprobe_multi link
>       bpf: Add bpf_get_func_ip helper support for uprobe link
>       libbpf: Update uapi bpf.h tools header
>       libbpf: Add uprobe_multi attach type and link names
>       libbpf: Factor elf_for_each_symbol function
>       libbpf: Add elf_find_multi_func_offset function
>       libbpf: Add elf_find_patern_func_offset function
>       libbpf: Add bpf_link_create support for multi uprobes
>       libbpf: Add bpf_program__attach_uprobe_multi_opts function
>       libbpf: Add support for uprobe.multi/uprobe.multi program sections
>       libbpf: Add uprobe multi link support to bpf_program__attach_usdt
>       selftests/bpf: Add uprobe_multi skel test
>       selftests/bpf: Add uprobe_multi api test
>       selftests/bpf: Add uprobe_multi link test
>       selftests/bpf: Add uprobe_multi test program
>       selftests/bpf: Add uprobe_multi bench test
>       selftests/bpf: Add usdt_multi test program
>       selftests/bpf: Add usdt_multi bench test
>       selftests/bpf: Add uprobe_multi cookie test
>
>  include/linux/trace_events.h                               |   6 +
>  include/uapi/linux/bpf.h                                   |  15 +++
>  kernel/bpf/syscall.c                                       |  18 ++-
>  kernel/trace/bpf_trace.c                                   | 310 +++++++++++++++++++++++++++++++++++++++++++-
>  tools/include/uapi/linux/bpf.h                             |  15 +++
>  tools/lib/bpf/bpf.c                                        |  10 ++
>  tools/lib/bpf/bpf.h                                        |  10 +-
>  tools/lib/bpf/libbpf.c                                     | 653 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
>  tools/lib/bpf/libbpf.h                                     |  43 ++++++
>  tools/lib/bpf/libbpf.map                                   |   3 +
>  tools/lib/bpf/libbpf_internal.h                            |   2 +-
>  tools/lib/bpf/usdt.c                                       | 127 +++++++++++++-----
>  tools/testing/selftests/bpf/Makefile                       |  10 ++
>  tools/testing/selftests/bpf/prog_tests/bpf_cookie.c        |  78 +++++++++++
>  tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c | 286 ++++++++++++++++++++++++++++++++++++++++
>  tools/testing/selftests/bpf/progs/uprobe_multi.c           |  72 +++++++++++
>  tools/testing/selftests/bpf/progs/uprobe_multi_usdt.c      |  16 +++
>  tools/testing/selftests/bpf/uprobe_multi.c                 |  53 ++++++++
>  tools/testing/selftests/bpf/usdt_multi.c                   |  23 ++++
>  19 files changed, 1634 insertions(+), 116 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_multi.c
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_multi_usdt.c
>  create mode 100644 tools/testing/selftests/bpf/uprobe_multi.c
>  create mode 100644 tools/testing/selftests/bpf/usdt_multi.c
Jiri Olsa April 27, 2023, 12:44 p.m. UTC | #2
On Wed, Apr 26, 2023 at 12:09:59PM -0700, Andrii Nakryiko wrote:
> On Mon, Apr 24, 2023 at 9:04 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > hi,
> > this patchset is adding support to attach multiple uprobes and usdt probes
> > through new uprobe_multi link.
> >
> > The current uprobe is attached through the perf event and attaching many
> > uprobes takes a lot of time because of that.
> >
> > The main reason is that we need to install perf event for each probed function
> > and profile shows perf event installation (perf_install_in_context) as culprit.
> >
> > The new uprobe_multi link just creates raw uprobes and attaches the bpf
> > program to them without perf event being involved.
> >
> > In addition to being faster we also save file descriptors. For the current
> > uprobe attach we use extra perf event fd for each probed function. The new
> > link just need one fd that covers all the functions we are attaching to.
> 
> All of the above are good reasons and thanks for tackling multi-uprobe!
> 
> >
> > By dropping perf we lose the ability to attach uprobe to specific pid.
> > We can workaround that by having pid check directly in the bpf program,
> > but we might need to check for another solution if that will turn out
> > to be a problem.
> >
> 
> I think this is a big deal, because it makes multi-uprobe not a
> drop-in replacement for normal uprobes even for typical scenarios. It
> might be why you couldn't do transparent use of uprobe.multi in USDT?

yes

> 
> But I'm not sure why this is a problem? How does perf handle this?
> Does it do runtime filtering or something more efficient that prevents
> uprobe to be triggered for other PIDs in the first place? If it's the
> former, then why can't we do the same simple check ourselves if pid
> filter is specified?

so the standard uprobe is basically a perf event and as such it can be
created with 'pid' as a target.. and such perf event will get installed
only when the process with that pid is scheduled in and uninstalled
when it's scheduled out

> 
> I also see that uprobe_consumer has filter callback, not sure if it's
> a better solution just for pid filtering, but might be another way to
> do this?

yes, that's probably how we will have to do that, will check

> 
> Another aspect I wanted to discuss (and I don't know the right answer)
> was whether we need to support separate binary path for each offset?
> It would simplify (and trim down memory usage significantly) a bunch
> of internals if we knew we are dealing with single inode for each
> multi-uprobe link. I'm trying to think if it would be limiting in
> practice to have to create link per each binary, and so far it seems
> like usually user-space code will do symbol resolution per ELF file
> anyways, so doesn't seem limiting to have single path + multiple
> offsets/cookies within that file. For USDTs use case even ref_ctr is
> probably the same, but I'd keep it 1:1 with offset and cookie anyways.
> For uniformity and generality.
> 
> WDYT?

right, it's waste for single binary, but I guess it's not a big waste,
because when you have single binary you just repeat the same pointer,
not the path

it's fast enough to be called multiple times for each binary you want
to trace, but it'd be also nice to be able to attach all in once ;-)

maybe we could have a bit in flags saying paths[0] is valid for all

> 
> >
> > Attaching current bpftrace to 1000 uprobes:
> >
> >   # BPFTRACE_MAX_PROBES=100000 perf stat -e cycles,instructions \
> >     ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }'
> >     ...
> >
> >      126,666,390,509      cycles
> >       29,973,207,307      instructions                     #    0.24  insn per cycle
> >
> >         85.284833554 seconds time elapsed
> >
> >
> > Same bpftrace setup with uprobe_multi support:
> >
> >   # perf stat -e cycles,instructions \
> >     ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }'
> >     ...
> >
> >        6,818,470,649      cycles
> >       13,275,510,122      instructions                     #    1.95  insn per cycle
> >
> >          1.943269451 seconds time elapsed
> >
> >
> > I'm sending this as RFC because of:
> >   - I added/exported some new elf_* helper functions in libbpf,
> >     and I'm not sure that's the best/right way of doing this
> 
> didn't get to that yet, sounds suspicious :)
> 
> >   - I'm not completely sure about the usdt integration in bpf_program__attach_usdt,
> >     I was trying to detect uprobe_multi kernel support first, but ended up with
> >     just new field for struct bpf_usdt_opts
> 
> haven't gotten to this yet as well, but it has to be auto-detectable,
> not an option (at least I don't see why it wouldn't be, but let me get
> to the patch)

thanks,
jirka
Andrii Nakryiko April 27, 2023, 10:24 p.m. UTC | #3
On Thu, Apr 27, 2023 at 5:44 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Apr 26, 2023 at 12:09:59PM -0700, Andrii Nakryiko wrote:
> > On Mon, Apr 24, 2023 at 9:04 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > hi,
> > > this patchset is adding support to attach multiple uprobes and usdt probes
> > > through new uprobe_multi link.
> > >
> > > The current uprobe is attached through the perf event and attaching many
> > > uprobes takes a lot of time because of that.
> > >
> > > The main reason is that we need to install perf event for each probed function
> > > and profile shows perf event installation (perf_install_in_context) as culprit.
> > >
> > > The new uprobe_multi link just creates raw uprobes and attaches the bpf
> > > program to them without perf event being involved.
> > >
> > > In addition to being faster we also save file descriptors. For the current
> > > uprobe attach we use extra perf event fd for each probed function. The new
> > > link just need one fd that covers all the functions we are attaching to.
> >
> > All of the above are good reasons and thanks for tackling multi-uprobe!
> >
> > >
> > > By dropping perf we lose the ability to attach uprobe to specific pid.
> > > We can workaround that by having pid check directly in the bpf program,
> > > but we might need to check for another solution if that will turn out
> > > to be a problem.
> > >
> >
> > I think this is a big deal, because it makes multi-uprobe not a
> > drop-in replacement for normal uprobes even for typical scenarios. It
> > might be why you couldn't do transparent use of uprobe.multi in USDT?
>
> yes
>
> >
> > But I'm not sure why this is a problem? How does perf handle this?
> > Does it do runtime filtering or something more efficient that prevents
> > uprobe to be triggered for other PIDs in the first place? If it's the
> > former, then why can't we do the same simple check ourselves if pid
> > filter is specified?
>
> so the standard uprobe is basically a perf event and as such it can be
> created with 'pid' as a target.. and such perf event will get installed
> only when the process with that pid is scheduled in and uninstalled
> when it's scheduled out
>
> >
> > I also see that uprobe_consumer has filter callback, not sure if it's
> > a better solution just for pid filtering, but might be another way to
> > do this?
>
> yes, that's probably how we will have to do that, will check

callback seems like overkill as we'll be paying indirect call price.
So a simple if statement in either uprobe_prog_run or in
uprobe_multi_link_ret_handler/uprobe_multi_link_handler seems like
better solution, IMO.


>
> >
> > Another aspect I wanted to discuss (and I don't know the right answer)
> > was whether we need to support separate binary path for each offset?
> > It would simplify (and trim down memory usage significantly) a bunch
> > of internals if we knew we are dealing with single inode for each
> > multi-uprobe link. I'm trying to think if it would be limiting in
> > practice to have to create link per each binary, and so far it seems
> > like usually user-space code will do symbol resolution per ELF file
> > anyways, so doesn't seem limiting to have single path + multiple
> > offsets/cookies within that file. For USDTs use case even ref_ctr is
> > probably the same, but I'd keep it 1:1 with offset and cookie anyways.
> > For uniformity and generality.
> >
> > WDYT?
>
> right, it's waste for single binary, but I guess it's not a big waste,
> because when you have single binary you just repeat the same pointer,
> not the path
>
> it's fast enough to be called multiple times for each binary you want
> to trace, but it'd be also nice to be able to attach all in once ;-)
>
> maybe we could have a bit in flags saying paths[0] is valid for all

No need for extra flags. I was just thinking about having a simpler
and more straightforward API, where you don't need to create another
array with tons of duplicated string pointers. No big deal, I'm fine
either way.

>
> >
> > >
> > > Attaching current bpftrace to 1000 uprobes:
> > >
> > >   # BPFTRACE_MAX_PROBES=100000 perf stat -e cycles,instructions \
> > >     ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }'
> > >     ...
> > >
> > >      126,666,390,509      cycles
> > >       29,973,207,307      instructions                     #    0.24  insn per cycle
> > >
> > >         85.284833554 seconds time elapsed
> > >
> > >
> > > Same bpftrace setup with uprobe_multi support:
> > >
> > >   # perf stat -e cycles,instructions \
> > >     ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }'
> > >     ...
> > >
> > >        6,818,470,649      cycles
> > >       13,275,510,122      instructions                     #    1.95  insn per cycle
> > >
> > >          1.943269451 seconds time elapsed
> > >
> > >
> > > I'm sending this as RFC because of:
> > >   - I added/exported some new elf_* helper functions in libbpf,
> > >     and I'm not sure that's the best/right way of doing this
> >
> > didn't get to that yet, sounds suspicious :)
> >
> > >   - I'm not completely sure about the usdt integration in bpf_program__attach_usdt,
> > >     I was trying to detect uprobe_multi kernel support first, but ended up with
> > >     just new field for struct bpf_usdt_opts
> >
> > haven't gotten to this yet as well, but it has to be auto-detectable,
> > not an option (at least I don't see why it wouldn't be, but let me get
> > to the patch)
>
> thanks,
> jirka
Jiri Olsa April 28, 2023, 10:55 a.m. UTC | #4
On Thu, Apr 27, 2023 at 03:24:25PM -0700, Andrii Nakryiko wrote:
> On Thu, Apr 27, 2023 at 5:44 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Wed, Apr 26, 2023 at 12:09:59PM -0700, Andrii Nakryiko wrote:
> > > On Mon, Apr 24, 2023 at 9:04 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > >
> > > > hi,
> > > > this patchset is adding support to attach multiple uprobes and usdt probes
> > > > through new uprobe_multi link.
> > > >
> > > > The current uprobe is attached through the perf event and attaching many
> > > > uprobes takes a lot of time because of that.
> > > >
> > > > The main reason is that we need to install perf event for each probed function
> > > > and profile shows perf event installation (perf_install_in_context) as culprit.
> > > >
> > > > The new uprobe_multi link just creates raw uprobes and attaches the bpf
> > > > program to them without perf event being involved.
> > > >
> > > > In addition to being faster we also save file descriptors. For the current
> > > > uprobe attach we use extra perf event fd for each probed function. The new
> > > > link just need one fd that covers all the functions we are attaching to.
> > >
> > > All of the above are good reasons and thanks for tackling multi-uprobe!
> > >
> > > >
> > > > By dropping perf we lose the ability to attach uprobe to specific pid.
> > > > We can workaround that by having pid check directly in the bpf program,
> > > > but we might need to check for another solution if that will turn out
> > > > to be a problem.
> > > >
> > >
> > > I think this is a big deal, because it makes multi-uprobe not a
> > > drop-in replacement for normal uprobes even for typical scenarios. It
> > > might be why you couldn't do transparent use of uprobe.multi in USDT?
> >
> > yes
> >
> > >
> > > But I'm not sure why this is a problem? How does perf handle this?
> > > Does it do runtime filtering or something more efficient that prevents
> > > uprobe to be triggered for other PIDs in the first place? If it's the
> > > former, then why can't we do the same simple check ourselves if pid
> > > filter is specified?
> >
> > so the standard uprobe is basically a perf event and as such it can be
> > created with 'pid' as a target.. and such perf event will get installed
> > only when the process with that pid is scheduled in and uninstalled
> > when it's scheduled out
> >
> > >
> > > I also see that uprobe_consumer has filter callback, not sure if it's
> > > a better solution just for pid filtering, but might be another way to
> > > do this?
> >
> > yes, that's probably how we will have to do that, will check
> 
> callback seems like overkill as we'll be paying indirect call price.
> So a simple if statement in either uprobe_prog_run or in
> uprobe_multi_link_ret_handler/uprobe_multi_link_handler seems like
> better solution, IMO.

it looks like the consumer->filter is checked/executed before installing
the breakpoint for uprobe, so it could be actually faster than current
uprobe pid filter.. I'll check and have it there in next version

> 
> 
> >
> > >
> > > Another aspect I wanted to discuss (and I don't know the right answer)
> > > was whether we need to support separate binary path for each offset?
> > > It would simplify (and trim down memory usage significantly) a bunch
> > > of internals if we knew we are dealing with single inode for each
> > > multi-uprobe link. I'm trying to think if it would be limiting in
> > > practice to have to create link per each binary, and so far it seems
> > > like usually user-space code will do symbol resolution per ELF file
> > > anyways, so doesn't seem limiting to have single path + multiple
> > > offsets/cookies within that file. For USDTs use case even ref_ctr is
> > > probably the same, but I'd keep it 1:1 with offset and cookie anyways.
> > > For uniformity and generality.
> > >
> > > WDYT?
> >
> > right, it's waste for single binary, but I guess it's not a big waste,
> > because when you have single binary you just repeat the same pointer,
> > not the path
> >
> > it's fast enough to be called multiple times for each binary you want
> > to trace, but it'd be also nice to be able to attach all in once ;-)
> >
> > maybe we could have a bit in flags saying paths[0] is valid for all
> 
> No need for extra flags. I was just thinking about having a simpler
> and more straightforward API, where you don't need to create another
> array with tons of duplicated string pointers. No big deal, I'm fine
> either way.

ok

thanks,
jirka
Andrii Nakryiko April 28, 2023, 9:19 p.m. UTC | #5
On Fri, Apr 28, 2023 at 3:55 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 03:24:25PM -0700, Andrii Nakryiko wrote:
> > On Thu, Apr 27, 2023 at 5:44 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Wed, Apr 26, 2023 at 12:09:59PM -0700, Andrii Nakryiko wrote:
> > > > On Mon, Apr 24, 2023 at 9:04 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > > >
> > > > > hi,
> > > > > this patchset is adding support to attach multiple uprobes and usdt probes
> > > > > through new uprobe_multi link.
> > > > >
> > > > > The current uprobe is attached through the perf event and attaching many
> > > > > uprobes takes a lot of time because of that.
> > > > >
> > > > > The main reason is that we need to install perf event for each probed function
> > > > > and profile shows perf event installation (perf_install_in_context) as culprit.
> > > > >
> > > > > The new uprobe_multi link just creates raw uprobes and attaches the bpf
> > > > > program to them without perf event being involved.
> > > > >
> > > > > In addition to being faster we also save file descriptors. For the current
> > > > > uprobe attach we use extra perf event fd for each probed function. The new
> > > > > link just need one fd that covers all the functions we are attaching to.
> > > >
> > > > All of the above are good reasons and thanks for tackling multi-uprobe!
> > > >
> > > > >
> > > > > By dropping perf we lose the ability to attach uprobe to specific pid.
> > > > > We can workaround that by having pid check directly in the bpf program,
> > > > > but we might need to check for another solution if that will turn out
> > > > > to be a problem.
> > > > >
> > > >
> > > > I think this is a big deal, because it makes multi-uprobe not a
> > > > drop-in replacement for normal uprobes even for typical scenarios. It
> > > > might be why you couldn't do transparent use of uprobe.multi in USDT?
> > >
> > > yes
> > >
> > > >
> > > > But I'm not sure why this is a problem? How does perf handle this?
> > > > Does it do runtime filtering or something more efficient that prevents
> > > > uprobe to be triggered for other PIDs in the first place? If it's the
> > > > former, then why can't we do the same simple check ourselves if pid
> > > > filter is specified?
> > >
> > > so the standard uprobe is basically a perf event and as such it can be
> > > created with 'pid' as a target.. and such perf event will get installed
> > > only when the process with that pid is scheduled in and uninstalled
> > > when it's scheduled out
> > >
> > > >
> > > > I also see that uprobe_consumer has filter callback, not sure if it's
> > > > a better solution just for pid filtering, but might be another way to
> > > > do this?
> > >
> > > yes, that's probably how we will have to do that, will check
> >
> > callback seems like overkill as we'll be paying indirect call price.
> > So a simple if statement in either uprobe_prog_run or in
> > uprobe_multi_link_ret_handler/uprobe_multi_link_handler seems like
> > better solution, IMO.
>
> it looks like the consumer->filter is checked/executed before installing
> the breakpoint for uprobe, so it could be actually faster than current
> uprobe pid filter.. I'll check and have it there in next version

ah, so if it's not executed on each uprobe run, then yeah, that would be best

>
> >
> >
> > >
> > > >
> > > > Another aspect I wanted to discuss (and I don't know the right answer)
> > > > was whether we need to support separate binary path for each offset?
> > > > It would simplify (and trim down memory usage significantly) a bunch
> > > > of internals if we knew we are dealing with single inode for each
> > > > multi-uprobe link. I'm trying to think if it would be limiting in
> > > > practice to have to create link per each binary, and so far it seems
> > > > like usually user-space code will do symbol resolution per ELF file
> > > > anyways, so doesn't seem limiting to have single path + multiple
> > > > offsets/cookies within that file. For USDTs use case even ref_ctr is
> > > > probably the same, but I'd keep it 1:1 with offset and cookie anyways.
> > > > For uniformity and generality.
> > > >
> > > > WDYT?
> > >
> > > right, it's waste for single binary, but I guess it's not a big waste,
> > > because when you have single binary you just repeat the same pointer,
> > > not the path
> > >
> > > it's fast enough to be called multiple times for each binary you want
> > > to trace, but it'd be also nice to be able to attach all in once ;-)
> > >
> > > maybe we could have a bit in flags saying paths[0] is valid for all
> >
> > No need for extra flags. I was just thinking about having a simpler
> > and more straightforward API, where you don't need to create another
> > array with tons of duplicated string pointers. No big deal, I'm fine
> > either way.
>
> ok
>
> thanks,
> jirka