mbox series

[0/2] Objpool performance improvements

Message ID 20240424215214.3956041-1-andrii@kernel.org (mailing list archive)
Headers show
Series Objpool performance improvements | expand

Message

Andrii Nakryiko April 24, 2024, 9:52 p.m. UTC
Improve objpool (used heavily in kretprobe hot path) performance with two
improvements:
  - inlining performance critical objpool_push()/objpool_pop() operations;
  - avoiding re-calculating relatively expensive nr_possible_cpus().

These opportunities were found when benchmarking and profiling kprobes and
kretprobes with BPF-based benchmarks. See individual patches for details and
results.

Andrii Nakryiko (2):
  objpool: enable inlining objpool_push() and objpool_pop() operations
  objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids

 include/linux/objpool.h | 105 +++++++++++++++++++++++++++++++++++--
 lib/objpool.c           | 112 +++-------------------------------------
 2 files changed, 107 insertions(+), 110 deletions(-)

Comments

Masami Hiramatsu (Google) April 26, 2024, 2:25 p.m. UTC | #1
Hi Andrii,

On Wed, 24 Apr 2024 14:52:12 -0700
Andrii Nakryiko <andrii@kernel.org> wrote:

> Improve objpool (used heavily in kretprobe hot path) performance with two
> improvements:
>   - inlining performance critical objpool_push()/objpool_pop() operations;
>   - avoiding re-calculating relatively expensive nr_possible_cpus().

Thanks for optimizing objpool. Both looks good to me.

BTW, I don't intend to stop this short-term optimization attempts,
but I would like to ask you check the new fgraph based fprobe 
(kretprobe-multi)[1] instead of objpool/rethook.

[1] https://lore.kernel.org/all/171318533841.254850.15841395205784342850.stgit@devnote2/

I'm considering to obsolete the kretprobe (and rethook) with fprobe
and eventually remove it. Those have similar feature and we should
choose safer one.

Thank you,

> 
> These opportunities were found when benchmarking and profiling kprobes and
> kretprobes with BPF-based benchmarks. See individual patches for details and
> results.
> 
> Andrii Nakryiko (2):
>   objpool: enable inlining objpool_push() and objpool_pop() operations
>   objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
> 
>  include/linux/objpool.h | 105 +++++++++++++++++++++++++++++++++++--
>  lib/objpool.c           | 112 +++-------------------------------------
>  2 files changed, 107 insertions(+), 110 deletions(-)
> 
> -- 
> 2.43.0
>
Andrii Nakryiko April 26, 2024, 4:05 p.m. UTC | #2
On Fri, Apr 26, 2024 at 7:25 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi Andrii,
>
> On Wed, 24 Apr 2024 14:52:12 -0700
> Andrii Nakryiko <andrii@kernel.org> wrote:
>
> > Improve objpool (used heavily in kretprobe hot path) performance with two
> > improvements:
> >   - inlining performance critical objpool_push()/objpool_pop() operations;
> >   - avoiding re-calculating relatively expensive nr_possible_cpus().
>
> Thanks for optimizing objpool. Both looks good to me.

Great, thanks for applying.

>
> BTW, I don't intend to stop this short-term optimization attempts,
> but I would like to ask you check the new fgraph based fprobe
> (kretprobe-multi)[1] instead of objpool/rethook.

You can see that I did :) There is tons of code and I'm not familiar
with internals of function_graph infra, but you can see I did run
benchmarks, so I'm paying attention.

But as for the objpool itself, I think it's a performant and useful
internal building block, and we might use it outside of rethook as
well, so I think making it as fast as possible is good regardless.

>
> [1] https://lore.kernel.org/all/171318533841.254850.15841395205784342850.stgit@devnote2/
>
> I'm considering to obsolete the kretprobe (and rethook) with fprobe
> and eventually remove it. Those have similar feature and we should
> choose safer one.
>

Yep, I had a few more semi-ready patches, but I'll hold off for now
given this move to function graph, plus some of the changes that Jiri
is making in multi-kprobe code. I'll wait for things to settle down a
bit before looking again.

But just to give you some context, I'm an author of retsnoop tool, and
one of the killer features of it is LBR capture in kretprobes, which
is a tremendous help in investigating kernel failures, especially in
unfamiliar code (LBR allows to "look back" and figure out "how did we
get to this condition" after the fact). And so it's important to
minimize the amount of wasted LBR records between some kernel function
returns error (and thus is "an interesting event" and BPF program that
captures LBR is triggered). Big part of that is ftrace/fprobe/rethook
infra, so I was looking into making that part as "minimal" as
possible, in the sense of eliminating as many function calls and jump
as possible. This has an added benefit of making this hot path faster,
but my main motivation is LBR.

Anyways, just a bit of context for some of the other patches (like
inlining arch_rethook_trampoline_callback).

> Thank you,
>
> >
> > These opportunities were found when benchmarking and profiling kprobes and
> > kretprobes with BPF-based benchmarks. See individual patches for details and
> > results.
> >
> > Andrii Nakryiko (2):
> >   objpool: enable inlining objpool_push() and objpool_pop() operations
> >   objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
> >
> >  include/linux/objpool.h | 105 +++++++++++++++++++++++++++++++++++--
> >  lib/objpool.c           | 112 +++-------------------------------------
> >  2 files changed, 107 insertions(+), 110 deletions(-)
> >
> > --
> > 2.43.0
> >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>