mbox series

[v2,00/17] Support dynamic opening of capstone/llvm remove BUILD_NONDISTRO

Message ID 20250122062332.577009-1-irogers@google.com (mailing list archive)
Headers show
Series Support dynamic opening of capstone/llvm remove BUILD_NONDISTRO | expand

Message

Ian Rogers Jan. 22, 2025, 6:23 a.m. UTC
Linking against libcapstone and libLLVM can be a significant increase
in dependencies and size of memory footprint. For something like `perf
record` the disassembler and addr2line functionality won't be
used. Support dynamically loading these libraries using dlopen and
then calling the appropriate functions found using dlsym.

BUILD_NONDISTRO is used to build perf against the license incompatible
libbfd and libiberty libraries. As this has been opt-in for nearly 2
years, commit dd317df07207 ("perf build: Make binutil libraries opt
in"), remove the code to simplify the code base.

The patch series:
1) does some initial clean up;
2) moves the capstone and LLVM code to their own C files,
3) simplifies a little the capstone code;
4) adds perf_ variants of the functions that will either directly call
   the function or use dlsym to discover it;
5) adds BPF JIT disassembly support to LLVM and capstone disassembly;
6) removes the BUILD_NONDISTRO code, reduces scope and removes what's possible.

The addr2line LLVM functionality is written in C++. To avoid linking
against libLLVM for this, a new LIBLLVM_DYNAMIC option is added where
the C++ code with the libLLVM dependency will be built into a
libperf-llvm.so and that dlsym-ed and called against. Ideally LLVM
would extend their C API to avoid this.

The libbfd BPF disassembly supported source lines, this wasn't ported
to the capstone and LLVM disassembly.

v2: Add mangling of the function names in libperf-llvm.so to avoid
    potential infinite recursion. Add BPF JIT disassembly support to
    LLVM and capstone. Add/rebase the BUILD_NONDISTRO cleanup onto the
    series from:
    https://lore.kernel.org/lkml/20250111202851.1075338-1-irogers@google.com/
    Some other minor additional clean up.

Ian Rogers (17):
  perf build: Remove libtracefs configuration
  perf map: Constify objdump offset/address conversion APIs
  perf capstone: Move capstone functionality into its own file
  perf llvm: Move llvm functionality into its own file
  perf capstone: Remove open_capstone_handle
  perf capstone: Support for dlopen-ing libcapstone.so
  perf llvm: Support for dlopen-ing libLLVM.so
  perf llvm: Mangle libperf-llvm.so function names
  perf dso: Move read_symbol from llvm/capstone to dso
  perf dso: Support BPF programs in dso__read_symbol
  perf llvm: Disassemble cleanup
  perf dso: Clean up read_symbol error handling
  perf build: Remove libbfd support
  perf build: Remove libiberty support
  perf build: Remove unused defines
  perf disasm: Remove disasm_bpf
  perf disasm: Make ins__scnprintf and ins__is_nop static

 tools/perf/Documentation/perf-check.txt |   1 -
 tools/perf/Makefile.config              |  90 +---
 tools/perf/Makefile.perf                |  35 +-
 tools/perf/builtin-check.c              |   1 -
 tools/perf/builtin-script.c             |   2 -
 tools/perf/tests/Build                  |   1 -
 tools/perf/tests/builtin-test.c         |   1 -
 tools/perf/tests/make                   |   4 +-
 tools/perf/tests/pe-file-parsing.c      | 101 ----
 tools/perf/tests/tests.h                |   1 -
 tools/perf/util/Build                   |   5 +-
 tools/perf/util/annotate.h              |   1 -
 tools/perf/util/capstone.c              | 682 ++++++++++++++++++++++++
 tools/perf/util/capstone.h              |  24 +
 tools/perf/util/demangle-cxx.cpp        |  22 +-
 tools/perf/util/disasm.c                | 632 +---------------------
 tools/perf/util/disasm.h                |   5 +-
 tools/perf/util/disasm_bpf.c            | 195 -------
 tools/perf/util/disasm_bpf.h            |  12 -
 tools/perf/util/dso.c                   |  98 ++++
 tools/perf/util/dso.h                   |   4 +
 tools/perf/util/llvm-c-helpers.cpp      | 120 ++++-
 tools/perf/util/llvm-c-helpers.h        |  24 +-
 tools/perf/util/llvm.c                  | 489 +++++++++++++++++
 tools/perf/util/llvm.h                  |  24 +
 tools/perf/util/map.c                   |  19 +-
 tools/perf/util/map.h                   |   6 +-
 tools/perf/util/print_insn.c            | 117 +---
 tools/perf/util/srcline.c               | 306 +----------
 tools/perf/util/srcline.h               |   6 +
 tools/perf/util/symbol-elf.c            |  95 ----
 tools/perf/util/symbol.c                | 135 -----
 tools/perf/util/symbol.h                |   4 -
 33 files changed, 1552 insertions(+), 1710 deletions(-)
 delete mode 100644 tools/perf/tests/pe-file-parsing.c
 create mode 100644 tools/perf/util/capstone.c
 create mode 100644 tools/perf/util/capstone.h
 delete mode 100644 tools/perf/util/disasm_bpf.c
 delete mode 100644 tools/perf/util/disasm_bpf.h
 create mode 100644 tools/perf/util/llvm.c
 create mode 100644 tools/perf/util/llvm.h

Comments

Andi Kleen Jan. 22, 2025, 3:20 p.m. UTC | #1
On Tue, Jan 21, 2025 at 10:23:15PM -0800, Ian Rogers wrote:
> Linking against libcapstone and libLLVM can be a significant increase
> in dependencies and size of memory footprint. For something like `perf
> record` the disassembler and addr2line functionality won't be
> used. Support dynamically loading these libraries using dlopen and
> then calling the appropriate functions found using dlsym.

It's unclear to me what this actually fixes. If the code is not used
it should not be faulted in and the dynamic linker is lazy too, so 
if it's not used, it won't even be linked. 

I don't see any numbers, but it won't surprise me if it improved
actual run time or memory usage significantly.

-Andi
Ian Rogers Jan. 22, 2025, 4:11 p.m. UTC | #2
On Wed, Jan 22, 2025 at 7:21 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> On Tue, Jan 21, 2025 at 10:23:15PM -0800, Ian Rogers wrote:
> > Linking against libcapstone and libLLVM can be a significant increase
> > in dependencies and size of memory footprint. For something like `perf
> > record` the disassembler and addr2line functionality won't be
> > used. Support dynamically loading these libraries using dlopen and
> > then calling the appropriate functions found using dlsym.
>
> It's unclear to me what this actually fixes. If the code is not used
> it should not be faulted in and the dynamic linker is lazy too, so
> if it's not used, it won't even be linked.
>
> I don't see any numbers, but it won't surprise me if it improved
> actual run time or memory usage significantly.

In certain scenarios, like data centers, it can be useful to
statically link all your dependencies to avoid dll hell. The X86
disassembler alone in libllvm is of a size comparable to the perf tool
- I think this speaks to us doing a reasonably good job of size
optimization of the events/metrics in the perf tool. We want these
dependencies for the performance over forking objdump and addr2line,
but we don't want it baked in - unless the person doing the build
wants this and this is still the default if the libraries are detected
by Makefile.config. Using dlopen also means distributions can have a
perf tool that doesn't drag in libLLVM.so and a universe of
dependencies, but when it is installed get the performance advantages.
In data centres having fast disassembly/addr2line is less of a
priority over the binary size cost replicated over 10,000s of machines
because those machines don't tend to be running the annotate/report
commands.

Fwiw, Namhyung's uftrace is doing something similar for python:
https://github.com/namhyung/uftrace/blob/master/utils/script-python.c#L139
and I wish the perf tool were also doing this. I think it is much
nicer to have the tool fail at runtime because of a missing dependency
which you can then install should you want it, rather than doing an
equivalent within the code base with #ifdefs and needing users to
recompile. This patch series significantly reduces the #ifdefs in
places like the core disasm code.

Thanks,
Ian
Andi Kleen Jan. 23, 2025, 6:19 p.m. UTC | #3
> In certain scenarios, like data centers, it can be useful to
> statically link all your dependencies to avoid dll hell.

Yes but it won't be loaded into memory if not used. Executable
loading is all lazy. Maybe look a page fault trace for loading
perf if you don't believe me.

So you're trying to optimize disk space here?

I didn't see that in the cover letter.

It doesn't seem like a very good reason for such an intrusive patch kit.

If it's a serious concern maybe investigate an executable compressor?

> The X86
> disassembler alone in libllvm is of a size comparable to the perf tool

I agree that LLVM is a serious bloat and DLL hell concern, but I don't think 
dlopen is the answer here.

-Andi
Ian Rogers Jan. 23, 2025, 9:24 p.m. UTC | #4
On Thu, Jan 23, 2025 at 10:19 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> > In certain scenarios, like data centers, it can be useful to
> > statically link all your dependencies to avoid dll hell.
>
> Yes but it won't be loaded into memory if not used. Executable
> loading is all lazy. Maybe look a page fault trace for loading
> perf if you don't believe me.
>
> So you're trying to optimize disk space here?
>
> I didn't see that in the cover letter.

For me yes, for distributions it is dependencies. This is already in
the v3 message:
https://lore.kernel.org/lkml/20250122174308.350350-1-irogers@google.com/

> It doesn't seem like a very good reason for such an intrusive patch kit.

The capstone and LLVM code is preexisting. Moving the capstone/llvm
code to their own files isn't dependent on dlopen, it does make it
nicer to have a single place we're doing dlopen. The change to shim
the capstone/LLVM calls looks like this:
https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/tools/perf/util/llvm.c#L160-L182
That is a shim is introduced that either calls through to the function
if we're linking against libcapstone/llvm or does the dlsym. There are
7 such functions in the LLVM code. I don't think shimming 7 functions
is at the scale of hugely intrusive.

> If it's a serious concern maybe investigate an executable compressor?

Perhaps just have a squashfs partition.

Fwiw, excluding dependencies I think compression on the events is a
good solution. Convert json events/metrics to a sysfs file with the
cpuid in the path, add the compressed file to the binary as data, find
"json" events by iterating the directories in the compressed file,
etc. A single filesystem approach to event lookup can mean we do some
kind of unionfs style lookup of events, which could support users
adding their own events/metrics in a directory. Zip doesn't support
compressing across files, which is something of a requirement here,
other formats do but it's a case of optimizing for some kind of
libarchive sweet spot. The opportunity here is that about 70% of the
binary is event encodings, a compressed file is about 30% of the
current binary size, so we could reduce the binary size by about 40%.

> > The X86
> > disassembler alone in libllvm is of a size comparable to the perf tool
>
> I agree that LLVM is a serious bloat and DLL hell concern, but I don't think
> dlopen is the answer here.

Agreed, but it's where the code is at. addr2line command or use LLVM
for some performance. I think having an inbuilt solution would be best
longer term - we spend energy trying to parse and understand text
output from tools/libraries when the information is just sitting there
in the instruction encoding. Such a solution would be brittle for
things like new dwarf information, so we may want to have fallbacks
like LLVM but having a loosely coupled dependency using dlopen feels
preferable there, to aid package maintainers.

Thanks,
Ian
Namhyung Kim Jan. 23, 2025, 9:59 p.m. UTC | #5
On Tue, Jan 21, 2025 at 10:23:15PM -0800, Ian Rogers wrote:
> Linking against libcapstone and libLLVM can be a significant increase
> in dependencies and size of memory footprint. For something like `perf
> record` the disassembler and addr2line functionality won't be
> used. Support dynamically loading these libraries using dlopen and
> then calling the appropriate functions found using dlsym.

It's not clear from the description how you would use dlopen/dlsym.
Based on an offline discussion, you want to leave the current linking
model as is, and to support dlopen/dlsym when it's NOT detected at
build-time, right?

For that, you need to carry some definitions of the functions and types
for the used APIs.  But I'm not sure if it's right to carry them in the
perf code base.

> 
> BUILD_NONDISTRO is used to build perf against the license incompatible
> libbfd and libiberty libraries. As this has been opt-in for nearly 2
> years, commit dd317df07207 ("perf build: Make binutil libraries opt
> in"), remove the code to simplify the code base.

This part can be a separate series.

> 
> The patch series:
> 1) does some initial clean up;
> 2) moves the capstone and LLVM code to their own C files,
> 3) simplifies a little the capstone code;

I like changes up to this in general.  Let me take a look at the
patches.

Thanks,
Namhyung


> 4) adds perf_ variants of the functions that will either directly call
>    the function or use dlsym to discover it;
> 5) adds BPF JIT disassembly support to LLVM and capstone disassembly;
> 6) removes the BUILD_NONDISTRO code, reduces scope and removes what's possible.
> 
> The addr2line LLVM functionality is written in C++. To avoid linking
> against libLLVM for this, a new LIBLLVM_DYNAMIC option is added where
> the C++ code with the libLLVM dependency will be built into a
> libperf-llvm.so and that dlsym-ed and called against. Ideally LLVM
> would extend their C API to avoid this.
> 
> The libbfd BPF disassembly supported source lines, this wasn't ported
> to the capstone and LLVM disassembly.
> 
> v2: Add mangling of the function names in libperf-llvm.so to avoid
>     potential infinite recursion. Add BPF JIT disassembly support to
>     LLVM and capstone. Add/rebase the BUILD_NONDISTRO cleanup onto the
>     series from:
>     https://lore.kernel.org/lkml/20250111202851.1075338-1-irogers@google.com/
>     Some other minor additional clean up.
> 
> Ian Rogers (17):
>   perf build: Remove libtracefs configuration
>   perf map: Constify objdump offset/address conversion APIs
>   perf capstone: Move capstone functionality into its own file
>   perf llvm: Move llvm functionality into its own file
>   perf capstone: Remove open_capstone_handle
>   perf capstone: Support for dlopen-ing libcapstone.so
>   perf llvm: Support for dlopen-ing libLLVM.so
>   perf llvm: Mangle libperf-llvm.so function names
>   perf dso: Move read_symbol from llvm/capstone to dso
>   perf dso: Support BPF programs in dso__read_symbol
>   perf llvm: Disassemble cleanup
>   perf dso: Clean up read_symbol error handling
>   perf build: Remove libbfd support
>   perf build: Remove libiberty support
>   perf build: Remove unused defines
>   perf disasm: Remove disasm_bpf
>   perf disasm: Make ins__scnprintf and ins__is_nop static
> 
>  tools/perf/Documentation/perf-check.txt |   1 -
>  tools/perf/Makefile.config              |  90 +---
>  tools/perf/Makefile.perf                |  35 +-
>  tools/perf/builtin-check.c              |   1 -
>  tools/perf/builtin-script.c             |   2 -
>  tools/perf/tests/Build                  |   1 -
>  tools/perf/tests/builtin-test.c         |   1 -
>  tools/perf/tests/make                   |   4 +-
>  tools/perf/tests/pe-file-parsing.c      | 101 ----
>  tools/perf/tests/tests.h                |   1 -
>  tools/perf/util/Build                   |   5 +-
>  tools/perf/util/annotate.h              |   1 -
>  tools/perf/util/capstone.c              | 682 ++++++++++++++++++++++++
>  tools/perf/util/capstone.h              |  24 +
>  tools/perf/util/demangle-cxx.cpp        |  22 +-
>  tools/perf/util/disasm.c                | 632 +---------------------
>  tools/perf/util/disasm.h                |   5 +-
>  tools/perf/util/disasm_bpf.c            | 195 -------
>  tools/perf/util/disasm_bpf.h            |  12 -
>  tools/perf/util/dso.c                   |  98 ++++
>  tools/perf/util/dso.h                   |   4 +
>  tools/perf/util/llvm-c-helpers.cpp      | 120 ++++-
>  tools/perf/util/llvm-c-helpers.h        |  24 +-
>  tools/perf/util/llvm.c                  | 489 +++++++++++++++++
>  tools/perf/util/llvm.h                  |  24 +
>  tools/perf/util/map.c                   |  19 +-
>  tools/perf/util/map.h                   |   6 +-
>  tools/perf/util/print_insn.c            | 117 +---
>  tools/perf/util/srcline.c               | 306 +----------
>  tools/perf/util/srcline.h               |   6 +
>  tools/perf/util/symbol-elf.c            |  95 ----
>  tools/perf/util/symbol.c                | 135 -----
>  tools/perf/util/symbol.h                |   4 -
>  33 files changed, 1552 insertions(+), 1710 deletions(-)
>  delete mode 100644 tools/perf/tests/pe-file-parsing.c
>  create mode 100644 tools/perf/util/capstone.c
>  create mode 100644 tools/perf/util/capstone.h
>  delete mode 100644 tools/perf/util/disasm_bpf.c
>  delete mode 100644 tools/perf/util/disasm_bpf.h
>  create mode 100644 tools/perf/util/llvm.c
>  create mode 100644 tools/perf/util/llvm.h
> 
> -- 
> 2.48.0.rc2.279.g1de40edade-goog
>
Ian Rogers Jan. 23, 2025, 11:36 p.m. UTC | #6
On Thu, Jan 23, 2025 at 1:59 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Jan 21, 2025 at 10:23:15PM -0800, Ian Rogers wrote:
> > Linking against libcapstone and libLLVM can be a significant increase
> > in dependencies and size of memory footprint. For something like `perf
> > record` the disassembler and addr2line functionality won't be
> > used. Support dynamically loading these libraries using dlopen and
> > then calling the appropriate functions found using dlsym.
>
> It's not clear from the description how you would use dlopen/dlsym.
> Based on an offline discussion, you want to leave the current linking
> model as is, and to support dlopen/dlsym when it's NOT detected at
> build-time, right?

Yep. Current behavior is no header file than these options fail, new
behavior is that we try to use dlopen/dlsym and fail if the dlopen
fails.

> For that, you need to carry some definitions of the functions and types
> for the used APIs.  But I'm not sure if it's right to carry them in the
> perf code base.

Right. I mention that here:
https://lore.kernel.org/lkml/CAP-5=fUhNuybCU-2_5EgcCwgwXnxvyFMvyhzKe=ZP1bssQwXHw@mail.gmail.com/
For LLVM we need 3 typedefs and 5 #defines, for capstone we need 2
structs and 5 enums (if we #ifdef some x86 only formatting code).

The problem in not carrying those definitions is:
1) if the header file isn't present a build won't support
LLVM/capstone even by dlopen - everything falls through to
objdump/addr2line that have known performance issues;
2) package maintainers either need to spot a warning message to
realize they've done this by having a missing header file (hard to
spot and brittle) or we require the build to fail and people without
capstone.h opt out of the build error with NO_CAPSTONE=1 - something
perf developers will probably not like;
3) the LLVM/capstone code needs #ifdefs and __maybe_unused to suppress
compiler warnings, or perhaps we have a minimal version of those
files, leading to extra code complexity.

I believe the approach here is no worse than what we do with vmlinux.h
for BPF code and is robust as depending on dlsym being able to look up
the function names. It is not perfect but I think it is more perfect
and less complex than the alternative.

> >
> > BUILD_NONDISTRO is used to build perf against the license incompatible
> > libbfd and libiberty libraries. As this has been opt-in for nearly 2
> > years, commit dd317df07207 ("perf build: Make binutil libraries opt
> > in"), remove the code to simplify the code base.
>
> This part can be a separate series.

Right, I posted it as a series here:
https://lore.kernel.org/lkml/20250111202851.1075338-1-irogers@google.com/
as mentioned in the v2 notes below. The issue was that Arnaldo pointed
out removing BUILD_NONDISTRO removed disassemble_bpf that had only
been implemented for libbfd. This series adds the LLVM/capstone
definitions built into the symbol__disassemble refactor. Merging that
series would conflict with this series, so I posted everything
together to avoid having series of patches depending upon one another.
I also wanted to check that what is in disasm.c in the end is
reasonable, which I believe it is with significantly reduced
ifdef-ery.

> >
> > The patch series:
> > 1) does some initial clean up;
> > 2) moves the capstone and LLVM code to their own C files,
> > 3) simplifies a little the capstone code;
>
> I like changes up to this in general.  Let me take a look at the
> patches.

Thanks,
Ian
Ian

> Thanks,
> Namhyung