diff mbox series

[v4,bpf-next,3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it

Message ID 20201110011932.3201430-4-andrii@kernel.org (mailing list archive)
State Accepted
Delegated to: BPF
Headers show
Series Integrate kernel module BTF support | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 4 this patch: 4
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch fail Link
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Andrii Nakryiko Nov. 10, 2020, 1:19 a.m. UTC
Detect if pahole supports split BTF generation, and generate BTF for each
selected kernel module, if it does. This is exposed to Makefiles and C code as
CONFIG_DEBUG_INFO_BTF_MODULES flag.

Kernel module BTF has to be re-generated if either vmlinux's BTF changes or
module's .ko changes. To achieve that, I needed a helper similar to
if_changed, but that would allow to filter out vmlinux from the list of
updated dependencies for .ko building. I've put it next to the only place that
uses and needs it, but it might be a better idea to just add it along the
other if_changed variants into scripts/Kbuild.include.

Each kernel module's BTF deduplication is pretty fast, as it does only
incremental BTF deduplication on top of already deduplicated vmlinux BTF. To
show the added build time, I've first ran make only just built kernel (to
establish the baseline) and then forced only BTF re-generation, without
regenerating .ko files. The build was performed with -j60 parallelization on
56-core machine. The final time also includes bzImage building, so it's not
a pure BTF overhead.

$ time make -j60
...
make -j60  27.65s user 10.96s system 782% cpu 4.933 total
$ touch ~/linux-build/default/vmlinux && time make -j60
...
make -j60  123.69s user 27.85s system 1566% cpu 9.675 total

So 4.6 seconds real time, with noticeable part spent in compressed vmlinux and
bzImage building.

To show size savings, I've built my kernel configuration with about 700 kernel
modules with full BTF per each kernel module (without deduplicating against
vmlinux) and with split BTF against deduplicated vmlinux (approach in this
patch). Below are top 10 modules with biggest BTF sizes. And total size of BTF
data across all kernel modules.

It shows that split BTF "compresses" 115MB down to 5MB total. And the biggest
kernel modules get a downsize from 500-570KB down to 200-300KB.

FULL BTF
========

$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
115710691

$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 570570
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 520240
./drivers/gpu/drm/radeon/radeon.ko 503849
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 491777
./fs/xfs/xfs.ko 411544
./drivers/net/ethernet/intel/i40e/i40e.ko 403904
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 398754
./drivers/infiniband/core/ib_core.ko 397224
./fs/cifs/cifs.ko 386249
./fs/nfsd/nfsd.ko 379738

SPLIT BTF
=========

$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
5194047

$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 293206
./drivers/gpu/drm/radeon/radeon.ko 282103
./fs/xfs/xfs.ko 222150
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
./fs/cifs/cifs.ko 109379
./arch/x86/kvm/kvm.ko 100225
./drivers/gpu/drm/drm.ko 94827
./drivers/infiniband/core/ib_core.ko 91188

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 lib/Kconfig.debug         |  9 +++++++++
 scripts/Makefile.modfinal | 20 ++++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

Comments

Song Liu Nov. 11, 2020, 1:04 a.m. UTC | #1
> On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

[...]

> SPLIT BTF
> =========
> 
> $ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
> 5194047
> 
> $ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
> ./drivers/gpu/drm/i915/i915.ko 293206
> ./drivers/gpu/drm/radeon/radeon.ko 282103
> ./fs/xfs/xfs.ko 222150
> ./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
> ./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
> ./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
> ./fs/cifs/cifs.ko 109379
> ./arch/x86/kvm/kvm.ko 100225
> ./drivers/gpu/drm/drm.ko 94827
> ./drivers/infiniband/core/ib_core.ko 91188
> 
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Song Liu <songliubraving@fb.com>
Allan, Bruce W Nov. 16, 2020, 7:55 p.m. UTC | #2
> -----Original Message-----
> From: Song Liu <songliubraving@fb.com>
> Sent: Tuesday, November 10, 2020 5:05 PM
> To: Andrii Nakryiko <andrii@kernel.org>
> Cc: bpf <bpf@vger.kernel.org>; Networking <netdev@vger.kernel.org>;
> Starovoitov, Alexei <ast@fb.com>; Daniel Borkmann <daniel@iogearbox.net>;
> Kernel Team <Kernel-team@fb.com>; open list <linux-
> kernel@vger.kernel.org>; rafael@kernel.org; jeyu@kernel.org; Arnaldo
> Carvalho de Melo <acme@redhat.com>; Greg Kroah-Hartman
> <gregkh@linuxfoundation.org>; Masahiro Yamada
> <yamada.masahiro@socionext.com>
> Subject: Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is
> enabled and pahole supports it
> 
> 
> 
> > On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> [...]
> 
> > SPLIT BTF
> > =========
> >
> > $ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}';
> done | awk '{ s += $1 } END { print s }'
> > 5194047
> >
> > $ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep
> BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
> > ./drivers/gpu/drm/i915/i915.ko 293206
> > ./drivers/gpu/drm/radeon/radeon.ko 282103
> > ./fs/xfs/xfs.ko 222150
> > ./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
> > ./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
> > ./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
> > ./fs/cifs/cifs.ko 109379
> > ./arch/x86/kvm/kvm.ko 100225
> > ./drivers/gpu/drm/drm.ko 94827
> > ./drivers/infiniband/core/ib_core.ko 91188
> >
> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> 
> Acked-by: Song Liu <songliubraving@fb.com>

This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
"Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."

Is that intentional?

Thanks,
Bruce.
Andrii Nakryiko Nov. 16, 2020, 8:34 p.m. UTC | #3
On Mon, Nov 16, 2020 at 11:55 AM Allan, Bruce W <bruce.w.allan@intel.com> wrote:
>
> > -----Original Message-----
> > From: Song Liu <songliubraving@fb.com>
> > Sent: Tuesday, November 10, 2020 5:05 PM
> > To: Andrii Nakryiko <andrii@kernel.org>
> > Cc: bpf <bpf@vger.kernel.org>; Networking <netdev@vger.kernel.org>;
> > Starovoitov, Alexei <ast@fb.com>; Daniel Borkmann <daniel@iogearbox.net>;
> > Kernel Team <Kernel-team@fb.com>; open list <linux-
> > kernel@vger.kernel.org>; rafael@kernel.org; jeyu@kernel.org; Arnaldo
> > Carvalho de Melo <acme@redhat.com>; Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org>; Masahiro Yamada
> > <yamada.masahiro@socionext.com>
> > Subject: Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is
> > enabled and pahole supports it
> >
> >
> >
> > > On Nov 9, 2020, at 5:19 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > [...]
> >
> > > SPLIT BTF
> > > =========
> > >
> > > $ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}';
> > done | awk '{ s += $1 } END { print s }'
> > > 5194047
> > >
> > > $ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep
> > BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
> > > ./drivers/gpu/drm/i915/i915.ko 293206
> > > ./drivers/gpu/drm/radeon/radeon.ko 282103
> > > ./fs/xfs/xfs.ko 222150
> > > ./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
> > > ./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
> > > ./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
> > > ./fs/cifs/cifs.ko 109379
> > > ./arch/x86/kvm/kvm.ko 100225
> > > ./drivers/gpu/drm/drm.ko 94827
> > > ./drivers/infiniband/core/ib_core.ko 91188
> > >
> > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > Acked-by: Song Liu <songliubraving@fb.com>
>
> This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
> supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
> kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
> first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
> "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
> errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
>
> Is that intentional?

I wasn't aware of such a use pattern, so definitely not intentional.
But vmlinux is absolutely necessary to generate the module BTF. So I'm
wondering what's the proper fix here? Leave it as is (that error
message is actually surprisingly descriptive, btw)? Force vmlinux
build? Or skip BTF generation for that module?

>
> Thanks,
> Bruce.
Jakub Kicinski Nov. 16, 2020, 9:24 p.m. UTC | #4
On Mon, 16 Nov 2020 12:34:17 -0800 Andrii Nakryiko wrote:
> > This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
> > supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
> > kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
> > first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
> > "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
> > errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
> >
> > Is that intentional?  
> 
> I wasn't aware of such a use pattern, so definitely not intentional.
> But vmlinux is absolutely necessary to generate the module BTF. So I'm
> wondering what's the proper fix here? Leave it as is (that error
> message is actually surprisingly descriptive, btw)? Force vmlinux
> build? Or skip BTF generation for that module?

For an external out-of-tree module there is no guarantee that vmlinux
will even be on the system, no? So only the last option can work in
that case.
Andrii Nakryiko Nov. 16, 2020, 10:35 p.m. UTC | #5
On Mon, Nov 16, 2020 at 1:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 16 Nov 2020 12:34:17 -0800 Andrii Nakryiko wrote:
> > > This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
> > > supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
> > > kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
> > > first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
> > > "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
> > > errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
> > >
> > > Is that intentional?
> >
> > I wasn't aware of such a use pattern, so definitely not intentional.
> > But vmlinux is absolutely necessary to generate the module BTF. So I'm
> > wondering what's the proper fix here? Leave it as is (that error
> > message is actually surprisingly descriptive, btw)? Force vmlinux
> > build? Or skip BTF generation for that module?
>
> For an external out-of-tree module there is no guarantee that vmlinux
> will even be on the system, no? So only the last option can work in
> that case.


Ok, how about something like the patch below. With that I seem to be
getting the desired behavior:

$ make clean
$ touch drivers/acpi/button.c
$ make M=drivers/acpi
make[1]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
  CC [M]  drivers/acpi/button.o
  MODPOST drivers/acpi/Module.symvers
  LD [M]  drivers/acpi/button.ko
  BTF [M] drivers/acpi/button.ko
Skipping BTF generation for drivers/acpi/button.ko due to
unavailability of vmlinux
make[1]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
$ readelf -S ~/linux-build/default/drivers/acpi/button.ko | grep BTF -A1
... empty ...

Now with normal build:

$ make all
...
LD [M]  drivers/acpi/button.ko
BTF [M] drivers/acpi/button.ko
...
$ readelf -S ~/linux-build/default/drivers/acpi/button.ko | grep BTF -A1
  [60] .BTF              PROGBITS         0000000000000000  00029310
       000000000000ab3f  0000000000000000           0     0     1



diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 02b892421f7a..d49ec001825d 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -38,7 +38,12 @@ quiet_cmd_ld_ko_o = LD [M]  $@
     $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)

 quiet_cmd_btf_ko = BTF [M] $@
-      cmd_btf_ko = LLVM_OBJCOPY=$(OBJCOPY) $(PAHOLE) -J --btf_base vmlinux $@
+      cmd_btf_ko =                             \
+    if [ -f vmlinux ]; then                        \
+        LLVM_OBJCOPY=$(OBJCOPY) $(PAHOLE) -J --btf_base vmlinux $@; \
+    else                                \
+        printf "Skipping BTF generation for %s due to unavailability
of vmlinux\n" $@ 1>&2; \
+    fi;

 # Same as newer-prereqs, but allows to exclude specified extra dependencies
 newer_prereqs_except = $(filter-out $(PHONY) $(1),$?)
@@ -49,7 +54,7 @@ if_changed_except = $(if $(call
newer_prereqs_except,$(2))$(cmd-check),      \
     printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:)

 # Re-generate module BTFs if either module's .ko or vmlinux changed
-$(modules): %.ko: %.o %.mod.o scripts/module.lds vmlinux FORCE
+$(modules): %.ko: %.o %.mod.o scripts/module.lds $(if
$(KBUILD_BUILTIN),vmlinux) FORCE
     +$(call if_changed_except,ld_ko_o,vmlinux)
 ifdef CONFIG_DEBUG_INFO_BTF_MODULES
     +$(if $(newer-prereqs),$(call cmd,btf_ko))
David Ahern Nov. 17, 2020, 3:44 a.m. UTC | #6
On 11/9/20 6:19 PM, Andrii Nakryiko wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index d7a7bc3b6098..1e78faaf20a5 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -274,6 +274,15 @@ config DEBUG_INFO_BTF
>  	  Turning this on expects presence of pahole tool, which will convert
>  	  DWARF type info into equivalent deduplicated BTF type info.
>  
> +config PAHOLE_HAS_SPLIT_BTF
> +	def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")
> +
> +config DEBUG_INFO_BTF_MODULES
> +	def_bool y
> +	depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
> +	help
> +	  Generate compact split BTF type information for kernel modules.
> +
>  config GDB_SCRIPTS
>  	bool "Provide GDB scripts for kernel debugging"
>  	help

Thank you for adding a config option for this feature vs bumping the
required pahole version in scripts/link-vmlinux.sh. This is a much more
friendly way of handling kernel features that require support from build
tools.
Fabian Grünbichler Nov. 26, 2021, 3:16 p.m. UTC | #7
On November 16, 2020 10:24 pm, Jakub Kicinski wrote:
> On Mon, 16 Nov 2020 12:34:17 -0800 Andrii Nakryiko wrote:
>> > This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
>> > supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
>> > kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
>> > first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
>> > "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
>> > errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
>> >
>> > Is that intentional?  
>> 
>> I wasn't aware of such a use pattern, so definitely not intentional.
>> But vmlinux is absolutely necessary to generate the module BTF. So I'm
>> wondering what's the proper fix here? Leave it as is (that error
>> message is actually surprisingly descriptive, btw)? Force vmlinux
>> build? Or skip BTF generation for that module?
> 
> For an external out-of-tree module there is no guarantee that vmlinux
> will even be on the system, no? So only the last option can work in
> that case.

a year late to the party, but it seems to me that this patch 
series/features also missed another, not yet fixed scenario. I have to 
admit I am not very well-versed in BTF/BPF matters though, so please 
take the analysis below with a grain of salt or two ;)

(am subscribed to LKML/netdev, but not the bpf list, so please keep me 
CCed if discussion moves there! apologies if too many people are CCed 
here, feel free to trim down to relevant people/lists)

many distros do their own tracking of kernel <-> module ABI (usually 
these checks use Module.symvers and some combination of lists/symbols/.. 
to skip/ignore[0]).

depending on detected changes, a kernel update can either
- bump ABI, resulting in a new kernel/modules package that is installed 
  next to the current one
- keep ABI, resulting in an updated kernel/modules package that is 
  installed over/instead of the current one

the former case is obviously not an issue, since the modules and vmlinux 
image shipped in that (set of) package(s) match. but in the later case 
of updated, compatible kernel image + modules with unchanged ABI
(which is important, as it allows shipping fixed modules that are 
loadable for a compatible, older, booted kernel image), the following is 
possible:
- install kernel+modules with ABI 1
- boot kernel with ABI 1
- install ABI-compatible upgrade (e.g. a security fix)
- load module
- BTF validation fails, because the base_btf (loaded at boot time) and 
  the offsets in the module's .BTF section (loaded at module load time) 
  aren't matching

of course the validation might also not fail but the parsed BTF info 
might be bogus, or the base_btf might be similar enough that validation 
passes and the parsed BTF data is correct.

in our case the symptoms look like this (exact details vary with kernel 
builds/modules, but likely not relevant):

Nov 24 11:39:11 host kernel: BPF:         type_id=7 bits_offset=0
Nov 24 11:39:11 host kernel: BPF:
Nov 24 11:39:11 host kernel: BPF:Invalid name
Nov 24 11:39:11 host kernel: BPF:
Nov 24 11:39:11 host kernel: failed to validate module [overlay] BTF: -22

where the booted kernel and the (attempted to get) loaded module are not 
from the same build, but the Module.symvers is matching and loading 
should thus work. adding some more debug logging reveals that the root 
cause is the module's BTF start_str_off being, well, off, since it's derived 
from vmlinux' BTF/base_btf. if it is too big, the name/type lookups will 
wrongly look in the base_btf, if it's too small, the name/type lookups 
will be offset within the module or wrongly look inside the module when 
they should look inside base_btf/vmlinux. in any case, random garbage is 
the result, usually tripping up some validation check (e.g. the first 
byte not being 0 when checking a name). but even if it's correct (old 
and new vmlinux image have the same nr_types/hdr.str_len), there is no 
guarantuee that the offsets into base_btf are pointing at the right 
stuff.

example with debug logging patched in, note the garbled names, and 
offset slightly below the (wrong) start_str_off:

----8<----

BPF:magic: 0xeb9f

BPF:version: 1

BPF:flags: 0x0

BPF:hdr_len: 24

BPF:type_off: 0

BPF:type_len: 9264

BPF:str_off: 9264

BPF:str_len: 5511

BPF:btf_total_size: 14799

BPF:[106314] STRUCT rimary_device
BPF:size=56 vlen=14
BPF:

BPF:offset at call: 1915394
BPF:offset too small, choosing base_btf: 1915397

BPF:offset after base_btf: 1915394

BPF:     ce type_id=49 bits_offset=0
BPF:

BPF:offset at call: 1915403
BPF:offset after base_btf: 6

BPF:     nfig type_id=49 bits_offset=64
BPF:

BPF:offset at call: 1915412
BPF:offset after base_btf: 15

BPF:     rdir type_id=49 bits_offset=128
BPF:

BPF:offset at call: 768428
BPF:offset too small, choosing base_btf: 1915397

BPF:offset after base_btf: 768428

BPF:     _dio type_id=56 bits_offset=192
BPF:

BPF:offset at call: 1915420
BPF:offset after base_btf: 23

BPF:     erdir type_id=56 bits_offset=200
BPF:

BPF:offset at call: 1915433
BPF:offset after base_btf: 36

BPF:first char wrong - 0

BPF:      type_id=56 bits_offset=208
BPF:
BPF:Invalid name STRUCT MEMBER (name offset 1915433)
BPF:

failed to validate module [overlay] BTF: -22

---->8----

also note how it's only after a few botched entries that a check 
actually trips up - not sure what the impliciations for crafted BTF info 
are, but might be worthy a closer look by someone more knowledgable as 
well..

it seems to me this can be solved on the distro/user side by tracking 
vmlinux BTF infos as part of the ABI tracking (how stable are those 
compared to the existing interfaces making up the kernel <-> module 
ABI/Module.symvers? does this effectively mean bumping ABI for any 
change anyway?) or by disabling CONFIG_DEBUG_INFO_BTF_MODULES.

on the kernel/libbpf side it could maybe be solved by storing a hash of 
the base_btf data used to generate the split BTF-sections inside the 
modules, and skip BTF loading/validating if another base_btf is 
currently loaded (so BTF is best-effort, if the booted kernel and the 
module are matching it works, if not module loading works but no BTF 
support). this might be a good safe-guard for split-BTF in general?

I'd appreciate input on how to proceed (we were recently hit by this in 
a downstream Debian derivative, and will disable BTF info for modules as 
an interim measure).

thanks!

0: e.g., Debian's: https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/bin/abiupdate.py
Andrii Nakryiko Dec. 8, 2021, 12:08 a.m. UTC | #8
On Fri, Nov 26, 2021 at 7:16 AM Fabian Grünbichler
<f.gruenbichler@proxmox.com> wrote:
>
> On November 16, 2020 10:24 pm, Jakub Kicinski wrote:
> > On Mon, 16 Nov 2020 12:34:17 -0800 Andrii Nakryiko wrote:
> >> > This change, commit 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole
> >> > supports it") currently in net-next, linux-next, etc. breaks the use-case of compiling only a specific
> >> > kernel module (both in-tree and out-of-tree, e.g. 'make M=drivers/net/ethernet/intel/ice') after
> >> > first doing a 'make modules_prepare'.  Previously, that use-case would result in a warning noting
> >> > "Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped" but now it
> >> > errors out after noting "No rule to make target 'vmlinux', needed by '<...>.ko'.  Stop."
> >> >
> >> > Is that intentional?
> >>
> >> I wasn't aware of such a use pattern, so definitely not intentional.
> >> But vmlinux is absolutely necessary to generate the module BTF. So I'm
> >> wondering what's the proper fix here? Leave it as is (that error
> >> message is actually surprisingly descriptive, btw)? Force vmlinux
> >> build? Or skip BTF generation for that module?
> >
> > For an external out-of-tree module there is no guarantee that vmlinux
> > will even be on the system, no? So only the last option can work in
> > that case.
>
> a year late to the party, but it seems to me that this patch

hi, sorry, you email slipped through the cracks initially, just
getting back to it...

> series/features also missed another, not yet fixed scenario. I have to
> admit I am not very well-versed in BTF/BPF matters though, so please
> take the analysis below with a grain of salt or two ;)
>
> (am subscribed to LKML/netdev, but not the bpf list, so please keep me
> CCed if discussion moves there! apologies if too many people are CCed
> here, feel free to trim down to relevant people/lists)
>
> many distros do their own tracking of kernel <-> module ABI (usually
> these checks use Module.symvers and some combination of lists/symbols/..
> to skip/ignore[0]).
>
> depending on detected changes, a kernel update can either
> - bump ABI, resulting in a new kernel/modules package that is installed
>   next to the current one
> - keep ABI, resulting in an updated kernel/modules package that is
>   installed over/instead of the current one
>
> the former case is obviously not an issue, since the modules and vmlinux
> image shipped in that (set of) package(s) match. but in the later case
> of updated, compatible kernel image + modules with unchanged ABI
> (which is important, as it allows shipping fixed modules that are
> loadable for a compatible, older, booted kernel image), the following is
> possible:
> - install kernel+modules with ABI 1
> - boot kernel with ABI 1
> - install ABI-compatible upgrade (e.g. a security fix)
> - load module
> - BTF validation fails, because the base_btf (loaded at boot time) and
>   the offsets in the module's .BTF section (loaded at module load time)
>   aren't matching
>
> of course the validation might also not fail but the parsed BTF info
> might be bogus, or the base_btf might be similar enough that validation
> passes and the parsed BTF data is correct.
>
> in our case the symptoms look like this (exact details vary with kernel
> builds/modules, but likely not relevant):
>
> Nov 24 11:39:11 host kernel: BPF:         type_id=7 bits_offset=0
> Nov 24 11:39:11 host kernel: BPF:
> Nov 24 11:39:11 host kernel: BPF:Invalid name
> Nov 24 11:39:11 host kernel: BPF:
> Nov 24 11:39:11 host kernel: failed to validate module [overlay] BTF: -22
>
> where the booted kernel and the (attempted to get) loaded module are not
> from the same build, but the Module.symvers is matching and loading
> should thus work. adding some more debug logging reveals that the root
> cause is the module's BTF start_str_off being, well, off, since it's derived
> from vmlinux' BTF/base_btf. if it is too big, the name/type lookups will
> wrongly look in the base_btf, if it's too small, the name/type lookups
> will be offset within the module or wrongly look inside the module when
> they should look inside base_btf/vmlinux. in any case, random garbage is
> the result, usually tripping up some validation check (e.g. the first
> byte not being 0 when checking a name). but even if it's correct (old
> and new vmlinux image have the same nr_types/hdr.str_len), there is no
> guarantuee that the offsets into base_btf are pointing at the right
> stuff.
>
> example with debug logging patched in, note the garbled names, and
> offset slightly below the (wrong) start_str_off:
>
> ----8<----
>
> BPF:magic: 0xeb9f
>
> BPF:version: 1
>
> BPF:flags: 0x0
>
> BPF:hdr_len: 24
>
> BPF:type_off: 0
>
> BPF:type_len: 9264
>
> BPF:str_off: 9264
>
> BPF:str_len: 5511
>
> BPF:btf_total_size: 14799
>
> BPF:[106314] STRUCT rimary_device
> BPF:size=56 vlen=14
> BPF:
>
> BPF:offset at call: 1915394
> BPF:offset too small, choosing base_btf: 1915397
>
> BPF:offset after base_btf: 1915394
>
> BPF:     ce type_id=49 bits_offset=0
> BPF:
>
> BPF:offset at call: 1915403
> BPF:offset after base_btf: 6
>
> BPF:     nfig type_id=49 bits_offset=64
> BPF:
>
> BPF:offset at call: 1915412
> BPF:offset after base_btf: 15
>
> BPF:     rdir type_id=49 bits_offset=128
> BPF:
>
> BPF:offset at call: 768428
> BPF:offset too small, choosing base_btf: 1915397
>
> BPF:offset after base_btf: 768428
>
> BPF:     _dio type_id=56 bits_offset=192
> BPF:
>
> BPF:offset at call: 1915420
> BPF:offset after base_btf: 23
>
> BPF:     erdir type_id=56 bits_offset=200
> BPF:
>
> BPF:offset at call: 1915433
> BPF:offset after base_btf: 36
>
> BPF:first char wrong - 0
>
> BPF:      type_id=56 bits_offset=208
> BPF:
> BPF:Invalid name STRUCT MEMBER (name offset 1915433)
> BPF:
>
> failed to validate module [overlay] BTF: -22
>
> ---->8----
>
> also note how it's only after a few botched entries that a check
> actually trips up - not sure what the impliciations for crafted BTF info
> are, but might be worthy a closer look by someone more knowledgable as
> well..
>
> it seems to me this can be solved on the distro/user side by tracking
> vmlinux BTF infos as part of the ABI tracking (how stable are those
> compared to the existing interfaces making up the kernel <-> module
> ABI/Module.symvers? does this effectively mean bumping ABI for any

Yes, I think so, unfortunately. Any change in order of types emitted
by DWARF or any extra string might cause a big change in string
offsets or type IDs.


> change anyway?) or by disabling CONFIG_DEBUG_INFO_BTF_MODULES.
>
> on the kernel/libbpf side it could maybe be solved by storing a hash of
> the base_btf data used to generate the split BTF-sections inside the
> modules, and skip BTF loading/validating if another base_btf is
> currently loaded (so BTF is best-effort, if the booted kernel and the
> module are matching it works, if not module loading works but no BTF
> support). this might be a good safe-guard for split-BTF in general?
>
> I'd appreciate input on how to proceed (we were recently hit by this in
> a downstream Debian derivative, and will disable BTF info for modules as
> an interim measure).
>

Yeah, I think "BTF is best-effort" is a necessity for the complicated
setups you described above. It probably is good to keep strict
behavior (BTF fails to load -> fail loading the module), but allow you
to tune it with the Kconfig option. Most modules probably would be
just fine without its BTF loaded successfully.

As for the check sums, this would add another layer of protection, so
I think storing base vmlinux BTF checksum in some special ELF section
in module' ELF (e.g. ".BTF.base_checksum" or something along those
lines) is also a good idea and would be easy to implement. Insmod and
related tooling might even provide a validation check that vmlinux BTF
matches module's expectation, if necessary.

> thanks!
>
> 0: e.g., Debian's: https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/bin/abiupdate.py
>
diff mbox series

Patch

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d7a7bc3b6098..1e78faaf20a5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -274,6 +274,15 @@  config DEBUG_INFO_BTF
 	  Turning this on expects presence of pahole tool, which will convert
 	  DWARF type info into equivalent deduplicated BTF type info.
 
+config PAHOLE_HAS_SPLIT_BTF
+	def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")
+
+config DEBUG_INFO_BTF_MODULES
+	def_bool y
+	depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
+	help
+	  Generate compact split BTF type information for kernel modules.
+
 config GDB_SCRIPTS
 	bool "Provide GDB scripts for kernel debugging"
 	help
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index ae01baf96f4e..02b892421f7a 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -6,6 +6,7 @@ 
 PHONY := __modfinal
 __modfinal:
 
+include include/config/auto.conf
 include $(srctree)/scripts/Kbuild.include
 
 # for c_flags
@@ -36,8 +37,23 @@  quiet_cmd_ld_ko_o = LD [M]  $@
 		-T scripts/module.lds -o $@ $(filter %.o, $^);		\
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
-$(modules): %.ko: %.o %.mod.o scripts/module.lds FORCE
-	+$(call if_changed,ld_ko_o)
+quiet_cmd_btf_ko = BTF [M] $@
+      cmd_btf_ko = LLVM_OBJCOPY=$(OBJCOPY) $(PAHOLE) -J --btf_base vmlinux $@
+
+# Same as newer-prereqs, but allows to exclude specified extra dependencies
+newer_prereqs_except = $(filter-out $(PHONY) $(1),$?)
+
+# Same as if_changed, but allows to exclude specified extra dependencies
+if_changed_except = $(if $(call newer_prereqs_except,$(2))$(cmd-check),      \
+	$(cmd);                                                              \
+	printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:)
+
+# Re-generate module BTFs if either module's .ko or vmlinux changed
+$(modules): %.ko: %.o %.mod.o scripts/module.lds vmlinux FORCE
+	+$(call if_changed_except,ld_ko_o,vmlinux)
+ifdef CONFIG_DEBUG_INFO_BTF_MODULES
+	+$(if $(newer-prereqs),$(call cmd,btf_ko))
+endif
 
 targets += $(modules) $(modules:.ko=.mod.o)