Message ID | 20250207012045.2129841-3-stephen.s.brennan@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add option for generating BTF types of global variables | expand |
On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan <stephen.s.brennan@oracle.com> wrote: > When the feature was implemented in pahole, my measurements indicated > that vmlinux BTF size increased by about 25.8%, and module BTF size > increased by 53.2%. Due to these increases, the feature is implemented > behind a new config option, allowing users sensitive to increased memory > usage to disable it. > ... > +config DEBUG_INFO_BTF_GLOBAL_VARS > + bool "Generate BTF type information for all global variables" > + default y > + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128 > + help > + Include type information for all global variables in the BTF. This > + increases the size of the BTF information, which increases memory > + usage at runtime. With global variable types available, runtime > + debugging and tracers may be able to provide more detail. This is not a solution. Even if it's changed to 'default n' distros will enable it like they enable everything and will suffer a regression. We need to add a new module like vmlinux_btf.ko that will contain this additional BTF data. For global vars and everything else we might need. pw-bot: cr
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan > <stephen.s.brennan@oracle.com> wrote: >> When the feature was implemented in pahole, my measurements indicated >> that vmlinux BTF size increased by about 25.8%, and module BTF size >> increased by 53.2%. Due to these increases, the feature is implemented >> behind a new config option, allowing users sensitive to increased memory >> usage to disable it. >> > > ... >> +config DEBUG_INFO_BTF_GLOBAL_VARS >> + bool "Generate BTF type information for all global variables" >> + default y >> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128 >> + help >> + Include type information for all global variables in the BTF. This >> + increases the size of the BTF information, which increases memory >> + usage at runtime. With global variable types available, runtime >> + debugging and tracers may be able to provide more detail. > > This is not a solution. > Even if it's changed to 'default n' distros will enable it > like they enable everything and will suffer a regression. > > We need to add a new module like vmlinux_btf.ko that will contain > this additional BTF data. For global vars and everything else we might need. Fair enough. I believe I had shared Alan Maguire's proof-of-concept for that idea a while back for an older version of this feature: https://lore.kernel.org/all/20221104231103.752040-10-stephen.s.brennan@oracle.com/ We can dust that off and include it for a new version of this series. I'd be curious of what you'd like to see for kernel modules? A three-level tree would be too complex, in my opinion. As a separate note for this patch series, we discovered that variables declared twice, where one is declared "__weak", will result in two DWARF variable declarations, and thus two BTF variables. This trips up the BTF validation code. So this series as it is cannot move forward. I'm submitting a fix to dwarves today. Thanks, Stephen
On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan <stephen.s.brennan@oracle.com> wrote: > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan > > <stephen.s.brennan@oracle.com> wrote: > >> When the feature was implemented in pahole, my measurements indicated > >> that vmlinux BTF size increased by about 25.8%, and module BTF size > >> increased by 53.2%. Due to these increases, the feature is implemented > >> behind a new config option, allowing users sensitive to increased memory > >> usage to disable it. > >> > > > > ... > >> +config DEBUG_INFO_BTF_GLOBAL_VARS > >> + bool "Generate BTF type information for all global variables" > >> + default y > >> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128 > >> + help > >> + Include type information for all global variables in the BTF. This > >> + increases the size of the BTF information, which increases memory > >> + usage at runtime. With global variable types available, runtime > >> + debugging and tracers may be able to provide more detail. > > > > This is not a solution. > > Even if it's changed to 'default n' distros will enable it > > like they enable everything and will suffer a regression. > > > > We need to add a new module like vmlinux_btf.ko that will contain > > this additional BTF data. For global vars and everything else we might need. > > Fair enough. I believe I had shared Alan Maguire's proof-of-concept for > that idea a while back for an older version of this feature: > > https://lore.kernel.org/all/20221104231103.752040-10-stephen.s.brennan@oracle.com/ Right vmlinux_extra was discussed in various context, so let's make it happen. > We can dust that off and include it for a new version of this series. > I'd be curious of what you'd like to see for kernel modules? A > three-level tree would be too complex, in my opinion. What is the use case for vars in kernel modules? > module BTF size increased by 53.2%. This is the sum of all mods with vars divided by the sum of all mods without? Any outliers there? I would expect modules to have few global variables. So before we decide on what to do with vars in mods lets figure out the need.
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan [...] >> We can dust that off and include it for a new version of this series. >> I'd be curious of what you'd like to see for kernel modules? A >> three-level tree would be too complex, in my opinion. > > What is the use case for vars in kernel modules? The use case would be the same as for the core kernel. My primary motivation is to allow drgn to understand the types of global variables, and that extends to kernel modules too. >> module BTF size increased by 53.2%. > > This is the sum of all mods with vars divided by > the sum of all mods without? That was a poorly done comparison, so let me provide this one that I did using 6.13 and these patches. It was essentially a localmodconfig for a VM instance, so I could still do better by picking a popular distribution config. But I think this is far more representative. MODULE BASE COMP CHG PCT drm.ko 115833 123410 7577 6.54% iscsi_boot_sysfs.ko 2627 5380 2753 104.80% joydev.ko 1816 2289 473 26.05% libcxgbi.ko 24556 25266 710 2.89% drm_vram_helper.ko 22325 22751 426 1.91% nvme-tcp.ko 25044 25973 929 3.71% vfat.ko 3448 3953 505 14.65% btrfs.ko 275139 343686 68547 24.91% libiscsi.ko 21177 21977 800 3.78% xt_owner.ko 449 803 354 78.84% nft_ct.ko 4912 6157 1245 25.35% iscsi_ibft.ko 3967 4463 496 12.50% pcspkr.ko 283 682 399 140.99% crc32-pclmul.ko 390 771 381 97.69% nf_conntrack.ko 23686 28191 4505 19.02% iscsi_tcp.ko 16827 17750 923 5.49% nft_fib.ko 835 1117 282 33.77% nf_reject_ipv6.ko 699 981 282 40.34% rfkill.ko 4233 6410 2177 51.43% dm-region-hash.ko 6214 6496 282 4.54% cxgb3i.ko 35469 37078 1609 4.54% dm-mirror.ko 7576 8191 615 8.12% pvpanic-pci.ko 174 574 400 229.89% crct10dif-pclmul.ko 146 525 379 259.59% nvme-fabrics.ko 17341 18124 783 4.52% kvm-amd.ko 47302 51914 4612 9.75% crc8.ko 221 405 184 83.26% ib_iser.ko 27769 29116 1347 4.85% sg.ko 4234 5656 1422 33.59% intel_rapl_common.ko 5678 8446 2768 48.75% bochs.ko 35643 36997 1354 3.80% sha1-ssse3.ko 790 1305 515 65.19% kvm-intel.ko 53802 59220 5418 10.07% nft_chain_nat.ko 279 714 435 155.91% vmlinux 5484970 7330096 1845126 33.64% sha256-ssse3.ko 851 1378 527 61.93% nf_nat.ko 6341 7240 899 14.18% configs.ko 72 256 184 255.56% xt_comment.ko 151 507 356 235.76% ccp.ko 30433 34782 4349 14.29% cxgb3.ko 44981 47504 2523 5.61% crypto_simd.ko 1331 1613 282 21.19% iptable_filter.ko 855 1456 601 70.29% qedi.ko 70653 72786 2133 3.02% drm_kms_helper.ko 63238 65000 1762 2.79% cnic.ko 117074 117790 716 0.61% failover.ko 780 1216 436 55.90% nft_redir.ko 874 1529 655 74.94% serio_raw.ko 708 1234 526 74.29% nf_defrag_ipv6.ko 1520 2253 733 48.22% nf_defrag_ipv4.ko 306 770 464 151.63% nft_reject_ipv4.ko 517 939 422 81.62% nft_nat.ko 1192 1732 540 45.30% nft_reject_inet.ko 554 976 422 76.17% fuse.ko 32181 41859 9678 30.07% nft_compat.ko 3705 4404 699 18.87% zstd_compress.ko 42597 43622 1025 2.41% tls.ko 15140 20683 5543 36.61% virtio_pci.ko 8456 9193 737 8.72% blake2b_generic.ko 1364 1699 335 24.56% cryptd.ko 3697 4297 600 16.23% xor.ko 1358 1879 521 38.37% intel_rapl_msr.ko 2851 3440 589 20.66% kvm.ko 177060 256377 79317 44.80% cxgb4.ko 215865 220844 4979 2.31% bnx2i.ko 39524 41477 1953 4.94% dm-round-robin.ko 1795 2123 328 18.27% virtio_pci_legacy_dev.ko 909 1191 282 31.02% qla4xxx.ko 79040 82694 3654 4.62% nfs.ko 108350 169642 61292 56.57% libata.ko 47301 66188 18887 39.93% ghash-clmulni-intel.ko 578 997 419 72.49% nf_reject_ipv4.ko 706 988 282 39.94% nft_reject.ko 820 1196 376 45.85% sunrpc.ko 127496 197841 70345 55.17% nft_fib_ipv4.ko 803 1257 454 56.54% scsi_transport_iscsi.ko 40419 57633 17214 42.59% lockd.ko 36144 42137 5993 16.58% drm_shmem_helper.ko 32555 33043 488 1.50% nvme-core.ko 50275 58298 8023 15.96% iw_cm.ko 13405 14796 1391 10.38% mdio.ko 857 1041 184 21.47% bnx2.ko 20354 21611 1257 6.18% net_failover.ko 1742 2187 445 25.55% ip_set.ko 11812 13093 1281 10.84% libcxgb.ko 8698 8980 282 3.24% dm-multipath.ko 8124 8898 774 9.53% grace.ko 462 890 428 92.64% virtio_net.ko 12322 14896 2574 20.89% qed.ko 228735 232231 3496 1.53% cdc-acm.ko 2923 3679 756 25.86% i2c-piix4.ko 1124 2341 1217 108.27% pvpanic-mmio.ko 177 625 448 253.11% virtio_scsi.ko 3154 3898 744 23.59% uio.ko 2602 4295 1693 65.07% nft_fib_ipv6.ko 956 1410 454 47.49% cec.ko 28370 29266 896 3.16% qemu_fw_cfg.ko 1601 3476 1875 117.11% ttm.ko 23672 25727 2055 8.68% sd_mod.ko 9976 13030 3054 30.61% xfs.ko 574594 926637 352043 61.27% libiscsi_tcp.ko 17444 17911 467 2.68% ib_cm.ko 32324 62373 30049 92.96% aesni-intel.ko 3370 4922 1552 46.05% drm_client_lib.ko 27449 27794 345 1.26% virtio_pci_modern_dev.ko 2537 2819 282 11.12% rdma_cm.ko 32504 51823 19319 59.44% fat.ko 11958 13297 1339 11.20% dm-log.ko 6529 6986 457 7.00% pata_acpi.ko 9231 9700 469 5.08% ata_piix.ko 10998 12598 1600 14.55% ipt_REJECT.ko 956 1311 355 37.13% drm_ttm_helper.ko 33160 33544 384 1.16% be2iscsi.ko 55078 56993 1915 3.48% i2c-smbus.ko 582 973 391 67.18% cuse.ko 8435 9241 806 9.56% nft_fib_inet.ko 579 995 416 71.85% ib_core.ko 103656 123701 20045 19.34% pulse8-cec.ko 9153 9890 737 8.05% pvpanic.ko 494 1087 593 120.04% dm-mod.ko 31377 35265 3888 12.39% raid6_pq.ko 2774 4207 1433 51.66% nft_reject_ipv6.ko 517 939 422 81.62% cxgb4i.ko 47490 49021 1531 3.22% ata_generic.ko 9008 9666 658 7.30% vboxvideo.ko 47622 48844 1222 2.57% ip_tables.ko 3109 3564 455 14.63% ALL MODS 9153268 11895301 2742033 29.96% vmlinux 5484970 7330096 1845126 33.64% TOTAL 14638238 19225397 4587159 31.34% So this shows a 1.8 MiB increase in vmlinux size, or 33.6%. And for these modules in aggregate, an increase of 2.7 MiB or 30.0%. > Any outliers there? > I would expect modules to have few global variables. In terms of outliers, there are groups that stand out to me: 1. Large percentage increases are usually always for modules that had very tiny BTF before. The module system inherently creates a few global variables for each module, so there's always a slight constant increase of the BTF size (184 bytes, as far as I can tell), and in those cases it can be a quite large percentage. Here's an example, "configs.ko" which comes from the CONFIG_IKCONFIG enablement: BEFORE: $ bpftool btf dump file ../build_pahole_novars/kernel/configs.ko -B ../build_pahole_novars/vmlinux [127877] CONST '(anon)' type_id=11124 [127878] ARRAY '(anon)' type_id=127877 index_type_id=21 nr_elems=1 [127879] CONST '(anon)' type_id=127878 AFTER: $ bpftool btf dump file ../build_pahole_vars/kernel/configs.ko -B ../build_pahole_vars/vmlinux [162827] CONST '(anon)' type_id=11124 [162828] ARRAY '(anon)' type_id=162827 index_type_id=21 nr_elems=1 [162829] CONST '(anon)' type_id=162828 [162830] VAR '____versions' type_id=162829, linkage=static [162831] DATASEC '__versions' size=64 vlen=1 type_id=162830 offset=0 size=64 (VAR '____versions') [162832] VAR 'orc_header' type_id=8667, linkage=static [162833] DATASEC '.orc_header' size=20 vlen=1 type_id=162832 offset=0 size=20 (VAR 'orc_header') [162834] VAR '__this_module' type_id=312, linkage=global [162835] DATASEC '.gnu.linkonce.this_module' size=1344 vlen=1 type_id=162834 offset=0 size=1344 (VAR '__this_module') What is, I think interesting, is that the types in that module were totally useless to begin with, because they were used by a variable which didn't even get emitted. So while this is a substantial percentage-wise increase, I think it's a net improvement for this and other modules. 2. The largest absolute increases come from large, complex modules like xfs, kvm, sunrpc, btrfs, etc. For example, xfs had 5696 VAR declarations. What is disappointing is how much of this is due to automatically-generated "variables" from macros (e.g. tracepoints): Here is a list of variable prefixes like that: print_fmt_* trace_event_fields_* trace_event_type_funcs_* event_* __SCK__tp_func_* __bpf_trace_tp_map_* __event_* event_class_* TRACE_SYSTEM_* __TRACE_SYSTEM_* __tracepoint_* These are, unfortunately, all valid declarations produced by macros and they correspond to valid symbols as well. If you look at the kallsyms for the modules (and core kernel), these variables are present there as well. It may indeed make sense to have kallsyms entries for them: I don't know. These are all, as far as I'm concerned, totally uninteresting types. If you want to access any of this data, you probably already know its type and wouldn't need a BTF declaration. Unfortunately, the flip side is that I don't think we have a good way to automatically detect these, outside of prefix matching, which quickly goes out of date as the kernel changes, and can have false positives as well. For kernel modules, many of these may appear in separate ELF sections, but for vmlinux, they don't. I'd be happy to eliminate types for these auto-generated kinds of variables, if we could somehow annotate them so that pahole knows to ignore them. For instance, maybe we cauld use __attribute__((btf_decl_tag("btf_omit"))) as an instruction to pahole to omit declarations for these things? Thanks, Stephen > So before we decide on what to do with vars in mods lets figure out > the need.
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 1af972a92d06f..3fbdc5ba2d017 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -409,6 +409,16 @@ config PAHOLE_HAS_LANG_EXCLUDE otherwise it would emit malformed kernel and module binaries when using DEBUG_INFO_BTF_MODULES. +config DEBUG_INFO_BTF_GLOBAL_VARS + bool "Generate BTF type information for all global variables" + default y + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128 + help + Include type information for all global variables in the BTF. This + increases the size of the BTF information, which increases memory + usage at runtime. With global variable types available, runtime + debugging and tracers may be able to provide more detail. + config DEBUG_INFO_BTF_MODULES bool "Generate BTF type information for kernel modules" default y diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf index c3cbeb13de503..ad3c05a96a010 100644 --- a/scripts/Makefile.btf +++ b/scripts/Makefile.btf @@ -31,5 +31,8 @@ endif pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust +# Requires v1.28 or later, enforced by KConfig +pahole-flags-$(CONFIG_DEBUG_INFO_BTF_GLOBAL_VARS) += --btf_features=global_var + export PAHOLE_FLAGS := $(pahole-flags-y) export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
Since pahole 1.28, BTF can now include types for all global variables. Previously, BTF has only included types for functions, as well as percpu variables. There are a few applications for this type information. For one, runtime debuggers like drgn[1] can consume it in the absence of DWARF debuginfo. The support in drgn is currently implemented and moving through the review process, see [2]. For distributions which don't distribute DWARF debuginfo, or for situations where it can't be made available, the compact BTF, combined with ORC for stack unwinding, and the kallsyms symbol table, can be used for simple runtime debugging and introspection. Another application is verifying types of ksyms in BPF programs. libbpf already supports resolving global variables with "__ksym", but they must be declared as void. For example, in tools/bpf/bpftool/skeleton/pid_iter.bpf.c we have: extern const void bpf_map_fops __ksym; With global variable information, declarations like these would be able to use the actual variable types, for example: extern const struct file_operations bpf_map_fops __ksym; When the feature was implemented in pahole, my measurements indicated that vmlinux BTF size increased by about 25.8%, and module BTF size increased by 53.2%. Due to these increases, the feature is implemented behind a new config option, allowing users sensitive to increased memory usage to disable it. [1]: https://github.com/osandov/drgn [2]: https://github.com/osandov/drgn/issues/176 Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> --- lib/Kconfig.debug | 10 ++++++++++ scripts/Makefile.btf | 3 +++ 2 files changed, 13 insertions(+)