diff mbox series

[2/2] btf: Add the option to include global variable types

Message ID 20250207012045.2129841-3-stephen.s.brennan@oracle.com (mailing list archive)
State New
Headers show
Series Add option for generating BTF types of global variables | expand

Commit Message

Stephen Brennan Feb. 7, 2025, 1:20 a.m. UTC
Since pahole 1.28, BTF can now include types for all global variables.
Previously, BTF has only included types for functions, as well as percpu
variables.

There are a few applications for this type information. For one, runtime
debuggers like drgn[1] can consume it in the absence of DWARF debuginfo.
The support in drgn is currently implemented and moving through the
review process, see [2]. For distributions which don't distribute DWARF
debuginfo, or for situations where it can't be made available, the
compact BTF, combined with ORC for stack unwinding, and the kallsyms
symbol table, can be used for simple runtime debugging and
introspection.

Another application is verifying types of ksyms in BPF programs. libbpf
already supports resolving global variables with "__ksym", but they must
be declared as void. For example, in
tools/bpf/bpftool/skeleton/pid_iter.bpf.c we have:

    extern const void bpf_map_fops __ksym;

With global variable information, declarations like these would be able
to use the actual variable types, for example:

    extern const struct file_operations bpf_map_fops __ksym;

When the feature was implemented in pahole, my measurements indicated
that vmlinux BTF size increased by about 25.8%, and module BTF size
increased by 53.2%. Due to these increases, the feature is implemented
behind a new config option, allowing users sensitive to increased memory
usage to disable it.

[1]: https://github.com/osandov/drgn
[2]: https://github.com/osandov/drgn/issues/176

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 lib/Kconfig.debug    | 10 ++++++++++
 scripts/Makefile.btf |  3 +++
 2 files changed, 13 insertions(+)

Comments

Alexei Starovoitov Feb. 7, 2025, 11:50 p.m. UTC | #1
On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
> When the feature was implemented in pahole, my measurements indicated
> that vmlinux BTF size increased by about 25.8%, and module BTF size
> increased by 53.2%. Due to these increases, the feature is implemented
> behind a new config option, allowing users sensitive to increased memory
> usage to disable it.
>

...
> +config DEBUG_INFO_BTF_GLOBAL_VARS
> +       bool "Generate BTF type information for all global variables"
> +       default y
> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> +       help
> +         Include type information for all global variables in the BTF. This
> +         increases the size of the BTF information, which increases memory
> +         usage at runtime. With global variable types available, runtime
> +         debugging and tracers may be able to provide more detail.

This is not a solution.
Even if it's changed to 'default n' distros will enable it
like they enable everything and will suffer a regression.

We need to add a new module like vmlinux_btf.ko that will contain
this additional BTF data. For global vars and everything else we might need.

pw-bot: cr
Stephen Brennan Feb. 11, 2025, 11:58 p.m. UTC | #2
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>> When the feature was implemented in pahole, my measurements indicated
>> that vmlinux BTF size increased by about 25.8%, and module BTF size
>> increased by 53.2%. Due to these increases, the feature is implemented
>> behind a new config option, allowing users sensitive to increased memory
>> usage to disable it.
>>
>
> ...
>> +config DEBUG_INFO_BTF_GLOBAL_VARS
>> +       bool "Generate BTF type information for all global variables"
>> +       default y
>> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
>> +       help
>> +         Include type information for all global variables in the BTF. This
>> +         increases the size of the BTF information, which increases memory
>> +         usage at runtime. With global variable types available, runtime
>> +         debugging and tracers may be able to provide more detail.
>
> This is not a solution.
> Even if it's changed to 'default n' distros will enable it
> like they enable everything and will suffer a regression.
>
> We need to add a new module like vmlinux_btf.ko that will contain
> this additional BTF data. For global vars and everything else we might need.

Fair enough. I believe I had shared Alan Maguire's proof-of-concept for
that idea a while back for an older version of this feature:

https://lore.kernel.org/all/20221104231103.752040-10-stephen.s.brennan@oracle.com/

We can dust that off and include it for a new version of this series.
I'd be curious of what you'd like to see for kernel modules? A
three-level tree would be too complex, in my opinion.

As a separate note for this patch series, we discovered that variables
declared twice, where one is declared "__weak", will result in two DWARF
variable declarations, and thus two BTF variables. This trips up the BTF
validation code. So this series as it is cannot move forward. I'm
submitting a fix to dwarves today.

Thanks,
Stephen
Alexei Starovoitov Feb. 14, 2025, 1:18 a.m. UTC | #3
On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >> When the feature was implemented in pahole, my measurements indicated
> >> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >> increased by 53.2%. Due to these increases, the feature is implemented
> >> behind a new config option, allowing users sensitive to increased memory
> >> usage to disable it.
> >>
> >
> > ...
> >> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >> +       bool "Generate BTF type information for all global variables"
> >> +       default y
> >> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >> +       help
> >> +         Include type information for all global variables in the BTF. This
> >> +         increases the size of the BTF information, which increases memory
> >> +         usage at runtime. With global variable types available, runtime
> >> +         debugging and tracers may be able to provide more detail.
> >
> > This is not a solution.
> > Even if it's changed to 'default n' distros will enable it
> > like they enable everything and will suffer a regression.
> >
> > We need to add a new module like vmlinux_btf.ko that will contain
> > this additional BTF data. For global vars and everything else we might need.
>
> Fair enough. I believe I had shared Alan Maguire's proof-of-concept for
> that idea a while back for an older version of this feature:
>
> https://lore.kernel.org/all/20221104231103.752040-10-stephen.s.brennan@oracle.com/

Right vmlinux_extra was discussed in various context, so let's make it happen.

> We can dust that off and include it for a new version of this series.
> I'd be curious of what you'd like to see for kernel modules? A
> three-level tree would be too complex, in my opinion.

What is the use case for vars in kernel modules?

> module BTF size increased by 53.2%.

This is the sum of all mods with vars divided by
the sum of all mods without?
Any outliers there?
I would expect modules to have few global variables.

So before we decide on what to do with vars in mods lets figure out
the need.
Stephen Brennan Feb. 18, 2025, 11:09 p.m. UTC | #4
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan
[...]
>> We can dust that off and include it for a new version of this series.
>> I'd be curious of what you'd like to see for kernel modules? A
>> three-level tree would be too complex, in my opinion.
>
> What is the use case for vars in kernel modules?

The use case would be the same as for the core kernel. My primary
motivation is to allow drgn to understand the types of global variables,
and that extends to kernel modules too.

>> module BTF size increased by 53.2%.
>
> This is the sum of all mods with vars divided by
> the sum of all mods without?

That was a poorly done comparison, so let me provide this one that I did
using 6.13 and these patches. It was essentially a localmodconfig for a
VM instance, so I could still do better by picking a popular
distribution config. But I think this is far more representative.

MODULE                   BASE   COMP    CHG     PCT
drm.ko                   115833 123410  7577    6.54%
iscsi_boot_sysfs.ko      2627   5380    2753    104.80%
joydev.ko                1816   2289    473     26.05%
libcxgbi.ko              24556  25266   710     2.89%
drm_vram_helper.ko       22325  22751   426     1.91%
nvme-tcp.ko              25044  25973   929     3.71%
vfat.ko                  3448   3953    505     14.65%
btrfs.ko                 275139 343686  68547   24.91%
libiscsi.ko              21177  21977   800     3.78%
xt_owner.ko              449    803     354     78.84%
nft_ct.ko                4912   6157    1245    25.35%
iscsi_ibft.ko            3967   4463    496     12.50%
pcspkr.ko                283    682     399     140.99%
crc32-pclmul.ko          390    771     381     97.69%
nf_conntrack.ko          23686  28191   4505    19.02%
iscsi_tcp.ko             16827  17750   923     5.49%
nft_fib.ko               835    1117    282     33.77%
nf_reject_ipv6.ko        699    981     282     40.34%
rfkill.ko                4233   6410    2177    51.43%
dm-region-hash.ko        6214   6496    282     4.54%
cxgb3i.ko                35469  37078   1609    4.54%
dm-mirror.ko             7576   8191    615     8.12%
pvpanic-pci.ko           174    574     400     229.89%
crct10dif-pclmul.ko      146    525     379     259.59%
nvme-fabrics.ko          17341  18124   783     4.52%
kvm-amd.ko               47302  51914   4612    9.75%
crc8.ko                  221    405     184     83.26%
ib_iser.ko               27769  29116   1347    4.85%
sg.ko                    4234   5656    1422    33.59%
intel_rapl_common.ko     5678   8446    2768    48.75%
bochs.ko                 35643  36997   1354    3.80%
sha1-ssse3.ko            790    1305    515     65.19%
kvm-intel.ko             53802  59220   5418    10.07%
nft_chain_nat.ko         279    714     435     155.91%
vmlinux                  5484970        7330096 1845126 33.64%
sha256-ssse3.ko          851    1378    527     61.93%
nf_nat.ko                6341   7240    899     14.18%
configs.ko               72     256     184     255.56%
xt_comment.ko            151    507     356     235.76%
ccp.ko                   30433  34782   4349    14.29%
cxgb3.ko                 44981  47504   2523    5.61%
crypto_simd.ko           1331   1613    282     21.19%
iptable_filter.ko        855    1456    601     70.29%
qedi.ko                  70653  72786   2133    3.02%
drm_kms_helper.ko        63238  65000   1762    2.79%
cnic.ko                  117074 117790  716     0.61%
failover.ko              780    1216    436     55.90%
nft_redir.ko             874    1529    655     74.94%
serio_raw.ko             708    1234    526     74.29%
nf_defrag_ipv6.ko        1520   2253    733     48.22%
nf_defrag_ipv4.ko        306    770     464     151.63%
nft_reject_ipv4.ko       517    939     422     81.62%
nft_nat.ko               1192   1732    540     45.30%
nft_reject_inet.ko       554    976     422     76.17%
fuse.ko                  32181  41859   9678    30.07%
nft_compat.ko            3705   4404    699     18.87%
zstd_compress.ko         42597  43622   1025    2.41%
tls.ko                   15140  20683   5543    36.61%
virtio_pci.ko            8456   9193    737     8.72%
blake2b_generic.ko       1364   1699    335     24.56%
cryptd.ko                3697   4297    600     16.23%
xor.ko                   1358   1879    521     38.37%
intel_rapl_msr.ko        2851   3440    589     20.66%
kvm.ko                   177060 256377  79317   44.80%
cxgb4.ko                 215865 220844  4979    2.31%
bnx2i.ko                 39524  41477   1953    4.94%
dm-round-robin.ko        1795   2123    328     18.27%
virtio_pci_legacy_dev.ko 909    1191    282     31.02%
qla4xxx.ko               79040  82694   3654    4.62%
nfs.ko                   108350 169642  61292   56.57%
libata.ko                47301  66188   18887   39.93%
ghash-clmulni-intel.ko   578    997     419     72.49%
nf_reject_ipv4.ko        706    988     282     39.94%
nft_reject.ko            820    1196    376     45.85%
sunrpc.ko                127496 197841  70345   55.17%
nft_fib_ipv4.ko          803    1257    454     56.54%
scsi_transport_iscsi.ko  40419  57633   17214   42.59%
lockd.ko                 36144  42137   5993    16.58%
drm_shmem_helper.ko      32555  33043   488     1.50%
nvme-core.ko             50275  58298   8023    15.96%
iw_cm.ko                 13405  14796   1391    10.38%
mdio.ko                  857    1041    184     21.47%
bnx2.ko                  20354  21611   1257    6.18%
net_failover.ko          1742   2187    445     25.55%
ip_set.ko                11812  13093   1281    10.84%
libcxgb.ko               8698   8980    282     3.24%
dm-multipath.ko          8124   8898    774     9.53%
grace.ko                 462    890     428     92.64%
virtio_net.ko            12322  14896   2574    20.89%
qed.ko                   228735 232231  3496    1.53%
cdc-acm.ko               2923   3679    756     25.86%
i2c-piix4.ko             1124   2341    1217    108.27%
pvpanic-mmio.ko          177    625     448     253.11%
virtio_scsi.ko           3154   3898    744     23.59%
uio.ko                   2602   4295    1693    65.07%
nft_fib_ipv6.ko          956    1410    454     47.49%
cec.ko                   28370  29266   896     3.16%
qemu_fw_cfg.ko           1601   3476    1875    117.11%
ttm.ko                   23672  25727   2055    8.68%
sd_mod.ko                9976   13030   3054    30.61%
xfs.ko                   574594 926637  352043  61.27%
libiscsi_tcp.ko          17444  17911   467     2.68%
ib_cm.ko                 32324  62373   30049   92.96%
aesni-intel.ko           3370   4922    1552    46.05%
drm_client_lib.ko        27449  27794   345     1.26%
virtio_pci_modern_dev.ko 2537   2819    282     11.12%
rdma_cm.ko               32504  51823   19319   59.44%
fat.ko                   11958  13297   1339    11.20%
dm-log.ko                6529   6986    457     7.00%
pata_acpi.ko             9231   9700    469     5.08%
ata_piix.ko              10998  12598   1600    14.55%
ipt_REJECT.ko            956    1311    355     37.13%
drm_ttm_helper.ko        33160  33544   384     1.16%
be2iscsi.ko              55078  56993   1915    3.48%
i2c-smbus.ko             582    973     391     67.18%
cuse.ko                  8435   9241    806     9.56%
nft_fib_inet.ko          579    995     416     71.85%
ib_core.ko               103656 123701  20045   19.34%
pulse8-cec.ko            9153   9890    737     8.05%
pvpanic.ko               494    1087    593     120.04%
dm-mod.ko                31377  35265   3888    12.39%
raid6_pq.ko              2774   4207    1433    51.66%
nft_reject_ipv6.ko       517    939     422     81.62%
cxgb4i.ko                47490  49021   1531    3.22%
ata_generic.ko           9008   9666    658     7.30%
vboxvideo.ko             47622  48844   1222    2.57%
ip_tables.ko             3109   3564    455     14.63%

ALL MODS                 9153268        11895301        2742033 29.96%
vmlinux                  5484970        7330096 1845126 33.64%
TOTAL                    14638238       19225397        4587159 31.34%

So this shows a 1.8 MiB increase in vmlinux size, or 33.6%.
And for these modules in aggregate, an increase of 2.7 MiB or 30.0%.

> Any outliers there?
> I would expect modules to have few global variables.

In terms of outliers, there are groups that stand out to me:

1. Large percentage increases are usually always for modules that had
very tiny BTF before. The module system inherently creates a few
global variables for each module, so there's always a slight constant
increase of the BTF size (184 bytes, as far as I can tell), and in those
cases it can be a quite large percentage. Here's an example,
"configs.ko" which comes from the CONFIG_IKCONFIG enablement:

BEFORE:
    $ bpftool btf dump file ../build_pahole_novars/kernel/configs.ko -B ../build_pahole_novars/vmlinux
    [127877] CONST '(anon)' type_id=11124
    [127878] ARRAY '(anon)' type_id=127877 index_type_id=21 nr_elems=1
    [127879] CONST '(anon)' type_id=127878

AFTER:
    $ bpftool btf dump file ../build_pahole_vars/kernel/configs.ko -B ../build_pahole_vars/vmlinux
    [162827] CONST '(anon)' type_id=11124
    [162828] ARRAY '(anon)' type_id=162827 index_type_id=21 nr_elems=1
    [162829] CONST '(anon)' type_id=162828
    [162830] VAR '____versions' type_id=162829, linkage=static
    [162831] DATASEC '__versions' size=64 vlen=1
            type_id=162830 offset=0 size=64 (VAR '____versions')
    [162832] VAR 'orc_header' type_id=8667, linkage=static
    [162833] DATASEC '.orc_header' size=20 vlen=1
            type_id=162832 offset=0 size=20 (VAR 'orc_header')
    [162834] VAR '__this_module' type_id=312, linkage=global
    [162835] DATASEC '.gnu.linkonce.this_module' size=1344 vlen=1
            type_id=162834 offset=0 size=1344 (VAR '__this_module')

What is, I think interesting, is that the types in that module were
totally useless to begin with, because they were used by a variable
which didn't even get emitted. So while this is a substantial
percentage-wise increase, I think it's a net improvement for this and
other modules.

2. The largest absolute increases come from large, complex modules like
xfs, kvm, sunrpc, btrfs, etc. For example, xfs had 5696 VAR
declarations. What is disappointing is how much of this is due to
automatically-generated "variables" from macros (e.g. tracepoints):
Here is a list of variable prefixes like that:

  print_fmt_*
  trace_event_fields_*
  trace_event_type_funcs_*
  event_*
  __SCK__tp_func_*
  __bpf_trace_tp_map_*
  __event_*
  event_class_*
  TRACE_SYSTEM_*
  __TRACE_SYSTEM_*
  __tracepoint_*

These are, unfortunately, all valid declarations produced by macros and
they correspond to valid symbols as well. If you look at the kallsyms
for the modules (and core kernel), these variables are present there as
well. It may indeed make sense to have kallsyms entries for them: I
don't know.

These are all, as far as I'm concerned, totally uninteresting types. If
you want to access any of this data, you probably already know its type
and wouldn't need a BTF declaration. Unfortunately, the flip side is
that I don't think we have a good way to automatically detect these,
outside of prefix matching, which quickly goes out of date as the kernel
changes, and can have false positives as well. For kernel modules, many
of these may appear in separate ELF sections, but for vmlinux, they
don't. I'd be happy to eliminate types for these auto-generated kinds of
variables, if we could somehow annotate them so that pahole knows to
ignore them. For instance, maybe we cauld use

__attribute__((btf_decl_tag("btf_omit")))

as an instruction to pahole to omit declarations for these things?

Thanks,
Stephen

> So before we decide on what to do with vars in mods lets figure out
> the need.
diff mbox series

Patch

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af972a92d06f..3fbdc5ba2d017 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -409,6 +409,16 @@  config PAHOLE_HAS_LANG_EXCLUDE
 	  otherwise it would emit malformed kernel and module binaries when
 	  using DEBUG_INFO_BTF_MODULES.
 
+config DEBUG_INFO_BTF_GLOBAL_VARS
+	bool "Generate BTF type information for all global variables"
+	default y
+	depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
+	help
+	  Include type information for all global variables in the BTF. This
+	  increases the size of the BTF information, which increases memory
+	  usage at runtime. With global variable types available, runtime
+	  debugging and tracers may be able to provide more detail.
+
 config DEBUG_INFO_BTF_MODULES
 	bool "Generate BTF type information for kernel modules"
 	default y
diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index c3cbeb13de503..ad3c05a96a010 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -31,5 +31,8 @@  endif
 
 pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)		+= --lang_exclude=rust
 
+# Requires v1.28 or later, enforced by KConfig
+pahole-flags-$(CONFIG_DEBUG_INFO_BTF_GLOBAL_VARS)	+= --btf_features=global_var
+
 export PAHOLE_FLAGS := $(pahole-flags-y)
 export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)