diff mbox series

[RFC,bpf-next,1/2] bpf: support standalone BTF in modules

Message ID 1667577487-9162-2-git-send-email-alan.maguire@oracle.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series bpf: standalone BTF support for modules | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR pending PR summary
bpf/vmtest-bpf-next-VM_Test-1 pending Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_verifier on x86_64 with llvm-16
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/cc_maintainers success CCed 12 of 12 maintainers
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Alan Maguire Nov. 4, 2022, 3:58 p.m. UTC
Not all kernel modules can be built in-tree when the core
kernel is built. This presents a problem for split BTF, because
split module BTF refers to type ids in the base kernel BTF, and
if that base kernel BTF changes (even in minor ways) those
references become invalid.  Such modules then cannot take
advantage of BTF (or at least they only can until the kernel
changes enough to invalidate their vmlinux type id references).
This problem has been discussed before, and the initial approach
was to allow BTF mismatch but fail to load BTF.  See [1]
for more discussion.

Generating standalone BTF for modules helps solve this problem
because the BTF generated is self-referential only.  However,
tooling is geared towards split BTF - for example bpftool assumes
a module's BTF is defined relative to vmlinux BTF.  To handle
this, dynamic remapping of standalone BTF is done on module
load to make it appear like split BTF - type ids and string
offsets are remapped such that they appear as they would in
split BTF.  It just so happens that the BTF is self-referential.
With this approach, existing tooling works with standalone
module BTF from /sys/kernel/btf in the same way as before;
no knowledge of split versus standalone BTF is required.

Currently, the approach taken is to assume that the BTF
associated with a module is split BTF.  If however the
checking of types fails, we fall back to interpreting it as
standalone BTF and carrying out remapping.  As discussed in [1]
there are some heuristics we could use to identify standalone
versus split module BTF, but for now the simplistic fallback
method is used.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>

[1] https://lore.kernel.org/bpf/YfK18x%2FXrYL4Vw8o@syu-laptop/
---
 kernel/bpf/btf.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)

Comments

Alexei Starovoitov Nov. 5, 2022, 10:54 p.m. UTC | #1
On Fri, Nov 4, 2022 at 8:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Not all kernel modules can be built in-tree when the core
> kernel is built. This presents a problem for split BTF, because
> split module BTF refers to type ids in the base kernel BTF, and
> if that base kernel BTF changes (even in minor ways) those
> references become invalid.  Such modules then cannot take
> advantage of BTF (or at least they only can until the kernel
> changes enough to invalidate their vmlinux type id references).
> This problem has been discussed before, and the initial approach
> was to allow BTF mismatch but fail to load BTF.  See [1]
> for more discussion.
>
> Generating standalone BTF for modules helps solve this problem
> because the BTF generated is self-referential only.  However,
> tooling is geared towards split BTF - for example bpftool assumes
> a module's BTF is defined relative to vmlinux BTF.  To handle
> this, dynamic remapping of standalone BTF is done on module
> load to make it appear like split BTF - type ids and string
> offsets are remapped such that they appear as they would in
> split BTF.  It just so happens that the BTF is self-referential.
> With this approach, existing tooling works with standalone
> module BTF from /sys/kernel/btf in the same way as before;
> no knowledge of split versus standalone BTF is required.
>
> Currently, the approach taken is to assume that the BTF
> associated with a module is split BTF.  If however the
> checking of types fails, we fall back to interpreting it as
> standalone BTF and carrying out remapping.  As discussed in [1]
> there are some heuristics we could use to identify standalone
> versus split module BTF, but for now the simplistic fallback
> method is used.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>
> [1] https://lore.kernel.org/bpf/YfK18x%2FXrYL4Vw8o@syu-laptop/
> ---
>  kernel/bpf/btf.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 132 insertions(+)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 5579ff3..5efdcaf 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -5315,11 +5315,120 @@ struct btf *btf_parse_vmlinux(void)
>
>  #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
>
> +static u32 btf_name_off_renumber(struct btf *btf, u32 name_off)
> +{
> +       return name_off + btf->start_str_off;
> +}
> +
> +static u32 btf_id_renumber(struct btf *btf, u32 id)
> +{
> +       /* no need to renumber void */
> +       if (id == 0)
> +               return id;
> +       return id + btf->start_id - 1;
> +}
> +
> +/* Renumber standalone BTF to appear as split BTF; name offsets must
> + * be relative to btf->start_str_offset and ids relative to btf->start_id.
> + * When user sees BTF it will appear as normal module split BTF, the only
> + * difference being it is fully self-referential and does not refer back
> + * to vmlinux BTF (aside from 0 "void" references).
> + */
> +static void btf_type_renumber(struct btf_verifier_env *env, struct btf_type *t)
> +{
> +       struct btf_var_secinfo *secinfo;
> +       struct btf *btf = env->btf;
> +       struct btf_member *member;
> +       struct btf_param *param;
> +       struct btf_array *array;
> +       struct btf_enum64 *e64;
> +       struct btf_enum *e;
> +       int i;
> +
> +       t->name_off = btf_name_off_renumber(btf, t->name_off);
> +
> +       switch (BTF_INFO_KIND(t->info)) {
> +       case BTF_KIND_INT:
> +       case BTF_KIND_FLOAT:
> +       case BTF_KIND_TYPE_TAG:
> +               /* nothing to renumber here, no type references */
> +               break;
> +       case BTF_KIND_PTR:
> +       case BTF_KIND_FWD:
> +       case BTF_KIND_TYPEDEF:
> +       case BTF_KIND_VOLATILE:
> +       case BTF_KIND_CONST:
> +       case BTF_KIND_RESTRICT:
> +       case BTF_KIND_FUNC:
> +       case BTF_KIND_VAR:
> +       case BTF_KIND_DECL_TAG:
> +               /* renumber the referenced type */
> +               t->type = btf_id_renumber(btf, t->type);
> +               break;
> +       case BTF_KIND_ARRAY:
> +               array = btf_array(t);
> +               array->type = btf_id_renumber(btf, array->type);
> +               array->index_type = btf_id_renumber(btf, array->index_type);
> +               break;
> +       case BTF_KIND_STRUCT:
> +       case BTF_KIND_UNION:
> +               member = (struct btf_member *)(t + 1);
> +               for (i = 0; i < btf_type_vlen(t); i++) {
> +                       member->type = btf_id_renumber(btf, member->type);
> +                       member->name_off = btf_name_off_renumber(btf, member->name_off);
> +                       member++;
> +               }
> +               break;
> +       case BTF_KIND_FUNC_PROTO:
> +               param = (struct btf_param *)(t + 1);
> +               for (i = 0; i < btf_type_vlen(t); i++) {
> +                       param->type = btf_id_renumber(btf, param->type);
> +                       param->name_off = btf_name_off_renumber(btf, param->name_off);
> +                       param++;
> +               }
> +               break;
> +       case BTF_KIND_DATASEC:
> +               secinfo = (struct btf_var_secinfo *)(t + 1);
> +               for (i = 0; i < btf_type_vlen(t); i++) {
> +                       secinfo->type = btf_id_renumber(btf, secinfo->type);
> +                       secinfo++;
> +               }
> +               break;
> +       case BTF_KIND_ENUM:
> +               e = (struct btf_enum *)(t + 1);
> +               for (i = 0; i < btf_type_vlen(t); i++) {
> +                       e->name_off = btf_name_off_renumber(btf, e->name_off);
> +                       e++;
> +               }
> +               break;
> +       case BTF_KIND_ENUM64:
> +               e64 = (struct btf_enum64 *)(t + 1);
> +               for (i = 0; i < btf_type_vlen(t); i++) {
> +                       e64->name_off = btf_name_off_renumber(btf, e64->name_off);
> +                       e64++;
> +               }
> +               break;
> +       }
> +}
> +
> +static void btf_renumber(struct btf_verifier_env *env, struct btf *base_btf)
> +{
> +       struct btf *btf = env->btf;
> +       int i;
> +
> +       btf->start_id = base_btf->nr_types;
> +       btf->start_str_off = base_btf->hdr.str_len;
> +
> +       for (i = 0; i < btf->nr_types; i++)
> +               btf_type_renumber(env, btf->types[i]);
> +}
> +
>  static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
>  {
>         struct btf_verifier_env *env = NULL;
>         struct bpf_verifier_log *log;
>         struct btf *btf = NULL, *base_btf;
> +       bool standalone = false;
>         int err;
>
>         base_btf = bpf_get_btf_vmlinux();
> @@ -5367,9 +5476,32 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
>                 goto errout;
>
>         err = btf_check_all_metas(env);
> +       if (err) {
> +               /* BTF may be standalone; in that case meta checks will
> +                * fail and we fall back to standalone BTF processing.
> +                * Later on, once we have checked all metas, we will
> +                * retain start id from  base BTF so it will look like
> +                * split BTF (but is self-contained); renumbering is done
> +                * also to give the split BTF-like appearance and not
> +                * confuse pahole which assumes split BTF for modules.
> +                */
> +               btf->base_btf = NULL;
> +               if (btf->types)
> +                       kvfree(btf->types);
> +               btf->types = NULL;
> +               btf->types_size = 0;
> +               btf->start_id = 0;
> +               btf->nr_types = 0;
> +               btf->start_str_off = 0;
> +               standalone = true;
> +               err = btf_check_all_metas(env);
> +       }

Interesting idea!
Instead of failing the first time, how about we make
such standalone module BTFs explicit?
Some flag or special type?
Then the kernel just checks that and renumbers right away.

And combining this with what we discussed in the other thread
to load vmlinux's BTF through the module...
We'd need two flags in BTF to address both cases.
One to load base vmlinux BTF from a module and
another to load standalone module BTF.
Maybe add another BTF_KIND and it will be first in all types ?
Or use DATASEC with a special name ?
Alan Maguire Nov. 7, 2022, 4:37 p.m. UTC | #2
On 05/11/2022 22:54, Alexei Starovoitov wrote:
> On Fri, Nov 4, 2022 at 8:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Not all kernel modules can be built in-tree when the core
>> kernel is built. This presents a problem for split BTF, because
>> split module BTF refers to type ids in the base kernel BTF, and
>> if that base kernel BTF changes (even in minor ways) those
>> references become invalid.  Such modules then cannot take
>> advantage of BTF (or at least they only can until the kernel
>> changes enough to invalidate their vmlinux type id references).
>> This problem has been discussed before, and the initial approach
>> was to allow BTF mismatch but fail to load BTF.  See [1]
>> for more discussion.
>>
>> Generating standalone BTF for modules helps solve this problem
>> because the BTF generated is self-referential only.  However,
>> tooling is geared towards split BTF - for example bpftool assumes
>> a module's BTF is defined relative to vmlinux BTF.  To handle
>> this, dynamic remapping of standalone BTF is done on module
>> load to make it appear like split BTF - type ids and string
>> offsets are remapped such that they appear as they would in
>> split BTF.  It just so happens that the BTF is self-referential.
>> With this approach, existing tooling works with standalone
>> module BTF from /sys/kernel/btf in the same way as before;
>> no knowledge of split versus standalone BTF is required.
>>
>> Currently, the approach taken is to assume that the BTF
>> associated with a module is split BTF.  If however the
>> checking of types fails, we fall back to interpreting it as
>> standalone BTF and carrying out remapping.  As discussed in [1]
>> there are some heuristics we could use to identify standalone
>> versus split module BTF, but for now the simplistic fallback
>> method is used.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>>
>> [1] https://lore.kernel.org/bpf/YfK18x%2FXrYL4Vw8o@syu-laptop/
>> ---
>>  kernel/bpf/btf.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 132 insertions(+)
>>
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 5579ff3..5efdcaf 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -5315,11 +5315,120 @@ struct btf *btf_parse_vmlinux(void)
>>
>>  #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
>>
>> +static u32 btf_name_off_renumber(struct btf *btf, u32 name_off)
>> +{
>> +       return name_off + btf->start_str_off;
>> +}
>> +
>> +static u32 btf_id_renumber(struct btf *btf, u32 id)
>> +{
>> +       /* no need to renumber void */
>> +       if (id == 0)
>> +               return id;
>> +       return id + btf->start_id - 1;
>> +}
>> +
>> +/* Renumber standalone BTF to appear as split BTF; name offsets must
>> + * be relative to btf->start_str_offset and ids relative to btf->start_id.
>> + * When user sees BTF it will appear as normal module split BTF, the only
>> + * difference being it is fully self-referential and does not refer back
>> + * to vmlinux BTF (aside from 0 "void" references).
>> + */
>> +static void btf_type_renumber(struct btf_verifier_env *env, struct btf_type *t)
>> +{
>> +       struct btf_var_secinfo *secinfo;
>> +       struct btf *btf = env->btf;
>> +       struct btf_member *member;
>> +       struct btf_param *param;
>> +       struct btf_array *array;
>> +       struct btf_enum64 *e64;
>> +       struct btf_enum *e;
>> +       int i;
>> +
>> +       t->name_off = btf_name_off_renumber(btf, t->name_off);
>> +
>> +       switch (BTF_INFO_KIND(t->info)) {
>> +       case BTF_KIND_INT:
>> +       case BTF_KIND_FLOAT:
>> +       case BTF_KIND_TYPE_TAG:
>> +               /* nothing to renumber here, no type references */
>> +               break;
>> +       case BTF_KIND_PTR:
>> +       case BTF_KIND_FWD:
>> +       case BTF_KIND_TYPEDEF:
>> +       case BTF_KIND_VOLATILE:
>> +       case BTF_KIND_CONST:
>> +       case BTF_KIND_RESTRICT:
>> +       case BTF_KIND_FUNC:
>> +       case BTF_KIND_VAR:
>> +       case BTF_KIND_DECL_TAG:
>> +               /* renumber the referenced type */
>> +               t->type = btf_id_renumber(btf, t->type);
>> +               break;
>> +       case BTF_KIND_ARRAY:
>> +               array = btf_array(t);
>> +               array->type = btf_id_renumber(btf, array->type);
>> +               array->index_type = btf_id_renumber(btf, array->index_type);
>> +               break;
>> +       case BTF_KIND_STRUCT:
>> +       case BTF_KIND_UNION:
>> +               member = (struct btf_member *)(t + 1);
>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>> +                       member->type = btf_id_renumber(btf, member->type);
>> +                       member->name_off = btf_name_off_renumber(btf, member->name_off);
>> +                       member++;
>> +               }
>> +               break;
>> +       case BTF_KIND_FUNC_PROTO:
>> +               param = (struct btf_param *)(t + 1);
>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>> +                       param->type = btf_id_renumber(btf, param->type);
>> +                       param->name_off = btf_name_off_renumber(btf, param->name_off);
>> +                       param++;
>> +               }
>> +               break;
>> +       case BTF_KIND_DATASEC:
>> +               secinfo = (struct btf_var_secinfo *)(t + 1);
>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>> +                       secinfo->type = btf_id_renumber(btf, secinfo->type);
>> +                       secinfo++;
>> +               }
>> +               break;
>> +       case BTF_KIND_ENUM:
>> +               e = (struct btf_enum *)(t + 1);
>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>> +                       e->name_off = btf_name_off_renumber(btf, e->name_off);
>> +                       e++;
>> +               }
>> +               break;
>> +       case BTF_KIND_ENUM64:
>> +               e64 = (struct btf_enum64 *)(t + 1);
>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>> +                       e64->name_off = btf_name_off_renumber(btf, e64->name_off);
>> +                       e64++;
>> +               }
>> +               break;
>> +       }
>> +}
>> +
>> +static void btf_renumber(struct btf_verifier_env *env, struct btf *base_btf)
>> +{
>> +       struct btf *btf = env->btf;
>> +       int i;
>> +
>> +       btf->start_id = base_btf->nr_types;
>> +       btf->start_str_off = base_btf->hdr.str_len;
>> +
>> +       for (i = 0; i < btf->nr_types; i++)
>> +               btf_type_renumber(env, btf->types[i]);
>> +}
>> +
>>  static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
>>  {
>>         struct btf_verifier_env *env = NULL;
>>         struct bpf_verifier_log *log;
>>         struct btf *btf = NULL, *base_btf;
>> +       bool standalone = false;
>>         int err;
>>
>>         base_btf = bpf_get_btf_vmlinux();
>> @@ -5367,9 +5476,32 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
>>                 goto errout;
>>
>>         err = btf_check_all_metas(env);
>> +       if (err) {
>> +               /* BTF may be standalone; in that case meta checks will
>> +                * fail and we fall back to standalone BTF processing.
>> +                * Later on, once we have checked all metas, we will
>> +                * retain start id from  base BTF so it will look like
>> +                * split BTF (but is self-contained); renumbering is done
>> +                * also to give the split BTF-like appearance and not
>> +                * confuse pahole which assumes split BTF for modules.
>> +                */
>> +               btf->base_btf = NULL;
>> +               if (btf->types)
>> +                       kvfree(btf->types);
>> +               btf->types = NULL;
>> +               btf->types_size = 0;
>> +               btf->start_id = 0;
>> +               btf->nr_types = 0;
>> +               btf->start_str_off = 0;
>> +               standalone = true;
>> +               err = btf_check_all_metas(env);
>> +       }
> 
> Interesting idea!
> Instead of failing the first time, how about we make
> such standalone module BTFs explicit?
> Some flag or special type?
> Then the kernel just checks that and renumbers right away.
> 

I was thinking that might be one way to do it, perhaps even
a .BTF_standalone section name or somesuch as a signal we
are dealing with standalone BTF. However I _think_
we can actually determine the module BTF is standalone
without needing to change anything in the toolchain (I 
think adding flags would require that).

If the BTF consists of string offsets all within base BTF
string range, and it contains a definition for a BTF_KIND_INT
called "int", we can infer safely it is standalone BTF I think.
The nice thing is we are guaranteed such an int definition will
be there in every module, thanks to the module init function 
signature. These tests could be put into a btf_is_standalone() 
function, and it would avoid the need to fall back from interpreting
as split BTF, which was messy and polluted logs. If that makes sense,
I'll respin with that change.

> And combining this with what we discussed in the other thread
> to load vmlinux's BTF through the module...
> We'd need two flags in BTF to address both cases.
> One to load base vmlinux BTF from a module and
> another to load standalone module BTF.
> Maybe add another BTF_KIND and it will be first in all types ?
> Or use DATASEC with a special name ?
> 

I'm wondering if we can minimize impact on existing tools for
vmlinux BTF when loaded as a module; even if the module is
called vmlinux_btf, we could special-case how it appears
in /sys/kernel/btf such that it is still represented as 
/sys/kernel/btf/vmlinux I suspect, rather than as its module
name "vmlinux_btf". If only part of the core kernel BTF was
in vmlinux_btf (CONFIG_DEBUG_INFO_BTF=y and say 
CONFIG_DEBUG_INFO_BTF_VARS=m as per [1]) the vmlinux_btf
name would be used to for the extras.

One pain point is that if vmlinux_btf was loaded after other 
modules we'd need to iterate over them to (re)-load 
their BTF, so I think CONFIG_DEBUG_INFO_BTF=m would possibly require
CONFIG_DEBUG_INFO_BTF_MISMATCH=y, otherwise late loading of vmlinux_btf
might block other modules loading, as their split BTF would be invalid
without the base BTF. Even standalone modules would need that to
get their BTF renumbering right. But that all seems doable.

Alan

[1] https://lore.kernel.org/bpf/20221104231103.752040-10-stephen.s.brennan@oracle.com/T/#u
Andrii Nakryiko Nov. 9, 2022, 10:43 p.m. UTC | #3
On Mon, Nov 7, 2022 at 8:37 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 05/11/2022 22:54, Alexei Starovoitov wrote:
> > On Fri, Nov 4, 2022 at 8:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> Not all kernel modules can be built in-tree when the core
> >> kernel is built. This presents a problem for split BTF, because
> >> split module BTF refers to type ids in the base kernel BTF, and
> >> if that base kernel BTF changes (even in minor ways) those
> >> references become invalid.  Such modules then cannot take
> >> advantage of BTF (or at least they only can until the kernel
> >> changes enough to invalidate their vmlinux type id references).
> >> This problem has been discussed before, and the initial approach
> >> was to allow BTF mismatch but fail to load BTF.  See [1]
> >> for more discussion.
> >>
> >> Generating standalone BTF for modules helps solve this problem
> >> because the BTF generated is self-referential only.  However,
> >> tooling is geared towards split BTF - for example bpftool assumes
> >> a module's BTF is defined relative to vmlinux BTF.  To handle
> >> this, dynamic remapping of standalone BTF is done on module
> >> load to make it appear like split BTF - type ids and string
> >> offsets are remapped such that they appear as they would in
> >> split BTF.  It just so happens that the BTF is self-referential.
> >> With this approach, existing tooling works with standalone
> >> module BTF from /sys/kernel/btf in the same way as before;
> >> no knowledge of split versus standalone BTF is required.
> >>
> >> Currently, the approach taken is to assume that the BTF
> >> associated with a module is split BTF.  If however the
> >> checking of types fails, we fall back to interpreting it as
> >> standalone BTF and carrying out remapping.  As discussed in [1]
> >> there are some heuristics we could use to identify standalone
> >> versus split module BTF, but for now the simplistic fallback
> >> method is used.
> >>
> >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >>
> >> [1] https://lore.kernel.org/bpf/YfK18x%2FXrYL4Vw8o@syu-laptop/
> >> ---
> >>  kernel/bpf/btf.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 132 insertions(+)
> >>
> >> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >> index 5579ff3..5efdcaf 100644
> >> --- a/kernel/bpf/btf.c
> >> +++ b/kernel/bpf/btf.c
> >> @@ -5315,11 +5315,120 @@ struct btf *btf_parse_vmlinux(void)
> >>
> >>  #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
> >>
> >> +static u32 btf_name_off_renumber(struct btf *btf, u32 name_off)
> >> +{
> >> +       return name_off + btf->start_str_off;
> >> +}
> >> +
> >> +static u32 btf_id_renumber(struct btf *btf, u32 id)
> >> +{
> >> +       /* no need to renumber void */
> >> +       if (id == 0)
> >> +               return id;
> >> +       return id + btf->start_id - 1;
> >> +}
> >> +
> >> +/* Renumber standalone BTF to appear as split BTF; name offsets must
> >> + * be relative to btf->start_str_offset and ids relative to btf->start_id.
> >> + * When user sees BTF it will appear as normal module split BTF, the only
> >> + * difference being it is fully self-referential and does not refer back
> >> + * to vmlinux BTF (aside from 0 "void" references).
> >> + */
> >> +static void btf_type_renumber(struct btf_verifier_env *env, struct btf_type *t)
> >> +{
> >> +       struct btf_var_secinfo *secinfo;
> >> +       struct btf *btf = env->btf;
> >> +       struct btf_member *member;
> >> +       struct btf_param *param;
> >> +       struct btf_array *array;
> >> +       struct btf_enum64 *e64;
> >> +       struct btf_enum *e;
> >> +       int i;
> >> +
> >> +       t->name_off = btf_name_off_renumber(btf, t->name_off);
> >> +
> >> +       switch (BTF_INFO_KIND(t->info)) {
> >> +       case BTF_KIND_INT:
> >> +       case BTF_KIND_FLOAT:
> >> +       case BTF_KIND_TYPE_TAG:
> >> +               /* nothing to renumber here, no type references */
> >> +               break;
> >> +       case BTF_KIND_PTR:
> >> +       case BTF_KIND_FWD:
> >> +       case BTF_KIND_TYPEDEF:
> >> +       case BTF_KIND_VOLATILE:
> >> +       case BTF_KIND_CONST:
> >> +       case BTF_KIND_RESTRICT:
> >> +       case BTF_KIND_FUNC:
> >> +       case BTF_KIND_VAR:
> >> +       case BTF_KIND_DECL_TAG:
> >> +               /* renumber the referenced type */
> >> +               t->type = btf_id_renumber(btf, t->type);
> >> +               break;
> >> +       case BTF_KIND_ARRAY:
> >> +               array = btf_array(t);
> >> +               array->type = btf_id_renumber(btf, array->type);
> >> +               array->index_type = btf_id_renumber(btf, array->index_type);
> >> +               break;
> >> +       case BTF_KIND_STRUCT:
> >> +       case BTF_KIND_UNION:
> >> +               member = (struct btf_member *)(t + 1);
> >> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >> +                       member->type = btf_id_renumber(btf, member->type);
> >> +                       member->name_off = btf_name_off_renumber(btf, member->name_off);
> >> +                       member++;
> >> +               }
> >> +               break;
> >> +       case BTF_KIND_FUNC_PROTO:
> >> +               param = (struct btf_param *)(t + 1);
> >> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >> +                       param->type = btf_id_renumber(btf, param->type);
> >> +                       param->name_off = btf_name_off_renumber(btf, param->name_off);
> >> +                       param++;
> >> +               }
> >> +               break;
> >> +       case BTF_KIND_DATASEC:
> >> +               secinfo = (struct btf_var_secinfo *)(t + 1);
> >> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >> +                       secinfo->type = btf_id_renumber(btf, secinfo->type);
> >> +                       secinfo++;
> >> +               }
> >> +               break;
> >> +       case BTF_KIND_ENUM:
> >> +               e = (struct btf_enum *)(t + 1);
> >> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >> +                       e->name_off = btf_name_off_renumber(btf, e->name_off);
> >> +                       e++;
> >> +               }
> >> +               break;
> >> +       case BTF_KIND_ENUM64:
> >> +               e64 = (struct btf_enum64 *)(t + 1);
> >> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >> +                       e64->name_off = btf_name_off_renumber(btf, e64->name_off);
> >> +                       e64++;
> >> +               }
> >> +               break;
> >> +       }
> >> +}
> >> +
> >> +static void btf_renumber(struct btf_verifier_env *env, struct btf *base_btf)
> >> +{
> >> +       struct btf *btf = env->btf;
> >> +       int i;
> >> +
> >> +       btf->start_id = base_btf->nr_types;
> >> +       btf->start_str_off = base_btf->hdr.str_len;
> >> +
> >> +       for (i = 0; i < btf->nr_types; i++)
> >> +               btf_type_renumber(env, btf->types[i]);
> >> +}
> >> +
> >>  static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
> >>  {
> >>         struct btf_verifier_env *env = NULL;
> >>         struct bpf_verifier_log *log;
> >>         struct btf *btf = NULL, *base_btf;
> >> +       bool standalone = false;
> >>         int err;
> >>
> >>         base_btf = bpf_get_btf_vmlinux();
> >> @@ -5367,9 +5476,32 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
> >>                 goto errout;
> >>
> >>         err = btf_check_all_metas(env);
> >> +       if (err) {
> >> +               /* BTF may be standalone; in that case meta checks will
> >> +                * fail and we fall back to standalone BTF processing.
> >> +                * Later on, once we have checked all metas, we will
> >> +                * retain start id from  base BTF so it will look like
> >> +                * split BTF (but is self-contained); renumbering is done
> >> +                * also to give the split BTF-like appearance and not
> >> +                * confuse pahole which assumes split BTF for modules.
> >> +                */
> >> +               btf->base_btf = NULL;
> >> +               if (btf->types)
> >> +                       kvfree(btf->types);
> >> +               btf->types = NULL;
> >> +               btf->types_size = 0;
> >> +               btf->start_id = 0;
> >> +               btf->nr_types = 0;
> >> +               btf->start_str_off = 0;
> >> +               standalone = true;
> >> +               err = btf_check_all_metas(env);
> >> +       }
> >
> > Interesting idea!
> > Instead of failing the first time, how about we make
> > such standalone module BTFs explicit?
> > Some flag or special type?
> > Then the kernel just checks that and renumbers right away.
> >
>
> I was thinking that might be one way to do it, perhaps even
> a .BTF_standalone section name or somesuch as a signal we
> are dealing with standalone BTF. However I _think_
> we can actually determine the module BTF is standalone
> without needing to change anything in the toolchain (I
> think adding flags would require that).

Why not just extend btf_header to contains extra information where we
can record whether it is stand-alone or split, what's the checksum of
base BTF, etc, etc. Yes, we'll need to teach libbpf and some tools
about this v2 of btf_header, but it's also an opportunity to make BTF
a bit more self-describing. E.g., right now there is a pretty big
problem that when we add new BTF_KIND_XXX, no existing tooling will be
able to do anything with BTF that contains that new kind, even if that
kind is completely optional and uninteresting for most tools (e.g., if
some particular tool didn't care about DECL_TAG). So with v2 we can
record a small table that records each kind's size: extra info size
and per-element size (for types that have vlen>0).

More upfront work, but solves few existing problems and we can reserve
space for future fields as well.

WDYT?

>
> If the BTF consists of string offsets all within base BTF
> string range, and it contains a definition for a BTF_KIND_INT
> called "int", we can infer safely it is standalone BTF I think.
> The nice thing is we are guaranteed such an int definition will
> be there in every module, thanks to the module init function
> signature. These tests could be put into a btf_is_standalone()
> function, and it would avoid the need to fall back from interpreting
> as split BTF, which was messy and polluted logs. If that makes sense,
> I'll respin with that change.
>
> > And combining this with what we discussed in the other thread
> > to load vmlinux's BTF through the module...
> > We'd need two flags in BTF to address both cases.
> > One to load base vmlinux BTF from a module and
> > another to load standalone module BTF.
> > Maybe add another BTF_KIND and it will be first in all types ?
> > Or use DATASEC with a special name ?
> >
>
> I'm wondering if we can minimize impact on existing tools for
> vmlinux BTF when loaded as a module; even if the module is
> called vmlinux_btf, we could special-case how it appears
> in /sys/kernel/btf such that it is still represented as
> /sys/kernel/btf/vmlinux I suspect, rather than as its module
> name "vmlinux_btf". If only part of the core kernel BTF was
> in vmlinux_btf (CONFIG_DEBUG_INFO_BTF=y and say
> CONFIG_DEBUG_INFO_BTF_VARS=m as per [1]) the vmlinux_btf
> name would be used to for the extras.
>
> One pain point is that if vmlinux_btf was loaded after other
> modules we'd need to iterate over them to (re)-load
> their BTF, so I think CONFIG_DEBUG_INFO_BTF=m would possibly require
> CONFIG_DEBUG_INFO_BTF_MISMATCH=y, otherwise late loading of vmlinux_btf
> might block other modules loading, as their split BTF would be invalid
> without the base BTF. Even standalone modules would need that to
> get their BTF renumbering right. But that all seems doable.
>
> Alan
>
> [1] https://lore.kernel.org/bpf/20221104231103.752040-10-stephen.s.brennan@oracle.com/T/#u
Alan Maguire Nov. 22, 2022, 5:36 p.m. UTC | #4
On 09/11/2022 22:43, Andrii Nakryiko wrote:
> On Mon, Nov 7, 2022 at 8:37 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 05/11/2022 22:54, Alexei Starovoitov wrote:
>>> On Fri, Nov 4, 2022 at 8:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>
>>>> Not all kernel modules can be built in-tree when the core
>>>> kernel is built. This presents a problem for split BTF, because
>>>> split module BTF refers to type ids in the base kernel BTF, and
>>>> if that base kernel BTF changes (even in minor ways) those
>>>> references become invalid.  Such modules then cannot take
>>>> advantage of BTF (or at least they only can until the kernel
>>>> changes enough to invalidate their vmlinux type id references).
>>>> This problem has been discussed before, and the initial approach
>>>> was to allow BTF mismatch but fail to load BTF.  See [1]
>>>> for more discussion.
>>>>
>>>> Generating standalone BTF for modules helps solve this problem
>>>> because the BTF generated is self-referential only.  However,
>>>> tooling is geared towards split BTF - for example bpftool assumes
>>>> a module's BTF is defined relative to vmlinux BTF.  To handle
>>>> this, dynamic remapping of standalone BTF is done on module
>>>> load to make it appear like split BTF - type ids and string
>>>> offsets are remapped such that they appear as they would in
>>>> split BTF.  It just so happens that the BTF is self-referential.
>>>> With this approach, existing tooling works with standalone
>>>> module BTF from /sys/kernel/btf in the same way as before;
>>>> no knowledge of split versus standalone BTF is required.
>>>>
>>>> Currently, the approach taken is to assume that the BTF
>>>> associated with a module is split BTF.  If however the
>>>> checking of types fails, we fall back to interpreting it as
>>>> standalone BTF and carrying out remapping.  As discussed in [1]
>>>> there are some heuristics we could use to identify standalone
>>>> versus split module BTF, but for now the simplistic fallback
>>>> method is used.
>>>>
>>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>>>>
>>>> [1] https://lore.kernel.org/bpf/YfK18x%2FXrYL4Vw8o@syu-laptop/
>>>> ---
>>>>  kernel/bpf/btf.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 132 insertions(+)
>>>>
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index 5579ff3..5efdcaf 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -5315,11 +5315,120 @@ struct btf *btf_parse_vmlinux(void)
>>>>
>>>>  #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
>>>>
>>>> +static u32 btf_name_off_renumber(struct btf *btf, u32 name_off)
>>>> +{
>>>> +       return name_off + btf->start_str_off;
>>>> +}
>>>> +
>>>> +static u32 btf_id_renumber(struct btf *btf, u32 id)
>>>> +{
>>>> +       /* no need to renumber void */
>>>> +       if (id == 0)
>>>> +               return id;
>>>> +       return id + btf->start_id - 1;
>>>> +}
>>>> +
>>>> +/* Renumber standalone BTF to appear as split BTF; name offsets must
>>>> + * be relative to btf->start_str_offset and ids relative to btf->start_id.
>>>> + * When user sees BTF it will appear as normal module split BTF, the only
>>>> + * difference being it is fully self-referential and does not refer back
>>>> + * to vmlinux BTF (aside from 0 "void" references).
>>>> + */
>>>> +static void btf_type_renumber(struct btf_verifier_env *env, struct btf_type *t)
>>>> +{
>>>> +       struct btf_var_secinfo *secinfo;
>>>> +       struct btf *btf = env->btf;
>>>> +       struct btf_member *member;
>>>> +       struct btf_param *param;
>>>> +       struct btf_array *array;
>>>> +       struct btf_enum64 *e64;
>>>> +       struct btf_enum *e;
>>>> +       int i;
>>>> +
>>>> +       t->name_off = btf_name_off_renumber(btf, t->name_off);
>>>> +
>>>> +       switch (BTF_INFO_KIND(t->info)) {
>>>> +       case BTF_KIND_INT:
>>>> +       case BTF_KIND_FLOAT:
>>>> +       case BTF_KIND_TYPE_TAG:
>>>> +               /* nothing to renumber here, no type references */
>>>> +               break;
>>>> +       case BTF_KIND_PTR:
>>>> +       case BTF_KIND_FWD:
>>>> +       case BTF_KIND_TYPEDEF:
>>>> +       case BTF_KIND_VOLATILE:
>>>> +       case BTF_KIND_CONST:
>>>> +       case BTF_KIND_RESTRICT:
>>>> +       case BTF_KIND_FUNC:
>>>> +       case BTF_KIND_VAR:
>>>> +       case BTF_KIND_DECL_TAG:
>>>> +               /* renumber the referenced type */
>>>> +               t->type = btf_id_renumber(btf, t->type);
>>>> +               break;
>>>> +       case BTF_KIND_ARRAY:
>>>> +               array = btf_array(t);
>>>> +               array->type = btf_id_renumber(btf, array->type);
>>>> +               array->index_type = btf_id_renumber(btf, array->index_type);
>>>> +               break;
>>>> +       case BTF_KIND_STRUCT:
>>>> +       case BTF_KIND_UNION:
>>>> +               member = (struct btf_member *)(t + 1);
>>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>>>> +                       member->type = btf_id_renumber(btf, member->type);
>>>> +                       member->name_off = btf_name_off_renumber(btf, member->name_off);
>>>> +                       member++;
>>>> +               }
>>>> +               break;
>>>> +       case BTF_KIND_FUNC_PROTO:
>>>> +               param = (struct btf_param *)(t + 1);
>>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>>>> +                       param->type = btf_id_renumber(btf, param->type);
>>>> +                       param->name_off = btf_name_off_renumber(btf, param->name_off);
>>>> +                       param++;
>>>> +               }
>>>> +               break;
>>>> +       case BTF_KIND_DATASEC:
>>>> +               secinfo = (struct btf_var_secinfo *)(t + 1);
>>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>>>> +                       secinfo->type = btf_id_renumber(btf, secinfo->type);
>>>> +                       secinfo++;
>>>> +               }
>>>> +               break;
>>>> +       case BTF_KIND_ENUM:
>>>> +               e = (struct btf_enum *)(t + 1);
>>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>>>> +                       e->name_off = btf_name_off_renumber(btf, e->name_off);
>>>> +                       e++;
>>>> +               }
>>>> +               break;
>>>> +       case BTF_KIND_ENUM64:
>>>> +               e64 = (struct btf_enum64 *)(t + 1);
>>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
>>>> +                       e64->name_off = btf_name_off_renumber(btf, e64->name_off);
>>>> +                       e64++;
>>>> +               }
>>>> +               break;
>>>> +       }
>>>> +}
>>>> +
>>>> +static void btf_renumber(struct btf_verifier_env *env, struct btf *base_btf)
>>>> +{
>>>> +       struct btf *btf = env->btf;
>>>> +       int i;
>>>> +
>>>> +       btf->start_id = base_btf->nr_types;
>>>> +       btf->start_str_off = base_btf->hdr.str_len;
>>>> +
>>>> +       for (i = 0; i < btf->nr_types; i++)
>>>> +               btf_type_renumber(env, btf->types[i]);
>>>> +}
>>>> +
>>>>  static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
>>>>  {
>>>>         struct btf_verifier_env *env = NULL;
>>>>         struct bpf_verifier_log *log;
>>>>         struct btf *btf = NULL, *base_btf;
>>>> +       bool standalone = false;
>>>>         int err;
>>>>
>>>>         base_btf = bpf_get_btf_vmlinux();
>>>> @@ -5367,9 +5476,32 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
>>>>                 goto errout;
>>>>
>>>>         err = btf_check_all_metas(env);
>>>> +       if (err) {
>>>> +               /* BTF may be standalone; in that case meta checks will
>>>> +                * fail and we fall back to standalone BTF processing.
>>>> +                * Later on, once we have checked all metas, we will
>>>> +                * retain start id from  base BTF so it will look like
>>>> +                * split BTF (but is self-contained); renumbering is done
>>>> +                * also to give the split BTF-like appearance and not
>>>> +                * confuse pahole which assumes split BTF for modules.
>>>> +                */
>>>> +               btf->base_btf = NULL;
>>>> +               if (btf->types)
>>>> +                       kvfree(btf->types);
>>>> +               btf->types = NULL;
>>>> +               btf->types_size = 0;
>>>> +               btf->start_id = 0;
>>>> +               btf->nr_types = 0;
>>>> +               btf->start_str_off = 0;
>>>> +               standalone = true;
>>>> +               err = btf_check_all_metas(env);
>>>> +       }
>>>
>>> Interesting idea!
>>> Instead of failing the first time, how about we make
>>> such standalone module BTFs explicit?
>>> Some flag or special type?
>>> Then the kernel just checks that and renumbers right away.
>>>
>>
>> I was thinking that might be one way to do it, perhaps even
>> a .BTF_standalone section name or somesuch as a signal we
>> are dealing with standalone BTF. However I _think_
>> we can actually determine the module BTF is standalone
>> without needing to change anything in the toolchain (I
>> think adding flags would require that).
> 
> Why not just extend btf_header to contains extra information where we
> can record whether it is stand-alone or split, what's the checksum of
> base BTF, etc, etc. Yes, we'll need to teach libbpf and some tools
> about this v2 of btf_header, but it's also an opportunity to make BTF
> a bit more self-describing. E.g., right now there is a pretty big
> problem that when we add new BTF_KIND_XXX, no existing tooling will be
> able to do anything with BTF that contains that new kind, even if that
> kind is completely optional and uninteresting for most tools (e.g., if
> some particular tool didn't care about DECL_TAG). So with v2 we can
> record a small table that records each kind's size: extra info size
> and per-element size (for types that have vlen>0).
> 
> More upfront work, but solves few existing problems and we can reserve
> space for future fields as well.
> 
> WDYT?
>

Sorry Andrii, missed this reply. With respect to self-describing BTF,
I've got a patch series that I was planning on sending out soon which 
approaches this by describing BTF kind encodings in BTF. The handy thing
about that is as you say it allows us to parse BTF even if we don't
actually use features, so when new kinds are added we can skip past
them, but if a new libbpf comes along we can potentially unlock these
features. This is particularly valuable for kernel/module BTF since
the kernel might be around for a while and we would rather encode
BTF optimistically. The other useful thing is it won't itself require
any BTF format changes; it's simply a matter of adding a libbpf call
to pahole which says "encode BTF kinds", and that part is done using
basic BTF kinds like typedefs+structs. The benefit of doing this in 
BTF is that we don't need to worry about header incompatibility etc. 
Just adds a few hundred bytes to overall kernel BTF too, since kind 
encodings are only needed for vmlinux (or standalone) BTF. We also
end up with BTF kind structures in vmlinux.h, which could potentially
simplify BTF introspection in BPF programs in the future also.

The handy thing about this approach is also that the kernel code
that parses the BTF kind descriptions is easy to backport, so we
could even look at backporting that patch to stable kernels such
that they would be an a position to parse newer BTF kinds even
if they could not use them. Because that parsing is based around 
interpreting existing BTF kinds, it is minimally invasive.

I'll send it out as an RFC shortly to provide additional context.

Alan
Andrii Nakryiko Nov. 22, 2022, 6:12 p.m. UTC | #5
On Tue, Nov 22, 2022 at 9:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 09/11/2022 22:43, Andrii Nakryiko wrote:
> > On Mon, Nov 7, 2022 at 8:37 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> On 05/11/2022 22:54, Alexei Starovoitov wrote:
> >>> On Fri, Nov 4, 2022 at 8:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>>>
> >>>> Not all kernel modules can be built in-tree when the core
> >>>> kernel is built. This presents a problem for split BTF, because
> >>>> split module BTF refers to type ids in the base kernel BTF, and
> >>>> if that base kernel BTF changes (even in minor ways) those
> >>>> references become invalid.  Such modules then cannot take
> >>>> advantage of BTF (or at least they only can until the kernel
> >>>> changes enough to invalidate their vmlinux type id references).
> >>>> This problem has been discussed before, and the initial approach
> >>>> was to allow BTF mismatch but fail to load BTF.  See [1]
> >>>> for more discussion.
> >>>>
> >>>> Generating standalone BTF for modules helps solve this problem
> >>>> because the BTF generated is self-referential only.  However,
> >>>> tooling is geared towards split BTF - for example bpftool assumes
> >>>> a module's BTF is defined relative to vmlinux BTF.  To handle
> >>>> this, dynamic remapping of standalone BTF is done on module
> >>>> load to make it appear like split BTF - type ids and string
> >>>> offsets are remapped such that they appear as they would in
> >>>> split BTF.  It just so happens that the BTF is self-referential.
> >>>> With this approach, existing tooling works with standalone
> >>>> module BTF from /sys/kernel/btf in the same way as before;
> >>>> no knowledge of split versus standalone BTF is required.
> >>>>
> >>>> Currently, the approach taken is to assume that the BTF
> >>>> associated with a module is split BTF.  If however the
> >>>> checking of types fails, we fall back to interpreting it as
> >>>> standalone BTF and carrying out remapping.  As discussed in [1]
> >>>> there are some heuristics we could use to identify standalone
> >>>> versus split module BTF, but for now the simplistic fallback
> >>>> method is used.
> >>>>
> >>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >>>>
> >>>> [1] https://lore.kernel.org/bpf/YfK18x%2FXrYL4Vw8o@syu-laptop/
> >>>> ---
> >>>>  kernel/bpf/btf.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 132 insertions(+)
> >>>>
> >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>> index 5579ff3..5efdcaf 100644
> >>>> --- a/kernel/bpf/btf.c
> >>>> +++ b/kernel/bpf/btf.c
> >>>> @@ -5315,11 +5315,120 @@ struct btf *btf_parse_vmlinux(void)
> >>>>
> >>>>  #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
> >>>>
> >>>> +static u32 btf_name_off_renumber(struct btf *btf, u32 name_off)
> >>>> +{
> >>>> +       return name_off + btf->start_str_off;
> >>>> +}
> >>>> +
> >>>> +static u32 btf_id_renumber(struct btf *btf, u32 id)
> >>>> +{
> >>>> +       /* no need to renumber void */
> >>>> +       if (id == 0)
> >>>> +               return id;
> >>>> +       return id + btf->start_id - 1;
> >>>> +}
> >>>> +
> >>>> +/* Renumber standalone BTF to appear as split BTF; name offsets must
> >>>> + * be relative to btf->start_str_offset and ids relative to btf->start_id.
> >>>> + * When user sees BTF it will appear as normal module split BTF, the only
> >>>> + * difference being it is fully self-referential and does not refer back
> >>>> + * to vmlinux BTF (aside from 0 "void" references).
> >>>> + */
> >>>> +static void btf_type_renumber(struct btf_verifier_env *env, struct btf_type *t)
> >>>> +{
> >>>> +       struct btf_var_secinfo *secinfo;
> >>>> +       struct btf *btf = env->btf;
> >>>> +       struct btf_member *member;
> >>>> +       struct btf_param *param;
> >>>> +       struct btf_array *array;
> >>>> +       struct btf_enum64 *e64;
> >>>> +       struct btf_enum *e;
> >>>> +       int i;
> >>>> +
> >>>> +       t->name_off = btf_name_off_renumber(btf, t->name_off);
> >>>> +
> >>>> +       switch (BTF_INFO_KIND(t->info)) {
> >>>> +       case BTF_KIND_INT:
> >>>> +       case BTF_KIND_FLOAT:
> >>>> +       case BTF_KIND_TYPE_TAG:
> >>>> +               /* nothing to renumber here, no type references */
> >>>> +               break;
> >>>> +       case BTF_KIND_PTR:
> >>>> +       case BTF_KIND_FWD:
> >>>> +       case BTF_KIND_TYPEDEF:
> >>>> +       case BTF_KIND_VOLATILE:
> >>>> +       case BTF_KIND_CONST:
> >>>> +       case BTF_KIND_RESTRICT:
> >>>> +       case BTF_KIND_FUNC:
> >>>> +       case BTF_KIND_VAR:
> >>>> +       case BTF_KIND_DECL_TAG:
> >>>> +               /* renumber the referenced type */
> >>>> +               t->type = btf_id_renumber(btf, t->type);
> >>>> +               break;
> >>>> +       case BTF_KIND_ARRAY:
> >>>> +               array = btf_array(t);
> >>>> +               array->type = btf_id_renumber(btf, array->type);
> >>>> +               array->index_type = btf_id_renumber(btf, array->index_type);
> >>>> +               break;
> >>>> +       case BTF_KIND_STRUCT:
> >>>> +       case BTF_KIND_UNION:
> >>>> +               member = (struct btf_member *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       member->type = btf_id_renumber(btf, member->type);
> >>>> +                       member->name_off = btf_name_off_renumber(btf, member->name_off);
> >>>> +                       member++;
> >>>> +               }
> >>>> +               break;
> >>>> +       case BTF_KIND_FUNC_PROTO:
> >>>> +               param = (struct btf_param *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       param->type = btf_id_renumber(btf, param->type);
> >>>> +                       param->name_off = btf_name_off_renumber(btf, param->name_off);
> >>>> +                       param++;
> >>>> +               }
> >>>> +               break;
> >>>> +       case BTF_KIND_DATASEC:
> >>>> +               secinfo = (struct btf_var_secinfo *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       secinfo->type = btf_id_renumber(btf, secinfo->type);
> >>>> +                       secinfo++;
> >>>> +               }
> >>>> +               break;
> >>>> +       case BTF_KIND_ENUM:
> >>>> +               e = (struct btf_enum *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       e->name_off = btf_name_off_renumber(btf, e->name_off);
> >>>> +                       e++;
> >>>> +               }
> >>>> +               break;
> >>>> +       case BTF_KIND_ENUM64:
> >>>> +               e64 = (struct btf_enum64 *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       e64->name_off = btf_name_off_renumber(btf, e64->name_off);
> >>>> +                       e64++;
> >>>> +               }
> >>>> +               break;
> >>>> +       }
> >>>> +}
> >>>> +
> >>>> +static void btf_renumber(struct btf_verifier_env *env, struct btf *base_btf)
> >>>> +{
> >>>> +       struct btf *btf = env->btf;
> >>>> +       int i;
> >>>> +
> >>>> +       btf->start_id = base_btf->nr_types;
> >>>> +       btf->start_str_off = base_btf->hdr.str_len;
> >>>> +
> >>>> +       for (i = 0; i < btf->nr_types; i++)
> >>>> +               btf_type_renumber(env, btf->types[i]);
> >>>> +}
> >>>> +
> >>>>  static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
> >>>>  {
> >>>>         struct btf_verifier_env *env = NULL;
> >>>>         struct bpf_verifier_log *log;
> >>>>         struct btf *btf = NULL, *base_btf;
> >>>> +       bool standalone = false;
> >>>>         int err;
> >>>>
> >>>>         base_btf = bpf_get_btf_vmlinux();
> >>>> @@ -5367,9 +5476,32 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
> >>>>                 goto errout;
> >>>>
> >>>>         err = btf_check_all_metas(env);
> >>>> +       if (err) {
> >>>> +               /* BTF may be standalone; in that case meta checks will
> >>>> +                * fail and we fall back to standalone BTF processing.
> >>>> +                * Later on, once we have checked all metas, we will
> >>>> +                * retain start id from  base BTF so it will look like
> >>>> +                * split BTF (but is self-contained); renumbering is done
> >>>> +                * also to give the split BTF-like appearance and not
> >>>> +                * confuse pahole which assumes split BTF for modules.
> >>>> +                */
> >>>> +               btf->base_btf = NULL;
> >>>> +               if (btf->types)
> >>>> +                       kvfree(btf->types);
> >>>> +               btf->types = NULL;
> >>>> +               btf->types_size = 0;
> >>>> +               btf->start_id = 0;
> >>>> +               btf->nr_types = 0;
> >>>> +               btf->start_str_off = 0;
> >>>> +               standalone = true;
> >>>> +               err = btf_check_all_metas(env);
> >>>> +       }
> >>>
> >>> Interesting idea!
> >>> Instead of failing the first time, how about we make
> >>> such standalone module BTFs explicit?
> >>> Some flag or special type?
> >>> Then the kernel just checks that and renumbers right away.
> >>>
> >>
> >> I was thinking that might be one way to do it, perhaps even
> >> a .BTF_standalone section name or somesuch as a signal we
> >> are dealing with standalone BTF. However I _think_
> >> we can actually determine the module BTF is standalone
> >> without needing to change anything in the toolchain (I
> >> think adding flags would require that).
> >
> > Why not just extend btf_header to contains extra information where we
> > can record whether it is stand-alone or split, what's the checksum of
> > base BTF, etc, etc. Yes, we'll need to teach libbpf and some tools
> > about this v2 of btf_header, but it's also an opportunity to make BTF
> > a bit more self-describing. E.g., right now there is a pretty big
> > problem that when we add new BTF_KIND_XXX, no existing tooling will be
> > able to do anything with BTF that contains that new kind, even if that
> > kind is completely optional and uninteresting for most tools (e.g., if
> > some particular tool didn't care about DECL_TAG). So with v2 we can
> > record a small table that records each kind's size: extra info size
> > and per-element size (for types that have vlen>0).
> >
> > More upfront work, but solves few existing problems and we can reserve
> > space for future fields as well.
> >
> > WDYT?
> >
>
> Sorry Andrii, missed this reply. With respect to self-describing BTF,
> I've got a patch series that I was planning on sending out soon which
> approaches this by describing BTF kind encodings in BTF. The handy thing
> about that is as you say it allows us to parse BTF even if we don't
> actually use features, so when new kinds are added we can skip past
> them, but if a new libbpf comes along we can potentially unlock these
> features. This is particularly valuable for kernel/module BTF since
> the kernel might be around for a while and we would rather encode
> BTF optimistically. The other useful thing is it won't itself require
> any BTF format changes; it's simply a matter of adding a libbpf call
> to pahole which says "encode BTF kinds", and that part is done using
> basic BTF kinds like typedefs+structs. The benefit of doing this in
> BTF is that we don't need to worry about header incompatibility etc.
> Just adds a few hundred bytes to overall kernel BTF too, since kind
> encodings are only needed for vmlinux (or standalone) BTF. We also
> end up with BTF kind structures in vmlinux.h, which could potentially
> simplify BTF introspection in BPF programs in the future also.
>
> The handy thing about this approach is also that the kernel code
> that parses the BTF kind descriptions is easy to backport, so we
> could even look at backporting that patch to stable kernels such
> that they would be an a position to parse newer BTF kinds even
> if they could not use them. Because that parsing is based around
> interpreting existing BTF kinds, it is minimally invasive.
>
> I'll send it out as an RFC shortly to provide additional context.

I'm not clear how that handles kinds with multiple entities (like
fields for STRUCTs, args for FUNC_PROTO, etc), but I'll look at RFC.
I think a small table of kind ID -> prefix size + per-element size
mappings is useful for any tool that works with BTF, because it allows
you to skip unknown stuff. And it works not just for vmlinux BTF.

>
> Alan
diff mbox series

Patch

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5579ff3..5efdcaf 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5315,11 +5315,120 @@  struct btf *btf_parse_vmlinux(void)
 
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 
+static u32 btf_name_off_renumber(struct btf *btf, u32 name_off)
+{
+	return name_off + btf->start_str_off;
+}
+
+static u32 btf_id_renumber(struct btf *btf, u32 id)
+{
+	/* no need to renumber void */
+	if (id == 0)
+		return id;
+	return id + btf->start_id - 1;
+}
+
+/* Renumber standalone BTF to appear as split BTF; name offsets must
+ * be relative to btf->start_str_offset and ids relative to btf->start_id.
+ * When user sees BTF it will appear as normal module split BTF, the only
+ * difference being it is fully self-referential and does not refer back
+ * to vmlinux BTF (aside from 0 "void" references).
+ */
+static void btf_type_renumber(struct btf_verifier_env *env, struct btf_type *t)
+{
+	struct btf_var_secinfo *secinfo;
+	struct btf *btf = env->btf;
+	struct btf_member *member;
+	struct btf_param *param;
+	struct btf_array *array;
+	struct btf_enum64 *e64;
+	struct btf_enum *e;
+	int i;
+
+	t->name_off = btf_name_off_renumber(btf, t->name_off);
+
+	switch (BTF_INFO_KIND(t->info)) {
+	case BTF_KIND_INT:
+	case BTF_KIND_FLOAT:
+	case BTF_KIND_TYPE_TAG:
+		/* nothing to renumber here, no type references */
+		break;
+	case BTF_KIND_PTR:
+	case BTF_KIND_FWD:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_CONST:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_FUNC:
+	case BTF_KIND_VAR:
+	case BTF_KIND_DECL_TAG:
+		/* renumber the referenced type */
+		t->type = btf_id_renumber(btf, t->type);
+		break;
+	case BTF_KIND_ARRAY:
+		array = btf_array(t);
+		array->type = btf_id_renumber(btf, array->type);
+		array->index_type = btf_id_renumber(btf, array->index_type);
+		break;
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION:
+		member = (struct btf_member *)(t + 1);
+		for (i = 0; i < btf_type_vlen(t); i++) {
+			member->type = btf_id_renumber(btf, member->type);
+			member->name_off = btf_name_off_renumber(btf, member->name_off);
+			member++;
+		}
+		break;
+	case BTF_KIND_FUNC_PROTO:
+		param = (struct btf_param *)(t + 1);
+		for (i = 0; i < btf_type_vlen(t); i++) {
+			param->type = btf_id_renumber(btf, param->type);
+			param->name_off = btf_name_off_renumber(btf, param->name_off);
+			param++;
+		}
+		break;
+	case BTF_KIND_DATASEC:
+		secinfo = (struct btf_var_secinfo *)(t + 1);
+		for (i = 0; i < btf_type_vlen(t); i++) {
+			secinfo->type = btf_id_renumber(btf, secinfo->type);
+			secinfo++;
+		}
+		break;
+	case BTF_KIND_ENUM:
+		e = (struct btf_enum *)(t + 1);
+		for (i = 0; i < btf_type_vlen(t); i++) {
+			e->name_off = btf_name_off_renumber(btf, e->name_off);
+			e++;
+		}
+		break;
+	case BTF_KIND_ENUM64:
+		e64 = (struct btf_enum64 *)(t + 1);
+		for (i = 0; i < btf_type_vlen(t); i++) {
+			e64->name_off = btf_name_off_renumber(btf, e64->name_off);
+			e64++;
+		}
+		break;
+	}
+}
+
+static void btf_renumber(struct btf_verifier_env *env, struct btf *base_btf)
+{
+	struct btf *btf = env->btf;
+	int i;
+
+	btf->start_id = base_btf->nr_types;
+	btf->start_str_off = base_btf->hdr.str_len;
+
+	for (i = 0; i < btf->nr_types; i++)
+		btf_type_renumber(env, btf->types[i]);
+}
+
 static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
 {
 	struct btf_verifier_env *env = NULL;
 	struct bpf_verifier_log *log;
 	struct btf *btf = NULL, *base_btf;
+	bool standalone = false;
 	int err;
 
 	base_btf = bpf_get_btf_vmlinux();
@@ -5367,9 +5476,32 @@  static struct btf *btf_parse_module(const char *module_name, const void *data, u
 		goto errout;
 
 	err = btf_check_all_metas(env);
+	if (err) {
+		/* BTF may be standalone; in that case meta checks will
+		 * fail and we fall back to standalone BTF processing.
+		 * Later on, once we have checked all metas, we will
+		 * retain start id from  base BTF so it will look like
+		 * split BTF (but is self-contained); renumbering is done
+		 * also to give the split BTF-like appearance and not
+		 * confuse pahole which assumes split BTF for modules.
+		 */
+		btf->base_btf = NULL;
+		if (btf->types)
+			kvfree(btf->types);
+		btf->types = NULL;
+		btf->types_size = 0;
+		btf->start_id = 0;
+		btf->nr_types = 0;
+		btf->start_str_off = 0;
+		standalone = true;
+		err = btf_check_all_metas(env);
+	}
 	if (err)
 		goto errout;
 
+	if (standalone)
+		btf_renumber(env, base_btf);
+
 	err = btf_check_type_tags(env, btf, btf_nr_types(base_btf));
 	if (err)
 		goto errout;