[v4,bpf-next,00/17] Add kind layout, CRCs to BTF

Message ID	20231112124834.388735-1-alan.maguire@oracle.com (mailing list archive)
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A080648 for <bpf@vger.kernel.org>; Sun, 12 Nov 2023 12:49:10 +0000 (UTC) From: Alan Maguire <alan.maguire@oracle.com> To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, jolsa@kernel.org Cc: quentin@isovalent.com, eddyz87@gmail.com, martin.lau@linux.dev, song@kernel.org, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, masahiroy@kernel.org, bpf@vger.kernel.org, Alan Maguire <alan.maguire@oracle.com> Subject: [PATCH v4 bpf-next 00/17] Add kind layout, CRCs to BTF Date: Sun, 12 Nov 2023 12:48:17 +0000 Message-Id: <20231112124834.388735-1-alan.maguire@oracle.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Add kind layout, CRCs to BTF \| expand [v4,bpf-next,00/17] Add kind layout, CRCs to BTF [v4,bpf-next,01/17] btf: add kind layout encoding, crcs to UAPI [v4,bpf-next,02/17] libbpf: support kind layout section handling in BTF [v4,bpf-next,03/17] libbpf: use kind layout to compute an unknown kind size [v4,bpf-next,04/17] libbpf: add kind layout encoding, crc support [v4,bpf-next,05/17] libbpf: BTF validation can use kind layout for unknown kinds [v4,bpf-next,06/17] btf: support kernel parsing of BTF with kind layout [v4,bpf-next,07/17] bpf: add BTF CRC verification where present [v4,bpf-next,08/17] bpf: verify base BTF CRC to ensure it matches module BTF [v4,bpf-next,09/17] kbuild, bpf: switch to --btf_features, add crc,kind_layout features [v4,bpf-next,10/17] bpftool: add BTF dump "format meta" to dump header/metadata [v4,bpf-next,11/17] bpftool: update doc to describe bpftool btf dump .. format meta [v4,bpf-next,12/17] selftests/bpf: test kind encoding/decoding [v4,bpf-next,13/17] bpf: support standalone BTF in modules [v4,bpf-next,14/17] kbuild, bpf: allow opt-out from using split BTF for modules [v4,bpf-next,15/17] selftests/bpf: generalize module load to support specifying a module name [v4,bpf-next,16/17] selftests/bpf: build separate bpf_testmod module with standalone BTF [v4,bpf-next,17/17] selftests/bpf: update btf_module test to ensure standalone BTF works

Message ID

20231112124834.388735-1-alan.maguire@oracle.com (mailing list archive)

Headers

From: Alan Maguire <alan.maguire@oracle.com>
To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, jolsa@kernel.org
Cc: quentin@isovalent.com, eddyz87@gmail.com, martin.lau@linux.dev,
        song@kernel.org, yonghong.song@linux.dev, john.fastabend@gmail.com,
        kpsingh@kernel.org, sdf@google.com, haoluo@google.com,
        masahiroy@kernel.org, bpf@vger.kernel.org,
        Alan Maguire <alan.maguire@oracle.com>
Subject: [PATCH v4 bpf-next 00/17] Add kind layout, CRCs to BTF
Date: Sun, 12 Nov 2023 12:48:17 +0000
Message-Id: <20231112124834.388735-1-alan.maguire@oracle.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

Add kind layout, CRCs to BTF | expand

Message

Alan Maguire Nov. 12, 2023, 12:48 p.m. UTC

Update struct btf_header to add a new "kind_layout" section containing
a description of how to parse the BTF kinds known about at BTF
encoding time.  This provides the opportunity for tools that might
not know all of these kinds - as is the case when older tools run
on more newly-generated BTF - to still parse the BTF provided,
even if it cannot all be used.

Also add CRCs for the BTF and base BTF (if needed) from which it was
created.  CRCs provide a few useful features:

- the base CRC allows us to explicitly identify when the split and
  base BTF are not matched
- absence of a base BTF CRC can indicate that BTF is standalone;
  i.e. not defined relative to base BTF

The former case can be used to explicitly reject mismatched
module/kernel BTF rather than assuming it is matched until an
unexpected type is encountered.

The latter case is useful for modules that are not built as
frequently as the kernel; in such cases, the module can be built
standalone by specifying an empty BTF base:

 make BTF_BASE= M=path/2/module

If CRCs are not present (as will be the case for pahole versions
prior to the proposed v1.26 which will support CRC generation),
standalone BTF can still be identified by a slower fallback
method of examining BTF type ids to ensure that BTF is
self-referential only.

To ensure existing tooling can handle standalone BTF for kernel
modules,  we remap the type ids to start after the vmlinux
BTF ids, to make it appear to be split BTF.  This allows tools
(and the kernel) that assume split BTF for modules to operate normally.

Also add support to bpftool to dump metadata about BTF; its size,
header information and kind layout section.

The ideas here were discussed at [1], with further discussion
at [2].

A  patch for pahole [3] will enable the CRC/kind layout
addition to BTF generation, but the kernel can still be built
and tests added will still pass; it will just be the case that
we fall back to the slowpath of standalone module identification
in the absence of CRCs in the standalone module.

Note that for additional context I will be discussing this work
along with some other issues around evolving BTF at Linux
Plumbers next week; see [4] where slides will be added shortly.

Changes since v3 [5]:

- fixed mismerge issues with kbuild changes for BTF generation
  (patches 9, 14)
- fixed a few small issues in libbpf with kind layout representation
  (patches 2, 4)

Changes since v2 [6]:

- drop "optional" kind flag (Andrii, patch 1)
- allocate "struct btf_header" for struct btf to ensure
  we can always access new fields (Andrii, patch 2)
- use an internal BTF kind array in btf.c to simplify
  kind encoding (Andrii, patch 2)
- drop use of kind layout information for in-kernel parsing,
  since the kernel needs to be strict in what it accepts
  (Andrii, patch 6)
- added CRC verification for BTF objects and for matching
  with base object (Alexei, patches 7,8)
- fixed bpftool json output (Quentin, patch 10)
- added standalone module BTF support, tests (patches 13-17)

Changes since RFC

- Terminology change from meta -> kind_layout
  (Alexei and Andrii)
- Simplify representation, removing meta header
  and just having kind layout section (Alexei)
- Fixed bpftool to have JSON support, support
  prefix match, documented changes (Quentin)
- Separated metadata opts into add_kind_layout
  and add_crc
- Added additional positive/negative tests
  to cover basic unknown kind, one with an
  info_sz object following it and one with
  N elem_sz elements following it.
- Updated pahole-flags to use help output
  rather than version to see if features
  are present


[1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
[2] https://lore.kernel.org/bpf/20230531201936.1992188-1-alan.maguire@oracle.com/
[3] https://lore.kernel.org/bpf/20231110111533.64608-1-alan.maguire@oracle.com/
[4] https://lpc.events/event/17/contributions/1576/
[5] https://lore.kernel.org/bpf/20231110110304.63910-1-alan.maguire@oracle.com/
[6] https://lore.kernel.org/bpf/20230616171728.530116-1-alan.maguire@oracle.com/


Alan Maguire (17):
  btf: add kind layout encoding, crcs to UAPI
  libbpf: support kind layout section handling in BTF
  libbpf: use kind layout to compute an unknown kind size
  libbpf: add kind layout encoding, crc support
  libbpf: BTF validation can use kind layout for unknown kinds
  btf: support kernel parsing of BTF with kind layout
  bpf: add BTF CRC verification where present
  bpf: verify base BTF CRC to ensure it matches module BTF
  kbuild, bpf: switch to --btf_features, add crc,kind_layout features
  bpftool: add BTF dump "format meta" to dump header/metadata
  bpftool: update doc to describe bpftool btf dump .. format meta
  selftests/bpf: test kind encoding/decoding
  bpf: support standalone BTF in modules
  kbuild, bpf: allow opt-out from using split BTF for modules
  selftests/bpf: generalize module load to support specifying a module
    name
  selftests/bpf: build separate bpf_testmod module with standalone BTF
  selftests/bpf: update btf_module test to ensure standalone BTF works

 include/uapi/linux/btf.h                      |  18 +
 kernel/bpf/btf.c                              | 435 +++++++++++++-
 scripts/Makefile.btf                          |   5 +
 scripts/Makefile.modfinal                     |   4 +-
 .../bpf/bpftool/Documentation/bpftool-btf.rst |  30 +-
 tools/bpf/bpftool/bash-completion/bpftool     |   2 +-
 tools/bpf/bpftool/btf.c                       |  91 ++-
 tools/include/uapi/linux/btf.h                |  18 +
 tools/lib/bpf/btf.c                           | 341 ++++++++---
 tools/lib/bpf/btf.h                           |  11 +
 tools/lib/bpf/libbpf.map                      |   1 +
 tools/testing/selftests/bpf/Makefile          |   8 +-
 .../selftests/bpf/bpf_testmod/Makefile        |  10 +-
 .../bpf_testmod_standalone-events.h           |  57 ++
 .../bpf/bpf_testmod/bpf_testmod_standalone.c  | 551 ++++++++++++++++++
 .../bpf/bpf_testmod/bpf_testmod_standalone.h  |  31 +
 .../bpf_testmod_standalone_kfunc.h            | 109 ++++
 .../selftests/bpf/prog_tests/bpf_mod_race.c   |   8 +-
 .../selftests/bpf/prog_tests/btf_kind.c       | 176 ++++++
 .../selftests/bpf/prog_tests/btf_module.c     |  19 +-
 .../selftests/bpf/prog_tests/module_attach.c  |   6 +-
 tools/testing/selftests/bpf/test_progs.c      |   6 +-
 tools/testing/selftests/bpf/test_verifier.c   |   6 +-
 tools/testing/selftests/bpf/testing_helpers.c |  24 +-
 tools/testing/selftests/bpf/testing_helpers.h |   4 +-
 25 files changed, 1832 insertions(+), 139 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_standalone-events.h
 create mode 100644 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_standalone.c
 create mode 100644 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_standalone.h
 create mode 100644 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_standalone_kfunc.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_kind.c

Comments

Alan Maguire Nov. 14, 2023, 8:16 p.m. UTC | #1

On 12/11/2023 12:48, Alan Maguire wrote:
> Update struct btf_header to add a new "kind_layout" section containing
> a description of how to parse the BTF kinds known about at BTF
> encoding time.  This provides the opportunity for tools that might
> not know all of these kinds - as is the case when older tools run
> on more newly-generated BTF - to still parse the BTF provided,
> even if it cannot all be used.
> 
> Also add CRCs for the BTF and base BTF (if needed) from which it was
> created.  CRCs provide a few useful features:
> 
> - the base CRC allows us to explicitly identify when the split and
>   base BTF are not matched
> - absence of a base BTF CRC can indicate that BTF is standalone;
>   i.e. not defined relative to base BTF
> 
> The former case can be used to explicitly reject mismatched
> module/kernel BTF rather than assuming it is matched until an
> unexpected type is encountered.
> 
> The latter case is useful for modules that are not built as
> frequently as the kernel; in such cases, the module can be built
> standalone by specifying an empty BTF base:
> 
>  make BTF_BASE= M=path/2/module
> 
> If CRCs are not present (as will be the case for pahole versions
> prior to the proposed v1.26 which will support CRC generation),
> standalone BTF can still be identified by a slower fallback
> method of examining BTF type ids to ensure that BTF is
> self-referential only.
> 
> To ensure existing tooling can handle standalone BTF for kernel
> modules,  we remap the type ids to start after the vmlinux
> BTF ids, to make it appear to be split BTF.  This allows tools
> (and the kernel) that assume split BTF for modules to operate normally.
> 

hi folks

I wanted to capture feedback received on the approach described here for
BTF module generation at my talk at LPC [1].

Stepping back, the aim is to provide a way to generate BTF for a module
such that it is somewhat resilient to minor changes in underlying BTF,
so it does not have to be rebuilt every time vmlinux is built.  The
module references to vmlinux BTF ids are currently very brittle, and
even for the same kernel we get different vmlinux BTF ids if the BTF is
rebuilt.  So the aim is to support a more robust method of module BTF
generation.  Note that the approach described here is not needed for
modules that are built at the same time as the kernel, so it's unlikely
any in-tree modules will need this, but it will be useful for cases such
as where modules are delivered via a package and want to make use
of BTF such that it will not be invalidated.

Turning to the talk, the general consensus - I think - was that the
standalone BTF approach described in this series was problematic.
Consider kfuncs, if we have, for example, our own definition of a
structure in  standalone module BTF, the BTF id of the local structure
will not match that of the core kernel, which has the potential to
confuse the verifier.

A similar problem exists for tracing; we would trace an sk_buff in
the module via the module's view of struct sk_buff, but we have no
guarantees that the module's view is still consistent with the vmlinux
representation (which actually allocated it).

Hopefully I've characterized this correctly; let me know if I missed
something here.

So we need some means to both remap BTF ids in the module BTF that refer
to the vmlinux BTF so they point at the right types, _and_ to check the
consistency of the representation of a vmlinux type between module BTF
build time and when it is loaded into the kernel.

With this in mind, I think a good way forward might be something like
the following:

For cases where we want more change-independent module BTF - which
is resilient to things like reshuffling of vmlinux BTF ids, and small
changes that don't invalidate structure use completely - we add
a "relocatable" option to the --btf_features list of features for pahole
encoding of module BTF.

This option would not be needed for modules built at the same time as
the kernel, since the BTF ids and the types they refer to are consistent.

When used however, it would tell BTF dedup in pahole to add reocation
information as well as generating usual split BTF at the time of module
BTF generation. This relocation information would consist of
descriptions of the BTF types that the module refers to in base BTF and
their dependents. By providing such descriptions, we can then reconcile
the views of types between module and kernel, or if such reconciliation
is impossible, we can refuse to use the BTF. The amount of information
needed for a module will need to be determined, but I'm hopeful in most
cases it would be a small subset of the type information
required for vmlinux as a whole.

The process of reconciling module and vmlinux BTF at module load time
would then be

1. Remap all the split BTF ids representing module-specific types
   and functions to start at last_vmlinux_id + 1. Since the current
   vmlinux may have a different number of types than the vmlinux
   at time of encoding, this remapping is necessary.

2. For each vmlinux type in our list of relocations, check its
   compatibility with the associated vmlinux type.  This is
   somewhat akin to the CO-RE compatibility checks.  Exact rules
   would need to be ironed out, but a somewhat loose approach
   would be ideal such that a few minor changes in a struct
   somewhere do not totally invalidate module BTF. Unlike CO-RE
   though, field offset changes are _not_ good since they imply the
   module has an incorrect view of the structure and might
   start using fields incorrectly.

   Note that this is a bit easier than BTF deduplication, because
   the deduplication process that happened at module encoding time
   has already done the dependency checking for us; we just need
   to do a type-by-type, 1-to-1 comparison between our relocation
   types and current vmlinux types.

3. If all types are consistent, BTF is loaded and we remap the
   module's vmlinux BTF id references to the corresponding
   vmlinux BTF ids of the current vmlinux.

I _think_ this gets us what we want; more resilient module BTF,
but with safety checks to ensure compatible representations.
There were some suggestions of using a hashing method, but I think
such a method presupposes we want exact type matches, which I suspect
would be unlikely to be useful in practice as with most stable-based
distros, small changes in types can be made due to fixes etc.

There were also a suggestion of doing a full dedup, but I think the
consensus in the room (which I agree with) is that would be hard
to do in-kernel.  So the above approach is a compropmise I think;
it gets actual dedup at BTF creation time to create the list of
references and dependents, and we later check them one-by-one on module
load for compatibility.

Anyway I just wanted to try and capture the feedback received, and
lay out a possible direction. Any further thoughts or suggestions
would be much appreciated. Thanks!

Alan

[1] https://lpc.events/event/17/contributions/1576/

Andrii Nakryiko Nov. 21, 2023, 7:44 p.m. UTC | #2

On Tue, Nov 14, 2023 at 12:20 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 12/11/2023 12:48, Alan Maguire wrote:
> > Update struct btf_header to add a new "kind_layout" section containing
> > a description of how to parse the BTF kinds known about at BTF
> > encoding time.  This provides the opportunity for tools that might
> > not know all of these kinds - as is the case when older tools run
> > on more newly-generated BTF - to still parse the BTF provided,
> > even if it cannot all be used.
> >
> > Also add CRCs for the BTF and base BTF (if needed) from which it was
> > created.  CRCs provide a few useful features:
> >
> > - the base CRC allows us to explicitly identify when the split and
> >   base BTF are not matched
> > - absence of a base BTF CRC can indicate that BTF is standalone;
> >   i.e. not defined relative to base BTF
> >
> > The former case can be used to explicitly reject mismatched
> > module/kernel BTF rather than assuming it is matched until an
> > unexpected type is encountered.
> >
> > The latter case is useful for modules that are not built as
> > frequently as the kernel; in such cases, the module can be built
> > standalone by specifying an empty BTF base:
> >
> >  make BTF_BASE= M=path/2/module
> >
> > If CRCs are not present (as will be the case for pahole versions
> > prior to the proposed v1.26 which will support CRC generation),
> > standalone BTF can still be identified by a slower fallback
> > method of examining BTF type ids to ensure that BTF is
> > self-referential only.
> >
> > To ensure existing tooling can handle standalone BTF for kernel
> > modules,  we remap the type ids to start after the vmlinux
> > BTF ids, to make it appear to be split BTF.  This allows tools
> > (and the kernel) that assume split BTF for modules to operate normally.
> >
>
> hi folks
>
> I wanted to capture feedback received on the approach described here for
> BTF module generation at my talk at LPC [1].
>
> Stepping back, the aim is to provide a way to generate BTF for a module
> such that it is somewhat resilient to minor changes in underlying BTF,
> so it does not have to be rebuilt every time vmlinux is built.  The
> module references to vmlinux BTF ids are currently very brittle, and
> even for the same kernel we get different vmlinux BTF ids if the BTF is
> rebuilt.  So the aim is to support a more robust method of module BTF
> generation.  Note that the approach described here is not needed for
> modules that are built at the same time as the kernel, so it's unlikely
> any in-tree modules will need this, but it will be useful for cases such
> as where modules are delivered via a package and want to make use
> of BTF such that it will not be invalidated.
>
> Turning to the talk, the general consensus - I think - was that the
> standalone BTF approach described in this series was problematic.
> Consider kfuncs, if we have, for example, our own definition of a
> structure in  standalone module BTF, the BTF id of the local structure
> will not match that of the core kernel, which has the potential to
> confuse the verifier.
>
> A similar problem exists for tracing; we would trace an sk_buff in
> the module via the module's view of struct sk_buff, but we have no
> guarantees that the module's view is still consistent with the vmlinux
> representation (which actually allocated it).
>
> Hopefully I've characterized this correctly; let me know if I missed
> something here.

Correct.

>
> So we need some means to both remap BTF ids in the module BTF that refer
> to the vmlinux BTF so they point at the right types, _and_ to check the
> consistency of the representation of a vmlinux type between module BTF
> build time and when it is loaded into the kernel.
>
> With this in mind, I think a good way forward might be something like
> the following:
>
> For cases where we want more change-independent module BTF - which
> is resilient to things like reshuffling of vmlinux BTF ids, and small
> changes that don't invalidate structure use completely - we add
> a "relocatable" option to the --btf_features list of features for pahole
> encoding of module BTF.
>
> This option would not be needed for modules built at the same time as
> the kernel, since the BTF ids and the types they refer to are consistent.
>
> When used however, it would tell BTF dedup in pahole to add reocation
> information as well as generating usual split BTF at the time of module
> BTF generation. This relocation information would consist of
> descriptions of the BTF types that the module refers to in base BTF and
> their dependents. By providing such descriptions, we can then reconcile
> the views of types between module and kernel, or if such reconciliation
> is impossible, we can refuse to use the BTF. The amount of information
> needed for a module will need to be determined, but I'm hopeful in most
> cases it would be a small subset of the type information
> required for vmlinux as a whole.
>
> The process of reconciling module and vmlinux BTF at module load time
> would then be
>
> 1. Remap all the split BTF ids representing module-specific types
>    and functions to start at last_vmlinux_id + 1. Since the current
>    vmlinux may have a different number of types than the vmlinux
>    at time of encoding, this remapping is necessary.

Correct.

>
> 2. For each vmlinux type in our list of relocations, check its
>    compatibility with the associated vmlinux type.  This is
>    somewhat akin to the CO-RE compatibility checks.  Exact rules

Not really. CO-RE compatiblity is explicitly very permissive, while
here we want to make sure that types are actually memory
layout-compatible.

>    would need to be ironed out, but a somewhat loose approach
>    would be ideal such that a few minor changes in a struct
>    somewhere do not totally invalidate module BTF. Unlike CO-RE
>    though, field offset changes are _not_ good since they imply the
>    module has an incorrect view of the structure and might
>    start using fields incorrectly.

I think vmlinux type should have at least all the members that module
expects, at the same offset, with the same size. Maybe we should allow
vmlinux type to get some types at the end, not sure. How hard a
requirement it is to accommodate non-exact type matches between
vmlinux and kernel module's types?

>
>    Note that this is a bit easier than BTF deduplication, because
>    the deduplication process that happened at module encoding time
>    has already done the dependency checking for us; we just need
>    to do a type-by-type, 1-to-1 comparison between our relocation
>    types and current vmlinux types.
>
> 3. If all types are consistent, BTF is loaded and we remap the
>    module's vmlinux BTF id references to the corresponding
>    vmlinux BTF ids of the current vmlinux.

Note that we might need to do something special for anonymous types
(modifiers, anon enums and structs/unions). Otherwise it's not clear
how to even map them between vmlinux BTF and module BTF.

>
> I _think_ this gets us what we want; more resilient module BTF,
> but with safety checks to ensure compatible representations.
> There were some suggestions of using a hashing method, but I think
> such a method presupposes we want exact type matches, which I suspect
> would be unlikely to be useful in practice as with most stable-based
> distros, small changes in types can be made due to fixes etc.

What are "small changes" and how are they automatically determined and
validated?

>
> There were also a suggestion of doing a full dedup, but I think the
> consensus in the room (which I agree with) is that would be hard
> to do in-kernel.  So the above approach is a compropmise I think;
> it gets actual dedup at BTF creation time to create the list of
> references and dependents, and we later check them one-by-one on module
> load for compatibility.
>
> Anyway I just wanted to try and capture the feedback received, and
> lay out a possible direction. Any further thoughts or suggestions
> would be much appreciated. Thanks!
>
> Alan
>
> [1] https://lpc.events/event/17/contributions/1576/

Alan Maguire Nov. 22, 2023, 5 p.m. UTC | #3

On 21/11/2023 19:44, Andrii Nakryiko wrote:
> On Tue, Nov 14, 2023 at 12:20 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> hi folks
>>
>> I wanted to capture feedback received on the approach described here for
>> BTF module generation at my talk at LPC [1].
>>
>> Stepping back, the aim is to provide a way to generate BTF for a module
>> such that it is somewhat resilient to minor changes in underlying BTF,
>> so it does not have to be rebuilt every time vmlinux is built.  The
>> module references to vmlinux BTF ids are currently very brittle, and
>> even for the same kernel we get different vmlinux BTF ids if the BTF is
>> rebuilt.  So the aim is to support a more robust method of module BTF
>> generation.  Note that the approach described here is not needed for
>> modules that are built at the same time as the kernel, so it's unlikely
>> any in-tree modules will need this, but it will be useful for cases such
>> as where modules are delivered via a package and want to make use
>> of BTF such that it will not be invalidated.
>>
>> Turning to the talk, the general consensus - I think - was that the
>> standalone BTF approach described in this series was problematic.
>> Consider kfuncs, if we have, for example, our own definition of a
>> structure in  standalone module BTF, the BTF id of the local structure
>> will not match that of the core kernel, which has the potential to
>> confuse the verifier.
>>
>> A similar problem exists for tracing; we would trace an sk_buff in
>> the module via the module's view of struct sk_buff, but we have no
>> guarantees that the module's view is still consistent with the vmlinux
>> representation (which actually allocated it).
>>
>> Hopefully I've characterized this correctly; let me know if I missed
>> something here.
> 
> Correct.
> 
>>
>> So we need some means to both remap BTF ids in the module BTF that refer
>> to the vmlinux BTF so they point at the right types, _and_ to check the
>> consistency of the representation of a vmlinux type between module BTF
>> build time and when it is loaded into the kernel.
>>
>> With this in mind, I think a good way forward might be something like
>> the following:
>>
>> For cases where we want more change-independent module BTF - which
>> is resilient to things like reshuffling of vmlinux BTF ids, and small
>> changes that don't invalidate structure use completely - we add
>> a "relocatable" option to the --btf_features list of features for pahole
>> encoding of module BTF.
>>
>> This option would not be needed for modules built at the same time as
>> the kernel, since the BTF ids and the types they refer to are consistent.
>>
>> When used however, it would tell BTF dedup in pahole to add reocation
>> information as well as generating usual split BTF at the time of module
>> BTF generation. This relocation information would consist of
>> descriptions of the BTF types that the module refers to in base BTF and
>> their dependents. By providing such descriptions, we can then reconcile
>> the views of types between module and kernel, or if such reconciliation
>> is impossible, we can refuse to use the BTF. The amount of information
>> needed for a module will need to be determined, but I'm hopeful in most
>> cases it would be a small subset of the type information
>> required for vmlinux as a whole.
>>
>> The process of reconciling module and vmlinux BTF at module load time
>> would then be
>>
>> 1. Remap all the split BTF ids representing module-specific types
>>    and functions to start at last_vmlinux_id + 1. Since the current
>>    vmlinux may have a different number of types than the vmlinux
>>    at time of encoding, this remapping is necessary.
> 
> Correct.
> 
>>
>> 2. For each vmlinux type in our list of relocations, check its
>>    compatibility with the associated vmlinux type.  This is
>>    somewhat akin to the CO-RE compatibility checks.  Exact rules
> 
> Not really. CO-RE compatiblity is explicitly very permissive, while
> here we want to make sure that types are actually memory
> layout-compatible.
> 
>>    would need to be ironed out, but a somewhat loose approach
>>    would be ideal such that a few minor changes in a struct
>>    somewhere do not totally invalidate module BTF. Unlike CO-RE
>>    though, field offset changes are _not_ good since they imply the
>>    module has an incorrect view of the structure and might
>>    start using fields incorrectly.
> 
> I think vmlinux type should have at least all the members that module
> expects, at the same offset, with the same size. Maybe we should allow
> vmlinux type to get some types at the end, not sure. How hard a
> requirement it is to accommodate non-exact type matches between
> vmlinux and kernel module's types?
> 

The main need is to support resilience in the face of small structure
changes such that the compiled module will still work. When backporting
fixes to a stable-based kernel - where a version of say 5.15 stable is
supported for a while and so accumulates stable fixes - often the
approach used is to use holes in structures for new fields, or if the
structure is not embedded in any module-specific structures, add fields
at the end. All existing field offsets should match. In taking that
approach, the aim is to make sure data accesses in the module are still
valid - memory layout compatibility is the goal.

>>
>>    Note that this is a bit easier than BTF deduplication, because
>>    the deduplication process that happened at module encoding time
>>    has already done the dependency checking for us; we just need
>>    to do a type-by-type, 1-to-1 comparison between our relocation
>>    types and current vmlinux types.
>>
>> 3. If all types are consistent, BTF is loaded and we remap the
>>    module's vmlinux BTF id references to the corresponding
>>    vmlinux BTF ids of the current vmlinux.
> 
> Note that we might need to do something special for anonymous types
> (modifiers, anon enums and structs/unions). Otherwise it's not clear
> how to even map them between vmlinux BTF and module BTF.
>

Good point, we'd probably need to represent some sort of parent-child
relationship to handle cases like this.

>>
>> I _think_ this gets us what we want; more resilient module BTF,
>> but with safety checks to ensure compatible representations.
>> There were some suggestions of using a hashing method, but I think
>> such a method presupposes we want exact type matches, which I suspect
>> would be unlikely to be useful in practice as with most stable-based
>> distros, small changes in types can be made due to fixes etc.
> 
> What are "small changes" and how are they automatically determined and
> validated?
> 

See above, field additions in data structure holes or appended to
structs for the most part. Once I have something rough working
I'll see how it performs in practice and report back. Thanks!

Alan


>>
>> There were also a suggestion of doing a full dedup, but I think the
>> consensus in the room (which I agree with) is that would be hard
>> to do in-kernel.  So the above approach is a compropmise I think;
>> it gets actual dedup at BTF creation time to create the list of
>> references and dependents, and we later check them one-by-one on module
>> load for compatibility.
>>
>> Anyway I just wanted to try and capture the feedback received, and
>> lay out a possible direction. Any further thoughts or suggestions
>> would be much appreciated. Thanks!
>>
>> Alan
>>
>> [1] https://lpc.events/event/17/contributions/1576/

Andrii Nakryiko Nov. 22, 2023, 5:42 p.m. UTC | #4

On Wed, Nov 22, 2023 at 9:00 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 21/11/2023 19:44, Andrii Nakryiko wrote:
> > On Tue, Nov 14, 2023 at 12:20 PM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> hi folks
> >>
> >> I wanted to capture feedback received on the approach described here for
> >> BTF module generation at my talk at LPC [1].
> >>
> >> Stepping back, the aim is to provide a way to generate BTF for a module
> >> such that it is somewhat resilient to minor changes in underlying BTF,
> >> so it does not have to be rebuilt every time vmlinux is built.  The
> >> module references to vmlinux BTF ids are currently very brittle, and
> >> even for the same kernel we get different vmlinux BTF ids if the BTF is
> >> rebuilt.  So the aim is to support a more robust method of module BTF
> >> generation.  Note that the approach described here is not needed for
> >> modules that are built at the same time as the kernel, so it's unlikely
> >> any in-tree modules will need this, but it will be useful for cases such
> >> as where modules are delivered via a package and want to make use
> >> of BTF such that it will not be invalidated.
> >>
> >> Turning to the talk, the general consensus - I think - was that the
> >> standalone BTF approach described in this series was problematic.
> >> Consider kfuncs, if we have, for example, our own definition of a
> >> structure in  standalone module BTF, the BTF id of the local structure
> >> will not match that of the core kernel, which has the potential to
> >> confuse the verifier.
> >>
> >> A similar problem exists for tracing; we would trace an sk_buff in
> >> the module via the module's view of struct sk_buff, but we have no
> >> guarantees that the module's view is still consistent with the vmlinux
> >> representation (which actually allocated it).
> >>
> >> Hopefully I've characterized this correctly; let me know if I missed
> >> something here.
> >
> > Correct.
> >
> >>
> >> So we need some means to both remap BTF ids in the module BTF that refer
> >> to the vmlinux BTF so they point at the right types, _and_ to check the
> >> consistency of the representation of a vmlinux type between module BTF
> >> build time and when it is loaded into the kernel.
> >>
> >> With this in mind, I think a good way forward might be something like
> >> the following:
> >>
> >> For cases where we want more change-independent module BTF - which
> >> is resilient to things like reshuffling of vmlinux BTF ids, and small
> >> changes that don't invalidate structure use completely - we add
> >> a "relocatable" option to the --btf_features list of features for pahole
> >> encoding of module BTF.
> >>
> >> This option would not be needed for modules built at the same time as
> >> the kernel, since the BTF ids and the types they refer to are consistent.
> >>
> >> When used however, it would tell BTF dedup in pahole to add reocation
> >> information as well as generating usual split BTF at the time of module
> >> BTF generation. This relocation information would consist of
> >> descriptions of the BTF types that the module refers to in base BTF and
> >> their dependents. By providing such descriptions, we can then reconcile
> >> the views of types between module and kernel, or if such reconciliation
> >> is impossible, we can refuse to use the BTF. The amount of information
> >> needed for a module will need to be determined, but I'm hopeful in most
> >> cases it would be a small subset of the type information
> >> required for vmlinux as a whole.
> >>
> >> The process of reconciling module and vmlinux BTF at module load time
> >> would then be
> >>
> >> 1. Remap all the split BTF ids representing module-specific types
> >>    and functions to start at last_vmlinux_id + 1. Since the current
> >>    vmlinux may have a different number of types than the vmlinux
> >>    at time of encoding, this remapping is necessary.
> >
> > Correct.
> >
> >>
> >> 2. For each vmlinux type in our list of relocations, check its
> >>    compatibility with the associated vmlinux type.  This is
> >>    somewhat akin to the CO-RE compatibility checks.  Exact rules
> >
> > Not really. CO-RE compatiblity is explicitly very permissive, while
> > here we want to make sure that types are actually memory
> > layout-compatible.
> >
> >>    would need to be ironed out, but a somewhat loose approach
> >>    would be ideal such that a few minor changes in a struct
> >>    somewhere do not totally invalidate module BTF. Unlike CO-RE
> >>    though, field offset changes are _not_ good since they imply the
> >>    module has an incorrect view of the structure and might
> >>    start using fields incorrectly.
> >
> > I think vmlinux type should have at least all the members that module
> > expects, at the same offset, with the same size. Maybe we should allow
> > vmlinux type to get some types at the end, not sure. How hard a
> > requirement it is to accommodate non-exact type matches between
> > vmlinux and kernel module's types?
> >
>
> The main need is to support resilience in the face of small structure
> changes such that the compiled module will still work. When backporting
> fixes to a stable-based kernel - where a version of say 5.15 stable is
> supported for a while and so accumulates stable fixes - often the
> approach used is to use holes in structures for new fields, or if the
> structure is not embedded in any module-specific structures, add fields
> at the end. All existing field offsets should match. In taking that
> approach, the aim is to make sure data accesses in the module are still
> valid - memory layout compatibility is the goal.

So we'll need to develop some checksum/hash that would accommodate
these allowed changes.

>
> >>
> >>    Note that this is a bit easier than BTF deduplication, because
> >>    the deduplication process that happened at module encoding time
> >>    has already done the dependency checking for us; we just need
> >>    to do a type-by-type, 1-to-1 comparison between our relocation
> >>    types and current vmlinux types.
> >>
> >> 3. If all types are consistent, BTF is loaded and we remap the
> >>    module's vmlinux BTF id references to the corresponding
> >>    vmlinux BTF ids of the current vmlinux.
> >
> > Note that we might need to do something special for anonymous types
> > (modifiers, anon enums and structs/unions). Otherwise it's not clear
> > how to even map them between vmlinux BTF and module BTF.
> >
>
> Good point, we'd probably need to represent some sort of parent-child
> relationship to handle cases like this.

Probably best to keep such anonymous types in module's BTF. It might
add a bit of duplication, but will simplify the rest a lot.

>
> >>
> >> I _think_ this gets us what we want; more resilient module BTF,
> >> but with safety checks to ensure compatible representations.
> >> There were some suggestions of using a hashing method, but I think
> >> such a method presupposes we want exact type matches, which I suspect
> >> would be unlikely to be useful in practice as with most stable-based
> >> distros, small changes in types can be made due to fixes etc.
> >
> > What are "small changes" and how are they automatically determined and
> > validated?
> >
>
> See above, field additions in data structure holes or appended to
> structs for the most part. Once I have something rough working
> I'll see how it performs in practice and report back. Thanks!
>

SGTM.


> Alan
>
>
> >>
> >> There were also a suggestion of doing a full dedup, but I think the
> >> consensus in the room (which I agree with) is that would be hard
> >> to do in-kernel.  So the above approach is a compropmise I think;
> >> it gets actual dedup at BTF creation time to create the list of
> >> references and dependents, and we later check them one-by-one on module
> >> load for compatibility.
> >>
> >> Anyway I just wanted to try and capture the feedback received, and
> >> lay out a possible direction. Any further thoughts or suggestions
> >> would be much appreciated. Thanks!
> >>
> >> Alan
> >>
> >> [1] https://lpc.events/event/17/contributions/1576/