Message ID | 20240424154806.3417662-1-alan.maguire@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: support resilient split BTF | expand |
On Wed, Apr 24, 2024 at 8:48 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > Split BPF Type Format (BTF) provides huge advantages in that kernel > modules only have to provide type information for types that they do not > share with the core kernel; for core kernel types, split BTF refers to > core kernel BTF type ids. So for a STRUCT sk_buff, a module that > uses that structure (or a pointer to it) simply needs to refer to the > core kernel type id, saving the need to define the structure and its many > dependents. This cuts down on duplication and makes BTF as compact > as possible. > > However, there is a downside. This scheme requires the references from > split BTF to base BTF to be valid not just at encoding time, but at use > time (when the module is loaded). Even a small change in kernel types > can perturb the type ids in core kernel BTF, and due to pahole's > parallel processing of compilation units, even an unchanged kernel can > have different type ids if BTF is re-generated. So we have a robustness > problem for split BTF for cases where a module is not always compiled at > the same time as the kernel. This problem is particularly acute for > distros which generally want module builders to be able to compile a > module for the lifetime of a Linux stable-based release, and have it > continue to be valid over the lifetime of that release, even as changes > in data structures (and hence BTF types) accrue. Today it's not > possible to generate BTF for modules that works beyond the initial > kernel it is compiled against - kernel bugfixes etc invalidate the split > BTF references to vmlinux BTF, and BTF is no longer usable for the > module. > > The goal of this series is to provide options to provide additional > context for cases like this. That context comes in the form of > distilled base BTF; it stands in for the base BTF, and contains > information about the types referenced from split BTF, but not their > full descriptions. The modified split BTF will refer to type ids in > this .BTF.base section, and when the kernel loads such modules it > will use that base BTF to map references from split BTF to the > current vmlinux BTF - a process of relocating split BTF with the > currently-running kernel's vmlinux base BTF. > > A module builder - using this series along with the pahole changes - > can then build a module with distilled base BTF via an out-of-tree > module build, i.e. > > make -C . M=path/2/module > > The module will have a .BTF section (the split BTF) and a > .BTF.base section. The latter is small in size - distilled base > BTF does not need full struct/union/enum information for named > types for example. For 2667 modules built with distilled base BTF, > the average size observed was 1556 bytes (stddev 1563). > > Note that for the in-tree modules, this approach is not needed as > split and base BTF in the case of in-tree modules are always built > and re-built together. > > The series first focuses on generating split BTF with distilled base > BTF, and provides btf__parse_opts() which allows specification > of the section name from which to read BTF data, since we now have > both .BTF and .BTF.base sections that can contain such data. > > Then we add support to resolve_btfids for generating the .BTF.ids > section with reference to the .BTF.base section - this ensures the > .BTF.ids match those used in the split/base BTF. > > Finally the series provides the mechanism for relocating split BTF with > a new base; the distilled base BTF is used to map the references to base > BTF in the split BTF to the new base. For the kernel, this relocation > process happens at module load time, and we relocate split BTF > references to point at types in the current vmlinux BTF. As part of > this, .BTF.ids references need to be mapped also. > > So concretely, what happens is > > - we generate split BTF in the .BTF section of a module that refers to > types in the .BTF.base section as base types; these are not full > type descriptions but provide information about the base type. So > a STRUCT sk_buff would be represented as a FWD struct sk_buff in > distilled base BTF for example. > - when the module is loaded, the split BTF is relocated with vmlinux > BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff > in vmlinux BTF and map all split BTF references to the distilled base > FWD sk_buff, replacing them with references to the vmlinux BTF > STRUCT sk_buff. > > Support is also added to bpftool to be able to display split BTF > relative to its .BTF.base section, and also to display the relocated > form via the "-R path_to_base_btf". > > A previous approach to this problem [1] utilized standalone BTF for such > cases - where the BTF is not defined relative to base BTF so there is no > relocation required. The problem with that approach is that from > the verifier perspective, some types are special, and having a custom > representation of a core kernel type that did not necessarily match the > current representation is not tenable. So the approach taken here was > to preserve the split BTF model while minimizing the representation of > the context needed to relocate split and current vmlinux BTF. > > To generate distilled .BTF.base sections the associated dwarves > patch (to be applied on the "next" branch there) is needed. > Without it, things will still work but bpf_testmod will not be built > with a .BTF.base section. > > Changes since RFC [2]: > > - updated terminology; we replace clunky "base reference" BTF with > distilling base BTF into a .BTF.base section. Similarly BTF > reconcilation becomes BTF relocation (Andrii, most patches) > - add distilled base BTF by default for out-of-tree modules > (Alexei, patch 8) > - distill algorithm updated to record size of embedded struct/union > by recording it as a 0-vlen STRUCT/UNION with size preserved > (Andrii, patch 2) > - verify size match on relocation for such STRUCT/UNIONs (Andrii, > patch 9) > - with embedded STRUCT/UNION recording size, we can have bpftool > dump a header representation using .BTF.base + .BTF sections > rather than special-casing and refusing to use "format c" for > that case (patch 5) > - match enum with enum64 and vice versa (Andrii, patch 9) > - ensure that resolve_btfids works with BTF without .BTF.base > section (patch 7) > - update tests to cover embedded types, arrays and function > prototypes (patches 3, 12) > > One change not made yet is adding anonymous struct/unions that the split > BTF references in base BTF to the module instead of adding them to the > .BTF.base section. That would involve having to maintain two pipes for > writing BTF, one for the .BTF.base and one for the split BTF. It would > be possible, but there are I think some edge cases that might make it > tricky. For example consider a split BTF reference to a base BTF > ARRAY which in turn referenced an anonymous STRUCT as type. In such a > case, it wouldn't make sense to have the array in the .BTF.base section > while having the STRUCT in the module. The general concern is that once Hm.. not really? ARRAY is a reference type (and anonymous at that), so it would have to stay in module's BTF, no? I'll go read the patch series again, but let me know if I'm missing something. > we move a type to the module we would need to also ensure any base types > that refer to it move there too. For now it is I think simpler to > retain the existing split/base type classifications. We would have to finalize this part before landing, as it has big implications on the relocation process. > > [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/ > [2] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/ > > > > Alan Maguire (13): > libbpf: add support to btf__add_fwd() for ENUM64 > libbpf: add btf__distill_base() creating split BTF with distilled base > BTF > selftests/bpf: test distilled base, split BTF generation > libbpf: add btf__parse_opts() API for flexible BTF parsing > bpftool: support displaying raw split BTF using base BTF section as > base > kbuild,bpf: switch to using --btf_features for pahole v1.26 and later > resolve_btfids: use .BTF.base ELF section as base BTF if -B option is > used > kbuild, bpf: add module-specific pahole/resolve_btfids flags for > distilled base BTF > libbpf: split BTF relocation > module, bpf: store BTF base pointer in struct module > libbpf,bpf: share BTF relocate-related code with kernel > selftests/bpf: extend distilled BTF tests to cover BTF relocation > bpftool: support displaying relocated-with-base split BTF > > include/linux/btf.h | 32 + > include/linux/module.h | 2 + > kernel/bpf/Makefile | 8 + > kernel/bpf/btf.c | 227 +++++-- > kernel/module/main.c | 5 +- > scripts/Makefile.btf | 12 +- > scripts/Makefile.modfinal | 4 +- > .../bpf/bpftool/Documentation/bpftool-btf.rst | 15 +- > tools/bpf/bpftool/bash-completion/bpftool | 7 +- > tools/bpf/bpftool/btf.c | 20 +- > tools/bpf/bpftool/main.c | 14 +- > tools/bpf/bpftool/main.h | 2 + > tools/bpf/resolve_btfids/main.c | 22 +- > tools/lib/bpf/Build | 2 +- > tools/lib/bpf/btf.c | 561 +++++++++++----- > tools/lib/bpf/btf.h | 61 ++ > tools/lib/bpf/btf_common.c | 146 ++++ > tools/lib/bpf/btf_relocate.c | 630 ++++++++++++++++++ > tools/lib/bpf/libbpf.map | 3 + > tools/lib/bpf/libbpf_internal.h | 2 + > .../selftests/bpf/prog_tests/btf_distill.c | 298 +++++++++ > 21 files changed, 1864 insertions(+), 209 deletions(-) > create mode 100644 tools/lib/bpf/btf_common.c > create mode 100644 tools/lib/bpf/btf_relocate.c > create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c > > -- > 2.31.1 >
On Fri, Apr 26, 2024 at 3:56 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, Apr 24, 2024 at 8:48 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > Split BPF Type Format (BTF) provides huge advantages in that kernel > > modules only have to provide type information for types that they do not > > share with the core kernel; for core kernel types, split BTF refers to > > core kernel BTF type ids. So for a STRUCT sk_buff, a module that > > uses that structure (or a pointer to it) simply needs to refer to the > > core kernel type id, saving the need to define the structure and its many > > dependents. This cuts down on duplication and makes BTF as compact > > as possible. > > > > However, there is a downside. This scheme requires the references from > > split BTF to base BTF to be valid not just at encoding time, but at use > > time (when the module is loaded). Even a small change in kernel types > > can perturb the type ids in core kernel BTF, and due to pahole's > > parallel processing of compilation units, even an unchanged kernel can > > have different type ids if BTF is re-generated. So we have a robustness > > problem for split BTF for cases where a module is not always compiled at > > the same time as the kernel. This problem is particularly acute for > > distros which generally want module builders to be able to compile a > > module for the lifetime of a Linux stable-based release, and have it > > continue to be valid over the lifetime of that release, even as changes > > in data structures (and hence BTF types) accrue. Today it's not > > possible to generate BTF for modules that works beyond the initial > > kernel it is compiled against - kernel bugfixes etc invalidate the split > > BTF references to vmlinux BTF, and BTF is no longer usable for the > > module. > > > > The goal of this series is to provide options to provide additional > > context for cases like this. That context comes in the form of > > distilled base BTF; it stands in for the base BTF, and contains > > information about the types referenced from split BTF, but not their > > full descriptions. The modified split BTF will refer to type ids in > > this .BTF.base section, and when the kernel loads such modules it > > will use that base BTF to map references from split BTF to the > > current vmlinux BTF - a process of relocating split BTF with the > > currently-running kernel's vmlinux base BTF. > > > > A module builder - using this series along with the pahole changes - > > can then build a module with distilled base BTF via an out-of-tree > > module build, i.e. > > > > make -C . M=path/2/module > > > > The module will have a .BTF section (the split BTF) and a > > .BTF.base section. The latter is small in size - distilled base > > BTF does not need full struct/union/enum information for named > > types for example. For 2667 modules built with distilled base BTF, > > the average size observed was 1556 bytes (stddev 1563). > > > > Note that for the in-tree modules, this approach is not needed as > > split and base BTF in the case of in-tree modules are always built > > and re-built together. > > > > The series first focuses on generating split BTF with distilled base > > BTF, and provides btf__parse_opts() which allows specification > > of the section name from which to read BTF data, since we now have > > both .BTF and .BTF.base sections that can contain such data. > > > > Then we add support to resolve_btfids for generating the .BTF.ids > > section with reference to the .BTF.base section - this ensures the > > .BTF.ids match those used in the split/base BTF. > > > > Finally the series provides the mechanism for relocating split BTF with > > a new base; the distilled base BTF is used to map the references to base > > BTF in the split BTF to the new base. For the kernel, this relocation > > process happens at module load time, and we relocate split BTF > > references to point at types in the current vmlinux BTF. As part of > > this, .BTF.ids references need to be mapped also. > > > > So concretely, what happens is > > > > - we generate split BTF in the .BTF section of a module that refers to > > types in the .BTF.base section as base types; these are not full > > type descriptions but provide information about the base type. So > > a STRUCT sk_buff would be represented as a FWD struct sk_buff in > > distilled base BTF for example. > > - when the module is loaded, the split BTF is relocated with vmlinux > > BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff > > in vmlinux BTF and map all split BTF references to the distilled base > > FWD sk_buff, replacing them with references to the vmlinux BTF > > STRUCT sk_buff. > > > > Support is also added to bpftool to be able to display split BTF > > relative to its .BTF.base section, and also to display the relocated > > form via the "-R path_to_base_btf". > > > > A previous approach to this problem [1] utilized standalone BTF for such > > cases - where the BTF is not defined relative to base BTF so there is no > > relocation required. The problem with that approach is that from > > the verifier perspective, some types are special, and having a custom > > representation of a core kernel type that did not necessarily match the > > current representation is not tenable. So the approach taken here was > > to preserve the split BTF model while minimizing the representation of > > the context needed to relocate split and current vmlinux BTF. > > > > To generate distilled .BTF.base sections the associated dwarves > > patch (to be applied on the "next" branch there) is needed. > > Without it, things will still work but bpf_testmod will not be built > > with a .BTF.base section. > > > > Changes since RFC [2]: > > > > - updated terminology; we replace clunky "base reference" BTF with > > distilling base BTF into a .BTF.base section. Similarly BTF > > reconcilation becomes BTF relocation (Andrii, most patches) > > - add distilled base BTF by default for out-of-tree modules > > (Alexei, patch 8) > > - distill algorithm updated to record size of embedded struct/union > > by recording it as a 0-vlen STRUCT/UNION with size preserved > > (Andrii, patch 2) > > - verify size match on relocation for such STRUCT/UNIONs (Andrii, > > patch 9) > > - with embedded STRUCT/UNION recording size, we can have bpftool > > dump a header representation using .BTF.base + .BTF sections > > rather than special-casing and refusing to use "format c" for > > that case (patch 5) > > - match enum with enum64 and vice versa (Andrii, patch 9) > > - ensure that resolve_btfids works with BTF without .BTF.base > > section (patch 7) > > - update tests to cover embedded types, arrays and function > > prototypes (patches 3, 12) > > > > One change not made yet is adding anonymous struct/unions that the split > > BTF references in base BTF to the module instead of adding them to the > > .BTF.base section. That would involve having to maintain two pipes for > > writing BTF, one for the .BTF.base and one for the split BTF. It would > > be possible, but there are I think some edge cases that might make it > > tricky. For example consider a split BTF reference to a base BTF > > ARRAY which in turn referenced an anonymous STRUCT as type. In such a > > case, it wouldn't make sense to have the array in the .BTF.base section > > while having the STRUCT in the module. The general concern is that once > > Hm.. not really? ARRAY is a reference type (and anonymous at that), so > it would have to stay in module's BTF, no? I'll go read the patch > series again, but let me know if I'm missing something. > > > we move a type to the module we would need to also ensure any base types > > that refer to it move there too. For now it is I think simpler to > > retain the existing split/base type classifications. > > We would have to finalize this part before landing, as it has big > implications on the relocation process. Ran out of time, sorry, will continue on Monday. But please consider, meanwhile, what I mentioned about only having named structs/unions/enums in distilled base BTF. > > > > > > [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/ > > [2] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/ > > > > > > > > Alan Maguire (13): > > libbpf: add support to btf__add_fwd() for ENUM64 > > libbpf: add btf__distill_base() creating split BTF with distilled base > > BTF > > selftests/bpf: test distilled base, split BTF generation > > libbpf: add btf__parse_opts() API for flexible BTF parsing > > bpftool: support displaying raw split BTF using base BTF section as > > base > > kbuild,bpf: switch to using --btf_features for pahole v1.26 and later > > resolve_btfids: use .BTF.base ELF section as base BTF if -B option is > > used > > kbuild, bpf: add module-specific pahole/resolve_btfids flags for > > distilled base BTF > > libbpf: split BTF relocation > > module, bpf: store BTF base pointer in struct module > > libbpf,bpf: share BTF relocate-related code with kernel > > selftests/bpf: extend distilled BTF tests to cover BTF relocation > > bpftool: support displaying relocated-with-base split BTF > > > > include/linux/btf.h | 32 + > > include/linux/module.h | 2 + > > kernel/bpf/Makefile | 8 + > > kernel/bpf/btf.c | 227 +++++-- > > kernel/module/main.c | 5 +- > > scripts/Makefile.btf | 12 +- > > scripts/Makefile.modfinal | 4 +- > > .../bpf/bpftool/Documentation/bpftool-btf.rst | 15 +- > > tools/bpf/bpftool/bash-completion/bpftool | 7 +- > > tools/bpf/bpftool/btf.c | 20 +- > > tools/bpf/bpftool/main.c | 14 +- > > tools/bpf/bpftool/main.h | 2 + > > tools/bpf/resolve_btfids/main.c | 22 +- > > tools/lib/bpf/Build | 2 +- > > tools/lib/bpf/btf.c | 561 +++++++++++----- > > tools/lib/bpf/btf.h | 61 ++ > > tools/lib/bpf/btf_common.c | 146 ++++ > > tools/lib/bpf/btf_relocate.c | 630 ++++++++++++++++++ > > tools/lib/bpf/libbpf.map | 3 + > > tools/lib/bpf/libbpf_internal.h | 2 + > > .../selftests/bpf/prog_tests/btf_distill.c | 298 +++++++++ > > 21 files changed, 1864 insertions(+), 209 deletions(-) > > create mode 100644 tools/lib/bpf/btf_common.c > > create mode 100644 tools/lib/bpf/btf_relocate.c > > create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c > > > > -- > > 2.31.1 > >
On 27/04/2024 01:24, Andrii Nakryiko wrote: > On Fri, Apr 26, 2024 at 3:56 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: >> >> On Wed, Apr 24, 2024 at 8:48 AM Alan Maguire <alan.maguire@oracle.com> wrote: >>> >>> Split BPF Type Format (BTF) provides huge advantages in that kernel >>> modules only have to provide type information for types that they do not >>> share with the core kernel; for core kernel types, split BTF refers to >>> core kernel BTF type ids. So for a STRUCT sk_buff, a module that >>> uses that structure (or a pointer to it) simply needs to refer to the >>> core kernel type id, saving the need to define the structure and its many >>> dependents. This cuts down on duplication and makes BTF as compact >>> as possible. >>> >>> However, there is a downside. This scheme requires the references from >>> split BTF to base BTF to be valid not just at encoding time, but at use >>> time (when the module is loaded). Even a small change in kernel types >>> can perturb the type ids in core kernel BTF, and due to pahole's >>> parallel processing of compilation units, even an unchanged kernel can >>> have different type ids if BTF is re-generated. So we have a robustness >>> problem for split BTF for cases where a module is not always compiled at >>> the same time as the kernel. This problem is particularly acute for >>> distros which generally want module builders to be able to compile a >>> module for the lifetime of a Linux stable-based release, and have it >>> continue to be valid over the lifetime of that release, even as changes >>> in data structures (and hence BTF types) accrue. Today it's not >>> possible to generate BTF for modules that works beyond the initial >>> kernel it is compiled against - kernel bugfixes etc invalidate the split >>> BTF references to vmlinux BTF, and BTF is no longer usable for the >>> module. >>> >>> The goal of this series is to provide options to provide additional >>> context for cases like this. That context comes in the form of >>> distilled base BTF; it stands in for the base BTF, and contains >>> information about the types referenced from split BTF, but not their >>> full descriptions. The modified split BTF will refer to type ids in >>> this .BTF.base section, and when the kernel loads such modules it >>> will use that base BTF to map references from split BTF to the >>> current vmlinux BTF - a process of relocating split BTF with the >>> currently-running kernel's vmlinux base BTF. >>> >>> A module builder - using this series along with the pahole changes - >>> can then build a module with distilled base BTF via an out-of-tree >>> module build, i.e. >>> >>> make -C . M=path/2/module >>> >>> The module will have a .BTF section (the split BTF) and a >>> .BTF.base section. The latter is small in size - distilled base >>> BTF does not need full struct/union/enum information for named >>> types for example. For 2667 modules built with distilled base BTF, >>> the average size observed was 1556 bytes (stddev 1563). >>> >>> Note that for the in-tree modules, this approach is not needed as >>> split and base BTF in the case of in-tree modules are always built >>> and re-built together. >>> >>> The series first focuses on generating split BTF with distilled base >>> BTF, and provides btf__parse_opts() which allows specification >>> of the section name from which to read BTF data, since we now have >>> both .BTF and .BTF.base sections that can contain such data. >>> >>> Then we add support to resolve_btfids for generating the .BTF.ids >>> section with reference to the .BTF.base section - this ensures the >>> .BTF.ids match those used in the split/base BTF. >>> >>> Finally the series provides the mechanism for relocating split BTF with >>> a new base; the distilled base BTF is used to map the references to base >>> BTF in the split BTF to the new base. For the kernel, this relocation >>> process happens at module load time, and we relocate split BTF >>> references to point at types in the current vmlinux BTF. As part of >>> this, .BTF.ids references need to be mapped also. >>> >>> So concretely, what happens is >>> >>> - we generate split BTF in the .BTF section of a module that refers to >>> types in the .BTF.base section as base types; these are not full >>> type descriptions but provide information about the base type. So >>> a STRUCT sk_buff would be represented as a FWD struct sk_buff in >>> distilled base BTF for example. >>> - when the module is loaded, the split BTF is relocated with vmlinux >>> BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff >>> in vmlinux BTF and map all split BTF references to the distilled base >>> FWD sk_buff, replacing them with references to the vmlinux BTF >>> STRUCT sk_buff. >>> >>> Support is also added to bpftool to be able to display split BTF >>> relative to its .BTF.base section, and also to display the relocated >>> form via the "-R path_to_base_btf". >>> >>> A previous approach to this problem [1] utilized standalone BTF for such >>> cases - where the BTF is not defined relative to base BTF so there is no >>> relocation required. The problem with that approach is that from >>> the verifier perspective, some types are special, and having a custom >>> representation of a core kernel type that did not necessarily match the >>> current representation is not tenable. So the approach taken here was >>> to preserve the split BTF model while minimizing the representation of >>> the context needed to relocate split and current vmlinux BTF. >>> >>> To generate distilled .BTF.base sections the associated dwarves >>> patch (to be applied on the "next" branch there) is needed. >>> Without it, things will still work but bpf_testmod will not be built >>> with a .BTF.base section. >>> >>> Changes since RFC [2]: >>> >>> - updated terminology; we replace clunky "base reference" BTF with >>> distilling base BTF into a .BTF.base section. Similarly BTF >>> reconcilation becomes BTF relocation (Andrii, most patches) >>> - add distilled base BTF by default for out-of-tree modules >>> (Alexei, patch 8) >>> - distill algorithm updated to record size of embedded struct/union >>> by recording it as a 0-vlen STRUCT/UNION with size preserved >>> (Andrii, patch 2) >>> - verify size match on relocation for such STRUCT/UNIONs (Andrii, >>> patch 9) >>> - with embedded STRUCT/UNION recording size, we can have bpftool >>> dump a header representation using .BTF.base + .BTF sections >>> rather than special-casing and refusing to use "format c" for >>> that case (patch 5) >>> - match enum with enum64 and vice versa (Andrii, patch 9) >>> - ensure that resolve_btfids works with BTF without .BTF.base >>> section (patch 7) >>> - update tests to cover embedded types, arrays and function >>> prototypes (patches 3, 12) >>> >>> One change not made yet is adding anonymous struct/unions that the split >>> BTF references in base BTF to the module instead of adding them to the >>> .BTF.base section. That would involve having to maintain two pipes for >>> writing BTF, one for the .BTF.base and one for the split BTF. It would >>> be possible, but there are I think some edge cases that might make it >>> tricky. For example consider a split BTF reference to a base BTF >>> ARRAY which in turn referenced an anonymous STRUCT as type. In such a >>> case, it wouldn't make sense to have the array in the .BTF.base section >>> while having the STRUCT in the module. The general concern is that once >> >> Hm.. not really? ARRAY is a reference type (and anonymous at that), so >> it would have to stay in module's BTF, no? I'll go read the patch >> series again, but let me know if I'm missing something. >> The way things currently work, we preserve all relationships prior to distilling base BTF. That is, if a type was in split BTF prior to calling btf__distill_base(), it will stay in split BTF afterwards. Ditto for base types. This is true for reference types as well as named types. So in the case of the above array for example, prior to distilling types it is in base BTF. If it in turn then referred to a base anonymous struct, both would be in the base and thus the distilled base BTF. In the above case, I was suggesting the array itself was referred to from split BTF, but not in split BTF, sorry if that wasn't clearer. So the problem comes if we moved the anon struct to the module; then we also need to move types that depend on it there. This means we'd need to make the move recursive. That seems doable; the only question is around the logistics and the effects of doing so. At one extreme we might end up with something that resembles standalone BTF (many/most types in the split BTF). That seems unlikely in most cases. I examined one module's BTF base for example, and the only anon structs arose from typedef references possible_net_t, sockptr_t, rwlock_t and atomic_t. These in turn were only referenced once elsewhere in distilled base BTF; a sockptr was in a FUNC_PROTO, but aside from that the typedefs were not otherwise referenced in distilled base BTF, they were referenced in split BTF as embeeded struct field types. So moving all of this to the split BTF seems possible; what I think we probably need to think on a bit is how to handle relocation. Is there a need to relocate these module types too, or can we live with having duplicate atomic_t/sockptr_t typedefs in the module? Currently relocation is simplified by the fact that we only need to relocate the types prior to the module's start id. All we need to do is rewrite type references in split BTF to base ids. If we were relocating split types too we'd need to remove them from split BTF. >>> we move a type to the module we would need to also ensure any base types >>> that refer to it move there too. For now it is I think simpler to >>> retain the existing split/base type classifications. >> >> We would have to finalize this part before landing, as it has big >> implications on the relocation process. > > Ran out of time, sorry, will continue on Monday. But please consider, > meanwhile, what I mentioned about only having named > structs/unions/enums in distilled base BTF. > Sure, I'll dig into it further. FWIW I agree with the goal of moving anonymous structs/unions if it's doable. I can't see any blocking issues thus far. >> >> >>> >>> [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/ >>> [2] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/ >>> >>> >>> >>> Alan Maguire (13): >>> libbpf: add support to btf__add_fwd() for ENUM64 >>> libbpf: add btf__distill_base() creating split BTF with distilled base >>> BTF >>> selftests/bpf: test distilled base, split BTF generation >>> libbpf: add btf__parse_opts() API for flexible BTF parsing >>> bpftool: support displaying raw split BTF using base BTF section as >>> base >>> kbuild,bpf: switch to using --btf_features for pahole v1.26 and later >>> resolve_btfids: use .BTF.base ELF section as base BTF if -B option is >>> used >>> kbuild, bpf: add module-specific pahole/resolve_btfids flags for >>> distilled base BTF >>> libbpf: split BTF relocation >>> module, bpf: store BTF base pointer in struct module >>> libbpf,bpf: share BTF relocate-related code with kernel >>> selftests/bpf: extend distilled BTF tests to cover BTF relocation >>> bpftool: support displaying relocated-with-base split BTF >>> >>> include/linux/btf.h | 32 + >>> include/linux/module.h | 2 + >>> kernel/bpf/Makefile | 8 + >>> kernel/bpf/btf.c | 227 +++++-- >>> kernel/module/main.c | 5 +- >>> scripts/Makefile.btf | 12 +- >>> scripts/Makefile.modfinal | 4 +- >>> .../bpf/bpftool/Documentation/bpftool-btf.rst | 15 +- >>> tools/bpf/bpftool/bash-completion/bpftool | 7 +- >>> tools/bpf/bpftool/btf.c | 20 +- >>> tools/bpf/bpftool/main.c | 14 +- >>> tools/bpf/bpftool/main.h | 2 + >>> tools/bpf/resolve_btfids/main.c | 22 +- >>> tools/lib/bpf/Build | 2 +- >>> tools/lib/bpf/btf.c | 561 +++++++++++----- >>> tools/lib/bpf/btf.h | 61 ++ >>> tools/lib/bpf/btf_common.c | 146 ++++ >>> tools/lib/bpf/btf_relocate.c | 630 ++++++++++++++++++ >>> tools/lib/bpf/libbpf.map | 3 + >>> tools/lib/bpf/libbpf_internal.h | 2 + >>> .../selftests/bpf/prog_tests/btf_distill.c | 298 +++++++++ >>> 21 files changed, 1864 insertions(+), 209 deletions(-) >>> create mode 100644 tools/lib/bpf/btf_common.c >>> create mode 100644 tools/lib/bpf/btf_relocate.c >>> create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c >>> >>> -- >>> 2.31.1 >>> >
On Mon, Apr 29, 2024 at 8:25 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > On 27/04/2024 01:24, Andrii Nakryiko wrote: > > On Fri, Apr 26, 2024 at 3:56 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > >> > >> On Wed, Apr 24, 2024 at 8:48 AM Alan Maguire <alan.maguire@oracle.com> wrote: > >>> > >>> Split BPF Type Format (BTF) provides huge advantages in that kernel > >>> modules only have to provide type information for types that they do not > >>> share with the core kernel; for core kernel types, split BTF refers to > >>> core kernel BTF type ids. So for a STRUCT sk_buff, a module that > >>> uses that structure (or a pointer to it) simply needs to refer to the > >>> core kernel type id, saving the need to define the structure and its many > >>> dependents. This cuts down on duplication and makes BTF as compact > >>> as possible. > >>> > >>> However, there is a downside. This scheme requires the references from > >>> split BTF to base BTF to be valid not just at encoding time, but at use > >>> time (when the module is loaded). Even a small change in kernel types > >>> can perturb the type ids in core kernel BTF, and due to pahole's > >>> parallel processing of compilation units, even an unchanged kernel can > >>> have different type ids if BTF is re-generated. So we have a robustness > >>> problem for split BTF for cases where a module is not always compiled at > >>> the same time as the kernel. This problem is particularly acute for > >>> distros which generally want module builders to be able to compile a > >>> module for the lifetime of a Linux stable-based release, and have it > >>> continue to be valid over the lifetime of that release, even as changes > >>> in data structures (and hence BTF types) accrue. Today it's not > >>> possible to generate BTF for modules that works beyond the initial > >>> kernel it is compiled against - kernel bugfixes etc invalidate the split > >>> BTF references to vmlinux BTF, and BTF is no longer usable for the > >>> module. > >>> > >>> The goal of this series is to provide options to provide additional > >>> context for cases like this. That context comes in the form of > >>> distilled base BTF; it stands in for the base BTF, and contains > >>> information about the types referenced from split BTF, but not their > >>> full descriptions. The modified split BTF will refer to type ids in > >>> this .BTF.base section, and when the kernel loads such modules it > >>> will use that base BTF to map references from split BTF to the > >>> current vmlinux BTF - a process of relocating split BTF with the > >>> currently-running kernel's vmlinux base BTF. > >>> > >>> A module builder - using this series along with the pahole changes - > >>> can then build a module with distilled base BTF via an out-of-tree > >>> module build, i.e. > >>> > >>> make -C . M=path/2/module > >>> > >>> The module will have a .BTF section (the split BTF) and a > >>> .BTF.base section. The latter is small in size - distilled base > >>> BTF does not need full struct/union/enum information for named > >>> types for example. For 2667 modules built with distilled base BTF, > >>> the average size observed was 1556 bytes (stddev 1563). > >>> > >>> Note that for the in-tree modules, this approach is not needed as > >>> split and base BTF in the case of in-tree modules are always built > >>> and re-built together. > >>> > >>> The series first focuses on generating split BTF with distilled base > >>> BTF, and provides btf__parse_opts() which allows specification > >>> of the section name from which to read BTF data, since we now have > >>> both .BTF and .BTF.base sections that can contain such data. > >>> > >>> Then we add support to resolve_btfids for generating the .BTF.ids > >>> section with reference to the .BTF.base section - this ensures the > >>> .BTF.ids match those used in the split/base BTF. > >>> > >>> Finally the series provides the mechanism for relocating split BTF with > >>> a new base; the distilled base BTF is used to map the references to base > >>> BTF in the split BTF to the new base. For the kernel, this relocation > >>> process happens at module load time, and we relocate split BTF > >>> references to point at types in the current vmlinux BTF. As part of > >>> this, .BTF.ids references need to be mapped also. > >>> > >>> So concretely, what happens is > >>> > >>> - we generate split BTF in the .BTF section of a module that refers to > >>> types in the .BTF.base section as base types; these are not full > >>> type descriptions but provide information about the base type. So > >>> a STRUCT sk_buff would be represented as a FWD struct sk_buff in > >>> distilled base BTF for example. > >>> - when the module is loaded, the split BTF is relocated with vmlinux > >>> BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff > >>> in vmlinux BTF and map all split BTF references to the distilled base > >>> FWD sk_buff, replacing them with references to the vmlinux BTF > >>> STRUCT sk_buff. > >>> > >>> Support is also added to bpftool to be able to display split BTF > >>> relative to its .BTF.base section, and also to display the relocated > >>> form via the "-R path_to_base_btf". > >>> > >>> A previous approach to this problem [1] utilized standalone BTF for such > >>> cases - where the BTF is not defined relative to base BTF so there is no > >>> relocation required. The problem with that approach is that from > >>> the verifier perspective, some types are special, and having a custom > >>> representation of a core kernel type that did not necessarily match the > >>> current representation is not tenable. So the approach taken here was > >>> to preserve the split BTF model while minimizing the representation of > >>> the context needed to relocate split and current vmlinux BTF. > >>> > >>> To generate distilled .BTF.base sections the associated dwarves > >>> patch (to be applied on the "next" branch there) is needed. > >>> Without it, things will still work but bpf_testmod will not be built > >>> with a .BTF.base section. > >>> > >>> Changes since RFC [2]: > >>> > >>> - updated terminology; we replace clunky "base reference" BTF with > >>> distilling base BTF into a .BTF.base section. Similarly BTF > >>> reconcilation becomes BTF relocation (Andrii, most patches) > >>> - add distilled base BTF by default for out-of-tree modules > >>> (Alexei, patch 8) > >>> - distill algorithm updated to record size of embedded struct/union > >>> by recording it as a 0-vlen STRUCT/UNION with size preserved > >>> (Andrii, patch 2) > >>> - verify size match on relocation for such STRUCT/UNIONs (Andrii, > >>> patch 9) > >>> - with embedded STRUCT/UNION recording size, we can have bpftool > >>> dump a header representation using .BTF.base + .BTF sections > >>> rather than special-casing and refusing to use "format c" for > >>> that case (patch 5) > >>> - match enum with enum64 and vice versa (Andrii, patch 9) > >>> - ensure that resolve_btfids works with BTF without .BTF.base > >>> section (patch 7) > >>> - update tests to cover embedded types, arrays and function > >>> prototypes (patches 3, 12) > >>> > >>> One change not made yet is adding anonymous struct/unions that the split > >>> BTF references in base BTF to the module instead of adding them to the > >>> .BTF.base section. That would involve having to maintain two pipes for > >>> writing BTF, one for the .BTF.base and one for the split BTF. It would > >>> be possible, but there are I think some edge cases that might make it > >>> tricky. For example consider a split BTF reference to a base BTF > >>> ARRAY which in turn referenced an anonymous STRUCT as type. In such a > >>> case, it wouldn't make sense to have the array in the .BTF.base section > >>> while having the STRUCT in the module. The general concern is that once > >> > >> Hm.. not really? ARRAY is a reference type (and anonymous at that), so > >> it would have to stay in module's BTF, no? I'll go read the patch > >> series again, but let me know if I'm missing something. > >> > > The way things currently work, we preserve all relationships prior to > distilling base BTF. That is, if a type was in split BTF prior to > calling btf__distill_base(), it will stay in split BTF afterwards. Ditto > for base types. This is true for reference types as well as named types. > So in the case of the above array for example, prior to distilling types > it is in base BTF. If it in turn then referred to a base anonymous > struct, both would be in the base and thus the distilled base BTF. In > the above case, I was suggesting the array itself was referred to from > split BTF, but not in split BTF, sorry if that wasn't clearer. > > So the problem comes if we moved the anon struct to the module; then we > also need to move types that depend on it there. This means we'd need to > make the move recursive. That seems doable; the only question is around Yep, it should be very doable. We just mark everything used from "to-be-moved-to-new-split-BTF" types recursively, unless it's "qualified named type", where we stop. You have a pass to mark embedded types, here it might be another pass to mark "used-by-split-BTF-types-but-not-distillable" types. > the logistics and the effects of doing so. At one extreme we might end > up with something that resembles standalone BTF (many/most types in the My hypothesis is that it is very unlikely that there will be a lot of types that have to be copied into split BTF. > split BTF). That seems unlikely in most cases. I examined one module's > BTF base for example, and the only anon structs arose from typedef > references possible_net_t, sockptr_t, rwlock_t and atomic_t. These in > turn were only referenced once elsewhere in distilled base BTF; a > sockptr was in a FUNC_PROTO, but aside from that the typedefs were not > otherwise referenced in distilled base BTF, they were referenced in > split BTF as embeeded struct field types. > > So moving all of this to the split BTF seems possible; what I think we > probably need to think on a bit is how to handle relocation. Is there a > need to relocate these module types too, or can we live with having > duplicate atomic_t/sockptr_t typedefs in the module? Currently > relocation is simplified by the fact that we only need to relocate the > types prior to the module's start id. All we need to do is rewrite type > references in split BTF to base ids. If we were relocating split types > too we'd need to remove them from split BTF. I think anything that is not in distilled base should not be relocated, so current simplicity is remapping distilled BTF IDs will remain. It's ok to have clones/copies of some simple typedefs, probably. We have a few somewhat competing goals here and we need to make a tradeoff between them: a) minimizing split BTF size (or rather not making it too large) b) making sure PTR_TO_BTF_ID types work (so module kfuncs can accept task_struct and others) c) keeping relocation simple, fast, and reliable/unambiguous By copying anonymous types we potentially hurt a) (but presumably not a lot to worry about), and we significantly improve c) by making relocation simple/fast/reliably (to the extent possible with "by name" lookups). And we (presumably) don't change b), it still works for all existing and future cases. If we ever need to pass anonymous typedef'ed types to kfunc, we'll need to think how to represent them in distilled base BTF. But it most probably won't be TYPEDEF -> STRUCT chain, but rather empty STRUCT with the name of original TYPEDEF + some bit to specify that we are looking for a TYPEDEF in real base BTF; I think we have a pass forward here, and that's the main thing, but I don't think it's a problem worth solving now (or ever). WDYT? > > >>> we move a type to the module we would need to also ensure any base types > >>> that refer to it move there too. For now it is I think simpler to > >>> retain the existing split/base type classifications. > >> > >> We would have to finalize this part before landing, as it has big > >> implications on the relocation process. > > > > Ran out of time, sorry, will continue on Monday. But please consider, > > meanwhile, what I mentioned about only having named > > structs/unions/enums in distilled base BTF. > > > > Sure, I'll dig into it further. FWIW I agree with the goal of moving > anonymous structs/unions if it's doable. I can't see any blocking issues > thus far. Yep, please give it a go, and I'll try to finish the review today, thanks. > > >> > >> > >>> > >>> [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/ > >>> [2] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/ > >>> > >>> > >>> > >>> Alan Maguire (13): > >>> libbpf: add support to btf__add_fwd() for ENUM64 > >>> libbpf: add btf__distill_base() creating split BTF with distilled base > >>> BTF > >>> selftests/bpf: test distilled base, split BTF generation > >>> libbpf: add btf__parse_opts() API for flexible BTF parsing > >>> bpftool: support displaying raw split BTF using base BTF section as > >>> base > >>> kbuild,bpf: switch to using --btf_features for pahole v1.26 and later > >>> resolve_btfids: use .BTF.base ELF section as base BTF if -B option is > >>> used > >>> kbuild, bpf: add module-specific pahole/resolve_btfids flags for > >>> distilled base BTF > >>> libbpf: split BTF relocation > >>> module, bpf: store BTF base pointer in struct module > >>> libbpf,bpf: share BTF relocate-related code with kernel > >>> selftests/bpf: extend distilled BTF tests to cover BTF relocation > >>> bpftool: support displaying relocated-with-base split BTF > >>> > >>> include/linux/btf.h | 32 + > >>> include/linux/module.h | 2 + > >>> kernel/bpf/Makefile | 8 + > >>> kernel/bpf/btf.c | 227 +++++-- > >>> kernel/module/main.c | 5 +- > >>> scripts/Makefile.btf | 12 +- > >>> scripts/Makefile.modfinal | 4 +- > >>> .../bpf/bpftool/Documentation/bpftool-btf.rst | 15 +- > >>> tools/bpf/bpftool/bash-completion/bpftool | 7 +- > >>> tools/bpf/bpftool/btf.c | 20 +- > >>> tools/bpf/bpftool/main.c | 14 +- > >>> tools/bpf/bpftool/main.h | 2 + > >>> tools/bpf/resolve_btfids/main.c | 22 +- > >>> tools/lib/bpf/Build | 2 +- > >>> tools/lib/bpf/btf.c | 561 +++++++++++----- > >>> tools/lib/bpf/btf.h | 61 ++ > >>> tools/lib/bpf/btf_common.c | 146 ++++ > >>> tools/lib/bpf/btf_relocate.c | 630 ++++++++++++++++++ > >>> tools/lib/bpf/libbpf.map | 3 + > >>> tools/lib/bpf/libbpf_internal.h | 2 + > >>> .../selftests/bpf/prog_tests/btf_distill.c | 298 +++++++++ > >>> 21 files changed, 1864 insertions(+), 209 deletions(-) > >>> create mode 100644 tools/lib/bpf/btf_common.c > >>> create mode 100644 tools/lib/bpf/btf_relocate.c > >>> create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c > >>> > >>> -- > >>> 2.31.1 > >>> > >
On 29/04/2024 18:05, Andrii Nakryiko wrote: > On Mon, Apr 29, 2024 at 8:25 AM Alan Maguire <alan.maguire@oracle.com> wrote: >> >> On 27/04/2024 01:24, Andrii Nakryiko wrote: >>> On Fri, Apr 26, 2024 at 3:56 PM Andrii Nakryiko >>> <andrii.nakryiko@gmail.com> wrote: >>>> >>>> On Wed, Apr 24, 2024 at 8:48 AM Alan Maguire <alan.maguire@oracle.com> wrote: >>>>> >>>>> Split BPF Type Format (BTF) provides huge advantages in that kernel >>>>> modules only have to provide type information for types that they do not >>>>> share with the core kernel; for core kernel types, split BTF refers to >>>>> core kernel BTF type ids. So for a STRUCT sk_buff, a module that >>>>> uses that structure (or a pointer to it) simply needs to refer to the >>>>> core kernel type id, saving the need to define the structure and its many >>>>> dependents. This cuts down on duplication and makes BTF as compact >>>>> as possible. >>>>> >>>>> However, there is a downside. This scheme requires the references from >>>>> split BTF to base BTF to be valid not just at encoding time, but at use >>>>> time (when the module is loaded). Even a small change in kernel types >>>>> can perturb the type ids in core kernel BTF, and due to pahole's >>>>> parallel processing of compilation units, even an unchanged kernel can >>>>> have different type ids if BTF is re-generated. So we have a robustness >>>>> problem for split BTF for cases where a module is not always compiled at >>>>> the same time as the kernel. This problem is particularly acute for >>>>> distros which generally want module builders to be able to compile a >>>>> module for the lifetime of a Linux stable-based release, and have it >>>>> continue to be valid over the lifetime of that release, even as changes >>>>> in data structures (and hence BTF types) accrue. Today it's not >>>>> possible to generate BTF for modules that works beyond the initial >>>>> kernel it is compiled against - kernel bugfixes etc invalidate the split >>>>> BTF references to vmlinux BTF, and BTF is no longer usable for the >>>>> module. >>>>> >>>>> The goal of this series is to provide options to provide additional >>>>> context for cases like this. That context comes in the form of >>>>> distilled base BTF; it stands in for the base BTF, and contains >>>>> information about the types referenced from split BTF, but not their >>>>> full descriptions. The modified split BTF will refer to type ids in >>>>> this .BTF.base section, and when the kernel loads such modules it >>>>> will use that base BTF to map references from split BTF to the >>>>> current vmlinux BTF - a process of relocating split BTF with the >>>>> currently-running kernel's vmlinux base BTF. >>>>> >>>>> A module builder - using this series along with the pahole changes - >>>>> can then build a module with distilled base BTF via an out-of-tree >>>>> module build, i.e. >>>>> >>>>> make -C . M=path/2/module >>>>> >>>>> The module will have a .BTF section (the split BTF) and a >>>>> .BTF.base section. The latter is small in size - distilled base >>>>> BTF does not need full struct/union/enum information for named >>>>> types for example. For 2667 modules built with distilled base BTF, >>>>> the average size observed was 1556 bytes (stddev 1563). >>>>> >>>>> Note that for the in-tree modules, this approach is not needed as >>>>> split and base BTF in the case of in-tree modules are always built >>>>> and re-built together. >>>>> >>>>> The series first focuses on generating split BTF with distilled base >>>>> BTF, and provides btf__parse_opts() which allows specification >>>>> of the section name from which to read BTF data, since we now have >>>>> both .BTF and .BTF.base sections that can contain such data. >>>>> >>>>> Then we add support to resolve_btfids for generating the .BTF.ids >>>>> section with reference to the .BTF.base section - this ensures the >>>>> .BTF.ids match those used in the split/base BTF. >>>>> >>>>> Finally the series provides the mechanism for relocating split BTF with >>>>> a new base; the distilled base BTF is used to map the references to base >>>>> BTF in the split BTF to the new base. For the kernel, this relocation >>>>> process happens at module load time, and we relocate split BTF >>>>> references to point at types in the current vmlinux BTF. As part of >>>>> this, .BTF.ids references need to be mapped also. >>>>> >>>>> So concretely, what happens is >>>>> >>>>> - we generate split BTF in the .BTF section of a module that refers to >>>>> types in the .BTF.base section as base types; these are not full >>>>> type descriptions but provide information about the base type. So >>>>> a STRUCT sk_buff would be represented as a FWD struct sk_buff in >>>>> distilled base BTF for example. >>>>> - when the module is loaded, the split BTF is relocated with vmlinux >>>>> BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff >>>>> in vmlinux BTF and map all split BTF references to the distilled base >>>>> FWD sk_buff, replacing them with references to the vmlinux BTF >>>>> STRUCT sk_buff. >>>>> >>>>> Support is also added to bpftool to be able to display split BTF >>>>> relative to its .BTF.base section, and also to display the relocated >>>>> form via the "-R path_to_base_btf". >>>>> >>>>> A previous approach to this problem [1] utilized standalone BTF for such >>>>> cases - where the BTF is not defined relative to base BTF so there is no >>>>> relocation required. The problem with that approach is that from >>>>> the verifier perspective, some types are special, and having a custom >>>>> representation of a core kernel type that did not necessarily match the >>>>> current representation is not tenable. So the approach taken here was >>>>> to preserve the split BTF model while minimizing the representation of >>>>> the context needed to relocate split and current vmlinux BTF. >>>>> >>>>> To generate distilled .BTF.base sections the associated dwarves >>>>> patch (to be applied on the "next" branch there) is needed. >>>>> Without it, things will still work but bpf_testmod will not be built >>>>> with a .BTF.base section. >>>>> >>>>> Changes since RFC [2]: >>>>> >>>>> - updated terminology; we replace clunky "base reference" BTF with >>>>> distilling base BTF into a .BTF.base section. Similarly BTF >>>>> reconcilation becomes BTF relocation (Andrii, most patches) >>>>> - add distilled base BTF by default for out-of-tree modules >>>>> (Alexei, patch 8) >>>>> - distill algorithm updated to record size of embedded struct/union >>>>> by recording it as a 0-vlen STRUCT/UNION with size preserved >>>>> (Andrii, patch 2) >>>>> - verify size match on relocation for such STRUCT/UNIONs (Andrii, >>>>> patch 9) >>>>> - with embedded STRUCT/UNION recording size, we can have bpftool >>>>> dump a header representation using .BTF.base + .BTF sections >>>>> rather than special-casing and refusing to use "format c" for >>>>> that case (patch 5) >>>>> - match enum with enum64 and vice versa (Andrii, patch 9) >>>>> - ensure that resolve_btfids works with BTF without .BTF.base >>>>> section (patch 7) >>>>> - update tests to cover embedded types, arrays and function >>>>> prototypes (patches 3, 12) >>>>> >>>>> One change not made yet is adding anonymous struct/unions that the split >>>>> BTF references in base BTF to the module instead of adding them to the >>>>> .BTF.base section. That would involve having to maintain two pipes for >>>>> writing BTF, one for the .BTF.base and one for the split BTF. It would >>>>> be possible, but there are I think some edge cases that might make it >>>>> tricky. For example consider a split BTF reference to a base BTF >>>>> ARRAY which in turn referenced an anonymous STRUCT as type. In such a >>>>> case, it wouldn't make sense to have the array in the .BTF.base section >>>>> while having the STRUCT in the module. The general concern is that once >>>> >>>> Hm.. not really? ARRAY is a reference type (and anonymous at that), so >>>> it would have to stay in module's BTF, no? I'll go read the patch >>>> series again, but let me know if I'm missing something. >>>> >> >> The way things currently work, we preserve all relationships prior to >> distilling base BTF. That is, if a type was in split BTF prior to >> calling btf__distill_base(), it will stay in split BTF afterwards. Ditto >> for base types. This is true for reference types as well as named types. >> So in the case of the above array for example, prior to distilling types >> it is in base BTF. If it in turn then referred to a base anonymous >> struct, both would be in the base and thus the distilled base BTF. In >> the above case, I was suggesting the array itself was referred to from >> split BTF, but not in split BTF, sorry if that wasn't clearer. >> >> So the problem comes if we moved the anon struct to the module; then we >> also need to move types that depend on it there. This means we'd need to >> make the move recursive. That seems doable; the only question is around > > Yep, it should be very doable. We just mark everything used from > "to-be-moved-to-new-split-BTF" types recursively, unless it's > "qualified named type", where we stop. You have a pass to mark > embedded types, here it might be another pass to mark > "used-by-split-BTF-types-but-not-distillable" types. > >> the logistics and the effects of doing so. At one extreme we might end >> up with something that resembles standalone BTF (many/most types in the > > My hypothesis is that it is very unlikely that there will be a lot of > types that have to be copied into split BTF. > >> split BTF). That seems unlikely in most cases. I examined one module's >> BTF base for example, and the only anon structs arose from typedef >> references possible_net_t, sockptr_t, rwlock_t and atomic_t. These in >> turn were only referenced once elsewhere in distilled base BTF; a >> sockptr was in a FUNC_PROTO, but aside from that the typedefs were not >> otherwise referenced in distilled base BTF, they were referenced in >> split BTF as embeeded struct field types. >> >> So moving all of this to the split BTF seems possible; what I think we >> probably need to think on a bit is how to handle relocation. Is there a >> need to relocate these module types too, or can we live with having >> duplicate atomic_t/sockptr_t typedefs in the module? Currently >> relocation is simplified by the fact that we only need to relocate the >> types prior to the module's start id. All we need to do is rewrite type >> references in split BTF to base ids. If we were relocating split types >> too we'd need to remove them from split BTF. > > I think anything that is not in distilled base should not be > relocated, so current simplicity is remapping distilled BTF IDs will > remain. It's ok to have clones/copies of some simple typedefs, > probably. > > We have a few somewhat competing goals here and we need to make a > tradeoff between them: > > a) minimizing split BTF size (or rather not making it too large) > b) making sure PTR_TO_BTF_ID types work (so module kfuncs can accept > task_struct and others) > c) keeping relocation simple, fast, and reliable/unambiguous > > By copying anonymous types we potentially hurt a) (but presumably not > a lot to worry about), and we significantly improve c) by making > relocation simple/fast/reliably (to the extent possible with "by name" > lookups). And we (presumably) don't change b), it still works for all > existing and future cases. > Yeah, case b) is the only lingering concern I have, but in practice it seems unlikely to arise. One point of clarification - we've discussed so far mostly anonymous STRUCTs and UNIONs; do you think there are other anonymous types we should consider, ARRAYs for example? > If we ever need to pass anonymous typedef'ed types to kfunc, we'll > need to think how to represent them in distilled base BTF. But it most > probably won't be TYPEDEF -> STRUCT chain, but rather empty STRUCT > with the name of original TYPEDEF + some bit to specify that we are > looking for a TYPEDEF in real base BTF; I think we have a pass forward > here, and that's the main thing, but I don't think it's a problem > worth solving now (or ever). > > WDYT? Agreed. I think (hope) it's unlikely to arise. > >> >>>>> we move a type to the module we would need to also ensure any base types >>>>> that refer to it move there too. For now it is I think simpler to >>>>> retain the existing split/base type classifications. >>>> >>>> We would have to finalize this part before landing, as it has big >>>> implications on the relocation process. >>> >>> Ran out of time, sorry, will continue on Monday. But please consider, >>> meanwhile, what I mentioned about only having named >>> structs/unions/enums in distilled base BTF. >>> >> >> Sure, I'll dig into it further. FWIW I agree with the goal of moving >> anonymous structs/unions if it's doable. I can't see any blocking issues >> thus far. > > Yep, please give it a go, and I'll try to finish the review today, thanks. > >> >>>> >>>> >>>>> >>>>> [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/ >>>>> [2] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/ >>>>> >>>>> >>>>> >>>>> Alan Maguire (13): >>>>> libbpf: add support to btf__add_fwd() for ENUM64 >>>>> libbpf: add btf__distill_base() creating split BTF with distilled base >>>>> BTF >>>>> selftests/bpf: test distilled base, split BTF generation >>>>> libbpf: add btf__parse_opts() API for flexible BTF parsing >>>>> bpftool: support displaying raw split BTF using base BTF section as >>>>> base >>>>> kbuild,bpf: switch to using --btf_features for pahole v1.26 and later >>>>> resolve_btfids: use .BTF.base ELF section as base BTF if -B option is >>>>> used >>>>> kbuild, bpf: add module-specific pahole/resolve_btfids flags for >>>>> distilled base BTF >>>>> libbpf: split BTF relocation >>>>> module, bpf: store BTF base pointer in struct module >>>>> libbpf,bpf: share BTF relocate-related code with kernel >>>>> selftests/bpf: extend distilled BTF tests to cover BTF relocation >>>>> bpftool: support displaying relocated-with-base split BTF >>>>> >>>>> include/linux/btf.h | 32 + >>>>> include/linux/module.h | 2 + >>>>> kernel/bpf/Makefile | 8 + >>>>> kernel/bpf/btf.c | 227 +++++-- >>>>> kernel/module/main.c | 5 +- >>>>> scripts/Makefile.btf | 12 +- >>>>> scripts/Makefile.modfinal | 4 +- >>>>> .../bpf/bpftool/Documentation/bpftool-btf.rst | 15 +- >>>>> tools/bpf/bpftool/bash-completion/bpftool | 7 +- >>>>> tools/bpf/bpftool/btf.c | 20 +- >>>>> tools/bpf/bpftool/main.c | 14 +- >>>>> tools/bpf/bpftool/main.h | 2 + >>>>> tools/bpf/resolve_btfids/main.c | 22 +- >>>>> tools/lib/bpf/Build | 2 +- >>>>> tools/lib/bpf/btf.c | 561 +++++++++++----- >>>>> tools/lib/bpf/btf.h | 61 ++ >>>>> tools/lib/bpf/btf_common.c | 146 ++++ >>>>> tools/lib/bpf/btf_relocate.c | 630 ++++++++++++++++++ >>>>> tools/lib/bpf/libbpf.map | 3 + >>>>> tools/lib/bpf/libbpf_internal.h | 2 + >>>>> .../selftests/bpf/prog_tests/btf_distill.c | 298 +++++++++ >>>>> 21 files changed, 1864 insertions(+), 209 deletions(-) >>>>> create mode 100644 tools/lib/bpf/btf_common.c >>>>> create mode 100644 tools/lib/bpf/btf_relocate.c >>>>> create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c >>>>> >>>>> -- >>>>> 2.31.1 >>>>> >>>
On Mon, Apr 29, 2024 at 10:31 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > On 29/04/2024 18:05, Andrii Nakryiko wrote: > > On Mon, Apr 29, 2024 at 8:25 AM Alan Maguire <alan.maguire@oracle.com> wrote: > >> > >> On 27/04/2024 01:24, Andrii Nakryiko wrote: > >>> On Fri, Apr 26, 2024 at 3:56 PM Andrii Nakryiko > >>> <andrii.nakryiko@gmail.com> wrote: > >>>> > >>>> On Wed, Apr 24, 2024 at 8:48 AM Alan Maguire <alan.maguire@oracle.com> wrote: > >>>>> > >>>>> Split BPF Type Format (BTF) provides huge advantages in that kernel > >>>>> modules only have to provide type information for types that they do not > >>>>> share with the core kernel; for core kernel types, split BTF refers to > >>>>> core kernel BTF type ids. So for a STRUCT sk_buff, a module that > >>>>> uses that structure (or a pointer to it) simply needs to refer to the > >>>>> core kernel type id, saving the need to define the structure and its many > >>>>> dependents. This cuts down on duplication and makes BTF as compact > >>>>> as possible. > >>>>> > >>>>> However, there is a downside. This scheme requires the references from > >>>>> split BTF to base BTF to be valid not just at encoding time, but at use > >>>>> time (when the module is loaded). Even a small change in kernel types > >>>>> can perturb the type ids in core kernel BTF, and due to pahole's > >>>>> parallel processing of compilation units, even an unchanged kernel can > >>>>> have different type ids if BTF is re-generated. So we have a robustness > >>>>> problem for split BTF for cases where a module is not always compiled at > >>>>> the same time as the kernel. This problem is particularly acute for > >>>>> distros which generally want module builders to be able to compile a > >>>>> module for the lifetime of a Linux stable-based release, and have it > >>>>> continue to be valid over the lifetime of that release, even as changes > >>>>> in data structures (and hence BTF types) accrue. Today it's not > >>>>> possible to generate BTF for modules that works beyond the initial > >>>>> kernel it is compiled against - kernel bugfixes etc invalidate the split > >>>>> BTF references to vmlinux BTF, and BTF is no longer usable for the > >>>>> module. > >>>>> > >>>>> The goal of this series is to provide options to provide additional > >>>>> context for cases like this. That context comes in the form of > >>>>> distilled base BTF; it stands in for the base BTF, and contains > >>>>> information about the types referenced from split BTF, but not their > >>>>> full descriptions. The modified split BTF will refer to type ids in > >>>>> this .BTF.base section, and when the kernel loads such modules it > >>>>> will use that base BTF to map references from split BTF to the > >>>>> current vmlinux BTF - a process of relocating split BTF with the > >>>>> currently-running kernel's vmlinux base BTF. > >>>>> > >>>>> A module builder - using this series along with the pahole changes - > >>>>> can then build a module with distilled base BTF via an out-of-tree > >>>>> module build, i.e. > >>>>> > >>>>> make -C . M=path/2/module > >>>>> > >>>>> The module will have a .BTF section (the split BTF) and a > >>>>> .BTF.base section. The latter is small in size - distilled base > >>>>> BTF does not need full struct/union/enum information for named > >>>>> types for example. For 2667 modules built with distilled base BTF, > >>>>> the average size observed was 1556 bytes (stddev 1563). > >>>>> > >>>>> Note that for the in-tree modules, this approach is not needed as > >>>>> split and base BTF in the case of in-tree modules are always built > >>>>> and re-built together. > >>>>> > >>>>> The series first focuses on generating split BTF with distilled base > >>>>> BTF, and provides btf__parse_opts() which allows specification > >>>>> of the section name from which to read BTF data, since we now have > >>>>> both .BTF and .BTF.base sections that can contain such data. > >>>>> > >>>>> Then we add support to resolve_btfids for generating the .BTF.ids > >>>>> section with reference to the .BTF.base section - this ensures the > >>>>> .BTF.ids match those used in the split/base BTF. > >>>>> > >>>>> Finally the series provides the mechanism for relocating split BTF with > >>>>> a new base; the distilled base BTF is used to map the references to base > >>>>> BTF in the split BTF to the new base. For the kernel, this relocation > >>>>> process happens at module load time, and we relocate split BTF > >>>>> references to point at types in the current vmlinux BTF. As part of > >>>>> this, .BTF.ids references need to be mapped also. > >>>>> > >>>>> So concretely, what happens is > >>>>> > >>>>> - we generate split BTF in the .BTF section of a module that refers to > >>>>> types in the .BTF.base section as base types; these are not full > >>>>> type descriptions but provide information about the base type. So > >>>>> a STRUCT sk_buff would be represented as a FWD struct sk_buff in > >>>>> distilled base BTF for example. > >>>>> - when the module is loaded, the split BTF is relocated with vmlinux > >>>>> BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff > >>>>> in vmlinux BTF and map all split BTF references to the distilled base > >>>>> FWD sk_buff, replacing them with references to the vmlinux BTF > >>>>> STRUCT sk_buff. > >>>>> > >>>>> Support is also added to bpftool to be able to display split BTF > >>>>> relative to its .BTF.base section, and also to display the relocated > >>>>> form via the "-R path_to_base_btf". > >>>>> > >>>>> A previous approach to this problem [1] utilized standalone BTF for such > >>>>> cases - where the BTF is not defined relative to base BTF so there is no > >>>>> relocation required. The problem with that approach is that from > >>>>> the verifier perspective, some types are special, and having a custom > >>>>> representation of a core kernel type that did not necessarily match the > >>>>> current representation is not tenable. So the approach taken here was > >>>>> to preserve the split BTF model while minimizing the representation of > >>>>> the context needed to relocate split and current vmlinux BTF. > >>>>> > >>>>> To generate distilled .BTF.base sections the associated dwarves > >>>>> patch (to be applied on the "next" branch there) is needed. > >>>>> Without it, things will still work but bpf_testmod will not be built > >>>>> with a .BTF.base section. > >>>>> > >>>>> Changes since RFC [2]: > >>>>> > >>>>> - updated terminology; we replace clunky "base reference" BTF with > >>>>> distilling base BTF into a .BTF.base section. Similarly BTF > >>>>> reconcilation becomes BTF relocation (Andrii, most patches) > >>>>> - add distilled base BTF by default for out-of-tree modules > >>>>> (Alexei, patch 8) > >>>>> - distill algorithm updated to record size of embedded struct/union > >>>>> by recording it as a 0-vlen STRUCT/UNION with size preserved > >>>>> (Andrii, patch 2) > >>>>> - verify size match on relocation for such STRUCT/UNIONs (Andrii, > >>>>> patch 9) > >>>>> - with embedded STRUCT/UNION recording size, we can have bpftool > >>>>> dump a header representation using .BTF.base + .BTF sections > >>>>> rather than special-casing and refusing to use "format c" for > >>>>> that case (patch 5) > >>>>> - match enum with enum64 and vice versa (Andrii, patch 9) > >>>>> - ensure that resolve_btfids works with BTF without .BTF.base > >>>>> section (patch 7) > >>>>> - update tests to cover embedded types, arrays and function > >>>>> prototypes (patches 3, 12) > >>>>> > >>>>> One change not made yet is adding anonymous struct/unions that the split > >>>>> BTF references in base BTF to the module instead of adding them to the > >>>>> .BTF.base section. That would involve having to maintain two pipes for > >>>>> writing BTF, one for the .BTF.base and one for the split BTF. It would > >>>>> be possible, but there are I think some edge cases that might make it > >>>>> tricky. For example consider a split BTF reference to a base BTF > >>>>> ARRAY which in turn referenced an anonymous STRUCT as type. In such a > >>>>> case, it wouldn't make sense to have the array in the .BTF.base section > >>>>> while having the STRUCT in the module. The general concern is that once > >>>> > >>>> Hm.. not really? ARRAY is a reference type (and anonymous at that), so > >>>> it would have to stay in module's BTF, no? I'll go read the patch > >>>> series again, but let me know if I'm missing something. > >>>> > >> > >> The way things currently work, we preserve all relationships prior to > >> distilling base BTF. That is, if a type was in split BTF prior to > >> calling btf__distill_base(), it will stay in split BTF afterwards. Ditto > >> for base types. This is true for reference types as well as named types. > >> So in the case of the above array for example, prior to distilling types > >> it is in base BTF. If it in turn then referred to a base anonymous > >> struct, both would be in the base and thus the distilled base BTF. In > >> the above case, I was suggesting the array itself was referred to from > >> split BTF, but not in split BTF, sorry if that wasn't clearer. > >> > >> So the problem comes if we moved the anon struct to the module; then we > >> also need to move types that depend on it there. This means we'd need to > >> make the move recursive. That seems doable; the only question is around > > > > Yep, it should be very doable. We just mark everything used from > > "to-be-moved-to-new-split-BTF" types recursively, unless it's > > "qualified named type", where we stop. You have a pass to mark > > embedded types, here it might be another pass to mark > > "used-by-split-BTF-types-but-not-distillable" types. > > > >> the logistics and the effects of doing so. At one extreme we might end > >> up with something that resembles standalone BTF (many/most types in the > > > > My hypothesis is that it is very unlikely that there will be a lot of > > types that have to be copied into split BTF. > > > >> split BTF). That seems unlikely in most cases. I examined one module's > >> BTF base for example, and the only anon structs arose from typedef > >> references possible_net_t, sockptr_t, rwlock_t and atomic_t. These in > >> turn were only referenced once elsewhere in distilled base BTF; a > >> sockptr was in a FUNC_PROTO, but aside from that the typedefs were not > >> otherwise referenced in distilled base BTF, they were referenced in > >> split BTF as embeeded struct field types. > >> > >> So moving all of this to the split BTF seems possible; what I think we > >> probably need to think on a bit is how to handle relocation. Is there a > >> need to relocate these module types too, or can we live with having > >> duplicate atomic_t/sockptr_t typedefs in the module? Currently > >> relocation is simplified by the fact that we only need to relocate the > >> types prior to the module's start id. All we need to do is rewrite type > >> references in split BTF to base ids. If we were relocating split types > >> too we'd need to remove them from split BTF. > > > > I think anything that is not in distilled base should not be > > relocated, so current simplicity is remapping distilled BTF IDs will > > remain. It's ok to have clones/copies of some simple typedefs, > > probably. > > > > We have a few somewhat competing goals here and we need to make a > > tradeoff between them: > > > > a) minimizing split BTF size (or rather not making it too large) > > b) making sure PTR_TO_BTF_ID types work (so module kfuncs can accept > > task_struct and others) > > c) keeping relocation simple, fast, and reliable/unambiguous > > > > By copying anonymous types we potentially hurt a) (but presumably not > > a lot to worry about), and we significantly improve c) by making > > relocation simple/fast/reliably (to the extent possible with "by name" > > lookups). And we (presumably) don't change b), it still works for all > > existing and future cases. > > > > Yeah, case b) is the only lingering concern I have, but in practice it > seems unlikely to arise. One point of clarification - we've discussed so > far mostly anonymous STRUCTs and UNIONs; do you think there are other > anonymous types we should consider, ARRAYs for example? Everything is technically possible, but I'd be surprised if anything but STRUCT/UNION is referred to by PTR_TO_BTF_ID for kfunc. But let's get there first. > > If we ever need to pass anonymous typedef'ed types to kfunc, we'll > > need to think how to represent them in distilled base BTF. But it most > > probably won't be TYPEDEF -> STRUCT chain, but rather empty STRUCT > > with the name of original TYPEDEF + some bit to specify that we are > > looking for a TYPEDEF in real base BTF; I think we have a pass forward > > here, and that's the main thing, but I don't think it's a problem > > worth solving now (or ever). > > > > WDYT? > > Agreed. I think (hope) it's unlikely to arise. > > > > >> > >>>>> we move a type to the module we would need to also ensure any base types > >>>>> that refer to it move there too. For now it is I think simpler to > >>>>> retain the existing split/base type classifications. > >>>> > >>>> We would have to finalize this part before landing, as it has big > >>>> implications on the relocation process. > >>> > >>> Ran out of time, sorry, will continue on Monday. But please consider, > >>> meanwhile, what I mentioned about only having named > >>> structs/unions/enums in distilled base BTF. > >>> > >> > >> Sure, I'll dig into it further. FWIW I agree with the goal of moving > >> anonymous structs/unions if it's doable. I can't see any blocking issues > >> thus far. > > > > Yep, please give it a go, and I'll try to finish the review today, thanks. > > > >> > >>>> > >>>> > >>>>> > >>>>> [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/ > >>>>> [2] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/ > >>>>> > >>>>> > >>>>> > >>>>> Alan Maguire (13): > >>>>> libbpf: add support to btf__add_fwd() for ENUM64 > >>>>> libbpf: add btf__distill_base() creating split BTF with distilled base > >>>>> BTF > >>>>> selftests/bpf: test distilled base, split BTF generation > >>>>> libbpf: add btf__parse_opts() API for flexible BTF parsing > >>>>> bpftool: support displaying raw split BTF using base BTF section as > >>>>> base > >>>>> kbuild,bpf: switch to using --btf_features for pahole v1.26 and later > >>>>> resolve_btfids: use .BTF.base ELF section as base BTF if -B option is > >>>>> used > >>>>> kbuild, bpf: add module-specific pahole/resolve_btfids flags for > >>>>> distilled base BTF > >>>>> libbpf: split BTF relocation > >>>>> module, bpf: store BTF base pointer in struct module > >>>>> libbpf,bpf: share BTF relocate-related code with kernel > >>>>> selftests/bpf: extend distilled BTF tests to cover BTF relocation > >>>>> bpftool: support displaying relocated-with-base split BTF > >>>>> > >>>>> include/linux/btf.h | 32 + > >>>>> include/linux/module.h | 2 + > >>>>> kernel/bpf/Makefile | 8 + > >>>>> kernel/bpf/btf.c | 227 +++++-- > >>>>> kernel/module/main.c | 5 +- > >>>>> scripts/Makefile.btf | 12 +- > >>>>> scripts/Makefile.modfinal | 4 +- > >>>>> .../bpf/bpftool/Documentation/bpftool-btf.rst | 15 +- > >>>>> tools/bpf/bpftool/bash-completion/bpftool | 7 +- > >>>>> tools/bpf/bpftool/btf.c | 20 +- > >>>>> tools/bpf/bpftool/main.c | 14 +- > >>>>> tools/bpf/bpftool/main.h | 2 + > >>>>> tools/bpf/resolve_btfids/main.c | 22 +- > >>>>> tools/lib/bpf/Build | 2 +- > >>>>> tools/lib/bpf/btf.c | 561 +++++++++++----- > >>>>> tools/lib/bpf/btf.h | 61 ++ > >>>>> tools/lib/bpf/btf_common.c | 146 ++++ > >>>>> tools/lib/bpf/btf_relocate.c | 630 ++++++++++++++++++ > >>>>> tools/lib/bpf/libbpf.map | 3 + > >>>>> tools/lib/bpf/libbpf_internal.h | 2 + > >>>>> .../selftests/bpf/prog_tests/btf_distill.c | 298 +++++++++ > >>>>> 21 files changed, 1864 insertions(+), 209 deletions(-) > >>>>> create mode 100644 tools/lib/bpf/btf_common.c > >>>>> create mode 100644 tools/lib/bpf/btf_relocate.c > >>>>> create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c > >>>>> > >>>>> -- > >>>>> 2.31.1 > >>>>> > >>>