[v6,bpf-next,00/21] bpf: syscall program, FD array, loader program, light skeleton.

Message ID	20210514003623.28033-1-alexei.starovoitov@gmail.com (mailing list archive)
Headers	show Return-Path: <bpf-owner@kernel.org> From: Alexei Starovoitov <alexei.starovoitov@gmail.com> To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, john.fastabend@gmail.com, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 00/21] bpf: syscall program, FD array, loader program, light skeleton. Date: Thu, 13 May 2021 17:36:02 -0700 Message-Id: <20210514003623.28033-1-alexei.starovoitov@gmail.com> Precedence: bulk
Series	bpf: syscall program, FD array, loader program, light skeleton. \| expand [v6,bpf-next,00/21] bpf: syscall program, FD array, loader program, light skeleton. [v6,bpf-next,01/21] bpf: Introduce bpf_sys_bpf() helper and program type. [v6,bpf-next,02/21] bpf: Introduce bpfptr_t user/kernel pointer. [v6,bpf-next,03/21] bpf: Prepare bpf syscall to be used from kernel and user space. [v6,bpf-next,04/21] libbpf: Support for syscall program type [v6,bpf-next,05/21] selftests/bpf: Test for syscall program type [v6,bpf-next,06/21] bpf: Make btf_load command to be bpfptr_t compatible. [v6,bpf-next,07/21] selftests/bpf: Test for btf_load command. [v6,bpf-next,08/21] bpf: Introduce fd_idx [v6,bpf-next,09/21] bpf: Add bpf_btf_find_by_name_kind() helper. [v6,bpf-next,10/21] bpf: Add bpf_sys_close() helper. [v6,bpf-next,11/21] libbpf: Change the order of data and text relocations. [v6,bpf-next,12/21] libbpf: Add bpf_object pointer to kernel_supports(). [v6,bpf-next,13/21] libbpf: Preliminary support for fd_idx [v6,bpf-next,14/21] libbpf: Generate loader program out of BPF ELF file. [v6,bpf-next,15/21] libbpf: Cleanup temp FDs when intermediate sys_bpf fails. [v6,bpf-next,16/21] libbpf: Introduce bpf_map__initial_value(). [v6,bpf-next,17/21] bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command. [v6,bpf-next,18/21] selftests/bpf: Convert few tests to light skeleton. [v6,bpf-next,19/21] selftests/bpf: Convert atomics test to light skeleton. [v6,bpf-next,20/21] selftests/bpf: Convert test printk to use rodata. [v6,bpf-next,21/21] selftests/bpf: Convert test trace_printk to lskel.

Message ID

20210514003623.28033-1-alexei.starovoitov@gmail.com (mailing list archive)

Headers

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: davem@davemloft.net
Cc: daniel@iogearbox.net, andrii@kernel.org, john.fastabend@gmail.com,
        bpf@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH v6 bpf-next 00/21] bpf: syscall program, FD array,
 loader program, light skeleton.
Date: Thu, 13 May 2021 17:36:02 -0700
Message-Id: <20210514003623.28033-1-alexei.starovoitov@gmail.com>
Precedence: bulk

Series

bpf: syscall program, FD array, loader program, light skeleton. | expand

Message

Alexei Starovoitov May 14, 2021, 12:36 a.m. UTC

From: Alexei Starovoitov <ast@kernel.org>

v5->v6:
- fixed issue found by bpf CI. The light skeleton generation was
  doing a dry-run of loading the program where all actual sys_bpf syscalls
  were replaced by calls into gen_loader. Turned out that search for valid
  vmlinux_btf was not stubbed out which was causing light skeleton gen
  to fail on older kernels.
- significantly reduced verbosity of gen_loader.c.
- an example trace_printk.lskel.h generated out of progs/trace_printk.c
  https://gist.github.com/4ast/774ea58f8286abac6aa8e3bf3bf3b903

v4->v5:
- addressed a bunch of minor comments from Andrii.
- the main difference is that lskel is now more robust in case of errors
  and a bit cleaner looking.

v3->v4:
- cleaned up closing of temporary FDs in case intermediate sys_bpf fails during
  execution of loader program.
- added support for rodata in the skeleton.
- enforce bpf_prog_type_syscall to be sleepable, since it needs bpf_copy_from_user
  to populate rodata map.
- converted test trace_printk to use lskel to test rodata access.
- various small bug fixes.

v2->v3: Addressed comments from Andrii and John.
- added support for setting max_entries after signature verification
  and used it in ringbuf test, since ringbuf's max_entries has to be updated
  after skeleton open() and before load(). See patch 20.
- bpf_btf_find_by_name_kind doesn't take btf_fd anymore.
  Because of that removed attach_prog_fd from bpf_prog_desc in lskel.
  Both features to be added later.
- cleaned up closing of fd==0 during loader gen by resetting fds back to -1.
- converted loader gen to use memset(&attr, cmd_specific_attr_size).
  would love to see this optimization in the rest of libbpf.
- fixed memory leak during loader_gen in case of enomem.
- support for fd_array kernel feature is added in patch 9 to have
  exhaustive testing across all selftests and then partially reverted
  in patch 15 to keep old style map_fd patching tested as well.
- since fentry_test/fexit_tests were extended with re-attach had to add
  support for per-program attach method in lskel and use it in the tests.
- cleanup closing of fds in lskel in case of partial failures.
- fixed numerous small nits.

v1->v2: Addressed comments from Al, Yonghong and Andrii.
- documented sys_close fdget/fdput requirement and non-recursion check.
- reduced internal api leaks between libbpf and bpftool.
  Now bpf_object__gen_loader() is the only new libbf api with minimal fields.
- fixed light skeleton __destroy() method to munmap and close maps and progs.
- refactored bpf_btf_find_by_name_kind to return btf_id | (btf_obj_fd << 32).
- refactored use of bpf_btf_find_by_name_kind from loader prog.
- moved auto-gen like code into skel_internal.h that is used by *.lskel.h
  It has minimal static inline bpf_load_and_run() method used by lskel.
- added lksel.h example in patch 15.
- replaced union bpf_map_prog_desc with struct bpf_map_desc and struct bpf_prog_desc.
- removed mark_feat_supported and added a patch to pass 'obj' into kernel_supports.
- added proper tracking of temporary FDs in loader prog and their cleanup via bpf_sys_close.
- rename gen_trace.c into gen_loader.c to better align the naming throughout.
- expanded number of available helpers in new prog type.
- added support for raw_tp attaching in lskel.
  lskel supports tracing and raw_tp progs now.
  It correctly loads all networking prog types too, but __attach() method is tbd.
- converted progs/test_ksyms_module.c to lskel.
- minor feedback fixes all over.

The description of V1 set is still valid:
----
This is a first step towards signed bpf programs and the third approach of that kind.
The first approach was to bring libbpf into the kernel as a user-mode-driver.
The second approach was to invent a new file format and let kernel execute
that format as a sequence of syscalls that create maps and load programs.
This third approach is using new type of bpf program instead of inventing file format.
1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
after months of work.

To make it work the following new concepts are introduced:
1. syscall bpf program type
A kind of bpf program that can do sys_bpf and sys_close syscalls.
It can only execute in user context.

2. FD array or FD index.
Traditionally BPF instructions are patched with FDs.
What it means that maps has to be created first and then instructions modified
which breaks signature verification if the program is signed.
Instead of patching each instruction with FD patch it with an index into array of FDs.
That makes the program signature stable if it uses maps.

3. loader program that is generated as "strace of libbpf".
When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
load BTF, create maps, populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map and single bpf program that
does those syscalls.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.

4. light skeleton
Instead of embedding the whole elf file into skeleton and using libbpf
to parse it later generate a loader program and embed it into "light skeleton".
Such skeleton can load the same set of elf files, but it doesn't need
libbpf and libelf to do that. It only needs few sys_bpf wrappers.

Future steps:
- support CO-RE in the kernel. This patch set is already too big,
so that critical feature is left for the next step.
- generate light skeleton in golang to allow such users use BTF and
all other features provided by libbpf
- generate light skeleton for kernel, so that bpf programs can be embeded
in the kernel module. The UMD usage in bpf_preload will be replaced with
such skeleton, so bpf_preload would become a standard kernel module
without user space dependency.
- finally do the signing of the loader program.

The patches are work in progress with few rough edges.

Alexei Starovoitov (21):
  bpf: Introduce bpf_sys_bpf() helper and program type.
  bpf: Introduce bpfptr_t user/kernel pointer.
  bpf: Prepare bpf syscall to be used from kernel and user space.
  libbpf: Support for syscall program type
  selftests/bpf: Test for syscall program type
  bpf: Make btf_load command to be bpfptr_t compatible.
  selftests/bpf: Test for btf_load command.
  bpf: Introduce fd_idx
  bpf: Add bpf_btf_find_by_name_kind() helper.
  bpf: Add bpf_sys_close() helper.
  libbpf: Change the order of data and text relocations.
  libbpf: Add bpf_object pointer to kernel_supports().
  libbpf: Preliminary support for fd_idx
  libbpf: Generate loader program out of BPF ELF file.
  libbpf: Cleanup temp FDs when intermediate sys_bpf fails.
  libbpf: Introduce bpf_map__initial_value().
  bpftool: Use syscall/loader program in "prog load" and "gen skeleton"
    command.
  selftests/bpf: Convert few tests to light skeleton.
  selftests/bpf: Convert atomics test to light skeleton.
  selftests/bpf: Convert test printk to use rodata.
  selftests/bpf: Convert test trace_printk to lskel.

 include/linux/bpf.h                           |  19 +-
 include/linux/bpf_types.h                     |   2 +
 include/linux/bpf_verifier.h                  |   1 +
 include/linux/bpfptr.h                        |  75 ++
 include/linux/btf.h                           |   2 +-
 include/uapi/linux/bpf.h                      |  38 +-
 kernel/bpf/bpf_iter.c                         |  13 +-
 kernel/bpf/btf.c                              |  70 +-
 kernel/bpf/syscall.c                          | 194 ++++-
 kernel/bpf/verifier.c                         |  89 ++-
 net/bpf/test_run.c                            |  45 +-
 tools/bpf/bpftool/Makefile                    |   2 +-
 tools/bpf/bpftool/gen.c                       | 386 +++++++++-
 tools/bpf/bpftool/main.c                      |   7 +-
 tools/bpf/bpftool/main.h                      |   1 +
 tools/bpf/bpftool/prog.c                      | 107 ++-
 tools/bpf/bpftool/xlated_dumper.c             |   3 +
 tools/include/uapi/linux/bpf.h                |  38 +-
 tools/lib/bpf/Build                           |   2 +-
 tools/lib/bpf/bpf_gen_internal.h              |  41 +
 tools/lib/bpf/gen_loader.c                    | 729 ++++++++++++++++++
 tools/lib/bpf/libbpf.c                        | 387 ++++++++--
 tools/lib/bpf/libbpf.h                        |  13 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_internal.h               |   2 +
 tools/lib/bpf/skel_internal.h                 | 123 +++
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |  16 +-
 .../selftests/bpf/prog_tests/atomics.c        |  72 +-
 .../selftests/bpf/prog_tests/fentry_fexit.c   |   6 +-
 .../selftests/bpf/prog_tests/fentry_test.c    |  10 +-
 .../selftests/bpf/prog_tests/fexit_sleep.c    |   6 +-
 .../selftests/bpf/prog_tests/fexit_test.c     |  10 +-
 .../selftests/bpf/prog_tests/kfunc_call.c     |   6 +-
 .../selftests/bpf/prog_tests/ksyms_module.c   |   2 +-
 .../selftests/bpf/prog_tests/ringbuf.c        |   8 +-
 .../selftests/bpf/prog_tests/syscall.c        |  55 ++
 .../selftests/bpf/prog_tests/trace_printk.c   |   5 +-
 tools/testing/selftests/bpf/progs/syscall.c   | 121 +++
 .../selftests/bpf/progs/test_ringbuf.c        |   4 +-
 .../selftests/bpf/progs/test_subprogs.c       |  13 +
 .../selftests/bpf/progs/trace_printk.c        |   6 +-
 42 files changed, 2474 insertions(+), 258 deletions(-)
 create mode 100644 include/linux/bpfptr.h
 create mode 100644 tools/lib/bpf/bpf_gen_internal.h
 create mode 100644 tools/lib/bpf/gen_loader.c
 create mode 100644 tools/lib/bpf/skel_internal.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/syscall.c
 create mode 100644 tools/testing/selftests/bpf/progs/syscall.c

Comments

Andrii Nakryiko May 14, 2021, 6:16 p.m. UTC | #1

On Thu, May 13, 2021 at 5:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> v5->v6:

I think I acked all the patches and I think it will be fair to let
Daniel apply this patch set :)

> - fixed issue found by bpf CI. The light skeleton generation was
>   doing a dry-run of loading the program where all actual sys_bpf syscalls
>   were replaced by calls into gen_loader. Turned out that search for valid
>   vmlinux_btf was not stubbed out which was causing light skeleton gen
>   to fail on older kernels.
> - significantly reduced verbosity of gen_loader.c.
> - an example trace_printk.lskel.h generated out of progs/trace_printk.c
>   https://gist.github.com/4ast/774ea58f8286abac6aa8e3bf3bf3b903
>
> v4->v5:
> - addressed a bunch of minor comments from Andrii.
> - the main difference is that lskel is now more robust in case of errors
>   and a bit cleaner looking.
>
> v3->v4:
> - cleaned up closing of temporary FDs in case intermediate sys_bpf fails during
>   execution of loader program.
> - added support for rodata in the skeleton.
> - enforce bpf_prog_type_syscall to be sleepable, since it needs bpf_copy_from_user
>   to populate rodata map.
> - converted test trace_printk to use lskel to test rodata access.
> - various small bug fixes.
>
> v2->v3: Addressed comments from Andrii and John.
> - added support for setting max_entries after signature verification
>   and used it in ringbuf test, since ringbuf's max_entries has to be updated
>   after skeleton open() and before load(). See patch 20.
> - bpf_btf_find_by_name_kind doesn't take btf_fd anymore.
>   Because of that removed attach_prog_fd from bpf_prog_desc in lskel.
>   Both features to be added later.
> - cleaned up closing of fd==0 during loader gen by resetting fds back to -1.
> - converted loader gen to use memset(&attr, cmd_specific_attr_size).
>   would love to see this optimization in the rest of libbpf.
> - fixed memory leak during loader_gen in case of enomem.
> - support for fd_array kernel feature is added in patch 9 to have
>   exhaustive testing across all selftests and then partially reverted
>   in patch 15 to keep old style map_fd patching tested as well.
> - since fentry_test/fexit_tests were extended with re-attach had to add
>   support for per-program attach method in lskel and use it in the tests.
> - cleanup closing of fds in lskel in case of partial failures.
> - fixed numerous small nits.
>
> v1->v2: Addressed comments from Al, Yonghong and Andrii.
> - documented sys_close fdget/fdput requirement and non-recursion check.
> - reduced internal api leaks between libbpf and bpftool.
>   Now bpf_object__gen_loader() is the only new libbf api with minimal fields.
> - fixed light skeleton __destroy() method to munmap and close maps and progs.
> - refactored bpf_btf_find_by_name_kind to return btf_id | (btf_obj_fd << 32).
> - refactored use of bpf_btf_find_by_name_kind from loader prog.
> - moved auto-gen like code into skel_internal.h that is used by *.lskel.h
>   It has minimal static inline bpf_load_and_run() method used by lskel.
> - added lksel.h example in patch 15.
> - replaced union bpf_map_prog_desc with struct bpf_map_desc and struct bpf_prog_desc.
> - removed mark_feat_supported and added a patch to pass 'obj' into kernel_supports.
> - added proper tracking of temporary FDs in loader prog and their cleanup via bpf_sys_close.
> - rename gen_trace.c into gen_loader.c to better align the naming throughout.
> - expanded number of available helpers in new prog type.
> - added support for raw_tp attaching in lskel.
>   lskel supports tracing and raw_tp progs now.
>   It correctly loads all networking prog types too, but __attach() method is tbd.
> - converted progs/test_ksyms_module.c to lskel.
> - minor feedback fixes all over.
>
> The description of V1 set is still valid:
> ----
> This is a first step towards signed bpf programs and the third approach of that kind.
> The first approach was to bring libbpf into the kernel as a user-mode-driver.
> The second approach was to invent a new file format and let kernel execute
> that format as a sequence of syscalls that create maps and load programs.
> This third approach is using new type of bpf program instead of inventing file format.
> 1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
> after months of work.
>
> To make it work the following new concepts are introduced:
> 1. syscall bpf program type
> A kind of bpf program that can do sys_bpf and sys_close syscalls.
> It can only execute in user context.
>
> 2. FD array or FD index.
> Traditionally BPF instructions are patched with FDs.
> What it means that maps has to be created first and then instructions modified
> which breaks signature verification if the program is signed.
> Instead of patching each instruction with FD patch it with an index into array of FDs.
> That makes the program signature stable if it uses maps.
>
> 3. loader program that is generated as "strace of libbpf".
> When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
> load BTF, create maps, populate maps and finally load programs.
> Instead of actually doing the syscalls generate a trace of what libbpf
> would have done and represent it as the "loader program".
> The "loader program" consists of single map and single bpf program that
> does those syscalls.
> Executing such "loader program" via bpf_prog_test_run() command will
> replay the sequence of syscalls that libbpf would have done which will result
> the same maps created and programs loaded as specified in the elf file.
> The "loader program" removes libelf and majority of libbpf dependency from
> program loading process.
>
> 4. light skeleton
> Instead of embedding the whole elf file into skeleton and using libbpf
> to parse it later generate a loader program and embed it into "light skeleton".
> Such skeleton can load the same set of elf files, but it doesn't need
> libbpf and libelf to do that. It only needs few sys_bpf wrappers.
>
> Future steps:
> - support CO-RE in the kernel. This patch set is already too big,
> so that critical feature is left for the next step.
> - generate light skeleton in golang to allow such users use BTF and
> all other features provided by libbpf
> - generate light skeleton for kernel, so that bpf programs can be embeded
> in the kernel module. The UMD usage in bpf_preload will be replaced with
> such skeleton, so bpf_preload would become a standard kernel module
> without user space dependency.
> - finally do the signing of the loader program.
>
> The patches are work in progress with few rough edges.
>
> Alexei Starovoitov (21):
>   bpf: Introduce bpf_sys_bpf() helper and program type.
>   bpf: Introduce bpfptr_t user/kernel pointer.
>   bpf: Prepare bpf syscall to be used from kernel and user space.
>   libbpf: Support for syscall program type
>   selftests/bpf: Test for syscall program type
>   bpf: Make btf_load command to be bpfptr_t compatible.
>   selftests/bpf: Test for btf_load command.
>   bpf: Introduce fd_idx
>   bpf: Add bpf_btf_find_by_name_kind() helper.
>   bpf: Add bpf_sys_close() helper.
>   libbpf: Change the order of data and text relocations.
>   libbpf: Add bpf_object pointer to kernel_supports().
>   libbpf: Preliminary support for fd_idx
>   libbpf: Generate loader program out of BPF ELF file.
>   libbpf: Cleanup temp FDs when intermediate sys_bpf fails.
>   libbpf: Introduce bpf_map__initial_value().
>   bpftool: Use syscall/loader program in "prog load" and "gen skeleton"
>     command.
>   selftests/bpf: Convert few tests to light skeleton.
>   selftests/bpf: Convert atomics test to light skeleton.
>   selftests/bpf: Convert test printk to use rodata.
>   selftests/bpf: Convert test trace_printk to lskel.
>
>  include/linux/bpf.h                           |  19 +-
>  include/linux/bpf_types.h                     |   2 +
>  include/linux/bpf_verifier.h                  |   1 +
>  include/linux/bpfptr.h                        |  75 ++
>  include/linux/btf.h                           |   2 +-
>  include/uapi/linux/bpf.h                      |  38 +-
>  kernel/bpf/bpf_iter.c                         |  13 +-
>  kernel/bpf/btf.c                              |  70 +-
>  kernel/bpf/syscall.c                          | 194 ++++-
>  kernel/bpf/verifier.c                         |  89 ++-
>  net/bpf/test_run.c                            |  45 +-
>  tools/bpf/bpftool/Makefile                    |   2 +-
>  tools/bpf/bpftool/gen.c                       | 386 +++++++++-
>  tools/bpf/bpftool/main.c                      |   7 +-
>  tools/bpf/bpftool/main.h                      |   1 +
>  tools/bpf/bpftool/prog.c                      | 107 ++-
>  tools/bpf/bpftool/xlated_dumper.c             |   3 +
>  tools/include/uapi/linux/bpf.h                |  38 +-
>  tools/lib/bpf/Build                           |   2 +-
>  tools/lib/bpf/bpf_gen_internal.h              |  41 +
>  tools/lib/bpf/gen_loader.c                    | 729 ++++++++++++++++++
>  tools/lib/bpf/libbpf.c                        | 387 ++++++++--
>  tools/lib/bpf/libbpf.h                        |  13 +
>  tools/lib/bpf/libbpf.map                      |   2 +
>  tools/lib/bpf/libbpf_internal.h               |   2 +
>  tools/lib/bpf/skel_internal.h                 | 123 +++
>  tools/testing/selftests/bpf/.gitignore        |   1 +
>  tools/testing/selftests/bpf/Makefile          |  16 +-
>  .../selftests/bpf/prog_tests/atomics.c        |  72 +-
>  .../selftests/bpf/prog_tests/fentry_fexit.c   |   6 +-
>  .../selftests/bpf/prog_tests/fentry_test.c    |  10 +-
>  .../selftests/bpf/prog_tests/fexit_sleep.c    |   6 +-
>  .../selftests/bpf/prog_tests/fexit_test.c     |  10 +-
>  .../selftests/bpf/prog_tests/kfunc_call.c     |   6 +-
>  .../selftests/bpf/prog_tests/ksyms_module.c   |   2 +-
>  .../selftests/bpf/prog_tests/ringbuf.c        |   8 +-
>  .../selftests/bpf/prog_tests/syscall.c        |  55 ++
>  .../selftests/bpf/prog_tests/trace_printk.c   |   5 +-
>  tools/testing/selftests/bpf/progs/syscall.c   | 121 +++
>  .../selftests/bpf/progs/test_ringbuf.c        |   4 +-
>  .../selftests/bpf/progs/test_subprogs.c       |  13 +
>  .../selftests/bpf/progs/trace_printk.c        |   6 +-
>  42 files changed, 2474 insertions(+), 258 deletions(-)
>  create mode 100644 include/linux/bpfptr.h
>  create mode 100644 tools/lib/bpf/bpf_gen_internal.h
>  create mode 100644 tools/lib/bpf/gen_loader.c
>  create mode 100644 tools/lib/bpf/skel_internal.h
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/syscall.c
>  create mode 100644 tools/testing/selftests/bpf/progs/syscall.c
>
> --
> 2.30.2
>

Daniel Borkmann May 18, 2021, 7:54 p.m. UTC | #2

On 5/14/21 2:36 AM, Alexei Starovoitov wrote:
[...]
> This is a first step towards signed bpf programs and the third approach of that kind.
> The first approach was to bring libbpf into the kernel as a user-mode-driver.
> The second approach was to invent a new file format and let kernel execute
> that format as a sequence of syscalls that create maps and load programs.
> This third approach is using new type of bpf program instead of inventing file format.
> 1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
> after months of work.
> 
> To make it work the following new concepts are introduced:
> 1. syscall bpf program type
> A kind of bpf program that can do sys_bpf and sys_close syscalls.
> It can only execute in user context.
> 
> 2. FD array or FD index.
> Traditionally BPF instructions are patched with FDs.
> What it means that maps has to be created first and then instructions modified
> which breaks signature verification if the program is signed.
> Instead of patching each instruction with FD patch it with an index into array of FDs.
> That makes the program signature stable if it uses maps.
> 
> 3. loader program that is generated as "strace of libbpf".
> When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
> load BTF, create maps, populate maps and finally load programs.
> Instead of actually doing the syscalls generate a trace of what libbpf
> would have done and represent it as the "loader program".
> The "loader program" consists of single map and single bpf program that
> does those syscalls.
> Executing such "loader program" via bpf_prog_test_run() command will
> replay the sequence of syscalls that libbpf would have done which will result
> the same maps created and programs loaded as specified in the elf file.
> The "loader program" removes libelf and majority of libbpf dependency from
> program loading process.

More of a general question since afaik from prior discussion it didn't came up.
I think conceptually, it's rather weird to only being able to execute the loader
program which is later also supposed to do signing through the BPF_PROG_TEST_RUN
aka our _testing_ infrastructure. Given it's not mentioned in future steps, is
there anything planned before it becomes uapi and fixed part of skeleton (in
particular the libbpf bpf_load_and_run() helper officially calling into the
skel_sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr))) on this regard or is the
BPF_PROG_TEST_RUN really supposed to be the /main/ interface going forward;
what's the plan on this?

> 4. light skeleton
> Instead of embedding the whole elf file into skeleton and using libbpf
> to parse it later generate a loader program and embed it into "light skeleton".
> Such skeleton can load the same set of elf files, but it doesn't need
> libbpf and libelf to do that. It only needs few sys_bpf wrappers.
> 
> Future steps:
> - support CO-RE in the kernel. This patch set is already too big,
> so that critical feature is left for the next step.
> - generate light skeleton in golang to allow such users use BTF and
> all other features provided by libbpf
> - generate light skeleton for kernel, so that bpf programs can be embeded
> in the kernel module. The UMD usage in bpf_preload will be replaced with
> such skeleton, so bpf_preload would become a standard kernel module
> without user space dependency.
> - finally do the signing of the loader program.
> 
> The patches are work in progress with few rough edges.

Thanks a lot,
Daniel

Alexei Starovoitov May 18, 2021, 9:17 p.m. UTC | #3

On Tue, May 18, 2021 at 12:54 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 5/14/21 2:36 AM, Alexei Starovoitov wrote:
> [...]
> > This is a first step towards signed bpf programs and the third approach of that kind.
> > The first approach was to bring libbpf into the kernel as a user-mode-driver.
> > The second approach was to invent a new file format and let kernel execute
> > that format as a sequence of syscalls that create maps and load programs.
> > This third approach is using new type of bpf program instead of inventing file format.
> > 1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
> > after months of work.
> >
> > To make it work the following new concepts are introduced:
> > 1. syscall bpf program type
> > A kind of bpf program that can do sys_bpf and sys_close syscalls.
> > It can only execute in user context.
> >
> > 2. FD array or FD index.
> > Traditionally BPF instructions are patched with FDs.
> > What it means that maps has to be created first and then instructions modified
> > which breaks signature verification if the program is signed.
> > Instead of patching each instruction with FD patch it with an index into array of FDs.
> > That makes the program signature stable if it uses maps.
> >
> > 3. loader program that is generated as "strace of libbpf".
> > When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
> > load BTF, create maps, populate maps and finally load programs.
> > Instead of actually doing the syscalls generate a trace of what libbpf
> > would have done and represent it as the "loader program".
> > The "loader program" consists of single map and single bpf program that
> > does those syscalls.
> > Executing such "loader program" via bpf_prog_test_run() command will
> > replay the sequence of syscalls that libbpf would have done which will result
> > the same maps created and programs loaded as specified in the elf file.
> > The "loader program" removes libelf and majority of libbpf dependency from
> > program loading process.
>
> More of a general question since afaik from prior discussion it didn't came up.
> I think conceptually, it's rather weird to only being able to execute the loader
> program which is later also supposed to do signing through the BPF_PROG_TEST_RUN
> aka our _testing_ infrastructure. Given it's not mentioned in future steps, is
> there anything planned before it becomes uapi and fixed part of skeleton (in
> particular the libbpf bpf_load_and_run() helper officially calling into the
> skel_sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr))) on this regard or is the
> BPF_PROG_TEST_RUN really supposed to be the /main/ interface going forward;
> what's the plan on this?

Few things here:
1. using TEST_RUN command beyond testing.
That ship already sailed. The perf using this command to trigger
prog execution not in a testing environment. See bperf_trigger_reading().
In the past we agreed not to rename commands whose purpose
doesn't strictly fit the name any more. Like RAW_TP_OPEN does a lot more
than just attaching raw_tracepoints.
TEST_RUN command is also no longer for testing only.
That's one of the reasons why bpf_load_and_run() helper is
called such instead of bpf_load_and_test_run().
It's running the program and not testing it.
The kernel cmd is unfortunately misnamed.

2. singing parts that are still to be designed.
We've discussed a few ways of doing it including having
another prog type responsible for it.
In all cases it will be done outside of test_run cmd context.
The actual signing will be completely in user space similar to kernel
modules. No kernel syscalls will be invoked.
The signature verification will be at the program load time.
The loader map and the loader prog will be signed and signature
has to be verified prior to execution. I think load time is the best
place to do it.
Currently Arnaldo's approach of extra sign_add+sign_sz fields to prog_load
and map_create cmds look like the best fit.
Together the map + prog will be checked as one entity and once
created+loaded the loader prog is ready to be executed to
produce other progs/maps.
Such 'run/execute' step (via test_run cmd) can happen many times later,
but at that time there will be no signature creation or signature
checking steps.
The more flexible approach to this is to add a sign checking program
that will be invoked and executed by the kernel during loader prog
loading and during map create. Both approaches can co-exist too.
And in both approaches signature checking steps are not in
test_run cmd user context.
All these future steps are up for discussion of course.

3. fixed part of skeleton
The skeleton is not cast in stone.
Quite the opposite.
It will change as loader prog will support more features.
The bpf_load_and_run() helper may change as well.
That's why it's in skel_internal.h and not part of libbpf api.
Essentially all C code in skel_internal.h are internal to lskel.
They are just as good as being auto-generated by bpftool
during light skeleton creation.
The bpftool could have emitted skel_internal.h just as well.
But it's kinda ugly to let it emit the whole .h file that could be
shared by multiple light skels.
Since it's a .h file it's not a static or shared library. It's not a .c file.
It's guaranteed to be compiled into whatever app that is using light skel.
So there are no backward compat concerns when skel_internal.h
will inevitably change in next revs of lskel.
Same thing with struct bpf_map/prog_desc. They are part of skel_internal.h
and match to what loader prog and lskel gen are doing.
Not only their layout will change, but depending on bpftool
cmdline flags that generated lskel might use different bpf_map_desc.
For example when lskel user needs more or less debuggability from the
loader prog the generated bpf prog will be different and will use
different contract between loader prog and auto-generated light skel .h

Does it answer your questions?

Daniel Borkmann May 18, 2021, 11:04 p.m. UTC | #4

On 5/18/21 11:17 PM, Alexei Starovoitov wrote:
> On Tue, May 18, 2021 at 12:54 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 5/14/21 2:36 AM, Alexei Starovoitov wrote:
>> [...]
>>> This is a first step towards signed bpf programs and the third approach of that kind.
>>> The first approach was to bring libbpf into the kernel as a user-mode-driver.
>>> The second approach was to invent a new file format and let kernel execute
>>> that format as a sequence of syscalls that create maps and load programs.
>>> This third approach is using new type of bpf program instead of inventing file format.
>>> 1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
>>> after months of work.
>>>
>>> To make it work the following new concepts are introduced:
>>> 1. syscall bpf program type
>>> A kind of bpf program that can do sys_bpf and sys_close syscalls.
>>> It can only execute in user context.
>>>
>>> 2. FD array or FD index.
>>> Traditionally BPF instructions are patched with FDs.
>>> What it means that maps has to be created first and then instructions modified
>>> which breaks signature verification if the program is signed.
>>> Instead of patching each instruction with FD patch it with an index into array of FDs.
>>> That makes the program signature stable if it uses maps.
>>>
>>> 3. loader program that is generated as "strace of libbpf".
>>> When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
>>> load BTF, create maps, populate maps and finally load programs.
>>> Instead of actually doing the syscalls generate a trace of what libbpf
>>> would have done and represent it as the "loader program".
>>> The "loader program" consists of single map and single bpf program that
>>> does those syscalls.
>>> Executing such "loader program" via bpf_prog_test_run() command will
>>> replay the sequence of syscalls that libbpf would have done which will result
>>> the same maps created and programs loaded as specified in the elf file.
>>> The "loader program" removes libelf and majority of libbpf dependency from
>>> program loading process.
>>
>> More of a general question since afaik from prior discussion it didn't came up.
>> I think conceptually, it's rather weird to only being able to execute the loader
>> program which is later also supposed to do signing through the BPF_PROG_TEST_RUN
>> aka our _testing_ infrastructure. Given it's not mentioned in future steps, is
>> there anything planned before it becomes uapi and fixed part of skeleton (in
>> particular the libbpf bpf_load_and_run() helper officially calling into the
>> skel_sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr))) on this regard or is the
>> BPF_PROG_TEST_RUN really supposed to be the /main/ interface going forward;
>> what's the plan on this?
> 
> Few things here:
> 1. using TEST_RUN command beyond testing.
> That ship already sailed. The perf using this command to trigger
> prog execution not in a testing environment. See bperf_trigger_reading().
> In the past we agreed not to rename commands whose purpose
> doesn't strictly fit the name any more. Like RAW_TP_OPEN does a lot more
> than just attaching raw_tracepoints.
> TEST_RUN command is also no longer for testing only.
> That's one of the reasons why bpf_load_and_run() helper is
> called such instead of bpf_load_and_test_run().
> It's running the program and not testing it.
> The kernel cmd is unfortunately misnamed.

It definitely is, and from an UAPI pov it just looks super odd to users as in 'what
does the loader have to do with TEST_RUN?!'. I do think this begs a one-time refactor
inside the kernel as well as an exception for alias-mapping that BPF_PROG_TEST_RUN
enum into something more sane (w/ adding an explanation that BPF_PROG_TEST_RUN was
used in the past but it outgrew testing-only), maybe just BPF_PROG_RUN. I generally
agree that we typically shouldn't go for it, but this otherwise looks way too obscure
for something that fundamental.

> The skeleton is not cast in stone.
> Quite the opposite.
> It will change as loader prog will support more features.
> The bpf_load_and_run() helper may change as well.
> That's why it's in skel_internal.h and not part of libbpf api.
> Essentially all C code in skel_internal.h are internal to lskel.
> They are just as good as being auto-generated by bpftool
> during light skeleton creation.
> The bpftool could have emitted skel_internal.h just as well.
> But it's kinda ugly to let it emit the whole .h file that could be
> shared by multiple light skels.
> Since it's a .h file it's not a static or shared library. It's not a .c file.
> It's guaranteed to be compiled into whatever app that is using light skel.
> So there are no backward compat concerns when skel_internal.h
> will inevitably change in next revs of lskel.
> Same thing with struct bpf_map/prog_desc. They are part of skel_internal.h
> and match to what loader prog and lskel gen are doing.
> Not only their layout will change, but depending on bpftool
> cmdline flags that generated lskel might use different bpf_map_desc.
> For example when lskel user needs more or less debuggability from the
> loader prog the generated bpf prog will be different and will use
> different contract between loader prog and auto-generated light skel .h

Ah true, that makes sense. Good we're flexible here. I've pushed the current bits
out to bpf-next thus far & resolved conflicts on libbpf side.

Thanks,
Daniel