[bpf-next,02/10] bpf: Implement BPF link handling for tc BPF programs

This work adds BPF links for tc. As a recap, a BPF link represents the attachment
of a BPF program to a BPF hook point. The BPF link holds a single reference to
keep BPF program alive. Moreover, hook points do not reference a BPF link, only
the application's fd or pinning does. A BPF link holds meta-data specific to
attachment and implements operations for link creation, (atomic) BPF program
update, detachment and introspection.

The motivation for BPF links for tc BPF programs is multi-fold, for example:

- "It's especially important for applications that are deployed fleet-wide
   and that don't "control" hosts they are deployed to. If such application
   crashes and no one notices and does anything about that, BPF program will
   keep running draining resources or even just, say, dropping packets. We
   at FB had outages due to such permanent BPF attachment semantics. With
   fd-based BPF link we are getting a framework, which allows safe, auto-
   detachable behavior by default, unless application explicitly opts in by
   pinning the BPF link." [0]

-  From Cilium-side the tc BPF programs we attach to host-facing veth devices
   and phys devices build the core datapath for Kubernetes Pods, and they
   implement forwarding, load-balancing, policy, EDT-management, etc, within
   BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
   experienced hard-to-debug issues in a user's staging environment where
   another Kubernetes application using tc BPF attached to the same prio/handle
   of cls_bpf, wiping all Cilium-based BPF programs from underneath it. The
   goal is to establish a clear/safe ownership model via links which cannot
   accidentally be overridden. [1]

BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.

Earlier attempts [2] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.
We chose to implement a prerequisite of the fd-based tc BPF attach API, so
that the BPF link implementation fits in naturally similar to other link types
which are fd-based and without the need for changing core tc internal APIs.

BPF programs for tc can then be successively migrated from cls_bpf to the new
tc BPF link without needing to change the program's source code, just the BPF
loader mechanics for attaching.

  [0] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com/
  [1] https://lpc.events/event/16/contributions/1353/
  [2] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com/

Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h            |   5 +-
 include/net/xtc.h              |  14 ++++
 include/uapi/linux/bpf.h       |   5 ++
 kernel/bpf/net.c               | 116 ++++++++++++++++++++++++++++++---
 kernel/bpf/syscall.c           |   3 +
 tools/include/uapi/linux/bpf.h |   5 ++
 6 files changed, 139 insertions(+), 9 deletions(-)

Message ID	20221004231143.19190-3-daniel@iogearbox.net (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Return-Path: <netdev-owner@kernel.org> From: Daniel Borkmann <daniel@iogearbox.net> To: bpf@vger.kernel.org Cc: razor@blackwall.org, ast@kernel.org, andrii@kernel.org, martin.lau@linux.dev, john.fastabend@gmail.com, joannelkoong@gmail.com, memxor@gmail.com, toke@redhat.com, joe@cilium.io, netdev@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net> Subject: [PATCH bpf-next 02/10] bpf: Implement BPF link handling for tc BPF programs Date: Wed, 5 Oct 2022 01:11:35 +0200 Message-Id: <20221004231143.19190-3-daniel@iogearbox.net> In-Reply-To: <20221004231143.19190-1-daniel@iogearbox.net> References: <20221004231143.19190-1-daniel@iogearbox.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	BPF link support for tc BPF programs \| expand [bpf-next,00/10] BPF link support for tc BPF programs [bpf-next,01/10] bpf: Add initial fd-based API to attach tc BPF programs [bpf-next,02/10] bpf: Implement BPF link handling for tc BPF programs [bpf-next,03/10] bpf: Implement link update for tc BPF link programs [bpf-next,04/10] bpf: Implement link introspection for tc BPF link programs [bpf-next,05/10] bpf: Implement link detach for tc BPF link programs [bpf-next,06/10] libbpf: Change signature of bpf_prog_query [bpf-next,07/10] libbpf: Add extended attach/detach opts [bpf-next,08/10] libbpf: Add support for BPF tc link [bpf-next,09/10] bpftool: Add support for tc fd-based attach types [bpf-next,10/10] bpf, selftests: Add various BPF tc link selftests

Context	Check	Description
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10	fail	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11	fail	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13	fail	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	fail	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12	fail	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9	fail	Logs for test_progs on s390x with gcc
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Series has a cover letter
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1695 this patch: 1695
netdev/cc_maintainers	warning	10 maintainers not CCed: kuba@kernel.org sdf@google.com davem@davemloft.net yhs@fb.com haoluo@google.com jolsa@kernel.org kpsingh@kernel.org song@kernel.org edumazet@google.com pabeni@redhat.com
netdev/build_clang	success	Errors and warnings before: 171 this patch: 171
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1687 this patch: 1687
netdev/checkpatch	warning	WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

[bpf-next,02/10] bpf: Implement BPF link handling for tc BPF programs

Checks

Commit Message

Comments

Patch