mbox series

[RFC,bpf-next,v2,0/5] Extend cgroup interface with bpf

Message ID 20220201205534.1962784-1-haoluo@google.com (mailing list archive)
Headers show
Series Extend cgroup interface with bpf | expand

Message

Hao Luo Feb. 1, 2022, 8:55 p.m. UTC
This patchset introduces a new program type to extend the cgroup interfaces.
It extends the bpf filesystem (bpffs) to allow creating a directory
hierarchy that tracks any kernfs hierarchy, in particular cgroupfs. Each
subdirectory in this hierarchy will keep a reference to a corresponding
kernfs node when created. This is done by associating a new data
structure (called "tag" in this patchset) to the bpffs directory inodes.
File inode in bpffs holds a reference to bpf object and directory inode
may point to a tag, which holds a kernfs node.

A bpf object can be pinned in these directories and objects can choose to
enable inheritance in tagged directories. In this patchset, a new
program type "cgroup_view" is introduced, which supports inheritance.
More specifically, when a link to cgroup_view prog is pinned in a bpffs
directory, it tags the directory and connects the directory to the root
cgroup. Subdirectories created underneath has to match a subcgroup, and
when created, they will inherit the pinned cgroup_view link from the
parent directory.

The pinned cgroup_view objects can be read as files. When the object is
read, it tries to get the cgroup its parent directory is matched to.
Failure to get the cgroup's reference will not run the cgroup_view prog.
Users can implement cgroup_view program to define what to print out to
the file, given the cgroup object.

See patch 5/5 for an example of how this works.

Userspace has to manually create/remove directories in bpffs to mirror
the cgroup hierarchy. It was suggested using overlayfs to create a
hierarchy that contains both cgroupfs and bpffs. But after some
experiments, I found overlayfs is not intended to be used this way:
overlayfs assumes the underlying filesystem will not change [1], but our
underlaying fs (i.e. cgroupfs) will change and cause weird behavior. So
I didn't pursue in that direction.

This patchset v2 is only for demonstrating the high level design. There
are a lot of places in its implementation that can be improved. Cgroup_view
is a type of bpf_iter, because seqfile printing has been supported well
in bpf_iter, although cgroup_view is not iterating kernel objects.

Changes v1->v2:
 - Complete redesign. v1 implements pinning bpf objects in cgroupfs[2].
   v2 implements object inheritance in bpffs. Due to its simplicity,
   bpffs is better for implementing inheritance compared to cgroupfs.
 - Extend selftest to include a more realistic use case. The selftests
   in v2 developed a cgroup-level sched performance metric and exported
   through the new prog type.

[1] https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#changes-to-underlying-filesystems
[2] https://lore.kernel.org/bpf/Ydd1IIUG7%2F3kQRcR@google.com/

Hao Luo (5):
  bpf: Bpffs directory tag
  bpf: Introduce inherit list for dir tag.
  bpf: cgroup_view iter
  bpf: Pin cgroup_view
  selftests/bpf: test for pinning for cgroup_view link

 include/linux/bpf.h                           |   2 +
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/bpf_iter.c                         |  11 +
 kernel/bpf/cgroup_view_iter.c                 | 114 ++++++++
 kernel/bpf/inode.c                            | 272 +++++++++++++++++-
 kernel/bpf/inode.h                            |  55 ++++
 .../selftests/bpf/prog_tests/pinning_cgroup.c | 143 +++++++++
 tools/testing/selftests/bpf/progs/bpf_iter.h  |   7 +
 .../bpf/progs/bpf_iter_cgroup_view.c          | 232 +++++++++++++++
 9 files changed, 829 insertions(+), 9 deletions(-)
 create mode 100644 kernel/bpf/cgroup_view_iter.c
 create mode 100644 kernel/bpf/inode.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_cgroup_view.c