mbox series

[RFC,bpf-next,0/9] bpf: cgroup hierarchical stats collection

Message ID 20220510001807.4132027-1-yosryahmed@google.com (mailing list archive)
Headers show
Series bpf: cgroup hierarchical stats collection | expand

Message

Yosry Ahmed May 10, 2022, 12:17 a.m. UTC
This patch series allows for using bpf to collect hierarchical cgroup
stats efficiently by integrating with the rstat framework. The rstat
framework provides an efficient way to collect cgroup stats and
propagate them through the cgroup hierarchy.

The last patch is a selftest that demonastrates the entire workflow.
The workflow consists of:
- bpf programs that collect per-cpu per-cgroup stats (tracing progs).
- bpf rstat flusher that contains the logic for aggregating stats
  across cpus and across the cgroup hierarchy.
- bpf cgroup_iter responsible for outputting the stats to userspace
  through reading a file in bpffs.

The first 3 patches include the new bpf rstat flusher program type and
the needed support in rstat code and libbpf. The rstat flusher program
is a callback that the rstat framework makes to bpf when a stat flush is
ongoing, similar to the css_rstat_flush() callback that rstat makes to
cgroup controllers. Each callback is parameterized by a (cgroup, cpu)
pair that has been updated. The program contains the logic for
aggregating the stats across cpus and across the cgroup hierarchy.
These programs can be attached to any cgroup subsystem, not only the
ones that implement the css_rstat_flush() callback in the kernel. This
gives bpf programs more flexibility, and more isolation from the kernel
implementation.

The following 2 patches add necessary helpers for the stats collection
workflow. Helpers that call into cgroup_rstat_updated() and
cgroup_rstat_flush() are added to allow bpf programs collecting stats to
tell the rstat framework that a cgroup has been updated, and to allow
bpf programs outputting stats to tell the rstat framework to flush the
stats before they are displayed to the user. An additional
bpf_map_lookup_percpu_elem is introduced to allow rstat flusher programs
to access percpu stats of the cpu being flushed.

The following 3 patches add the cgroup_iter program type (v2). This was
originally introduced by Hao as a part of a different series [1].
Their usecase is better showcased as part of this patch series. We also
make cgroup_get_from_id() cgroup v1 friendly to allow cgroup_iter programs
to display stats for cgroup v1 as well. This small change makes the
entire workflow cgroup v1 friendly without any other dedicated changes.

The final patch is a selftest demonstrating the entire workflow with a
set of bpf programs that collect per-cgroup latency of memcg reclaim.

[1]https://lore.kernel.org/lkml/20220225234339.2386398-9-haoluo@google.com/


Hao Luo (2):
  cgroup: Add cgroup_put() in !CONFIG_CGROUPS case
  bpf: Introduce cgroup iter

Yosry Ahmed (7):
  bpf: introduce CGROUP_SUBSYS_RSTAT program type
  cgroup: bpf: flush bpf stats on rstat flush
  libbpf: Add support for rstat progs and links
  bpf: add bpf rstat helpers
  bpf: add bpf_map_lookup_percpu_elem() helper
  cgroup: add v1 support to cgroup_get_from_id()
  bpf: add a selftest for cgroup hierarchical stats collection

 include/linux/bpf-cgroup-subsys.h             |  35 ++
 include/linux/bpf.h                           |   4 +
 include/linux/bpf_types.h                     |   2 +
 include/linux/cgroup-defs.h                   |   4 +
 include/linux/cgroup.h                        |   5 +
 include/uapi/linux/bpf.h                      |  45 +++
 kernel/bpf/Makefile                           |   3 +-
 kernel/bpf/arraymap.c                         |  11 +-
 kernel/bpf/cgroup_iter.c                      | 148 ++++++++
 kernel/bpf/cgroup_subsys.c                    | 212 +++++++++++
 kernel/bpf/hashtab.c                          |  25 +-
 kernel/bpf/helpers.c                          |  56 +++
 kernel/bpf/syscall.c                          |   6 +
 kernel/bpf/verifier.c                         |   6 +
 kernel/cgroup/cgroup.c                        |  16 +-
 kernel/cgroup/rstat.c                         |  11 +
 scripts/bpf_doc.py                            |   2 +
 tools/include/uapi/linux/bpf.h                |  45 +++
 tools/lib/bpf/bpf.c                           |   3 +
 tools/lib/bpf/bpf.h                           |   3 +
 tools/lib/bpf/libbpf.c                        |  35 ++
 tools/lib/bpf/libbpf.h                        |   3 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../test_cgroup_hierarchical_stats.c          | 335 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_iter.h  |   7 +
 .../selftests/bpf/progs/cgroup_vmscan.c       | 211 +++++++++++
 26 files changed, 1212 insertions(+), 22 deletions(-)
 create mode 100644 include/linux/bpf-cgroup-subsys.h
 create mode 100644 kernel/bpf/cgroup_iter.c
 create mode 100644 kernel/bpf/cgroup_subsys.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c
 create mode 100644 tools/testing/selftests/bpf/progs/cgroup_vmscan.c

Comments

Yosry Ahmed May 13, 2022, 7:16 a.m. UTC | #1
I have done some significant changes on the BPF side of this. I will
send a RFC V2 soon with those changes and incorporating the feedback
on the cgroup side that I got from Tejun. Hold off on reviewing this
version.


On Mon, May 9, 2022 at 5:18 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> This patch series allows for using bpf to collect hierarchical cgroup
> stats efficiently by integrating with the rstat framework. The rstat
> framework provides an efficient way to collect cgroup stats and
> propagate them through the cgroup hierarchy.
>
> The last patch is a selftest that demonastrates the entire workflow.
> The workflow consists of:
> - bpf programs that collect per-cpu per-cgroup stats (tracing progs).
> - bpf rstat flusher that contains the logic for aggregating stats
>   across cpus and across the cgroup hierarchy.
> - bpf cgroup_iter responsible for outputting the stats to userspace
>   through reading a file in bpffs.
>
> The first 3 patches include the new bpf rstat flusher program type and
> the needed support in rstat code and libbpf. The rstat flusher program
> is a callback that the rstat framework makes to bpf when a stat flush is
> ongoing, similar to the css_rstat_flush() callback that rstat makes to
> cgroup controllers. Each callback is parameterized by a (cgroup, cpu)
> pair that has been updated. The program contains the logic for
> aggregating the stats across cpus and across the cgroup hierarchy.
> These programs can be attached to any cgroup subsystem, not only the
> ones that implement the css_rstat_flush() callback in the kernel. This
> gives bpf programs more flexibility, and more isolation from the kernel
> implementation.
>
> The following 2 patches add necessary helpers for the stats collection
> workflow. Helpers that call into cgroup_rstat_updated() and
> cgroup_rstat_flush() are added to allow bpf programs collecting stats to
> tell the rstat framework that a cgroup has been updated, and to allow
> bpf programs outputting stats to tell the rstat framework to flush the
> stats before they are displayed to the user. An additional
> bpf_map_lookup_percpu_elem is introduced to allow rstat flusher programs
> to access percpu stats of the cpu being flushed.
>
> The following 3 patches add the cgroup_iter program type (v2). This was
> originally introduced by Hao as a part of a different series [1].
> Their usecase is better showcased as part of this patch series. We also
> make cgroup_get_from_id() cgroup v1 friendly to allow cgroup_iter programs
> to display stats for cgroup v1 as well. This small change makes the
> entire workflow cgroup v1 friendly without any other dedicated changes.
>
> The final patch is a selftest demonstrating the entire workflow with a
> set of bpf programs that collect per-cgroup latency of memcg reclaim.
>
> [1]https://lore.kernel.org/lkml/20220225234339.2386398-9-haoluo@google.com/
>
>
> Hao Luo (2):
>   cgroup: Add cgroup_put() in !CONFIG_CGROUPS case
>   bpf: Introduce cgroup iter
>
> Yosry Ahmed (7):
>   bpf: introduce CGROUP_SUBSYS_RSTAT program type
>   cgroup: bpf: flush bpf stats on rstat flush
>   libbpf: Add support for rstat progs and links
>   bpf: add bpf rstat helpers
>   bpf: add bpf_map_lookup_percpu_elem() helper
>   cgroup: add v1 support to cgroup_get_from_id()
>   bpf: add a selftest for cgroup hierarchical stats collection
>
>  include/linux/bpf-cgroup-subsys.h             |  35 ++
>  include/linux/bpf.h                           |   4 +
>  include/linux/bpf_types.h                     |   2 +
>  include/linux/cgroup-defs.h                   |   4 +
>  include/linux/cgroup.h                        |   5 +
>  include/uapi/linux/bpf.h                      |  45 +++
>  kernel/bpf/Makefile                           |   3 +-
>  kernel/bpf/arraymap.c                         |  11 +-
>  kernel/bpf/cgroup_iter.c                      | 148 ++++++++
>  kernel/bpf/cgroup_subsys.c                    | 212 +++++++++++
>  kernel/bpf/hashtab.c                          |  25 +-
>  kernel/bpf/helpers.c                          |  56 +++
>  kernel/bpf/syscall.c                          |   6 +
>  kernel/bpf/verifier.c                         |   6 +
>  kernel/cgroup/cgroup.c                        |  16 +-
>  kernel/cgroup/rstat.c                         |  11 +
>  scripts/bpf_doc.py                            |   2 +
>  tools/include/uapi/linux/bpf.h                |  45 +++
>  tools/lib/bpf/bpf.c                           |   3 +
>  tools/lib/bpf/bpf.h                           |   3 +
>  tools/lib/bpf/libbpf.c                        |  35 ++
>  tools/lib/bpf/libbpf.h                        |   3 +
>  tools/lib/bpf/libbpf.map                      |   1 +
>  .../test_cgroup_hierarchical_stats.c          | 335 ++++++++++++++++++
>  tools/testing/selftests/bpf/progs/bpf_iter.h  |   7 +
>  .../selftests/bpf/progs/cgroup_vmscan.c       | 211 +++++++++++
>  26 files changed, 1212 insertions(+), 22 deletions(-)
>  create mode 100644 include/linux/bpf-cgroup-subsys.h
>  create mode 100644 kernel/bpf/cgroup_iter.c
>  create mode 100644 kernel/bpf/cgroup_subsys.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c
>  create mode 100644 tools/testing/selftests/bpf/progs/cgroup_vmscan.c
>
> --
> 2.36.0.512.ge40c2bad7a-goog
>