From patchwork Sun May 15 02:34:58 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yosry Ahmed <yosryahmed@google.com>
X-Patchwork-Id: 12849963
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 51BA6C433EF
	for <netdev@archiver.kernel.org>; Sun, 15 May 2022 02:35:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233780AbiEOCfR (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Sat, 14 May 2022 22:35:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56544 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233677AbiEOCfP (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 14 May 2022 22:35:15 -0400
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA79FBF55
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:13 -0700 (PDT)
Received: by mail-pg1-x54a.google.com with SMTP id
 x190-20020a6386c7000000b003d82199c4fdso5664303pgd.16
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=YC2InFXwdRtQDPboyxfsyUWhY4pdSELhBce0hzw3bTk=;
        b=S+1kFWym/zN0rJudekoPvUvDDe7UznkyF8SCOJE4pOoY6ys9Cjzha6SrMSikpVNITA
         qMVIV+5G8bo4Eo+4tW+WsKUJR1Gi3b4QZj+BMMXcGIbuiXOdS6E/zhC9KNrtrJfsD7NF
         jZFfJNKlJlPYn8fXmpsZvFdJcQwpjhyOfQQPvEmmkNNhkAMz7Khtdpm6AsOcR513IN1m
         wfYwXH25ptqUui06Q9CjTBDKmb7BRUkvzInP8gx0BJw2j3IlZ4Wt3WjISXP1ndUrf2WJ
         6pjikGNl/J5x4KW89rpGiU/BqnoOmHf/4+xGLZttCoJFfqU+Fs2YuiXg8zRsDsUsj6hm
         8xfw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=YC2InFXwdRtQDPboyxfsyUWhY4pdSELhBce0hzw3bTk=;
        b=pHS5Iu/FDPVW8QmtjRSuVFCn1JoLVGzofSmIAuCD5I+pGhuq7BrWzjdTgGh6JjdKYw
         1gY1rvPvU8qVqm2QYEBqTetuFKQTl9RUvdC53rxaXen/7vd6GRAN8Yigq+QoAxGrpvdu
         7bfDgKsgYkxp/E1wHPuVYoN1M5zXMCEL0bwgVM6augKftOGIrpFOlmZlbJwiqD6uAxM2
         iCfNLNwTmMzCX8V1qmwZeXBTHBWtM3//viYWhl3O30PwinEJbeXDWtrbDr2xBefjtw+f
         KWwdBUvFaCNvF2humuFH84GdSbiBfqQDRCJQvgVfacqhJzN4tkasGySyX0xvuB55eUpy
         UtvQ==
X-Gm-Message-State: AOAM532q8V8c/uqP8Mtmk8lo2jYds82m86R4tnEUqV/3XSZUW3911zHe
        JJ50E140tWr972uv0oqAUXYcxVEfWn4kRduN
X-Google-Smtp-Source: 
 ABdhPJwqr95MHkBE6vkF0FYJcNxo4nTNVNVgGU6dH9U7Nz3DXXdajDF+Tp0ZOs5jRbpk6+zaGyVlGvaV19h1ibTF
X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327])
 (user=yosryahmed job=sendgmr) by 2002:a17:902:d88a:b0:156:1609:1e62 with SMTP
 id b10-20020a170902d88a00b0015616091e62mr11370656plz.143.1652582113258; Sat,
 14 May 2022 19:35:13 -0700 (PDT)
Date: Sun, 15 May 2022 02:34:58 +0000
In-Reply-To: <20220515023504.1823463-1-yosryahmed@google.com>
Message-Id: <20220515023504.1823463-2-yosryahmed@google.com>
Mime-Version: 1.0
References: <20220515023504.1823463-1-yosryahmed@google.com>
X-Mailer: git-send-email 2.36.0.550.gb090851708-goog
Subject: [RFC PATCH bpf-next v2 1/7] bpf: introduce RSTAT_FLUSH program type
From: Yosry Ahmed <yosryahmed@google.com>
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <kafai@fb.com>,
        Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
        John Fastabend <john.fastabend@gmail.com>,
        KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
        Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Shuah Khan <shuah@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Michal Hocko <mhocko@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>,
        David Rientjes <rientjes@google.com>,
        Greg Thelen <gthelen@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        bpf@vger.kernel.org, cgroups@vger.kernel.org,
        Yosry Ahmed <yosryahmed@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

This patch introduces a new bpf program type, RSTAT_FLUSH,
with new corresponding link and attach types.

These programs acts as a callback for the rstat framework to call when a
stats flush is ongoing. It allows BPF programs to collect and maintain
hierarchical stats cgroup stats efficiently by integrating with the rstat
framework.

See the selftest in the final patch for a practical example.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/bpf-rstat.h      |  25 +++++
 include/linux/bpf_types.h      |   4 +
 include/uapi/linux/bpf.h       |   9 ++
 kernel/bpf/Makefile            |   3 +
 kernel/bpf/rstat.c             | 166 +++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |   6 ++
 tools/include/uapi/linux/bpf.h |   9 ++
 7 files changed, 222 insertions(+)
 create mode 100644 include/linux/bpf-rstat.h
 create mode 100644 kernel/bpf/rstat.c

diff --git a/include/linux/bpf-rstat.h b/include/linux/bpf-rstat.h
new file mode 100644
index 000000000000..23cad23b5fc2
--- /dev/null
+++ b/include/linux/bpf-rstat.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2022 Google LLC.
+ */
+#ifndef _BPF_RSTAT_H_
+#define _BPF_RSTAT_H_
+
+#include <linux/bpf.h>
+
+#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_CGROUPS)
+
+int bpf_rstat_link_attach(const union bpf_attr *attr,
+				 struct bpf_prog *prog);
+
+#else /* defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_CGROUPS) */
+
+static inline int bpf_rstat_link_attach(const union bpf_attr *attr,
+					struct bpf_prog *prog)
+{
+	return -ENOTSUPP;
+}
+
+#endif /* defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_CGROUPS) */
+
+#endif  /* _BPF_RSTAT */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 2b9112b80171..ff92299f76a9 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -77,6 +77,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
 	       void *, void *)
 #endif /* CONFIG_BPF_LSM */
 #endif
+#ifdef CONFIG_CGROUPS
+BPF_PROG_TYPE(BPF_PROG_TYPE_RSTAT_FLUSH, rstat_flush,
+	      struct bpf_rstat_flush_ctx, struct bpf_rstat_flush_ctx)
+#endif /* CONFIG_CGROUPS */
 BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
 	      void *, void *)
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0210f85131b3..968e3cb02580 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -952,6 +952,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LSM,
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
+	BPF_PROG_TYPE_RSTAT_FLUSH,
 };
 
 enum bpf_attach_type {
@@ -998,6 +999,7 @@ enum bpf_attach_type {
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	BPF_PERF_EVENT,
 	BPF_TRACE_KPROBE_MULTI,
+	BPF_RSTAT_FLUSH,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1014,6 +1016,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_PERF_EVENT = 7,
 	BPF_LINK_TYPE_KPROBE_MULTI = 8,
 	BPF_LINK_TYPE_STRUCT_OPS = 9,
+	BPF_LINK_TYPE_RSTAT = 10,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -6359,6 +6362,12 @@ struct bpf_cgroup_dev_ctx {
 	__u32 minor;
 };
 
+struct bpf_rstat_flush_ctx {
+	__bpf_md_ptr(struct cgroup *, cgrp);
+	__bpf_md_ptr(struct cgroup *, parent);
+	__s32 cpu;
+};
+
 struct bpf_raw_tracepoint_args {
 	__u64 args[0];
 };
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 057ba8e01e70..0487133b799f 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -36,6 +36,9 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
 obj-${CONFIG_BPF_LSM} += bpf_lsm.o
 endif
 obj-$(CONFIG_BPF_PRELOAD) += preload/
+ifeq ($(CONFIG_CGROUPS),y)
+obj-$(CONFIG_BPF_SYSCALL) += rstat.o
+endif
 
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
 $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
diff --git a/kernel/bpf/rstat.c b/kernel/bpf/rstat.c
new file mode 100644
index 000000000000..5f529002d4b9
--- /dev/null
+++ b/kernel/bpf/rstat.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Functions to manage eBPF programs attached to cgroup subsystems
+ *
+ * Copyright 2022 Google LLC.
+ */
+
+#include <linux/bpf-rstat.h>
+#include <linux/btf_ids.h>
+#include <linux/cgroup.h>
+#include <linux/filter.h>
+
+static LIST_HEAD(bpf_rstat_flushers);
+static DEFINE_SPINLOCK(bpf_rstat_flushers_lock);
+
+
+struct bpf_rstat_flusher {
+	struct bpf_prog *prog;
+	/* List of BPF rtstat flushers, anchored at subsys->bpf */
+	struct list_head list;
+};
+
+struct bpf_rstat_link {
+	struct bpf_link link;
+	struct bpf_rstat_flusher *flusher;
+};
+
+static int bpf_rstat_flush_attach(struct bpf_prog *prog,
+				  struct bpf_rstat_link *rlink)
+{
+	struct bpf_rstat_flusher *flusher;
+
+	flusher = kmalloc(sizeof(*flusher), GFP_KERNEL);
+	if (!flusher)
+		return -ENOMEM;
+
+	flusher->prog = prog;
+	rlink->flusher = flusher;
+
+	spin_lock(&bpf_rstat_flushers_lock);
+	list_add(&flusher->list, &bpf_rstat_flushers);
+	spin_unlock(&bpf_rstat_flushers_lock);
+
+	return 0;
+}
+
+static void bpf_rstat_flush_detach(struct bpf_rstat_link *rstat_link)
+{
+	struct bpf_rstat_flusher *flusher = rstat_link->flusher;
+
+	if (!flusher)
+		return;
+
+	spin_lock(&bpf_rstat_flushers_lock);
+	list_del(&flusher->list);
+	bpf_prog_put(flusher->prog);
+	kfree(flusher);
+	spin_unlock(&bpf_rstat_flushers_lock);
+}
+
+static const struct bpf_func_proto *
+bpf_rstat_flush_func_proto(enum bpf_func_id func_id,
+			   const struct bpf_prog *prog)
+{
+	return bpf_base_func_proto(func_id);
+}
+
+BTF_ID_LIST_SINGLE(bpf_cgroup_btf_ids, struct, cgroup)
+
+static bool bpf_rstat_flush_is_valid_access(int off, int size,
+					    enum bpf_access_type type,
+					    const struct bpf_prog *prog,
+					    struct bpf_insn_access_aux *info)
+{
+	if (type == BPF_WRITE)
+		return false;
+
+	if (off < 0 || off + size > sizeof(struct bpf_rstat_flush_ctx))
+		return false;
+	/* The verifier guarantees that size > 0 */
+	if (off % size != 0)
+		return false;
+
+	switch (off) {
+	case bpf_ctx_range_ptr(struct bpf_rstat_flush_ctx, cgrp):
+		info->reg_type = PTR_TO_BTF_ID;
+		info->btf_id = bpf_cgroup_btf_ids[0];
+		info->btf = bpf_get_btf_vmlinux();
+		return !IS_ERR(info->btf) && info->btf && size == sizeof(__u64);
+	case bpf_ctx_range_ptr(struct bpf_rstat_flush_ctx, parent):
+		info->reg_type = PTR_TO_BTF_ID_OR_NULL;
+		info->btf_id = bpf_cgroup_btf_ids[0];
+		info->btf = bpf_get_btf_vmlinux();
+		return !IS_ERR(info->btf) && info->btf && size == sizeof(__u64);
+	case bpf_ctx_range(struct bpf_rstat_flush_ctx, cpu):
+		return size == sizeof(__s32);
+	default:
+		return false;
+	}
+}
+
+const struct bpf_prog_ops rstat_flush_prog_ops = {
+};
+
+const struct bpf_verifier_ops rstat_flush_verifier_ops = {
+	.get_func_proto         = bpf_rstat_flush_func_proto,
+	.is_valid_access        = bpf_rstat_flush_is_valid_access,
+};
+
+static void bpf_rstat_link_release(struct bpf_link *link)
+{
+	struct bpf_rstat_link *rlink;
+
+	rlink = container_of(link,
+			     struct bpf_rstat_link,
+			     link);
+
+	/* rstat flushers are currently the only supported rstat programs */
+	bpf_rstat_flush_detach(rlink);
+}
+
+static void bpf_rstat_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_rstat_link *rlink = container_of(link,
+						    struct bpf_rstat_link,
+						    link);
+	kfree(rlink);
+}
+
+static const struct bpf_link_ops bpf_rstat_link_lops = {
+	.release = bpf_rstat_link_release,
+	.dealloc = bpf_rstat_link_dealloc,
+};
+
+int bpf_rstat_link_attach(const union bpf_attr *attr,
+			  struct bpf_prog *prog)
+{
+	struct bpf_link_primer link_primer;
+	struct bpf_rstat_link *link;
+	int err;
+
+	if (attr->link_create.target_fd || attr->link_create.flags)
+		return -EINVAL;
+
+	link = kzalloc(sizeof(*link), GFP_USER);
+	if (!link)
+		return -ENOMEM;
+
+	bpf_link_init(&link->link, BPF_LINK_TYPE_RSTAT,
+		      &bpf_rstat_link_lops, prog);
+
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
+		kfree(link);
+		return err;
+	}
+
+	/* rstat flushers are currently the only supported rstat programs */
+	err = bpf_rstat_flush_attach(prog, link);
+	if (err) {
+		bpf_link_cleanup(&link_primer);
+		return err;
+	}
+
+	return bpf_link_settle(&link_primer);
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 72e53489165d..ffeed8379b35 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3,6 +3,7 @@
  */
 #include <linux/bpf.h>
 #include <linux/bpf-cgroup.h>
+#include <linux/bpf-rstat.h>
 #include <linux/bpf_trace.h>
 #include <linux/bpf_lirc.h>
 #include <linux/bpf_verifier.h>
@@ -3416,6 +3417,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_SK_LOOKUP;
 	case BPF_XDP:
 		return BPF_PROG_TYPE_XDP;
+	case BPF_RSTAT_FLUSH:
+		return BPF_PROG_TYPE_RSTAT_FLUSH;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
@@ -4564,6 +4567,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 		else
 			ret = bpf_kprobe_multi_link_attach(attr, prog);
 		break;
+	case BPF_PROG_TYPE_RSTAT_FLUSH:
+		ret = bpf_rstat_link_attach(attr, prog);
+		break;
 	default:
 		ret = -EINVAL;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0210f85131b3..968e3cb02580 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -952,6 +952,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LSM,
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
+	BPF_PROG_TYPE_RSTAT_FLUSH,
 };
 
 enum bpf_attach_type {
@@ -998,6 +999,7 @@ enum bpf_attach_type {
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	BPF_PERF_EVENT,
 	BPF_TRACE_KPROBE_MULTI,
+	BPF_RSTAT_FLUSH,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1014,6 +1016,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_PERF_EVENT = 7,
 	BPF_LINK_TYPE_KPROBE_MULTI = 8,
 	BPF_LINK_TYPE_STRUCT_OPS = 9,
+	BPF_LINK_TYPE_RSTAT = 10,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -6359,6 +6362,12 @@ struct bpf_cgroup_dev_ctx {
 	__u32 minor;
 };
 
+struct bpf_rstat_flush_ctx {
+	__bpf_md_ptr(struct cgroup *, cgrp);
+	__bpf_md_ptr(struct cgroup *, parent);
+	__s32 cpu;
+};
+
 struct bpf_raw_tracepoint_args {
 	__u64 args[0];
 };

From patchwork Sun May 15 02:34:59 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yosry Ahmed <yosryahmed@google.com>
X-Patchwork-Id: 12849962
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 72EE9C4332F
	for <netdev@archiver.kernel.org>; Sun, 15 May 2022 02:35:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233842AbiEOCfT (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Sat, 14 May 2022 22:35:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56608 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233755AbiEOCfQ (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 14 May 2022 22:35:16 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0157BF4D
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:15 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 d125-20020a636883000000b003db5e24db27so4206810pgc.13
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=EUwJph95+QtgVEYpOk4h8MUmP3EN23IIGoIBGE07ovI=;
        b=PDZGlc0QzpODkc7dmkDeYxiYCZPbcORZBrrzUTn6k3D3SZY7PbQm0ZXd2IW+sOUcbS
         Xo/fMyFUqZ1MfPE7KjvD1HFOoKFRkYUHKAVvf1VIFlKP8N7bWGPWYWNRzEnXgpO2PIDH
         808VikSXzNhMH2ZaMUaHgFvbK6M6E+3CmGXWQ/tFRAUGdnThnx5OCM8tB0SDrpb5/K4M
         x0t6mUWNAKdmuRFlk6RTn9mfd5kqItz7GI+N7IBtsJPOKuErr9wiCmXd9ZHUnDVq8d/+
         QwUnKS3979H8Oq+Ejg40FwzfS7cnE1rESwD7zLfGGgUTKx3KVKQ81Nou3l0sEKcUR/hP
         jJBA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=EUwJph95+QtgVEYpOk4h8MUmP3EN23IIGoIBGE07ovI=;
        b=KpVUWBC3eTFFVnSL6FjgpUa9E/pfer2HCBegJDlmtPyMOcNBIr19Zz8exz29yjgSX4
         Gn8q+Ijx/jwszWOTJT4w9ZvI298mL6zRplSdpj45ZEMdYPtlUSHZxwuTXGjiHfIL9ElV
         LMLzG7Wu8+guDoqjkP//bZrThe3OZP41Gs2mFYyo4yvF3+r8w57+sbM16JR2NbwCNwpC
         cAMdQ/AA9tq/WEUFPXLQnFDXux/wi+jTRexUEhIvHOcWTcUWPy6fk/9vL6GE1zNCOOGT
         EZBf+cjHF/GCZxTVIwf1dJM89bPyaMB0297MeTz/JrM3KDIwQbTYW77X0hXp4EMamSKv
         O3eQ==
X-Gm-Message-State: AOAM531ORhc7C+42GnaHXFuflrRVVilTgo+AS2MKnFk1aG4ffoJ/kr5z
        CRVK73inn8gmkFfHhv+LUnXItuNB+iwCTEPr
X-Google-Smtp-Source: 
 ABdhPJxpZk5O5QDsqL4It6DfSPP6K34f7RSXPFA0LdFKnqShsjpeMiKJiB90w0mBU56w5Imn8xwExWVTVLuT8NEI
X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327])
 (user=yosryahmed job=sendgmr) by 2002:a17:903:2348:b0:15f:2b4a:29c2 with SMTP
 id c8-20020a170903234800b0015f2b4a29c2mr11740589plh.37.1652582115174; Sat, 14
 May 2022 19:35:15 -0700 (PDT)
Date: Sun, 15 May 2022 02:34:59 +0000
In-Reply-To: <20220515023504.1823463-1-yosryahmed@google.com>
Message-Id: <20220515023504.1823463-3-yosryahmed@google.com>
Mime-Version: 1.0
References: <20220515023504.1823463-1-yosryahmed@google.com>
X-Mailer: git-send-email 2.36.0.550.gb090851708-goog
Subject: [RFC PATCH bpf-next v2 2/7] cgroup: bpf: flush bpf stats on rstat
 flush
From: Yosry Ahmed <yosryahmed@google.com>
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <kafai@fb.com>,
        Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
        John Fastabend <john.fastabend@gmail.com>,
        KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
        Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Shuah Khan <shuah@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Michal Hocko <mhocko@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>,
        David Rientjes <rientjes@google.com>,
        Greg Thelen <gthelen@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        bpf@vger.kernel.org, cgroups@vger.kernel.org,
        Yosry Ahmed <yosryahmed@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

When an rstat flush is ongoing for a cgroup, also flush bpf stats by
running any attached rstat flush programs.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/bpf-rstat.h |  6 ++++++
 kernel/bpf/rstat.c        | 21 +++++++++++++++++++++
 kernel/cgroup/rstat.c     |  2 ++
 3 files changed, 29 insertions(+)

diff --git a/include/linux/bpf-rstat.h b/include/linux/bpf-rstat.h
index 23cad23b5fc2..55e000fe0f47 100644
--- a/include/linux/bpf-rstat.h
+++ b/include/linux/bpf-rstat.h
@@ -12,6 +12,8 @@
 int bpf_rstat_link_attach(const union bpf_attr *attr,
 				 struct bpf_prog *prog);
 
+void bpf_rstat_flush(struct cgroup *cgrp, int cpu);
+
 #else /* defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_CGROUPS) */
 
 static inline int bpf_rstat_link_attach(const union bpf_attr *attr,
@@ -20,6 +22,10 @@ static inline int bpf_rstat_link_attach(const union bpf_attr *attr,
 	return -ENOTSUPP;
 }
 
+static inline void bpf_rstat_flush(struct cgroup *cgrp, int cpu)
+{
+}
+
 #endif /* defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_CGROUPS) */
 
 #endif  /* _BPF_RSTAT */
diff --git a/kernel/bpf/rstat.c b/kernel/bpf/rstat.c
index 5f529002d4b9..e96bc080f4b9 100644
--- a/kernel/bpf/rstat.c
+++ b/kernel/bpf/rstat.c
@@ -164,3 +164,24 @@ int bpf_rstat_link_attach(const union bpf_attr *attr,
 
 	return bpf_link_settle(&link_primer);
 }
+
+void bpf_rstat_flush(struct cgroup *cgrp, int cpu)
+{
+	struct bpf_rstat_flusher *flusher;
+	struct bpf_rstat_flush_ctx ctx = {
+		.cgrp = cgrp,
+		.parent = cgroup_parent(cgrp),
+		.cpu = cpu,
+	};
+
+	rcu_read_lock();
+	migrate_disable();
+	spin_lock(&bpf_rstat_flushers_lock);
+
+	list_for_each_entry(flusher, &bpf_rstat_flushers, list)
+		(void) bpf_prog_run(flusher->prog, &ctx);
+
+	spin_unlock(&bpf_rstat_flushers_lock);
+	migrate_enable();
+	rcu_read_unlock();
+}
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 24b5c2ab5598..0285d496e807 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -2,6 +2,7 @@
 #include "cgroup-internal.h"
 
 #include <linux/sched/cputime.h>
+#include <linux/bpf-rstat.h>
 
 static DEFINE_SPINLOCK(cgroup_rstat_lock);
 static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock);
@@ -168,6 +169,7 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
 			struct cgroup_subsys_state *css;
 
 			cgroup_base_stat_flush(pos, cpu);
+			bpf_rstat_flush(pos, cpu);
 
 			rcu_read_lock();
 			list_for_each_entry_rcu(css, &pos->rstat_css_list,

From patchwork Sun May 15 02:35:00 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yosry Ahmed <yosryahmed@google.com>
X-Patchwork-Id: 12849964
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B764DC433F5
	for <netdev@archiver.kernel.org>; Sun, 15 May 2022 02:35:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234292AbiEOCfj (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Sat, 14 May 2022 22:35:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56770 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233849AbiEOCfU (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 14 May 2022 22:35:20 -0400
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59F24BF53
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:17 -0700 (PDT)
Received: by mail-pj1-x1049.google.com with SMTP id
 i6-20020a17090a718600b001dc87aca289so6205339pjk.5
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=+WjPdyMxAdY9z5Dvu+MUV583iy9AH2h8aY84f8u+NlQ=;
        b=drK5May8XtsY9ItuFqGUPfYPZSwPFC2vfWpY3F0n+trY0R+YgtulCP7PaD0tTJYX87
         /AQzQdehzBXCX/m9XBVR+UEuQM1EiQOLyBQFDJn4AEtc6yk4IHGVD6xEuPAit6b3utLc
         pexcE8FmNCBHJ3vKjD3Wuv4qRyCvS8bOd9hQ3GKXf6M92KdyaHgvs6OqDSt/PCZsV7zJ
         6k2Y0w02FfG3VXu16UcaUq+7hUYRzXqUDru05kCE3TYBetyhT54RS2zi6TSof9r0Hzxu
         PS3sd/eymY2do0BX/aNeXkVD9XqdOm59MkVD8or9jMJzISh5c1W9Xkbe0G/miCcx/pVQ
         EnQw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=+WjPdyMxAdY9z5Dvu+MUV583iy9AH2h8aY84f8u+NlQ=;
        b=WV2MWdCB4TgyTYdUrdWa3LAHTowoLarzpX8SbLmp4tZpnS5rsnlCFPl/PtQxsySdiD
         AKTuOCwTV6Do+yVCo1+NebE+lBi7IndK42R0lQaXg5/8DkUWN3ORu1RwbbeIwFvm0Bru
         RPBTOdVbzcRiQVYnUeTc7kYsZYOddFCwk16/7Z7wnJkQoqoXFToZRKJd82uO3w3DiHc5
         JqDg4DisEjZQpTLlabqNb2lGLOQarudn70Fb67+hnsXsKxsdNe0kiIu6FT/4TXXmZGSh
         /S8pKj9GtHaRwsEqitBxx8CinjfhJlXSOgZYhEDI40Dl/CyXl/qInwbm2EUtQBh5muuU
         B1YA==
X-Gm-Message-State: AOAM530lE4CNOuAz/WIJdtmQY3CkDDjZhhBUL7+VvXmi3VqF4RATgGL6
        AmoKJOAYXCACH3U4IJqg1T9bU2POi6+FxC9Y
X-Google-Smtp-Source: 
 ABdhPJzI7Dv8yggdDz/nPqxwdCwDu1Q9s6V8RXcBRcuirs2FNzx0DWjzi40c7FJoyyLmQlUTM+qr+DPFvYNQOqRN
X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327])
 (user=yosryahmed job=sendgmr) by 2002:a17:90b:610:b0:1d9:4008:cfee with SMTP
 id gb16-20020a17090b061000b001d94008cfeemr12637276pjb.71.1652582116715; Sat,
 14 May 2022 19:35:16 -0700 (PDT)
Date: Sun, 15 May 2022 02:35:00 +0000
In-Reply-To: <20220515023504.1823463-1-yosryahmed@google.com>
Message-Id: <20220515023504.1823463-4-yosryahmed@google.com>
Mime-Version: 1.0
References: <20220515023504.1823463-1-yosryahmed@google.com>
X-Mailer: git-send-email 2.36.0.550.gb090851708-goog
Subject: [RFC PATCH bpf-next v2 3/7] libbpf: Add support for rstat flush progs
From: Yosry Ahmed <yosryahmed@google.com>
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <kafai@fb.com>,
        Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
        John Fastabend <john.fastabend@gmail.com>,
        KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
        Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Shuah Khan <shuah@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Michal Hocko <mhocko@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>,
        David Rientjes <rientjes@google.com>,
        Greg Thelen <gthelen@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        bpf@vger.kernel.org, cgroups@vger.kernel.org,
        Yosry Ahmed <yosryahmed@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Add support to attach RSTAT_FLUSH programs.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 tools/lib/bpf/bpf.c      |  1 -
 tools/lib/bpf/libbpf.c   | 40 ++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h   |  3 +++
 tools/lib/bpf/libbpf.map |  1 +
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5660268e103f..9e3cb0d1eb99 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -870,7 +870,6 @@ int bpf_link_create(int prog_fd, int target_fd,
 		attr.link_create.tracing.cookie = OPTS_GET(opts, tracing.cookie, 0);
 		if (!OPTS_ZEROED(opts, tracing))
 			return libbpf_err(-EINVAL);
-		break;
 	default:
 		if (!OPTS_ZEROED(opts, flags))
 			return libbpf_err(-EINVAL);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4867a930628b..b7fc64ebf8dd 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8998,6 +8998,7 @@ static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_
 static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_iter(const struct bpf_program *prog, long cookie, struct bpf_link **link);
+static int attach_rstat(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 
 static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("socket",		SOCKET_FILTER, 0, SEC_NONE | SEC_SLOPPY_PFX),
@@ -9078,6 +9079,7 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("cgroup/setsockopt",	CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT, SEC_ATTACHABLE | SEC_SLOPPY_PFX),
 	SEC_DEF("struct_ops+",		STRUCT_OPS, 0, SEC_NONE),
 	SEC_DEF("sk_lookup",		SK_LOOKUP, BPF_SK_LOOKUP, SEC_ATTACHABLE | SEC_SLOPPY_PFX),
+	SEC_DEF("rstat/flush",		RSTAT_FLUSH, 0, SEC_NONE, attach_rstat),
 };
 
 static size_t custom_sec_def_cnt;
@@ -11784,6 +11786,44 @@ static int attach_iter(const struct bpf_program *prog, long cookie, struct bpf_l
 	return libbpf_get_error(*link);
 }
 
+struct bpf_link *bpf_program__attach_rstat(const struct bpf_program *prog)
+{
+	struct bpf_link *link = NULL;
+	char errmsg[STRERR_BUFSIZE];
+	int err, prog_fd, link_fd;
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		pr_warn("prog '%s': can't attach before loaded\n", prog->name);
+		return libbpf_err_ptr(-EINVAL);
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link)
+		return libbpf_err_ptr(-ENOMEM);
+	link->detach = &bpf_link__detach_fd;
+
+	/* rstat flushers are currently the only supported rstat programs */
+	link_fd = bpf_link_create(prog_fd, 0, BPF_RSTAT_FLUSH, NULL);
+	if (link_fd < 0) {
+		err = -errno;
+		pr_warn("prog '%s': failed to attach: %s\n",
+			prog->name, libbpf_strerror_r(err, errmsg,
+						      sizeof(errmsg)));
+		free(link);
+		return libbpf_err_ptr(err);
+	}
+
+	link->fd = link_fd;
+	return link;
+}
+
+static int attach_rstat(const struct bpf_program *prog, long cookie, struct bpf_link **link)
+{
+	*link = bpf_program__attach_rstat(prog);
+	return libbpf_get_error(*link);
+}
+
 struct bpf_link *bpf_program__attach(const struct bpf_program *prog)
 {
 	struct bpf_link *link = NULL;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 21984dcd6dbe..f8b6827d5550 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -662,6 +662,9 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_iter(const struct bpf_program *prog,
 			 const struct bpf_iter_attach_opts *opts);
 
+LIBBPF_API struct bpf_link *
+bpf_program__attach_rstat(const struct bpf_program *prog);
+
 /*
  * Libbpf allows callers to adjust BPF programs before being loaded
  * into kernel. One program in an object file can be transformed into
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 008da8db1d94..f945c6265cb5 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -449,6 +449,7 @@ LIBBPF_0.8.0 {
 		bpf_program__attach_kprobe_multi_opts;
 		bpf_program__attach_trace_opts;
 		bpf_program__attach_usdt;
+		bpf_program__attach_rstat;
 		bpf_program__set_insns;
 		libbpf_register_prog_handler;
 		libbpf_unregister_prog_handler;

From patchwork Sun May 15 02:35:01 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yosry Ahmed <yosryahmed@google.com>
X-Patchwork-Id: 12849967
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 67F37C433EF
	for <netdev@archiver.kernel.org>; Sun, 15 May 2022 02:36:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233934AbiEOCgL (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Sat, 14 May 2022 22:36:11 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57730 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233892AbiEOCfi (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 14 May 2022 22:35:38 -0400
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20CD0BF60
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:19 -0700 (PDT)
Received: by mail-pj1-x1049.google.com with SMTP id
 gg5-20020a17090b0a0500b001d9852bd129so5057227pjb.9
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=ZEXZ/QXFU/CsHogdpQ7ZUfr7HZOPm4+DGygzngWTcUA=;
        b=VzJ/P7BnyJakBf09tKg7be86cJFVoS7xh60Ky83Z8pkc38jrbA7tPJVJDl0Eavb7sy
         6F0n1AOFaVZODO+9LdufUSnZkesvSAeupz5vDNamKzKRiTYHyTYai/WNfIMjDTAPfFNc
         Df5vg2u6DjfW9LoKqgIgpQaHkfgIflGFQUkUGRPCgIruOzPU5oIaRAlLNLehj6bhk8sk
         /DQRFoHI8weERppdosnMt3A10xwHidKLIGP7oJNURDjePFeKujvw6ulliemweAxKJGmQ
         tKuDQylnBRQnE0Vw3efTiKkc99W1cN+ZX4CZpBVRrBxeS97ufLx7uvBUZVh+hsinhp8C
         t4lQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=ZEXZ/QXFU/CsHogdpQ7ZUfr7HZOPm4+DGygzngWTcUA=;
        b=ROz77hKAx1KPK6UnfNF3W1UzvjYZ2LbNhY85w18Gx2t0AseQnpXrDjbPoS5yJs+APQ
         /LD+KQJu0c9NLpSVcwDJ+srTb6qvfjVmDkT8p51KNcVIWCdTHhk3hm8jM4pVq3w6eSGa
         HtFTnP1gny0UhmeWlnl85XZkwyt2evKa76EAQbi4f+b8BnDuahP7pq34onWuRFHZU1t1
         Wva1aYjjDCNX0swFjs2hEoOu7/DoSZKFrNJNAACJIDj/fMu66D8fpY6FBffFehsvfzgl
         3+6PpPLz+boZQ2A+BpoeGsRM0kSmiNvACr7Duc1b3AEiOs32z1/dTXAT7yJndZp/eTNn
         g+CQ==
X-Gm-Message-State: AOAM531PgRukDDFy4A6p9WlMO7CUkrgVfY6ZOvw1QhM/jDxhO4xL3mDU
        cnBHYGnVt/XTmnb1QxzU/7mao0juLAqcEGKH
X-Google-Smtp-Source: 
 ABdhPJzki7EdQw0sGqwb0SP3TzxosU+OOtJIEU8WS5UAu6wm/vCL6t2hRjdQlKKt/jSEjM7YFHBDBuu6oPhFXUWQ
X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327])
 (user=yosryahmed job=sendgmr) by 2002:a17:902:da8b:b0:15e:aba7:43fe with SMTP
 id j11-20020a170902da8b00b0015eaba743femr11765556plx.9.1652582118569; Sat, 14
 May 2022 19:35:18 -0700 (PDT)
Date: Sun, 15 May 2022 02:35:01 +0000
In-Reply-To: <20220515023504.1823463-1-yosryahmed@google.com>
Message-Id: <20220515023504.1823463-5-yosryahmed@google.com>
Mime-Version: 1.0
References: <20220515023504.1823463-1-yosryahmed@google.com>
X-Mailer: git-send-email 2.36.0.550.gb090851708-goog
Subject: [RFC PATCH bpf-next v2 4/7] bpf: add bpf rstat helpers
From: Yosry Ahmed <yosryahmed@google.com>
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <kafai@fb.com>,
        Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
        John Fastabend <john.fastabend@gmail.com>,
        KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
        Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Shuah Khan <shuah@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Michal Hocko <mhocko@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>,
        David Rientjes <rientjes@google.com>,
        Greg Thelen <gthelen@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        bpf@vger.kernel.org, cgroups@vger.kernel.org,
        Yosry Ahmed <yosryahmed@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Add bpf_cgroup_rstat_updated() and bpf_cgroup_rstat_flush() helpers
to enable  bpf programs that collect and output cgroup stats
to communicate with the rstat frameworkto add a cgroup to the rstat
updated tree or trigger an rstat flush before reading stats.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/bpf.h            |  2 ++
 include/uapi/linux/bpf.h       | 18 ++++++++++++++++++
 kernel/bpf/helpers.c           | 30 ++++++++++++++++++++++++++++++
 kernel/trace/bpf_trace.c       |  4 ++++
 scripts/bpf_doc.py             |  2 ++
 tools/include/uapi/linux/bpf.h | 18 ++++++++++++++++++
 6 files changed, 74 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5061ccd8b2dc..ca908a731cb4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2205,6 +2205,8 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto;
 extern const struct bpf_func_proto bpf_sock_hash_update_proto;
 extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
 extern const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto;
+extern const struct bpf_func_proto bpf_cgroup_rstat_updated_proto;
+extern const struct bpf_func_proto bpf_cgroup_rstat_flush_proto;
 extern const struct bpf_func_proto bpf_msg_redirect_hash_proto;
 extern const struct bpf_func_proto bpf_msg_redirect_map_proto;
 extern const struct bpf_func_proto bpf_sk_redirect_hash_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 968e3cb02580..022522174286 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5175,6 +5175,22 @@ union bpf_attr {
  * 	Return
  * 		Map value associated to *key* on *cpu*, or **NULL** if no entry
  * 		was found or *cpu* is invalid.
+ *
+ * void bpf_cgroup_rstat_updated(struct cgroup *cgrp)
+ *	Description
+ *		Notify the rstat framework that bpf stats were updated for
+ *		*cgrp* on the current cpu. Directly calls cgroup_rstat_updated
+ *		with the given *cgrp* and the current cpu.
+ *	Return
+ *		0
+ *
+ * void bpf_cgroup_rstat_flush(struct cgroup *cgrp)
+ *	Description
+ *		Collect all per-cpu stats in *cgrp*'s subtree into global
+ *		counters and propagate them upwards. Directly calls
+ *		cgroup_rstat_flush_irqsafe with the given *cgrp*.
+ *	Return
+ *		0
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5373,6 +5389,8 @@ union bpf_attr {
 	FN(ima_file_hash),		\
 	FN(kptr_xchg),			\
 	FN(map_lookup_percpu_elem),     \
+	FN(cgroup_rstat_updated),	\
+	FN(cgroup_rstat_flush),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index d5f104a39092..88ed26cf45e2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -416,6 +416,36 @@ const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto = {
 	.arg1_type	= ARG_ANYTHING,
 };
 
+BTF_ID_LIST_SINGLE(bpf_cgroup_btf_ids, struct, cgroup)
+
+BPF_CALL_1(bpf_cgroup_rstat_updated, struct cgroup *, cgrp)
+{
+	cgroup_rstat_updated(cgrp, smp_processor_id());
+	return 0;
+}
+
+const struct bpf_func_proto bpf_cgroup_rstat_updated_proto = {
+	.func		= bpf_cgroup_rstat_updated,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg1_btf_id	= &bpf_cgroup_btf_ids[0],
+};
+
+BPF_CALL_1(bpf_cgroup_rstat_flush, struct cgroup *, cgrp)
+{
+	cgroup_rstat_flush_irqsafe(cgrp);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_cgroup_rstat_flush_proto = {
+	.func		= bpf_cgroup_rstat_flush,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg1_btf_id	= &bpf_cgroup_btf_ids[0],
+};
+
 #ifdef CONFIG_CGROUP_BPF
 
 BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7141ca8a1c2d..e5a4f1b6e00d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1255,6 +1255,10 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_current_cgroup_id_proto;
 	case BPF_FUNC_get_current_ancestor_cgroup_id:
 		return &bpf_get_current_ancestor_cgroup_id_proto;
+	case BPF_FUNC_cgroup_rstat_updated:
+		return &bpf_cgroup_rstat_updated_proto;
+	case BPF_FUNC_cgroup_rstat_flush:
+		return &bpf_cgroup_rstat_flush_proto;
 #endif
 	case BPF_FUNC_send_signal:
 		return &bpf_send_signal_proto;
diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
index 096625242475..9e2b08557a6f 100755
--- a/scripts/bpf_doc.py
+++ b/scripts/bpf_doc.py
@@ -633,6 +633,7 @@ class PrinterHelpers(Printer):
             'struct socket',
             'struct file',
             'struct bpf_timer',
+            'struct cgroup',
     ]
     known_types = {
             '...',
@@ -682,6 +683,7 @@ class PrinterHelpers(Printer):
             'struct socket',
             'struct file',
             'struct bpf_timer',
+            'struct cgroup',
     }
     mapped_types = {
             'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 968e3cb02580..022522174286 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5175,6 +5175,22 @@ union bpf_attr {
  * 	Return
  * 		Map value associated to *key* on *cpu*, or **NULL** if no entry
  * 		was found or *cpu* is invalid.
+ *
+ * void bpf_cgroup_rstat_updated(struct cgroup *cgrp)
+ *	Description
+ *		Notify the rstat framework that bpf stats were updated for
+ *		*cgrp* on the current cpu. Directly calls cgroup_rstat_updated
+ *		with the given *cgrp* and the current cpu.
+ *	Return
+ *		0
+ *
+ * void bpf_cgroup_rstat_flush(struct cgroup *cgrp)
+ *	Description
+ *		Collect all per-cpu stats in *cgrp*'s subtree into global
+ *		counters and propagate them upwards. Directly calls
+ *		cgroup_rstat_flush_irqsafe with the given *cgrp*.
+ *	Return
+ *		0
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5373,6 +5389,8 @@ union bpf_attr {
 	FN(ima_file_hash),		\
 	FN(kptr_xchg),			\
 	FN(map_lookup_percpu_elem),     \
+	FN(cgroup_rstat_updated),	\
+	FN(cgroup_rstat_flush),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

From patchwork Sun May 15 02:35:02 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yosry Ahmed <yosryahmed@google.com>
X-Patchwork-Id: 12849965
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E426DC4332F
	for <netdev@archiver.kernel.org>; Sun, 15 May 2022 02:35:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234497AbiEOCfl (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Sat, 14 May 2022 22:35:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56708 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233918AbiEOCfi (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 14 May 2022 22:35:38 -0400
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7366C95B7
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:20 -0700 (PDT)
Received: by mail-pg1-x54a.google.com with SMTP id
 e4-20020a635004000000b003f252bb65faso269401pgb.2
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=mivptsETNqNDdcSME2iU77WtyhUxYujtfDC8wYBwYo8=;
        b=rMya0QjL40CIpu8Uv/ByKWJyDrwJG5QrMXKxZNnicY5ZklKuCvsXB5oQ2U+Z5FoVPS
         NNID7KJ++02OQhQGC/O7udgQAt+/sF5cjRYPZqcf8Mj7qgc9httTO8QX5l1pGxivwgop
         eXCz8XpLdo0k56s/4Xp2sMxYTBtxDAeEU1Lt4gvtI33dWiQaKbR2O99nlVMoQRoVY7s7
         BBXZvaQSBFuYR6uMsX3D3mTwWX1M9QSzWMCbRG5fP5THFSoKblyRNfBoPrTCL9dO8hF1
         PEq0BiUdinr0dtZQK4V8fQBCrsCqIWbguGR/Jzq4tqE9PEOy01Xl8+B35dPbm6xq8IrW
         +B9w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=mivptsETNqNDdcSME2iU77WtyhUxYujtfDC8wYBwYo8=;
        b=Vpqx6+osDA30sIi3Bn1elFBlvHrxhHaGnTPSQ28Vdkz5TEkr7NZCt07m5N1k3PrGLm
         iCPP0KSjgcRaEQ5YnvuBvLW4IIv2mgxYnGTIgD6fxs0wZI9M7VNuuLlvW2VdNa2CxjUa
         H8unUqyfclO3VB0dC4ubyG2U50FNW5pte+Xsxhtx3SlCW4yqR6puSj88KZIaCLa/IoUy
         VfwBiDQE02iDAbFjXY/ZyKLI6KwwvbtSdB9GeM0/wHgBoGCY0XSRJLU2p2OgRZXDBH1r
         CccfOZeGR3jzbDj/N7Xr2jGFUafxag6P6zyqrZQ3gxq05fbXwACfGWch48+P7a3Lkcel
         1dZQ==
X-Gm-Message-State: AOAM530AsBovcw03pe8ua+Bgb5VchhzZ1PwBYGwpI4Siqjy6FKCoLWgr
        oLw+2KZcD71M+XIb2upl3qEice2i/wEysRxl
X-Google-Smtp-Source: 
 ABdhPJwounggXga93PA2rBarvjkcnVYOVj3yjgHbf+4I8K2y2Goh8JfuUZYlGSM5QCQ9r6xCJ8cAfkvFDofNY+Iu
X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327])
 (user=yosryahmed job=sendgmr) by 2002:a17:903:24f:b0:15c:e3b8:a640 with SMTP
 id j15-20020a170903024f00b0015ce3b8a640mr11562712plh.5.1652582119956; Sat, 14
 May 2022 19:35:19 -0700 (PDT)
Date: Sun, 15 May 2022 02:35:02 +0000
In-Reply-To: <20220515023504.1823463-1-yosryahmed@google.com>
Message-Id: <20220515023504.1823463-6-yosryahmed@google.com>
Mime-Version: 1.0
References: <20220515023504.1823463-1-yosryahmed@google.com>
X-Mailer: git-send-email 2.36.0.550.gb090851708-goog
Subject: [RFC PATCH bpf-next v2 5/7] bpf: Introduce cgroup iter
From: Yosry Ahmed <yosryahmed@google.com>
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <kafai@fb.com>,
        Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
        John Fastabend <john.fastabend@gmail.com>,
        KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
        Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Shuah Khan <shuah@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Michal Hocko <mhocko@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>,
        David Rientjes <rientjes@google.com>,
        Greg Thelen <gthelen@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        bpf@vger.kernel.org, cgroups@vger.kernel.org,
        Yosry Ahmed <yosryahmed@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

From: Hao Luo <haoluo@google.com>

Introduce a new type of iter prog: cgroup. Unlike other bpf_iter, this
iter doesn't iterate a set of kernel objects. Instead, it is supposed to
be parameterized by a cgroup id and prints only that cgroup. So one
needs to specify a target cgroup id when attaching this iter. The target
cgroup's state can be read out via a link of this iter.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/bpf.h            |   2 +
 include/uapi/linux/bpf.h       |   6 ++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/cgroup_iter.c       | 148 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |   6 ++
 5 files changed, 163 insertions(+), 1 deletion(-)
 create mode 100644 kernel/bpf/cgroup_iter.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ca908a731cb4..45076c581f24 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -44,6 +44,7 @@ struct kobject;
 struct mem_cgroup;
 struct module;
 struct bpf_func_state;
+struct cgroup;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1579,6 +1580,7 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
 
 struct bpf_iter_aux_info {
 	struct bpf_map *map;
+	struct cgroup *cgroup;
 };
 
 typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 022522174286..9a93b72bf39c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -91,6 +91,9 @@ union bpf_iter_link_info {
 	struct {
 		__u32	map_fd;
 	} map;
+	struct {
+		__u64	cgroup_id;
+	} cgroup;
 };
 
 /* BPF syscall commands, see bpf(2) man-page for more details. */
@@ -5986,6 +5989,9 @@ struct bpf_link_info {
 				struct {
 					__u32 map_id;
 				} map;
+				struct {
+					__u64 cgroup_id;
+				} cgroup;
 			};
 		} iter;
 		struct  {
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 0487133b799f..f2a6fd0633d6 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -37,7 +37,7 @@ obj-${CONFIG_BPF_LSM} += bpf_lsm.o
 endif
 obj-$(CONFIG_BPF_PRELOAD) += preload/
 ifeq ($(CONFIG_CGROUPS),y)
-obj-$(CONFIG_BPF_SYSCALL) += rstat.o
+obj-$(CONFIG_BPF_SYSCALL) += rstat.o cgroup_iter.o
 endif
 
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
new file mode 100644
index 000000000000..86bdfe135d24
--- /dev/null
+++ b/kernel/bpf/cgroup_iter.c
@@ -0,0 +1,148 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2022 Google */
+#include <linux/bpf.h>
+#include <linux/btf_ids.h>
+#include <linux/cgroup.h>
+#include <linux/kernel.h>
+#include <linux/seq_file.h>
+
+struct bpf_iter__cgroup {
+	__bpf_md_ptr(struct bpf_iter_meta *, meta);
+	__bpf_md_ptr(struct cgroup *, cgroup);
+};
+
+static void *cgroup_iter_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	/* Only one session is supported. */
+	if (*pos > 0)
+		return NULL;
+
+	if (*pos == 0)
+		++*pos;
+
+	return *(struct cgroup **)seq->private;
+}
+
+static void *cgroup_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	++*pos;
+	return NULL;
+}
+
+static int cgroup_iter_seq_show(struct seq_file *seq, void *v)
+{
+	struct bpf_iter__cgroup ctx;
+	struct bpf_iter_meta meta;
+	struct bpf_prog *prog;
+	int ret = 0;
+
+	ctx.meta = &meta;
+	ctx.cgroup = v;
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, false);
+	if (prog)
+		ret = bpf_iter_run_prog(prog, &ctx);
+
+	return ret;
+}
+
+static void cgroup_iter_seq_stop(struct seq_file *seq, void *v)
+{
+}
+
+static const struct seq_operations cgroup_iter_seq_ops = {
+	.start  = cgroup_iter_seq_start,
+	.next   = cgroup_iter_seq_next,
+	.stop   = cgroup_iter_seq_stop,
+	.show   = cgroup_iter_seq_show,
+};
+
+BTF_ID_LIST_SINGLE(bpf_cgroup_btf_id, struct, cgroup)
+
+static int cgroup_iter_seq_init(void *priv_data, struct bpf_iter_aux_info *aux)
+{
+	*(struct cgroup **)priv_data = aux->cgroup;
+	return 0;
+}
+
+static const struct bpf_iter_seq_info cgroup_iter_seq_info = {
+	.seq_ops                = &cgroup_iter_seq_ops,
+	.init_seq_private       = cgroup_iter_seq_init,
+	.seq_priv_size          = sizeof(struct cgroup *),
+};
+
+static int bpf_iter_attach_cgroup(struct bpf_prog *prog,
+				  union bpf_iter_link_info *linfo,
+				  struct bpf_iter_aux_info *aux)
+{
+	struct cgroup *cgroup;
+
+	cgroup = cgroup_get_from_id(linfo->cgroup.cgroup_id);
+	if (!cgroup)
+		return -EBUSY;
+
+	aux->cgroup = cgroup;
+	return 0;
+}
+
+static void bpf_iter_detach_cgroup(struct bpf_iter_aux_info *aux)
+{
+	if (aux->cgroup)
+		cgroup_put(aux->cgroup);
+}
+
+static void bpf_iter_cgroup_show_fdinfo(const struct bpf_iter_aux_info *aux,
+					struct seq_file *seq)
+{
+	char *buf;
+
+	seq_printf(seq, "cgroup_id:\t%llu\n", cgroup_id(aux->cgroup));
+
+	buf = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (!buf) {
+		seq_puts(seq, "cgroup_path:\n");
+		return;
+	}
+
+	/* If cgroup_path_ns() fails, buf will be an empty string, cgroup_path
+	 * will print nothing.
+	 *
+	 * Cgroup_path is the path in the calliing process's cgroup namespace.
+	 */
+	cgroup_path_ns(aux->cgroup, buf, sizeof(buf),
+		       current->nsproxy->cgroup_ns);
+	seq_printf(seq, "cgroup_path:\t%s\n", buf);
+	kfree(buf);
+}
+
+static int bpf_iter_cgroup_fill_link_info(const struct bpf_iter_aux_info *aux,
+					  struct bpf_link_info *info)
+{
+	info->iter.cgroup.cgroup_id = cgroup_id(aux->cgroup);
+	return 0;
+}
+
+DEFINE_BPF_ITER_FUNC(cgroup, struct bpf_iter_meta *meta,
+		     struct cgroup *cgroup)
+
+static struct bpf_iter_reg bpf_cgroup_reg_info = {
+	.target			= "cgroup",
+	.attach_target		= bpf_iter_attach_cgroup,
+	.detach_target		= bpf_iter_detach_cgroup,
+	.show_fdinfo		= bpf_iter_cgroup_show_fdinfo,
+	.fill_link_info		= bpf_iter_cgroup_fill_link_info,
+	.ctx_arg_info_size	= 1,
+	.ctx_arg_info		= {
+		{ offsetof(struct bpf_iter__cgroup, cgroup),
+		  PTR_TO_BTF_ID },
+	},
+	.seq_info		= &cgroup_iter_seq_info,
+};
+
+static int __init bpf_cgroup_iter_init(void)
+{
+	bpf_cgroup_reg_info.ctx_arg_info[0].btf_id = bpf_cgroup_btf_id[0];
+	return bpf_iter_reg_target(&bpf_cgroup_reg_info);
+}
+
+late_initcall(bpf_cgroup_iter_init);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 022522174286..9a93b72bf39c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -91,6 +91,9 @@ union bpf_iter_link_info {
 	struct {
 		__u32	map_fd;
 	} map;
+	struct {
+		__u64	cgroup_id;
+	} cgroup;
 };
 
 /* BPF syscall commands, see bpf(2) man-page for more details. */
@@ -5986,6 +5989,9 @@ struct bpf_link_info {
 				struct {
 					__u32 map_id;
 				} map;
+				struct {
+					__u64 cgroup_id;
+				} cgroup;
 			};
 		} iter;
 		struct  {

From patchwork Sun May 15 02:35:03 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yosry Ahmed <yosryahmed@google.com>
X-Patchwork-Id: 12849968
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 332B3C433F5
	for <netdev@archiver.kernel.org>; Sun, 15 May 2022 02:36:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234237AbiEOCgO (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Sat, 14 May 2022 22:36:14 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57758 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234083AbiEOCfj (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 14 May 2022 22:35:39 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7586ACE03
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:22 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 e4-20020a635004000000b003f252bb65faso269444pgb.2
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=s9mC8a74/srzDfRXqOZJ2sLxbf44Cz+oTregtiTYZa4=;
        b=sio8SB9GLeVm2ABq3LCIwDcFrLFp0wymSz5h06ZJ9wWecTaz3Y41jkp0lbsGSrHfds
         6cd/vVLTz39FDO5PyFS+XMKzB3MdXpee7BgBbh876pPGTZXyXFyvovqLsX1H8zoAkCbr
         fTmZaftXfpfyu7Jj66usYonyOPxGvmO3pQvbgoS0PmzvIa5M4UuJsB4a6TlChO3sRD9H
         iy/OslzUKo3n3mIVLWzbm6hgdNPCMOgQPkcxjoF6df1JmpZQOs3Y2f5HYQXu9DaviHL7
         N8vSYg2UVvfOi4JGAyvfRSrNYDMCmtbg63jTJ6cEGNjprmoKxgnmWnX8bacMYkthLT+c
         Cz2Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=s9mC8a74/srzDfRXqOZJ2sLxbf44Cz+oTregtiTYZa4=;
        b=FaRYxE2HEWnEaFfNQox4nHqZ7ssmPVxxSv2aXV7a/ZciAZpj6/UHmr+kbmyK7+vSrA
         AWmXOx8NVM4QX21ZnStRzbUWuNiLtJqIzKLjLWOnrzEVcrQmi/eLMVlhbdcyiri44wwT
         YZkt4Gk/y6vAFm7kYHtuCNVXLvhV9/xivbY4GMANahCSTUM/10FaYqfrAfqfgCG6kKw4
         xBnp205zlgngbmxcC6etQKI3xatbLmB+Ot//5Xu89yf/pni8onunRFfr/DqI0qGO367j
         pXhInHfKMKbU/IA/klWB523mmiO8is4d2KDY65bcmmL2hG0HdL8XtTcStZG2eXBiWHa0
         r6ag==
X-Gm-Message-State: AOAM533P+eIxjcD/7Y3Op9220Ou8neoA+23dB4clfaaI0snHsPuxoPGk
        ES4GLqnV4AGKwe0GbQa0Yc4sWtEWKwqLztUM
X-Google-Smtp-Source: 
 ABdhPJx3mm1lS9ipjv+p2kumWtYDrRS1+5bYHsLTMf3wedJTnbNM63b6Qf8gFubzZGVLeCNkV1++qQnGzWdRObSl
X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327])
 (user=yosryahmed job=sendgmr) by 2002:a05:6a00:1145:b0:4f6:3ebc:a79b with
 SMTP id b5-20020a056a00114500b004f63ebca79bmr11679069pfm.41.1652582121541;
 Sat, 14 May 2022 19:35:21 -0700 (PDT)
Date: Sun, 15 May 2022 02:35:03 +0000
In-Reply-To: <20220515023504.1823463-1-yosryahmed@google.com>
Message-Id: <20220515023504.1823463-7-yosryahmed@google.com>
Mime-Version: 1.0
References: <20220515023504.1823463-1-yosryahmed@google.com>
X-Mailer: git-send-email 2.36.0.550.gb090851708-goog
Subject: [RFC PATCH bpf-next v2 6/7] selftests/bpf: extend cgroup helpers
From: Yosry Ahmed <yosryahmed@google.com>
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <kafai@fb.com>,
        Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
        John Fastabend <john.fastabend@gmail.com>,
        KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
        Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Shuah Khan <shuah@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Michal Hocko <mhocko@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>,
        David Rientjes <rientjes@google.com>,
        Greg Thelen <gthelen@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        bpf@vger.kernel.org, cgroups@vger.kernel.org,
        Yosry Ahmed <yosryahmed@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

This patch extends bpf selftests cgroup helpers in various ways:
- Expose enable_controllers() that allows tests to enable all or a
  subset of controllers for a specific cgroup.
- Add write_cgroup_file().
- Add join_cgroup_parent(). The cgroup workdir is based on the pid,
  therefore a spawned child cannot join the same cgroup hierarchy of the
  test through join_cgroup(). join_cgroup_parent() is used in child
  processes to join a cgroup under the parent's workdir.
- Distinguish relative and absolute cgroup paths in function arguments.
  Now relative paths are called relative_path, and absolute paths are
  called cgroup_path.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 tools/testing/selftests/bpf/cgroup_helpers.c | 158 ++++++++++++++-----
 tools/testing/selftests/bpf/cgroup_helpers.h |  14 +-
 2 files changed, 126 insertions(+), 46 deletions(-)

diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index 9d59c3990ca8..065270b2387f 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -33,49 +33,51 @@
 #define CGROUP_MOUNT_DFLT		"/sys/fs/cgroup"
 #define NETCLS_MOUNT_PATH		CGROUP_MOUNT_DFLT "/net_cls"
 #define CGROUP_WORK_DIR			"/cgroup-test-work-dir"
-#define format_cgroup_path(buf, path) \
+
+#define format_cgroup_path_pid(buf, path, pid) \
 	snprintf(buf, sizeof(buf), "%s%s%d%s", CGROUP_MOUNT_PATH, \
-	CGROUP_WORK_DIR, getpid(), path)
+	CGROUP_WORK_DIR, pid, path)
+
+#define format_cgroup_path(buf, path) \
+	format_cgroup_path_pid(buf, path, getpid())
+
+#define format_parent_cgroup_path(buf, path) \
+	format_cgroup_path_pid(buf, path, getppid())
 
 #define format_classid_path(buf)				\
 	snprintf(buf, sizeof(buf), "%s%s", NETCLS_MOUNT_PATH,	\
 		 CGROUP_WORK_DIR)
 
-/**
- * enable_all_controllers() - Enable all available cgroup v2 controllers
- *
- * Enable all available cgroup v2 controllers in order to increase
- * the code coverage.
- *
- * If successful, 0 is returned.
- */
-static int enable_all_controllers(char *cgroup_path)
+
+static int __enable_controllers(const char *cgroup_path, const char *controllers)
 {
 	char path[PATH_MAX + 1];
-	char buf[PATH_MAX];
+	char enable[PATH_MAX + 1];
 	char *c, *c2;
 	int fd, cfd;
 	ssize_t len;
 
-	snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
-	fd = open(path, O_RDONLY);
-	if (fd < 0) {
-		log_err("Opening cgroup.controllers: %s", path);
-		return 1;
-	}
+	/* If not controllers are passed, enable all available controllers */
+	if (!controllers) {
+		snprintf(path, sizeof(path), "%s/cgroup.controllers",
+			 cgroup_path);
+		fd = open(path, O_RDONLY);
+		if (fd < 0) {
+			log_err("Opening cgroup.controllers: %s", path);
+			return 1;
+		}
 
-	len = read(fd, buf, sizeof(buf) - 1);
-	if (len < 0) {
+		len = read(fd, enable, sizeof(enable) - 1);
+		if (len < 0) {
+			close(fd);
+			log_err("Reading cgroup.controllers: %s", path);
+			return 1;
+		} else if (len == 0) /* No controllers to enable */
+			return 0;
+		enable[len] = 0;
 		close(fd);
-		log_err("Reading cgroup.controllers: %s", path);
-		return 1;
-	}
-	buf[len] = 0;
-	close(fd);
-
-	/* No controllers available? We're probably on cgroup v1. */
-	if (len == 0)
-		return 0;
+	} else
+		strncpy(enable, controllers, sizeof(enable));
 
 	snprintf(path, sizeof(path), "%s/cgroup.subtree_control", cgroup_path);
 	cfd = open(path, O_RDWR);
@@ -84,7 +86,7 @@ static int enable_all_controllers(char *cgroup_path)
 		return 1;
 	}
 
-	for (c = strtok_r(buf, " ", &c2); c; c = strtok_r(NULL, " ", &c2)) {
+	for (c = strtok_r(enable, " ", &c2); c; c = strtok_r(NULL, " ", &c2)) {
 		if (dprintf(cfd, "+%s\n", c) <= 0) {
 			log_err("Enabling controller %s: %s", c, path);
 			close(cfd);
@@ -95,6 +97,63 @@ static int enable_all_controllers(char *cgroup_path)
 	return 0;
 }
 
+/**
+ * enable_controllers() - Enable cgroup v2 controllers
+ * @relative_path: The cgroup path, relative to the workdir
+ * @controllers: List of controllers to enable in cgroup.controllers format
+ *
+ *
+ * Enable given cgroup v2 controllers, if @controllers is NULL, enable all
+ * available controllers.
+ *
+ * If successful, 0 is returned.
+ */
+int enable_controllers(const char *relative_path, const char *controllers)
+{
+	char cgroup_path[PATH_MAX + 1];
+
+	format_cgroup_path(cgroup_path, relative_path);
+	return __enable_controllers(cgroup_path, controllers);
+}
+
+
+
+/**
+ * write_cgroup_file() - Write to a cgroup file
+ * @relative_path: The cgroup path, relative to the workdir
+ * @buf: Buffer to write to the file
+ *
+ * Write to a file in the given cgroup's directory.
+ *
+ * If successful, 0 is returned.
+ */
+int write_cgroup_file(const char *relative_path, const char *file,
+		      const char *buf)
+{
+	char cgroup_path[PATH_MAX - 24];
+	char file_path[PATH_MAX + 1];
+	int fd;
+
+	format_cgroup_path(cgroup_path, relative_path);
+
+	snprintf(file_path, sizeof(file_path), "%s/%s", cgroup_path, file);
+	fd = open(file_path, O_RDWR);
+	if (fd < 0) {
+		log_err("Opening cgroup.subtree_control: %s", file_path);
+		return 1;
+	}
+
+	if (dprintf(fd, "%s", buf) <= 0) {
+		log_err("Writing to %s", file_path);
+		close(fd);
+		return 1;
+	}
+	close(fd);
+	return 0;
+}
+
+
+
 /**
  * setup_cgroup_environment() - Setup the cgroup environment
  *
@@ -133,7 +192,8 @@ int setup_cgroup_environment(void)
 		return 1;
 	}
 
-	if (enable_all_controllers(cgroup_workdir))
+	/* Enable all available controllers to increase test coverage */
+	if (__enable_controllers(cgroup_workdir, NULL))
 		return 1;
 
 	return 0;
@@ -173,7 +233,7 @@ static int join_cgroup_from_top(const char *cgroup_path)
 
 /**
  * join_cgroup() - Join a cgroup
- * @path: The cgroup path, relative to the workdir, to join
+ * @relative_path: The cgroup path, relative to the workdir, to join
  *
  * This function expects a cgroup to already be created, relative to the cgroup
  * work dir, and it joins it. For example, passing "/my-cgroup" as the path
@@ -182,11 +242,27 @@ static int join_cgroup_from_top(const char *cgroup_path)
  *
  * On success, it returns 0, otherwise on failure it returns 1.
  */
-int join_cgroup(const char *path)
+int join_cgroup(const char *relative_path)
+{
+	char cgroup_path[PATH_MAX + 1];
+
+	format_cgroup_path(cgroup_path, relative_path);
+	return join_cgroup_from_top(cgroup_path);
+}
+
+/**
+ * join_parent_cgroup() - Join a cgroup in the parent process workdir
+ * @relative_path: The cgroup path, relative to parent process workdir, to join
+ *
+ * See join_cgroup().
+ *
+ * On success, it returns 0, otherwise on failure it returns 1.
+ */
+int join_parent_cgroup(const char *relative_path)
 {
 	char cgroup_path[PATH_MAX + 1];
 
-	format_cgroup_path(cgroup_path, path);
+	format_parent_cgroup_path(cgroup_path, relative_path);
 	return join_cgroup_from_top(cgroup_path);
 }
 
@@ -214,7 +290,7 @@ void cleanup_cgroup_environment(void)
 
 /**
  * create_and_get_cgroup() - Create a cgroup, relative to workdir, and get the FD
- * @path: The cgroup path, relative to the workdir, to join
+ * @relative_path: The cgroup path, relative to the workdir, to join
  *
  * This function creates a cgroup under the top level workdir and returns the
  * file descriptor. It is idempotent.
@@ -222,14 +298,14 @@ void cleanup_cgroup_environment(void)
  * On success, it returns the file descriptor. On failure it returns -1.
  * If there is a failure, it prints the error to stderr.
  */
-int create_and_get_cgroup(const char *path)
+int create_and_get_cgroup(const char *relative_path)
 {
 	char cgroup_path[PATH_MAX + 1];
 	int fd;
 
-	format_cgroup_path(cgroup_path, path);
+	format_cgroup_path(cgroup_path, relative_path);
 	if (mkdir(cgroup_path, 0777) && errno != EEXIST) {
-		log_err("mkdiring cgroup %s .. %s", path, cgroup_path);
+		log_err("mkdiring cgroup %s .. %s", relative_path, cgroup_path);
 		return -1;
 	}
 
@@ -244,13 +320,13 @@ int create_and_get_cgroup(const char *path)
 
 /**
  * get_cgroup_id() - Get cgroup id for a particular cgroup path
- * @path: The cgroup path, relative to the workdir, to join
+ * @relative_path: The cgroup path, relative to the workdir, to join
  *
  * On success, it returns the cgroup id. On failure it returns 0,
  * which is an invalid cgroup id.
  * If there is a failure, it prints the error to stderr.
  */
-unsigned long long get_cgroup_id(const char *path)
+unsigned long long get_cgroup_id(const char *relative_path)
 {
 	int dirfd, err, flags, mount_id, fhsize;
 	union {
@@ -261,7 +337,7 @@ unsigned long long get_cgroup_id(const char *path)
 	struct file_handle *fhp, *fhp2;
 	unsigned long long ret = 0;
 
-	format_cgroup_path(cgroup_workdir, path);
+	format_cgroup_path(cgroup_workdir, relative_path);
 
 	dirfd = AT_FDCWD;
 	flags = 0;
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.h b/tools/testing/selftests/bpf/cgroup_helpers.h
index fcc9cb91b211..6b1d905557c7 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.h
+++ b/tools/testing/selftests/bpf/cgroup_helpers.h
@@ -10,11 +10,15 @@
 	__FILE__, __LINE__, clean_errno(), ##__VA_ARGS__)
 
 /* cgroupv2 related */
-int cgroup_setup_and_join(const char *path);
-int create_and_get_cgroup(const char *path);
-unsigned long long get_cgroup_id(const char *path);
+int enable_controllers(const char *relative_path, const char *controllers);
+int write_cgroup_file(const char *relative_path, const char *file,
+		      const char *buf);
+int cgroup_setup_and_join(const char *relative_path);
+int create_and_get_cgroup(const char *relative_path);
+unsigned long long get_cgroup_id(const char *relative_path);
 
-int join_cgroup(const char *path);
+int join_cgroup(const char *relative_path);
+int join_parent_cgroup(const char *relative_path);
 
 int setup_cgroup_environment(void);
 void cleanup_cgroup_environment(void);
@@ -26,4 +30,4 @@ int join_classid(void);
 int setup_classid_environment(void);
 void cleanup_classid_environment(void);
 
-#endif /* __CGROUP_HELPERS_H */
\ No newline at end of file
+#endif /* __CGROUP_HELPERS_H */

From patchwork Sun May 15 02:35:04 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yosry Ahmed <yosryahmed@google.com>
X-Patchwork-Id: 12849966
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 34EC3C433EF
	for <netdev@archiver.kernel.org>; Sun, 15 May 2022 02:35:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235392AbiEOCfn (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Sat, 14 May 2022 22:35:43 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56774 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233802AbiEOCfj (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 14 May 2022 22:35:39 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26BD9BF64
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:23 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 d125-20020a636883000000b003db5e24db27so4206810pgc.13
        for <netdev@vger.kernel.org>; Sat, 14 May 2022 19:35:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=4ENKHilLZHLMTMw4+nqEEfHlmot8pXJybwjNSGKN2t8=;
        b=RFXtiMzmlLVNDhAhBW+SRZLEi/xAPaX4Ws5tzVifkabLBMIf5OgIoUwH6FVsAKs2Ny
         rNoWZHrpgFI0Klw545iKF/naSCB0AAn/kYs3W/UJzyfL4FJV3/mFaoCH5OTHMiqwvbF1
         zkj3nzmHTuNhHIkwko9rliCOLRTV3qPM9zHX5ucYkjakFFUrY7CzxuZKL+vwUQujHEUK
         tPU1tuv8U4Pl9B0HQoNENW0t3oflo2m7eYf3F+4Imqp0JAX4U0sys6DSgtraE8XQiIR0
         VkH0w/GXGE1DyChLPCqiWBleYIdCHX46f7SzWVPg7nKEsrlKDZKSxXkLdof4O0/zshKZ
         7vRg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=4ENKHilLZHLMTMw4+nqEEfHlmot8pXJybwjNSGKN2t8=;
        b=oY856A+Veytiush8PakLogmIit5n6k+3037HC4etSMubJZ13Qze8UMniWe2Ry8XFpw
         CWzKvAD5BVKlK/RgVPx7aAdj+MvBiolcTR4bD9ew7UhVlVxSNDJwHDq9zDqS9RhD2pBh
         yxxbznI4qJwSw7QxSOir/ugy+CzXMfixADyv/deruj9d/mvwBy1wA1l1ng7oT4Jl1jpi
         psZChHgn+xlJmdDRtvVaSX7Y19XrMgQeoKn/sxY88J0gvYf1UyeN6N+OS/aCP9eh4fSh
         c6zL01Xx9uQE9DcKQL2wRQEsbumhOf7DX8gWJbYgyytXTb7IlrrMJMyWPqLSBc9PaMI+
         6quA==
X-Gm-Message-State: AOAM532Qi9PtBUbgyJruKf4hQ7taF+jiSbmDrLnUqbZQfjq9IZhguyea
        e/3YSooMzTW9YbS1Mt3XkKdvRsu/5xUBGegM
X-Google-Smtp-Source: 
 ABdhPJzvvoYy0xz+3rNxf2xJNcveA76SuWEHgpljCpcCpg28mhiJ/qzrTlFxaVrM23M8AVPeUVG2v7zguZfKyMi0
X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327])
 (user=yosryahmed job=sendgmr) by 2002:a63:a18:0:b0:3c6:12b1:a8d0 with SMTP id
 24-20020a630a18000000b003c612b1a8d0mr10113044pgk.534.1652582123333; Sat, 14
 May 2022 19:35:23 -0700 (PDT)
Date: Sun, 15 May 2022 02:35:04 +0000
In-Reply-To: <20220515023504.1823463-1-yosryahmed@google.com>
Message-Id: <20220515023504.1823463-8-yosryahmed@google.com>
Mime-Version: 1.0
References: <20220515023504.1823463-1-yosryahmed@google.com>
X-Mailer: git-send-email 2.36.0.550.gb090851708-goog
Subject: [RFC PATCH bpf-next v2 7/7] bpf: add a selftest for cgroup
 hierarchical stats collection
From: Yosry Ahmed <yosryahmed@google.com>
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <kafai@fb.com>,
        Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
        John Fastabend <john.fastabend@gmail.com>,
        KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
        Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Shuah Khan <shuah@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Michal Hocko <mhocko@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>,
        David Rientjes <rientjes@google.com>,
        Greg Thelen <gthelen@google.com>,
        Shakeel Butt <shakeelb@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        bpf@vger.kernel.org, cgroups@vger.kernel.org,
        Yosry Ahmed <yosryahmed@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Add a selftest that tests the whole workflow for collecting,
aggregating, and display cgroup hierarchical stats.

The test loads tracing bpf programs at the beginning and ending of
direct reclaim to measure the vmscan latency. Per-cgroup readings are
stored in percpu maps for efficiency. When a cgroup reading is updated,
bpf_cgroup_rstat_updated() is called to add the cgroup (and the current
cpu) to the rstat updated tree. When a cgroup is added to the rstat
updated tree, all its parents are added as well. rstat makes sure
cgroups are popped in a bottom up fashion.

When an rstat flush is invoked, an rstat flusher program is called for
per-cgroup per-cpu pairs on the updated tree. The program aggregates
percpu readings to a total reading, and also propagates them to the
parent. After rstat flushing is over, the program will have been invoked
for all (cgroup, cpu) pairs that have updates as well as their parents,
so the whole hierarchy will have updated (flushed) stats.

Finally, a cgroup_iter program is pinned to a file for each cgroup.
Reading this file invokes the cgroup_iter program to flush the stats and
display them to the user.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 .../test_cgroup_hierarchical_stats.c          | 339 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_iter.h  |   7 +
 .../selftests/bpf/progs/cgroup_vmscan.c       | 222 ++++++++++++
 3 files changed, 568 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c
 create mode 100644 tools/testing/selftests/bpf/progs/cgroup_vmscan.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c b/tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c
new file mode 100644
index 000000000000..feb325e7fc39
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Functions to manage eBPF programs attached to cgroup subsystems
+ *
+ * Copyright 2022 Google LLC.
+ */
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+#include <test_progs.h>
+
+#include "cgroup_helpers.h"
+#include "cgroup_vmscan.skel.h"
+
+#define PAGE_SIZE 4096
+#define MB(x) (x << 20)
+
+#define BPFFS_ROOT "/sys/fs/bpf/"
+#define BPFFS_VMSCAN BPFFS_ROOT"vmscan/"
+
+#define CG_ROOT_NAME "root"
+#define CG_ROOT_ID 1
+
+#define CGROUP_PATH(p, n) {.name = #n, .path = #p"/"#n}
+
+static struct {
+	const char *name, *path;
+	unsigned long long id;
+	int fd;
+} cgroups[] = {
+	CGROUP_PATH(/, test),
+	CGROUP_PATH(/test, child1),
+	CGROUP_PATH(/test, child2),
+	CGROUP_PATH(/test/child1, child1_1),
+	CGROUP_PATH(/test/child1, child1_2),
+	CGROUP_PATH(/test/child2, child2_1),
+	CGROUP_PATH(/test/child2, child2_2),
+};
+
+#define N_CGROUPS (sizeof(cgroups)/sizeof(cgroups[0]))
+#define N_NON_LEAF_CGROUPS 3
+
+bool mounted_bpffs;
+static int duration;
+
+static int read_from_file(const char *path, char *buf, size_t size)
+{
+	int fd, len;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		log_err("Open %s", path);
+		return -errno;
+	}
+	len = read(fd, buf, size);
+	if (len < 0)
+		log_err("Read %s", path);
+	else
+		buf[len] = 0;
+	close(fd);
+	return len < 0 ? -errno : 0;
+}
+
+static int setup_bpffs(void)
+{
+	int err;
+
+	/* Mount bpffs */
+	err = mount("bpf", BPFFS_ROOT, "bpf", 0, NULL);
+	mounted_bpffs = !err;
+	if (CHECK(err && errno != EBUSY, "mount bpffs",
+	      "failed to mount bpffs at %s (%s)\n", BPFFS_ROOT,
+	      strerror(errno)))
+		return err;
+
+	/* Create a directory to contain stat files in bpffs */
+	err = mkdir(BPFFS_VMSCAN, 0755);
+	CHECK(err, "mkdir bpffs", "failed to mkdir %s (%s)\n",
+	      BPFFS_VMSCAN, strerror(errno));
+	return err;
+}
+
+static void cleanup_bpffs(void)
+{
+	/* Remove created directory in bpffs */
+	CHECK(rmdir(BPFFS_VMSCAN), "rmdir", "failed to rmdir %s (%s)\n",
+	      BPFFS_VMSCAN, strerror(errno));
+
+	/* Unmount bpffs, if it wasn't already mounted when we started */
+	if (mounted_bpffs)
+		return;
+	CHECK(umount(BPFFS_ROOT), "umount", "failed to unmount bpffs (%s)\n",
+	      strerror(errno));
+}
+
+static int setup_cgroups(void)
+{
+	int i, err;
+
+	err = setup_cgroup_environment();
+	if (CHECK(err, "setup_cgroup_environment", "failed: %d\n", err))
+		return err;
+
+	for (i = 0; i < N_CGROUPS; i++) {
+		int fd;
+
+		fd = create_and_get_cgroup(cgroups[i].path);
+		if (!ASSERT_GE(fd, 0, "create_and_get_cgroup"))
+			return fd;
+
+		cgroups[i].fd = fd;
+		cgroups[i].id = get_cgroup_id(cgroups[i].path);
+		if (i < N_NON_LEAF_CGROUPS) {
+			err = enable_controllers(cgroups[i].path, "memory");
+			if (!ASSERT_OK(err, "enable_controllers"))
+				return err;
+		}
+	}
+	return 0;
+}
+
+static void cleanup_cgroups(void)
+{
+	for (int i = 0; i < N_CGROUPS; i++)
+		close(cgroups[i].fd);
+	cleanup_cgroup_environment();
+}
+
+
+static int setup_hierarchy(void)
+{
+	return setup_bpffs() || setup_cgroups();
+}
+
+static void destroy_hierarchy(void)
+{
+	cleanup_cgroups();
+	cleanup_bpffs();
+}
+
+static void alloc_anon(size_t size)
+{
+	char *buf, *ptr;
+
+	buf = malloc(size);
+	for (ptr = buf; ptr < buf + size; ptr += PAGE_SIZE)
+		*ptr = 0;
+	free(buf);
+}
+
+static int induce_vmscan(void)
+{
+	char size[128];
+	int i, err;
+
+	/*
+	 * Set memory.high for test parent cgroup to 1 MB to throttle
+	 * allocations and invoke reclaim in children.
+	 */
+	snprintf(size, 128, "%d", MB(1));
+	err = write_cgroup_file(cgroups[0].path, "memory.high",	size);
+	if (!ASSERT_OK(err, "write memory.high"))
+		return err;
+	/*
+	 * In every leaf cgroup, run a memory hog for a few seconds to induce
+	 * reclaim then kill it.
+	 */
+	for (i = N_NON_LEAF_CGROUPS; i < N_CGROUPS; i++) {
+		pid_t pid = fork();
+
+		if (pid == 0) {
+			/* Join cgroup in the parent process workdir */
+			join_parent_cgroup(cgroups[i].path);
+
+			/* Allocate more memory than memory.high */
+			alloc_anon(MB(2));
+			exit(0);
+		} else {
+			/* Wait for child to cause reclaim then kill it */
+			if (!ASSERT_GT(pid, 0, "fork"))
+				return pid;
+			sleep(2);
+			kill(pid, SIGKILL);
+			waitpid(pid, NULL, 0);
+		}
+	}
+	return 0;
+}
+
+static unsigned long long get_cgroup_vmscan(unsigned long long cgroup_id,
+					    const char *file_name)
+{
+	char buf[128], path[128];
+	unsigned long long vmscan = 0, id = 0;
+	int err;
+
+	/* For every cgroup, read the file generated by cgroup_iter */
+	snprintf(path, 128, "%s%s", BPFFS_VMSCAN, file_name);
+	err = read_from_file(path, buf, 128);
+	if (CHECK(err, "read", "failed to read from %s (%s)\n",
+		   path, strerror(errno)))
+		return 0;
+
+	/* Check the output file formatting */
+	ASSERT_EQ(sscanf(buf, "cg_id: %llu, total_vmscan_delay: %llu\n",
+			 &id, &vmscan), 2, "output format");
+
+	/* Check that the cgroup_id is displayed correctly */
+	ASSERT_EQ(cgroup_id, id, "cgroup_id");
+	/* Check that the vmscan reading is non-zero */
+	ASSERT_NEQ(vmscan, 0, "vmscan_reading");
+	return vmscan;
+}
+
+static void check_vmscan_stats(void)
+{
+	int i;
+	unsigned long long vmscan_readings[N_CGROUPS], vmscan_root;
+
+	for (i = 0; i < N_CGROUPS; i++)
+		vmscan_readings[i] = get_cgroup_vmscan(cgroups[i].id,
+						       cgroups[i].name);
+
+	/* Read stats for root too */
+	vmscan_root = get_cgroup_vmscan(CG_ROOT_ID, CG_ROOT_NAME);
+
+	/* Check that child1 == child1_1 + child1_2 */
+	ASSERT_EQ(vmscan_readings[1], vmscan_readings[3] + vmscan_readings[4],
+		  "child1_vmscan");
+	/* Check that child2 == child2_1 + child2_2 */
+	ASSERT_EQ(vmscan_readings[2], vmscan_readings[5] + vmscan_readings[6],
+		  "child2_vmscan");
+	/* Check that test == child1 + child2 */
+	ASSERT_EQ(vmscan_readings[0], vmscan_readings[1] + vmscan_readings[2],
+		  "test_vmscan");
+	/* Check that root >= test */
+	ASSERT_GE(vmscan_root, vmscan_readings[1], "root_vmscan");
+}
+
+static int setup_cgroup_iter(struct cgroup_vmscan *obj,
+			     unsigned long long cgroup_id,
+			     const char *file_name)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo = {};
+	struct bpf_link *link;
+	char path[128];
+	int err;
+
+	/* Create an iter link, parameterized by cgroup id */
+	linfo.cgroup.cgroup_id = cgroup_id;
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+	link = bpf_program__attach_iter(obj->progs.dump_vmscan, &opts);
+	if (!ASSERT_OK_PTR(link, "attach iter"))
+		return libbpf_get_error(link);
+
+	/* Pin the link to a bpffs file */
+	snprintf(path, 128, "%s%s", BPFFS_VMSCAN, file_name);
+	err = bpf_link__pin(link, path);
+	CHECK(err, "pin iter", "failed to pin iter at %s", path);
+	return err;
+}
+
+static int setup_progs(struct cgroup_vmscan **skel)
+{
+	int i;
+	struct bpf_link *link;
+	struct cgroup_vmscan *obj;
+
+	obj = cgroup_vmscan__open_and_load();
+	if (!ASSERT_OK_PTR(obj, "open_and_load"))
+		return libbpf_get_error(obj);
+
+	/* Attach cgroup_iter program that will dump the stats to cgroups */
+	for (i = 0; i < N_CGROUPS; i++)
+		setup_cgroup_iter(obj, cgroups[i].id, cgroups[i].name);
+	/* Also dump stats for root */
+	setup_cgroup_iter(obj, CG_ROOT_ID, CG_ROOT_NAME);
+
+	/* Attach rstat flusher */
+	link = bpf_program__attach(obj->progs.vmscan_flush);
+	if (!ASSERT_OK_PTR(link, "attach rstat"))
+		return libbpf_get_error(link);
+
+	/* Attach tracing programs that will calculate vmscan delays */
+	link = bpf_program__attach(obj->progs.vmscan_start);
+	if (!ASSERT_OK_PTR(obj, "attach raw_tracepoint"))
+		return libbpf_get_error(obj);
+
+	link = bpf_program__attach(obj->progs.vmscan_end);
+	if (!ASSERT_OK_PTR(obj, "attach raw_tracepoint"))
+		return libbpf_get_error(obj);
+
+	*skel = obj;
+	return 0;
+}
+
+void destroy_progs(struct cgroup_vmscan *skel)
+{
+	char path[128];
+	int i;
+
+	for (i = 0; i < N_CGROUPS; i++) {
+		/* Delete files in bpffs that cgroup_iters are pinned in */
+		snprintf(path, 128, "%s%s", BPFFS_VMSCAN,
+			 cgroups[i].name);
+		CHECK(remove(path), "remove", "failed to remove %s (%s)\n",
+		      path, strerror(errno));
+	}
+
+	/* Delete root file in bpffs */
+	snprintf(path, 128, "%s%s", BPFFS_VMSCAN, CG_ROOT_NAME);
+	CHECK(remove(path), "remove", "failed to remove %s (%s)\n", path,
+	      strerror(errno));
+	cgroup_vmscan__destroy(skel);
+}
+
+void test_cgroup_hierarchical_stats(void)
+{
+	struct cgroup_vmscan *skel = NULL;
+
+	if (setup_hierarchy())
+		goto hierarchy_cleanup;
+	if (setup_progs(&skel))
+		goto cleanup;
+	if (induce_vmscan())
+		goto cleanup;
+	check_vmscan_stats();
+cleanup:
+	destroy_progs(skel);
+hierarchy_cleanup:
+	destroy_hierarchy();
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h
index 97ec8bc76ae6..df91f1daf74d 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter.h
+++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
@@ -17,6 +17,7 @@
 #define bpf_iter__bpf_sk_storage_map bpf_iter__bpf_sk_storage_map___not_used
 #define bpf_iter__sockmap bpf_iter__sockmap___not_used
 #define bpf_iter__bpf_link bpf_iter__bpf_link___not_used
+#define bpf_iter__cgroup bpf_iter__cgroup__not_used
 #define btf_ptr btf_ptr___not_used
 #define BTF_F_COMPACT BTF_F_COMPACT___not_used
 #define BTF_F_NONAME BTF_F_NONAME___not_used
@@ -39,6 +40,7 @@
 #undef bpf_iter__bpf_sk_storage_map
 #undef bpf_iter__sockmap
 #undef bpf_iter__bpf_link
+#undef bpf_iter__cgroup
 #undef btf_ptr
 #undef BTF_F_COMPACT
 #undef BTF_F_NONAME
@@ -139,6 +141,11 @@ struct bpf_iter__bpf_link {
 	struct bpf_link *link;
 };
 
+struct bpf_iter__cgroup {
+	struct bpf_iter_meta *meta;
+	struct cgroup *cgroup;
+} __attribute((preserve_access_index));
+
 struct btf_ptr {
 	void *ptr;
 	__u32 type_id;
diff --git a/tools/testing/selftests/bpf/progs/cgroup_vmscan.c b/tools/testing/selftests/bpf/progs/cgroup_vmscan.c
new file mode 100644
index 000000000000..96aa62f7b260
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/cgroup_vmscan.c
@@ -0,0 +1,222 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Functions to manage eBPF programs attached to cgroup subsystems
+ *
+ * Copyright 2022 Google LLC.
+ */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+char _license[] SEC("license") = "GPL";
+
+/*
+ * Start times are stored per-task, not per-cgroup, as multiple tasks in one
+ * cgroup can perform reclain concurrently.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, __u64);
+} vmscan_start_time SEC(".maps");
+
+struct vmscan_percpu {
+	/* Previous percpu state, to figure out if we have new updates */
+	__u64 prev;
+	/* Current percpu state */
+	__u64 state;
+};
+
+struct vmscan {
+	/* State propagated through children, pending aggregation */
+	__u64 pending;
+	/* Total state, including all cpus and all children */
+	__u64 state;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_HASH);
+	__uint(max_entries, 10);
+	__type(key, __u64);
+	__type(value, struct vmscan_percpu);
+} pcpu_cgroup_vmscan_elapsed SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 10);
+	__type(key, __u64);
+	__type(value, struct vmscan);
+} cgroup_vmscan_elapsed SEC(".maps");
+
+
+static inline bool memory_subsys_enabled(struct cgroup *cgrp)
+{
+	return cgrp->subsys[memory_cgrp_id] != NULL;
+}
+
+static inline struct cgroup *task_memcg(struct task_struct *task)
+{
+	return task->cgroups->subsys[memory_cgrp_id]->cgroup;
+}
+
+static inline uint64_t cgroup_id(struct cgroup *cgrp)
+{
+	return cgrp->kn->id;
+}
+
+static inline int create_vmscan_percpu_elem(__u64 cg_id, __u64 state)
+{
+	struct vmscan_percpu pcpu_init = {.state = state, .prev = 0};
+
+	if (bpf_map_update_elem(&pcpu_cgroup_vmscan_elapsed, &cg_id,
+				&pcpu_init, BPF_NOEXIST)) {
+		bpf_printk("failed to create pcpu entry for cgroup %llu\n"
+			   , cg_id);
+		return 1;
+	}
+	return 0;
+}
+
+static inline int create_vmscan_elem(__u64 cg_id, __u64 state, __u64 pending)
+{
+	struct vmscan init = {.state = state, .pending = pending};
+
+	if (bpf_map_update_elem(&cgroup_vmscan_elapsed, &cg_id,
+				&init, BPF_NOEXIST)) {
+		bpf_printk("failed to create entry for cgroup %llu\n"
+			   , cg_id);
+		return 1;
+	}
+	return 0;
+}
+
+SEC("raw_tp/mm_vmscan_memcg_reclaim_begin")
+int vmscan_start(struct lruvec *lruvec, struct scan_control *sc)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	__u64 *start_time_ptr;
+
+	start_time_ptr = bpf_task_storage_get(&vmscan_start_time, task, 0,
+					  BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!start_time_ptr) {
+		bpf_printk("error retrieving storage\n");
+		return 0;
+	}
+
+	*start_time_ptr = bpf_ktime_get_ns();
+	return 0;
+}
+
+SEC("raw_tp/mm_vmscan_memcg_reclaim_end")
+int vmscan_end(struct lruvec *lruvec, struct scan_control *sc)
+{
+	struct vmscan_percpu *pcpu_stat;
+	struct task_struct *current = bpf_get_current_task_btf();
+	struct cgroup *cgrp = task_memcg(current);
+	__u64 *start_time_ptr;
+	__u64 current_elapsed, cg_id;
+	__u64 end_time = bpf_ktime_get_ns();
+
+	/* cgrp may not have memory controller enabled */
+	if (!cgrp)
+		return 0;
+
+	cg_id = cgroup_id(cgrp);
+	start_time_ptr = bpf_task_storage_get(&vmscan_start_time, current, 0,
+					      BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!start_time_ptr) {
+		bpf_printk("error retrieving storage local storage\n");
+		return 0;
+	}
+
+	current_elapsed = end_time - *start_time_ptr;
+	pcpu_stat = bpf_map_lookup_elem(&pcpu_cgroup_vmscan_elapsed,
+					&cg_id);
+	if (pcpu_stat)
+		__sync_fetch_and_add(&pcpu_stat->state, current_elapsed);
+	else
+		create_vmscan_percpu_elem(cg_id, current_elapsed);
+
+	bpf_cgroup_rstat_updated(cgrp);
+	return 0;
+}
+
+SEC("rstat/flush")
+int vmscan_flush(struct bpf_rstat_flush_ctx *ctx)
+{
+	struct vmscan_percpu *pcpu_stat;
+	struct vmscan *total_stat, *parent_stat;
+	struct cgroup *cgrp = ctx->cgrp, *parent = ctx->parent;
+	__u64 cg_id = cgroup_id(ctx->cgrp);
+	__u64 parent_cg_id = parent ? cgroup_id(parent) : 0;
+	__s32 cpu = ctx->cpu;
+	__u64 *pcpu_vmscan;
+	__u64 state;
+	__u64 delta = 0;
+
+	if (!memory_subsys_enabled(cgrp))
+		return 0;
+
+	/* Add CPU changes on this level since the last flush */
+	pcpu_stat = bpf_map_lookup_percpu_elem(&pcpu_cgroup_vmscan_elapsed,
+					       &cg_id, cpu);
+	if (pcpu_stat) {
+		state = pcpu_stat->state;
+		delta += state - pcpu_stat->prev;
+		pcpu_stat->prev = state;
+	}
+
+	total_stat = bpf_map_lookup_elem(&cgroup_vmscan_elapsed, &cg_id);
+	if (!total_stat) {
+		create_vmscan_elem(cg_id, delta, 0);
+		goto update_parent;
+	}
+
+	/* Collect pending stats from subtree */
+	if (total_stat->pending) {
+		delta += total_stat->pending;
+		total_stat->pending = 0;
+	}
+
+	/* Propagate changes to this cgroup's total */
+	total_stat->state += delta;
+
+update_parent:
+	/* Skip if there are no changes to propagate, or no parent */
+	if (!delta || !parent_cg_id)
+		return 0;
+
+	/* Propagate changes to cgroup's parent */
+	parent_stat = bpf_map_lookup_elem(&cgroup_vmscan_elapsed,
+					  &parent_cg_id);
+	if (parent_stat)
+		parent_stat->pending += delta;
+	else
+		create_vmscan_elem(parent_cg_id, 0, delta);
+
+	return 0;
+}
+
+SEC("iter/cgroup")
+int dump_vmscan(struct bpf_iter__cgroup *ctx)
+{
+	struct seq_file *seq = ctx->meta->seq;
+	struct cgroup *cgrp = ctx->cgroup;
+	struct vmscan *total_stat;
+	__u64 cg_id = cgroup_id(cgrp);
+
+	/* Flush the stats to make sure we get the most updated numbers */
+	bpf_cgroup_rstat_flush(cgrp);
+
+	total_stat = bpf_map_lookup_elem(&cgroup_vmscan_elapsed, &cg_id);
+	if (!total_stat) {
+		bpf_printk("error finding stats for cgroup %llu\n", cg_id);
+		BPF_SEQ_PRINTF(seq, "cg_id: -1, total_vmscan_delay: -1\n");
+		return 0;
+	}
+	BPF_SEQ_PRINTF(seq, "cg_id: %llu, total_vmscan_delay: %llu\n",
+		       cg_id, total_stat->state);
+	return 0;
+}
+