From patchwork Wed Mar 19 21:53:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 14023200 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2A4721C9F2; Wed, 19 Mar 2025 21:54:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742421254; cv=none; b=iqzaMcCBHL1kLrTJ1v/z/NsSjhDaKQzoUWx8oWiMTOZZ4oCIDUKyr0TiwRqjfTwfyf27mqruX9zCJpicQzEd8gyILyFMyVwFKEgUtEwdIVkWBLef7Eq1fSyvp50bQRBnUML2YyzcvfMQeK+4GwglOdAiz4iy7ybKODJCiy0wlkE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742421254; c=relaxed/simple; bh=rs06dbIdpXiRETb6ghtLt3iLOXAScwV8g+6KzPVmcew=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nqWGob+DtmJmKsy44E7W+MwUVa347qMNO2fHn8fEm1QQ/flpk/ax7ledCPaCfZ5nlENsYNaeryXneRL1nYAC9Olt7CaSWRxJvVqWqS0/74sxl01DxFoVWMw9n5+dbtS8KFzKYsQl500k+WmNtj42wbDHFUKhqRbZSxiC9GA0+2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BQbknyZE; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BQbknyZE" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-223fd89d036so804365ad.1; Wed, 19 Mar 2025 14:54:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742421252; x=1743026052; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vaTPExmOh7fK7MqnAPa6Vkti2097rUmISDNMFNX1BdE=; b=BQbknyZEuOZ28MDwyjuwjXz6UJoNN0BbNrEcKa7yY6pltmAWLD/s5CfM5PRiRpv2v5 eiC74n3mI2iAb0tw+iXcW8xLnaq6xjm2AuLFqTtjH3VyzkauI5hUQ/UexoKQW6yAOsPe 3f59GlAJzIOlO6FIRzihOXNeUfDcdafKwlsA1eVJJZ+QDGDkzZi4DWHdZ+oGcZ6XG4+f Q+bBwJJlh/gPs0UXz/mMivWXW6rYCUyax5eU+BfnVgvr2n0pgLDv6UZM1Sc3k5X118rP SdwyhRS4J7w9DYQk71ZACazDQgJQQn798wVM58kiArq31f6V6baHRdQPoRVqnigMmrXI ej1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742421252; x=1743026052; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vaTPExmOh7fK7MqnAPa6Vkti2097rUmISDNMFNX1BdE=; b=qlj+uIncZ+f3q1Fau3hPkxC/i9B35N/oxHBl13NjJQGYZGhyIHEceddFT4P42tFMqr +b3Tb7y1KfJsLbNO2l9/DZzO+zD0xtlc3uTLI5uzOOTzzERqJTPJwZWgR9sV5cw0lwuS LBsWXVaDcpOmzY0JPRomYPinWFG7+OYiPQCWgcXzAvalT7PuPPiRaqSiX9vKoUbAANsS icMgNSVw3Lr0DbDvpkbfAERPyI0sWDlq7cnop4sG+vtoNeeXg48SSqEDKWapJhkGkGZ5 I9lNmF64vegdghl/b5wuuKoLdSi7cQAmAq4ePKtE3quLS+Mjg5zeP8lbLLax8u8rXX8P SuGA== X-Gm-Message-State: AOJu0Yw902AKuMPAB3JjNOKVyCDiOpqHrZo7LwCsuinRgskfSnmxIZME H+PlLLJDFVO8o/znVN+XbXIyPlmgqWM34nTUm3VmswI3MwXemIRnDbekQfi+tRM= X-Gm-Gg: ASbGncsPVVG7lRycy52sf9jCcAM2q2wqKU1CATMK5lpFdC/mUBwgMuui+t74+alfK++ 971dmVoo5wHjVBTJZFENvDljDUE3yYo9MuBufQ9LWxaLmUkKHdZhj8BV9nfCDEnF9fU7lk9NGkP Mav++AoLUqej72V2Yc8uCMR4Dao9fqzXMXOLhs5PiGpHYVC8Y7y76t69y2LHI609Q6TgOFvT3Si qmxLrvVmfDNCiYR05ih5drbYN7FbQKj7XEXO427v9AUG6/xc82liUmiTDbY7hnJplz/DU+662Kt sqwwOGRaUQVq7y/23NNGAjqEcz716It6wiZ6BG1Wk2AZ2O7dU2omramYxzYq/+v/TifbVi/4KD7 8UuNm74mavaBrWBqGrJD57ep11IQQfA== X-Google-Smtp-Source: AGHT+IH0nGjvh9omc092pi5lyE1PduIKQP57z0gmOUFzxe2ge6mGygZ63vQM7frnnHrFfXN8KOVadg== X-Received: by 2002:a05:6a20:9f88:b0:1f5:8dea:bb93 with SMTP id adf61e73a8af0-1fbeb184152mr8290997637.7.1742421251757; Wed, 19 Mar 2025 14:54:11 -0700 (PDT) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-737116b0e8asm12175596b3a.158.2025.03.19.14.54.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 14:54:11 -0700 (PDT) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, kuba@kernel.org, edumazet@google.com, xiyou.wangcong@gmail.com, jhs@mojatatu.com, sinquersw@gmail.com, toke@redhat.com, juntong.deng@outlook.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, yepeilin.cs@gmail.com, ameryhung@gmail.com, kernel-team@meta.com Subject: [PATCH bpf-next v6 03/11] bpf: net_sched: Support implementation of Qdisc_ops in bpf Date: Wed, 19 Mar 2025 14:53:50 -0700 Message-ID: <20250319215358.2287371-4-ameryhung@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250319215358.2287371-1-ameryhung@gmail.com> References: <20250319215358.2287371-1-ameryhung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung The recent advancement in bpf such as allocated objects, bpf list and bpf rbtree has provided powerful and flexible building blocks to realize sophisticated packet scheduling algorithms. As struct_ops now supports core operators in Qdisc_ops, start allowing qdisc to be implemented using bpf struct_ops with this patch. Users can implement Qdisc_ops.{enqueue, dequeue, init, reset, destroy} in bpf and register the qdisc dynamically into the kernel. Co-developed-by: Cong Wang Signed-off-by: Cong Wang Signed-off-by: Amery Hung Acked-by: Cong Wang Acked-by: Toke Høiland-Jørgensen --- net/sched/Kconfig | 12 +++ net/sched/Makefile | 1 + net/sched/bpf_qdisc.c | 232 ++++++++++++++++++++++++++++++++++++++++ net/sched/sch_api.c | 7 +- net/sched/sch_generic.c | 3 +- 5 files changed, 251 insertions(+), 4 deletions(-) create mode 100644 net/sched/bpf_qdisc.c diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 8180d0c12fce..ccd0255da5a5 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -403,6 +403,18 @@ config NET_SCH_ETS If unsure, say N. +config NET_SCH_BPF + bool "BPF-based Qdisc" + depends on BPF_SYSCALL && BPF_JIT && DEBUG_INFO_BTF + help + This option allows BPF-based queueing disiplines. With BPF struct_ops, + users can implement supported operators in Qdisc_ops using BPF programs. + The queue holding skb can be built with BPF maps or graphs. + + Say Y here if you want to use BPF-based Qdisc. + + If unsure, say N. + menuconfig NET_SCH_DEFAULT bool "Allow override default queue discipline" help diff --git a/net/sched/Makefile b/net/sched/Makefile index 82c3f78ca486..904d784902d1 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -62,6 +62,7 @@ obj-$(CONFIG_NET_SCH_FQ_PIE) += sch_fq_pie.o obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o obj-$(CONFIG_NET_SCH_ETF) += sch_etf.o obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o +obj-$(CONFIG_NET_SCH_BPF) += bpf_qdisc.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c new file mode 100644 index 000000000000..7eca556a3782 --- /dev/null +++ b/net/sched/bpf_qdisc.c @@ -0,0 +1,232 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include + +static struct bpf_struct_ops bpf_Qdisc_ops; + +struct bpf_sk_buff_ptr { + struct sk_buff *skb; +}; + +static int bpf_qdisc_init(struct btf *btf) +{ + return 0; +} + +BTF_ID_LIST_SINGLE(bpf_qdisc_ids, struct, Qdisc) +BTF_ID_LIST_SINGLE(bpf_sk_buff_ids, struct, sk_buff) +BTF_ID_LIST_SINGLE(bpf_sk_buff_ptr_ids, struct, bpf_sk_buff_ptr) + +static bool bpf_qdisc_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + struct btf *btf = prog->aux->attach_btf; + u32 arg; + + arg = btf_ctx_arg_idx(btf, prog->aux->attach_func_proto, off); + if (prog->aux->attach_st_ops_member_off == offsetof(struct Qdisc_ops, enqueue)) { + if (arg == 2 && type == BPF_READ) { + info->reg_type = PTR_TO_BTF_ID | PTR_TRUSTED; + info->btf = btf; + info->btf_id = bpf_sk_buff_ptr_ids[0]; + return true; + } + } + + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static int bpf_qdisc_qdisc_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, size_t *end) +{ + switch (off) { + case offsetof(struct Qdisc, limit): + *end = offsetofend(struct Qdisc, limit); + break; + case offsetof(struct Qdisc, q) + offsetof(struct qdisc_skb_head, qlen): + *end = offsetof(struct Qdisc, q) + offsetofend(struct qdisc_skb_head, qlen); + break; + case offsetof(struct Qdisc, qstats) ... offsetofend(struct Qdisc, qstats) - 1: + *end = offsetofend(struct Qdisc, qstats); + break; + default: + return -EACCES; + } + + return 0; +} + +static int bpf_qdisc_sk_buff_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, size_t *end) +{ + switch (off) { + case offsetof(struct sk_buff, tstamp): + *end = offsetofend(struct sk_buff, tstamp); + break; + case offsetof(struct sk_buff, priority): + *end = offsetofend(struct sk_buff, priority); + break; + case offsetof(struct sk_buff, mark): + *end = offsetofend(struct sk_buff, mark); + break; + case offsetof(struct sk_buff, queue_mapping): + *end = offsetofend(struct sk_buff, queue_mapping); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, tc_classid): + *end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, tc_classid); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, data[0]) ... + offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, + data[QDISC_CB_PRIV_LEN - 1]): + *end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, data[QDISC_CB_PRIV_LEN - 1]); + break; + case offsetof(struct sk_buff, tc_index): + *end = offsetofend(struct sk_buff, tc_index); + break; + default: + return -EACCES; + } + + return 0; +} + +static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + const struct btf_type *t, *skbt, *qdisct; + size_t end; + int err; + + skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); + qdisct = btf_type_by_id(reg->btf, bpf_qdisc_ids[0]); + t = btf_type_by_id(reg->btf, reg->btf_id); + + if (t == skbt) { + err = bpf_qdisc_sk_buff_access(log, reg, off, &end); + } else if (t == qdisct) { + err = bpf_qdisc_qdisc_access(log, reg, off, &end); + } else { + bpf_log(log, "only read is supported\n"); + return -EACCES; + } + + if (err) { + bpf_log(log, "no write support to %s at off %d\n", + btf_name_by_offset(reg->btf, t->name_off), off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of %s ended at %zu\n", + off, size, btf_name_by_offset(reg->btf, t->name_off), end); + return -EACCES; + } + + return 0; +} + +static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { + .get_func_proto = bpf_base_func_proto, + .is_valid_access = bpf_qdisc_is_valid_access, + .btf_struct_access = bpf_qdisc_btf_struct_access, +}; + +static int bpf_qdisc_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + const struct Qdisc_ops *uqdisc_ops; + struct Qdisc_ops *qdisc_ops; + u32 moff; + + uqdisc_ops = (const struct Qdisc_ops *)udata; + qdisc_ops = (struct Qdisc_ops *)kdata; + + moff = __btf_member_bit_offset(t, member) / 8; + switch (moff) { + case offsetof(struct Qdisc_ops, peek): + qdisc_ops->peek = qdisc_peek_dequeued; + return 0; + case offsetof(struct Qdisc_ops, id): + if (bpf_obj_name_cpy(qdisc_ops->id, uqdisc_ops->id, + sizeof(qdisc_ops->id)) <= 0) + return -EINVAL; + return 1; + } + + return 0; +} + +static int bpf_qdisc_reg(void *kdata, struct bpf_link *link) +{ + return register_qdisc(kdata); +} + +static void bpf_qdisc_unreg(void *kdata, struct bpf_link *link) +{ + return unregister_qdisc(kdata); +} + +static int Qdisc_ops__enqueue(struct sk_buff *skb__ref, struct Qdisc *sch, + struct sk_buff **to_free) +{ + return 0; +} + +static struct sk_buff *Qdisc_ops__dequeue(struct Qdisc *sch) +{ + return NULL; +} + +static int Qdisc_ops__init(struct Qdisc *sch, struct nlattr *arg, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static void Qdisc_ops__reset(struct Qdisc *sch) +{ +} + +static void Qdisc_ops__destroy(struct Qdisc *sch) +{ +} + +static struct Qdisc_ops __bpf_ops_qdisc_ops = { + .enqueue = Qdisc_ops__enqueue, + .dequeue = Qdisc_ops__dequeue, + .init = Qdisc_ops__init, + .reset = Qdisc_ops__reset, + .destroy = Qdisc_ops__destroy, +}; + +static struct bpf_struct_ops bpf_Qdisc_ops = { + .verifier_ops = &bpf_qdisc_verifier_ops, + .reg = bpf_qdisc_reg, + .unreg = bpf_qdisc_unreg, + .init_member = bpf_qdisc_init_member, + .init = bpf_qdisc_init, + .name = "Qdisc_ops", + .cfi_stubs = &__bpf_ops_qdisc_ops, + .owner = THIS_MODULE, +}; + +static int __init bpf_qdisc_kfunc_init(void) +{ + return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); +} +late_initcall(bpf_qdisc_kfunc_init); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index e3e91cf867eb..1aad41b7d5a8 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -358,7 +359,7 @@ static struct Qdisc_ops *qdisc_lookup_ops(struct nlattr *kind) read_lock(&qdisc_mod_lock); for (q = qdisc_base; q; q = q->next) { if (nla_strcmp(kind, q->id) == 0) { - if (!try_module_get(q->owner)) + if (!bpf_try_module_get(q, q->owner)) q = NULL; break; } @@ -1287,7 +1288,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, /* We will try again qdisc_lookup_ops, * so don't keep a reference. */ - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err = -EAGAIN; goto err_out; } @@ -1398,7 +1399,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, netdev_put(dev, &sch->dev_tracker); qdisc_free(sch); err_out2: - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err_out: *errp = err; return NULL; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 14ab2f4c190a..e6fda9f20272 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -1078,7 +1079,7 @@ static void __qdisc_destroy(struct Qdisc *qdisc) ops->destroy(qdisc); lockdep_unregister_key(&qdisc->root_lock_key); - module_put(ops->owner); + bpf_module_put(ops, ops->owner); netdev_put(dev, &qdisc->dev_tracker); trace_qdisc_destroy(qdisc);