From patchwork Sun Jul 14 17:51:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732772 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FA81282FA; Sun, 14 Jul 2024 17:51:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979495; cv=none; b=E8ukYh4xxVL8fIPrvdND0UbUFpU75GImXf2XyWtVmfGo4xRjKxh9A6kSpDnAW3lhdCzVCGCaRGWFwquQPz+L4XDc1cdFy/ryArRBkgoW2oiYd7/64vTL0eQPTbJzV9kjFBdM9iikp1qxVne4hIM6gQ0DOawrjfii1EjOsCZW180= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979495; c=relaxed/simple; bh=uVKDn9PByunNEKw0VV6NvzeWZlExf66uTkq6tuyFiGQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gA37D+H2W5o9lDmiGMgY8pNJ4TfdaFbTqTwVuEtweqsqU6m1MohtjksaBiqN71BjBiWcO9nv+wU6bN+ZjE98Ya16Eiu3BsifoVfsErJdi8ASQvNxhcJaUdaSl6Ar8ttXddUU2L+WEqoS5yhC+dm0V3kNFneMJ1J9ZuSNDm/DSKs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=B5hqoMjD; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="B5hqoMjD" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-44f5df38e64so7616901cf.3; Sun, 14 Jul 2024 10:51:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979492; x=1721584292; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qmEz/IGn/SjAsGtTZZV0MCY8TFL8b4oCXPsRsze76Jg=; b=B5hqoMjDXYqNgqknbilw0l4gRTPrjyajMyj/f/f1BL9w1DJ92gyjvj4yXB9Lxltwqw u3FcOASnAlwm9IiP1ofSveNolCvTuSlI9oi8/UhvxxpFK8KlPDFkVqlthx2Qiyika5i4 JSMe4nO3TRzLwc71jPVytqRLHPf8BngLRld5jiXeSL+CNErHzJbAc4nURc/vWyNJmZdl PwkETfO1nS2O6FTZA8qOiLKjzbygiO1LSQdHBJnEv2b+aShGrjHUuIwQPMrGC2LNVdcN pMdryJxvXVafzxpFE7EGYKuUCCZ3ovLqMuV8fx2o81i8Jf83B7RwV573KhAIhyPknLct aQWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979492; x=1721584292; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qmEz/IGn/SjAsGtTZZV0MCY8TFL8b4oCXPsRsze76Jg=; b=SkpK5TSVmqFxvc9VLWnvjKCXcCSUQ2THsKoY7DVQtpv7yt4m9WNudypQBINbU2FBW4 1vIvOoJcCchjCjQNH/VkvQf03nPWj/V/heCXGwVNdfYLGXNHJQtmwinJMTfC2r3SGh+P JwUXIzoz90ZSWL4Grp9/GQz3zZkg4fqFhWm1M8yFFRFjNwSglb6HGev5ykh727fyBUIA jLzHTGqCkFdrdfL7Hx/1zjCQGOtJuxGB7403I6xC1y+i8jeGj0ZRNjkHhjbWvvstySF3 S+D04OTexeBgUUEsA/ygR5UNaRurCcQCDRDdCDmWuFc0O8ICtypM8G0TR9iMjbxJ7XqI ZJVA== X-Gm-Message-State: AOJu0Yz8PL62T97nchqHmJ1wLTtGN97JMqIqw0ONx8aLjwUk9lVJaDdI h0OAaNzmP6r+cjnhP235P2ckP/jg9MwZb4ewtM8Lz6btvK1jEKH4G9JUeg== X-Google-Smtp-Source: AGHT+IEKMXizeG1La1QNp+zFvN85OwgJqX7J2fwlrTww7/GrmAQnin981aPFPE81sUS19HKwsdM6tw== X-Received: by 2002:a05:622a:7719:b0:447:f4b9:2524 with SMTP id d75a77b69052e-447fa9088dcmr154616461cf.24.1720979492010; Sun, 14 Jul 2024 10:51:32 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:31 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 01/11] bpf: Support getting referenced kptr from struct_ops argument Date: Sun, 14 Jul 2024 17:51:20 +0000 Message-Id: <20240714175130.4051012-2-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Allows struct_ops programs to acqurie referenced kptrs from arguments by directly reading the argument. The verifier will automatically acquire a reference for struct_ops argument tagged with "__ref" in the stub function. The user will be able to access the referenced kptr directly by reading the context as long as it has not been released by the program. This new mechanism to acquire referenced kptr (compared to the existing "kfunc with KF_ACQUIRE") is introduced for ergonomic and semantic reasons. In the first use case, Qdisc_ops, an skb is passed to .enqueue in the first argument. The qdisc becomes the sole owner of the skb and must enqueue or drop the skb. Representing skbs in bpf qdisc as referenced kptrs makes sure 1) qdisc will always enqueue or drop the skb in .enqueue, and 2) qdisc cannot make up invalid skb pointers in .dequeue. The new mechanism provides a natural way for users to get a referenced kptr in struct_ops programs. More importantly, we would also like to make sure that there is only a single reference to the same skb in qdisc. Since in the future, skb->rbnode will be utilized to support adding skb to bpf list and rbtree, allowing multiple references may lead to racy accesses to this field when the user adds references of the skb to different bpf graphs. The new mechanism provides a better way to enforce such unique ptr semantic than forbidding users to call a KF_ACQUIRE kfunc multiple times. Signed-off-by: Amery Hung --- include/linux/bpf.h | 3 +++ kernel/bpf/bpf_struct_ops.c | 26 ++++++++++++++++++++------ kernel/bpf/btf.c | 1 + kernel/bpf/verifier.c | 34 +++++++++++++++++++++++++++++++--- 4 files changed, 55 insertions(+), 9 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cc460786da9b..3891e45ded4e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -924,6 +924,7 @@ struct bpf_insn_access_aux { struct { struct btf *btf; u32 btf_id; + u32 ref_obj_id; }; }; struct bpf_verifier_log *log; /* for verbose logs */ @@ -1427,6 +1428,8 @@ struct bpf_ctx_arg_aux { enum bpf_reg_type reg_type; struct btf *btf; u32 btf_id; + u32 ref_obj_id; + bool refcounted; }; struct btf_mod_pair { diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 0d515ec57aa5..05f16f21981e 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -145,6 +145,7 @@ void bpf_struct_ops_image_free(void *image) } #define MAYBE_NULL_SUFFIX "__nullable" +#define REFCOUNTED_SUFFIX "__ref" #define MAX_STUB_NAME 128 /* Return the type info of a stub function, if it exists. @@ -206,9 +207,11 @@ static int prepare_arg_info(struct btf *btf, struct bpf_struct_ops_arg_info *arg_info) { const struct btf_type *stub_func_proto, *pointed_type; + bool is_nullable = false, is_refcounted = false; const struct btf_param *stub_args, *args; struct bpf_ctx_arg_aux *info, *info_buf; u32 nargs, arg_no, info_cnt = 0; + const char *suffix; u32 arg_btf_id; int offset; @@ -240,12 +243,19 @@ static int prepare_arg_info(struct btf *btf, info = info_buf; for (arg_no = 0; arg_no < nargs; arg_no++) { /* Skip arguments that is not suffixed with - * "__nullable". + * "__nullable or __ref". */ - if (!btf_param_match_suffix(btf, &stub_args[arg_no], - MAYBE_NULL_SUFFIX)) + is_nullable = btf_param_match_suffix(btf, &stub_args[arg_no], + MAYBE_NULL_SUFFIX); + is_refcounted = btf_param_match_suffix(btf, &stub_args[arg_no], + REFCOUNTED_SUFFIX); + if (!is_nullable && !is_refcounted) continue; + if (is_nullable) + suffix = MAYBE_NULL_SUFFIX; + else if (is_refcounted) + suffix = REFCOUNTED_SUFFIX; /* Should be a pointer to struct */ pointed_type = btf_type_resolve_ptr(btf, args[arg_no].type, @@ -253,7 +263,7 @@ static int prepare_arg_info(struct btf *btf, if (!pointed_type || !btf_type_is_struct(pointed_type)) { pr_warn("stub function %s__%s has %s tagging to an unsupported type\n", - st_ops_name, member_name, MAYBE_NULL_SUFFIX); + st_ops_name, member_name, suffix); goto err_out; } @@ -271,11 +281,15 @@ static int prepare_arg_info(struct btf *btf, } /* Fill the information of the new argument */ - info->reg_type = - PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; info->btf_id = arg_btf_id; info->btf = btf; info->offset = offset; + if (is_nullable) { + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; + } else if (is_refcounted) { + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID; + info->refcounted = true; + } info++; info_cnt++; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index de15e8b12fae..52be35b30308 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6516,6 +6516,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, info->reg_type = ctx_arg_info->reg_type; info->btf = ctx_arg_info->btf ? : btf_vmlinux; info->btf_id = ctx_arg_info->btf_id; + info->ref_obj_id = ctx_arg_info->ref_obj_id; return true; } } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 37053cc4defe..f614ab283c37 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1367,6 +1367,17 @@ static int release_reference_state(struct bpf_func_state *state, int ptr_id) return -EINVAL; } +static bool find_reference_state(struct bpf_func_state *state, int ptr_id) +{ + int i; + + for (i = 0; i < state->acquired_refs; i++) + if (state->refs[i].id == ptr_id) + return true; + + return false; +} + static void free_func_state(struct bpf_func_state *state) { if (!state) @@ -5587,7 +5598,7 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off, /* check access to 'struct bpf_context' fields. Supports fixed offsets only */ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size, enum bpf_access_type t, enum bpf_reg_type *reg_type, - struct btf **btf, u32 *btf_id) + struct btf **btf, u32 *btf_id, u32 *ref_obj_id) { struct bpf_insn_access_aux info = { .reg_type = *reg_type, @@ -5606,8 +5617,16 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, *reg_type = info.reg_type; if (base_type(*reg_type) == PTR_TO_BTF_ID) { + if (info.ref_obj_id && + !find_reference_state(cur_func(env), info.ref_obj_id)) { + verbose(env, "bpf_context off=%d ref_obj_id=%d is no longer valid\n", + off, info.ref_obj_id); + return -EACCES; + } + *btf = info.btf; *btf_id = info.btf_id; + *ref_obj_id = info.ref_obj_id; } else { env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size; } @@ -6878,7 +6897,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn } else if (reg->type == PTR_TO_CTX) { enum bpf_reg_type reg_type = SCALAR_VALUE; struct btf *btf = NULL; - u32 btf_id = 0; + u32 btf_id = 0, ref_obj_id = 0; if (t == BPF_WRITE && value_regno >= 0 && is_pointer_value(env, value_regno)) { @@ -6891,7 +6910,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn return err; err = check_ctx_access(env, insn_idx, off, size, t, ®_type, &btf, - &btf_id); + &btf_id, &ref_obj_id); if (err) verbose_linfo(env, insn_idx, "; "); if (!err && t == BPF_READ && value_regno >= 0) { @@ -6915,6 +6934,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn if (base_type(reg_type) == PTR_TO_BTF_ID) { regs[value_regno].btf = btf; regs[value_regno].btf_id = btf_id; + regs[value_regno].ref_obj_id = ref_obj_id; } } regs[value_regno].type = reg_type; @@ -20897,6 +20917,7 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) { bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); struct bpf_subprog_info *sub = subprog_info(env, subprog); + struct bpf_ctx_arg_aux *ctx_arg_info; struct bpf_verifier_state *state; struct bpf_reg_state *regs; int ret, i; @@ -21004,6 +21025,13 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) mark_reg_known_zero(env, regs, BPF_REG_1); } + if (env->prog->type == BPF_PROG_TYPE_STRUCT_OPS) { + ctx_arg_info = (struct bpf_ctx_arg_aux *)env->prog->aux->ctx_arg_info; + for (i = 0; i < env->prog->aux->ctx_arg_info_size; i++) + if (ctx_arg_info[i].refcounted) + ctx_arg_info[i].ref_obj_id = acquire_reference_state(env, 0); + } + ret = do_check(env); out: /* check for NULL is necessary, since cur_state can be freed inside From patchwork Sun Jul 14 17:51:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732773 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A93B22E3E4; Sun, 14 Jul 2024 17:51:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979495; cv=none; b=jPf2yYCq8QFx6Uwe8QzN4MM6gr7tCQnlrYHxdXbjp8YFg7z4Okw276FFrn9BMWSHNHqscLX0LqWW918lq+mWvnUqjWhqUz4NjpcMFBduY769XZAPDAD5GJG+4Vu8BUQohG/aflXaC1crCpWpxpfni0RW9jzW0tKBNte77vip2yQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979495; c=relaxed/simple; bh=rDwABeGlVqrKbor/mbg+ljxecU8TehoRDExiZ7tktmk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uVbamJ6yQNRxzWz2D5TWnzSgaY+8v6DixRTLKgyTzzIVVMfNCSmGnxXwr3fRGZ9MJV/SWoBedqreT9CjMWrHez1+ltuisGiMl5q1QZJhzvfCQYUSrjSbq9QKy9lPIpJ5Yudak4VeYVEz5rpfHpEocP9ae1dL4ul0rZcik9Squbk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kzIBFpQv; arc=none smtp.client-ip=209.85.222.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kzIBFpQv" Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-79f15e7c879so261888085a.1; Sun, 14 Jul 2024 10:51:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979492; x=1721584292; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9ndmJ9l7R0qUCRj64SeRs6YoNt8E4ydj+kVGLYd2HHU=; b=kzIBFpQvlvra8KKLRw/mL3NIF3Zmiley9lbzXldHOHFmQySb1gIqvEpEdNJ6RJh+V3 ld/SOPoACLbEW4ogb6/NdXYDExWuRMAE0h41n1HafIKjq91dXjJn/SF+gCtvgf9lvtDe 83XE8R2SDhv1YLT2hFCrdIFwF/s2PlFFT4FIUO7BcSIC/30Y8u7Chke5kZ3rMhtGohNa lnXmVycct/yAdAGIYNYUTTaZ15C66z15Ak2XhXnaXs/c2dCfB2vJnOPfdKEUeN+tcoAF B11jHpSQfU8iifxA5jDADxO9FVhvbXUVTnVHkBGFuisG58UJ90ZZd4oEoo7vQ4aGA6aX /v4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979492; x=1721584292; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9ndmJ9l7R0qUCRj64SeRs6YoNt8E4ydj+kVGLYd2HHU=; b=IIgSyxyz6L3PonhKnmo25GUPSUxNPXn9cvaYh3Fd2vmfkb/hsQoyhmDbzyKdkCJvd6 6Xr7v5X9zBV+MMzK+C8OzzjXla8JgPixiDN6Chd/CkcQgAUySyFYwkdi7hwceHbNLLca hSD73HkNNg1FFzEzrBViPzItVmnXqdpw6ihLGI2+mqii4pnjKNcxx00bJ06e5HDVSYYI 2LZ0tqHFK0cqoz9Haf6bQNvm3fThwUGJfwd9FaowDV2E9f7I9aR0I8vTXp3PhJRSd9rt SE59ObVNLYkw13tezbrr/7xUVGXhT3qM/WqIo9pClVSES1Rk2KqO1m3iyAqZJy5IBAU1 CIRw== X-Gm-Message-State: AOJu0YzlnzUL1xml+JJxW7gjKqEcMy4MSXR5O06HDOOxjcTFUyOujErS rIJ1mOos1gNW4srEfBgN/HpdafVfFBP7U3/4ICyit6DiZBFbunht3dWiFQ== X-Google-Smtp-Source: AGHT+IHAP7UrXe+aE/vFuoHa09yT7GscGF6LJfZYfXIs6E5UOKiIP6lGR2NKv+x4x7tQ3dQbYfqTcQ== X-Received: by 2002:a05:620a:2a02:b0:79e:fb4f:f5fa with SMTP id af79cd13be357-79f19be636dmr2530264485a.49.1720979492571; Sun, 14 Jul 2024 10:51:32 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:32 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 02/11] selftests/bpf: Test referenced kptr arguments of struct_ops programs Date: Sun, 14 Jul 2024 17:51:21 +0000 Message-Id: <20240714175130.4051012-3-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Test referenced kptr acquired through struct_ops argument tagged with "__ref". The success case checks whether 1) a reference to the correct type is acquired, and 2) the referenced kptr argument can be accessed in multiple paths as long as it hasn't been released. In the fail case, we confirm that a referenced kptr acquried through struct_ops argument, just like the ones acquired via kfuncs, cannot leak. Signed-off-by: Amery Hung --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 7 ++ .../selftests/bpf/bpf_testmod/bpf_testmod.h | 2 + .../prog_tests/test_struct_ops_refcounted.c | 41 ++++++++++++ .../bpf/progs/struct_ops_refcounted.c | 67 +++++++++++++++++++ .../struct_ops_refcounted_fail__ref_leak.c | 17 +++++ 5 files changed, 134 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index f8962a1dd397..316a4c3d3a88 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -916,10 +916,17 @@ static int bpf_testmod_ops__test_maybe_null(int dummy, return 0; } +static int bpf_testmod_ops__test_refcounted(int dummy, + struct task_struct *task__ref) +{ + return 0; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, .test_maybe_null = bpf_testmod_ops__test_maybe_null, + .test_refcounted = bpf_testmod_ops__test_refcounted, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index 23fa1872ee67..bfef5f382d01 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -35,6 +35,8 @@ struct bpf_testmod_ops { void (*test_2)(int a, int b); /* Used to test nullable arguments. */ int (*test_maybe_null)(int dummy, struct task_struct *task); + /* Used to test ref_acquired arguments. */ + int (*test_refcounted)(int dummy, struct task_struct *task); /* The following fields are used to test shadow copies. */ char onebyte; diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c new file mode 100644 index 000000000000..c463b46538d2 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c @@ -0,0 +1,41 @@ +#include + +#include "struct_ops_refcounted.skel.h" +#include "struct_ops_refcounted_fail__ref_leak.skel.h" + +/* Test that the verifier accepts a program that acquires a referenced + * kptr and releases the reference + */ +static void refcounted(void) +{ + struct struct_ops_refcounted *skel; + + skel = struct_ops_refcounted__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open_and_load")) + return; + + struct_ops_refcounted__destroy(skel); +} + +/* Test that the verifier rejects a program that acquires a referenced + * kptr without releasing the reference + */ +static void refcounted_fail__ref_leak(void) +{ + struct struct_ops_refcounted_fail__ref_leak *skel; + + skel = struct_ops_refcounted_fail__ref_leak__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__open_and_load")) + return; + + struct_ops_refcounted_fail__ref_leak__destroy(skel); +} + +void test_struct_ops_refcounted(void) +{ + if (test__start_subtest("refcounted")) + refcounted(); + if (test__start_subtest("refcounted_fail__ref_leak")) + refcounted_fail__ref_leak(); +} + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c new file mode 100644 index 000000000000..2c1326668b92 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c @@ -0,0 +1,67 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +extern void bpf_task_release(struct task_struct *p) __ksym; + +/* This is a test BPF program that uses struct_ops to access a referenced + * kptr argument. This is a test for the verifier to ensure that it + * 1) recongnizes the task as a referenced object (i.e., ref_obj_id > 0), and + * 2) the same reference can be acquired from multiple paths as long as it + * has not been released. + * + * test_refcounted() is equivalent to the C code below. It is written in assembly + * to avoid reads from task (i.e., getting referenced kptrs to task) being merged + * into single path by the compiler. + * + * int test_refcounted(int dummy, struct task_struct *task) + * { + * if (dummy % 2) + * bpf_task_release(task); + * else + * bpf_task_release(task); + * return 0; + * } + */ +SEC("struct_ops/test_refcounted") +int test_refcounted(unsigned long long *ctx) +{ + asm volatile (" \ + /* r6 = dummy */ \ + r6 = *(u64 *)(r1 + 0x0); \ + /* if (r6 & 0x1 != 0) */ \ + r6 &= 0x1; \ + if r6 == 0 goto l0_%=; \ + /* r1 = task */ \ + r1 = *(u64 *)(r1 + 0x8); \ + call %[bpf_task_release]; \ + goto l1_%=; \ +l0_%=: /* r1 = task */ \ + r1 = *(u64 *)(r1 + 0x8); \ + call %[bpf_task_release]; \ +l1_%=: /* return 0 */ \ +" : + : __imm(bpf_task_release) + : __clobber_all); + return 0; +} + +/* BTF FUNC records are not generated for kfuncs referenced + * from inline assembly. These records are necessary for + * libbpf to link the program. The function below is a hack + * to ensure that BTF FUNC records are generated. + */ +void __btf_root(void) +{ + bpf_task_release(NULL); +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_refcounted = { + .test_refcounted = (void *)test_refcounted, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c new file mode 100644 index 000000000000..6e82859eb187 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c @@ -0,0 +1,17 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +SEC("struct_ops/test_refcounted") +int BPF_PROG(test_refcounted, int dummy, + struct task_struct *task) +{ + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_ref_acquire = { + .test_refcounted = (void *)test_refcounted, +}; From patchwork Sun Jul 14 17:51:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732774 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81A7E61FDA; Sun, 14 Jul 2024 17:51:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979496; cv=none; b=ssByENXWjwgA0I1K5Qnb9LytQViw8SfSxfJjMbTn0JhMDtiMQs36j+jSwGsmHiJvLE3FAnx3VcEmH0mFXu/f5Zth41QMyTWEbpl7eV8HzfR1JVXNIwyil1v/aCIjHtD4fBaBCSaNlpFWSprdZ2chFdZ0Ee42OHQtwdtKbeAzwyY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979496; c=relaxed/simple; bh=4wSMpS650Z0R5dTq+hmz3Lcq6aMZ1mEwjPPMYWEwl+E=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XWFyqJfN7EIlDnXXa8JDCa3w4FyCQCj2sd/xzIYDOtHA2M04PUQ04aA2tpX8yrYMtgjndcEBEYgqF1A88wcPpJDAgQY0dT0vquwj75vxdIs+SgEK5dw8ZifDRfe/m52tO3ucG7g8xCfl59xdzKi76HGxfa/u2u7rzmuwi5xxjM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=E1IqyncI; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="E1IqyncI" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-449f23df593so20850911cf.1; Sun, 14 Jul 2024 10:51:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979493; x=1721584293; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=euRRISVXP1oX1vY6QcOXzEe2K8on6qhG38wctYyGy5g=; b=E1IqyncIexJrVhciSXA8gtilvgjTYhVo82rUAIm7Q53eQijUWKGXmaPK/zfyDYKRb2 Q7Npd5rsVUWxqJWuh6FYS9J59aelLb9RS6rKRKk26ipMe3aGA+FvIeP8cV4v6oK7HELG vzDzLySlsNV1dWZrV2RDPf4/5MO2ED7ol0jr2Rtd1NWMu7lm3KLbzpGLlWEHWc2CSjD2 Im+xW76/zw755Jujvn5NyYY02YhmHobY5Sz6xAaiDhBt4Z0HMUDrGYYKyuVdP2TQZF1r 29JgqUOyBM+YTMIto55XB7LFyamuwujZfq1jh2iknrv+7RKPanUBdbH//No6WLXbU4jc Z/vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979493; x=1721584293; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=euRRISVXP1oX1vY6QcOXzEe2K8on6qhG38wctYyGy5g=; b=nOjvy/yIhkY4N6dXxfQm+WhomfQSt7OfH/Coa78XqeXkx+mEBGLGD0tDYxg2afnHIv Ef1DL4VxdcKnuBhOcx+RG16Fzrbgz+BWBu1Zjvw2JUpCvrd8u2BmISS0m5xjI91tqnmo ycKzO4EGD703crg9KnwV5QY4NqtTqo41aUdbrz464gKDk5W7VguWwcpEg5X0wn93utPn PFMCQr442pun+hKPO2VbG85N90sV3P8mpmypYj+X6ADIkH2f7ShtNzDQOoJlcYbjkarB eGjF7kV4i6IPp3aGZNjOD0m8/5rUXDefPerrWTGeTCfr93G5hZktYIlGzYdSJRVyNFSj 7z3g== X-Gm-Message-State: AOJu0Yy4btcv3teAFj/X8kwY3Xs8oOtfElNZAIO+nfTJCdhUvB9oaJtp G0YdJAAseMfgORuLQO76cKve9Zb3MKMv48p43eyY5+rg1Kaiel0PnU7ixA== X-Google-Smtp-Source: AGHT+IEaiCb6NljzvFtaq7tAaLkOV0YbEH+9z/wA3TK2956Ttm8m6GZBHhos1mwSSsABfagB9TPb4Q== X-Received: by 2002:a05:622a:1a17:b0:447:f922:64fd with SMTP id d75a77b69052e-447fa9282edmr204583731cf.35.1720979493208; Sun, 14 Jul 2024 10:51:33 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:32 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 03/11] bpf: Allow struct_ops prog to return referenced kptr Date: Sun, 14 Jul 2024 17:51:22 +0000 Message-Id: <20240714175130.4051012-4-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Allow a struct_ops program to return a referenced kptr if the struct_ops operator has pointer to struct as the return type. To make sure the returned pointer continues to be valid in the kernel, several constraints are required: 1) The type of the pointer must matches the return type 2) The pointer originally comes from the kernel (not locally allocated) 3) The pointer is in its unmodified form In addition, since the first user, Qdisc_ops::dequeue, allows a NULL pointer to be returned when there is no skb to be dequeued, we will allow a scalar value with value equals to NULL to be returned. In the future when there is a struct_ops user that always expects a valid pointer to be returned from an operator, we may extend tagging to the return value. We can tell the verifier to only allow NULL pointer return if the return value is tagged with MAY_BE_NULL. The check is split into two parts since check_reference_leak() happens before check_return_code(). We first allow a reference object to leak through return if it is in the return register and the type matches the return type. Then, we check whether the pointer to-be-returned is valid in check_return_code(). Signed-off-by: Amery Hung --- kernel/bpf/verifier.c | 50 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f614ab283c37..e7f356098902 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10188,16 +10188,36 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit) { + enum bpf_prog_type type = resolve_prog_type(env->prog); + u32 regno = exception_exit ? BPF_REG_1 : BPF_REG_0; + struct bpf_reg_state *reg = reg_state(env, regno); struct bpf_func_state *state = cur_func(env); + const struct bpf_prog *prog = env->prog; + const struct btf_type *ret_type = NULL; bool refs_lingering = false; + struct btf *btf; int i; if (!exception_exit && state->frameno && !state->in_callback_fn) return 0; + if (type == BPF_PROG_TYPE_STRUCT_OPS && + reg->type & PTR_TO_BTF_ID && reg->ref_obj_id) { + btf = bpf_prog_get_target_btf(prog); + ret_type = btf_type_by_id(btf, prog->aux->attach_func_proto->type); + if (reg->btf_id != ret_type->type) { + verbose(env, "Return kptr type, struct %s, doesn't match function prototype, struct %s\n", + btf_type_name(reg->btf, reg->btf_id), + btf_type_name(btf, ret_type->type)); + return -EINVAL; + } + } + for (i = 0; i < state->acquired_refs; i++) { if (!exception_exit && state->in_callback_fn && state->refs[i].callback_ref != state->frameno) continue; + if (ret_type && reg->ref_obj_id == state->refs[i].id) + continue; verbose(env, "Unreleased reference id=%d alloc_insn=%d\n", state->refs[i].id, state->refs[i].insn_idx); refs_lingering = true; @@ -15677,12 +15697,15 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char const char *exit_ctx = "At program exit"; struct tnum enforce_attach_type_range = tnum_unknown; const struct bpf_prog *prog = env->prog; - struct bpf_reg_state *reg; + struct bpf_reg_state *reg = reg_state(env, regno); struct bpf_retval_range range = retval_range(0, 1); enum bpf_prog_type prog_type = resolve_prog_type(env->prog); int err; struct bpf_func_state *frame = env->cur_state->frame[0]; const bool is_subprog = frame->subprogno; + struct btf *btf = bpf_prog_get_target_btf(prog); + bool st_ops_ret_is_kptr = false; + const struct btf_type *t; /* LSM and struct_ops func-ptr's return type could be "void" */ if (!is_subprog || frame->in_exception_callback_fn) { @@ -15691,10 +15714,26 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char if (prog->expected_attach_type == BPF_LSM_CGROUP) /* See below, can be 0 or 0-1 depending on hook. */ break; - fallthrough; + if (!prog->aux->attach_func_proto->type) + return 0; + break; case BPF_PROG_TYPE_STRUCT_OPS: if (!prog->aux->attach_func_proto->type) return 0; + + t = btf_type_by_id(btf, prog->aux->attach_func_proto->type); + if (btf_type_is_ptr(t)) { + /* Allow struct_ops programs to return kptr or null if + * the return type is a pointer type. + * check_reference_leak has ensured the returning kptr + * matches the type of the function prototype and is + * the only leaking reference. Thus, we can safely return + * if the pointer is in its unmodified form + */ + if (reg->type & PTR_TO_BTF_ID) + return __check_ptr_off_reg(env, reg, regno, false); + st_ops_ret_is_kptr = true; + } break; default: break; @@ -15716,8 +15755,6 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char return -EACCES; } - reg = cur_regs(env) + regno; - if (frame->in_async_callback_fn) { /* enforce return zero from async callbacks like timer */ exit_ctx = "At async callback return"; @@ -15804,6 +15841,11 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char case BPF_PROG_TYPE_NETFILTER: range = retval_range(NF_DROP, NF_ACCEPT); break; + case BPF_PROG_TYPE_STRUCT_OPS: + if (!st_ops_ret_is_kptr) + return 0; + range = retval_range(0, 0); + break; case BPF_PROG_TYPE_EXT: /* freplace program can return anything as its return value * depends on the to-be-replaced kernel func or bpf program. From patchwork Sun Jul 14 17:51:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732775 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FB9013C8E8; Sun, 14 Jul 2024 17:51:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979497; cv=none; b=VR/ooQI1g+hCsDqg4v1nhmnljUUea6TSeEyYdOivc5VVzpB8nkyI3Q3UFpZxbUfrOkhy2hpRue2d3ETygZo1PQUpIfzNyLQftXyqS1S0hK0ZpR2+YqKZydWk6YlA1J9+YsYgkIqc7J48IXI1GAIZiO3b3CsaVewS15olz7iZQV0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979497; c=relaxed/simple; bh=SV59NoCDZEsIWoEZ8m1wiLRmxwg4eOSsNAwhBb54M7o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Uv0kMHX40hgq5wWGgQ9D3cqqE2gQJ58ypquB6ZzyGEEeXF5rwUrxSDhqsYvuH5inbDMIpEa/ML/hcB/MwlafxgiDN7DCHXYA4y7s+ik6GyE0143F9kFEeyYyqNxOpoLBH5DB8kHGVIvgZVXXPIaYRZHNLQMb3dpeNhisbkVDrcY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UG3pESic; arc=none smtp.client-ip=209.85.160.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UG3pESic" Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-447d97f98d3so19562081cf.2; Sun, 14 Jul 2024 10:51:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979494; x=1721584294; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HtImcwPeg2GxCBruL+KDWulfN186/oe5ccglZIuiGmQ=; b=UG3pESics827TxdSh6HEhEoEeToSmCb3kA3P0oOvJTJpZ7wp+b+IVT7oUYwTyL6y2J 3vHT6HuzPYIPfrUA/NKGVGG1UoYFXgNccD7ECe0AJJHPXmcK6J6q5Mdxi+XPCAxyfg7q 9SSw/rIkNnRInt1Uf17Y4VcxNRL0LkZfQ/xx7bQGvSOuyZdlPe5VV9J9tPnaJQTQm79V ThCEToLMPN6J0KMu/yE19Xl04Cpyz89Qg2Ysj5i/sqoD1L3aVRP31uUtTFM2u7P40rG/ bjdcQ2JlqfKawB/i7HQWlW+UJPy/whBKfGEZ9e1d/s5au3aTDxwgAgZqE+42J1B420Nx Hcqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979494; x=1721584294; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HtImcwPeg2GxCBruL+KDWulfN186/oe5ccglZIuiGmQ=; b=uYCEwt34QyrhRkim/coKviOtSmf3dsp1seYnLhvK5HWtqQO9rZKPuSl/CjKd1Yb4pt BJOt+efO6VZeEtoqMqFwwHenhtzpGzSzc0wf/01xDLB/ikfnrau79Saaurtaq6Chede4 fBNYb238NqKv0F3qgqFLeq/nQ387NhwNaSTb+S2EesdXSJM8QNNstBnAtWQs+V4j2IPu 9J1jwTj3T4iv8+oBlH2nIxP7rh0c/LH+1hIRhsVoHaQP4LviU9QdY055K01h4CnLRWXc S1W6wQXZByKa0nSpce7E8I67I/rs6HdNQNr9OeWbequks7nmNDR1DEpWeqkza598tnhv xr3A== X-Gm-Message-State: AOJu0YzQ68nK43UZfU78AIWiy9pu1oPrMIT+rsc5l4uyjjULZWERhrVl xMg6s/26Ih5N4u5yIPVxDY5dzTR/yW8d5ylFn6GZVrrNJj9WI1CIPSaO7g== X-Google-Smtp-Source: AGHT+IHlHI1THSfFh2F1K9EqvKpO8OA2UwpG5hWJuPi4BFcCR0tJl5Y+MnR2fnrM9nw+Nqbu4aqT5w== X-Received: by 2002:a05:622a:15d3:b0:447:e51b:60c0 with SMTP id d75a77b69052e-447fa825dd8mr219483131cf.12.1720979493953; Sun, 14 Jul 2024 10:51:33 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:33 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 04/11] selftests/bpf: Test returning referenced kptr from struct_ops programs Date: Sun, 14 Jul 2024 17:51:23 +0000 Message-Id: <20240714175130.4051012-5-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Test struct_ops programs returning referenced kptr. When the return type of a struct_ops operator is pointer to struct, the verifier should only allow programs that return a scalar NULL or a non-local kptr with the correct type in its unmodified form. Signed-off-by: Amery Hung --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 8 ++ .../selftests/bpf/bpf_testmod/bpf_testmod.h | 4 + .../prog_tests/test_struct_ops_kptr_return.c | 87 +++++++++++++++++++ .../bpf/progs/struct_ops_kptr_return.c | 29 +++++++ ...uct_ops_kptr_return_fail__invalid_scalar.c | 24 +++++ .../struct_ops_kptr_return_fail__local_kptr.c | 30 +++++++ ...uct_ops_kptr_return_fail__nonzero_offset.c | 23 +++++ .../struct_ops_kptr_return_fail__wrong_type.c | 28 ++++++ 8 files changed, 233 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 316a4c3d3a88..c90bb3a5e86a 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -922,11 +922,19 @@ static int bpf_testmod_ops__test_refcounted(int dummy, return 0; } +static struct task_struct * +bpf_testmod_ops__test_return_ref_kptr(int dummy, struct task_struct *task__ref, + struct cgroup *cgrp) +{ + return NULL; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, .test_maybe_null = bpf_testmod_ops__test_maybe_null, .test_refcounted = bpf_testmod_ops__test_refcounted, + .test_return_ref_kptr = bpf_testmod_ops__test_return_ref_kptr, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index bfef5f382d01..2289ecd38401 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct cgroup; struct bpf_testmod_test_read_ctx { char *buf; @@ -37,6 +38,9 @@ struct bpf_testmod_ops { int (*test_maybe_null)(int dummy, struct task_struct *task); /* Used to test ref_acquired arguments. */ int (*test_refcounted)(int dummy, struct task_struct *task); + /* Used to test returning referenced kptr. */ + struct task_struct *(*test_return_ref_kptr)(int dummy, struct task_struct *task, + struct cgroup *cgrp); /* The following fields are used to test shadow copies. */ char onebyte; diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c new file mode 100644 index 000000000000..bc2fac39215a --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c @@ -0,0 +1,87 @@ +#include + +#include "struct_ops_kptr_return.skel.h" +#include "struct_ops_kptr_return_fail__wrong_type.skel.h" +#include "struct_ops_kptr_return_fail__invalid_scalar.skel.h" +#include "struct_ops_kptr_return_fail__nonzero_offset.skel.h" +#include "struct_ops_kptr_return_fail__local_kptr.skel.h" + +/* Test that the verifier accepts a program that acquires a referenced + * kptr and releases the reference through return + */ +static void kptr_return(void) +{ + struct struct_ops_kptr_return *skel; + + skel = struct_ops_kptr_return__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open_and_load")) + return; + + struct_ops_kptr_return__destroy(skel); +} + +/* Test that the verifier rejects a program that returns a kptr of the + * wrong type + */ +static void kptr_return_fail__wrong_type(void) +{ + struct struct_ops_kptr_return_fail__wrong_type *skel; + + skel = struct_ops_kptr_return_fail__wrong_type__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__wrong_type__open_and_load")) + return; + + struct_ops_kptr_return_fail__wrong_type__destroy(skel); +} + +/* Test that the verifier rejects a program that returns a non-null scalar */ +static void kptr_return_fail__invalid_scalar(void) +{ + struct struct_ops_kptr_return_fail__invalid_scalar *skel; + + skel = struct_ops_kptr_return_fail__invalid_scalar__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__invalid_scalar__open_and_load")) + return; + + struct_ops_kptr_return_fail__invalid_scalar__destroy(skel); +} + +/* Test that the verifier rejects a program that returns kptr with non-zero offset */ +static void kptr_return_fail__nonzero_offset(void) +{ + struct struct_ops_kptr_return_fail__nonzero_offset *skel; + + skel = struct_ops_kptr_return_fail__nonzero_offset__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__nonzero_offset__open_and_load")) + return; + + struct_ops_kptr_return_fail__nonzero_offset__destroy(skel); +} + +/* Test that the verifier rejects a program that returns local kptr */ +static void kptr_return_fail__local_kptr(void) +{ + struct struct_ops_kptr_return_fail__local_kptr *skel; + + skel = struct_ops_kptr_return_fail__local_kptr__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__local_kptr__open_and_load")) + return; + + struct_ops_kptr_return_fail__local_kptr__destroy(skel); +} + +void test_struct_ops_kptr_return(void) +{ + if (test__start_subtest("kptr_return")) + kptr_return(); + if (test__start_subtest("kptr_return_fail__wrong_type")) + kptr_return_fail__wrong_type(); + if (test__start_subtest("kptr_return_fail__invalid_scalar")) + kptr_return_fail__invalid_scalar(); + if (test__start_subtest("kptr_return_fail__nonzero_offset")) + kptr_return_fail__nonzero_offset(); + if (test__start_subtest("kptr_return_fail__local_kptr")) + kptr_return_fail__local_kptr(); +} + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c new file mode 100644 index 000000000000..29b7719cd4c9 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c @@ -0,0 +1,29 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * allow a referenced kptr or a NULL pointer to be returned. A referenced kptr to task + * here is acquried automatically as the task argument is tagged with "__ref". + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + if (dummy % 2) { + bpf_task_release(task); + return NULL; + } + return task; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c new file mode 100644 index 000000000000..d67982ba8224 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c @@ -0,0 +1,24 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a non-zero scalar value. + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + bpf_task_release(task); + return (struct task_struct *)1; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c new file mode 100644 index 000000000000..9a4247432539 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c @@ -0,0 +1,30 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a local kptr. + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + struct task_struct *t; + + t = bpf_obj_new(typeof(*task)); + if (!t) + return task; + + return t; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c new file mode 100644 index 000000000000..5bb0b4029d11 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c @@ -0,0 +1,23 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a modified referenced kptr. + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + return (struct task_struct *)&task->jobctl; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c new file mode 100644 index 000000000000..32365cb7af49 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c @@ -0,0 +1,28 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a referenced kptr of the wrong type. + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + struct task_struct *ret; + + ret = (struct task_struct *)bpf_cgroup_acquire(cgrp); + bpf_task_release(task); + + return ret; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; From patchwork Sun Jul 14 17:51:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732776 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E734E13CFBB; Sun, 14 Jul 2024 17:51:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979497; cv=none; b=FzXHfTgX/FfIdGquj0bsl/GBHCYrtpCH2WxupGHJa65//GuAkGuRB3KyAbB7496d/+168x0iI4COdNfBwPybpk+ZXkSGAT1Dzwa7EN6NZecynyfWDmATotwtLTgOoqWbDk3VCGikUw1wxmsk6Ip8U1hSgWlccFPqEtPd0+yFtUo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979497; c=relaxed/simple; bh=Foev1jrvQhcXwhYQQDsDvEwSA21epXIcx393T9Q42HQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oLKAsOgGtbOftA1e+qT8uTXoLWRR9xcvRwdP57e9C7HvnnGKsgEYYlULd/+tPEYnYEeF01l3yakGGNN5vl9pYeGovNrKOhJlrHnaOscs2ZsfLc8GZAqSGhOHQ3/VQGpuaIeKDcElPW+ILxA/jp9nQL06YT66WJuEayee121EGKQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NHtvwqvR; arc=none smtp.client-ip=209.85.160.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NHtvwqvR" Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-44931f038f9so37784181cf.0; Sun, 14 Jul 2024 10:51:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979495; x=1721584295; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=I1rlXeJD8f1Pr+M2ITregcBYcMVSqYFKhTS/oYcYvU0=; b=NHtvwqvRNQbTkDwBfg2rxSnQ5OBvg/L7SmUGxSMXBqJ7rZdYMxyu8NHyKytWAYqDve ZL9DlE4tF4ndDnXMmqNPbJ/C7BSr9gB61ePWuw8QiE8u17jBnon6qoO/5qd9vxd3v0MC 5onNMUq6aNiELHLtVJdgPpEBaQTWGZsfxG6Dau2Giy3WF0LoL8Q0nHTKNKrH6Nm3m2Bu bqqoo0KkEtQj9TSA+5PpwjzIR6M9e7XhEaEW8eA3hswMI6lRL7AvMQg00SWOs8hip6Il gbm3z/36wvCtjJIACaL1Ak+q8CEfxPeCMl4qjXY8Z4YpyifPdmXUfgO5+H604rlUuAEl y2iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979495; x=1721584295; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=I1rlXeJD8f1Pr+M2ITregcBYcMVSqYFKhTS/oYcYvU0=; b=mj4QSN9ajSIVBcx5nKcaWP4oLLSJk2Ei82A4gizeNxDrOZZukXLOJNbel5fbjQsgaF ErAGIwGqjPDLqbz20DWM2aQM8cECu3XudNEb/nl08R9cDdJJja/mvTskYsZIgPpCgsh8 ZDLy5jD7iTFhhGA78loTgO+FB94al3XJlAJZNyFg69dSA9gNNgeqKbDBPazaskHqRYAo yeZScRtQCnJDwjDxu1cA5ZlFK/5mOR9avsaX7WUaj1+a0//Vt4LilA0v3fl+FnBlCBd4 oV75Nl9ozoAcqDROSL1HoIOuYh3K8NR1cHWNseD5r5hRkOkHZ4BaRjqTWftlikyPSq/w hbWA== X-Gm-Message-State: AOJu0Yz/qmN+nhB+D0KzXsHf/qugFDIh6qzrJ9RKvivwPUzfzYJ0E1VW 15kIy33HOgX1b9SYaeFoI07baQZKpQtUE76uot73tfn9B5Q8wnMZsB19SQ== X-Google-Smtp-Source: AGHT+IHXM+DELG95X1mc5oOVoDTd1no4Klwtl116YC4R9lzUA+dCY/mdwHvqNqorCWKlfxQITYSxXA== X-Received: by 2002:ac8:5f84:0:b0:446:45eb:6af6 with SMTP id d75a77b69052e-44e5bc25674mr156462261cf.1.1720979494687; Sun, 14 Jul 2024 10:51:34 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:34 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 05/11] bpf: net_sched: Support implementation of Qdisc_ops in bpf Date: Sun, 14 Jul 2024 17:51:24 +0000 Message-Id: <20240714175130.4051012-6-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Enable users to implement a classless qdisc using bpf. The last few patches in this series has prepared struct_ops to support core operators in Qdisc_ops. The recent advancement in bpf such as allocated objects, bpf list and bpf rbtree has also provided powerful and flexible building blocks to realize sophisticated scheduling algorithms. Therefore, in this patch, we start allowing qdisc to be implemented using bpf struct_ops. Users can implement .enqueue and .dequeue in Qdisc_ops in bpf and register the qdisc dynamically into the kernel. Signed-off-by: Cong Wang Co-developed-by: Amery Hung Signed-off-by: Amery Hung --- include/linux/btf.h | 1 + include/net/sch_generic.h | 1 + kernel/bpf/btf.c | 2 +- net/sched/Makefile | 4 + net/sched/bpf_qdisc.c | 352 ++++++++++++++++++++++++++++++++++++++ net/sched/sch_api.c | 7 +- net/sched/sch_generic.c | 3 +- 7 files changed, 365 insertions(+), 5 deletions(-) create mode 100644 net/sched/bpf_qdisc.c diff --git a/include/linux/btf.h b/include/linux/btf.h index cffb43133c68..730ec304f787 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -562,6 +562,7 @@ const char *btf_name_by_offset(const struct btf *btf, u32 offset); const char *btf_str_by_offset(const struct btf *btf, u32 offset); struct btf *btf_parse_vmlinux(void); struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); +u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, int off); u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id, const struct bpf_prog *prog); u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 79edd5b5e3c9..214ed2e34faa 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -95,6 +95,7 @@ struct Qdisc { #define TCQ_F_INVISIBLE 0x80 /* invisible by default in dump */ #define TCQ_F_NOLOCK 0x100 /* qdisc does not require locking */ #define TCQ_F_OFFLOADED 0x200 /* qdisc is offloaded to HW */ +#define TCQ_F_BPF 0x400 /* BPF qdisc */ u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 52be35b30308..059bcc365f10 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6314,7 +6314,7 @@ static bool is_int_ptr(struct btf *btf, const struct btf_type *t) return btf_type_is_int(t); } -static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, +u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, int off) { const struct btf_param *args; diff --git a/net/sched/Makefile b/net/sched/Makefile index 82c3f78ca486..2094e6e74158 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -63,6 +63,10 @@ obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o obj-$(CONFIG_NET_SCH_ETF) += sch_etf.o obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o +ifeq ($(CONFIG_BPF_JIT),y) +obj-$(CONFIG_BPF_SYSCALL) += bpf_qdisc.o +endif + obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o obj-$(CONFIG_NET_CLS_FW) += cls_fw.o diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c new file mode 100644 index 000000000000..a68fc115d8f8 --- /dev/null +++ b/net/sched/bpf_qdisc.c @@ -0,0 +1,352 @@ +#include +#include +#include +#include +#include +#include +#include + +static struct bpf_struct_ops bpf_Qdisc_ops; + +static u32 unsupported_ops[] = { + offsetof(struct Qdisc_ops, init), + offsetof(struct Qdisc_ops, reset), + offsetof(struct Qdisc_ops, destroy), + offsetof(struct Qdisc_ops, change), + offsetof(struct Qdisc_ops, attach), + offsetof(struct Qdisc_ops, change_real_num_tx), + offsetof(struct Qdisc_ops, dump), + offsetof(struct Qdisc_ops, dump_stats), + offsetof(struct Qdisc_ops, ingress_block_set), + offsetof(struct Qdisc_ops, egress_block_set), + offsetof(struct Qdisc_ops, ingress_block_get), + offsetof(struct Qdisc_ops, egress_block_get), +}; + +struct bpf_sched_data { + struct qdisc_watchdog watchdog; +}; + +struct bpf_sk_buff_ptr { + struct sk_buff *skb; +}; + +static int bpf_qdisc_init(struct btf *btf) +{ + return 0; +} + +static int bpf_qdisc_init_op(struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_init(&q->watchdog, sch); + return 0; +} + +static void bpf_qdisc_reset_op(struct Qdisc *sch) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_cancel(&q->watchdog); +} + +static void bpf_qdisc_destroy_op(struct Qdisc *sch) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_cancel(&q->watchdog); +} + +static const struct bpf_func_proto * +bpf_qdisc_get_func_proto(enum bpf_func_id func_id, + const struct bpf_prog *prog) +{ + switch (func_id) { + default: + return bpf_base_func_proto(func_id, prog); + } +} + +BTF_ID_LIST_SINGLE(bpf_sk_buff_ids, struct, sk_buff) +BTF_ID_LIST_SINGLE(bpf_sk_buff_ptr_ids, struct, bpf_sk_buff_ptr) + +static bool bpf_qdisc_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + struct btf *btf = prog->aux->attach_btf; + u32 arg; + + arg = get_ctx_arg_idx(btf, prog->aux->attach_func_proto, off); + if (!strcmp(prog->aux->attach_func_name, "enqueue")) { + if (arg == 2) { + info->reg_type = PTR_TO_BTF_ID | PTR_TRUSTED; + info->btf = btf; + info->btf_id = bpf_sk_buff_ptr_ids[0]; + return true; + } + } + + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + const struct btf_type *t, *skbt; + size_t end; + + skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); + t = btf_type_by_id(reg->btf, reg->btf_id); + if (t != skbt) { + bpf_log(log, "only read is supported\n"); + return -EACCES; + } + + switch (off) { + case offsetof(struct sk_buff, tstamp): + end = offsetofend(struct sk_buff, tstamp); + break; + case offsetof(struct sk_buff, priority): + end = offsetofend(struct sk_buff, priority); + break; + case offsetof(struct sk_buff, mark): + end = offsetofend(struct sk_buff, mark); + break; + case offsetof(struct sk_buff, queue_mapping): + end = offsetofend(struct sk_buff, queue_mapping); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, tc_classid): + end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, tc_classid); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, data[0]) ... + offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, + data[QDISC_CB_PRIV_LEN - 1]): + end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, data[QDISC_CB_PRIV_LEN - 1]); + break; + case offsetof(struct sk_buff, tc_index): + end = offsetofend(struct sk_buff, tc_index); + break; + default: + bpf_log(log, "no write support to sk_buff at off %d\n", off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of sk_buff ended at %zu\n", + off, size, end); + return -EACCES; + } + + return 0; +} + +static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { + .get_func_proto = bpf_qdisc_get_func_proto, + .is_valid_access = bpf_qdisc_is_valid_access, + .btf_struct_access = bpf_qdisc_btf_struct_access, +}; + +static int bpf_qdisc_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + const struct Qdisc_ops *uqdisc_ops; + struct Qdisc_ops *qdisc_ops; + u32 moff; + + uqdisc_ops = (const struct Qdisc_ops *)udata; + qdisc_ops = (struct Qdisc_ops *)kdata; + + moff = __btf_member_bit_offset(t, member) / 8; + switch (moff) { + case offsetof(struct Qdisc_ops, priv_size): + if (uqdisc_ops->priv_size) + return -EINVAL; + qdisc_ops->priv_size = sizeof(struct bpf_sched_data); + return 1; + case offsetof(struct Qdisc_ops, static_flags): + if (uqdisc_ops->static_flags) + return -EINVAL; + qdisc_ops->static_flags = TCQ_F_BPF; + return 1; + case offsetof(struct Qdisc_ops, init): + qdisc_ops->init = bpf_qdisc_init_op; + return 1; + case offsetof(struct Qdisc_ops, reset): + qdisc_ops->reset = bpf_qdisc_reset_op; + return 1; + case offsetof(struct Qdisc_ops, destroy): + qdisc_ops->destroy = bpf_qdisc_destroy_op; + return 1; + case offsetof(struct Qdisc_ops, peek): + if (!uqdisc_ops->peek) + qdisc_ops->peek = qdisc_peek_dequeued; + return 1; + case offsetof(struct Qdisc_ops, id): + if (bpf_obj_name_cpy(qdisc_ops->id, uqdisc_ops->id, + sizeof(qdisc_ops->id)) <= 0) + return -EINVAL; + return 1; + } + + return 0; +} + +static bool is_unsupported(u32 member_offset) +{ + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(unsupported_ops); i++) { + if (member_offset == unsupported_ops[i]) + return true; + } + + return false; +} + +static int bpf_qdisc_check_member(const struct btf_type *t, + const struct btf_member *member, + const struct bpf_prog *prog) +{ + if (is_unsupported(__btf_member_bit_offset(t, member) / 8)) + return -ENOTSUPP; + return 0; +} + +static int bpf_qdisc_validate(void *kdata) +{ + return 0; +} + +static int bpf_qdisc_reg(void *kdata, struct bpf_link *link) +{ + return register_qdisc(kdata); +} + +static void bpf_qdisc_unreg(void *kdata, struct bpf_link *link) +{ + return unregister_qdisc(kdata); +} + +static int Qdisc_ops__enqueue(struct sk_buff *skb__ref, struct Qdisc *sch, + struct sk_buff **to_free) +{ + return 0; +} + +static struct sk_buff *Qdisc_ops__dequeue(struct Qdisc *sch) +{ + return NULL; +} + +static struct sk_buff *Qdisc_ops__peek(struct Qdisc *sch) +{ + return NULL; +} + +static int Qdisc_ops__init(struct Qdisc *sch, struct nlattr *arg, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static void Qdisc_ops__reset(struct Qdisc *sch) +{ +} + +static void Qdisc_ops__destroy(struct Qdisc *sch) +{ +} + +static int Qdisc_ops__change(struct Qdisc *sch, struct nlattr *arg, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static void Qdisc_ops__attach(struct Qdisc *sch) +{ +} + +static int Qdisc_ops__change_tx_queue_len(struct Qdisc *sch, unsigned int new_len) +{ + return 0; +} + +static void Qdisc_ops__change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx) +{ +} + +static int Qdisc_ops__dump(struct Qdisc *sch, struct sk_buff *skb) +{ + return 0; +} + +static int Qdisc_ops__dump_stats(struct Qdisc *sch, struct gnet_dump *d) +{ + return 0; +} + +static void Qdisc_ops__ingress_block_set(struct Qdisc *sch, u32 block_index) +{ +} + +static void Qdisc_ops__egress_block_set(struct Qdisc *sch, u32 block_index) +{ +} + +static u32 Qdisc_ops__ingress_block_get(struct Qdisc *sch) +{ + return 0; +} + +static u32 Qdisc_ops__egress_block_get(struct Qdisc *sch) +{ + return 0; +} + +static struct Qdisc_ops __bpf_ops_qdisc_ops = { + .enqueue = Qdisc_ops__enqueue, + .dequeue = Qdisc_ops__dequeue, + .peek = Qdisc_ops__peek, + .init = Qdisc_ops__init, + .reset = Qdisc_ops__reset, + .destroy = Qdisc_ops__destroy, + .change = Qdisc_ops__change, + .attach = Qdisc_ops__attach, + .change_tx_queue_len = Qdisc_ops__change_tx_queue_len, + .change_real_num_tx = Qdisc_ops__change_real_num_tx, + .dump = Qdisc_ops__dump, + .dump_stats = Qdisc_ops__dump_stats, + .ingress_block_set = Qdisc_ops__ingress_block_set, + .egress_block_set = Qdisc_ops__egress_block_set, + .ingress_block_get = Qdisc_ops__ingress_block_get, + .egress_block_get = Qdisc_ops__egress_block_get, +}; + +static struct bpf_struct_ops bpf_Qdisc_ops = { + .verifier_ops = &bpf_qdisc_verifier_ops, + .reg = bpf_qdisc_reg, + .unreg = bpf_qdisc_unreg, + .check_member = bpf_qdisc_check_member, + .init_member = bpf_qdisc_init_member, + .init = bpf_qdisc_init, + .validate = bpf_qdisc_validate, + .name = "Qdisc_ops", + .cfi_stubs = &__bpf_ops_qdisc_ops, + .owner = THIS_MODULE, +}; + +static int __init bpf_qdisc_kfunc_init(void) +{ + return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); +} +late_initcall(bpf_qdisc_kfunc_init); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 74afc210527d..5064b6d2d1ec 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -358,7 +359,7 @@ static struct Qdisc_ops *qdisc_lookup_ops(struct nlattr *kind) read_lock(&qdisc_mod_lock); for (q = qdisc_base; q; q = q->next) { if (nla_strcmp(kind, q->id) == 0) { - if (!try_module_get(q->owner)) + if (!bpf_try_module_get(q, q->owner)) q = NULL; break; } @@ -1282,7 +1283,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, /* We will try again qdisc_lookup_ops, * so don't keep a reference. */ - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err = -EAGAIN; goto err_out; } @@ -1393,7 +1394,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, netdev_put(dev, &sch->dev_tracker); qdisc_free(sch); err_out2: - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err_out: *errp = err; return NULL; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 2af24547a82c..76e4a6efd17c 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -1077,7 +1078,7 @@ static void __qdisc_destroy(struct Qdisc *qdisc) ops->destroy(qdisc); lockdep_unregister_key(&qdisc->root_lock_key); - module_put(ops->owner); + bpf_module_put(ops, ops->owner); netdev_put(dev, &qdisc->dev_tracker); trace_qdisc_destroy(qdisc); From patchwork Sun Jul 14 17:51:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732777 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD79813D533; Sun, 14 Jul 2024 17:51:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979498; cv=none; b=ilens5c6tx9HRJy38/aySDuX39nD56DwA6he2G/nLszWnjKPF9YL6TvcUckZ0EyUXe6GGMIl9HQY6v4or8a+8twpIIY/TBXa+vzP7cp1t5jaNPRZ9tUH+rIykn2EtsR0ypu9/XcVko8cmdMdpSFO0iFs080wDnpkM+VUkEtDJx0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979498; c=relaxed/simple; bh=AN9wqcxf+Cnk2mtFvgspphR5J77vPVWFHbM72TABR34=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=rKiXpzH99oJJagTdYRVYmMDVGZYdVL8NyUZ65W5n5OJF20DZhW/DcsFHOcciYHAOH2WQQtJf7+tESLOOLR9D62z+vC2q/SkT5UyHs0oLJwWDBGOJgDk54A3S5hDLNWNRjUauAC5i5XK/zEl014pNjzkTH9V0Iyt8gsUvHsal/L8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HaMMClLv; arc=none smtp.client-ip=209.85.222.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HaMMClLv" Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-79f08b01ba6so306705285a.0; Sun, 14 Jul 2024 10:51:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979495; x=1721584295; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/UZab7fQwrfuoaKVlP492SgUXnAcCK4tkBL6m5X64RM=; b=HaMMClLvXe6zBxFU/a+4f1gFTeSVg4Q8lCB4EptkeCuhAo45owPRBKjH3DXdHGGn75 4ftieyxHiOTBI4Y7xNALoy8ETQnvG/fJD08cbde3Okq8g5mLmrFcl4GTdib+rgJ9UCFI l72J1MxW+7LNZKTmNPshKZGN9m1xSPAcJxUp+Zu0YsjBWTRBaoyVEfg4I4bs+bwuUKZe CZ30M5uPfYHitrN5F3VIltGl+At6YCf1RzSDj5361O9KJMPtw2bLO0toAQZewsnQO6tg U7EN19MSQvWoLSB0JmHfI4dbSlVdcWjKNCF1Nfm/Pxyw5mdMAdXa2EmzvQdcBBGIPETZ ES2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979495; x=1721584295; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/UZab7fQwrfuoaKVlP492SgUXnAcCK4tkBL6m5X64RM=; b=HnQOoiAGAAjuNgQ8Rn3W6xfmv3haOlTVqbC59yhr715iJszmjSaXYpSzWnVk1BTufR WVev6NXp1iwrEhO8OzBcmWL97iQcQZgKtbYkxzD1RyLNXO5gsbAEELvS+/eG62lc5Wd5 3QjM6xRWAzlj+eqf2RQit5yLoR+4eOn9DXtDE72bkh2Hgyhrayu5HwAdbYNT3apy8rT2 fpxr7Er8AQHsU0L4jTBFdS50s8ZjLF2x7pIqwjSLIX2utu0upRgMyusUFI+IoJZGDABR /X4UL22pc0rIubPQd6wednd66r41n340PPextC2Wk8XJhpGPtnjZrSHuJgtHH2voCBCN iQ6Q== X-Gm-Message-State: AOJu0Yxj8i7av5cuE96HkEDGRl+E01OK4usygvlKAOSwuHDOcOAolMsE zvV9X0FBTwRVMEMunvZDCF5J034NO2SrOnbz5c/nKZ7If0Z9rbt1/EL9uA== X-Google-Smtp-Source: AGHT+IHccMpLiY+7BpVDbwyVOyufenpMXyXZePIIUUvlKvcircESjWH9YL+atcI37ARj5kitgG+jyA== X-Received: by 2002:a05:620a:a10:b0:79b:a8df:7829 with SMTP id af79cd13be357-7a152fe0b08mr1272247785a.14.1720979495404; Sun, 14 Jul 2024 10:51:35 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:35 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 06/11] bpf: net_sched: Add bpf qdisc kfuncs Date: Sun, 14 Jul 2024 17:51:25 +0000 Message-Id: <20240714175130.4051012-7-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Add kfuncs for working on skb in qdisc. Both bpf_qdisc_skb_drop() and bpf_skb_release() can be used to release a reference to an skb. However, bpf_qdisc_skb_drop() can only be called in .enqueue where a to_free skb list is available from kernel to defer the release. Otherwise, bpf_skb_release() should be used elsewhere. It is also used in bpf_obj_free_fields() when cleaning up skb in maps and collections. bpf_qdisc_schedule() can be used to schedule the execution of the qdisc. An example use case is to throttle a qdisc if the time to dequeue the next packet is known. bpf_skb_get_hash() returns the flow hash of an skb, which can be used to build flow-based queueing algorithms. Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 74 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index a68fc115d8f8..eff7559aa346 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -148,6 +148,64 @@ static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, return 0; } +__bpf_kfunc_start_defs(); + +/* bpf_skb_get_hash - Get the flow hash of an skb. + * @skb: The skb to get the flow hash from. + */ +__bpf_kfunc u32 bpf_skb_get_hash(struct sk_buff *skb) +{ + return skb_get_hash(skb); +} + +/* bpf_skb_release - Release an skb reference acquired on an skb immediately. + * @skb: The skb on which a reference is being released. + */ +__bpf_kfunc void bpf_skb_release(struct sk_buff *skb) +{ + consume_skb(skb); +} + +/* bpf_qdisc_skb_drop - Add an skb to be dropped later to a list. + * @skb: The skb on which a reference is being released and dropped. + * @to_free_list: The list of skbs to be dropped. + */ +__bpf_kfunc void bpf_qdisc_skb_drop(struct sk_buff *skb, + struct bpf_sk_buff_ptr *to_free_list) +{ + __qdisc_drop(skb, (struct sk_buff **)to_free_list); +} + +/* bpf_qdisc_watchdog_schedule - Schedule a qdisc to a later time using a timer. + * @sch: The qdisc to be scheduled. + * @expire: The expiry time of the timer. + * @delta_ns: The slack range of the timer. + */ +__bpf_kfunc void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_schedule_range_ns(&q->watchdog, expire, delta_ns); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(bpf_qdisc_kfunc_ids) +BTF_ID_FLAGS(func, bpf_skb_get_hash) +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_qdisc_skb_drop, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_qdisc_watchdog_schedule) +BTF_KFUNCS_END(bpf_qdisc_kfunc_ids) + +static const struct btf_kfunc_id_set bpf_qdisc_kfunc_set = { + .owner = THIS_MODULE, + .set = &bpf_qdisc_kfunc_ids, +}; + +BTF_ID_LIST(skb_kfunc_dtor_ids) +BTF_ID(struct, sk_buff) +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE) + static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { .get_func_proto = bpf_qdisc_get_func_proto, .is_valid_access = bpf_qdisc_is_valid_access, @@ -347,6 +405,20 @@ static struct bpf_struct_ops bpf_Qdisc_ops = { static int __init bpf_qdisc_kfunc_init(void) { - return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); + int ret; + const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = { + { + .btf_id = skb_kfunc_dtor_ids[0], + .kfunc_btf_id = skb_kfunc_dtor_ids[1] + }, + }; + + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_qdisc_kfunc_set); + ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors, + ARRAY_SIZE(skb_kfunc_dtors), + THIS_MODULE); + ret = ret ?: register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); + + return ret; } late_initcall(bpf_qdisc_kfunc_init); From patchwork Sun Jul 14 17:51:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732779 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B1F713D8AC; Sun, 14 Jul 2024 17:51:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979500; cv=none; b=lVKxrr0UmqXgc8swEsaMXfkHoebuyZXBiaUVtpJKFHI5KmOCr2iftFniIxDqj7rdF3utgW3puRZKMN/ZVXzuxUn5Y6a7ZIq5cjfI7TH06/hVl1SDoPcAXuY0tGjk+LYgeBljpjwArD0hUUK9q/+gLIshdsys2jWsLR/GhcDaRxU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979500; c=relaxed/simple; bh=4tgm0rtgiKrQ13phg5UrlEtwRzDVm6eBfOhkY0pSGQE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GbHd34Bmnk9vbIkRVuNtKykc6GH4xelLKzcx+S/DLPp8DG5sJfMyW1EJsX0jWK/Y0vy04jEiRWPdllsXNdA6tuK5TuDWnQi+1t+YNZsq5wUPdeVWK0oVXo9JPHVYeJefdKB9oyZLUgd4NIuqFnXAL1ZFcNJhLGczVgD3YgY0xQo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=awipAlky; arc=none smtp.client-ip=209.85.160.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="awipAlky" Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-447f25e65f9so18995391cf.3; Sun, 14 Jul 2024 10:51:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979496; x=1721584296; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=06Iby+UsipdZWrLAzwBk+vvxyWam8V7Dc6hfSUjvbd0=; b=awipAlky3okjI0FDp+sM5igNbyWh3/KNhPsDq3cK09kzuS7hUPCUetQHujicGLsn7R TMJKnv7T4AcBNFWqF6Wg6QbV5JxdhSm+pWVllXCTuvs1g4nmbjHPWzqZ4Aow7pdeBagY exdCqa5FPS+CE3/h0uNYfVAI0mjzRVXQ1G64MQkzt8afiA/C6nY9YxmXEU9+X0Xl5+hq QnES/Uaans1b9Pmjw5/mSAQCf/klvG6mlrtmJq1+YyuKwEaj/GtfRIbCzKTL4QIszLpq FCrfWncBLPfcK6WMfSVWSy/lWvyi+CeoTVIsnK813nM21GnbhwCuYE1s2yIEy+tefNr/ R5qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979496; x=1721584296; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=06Iby+UsipdZWrLAzwBk+vvxyWam8V7Dc6hfSUjvbd0=; b=WGD1kSXvcNUuS6GY/eHdjpssv5XjR896rf028GmYwyxjptVGi0mPsPESMRpdh7A4HR yvEuh2lh2Il4/zBVolVSEB+7hdMtIuc+GS3OQ28TmzOfUps2ZlZVB8FS2AILADZcWsdB 7O0HcWJFncwtZINJAF8GMMs2ht38EGzy/cOLo1fVYPenqpcPE16Whc5M/C/kwwbVptQr A8GOBcHtre3jv+x+XWVYc+pi5fGgTFV49yZ3kLd2lAEJjs4wO7mlMnLMz/J7ZWHXfzLn 0HFM7GXVKiSVEh9eofwSDByC4XEGpxSx5HoE76QLd6+L2Yjzcxa22Kjh1XQajJ/pGTeC yuRQ== X-Gm-Message-State: AOJu0Ywx0OmwWL1i72jxy7X2WQnL6OYIzuQBQXPPJz10jSEAVpX7RMd7 1jX85hp8hJDYZxYA7GGQqHizaYlh6m1AQEuSUhmcW3/M282afdNIoRmC7Q== X-Google-Smtp-Source: AGHT+IFvLseeStupIH4TfxY6Pd1XHMrFFiMNDvsq6dYZM7539fFBr0mIiC3+0gLunRc5y9Fh3WLmMQ== X-Received: by 2002:a05:622a:2ca:b0:444:e9b9:709f with SMTP id d75a77b69052e-447fa85c5bdmr207099731cf.19.1720979496185; Sun, 14 Jul 2024 10:51:36 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:35 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 07/11] bpf: net_sched: Allow more optional operators in Qdisc_ops Date: Sun, 14 Jul 2024 17:51:26 +0000 Message-Id: <20240714175130.4051012-8-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC So far, init, reset, and destroy are implemented by bpf qdisc infra as fixed operators that manipulate the watchdog according to the occasion. This patch allows users to implement these three operators to perform desired work alongside the predefined ones. Signed-off-by: Amery Hung --- include/net/sch_generic.h | 6 ++++++ net/sched/bpf_qdisc.c | 20 ++++---------------- net/sched/sch_api.c | 11 +++++++++++ net/sched/sch_generic.c | 8 ++++++++ 4 files changed, 29 insertions(+), 16 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 214ed2e34faa..3041782b7527 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1359,4 +1359,10 @@ static inline void qdisc_synchronize(const struct Qdisc *q) msleep(1); } +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) +int bpf_qdisc_init_pre_op(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack); +void bpf_qdisc_destroy_post_op(struct Qdisc *sch); +void bpf_qdisc_reset_post_op(struct Qdisc *sch); +#endif + #endif diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index eff7559aa346..903b4eb54510 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -9,9 +9,6 @@ static struct bpf_struct_ops bpf_Qdisc_ops; static u32 unsupported_ops[] = { - offsetof(struct Qdisc_ops, init), - offsetof(struct Qdisc_ops, reset), - offsetof(struct Qdisc_ops, destroy), offsetof(struct Qdisc_ops, change), offsetof(struct Qdisc_ops, attach), offsetof(struct Qdisc_ops, change_real_num_tx), @@ -36,8 +33,8 @@ static int bpf_qdisc_init(struct btf *btf) return 0; } -static int bpf_qdisc_init_op(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) +int bpf_qdisc_init_pre_op(struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) { struct bpf_sched_data *q = qdisc_priv(sch); @@ -45,14 +42,14 @@ static int bpf_qdisc_init_op(struct Qdisc *sch, struct nlattr *opt, return 0; } -static void bpf_qdisc_reset_op(struct Qdisc *sch) +void bpf_qdisc_reset_post_op(struct Qdisc *sch) { struct bpf_sched_data *q = qdisc_priv(sch); qdisc_watchdog_cancel(&q->watchdog); } -static void bpf_qdisc_destroy_op(struct Qdisc *sch) +void bpf_qdisc_destroy_post_op(struct Qdisc *sch) { struct bpf_sched_data *q = qdisc_priv(sch); @@ -235,15 +232,6 @@ static int bpf_qdisc_init_member(const struct btf_type *t, return -EINVAL; qdisc_ops->static_flags = TCQ_F_BPF; return 1; - case offsetof(struct Qdisc_ops, init): - qdisc_ops->init = bpf_qdisc_init_op; - return 1; - case offsetof(struct Qdisc_ops, reset): - qdisc_ops->reset = bpf_qdisc_reset_op; - return 1; - case offsetof(struct Qdisc_ops, destroy): - qdisc_ops->destroy = bpf_qdisc_destroy_op; - return 1; case offsetof(struct Qdisc_ops, peek): if (!uqdisc_ops->peek) qdisc_ops->peek = qdisc_peek_dequeued; diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 5064b6d2d1ec..9fb9375e2793 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1352,6 +1352,13 @@ static struct Qdisc *qdisc_create(struct net_device *dev, rcu_assign_pointer(sch->stab, stab); } +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) + if (sch->flags & TCQ_F_BPF) { + err = bpf_qdisc_init_pre_op(sch, tca[TCA_OPTIONS], extack); + if (err != 0) + goto err_out4; + } +#endif if (ops->init) { err = ops->init(sch, tca[TCA_OPTIONS], extack); if (err != 0) @@ -1388,6 +1395,10 @@ static struct Qdisc *qdisc_create(struct net_device *dev, */ if (ops->destroy) ops->destroy(sch); +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) + if (sch->flags & TCQ_F_BPF) + bpf_qdisc_destroy_post_op(sch); +#endif qdisc_put_stab(rtnl_dereference(sch->stab)); err_out3: lockdep_unregister_key(&sch->root_lock_key); diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 76e4a6efd17c..0ac05665c69f 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1033,6 +1033,10 @@ void qdisc_reset(struct Qdisc *qdisc) if (ops->reset) ops->reset(qdisc); +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) + if (qdisc->flags & TCQ_F_BPF) + bpf_qdisc_reset_post_op(qdisc); +#endif __skb_queue_purge(&qdisc->gso_skb); __skb_queue_purge(&qdisc->skb_bad_txq); @@ -1076,6 +1080,10 @@ static void __qdisc_destroy(struct Qdisc *qdisc) if (ops->destroy) ops->destroy(qdisc); +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) + if (qdisc->flags & TCQ_F_BPF) + bpf_qdisc_destroy_post_op(qdisc); +#endif lockdep_unregister_key(&qdisc->root_lock_key); bpf_module_put(ops, ops->owner); From patchwork Sun Jul 14 17:51:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732778 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13D3013E41A; Sun, 14 Jul 2024 17:51:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979499; cv=none; b=pzSaZXIpjQSiV040/d6AR4rxu6ge9g9HAciu9I7D11+fysbq/SimZ/5U153dKoyGXTKbxczjdtBNfHMRRWQGzgvoq+S5qaSYfpfmInRGilVgrLqEiqMOL4IzBw2IPGq9EMZZ/qk7AXYeYdqRGHX293PUvwVV2WdWo+2aQ3/R7PY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979499; c=relaxed/simple; bh=5/7PVeR4sVk6VM5DPEBKzHM3YGlxnATuiRH53a/Z5hk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PSuqK3tV9l5nybabkMNtCCn9u1mlFuQVCFT4bkbNUlDKPg7+ApLwHnitcgdYytWE+Qu36cLyTByl942raz546+aTWsDcjpcZAhJm5k0PvBvGeRY7/pDM7iUAN0L7aLzmC0dKTy5nqRv3ttGyrbDixWJIoPcg4wDHtXZDUqublmo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TXxRi76b; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TXxRi76b" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-79f1b569ab7so214328285a.3; Sun, 14 Jul 2024 10:51:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979497; x=1721584297; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+BCOqlM5LK1vU/RnwpXYYvOWVT6XW2o3GkyNCOwIx70=; b=TXxRi76bF6yI3GPgwrN38l7vuhphjbNZCweOVfgfIukwP+YHJ0d1JTo1Bn5/KKpeRc DuwwtzHoB89mQgrnzytY+jBM6pG+GxNiZFeT1lpj62dGFSW01/eNZJ1ZYtj5ClL9WZhm kRsBJEp1yixvnI1a6EPl8uNtKNpWYrI9wCN80ay352cTh4FrozMgrW2JRfuPTh/0G9p/ JSIkGWnN95VyjD+eGIVsWyNfZbj6AcDODo+fBFvuZHn0Xtk4mfGvrzanVbNOpucyAr6l lfQVAF91QEfTjTDSfbD89hcIWK8lddwRJFR7VqiBYLopkSZYy2RqcFlbkgLquT1DsAo7 xXfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979497; x=1721584297; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+BCOqlM5LK1vU/RnwpXYYvOWVT6XW2o3GkyNCOwIx70=; b=V0KpObcnhndMaLdQL546R2RUJoW6h31kSjNRCRLZYQor1nDaBhP0hAAUKYNrKT+U24 RqMGBar9cELTu8I/shxaxr5nUortTpw/FSYV0C2P051bm+qvYTvD9mgy1rlDAocvNs4L kUVuquJ2yYUqwAq9gR7zZm2Bfq68Sb99zdqo67u31lVAhSY1gG4UtJ9jmmOcpfU7Gf55 fTnl3NQnNZXJIRX4Coct1s9guDwPsKD4Nq05DYCJiwMcUtolb3lxm4uIv7vvLX3UntW3 +havg3fiLjd4NFszvXjI1WG3Zs5r8osUhYin3uqMzSQQFok0Q6R6vJybFhFDCIfUt2u8 IIVg== X-Gm-Message-State: AOJu0Yz+kB3nY1LIPCx0XCSMw6l69OhtToehkWWudxrywHwIUpwH5BDs tcutza8k+bG3Ra0CcLGVWl5WkP6SwFbyuo4WY6/jZIq9imtnvw8ei87EkA== X-Google-Smtp-Source: AGHT+IHlEy4dfOhiOUwK2gFUrEklq9mjj+UAlts+yqE9+htnzHezs/VIDGtLR6ucKLqk2KWaWOk+6w== X-Received: by 2002:a05:620a:172a:b0:79c:4030:d891 with SMTP id af79cd13be357-79f19a440e0mr2348279985a.12.1720979496897; Sun, 14 Jul 2024 10:51:36 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:36 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 08/11] libbpf: Support creating and destroying qdisc Date: Sun, 14 Jul 2024 17:51:27 +0000 Message-Id: <20240714175130.4051012-9-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Extend struct bpf_tc_hook with handle, qdisc name and a new attach type, BPF_TC_QDISC, to allow users to add or remove any qdisc specified in addition to clsact. Signed-off-by: Amery Hung --- tools/lib/bpf/libbpf.h | 5 ++++- tools/lib/bpf/netlink.c | 20 +++++++++++++++++--- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 64a6a3d323e3..f6329a901c9b 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1258,6 +1258,7 @@ enum bpf_tc_attach_point { BPF_TC_INGRESS = 1 << 0, BPF_TC_EGRESS = 1 << 1, BPF_TC_CUSTOM = 1 << 2, + BPF_TC_QDISC = 1 << 3, }; #define BPF_TC_PARENT(a, b) \ @@ -1272,9 +1273,11 @@ struct bpf_tc_hook { int ifindex; enum bpf_tc_attach_point attach_point; __u32 parent; + __u32 handle; + char *qdisc; size_t :0; }; -#define bpf_tc_hook__last_field parent +#define bpf_tc_hook__last_field qdisc struct bpf_tc_opts { size_t sz; diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 68a2def17175..72db8c0add21 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -529,9 +529,9 @@ int bpf_xdp_query_id(int ifindex, int flags, __u32 *prog_id) } -typedef int (*qdisc_config_t)(struct libbpf_nla_req *req); +typedef int (*qdisc_config_t)(struct libbpf_nla_req *req, struct bpf_tc_hook *hook); -static int clsact_config(struct libbpf_nla_req *req) +static int clsact_config(struct libbpf_nla_req *req, struct bpf_tc_hook *hook) { req->tc.tcm_parent = TC_H_CLSACT; req->tc.tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0); @@ -539,6 +539,16 @@ static int clsact_config(struct libbpf_nla_req *req) return nlattr_add(req, TCA_KIND, "clsact", sizeof("clsact")); } +static int qdisc_config(struct libbpf_nla_req *req, struct bpf_tc_hook *hook) +{ + char *qdisc = OPTS_GET(hook, qdisc, NULL); + + req->tc.tcm_parent = OPTS_GET(hook, parent, TC_H_ROOT); + req->tc.tcm_handle = OPTS_GET(hook, handle, 0); + + return nlattr_add(req, TCA_KIND, qdisc, strlen(qdisc) + 1); +} + static int attach_point_to_config(struct bpf_tc_hook *hook, qdisc_config_t *config) { @@ -552,6 +562,9 @@ static int attach_point_to_config(struct bpf_tc_hook *hook, return 0; case BPF_TC_CUSTOM: return -EOPNOTSUPP; + case BPF_TC_QDISC: + *config = &qdisc_config; + return 0; default: return -EINVAL; } @@ -596,7 +609,7 @@ static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags) req.tc.tcm_family = AF_UNSPEC; req.tc.tcm_ifindex = OPTS_GET(hook, ifindex, 0); - ret = config(&req); + ret = config(&req, hook); if (ret < 0) return ret; @@ -639,6 +652,7 @@ int bpf_tc_hook_destroy(struct bpf_tc_hook *hook) case BPF_TC_INGRESS: case BPF_TC_EGRESS: return libbpf_err(__bpf_tc_detach(hook, NULL, true)); + case BPF_TC_QDISC: case BPF_TC_INGRESS | BPF_TC_EGRESS: return libbpf_err(tc_qdisc_delete(hook)); case BPF_TC_CUSTOM: From patchwork Sun Jul 14 17:51:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732780 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5E8F142623; Sun, 14 Jul 2024 17:51:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979500; cv=none; b=j/0lvQL6sJjexnIvi2dLRD0MNbc5Ey3ml6q4DjtQF7oW24LW1VQ7FtUH9zPEJqflBZXLZm3xsiulE2EvU+PHcSlbuOhXpK+yjCdt8MraWo9Y5iLRVvyDjulaFK22cFNsatgyVWg+ykxEu9KAUHL+ZI2C+ze9rjxwys8+k2TOHlc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979500; c=relaxed/simple; bh=Sw/GdMu+nwDZvnWUfqJSHIJRgPQ1+XXVghxePtb7M0k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uwNkS0QsWi+hptBeA7b9ktnICxy0qFYqhA3fLxf0bosJJL9i+7auyyUbuRjCYgwBQaG6+4HJxjMS4DvnzWZ3XSnnZy6HGjyZkvF63QbZSanX9ZOGoPQXI5FEMWcb1jHM2eNkC4jzi5cOLr8iUxq5X8DxFoiWBmRuT2zInUqLIg8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Hs0sEP56; arc=none smtp.client-ip=209.85.160.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Hs0sEP56" Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-447f7c7c5fcso22399211cf.0; Sun, 14 Jul 2024 10:51:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979497; x=1721584297; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CnvbTEalve2yrZBg4EjUwmO00l2W91aCisvFJUAqbig=; b=Hs0sEP567cX3ZXcsfpm0t72Qn6b4j4Ou0PF6sPZH9qX6B/OVM2dIJS0rAmQfGBqd/L uPDSsQHR6m12lyi+5r8/XuCpwt/TdlSYsaly2wzfQfjBSTWquz7bunTK685TBcJ1hTBC e0ao0vMcb3NoUsCjNTHDXF8SFZJTPuHpAZwINQehsokMyiyQfmoppkhsTBnyHsiRUEQz a/o9HddolVrxvf2IN3maB1QYyypya9LLx9xXcm38cWOLOO5xUwp9TTTtYKYq/m/sfGGz VCJVnNcF3e6LduFyGI+SqLUceGHcmTbYC6e4CIKj8fclD0gnBLtT73g4fctK2XDKjxQV JANA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979497; x=1721584297; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CnvbTEalve2yrZBg4EjUwmO00l2W91aCisvFJUAqbig=; b=k7KAuDB8I19wbPF4CquRVAhPOv+uHlWLSkfmBNxizDag2wpH6almp09iMy+KhfdraT 7eLeERici4uN0dmCatcyJNYmyW0Cj7KYG7cRbaDUUTEEqBXgxW3wZ8qQLrBUAMkVQuo2 ye+24pUwidVw4M+QMO4+9RYFNd/3qRCsnll40oMEhy3iASk0UC8MLWLl/2xe+wd8Zp3Y px8qazjGU4ow3xg592NagReTiqGQwHJmAyGW+ikPubOYVUjIDY2AB8KLOeVFPZHD9NAB ZkWWxuxDXI0sL5giu9Gj62LyCo5BCk4wzYj3kLbPb1WUQi+dT0mzqyaKrATKmjUULxax XA9w== X-Gm-Message-State: AOJu0Yz37uUOmSgmjzupo6cWKMJi735C0DhX5w88uXRfd2D49UNXQ/Ua 5+TEK6nb1N9ay4Ym3f5EFwX8Qnyt8aBvN1x1vPqUwz3diKFJ32+wa/SAEg== X-Google-Smtp-Source: AGHT+IFXl1nyVnQVfMIQKIeRsmNfr5YJDNKM/yk4tkrC3iJQqIHZ57MqD4kjZT3Uzu0Id9ktQkp8PA== X-Received: by 2002:a05:622a:1aa3:b0:446:39f9:1491 with SMTP id d75a77b69052e-447faa6f068mr231076051cf.42.1720979497501; Sun, 14 Jul 2024 10:51:37 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:37 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 09/11] selftests: Add a basic fifo qdisc test Date: Sun, 14 Jul 2024 17:51:28 +0000 Message-Id: <20240714175130.4051012-10-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This selftest shows a bare minimum fifo qdisc, which simply enqueues skbs into the back of a bpf list and dequeues from the front of the list. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 161 ++++++++++++++++++ .../selftests/bpf/progs/bpf_qdisc_common.h | 16 ++ .../selftests/bpf/progs/bpf_qdisc_fifo.c | 102 +++++++++++ 3 files changed, 279 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_common.h create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c new file mode 100644 index 000000000000..295d0216e70f --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -0,0 +1,161 @@ +#include +#include +#include + +#include "network_helpers.h" +#include "bpf_qdisc_fifo.skel.h" + +#ifndef ENOTSUPP +#define ENOTSUPP 524 +#endif + +#define LO_IFINDEX 1 + +static const unsigned int total_bytes = 10 * 1024 * 1024; +static int stop; + +static void *server(void *arg) +{ + int lfd = (int)(long)arg, err = 0, fd; + ssize_t nr_sent = 0, bytes = 0; + char batch[1500]; + + fd = accept(lfd, NULL, NULL); + while (fd == -1) { + if (errno == EINTR) + continue; + err = -errno; + goto done; + } + + if (settimeo(fd, 0)) { + err = -errno; + goto done; + } + + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_sent = send(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_sent == -1 && errno == EINTR) + continue; + if (nr_sent == -1) { + err = -errno; + break; + } + bytes += nr_sent; + } + + ASSERT_EQ(bytes, total_bytes, "send"); + +done: + if (fd >= 0) + close(fd); + if (err) { + WRITE_ONCE(stop, 1); + return ERR_PTR(err); + } + return NULL; +} + +static void do_test(char *qdisc) +{ + DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = LO_IFINDEX, + .attach_point = BPF_TC_QDISC, + .parent = TC_H_ROOT, + .handle = 0x8000000, + .qdisc = qdisc); + struct sockaddr_in6 sa6 = {}; + ssize_t nr_recv = 0, bytes = 0; + int lfd = -1, fd = -1; + pthread_t srv_thread; + socklen_t addrlen = sizeof(sa6); + void *thread_ret; + char batch[1500]; + int err; + + WRITE_ONCE(stop, 0); + + err = bpf_tc_hook_create(&hook); + if (!ASSERT_OK(err, "attach qdisc")) + return; + + lfd = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_NEQ(lfd, -1, "socket")) { + bpf_tc_hook_destroy(&hook); + return; + } + + fd = socket(AF_INET6, SOCK_STREAM, 0); + if (!ASSERT_NEQ(fd, -1, "socket")) { + bpf_tc_hook_destroy(&hook); + close(lfd); + return; + } + + if (settimeo(lfd, 0) || settimeo(fd, 0)) + goto done; + + err = getsockname(lfd, (struct sockaddr *)&sa6, &addrlen); + if (!ASSERT_NEQ(err, -1, "getsockname")) + goto done; + + /* connect to server */ + err = connect(fd, (struct sockaddr *)&sa6, addrlen); + if (!ASSERT_NEQ(err, -1, "connect")) + goto done; + + err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd); + if (!ASSERT_OK(err, "pthread_create")) + goto done; + + /* recv total_bytes */ + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_recv = recv(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_recv == -1 && errno == EINTR) + continue; + if (nr_recv == -1) + break; + bytes += nr_recv; + } + + ASSERT_EQ(bytes, total_bytes, "recv"); + + WRITE_ONCE(stop, 1); + pthread_join(srv_thread, &thread_ret); + ASSERT_OK(IS_ERR(thread_ret), "thread_ret"); + +done: + close(lfd); + close(fd); + + bpf_tc_hook_destroy(&hook); + return; +} + +static void test_fifo(void) +{ + struct bpf_qdisc_fifo *fifo_skel; + struct bpf_link *link; + + fifo_skel = bpf_qdisc_fifo__open_and_load(); + if (!ASSERT_OK_PTR(fifo_skel, "bpf_qdisc_fifo__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(fifo_skel->maps.fifo); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_fifo__destroy(fifo_skel); + return; + } + + do_test("bpf_fifo"); + + bpf_link__destroy(link); + bpf_qdisc_fifo__destroy(fifo_skel); +} + +void test_bpf_qdisc(void) +{ + if (test__start_subtest("fifo")) + test_fifo(); +} diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h new file mode 100644 index 000000000000..6ffefbd43f0c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h @@ -0,0 +1,16 @@ +#ifndef _BPF_QDISC_COMMON_H +#define _BPF_QDISC_COMMON_H + +#define NET_XMIT_SUCCESS 0x00 +#define NET_XMIT_DROP 0x01 /* skb dropped */ +#define NET_XMIT_CN 0x02 /* congestion notification */ + +#define TC_PRIO_CONTROL 7 +#define TC_PRIO_MAX 15 + +u32 bpf_skb_get_hash(struct sk_buff *p) __ksym; +void bpf_skb_release(struct sk_buff *p) __ksym; +void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym; +void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) __ksym; + +#endif diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c new file mode 100644 index 000000000000..eb6272d36c77 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c @@ -0,0 +1,102 @@ +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +struct skb_node { + struct sk_buff __kptr *skb; + struct bpf_list_node node; +}; + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock q_fifo_lock; +private(A) struct bpf_list_head q_fifo __contains(skb_node, node); + +unsigned int q_limit = 1000; +unsigned int q_qlen = 0; + +SEC("struct_ops/bpf_fifo_enqueue") +int BPF_PROG(bpf_fifo_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + struct skb_node *skbn; + + if (q_qlen == q_limit) + goto drop; + + skbn = bpf_obj_new(typeof(*skbn)); + if (!skbn) + goto drop; + + q_qlen++; + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) //unexpected + bpf_qdisc_skb_drop(skb, to_free); + + bpf_spin_lock(&q_fifo_lock); + bpf_list_push_back(&q_fifo, &skbn->node); + bpf_spin_unlock(&q_fifo_lock); + + return NET_XMIT_SUCCESS; +drop: + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; +} + +SEC("struct_ops/bpf_fifo_dequeue") +struct sk_buff *BPF_PROG(bpf_fifo_dequeue, struct Qdisc *sch) +{ + struct bpf_list_node *node; + struct sk_buff *skb = NULL; + struct skb_node *skbn; + + bpf_spin_lock(&q_fifo_lock); + node = bpf_list_pop_front(&q_fifo); + bpf_spin_unlock(&q_fifo_lock); + if (!node) + return NULL; + + skbn = container_of(node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + q_qlen--; + + return skb; +} + +SEC("struct_ops/bpf_fifo_reset") +void BPF_PROG(bpf_fifo_reset, struct Qdisc *sch) +{ + struct bpf_list_node *node; + struct skb_node *skbn; + int i; + + bpf_for(i, 0, q_qlen) { + struct sk_buff *skb = NULL; + + bpf_spin_lock(&q_fifo_lock); + node = bpf_list_pop_front(&q_fifo); + bpf_spin_unlock(&q_fifo_lock); + + if (!node) + break; + + skbn = container_of(node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_skb_release(skb); + bpf_obj_drop(skbn); + } + q_qlen = 0; +} + +SEC(".struct_ops") +struct Qdisc_ops fifo = { + .enqueue = (void *)bpf_fifo_enqueue, + .dequeue = (void *)bpf_fifo_dequeue, + .reset = (void *)bpf_fifo_reset, + .id = "bpf_fifo", +}; + From patchwork Sun Jul 14 17:51:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732781 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53A3913CFBB; Sun, 14 Jul 2024 17:51:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979501; cv=none; b=ma/QHiaaGYOMjdQmLwvleA0c4yJx/KX4XwxAl9FQekqyz7UmScjm/0f6LOKknrVe9GRF5zOGPa1Y3/UsWRmnOY1KWJPhQ84pCUoy9f3oe5Maio4EBpm6nYlY82ahk+vGa3rl/xnnwu8XsQQ+QnGMCo7gnnLZGpaN5Acz46NRbNA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979501; c=relaxed/simple; bh=fJ0yOhpWio4N/4+VjAvr2Ia8ydkMERltxVCi9cjYSBU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Lk6gQZlhHx/Vu79UQwrD0d9407216ifDgwmB/ZbxNybAwX1HE0Xw+1Goyl3oULpXqRsctikos0b4JR3Ou7Rgg88IoQvHRqlrmGWyrAy17QUL08/Jt2oP0jzCbCz746h22KPXB7QO50BVD4QRGbJSd/Wq1FPW8Za5deoZQqM0q1A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AE4O7lnx; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AE4O7lnx" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-447eb65d366so23533991cf.1; Sun, 14 Jul 2024 10:51:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979498; x=1721584298; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vupKjZNLemBNkVFIIerpaDwfePQ7F60jG4v+PdHJlQI=; b=AE4O7lnxx1x7HVSLkMW7WdomOF6Y96/rLrhKtDnlC4nfr3CllrDaxjXlGzQMG6kM0X J9A6j1qZZkrVy9cFMtXcTiFJgoMiFRK4Np173L0AvgerYyv35lqPbGUGvlCpKwm4NBjJ IQLMta9KyeQoZZJJ3Oz1lSGvvXDOoosQDvMqsc4pan2in7hTi4ITIJPD2UiiBzROcea4 7ZuFq/CE7fkYAl6XLOyy4A6xEEVNdkN/f7Ukvcl1LONETaOnkyTF3yGGyTr9sdrO0GUT Og5/JgpTNTuFwr0PtU/EjThgrYWR1clPUr4SgicHBf8ZLfVxtn2TC1wOeex/HobCcPjJ Og0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979498; x=1721584298; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vupKjZNLemBNkVFIIerpaDwfePQ7F60jG4v+PdHJlQI=; b=IuCOEjx6SrRZ+4gL4iJmLNV+CzPaEQr7cPN/WWG3HNFKoY8KCcItuJxkufo1yBEM52 /8Ouf+e70TRMO6X+Ve0HM2us5ohfkLVcdYEwmQjoqPzGiDWlfuIEDld3u5MPv9WMPmft FPOmbptO/VCApaFJWF5l66l47CNtwJZCiwRo4+Bxab5EhaVhhT5VbOOIskCfSdhjk4uE duQpBMIs2npm97GCm1fljMxQzzhpZIB+aDd6U/AeMabNZVFfztjUZIxljxqg5CYYVrp9 EMxY8ZrnxhLaV78x57znQofomzpYDQF4FIAUDA6DiiyuP0oDBJFzyOIgafUhzE4teCSZ mJjQ== X-Gm-Message-State: AOJu0Yx4uS/60ZyXAiza25ZEdYyx8/N6csuz6k8ZzyPZTUDLWmbqnk4d Ki+PKmk9sSPeviGYZECBipE5i9Tvu1mQpcYf2L3ZXfpeobnYhJf2b7UjvA== X-Google-Smtp-Source: AGHT+IG5mdU/qRw77qNsWRKmSd1klU+WIaig++TAtpdIjx6gzLVS5IfoBVbk8SBDBLpOniUCZ8wNbA== X-Received: by 2002:a05:622a:106:b0:446:5ee0:d1e with SMTP id d75a77b69052e-447fa93365dmr213073511cf.39.1720979498074; Sun, 14 Jul 2024 10:51:38 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:37 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 10/11] selftests: Add a bpf fq qdisc to selftest Date: Sun, 14 Jul 2024 17:51:29 +0000 Message-Id: <20240714175130.4051012-11-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This test implements a more sophisticated qdisc using bpf. The bpf fair- queueing (fq) qdisc gives each flow an equal chance to transmit data. It also respects the timestamp of skb for rate limiting. The implementation does not prevent hash collision of flows nor does it recycle flows. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 24 + .../selftests/bpf/progs/bpf_qdisc_fq.c | 623 ++++++++++++++++++ 2 files changed, 647 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c index 295d0216e70f..394bf5a4adae 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -4,6 +4,7 @@ #include "network_helpers.h" #include "bpf_qdisc_fifo.skel.h" +#include "bpf_qdisc_fq.skel.h" #ifndef ENOTSUPP #define ENOTSUPP 524 @@ -154,8 +155,31 @@ static void test_fifo(void) bpf_qdisc_fifo__destroy(fifo_skel); } +static void test_fq(void) +{ + struct bpf_qdisc_fq *fq_skel; + struct bpf_link *link; + + fq_skel = bpf_qdisc_fq__open_and_load(); + if (!ASSERT_OK_PTR(fq_skel, "bpf_qdisc_fq__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(fq_skel->maps.fq); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_fq__destroy(fq_skel); + return; + } + + do_test("bpf_fq"); + + bpf_link__destroy(link); + bpf_qdisc_fq__destroy(fq_skel); +} + void test_bpf_qdisc(void) { if (test__start_subtest("fifo")) test_fifo(); + if (test__start_subtest("fq")) + test_fq(); } diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c new file mode 100644 index 000000000000..5debb045b6e2 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c @@ -0,0 +1,623 @@ +#include +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +#define NSEC_PER_USEC 1000L +#define NSEC_PER_SEC 1000000000L +#define PSCHED_MTU (64 * 1024 + 14) + +#define NUM_QUEUE_LOG 10 +#define NUM_QUEUE (1 << NUM_QUEUE_LOG) +#define PRIO_QUEUE (NUM_QUEUE + 1) +#define COMP_DROP_PKT_DELAY 1 +#define THROTTLED 0xffffffffffffffff + +/* fq configuration */ +__u64 q_flow_refill_delay = 40; +__u64 q_horizon = 10ULL * NSEC_PER_SEC; +__u32 q_initial_quantum = 10 * PSCHED_MTU; +__u32 q_quantum = 2 * PSCHED_MTU; +__u32 q_orphan_mask = 1023; +__u32 q_flow_plimit = 100; +__u32 q_plimit = 10000; +__u32 q_timer_slack = 10 * NSEC_PER_USEC; +bool q_horizon_drop = true; + +unsigned long time_next_delayed_flow = ~0ULL; +unsigned long unthrottle_latency_ns = 0ULL; +unsigned long ktime_cache = 0; +unsigned long dequeue_now; +unsigned int fq_qlen = 0; + +struct skb_node { + u64 tstamp; + struct sk_buff __kptr *skb; + struct bpf_rb_node node; +}; + +struct fq_flow_node { + u32 hash; + int credit; + u32 qlen; + u32 socket_hash; + u64 age; + u64 time_next_packet; + struct bpf_list_node list_node; + struct bpf_rb_node rb_node; + struct bpf_rb_root queue __contains(skb_node, node); + struct bpf_spin_lock lock; + struct bpf_refcount refcount; +}; + +struct dequeue_nonprio_ctx { + bool stop_iter; + u64 expire; +}; + +struct fq_stashed_flow { + struct fq_flow_node __kptr *flow; +}; + +/* [NUM_QUEUE] for TC_PRIO_CONTROL + * [0, NUM_QUEUE - 1] for other flows + */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, __u32); + __type(value, struct fq_stashed_flow); + __uint(max_entries, NUM_QUEUE + 1); +} fq_stashed_flows SEC(".maps"); + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock fq_delayed_lock; +private(A) struct bpf_rb_root fq_delayed __contains(fq_flow_node, rb_node); + +private(B) struct bpf_spin_lock fq_new_flows_lock; +private(B) struct bpf_list_head fq_new_flows __contains(fq_flow_node, list_node); + +private(C) struct bpf_spin_lock fq_old_flows_lock; +private(C) struct bpf_list_head fq_old_flows __contains(fq_flow_node, list_node); + +static bool bpf_kptr_xchg_back(void *map_val, void *ptr) +{ + void *ret; + + ret = bpf_kptr_xchg(map_val, ptr); + if (ret) { //unexpected + bpf_obj_drop(ret); + return false; + } + return true; +} + +static struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb) +{ + return (struct qdisc_skb_cb *)skb->cb; +} + +static int hash64(u64 val, int bits) +{ + return val * 0x61C8864680B583EBull >> (64 - bits); +} + +static bool skb_tstamp_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct skb_node *skbn_a; + struct skb_node *skbn_b; + + skbn_a = container_of(a, struct skb_node, node); + skbn_b = container_of(b, struct skb_node, node); + + return skbn_a->tstamp < skbn_b->tstamp; +} + +static bool fn_time_next_packet_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct fq_flow_node *flow_a; + struct fq_flow_node *flow_b; + + flow_a = container_of(a, struct fq_flow_node, rb_node); + flow_b = container_of(b, struct fq_flow_node, rb_node); + + return flow_a->time_next_packet < flow_b->time_next_packet; +} + +static void +fq_flows_add_head(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct fq_flow_node *flow) +{ + bpf_spin_lock(lock); + bpf_list_push_front(head, &flow->list_node); + bpf_spin_unlock(lock); +} + +static void +fq_flows_add_tail(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct fq_flow_node *flow) +{ + bpf_spin_lock(lock); + bpf_list_push_back(head, &flow->list_node); + bpf_spin_unlock(lock); +} + +static bool +fq_flows_is_empty(struct bpf_list_head *head, struct bpf_spin_lock *lock) +{ + struct bpf_list_node *node; + + bpf_spin_lock(lock); + node = bpf_list_pop_front(head); + if (node) { + bpf_list_push_front(head, node); + bpf_spin_unlock(lock); + return false; + } + bpf_spin_unlock(lock); + + return true; +} + +static void fq_flow_set_detached(struct fq_flow_node *flow) +{ + flow->age = bpf_jiffies64(); + bpf_obj_drop(flow); +} + +static bool fq_flow_is_detached(struct fq_flow_node *flow) +{ + return flow->age != 0 && flow->age != THROTTLED; +} + +static bool fq_flow_is_throttled(struct fq_flow_node *flow) +{ + return flow->age != THROTTLED; +} + +static bool sk_listener(struct sock *sk) +{ + return (1 << sk->__sk_common.skc_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV); +} + +static int +fq_classify(struct sk_buff *skb, u32 *hash, struct fq_stashed_flow **sflow, + bool *connected, u32 *sk_hash) +{ + struct fq_flow_node *flow; + struct sock *sk = skb->sk; + + *connected = false; + + if ((skb->priority & TC_PRIO_MAX) == TC_PRIO_CONTROL) { + *hash = PRIO_QUEUE; + } else { + if (!sk || sk_listener(sk)) { + *sk_hash = bpf_skb_get_hash(skb) & q_orphan_mask; + *sk_hash = (*sk_hash << 1 | 1); + } else if (sk->__sk_common.skc_state == TCP_CLOSE) { + *sk_hash = bpf_skb_get_hash(skb) & q_orphan_mask; + *sk_hash = (*sk_hash << 1 | 1); + } else { + *sk_hash = sk->__sk_common.skc_hash; + *connected = true; + } + *hash = hash64(*sk_hash, NUM_QUEUE_LOG); + } + + *sflow = bpf_map_lookup_elem(&fq_stashed_flows, hash); + if (!*sflow) + return -1; + + if ((*sflow)->flow) + return 0; + + flow = bpf_obj_new(typeof(*flow)); + if (!flow) + return -1; + + flow->hash = *hash; + flow->credit = q_initial_quantum; + flow->qlen = 0; + flow->age = 1UL; + flow->time_next_packet = 0; + + bpf_kptr_xchg_back(&(*sflow)->flow, flow); + + return 0; +} + +static bool fq_packet_beyond_horizon(struct sk_buff *skb) +{ + return (s64)skb->tstamp > (s64)(ktime_cache + q_horizon); +} + +SEC("struct_ops/bpf_fq_enqueue") +int BPF_PROG(bpf_fq_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + struct fq_flow_node *flow = NULL, *flow_copy; + struct fq_stashed_flow *sflow; + u64 time_to_send, jiffies; + u32 hash, sk_hash; + struct skb_node *skbn; + bool connected; + + if (fq_qlen >= q_plimit) + goto drop; + + if (!skb->tstamp) { + time_to_send = ktime_cache = bpf_ktime_get_ns(); + } else { + if (fq_packet_beyond_horizon(skb)) { + ktime_cache = bpf_ktime_get_ns(); + if (fq_packet_beyond_horizon(skb)) { + if (q_horizon_drop) + goto drop; + + skb->tstamp = ktime_cache + q_horizon; + } + } + time_to_send = skb->tstamp; + } + + if (fq_classify(skb, &hash, &sflow, &connected, &sk_hash) < 0) + goto drop; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (!flow) + goto drop; + + if (hash != PRIO_QUEUE) { + if (connected && flow->socket_hash != sk_hash) { + flow->credit = q_initial_quantum; + flow->socket_hash = sk_hash; + if (fq_flow_is_throttled(flow)) { + /* mark the flow as undetached. The reference to the + * throttled flow in fq_delayed will be removed later. + */ + flow_copy = bpf_refcount_acquire(flow); + flow_copy->age = 0; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow_copy); + } + flow->time_next_packet = 0ULL; + } + + if (flow->qlen >= q_flow_plimit) { + bpf_kptr_xchg_back(&sflow->flow, flow); + goto drop; + } + + if (fq_flow_is_detached(flow)) { + if (connected) + flow->socket_hash = sk_hash; + + flow_copy = bpf_refcount_acquire(flow); + + jiffies = bpf_jiffies64(); + if ((s64)(jiffies - (flow_copy->age + q_flow_refill_delay)) > 0) { + if (flow_copy->credit < q_quantum) + flow_copy->credit = q_quantum; + } + flow_copy->age = 0; + fq_flows_add_tail(&fq_new_flows, &fq_new_flows_lock, flow_copy); + } + } + + skbn = bpf_obj_new(typeof(*skbn)); + if (!skbn) { + bpf_kptr_xchg_back(&sflow->flow, flow); + goto drop; + } + + skbn->tstamp = skb->tstamp = time_to_send; + + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_qdisc_skb_drop(skb, to_free); + + bpf_spin_lock(&flow->lock); + bpf_rbtree_add(&flow->queue, &skbn->node, skb_tstamp_less); + bpf_spin_unlock(&flow->lock); + + flow->qlen++; + bpf_kptr_xchg_back(&sflow->flow, flow); + + fq_qlen++; + return NET_XMIT_SUCCESS; + +drop: + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; +} + +static int fq_unset_throttled_flows(u32 index, bool *unset_all) +{ + struct bpf_rb_node *node = NULL; + struct fq_flow_node *flow; + + bpf_spin_lock(&fq_delayed_lock); + + node = bpf_rbtree_first(&fq_delayed); + if (!node) { + bpf_spin_unlock(&fq_delayed_lock); + return 1; + } + + flow = container_of(node, struct fq_flow_node, rb_node); + if (!*unset_all && flow->time_next_packet > dequeue_now) { + time_next_delayed_flow = flow->time_next_packet; + bpf_spin_unlock(&fq_delayed_lock); + return 1; + } + + node = bpf_rbtree_remove(&fq_delayed, &flow->rb_node); + + bpf_spin_unlock(&fq_delayed_lock); + + if (!node) + return 1; + + flow = container_of(node, struct fq_flow_node, rb_node); + + /* the flow was recycled during enqueue() */ + if (flow->age != THROTTLED) { + bpf_obj_drop(flow); + return 0; + } + + flow->age = 0; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow); + + return 0; +} + +static void fq_flow_set_throttled(struct fq_flow_node *flow) +{ + flow->age = THROTTLED; + + if (time_next_delayed_flow > flow->time_next_packet) + time_next_delayed_flow = flow->time_next_packet; + + bpf_spin_lock(&fq_delayed_lock); + bpf_rbtree_add(&fq_delayed, &flow->rb_node, fn_time_next_packet_less); + bpf_spin_unlock(&fq_delayed_lock); +} + +static void fq_check_throttled(void) +{ + bool unset_all = false; + unsigned long sample; + + if (time_next_delayed_flow > dequeue_now) + return; + + sample = (unsigned long)(dequeue_now - time_next_delayed_flow); + unthrottle_latency_ns -= unthrottle_latency_ns >> 3; + unthrottle_latency_ns += sample >> 3; + + time_next_delayed_flow = ~0ULL; + bpf_loop(NUM_QUEUE, fq_unset_throttled_flows, &unset_all, 0); +} + +static struct sk_buff* +fq_dequeue_nonprio_flows(u32 index, struct dequeue_nonprio_ctx *ctx) +{ + u64 time_next_packet, time_to_send; + struct bpf_rb_node *rb_node; + struct sk_buff *skb = NULL; + struct bpf_list_head *head; + struct bpf_list_node *node; + struct bpf_spin_lock *lock; + struct fq_flow_node *flow; + struct skb_node *skbn; + bool is_empty; + + head = &fq_new_flows; + lock = &fq_new_flows_lock; + bpf_spin_lock(&fq_new_flows_lock); + node = bpf_list_pop_front(&fq_new_flows); + bpf_spin_unlock(&fq_new_flows_lock); + if (!node) { + head = &fq_old_flows; + lock = &fq_old_flows_lock; + bpf_spin_lock(&fq_old_flows_lock); + node = bpf_list_pop_front(&fq_old_flows); + bpf_spin_unlock(&fq_old_flows_lock); + if (!node) { + if (time_next_delayed_flow != ~0ULL) + ctx->expire = time_next_delayed_flow; + ctx->stop_iter = true; + return NULL; + } + } + + flow = container_of(node, struct fq_flow_node, list_node); + if (flow->credit <= 0) { + flow->credit += q_quantum; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow); + return NULL; + } + + bpf_spin_lock(&flow->lock); + rb_node = bpf_rbtree_first(&flow->queue); + if (!rb_node) { + bpf_spin_unlock(&flow->lock); + is_empty = fq_flows_is_empty(&fq_old_flows, &fq_old_flows_lock); + if (head == &fq_new_flows && !is_empty) + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow); + else + fq_flow_set_detached(flow); + + return NULL; + } + + skbn = container_of(rb_node, struct skb_node, node); + time_to_send = skbn->tstamp; + + time_next_packet = (time_to_send > flow->time_next_packet) ? + time_to_send : flow->time_next_packet; + if (dequeue_now < time_next_packet) { + bpf_spin_unlock(&flow->lock); + flow->time_next_packet = time_next_packet; + fq_flow_set_throttled(flow); + return NULL; + } + + rb_node = bpf_rbtree_remove(&flow->queue, rb_node); + bpf_spin_unlock(&flow->lock); + + if (!rb_node) + goto out; + + skbn = container_of(rb_node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + + if (!skb) + goto out; + + flow->credit -= qdisc_skb_cb(skb)->pkt_len; + flow->qlen--; + fq_qlen--; + + ctx->stop_iter = true; + +out: + fq_flows_add_head(head, lock, flow); + return skb; +} + +static struct sk_buff *fq_dequeue_prio(void) +{ + struct fq_flow_node *flow = NULL; + struct fq_stashed_flow *sflow; + struct bpf_rb_node *rb_node; + struct sk_buff *skb = NULL; + struct skb_node *skbn; + u32 hash = NUM_QUEUE; + + sflow = bpf_map_lookup_elem(&fq_stashed_flows, &hash); + if (!sflow) + return NULL; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (!flow) + return NULL; + + bpf_spin_lock(&flow->lock); + rb_node = bpf_rbtree_first(&flow->queue); + if (!rb_node) { + bpf_spin_unlock(&flow->lock); + goto xchg_flow_back; + } + + skbn = container_of(rb_node, struct skb_node, node); + rb_node = bpf_rbtree_remove(&flow->queue, &skbn->node); + bpf_spin_unlock(&flow->lock); + + if (!rb_node) { + skb = NULL; + goto xchg_flow_back; + } + + skbn = container_of(rb_node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + + fq_qlen--; + +xchg_flow_back: + bpf_kptr_xchg_back(&sflow->flow, flow); + + return skb; +} + +SEC("struct_ops/bpf_fq_dequeue") +struct sk_buff *BPF_PROG(bpf_fq_dequeue, struct Qdisc *sch) +{ + struct dequeue_nonprio_ctx cb_ctx = {}; + struct sk_buff *skb = NULL; + int i; + + skb = fq_dequeue_prio(); + if (skb) + return skb; + + ktime_cache = dequeue_now = bpf_ktime_get_ns(); + fq_check_throttled(); + bpf_for(i, 0, q_plimit) { + skb = fq_dequeue_nonprio_flows(i, &cb_ctx); + if (cb_ctx.stop_iter) + break; + }; + + if (skb) + return skb; + + if (cb_ctx.expire) + bpf_qdisc_watchdog_schedule(sch, cb_ctx.expire, q_timer_slack); + + return NULL; +} + +static int +fq_reset_flows(u32 index, void *ctx) +{ + struct bpf_list_node *node; + struct fq_flow_node *flow; + + bpf_spin_lock(&fq_new_flows_lock); + node = bpf_list_pop_front(&fq_new_flows); + bpf_spin_unlock(&fq_new_flows_lock); + if (!node) { + bpf_spin_lock(&fq_old_flows_lock); + node = bpf_list_pop_front(&fq_old_flows); + bpf_spin_unlock(&fq_old_flows_lock); + if (!node) + return 1; + } + + flow = container_of(node, struct fq_flow_node, list_node); + bpf_obj_drop(flow); + + return 0; +} + +static int +fq_reset_stashed_flows(u32 index, void *ctx) +{ + struct fq_flow_node *flow = NULL; + struct fq_stashed_flow *sflow; + + sflow = bpf_map_lookup_elem(&fq_stashed_flows, &index); + if (!sflow) + return 0; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (flow) + bpf_obj_drop(flow); + + return 0; +} + +SEC("struct_ops/bpf_fq_reset") +void BPF_PROG(bpf_fq_reset, struct Qdisc *sch) +{ + bool unset_all = true; + fq_qlen = 0; + bpf_loop(NUM_QUEUE + 1, fq_reset_stashed_flows, NULL, 0); + bpf_loop(NUM_QUEUE, fq_reset_flows, NULL, 0); + bpf_loop(NUM_QUEUE, fq_unset_throttled_flows, &unset_all, 0); + return; +} + +SEC(".struct_ops") +struct Qdisc_ops fq = { + .enqueue = (void *)bpf_fq_enqueue, + .dequeue = (void *)bpf_fq_dequeue, + .reset = (void *)bpf_fq_reset, + .id = "bpf_fq", +}; From patchwork Sun Jul 14 17:51:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13732782 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6E0913D533; Sun, 14 Jul 2024 17:51:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979501; cv=none; b=hwBY4lmjmj72HItIt/SUgEDFT3qDW+dbrFxGxWaIDcBgA1txU/o6n2zpDQXmgxfV0xmcnkFlEs9MDvQaXP5TcZ4qzxSIForOkWEsgKT8aiebORklU9rGZb5w8oHj3u9PhLSkIjW75bUWcEPps0WaF0/afvg3+1zTxnQ7Ih9+DzY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720979501; c=relaxed/simple; bh=bQ+34maP/UD2Q5MReBTSvH93tQya4LO9WvhCA14krd4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CEgl3MhTKB8uhGncIulM97tI3e7y+tVlIxS0PVWcrVa1UFeSTuyqLQAonKzA1G8HN9uuyzf6EJIQAhIeLlY90fju78L15nu++f2k6ojsGGj7HxHA9o6O5SbEHo37WzG/yjhb6TeX7qEJkpzMFjTZQm4+4dL/bJRN1OF2xWMMbTQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RVRG4QmE; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RVRG4QmE" Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-44931d9eda6so29162001cf.1; Sun, 14 Jul 2024 10:51:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720979499; x=1721584299; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qXabjXEAojtFoTktfk1aij/0ux5sndtJGYU+XjZZZnc=; b=RVRG4QmEuUGZG69jN61zvOy/Ea0XDuyofmjhAqCcWstTiz9QtoARer5qKHjOX8Is1z x2Yma/23k9Uloxc7saC1zF4tocdosOYTSuYZm+VnbbDQGOWgMQCkRIcNSvjQIkT4wKmu on6Y+bl76ZhzFs6rllZBie8jeq8jATD6Xq2F6OE4h29s5cPhOQ02nkNBxN6NDNrZbzQ+ Ou/JqMGKFXnp4MDh6XJunYlMeP/s8u1bXRv4aCB/N5UkrTNrAnod1y1TAxD7RCY5XeD2 pziuBDNPxoGTLDeCyhhNs00IKO7bVnRNQaTBGWYk7K4X/uK6sulrebEf0KMqQ2cUDSZt +XGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720979499; x=1721584299; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qXabjXEAojtFoTktfk1aij/0ux5sndtJGYU+XjZZZnc=; b=hS2JhyDaJdXIeO2EE/GSNH6cqVnUcixXrHLw6kK6YN5Aw8T0u/HrIhSIEynBYWz0t5 YyaK9jeZhqkfIqDxb1teN7QxqIS/Wa9zfyrlWeE3DtKdaieEZLm+eEyz9mjl3SDYCVwx 28OCglMiqwQ1rtQhk3R7e7iHM57HxibbRqF1f9lDZmYs29jVzNV0/B+EFpUb+UxXr73v SAz2zpndU637gZsntkgov/DAfLKsLtZ+tBcL/u6FlJzfNXT91UvzEFQkCpYAUGg9afVm C9+LyiuO0QFaARtrKqgUTVujRIhQ0fG2mbaidnTqp9Mr89ziKgxNy9Qn9lgIOEdOcC9S y1Yw== X-Gm-Message-State: AOJu0Yw8pK6BnTBNU2Meskuax4yFV+BpH9iIlJv5aWZtpM8jhnkuoeyY MaCJIr8tEvYPNrtYRTQeZbRR/ARrmleFb76gsqQ385XB+FNDm4BwiXp5JQ== X-Google-Smtp-Source: AGHT+IHzmeWTyAkZUwjWI2Y+jQgfma7EKavsxYmhe3zA2vpFYca4tVzLqNIksn0KMWtixExepJQUHw== X-Received: by 2002:a05:622a:120b:b0:447:f942:50c1 with SMTP id d75a77b69052e-44e5d54b50cmr135622381cf.29.1720979498632; Sun, 14 Jul 2024 10:51:38 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.91]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7e1e38sm17010481cf.25.2024.07.14.10.51.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 10:51:38 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v9 11/11] selftests: Add a bpf netem qdisc to selftest Date: Sun, 14 Jul 2024 17:51:30 +0000 Message-Id: <20240714175130.4051012-12-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240714175130.4051012-1-amery.hung@bytedance.com> References: <20240714175130.4051012-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This test implements a simple network emulator qdisc that simulates packet drop, loss and delay. The qdisc uses Gilbert-Elliott model to simulate packet drops. When used with mq qdisc, the bpf netem qdiscs on different tx queues maintain a global state machine using a bpf map. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 30 ++ .../selftests/bpf/progs/bpf_qdisc_netem.c | 258 ++++++++++++++++++ 2 files changed, 288 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c index 394bf5a4adae..ec9c0d166e89 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -6,6 +6,13 @@ #include "bpf_qdisc_fifo.skel.h" #include "bpf_qdisc_fq.skel.h" +struct crndstate { + u32 last; + u32 rho; +}; + +#include "bpf_qdisc_netem.skel.h" + #ifndef ENOTSUPP #define ENOTSUPP 524 #endif @@ -176,10 +183,33 @@ static void test_fq(void) bpf_qdisc_fq__destroy(fq_skel); } +static void test_netem(void) +{ + struct bpf_qdisc_netem *netem_skel; + struct bpf_link *link; + + netem_skel = bpf_qdisc_netem__open_and_load(); + if (!ASSERT_OK_PTR(netem_skel, "bpf_qdisc_netem__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(netem_skel->maps.netem); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_netem__destroy(netem_skel); + return; + } + + do_test("bpf_netem"); + + bpf_link__destroy(link); + bpf_qdisc_netem__destroy(netem_skel); +} + void test_bpf_qdisc(void) { if (test__start_subtest("fifo")) test_fifo(); if (test__start_subtest("fq")) test_fq(); + if (test__start_subtest("netem")) + test_netem(); } diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c new file mode 100644 index 000000000000..39be88a5f16a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c @@ -0,0 +1,258 @@ +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +int q_loss_model = CLG_GILB_ELL; +unsigned int q_limit = 1000; +signed long q_latency = 0; +signed long q_jitter = 0; +unsigned int q_loss = 1; +unsigned int q_qlen = 0; + +struct crndstate q_loss_cor = {.last = 0, .rho = 0,}; +struct crndstate q_delay_cor = {.last = 0, .rho = 0,}; + +struct skb_node { + u64 tstamp; + struct sk_buff __kptr *skb; + struct bpf_rb_node node; +}; + +struct clg_state { + u64 state; + u32 a1; + u32 a2; + u32 a3; + u32 a4; + u32 a5; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, __u32); + __type(value, struct clg_state); + __uint(max_entries, 1); +} g_clg_state SEC(".maps"); + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock t_root_lock; +private(A) struct bpf_rb_root t_root __contains(skb_node, node); + +static bool skb_tstamp_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct skb_node *skbn_a; + struct skb_node *skbn_b; + + skbn_a = container_of(a, struct skb_node, node); + skbn_b = container_of(b, struct skb_node, node); + + return skbn_a->tstamp < skbn_b->tstamp; +} + +static u32 get_crandom(struct crndstate *state) +{ + u64 value, rho; + unsigned long answer; + + if (!state || state->rho == 0) /* no correlation */ + return bpf_get_prandom_u32(); + + value = bpf_get_prandom_u32(); + rho = (u64)state->rho + 1; + answer = (value * ((1ull<<32) - rho) + state->last * rho) >> 32; + state->last = answer; + return answer; +} + +static s64 tabledist(s64 mu, s32 sigma, struct crndstate *state) +{ + u32 rnd; + + if (sigma == 0) + return mu; + + rnd = get_crandom(state); + + /* default uniform distribution */ + return ((rnd % (2 * (u32)sigma)) + mu) - sigma; +} + +static bool loss_gilb_ell(void) +{ + struct clg_state *clg; + u32 r1, r2, key = 0; + bool ret = false; + + clg = bpf_map_lookup_elem(&g_clg_state, &key); + if (!clg) + return false; + + r1 = bpf_get_prandom_u32(); + r2 = bpf_get_prandom_u32(); + + switch (clg->state) { + case GOOD_STATE: + if (r1 < clg->a1) + __sync_val_compare_and_swap(&clg->state, + GOOD_STATE, BAD_STATE); + if (r2 < clg->a4) + ret = true; + break; + case BAD_STATE: + if (r1 < clg->a2) + __sync_val_compare_and_swap(&clg->state, + BAD_STATE, GOOD_STATE); + if (r2 > clg->a3) + ret = true; + } + + return ret; +} + +static bool loss_event(void) +{ + switch (q_loss_model) { + case CLG_RANDOM: + return q_loss && q_loss >= get_crandom(&q_loss_cor); + case CLG_GILB_ELL: + return loss_gilb_ell(); + } + + return false; +} + +SEC("struct_ops/bpf_netem_enqueue") +int BPF_PROG(bpf_netem_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + struct skb_node *skbn; + int count = 1; + s64 delay = 0; + u64 now; + + if (loss_event()) + --count; + + if (count == 0) { + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; + } + + q_qlen++; + if (q_qlen > q_limit) { + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; + } + + skbn = bpf_obj_new(typeof(*skbn)); + if (!skbn) { + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; + } + + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_qdisc_skb_drop(skb, to_free); + + delay = tabledist(q_latency, q_jitter, &q_delay_cor); + now = bpf_ktime_get_ns(); + skbn->tstamp = now + delay; + + bpf_spin_lock(&t_root_lock); + bpf_rbtree_add(&t_root, &skbn->node, skb_tstamp_less); + bpf_spin_unlock(&t_root_lock); + + return NET_XMIT_SUCCESS; +} + +SEC("struct_ops/bpf_netem_dequeue") +struct sk_buff *BPF_PROG(bpf_netem_dequeue, struct Qdisc *sch) +{ + struct sk_buff *skb = NULL; + struct bpf_rb_node *node; + struct skb_node *skbn; + u64 now, tstamp; + + now = bpf_ktime_get_ns(); + + bpf_spin_lock(&t_root_lock); + node = bpf_rbtree_first(&t_root); + if (!node) { + bpf_spin_unlock(&t_root_lock); + return NULL; + } + + skbn = container_of(node, struct skb_node, node); + tstamp = skbn->tstamp; + if (tstamp <= now) { + node = bpf_rbtree_remove(&t_root, node); + bpf_spin_unlock(&t_root_lock); + + if (!node) + return NULL; + + skbn = container_of(node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + + q_qlen--; + return skb; + } + + bpf_spin_unlock(&t_root_lock); + bpf_qdisc_watchdog_schedule(sch, tstamp, 0); + return NULL; +} + +SEC("struct_ops/bpf_netem_init") +int BPF_PROG(bpf_netem_init, struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + return 0; +} + +SEC("struct_ops/bpf_netem_reset") +void BPF_PROG(bpf_netem_reset, struct Qdisc *sch) +{ + struct bpf_rb_node *node; + struct skb_node *skbn; + int i; + + bpf_for(i, 0, q_limit) { + struct sk_buff *skb = NULL; + + bpf_spin_lock(&t_root_lock); + node = bpf_rbtree_first(&t_root); + if (!node) { + bpf_spin_unlock(&t_root_lock); + break; + } + + skbn = container_of(node, struct skb_node, node); + node = bpf_rbtree_remove(&t_root, node); + bpf_spin_unlock(&t_root_lock); + + if (!node) + continue; + + skbn = container_of(node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_skb_release(skb); + bpf_obj_drop(skbn); + } + q_qlen = 0; +} + +SEC(".struct_ops") +struct Qdisc_ops netem = { + .enqueue = (void *)bpf_netem_enqueue, + .dequeue = (void *)bpf_netem_dequeue, + .init = (void *)bpf_netem_init, + .reset = (void *)bpf_netem_reset, + .id = "bpf_netem", +}; +