From patchwork Fri Dec 13 23:29:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908023 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E94D1EBFF7 for ; Fri, 13 Dec 2024 23:30:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132604; cv=none; b=c3Hw0J5pdphgB1AfW0HEXKwU0bYP9PwzPEiAX5JFHjZ+bTB9iOHHiCeOb29reXB4rLheGQG/25PD98U/21JGS09iTndm7j9BbHLLORQP+ks24ki6QcfNc1V6enyjW3mk0tuNZhd+RipYU/qfDEkK/HWfF2xllR0/+ewmszgmPKE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132604; c=relaxed/simple; bh=Eyr46HBdMzcezTSaJc8Q8hesaHfa7v2OqrjhG9rizno=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VtFGfT0NKttcAKAcSc32mREnNR3c62VIedLjueqqqOGumf8JpYDXbpoN66zI1HVOq69CMyH7OFvQ7xGHTXi7v4WSQ4cJcGvjI03K+0p/CLphZwwWIq16Qo8u3c6bqnPOMeV4G/c8jsrdR+EJVbbakaRAP+NOA2odrBK+Ghl0Bqg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=DTxE4J5r; arc=none smtp.client-ip=209.85.222.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="DTxE4J5r" Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-7b6f53c12adso147045285a.1 for ; Fri, 13 Dec 2024 15:30:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132601; x=1734737401; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Dz258LP6R5huynohPazHs2P5gkDWlbu66HknCOxo5I0=; b=DTxE4J5rBlipiMkr5sXeppmgBRUrWvF1Qmrp77V4NSOImviXsKydQaGm4EnigZm5Iy PxMBnrXzNkPikeYJqQ2bt3z8oQI2ex5xQmnRyxg3O2nofvAmcunFLfABg/g0gyNdN3Oz 5SvKs5xfpeQ4LdgwIGh5r02SXroS9SWTpTUhSludv8PoH9+5wPnou1BY4oE4fKq4RlfZ F/4IciUheTMNyOiwQ+73QsUOHnFOzX4ILCqI4agCLpaprNSNMRRtrdvLVR+U5F3AkBS+ c+m/sC+yY7NhyqqZxw5F5xb0nHyNYEMjrbK3Qkkc/0eT8AdUpzIUUDVuSh3tQ2jWhgiI r3OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132601; x=1734737401; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Dz258LP6R5huynohPazHs2P5gkDWlbu66HknCOxo5I0=; b=f0Zg8XvB2vkPmn4aSqZ6tM5jSdoZvFFa6gf5/nvWIMPRt/fixe7ty6j6fHNeoEI40o SEVOfA1dA6cIhqZlpgiK6NmcbdIuRlQLFSgfF2xf+GJ6EbH/OVPw3X9BKZ4IfUPrUHKZ 077Qa+914DaDC8Uk6BEIIWCbBVdU04RyewTnT7i0e/VQZYZ/cVpaHCRXd5gADqMZpE1n R17Kif8xEGXtRDvSwPeE3cXX5I6RA7skqE8XPOg96Tayn9TOC46EIzX2IfYuy39KqVfd zjsZjmZ+WPhDZZrSrQCVSwcDs3cQk6g79hTkws/rsAZOt/4o42cBw7+1mtU49EYY90bq c41Q== X-Gm-Message-State: AOJu0Ywsy1djA/Jd1Q/tyknK/SPJoHTqMYAE6/uOIx2U5fylB9863p3v XvZqkzTZ6ITFZmRf9wIMcaHu1P8CghyQGvzYgjD4omzIHxLtUpvkz6XziNvM3LI= X-Gm-Gg: ASbGncsC+4sb5UW0IqR9hghRvvs2pslrrlkafX3sPRNU3Q7CHq+4MFuSGvPFaxJWqrX dXyNaOjmzm9UHtdv7l/HlRFeBxNJKr8apt41q8EUNZIMcVxvkxCQ19+cLeJJU1/f7qSgszT4fZ+ V8MxJ23WwQEwiQ2YHt7UoqDP3LH2UKgKAVzcxDO+T+teOfSg1ZzCOhxcjv6a4/iizWUhbaec9N5 r+wRX+rk4vVIRvANDx70A/4fWazv9O1I/iFvxOZiaMtpRqR02xJc4o06G5uHMfMQY5EKWRqhOVb X-Google-Smtp-Source: AGHT+IGHSOZJiCUGotd3t52TlgCH8yY/uNGAugWIBn/Ii2l2ji3HsybQt/7gX1aV1KemSoPeANAr5w== X-Received: by 2002:a05:620a:199a:b0:7b6:d089:2757 with SMTP id af79cd13be357-7b6fbf2207bmr632473285a.35.1734132601070; Fri, 13 Dec 2024 15:30:01 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.29.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:00 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 01/13] bpf: Support getting referenced kptr from struct_ops argument Date: Fri, 13 Dec 2024 23:29:46 +0000 Message-Id: <20241213232958.2388301-2-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Allows struct_ops programs to acqurie referenced kptrs from arguments by directly reading the argument. The verifier will acquire a reference for struct_ops a argument tagged with "__ref" in the stub function in the beginning of the main program. The user will be able to access the referenced kptr directly by reading the context as long as it has not been released by the program. This new mechanism to acquire referenced kptr (compared to the existing "kfunc with KF_ACQUIRE") is introduced for ergonomic and semantic reasons. In the first use case, Qdisc_ops, an skb is passed to .enqueue in the first argument. This mechanism provides a natural way for users to get a referenced kptr in the .enqueue struct_ops programs and makes sure that a qdisc will always enqueue or drop the skb. Signed-off-by: Amery Hung --- include/linux/bpf.h | 3 +++ kernel/bpf/bpf_struct_ops.c | 26 ++++++++++++++++++++------ kernel/bpf/btf.c | 1 + kernel/bpf/verifier.c | 35 ++++++++++++++++++++++++++++++++--- 4 files changed, 56 insertions(+), 9 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1b84613b10ac..72bf941d1daf 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -968,6 +968,7 @@ struct bpf_insn_access_aux { struct { struct btf *btf; u32 btf_id; + u32 ref_obj_id; }; }; struct bpf_verifier_log *log; /* for verbose logs */ @@ -1480,6 +1481,8 @@ struct bpf_ctx_arg_aux { enum bpf_reg_type reg_type; struct btf *btf; u32 btf_id; + u32 ref_obj_id; + bool refcounted; }; struct btf_mod_pair { diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index fda3dd2ee984..6e7795744f6a 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -145,6 +145,7 @@ void bpf_struct_ops_image_free(void *image) } #define MAYBE_NULL_SUFFIX "__nullable" +#define REFCOUNTED_SUFFIX "__ref" #define MAX_STUB_NAME 128 /* Return the type info of a stub function, if it exists. @@ -206,9 +207,11 @@ static int prepare_arg_info(struct btf *btf, struct bpf_struct_ops_arg_info *arg_info) { const struct btf_type *stub_func_proto, *pointed_type; + bool is_nullable = false, is_refcounted = false; const struct btf_param *stub_args, *args; struct bpf_ctx_arg_aux *info, *info_buf; u32 nargs, arg_no, info_cnt = 0; + const char *suffix; u32 arg_btf_id; int offset; @@ -240,12 +243,19 @@ static int prepare_arg_info(struct btf *btf, info = info_buf; for (arg_no = 0; arg_no < nargs; arg_no++) { /* Skip arguments that is not suffixed with - * "__nullable". + * "__nullable or __ref". */ - if (!btf_param_match_suffix(btf, &stub_args[arg_no], - MAYBE_NULL_SUFFIX)) + is_nullable = btf_param_match_suffix(btf, &stub_args[arg_no], + MAYBE_NULL_SUFFIX); + is_refcounted = btf_param_match_suffix(btf, &stub_args[arg_no], + REFCOUNTED_SUFFIX); + if (!is_nullable && !is_refcounted) continue; + if (is_nullable) + suffix = MAYBE_NULL_SUFFIX; + else if (is_refcounted) + suffix = REFCOUNTED_SUFFIX; /* Should be a pointer to struct */ pointed_type = btf_type_resolve_ptr(btf, args[arg_no].type, @@ -253,7 +263,7 @@ static int prepare_arg_info(struct btf *btf, if (!pointed_type || !btf_type_is_struct(pointed_type)) { pr_warn("stub function %s__%s has %s tagging to an unsupported type\n", - st_ops_name, member_name, MAYBE_NULL_SUFFIX); + st_ops_name, member_name, suffix); goto err_out; } @@ -271,11 +281,15 @@ static int prepare_arg_info(struct btf *btf, } /* Fill the information of the new argument */ - info->reg_type = - PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; info->btf_id = arg_btf_id; info->btf = btf; info->offset = offset; + if (is_nullable) { + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; + } else if (is_refcounted) { + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID; + info->refcounted = true; + } info++; info_cnt++; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index e7a59e6462a9..a05ccf9ee032 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6580,6 +6580,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, info->reg_type = ctx_arg_info->reg_type; info->btf = ctx_arg_info->btf ? : btf_vmlinux; info->btf_id = ctx_arg_info->btf_id; + info->ref_obj_id = ctx_arg_info->ref_obj_id; return true; } } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 9f5de8d4fbd0..69753096075f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1402,6 +1402,17 @@ static int release_reference_state(struct bpf_func_state *state, int ptr_id) return -EINVAL; } +static bool find_reference_state(struct bpf_func_state *state, int ptr_id) +{ + int i; + + for (i = 0; i < state->acquired_refs; i++) + if (state->refs[i].id == ptr_id) + return true; + + return false; +} + static int release_lock_state(struct bpf_func_state *state, int type, int id, void *ptr) { int i, last_idx; @@ -5798,7 +5809,8 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off, /* check access to 'struct bpf_context' fields. Supports fixed offsets only */ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size, enum bpf_access_type t, enum bpf_reg_type *reg_type, - struct btf **btf, u32 *btf_id, bool *is_retval, bool is_ldsx) + struct btf **btf, u32 *btf_id, bool *is_retval, bool is_ldsx, + u32 *ref_obj_id) { struct bpf_insn_access_aux info = { .reg_type = *reg_type, @@ -5820,8 +5832,16 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, *is_retval = info.is_retval; if (base_type(*reg_type) == PTR_TO_BTF_ID) { + if (info.ref_obj_id && + !find_reference_state(cur_func(env), info.ref_obj_id)) { + verbose(env, "invalid bpf_context access off=%d. Reference may already be released\n", + off); + return -EACCES; + } + *btf = info.btf; *btf_id = info.btf_id; + *ref_obj_id = info.ref_obj_id; } else { env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size; } @@ -7135,7 +7155,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn struct bpf_retval_range range; enum bpf_reg_type reg_type = SCALAR_VALUE; struct btf *btf = NULL; - u32 btf_id = 0; + u32 btf_id = 0, ref_obj_id = 0; if (t == BPF_WRITE && value_regno >= 0 && is_pointer_value(env, value_regno)) { @@ -7148,7 +7168,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn return err; err = check_ctx_access(env, insn_idx, off, size, t, ®_type, &btf, - &btf_id, &is_retval, is_ldsx); + &btf_id, &is_retval, is_ldsx, &ref_obj_id); if (err) verbose_linfo(env, insn_idx, "; "); if (!err && t == BPF_READ && value_regno >= 0) { @@ -7179,6 +7199,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn if (base_type(reg_type) == PTR_TO_BTF_ID) { regs[value_regno].btf = btf; regs[value_regno].btf_id = btf_id; + regs[value_regno].ref_obj_id = ref_obj_id; } } regs[value_regno].type = reg_type; @@ -21662,6 +21683,7 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) { bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); struct bpf_subprog_info *sub = subprog_info(env, subprog); + struct bpf_ctx_arg_aux *ctx_arg_info; struct bpf_verifier_state *state; struct bpf_reg_state *regs; int ret, i; @@ -21769,6 +21791,13 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) mark_reg_known_zero(env, regs, BPF_REG_1); } + if (!subprog && env->prog->type == BPF_PROG_TYPE_STRUCT_OPS) { + ctx_arg_info = (struct bpf_ctx_arg_aux *)env->prog->aux->ctx_arg_info; + for (i = 0; i < env->prog->aux->ctx_arg_info_size; i++) + if (ctx_arg_info[i].refcounted) + ctx_arg_info[i].ref_obj_id = acquire_reference_state(env, 0); + } + ret = do_check(env); out: /* check for NULL is necessary, since cur_state can be freed inside From patchwork Fri Dec 13 23:29:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908024 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0D201EE031 for ; Fri, 13 Dec 2024 23:30:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132606; cv=none; b=HhHZPfqb+rF01fGXtdZEEHAkgtH43GtsDvu9ScGWHRIr/6TghcI1hn3op9J0/oOOwmI6Fk3bRWLBBCRyj6GPuT1Nc4o2SmM7HCwd4Vnt32q+k2sfh95DCB8BY1/CmkbQMkx1oWXBRMvb4uXaRsODj1KO2Yi6RonEO1iF1fRKDHQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132606; c=relaxed/simple; bh=UROdt2KIRlGXGZhC8MmjrpDHu3slO/reGPSVhfuuSh0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dhcLcRbWjnr0M3UNm36J2CRS5HcL9edCDAI4m6jVzAtQdvt8zPGHIEm3+6dheF9cEF6gmMQUmYC3UmMhJWysMWzlav8QPpFNuG3oUE1r8q6E+ppQNZANHPn7Xbh9XVYkby0gmSlShQi/Y23kfrX+xyhYLAg7PT2A6sTTVXbrUTo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=T9uV/XWu; arc=none smtp.client-ip=209.85.160.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="T9uV/XWu" Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-4674f1427deso27318601cf.1 for ; Fri, 13 Dec 2024 15:30:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132602; x=1734737402; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4uWgir9hdNGMP/IRXbQNB+3vdHPYt1xWCe9npuxYxzI=; b=T9uV/XWu3LoXyfeS833kHdlHGJkGuqEUpRKAVHgo+PY92x62mdu/twd5GXwvAsVKA6 qWDVzh2N2U4fYKyKy99ypdHhtk7xGwJFEhKFArLXHyi9BCjgMasa/QfNcGROhdysixRl K/FrMTzypinlKsqK9ESESphZUx8jVBOWqYL84Qayh63w8nrKS9bo3+fUQGAnvdux66Tl 7sgqCCYRZrmWzTn0+LsPjL5CQKM7U80zYKtlLeV5uoQf0nWGGrYjqAB4JeZ0Bz+ts+EG vF6poCZaFYso8mKILd1+eXSxJdomrlI8lhe3ahTth7T6cofOzmkWNpC+DrqvqRKYnOo5 L6pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132602; x=1734737402; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4uWgir9hdNGMP/IRXbQNB+3vdHPYt1xWCe9npuxYxzI=; b=P/1WvpynWepIgDdHQlVzWKnJgv/gvPRBJH+D54XQhnfAGR9gh/4SvHAMCPOZXUTPTp XxddVaEgBRT96Blogna1gN72KbyyfzQ7/dDjTgu1lfqm/gooCIfF1go/Kb/TF4wnq7Cy cRhsTOs7QKAFAGqVeNh0GkO+H5Bo+tFlMeqMWdZ3s7huCIWAjWkcDSsmGY3tblp5wEFu dPPPt6u++QNWqMu14tFo/ziGis8ygyKfysUrQOr02soLbRvuPmQKPDD+Ey+BYz/4UwID 42b/K4Pv0vx+7PHf0srbk6wUJyPUaLk/OtYJ/FVLYsykEiNYLI6Ml3f9iuSLifhy1sIM oqBQ== X-Gm-Message-State: AOJu0YyeI323cm/cRwAWO3Md+TO4jca426EiEgOBjGqD4eg9NutI/QyN //Z9FHAJemx0BV41XSgYK29jeAL5Iwvi7mwJMGn8YpLyHQUXq88PeRh2HIVe86o= X-Gm-Gg: ASbGncsLJ/AasFzFl7Ym/wAaj4OYxzoANztzt1DEGzDnOiwPwNdm1jGwjE343g+2oWA uaU+S2m2ZM1wvvq9nAKX4YVgG+AvPnrKTWooVcsoxuLPFrG1HvDoWNfVBV5RGJ72ANzMrUvtCPi cW4qTYeXSZkV3izgY+lWwpczz+T1XV7GSxgou4ZE1e1nQLaKmbggczmnxZ7tOfhPOOIdgF84kdr 8nBF1f7ca6QyYI0Ws3CTDazh14SvfAIwJPR4zI//2jNFzFUd08mYPN2M2FTYgzjwnA+YGsjarzU X-Google-Smtp-Source: AGHT+IHNLiyVdLTQryJ5zl/BlLI0w3IMaaqQp710LjXBU8Kp9kkx4QGyMUZU5Ga/n0POhRU/eSwXDQ== X-Received: by 2002:ac8:5741:0:b0:467:6cd9:3093 with SMTP id d75a77b69052e-467a582a976mr89521851cf.46.1734132602601; Fri, 13 Dec 2024 15:30:02 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:02 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 02/13] selftests/bpf: Test referenced kptr arguments of struct_ops programs Date: Fri, 13 Dec 2024 23:29:47 +0000 Message-Id: <20241213232958.2388301-3-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Test referenced kptr acquired through struct_ops argument tagged with "__ref". The success case checks whether 1) a reference to the correct type is acquired, and 2) the referenced kptr argument can be accessed in multiple paths as long as it hasn't been released. In the fail cases, we first confirm that a referenced kptr acquried through a struct_ops argument is not allowed to be leaked. Then, we make sure this new referenced kptr acquiring mechanism does not accidentally allow referenced kptrs to flow into global subprograms through their arguments. Signed-off-by: Amery Hung --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 7 ++ .../selftests/bpf/bpf_testmod/bpf_testmod.h | 2 + .../prog_tests/test_struct_ops_refcounted.c | 58 ++++++++++++++++ .../bpf/progs/struct_ops_refcounted.c | 67 +++++++++++++++++++ ...ruct_ops_refcounted_fail__global_subprog.c | 32 +++++++++ .../struct_ops_refcounted_fail__ref_leak.c | 17 +++++ 6 files changed, 183 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 987d41af71d2..244234546ae2 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -1135,10 +1135,17 @@ static int bpf_testmod_ops__test_maybe_null(int dummy, return 0; } +static int bpf_testmod_ops__test_refcounted(int dummy, + struct task_struct *task__ref) +{ + return 0; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, .test_maybe_null = bpf_testmod_ops__test_maybe_null, + .test_refcounted = bpf_testmod_ops__test_refcounted, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index fb7dff47597a..0e31586c1353 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -36,6 +36,8 @@ struct bpf_testmod_ops { /* Used to test nullable arguments. */ int (*test_maybe_null)(int dummy, struct task_struct *task); int (*unsupported_ops)(void); + /* Used to test ref_acquired arguments. */ + int (*test_refcounted)(int dummy, struct task_struct *task); /* The following fields are used to test shadow copies. */ char onebyte; diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c new file mode 100644 index 000000000000..976df951b700 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c @@ -0,0 +1,58 @@ +#include + +#include "struct_ops_refcounted.skel.h" +#include "struct_ops_refcounted_fail__ref_leak.skel.h" +#include "struct_ops_refcounted_fail__global_subprog.skel.h" + +/* Test that the verifier accepts a program that first acquires a referenced + * kptr through context and then releases the reference + */ +static void refcounted(void) +{ + struct struct_ops_refcounted *skel; + + skel = struct_ops_refcounted__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open_and_load")) + return; + + struct_ops_refcounted__destroy(skel); +} + +/* Test that the verifier rejects a program that acquires a referenced + * kptr through context without releasing the reference + */ +static void refcounted_fail__ref_leak(void) +{ + struct struct_ops_refcounted_fail__ref_leak *skel; + + skel = struct_ops_refcounted_fail__ref_leak__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__open_and_load")) + return; + + struct_ops_refcounted_fail__ref_leak__destroy(skel); +} + +/* Test that the verifier rejects a program that contains a global + * subprogram with referenced kptr arguments + */ +static void refcounted_fail__global_subprog(void) +{ + struct struct_ops_refcounted_fail__global_subprog *skel; + + skel = struct_ops_refcounted_fail__global_subprog__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__open_and_load")) + return; + + struct_ops_refcounted_fail__global_subprog__destroy(skel); +} + +void test_struct_ops_refcounted(void) +{ + if (test__start_subtest("refcounted")) + refcounted(); + if (test__start_subtest("refcounted_fail__ref_leak")) + refcounted_fail__ref_leak(); + if (test__start_subtest("refcounted_fail__global_subprog")) + refcounted_fail__global_subprog(); +} + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c new file mode 100644 index 000000000000..2c1326668b92 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c @@ -0,0 +1,67 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +extern void bpf_task_release(struct task_struct *p) __ksym; + +/* This is a test BPF program that uses struct_ops to access a referenced + * kptr argument. This is a test for the verifier to ensure that it + * 1) recongnizes the task as a referenced object (i.e., ref_obj_id > 0), and + * 2) the same reference can be acquired from multiple paths as long as it + * has not been released. + * + * test_refcounted() is equivalent to the C code below. It is written in assembly + * to avoid reads from task (i.e., getting referenced kptrs to task) being merged + * into single path by the compiler. + * + * int test_refcounted(int dummy, struct task_struct *task) + * { + * if (dummy % 2) + * bpf_task_release(task); + * else + * bpf_task_release(task); + * return 0; + * } + */ +SEC("struct_ops/test_refcounted") +int test_refcounted(unsigned long long *ctx) +{ + asm volatile (" \ + /* r6 = dummy */ \ + r6 = *(u64 *)(r1 + 0x0); \ + /* if (r6 & 0x1 != 0) */ \ + r6 &= 0x1; \ + if r6 == 0 goto l0_%=; \ + /* r1 = task */ \ + r1 = *(u64 *)(r1 + 0x8); \ + call %[bpf_task_release]; \ + goto l1_%=; \ +l0_%=: /* r1 = task */ \ + r1 = *(u64 *)(r1 + 0x8); \ + call %[bpf_task_release]; \ +l1_%=: /* return 0 */ \ +" : + : __imm(bpf_task_release) + : __clobber_all); + return 0; +} + +/* BTF FUNC records are not generated for kfuncs referenced + * from inline assembly. These records are necessary for + * libbpf to link the program. The function below is a hack + * to ensure that BTF FUNC records are generated. + */ +void __btf_root(void) +{ + bpf_task_release(NULL); +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_refcounted = { + .test_refcounted = (void *)test_refcounted, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c new file mode 100644 index 000000000000..c7e84e63b053 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c @@ -0,0 +1,32 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +extern void bpf_task_release(struct task_struct *p) __ksym; + +__noinline int subprog_release(__u64 *ctx __arg_ctx) +{ + struct task_struct *task = (struct task_struct *)ctx[1]; + int dummy = (int)ctx[0]; + + bpf_task_release(task); + + return dummy + 1; +} + +SEC("struct_ops/test_refcounted") +int test_refcounted(unsigned long long *ctx) +{ + struct task_struct *task = (struct task_struct *)ctx[1]; + + bpf_task_release(task); + + return subprog_release(ctx); +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_ref_acquire = { + .test_refcounted = (void *)test_refcounted, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c new file mode 100644 index 000000000000..6e82859eb187 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c @@ -0,0 +1,17 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +SEC("struct_ops/test_refcounted") +int BPF_PROG(test_refcounted, int dummy, + struct task_struct *task) +{ + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_ref_acquire = { + .test_refcounted = (void *)test_refcounted, +}; From patchwork Fri Dec 13 23:29:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908025 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5AB81EF0AA for ; Fri, 13 Dec 2024 23:30:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132607; cv=none; b=UJrcQrke9eQkeSwZOVDzUF1Di4saRKTEmeY0mExh7jhyC25BMYx1CTIjXsB298mzIGd/3Onyms0QmcbqeoBnjCx7XbfhVHKY5FbCdQKterk4v1SDson9fG/56TyuYlbP6rsPmhoCeO0R0HGqeQwEY2sgI9N7RTPOnsQp4CmtIzw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132607; c=relaxed/simple; bh=rO2nseOhkawG+8YRK6fu8d6yVEe8z8rn5Ny5rTBCLAM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QP1oLLVzyVUNLvtiv4p6w1hXIhyrs7B7LTvRUO4+JLRkebiQPIk7ysMO0vBslhOxqwEB2mARVU9d4h8isfaSKHF9J6kJIzgDQmJyRA1NIl6Nrg/AOCzzpWyNG9SHqTbYagJ0KZBrxQbDPr+3ZZh6MB+mh32zFUYP8fILJJcrgmA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=PqR9jZC0; arc=none smtp.client-ip=209.85.160.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="PqR9jZC0" Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-46677ef6910so24385881cf.2 for ; Fri, 13 Dec 2024 15:30:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132603; x=1734737403; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KzGU7EA7+CxZZl0ZD9zaKpznoSGHASSKNfn0psVXEmc=; b=PqR9jZC0t9BYU8X0ngkMaFm1bUHoDJ7aQGFC4gJkdRz2e/n3NYAsbL5gzwlOT6AkwB Rp4v8H15HOwY8GXXeRLB0CEPNYOrHtQTSR3ZM8vIy9BGarMDk9GSD3RmV4hpLZisaQBU g8ka3BzC1aG016WUmP2keXrstmm931BupWXD50ahBX2XgJVRvafX49CNhoAcPQYy62bk XYPiNrNi/KK/b0MNDVOsWjchzVefDWFhk8SDo9f04E0QiZmNkUS23paZa3XrVBTqjSjO a4GU23TL15JbtXONRdcYt5IXwMWMzyCTcKWjRPIYN5KSJY9BiVUoNhsjepeWvTub3T0z hEMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132603; x=1734737403; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KzGU7EA7+CxZZl0ZD9zaKpznoSGHASSKNfn0psVXEmc=; b=r0V3pxZ10SZILVHcSeEMBOfo22NU7aLdKplke+74ANkLnlZopq8+iawctkLCXp05Ef c+3aGkSU5DiPg1KZ05faWp1HpbyVP5lQ9aV6mvk0K9vygEkwtgd8KxxZfLQABfYslfmu Cqhtt5nBFK8b1p6bPYt2ln7Y9T8dqkga1us6kf5XxLbvkmSKDfLbvDM2X8vzsKvUqWqt 6Hlb4PR3sg66Rtw6Fn5G7KDbXXg/erBjYgk+OSLV2Uzoi7I0IaxK3jXPp299uMlwuWZS SzpF4ZAe7dAxLiNj4g2EBaG+FXUHyPzQ8p/fTo/EcIeHw8Z5/QhibAUh9pgJDmYupjCc gD7g== X-Gm-Message-State: AOJu0YzIXpS/XNOJNB0q9EQ/W6+X/9DPl8Z/A2AY+8b8rRQovg5y+2ej kRE2iV2WvxXyUFe3ygPmrA5j5pMrMgXoD0HVerVAG8qWAxdvRpcuHmIsTTON+gs= X-Gm-Gg: ASbGncvJo6BqMHEGshSavP8r98HtCTaqXB7v6kXB0/9nlYk6P2K0DaHVKuEWwusdkT/ W7RNhcXhf3vDaHA7QAIDRVwfusQjtzhrVkkZU2+VpxIC1yrjhmJlgHf5Y/tgeojDw74rCyrkQ5R HoiBWBB5oY17dEqrHKgWqLb2z+RCcl9fbX4FExPRziqi81WwUSOTlKZV81fvLIlnVpKlNsxHVGg 6YXcIM2v6RUmpe3HMBs9BH5UCYD9DH5MfppwTWGvpz7hk1Fvvh/PxL41+EjGrbQnr+fCeQjuIgB X-Google-Smtp-Source: AGHT+IFOEyawTuzhelkkZ5YhLC25ny1ozPpE10IusXh4SvjdnogSFTsDawNErpwZUT+xA6XS+MMaxg== X-Received: by 2002:a05:622a:1990:b0:467:5375:5804 with SMTP id d75a77b69052e-467a58296bemr87127611cf.38.1734132603562; Fri, 13 Dec 2024 15:30:03 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:03 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 03/13] bpf: Allow struct_ops prog to return referenced kptr Date: Fri, 13 Dec 2024 23:29:48 +0000 Message-Id: <20241213232958.2388301-4-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Allow a struct_ops program to return a referenced kptr if the struct_ops operator's return type is a struct pointer. To make sure the returned pointer continues to be valid in the kernel, several constraints are required: 1) The type of the pointer must matches the return type 2) The pointer originally comes from the kernel (not locally allocated) 3) The pointer is in its unmodified form Implementation wise, a referenced kptr first needs to be allowed to leak in check_reference_leak() if it is in the return register. Then, in check_return_code(), constraints 1-3 are checked. In addition, since the first user, Qdisc_ops::dequeue, allows a NULL pointer to be returned when there is no skb to be dequeued, we will allow a scalar value with value equals to NULL to be returned. In the future when there is a struct_ops user that always expects a valid pointer to be returned from an operator, we may extend tagging to the return value. We can tell the verifier to only allow NULL pointer return if the return value is tagged with MAY_BE_NULL. Signed-off-by: Amery Hung --- kernel/bpf/verifier.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 69753096075f..c04028106710 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10453,6 +10453,8 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit) { + enum bpf_prog_type type = resolve_prog_type(env->prog); + struct bpf_reg_state *reg = reg_state(env, BPF_REG_0); struct bpf_func_state *state = cur_func(env); bool refs_lingering = false; int i; @@ -10463,6 +10465,12 @@ static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exi for (i = 0; i < state->acquired_refs; i++) { if (state->refs[i].type != REF_TYPE_PTR) continue; + /* Allow struct_ops programs to leak referenced kptr through return value. + * Type checks are performed later in check_return_code. + */ + if (type == BPF_PROG_TYPE_STRUCT_OPS && !exception_exit && + reg->ref_obj_id == state->refs[i].id) + continue; verbose(env, "Unreleased reference id=%d alloc_insn=%d\n", state->refs[i].id, state->refs[i].insn_idx); refs_lingering = true; @@ -15993,13 +16001,15 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char const char *exit_ctx = "At program exit"; struct tnum enforce_attach_type_range = tnum_unknown; const struct bpf_prog *prog = env->prog; - struct bpf_reg_state *reg; + struct bpf_reg_state *reg = reg_state(env, regno); struct bpf_retval_range range = retval_range(0, 1); enum bpf_prog_type prog_type = resolve_prog_type(env->prog); int err; struct bpf_func_state *frame = env->cur_state->frame[0]; const bool is_subprog = frame->subprogno; bool return_32bit = false; + struct btf *btf = bpf_prog_get_target_btf(prog); + const struct btf_type *ret_type = NULL; /* LSM and struct_ops func-ptr's return type could be "void" */ if (!is_subprog || frame->in_exception_callback_fn) { @@ -16008,10 +16018,31 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char if (prog->expected_attach_type == BPF_LSM_CGROUP) /* See below, can be 0 or 0-1 depending on hook. */ break; - fallthrough; + if (!prog->aux->attach_func_proto->type) + return 0; + break; case BPF_PROG_TYPE_STRUCT_OPS: if (!prog->aux->attach_func_proto->type) return 0; + + if (frame->in_exception_callback_fn) + break; + + /* Allow a struct_ops program to return a referenced kptr if it + * matches the operator's return type and is in its unmodified + * form. A scalar zero (i.e., a null pointer) is also allowed. + */ + ret_type = btf_type_by_id(btf, prog->aux->attach_func_proto->type); + if (btf_type_is_ptr(ret_type) && reg->type & PTR_TO_BTF_ID && + reg->ref_obj_id) { + if (reg->btf_id != ret_type->type) { + verbose(env, "Return kptr type, struct %s, doesn't match function prototype, struct %s\n", + btf_type_name(reg->btf, reg->btf_id), + btf_type_name(btf, ret_type->type)); + return -EINVAL; + } + return __check_ptr_off_reg(env, reg, regno, false); + } break; default: break; @@ -16033,8 +16064,6 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char return -EACCES; } - reg = cur_regs(env) + regno; - if (frame->in_async_callback_fn) { /* enforce return zero from async callbacks like timer */ exit_ctx = "At async callback return"; @@ -16133,6 +16162,11 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char case BPF_PROG_TYPE_NETFILTER: range = retval_range(NF_DROP, NF_ACCEPT); break; + case BPF_PROG_TYPE_STRUCT_OPS: + if (!ret_type || !btf_type_is_ptr(ret_type)) + return 0; + range = retval_range(0, 0); + break; case BPF_PROG_TYPE_EXT: /* freplace program can return anything as its return value * depends on the to-be-replaced kernel func or bpf program. From patchwork Fri Dec 13 23:29:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908026 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 417381DDC24 for ; Fri, 13 Dec 2024 23:30:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132609; cv=none; b=suQML19mBVKx+/ff2fiMnVZVdH4ErcwCCeYX0jLZxsQYkg4f5zs3tEa+kF9ADpEOb85dq1aDThDOVw4NpnZHxHRmPgvH8KE+5zkEDpN12ijRZDE2kVToxZUUe7xcNVgbr6blx6uXe5QEC46AXT6TW6TxxT4E+X8gbXmxVZl6/s4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132609; c=relaxed/simple; bh=TPzsZdNWtFW8vcRGJQby6GpfACEn3oIIRbLydlxNfio=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HCBgO3JWEUatkVQgrFHTw7v9U/tEyD4dz/3hYAeQ3Kosuxd0DwzA38swmccuKUVggKmqYGtkml4XRVDswbYdQoOcNcsX0jV4WlM4FctUCv7QryciP0ZYMC6+CKINWtifcRzGCVK8XU/gesYRJOGY8grQ+Bj0YKz3AQwg/SsgVGg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=AxpNow8P; arc=none smtp.client-ip=209.85.160.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="AxpNow8P" Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-46753242ef1so34784731cf.1 for ; Fri, 13 Dec 2024 15:30:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132605; x=1734737405; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YtjTtmMT6O6w3VZS9xHihVZl/iedzM4fQQxBhi0eNmo=; b=AxpNow8Pi3Y0pDaGgtyZMxBZ6HvKcgrKBrXsazFA00m4KwrQmlSivz+4crR3GjyuTz cZivy4kv57ob35p26nhFLi4n5jayIl9jB4SQDrWyzzqv1QiJk5iUQaI5vO3H+XCHJLfN m1N8e0UN6anG/+3GeFhXV4louHNUYi8eiotfD4IXWNaxYdi4eRdTG34XkKo1iM8NSxTa G9D8bbOVV1jlI+ocf0BdgUfIuDw07wzFuvY2TsnWlG3+bb6WgShs8tjnz/hvbyYVRiue LNi/v7vJ8IF8WQME0L6juJNwco55HfpNBpjP+RMMVa//Cd9D89wpzPh8SAzJWPnP0kWn /lAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132605; x=1734737405; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YtjTtmMT6O6w3VZS9xHihVZl/iedzM4fQQxBhi0eNmo=; b=T+bApLkGEnpd0r1GW1Bzys3HW/J3SHrUE4OO2sMC0fMf3HlcdSeL/1zxzNPPOey8nK t9Y76MZet0pGpLv+85b1/UmQeQ8HYBdqDrfmHCh0unOx2vd7J6Aj0nyQ06qnItJVP7ch GVSXDX/BlCAdsaJOwl75cpr8LVUDwHY6htp6KCf0DCcfYG0RenY4Yvh7GZZJ5Wz8zK97 xMkCAvk/iICoBWK7rpP367L8qHSg91Ci6CgdrdrtvoRbMuKzUCdnDIQcIYMKjK6/mAyw 8WeOBVsl/lE2wKQJyvrnzGhZ/r9AsfE7d4Bw4g1UizamV1aJ61cspG6DNKr50YWRwxHO wakA== X-Gm-Message-State: AOJu0Yz29DZ815YqMXALFLOGSh7QEp0/k8EBAb+rrffuFgMlln+hKhWb FACBtfXfWogoh2R/F2G5baBmGgOdEq+eeDrNsXYCn9SoswMf3GYzo2f///LMQhM= X-Gm-Gg: ASbGncu/zrpFRFdyFq2LDa0vC5Av8+MMqcC0kSxlcWNU+rkfOvEmKwJh1cyRBF5V0dF hICZGVCM0542r+WeW+CeGiYNvDlLJLs0UPOSTXVEqvx6tfLY221iF6gGnE2XuRqJgKfrGWBm5mO n5xuQfWpx1pexmRSxe41Gp33dFvsGo64tHtylBs2TcaARnbZyF/atDhyhJLzjXHs6pMCNIxXtoB z2U+JhNOZRz+j4++ihHarVzMnNe3O7kcqvrki1ktCnBzkjf3NsPCJpg5EdQUKozhm9jvyw+u/Jz X-Google-Smtp-Source: AGHT+IEwrvI0TWA5i3ygjy9DwZWY6eVFOFX1HCnKe1n5GIxKyiEggePCTzKUB6l8+q8IUwtnOb/khQ== X-Received: by 2002:ac8:5d14:0:b0:466:85eb:6118 with SMTP id d75a77b69052e-467a574d124mr67024061cf.16.1734132605341; Fri, 13 Dec 2024 15:30:05 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:04 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 04/13] selftests/bpf: Test returning referenced kptr from struct_ops programs Date: Fri, 13 Dec 2024 23:29:49 +0000 Message-Id: <20241213232958.2388301-5-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Test struct_ops programs returning referenced kptr. When the return type of a struct_ops operator is pointer to struct, the verifier should only allow programs that return a scalar NULL or a non-local kptr with the correct type in its unmodified form. Signed-off-by: Amery Hung --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 8 ++ .../selftests/bpf/bpf_testmod/bpf_testmod.h | 4 + .../prog_tests/test_struct_ops_kptr_return.c | 87 +++++++++++++++++++ .../bpf/progs/struct_ops_kptr_return.c | 29 +++++++ ...uct_ops_kptr_return_fail__invalid_scalar.c | 24 +++++ .../struct_ops_kptr_return_fail__local_kptr.c | 30 +++++++ ...uct_ops_kptr_return_fail__nonzero_offset.c | 23 +++++ .../struct_ops_kptr_return_fail__wrong_type.c | 28 ++++++ 8 files changed, 233 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 244234546ae2..cfab09f16cc2 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -1141,11 +1141,19 @@ static int bpf_testmod_ops__test_refcounted(int dummy, return 0; } +static struct task_struct * +bpf_testmod_ops__test_return_ref_kptr(int dummy, struct task_struct *task__ref, + struct cgroup *cgrp) +{ + return NULL; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, .test_maybe_null = bpf_testmod_ops__test_maybe_null, .test_refcounted = bpf_testmod_ops__test_refcounted, + .test_return_ref_kptr = bpf_testmod_ops__test_return_ref_kptr, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index 0e31586c1353..a66659314e67 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct cgroup; struct bpf_testmod_test_read_ctx { char *buf; @@ -38,6 +39,9 @@ struct bpf_testmod_ops { int (*unsupported_ops)(void); /* Used to test ref_acquired arguments. */ int (*test_refcounted)(int dummy, struct task_struct *task); + /* Used to test returning referenced kptr. */ + struct task_struct *(*test_return_ref_kptr)(int dummy, struct task_struct *task, + struct cgroup *cgrp); /* The following fields are used to test shadow copies. */ char onebyte; diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c new file mode 100644 index 000000000000..bc2fac39215a --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c @@ -0,0 +1,87 @@ +#include + +#include "struct_ops_kptr_return.skel.h" +#include "struct_ops_kptr_return_fail__wrong_type.skel.h" +#include "struct_ops_kptr_return_fail__invalid_scalar.skel.h" +#include "struct_ops_kptr_return_fail__nonzero_offset.skel.h" +#include "struct_ops_kptr_return_fail__local_kptr.skel.h" + +/* Test that the verifier accepts a program that acquires a referenced + * kptr and releases the reference through return + */ +static void kptr_return(void) +{ + struct struct_ops_kptr_return *skel; + + skel = struct_ops_kptr_return__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open_and_load")) + return; + + struct_ops_kptr_return__destroy(skel); +} + +/* Test that the verifier rejects a program that returns a kptr of the + * wrong type + */ +static void kptr_return_fail__wrong_type(void) +{ + struct struct_ops_kptr_return_fail__wrong_type *skel; + + skel = struct_ops_kptr_return_fail__wrong_type__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__wrong_type__open_and_load")) + return; + + struct_ops_kptr_return_fail__wrong_type__destroy(skel); +} + +/* Test that the verifier rejects a program that returns a non-null scalar */ +static void kptr_return_fail__invalid_scalar(void) +{ + struct struct_ops_kptr_return_fail__invalid_scalar *skel; + + skel = struct_ops_kptr_return_fail__invalid_scalar__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__invalid_scalar__open_and_load")) + return; + + struct_ops_kptr_return_fail__invalid_scalar__destroy(skel); +} + +/* Test that the verifier rejects a program that returns kptr with non-zero offset */ +static void kptr_return_fail__nonzero_offset(void) +{ + struct struct_ops_kptr_return_fail__nonzero_offset *skel; + + skel = struct_ops_kptr_return_fail__nonzero_offset__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__nonzero_offset__open_and_load")) + return; + + struct_ops_kptr_return_fail__nonzero_offset__destroy(skel); +} + +/* Test that the verifier rejects a program that returns local kptr */ +static void kptr_return_fail__local_kptr(void) +{ + struct struct_ops_kptr_return_fail__local_kptr *skel; + + skel = struct_ops_kptr_return_fail__local_kptr__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__local_kptr__open_and_load")) + return; + + struct_ops_kptr_return_fail__local_kptr__destroy(skel); +} + +void test_struct_ops_kptr_return(void) +{ + if (test__start_subtest("kptr_return")) + kptr_return(); + if (test__start_subtest("kptr_return_fail__wrong_type")) + kptr_return_fail__wrong_type(); + if (test__start_subtest("kptr_return_fail__invalid_scalar")) + kptr_return_fail__invalid_scalar(); + if (test__start_subtest("kptr_return_fail__nonzero_offset")) + kptr_return_fail__nonzero_offset(); + if (test__start_subtest("kptr_return_fail__local_kptr")) + kptr_return_fail__local_kptr(); +} + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c new file mode 100644 index 000000000000..29b7719cd4c9 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c @@ -0,0 +1,29 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * allow a referenced kptr or a NULL pointer to be returned. A referenced kptr to task + * here is acquried automatically as the task argument is tagged with "__ref". + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + if (dummy % 2) { + bpf_task_release(task); + return NULL; + } + return task; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c new file mode 100644 index 000000000000..d67982ba8224 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c @@ -0,0 +1,24 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a non-zero scalar value. + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + bpf_task_release(task); + return (struct task_struct *)1; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c new file mode 100644 index 000000000000..9a4247432539 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c @@ -0,0 +1,30 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a local kptr. + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + struct task_struct *t; + + t = bpf_obj_new(typeof(*task)); + if (!t) + return task; + + return t; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c new file mode 100644 index 000000000000..5bb0b4029d11 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c @@ -0,0 +1,23 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a modified referenced kptr. + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + return (struct task_struct *)&task->jobctl; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c new file mode 100644 index 000000000000..32365cb7af49 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c @@ -0,0 +1,28 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a referenced kptr of the wrong type. + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(test_return_ref_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + struct task_struct *ret; + + ret = (struct task_struct *)bpf_cgroup_acquire(cgrp); + bpf_task_release(task); + + return ret; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)test_return_ref_kptr, +}; From patchwork Fri Dec 13 23:29:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908027 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7F7E1F1311 for ; Fri, 13 Dec 2024 23:30:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132610; cv=none; b=LGD0wOHK08XQEX+j8DXtaK12rmbhnj9tvp9Kdlwv139nzVnLElgGnt/HiOL8Kxw5h8MrJp9w13F4V22LQACN4oCI+HWx5dOZ6tfCD3B/K1Jd1MM/qKqeAFzfSL5gi5nPNWpfVnpCXNysVUOJ72ZT3cIZWPQRhHlIie20QV/ByhY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132610; c=relaxed/simple; bh=dC8bLrw9k9YKRvGZ6gL0NtXUDBM94pcw3MkYOmD6ngs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=muqLPohxMu6aejJv9v59mcVSckrsi+Sezhf6ZBozSK9FnJyTlxQwso04SwM4AE4O3stdn6lr+u4nBUbjijboNc/SH8DsSmelLP9Z4CBgmkystZrslXVTD1ASC5W6f0vQdQQB/oGrJ88hF4kbpp4ucoKh45onP8Lme+cwciWG3w0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=G13//gGd; arc=none smtp.client-ip=209.85.222.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="G13//gGd" Received: by mail-qk1-f180.google.com with SMTP id af79cd13be357-7b6e9317a2aso292829285a.0 for ; Fri, 13 Dec 2024 15:30:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132607; x=1734737407; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=O9bC+OM/CF5dwJDo+XVgC5pDUNUdCrhJE2NWGRn108Q=; b=G13//gGdh2KlIpkqUDm0ghf+nOwpcev9gYYS9lo7lzjxa0H/CYj0gmGVl/DCZ5PhJn 71yTj+wYBSN14XATXmRtN8E934dAZ3UUOWWUis4WqOvviySmMXo9SRVHIYAKK4ChUmcV gGCkEjcLheEDxsdeuDp17bB8/5ti3tMsgIqCNEk773bO5xyrCqJ9AHIfMGaX60IdX+Yh ajRn5avKystws+FmcUQbQvEaM5vr6q7EImYEOJc1ZEBPypc0TVgdfIxlb7svk/dh+R90 K5eevR+Ps05CzZI6jK5+mnIt4qw6EvMei7I6AIsaOSENlj/1sZBEB3/511gb2pJ25mN7 H46w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132607; x=1734737407; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O9bC+OM/CF5dwJDo+XVgC5pDUNUdCrhJE2NWGRn108Q=; b=rB0+Rlrn6824UsdC9JJOQuEbeMBka7KYF12O2sdPAWed+xmvgxW5qeKXq1U8Qzlm87 1VMnw9veEWIq9d0DeAx24j5m5HV15zC/hSqSpHqA04JikC9Ty8pUerJOPnJ3cRh1X0oh TxqWcanziidf9yoiGQCUA7LCmfLxU42GIkkmA2rag6PH6uXNwPsDlYLJDROjELxXdBV4 TCuoyhrVBmrifhWw0FiYyc7KruV4ZZ89kuVdDV2whYXbEyUtrzGIl9jxCtP3Us6tMg4F 4Yx8NkFe91JiISZXB3abuaX70y4dydADOkClIrdjU6774baYmgRVys+4ODyKQ2nlaZry htfg== X-Gm-Message-State: AOJu0YwHpq0WjfcUzQJ8d5zjtnmxc1RLz3CQGZ575ek177C9VrjkVdhi JIz6pLCPRQGVoMDNbqQRGpiYEh+bP8zXiwGb0nQ5jdEQcXiLlDEmx/IP8cteIF0= X-Gm-Gg: ASbGncuWNa/mqPrYB57vTP0JHPLP/6AcecK1NRC61h62j3QlkQcaJaEQdpY44RfR+Nb aYRjiijq1IAZTRNZ2lOZv9ob6eGZZkRxqEECzpAH7bWWngxWXcdr71q9RuaiPQpLCcpqLa24KWx AWaIZSMmb97TBG/s3DRKrsAHAfP0eeUeJOI9Ms1YRKpJrZkIMIldZgl1N81CCmPQK5EO7bnR/tJ CwszCIIy5/2ps6zhiZvmkM0tzPnrEtQ+WOeNpP7WVGwPBDS170RDHzIXwYiKviLfFBptS26D2gz X-Google-Smtp-Source: AGHT+IGS9m2Jt7ZKxO8JJ/t+fYxdTLR9gL7HoAeBsmNZDHMBmrDWWqByd0SPRMfuHs+MymFwZ0CU1g== X-Received: by 2002:a05:620a:2690:b0:7b6:ea67:72d0 with SMTP id af79cd13be357-7b6fbee68c5mr878370485a.4.1734132606738; Fri, 13 Dec 2024 15:30:06 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:05 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 05/13] bpf: net_sched: Support implementation of Qdisc_ops in bpf Date: Fri, 13 Dec 2024 23:29:50 +0000 Message-Id: <20241213232958.2388301-6-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Enable users to implement a classless qdisc using bpf. The last few patches in this series has prepared struct_ops to support core operators in Qdisc_ops. The recent advancement in bpf such as allocated objects, bpf list and bpf rbtree has also provided powerful and flexible building blocks to realize sophisticated scheduling algorithms. Therefore, in this patch, we start allowing qdisc to be implemented using bpf struct_ops. Users can implement Qdisc_ops.{enqueue, dequeue, init, reset, and .destroy in Qdisc_ops in bpf and register the qdisc dynamically into the kernel. Signed-off-by: Cong Wang Co-developed-by: Amery Hung Signed-off-by: Amery Hung --- include/linux/btf.h | 1 + kernel/bpf/btf.c | 4 +- net/sched/Kconfig | 12 +++ net/sched/Makefile | 1 + net/sched/bpf_qdisc.c | 214 ++++++++++++++++++++++++++++++++++++++++ net/sched/sch_api.c | 7 +- net/sched/sch_generic.c | 3 +- 7 files changed, 236 insertions(+), 6 deletions(-) create mode 100644 net/sched/bpf_qdisc.c diff --git a/include/linux/btf.h b/include/linux/btf.h index 4214e76c9168..eb16218fdf52 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -563,6 +563,7 @@ const char *btf_name_by_offset(const struct btf *btf, u32 offset); const char *btf_str_by_offset(const struct btf *btf, u32 offset); struct btf *btf_parse_vmlinux(void); struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); +u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, int off); u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id, const struct bpf_prog *prog); u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id, diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index a05ccf9ee032..f733dbf24261 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6375,8 +6375,8 @@ static bool is_int_ptr(struct btf *btf, const struct btf_type *t) return btf_type_is_int(t); } -static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, - int off) +u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, + int off) { const struct btf_param *args; const struct btf_type *t; diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 8180d0c12fce..ccd0255da5a5 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -403,6 +403,18 @@ config NET_SCH_ETS If unsure, say N. +config NET_SCH_BPF + bool "BPF-based Qdisc" + depends on BPF_SYSCALL && BPF_JIT && DEBUG_INFO_BTF + help + This option allows BPF-based queueing disiplines. With BPF struct_ops, + users can implement supported operators in Qdisc_ops using BPF programs. + The queue holding skb can be built with BPF maps or graphs. + + Say Y here if you want to use BPF-based Qdisc. + + If unsure, say N. + menuconfig NET_SCH_DEFAULT bool "Allow override default queue discipline" help diff --git a/net/sched/Makefile b/net/sched/Makefile index 82c3f78ca486..904d784902d1 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -62,6 +62,7 @@ obj-$(CONFIG_NET_SCH_FQ_PIE) += sch_fq_pie.o obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o obj-$(CONFIG_NET_SCH_ETF) += sch_etf.o obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o +obj-$(CONFIG_NET_SCH_BPF) += bpf_qdisc.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c new file mode 100644 index 000000000000..a2e2db29e5fc --- /dev/null +++ b/net/sched/bpf_qdisc.c @@ -0,0 +1,214 @@ +#include +#include +#include +#include +#include +#include +#include + +static struct bpf_struct_ops bpf_Qdisc_ops; + +struct bpf_sk_buff_ptr { + struct sk_buff *skb; +}; + +static int bpf_qdisc_init(struct btf *btf) +{ + return 0; +} + +static const struct bpf_func_proto * +bpf_qdisc_get_func_proto(enum bpf_func_id func_id, + const struct bpf_prog *prog) +{ + switch (func_id) { + default: + return bpf_base_func_proto(func_id, prog); + } +} + +BTF_ID_LIST_SINGLE(bpf_sk_buff_ids, struct, sk_buff) +BTF_ID_LIST_SINGLE(bpf_sk_buff_ptr_ids, struct, bpf_sk_buff_ptr) + +static bool bpf_qdisc_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + struct btf *btf = prog->aux->attach_btf; + u32 arg; + + arg = get_ctx_arg_idx(btf, prog->aux->attach_func_proto, off); + if (!strcmp(prog->aux->attach_func_name, "enqueue")) { + if (arg == 2 && type == BPF_READ) { + info->reg_type = PTR_TO_BTF_ID | PTR_TRUSTED; + info->btf = btf; + info->btf_id = bpf_sk_buff_ptr_ids[0]; + return true; + } + } + + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + const struct btf_type *t, *skbt; + size_t end; + + skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); + t = btf_type_by_id(reg->btf, reg->btf_id); + if (t != skbt) { + bpf_log(log, "only read is supported\n"); + return -EACCES; + } + + switch (off) { + case offsetof(struct sk_buff, tstamp): + end = offsetofend(struct sk_buff, tstamp); + break; + case offsetof(struct sk_buff, priority): + end = offsetofend(struct sk_buff, priority); + break; + case offsetof(struct sk_buff, mark): + end = offsetofend(struct sk_buff, mark); + break; + case offsetof(struct sk_buff, queue_mapping): + end = offsetofend(struct sk_buff, queue_mapping); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, tc_classid): + end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, tc_classid); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, data[0]) ... + offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, + data[QDISC_CB_PRIV_LEN - 1]): + end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, data[QDISC_CB_PRIV_LEN - 1]); + break; + case offsetof(struct sk_buff, tc_index): + end = offsetofend(struct sk_buff, tc_index); + break; + default: + bpf_log(log, "no write support to sk_buff at off %d\n", off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of sk_buff ended at %zu\n", + off, size, end); + return -EACCES; + } + + return 0; +} + +static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { + .get_func_proto = bpf_qdisc_get_func_proto, + .is_valid_access = bpf_qdisc_is_valid_access, + .btf_struct_access = bpf_qdisc_btf_struct_access, +}; + +static int bpf_qdisc_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + const struct Qdisc_ops *uqdisc_ops; + struct Qdisc_ops *qdisc_ops; + u32 moff; + + uqdisc_ops = (const struct Qdisc_ops *)udata; + qdisc_ops = (struct Qdisc_ops *)kdata; + + moff = __btf_member_bit_offset(t, member) / 8; + switch (moff) { + case offsetof(struct Qdisc_ops, priv_size): + if (uqdisc_ops->priv_size) + return -EINVAL; + return 1; + case offsetof(struct Qdisc_ops, static_flags): + if (uqdisc_ops->static_flags) + return -EINVAL; + return 1; + case offsetof(struct Qdisc_ops, peek): + if (!uqdisc_ops->peek) + qdisc_ops->peek = qdisc_peek_dequeued; + return 1; + case offsetof(struct Qdisc_ops, id): + if (bpf_obj_name_cpy(qdisc_ops->id, uqdisc_ops->id, + sizeof(qdisc_ops->id)) <= 0) + return -EINVAL; + return 1; + } + + return 0; +} + +static int bpf_qdisc_reg(void *kdata, struct bpf_link *link) +{ + return register_qdisc(kdata); +} + +static void bpf_qdisc_unreg(void *kdata, struct bpf_link *link) +{ + return unregister_qdisc(kdata); +} + +static int Qdisc_ops__enqueue(struct sk_buff *skb__ref, struct Qdisc *sch, + struct sk_buff **to_free) +{ + return 0; +} + +static struct sk_buff *Qdisc_ops__dequeue(struct Qdisc *sch) +{ + return NULL; +} + +static struct sk_buff *Qdisc_ops__peek(struct Qdisc *sch) +{ + return NULL; +} + +static int Qdisc_ops__init(struct Qdisc *sch, struct nlattr *arg, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static void Qdisc_ops__reset(struct Qdisc *sch) +{ +} + +static void Qdisc_ops__destroy(struct Qdisc *sch) +{ +} + +static struct Qdisc_ops __bpf_ops_qdisc_ops = { + .enqueue = Qdisc_ops__enqueue, + .dequeue = Qdisc_ops__dequeue, + .peek = Qdisc_ops__peek, + .init = Qdisc_ops__init, + .reset = Qdisc_ops__reset, + .destroy = Qdisc_ops__destroy, +}; + +static struct bpf_struct_ops bpf_Qdisc_ops = { + .verifier_ops = &bpf_qdisc_verifier_ops, + .reg = bpf_qdisc_reg, + .unreg = bpf_qdisc_unreg, + .init_member = bpf_qdisc_init_member, + .init = bpf_qdisc_init, + .name = "Qdisc_ops", + .cfi_stubs = &__bpf_ops_qdisc_ops, + .owner = THIS_MODULE, +}; + +static int __init bpf_qdisc_kfunc_init(void) +{ + return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); +} +late_initcall(bpf_qdisc_kfunc_init); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 2eefa4783879..f074053c4232 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -358,7 +359,7 @@ static struct Qdisc_ops *qdisc_lookup_ops(struct nlattr *kind) read_lock(&qdisc_mod_lock); for (q = qdisc_base; q; q = q->next) { if (nla_strcmp(kind, q->id) == 0) { - if (!try_module_get(q->owner)) + if (!bpf_try_module_get(q, q->owner)) q = NULL; break; } @@ -1287,7 +1288,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, /* We will try again qdisc_lookup_ops, * so don't keep a reference. */ - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err = -EAGAIN; goto err_out; } @@ -1398,7 +1399,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, netdev_put(dev, &sch->dev_tracker); qdisc_free(sch); err_out2: - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err_out: *errp = err; return NULL; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 38ec18f73de4..1e770ec251a0 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -1083,7 +1084,7 @@ static void __qdisc_destroy(struct Qdisc *qdisc) ops->destroy(qdisc); lockdep_unregister_key(&qdisc->root_lock_key); - module_put(ops->owner); + bpf_module_put(ops, ops->owner); netdev_put(dev, &qdisc->dev_tracker); trace_qdisc_destroy(qdisc); From patchwork Fri Dec 13 23:29:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908028 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BCBCD1F131B for ; Fri, 13 Dec 2024 23:30:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132610; cv=none; b=OM2kkbN+4GKDJI6syfBbRZeLm26aNC6et6gPKwXOs341LDXAGS2dtLCVE8UpVZ792re6c9V6YK0MLokI4jcXfULbQ4HHycY4hQ4tkWNcYq5ZxWf+jPgcPJBj56S+I2FcZu2St3lNBTP4QDCXIgeSaba6g/Cjoc+0suJpq5QAM+M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132610; c=relaxed/simple; bh=YKCb30RDsE4+Fn8NS6P9LGffDPMtCrO9o2BAMrFfjR0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Smav7xdav4QlPf/sQvn4PlipHUvbfA1/whNbxrqRJyKD6bSOeXXKf4I4OQgbwbZEeciJzGXs7B8ElvsZSojWYb/r+DC44w9uTZmt4d1B3P+zNQe5vI9bo1AT8hBSpR4U/35VSqI9HMadZYKgPtobbIVXLxe8trZrcnDlTcrCqaM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=FHk21L4C; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="FHk21L4C" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-7b6c70220f7so158866185a.0 for ; Fri, 13 Dec 2024 15:30:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132607; x=1734737407; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WWEnUr44gSryBgOiEp4wVFPMwX6gdjGF4VTepdjaT5k=; b=FHk21L4CrMDB19lpEfW1QAhIPATOV7Pz3rilQsN0Tcxj+TEeb/ptbFDTZlSp1rgkse SMFNoJfocVVhF8KR5SaLM0aSWih6RNs86risSX6mU4l65mpNcEAXkjoKOQnOqiQ4cqNP fPO+lbdY0pZ4lhYdsdNUIPqyEmxxCpYQO3xiRxgGrcLZLf8Hve5SvVY48vD7IZTq2/gv sdvEJZdaXs5WsEmOt/BymBdaeC75UlkMwakBozUOrQOpOuMGX0u0uhfOZdlo6K3tbejv kusx8YwxF6JS3Eqm96sZ1qgcS/eQFdlb/INSxdUe5FYydmhv3ksa0d/G5O1gt4kHk5Z8 D7HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132607; x=1734737407; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WWEnUr44gSryBgOiEp4wVFPMwX6gdjGF4VTepdjaT5k=; b=c0yIGuvMiOmkR2SLxStpJlBZ1P3haiNXhUPVDYPLrh16qdAvd8iRnmmvVGiI8b2+nw jD0DrSZmWO1B4rIRl8X1CosE2yPaB7JD5zwvocnEo0LtIEoo9qBTVPVlb/QKB0BwAike 8KIfgeHz+TixEyY1Qqwm94PWCmnu4Fb8k5yV+P+owEGz4CJVLOoGe3tDqcUY/H9OhVQ6 5v7o+5vJPOXmBEWDEigEub5doO0Ci1F02gw/h34cYdfn5XgiLg8sGsd3HljnEZTrzwb/ dvDDHHaN2/ufjsHYO6qLvIDiO7USvtXxP4EERxOJcu2Gj4DhD1kLl0CeDBtsa6bsGerv JJNw== X-Gm-Message-State: AOJu0YweXvK9868QtzLd+dmahCnbi5HcztmpLaWpxbBYAjCEPVYA1avK x8NkBJn1LdOec1l8Ef3zIaI5g6jc7wSYNX38ftRneYTJhKKT8wR1vq34emVC0TY= X-Gm-Gg: ASbGncuK0iVn1O5DPWQ05dUelvvTtGXYIreEAPmnyYWbaju8AfShigMjndXw24EqcCK Zd+rgUoIoMnau3CtaUQWFuLSkxYGO1yeUAmMn3W3S3/VGpDe8LsUfYL9ZbqiKhuVNvIYEiaeklS a7Eg9JMZlDCWVaXsnEzYlHt/VfbfurUogvd+c06cEXAvTFHp60xbG5O2XegDblty1DWSYz2VVWW 6iFbM9sgdoT7Su1GbIl+x92W0T1f6N8LCggW4VV6K9WozlUashjqtydq2JVfr6mvdWF4qUxCq2r X-Google-Smtp-Source: AGHT+IEyRb6FE8d7V1uY6ACdJfdv+7fIlx8XcwfjROlargKGqIuaz7lKkL5OgtyP5OkF2N6nxKLFbA== X-Received: by 2002:a05:620a:19a0:b0:7b6:ece3:827f with SMTP id af79cd13be357-7b6fbeca4a1mr726463285a.2.1734132607669; Fri, 13 Dec 2024 15:30:07 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:07 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 06/13] bpf: net_sched: Add basic bpf qdisc kfuncs Date: Fri, 13 Dec 2024 23:29:51 +0000 Message-Id: <20241213232958.2388301-7-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Add basic kfuncs for working on skb in qdisc. Both bpf_qdisc_skb_drop() and bpf_kfree_skb() can be used to release a reference to an skb. However, bpf_qdisc_skb_drop() can only be called in .enqueue where a to_free skb list is available from kernel to defer the release. bpf_kfree_skb() should be used elsewhere. It is also used in bpf_obj_free_fields() when cleaning up skb in maps and collections. bpf_skb_get_hash() returns the flow hash of an skb, which can be used to build flow-based queueing algorithms. Finally, allow users to create read-only dynptr via bpf_dynptr_from_skb(). Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 77 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 76 insertions(+), 1 deletion(-) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index a2e2db29e5fc..28959424eab0 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -106,6 +106,67 @@ static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, return 0; } +__bpf_kfunc_start_defs(); + +/* bpf_skb_get_hash - Get the flow hash of an skb. + * @skb: The skb to get the flow hash from. + */ +__bpf_kfunc u32 bpf_skb_get_hash(struct sk_buff *skb) +{ + return skb_get_hash(skb); +} + +/* bpf_kfree_skb - Release an skb's reference and drop it immediately. + * @skb: The skb whose reference to be released and dropped. + */ +__bpf_kfunc void bpf_kfree_skb(struct sk_buff *skb) +{ + kfree_skb(skb); +} + +/* bpf_qdisc_skb_drop - Drop an skb by adding it to a deferred free list. + * @skb: The skb whose reference to be released and dropped. + * @to_free_list: The list of skbs to be dropped. + */ +__bpf_kfunc void bpf_qdisc_skb_drop(struct sk_buff *skb, + struct bpf_sk_buff_ptr *to_free_list) +{ + __qdisc_drop(skb, (struct sk_buff **)to_free_list); +} + +__bpf_kfunc_end_defs(); + +#define BPF_QDISC_KFUNC_xxx \ + BPF_QDISC_KFUNC(bpf_skb_get_hash, KF_TRUSTED_ARGS) \ + BPF_QDISC_KFUNC(bpf_kfree_skb, KF_RELEASE) \ + BPF_QDISC_KFUNC(bpf_qdisc_skb_drop, KF_RELEASE) \ + +BTF_KFUNCS_START(bpf_qdisc_kfunc_ids) +#define BPF_QDISC_KFUNC(name, flag) BTF_ID_FLAGS(func, name, flag) +BPF_QDISC_KFUNC_xxx +#undef BPF_QDISC_KFUNC +BTF_ID_FLAGS(func, bpf_dynptr_from_skb, KF_TRUSTED_ARGS) +BTF_KFUNCS_END(bpf_qdisc_kfunc_ids) + +#define BPF_QDISC_KFUNC(name, _) BTF_ID_LIST_SINGLE(name##_ids, func, name) +BPF_QDISC_KFUNC_xxx +#undef BPF_QDISC_KFUNC + +static int bpf_qdisc_kfunc_filter(const struct bpf_prog *prog, u32 kfunc_id) +{ + if (kfunc_id == bpf_qdisc_skb_drop_ids[0]) + if (strcmp(prog->aux->attach_func_name, "enqueue")) + return -EACCES; + + return 0; +} + +static const struct btf_kfunc_id_set bpf_qdisc_kfunc_set = { + .owner = THIS_MODULE, + .set = &bpf_qdisc_kfunc_ids, + .filter = bpf_qdisc_kfunc_filter, +}; + static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { .get_func_proto = bpf_qdisc_get_func_proto, .is_valid_access = bpf_qdisc_is_valid_access, @@ -209,6 +270,20 @@ static struct bpf_struct_ops bpf_Qdisc_ops = { static int __init bpf_qdisc_kfunc_init(void) { - return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); + int ret; + const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = { + { + .btf_id = bpf_sk_buff_ids[0], + .kfunc_btf_id = bpf_kfree_skb_ids[0] + }, + }; + + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_qdisc_kfunc_set); + ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors, + ARRAY_SIZE(skb_kfunc_dtors), + THIS_MODULE); + ret = ret ?: register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); + + return ret; } late_initcall(bpf_qdisc_kfunc_init); From patchwork Fri Dec 13 23:29:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908029 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9976D1F2C2D for ; Fri, 13 Dec 2024 23:30:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132611; cv=none; b=h892NI5SkGLqgaD4ONDQ1jbuKEpdjfJHvXuJcC78s7yc6P3fRZgD08lnr+EAucnYa3TSpRg/xPo2hfE/iHED0sONpMjKHBpOinyw4WS6CWY6B4Zu4QI7QvYzbCfbV3V1uYsLCqNLfbwTpB5ZnvAyv4hl+3OoFypzS/VrBBeszcg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132611; c=relaxed/simple; bh=2foTH+iUj94CLPM3igD82fuEWdY0C9Zj11bfis2nK+c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=svMfMQMxGhtR2wSlikLGmllhsdFy7yXc4KjLlHbHRQp7SI+PqGPfiLob2gHc5CYEwRL28aiOC2NbnE5oxTso6ngpuJb85gDDRe0tGZ0tI/QFLfL87OsA4eE5YbZPMPbMOV4toEQh4jhVR+DiSN7R/bOPE2o6SFl2EmNqB3alfrw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=bLfAAbsa; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="bLfAAbsa" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-467838e75ffso27725861cf.3 for ; Fri, 13 Dec 2024 15:30:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132608; x=1734737408; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=G2OXYqrE0972zwioEMfpSl1a3S0EO1eGNBM1E6eSNMg=; b=bLfAAbsa/01McOgcHhKbcPSb01sZLGtYC1Xwofi+pXLt5HhmDROAakBJTb+2fRG6+v IAOpnMkOnEfNa8/0tCjJhnjqHJX2Kdyj0+qRnqSPI5gKnq0Us/xtjlmrZk41N+yOuiPF g1qm+dgcVWbkbpW6QiWFgrSndi15/zVqlpJZ+TWyB0y4591yBz+wc+Rw9QDRtpLr/sFi R+fmw7876Z4PDKVwFOM3lJgpQ6EbCw/r0T0dHk6kFN4WJ+kGlU1HfaHocewsmGJwr10A a1A/uCsR42vDF82H8rpsM1rmD/GoAJ3cDVOkLeDY03tGjiVDOC698nLP88lIVygXvc0X QTKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132608; x=1734737408; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G2OXYqrE0972zwioEMfpSl1a3S0EO1eGNBM1E6eSNMg=; b=sWuIe7eWdtxWZSF+XGlE8lnkk+UO03cEczsKsuE58SSwoS7gFFwpqAELhF/j9fKMK/ 2SYVc1gGsgCfeIeOmAPI+4tDrrwo8WhrkwpAOfshsbcDXhKqMGT3IncGBmwyXG87Tz5R VBQlrZxR3kD03v9MmL0JA8hjZOnvhwcTyXrIOAw+gvZAADNuuAOFV6eXz7GqBwy8DIkT LgOwtUirGCbpyR01KO7k/n2StNjUjaCXyX/nKd4oNviBRpA3y5nRzKXSmiZMwdh9CgFR izmUnQd/ztEd2nZd6x5G/23NQS/AmY6CmTdhbt1QyxxA7rCOkvwvqCUSu03K8P3oHPC/ 5/gA== X-Gm-Message-State: AOJu0YwSx6UkY5lBkpEPBXZgzdHtu+dCZ/fJLVF4e/ZUp8qClU4FjSZ/ iZfJGp9hRVG8vbJS+wKTKhqtajOH25fE33IH4An1WRFsPrxAKZBCTlMjcnApkgg= X-Gm-Gg: ASbGnculM8tVt1E9Kmzw4j7QGXIUi9cqtpe9Dnw6SDKd+ll2G7G0MYQUK8EfbD0gqsb IWDFySS2Ep0y/q4jPI9o0ghZ/yWxb0EKXeh0RbF6KHHda2HxV3akdmUJo9uNXWcRdIae3J+xphg yEl/CYT2CYYYxwG/wAUTRtpMQoX4bqtpwFe/llVi3OqOwuNIO93uyKw7AFDPlemswXl+Q3HfCPR lU/lsIrT19NG+ZsVy2HKqjHdFTbtH6/Ftm0A5xJ++jKr+KUKCsyXEHojUbNNaDvHtyT7Q8cqLGH X-Google-Smtp-Source: AGHT+IHaMPV75hBtQpwrvsLV1zVwaAXv4UTowkerpUQcLAoXotUqlkI3a8ZfHYgi6Mv59hbl5MTQ8g== X-Received: by 2002:a05:622a:308:b0:467:6e25:3f30 with SMTP id d75a77b69052e-467a57562f1mr77808161cf.12.1734132608545; Fri, 13 Dec 2024 15:30:08 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:08 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 07/13] bpf: net_sched: Add a qdisc watchdog timer Date: Fri, 13 Dec 2024 23:29:52 +0000 Message-Id: <20241213232958.2388301-8-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Add a watchdog timer to bpf qdisc. The watchdog can be used to schedule the execution of qdisc through kfunc, bpf_qdisc_schedule(). It can be useful for building traffic shaping scheduling algorithm, where the time the next packet will be dequeued is known. Signed-off-by: Amery Hung --- include/net/sch_generic.h | 4 +++ net/sched/bpf_qdisc.c | 51 ++++++++++++++++++++++++++++++++++++++- net/sched/sch_api.c | 11 +++++++++ net/sched/sch_generic.c | 8 ++++++ 4 files changed, 73 insertions(+), 1 deletion(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 5d74fa7e694c..6a252b1b0680 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1357,4 +1357,8 @@ static inline void qdisc_synchronize(const struct Qdisc *q) msleep(1); } +int bpf_qdisc_init_pre_op(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack); +void bpf_qdisc_destroy_post_op(struct Qdisc *sch); +void bpf_qdisc_reset_post_op(struct Qdisc *sch); + #endif diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 28959424eab0..7c155207fe1e 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -8,6 +8,10 @@ static struct bpf_struct_ops bpf_Qdisc_ops; +struct bpf_sched_data { + struct qdisc_watchdog watchdog; +}; + struct bpf_sk_buff_ptr { struct sk_buff *skb; }; @@ -17,6 +21,32 @@ static int bpf_qdisc_init(struct btf *btf) return 0; } +int bpf_qdisc_init_pre_op(struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_init(&q->watchdog, sch); + return 0; +} +EXPORT_SYMBOL(bpf_qdisc_init_pre_op); + +void bpf_qdisc_reset_post_op(struct Qdisc *sch) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_cancel(&q->watchdog); +} +EXPORT_SYMBOL(bpf_qdisc_reset_post_op); + +void bpf_qdisc_destroy_post_op(struct Qdisc *sch) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_cancel(&q->watchdog); +} +EXPORT_SYMBOL(bpf_qdisc_destroy_post_op); + static const struct bpf_func_proto * bpf_qdisc_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) @@ -134,12 +164,25 @@ __bpf_kfunc void bpf_qdisc_skb_drop(struct sk_buff *skb, __qdisc_drop(skb, (struct sk_buff **)to_free_list); } +/* bpf_qdisc_watchdog_schedule - Schedule a qdisc to a later time using a timer. + * @sch: The qdisc to be scheduled. + * @expire: The expiry time of the timer. + * @delta_ns: The slack range of the timer. + */ +__bpf_kfunc void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_schedule_range_ns(&q->watchdog, expire, delta_ns); +} + __bpf_kfunc_end_defs(); #define BPF_QDISC_KFUNC_xxx \ BPF_QDISC_KFUNC(bpf_skb_get_hash, KF_TRUSTED_ARGS) \ BPF_QDISC_KFUNC(bpf_kfree_skb, KF_RELEASE) \ BPF_QDISC_KFUNC(bpf_qdisc_skb_drop, KF_RELEASE) \ + BPF_QDISC_KFUNC(bpf_qdisc_watchdog_schedule, KF_TRUSTED_ARGS) \ BTF_KFUNCS_START(bpf_qdisc_kfunc_ids) #define BPF_QDISC_KFUNC(name, flag) BTF_ID_FLAGS(func, name, flag) @@ -154,9 +197,14 @@ BPF_QDISC_KFUNC_xxx static int bpf_qdisc_kfunc_filter(const struct bpf_prog *prog, u32 kfunc_id) { - if (kfunc_id == bpf_qdisc_skb_drop_ids[0]) + if (kfunc_id == bpf_qdisc_skb_drop_ids[0]) { if (strcmp(prog->aux->attach_func_name, "enqueue")) return -EACCES; + } else if (kfunc_id == bpf_qdisc_watchdog_schedule_ids[0]) { + if (strcmp(prog->aux->attach_func_name, "enqueue") && + strcmp(prog->aux->attach_func_name, "dequeue")) + return -EACCES; + } return 0; } @@ -189,6 +237,7 @@ static int bpf_qdisc_init_member(const struct btf_type *t, case offsetof(struct Qdisc_ops, priv_size): if (uqdisc_ops->priv_size) return -EINVAL; + qdisc_ops->priv_size = sizeof(struct bpf_sched_data); return 1; case offsetof(struct Qdisc_ops, static_flags): if (uqdisc_ops->static_flags) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index f074053c4232..507abddcdafd 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1357,6 +1357,13 @@ static struct Qdisc *qdisc_create(struct net_device *dev, rcu_assign_pointer(sch->stab, stab); } +#ifdef CONFIG_NET_SCH_BPF + if (ops->owner == BPF_MODULE_OWNER) { + err = bpf_qdisc_init_pre_op(sch, tca[TCA_OPTIONS], extack); + if (err != 0) + goto err_out4; + } +#endif if (ops->init) { err = ops->init(sch, tca[TCA_OPTIONS], extack); if (err != 0) @@ -1393,6 +1400,10 @@ static struct Qdisc *qdisc_create(struct net_device *dev, */ if (ops->destroy) ops->destroy(sch); +#ifdef CONFIG_NET_SCH_BPF + if (ops->owner == BPF_MODULE_OWNER) + bpf_qdisc_destroy_post_op(sch); +#endif qdisc_put_stab(rtnl_dereference(sch->stab)); err_out3: lockdep_unregister_key(&sch->root_lock_key); diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 1e770ec251a0..ea4ee7f914be 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1039,6 +1039,10 @@ void qdisc_reset(struct Qdisc *qdisc) if (ops->reset) ops->reset(qdisc); +#ifdef CONFIG_NET_SCH_BPF + if (ops->owner == BPF_MODULE_OWNER) + bpf_qdisc_reset_post_op(qdisc); +#endif __skb_queue_purge(&qdisc->gso_skb); __skb_queue_purge(&qdisc->skb_bad_txq); @@ -1082,6 +1086,10 @@ static void __qdisc_destroy(struct Qdisc *qdisc) if (ops->destroy) ops->destroy(qdisc); +#ifdef CONFIG_NET_SCH_BPF + if (ops->owner == BPF_MODULE_OWNER) + bpf_qdisc_destroy_post_op(qdisc); +#endif lockdep_unregister_key(&qdisc->root_lock_key); bpf_module_put(ops, ops->owner); From patchwork Fri Dec 13 23:29:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908030 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 037C01B4F1A for ; Fri, 13 Dec 2024 23:30:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132612; cv=none; b=hj+wTXqJWFob/MO78YeGAZiDy+5yicp4vicjKRKbdM9RNafimIh4k/EL7Sjmx45sYK8jqyfy9w1ZXBRIyqdwKoT6+k/g5schwH0Iua/UYcp6aCyeDOo76urTqdnbB1cHNU8PNZ4fvFLqbnccrUFSo7NsGrkHIL99P/d877W6HBc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132612; c=relaxed/simple; bh=u4js09ZrJprP9Ct6KS/KG66aSEf0301PiPjx92HhRYA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=htfR8+g/h3IQUAk9iudg8Fkq6eImKNlhVCg/bmF0pNw8rsuZqLlVtiExU8L64VVvZaJvZi5j5sw86rP5pm6oJ+obcdlpUSRYX8E/ojrJ8CI4OCDVVbW/P4yy6VVRqZMGg8+WL4PQBSjFd2oFbVA9K4lXWAXBWaOwXJc8gy2RZS8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Htq04AlO; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Htq04AlO" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-467a1ee7ff2so15649061cf.0 for ; Fri, 13 Dec 2024 15:30:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132610; x=1734737410; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AsUHCw4T0E7XChR9WsYxTd868449t9RXx5ZWu1kInUM=; b=Htq04AlOlj5VkevqSqDPSSvoCiTKcpm6D8OxycxzNdf2rOBwvdi3CVGm+vHVWOhJco hcGd8rJJFIyd4JbBBsOdiopBHK7SY5NmXv3S5Tg2mPAdTbb3wzCKwVPdJzixgnWk/oS/ GqqHR5B2e47DnK2zX1yexsMPvMZm3syERf0yTEVeHG9XQ+SnJT8bkX5UjT8m3gymNbS9 6bo5K8xMzDRDKl8mgqmQMPmRppiWDqZUMuhPYx+/xw73TKhOY9WD6axBr6QjNQfubXVp Ir5G0vKpGJqq6dvA3wYATAgEQMrIcHaknqYePY0yLNQWcJb/v1SQwvItkvkc7gNEjOMY dqNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132610; x=1734737410; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AsUHCw4T0E7XChR9WsYxTd868449t9RXx5ZWu1kInUM=; b=rAYfLewvu3uU2ZxV25ccXMzGTLjoK35fh3gZRfHf9hniafqK4/ppm8Z46MkqXZ+zjs j2nztibE4tNUWpAdPIyRzAymwQPuBVXcgu7OSCxYYZzIqpaVZqy18s/V7IhaPk83ntIC VxtIW6m86xlZf3LCLRK1vDkFgB0A99c2EI+J2GxavRxkoYCAl9ACeJc6GBwfI7kY2Qjs Z9JQKHZFniZvhbnEwBWA2JRIEOS7Kus+xSxi7tkjEyKKposFbcf0UUVFWJpCCzEB2m5W Fwz7V7uY+FvL6+ReUxAUAFoFBbgmnbKaWhdNZkyomF9vyDYLLSp1h6jKpFGjBTCVMek+ VNQg== X-Gm-Message-State: AOJu0YyrRvWfcMQ0xSJ+0ZJmhFjL0hUCQog/pQ1jQ1I7xXlbG9HJgvV7 PjDdRyYfpIJd3wh/VVIVzVhtFfJ49NoITbK1YLaivydgYLvOJpDtHA6DBJ0dsp8= X-Gm-Gg: ASbGncsZmnZbqViEYvrtXraCn4Iwq5HENsIxFz2lXbPRVKDlAkVg6xz7aJf/hZBi0HD FzAydpMSK/UCEV5KIJFVrBCyLQSMxTE7C1109gLBgW0q/QZmyEFC5e/3zGHb21kT1/+M3+FYEYl BpfKL36kLnWa5q1ECB7nC47YaNDmTRZWG/yofDalOdfpdZqXgF1tJKx9cBsb+fLu8OoMTAlCFQx 2HNhgBxov7VOeAHglWwKFVpeLzmPGjqEAJsx33b8vDNdEjt5Zeo+HIa0pzMTvfS7AzyvGQNkFSH X-Google-Smtp-Source: AGHT+IF1w0JRYjaYuWkP2HsGdMaLVD3k7SbV34NZbx2XywOZUx4L6+IKiXSlKL9dxXI81FbxYMYgIQ== X-Received: by 2002:ac8:7dd6:0:b0:466:96ef:90c with SMTP id d75a77b69052e-467a5829a18mr66297661cf.41.1734132609973; Fri, 13 Dec 2024 15:30:09 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:09 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 08/13] bpf: net_sched: Support updating bstats Date: Fri, 13 Dec 2024 23:29:53 +0000 Message-Id: <20241213232958.2388301-9-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Add a kfunc to update Qdisc bstats when an skb is dequeued. The kfunc is only available in .dequeue programs. Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 7c155207fe1e..b5ac3b9923fb 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -176,6 +176,15 @@ __bpf_kfunc void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 qdisc_watchdog_schedule_range_ns(&q->watchdog, expire, delta_ns); } +/* bpf_qdisc_bstats_update - Update Qdisc basic statistics + * @sch: The qdisc from which an skb is dequeued. + * @skb: The skb to be dequeued. + */ +__bpf_kfunc void bpf_qdisc_bstats_update(struct Qdisc *sch, const struct sk_buff *skb) +{ + bstats_update(&sch->bstats, skb); +} + __bpf_kfunc_end_defs(); #define BPF_QDISC_KFUNC_xxx \ @@ -183,6 +192,7 @@ __bpf_kfunc_end_defs(); BPF_QDISC_KFUNC(bpf_kfree_skb, KF_RELEASE) \ BPF_QDISC_KFUNC(bpf_qdisc_skb_drop, KF_RELEASE) \ BPF_QDISC_KFUNC(bpf_qdisc_watchdog_schedule, KF_TRUSTED_ARGS) \ + BPF_QDISC_KFUNC(bpf_qdisc_bstats_update, KF_TRUSTED_ARGS) \ BTF_KFUNCS_START(bpf_qdisc_kfunc_ids) #define BPF_QDISC_KFUNC(name, flag) BTF_ID_FLAGS(func, name, flag) @@ -204,6 +214,9 @@ static int bpf_qdisc_kfunc_filter(const struct bpf_prog *prog, u32 kfunc_id) if (strcmp(prog->aux->attach_func_name, "enqueue") && strcmp(prog->aux->attach_func_name, "dequeue")) return -EACCES; + } else if (kfunc_id == bpf_qdisc_bstats_update_ids[0]) { + if (strcmp(prog->aux->attach_func_name, "dequeue")) + return -EACCES; } return 0; From patchwork Fri Dec 13 23:29:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908031 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E87E1F03F2 for ; Fri, 13 Dec 2024 23:30:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132613; cv=none; b=aQ2v3MPJMoZHLSZsGnzdYrPUSsuXYVaJO4fmMAVaOo+fMmAofb72qBehWNfu8DS366FB/jhwIiTIGdH5fcodIsLgArYjFbxab4nq+6CoLtKgM5elKrM2SuxZKbQU773DQE5NEoNIhkiAdm/g5CK7prWdH5fgUUWc0nI3wI9P6eM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132613; c=relaxed/simple; bh=X+t8gDEwNbFCtVM95/5isGpSKd5GAziNVGIkVSSzfPg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qkOrhHaFRo9q7sOzQ/JJT2p05A9lI3cKntTDAwBhxlBH/WIQCpidfYE+Bdk9dyDg3oF3ZGaqljUl81CDZ673sBtHcOqjPMt6f+WaZNBh5/UPWce9j5slRKGOHboPW2A5MGDI5K5/zsVlEM0KcUFvL8y2nymmcQXoYX+5qCsIbfA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=O90NrWUd; arc=none smtp.client-ip=209.85.222.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="O90NrWUd" Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-7b6ff72ba5aso81710585a.1 for ; Fri, 13 Dec 2024 15:30:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132611; x=1734737411; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=exqCsIoLkuY4OTkf+/hY3xBxRD9kSVmRMozwdnr8pz8=; b=O90NrWUd4Bnr8pX5fjseoTNFdwK+KJvnmyvPupOeSzTDY2osw7LwnZb+/MSHO58Idw l2meNhiaJ3qBlBG6g2UrF7ZVe3ilKU3j/PM9h6Cp4Miq07WhZW0FsE7yvLcAvBKzgeYU Lska9uXft6POELeMJ9agF1go82UslmttA6wbaaPHbUcsIJSDrpA35hi+x+keeeX7ZXiX UI2Ld4Ywd9JqD5L7DbOWxFNoarQO7NVW+SCwKVfVTWT1oYPhpRSv3JOYHcbxUfIpLZzL DtyT2euuaFXOlTzv9vR3IaHMPTHfEjtOem12q2k7X83TFRk2CKseB1BfJ1kMtPFLGyRt uMwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132611; x=1734737411; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=exqCsIoLkuY4OTkf+/hY3xBxRD9kSVmRMozwdnr8pz8=; b=RGLoZkHl6JJV2YwWKmxA0VJv5o+xugcmRDiuJ/uR0FY0TU2BW9GiGSUvh52hHWUogL 7iZVow7f/DlrzMYawr3Bhd49BWxFd9WHTIU0xgxb+C1g+D1HC8Rc9UoMijiAjHF9XcNH 5ngEa5bJVCDqKvNYbxWsC3o4KSvKlOWMsGnRsxpcdPosLZcC4lC9Ywu6+lrOP+OUalFX U5D8myYvgmnm5iZBGzyWbo3/5ZlbcFk9fm7wanisauqdavtFdGZL1ePBExqvWBNtpNyf TmK/nxYd8YOLx2yONrELx4vYgNeigZukjtJW7ICZdLFEMqh3TF1yKbOsiR1xc8ok1H9m xbCQ== X-Gm-Message-State: AOJu0YzfTCqns/VzZag77ngsp1kEvGLjIeLL5z2il8Po+2yozVjRPEnz vGC8RVM9ZvvsopKBDrzCe0cOfpoHFRDnJ2HbhDTg72ycuPEXJo3g8nbBKsh+vIl5rKk7yvn4yBt VHB4= X-Gm-Gg: ASbGncvavKlYI1cs9xPXQCdTtEFxRDYFI2x54dFPOD0zNcN9D+SFWaqZ46PXOWk1i/P qRQs4ew7PR9uWdCWuJ+dCZ/epegkw/Yt349jQjYPuo3z7KxEZODtGzGr7MI2NnSdZIRK5NJqB2J PiA11989gCK87ylTGYlbZ0RlaRFkcc3eC5tp11qyOGiW3UxpRTzOmXIel9OfA5xlOs5GHJnrb3k PhLZyoipLTdIAlfYy9LKiiqAAJR8pd4o/ITRHrVwewKImcdFFvMxWGToQxDt+PC4n3Mq3nYk3Mk X-Google-Smtp-Source: AGHT+IHabF5pqAFjUWYISSxmqI352ucWV5AJ52EofqpegSXFLtjrENmjqVDxwaeJ37QzD1+hNQ84fA== X-Received: by 2002:a05:620a:28cc:b0:7b6:6ffc:e972 with SMTP id af79cd13be357-7b6fbec7990mr742243185a.5.1734132611374; Fri, 13 Dec 2024 15:30:11 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:11 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 09/13] bpf: net_sched: Support updating qstats Date: Fri, 13 Dec 2024 23:29:54 +0000 Message-Id: <20241213232958.2388301-10-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Allow bpf qdisc programs to update Qdisc qstats directly with btf struct access. Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 53 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 8 deletions(-) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index b5ac3b9923fb..3901f855effc 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -57,6 +57,7 @@ bpf_qdisc_get_func_proto(enum bpf_func_id func_id, } } +BTF_ID_LIST_SINGLE(bpf_qdisc_ids, struct, Qdisc) BTF_ID_LIST_SINGLE(bpf_sk_buff_ids, struct, sk_buff) BTF_ID_LIST_SINGLE(bpf_sk_buff_ptr_ids, struct, bpf_sk_buff_ptr) @@ -81,20 +82,37 @@ static bool bpf_qdisc_is_valid_access(int off, int size, return bpf_tracing_btf_ctx_access(off, size, type, prog, info); } -static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, - const struct bpf_reg_state *reg, - int off, int size) +static int bpf_qdisc_qdisc_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) { - const struct btf_type *t, *skbt; size_t end; - skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); - t = btf_type_by_id(reg->btf, reg->btf_id); - if (t != skbt) { - bpf_log(log, "only read is supported\n"); + switch (off) { + case offsetof(struct Qdisc, qstats) ... offsetofend(struct Qdisc, qstats) - 1: + end = offsetofend(struct Qdisc, qstats); + break; + default: + bpf_log(log, "no write support to Qdisc at off %d\n", off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of Qdisc ended at %zu\n", + off, size, end); return -EACCES; } + return 0; +} + +static int bpf_qdisc_sk_buff_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + size_t end; + switch (off) { case offsetof(struct sk_buff, tstamp): end = offsetofend(struct sk_buff, tstamp); @@ -136,6 +154,25 @@ static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, return 0; } +static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + const struct btf_type *t, *skbt, *qdisct; + + skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); + qdisct = btf_type_by_id(reg->btf, bpf_qdisc_ids[0]); + t = btf_type_by_id(reg->btf, reg->btf_id); + + if (t == skbt) + return bpf_qdisc_sk_buff_access(log, reg, off, size); + else if (t == qdisct) + return bpf_qdisc_qdisc_access(log, reg, off, size); + + bpf_log(log, "only read is supported\n"); + return -EACCES; +} + __bpf_kfunc_start_defs(); /* bpf_skb_get_hash - Get the flow hash of an skb. From patchwork Fri Dec 13 23:29:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908032 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F3781F4288 for ; Fri, 13 Dec 2024 23:30:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132615; cv=none; b=qAKnlOjZ2FmOaL4FJekU9/dbhzuHivsfTHWFOYNBDrTnqGXuir01bsi1V4bKLFlDrgseNC/pWUxXvFaYtyiI2X+JS/lvjGJREEWfkivZNlv1qgyFhn93OOVCMvuC19zd7WaClsxwbWdHqrVk6X4clv6M+3P0MbPncV5eUrid8qw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132615; c=relaxed/simple; bh=/0LCHSqF8KD3uMi26Ke2tYmcboVwQRw3xrgQ0ODt+Ao=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Fka19lVh5qXqfMPz4IxEWw3Ob7HAcD/x4fgdyACVWEUGMcd4G7z8dc8vITdV6kioFeCBxcARNw9oNI8X9N/PC5KyHMe5yH+e5PJkPadbgWq2A7+KdUT7bME4qSthU7AjQyTNTfmw3w5RTziJcZPDDAR53/WSQwF4h+VzXMZy2ZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=fGwY7Nxb; arc=none smtp.client-ip=209.85.222.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="fGwY7Nxb" Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-7b6f1b54dc3so312449485a.1 for ; Fri, 13 Dec 2024 15:30:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132612; x=1734737412; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=C4MtU8Of42Vm+hc01puGWyYYlZFozHrm711A1SI/d2A=; b=fGwY7NxbOXEnbx4jxWu9lZYXo1+i/MMi/UGU6GB7OxiiTNY/UosKQfKXAPZsLCnVEZ iIy5w9m/C9OkJg8wB83SzZNUVgqtw846+rTpkC+KdOeY3ZGkeTX+JmaZgLQhOzeFpW8T jtnbCok440CxGAMbIAHX+JyDfn5d1/nYwaTouJi3L8Rx2sQaGw6cqKC67PjCurzNTqw8 Rkb/Wui5J3aE8d/RrehjOxj8YEOZyFiwhxvpg/4fNpDUPpSNIK2nIpjGTxwh2K6NqU45 8NNK5w4q0H8VTYAE2z5RxreVYseObPgqThnKGMohiVJtOudtUvWU2pjydBfmPt6ZHvKw EIKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132612; x=1734737412; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C4MtU8Of42Vm+hc01puGWyYYlZFozHrm711A1SI/d2A=; b=OE+ws49gIUvvo8Nzk8HCxEoAfCQVZMICbtkYN8ehsndSjCF8vu41WjEtPSYFYMBjlB 3L6g+0K3LzsAIv5MT7mC8+naqsHXeJUYkYpuddZ7g6coIUDwtlUy01/GvrXSpXfBmWo+ vbHrqqpUm1OmCBzWxHsmsVgJpOIJcwCUb6MS7sZDD5iHwCaFlPePAfAc+Lm/70EaGOIc Aqhy2jhqk5KmT621+8NJK6MUGoi6sUoEVMMcXH7/QsR8mnSWibcBhNJOWmpAu+T2SW1F iliGz1LTWNLNuL9MM1HmjTK2UkLVVJgWFv7yR7Ivh/UDSFvkC9jk8hDWVBPtMjaVnD18 WO7Q== X-Gm-Message-State: AOJu0Yw/PkCxd1B0PcJsZY4h2gL0HVtQ8iJYOJjoysm2JWNku/vfajww tbRz9EEfGFj2hynBQRNGrulhjYkkq6lQbo5nax1nUcv4IZYXqs9k47v9ghS1AcM= X-Gm-Gg: ASbGnctpXpQ/FKbODUqRBjmQYw7AVGrW19YATc+fDDX/K5Obrkx6uDsnsahUL94RW9j TX5V1J4W4+gApuLMoe+/zXxKSCW7FFzQZJO0YYJvwik8JQJaF0MByI7XcKX+nynCYfyw8vtV3V/ 35QMBdJq9n/MqzVsRBKbwmPrZpUCgZcAwS+tOKozeW1m8tbUiUuLsl/Z/OZfBI7fmVntLBqRvyS giZ9Neu470F5zxkTWap2Kdzo2gnGthWU4jBflyhezzBHDtdcaLff7+GXvrUDBoevCevoVYGqmHH X-Google-Smtp-Source: AGHT+IGhk3WCBCy2FnCcgGdTSmylboXkEW41jLfVsnBq+KCm46Gw7HQPdPsfaiQEvFTTaC21Z5+zCA== X-Received: by 2002:a05:620a:45aa:b0:7b6:ecaa:9633 with SMTP id af79cd13be357-7b6fbecc538mr761652485a.7.1734132612728; Fri, 13 Dec 2024 15:30:12 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:11 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 10/13] bpf: net_sched: Allow writing to more Qdisc members Date: Fri, 13 Dec 2024 23:29:55 +0000 Message-Id: <20241213232958.2388301-11-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Allow bpf qdisc to write to Qdisc->limit and Qdisc->q.qlen. Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 3901f855effc..1caa9f696d2d 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -89,6 +89,12 @@ static int bpf_qdisc_qdisc_access(struct bpf_verifier_log *log, size_t end; switch (off) { + case offsetof(struct Qdisc, limit): + end = offsetofend(struct Qdisc, limit); + break; + case offsetof(struct Qdisc, q) + offsetof(struct qdisc_skb_head, qlen): + end = offsetof(struct Qdisc, q) + offsetofend(struct qdisc_skb_head, qlen); + break; case offsetof(struct Qdisc, qstats) ... offsetofend(struct Qdisc, qstats) - 1: end = offsetofend(struct Qdisc, qstats); break; From patchwork Fri Dec 13 23:29:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908033 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C9841F4721 for ; Fri, 13 Dec 2024 23:30:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132617; cv=none; b=YsSkpD2Rw8MjpOzB1MoPBn4BXQWajVcM1w0gLNlI1PGDwdbRfTHaPhAZpAQidvId03g9oDk+ZCHPGFnZWp+3h6F7kIsSKE2OeUANwVe+64ZM2WeWm3yIkZrbv+G1JJolkJorFhqR+4+BUz4rSsQUrlUfJXrf/gSrXtI1PoOMQsE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132617; c=relaxed/simple; bh=6fNr52hIlUdxuXN5zxa6X9cf2YP/mL6odLZiCeVZxms=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GxseOrtjywVVuXImGKYOoNaKDuWgW6Ad/m4Dg+hz5umWKNIcfKITngLEvr9MGbgX72s9if76kOd5XwsCFnfeAGRoSLmi9naRyJmiHcGtDy675LUcig5NzZGWyx7pxP857Sts7aU1x/VrSekIzftgTOh7vR2y8UOKYVoaFCbKQD0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=GONLaLu4; arc=none smtp.client-ip=209.85.222.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="GONLaLu4" Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-7b6f1b54dc3so312450985a.1 for ; Fri, 13 Dec 2024 15:30:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132614; x=1734737414; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vD/XW9sp24gNBySE5fg54poVHx1ltK4R3XGpnG2I6f0=; b=GONLaLu42fLNpOejTg+0sAe+bKVDz++4hdF6BTOkUekwyle7PalHFmhbTOboIgzMaF ZfNhrCrCB0wilKcCsCXtJ6GFTGFtFThH1Y+kvlVsuZ//+ot+k7uB4nC9PqhRSVMqGRWr RHEwRsooS3rBaWzasy2UhQw4VkMn7VMgXpyt/K5xLR9C65pciROJJvLGAm9KQ1TjFoHU Zi3UWnVDPIreG+k6I4fmPE0Tgc+KGNqd7DjXTCfpJEgNu1h5AFYFsz4p0s4zHZmYRUUJ kNhFFMPGV0s9KX2po3meG/vYCQYHYUlJXej0I8tEGE7AHE/U5eVd71uanumCYRF8pniC gUXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132614; x=1734737414; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vD/XW9sp24gNBySE5fg54poVHx1ltK4R3XGpnG2I6f0=; b=SLFDxhV2G03Xd+lovlgcMwfQFePhUrG5dXY1ah63+XnPZJzemOqTmX89fDqPDUO/lg 1kJw3KsWteF4Hz1zmRl3Lw8VMqEL8z8Mtdhpjp7TSo5GjtKhC3OmfJyOjsX6ZZuHQdbH 5ePuMmlaMtBPkVMFw0i2va4F/PwLuGPUjHXbd//BJNnNx6uTKYoFUZh9SpIEYZ6qQNJ2 D3ciPUfbkE3wIqOJ7lZMikl4ixs6GZqYLIaWbM3ygx7NBeHhRjZATAcX7UtE0IXi716f CsYR/Nzf7mG+upJKM09v58VsCNTEIefBXFwD50IRACJUgiRbfxm2vrpKJhQ3agUsT6Ow KiMw== X-Gm-Message-State: AOJu0YxoMxEkt9aOccLWIruBbsq09eMz/EcpgdrxGLGvJIKZzZE8R/fM mg49Bu8w3RTJ7R+nkJ9U3Wk1iBHEwNHsV7Natr0eFYsrzL10D5lSs8oVa0rJB5w= X-Gm-Gg: ASbGncsXc1YTmtzM3x/PCbALYifHlGGfRvwcHrAMBbG3V/s0oHFaQeNdCXh2tF0MNk8 P/2Eb6tpoPxHTHQY1L5Nq0eIpUNgDc+TRrUI60+ZJMAAtn5U6evJp+E9iMihbBYhn/1fE4Oby2w i27HRaHerl9Mn8O8IUaMDPHY+AFlZmA7oT8UxBUxRNvAQkRwB0lhyyj8dcbHejZIw98FIHEdaXS GX+EiI1lZ75kDRSs3UFsASQUhBBnYdsB159cotf7SiERZTaVWKLoCiVjKbwGpX0qkklxmgpDHUV X-Google-Smtp-Source: AGHT+IH+TpyKTKM0jmg1lEPLJifd6b/6oH7jA2xJzXnFaRGp+6ziYkg/ksfZuux8T2i799ZjVmcLsg== X-Received: by 2002:a05:620a:2986:b0:7b6:d4df:2890 with SMTP id af79cd13be357-7b6fbecc8abmr584177785a.4.1734132614104; Fri, 13 Dec 2024 15:30:14 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:13 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 11/13] libbpf: Support creating and destroying qdisc Date: Fri, 13 Dec 2024 23:29:56 +0000 Message-Id: <20241213232958.2388301-12-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Extend struct bpf_tc_hook with handle, qdisc name and a new attach type, BPF_TC_QDISC, to allow users to add or remove any qdisc specified in addition to clsact. Signed-off-by: Amery Hung --- tools/lib/bpf/libbpf.h | 5 ++++- tools/lib/bpf/netlink.c | 20 +++++++++++++++++--- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index b2ce3a72b11d..b05d95814776 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1268,6 +1268,7 @@ enum bpf_tc_attach_point { BPF_TC_INGRESS = 1 << 0, BPF_TC_EGRESS = 1 << 1, BPF_TC_CUSTOM = 1 << 2, + BPF_TC_QDISC = 1 << 3, }; #define BPF_TC_PARENT(a, b) \ @@ -1282,9 +1283,11 @@ struct bpf_tc_hook { int ifindex; enum bpf_tc_attach_point attach_point; __u32 parent; + __u32 handle; + char *qdisc; size_t :0; }; -#define bpf_tc_hook__last_field parent +#define bpf_tc_hook__last_field qdisc struct bpf_tc_opts { size_t sz; diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 68a2def17175..72db8c0add21 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -529,9 +529,9 @@ int bpf_xdp_query_id(int ifindex, int flags, __u32 *prog_id) } -typedef int (*qdisc_config_t)(struct libbpf_nla_req *req); +typedef int (*qdisc_config_t)(struct libbpf_nla_req *req, struct bpf_tc_hook *hook); -static int clsact_config(struct libbpf_nla_req *req) +static int clsact_config(struct libbpf_nla_req *req, struct bpf_tc_hook *hook) { req->tc.tcm_parent = TC_H_CLSACT; req->tc.tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0); @@ -539,6 +539,16 @@ static int clsact_config(struct libbpf_nla_req *req) return nlattr_add(req, TCA_KIND, "clsact", sizeof("clsact")); } +static int qdisc_config(struct libbpf_nla_req *req, struct bpf_tc_hook *hook) +{ + char *qdisc = OPTS_GET(hook, qdisc, NULL); + + req->tc.tcm_parent = OPTS_GET(hook, parent, TC_H_ROOT); + req->tc.tcm_handle = OPTS_GET(hook, handle, 0); + + return nlattr_add(req, TCA_KIND, qdisc, strlen(qdisc) + 1); +} + static int attach_point_to_config(struct bpf_tc_hook *hook, qdisc_config_t *config) { @@ -552,6 +562,9 @@ static int attach_point_to_config(struct bpf_tc_hook *hook, return 0; case BPF_TC_CUSTOM: return -EOPNOTSUPP; + case BPF_TC_QDISC: + *config = &qdisc_config; + return 0; default: return -EINVAL; } @@ -596,7 +609,7 @@ static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags) req.tc.tcm_family = AF_UNSPEC; req.tc.tcm_ifindex = OPTS_GET(hook, ifindex, 0); - ret = config(&req); + ret = config(&req, hook); if (ret < 0) return ret; @@ -639,6 +652,7 @@ int bpf_tc_hook_destroy(struct bpf_tc_hook *hook) case BPF_TC_INGRESS: case BPF_TC_EGRESS: return libbpf_err(__bpf_tc_detach(hook, NULL, true)); + case BPF_TC_QDISC: case BPF_TC_INGRESS | BPF_TC_EGRESS: return libbpf_err(tc_qdisc_delete(hook)); case BPF_TC_CUSTOM: From patchwork Fri Dec 13 23:29:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908034 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DC3F1F2C5C for ; Fri, 13 Dec 2024 23:30:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132619; cv=none; b=fhZ9ZxdN3ISOZQsIqBLRZJJv6TN0JL+CT+thRMCgYnEBNSpZNlIsJiLn+wX5MgoJv6CiLfGlT0GKgMVphrdzQN37n3QHRl3NSg996e9LWxFAXjl9YP/Jap1qpehnnu6Que5/Nc10DnegiFO3Nx72ScUeKwfDytgx12+3/lBjuK4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132619; c=relaxed/simple; bh=zkOCx9fEVcuuqsnmNZ34ovCXQmFDFRNUTii418FycEc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lSGkuteH/LnJH7Aj7SBy9/icoVv8IzHFryv63ri3tngeclBuNmP7KrojL70RQRLBuiY4i4KIlDJ99SBVqF6kGKD9fWKUT9nuuU7jWFKpevHVszXSk17s/U3YZO7C/S3bEjtyRKaUBx3ImZezID14dnR0AwnDjBm8lAceU90Wpvg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=amS6IQYG; arc=none smtp.client-ip=209.85.222.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="amS6IQYG" Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-7b68e73188cso268417285a.0 for ; Fri, 13 Dec 2024 15:30:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132616; x=1734737416; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rkJpbIrUvCjlO7FumYI/XkKLoDSIiksX//uFYPHYx0Q=; b=amS6IQYGClvxA8RFe1iFPY7ljyk5W/wX6w7EeAV8KKnlI9Z3BLqsnQTQvB0BqAzLee H5eaU8fevPtVQI8xHGl/jqD3mh7my0bt9C6KSZivpBMGBtP721605l7PzKsim83Y47VR KTu4RbNHMCRAKyxn4+QR2GI17eO76m6614YMhN5RaOtVECt3IfT93+4ULUgigW/yWSzy TI6rQrrmqtzzD5pRMGAxV10KMv3k+YqfLjxkRTKRvBJXjqSQJAtU3hCd1VDhDNKlRlt/ zu4h+SJSXGcUp4ZG95Vu4k5Z3i+jZBnPxMSXzCYN8aSnMA38s4FFrdi3yIoku1krkvMo LZqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132616; x=1734737416; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rkJpbIrUvCjlO7FumYI/XkKLoDSIiksX//uFYPHYx0Q=; b=tywfl8bT/4kcQWWi/myhhfP9/4fzggLgAOnbHksjNwKSM7+rpkO9pP1o8VRPqyW21B Qad4RanwtyvldsEKDJHISxadXqlTHH80ribDXBtH6GvM4tHqqJBypvy8GtT3hh32VC2u JJtAs2a6Jo4ixmsSW39A36EPjBrOVqdTA46hfm3OSxC3U8usWYIL4FuGKLZOeihLtS69 X0BpCm2FKazxyLYonff+Hr5MvdJKfMFehg8FYL6qjTFnjGj3zp/ZA2chQwS2g34CgjsR f6MkzPi8/sH8vP3GfjGkPt/jzR1CpRUbc59vCbgvhZFXJ1IHGxcQTTfGcoNhLm2RgDNe VBWw== X-Gm-Message-State: AOJu0Yxp6ZJ0g3TBve1p/BPr+A0VX92Q9f4NVBIbbLCzbGQbIv8wXo0N amUtdlO8MWLbnJ3qEGCU97PoQc3qEBwmj1fdxCXTYkf5D/MH3q8IRNWVtcTgml8= X-Gm-Gg: ASbGncvdfWx0AYwPc8qu/3zBcBlwdimeA6UUlveq64vzGDVqvAUp5onBeog6VxVHZEU 75Hb4QtefnDXqiNaX0FkYhUBbsoLBFxjpmvjLjFIvswzSrJI05TYEb4BBTFKtI2a2zGPUHtxzJo qdh9ICYuKF4CKV+KSvQINNmtx78eIKXKVvuYlv/hWlp+MPzBrnZ7xsxRr0SKR35xPmWOWAkZ4HZ pXrU+2KEufdk1956RiQ92e2cNvmn17/uKIQG1/eclt20tWI/TY8rcHZNgG5DqboQMWfkxu+r4/I X-Google-Smtp-Source: AGHT+IF6hkt0GS8haa9WAqQFP726yw1CCl2JAzNKOkLXC1pdkOhPxJQJf4vvq8nhh5fsfj18QCulcw== X-Received: by 2002:a05:620a:27d1:b0:7b6:d0bd:c7e6 with SMTP id af79cd13be357-7b6fbf15179mr660740385a.32.1734132616104; Fri, 13 Dec 2024 15:30:16 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:15 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 12/13] selftests: Add a basic fifo qdisc test Date: Fri, 13 Dec 2024 23:29:57 +0000 Message-Id: <20241213232958.2388301-13-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net This selftest shows a bare minimum fifo qdisc, which simply enqueues skbs into the back of a bpf list and dequeues from the front of the list. Signed-off-by: Amery Hung --- tools/testing/selftests/bpf/config | 1 + .../selftests/bpf/prog_tests/bpf_qdisc.c | 161 ++++++++++++++++++ .../selftests/bpf/progs/bpf_qdisc_common.h | 27 +++ .../selftests/bpf/progs/bpf_qdisc_fifo.c | 117 +++++++++++++ 4 files changed, 306 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_common.h create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index 4ca84c8d9116..cf35e7e473d4 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -70,6 +70,7 @@ CONFIG_NET_IPGRE=y CONFIG_NET_IPGRE_DEMUX=y CONFIG_NET_IPIP=y CONFIG_NET_MPLS_GSO=y +CONFIG_NET_SCH_BPF=y CONFIG_NET_SCH_FQ=y CONFIG_NET_SCH_INGRESS=y CONFIG_NET_SCHED=y diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c new file mode 100644 index 000000000000..295d0216e70f --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -0,0 +1,161 @@ +#include +#include +#include + +#include "network_helpers.h" +#include "bpf_qdisc_fifo.skel.h" + +#ifndef ENOTSUPP +#define ENOTSUPP 524 +#endif + +#define LO_IFINDEX 1 + +static const unsigned int total_bytes = 10 * 1024 * 1024; +static int stop; + +static void *server(void *arg) +{ + int lfd = (int)(long)arg, err = 0, fd; + ssize_t nr_sent = 0, bytes = 0; + char batch[1500]; + + fd = accept(lfd, NULL, NULL); + while (fd == -1) { + if (errno == EINTR) + continue; + err = -errno; + goto done; + } + + if (settimeo(fd, 0)) { + err = -errno; + goto done; + } + + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_sent = send(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_sent == -1 && errno == EINTR) + continue; + if (nr_sent == -1) { + err = -errno; + break; + } + bytes += nr_sent; + } + + ASSERT_EQ(bytes, total_bytes, "send"); + +done: + if (fd >= 0) + close(fd); + if (err) { + WRITE_ONCE(stop, 1); + return ERR_PTR(err); + } + return NULL; +} + +static void do_test(char *qdisc) +{ + DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = LO_IFINDEX, + .attach_point = BPF_TC_QDISC, + .parent = TC_H_ROOT, + .handle = 0x8000000, + .qdisc = qdisc); + struct sockaddr_in6 sa6 = {}; + ssize_t nr_recv = 0, bytes = 0; + int lfd = -1, fd = -1; + pthread_t srv_thread; + socklen_t addrlen = sizeof(sa6); + void *thread_ret; + char batch[1500]; + int err; + + WRITE_ONCE(stop, 0); + + err = bpf_tc_hook_create(&hook); + if (!ASSERT_OK(err, "attach qdisc")) + return; + + lfd = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_NEQ(lfd, -1, "socket")) { + bpf_tc_hook_destroy(&hook); + return; + } + + fd = socket(AF_INET6, SOCK_STREAM, 0); + if (!ASSERT_NEQ(fd, -1, "socket")) { + bpf_tc_hook_destroy(&hook); + close(lfd); + return; + } + + if (settimeo(lfd, 0) || settimeo(fd, 0)) + goto done; + + err = getsockname(lfd, (struct sockaddr *)&sa6, &addrlen); + if (!ASSERT_NEQ(err, -1, "getsockname")) + goto done; + + /* connect to server */ + err = connect(fd, (struct sockaddr *)&sa6, addrlen); + if (!ASSERT_NEQ(err, -1, "connect")) + goto done; + + err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd); + if (!ASSERT_OK(err, "pthread_create")) + goto done; + + /* recv total_bytes */ + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_recv = recv(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_recv == -1 && errno == EINTR) + continue; + if (nr_recv == -1) + break; + bytes += nr_recv; + } + + ASSERT_EQ(bytes, total_bytes, "recv"); + + WRITE_ONCE(stop, 1); + pthread_join(srv_thread, &thread_ret); + ASSERT_OK(IS_ERR(thread_ret), "thread_ret"); + +done: + close(lfd); + close(fd); + + bpf_tc_hook_destroy(&hook); + return; +} + +static void test_fifo(void) +{ + struct bpf_qdisc_fifo *fifo_skel; + struct bpf_link *link; + + fifo_skel = bpf_qdisc_fifo__open_and_load(); + if (!ASSERT_OK_PTR(fifo_skel, "bpf_qdisc_fifo__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(fifo_skel->maps.fifo); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_fifo__destroy(fifo_skel); + return; + } + + do_test("bpf_fifo"); + + bpf_link__destroy(link); + bpf_qdisc_fifo__destroy(fifo_skel); +} + +void test_bpf_qdisc(void) +{ + if (test__start_subtest("fifo")) + test_fifo(); +} diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h new file mode 100644 index 000000000000..62a778f94908 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h @@ -0,0 +1,27 @@ +#ifndef _BPF_QDISC_COMMON_H +#define _BPF_QDISC_COMMON_H + +#define NET_XMIT_SUCCESS 0x00 +#define NET_XMIT_DROP 0x01 /* skb dropped */ +#define NET_XMIT_CN 0x02 /* congestion notification */ + +#define TC_PRIO_CONTROL 7 +#define TC_PRIO_MAX 15 + +u32 bpf_skb_get_hash(struct sk_buff *p) __ksym; +void bpf_kfree_skb(struct sk_buff *p) __ksym; +void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym; +void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) __ksym; +void bpf_qdisc_bstats_update(struct Qdisc *sch, const struct sk_buff *skb) __ksym; + +static struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb) +{ + return (struct qdisc_skb_cb *)skb->cb; +} + +static inline unsigned int qdisc_pkt_len(const struct sk_buff *skb) +{ + return qdisc_skb_cb(skb)->pkt_len; +} + +#endif diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c new file mode 100644 index 000000000000..705e7da325da --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c @@ -0,0 +1,117 @@ +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +struct skb_node { + struct sk_buff __kptr * skb; + struct bpf_list_node node; +}; + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock q_fifo_lock; +private(A) struct bpf_list_head q_fifo __contains(skb_node, node); + +SEC("struct_ops/bpf_fifo_enqueue") +int BPF_PROG(bpf_fifo_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + struct skb_node *skbn; + u32 pkt_len; + + if (sch->q.qlen == sch->limit) + goto drop; + + skbn = bpf_obj_new(typeof(*skbn)); + if (!skbn) + goto drop; + + pkt_len = qdisc_pkt_len(skb); + + sch->q.qlen++; + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_qdisc_skb_drop(skb, to_free); + + bpf_spin_lock(&q_fifo_lock); + bpf_list_push_back(&q_fifo, &skbn->node); + bpf_spin_unlock(&q_fifo_lock); + + sch->qstats.backlog += pkt_len; + return NET_XMIT_SUCCESS; +drop: + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; +} + +SEC("struct_ops/bpf_fifo_dequeue") +struct sk_buff *BPF_PROG(bpf_fifo_dequeue, struct Qdisc *sch) +{ + struct bpf_list_node *node; + struct sk_buff *skb = NULL; + struct skb_node *skbn; + + bpf_spin_lock(&q_fifo_lock); + node = bpf_list_pop_front(&q_fifo); + bpf_spin_unlock(&q_fifo_lock); + if (!node) + return NULL; + + skbn = container_of(node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + if (!skb) + return NULL; + + sch->qstats.backlog -= qdisc_pkt_len(skb); + bpf_qdisc_bstats_update(sch, skb); + sch->q.qlen--; + + return skb; +} + +SEC("struct_ops/bpf_fifo_init") +int BPF_PROG(bpf_fifo_init, struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + sch->limit = 1000; + return 0; +} + +SEC("struct_ops/bpf_fifo_reset") +void BPF_PROG(bpf_fifo_reset, struct Qdisc *sch) +{ + struct bpf_list_node *node; + struct skb_node *skbn; + int i; + + bpf_for(i, 0, sch->q.qlen) { + struct sk_buff *skb = NULL; + + bpf_spin_lock(&q_fifo_lock); + node = bpf_list_pop_front(&q_fifo); + bpf_spin_unlock(&q_fifo_lock); + + if (!node) + break; + + skbn = container_of(node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_kfree_skb(skb); + bpf_obj_drop(skbn); + } + sch->q.qlen = 0; +} + +SEC(".struct_ops") +struct Qdisc_ops fifo = { + .enqueue = (void *)bpf_fifo_enqueue, + .dequeue = (void *)bpf_fifo_dequeue, + .init = (void *)bpf_fifo_init, + .reset = (void *)bpf_fifo_reset, + .id = "bpf_fifo", +}; + From patchwork Fri Dec 13 23:29:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13908035 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E79C1F4E3D for ; Fri, 13 Dec 2024 23:30:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132621; cv=none; b=qtlJyRn/yK4GcKpWmADUzX23bbq0KNGyqJFunE041qGMp3EeOcCwc1/3UXGpOkcFWle/EtUfVzl7U3h9QORwl1HIHU+OPOVd6AOfU+EzLIm5FjZfL7tjPsLRIb2FW3uo3jQOf5+VytJoubdlXU1qHiPAGDxK33gEv8/05wJCock= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734132621; c=relaxed/simple; bh=0noguP3zjld9GiEHRIFbMKylla/FhFnEevMAAX/vFg8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dF+S6Sw9HAusrBjSGMXwl2A9J1DQfeH7qlmk7q1VvRAN4qtC3+WZbMR48aAbrudasuozHL630PWSF9cpNFOq87eHMY8q/U3xkLhDbUQ64YMk4o06GaAB2x7S/wpNrnEZwj0bE9TexpQMkK9V+7sGiAjx3gjFThwEyyYxk1uyg70= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=K8Jdd/Be; arc=none smtp.client-ip=209.85.222.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="K8Jdd/Be" Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-7b6f0afda3fso244190685a.2 for ; Fri, 13 Dec 2024 15:30:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1734132618; x=1734737418; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VIFPC07P1Q0CCoTDB6RvwVIWQKW0l1sPfgGTVOz2W7g=; b=K8Jdd/BerwO0ZYNL/+qngfkp3tStBlFVkmE93Sl21EudEZ+Sft972t7XvUTeaSQzZp E/s4DgDhyBMNz2ibxvis8rFRx3Qe+FYLiquUzYMzj077kOlnJQMRT758QWo41WWhY+uw QeYHIetu2ha0j356WxKaSqPRyk7X6w/nOHcyk3/Lexg8exzxhCnoqMbI277K5MZps5im A4URSYbbAd52DQ9Yelqv3JrnAGy9w4ixyF1cbzdF0pjdu5qNCmbohe/WNBdZYLQm2SVh CwIVkRa1Z3u3cabu2UXLW9/WEqel6ftTM8l4SDXBEGrMBLtnC7EXOKqFGpJ5Nrrb4vak sGTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734132618; x=1734737418; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VIFPC07P1Q0CCoTDB6RvwVIWQKW0l1sPfgGTVOz2W7g=; b=njfZzvDZNKFxBxez5uBsK5+I8R8zrlwQ2RsHJ+7Wyn0/dfIIkb3J6zqF+cYTbPM5SI zAaB56iB8XaBiVfvJJZ1HGVAoxcJ6pvOItEFg6lTMe0/DaFAn/J37xkpwtpTQBu0bQYn qN4P3Udt1LyG2wK6YHuSu9aG2fHfLuMkT5GrSjAD/+o7ypYpo54md1kPXw5/U7jrHkX4 a/eD5qjxq+VSZju8QaVE9dTLpYzKWpEsDo1JV5KOWTWJ/vNdYwP4eCMTHAFskE1NvDeY YZWp7ffWjKoAltcWyBKneVdfKfWGZ/9Erqq5GzlUVDzmh2vcSMKMYXJfKud9g0BX9tiB JlEw== X-Gm-Message-State: AOJu0YxvXkH7t1oF267lD1PctjwffDuM+ysCgSzngFAeVWda2N/99uCq ILs1wF5YwbtMrQA0IrrANOhNg1DiueAZlILMw0VIp1ikZJgTHMoCTuaxp0OJkUvUnanFFZdjI84 XVKA= X-Gm-Gg: ASbGncsjwyINYc/6WiRxJe5ZwfJzAZyLdGaspvMYcUFtIareguYD7/PKGTGdUalcFpb g6y4pKT+lqA1aVj6/xDzdHxyighwKZQnRLKKBJ+CtE+KOUNs9BN2RVtkptuP3EGPEFBAx9Jz4q0 E2yxkxeNo0GM8VKJRKenmzT4P8S1rLxUjF9F835m8l/bNHM0DR25JW1m5oAT1A7+lCoNhLRy+9n y4zOUgS/xx/wVQsmF4cbTCDvJOMtHmaLmiiQlRodRm/m/d6kcOk3JybxC/63Fv3cHMACHkmI7w7 X-Google-Smtp-Source: AGHT+IH6gSwiznCR0Dpe8a5c7WGQJrwyPkLvMCCpyt56cBESbnGFqFHNRNCoKq25icY0CuKdVaOnqg== X-Received: by 2002:a05:620a:2443:b0:7b6:d736:55c1 with SMTP id af79cd13be357-7b6fbf3bc32mr623056685a.48.1734132618129; Fri, 13 Dec 2024 15:30:18 -0800 (PST) Received: from n36-183-057.byted.org ([130.44.215.64]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b7047d4a20sm25805085a.39.2024.12.13.15.30.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Dec 2024 15:30:17 -0800 (PST) From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [PATCH bpf-next v1 13/13] selftests: Add a bpf fq qdisc to selftest Date: Fri, 13 Dec 2024 23:29:58 +0000 Message-Id: <20241213232958.2388301-14-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20241213232958.2388301-1-amery.hung@bytedance.com> References: <20241213232958.2388301-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net This test implements a more sophisticated qdisc using bpf. The bpf fair- queueing (fq) qdisc gives each flow an equal chance to transmit data. It also respects the timestamp of skb for rate limiting. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 24 + .../selftests/bpf/progs/bpf_qdisc_fq.c | 726 ++++++++++++++++++ 2 files changed, 750 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c index 295d0216e70f..394bf5a4adae 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -4,6 +4,7 @@ #include "network_helpers.h" #include "bpf_qdisc_fifo.skel.h" +#include "bpf_qdisc_fq.skel.h" #ifndef ENOTSUPP #define ENOTSUPP 524 @@ -154,8 +155,31 @@ static void test_fifo(void) bpf_qdisc_fifo__destroy(fifo_skel); } +static void test_fq(void) +{ + struct bpf_qdisc_fq *fq_skel; + struct bpf_link *link; + + fq_skel = bpf_qdisc_fq__open_and_load(); + if (!ASSERT_OK_PTR(fq_skel, "bpf_qdisc_fq__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(fq_skel->maps.fq); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_fq__destroy(fq_skel); + return; + } + + do_test("bpf_fq"); + + bpf_link__destroy(link); + bpf_qdisc_fq__destroy(fq_skel); +} + void test_bpf_qdisc(void) { if (test__start_subtest("fifo")) test_fifo(); + if (test__start_subtest("fq")) + test_fq(); } diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c new file mode 100644 index 000000000000..38a72fde3c5a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c @@ -0,0 +1,726 @@ +#include +#include +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +#define NSEC_PER_USEC 1000L +#define NSEC_PER_SEC 1000000000L + +#define NUM_QUEUE (1 << 20) + +struct fq_bpf_data { + u32 quantum; + u32 initial_quantum; + u32 flow_refill_delay; + u32 flow_plimit; + u64 horizon; + u32 orphan_mask; + u32 timer_slack; + u64 time_next_delayed_flow; + u64 unthrottle_latency_ns; + u8 horizon_drop; + u32 new_flow_cnt; + u32 old_flow_cnt; + u64 ktime_cache; +}; + +enum { + CLS_RET_PRIO = 0, + CLS_RET_NONPRIO = 1, + CLS_RET_ERR = 2, +}; + +struct skb_node { + u64 tstamp; + struct sk_buff __kptr * skb; + struct bpf_rb_node node; +}; + +struct fq_flow_node { + int credit; + u32 qlen; + u64 age; + u64 time_next_packet; + struct bpf_list_node list_node; + struct bpf_rb_node rb_node; + struct bpf_rb_root queue __contains(skb_node, node); + struct bpf_spin_lock lock; + struct bpf_refcount refcount; +}; + +struct dequeue_nonprio_ctx { + bool stop_iter; + u64 expire; + u64 now; +}; + +struct remove_flows_ctx { + bool gc_only; + u32 reset_cnt; + u32 reset_max; +}; + +struct unset_throttled_flows_ctx { + bool unset_all; + u64 now; +}; + +struct fq_stashed_flow { + struct fq_flow_node __kptr * flow; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u64); + __type(value, struct fq_stashed_flow); + __uint(max_entries, NUM_QUEUE); +} fq_nonprio_flows SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u64); + __type(value, struct fq_stashed_flow); + __uint(max_entries, 1); +} fq_prio_flows SEC(".maps"); + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock fq_delayed_lock; +private(A) struct bpf_rb_root fq_delayed __contains(fq_flow_node, rb_node); + +private(B) struct bpf_spin_lock fq_new_flows_lock; +private(B) struct bpf_list_head fq_new_flows __contains(fq_flow_node, list_node); + +private(C) struct bpf_spin_lock fq_old_flows_lock; +private(C) struct bpf_list_head fq_old_flows __contains(fq_flow_node, list_node); + +private(D) struct fq_bpf_data q; + +/* Wrapper for bpf_kptr_xchg that expects NULL dst */ +static void bpf_kptr_xchg_back(void *map_val, void *ptr) +{ + void *ret; + + ret = bpf_kptr_xchg(map_val, ptr); + if (ret) + bpf_obj_drop(ret); +} + +static bool skbn_tstamp_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct skb_node *skbn_a; + struct skb_node *skbn_b; + + skbn_a = container_of(a, struct skb_node, node); + skbn_b = container_of(b, struct skb_node, node); + + return skbn_a->tstamp < skbn_b->tstamp; +} + +static bool fn_time_next_packet_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct fq_flow_node *flow_a; + struct fq_flow_node *flow_b; + + flow_a = container_of(a, struct fq_flow_node, rb_node); + flow_b = container_of(b, struct fq_flow_node, rb_node); + + return flow_a->time_next_packet < flow_b->time_next_packet; +} + +static void +fq_flows_add_head(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct fq_flow_node *flow, u32 *flow_cnt) +{ + bpf_spin_lock(lock); + bpf_list_push_front(head, &flow->list_node); + bpf_spin_unlock(lock); + *flow_cnt += 1; +} + +static void +fq_flows_add_tail(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct fq_flow_node *flow, u32 *flow_cnt) +{ + bpf_spin_lock(lock); + bpf_list_push_back(head, &flow->list_node); + bpf_spin_unlock(lock); + *flow_cnt += 1; +} + +static void +fq_flows_remove_front(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct bpf_list_node **node, u32 *flow_cnt) +{ + bpf_spin_lock(lock); + *node = bpf_list_pop_front(head); + bpf_spin_unlock(lock); + *flow_cnt -= 1; +} + +static bool +fq_flows_is_empty(struct bpf_list_head *head, struct bpf_spin_lock *lock) +{ + struct bpf_list_node *node; + + bpf_spin_lock(lock); + node = bpf_list_pop_front(head); + if (node) { + bpf_list_push_front(head, node); + bpf_spin_unlock(lock); + return false; + } + bpf_spin_unlock(lock); + + return true; +} + +/* flow->age is used to denote the state of the flow (not-detached, detached, throttled) + * as well as the timestamp when the flow is detached. + * + * 0: not-detached + * 1 - (~0ULL-1): detached + * ~0ULL: throttled + */ +static void fq_flow_set_detached(struct fq_flow_node *flow) +{ + flow->age = bpf_jiffies64(); +} + +static bool fq_flow_is_detached(struct fq_flow_node *flow) +{ + return flow->age != 0 && flow->age != ~0ULL; +} + +static bool sk_listener(struct sock *sk) +{ + return (1 << sk->__sk_common.skc_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV); +} + +static void fq_gc(void); + +static int fq_new_flow(void *flow_map, struct fq_stashed_flow **sflow, u64 hash) +{ + struct fq_stashed_flow tmp = {}; + struct fq_flow_node *flow; + int ret; + + flow = bpf_obj_new(typeof(*flow)); + if (!flow) + return -ENOMEM; + + flow->credit = q.initial_quantum, + flow->qlen = 0, + flow->age = 1, + flow->time_next_packet = 0, + + ret = bpf_map_update_elem(flow_map, &hash, &tmp, 0); + if (ret == -ENOMEM) { + fq_gc(); + bpf_map_update_elem(&fq_nonprio_flows, &hash, &tmp, 0); + } + + *sflow = bpf_map_lookup_elem(flow_map, &hash); + if (!*sflow) { + bpf_obj_drop(flow); + return -ENOMEM; + } + + bpf_kptr_xchg_back(&(*sflow)->flow, flow); + return 0; +} + +static int +fq_classify(struct sk_buff *skb, struct fq_stashed_flow **sflow) +{ + struct sock *sk = skb->sk; + int ret = CLS_RET_NONPRIO; + u64 hash = 0; + + if ((skb->priority & TC_PRIO_MAX) == TC_PRIO_CONTROL) { + *sflow = bpf_map_lookup_elem(&fq_prio_flows, &hash); + ret = CLS_RET_PRIO; + } else { + if (!sk || sk_listener(sk)) { + hash = bpf_skb_get_hash(skb) & q.orphan_mask; + /* Avoid collision with an existing flow hash, which + * only uses the lower 32 bits of hash, by setting the + * upper half of hash to 1. + */ + hash |= (1ULL << 32); + } else if (sk->__sk_common.skc_state == TCP_CLOSE) { + hash = bpf_skb_get_hash(skb) & q.orphan_mask; + hash |= (1ULL << 32); + } else { + hash = sk->__sk_common.skc_hash; + } + *sflow = bpf_map_lookup_elem(&fq_nonprio_flows, &hash); + } + + if (!*sflow) + ret = fq_new_flow(&fq_nonprio_flows, sflow, hash) < 0 ? + CLS_RET_ERR : CLS_RET_NONPRIO; + + return ret; +} + +static bool fq_packet_beyond_horizon(struct sk_buff *skb) +{ + return (s64)skb->tstamp > (s64)(q.ktime_cache + q.horizon); +} + +SEC("struct_ops/bpf_fq_enqueue") +int BPF_PROG(bpf_fq_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + struct fq_flow_node *flow = NULL, *flow_copy; + struct fq_stashed_flow *sflow; + u64 time_to_send, jiffies; + struct skb_node *skbn; + int ret; + + if (sch->q.qlen >= sch->limit) + goto drop; + + if (!skb->tstamp) { + time_to_send = q.ktime_cache = bpf_ktime_get_ns(); + } else { + if (fq_packet_beyond_horizon(skb)) { + q.ktime_cache = bpf_ktime_get_ns(); + if (fq_packet_beyond_horizon(skb)) { + if (q.horizon_drop) + goto drop; + + skb->tstamp = q.ktime_cache + q.horizon; + } + } + time_to_send = skb->tstamp; + } + + ret = fq_classify(skb, &sflow); + if (ret == CLS_RET_ERR) + goto drop; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (!flow) + goto drop; + + if (ret == CLS_RET_NONPRIO) { + if (flow->qlen >= q.flow_plimit) { + bpf_kptr_xchg_back(&sflow->flow, flow); + goto drop; + } + + if (fq_flow_is_detached(flow)) { + flow_copy = bpf_refcount_acquire(flow); + + jiffies = bpf_jiffies64(); + if ((s64)(jiffies - (flow_copy->age + q.flow_refill_delay)) > 0) { + if (flow_copy->credit < q.quantum) + flow_copy->credit = q.quantum; + } + flow_copy->age = 0; + fq_flows_add_tail(&fq_new_flows, &fq_new_flows_lock, flow_copy, + &q.new_flow_cnt); + } + } + + skbn = bpf_obj_new(typeof(*skbn)); + if (!skbn) { + bpf_kptr_xchg_back(&sflow->flow, flow); + goto drop; + } + + skbn->tstamp = skb->tstamp = time_to_send; + + sch->qstats.backlog += qdisc_pkt_len(skb); + + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_qdisc_skb_drop(skb, to_free); + + bpf_spin_lock(&flow->lock); + bpf_rbtree_add(&flow->queue, &skbn->node, skbn_tstamp_less); + bpf_spin_unlock(&flow->lock); + + flow->qlen++; + bpf_kptr_xchg_back(&sflow->flow, flow); + + sch->q.qlen++; + return NET_XMIT_SUCCESS; + +drop: + bpf_qdisc_skb_drop(skb, to_free); + sch->qstats.drops++; + return NET_XMIT_DROP; +} + +static int fq_unset_throttled_flows(u32 index, struct unset_throttled_flows_ctx *ctx) +{ + struct bpf_rb_node *node = NULL; + struct fq_flow_node *flow; + + bpf_spin_lock(&fq_delayed_lock); + + node = bpf_rbtree_first(&fq_delayed); + if (!node) { + bpf_spin_unlock(&fq_delayed_lock); + return 1; + } + + flow = container_of(node, struct fq_flow_node, rb_node); + if (!ctx->unset_all && flow->time_next_packet > ctx->now) { + q.time_next_delayed_flow = flow->time_next_packet; + bpf_spin_unlock(&fq_delayed_lock); + return 1; + } + + node = bpf_rbtree_remove(&fq_delayed, &flow->rb_node); + + bpf_spin_unlock(&fq_delayed_lock); + + if (!node) + return 1; + + flow = container_of(node, struct fq_flow_node, rb_node); + flow->age = 0; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow, &q.old_flow_cnt); + + return 0; +} + +static void fq_flow_set_throttled(struct fq_flow_node *flow) +{ + flow->age = ~0ULL; + + if (q.time_next_delayed_flow > flow->time_next_packet) + q.time_next_delayed_flow = flow->time_next_packet; + + bpf_spin_lock(&fq_delayed_lock); + bpf_rbtree_add(&fq_delayed, &flow->rb_node, fn_time_next_packet_less); + bpf_spin_unlock(&fq_delayed_lock); +} + +static void fq_check_throttled(u64 now) +{ + struct unset_throttled_flows_ctx ctx = { + .unset_all = false, + .now = now, + }; + unsigned long sample; + + if (q.time_next_delayed_flow > now) + return; + + sample = (unsigned long)(now - q.time_next_delayed_flow); + q.unthrottle_latency_ns -= q.unthrottle_latency_ns >> 3; + q.unthrottle_latency_ns += sample >> 3; + + q.time_next_delayed_flow = ~0ULL; + bpf_loop(NUM_QUEUE, fq_unset_throttled_flows, &ctx, 0); +} + +static struct sk_buff* +fq_dequeue_nonprio_flows(u32 index, struct dequeue_nonprio_ctx *ctx) +{ + u64 time_next_packet, time_to_send; + struct bpf_rb_node *rb_node; + struct sk_buff *skb = NULL; + struct bpf_list_head *head; + struct bpf_list_node *node; + struct bpf_spin_lock *lock; + struct fq_flow_node *flow; + struct skb_node *skbn; + bool is_empty; + u32 *cnt; + + if (q.new_flow_cnt) { + head = &fq_new_flows; + lock = &fq_new_flows_lock; + cnt = &q.new_flow_cnt; + } else if (q.old_flow_cnt) { + head = &fq_old_flows; + lock = &fq_old_flows_lock; + cnt = &q.old_flow_cnt; + } else { + if (q.time_next_delayed_flow != ~0ULL) + ctx->expire = q.time_next_delayed_flow; + goto break_loop; + } + + fq_flows_remove_front(head, lock, &node, cnt); + if (!node) + goto break_loop; + + flow = container_of(node, struct fq_flow_node, list_node); + if (flow->credit <= 0) { + flow->credit += q.quantum; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow, &q.old_flow_cnt); + return NULL; + } + + bpf_spin_lock(&flow->lock); + rb_node = bpf_rbtree_first(&flow->queue); + if (!rb_node) { + bpf_spin_unlock(&flow->lock); + is_empty = fq_flows_is_empty(&fq_old_flows, &fq_old_flows_lock); + if (head == &fq_new_flows && !is_empty) { + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow, &q.old_flow_cnt); + } else { + fq_flow_set_detached(flow); + bpf_obj_drop(flow); + } + return NULL; + } + + skbn = container_of(rb_node, struct skb_node, node); + time_to_send = skbn->tstamp; + + time_next_packet = (time_to_send > flow->time_next_packet) ? + time_to_send : flow->time_next_packet; + if (ctx->now < time_next_packet) { + bpf_spin_unlock(&flow->lock); + flow->time_next_packet = time_next_packet; + fq_flow_set_throttled(flow); + return NULL; + } + + rb_node = bpf_rbtree_remove(&flow->queue, rb_node); + bpf_spin_unlock(&flow->lock); + + if (!rb_node) + goto add_flow_and_break; + + skbn = container_of(rb_node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + + if (!skb) + goto add_flow_and_break; + + flow->credit -= qdisc_skb_cb(skb)->pkt_len; + flow->qlen--; + +add_flow_and_break: + fq_flows_add_head(head, lock, flow, cnt); + +break_loop: + ctx->stop_iter = true; + return skb; +} + +static struct sk_buff *fq_dequeue_prio(void) +{ + struct fq_flow_node *flow = NULL; + struct fq_stashed_flow *sflow; + struct bpf_rb_node *rb_node; + struct sk_buff *skb = NULL; + struct skb_node *skbn; + u64 hash = 0; + + sflow = bpf_map_lookup_elem(&fq_prio_flows, &hash); + if (!sflow) + return NULL; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (!flow) + return NULL; + + bpf_spin_lock(&flow->lock); + rb_node = bpf_rbtree_first(&flow->queue); + if (!rb_node) { + bpf_spin_unlock(&flow->lock); + goto out; + } + + skbn = container_of(rb_node, struct skb_node, node); + rb_node = bpf_rbtree_remove(&flow->queue, &skbn->node); + bpf_spin_unlock(&flow->lock); + + if (!rb_node) + goto out; + + skbn = container_of(rb_node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + +out: + bpf_kptr_xchg_back(&sflow->flow, flow); + + return skb; +} + +SEC("struct_ops/bpf_fq_dequeue") +struct sk_buff *BPF_PROG(bpf_fq_dequeue, struct Qdisc *sch) +{ + struct dequeue_nonprio_ctx cb_ctx = {}; + struct sk_buff *skb = NULL; + int i; + + if (!sch->q.qlen) + goto out; + + skb = fq_dequeue_prio(); + if (skb) + goto dequeue; + + q.ktime_cache = cb_ctx.now = bpf_ktime_get_ns(); + fq_check_throttled(q.ktime_cache); + bpf_for(i, 0, sch->limit) { + skb = fq_dequeue_nonprio_flows(i, &cb_ctx); + if (cb_ctx.stop_iter) + break; + }; + +dequeue: + if (skb) { + sch->q.qlen--; + sch->qstats.backlog -= qdisc_pkt_len(skb); + bpf_qdisc_bstats_update(sch, skb); + return skb; + } + + if (cb_ctx.expire) + bpf_qdisc_watchdog_schedule(sch, cb_ctx.expire, q.timer_slack); +out: + return NULL; +} + +static int fq_remove_flows_in_list(u32 index, void *ctx) +{ + struct bpf_list_node *node; + struct fq_flow_node *flow; + + bpf_spin_lock(&fq_new_flows_lock); + node = bpf_list_pop_front(&fq_new_flows); + bpf_spin_unlock(&fq_new_flows_lock); + if (!node) { + bpf_spin_lock(&fq_old_flows_lock); + node = bpf_list_pop_front(&fq_old_flows); + bpf_spin_unlock(&fq_old_flows_lock); + if (!node) + return 1; + } + + flow = container_of(node, struct fq_flow_node, list_node); + bpf_obj_drop(flow); + + return 0; +} + +extern unsigned CONFIG_HZ __kconfig; + +/* limit number of collected flows per round */ +#define FQ_GC_MAX 8 +#define FQ_GC_AGE (3*CONFIG_HZ) + +static bool fq_gc_candidate(struct fq_flow_node *flow) +{ + u64 jiffies = bpf_jiffies64(); + + return fq_flow_is_detached(flow) && + ((s64)(jiffies - (flow->age + FQ_GC_AGE)) > 0); +} + +static int +fq_remove_flows(struct bpf_map *flow_map, u64 *hash, + struct fq_stashed_flow *sflow, struct remove_flows_ctx *ctx) +{ + struct fq_flow_node *flow = NULL; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (flow) { + if (!ctx->gc_only || fq_gc_candidate(flow)) { + bpf_obj_drop(flow); + ctx->reset_cnt++; + } else { + bpf_kptr_xchg_back(&sflow->flow, flow); + } + } + + return ctx->reset_cnt < ctx->reset_max ? 0 : 1; +} + +static void fq_gc(void) +{ + struct remove_flows_ctx cb_ctx = { + .gc_only = true, + .reset_cnt = 0, + .reset_max = FQ_GC_MAX, + }; + + bpf_for_each_map_elem(&fq_nonprio_flows, fq_remove_flows, &cb_ctx, 0); +} + +SEC("struct_ops/bpf_fq_reset") +void BPF_PROG(bpf_fq_reset, struct Qdisc *sch) +{ + struct unset_throttled_flows_ctx utf_ctx = { + .unset_all = true, + }; + struct remove_flows_ctx rf_ctx = { + .gc_only = false, + .reset_cnt = 0, + .reset_max = NUM_QUEUE, + }; + struct fq_stashed_flow *sflow; + u64 hash = 0; + + sch->q.qlen = 0; + sch->qstats.backlog = 0; + + bpf_for_each_map_elem(&fq_nonprio_flows, fq_remove_flows, &rf_ctx, 0); + + rf_ctx.reset_cnt = 0; + bpf_for_each_map_elem(&fq_prio_flows, fq_remove_flows, &rf_ctx, 0); + fq_new_flow(&fq_prio_flows, &sflow, hash); + + bpf_loop(NUM_QUEUE, fq_remove_flows_in_list, NULL, 0); + q.new_flow_cnt = 0; + q.old_flow_cnt = 0; + + bpf_loop(NUM_QUEUE, fq_unset_throttled_flows, &utf_ctx, 0); + + return; +} + +SEC("struct_ops/bpf_fq_init") +int BPF_PROG(bpf_fq_init, struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + struct net_device *dev = sch->dev_queue->dev; + u32 psched_mtu = dev->mtu + dev->hard_header_len; + struct fq_stashed_flow *sflow; + u64 hash = 0; + + if (fq_new_flow(&fq_prio_flows, &sflow, hash) < 0) + return -ENOMEM; + + sch->limit = 10000; + q.initial_quantum = 10 * psched_mtu; + q.quantum = 2 * psched_mtu; + q.flow_refill_delay = 40; + q.flow_plimit = 100; + q.horizon = 10ULL * NSEC_PER_SEC; + q.horizon_drop = 1; + q.orphan_mask = 1024 - 1; + q.timer_slack = 10 * NSEC_PER_USEC; + q.time_next_delayed_flow = ~0ULL; + q.unthrottle_latency_ns = 0ULL; + q.new_flow_cnt = 0; + q.old_flow_cnt = 0; + + return 0; +} + +SEC(".struct_ops") +struct Qdisc_ops fq = { + .enqueue = (void *)bpf_fq_enqueue, + .dequeue = (void *)bpf_fq_dequeue, + .reset = (void *)bpf_fq_reset, + .init = (void *)bpf_fq_init, + .id = "bpf_fq", +};