From patchwork Fri Dec 20 19:55:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917346 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AF452253E7; Fri, 20 Dec 2024 19:56:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724592; cv=none; b=fuKUX/RlNA9LZNnPbpOf0c8giRJUB7zH9ARtC3rAbRTZKNMYFpF0Zgdxl8oKID/j3IcgjW+vDcVw2q6RUL4jV7Iq/WKybvvpYCIA8pgEl7K+t4PLyDoufcbTfWHW49ytxd0O6FQQGjgKqdioQZMhIKrbEPEAseV9ykJR9HsU7D0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724592; c=relaxed/simple; bh=myi40Nh8v64VFaHkmVIShZ8caoMm3/T6dlJdNLjNHEk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MAnkR70MNlN3dVSfWAq+duusPubX3ZzJBh6hp7xB7UjyTyQEeXAXt25IFOqf+6If6SNwh16bZq0gf8+MACEozJb0xJ5m1lWO3WwEQ1qpvMwgS5yVH/Ts16/v3CZKgRkcEnVb+drUYaVBmK92zW0Hubt6rVO+MygyG0Fq9dYOvUE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Vuq2+3dL; arc=none smtp.client-ip=209.85.215.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Vuq2+3dL" Received: by mail-pg1-f170.google.com with SMTP id 41be03b00d2f7-7fd51285746so1477596a12.3; Fri, 20 Dec 2024 11:56:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724590; x=1735329390; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qFA6erE2CmR1ftMlIX2fPD8J/Ca+GzqoMDbQfYyUDV4=; b=Vuq2+3dLwXAxW9bYVbdvZaV/M5P63lt69Lp7Z+KXTQSb6MOf7B14m38NCE6guCj7vN aOJ6QnalQsXjffu5sA++mMFgSpnBCVY8kCPDc5vFzcBlA2HqhmjIx7qDIcKVQIVU77R0 LQBEOg1ktWnGBBwmViKZ5xkwfpHZJSIuMD/duBQpCdnp9TN/L0RorSkG9zctfuIOx/9a +bj+vvmHe/464zwYwepfZUinT3WaG+0QxdtGF8Ni3Ynoeg2QMUNzy0gwv4NFbdB63GLa Bj6uS8CCuKL1drfAIhK1eFJTlRyg0SS6bRIQlw0UygRdMYU60hb7/EBBPWTP/pY/e2RE ylsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724590; x=1735329390; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qFA6erE2CmR1ftMlIX2fPD8J/Ca+GzqoMDbQfYyUDV4=; b=DBanfIgNAk3wYlprGWF5bbdSbwclJP56S64w8BljLtsK7/KEmN2L68Y32Ciz5+0yrR plGrRlWMps4eLiHhda93azdA4BXefRMVX6ZZP8TBbRalsuFPgtaleiEhSRBVy8VykYRq WTtxJ7beOawFrZ8/i45aP+iattoucjTYQ7CmPjcYMARpJnxpAEjlrKgpK2T1BIbQt89g UeFoBxDRYMslqQGvqC+dn2O4JdO69VRDosyL7qLCkC4kObkSb3nA5SoUMRRULXBrxRya AggkVBfNmTt2GEAbBa9wu69AMz5hGSoKS87mKmpa1tuAwtEuRw0CCsOlGgbkPTUYNhYj yBPg== X-Gm-Message-State: AOJu0YyRbdP1y4JbFlE62+jxsmsFgLTfLRUK9UgOCK74eQS17qmNknYO le0NZBHw9KF+60yf2VbGJ90T2lJwKUJoqMh6jg7WDao0Q1WiEfWunGHBUw== X-Gm-Gg: ASbGncvCQMSSqFMVWN0kvsqumhnOU1R4AhHLBgAJ3jSE6LbNqxMLWbbd+L6AW1/hU6d 0HLi31p5G0ZOaRtimXmGX5KU2+9YsE33DPZ1geQVS112tp2sNBuNX/tAjhDc9CPhObcKK7TzfEe dnsZ8+hW5k/efMeLjZvbbz3kqskgsk63cCwv1i8JCCXWwYlxmrOB4WjzHLbg6USljKJSgLOyCtb K9a3hhHEJhO4U2Vp+u0tSJWlkBgI7KIK3imZlKiXAGSJP0RzYZ81W6KYUwbg5/bUXT4cohCG7KT nXhB7Ou9UabjNehCQCL1wS9cREBBIfn6 X-Google-Smtp-Source: AGHT+IHW40CsNcWYIjdG4pIllm1X25htXClnzKZd0E3yWM5CsHJ240/E/Hd2oTU0WfFmgnPhfsdatw== X-Received: by 2002:a05:6a20:d499:b0:1d9:c615:d1e6 with SMTP id adf61e73a8af0-1e5dfb65989mr6866348637.0.1734724589665; Fri, 20 Dec 2024 11:56:29 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:29 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 01/14] bpf: Support getting referenced kptr from struct_ops argument Date: Fri, 20 Dec 2024 11:55:27 -0800 Message-ID: <20241220195619.2022866-2-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Allows struct_ops programs to acqurie referenced kptrs from arguments by directly reading the argument. The verifier will acquire a reference for struct_ops a argument tagged with "__ref" in the stub function in the beginning of the main program. The user will be able to access the referenced kptr directly by reading the context as long as it has not been released by the program. This new mechanism to acquire referenced kptr (compared to the existing "kfunc with KF_ACQUIRE") is introduced for ergonomic and semantic reasons. In the first use case, Qdisc_ops, an skb is passed to .enqueue in the first argument. This mechanism provides a natural way for users to get a referenced kptr in the .enqueue struct_ops programs and makes sure that a qdisc will always enqueue or drop the skb. Signed-off-by: Amery Hung --- include/linux/bpf.h | 2 ++ kernel/bpf/bpf_struct_ops.c | 26 ++++++++++++++++++++------ kernel/bpf/btf.c | 3 ++- kernel/bpf/verifier.c | 37 ++++++++++++++++++++++++++++++++++--- 4 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index feda0ce90f5a..2556f8043276 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -968,6 +968,7 @@ struct bpf_insn_access_aux { struct { struct btf *btf; u32 btf_id; + u32 ref_obj_id; }; }; struct bpf_verifier_log *log; /* for verbose logs */ @@ -1481,6 +1482,7 @@ struct bpf_ctx_arg_aux { enum bpf_reg_type reg_type; struct btf *btf; u32 btf_id; + bool refcounted; }; struct btf_mod_pair { diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 606efe32485a..d9e0af00580b 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -146,6 +146,7 @@ void bpf_struct_ops_image_free(void *image) } #define MAYBE_NULL_SUFFIX "__nullable" +#define REFCOUNTED_SUFFIX "__ref" #define MAX_STUB_NAME 128 /* Return the type info of a stub function, if it exists. @@ -207,9 +208,11 @@ static int prepare_arg_info(struct btf *btf, struct bpf_struct_ops_arg_info *arg_info) { const struct btf_type *stub_func_proto, *pointed_type; + bool is_nullable = false, is_refcounted = false; const struct btf_param *stub_args, *args; struct bpf_ctx_arg_aux *info, *info_buf; u32 nargs, arg_no, info_cnt = 0; + const char *suffix; u32 arg_btf_id; int offset; @@ -241,12 +244,19 @@ static int prepare_arg_info(struct btf *btf, info = info_buf; for (arg_no = 0; arg_no < nargs; arg_no++) { /* Skip arguments that is not suffixed with - * "__nullable". + * "__nullable or __ref". */ - if (!btf_param_match_suffix(btf, &stub_args[arg_no], - MAYBE_NULL_SUFFIX)) + is_nullable = btf_param_match_suffix(btf, &stub_args[arg_no], + MAYBE_NULL_SUFFIX); + is_refcounted = btf_param_match_suffix(btf, &stub_args[arg_no], + REFCOUNTED_SUFFIX); + if (!is_nullable && !is_refcounted) continue; + if (is_nullable) + suffix = MAYBE_NULL_SUFFIX; + else if (is_refcounted) + suffix = REFCOUNTED_SUFFIX; /* Should be a pointer to struct */ pointed_type = btf_type_resolve_ptr(btf, args[arg_no].type, @@ -254,7 +264,7 @@ static int prepare_arg_info(struct btf *btf, if (!pointed_type || !btf_type_is_struct(pointed_type)) { pr_warn("stub function %s__%s has %s tagging to an unsupported type\n", - st_ops_name, member_name, MAYBE_NULL_SUFFIX); + st_ops_name, member_name, suffix); goto err_out; } @@ -272,11 +282,15 @@ static int prepare_arg_info(struct btf *btf, } /* Fill the information of the new argument */ - info->reg_type = - PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; info->btf_id = arg_btf_id; info->btf = btf; info->offset = offset; + if (is_nullable) { + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; + } else if (is_refcounted) { + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID; + info->refcounted = true; + } info++; info_cnt++; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 28246c59e12e..c2f4f84e539d 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6546,7 +6546,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, const struct btf_param *args; bool ptr_err_raw_tp = false; const char *tag_value; - u32 nr_args, arg; + u32 nr_args, arg, nr_ref_args = 0; int i, ret; if (off % 8) { @@ -6682,6 +6682,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, info->reg_type = ctx_arg_info->reg_type; info->btf = ctx_arg_info->btf ? : btf_vmlinux; info->btf_id = ctx_arg_info->btf_id; + info->ref_obj_id = ctx_arg_info->refcounted ? ++nr_ref_args : 0; return true; } } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f27274e933e5..26305571e377 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1542,6 +1542,17 @@ static void release_reference_state(struct bpf_verifier_state *state, int idx) return; } +static bool find_reference_state(struct bpf_verifier_state *state, int ptr_id) +{ + int i; + + for (i = 0; i < state->acquired_refs; i++) + if (state->refs[i].id == ptr_id) + return true; + + return false; +} + static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr) { int i; @@ -5980,7 +5991,8 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off, /* check access to 'struct bpf_context' fields. Supports fixed offsets only */ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size, enum bpf_access_type t, enum bpf_reg_type *reg_type, - struct btf **btf, u32 *btf_id, bool *is_retval, bool is_ldsx) + struct btf **btf, u32 *btf_id, bool *is_retval, bool is_ldsx, + u32 *ref_obj_id) { struct bpf_insn_access_aux info = { .reg_type = *reg_type, @@ -6002,8 +6014,16 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, *is_retval = info.is_retval; if (base_type(*reg_type) == PTR_TO_BTF_ID) { + if (info.ref_obj_id && + !find_reference_state(env->cur_state, info.ref_obj_id)) { + verbose(env, "invalid bpf_context access off=%d. Reference may already be released\n", + off); + return -EACCES; + } + *btf = info.btf; *btf_id = info.btf_id; + *ref_obj_id = info.ref_obj_id; } else { env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size; } @@ -7369,7 +7389,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn struct bpf_retval_range range; enum bpf_reg_type reg_type = SCALAR_VALUE; struct btf *btf = NULL; - u32 btf_id = 0; + u32 btf_id = 0, ref_obj_id = 0; if (t == BPF_WRITE && value_regno >= 0 && is_pointer_value(env, value_regno)) { @@ -7382,7 +7402,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn return err; err = check_ctx_access(env, insn_idx, off, size, t, ®_type, &btf, - &btf_id, &is_retval, is_ldsx); + &btf_id, &is_retval, is_ldsx, &ref_obj_id); if (err) verbose_linfo(env, insn_idx, "; "); if (!err && t == BPF_READ && value_regno >= 0) { @@ -7413,6 +7433,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn if (base_type(reg_type) == PTR_TO_BTF_ID) { regs[value_regno].btf = btf; regs[value_regno].btf_id = btf_id; + regs[value_regno].ref_obj_id = ref_obj_id; } } regs[value_regno].type = reg_type; @@ -22161,6 +22182,16 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) mark_reg_known_zero(env, regs, BPF_REG_1); } + /* Acquire references for struct_ops program arguments tagged with "__ref". + * These should be the earliest references acquired. btf_ctx_access() will + * assume the ref_obj_id of the n-th __ref-tagged argument to be n. + */ + if (!subprog && env->prog->type == BPF_PROG_TYPE_STRUCT_OPS) { + for (i = 0; i < env->prog->aux->ctx_arg_info_size; i++) + if (env->prog->aux->ctx_arg_info[i].refcounted) + acquire_reference(env, 0); + } + ret = do_check(env); out: /* check for NULL is necessary, since cur_state can be freed inside From patchwork Fri Dec 20 19:55:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917347 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F7AF221462; Fri, 20 Dec 2024 19:56:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724593; cv=none; b=o7sEb3hlA1ccVJpSzBVOM6NJeVFwzKb+d91K3BvzxwdSgy1QsPOrYvg/BBRx0XoBmANJ5y3WOXI152GsFytim53PAui8FnYX8bcRN6FOTBv3oCNMyxFAISagGXF+wcp/BNrgts1+jUwVCKHox4Pf5som0TiUqW0wcuDG2mlrckA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724593; c=relaxed/simple; bh=t4FUE67K84dtQH35RTVnYwfhNMRurLNJfv1SsEXwGpM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JvonRS3+WIw7zI4R9BeSlHGZ5OEvd/hq0opWLGdn2nKLPd9GjEwmSCmpW5NywFph6gO0tJeck0hjioeIEywDLYbt7yTCtjx4KyJ+2UZ/85zmZLNRRWOT+C2hQPP/tQdmWoOrZ395jxWj+P/RkMjmO9/V+lJBJF1ASBoGyx+agOU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aq3+q7N/; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aq3+q7N/" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-725f3594965so2065803b3a.3; Fri, 20 Dec 2024 11:56:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724590; x=1735329390; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Tf1eNdZAO2NHBNY0w45oV/p8wgFq0vsGvz4g6V3xMYA=; b=aq3+q7N/Osdm2g1JmxlIkaMzRu7hnJDI7vgt6GVxESdsfUXx0LKwBwzLKMvQJZ7j36 BDQ+Y9EOozxVQzQGcMFJ7BSpJuKpWuhgOGuWTK1EDwwOg1J11srNpNaOWgLV2I9m3+3x ICvuvG4YIBFJXu+dv9SVkyOklfbeRUMK+0I7Z8NE1Bs91NlUQuoTHTgtXgsiWcAR3fqs P7VKRTtmSjupMMT7BfPiidaGv/dVeGpexaUhz8i90x2uyWrkJh8OjSOGZhEOLOKrc89P Ai+iyH+P6rVvqvFw0IdTeghkTetS9O4LXJGr78UyBSKeGoVd/ug+uyRJ2z+x2XHGFSqC FUsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724590; x=1735329390; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Tf1eNdZAO2NHBNY0w45oV/p8wgFq0vsGvz4g6V3xMYA=; b=jI121Oc3jSGRsiXrYaco3CHfoOG+Ocybs68IqmMPOo0Nh7g2eFDWhmsyHDf11YbgCE 8Hd4Vxwtv5R3jsL8O55/ur1Qi5lbOUjfjb3NMqqmRav1Qmt4E+Jgn5yY/3nlFCpC9xVr imtXTe53lnsHgZ+Qv4gDPLV8s31cQMCKUHmvQQZoKSUjzzW9Oi1Ef8y1FhmqEaNTYhFd fYSVFe5Ob50NsUG7cNGAoE6+pUX7sJa4+0u34lSSl1SogkHqXtLvl87NDTd6rtTInjY4 QpFo7wVh2XedvviCfY5zMgG2cWGeI9B48AlPahrr+U4W0BMzB6s19V2S45MveQl5Aitu NBgA== X-Gm-Message-State: AOJu0YxFKelHWpHzJDI0Zt8SErH6PQd9ItHsBww34YXz4d1AWt3DLiU7 Do4IBVtInu6X4+nIE7K/KMp2b0woQaGWxlbnMHekiqQ1syGmfBP3JXo1eg== X-Gm-Gg: ASbGncuRg8eE73Sjn7z0ZWMQajc1pp/vk6utoFwXizEEWcTTGPlbMDwZaNH3k7EwaV8 NBqiLQ0WU2YIvkrD+iHrRLYQbpjtnK4anQqhRdNvMyhtZsqDD2qWH3XkaCC8h4TaAVlZFCCDq5F 6fbIDuyUnbRXlcig/Sr+KdFC+YYVGMX7FCb+P580HCLxEvSWXkzj8FndN9uZEHgdsHDMJxJ7Tyl jhQq/0uH/JXVkAEL/M2AGBxGjfirP877Fv/0esKd/dMuk9usgo2xjU962Ro22QqYfg4ZzkvFRyb wmW9sRugLYMYJWQbXYu6Y23o5LHIb7Qr X-Google-Smtp-Source: AGHT+IF8/X0KErNXYH7U4fuIbqu4mQxEYdPR14bCGBKIDM0EVr3nul6SLXtnQ58PO6da7SR0lDGbXg== X-Received: by 2002:a05:6a21:2d8c:b0:1e1:c943:4e8e with SMTP id adf61e73a8af0-1e5e081ee88mr7010613637.41.1734724590575; Fri, 20 Dec 2024 11:56:30 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:30 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 02/14] selftests/bpf: Test referenced kptr arguments of struct_ops programs Date: Fri, 20 Dec 2024 11:55:28 -0800 Message-ID: <20241220195619.2022866-3-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Test referenced kptr acquired through struct_ops argument tagged with "__ref". The success case checks whether 1) a reference to the correct type is acquired, and 2) the referenced kptr argument can be accessed in multiple paths as long as it hasn't been released. In the fail cases, we first confirm that a referenced kptr acquried through a struct_ops argument is not allowed to be leaked. Then, we make sure this new referenced kptr acquiring mechanism does not accidentally allow referenced kptrs to flow into global subprograms through their arguments. Signed-off-by: Amery Hung --- .../prog_tests/test_struct_ops_refcounted.c | 12 ++++++ .../bpf/progs/struct_ops_refcounted.c | 31 ++++++++++++++++ ...ruct_ops_refcounted_fail__global_subprog.c | 37 +++++++++++++++++++ .../struct_ops_refcounted_fail__ref_leak.c | 22 +++++++++++ .../selftests/bpf/test_kmods/bpf_testmod.c | 7 ++++ .../selftests/bpf/test_kmods/bpf_testmod.h | 2 + 6 files changed, 111 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c new file mode 100644 index 000000000000..e290a2f6db95 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_refcounted.c @@ -0,0 +1,12 @@ +#include + +#include "struct_ops_refcounted.skel.h" +#include "struct_ops_refcounted_fail__ref_leak.skel.h" +#include "struct_ops_refcounted_fail__global_subprog.skel.h" + +void test_struct_ops_refcounted(void) +{ + RUN_TESTS(struct_ops_refcounted); + RUN_TESTS(struct_ops_refcounted_fail__ref_leak); + RUN_TESTS(struct_ops_refcounted_fail__global_subprog); +} diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c new file mode 100644 index 000000000000..76dcb6089d7f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c @@ -0,0 +1,31 @@ +#include +#include +#include "../test_kmods/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +__attribute__((nomerge)) extern void bpf_task_release(struct task_struct *p) __ksym; + +/* This is a test BPF program that uses struct_ops to access a referenced + * kptr argument. This is a test for the verifier to ensure that it + * 1) recongnizes the task as a referenced object (i.e., ref_obj_id > 0), and + * 2) the same reference can be acquired from multiple paths as long as it + * has not been released. + */ +SEC("struct_ops/test_refcounted") +int BPF_PROG(refcounted, int dummy, struct task_struct *task) +{ + if (dummy == 1) + bpf_task_release(task); + else + bpf_task_release(task); + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_refcounted = { + .test_refcounted = (void *)refcounted, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c new file mode 100644 index 000000000000..43493a7ead39 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__global_subprog.c @@ -0,0 +1,37 @@ +#include +#include +#include "../test_kmods/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +extern void bpf_task_release(struct task_struct *p) __ksym; + +__noinline int subprog_release(__u64 *ctx __arg_ctx) +{ + struct task_struct *task = (struct task_struct *)ctx[1]; + int dummy = (int)ctx[0]; + + bpf_task_release(task); + + return dummy + 1; +} + +/* Test that the verifier rejects a program that contains a global + * subprogram with referenced kptr arguments + */ +SEC("struct_ops/test_refcounted") +__failure __msg("invalid bpf_context access off=8. Reference may already be released") +int refcounted_fail__global_subprog(unsigned long long *ctx) +{ + struct task_struct *task = (struct task_struct *)ctx[1]; + + bpf_task_release(task); + + return subprog_release(ctx); +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_ref_acquire = { + .test_refcounted = (void *)refcounted_fail__global_subprog, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c new file mode 100644 index 000000000000..e945b1a04294 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted_fail__ref_leak.c @@ -0,0 +1,22 @@ +#include +#include +#include "../test_kmods/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +/* Test that the verifier rejects a program that acquires a referenced + * kptr through context without releasing the reference + */ +SEC("struct_ops/test_refcounted") +__failure __msg("Unreleased reference id=1 alloc_insn=0") +int BPF_PROG(refcounted_fail__ref_leak, int dummy, + struct task_struct *task) +{ + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_ref_acquire = { + .test_refcounted = (void *)refcounted_fail__ref_leak, +}; diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c index cc9dde507aba..802cbd871035 100644 --- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c +++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c @@ -1176,10 +1176,17 @@ static int bpf_testmod_ops__test_maybe_null(int dummy, return 0; } +static int bpf_testmod_ops__test_refcounted(int dummy, + struct task_struct *task__ref) +{ + return 0; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, .test_maybe_null = bpf_testmod_ops__test_maybe_null, + .test_refcounted = bpf_testmod_ops__test_refcounted, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.h b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.h index 356803d1c10e..c57b2f9dab10 100644 --- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.h +++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.h @@ -36,6 +36,8 @@ struct bpf_testmod_ops { /* Used to test nullable arguments. */ int (*test_maybe_null)(int dummy, struct task_struct *task); int (*unsupported_ops)(void); + /* Used to test ref_acquired arguments. */ + int (*test_refcounted)(int dummy, struct task_struct *task); /* The following fields are used to test shadow copies. */ char onebyte; From patchwork Fri Dec 20 19:55:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917348 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 596CF22578C; Fri, 20 Dec 2024 19:56:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724593; cv=none; b=eM7AT0Vqv87XuFATvR5LxN9yRL9A0tuMLIVbHAH0G8AT7VYkA0JH7i1S7Rv+bgWKbVtmYCFeAa02ogJJlwE8vjUhvSoCXI1GTySN39sh63ytwkValyXt5UUlMYgIYFDFw9oxxMMnwkaX8RaTzirFJo0kI8ZldBECM8d8gMt1NCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724593; c=relaxed/simple; bh=B+z9xMRiPY9EiUasYh3moVq2Fpc288hu7WmpykEoyC4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fPUGldJWKUS7Gx/OaThE/ifMsg7TmXuBhFPnb9MvCmbhxiuPYiNtL0JwkxgQ2yr34IXYjLREIC7kN0Pk4p2ib0tfRvlN3CaxMc9E+RATlJWRm0qE2BgtTC2UladPj3uUgJ5YYxkLosHyWwH3bdJmh7HcnbP+xx1665q5RIsPICs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J0PR1dKb; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J0PR1dKb" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-728eccf836bso2126114b3a.1; Fri, 20 Dec 2024 11:56:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724591; x=1735329391; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GitseSJSjhLenlqje92uM6fqlUgeGQl14yY2w8l3Gtk=; b=J0PR1dKb23e6FSheRdK8oc1hw+/JdTYDAtg+ENMJ6+IeBsO87rWmPc0Nxin2Ggkf5/ oD3jj/c4l7Y3FGjACv5pcbyyZzG0pkkumZfSRVzxiLfgoAvJtDwAcIVtlnjeK3bDSOY3 ewMriDRFvmkofaed0eEppuQVvBdbcTtCqdhsNcg/v9RUfJXU5m59dweyean52YVinF2e jSArKOABZ0aJ3RTLIq8cbonziiKbWEwEKJqCf9urDg6yIgzxw5EXgxm9VQWeGO5jZjmW 7sUVfZptsZzhD9ccV8Ax0F5/udBevek5q+8bYxfRZidaE92nEetNhfBNey4Ae/US06AH XWJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724591; x=1735329391; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GitseSJSjhLenlqje92uM6fqlUgeGQl14yY2w8l3Gtk=; b=V0iMODmMyzA4aH65HK5MqPqC268k9Vn6MAke/PmMNfF/Q+awg+gaD64c3BcoJCQR12 J1IoIHO+lqRv9RNb0+X8LXwPZvWybsaRR1CL/jXX9QYkBettnqp3iETwoY5X+V1CgR9w AVNrnYYc7EXNU025o7rHB008IjiwcNM+EQQDHxedmrtkE/QBdHVAZKFUzWtyyDTI2pfr n3eydJK0CfP4hoUXa2IZt0jg7LHXy0UQh6J4dzWFBFbNfXih+Q9F6CGK+iDTVB49RZXh bqD0C0apKqeqgyhYw4oo2tp/oqzjkfe+YYTlh7M53BfZih8fnLInWMEtHCYnRAVjELO9 zSfg== X-Gm-Message-State: AOJu0YxD2ubu5LtmIUQdLmaP040Qv8JhfmiHXcttM4S5Xl4r6mhARtqs aKwEB4i5z6NKtWK+GTxAWmMeH1Tm3fTC8tIZLDQ08c10euxdioK3EYHbEQ== X-Gm-Gg: ASbGncuoMBBEWL+i/UG9ktOJf26YcRBdp43Bu91AYIFo25i3XxzaFmAjBQplQdDruGj Omo1TUGNGJOiCOxT6loyiLXRgzp+hlfpgGuqOCtwTzzFIfkK3ei6OaaOXwkr3umY+yIn33VWlen FnCj/+BbYtxDQBowXPsrdCftilGmX6t/xYwYcyQMDnt0bWtFeoB8fPzugW/SEJ1MZDklqJFa26r vj50VMYlV91AHMSvCgGqAo365w5+iwVtV4iXrQj4eiorh7jzEocwUsPfxiQRtUTjqL1NstmLga6 aVi0v1l+lKBp1cKJsdaVgRRhz+Nm9WeF X-Google-Smtp-Source: AGHT+IE2sAcNr1U4SXIM/YbR3wMtq3g82wHv7+hWZ565zph06itOHS3uOs7e8O7QEGXymIH1UW+yuQ== X-Received: by 2002:a05:6a20:db0a:b0:1e1:ae4a:1d50 with SMTP id adf61e73a8af0-1e5e049f545mr8192112637.25.1734724591543; Fri, 20 Dec 2024 11:56:31 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:31 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 03/14] bpf: Allow struct_ops prog to return referenced kptr Date: Fri, 20 Dec 2024 11:55:29 -0800 Message-ID: <20241220195619.2022866-4-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Allow a struct_ops program to return a referenced kptr if the struct_ops operator's return type is a struct pointer. To make sure the returned pointer continues to be valid in the kernel, several constraints are required: 1) The type of the pointer must matches the return type 2) The pointer originally comes from the kernel (not locally allocated) 3) The pointer is in its unmodified form Implementation wise, a referenced kptr first needs to be allowed to _leak_ in check_reference_leak() if it is in the return register. Then, in check_return_code(), constraints 1-3 are checked. During struct_ops registration, a check is also added to warn about operators with non-struct pointer return. In addition, since the first user, Qdisc_ops::dequeue, allows a NULL pointer to be returned when there is no skb to be dequeued, we will allow a scalar value with value equals to NULL to be returned. In the future when there is a struct_ops user that always expects a valid pointer to be returned from an operator, we may extend tagging to the return value. We can tell the verifier to only allow NULL pointer return if the return value is tagged with MAY_BE_NULL. Signed-off-by: Amery Hung --- kernel/bpf/bpf_struct_ops.c | 12 +++++++++++- kernel/bpf/verifier.c | 36 ++++++++++++++++++++++++++++++++---- 2 files changed, 43 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index d9e0af00580b..27d4a170df84 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -386,7 +386,7 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, st_ops_desc->value_type = btf_type_by_id(btf, value_id); for_each_member(i, t, member) { - const struct btf_type *func_proto; + const struct btf_type *func_proto, *ret_type; mname = btf_name_by_offset(btf, member->name_off); if (!*mname) { @@ -409,6 +409,16 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, if (!func_proto) continue; + if (func_proto->type) { + ret_type = btf_type_resolve_ptr(btf, func_proto->type, NULL); + if (ret_type && !__btf_type_is_struct(ret_type)) { + pr_warn("func ptr %s in struct %s returns non-struct pointer, which is not supported\n", + mname, st_ops->name); + err = -EOPNOTSUPP; + goto errout; + } + } + if (btf_distill_func_proto(log, btf, func_proto, mname, &st_ops->func_models[i])) { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 26305571e377..0e6a3c4daa7d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10707,6 +10707,8 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit) { struct bpf_verifier_state *state = env->cur_state; + enum bpf_prog_type type = resolve_prog_type(env->prog); + struct bpf_reg_state *reg = reg_state(env, BPF_REG_0); bool refs_lingering = false; int i; @@ -10716,6 +10718,12 @@ static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exi for (i = 0; i < state->acquired_refs; i++) { if (state->refs[i].type != REF_TYPE_PTR) continue; + /* Allow struct_ops programs to return a referenced kptr back to + * kernel. Type checks are performed later in check_return_code. + */ + if (type == BPF_PROG_TYPE_STRUCT_OPS && !exception_exit && + reg->ref_obj_id == state->refs[i].id) + continue; verbose(env, "Unreleased reference id=%d alloc_insn=%d\n", state->refs[i].id, state->refs[i].insn_idx); refs_lingering = true; @@ -16320,13 +16328,14 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char const char *exit_ctx = "At program exit"; struct tnum enforce_attach_type_range = tnum_unknown; const struct bpf_prog *prog = env->prog; - struct bpf_reg_state *reg; + struct bpf_reg_state *reg = reg_state(env, regno); struct bpf_retval_range range = retval_range(0, 1); enum bpf_prog_type prog_type = resolve_prog_type(env->prog); int err; struct bpf_func_state *frame = env->cur_state->frame[0]; const bool is_subprog = frame->subprogno; bool return_32bit = false; + const struct btf_type *reg_type, *ret_type = NULL; /* LSM and struct_ops func-ptr's return type could be "void" */ if (!is_subprog || frame->in_exception_callback_fn) { @@ -16335,10 +16344,26 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char if (prog->expected_attach_type == BPF_LSM_CGROUP) /* See below, can be 0 or 0-1 depending on hook. */ break; - fallthrough; + if (!prog->aux->attach_func_proto->type) + return 0; + break; case BPF_PROG_TYPE_STRUCT_OPS: if (!prog->aux->attach_func_proto->type) return 0; + + if (frame->in_exception_callback_fn) + break; + + /* Allow a struct_ops program to return a referenced kptr if it + * matches the operator's return type and is in its unmodified + * form. A scalar zero (i.e., a null pointer) is also allowed. + */ + reg_type = reg->btf ? btf_type_by_id(reg->btf, reg->btf_id) : NULL; + ret_type = btf_type_resolve_ptr(prog->aux->attach_btf, + prog->aux->attach_func_proto->type, + NULL); + if (ret_type && ret_type == reg_type && reg->ref_obj_id) + return __check_ptr_off_reg(env, reg, regno, false); break; default: break; @@ -16360,8 +16385,6 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char return -EACCES; } - reg = cur_regs(env) + regno; - if (frame->in_async_callback_fn) { /* enforce return zero from async callbacks like timer */ exit_ctx = "At async callback return"; @@ -16460,6 +16483,11 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char case BPF_PROG_TYPE_NETFILTER: range = retval_range(NF_DROP, NF_ACCEPT); break; + case BPF_PROG_TYPE_STRUCT_OPS: + if (!ret_type) + return 0; + range = retval_range(0, 0); + break; case BPF_PROG_TYPE_EXT: /* freplace program can return anything as its return value * depends on the to-be-replaced kernel func or bpf program. From patchwork Fri Dec 20 19:55:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917349 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 728EE225A5A; Fri, 20 Dec 2024 19:56:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724595; cv=none; b=JP/H9CWjIfa5ND3NDFrNs5ggLygok1QG3i2G64Cb7mW7kv3YSCjyRWR/5TRQt2ySoXB0dNnH2vBLs/J211JXFNhVlSrrUoLo8/yKqzfS1a+zmXVtlWm+5wnqSC0EkueeBb/6QZWyB0w9lFwToSS1XABO84UvbvIHun7glsBPIpE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724595; c=relaxed/simple; bh=n5icmZkxkP3FtQtKKyziYzt+JyJhdO3osQV6HrLB17o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TS+UeAlrVCQ+/eG2UisCEkUcgnvo78KLY09AKx6o4o35DhqiMdka7yF7p3tWxGf8aaGSBm+6MoLDpucJey7ELykCaR5cnD66YXFRwb/GD/TC/eA/VA/dtPVfFo3B7Ex+vXx3l0jPnXNys94nlWpIjtwOr9D8mRrLEy8n86OvYk8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fRF2yFF0; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fRF2yFF0" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-725d9f57d90so1805466b3a.1; Fri, 20 Dec 2024 11:56:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724592; x=1735329392; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AY5SYRjy7PKianvBBFp4ljQ5nK+6RTkJRG7kXPaZpVY=; b=fRF2yFF0q4IaU2eX+DyrhKmFab4WllPCdHIR9/9S1hQ6UkuOB4ut0w1a238fnuy0iv bnHIkZm0yFdYUkw7DjFM+w+8xHcCNUBeI0yZ9ae9yYsNeqEa0rZcIZVu7QJWWJ7sJ/ca t3exR7BaAfhw5/n58IYsAEL+mu3xKRwfsAfRvS6WJfcOQ/79YK//e9gf/lG+pMrKKemG cSC1IxTDUm18ape9FdqDCm8468fnY047w3lCZ0iWyCcCWh98unB/mxycBSAG6sublsjS 31VZXgHWQjDwKcfJ1Ix4zAiUQoeJPbzkawcQBL2zkKDU0xMum8vhUrCMQCW8vczlUXwl ZLDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724592; x=1735329392; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AY5SYRjy7PKianvBBFp4ljQ5nK+6RTkJRG7kXPaZpVY=; b=LO3oROxdhOkObPfubg+2dSIAN0XiRQxErfyQcF4Vz0EaJFIQgMAIgwGUPdZ7C31nKp zTCZfo3qKqriV2CI3HEu7oDaMZw1ugB5+yi3In2ONwIUZz0LLcp8dPnBoUgSo8JunNmh eG9IqvXDxGb/NFevjICCyPn9AiAJsF6ZAp9BImhTUVBM0flTB8EA7dCj4AouXixNK6cK /Rc5qkWQFsftF9B2vnrNUSfvh1PPJ2tuE/U3/VP1YPsEOergThinRr98eiTGQkaO8p1j gCrqYOd1KHxP1Oh/v1T7w8cqmtIsWgcE7OXtvfy+oegw1imTraawuE9RAvcYj36Ofww9 F1Kg== X-Gm-Message-State: AOJu0Yz9Vb5hjkG5AsKzq6pm83+WN9cDjZvS1EIvzTN6bPDIOpE8ofV2 tlo5FFRi+B8mvoyeexIA+uFmA/t/bYFWEjjoXtvWG+J3kYa9yMUag4bkFQ== X-Gm-Gg: ASbGncvO6DJpHJVdY6l1YAdQtSphQgr1NtxR2EBxviYHEh75IQh1HC19U7kxTDARIgD hVF1hoK9hSUWxK+Qf8DUCwex+UrlgQRzzEQLx2h+UPvWrWdfrN7ImOS/GU0OuThWKyETCj+dBlY Dq3NrtokNKyGLXYYJCA8B5/3JOBo5AhUwBYOi524z1KaIn4p45iGv3etOhRfJefA1zz3oYkOcRc 4E8BX2m+0uLxI9/gKIj81gjUXNpKgQsyzNbHpL9REuyhgVy6eao/iaZNh0hk18tN2+EHM8HaRNe +BpLQdZBVSFkkTMvnr8IBzz3h94CEtbg X-Google-Smtp-Source: AGHT+IE3aeY4WSqKHsR4EdxpsldksdQNPop3DR0ggg+53K9BinNxVUQPmeJegExa4dFYOb+aDM80kg== X-Received: by 2002:a05:6a00:244c:b0:724:bf30:3030 with SMTP id d2e1a72fcca58-72abdbe59f6mr5098491b3a.0.1734724592478; Fri, 20 Dec 2024 11:56:32 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:32 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 04/14] selftests/bpf: Test returning referenced kptr from struct_ops programs Date: Fri, 20 Dec 2024 11:55:30 -0800 Message-ID: <20241220195619.2022866-5-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Test struct_ops programs returning referenced kptr. When the return type of a struct_ops operator is pointer to struct, the verifier should only allow programs that return a scalar NULL or a non-local kptr with the correct type in its unmodified form. Signed-off-by: Amery Hung --- .../prog_tests/test_struct_ops_kptr_return.c | 16 +++++++++ .../bpf/progs/struct_ops_kptr_return.c | 30 ++++++++++++++++ ...uct_ops_kptr_return_fail__invalid_scalar.c | 26 ++++++++++++++ .../struct_ops_kptr_return_fail__local_kptr.c | 34 +++++++++++++++++++ ...uct_ops_kptr_return_fail__nonzero_offset.c | 25 ++++++++++++++ .../struct_ops_kptr_return_fail__wrong_type.c | 30 ++++++++++++++++ .../selftests/bpf/test_kmods/bpf_testmod.c | 8 +++++ .../selftests/bpf/test_kmods/bpf_testmod.h | 4 +++ 8 files changed, 173 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c new file mode 100644 index 000000000000..467cc72a3588 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c @@ -0,0 +1,16 @@ +#include + +#include "struct_ops_kptr_return.skel.h" +#include "struct_ops_kptr_return_fail__wrong_type.skel.h" +#include "struct_ops_kptr_return_fail__invalid_scalar.skel.h" +#include "struct_ops_kptr_return_fail__nonzero_offset.skel.h" +#include "struct_ops_kptr_return_fail__local_kptr.skel.h" + +void test_struct_ops_kptr_return(void) +{ + RUN_TESTS(struct_ops_kptr_return); + RUN_TESTS(struct_ops_kptr_return_fail__wrong_type); + RUN_TESTS(struct_ops_kptr_return_fail__invalid_scalar); + RUN_TESTS(struct_ops_kptr_return_fail__nonzero_offset); + RUN_TESTS(struct_ops_kptr_return_fail__local_kptr); +} diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c new file mode 100644 index 000000000000..36386b3c23a1 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c @@ -0,0 +1,30 @@ +#include +#include +#include "../test_kmods/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * allow a referenced kptr or a NULL pointer to be returned. A referenced kptr to task + * here is acquried automatically as the task argument is tagged with "__ref". + */ +SEC("struct_ops/test_return_ref_kptr") +struct task_struct *BPF_PROG(kptr_return, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + if (dummy % 2) { + bpf_task_release(task); + return NULL; + } + return task; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)kptr_return, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c new file mode 100644 index 000000000000..caeea158ef69 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c @@ -0,0 +1,26 @@ +#include +#include +#include "../test_kmods/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a non-zero scalar value. + */ +SEC("struct_ops/test_return_ref_kptr") +__failure __msg("At program exit the register R0 has smin=1 smax=1 should have been in [0, 0]") +struct task_struct *BPF_PROG(kptr_return_fail__invalid_scalar, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + bpf_task_release(task); + return (struct task_struct *)1; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)kptr_return_fail__invalid_scalar, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c new file mode 100644 index 000000000000..b8b4f05c3d7f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c @@ -0,0 +1,34 @@ +#include +#include +#include "../test_kmods/bpf_testmod.h" +#include "bpf_experimental.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a local kptr. + */ +SEC("struct_ops/test_return_ref_kptr") +__failure __msg("At program exit the register R0 is not a known value (ptr_or_null_)") +struct task_struct *BPF_PROG(kptr_return_fail__local_kptr, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + struct task_struct *t; + + bpf_task_release(task); + + t = bpf_obj_new(typeof(*task)); + if (!t) + return NULL; + + return t; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)kptr_return_fail__local_kptr, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c new file mode 100644 index 000000000000..7ddeb28c2329 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c @@ -0,0 +1,25 @@ +#include +#include +#include "../test_kmods/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a modified referenced kptr. + */ +SEC("struct_ops/test_return_ref_kptr") +__failure __msg("dereference of modified trusted_ptr_ ptr R0 off={{[0-9]+}} disallowed") +struct task_struct *BPF_PROG(kptr_return_fail__nonzero_offset, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + return (struct task_struct *)&task->jobctl; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)kptr_return_fail__nonzero_offset, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c new file mode 100644 index 000000000000..6a2dd5367802 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c @@ -0,0 +1,30 @@ +#include +#include +#include "../test_kmods/bpf_testmod.h" +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a referenced kptr of the wrong type. + */ +SEC("struct_ops/test_return_ref_kptr") +__failure __msg("At program exit the register R0 is not a known value (ptr_or_null_)") +struct task_struct *BPF_PROG(kptr_return_fail__wrong_type, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + struct task_struct *ret; + + ret = (struct task_struct *)bpf_cgroup_acquire(cgrp); + bpf_task_release(task); + + return ret; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_return_ref_kptr = (void *)kptr_return_fail__wrong_type, +}; diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c index 802cbd871035..89dc502de9d4 100644 --- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c +++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c @@ -1182,11 +1182,19 @@ static int bpf_testmod_ops__test_refcounted(int dummy, return 0; } +static struct task_struct * +bpf_testmod_ops__test_return_ref_kptr(int dummy, struct task_struct *task__ref, + struct cgroup *cgrp) +{ + return NULL; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, .test_maybe_null = bpf_testmod_ops__test_maybe_null, .test_refcounted = bpf_testmod_ops__test_refcounted, + .test_return_ref_kptr = bpf_testmod_ops__test_return_ref_kptr, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.h b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.h index c57b2f9dab10..c9fab51f16e2 100644 --- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.h +++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct cgroup; struct bpf_testmod_test_read_ctx { char *buf; @@ -38,6 +39,9 @@ struct bpf_testmod_ops { int (*unsupported_ops)(void); /* Used to test ref_acquired arguments. */ int (*test_refcounted)(int dummy, struct task_struct *task); + /* Used to test returning referenced kptr. */ + struct task_struct *(*test_return_ref_kptr)(int dummy, struct task_struct *task, + struct cgroup *cgrp); /* The following fields are used to test shadow copies. */ char onebyte; From patchwork Fri Dec 20 19:55:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917350 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C1B9222565; Fri, 20 Dec 2024 19:56:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724596; cv=none; b=ORQ9w9YlIKdB+TsCHRD9ocJ3318fiheTAM35HcgnB5rGunJTojsa/NB8GearXrskgyrUXbUpnccDqRZ9MGdEW68tM9gk4WelPA2NkD8P/BOWfg9OVObzCU6qg34kCndpnvhZ5tdD+boF9BlaNqANJ6ISxbrP8/j/PXA/anvD0NA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724596; c=relaxed/simple; bh=Dgs9P/JLEy2ZTvn1E7VqQyDuZMHik+0vWvpBnlOarJY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=klvj37iXLjdcaFJQEUBmUe+Rqsjy7rHM16PPGlhLQWvSKd/E4QGTaR8kXH/vvJCinp3wb4945RSS9zUXHhlhDlHlkz7DKVl71x/aHf5tvSgMyg/ltjIyJo6i3DgydJiUlfAs3Lq4PkNzFNnhIXgRBvLe9KpVAyGTx4pPdrMPUtc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mWPa9i+Y; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mWPa9i+Y" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-725d9f57d90so1805482b3a.1; Fri, 20 Dec 2024 11:56:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724593; x=1735329393; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LzEMwE7P5SeUCPP2ZtV5l+tofBVvoh8yKJDgFDQD9Uk=; b=mWPa9i+YCbRdLjxWd1mGZ/8CBpjGqN35y+QdpTDQipa58db+vDrz6zm73QyUh7x8AY xhxNF1K8odTu6Wo9ba0ScMubr7vzv9iF03NMizsFMXTeq0cEOqsDlOmyyf7FWheSp7mW gYjucx1frU1pKm+NtAr7+/BA0A7iOQ00BwVNOh5oUi/SRuuyUPnYx/f2IQuvteidPbUs 8Cd2+dwJFlFqvP0YI+qeF7ogCwzOl3MG/v6OtjEuAfmvbMVqjWeEB1NiqQZFvd1jSd3o El6aJdf4xK0YHYJKjMSukAxFP+derXrLD16+HagZKkiusgRzjHIY5edQTMzafja2Ena4 RMwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724593; x=1735329393; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LzEMwE7P5SeUCPP2ZtV5l+tofBVvoh8yKJDgFDQD9Uk=; b=d1FsaRFtOsnzZL5HdU7skQrcP3L0lw4xMVRFbzEXB77rDCfkXQCv31OCN/8YMob89p VceLUemDlOuQ0SMwumQ0a4ujtVDw5J2r4abOmF9R4Giu3S64w/e31uW1OK/klFoqI0Fj a0svnxaPYkNaiMefBGkcABlxXrLF7Xszt7xp27rHe6fTKKoUulRt1/TB9yoReyzXx0rL BJoJDSHBD/95NlHt2rcb3DOiEP8PqvrKC41cxqD4qHwqd2plKANwrsIfcBP55xaaKP0c bEhov7yBNyxUpJxqCoEpF5MZRZz8Y/Xjm5wCawV8ao0sikjEFzRL3KZhv6Vr3tqaPPp4 WTlQ== X-Gm-Message-State: AOJu0YwPRyLdQMeq8tdUdB5WWL1VygCRbB3jz/eDuOf6+vEA02SysGzc Q66ETEITlWqnS4QeCQMt7D5knlaSAzTtD8HKDqVctF5GaB+mYT7Tyc+srQ== X-Gm-Gg: ASbGncuLG3/ManyZeM51V8PSdxWZTTGJyfSIqbsfIdWdfqv56unqRV2VLhlpUyKS9w1 ox+l6IUOduAGlYc5R90MTC6NmjNRFlWTf8Dv2Y/bh6Tn6r6jcmnn5mzqOYfmcLlFEr0kuKzznvr w7wCItqQ9Dq+Zkt0bC3zkDqkC+8o8cV14QKNyBpjGE5nO9TvkNi4e3fwdYOxGLMRV5Is2ydVoeK zZGqLxYIvTi5pGqBoLXlXK2nTF0sLgTp4awYL5j4RRmgbouoYpUlMyCukoLEwSnXzeVPskW38E1 Fl58QT8nJvl3X8jcEchGRM1gn2wEU/Qq X-Google-Smtp-Source: AGHT+IGNdVB0KABJWJyGlmIKg8EjwPpnwfm7M8AyxbjJXEO6QjkHDLYIb4wFLWxzSsE0WKqnPVQ9ng== X-Received: by 2002:a05:6a21:6da3:b0:1e1:a68b:104a with SMTP id adf61e73a8af0-1e5e083f026mr6224476637.42.1734724593460; Fri, 20 Dec 2024 11:56:33 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:33 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 05/14] bpf: net_sched: Support implementation of Qdisc_ops in bpf Date: Fri, 20 Dec 2024 11:55:31 -0800 Message-ID: <20241220195619.2022866-6-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Enable users to implement a classless qdisc using bpf. The last few patches in this series has prepared struct_ops to support core operators in Qdisc_ops. The recent advancement in bpf such as allocated objects, bpf list and bpf rbtree has also provided powerful and flexible building blocks to realize sophisticated scheduling algorithms. Therefore, in this patch, we start allowing qdisc to be implemented using bpf struct_ops. Users can implement Qdisc_ops.{enqueue, dequeue, init, reset, and .destroy in Qdisc_ops in bpf and register the qdisc dynamically into the kernel. Co-developed-by: Cong Wang Signed-off-by: Cong Wang Signed-off-by: Amery Hung --- include/linux/btf.h | 1 + kernel/bpf/btf.c | 4 +- net/sched/Kconfig | 12 +++ net/sched/Makefile | 1 + net/sched/bpf_qdisc.c | 207 ++++++++++++++++++++++++++++++++++++++++ net/sched/sch_api.c | 7 +- net/sched/sch_generic.c | 3 +- 7 files changed, 229 insertions(+), 6 deletions(-) create mode 100644 net/sched/bpf_qdisc.c diff --git a/include/linux/btf.h b/include/linux/btf.h index 4214e76c9168..eb16218fdf52 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -563,6 +563,7 @@ const char *btf_name_by_offset(const struct btf *btf, u32 offset); const char *btf_str_by_offset(const struct btf *btf, u32 offset); struct btf *btf_parse_vmlinux(void); struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); +u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, int off); u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id, const struct bpf_prog *prog); u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id, diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index c2f4f84e539d..78476cebefe3 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6375,8 +6375,8 @@ static bool is_int_ptr(struct btf *btf, const struct btf_type *t) return btf_type_is_int(t); } -static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, - int off) +u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, + int off) { const struct btf_param *args; const struct btf_type *t; diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 8180d0c12fce..ccd0255da5a5 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -403,6 +403,18 @@ config NET_SCH_ETS If unsure, say N. +config NET_SCH_BPF + bool "BPF-based Qdisc" + depends on BPF_SYSCALL && BPF_JIT && DEBUG_INFO_BTF + help + This option allows BPF-based queueing disiplines. With BPF struct_ops, + users can implement supported operators in Qdisc_ops using BPF programs. + The queue holding skb can be built with BPF maps or graphs. + + Say Y here if you want to use BPF-based Qdisc. + + If unsure, say N. + menuconfig NET_SCH_DEFAULT bool "Allow override default queue discipline" help diff --git a/net/sched/Makefile b/net/sched/Makefile index 82c3f78ca486..904d784902d1 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -62,6 +62,7 @@ obj-$(CONFIG_NET_SCH_FQ_PIE) += sch_fq_pie.o obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o obj-$(CONFIG_NET_SCH_ETF) += sch_etf.o obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o +obj-$(CONFIG_NET_SCH_BPF) += bpf_qdisc.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c new file mode 100644 index 000000000000..4b7d013f4f5c --- /dev/null +++ b/net/sched/bpf_qdisc.c @@ -0,0 +1,207 @@ +#include +#include +#include +#include +#include +#include +#include + +static struct bpf_struct_ops bpf_Qdisc_ops; + +struct bpf_sk_buff_ptr { + struct sk_buff *skb; +}; + +static int bpf_qdisc_init(struct btf *btf) +{ + return 0; +} + +static const struct bpf_func_proto * +bpf_qdisc_get_func_proto(enum bpf_func_id func_id, + const struct bpf_prog *prog) +{ + switch (func_id) { + case BPF_FUNC_tail_call: + return NULL; + default: + return bpf_base_func_proto(func_id, prog); + } +} + +BTF_ID_LIST_SINGLE(bpf_sk_buff_ids, struct, sk_buff) +BTF_ID_LIST_SINGLE(bpf_sk_buff_ptr_ids, struct, bpf_sk_buff_ptr) + +static bool bpf_qdisc_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + struct btf *btf = prog->aux->attach_btf; + u32 arg; + + arg = get_ctx_arg_idx(btf, prog->aux->attach_func_proto, off); + if (!strcmp(prog->aux->attach_func_name, "enqueue")) { + if (arg == 2 && type == BPF_READ) { + info->reg_type = PTR_TO_BTF_ID | PTR_TRUSTED; + info->btf = btf; + info->btf_id = bpf_sk_buff_ptr_ids[0]; + return true; + } + } + + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + const struct btf_type *t, *skbt; + size_t end; + + skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); + t = btf_type_by_id(reg->btf, reg->btf_id); + if (t != skbt) { + bpf_log(log, "only read is supported\n"); + return -EACCES; + } + + switch (off) { + case offsetof(struct sk_buff, tstamp): + end = offsetofend(struct sk_buff, tstamp); + break; + case offsetof(struct sk_buff, priority): + end = offsetofend(struct sk_buff, priority); + break; + case offsetof(struct sk_buff, mark): + end = offsetofend(struct sk_buff, mark); + break; + case offsetof(struct sk_buff, queue_mapping): + end = offsetofend(struct sk_buff, queue_mapping); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, tc_classid): + end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, tc_classid); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, data[0]) ... + offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, + data[QDISC_CB_PRIV_LEN - 1]): + end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, data[QDISC_CB_PRIV_LEN - 1]); + break; + case offsetof(struct sk_buff, tc_index): + end = offsetofend(struct sk_buff, tc_index); + break; + default: + bpf_log(log, "no write support to sk_buff at off %d\n", off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of sk_buff ended at %zu\n", + off, size, end); + return -EACCES; + } + + return 0; +} + +static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { + .get_func_proto = bpf_qdisc_get_func_proto, + .is_valid_access = bpf_qdisc_is_valid_access, + .btf_struct_access = bpf_qdisc_btf_struct_access, +}; + +static int bpf_qdisc_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + const struct Qdisc_ops *uqdisc_ops; + struct Qdisc_ops *qdisc_ops; + u32 moff; + + uqdisc_ops = (const struct Qdisc_ops *)udata; + qdisc_ops = (struct Qdisc_ops *)kdata; + + moff = __btf_member_bit_offset(t, member) / 8; + switch (moff) { + case offsetof(struct Qdisc_ops, peek): + qdisc_ops->peek = qdisc_peek_dequeued; + return 0; + case offsetof(struct Qdisc_ops, id): + if (bpf_obj_name_cpy(qdisc_ops->id, uqdisc_ops->id, + sizeof(qdisc_ops->id)) <= 0) + return -EINVAL; + return 1; + } + + return 0; +} + +static int bpf_qdisc_reg(void *kdata, struct bpf_link *link) +{ + return register_qdisc(kdata); +} + +static void bpf_qdisc_unreg(void *kdata, struct bpf_link *link) +{ + return unregister_qdisc(kdata); +} + +static int Qdisc_ops__enqueue(struct sk_buff *skb__ref, struct Qdisc *sch, + struct sk_buff **to_free) +{ + return 0; +} + +static struct sk_buff *Qdisc_ops__dequeue(struct Qdisc *sch) +{ + return NULL; +} + +static struct sk_buff *Qdisc_ops__peek(struct Qdisc *sch) +{ + return NULL; +} + +static int Qdisc_ops__init(struct Qdisc *sch, struct nlattr *arg, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static void Qdisc_ops__reset(struct Qdisc *sch) +{ +} + +static void Qdisc_ops__destroy(struct Qdisc *sch) +{ +} + +static struct Qdisc_ops __bpf_ops_qdisc_ops = { + .enqueue = Qdisc_ops__enqueue, + .dequeue = Qdisc_ops__dequeue, + .peek = Qdisc_ops__peek, + .init = Qdisc_ops__init, + .reset = Qdisc_ops__reset, + .destroy = Qdisc_ops__destroy, +}; + +static struct bpf_struct_ops bpf_Qdisc_ops = { + .verifier_ops = &bpf_qdisc_verifier_ops, + .reg = bpf_qdisc_reg, + .unreg = bpf_qdisc_unreg, + .init_member = bpf_qdisc_init_member, + .init = bpf_qdisc_init, + .name = "Qdisc_ops", + .cfi_stubs = &__bpf_ops_qdisc_ops, + .owner = THIS_MODULE, +}; + +static int __init bpf_qdisc_kfunc_init(void) +{ + return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); +} +late_initcall(bpf_qdisc_kfunc_init); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 300430b8c4d2..b35c73c82342 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -358,7 +359,7 @@ static struct Qdisc_ops *qdisc_lookup_ops(struct nlattr *kind) read_lock(&qdisc_mod_lock); for (q = qdisc_base; q; q = q->next) { if (nla_strcmp(kind, q->id) == 0) { - if (!try_module_get(q->owner)) + if (!bpf_try_module_get(q, q->owner)) q = NULL; break; } @@ -1287,7 +1288,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, /* We will try again qdisc_lookup_ops, * so don't keep a reference. */ - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err = -EAGAIN; goto err_out; } @@ -1398,7 +1399,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, netdev_put(dev, &sch->dev_tracker); qdisc_free(sch); err_out2: - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err_out: *errp = err; return NULL; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 38ec18f73de4..1e770ec251a0 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -1083,7 +1084,7 @@ static void __qdisc_destroy(struct Qdisc *qdisc) ops->destroy(qdisc); lockdep_unregister_key(&qdisc->root_lock_key); - module_put(ops->owner); + bpf_module_put(ops, ops->owner); netdev_put(dev, &qdisc->dev_tracker); trace_qdisc_destroy(qdisc); From patchwork Fri Dec 20 19:55:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917351 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65379226888; Fri, 20 Dec 2024 19:56:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724597; cv=none; b=QbfEbpzNsKG+vOFGtyMX+WCyBEwV9iJmbDXq9REa8gkI6W8cWRpxPJEBbPEAurVpXNZ32dynn7siobIj8vRe6RqyBCx/hKm8BJj11bCit+9PdEUtqGGdkC1G2Dqik+N1dVU0+f3+go6NYAjvp9oPq+W80IHv0yNnVTez/qY3ngA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724597; c=relaxed/simple; bh=oTrWwuAlVXAkJcNs71LYPamD2VKPjLQ4QAnfcLW8TC4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XqCocIlDDDUHQpPHaYw8yPcWJlaFQiPNA3s/fAiD330dq07CC5vAYB1bLQcXba/4xBMQQIHNxocxhI3lv02kwzBbA/scqy0eIROqrSONdE6ow2gVBSw+85UShLDYilr/wxQBrPZcUYnLaAQgM3JF6BEfW3aSxC5CBxmP8gyo/L0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QfgX/pNc; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QfgX/pNc" Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-7273967f2f0so2873288b3a.1; Fri, 20 Dec 2024 11:56:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724594; x=1735329394; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QSuCNtOVV+R11g8QPC6TJRQ60HKfpVL/OeZ5X11d4SE=; b=QfgX/pNcKqbvsdqvBHoQjglToDtSaG1/QhBL5Wp8Ad5pvxjBhwqmKhYgHgVK84m9c6 ljs3pSpZOfEZShtyq9ovcSsEH4knbeTwKKNYTMofltb4V4nxPEFzkxKN2LhrRUdbobio 6elR5vwBfO0ieSlD92PfGHQSgSIxENDXALeThQWrtq1odxd3srlun/JcYJwtmsqz8mDJ 8Mc3f3A4NAmW9mmDkij/IZXPBcTF2Fi89dYO/yApmFeHGtb8XnJhgOZj3+ZOBd5bgDKh Jk0hsJxdli3pDWwciHGw76ps6lr90/8sgsPo0zN8JXcEJpQIrciWTfNUDgt1xN2opGR1 PYLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724594; x=1735329394; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QSuCNtOVV+R11g8QPC6TJRQ60HKfpVL/OeZ5X11d4SE=; b=G4jTDxXSUwOOp7z6yVHrXOH1sQqYUvfSE6aCFEmIEH2LaMgltAiujzdaut8JAOi4wL t38e+NLGL7qsbdSjxl0hDFaKg+8GImJ/3+l3bIrHNWN606gwCflZBrGCu1x3mFcz39kn gXn6Uf4Kb8nDD38fBWpfaKw9CvXsv0764rQ2b7dH7RzvzsYDlqml7vT8iY4b7e7J7UzM Vg5mSYlwRU5CA+MKcML6ZxQ3Yvgow+EO/Pn9zbdtQ1cpF8XDQm4UmZJ//gOpHh2FbsNa DT2S4pVcz25PhP8unI5bVsU1LU79eRwjnMo7MC3JMjOpxL89AxoPNeaPo826xA99wqut QLmw== X-Gm-Message-State: AOJu0Yzjr1jNl7w4WqzJlUZjO26RvDHaAu6JZHu6qgivV1b2JxKWXNfp F+DM4BEJKNsjW3qPusYnoDRIPMbtFu5bduKOpUztKR35R2Z/9dfP+N5uMg== X-Gm-Gg: ASbGncs0dBZnM7sLWg7R11ZSG9G3Oi/4zM5S/9pgPLI967ai5T03sl54wEysizWfmRD pnqW+dtd1Nwbdm7hvy/uZd+IUt3cZN46RUGdGSMO/cWyey1hOtvsJ6PJfO1o7xiOTUuxDtRUu4Z XEUzgmBRi75aqanfmUsVUkLJ9s+s9w+eVMQN+AoBf56mmLDn+eqVmKeptJhV+CrlLOdHrPfmCzB YdBbZTFwsGkC+zNlCPkXsFtv9BgfNZInNo7XDKu7l4YPO8ZepPkGdlAWwIF+/oR6oSGMqLtDOuC drMb8qH2cTkSXQRrmVkyV9O1Jaus4aVF X-Google-Smtp-Source: AGHT+IH0L9jh+hfs+3LzLnyzUs4dnFLtf7/ED1k4Oc55emX6aPRrk1FnRljVMfMq3BvZw/0zm0SuBw== X-Received: by 2002:a05:6a21:680b:b0:1e1:ae9a:6311 with SMTP id adf61e73a8af0-1e5e0458dcamr6655696637.4.1734724594499; Fri, 20 Dec 2024 11:56:34 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:34 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 06/14] bpf: net_sched: Add basic bpf qdisc kfuncs Date: Fri, 20 Dec 2024 11:55:32 -0800 Message-ID: <20241220195619.2022866-7-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Add basic kfuncs for working on skb in qdisc. Both bpf_qdisc_skb_drop() and bpf_kfree_skb() can be used to release a reference to an skb. However, bpf_qdisc_skb_drop() can only be called in .enqueue where a to_free skb list is available from kernel to defer the release. bpf_kfree_skb() should be used elsewhere. It is also used in bpf_obj_free_fields() when cleaning up skb in maps and collections. bpf_skb_get_hash() returns the flow hash of an skb, which can be used to build flow-based queueing algorithms. Finally, allow users to create read-only dynptr via bpf_dynptr_from_skb(). Signed-off-by: Amery Hung --- include/linux/bpf.h | 1 + kernel/bpf/bpf_struct_ops.c | 2 + net/sched/bpf_qdisc.c | 92 ++++++++++++++++++++++++++++++++++++- 3 files changed, 94 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2556f8043276..87ecee12af21 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1809,6 +1809,7 @@ struct bpf_struct_ops { void *cfi_stubs; struct module *owner; const char *name; + const struct btf_type *type; struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS]; }; diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 27d4a170df84..65542d8f064c 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -442,6 +442,8 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, goto errout; } + st_ops->type = t; + return 0; errout: diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 4b7d013f4f5c..1c92bfcc3847 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -108,6 +108,79 @@ static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, return 0; } +__bpf_kfunc_start_defs(); + +/* bpf_skb_get_hash - Get the flow hash of an skb. + * @skb: The skb to get the flow hash from. + */ +__bpf_kfunc u32 bpf_skb_get_hash(struct sk_buff *skb) +{ + return skb_get_hash(skb); +} + +/* bpf_kfree_skb - Release an skb's reference and drop it immediately. + * @skb: The skb whose reference to be released and dropped. + */ +__bpf_kfunc void bpf_kfree_skb(struct sk_buff *skb) +{ + kfree_skb(skb); +} + +/* bpf_qdisc_skb_drop - Drop an skb by adding it to a deferred free list. + * @skb: The skb whose reference to be released and dropped. + * @to_free_list: The list of skbs to be dropped. + */ +__bpf_kfunc void bpf_qdisc_skb_drop(struct sk_buff *skb, + struct bpf_sk_buff_ptr *to_free_list) +{ + __qdisc_drop(skb, (struct sk_buff **)to_free_list); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(qdisc_kfunc_ids) +BTF_ID_FLAGS(func, bpf_skb_get_hash, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_kfree_skb, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_qdisc_skb_drop, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_dynptr_from_skb, KF_TRUSTED_ARGS) +BTF_KFUNCS_END(qdisc_kfunc_ids) + +BTF_SET_START(qdisc_common_kfunc_set) +BTF_ID(func, bpf_skb_get_hash) +BTF_ID(func, bpf_kfree_skb) +BTF_SET_END(qdisc_common_kfunc_set) + +BTF_SET_START(qdisc_enqueue_kfunc_set) +BTF_ID(func, bpf_qdisc_skb_drop) +BTF_SET_END(qdisc_enqueue_kfunc_set) + +static int bpf_qdisc_kfunc_filter(const struct bpf_prog *prog, u32 kfunc_id) +{ + if (bpf_Qdisc_ops.type != btf_type_by_id(prog->aux->attach_btf, + prog->aux->attach_btf_id)) + return 0; + + /* Skip the check when prog->attach_func_name is not yet available + * during check_cfg(). + */ + if (!btf_id_set8_contains(&qdisc_kfunc_ids, kfunc_id) || + !prog->aux->attach_func_name) + return 0; + + if (!strcmp(prog->aux->attach_func_name, "enqueue")) { + if (btf_id_set_contains(&qdisc_enqueue_kfunc_set, kfunc_id)) + return 0; + } + + return btf_id_set_contains(&qdisc_common_kfunc_set, kfunc_id) ? 0 : -EACCES; +} + +static const struct btf_kfunc_id_set bpf_qdisc_kfunc_set = { + .owner = THIS_MODULE, + .set = &qdisc_kfunc_ids, + .filter = bpf_qdisc_kfunc_filter, +}; + static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { .get_func_proto = bpf_qdisc_get_func_proto, .is_valid_access = bpf_qdisc_is_valid_access, @@ -200,8 +273,25 @@ static struct bpf_struct_ops bpf_Qdisc_ops = { .owner = THIS_MODULE, }; +BTF_ID_LIST(bpf_sk_buff_dtor_ids) +BTF_ID(func, bpf_kfree_skb) + static int __init bpf_qdisc_kfunc_init(void) { - return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); + int ret; + const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = { + { + .btf_id = bpf_sk_buff_ids[0], + .kfunc_btf_id = bpf_sk_buff_dtor_ids[0] + }, + }; + + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_qdisc_kfunc_set); + ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors, + ARRAY_SIZE(skb_kfunc_dtors), + THIS_MODULE); + ret = ret ?: register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); + + return ret; } late_initcall(bpf_qdisc_kfunc_init); From patchwork Fri Dec 20 19:55:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917352 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2015C227567; Fri, 20 Dec 2024 19:56:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724597; cv=none; b=lqI5nAumTO3haHPyn6fj7mauwMRs/kvv1WzvtSUOpLVW7nX04wvCgdn8JRL3tZDzsMSY+8uMY6eiIlTWQeF1HT+wvMt2ro2Ol9JiQdF4z5UI913ZtdSpcf9BO4BU448ruCs0UoU0YAq3nRAc6gaKf3+IDG9fCy+4zkfD82zbnyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724597; c=relaxed/simple; bh=7VTKmqevgwD2T3vHTnzqDkUUJE0B8xtaL03/gd0M7oM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KtFArYOejjSkMfaTcpk88NJjMjDnvJMgeH81rE12j1fBDM3CfjmCFwEoeeQrEEk4MCbtkXh0CqNYENm5sx6F2J//9SYdf+qWjGRsDeBE8vnCf+Wx4WiB4kCADohXDpuFkVTHgOtHigSKBscyTDDmyXdz+GdT/p/yUJtJ2GqllNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hD3JGs+B; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hD3JGs+B" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-7fd526d4d9eso1766103a12.2; Fri, 20 Dec 2024 11:56:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724595; x=1735329395; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3pVqW4kyTwbSxeSE3OUqtXEyeML27KshGWKHBkIuQrc=; b=hD3JGs+BWqaz93RgHNJSejc6x5ECShPcd4iQWW0n+Li+iB7xJUJWnEmspbsSrPhp38 WIk2TfHbT0ZQa8eZrY4SNY9vAFHVR+Bam6S7x5foPXC5Qnh6BNAStDysfhvFniwogwAw 3Oxce2QmBU5L4XicoN9PCBQSsA1xaPyVd+m5sdQY2hlQJcg9PCa1xz9GKw1hITobq52i icelhXC1B5w/gzUb4WQjVAaQMjYyMNmEBReXg/UJP/haOSwBCxnaNSAf4B2sFDKobyUN zLnnDCg+5iKCXZEZjgM3AuXnzJ0hRwY+PiY1U5Yb2R+rUFdzAY7qPQJSyYPqejZA3wHp HC2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724595; x=1735329395; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3pVqW4kyTwbSxeSE3OUqtXEyeML27KshGWKHBkIuQrc=; b=rQuDjTAwcbrDitzVzw7RtZcSeDxY/XKnTiStz37ZPpB7d4xM4hJwdfNeDZ1/3MAzkE FckKrh2X8YFJSYaoYCtXWgxwK2NIF5lfW9devfHt7y1uHTbGYCVZa7Y/FxfNhlUgYagD 48lyhDrkJSInju3KkWynp2RZldmws2W4yVXEjQ6OoH5LMjDTnDLSZfhTOIEQzGhn6bpw KtumZ+sA2+JXSniD29IQ3MYXc+fvsgoiral0hbClpiRUSP6zun5sNR3j16izlMqIa+LR KRdevWC94KO3qBPOo45T4Czc20gkzouWdkquM1b7nPKs0XMxITUZqofttVqSDs953kPt 3ANg== X-Gm-Message-State: AOJu0Ywa1/bsa55Xyx8RM3H+yZvI6zIVfahoiWWrEWhv2hImFTa2BMSn VyR9wTpRkFl0ENRiVpfk18dpJICddo7e/+sHfWt1HLSpL7p2d2wRFjlABA== X-Gm-Gg: ASbGncteUX1CJY1dWmPwLMM8WSu45ZLYqjlA6p+z4VO3FHxmcXcWl1ShMfmqELJHT6N Jc97k6f6+ZYxSkjL1a8yfdQakYif0lCvZs+zo4LrQHx35tLZVRS18pJuUVR6Zq8ZLNMO76BvuZ/ xS95hzEiPjbPghD2WuzhK4gpEe6FbROHUx1eBwBN6Mf3s0rhnbFuv5e7v//n72+bdOqrx2KI55b p5P/S1gKToNfHbaweemK4Df8fOP/0eN/i7cqJvk45/BfaaNyxb4u2H5dfB11amdbfpBFltk6Kfj lLuXPRu9fP+LbQZUdSQPbey6mGohc0Ea X-Google-Smtp-Source: AGHT+IEEz5M/z8tYwRcmPtDwIciVMVaTLXI5ltjtZLcP2vze1Rmfe8O6DYxFr/gp/wwBuC44qziTZw== X-Received: by 2002:a05:6a21:339e:b0:1e1:aba4:209c with SMTP id adf61e73a8af0-1e5e07ffc0amr7662403637.29.1734724595385; Fri, 20 Dec 2024 11:56:35 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:35 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 07/14] bpf: Search and add kfuncs in struct_ops prologue and epilogue Date: Fri, 20 Dec 2024 11:55:33 -0800 Message-ID: <20241220195619.2022866-8-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Currently, add_kfunc_call() is only invoked once before the main verification loop. Therefore, the verifier could not find the bpf_kfunc_btf_tab of a new kfunc call which is not seen in user defined struct_ops operators but introduced in gen_prologue or gen_epilogue during do_misc_fixup(). Fix this by searching kfuncs in the patching instruction buffer and add them to prog->aux->kfunc_tab. Signed-off-by: Amery Hung --- kernel/bpf/verifier.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0e6a3c4daa7d..949812d955ca 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3214,6 +3214,21 @@ bpf_jit_find_kfunc_model(const struct bpf_prog *prog, return res ? &res->func_model : NULL; } +static int add_kfunc_in_insns(struct bpf_verifier_env *env, + struct bpf_insn *insn, int cnt) +{ + int i, ret; + + for (i = 0; i < cnt; i++, insn++) { + if (bpf_pseudo_kfunc_call(insn)) { + ret = add_kfunc_call(env, insn->imm, insn->off); + if (ret < 0) + return ret; + } + } + return 0; +} + static int add_subprog_and_kfunc(struct bpf_verifier_env *env) { struct bpf_subprog_info *subprog = env->subprog_info; @@ -20278,7 +20293,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) { struct bpf_subprog_info *subprogs = env->subprog_info; const struct bpf_verifier_ops *ops = env->ops; - int i, cnt, size, ctx_field_size, delta = 0, epilogue_cnt = 0; + int i, cnt, size, ctx_field_size, ret, delta = 0, epilogue_cnt = 0; const int insn_cnt = env->prog->len; struct bpf_insn *epilogue_buf = env->epilogue_buf; struct bpf_insn *insn_buf = env->insn_buf; @@ -20307,6 +20322,10 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) return -ENOMEM; env->prog = new_prog; delta += cnt - 1; + + ret = add_kfunc_in_insns(env, epilogue_buf, epilogue_cnt - 1); + if (ret < 0) + return ret; } } @@ -20327,6 +20346,10 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) env->prog = new_prog; delta += cnt - 1; + + ret = add_kfunc_in_insns(env, insn_buf, cnt - 1); + if (ret < 0) + return ret; } } From patchwork Fri Dec 20 19:55:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917354 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35A58227B8B; Fri, 20 Dec 2024 19:56:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724599; cv=none; b=c6WCmTqguhjh2o+TtBVtcpb7Y1VxiHkBD3MXnFsZojj2gg0uXBuVFT75ABlMp5BsayyIQCz1hyguBVI1M/eV/aPFvSEh6mVtDaGzn+v66BAPpag4NDZUWOvF6K4bKDkH32uhxnRtMlur2hCtLeUbrCPfPp8l7LlTOo0JBxzwWWE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724599; c=relaxed/simple; bh=84pJMQgiagIW1z54qE51DOsuDzjf1/1ePpxf0Iks4RY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lzye7stJgEd/m6SYkgBkb9qNBUvmqGPa+0FRgfUnVAldWD/ZSOwQI9DT7DQ4E3VfmjcrFEqk8StZnQAE0vRZczlRsR1LyuKrMSetyTq1CKTkqf71sX8LEP1IKq6AZOh/i7k9iiom1lYukRsmEPCmJraO5K1t4p5oUxAcunmMsEQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QQ9EllfB; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QQ9EllfB" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-725ea1e19f0so2060622b3a.3; Fri, 20 Dec 2024 11:56:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724596; x=1735329396; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KQleOtsoMTigSQpuxD19q3FCZQss7kN8SAwfnWhREag=; b=QQ9EllfBzdLqag/PeMM3Q2AmWK5WG9Me2EYg8xbAAC7FsWx0if6bWkq5ZfA4378Ssv uGFWe4UiGiHDMVDeamSMZG/GV92BdWI6leYGhoerI1aofuV/AGoXhnvwiPDd2fhGSb9R L5Etn/Pn95WOlcZZ0EOtkb/mmlQt8u+ruoPzFZGFXIlV4IHl1MDJJTm6NT5mrWAsZv4x 9BrKWjUpg5RdOQkTVtyDsE6YYbQQTf9O6+TGdmc57+NcacDUtaoAkNPl3SbS3uK6bw4b /psUP67MG5xjP5PXg5ML0o7TiV1IsE01+9zFHSnonMO6Z58inpXFx/eiY3qviBrVCjgx sssA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724596; x=1735329396; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KQleOtsoMTigSQpuxD19q3FCZQss7kN8SAwfnWhREag=; b=DuY/iegFgKpXT0SIZNUCcVH4Kq9cptcU+KJpm9J/zR6PJTrj/b8cPE/DDmBq4plbsl KnmxBNUEEqXdleSj5fQFRU5WU2/uVc1q8xSeQe688cUkuAa8ywdXQ/03i6Mx3HA/Qw7B Wih2BF5LcKAuGjrNjHsdqwGbFy+STfQli0zc06dwb/B7jZIP7JrAOHKnUaiVE8VcApAm po/SlnPxiEykewdbd4p2B8GsKvoIOoezOhYjF/i8Tl7jTdNqHws1O49nzxlHk7BL7rGW vrJbY90lV0Oo9NZenJMK8HSLX6HhzmXK8is3zajUd2Jk21+wOxg+/smx51xsngnzy6xF IQVQ== X-Gm-Message-State: AOJu0YxVu0JNyFFeAsuyWO5V5ZTRxZ1UgtnwO/cMO/asL9EgpmyZLlNS Eo+R8yqqhSTjNfF2c5DXgok8l/Peb8IPXPqm1Jcd1hkXC2jYtELeu27jdA== X-Gm-Gg: ASbGncuqFBwzE/++fJE9HwQQ34DvVI3u1stegBJ/rYcyPFSIPm7LCUWEeFyr1T9Njon sn5Jr6sCRREiCegoYl5HGU8Cz4DBu+dSpkTr85lJPZi48UeuLnwta2UImFIbN0phDUmpRkCpsAN HZJUC2qK8N+N8BurMEWj2quvTF0u3hn6Gkpf62x/oXit8UKAQvSCUaZTE79psgkSb3jCHbbL9/5 POPyVyY7Ue7yPbJSXUxqd87rYfMeGTEfC87Q10eKChvxwNbhOE9OstjIZ4M2v2t+J0QfWq1mOw8 43rRefl+1fbjgBq0miaytdDJLcakCsq+ X-Google-Smtp-Source: AGHT+IGKP+4ByVkH/VVGeF/pDsTx1qdhcrUWh7ix99KzyYukzDiBSSFXM+k4s62OwtcNmahc+80NWg== X-Received: by 2002:a05:6a21:2d05:b0:1e1:ae68:d8f5 with SMTP id adf61e73a8af0-1e5e04958edmr6739364637.26.1734724596427; Fri, 20 Dec 2024 11:56:36 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:36 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 08/14] bpf: net_sched: Add a qdisc watchdog timer Date: Fri, 20 Dec 2024 11:55:34 -0800 Message-ID: <20241220195619.2022866-9-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Add a watchdog timer to bpf qdisc. The watchdog can be used to schedule the execution of qdisc through kfunc, bpf_qdisc_schedule(). It can be useful for building traffic shaping scheduling algorithm, where the time the next packet will be dequeued is known. Signed-off-by: Amery Hung --- include/linux/filter.h | 10 +++++ net/sched/bpf_qdisc.c | 92 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+) diff --git a/include/linux/filter.h b/include/linux/filter.h index 0477254bc2d3..3bc9b741a120 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -469,6 +469,16 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn) .off = 0, \ .imm = BPF_CALL_IMM(FUNC) }) +/* Kfunc call */ + +#define BPF_CALL_KFUNC(OFF, IMM) \ + ((struct bpf_insn) { \ + .code = BPF_JMP | BPF_CALL, \ + .dst_reg = 0, \ + .src_reg = BPF_PSEUDO_KFUNC_CALL, \ + .off = OFF, \ + .imm = IMM }) + /* Raw code statement block */ #define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \ diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 1c92bfcc3847..bbe7aded6f24 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -8,6 +8,10 @@ static struct bpf_struct_ops bpf_Qdisc_ops; +struct bpf_sched_data { + struct qdisc_watchdog watchdog; +}; + struct bpf_sk_buff_ptr { struct sk_buff *skb; }; @@ -108,6 +112,46 @@ static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, return 0; } +BTF_ID_LIST(bpf_qdisc_init_prologue_ids) +BTF_ID(func, bpf_qdisc_init_prologue) + +static int bpf_qdisc_gen_prologue(struct bpf_insn *insn_buf, bool direct_write, + const struct bpf_prog *prog) +{ + struct bpf_insn *insn = insn_buf; + + if (strcmp(prog->aux->attach_func_name, "init")) + return 0; + + *insn++ = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1); + *insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0); + *insn++ = BPF_CALL_KFUNC(0, bpf_qdisc_init_prologue_ids[0]); + *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6); + *insn++ = prog->insnsi[0]; + + return insn - insn_buf; +} + +BTF_ID_LIST(bpf_qdisc_reset_destroy_epilogue_ids) +BTF_ID(func, bpf_qdisc_reset_destroy_epilogue) + +static int bpf_qdisc_gen_epilogue(struct bpf_insn *insn_buf, const struct bpf_prog *prog, + s16 ctx_stack_off) +{ + struct bpf_insn *insn = insn_buf; + + if (strcmp(prog->aux->attach_func_name, "reset") && + strcmp(prog->aux->attach_func_name, "destroy")) + return 0; + + *insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_FP, ctx_stack_off); + *insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0); + *insn++ = BPF_CALL_KFUNC(0, bpf_qdisc_reset_destroy_epilogue_ids[0]); + *insn++ = BPF_EXIT_INSN(); + + return insn - insn_buf; +} + __bpf_kfunc_start_defs(); /* bpf_skb_get_hash - Get the flow hash of an skb. @@ -136,6 +180,36 @@ __bpf_kfunc void bpf_qdisc_skb_drop(struct sk_buff *skb, __qdisc_drop(skb, (struct sk_buff **)to_free_list); } +/* bpf_qdisc_watchdog_schedule - Schedule a qdisc to a later time using a timer. + * @sch: The qdisc to be scheduled. + * @expire: The expiry time of the timer. + * @delta_ns: The slack range of the timer. + */ +__bpf_kfunc void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_schedule_range_ns(&q->watchdog, expire, delta_ns); +} + +/* bpf_qdisc_init_prologue - Hidden kfunc called in prologue of .init. */ +__bpf_kfunc void bpf_qdisc_init_prologue(struct Qdisc *sch) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_init(&q->watchdog, sch); +} + +/* bpf_qdisc_reset_destroy_epilogue - Hidden kfunc called in epilogue of .reset + * and .destroy + */ +__bpf_kfunc void bpf_qdisc_reset_destroy_epilogue(struct Qdisc *sch) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_cancel(&q->watchdog); +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(qdisc_kfunc_ids) @@ -143,6 +217,9 @@ BTF_ID_FLAGS(func, bpf_skb_get_hash, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_kfree_skb, KF_RELEASE) BTF_ID_FLAGS(func, bpf_qdisc_skb_drop, KF_RELEASE) BTF_ID_FLAGS(func, bpf_dynptr_from_skb, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_qdisc_watchdog_schedule, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_qdisc_init_prologue, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_qdisc_reset_destroy_epilogue, KF_TRUSTED_ARGS) BTF_KFUNCS_END(qdisc_kfunc_ids) BTF_SET_START(qdisc_common_kfunc_set) @@ -152,8 +229,13 @@ BTF_SET_END(qdisc_common_kfunc_set) BTF_SET_START(qdisc_enqueue_kfunc_set) BTF_ID(func, bpf_qdisc_skb_drop) +BTF_ID(func, bpf_qdisc_watchdog_schedule) BTF_SET_END(qdisc_enqueue_kfunc_set) +BTF_SET_START(qdisc_dequeue_kfunc_set) +BTF_ID(func, bpf_qdisc_watchdog_schedule) +BTF_SET_END(qdisc_dequeue_kfunc_set) + static int bpf_qdisc_kfunc_filter(const struct bpf_prog *prog, u32 kfunc_id) { if (bpf_Qdisc_ops.type != btf_type_by_id(prog->aux->attach_btf, @@ -170,6 +252,9 @@ static int bpf_qdisc_kfunc_filter(const struct bpf_prog *prog, u32 kfunc_id) if (!strcmp(prog->aux->attach_func_name, "enqueue")) { if (btf_id_set_contains(&qdisc_enqueue_kfunc_set, kfunc_id)) return 0; + } else if (!strcmp(prog->aux->attach_func_name, "dequeue")) { + if (btf_id_set_contains(&qdisc_dequeue_kfunc_set, kfunc_id)) + return 0; } return btf_id_set_contains(&qdisc_common_kfunc_set, kfunc_id) ? 0 : -EACCES; @@ -185,6 +270,8 @@ static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { .get_func_proto = bpf_qdisc_get_func_proto, .is_valid_access = bpf_qdisc_is_valid_access, .btf_struct_access = bpf_qdisc_btf_struct_access, + .gen_prologue = bpf_qdisc_gen_prologue, + .gen_epilogue = bpf_qdisc_gen_epilogue, }; static int bpf_qdisc_init_member(const struct btf_type *t, @@ -200,6 +287,11 @@ static int bpf_qdisc_init_member(const struct btf_type *t, moff = __btf_member_bit_offset(t, member) / 8; switch (moff) { + case offsetof(struct Qdisc_ops, priv_size): + if (uqdisc_ops->priv_size) + return -EINVAL; + qdisc_ops->priv_size = sizeof(struct bpf_sched_data); + return 1; case offsetof(struct Qdisc_ops, peek): qdisc_ops->peek = qdisc_peek_dequeued; return 0; From patchwork Fri Dec 20 19:55:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917353 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D0AE227BB9; Fri, 20 Dec 2024 19:56:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724599; cv=none; b=uT3lQDPa5k9pe1GENAeee+4Ih5gCCIXw50lGvs8kyCacXKfUQ3asYuTpCw6l51gCKQCpQftsvPRLaMYKbI9d9NsIZxt7Zb6crEOZynVXaV7HNj56RaqPWcxLQBuGqmzqw5jWTLl2QZM5xlCfZZjFTt4/SgiTID424HCsVWibDig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724599; c=relaxed/simple; bh=bJ4fRTQDsJLP+TAqkbvy8ATF64cWWM0drr4CpUrcOHA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Pf68XJriguzfKFNo1vrjLb76P7mntfV5VAsuC05sf04uwGJpaqJ3ka/yVkpBRU6U47042pK8FGedVMzmHZs+C3i+4IV4vZypfiWSdl+v0U5V1kFUSf13ONNpMF3Z5REVHCH6CDVb07ik3LC5fbStapoFwM5eFmZ/ShU3kN20w3k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=D252skAF; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D252skAF" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-728ea1e0bdbso2239767b3a.0; Fri, 20 Dec 2024 11:56:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724597; x=1735329397; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pFyFpVy1vmyR3zXTCfx76JfAwKafHxDwBgzFEvy/ujc=; b=D252skAFeuHFRndOTfnS4K9ar+90Jk33B2xA3jwskFo8sGuig7EYu6RsCsbZbJyqyj nKXlrgTasGD8yWALYBkU3fOHYICnSnT5cfhDH6G1SIOC2pktapVHS8E5gvWeFjkTp7MY xSfxZDo3HnC/otgvWW5gI8YTVDrSeDF/KP12K2Q8FncKYgtmSqvwp60+EcVSSzgtJiZs LW0/ScewONv8iKO+l9Q/tZyAus26Nr0TOZaCVg7A/AgUmTF6mqFAH5wtJ07RuMl/qxSL 1boqdOPTE3K4s/fcXIQHqCSm+lILNsT2ub8/qlvacY9g2Fh+oVH5+rWOk2hw84GIbAx8 km9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724597; x=1735329397; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pFyFpVy1vmyR3zXTCfx76JfAwKafHxDwBgzFEvy/ujc=; b=TEiGOsJVYHuR1XNTSASO1v6xU3+HnFZD+ungQc4aqMH/AsN0fWd0a22yThX0UgYI+j 7YCRj0gIM9phSBqlUX2oCclMa6iU4vb8dr8+VHE6RLcEbRQot6txpyQykuIEjvoABAIR Nxbn0XjCeeLMCYFDr85pArf+oRamQJZ9HQfrwPffIB7FZzbnVS1bMqOuu7/8NLNHNGxz loXLsgKc9E5LR1npHdkJPpofcnHDPXmCvAFgfAqzeV4c6RaXxO5cjMHWUgKIaTKi1wHA XwxRTMYEXXc84QSkvqWEWjgwzU77KbkQb2Vt/780amJWMS97lsa80KQM7m2SK+ewA+4d UYow== X-Gm-Message-State: AOJu0YwEgV+XDzclXZE0hverxuMHtMFOgSz4hDIMZkJJu8ALhX8u781O CeEZbmbhF+VRWwXFf/wZrlQmjy4zFRRaFQ4os9k7ixIbN/oNLsB1Ev53Aw== X-Gm-Gg: ASbGncthTmT9z+OFSrqwMsEaRffyrcI8HbVhsyEUdiVHQ58b8y/4GNk93OWPbpg6g5H TpqDnZ+ILUXrR26micMhjl/5TnBw1okMeSXtC4OWEtXS4tHC0t5cHBcSisi/DdrtF23mLuL7AVN QluzV0UrsZ6iBysLvS2OjhNYoEg9pqKpNr4UESKp/Ma0BSTg5kavrMFZ2s+ypBAaAKTqjOXc+bQ SjUpSDIY9+cXxYSJJj7JldZVAuVXOrpmLeHDCsSkXm3q2qRd5mZzcNHISfSWxcXQog6x20+SFPA yX4uyzw4HUSbiAdMnPVaAKviNQTXz1KP X-Google-Smtp-Source: AGHT+IER6AZe7MjnhGMPynqqes2HIr6gfFCPX7hGgmJQ9JUs1nV8DmF7k11hW3eHKR8q9+3MfcLpVQ== X-Received: by 2002:a05:6a21:900e:b0:1db:ec0f:5cf4 with SMTP id adf61e73a8af0-1e5e081c8bbmr7501500637.39.1734724597307; Fri, 20 Dec 2024 11:56:37 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:37 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 09/14] bpf: net_sched: Support updating bstats Date: Fri, 20 Dec 2024 11:55:35 -0800 Message-ID: <20241220195619.2022866-10-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Add a kfunc to update Qdisc bstats when an skb is dequeued. The kfunc is only available in .dequeue programs. Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index bbe7aded6f24..39f01daed48a 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -210,6 +210,15 @@ __bpf_kfunc void bpf_qdisc_reset_destroy_epilogue(struct Qdisc *sch) qdisc_watchdog_cancel(&q->watchdog); } +/* bpf_qdisc_bstats_update - Update Qdisc basic statistics + * @sch: The qdisc from which an skb is dequeued. + * @skb: The skb to be dequeued. + */ +__bpf_kfunc void bpf_qdisc_bstats_update(struct Qdisc *sch, const struct sk_buff *skb) +{ + bstats_update(&sch->bstats, skb); +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(qdisc_kfunc_ids) @@ -220,6 +229,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_from_skb, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_qdisc_watchdog_schedule, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_qdisc_init_prologue, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_qdisc_reset_destroy_epilogue, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_qdisc_bstats_update, KF_TRUSTED_ARGS) BTF_KFUNCS_END(qdisc_kfunc_ids) BTF_SET_START(qdisc_common_kfunc_set) @@ -234,6 +244,7 @@ BTF_SET_END(qdisc_enqueue_kfunc_set) BTF_SET_START(qdisc_dequeue_kfunc_set) BTF_ID(func, bpf_qdisc_watchdog_schedule) +BTF_ID(func, bpf_qdisc_bstats_update) BTF_SET_END(qdisc_dequeue_kfunc_set) static int bpf_qdisc_kfunc_filter(const struct bpf_prog *prog, u32 kfunc_id) From patchwork Fri Dec 20 19:55:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917355 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27142227B85; Fri, 20 Dec 2024 19:56:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724600; cv=none; b=ROhtXM4/Zb288Dn7NU5brctgy0TTJt6LGYsVHl1G35RpTwpblzQKVAXEPHad+JVGqx1dAQEnuEa1A1Laniv97QmbcXa0bnQli65aEPgGPV9MOqFoCtb5yUT+TdCJ9K+Ak1Mlq7R7rK1tnTzdRrGIJEz3XaQbeQNqQOzOcaH39FY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724600; c=relaxed/simple; bh=Cgx59rfQ7uTrwAt2fviLdB9a8k5VgQaOm7OFEtfhO5I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uo405rPgzzCxxV31WePxl5+kdg/KwmoFyUirkOy6DtJHm+bnuESLEtkXiGgDpEFYBMi47NzQMb+NGY6I7J09+wQeDviCLCciWrkNIz6N9MJu6j+LmoUeZR/gTPcrikp23YAufeHQ0hCxuHWa41F+wmgEo8cixaHgFgXtZmRAfGo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cXdPh9QK; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cXdPh9QK" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-725f4623df7so2322700b3a.2; Fri, 20 Dec 2024 11:56:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724598; x=1735329398; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dZT2YZ64lp7kr6Ks7lrnL2X7foJtTo0Jit/ACU1Pr64=; b=cXdPh9QKDbe8djpCFCDtZIkD4y+DN8knt3wFWA7BpmmeDLfuC9SwS3/lOxPvXxn2lG W2gHPuXuMztUxiURVhbw65rWMTo6tw9p9COkO9swhJG3MBHjhIfHpeDrBCp6qLy5lLZR VCqZLeo0i5aoPzcvriBqqqf4293QJ9fDkPEPZTMH95M0H5DLsxdHkZ31TFjNHrOMyV83 kmmpzsRHiyalek5lsMKdTbQ5nXTVl95qorZ0MglPTAW4/C8GUhF+W+O5KyCsME+6WXYE 1nyNTrQb3jPdY7NuepkADznArLNLM/WLtyS/2wTTJcvvnnc9TwVgsHhRw8KQPodTqvdP lTuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724598; x=1735329398; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dZT2YZ64lp7kr6Ks7lrnL2X7foJtTo0Jit/ACU1Pr64=; b=tDbB8VlWtSELSImFYQg91kGLussuAdSJIq2zQQ7rZrI6iOIjc/AumJtJK8ZQNU5zBv W5+To3P/C4cMfgnYSXdkMPQhxcUUAfdOh+7Gjfuv6Ymxfs/5PIAIfE0/cA4gNqfO16AZ 54M9uLlaizty1PgddPEnhfbG6Ln8wMwBJIabx0sD3VhMsrJW4dpGgz3f92SizwJ35WaX /zX0bcgK5RnaCUkbRJU2oHqROgKr/oN0h6Epz7QB5KyhMI1JPr10ykvteQS2Yb/oGvvb U6lxuN89Xw4MEQojdP6u7JfVUVXrmfy2ge/HFWf+7TVoZLToWUNiVomuOERcJYVGXKgq ZtLw== X-Gm-Message-State: AOJu0Yz9uYJyI0eA2/IdpfuWemigfpGQGem3kn3hfeIACQdxUDvMmfhf Qr28iKABU8ntlAxsUfkAjNar+1bN5EcBwJRIFRVgCo/WzMkBpG1k22199A== X-Gm-Gg: ASbGncsObQanDyim+xqQ/9vKmGCnQoHz5xm5YtwxLUPOXGXeNFH1riTOicyP6qyzini Wc1yPvk11+MR0E16DKcAUFmPhhMin/9WvU2om/ees2aKmwjE6li998u52kju8ahfJ8YWSB4J0Mw CUUzeZF2Se4Ac1T7tQtk/fMT79A1Set+mhfYWvt3ad6ZYURMk5xbUkkKyuZWgcJIhw0i3+ZWhsW CKmb/XDFs2aVuWd47FPmHpXj28J8/5yFUkKUNGXp+VLJKSnD1IPfEa0YLy1KHErMohI3CVlwQw7 ez0Yp2SQ0WS5DxMnlXeIUISFlVGM6VfN X-Google-Smtp-Source: AGHT+IHt+sIOm5AzBnZJjLKQRCn7gqoUYL6/KAY1Eg2b7FDWJY31wXUbw6+TxDV6kT7bYfIFKMmy3Q== X-Received: by 2002:a05:6a00:8085:b0:725:ae5f:7f06 with SMTP id d2e1a72fcca58-72abe096383mr6459461b3a.23.1734724598359; Fri, 20 Dec 2024 11:56:38 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:38 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 10/14] bpf: net_sched: Support updating qstats Date: Fri, 20 Dec 2024 11:55:36 -0800 Message-ID: <20241220195619.2022866-11-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Allow bpf qdisc programs to update Qdisc qstats directly with btf struct access. Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 53 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 8 deletions(-) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 39f01daed48a..04ad3676448f 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -33,6 +33,7 @@ bpf_qdisc_get_func_proto(enum bpf_func_id func_id, } } +BTF_ID_LIST_SINGLE(bpf_qdisc_ids, struct, Qdisc) BTF_ID_LIST_SINGLE(bpf_sk_buff_ids, struct, sk_buff) BTF_ID_LIST_SINGLE(bpf_sk_buff_ptr_ids, struct, bpf_sk_buff_ptr) @@ -57,20 +58,37 @@ static bool bpf_qdisc_is_valid_access(int off, int size, return bpf_tracing_btf_ctx_access(off, size, type, prog, info); } -static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, - const struct bpf_reg_state *reg, - int off, int size) +static int bpf_qdisc_qdisc_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) { - const struct btf_type *t, *skbt; size_t end; - skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); - t = btf_type_by_id(reg->btf, reg->btf_id); - if (t != skbt) { - bpf_log(log, "only read is supported\n"); + switch (off) { + case offsetof(struct Qdisc, qstats) ... offsetofend(struct Qdisc, qstats) - 1: + end = offsetofend(struct Qdisc, qstats); + break; + default: + bpf_log(log, "no write support to Qdisc at off %d\n", off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of Qdisc ended at %zu\n", + off, size, end); return -EACCES; } + return 0; +} + +static int bpf_qdisc_sk_buff_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + size_t end; + switch (off) { case offsetof(struct sk_buff, tstamp): end = offsetofend(struct sk_buff, tstamp); @@ -112,6 +130,25 @@ static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, return 0; } +static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + const struct btf_type *t, *skbt, *qdisct; + + skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); + qdisct = btf_type_by_id(reg->btf, bpf_qdisc_ids[0]); + t = btf_type_by_id(reg->btf, reg->btf_id); + + if (t == skbt) + return bpf_qdisc_sk_buff_access(log, reg, off, size); + else if (t == qdisct) + return bpf_qdisc_qdisc_access(log, reg, off, size); + + bpf_log(log, "only read is supported\n"); + return -EACCES; +} + BTF_ID_LIST(bpf_qdisc_init_prologue_ids) BTF_ID(func, bpf_qdisc_init_prologue) From patchwork Fri Dec 20 19:55:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917356 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 241342288D6; Fri, 20 Dec 2024 19:56:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724601; cv=none; b=RydAu6ebdOHvuWxHL7ltofn4aqNqt1EuBnD+eB7Id/pYUITfj9yefL6md0LMVqi0aieUd0EkwAEdpg+rqgwf/OXfXLgxLYWy2MisLP2+eOyGXnXDzu6gR5Hm+OOZYFt2MTqqV9bbLfTGUYknedVlYoyT6Bg0ZdlJKzXOY7IJztc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724601; c=relaxed/simple; bh=AuQ3ZCg2NNe30v+Ze3MFue3SVQxUbW9Y1NvlIhdZxK0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M4jwcrtZYO7KiwsN5g8x0zo0CqHbGFRpESdEaA+A4Dy3bGwP4uudwqK0PUZMKABnSerJzEwLp7aLLnBIueFaFX5IM5NFwix5fgEjQ1nT93mLNAd5C72JWghNPPMc9G0RWpeZpUnt5PGQxFPKdkN9JZeEwiiAUrMEJL/dMb9qUnk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PTHratCg; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PTHratCg" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-725d9f57d90so1805536b3a.1; Fri, 20 Dec 2024 11:56:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724599; x=1735329399; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7ZLKP2is0OI6w79iMXmmz62S0UgMKX3j5QlN+oXd+zM=; b=PTHratCgf/TxoYxFdoKJA14OazUYaJZLr6acif59UHcybZj9ypRfEDIy77LXWkF7H8 eDaYN+WiLBtCb3DKZNrNvWra15HqwADshGGsxbvuTkxx/Yevylg88AB5+VUnG+IkrJe8 NARdgnoNJNL/MFPhsN7Ng3lhO3D6qK78+h3ZjUZR4wQYaj4+oOa0Q/b1QCpKwdECg60D t/o9lG/Xn57NQiSLivR4vxF5JjW/IrD/KqkZHWcB3uvXJthbfuZVmIh2U9EUYVItfUDA JmbZitZ11tN4HWi4gxv3AozRgXNePu89qY3JiRJbHjyz7RE1gFJh2+lTKcnA+raxYpBl siwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724599; x=1735329399; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7ZLKP2is0OI6w79iMXmmz62S0UgMKX3j5QlN+oXd+zM=; b=hJCzWnQgvhXIUPOae/6KRhxMFT0pNykgywLsd0poDDdpwBLZyRYgOho1cvqZL2CyyW 9S3LuhWvrMShXVBgHcl1Tr33xBSw6n7CJe4nmsU8MVwH9LcmR53KTzuX69fXoygzD3oB l1zdPaQPN0ZRcQJSV+IanGDYH+3yyeNIul5opb/bBbWZ0l61nUhdzNOPWCTC8QH6aA0Y kaAUDyXwpFzFyYekobXylvlkn8V3gnymM+rRy7T82HCGFf+KQ+yheNCCTnUlQCeDNrJM +xX6mb8zweTq7D2GSagL3XPLxNTa8lMujV6mH8prrTOgO1iSzzYmIdI/rI3hkTbKiNEq uB0w== X-Gm-Message-State: AOJu0YwwPIcWRHxSZtR0uknUu2v4Mjl5W3Ezd0wP2iouE2Y5fP5yB+qI Ih13UgZcxxHNXqOqzWfsOaCzGZbPm0aj3g83yMBR0baqqc+jPE5rmJnCXg== X-Gm-Gg: ASbGncvu8kDKdSYEZO7iUggPf6WaLIGIzkzgnHQwkrx1NEnpew+6uRSdVc8HBstLJEb kN16Ip0dUZ2bwspVdZ/mUj8vAfUp0Krmtt+LnPqgCyUI7DZy7qRR9M7rtBo/da62Qz/Yk+JCVWy CkR0RmfvS8PMfWuCmqSF+X9+FbQWbKg7NpTt5jCIME9DBpXQkqOP+EMegnD8vbFGx9am3n1XhQB s3mXjD3BfKCOaT9ySBef0ut/mqF4f4v+irF6FhN3kFhtwbe7nWdsAjd1Jal3PyVAYpGGddVL56m 4/z9+wT/7GfmsEHR7YQPyNQl9Upha7Qt X-Google-Smtp-Source: AGHT+IGrOECkEeVN4/8ravghWHofV6sc1P57PrfWXiTarVoU3iXl9AP276ivf2YpYODlIFb4JhllpA== X-Received: by 2002:a05:6a20:e68b:b0:1e1:72ce:fefc with SMTP id adf61e73a8af0-1e5e05a9edemr8395718637.22.1734724599215; Fri, 20 Dec 2024 11:56:39 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:38 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 11/14] bpf: net_sched: Allow writing to more Qdisc members Date: Fri, 20 Dec 2024 11:55:37 -0800 Message-ID: <20241220195619.2022866-12-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Allow bpf qdisc to write to Qdisc->limit and Qdisc->q.qlen. Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 04ad3676448f..925624c47c3e 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -65,6 +65,12 @@ static int bpf_qdisc_qdisc_access(struct bpf_verifier_log *log, size_t end; switch (off) { + case offsetof(struct Qdisc, limit): + end = offsetofend(struct Qdisc, limit); + break; + case offsetof(struct Qdisc, q) + offsetof(struct qdisc_skb_head, qlen): + end = offsetof(struct Qdisc, q) + offsetofend(struct qdisc_skb_head, qlen); + break; case offsetof(struct Qdisc, qstats) ... offsetofend(struct Qdisc, qstats) - 1: end = offsetofend(struct Qdisc, qstats); break; From patchwork Fri Dec 20 19:55:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917357 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED3A0228C8D; Fri, 20 Dec 2024 19:56:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724602; cv=none; b=dmcVv2wTpJEs3KQnz15v0j/LLr2d4D//upZWM2IRvip8S+D7aFctOw+0jkcR9JoMTtxwCjfkz9cOIagJGMfwRZoZBdGJTTBXHhH7MEb6dRsL8s+akCi/N2tyCVCQo9pdt6S/Vx9OOeb4jO+W3vAhqu0szz2xUf9Vf+UNG+o6mB4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724602; c=relaxed/simple; bh=vJ1MpMFR4Axn9se4BBWi3ls8XVfvWLVEF43BdAt4CqY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rDArOC/5FvTS4nBky3EbUreA8YiMGEn8MZo9tf8qJok+GlwpNHopnUahey7j32jhN6FTR1zqxJpc//ryicbcjsB3Sha6k6CQ+jqMvIDPX79/R7X+scet4SMOnKTCwJMA+E9Dm2DYeCiKw4UI8Ukk6l3TNjqpE0d0g8oKft8ULFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=M3VoNXZH; arc=none smtp.client-ip=209.85.215.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="M3VoNXZH" Received: by mail-pg1-f177.google.com with SMTP id 41be03b00d2f7-7fc93152edcso1849110a12.0; Fri, 20 Dec 2024 11:56:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724600; x=1735329400; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=66ODIwKGv3cNgTZY2u+oon01SX0j/MqOmDaqK5pK014=; b=M3VoNXZHJtBX1QCXcnB+Ghmx6gIJXAeHAUw+j84K+v2HJIioKdWIj1oS80O27B/kav JV+uw4xNOhTTLuQk6m1/DY87Ea66t0c2YHQxVkgkB4J03i8wU5XPY2TnKtGPvLYqPyxu aB5GO9ppjTb/Wy9Z54hVz7JEgXjf7fMkjVBRnnGMf91O4lmkVPRtJCXVqn4LPlSdCpbA arqPaIiuzRTCOYl9XRXdAfgmp+A6YBHNWtrgmJZCQu/aUtadtsc5xFou7Gj6SfKyzCxS zEck+O3Qb1RIgw9PzDTEELHrCVaP463EUdeHNQ7iXfJNhrUdrk98pXbougK4MqnfVj30 W7eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724600; x=1735329400; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=66ODIwKGv3cNgTZY2u+oon01SX0j/MqOmDaqK5pK014=; b=BYNg0JqDVTp3d6CdITJwfMo4swx0ND/BoD7HwGFNneDpbCzZ5xIFsyG0KwqvKa8psH HvJ6ar3JMyYdLAv5OAYosTvDeMcPZDNvi2ppv8ubvOjWNfQDSnPIf8aPw3lZRIkx7WaK 9QlOgDNqk5zZuAttriUZ76d0SELmmwkC4Jepbq9QXIwAeFAQEUVnih6mHM4bcfC6Z2XB i/kOOaVGgocEfvlB3bit5CVMANh2AscXxr3pUU5KYXk28ttv1AWtWK7VzdvhrIKdcHUF bbst85/9x8BWHxPYjeR3DfI2uBJkmC1cRo9Ggtbez5IGp/Zj19GhMfzaJcu2xGfAHurQ 9Zpw== X-Gm-Message-State: AOJu0YzJBg0+TbUyfexACqmb0Q7qp3tREIdsVMDG8tBjAPx3/0We9L5H BKM7UjA8F/VD3FnpjolVEEyK1RlFkil6nom3jtl+mIKUr/g121WvpulwBg== X-Gm-Gg: ASbGncv+SCiIcEQfzuzRYwljq+g6UWp6HU8E1cvYOGbR3a1Pj2gevP8Ttt9LfWEdlBU h8zEOr50fVQsuub+DgnR7tzMUkffvIazaTcwBCXAEtUza6lM6t9Eyhf39TOR4cP/KDYnDoDLaa5 ZVGDEX6A7DgaOq+xMsmH7zJAgjJMoY9sr/9D4acXD5tlGDl0xPtBbKRZwW8A0xjU4xihv/7ihyy GvgglZV2TYSnBR5dH0avQSv8YnIyWL0bFgz5VdPmsY+jwsmanUS3X2K3aoK9AwqFn10+OxntA+8 TRRMI0R+cIQnGVnKIioRlptqEKoBHpdH X-Google-Smtp-Source: AGHT+IHwMgVOcaCC2dcfVtrVTCxc3NMxKGgUyzjnWEYCSA+V3qEds1zsUvnnhye8KYcMgIvzv8G2SA== X-Received: by 2002:a05:6a20:1593:b0:1e1:ad90:dda6 with SMTP id adf61e73a8af0-1e5e1fa249amr6397091637.20.1734724600182; Fri, 20 Dec 2024 11:56:40 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:39 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 12/14] libbpf: Support creating and destroying qdisc Date: Fri, 20 Dec 2024 11:55:38 -0800 Message-ID: <20241220195619.2022866-13-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung Extend struct bpf_tc_hook with handle, qdisc name and a new attach type, BPF_TC_QDISC, to allow users to add or remove any qdisc specified in addition to clsact. Signed-off-by: Amery Hung --- tools/lib/bpf/libbpf.h | 5 ++++- tools/lib/bpf/netlink.c | 20 +++++++++++++++++--- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index d45807103565..062ed3f273a1 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1268,6 +1268,7 @@ enum bpf_tc_attach_point { BPF_TC_INGRESS = 1 << 0, BPF_TC_EGRESS = 1 << 1, BPF_TC_CUSTOM = 1 << 2, + BPF_TC_QDISC = 1 << 3, }; #define BPF_TC_PARENT(a, b) \ @@ -1282,9 +1283,11 @@ struct bpf_tc_hook { int ifindex; enum bpf_tc_attach_point attach_point; __u32 parent; + __u32 handle; + const char *qdisc; size_t :0; }; -#define bpf_tc_hook__last_field parent +#define bpf_tc_hook__last_field qdisc struct bpf_tc_opts { size_t sz; diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 68a2def17175..c997e69d507f 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -529,9 +529,9 @@ int bpf_xdp_query_id(int ifindex, int flags, __u32 *prog_id) } -typedef int (*qdisc_config_t)(struct libbpf_nla_req *req); +typedef int (*qdisc_config_t)(struct libbpf_nla_req *req, const struct bpf_tc_hook *hook); -static int clsact_config(struct libbpf_nla_req *req) +static int clsact_config(struct libbpf_nla_req *req, const struct bpf_tc_hook *hook) { req->tc.tcm_parent = TC_H_CLSACT; req->tc.tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0); @@ -539,6 +539,16 @@ static int clsact_config(struct libbpf_nla_req *req) return nlattr_add(req, TCA_KIND, "clsact", sizeof("clsact")); } +static int qdisc_config(struct libbpf_nla_req *req, const struct bpf_tc_hook *hook) +{ + const char *qdisc = OPTS_GET(hook, qdisc, NULL); + + req->tc.tcm_parent = OPTS_GET(hook, parent, TC_H_ROOT); + req->tc.tcm_handle = OPTS_GET(hook, handle, 0); + + return nlattr_add(req, TCA_KIND, qdisc, strlen(qdisc) + 1); +} + static int attach_point_to_config(struct bpf_tc_hook *hook, qdisc_config_t *config) { @@ -552,6 +562,9 @@ static int attach_point_to_config(struct bpf_tc_hook *hook, return 0; case BPF_TC_CUSTOM: return -EOPNOTSUPP; + case BPF_TC_QDISC: + *config = &qdisc_config; + return 0; default: return -EINVAL; } @@ -596,7 +609,7 @@ static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags) req.tc.tcm_family = AF_UNSPEC; req.tc.tcm_ifindex = OPTS_GET(hook, ifindex, 0); - ret = config(&req); + ret = config(&req, hook); if (ret < 0) return ret; @@ -639,6 +652,7 @@ int bpf_tc_hook_destroy(struct bpf_tc_hook *hook) case BPF_TC_INGRESS: case BPF_TC_EGRESS: return libbpf_err(__bpf_tc_detach(hook, NULL, true)); + case BPF_TC_QDISC: case BPF_TC_INGRESS | BPF_TC_EGRESS: return libbpf_err(tc_qdisc_delete(hook)); case BPF_TC_CUSTOM: From patchwork Fri Dec 20 19:55:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917358 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0505F229128; Fri, 20 Dec 2024 19:56:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724603; cv=none; b=MqOB60tcvHdrmFSE+mPsDRtY9GQiMIToOrY1uGL0/Iu8/vBVdDR0dB8V+B8vd42mzZH064xhPiCR9P+uQYPBUGGMeTnrZ+00lJQzzJRmmDty9d1ZJSH3vPPMDE4vlOKrFFDJJgObLt/q7kRdfNuo+N6nHYvu/CIZJM2acIjLAYA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724603; c=relaxed/simple; bh=LmGFr7ciPFl0uEsztDZ+TIQBnHOI8t5Sz+e8A+p2IlU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=N9YvyECy1CWNSyppYvZ+v/EWCpjYfDeBwQTD4MkGG+Ok1tdgtVJRQvX/gLtgDJcJtQldo3+I3mxY5UFNNYRjP5H33yFQbPZgqoRWs85g+/Q4IKYzc6xqbxz0bBgKz2TDIA6oi62sIZ40p2W7HG45yYAy8+oQbvZujGVDb6i42jI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nXDdTG96; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nXDdTG96" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-728ea1573c0so2076646b3a.0; Fri, 20 Dec 2024 11:56:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724601; x=1735329401; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0KkoFeBG+yST4/Hwb8uolOPrVi+5DHCAbOjl4ev3y9k=; b=nXDdTG96WFCLNI8UyZHOhVa2kr4e7vP8j+7VVmG022h+OeMKso+vX7Emel/jO3XYET wbwjyBQvuuaL5EO2EpL+8J19py71jkDwsqeRj6WHIs0Y5C5RhiReWs30sQmIdXfrRBQM 6F/Pk7kgAkmzdEGGLpHf73HMJlyr8fEs3VvbKTTd8PQMOPNh6OuCUvK1SUgvhCtXzOoM Ws2wiN8x1GCm6moIfe5XJr6J1Wfd5Hznk/oJNWfbOGt7WVYUyl0w91tgZJikZbhvMVEw vmJmO89v7gs2pIF5WzT6nAzMPlzeWnrCOAtAL7/HV8yJi0olZeN06lW6u3VhcDikiVdW oT4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724601; x=1735329401; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0KkoFeBG+yST4/Hwb8uolOPrVi+5DHCAbOjl4ev3y9k=; b=Q0ydf/6JTYbouoSR0qsI3KVQNKO5rmRsJOT26gVqJgVR0lO4My2S19M7YDiFF1DOdx YPhq1NxaCYGaS7Jea0yeedt8M28Sw9MS9OGnbx8oUXMSw+er5yLnPxkcmqgJdTcjTXeB o6GF3LTdhg5KCkI13EeCz6Q+pherrwPZZ/kR2vok8H09bwJD7X3HhtK+OkQjEEQjvS+x Bfxeo/k8vuxwFX9hhTnS4W0HHqC4k24DJtax7B+e/ejEoC7YqofVW6JTPnhsmhXNSaiR h0PiJ3hQ7FYoh2ZYMm05+OygwayaOlYjaz3cPXXDYZX3TjjPFGKCg9X+ASy996qJ/7CJ SkcA== X-Gm-Message-State: AOJu0YzG9Z1Cdm6v2w7Gzulnc/3vEbW6x63TTxxW9k2fVv3h4KGVohDX aOSmldqqZV0l/zXp2dJQt4xnB4TlLJser1iDGbCT3DSBLaeDNi5e9wmu9A== X-Gm-Gg: ASbGncszKT1K6c85KJNjZdReMqSvW3SR7RuGUI90bTihxBgidlTrLA8EkSNzNjPZ+hx 4mqM3rugsQFttL6eP49lfYBAywBfbkN8njvBYo/Ui5WjnEu7p3jqwkn+JA1S7NzU5AjR8amKiaS O2CcrcL+Us2YqOMyVXFp5aRHr0DBIfkNy7sMh5fR59p/sqtTNcFXj8pQ42PLMiwGl1qr00nt2nI zjroEThe+G9wPn411AUD6HRrOu2yx3JpAzP6RbYJi/0AqxyL0HadWy+6dJURZqg3/nuhbmYeLhY 7FBTqB0aE9zJ5IXjSyj8Igi2jhI6ebIJ X-Google-Smtp-Source: AGHT+IHr5Mkl6vSRfDdzDsaMXXcQjHIroHKJr3ty4gHJfKOki7D40zdOH4Xw6fPbSTd+iq1MCnob6g== X-Received: by 2002:a05:6a21:9103:b0:1e0:d766:8da1 with SMTP id adf61e73a8af0-1e5e083e4f5mr7389502637.39.1734724601151; Fri, 20 Dec 2024 11:56:41 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:40 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 13/14] selftests: Add a basic fifo qdisc test Date: Fri, 20 Dec 2024 11:55:39 -0800 Message-ID: <20241220195619.2022866-14-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung This selftest shows a bare minimum fifo qdisc, which simply enqueues skbs into the back of a bpf list and dequeues from the front of the list. Signed-off-by: Amery Hung --- tools/testing/selftests/bpf/config | 1 + .../selftests/bpf/prog_tests/bpf_qdisc.c | 161 ++++++++++++++++++ .../selftests/bpf/progs/bpf_qdisc_common.h | 27 +++ .../selftests/bpf/progs/bpf_qdisc_fifo.c | 117 +++++++++++++ 4 files changed, 306 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_common.h create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index c378d5d07e02..6b0cab55bd2d 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -71,6 +71,7 @@ CONFIG_NET_IPGRE=y CONFIG_NET_IPGRE_DEMUX=y CONFIG_NET_IPIP=y CONFIG_NET_MPLS_GSO=y +CONFIG_NET_SCH_BPF=y CONFIG_NET_SCH_FQ=y CONFIG_NET_SCH_INGRESS=y CONFIG_NET_SCHED=y diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c new file mode 100644 index 000000000000..295d0216e70f --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -0,0 +1,161 @@ +#include +#include +#include + +#include "network_helpers.h" +#include "bpf_qdisc_fifo.skel.h" + +#ifndef ENOTSUPP +#define ENOTSUPP 524 +#endif + +#define LO_IFINDEX 1 + +static const unsigned int total_bytes = 10 * 1024 * 1024; +static int stop; + +static void *server(void *arg) +{ + int lfd = (int)(long)arg, err = 0, fd; + ssize_t nr_sent = 0, bytes = 0; + char batch[1500]; + + fd = accept(lfd, NULL, NULL); + while (fd == -1) { + if (errno == EINTR) + continue; + err = -errno; + goto done; + } + + if (settimeo(fd, 0)) { + err = -errno; + goto done; + } + + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_sent = send(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_sent == -1 && errno == EINTR) + continue; + if (nr_sent == -1) { + err = -errno; + break; + } + bytes += nr_sent; + } + + ASSERT_EQ(bytes, total_bytes, "send"); + +done: + if (fd >= 0) + close(fd); + if (err) { + WRITE_ONCE(stop, 1); + return ERR_PTR(err); + } + return NULL; +} + +static void do_test(char *qdisc) +{ + DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = LO_IFINDEX, + .attach_point = BPF_TC_QDISC, + .parent = TC_H_ROOT, + .handle = 0x8000000, + .qdisc = qdisc); + struct sockaddr_in6 sa6 = {}; + ssize_t nr_recv = 0, bytes = 0; + int lfd = -1, fd = -1; + pthread_t srv_thread; + socklen_t addrlen = sizeof(sa6); + void *thread_ret; + char batch[1500]; + int err; + + WRITE_ONCE(stop, 0); + + err = bpf_tc_hook_create(&hook); + if (!ASSERT_OK(err, "attach qdisc")) + return; + + lfd = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_NEQ(lfd, -1, "socket")) { + bpf_tc_hook_destroy(&hook); + return; + } + + fd = socket(AF_INET6, SOCK_STREAM, 0); + if (!ASSERT_NEQ(fd, -1, "socket")) { + bpf_tc_hook_destroy(&hook); + close(lfd); + return; + } + + if (settimeo(lfd, 0) || settimeo(fd, 0)) + goto done; + + err = getsockname(lfd, (struct sockaddr *)&sa6, &addrlen); + if (!ASSERT_NEQ(err, -1, "getsockname")) + goto done; + + /* connect to server */ + err = connect(fd, (struct sockaddr *)&sa6, addrlen); + if (!ASSERT_NEQ(err, -1, "connect")) + goto done; + + err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd); + if (!ASSERT_OK(err, "pthread_create")) + goto done; + + /* recv total_bytes */ + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_recv = recv(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_recv == -1 && errno == EINTR) + continue; + if (nr_recv == -1) + break; + bytes += nr_recv; + } + + ASSERT_EQ(bytes, total_bytes, "recv"); + + WRITE_ONCE(stop, 1); + pthread_join(srv_thread, &thread_ret); + ASSERT_OK(IS_ERR(thread_ret), "thread_ret"); + +done: + close(lfd); + close(fd); + + bpf_tc_hook_destroy(&hook); + return; +} + +static void test_fifo(void) +{ + struct bpf_qdisc_fifo *fifo_skel; + struct bpf_link *link; + + fifo_skel = bpf_qdisc_fifo__open_and_load(); + if (!ASSERT_OK_PTR(fifo_skel, "bpf_qdisc_fifo__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(fifo_skel->maps.fifo); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_fifo__destroy(fifo_skel); + return; + } + + do_test("bpf_fifo"); + + bpf_link__destroy(link); + bpf_qdisc_fifo__destroy(fifo_skel); +} + +void test_bpf_qdisc(void) +{ + if (test__start_subtest("fifo")) + test_fifo(); +} diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h new file mode 100644 index 000000000000..62a778f94908 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h @@ -0,0 +1,27 @@ +#ifndef _BPF_QDISC_COMMON_H +#define _BPF_QDISC_COMMON_H + +#define NET_XMIT_SUCCESS 0x00 +#define NET_XMIT_DROP 0x01 /* skb dropped */ +#define NET_XMIT_CN 0x02 /* congestion notification */ + +#define TC_PRIO_CONTROL 7 +#define TC_PRIO_MAX 15 + +u32 bpf_skb_get_hash(struct sk_buff *p) __ksym; +void bpf_kfree_skb(struct sk_buff *p) __ksym; +void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym; +void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) __ksym; +void bpf_qdisc_bstats_update(struct Qdisc *sch, const struct sk_buff *skb) __ksym; + +static struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb) +{ + return (struct qdisc_skb_cb *)skb->cb; +} + +static inline unsigned int qdisc_pkt_len(const struct sk_buff *skb) +{ + return qdisc_skb_cb(skb)->pkt_len; +} + +#endif diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c new file mode 100644 index 000000000000..705e7da325da --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c @@ -0,0 +1,117 @@ +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +struct skb_node { + struct sk_buff __kptr * skb; + struct bpf_list_node node; +}; + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock q_fifo_lock; +private(A) struct bpf_list_head q_fifo __contains(skb_node, node); + +SEC("struct_ops/bpf_fifo_enqueue") +int BPF_PROG(bpf_fifo_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + struct skb_node *skbn; + u32 pkt_len; + + if (sch->q.qlen == sch->limit) + goto drop; + + skbn = bpf_obj_new(typeof(*skbn)); + if (!skbn) + goto drop; + + pkt_len = qdisc_pkt_len(skb); + + sch->q.qlen++; + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_qdisc_skb_drop(skb, to_free); + + bpf_spin_lock(&q_fifo_lock); + bpf_list_push_back(&q_fifo, &skbn->node); + bpf_spin_unlock(&q_fifo_lock); + + sch->qstats.backlog += pkt_len; + return NET_XMIT_SUCCESS; +drop: + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; +} + +SEC("struct_ops/bpf_fifo_dequeue") +struct sk_buff *BPF_PROG(bpf_fifo_dequeue, struct Qdisc *sch) +{ + struct bpf_list_node *node; + struct sk_buff *skb = NULL; + struct skb_node *skbn; + + bpf_spin_lock(&q_fifo_lock); + node = bpf_list_pop_front(&q_fifo); + bpf_spin_unlock(&q_fifo_lock); + if (!node) + return NULL; + + skbn = container_of(node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + if (!skb) + return NULL; + + sch->qstats.backlog -= qdisc_pkt_len(skb); + bpf_qdisc_bstats_update(sch, skb); + sch->q.qlen--; + + return skb; +} + +SEC("struct_ops/bpf_fifo_init") +int BPF_PROG(bpf_fifo_init, struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + sch->limit = 1000; + return 0; +} + +SEC("struct_ops/bpf_fifo_reset") +void BPF_PROG(bpf_fifo_reset, struct Qdisc *sch) +{ + struct bpf_list_node *node; + struct skb_node *skbn; + int i; + + bpf_for(i, 0, sch->q.qlen) { + struct sk_buff *skb = NULL; + + bpf_spin_lock(&q_fifo_lock); + node = bpf_list_pop_front(&q_fifo); + bpf_spin_unlock(&q_fifo_lock); + + if (!node) + break; + + skbn = container_of(node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_kfree_skb(skb); + bpf_obj_drop(skbn); + } + sch->q.qlen = 0; +} + +SEC(".struct_ops") +struct Qdisc_ops fifo = { + .enqueue = (void *)bpf_fifo_enqueue, + .dequeue = (void *)bpf_fifo_dequeue, + .init = (void *)bpf_fifo_init, + .reset = (void *)bpf_fifo_reset, + .id = "bpf_fifo", +}; + From patchwork Fri Dec 20 19:55:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13917359 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 173BB229148; Fri, 20 Dec 2024 19:56:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724605; cv=none; b=tZ5ve5Lnc+jZEOYNb2J+yOCFiiaYfqXtF9ODmk2QP+oVUo4Qe/Yv0glOOG3Iv0CaRo7s+dYu39b7eH4kvtN1t/fDNPU9CYeN98Ecm1kVQciaywSEAT3juMLMOkL8S02QCr5kANCcfI7EK7g34j8fq3XREsaqRNmYKLw8lwZgi/w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734724605; c=relaxed/simple; bh=MIBv7p882aCHvB1TocKgOpO+fCPHxgd9nLqhLi0jqyk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NO/k1grRE5hgSqLcEJWrpJOR7PWkHeorCLoUqtTvRfkLM6XJ2rYn2mGE2D5YlVKcTL8OHHHwvXn3a8OlGbRGPUBg1kt//D+JSnv26LaOdH9h7Hwjy8Cs4qkLEQtTYSTLblKE3vOJSEk3nX4RkpHTRIP+zPRmJa2UMAJFPyRj9lg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lizQkmee; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lizQkmee" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-725d9f57d90so1805554b3a.1; Fri, 20 Dec 2024 11:56:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734724602; x=1735329402; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mHxnR3IpfGa0cK6dOMRVNiTC21A/TUkAMK384Ws0SFU=; b=lizQkmeekSB4Re/H05ng0eaqapD/X1R5oHkzUahlXPxy0N37V7tq3HQGrhs/Ub7XUT sYoAdgXmD6VEv2LCNX1BjnNS1L9CGKd2ewwR+J7rJCqTZlTv3DcP8Mwb+Tj5Q04rLZ27 D/iYVSyOVcHR16Mf4z9e+C+UsCDtINMwXX7m8P9NogfhZLFCRFxu6Fp/x2ZeiNiMr6oy 2h5dlR1SRa9+uaRYSGZIinmX/kO9LRsKaB/YkYYZRAzeqOAbGITeuvrHcoAfwmIc+e+A wm+/7NhfvKpQVjOm6oF0oZf2BWPk56U/6sx8JF9x6tTtoSKzPQ2OZ+gKu0OdYXnjZiZb eoOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734724602; x=1735329402; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mHxnR3IpfGa0cK6dOMRVNiTC21A/TUkAMK384Ws0SFU=; b=FgBILJn5hWV9aV8eviykgcl9vNqZIuJ1hjCHhxJqAepF/eP5Eg1GbDJ5LhaWBYwYfH LJdN1ZQmGNu4y/3tXktYUejiAlULE5Zgh52RBdg+yEyzOdOkkHTPGpXu/S4BxWKgRtTs GMqwUJ4VBRiKDmvz+bWQpNha51xsq2zlor6QU8Ii/EozJedqcixbAGMXVs/dCcVnP0W7 2QF34hTWlJNq7WCMR0RCpq0VaL3ogKZUYX29jHiqJcEs7GJnsRkcA4FgbOrphcg4ZdUL blj4sCjRUcW/5W1IdOKNiiCSksTlHtYc8b/ALPgnjchYyjlle0E5zeX/zcraofoH4mVw Yhcg== X-Gm-Message-State: AOJu0YxKXsGKIZHu41h9wyVOuv66QNJCEhEyUpSATfWhSj7/LNvRXNYG YpHwUPRkn1TbQyzNdX6CkoqQISr9AJhlLtXwdJO3WfB2bQofR5Gg42Z6rw== X-Gm-Gg: ASbGncsgLzKXOO6b1ebQtZY3H3FD0e4pM+jVETAdqeO4/tZYKFZFUJx7Qg5hR6xSQ4+ UYKQS/dMTC4wAyv2cc4Jue+EiMuvDboma/CAYlnVxMWb9FZwUR8zkFVObjtByiAFUnygFxZTayV JK5MePKDWvVFWmlaZnLmbkDzGfvIcO+XcRhY24lubrpXCCmx15AIzKz2s9xt9Lr3mp2foMXL1T0 AJ2xiJuZfDikYSeZPEnE5nd1gKee4kPgxlKvJeSTs/Oz7mtnVhknFkU8j+ArHiIJXimUREupOXF GCPVK/N2BrZKUaE7Rk+twJSwoP+Yvd4/ X-Google-Smtp-Source: AGHT+IHdV3E9biNbALrShvQSLpE2j3kJP23/m6Nwpd+rh2eCX6UoeZ9uxeDiZeoisLMTVTz35jDuEw== X-Received: by 2002:a05:6a21:4a4b:b0:1e1:a75a:c452 with SMTP id adf61e73a8af0-1e5e059c193mr7187267637.19.1734724602115; Fri, 20 Dec 2024 11:56:42 -0800 (PST) Received: from localhost.localdomain (c-76-146-13-146.hsd1.wa.comcast.net. [76.146.13.146]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842b17273dasm3240342a12.19.2024.12.20.11.56.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:56:41 -0800 (PST) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, alexei.starovoitov@gmail.com, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com, amery.hung@bytedance.com Subject: [PATCH bpf-next v2 14/14] selftests: Add a bpf fq qdisc to selftest Date: Fri, 20 Dec 2024 11:55:40 -0800 Message-ID: <20241220195619.2022866-15-amery.hung@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com> References: <20241220195619.2022866-1-amery.hung@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Amery Hung This test implements a more sophisticated qdisc using bpf. The bpf fair- queueing (fq) qdisc gives each flow an equal chance to transmit data. It also respects the timestamp of skb for rate limiting. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 24 + .../selftests/bpf/progs/bpf_qdisc_fq.c | 726 ++++++++++++++++++ 2 files changed, 750 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c index 295d0216e70f..394bf5a4adae 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -4,6 +4,7 @@ #include "network_helpers.h" #include "bpf_qdisc_fifo.skel.h" +#include "bpf_qdisc_fq.skel.h" #ifndef ENOTSUPP #define ENOTSUPP 524 @@ -154,8 +155,31 @@ static void test_fifo(void) bpf_qdisc_fifo__destroy(fifo_skel); } +static void test_fq(void) +{ + struct bpf_qdisc_fq *fq_skel; + struct bpf_link *link; + + fq_skel = bpf_qdisc_fq__open_and_load(); + if (!ASSERT_OK_PTR(fq_skel, "bpf_qdisc_fq__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(fq_skel->maps.fq); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_fq__destroy(fq_skel); + return; + } + + do_test("bpf_fq"); + + bpf_link__destroy(link); + bpf_qdisc_fq__destroy(fq_skel); +} + void test_bpf_qdisc(void) { if (test__start_subtest("fifo")) test_fifo(); + if (test__start_subtest("fq")) + test_fq(); } diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c new file mode 100644 index 000000000000..2af2e39f9ed7 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c @@ -0,0 +1,726 @@ +#include +#include +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +#define NSEC_PER_USEC 1000L +#define NSEC_PER_SEC 1000000000L + +#define NUM_QUEUE (1 << 20) + +struct fq_bpf_data { + u32 quantum; + u32 initial_quantum; + u32 flow_refill_delay; + u32 flow_plimit; + u64 horizon; + u32 orphan_mask; + u32 timer_slack; + u64 time_next_delayed_flow; + u64 unthrottle_latency_ns; + u8 horizon_drop; + u32 new_flow_cnt; + u32 old_flow_cnt; + u64 ktime_cache; +}; + +enum { + CLS_RET_PRIO = 0, + CLS_RET_NONPRIO = 1, + CLS_RET_ERR = 2, +}; + +struct skb_node { + u64 tstamp; + struct sk_buff __kptr * skb; + struct bpf_rb_node node; +}; + +struct fq_flow_node { + int credit; + u32 qlen; + u64 age; + u64 time_next_packet; + struct bpf_list_node list_node; + struct bpf_rb_node rb_node; + struct bpf_rb_root queue __contains(skb_node, node); + struct bpf_spin_lock lock; + struct bpf_refcount refcount; +}; + +struct dequeue_nonprio_ctx { + bool stop_iter; + u64 expire; + u64 now; +}; + +struct remove_flows_ctx { + bool gc_only; + u32 reset_cnt; + u32 reset_max; +}; + +struct unset_throttled_flows_ctx { + bool unset_all; + u64 now; +}; + +struct fq_stashed_flow { + struct fq_flow_node __kptr * flow; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u64); + __type(value, struct fq_stashed_flow); + __uint(max_entries, NUM_QUEUE); +} fq_nonprio_flows SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u64); + __type(value, struct fq_stashed_flow); + __uint(max_entries, 1); +} fq_prio_flows SEC(".maps"); + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock fq_delayed_lock; +private(A) struct bpf_rb_root fq_delayed __contains(fq_flow_node, rb_node); + +private(B) struct bpf_spin_lock fq_new_flows_lock; +private(B) struct bpf_list_head fq_new_flows __contains(fq_flow_node, list_node); + +private(C) struct bpf_spin_lock fq_old_flows_lock; +private(C) struct bpf_list_head fq_old_flows __contains(fq_flow_node, list_node); + +private(D) struct fq_bpf_data q; + +/* Wrapper for bpf_kptr_xchg that expects NULL dst */ +static void bpf_kptr_xchg_back(void *map_val, void *ptr) +{ + void *ret; + + ret = bpf_kptr_xchg(map_val, ptr); + if (ret) + bpf_obj_drop(ret); +} + +static bool skbn_tstamp_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct skb_node *skbn_a; + struct skb_node *skbn_b; + + skbn_a = container_of(a, struct skb_node, node); + skbn_b = container_of(b, struct skb_node, node); + + return skbn_a->tstamp < skbn_b->tstamp; +} + +static bool fn_time_next_packet_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct fq_flow_node *flow_a; + struct fq_flow_node *flow_b; + + flow_a = container_of(a, struct fq_flow_node, rb_node); + flow_b = container_of(b, struct fq_flow_node, rb_node); + + return flow_a->time_next_packet < flow_b->time_next_packet; +} + +static void +fq_flows_add_head(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct fq_flow_node *flow, u32 *flow_cnt) +{ + bpf_spin_lock(lock); + bpf_list_push_front(head, &flow->list_node); + bpf_spin_unlock(lock); + *flow_cnt += 1; +} + +static void +fq_flows_add_tail(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct fq_flow_node *flow, u32 *flow_cnt) +{ + bpf_spin_lock(lock); + bpf_list_push_back(head, &flow->list_node); + bpf_spin_unlock(lock); + *flow_cnt += 1; +} + +static void +fq_flows_remove_front(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct bpf_list_node **node, u32 *flow_cnt) +{ + bpf_spin_lock(lock); + *node = bpf_list_pop_front(head); + bpf_spin_unlock(lock); + *flow_cnt -= 1; +} + +static bool +fq_flows_is_empty(struct bpf_list_head *head, struct bpf_spin_lock *lock) +{ + struct bpf_list_node *node; + + bpf_spin_lock(lock); + node = bpf_list_pop_front(head); + if (node) { + bpf_list_push_front(head, node); + bpf_spin_unlock(lock); + return false; + } + bpf_spin_unlock(lock); + + return true; +} + +/* flow->age is used to denote the state of the flow (not-detached, detached, throttled) + * as well as the timestamp when the flow is detached. + * + * 0: not-detached + * 1 - (~0ULL-1): detached + * ~0ULL: throttled + */ +static void fq_flow_set_detached(struct fq_flow_node *flow) +{ + flow->age = bpf_jiffies64(); +} + +static bool fq_flow_is_detached(struct fq_flow_node *flow) +{ + return flow->age != 0 && flow->age != ~0ULL; +} + +static bool sk_listener(struct sock *sk) +{ + return (1 << sk->__sk_common.skc_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV); +} + +static void fq_gc(void); + +static int fq_new_flow(void *flow_map, struct fq_stashed_flow **sflow, u64 hash) +{ + struct fq_stashed_flow tmp = {}; + struct fq_flow_node *flow; + int ret; + + flow = bpf_obj_new(typeof(*flow)); + if (!flow) + return -ENOMEM; + + flow->credit = q.initial_quantum, + flow->qlen = 0, + flow->age = 1, + flow->time_next_packet = 0, + + ret = bpf_map_update_elem(flow_map, &hash, &tmp, 0); + if (ret == -ENOMEM) { + fq_gc(); + bpf_map_update_elem(&fq_nonprio_flows, &hash, &tmp, 0); + } + + *sflow = bpf_map_lookup_elem(flow_map, &hash); + if (!*sflow) { + bpf_obj_drop(flow); + return -ENOMEM; + } + + bpf_kptr_xchg_back(&(*sflow)->flow, flow); + return 0; +} + +static int +fq_classify(struct sk_buff *skb, struct fq_stashed_flow **sflow) +{ + struct sock *sk = skb->sk; + int ret = CLS_RET_NONPRIO; + u64 hash = 0; + + if ((skb->priority & TC_PRIO_MAX) == TC_PRIO_CONTROL) { + *sflow = bpf_map_lookup_elem(&fq_prio_flows, &hash); + ret = CLS_RET_PRIO; + } else { + if (!sk || sk_listener(sk)) { + hash = bpf_skb_get_hash(skb) & q.orphan_mask; + /* Avoid collision with an existing flow hash, which + * only uses the lower 32 bits of hash, by setting the + * upper half of hash to 1. + */ + hash |= (1ULL << 32); + } else if (sk->__sk_common.skc_state == TCP_CLOSE) { + hash = bpf_skb_get_hash(skb) & q.orphan_mask; + hash |= (1ULL << 32); + } else { + hash = sk->__sk_common.skc_hash; + } + *sflow = bpf_map_lookup_elem(&fq_nonprio_flows, &hash); + } + + if (!*sflow) + ret = fq_new_flow(&fq_nonprio_flows, sflow, hash) < 0 ? + CLS_RET_ERR : CLS_RET_NONPRIO; + + return ret; +} + +static bool fq_packet_beyond_horizon(struct sk_buff *skb) +{ + return (s64)skb->tstamp > (s64)(q.ktime_cache + q.horizon); +} + +SEC("struct_ops/bpf_fq_enqueue") +int BPF_PROG(bpf_fq_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + struct fq_flow_node *flow = NULL, *flow_copy; + struct fq_stashed_flow *sflow; + u64 time_to_send, jiffies; + struct skb_node *skbn; + int ret; + + if (sch->q.qlen >= sch->limit) + goto drop; + + if (!skb->tstamp) { + time_to_send = q.ktime_cache = bpf_ktime_get_ns(); + } else { + if (fq_packet_beyond_horizon(skb)) { + q.ktime_cache = bpf_ktime_get_ns(); + if (fq_packet_beyond_horizon(skb)) { + if (q.horizon_drop) + goto drop; + + skb->tstamp = q.ktime_cache + q.horizon; + } + } + time_to_send = skb->tstamp; + } + + ret = fq_classify(skb, &sflow); + if (ret == CLS_RET_ERR) + goto drop; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (!flow) + goto drop; + + if (ret == CLS_RET_NONPRIO) { + if (flow->qlen >= q.flow_plimit) { + bpf_kptr_xchg_back(&sflow->flow, flow); + goto drop; + } + + if (fq_flow_is_detached(flow)) { + flow_copy = bpf_refcount_acquire(flow); + + jiffies = bpf_jiffies64(); + if ((s64)(jiffies - (flow_copy->age + q.flow_refill_delay)) > 0) { + if (flow_copy->credit < q.quantum) + flow_copy->credit = q.quantum; + } + flow_copy->age = 0; + fq_flows_add_tail(&fq_new_flows, &fq_new_flows_lock, flow_copy, + &q.new_flow_cnt); + } + } + + skbn = bpf_obj_new(typeof(*skbn)); + if (!skbn) { + bpf_kptr_xchg_back(&sflow->flow, flow); + goto drop; + } + + skbn->tstamp = skb->tstamp = time_to_send; + + sch->qstats.backlog += qdisc_pkt_len(skb); + + skb = bpf_kptr_xchg(&skbn->skb, skb); + if (skb) + bpf_qdisc_skb_drop(skb, to_free); + + bpf_spin_lock(&flow->lock); + bpf_rbtree_add(&flow->queue, &skbn->node, skbn_tstamp_less); + bpf_spin_unlock(&flow->lock); + + flow->qlen++; + bpf_kptr_xchg_back(&sflow->flow, flow); + + sch->q.qlen++; + return NET_XMIT_SUCCESS; + +drop: + bpf_qdisc_skb_drop(skb, to_free); + sch->qstats.drops++; + return NET_XMIT_DROP; +} + +static int fq_unset_throttled_flows(u32 index, struct unset_throttled_flows_ctx *ctx) +{ + struct bpf_rb_node *node = NULL; + struct fq_flow_node *flow; + + bpf_spin_lock(&fq_delayed_lock); + + node = bpf_rbtree_first(&fq_delayed); + if (!node) { + bpf_spin_unlock(&fq_delayed_lock); + return 1; + } + + flow = container_of(node, struct fq_flow_node, rb_node); + if (!ctx->unset_all && flow->time_next_packet > ctx->now) { + q.time_next_delayed_flow = flow->time_next_packet; + bpf_spin_unlock(&fq_delayed_lock); + return 1; + } + + node = bpf_rbtree_remove(&fq_delayed, &flow->rb_node); + + bpf_spin_unlock(&fq_delayed_lock); + + if (!node) + return 1; + + flow = container_of(node, struct fq_flow_node, rb_node); + flow->age = 0; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow, &q.old_flow_cnt); + + return 0; +} + +static void fq_flow_set_throttled(struct fq_flow_node *flow) +{ + flow->age = ~0ULL; + + if (q.time_next_delayed_flow > flow->time_next_packet) + q.time_next_delayed_flow = flow->time_next_packet; + + bpf_spin_lock(&fq_delayed_lock); + bpf_rbtree_add(&fq_delayed, &flow->rb_node, fn_time_next_packet_less); + bpf_spin_unlock(&fq_delayed_lock); +} + +static void fq_check_throttled(u64 now) +{ + struct unset_throttled_flows_ctx ctx = { + .unset_all = false, + .now = now, + }; + unsigned long sample; + + if (q.time_next_delayed_flow > now) + return; + + sample = (unsigned long)(now - q.time_next_delayed_flow); + q.unthrottle_latency_ns -= q.unthrottle_latency_ns >> 3; + q.unthrottle_latency_ns += sample >> 3; + + q.time_next_delayed_flow = ~0ULL; + bpf_loop(NUM_QUEUE, fq_unset_throttled_flows, &ctx, 0); +} + +static struct sk_buff* +fq_dequeue_nonprio_flows(u32 index, struct dequeue_nonprio_ctx *ctx) +{ + u64 time_next_packet, time_to_send; + struct bpf_rb_node *rb_node; + struct sk_buff *skb = NULL; + struct bpf_list_head *head; + struct bpf_list_node *node; + struct bpf_spin_lock *lock; + struct fq_flow_node *flow; + struct skb_node *skbn; + bool is_empty; + u32 *cnt; + + if (q.new_flow_cnt) { + head = &fq_new_flows; + lock = &fq_new_flows_lock; + cnt = &q.new_flow_cnt; + } else if (q.old_flow_cnt) { + head = &fq_old_flows; + lock = &fq_old_flows_lock; + cnt = &q.old_flow_cnt; + } else { + if (q.time_next_delayed_flow != ~0ULL) + ctx->expire = q.time_next_delayed_flow; + goto break_loop; + } + + fq_flows_remove_front(head, lock, &node, cnt); + if (!node) + goto break_loop; + + flow = container_of(node, struct fq_flow_node, list_node); + if (flow->credit <= 0) { + flow->credit += q.quantum; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow, &q.old_flow_cnt); + return NULL; + } + + bpf_spin_lock(&flow->lock); + rb_node = bpf_rbtree_first(&flow->queue); + if (!rb_node) { + bpf_spin_unlock(&flow->lock); + is_empty = fq_flows_is_empty(&fq_old_flows, &fq_old_flows_lock); + if (head == &fq_new_flows && !is_empty) { + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow, &q.old_flow_cnt); + } else { + fq_flow_set_detached(flow); + bpf_obj_drop(flow); + } + return NULL; + } + + skbn = container_of(rb_node, struct skb_node, node); + time_to_send = skbn->tstamp; + + time_next_packet = (time_to_send > flow->time_next_packet) ? + time_to_send : flow->time_next_packet; + if (ctx->now < time_next_packet) { + bpf_spin_unlock(&flow->lock); + flow->time_next_packet = time_next_packet; + fq_flow_set_throttled(flow); + return NULL; + } + + rb_node = bpf_rbtree_remove(&flow->queue, rb_node); + bpf_spin_unlock(&flow->lock); + + if (!rb_node) + goto add_flow_and_break; + + skbn = container_of(rb_node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + + if (!skb) + goto add_flow_and_break; + + flow->credit -= qdisc_skb_cb(skb)->pkt_len; + flow->qlen--; + +add_flow_and_break: + fq_flows_add_head(head, lock, flow, cnt); + +break_loop: + ctx->stop_iter = true; + return skb; +} + +static struct sk_buff *fq_dequeue_prio(void) +{ + struct fq_flow_node *flow = NULL; + struct fq_stashed_flow *sflow; + struct bpf_rb_node *rb_node; + struct sk_buff *skb = NULL; + struct skb_node *skbn; + u64 hash = 0; + + sflow = bpf_map_lookup_elem(&fq_prio_flows, &hash); + if (!sflow) + return NULL; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (!flow) + return NULL; + + bpf_spin_lock(&flow->lock); + rb_node = bpf_rbtree_first(&flow->queue); + if (!rb_node) { + bpf_spin_unlock(&flow->lock); + goto out; + } + + skbn = container_of(rb_node, struct skb_node, node); + rb_node = bpf_rbtree_remove(&flow->queue, &skbn->node); + bpf_spin_unlock(&flow->lock); + + if (!rb_node) + goto out; + + skbn = container_of(rb_node, struct skb_node, node); + skb = bpf_kptr_xchg(&skbn->skb, skb); + bpf_obj_drop(skbn); + +out: + bpf_kptr_xchg_back(&sflow->flow, flow); + + return skb; +} + +SEC("struct_ops/bpf_fq_dequeue") +struct sk_buff *BPF_PROG(bpf_fq_dequeue, struct Qdisc *sch) +{ + struct dequeue_nonprio_ctx cb_ctx = {}; + struct sk_buff *skb = NULL; + int i; + + if (!sch->q.qlen) + goto out; + + skb = fq_dequeue_prio(); + if (skb) + goto dequeue; + + q.ktime_cache = cb_ctx.now = bpf_ktime_get_ns(); + fq_check_throttled(q.ktime_cache); + bpf_for(i, 0, sch->limit) { + skb = fq_dequeue_nonprio_flows(i, &cb_ctx); + if (cb_ctx.stop_iter) + break; + }; + + if (skb) { +dequeue: + sch->q.qlen--; + sch->qstats.backlog -= qdisc_pkt_len(skb); + bpf_qdisc_bstats_update(sch, skb); + return skb; + } + + if (cb_ctx.expire) + bpf_qdisc_watchdog_schedule(sch, cb_ctx.expire, q.timer_slack); +out: + return NULL; +} + +static int fq_remove_flows_in_list(u32 index, void *ctx) +{ + struct bpf_list_node *node; + struct fq_flow_node *flow; + + bpf_spin_lock(&fq_new_flows_lock); + node = bpf_list_pop_front(&fq_new_flows); + bpf_spin_unlock(&fq_new_flows_lock); + if (!node) { + bpf_spin_lock(&fq_old_flows_lock); + node = bpf_list_pop_front(&fq_old_flows); + bpf_spin_unlock(&fq_old_flows_lock); + if (!node) + return 1; + } + + flow = container_of(node, struct fq_flow_node, list_node); + bpf_obj_drop(flow); + + return 0; +} + +extern unsigned CONFIG_HZ __kconfig; + +/* limit number of collected flows per round */ +#define FQ_GC_MAX 8 +#define FQ_GC_AGE (3*CONFIG_HZ) + +static bool fq_gc_candidate(struct fq_flow_node *flow) +{ + u64 jiffies = bpf_jiffies64(); + + return fq_flow_is_detached(flow) && + ((s64)(jiffies - (flow->age + FQ_GC_AGE)) > 0); +} + +static int +fq_remove_flows(struct bpf_map *flow_map, u64 *hash, + struct fq_stashed_flow *sflow, struct remove_flows_ctx *ctx) +{ + struct fq_flow_node *flow = NULL; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (flow) { + if (!ctx->gc_only || fq_gc_candidate(flow)) { + bpf_obj_drop(flow); + ctx->reset_cnt++; + } else { + bpf_kptr_xchg_back(&sflow->flow, flow); + } + } + + return ctx->reset_cnt < ctx->reset_max ? 0 : 1; +} + +static void fq_gc(void) +{ + struct remove_flows_ctx cb_ctx = { + .gc_only = true, + .reset_cnt = 0, + .reset_max = FQ_GC_MAX, + }; + + bpf_for_each_map_elem(&fq_nonprio_flows, fq_remove_flows, &cb_ctx, 0); +} + +SEC("struct_ops/bpf_fq_reset") +void BPF_PROG(bpf_fq_reset, struct Qdisc *sch) +{ + struct unset_throttled_flows_ctx utf_ctx = { + .unset_all = true, + }; + struct remove_flows_ctx rf_ctx = { + .gc_only = false, + .reset_cnt = 0, + .reset_max = NUM_QUEUE, + }; + struct fq_stashed_flow *sflow; + u64 hash = 0; + + sch->q.qlen = 0; + sch->qstats.backlog = 0; + + bpf_for_each_map_elem(&fq_nonprio_flows, fq_remove_flows, &rf_ctx, 0); + + rf_ctx.reset_cnt = 0; + bpf_for_each_map_elem(&fq_prio_flows, fq_remove_flows, &rf_ctx, 0); + fq_new_flow(&fq_prio_flows, &sflow, hash); + + bpf_loop(NUM_QUEUE, fq_remove_flows_in_list, NULL, 0); + q.new_flow_cnt = 0; + q.old_flow_cnt = 0; + + bpf_loop(NUM_QUEUE, fq_unset_throttled_flows, &utf_ctx, 0); + + return; +} + +SEC("struct_ops/bpf_fq_init") +int BPF_PROG(bpf_fq_init, struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + struct net_device *dev = sch->dev_queue->dev; + u32 psched_mtu = dev->mtu + dev->hard_header_len; + struct fq_stashed_flow *sflow; + u64 hash = 0; + + if (fq_new_flow(&fq_prio_flows, &sflow, hash) < 0) + return -ENOMEM; + + sch->limit = 10000; + q.initial_quantum = 10 * psched_mtu; + q.quantum = 2 * psched_mtu; + q.flow_refill_delay = 40; + q.flow_plimit = 100; + q.horizon = 10ULL * NSEC_PER_SEC; + q.horizon_drop = 1; + q.orphan_mask = 1024 - 1; + q.timer_slack = 10 * NSEC_PER_USEC; + q.time_next_delayed_flow = ~0ULL; + q.unthrottle_latency_ns = 0ULL; + q.new_flow_cnt = 0; + q.old_flow_cnt = 0; + + return 0; +} + +SEC(".struct_ops") +struct Qdisc_ops fq = { + .enqueue = (void *)bpf_fq_enqueue, + .dequeue = (void *)bpf_fq_dequeue, + .reset = (void *)bpf_fq_reset, + .init = (void *)bpf_fq_init, + .id = "bpf_fq", +};