From patchwork Fri May 10 19:23:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661875 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 402BB47A70; Fri, 10 May 2024 19:24:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369056; cv=none; b=qysCPhXR7Uynr33oz5wnEtiqL9e8PlyUsPGElcBXjrt78rm9WiWbw+i9RJzf1JkuKPw4Y+TORAjrqi/EL5pQPgAT0vlkPwRoM5GGhc7GcUOcrnu7CwaLJQrzQ2e6v5Los/rpPIPU/ETzYNOJ3lin/tf89cMYrMfCIr+D+lzypJU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369056; c=relaxed/simple; bh=ZBOPkre873D94fGleWP36Xr4oz1szFk7Gu9NStRGk9U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Cmkqn39QJwkFPiCxsZtceXgR16AiWK9Oo3ig43bvSZuJjyElRTsNhbwAiWL9XY68ZGsM+3wMoHQC2/86psY42q60B5JPPHb4Dqhud341R1HL7bbpbvzncdsCFYBdnfVglUbMUS3xq2Li1c43UEKpmgW4KEkke8kqdRL0vaU6RkA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VYgJDVCs; arc=none smtp.client-ip=209.85.160.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VYgJDVCs" Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-43d269fa3bbso7246691cf.0; Fri, 10 May 2024 12:24:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369054; x=1715973854; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BOuXQAEhKw3rJutNx8xAD21JV3BngUmgn6O7AAvUwcw=; b=VYgJDVCsmDbgMNXK/3QZ7gxsIadvThj79hlSodw94o+oo/BuB02VvCEbFHkdB89k/e 2BdgN/2Sxm2Zr2gt/93EJLyw2f2SDmFxsKU/0jwp8SUZybpl/CgIHWy0uceVqJli6Cmn FggDcYmUGwiLXfM5mOw2cWTa84W7yz4YVmRV8U+wt3rCr9kaMoDSCy3qMaIrZjLfiLxA gelc8F/ztpdDYdRMwxyDAV9zXOJPdDU3fezM+W4wMIzx/ILjeq2oJ9QdElh1mrHrivdN /AG1+/kEax2bEW2w4prVQpbltwPpnUo9ZmXqypofTFxXJ5dUaFnUxGl7C5gWskVj93mB qQjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369054; x=1715973854; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BOuXQAEhKw3rJutNx8xAD21JV3BngUmgn6O7AAvUwcw=; b=lFc5Tj969M+HgwP6hiO4QeJA2te4npB0S8RSSCbz56exolWSMVpPxMPKCdzKoQEcx1 3F06v+JgGkyAXiiqGJVsudsv3pUpFaIEARHCFuWGX8DHgktDsBiTDs4bq9YM5IGh+WUr N3kCAUGlKXRJfNdsXUWMi1Xt5CTokxsPDfBvA6G1UVGSbkcmJq4YK6M43uEciG8e7y5b AX64b8gkfKIaaYo2oJkHL2lSa9llX+sERZk26LhI3G0kST90pUd+yZemKIn9z6G+O3jW tVvnF2r2XCVPfOW69ZsHd/Sy6tHY2KPj2c9oOLpew89uJzayO7NiFv27KgHDRYubGtiO Di9Q== X-Gm-Message-State: AOJu0YzXFoUolx6fWZjE9cpqwANOuCVoCgDsmMrZ9ERbD45Eefr56kxL CmYziUiO3Vt5LSl4HyS169HVgEnxScrKemPMHtztE0EEPc/QdrMYJDEQhg== X-Google-Smtp-Source: AGHT+IF+TR7y7y+YfoQgO3rr6IK1e5lnG8WwMNIpDcLKo8V6nkfCKRzb1tyC1J3SYinSqV1dkv6Wkw== X-Received: by 2002:a05:622a:14b:b0:439:8bae:6ab2 with SMTP id d75a77b69052e-43dfdac0abamr51650031cf.17.1715369054118; Fri, 10 May 2024 12:24:14 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:13 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 01/20] bpf: Support passing referenced kptr to struct_ops programs Date: Fri, 10 May 2024 19:23:53 +0000 Message-Id: <20240510192412.3297104-2-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch supports struct_ops programs that acqurie referenced kptrs throguh arguments. In Qdisc_ops, an skb is passed to ".enqueue" in the first argument. The qdisc becomes the sole owner of the skb and must enqueue or drop the skb. This matches the referenced kptr semantic in bpf. However, the existing practice of acquiring a referenced kptr via a kfunc with KF_ACQUIRE does not play well in this case. Calling kfuncs repeatedly allows the user to acquire multiple references, while there should be only one reference to a unique skb in a qdisc. The solutioin is to make a struct_ops program automatically acquire a referenced kptr through a tagged argument in the stub function. When tagged with "__ref_acquired" (suggestion for a better name?), an reference kptr (ref_obj_id > 0) will be acquired automatically when entering the program. In addition, only the first read to the arguement is allowed and it will yeild a referenced kptr. Signed-off-by: Amery Hung --- include/linux/bpf.h | 3 +++ kernel/bpf/bpf_struct_ops.c | 17 +++++++++++++---- kernel/bpf/btf.c | 10 +++++++++- kernel/bpf/verifier.c | 16 +++++++++++++--- 4 files changed, 38 insertions(+), 8 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9c6a7b8ff963..6aabca1581fe 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -914,6 +914,7 @@ struct bpf_insn_access_aux { struct { struct btf *btf; u32 btf_id; + u32 ref_obj_id; }; }; struct bpf_verifier_log *log; /* for verbose logs */ @@ -1416,6 +1417,8 @@ struct bpf_ctx_arg_aux { enum bpf_reg_type reg_type; struct btf *btf; u32 btf_id; + u32 ref_obj_id; + bool ref_acquired; }; struct btf_mod_pair { diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 86c7884abaf8..bca8e5936846 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -143,6 +143,7 @@ void bpf_struct_ops_image_free(void *image) } #define MAYBE_NULL_SUFFIX "__nullable" +#define REF_ACQUIRED_SUFFIX "__ref_acquired" #define MAX_STUB_NAME 128 /* Return the type info of a stub function, if it exists. @@ -204,6 +205,7 @@ static int prepare_arg_info(struct btf *btf, struct bpf_struct_ops_arg_info *arg_info) { const struct btf_type *stub_func_proto, *pointed_type; + bool is_nullable = false, is_ref_acquired = false; const struct btf_param *stub_args, *args; struct bpf_ctx_arg_aux *info, *info_buf; u32 nargs, arg_no, info_cnt = 0; @@ -240,8 +242,11 @@ static int prepare_arg_info(struct btf *btf, /* Skip arguments that is not suffixed with * "__nullable". */ - if (!btf_param_match_suffix(btf, &stub_args[arg_no], - MAYBE_NULL_SUFFIX)) + is_nullable = btf_param_match_suffix(btf, &stub_args[arg_no], + MAYBE_NULL_SUFFIX); + is_ref_acquired = btf_param_match_suffix(btf, &stub_args[arg_no], + REF_ACQUIRED_SUFFIX); + if (!(is_nullable || is_ref_acquired)) continue; /* Should be a pointer to struct */ @@ -269,11 +274,15 @@ static int prepare_arg_info(struct btf *btf, } /* Fill the information of the new argument */ - info->reg_type = - PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; info->btf_id = arg_btf_id; info->btf = btf; info->offset = offset; + if (is_nullable) { + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; + } else if (is_ref_acquired) { + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID; + info->ref_acquired = true; + } info++; info_cnt++; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 8c95392214ed..e462fb4a4598 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6316,7 +6316,8 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, /* this is a pointer to another type */ for (i = 0; i < prog->aux->ctx_arg_info_size; i++) { - const struct bpf_ctx_arg_aux *ctx_arg_info = &prog->aux->ctx_arg_info[i]; + struct bpf_ctx_arg_aux *ctx_arg_info = + (struct bpf_ctx_arg_aux *)&prog->aux->ctx_arg_info[i]; if (ctx_arg_info->offset == off) { if (!ctx_arg_info->btf_id) { @@ -6324,9 +6325,16 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, return false; } + if (ctx_arg_info->ref_acquired && !ctx_arg_info->ref_obj_id) { + bpf_log(log, "cannot acquire a reference to context argument offset %u\n", off); + return false; + } + info->reg_type = ctx_arg_info->reg_type; info->btf = ctx_arg_info->btf ? : btf_vmlinux; info->btf_id = ctx_arg_info->btf_id; + info->ref_obj_id = ctx_arg_info->ref_obj_id; + ctx_arg_info->ref_obj_id = 0; return true; } } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 9f867fca9fbe..06a6edd306fd 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5557,7 +5557,7 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off, /* check access to 'struct bpf_context' fields. Supports fixed offsets only */ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size, enum bpf_access_type t, enum bpf_reg_type *reg_type, - struct btf **btf, u32 *btf_id) + struct btf **btf, u32 *btf_id, u32 *ref_obj_id) { struct bpf_insn_access_aux info = { .reg_type = *reg_type, @@ -5578,6 +5578,7 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, if (base_type(*reg_type) == PTR_TO_BTF_ID) { *btf = info.btf; *btf_id = info.btf_id; + *ref_obj_id = info.ref_obj_id; } else { env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size; } @@ -6833,7 +6834,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn } else if (reg->type == PTR_TO_CTX) { enum bpf_reg_type reg_type = SCALAR_VALUE; struct btf *btf = NULL; - u32 btf_id = 0; + u32 btf_id = 0, ref_obj_id = 0; if (t == BPF_WRITE && value_regno >= 0 && is_pointer_value(env, value_regno)) { @@ -6846,7 +6847,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn return err; err = check_ctx_access(env, insn_idx, off, size, t, ®_type, &btf, - &btf_id); + &btf_id, &ref_obj_id); if (err) verbose_linfo(env, insn_idx, "; "); if (!err && t == BPF_READ && value_regno >= 0) { @@ -6870,6 +6871,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn if (base_type(reg_type) == PTR_TO_BTF_ID) { regs[value_regno].btf = btf; regs[value_regno].btf_id = btf_id; + regs[value_regno].ref_obj_id = ref_obj_id; } } regs[value_regno].type = reg_type; @@ -20426,6 +20428,7 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) { bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); struct bpf_subprog_info *sub = subprog_info(env, subprog); + struct bpf_ctx_arg_aux *ctx_arg_info; struct bpf_verifier_state *state; struct bpf_reg_state *regs; int ret, i; @@ -20533,6 +20536,13 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) mark_reg_known_zero(env, regs, BPF_REG_1); } + if (env->prog->type == BPF_PROG_TYPE_STRUCT_OPS) { + ctx_arg_info = (struct bpf_ctx_arg_aux *)env->prog->aux->ctx_arg_info; + for (i = 0; i < env->prog->aux->ctx_arg_info_size; i++) + if (ctx_arg_info[i].ref_acquired) + ctx_arg_info[i].ref_obj_id = acquire_reference_state(env, 0); + } + ret = do_check(env); out: /* check for NULL is necessary, since cur_state can be freed inside From patchwork Fri May 10 19:23:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661876 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6B3D175A6; Fri, 10 May 2024 19:24:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369057; cv=none; b=D9Xo++vnOcYo+EpJU6nMiVUaPXhaV9tSov05Zd/Spl38H2L4yzuqBQ+z6k8Onr4EiJ/V3PtuD+okmec3ctmqWdoRuS9b0Tlli7ly9dMneh3T2i1wOdTCTC9x63CysTBm6h1O70p1IQFFtPL/jZvGyqWetVgGENfxIzlIHyaoevE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369057; c=relaxed/simple; bh=nq/5uzUIqCMQh9wr42wW2/1TiuLcNo+khEi8EO26S0k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nrmsqbsQyKnUQ7rM/8XYy8nOlDQPZkxuI2Aj6djpmhSP7imNNxzgCA4EnzB5eiWuvN+J2ItSoxfYSp1sWHknKBOl2hXOH21c2DIZ4FlCZpeQpcISqg7P0b+wkDiy0eu+1apBwcF3ushUEDfreQ5PHyWiOnvM4xnhLAjwMkTTbKQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JS+6vjLl; arc=none smtp.client-ip=209.85.160.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JS+6vjLl" Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-43e06d21a06so3389451cf.3; Fri, 10 May 2024 12:24:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369054; x=1715973854; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=w5YJln67TCREziDO9SweZeJHQaPK0JaGLKJWTfi8+Uw=; b=JS+6vjLl/S7brC33p74BLwU9gAssidd9ZZAuGByk3GdjY7YsV0QTxT51wM7kUFGEO2 QACcGL/ycTOsjJhjUPi9M/Bo0ofZErX3gtGQb858Vy+8C5alh5L5Cz/gkcUDS/bHacfk yC2nEd3FUoz+jmx766jL49D8ELBwn7or5lKZM96xmMOt3MHwz4rVNi9BoyJZCmgOq2xF 4rKnORAvyT94nzCeFARoE3gEEnvYx0PivAUIwGAr+r0RmgRvuHVB+0UCCsP3+W+NGlFf 4WCWRNWx0wm/JwbXjHqKMvXHWqFZnqeXXc7kDhLvVRT7GaVzpBWv7IdD5wqSCe4sUKyb eJPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369054; x=1715973854; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w5YJln67TCREziDO9SweZeJHQaPK0JaGLKJWTfi8+Uw=; b=MGvkFR8iG/04Nfq9Z7aRkERGdjzH/bszdqVUnkyihBca2HKIEMTNceDjOeEfJcl2Hb jUO2dZ6Xg5jeuGxNKEY8tvc2ZwdAgIzSbRGZGMkdj1/wtBI/HhID2peoqGUI9wsj1S8R UEqrv6CJgqr+rSFsWseoOQZvnCRz7QPylaaTyidjBHm8NM8hNE6CYZSZRGc5yjzYYXm1 UBGZRe6NihGG5U3JTJsURk1A4GNnLPYDeTpBvd/1bhQfvTT/H/RJtntm+3kC+8qK1cDe rjimLpMJOW2PrRZB4pXmOUTeQjmKnMtpz92G4sOxyFvDG4lYhHg74cxfKh/HEYFYJdb9 NozA== X-Gm-Message-State: AOJu0YxMhKBrfClJiJXucHD9rjaeLNMSSmRSeiAnJZCt3iaGUL6jPKKh /fZL41uhu/2n9griEIe1CO/FUrfg4aL4u6nO6EmtLtPekah5xHSYDdVfMA== X-Google-Smtp-Source: AGHT+IHUeBUPOnJem25O0whoxu9a+QzTReLiUSrMMikCo2/mA8RGqzVM/VGGYWjviSLmcGfRyqH50Q== X-Received: by 2002:a05:622a:1a88:b0:43d:ebab:d4dd with SMTP id d75a77b69052e-43dfd9a9294mr39286661cf.0.1715369054695; Fri, 10 May 2024 12:24:14 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:14 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 02/20] selftests/bpf: Test referenced kptr arguments of struct_ops programs Date: Fri, 10 May 2024 19:23:54 +0000 Message-Id: <20240510192412.3297104-3-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC A reference is automatically acquired for a referenced kptr argument annotated via the stub function with "__ref_acquired" in a struct_ops program. It must be released and cannot be acquired more than once. The test first checks whether a reference to the correct type is acquired in "ref_acquire". Then, we check if the verifier correctly rejects the program that fails to release the reference (i.e., reference leak) in "ref_acquire_ref_leak". Finally, we check if the reference can be only acquired once through the argument in "ref_acquire_dup_ref". Signed-off-by: Amery Hung --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 7 +++ .../selftests/bpf/bpf_testmod/bpf_testmod.h | 2 + .../prog_tests/test_struct_ops_ref_acquire.c | 58 +++++++++++++++++++ .../bpf/progs/struct_ops_ref_acquire.c | 27 +++++++++ .../progs/struct_ops_ref_acquire_dup_ref.c | 24 ++++++++ .../progs/struct_ops_ref_acquire_ref_leak.c | 19 ++++++ 6 files changed, 137 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_ref_acquire.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_ref_acquire.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_dup_ref.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_ref_leak.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 39ad96a18123..64dcab25b539 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -594,10 +594,17 @@ static int bpf_testmod_ops__test_maybe_null(int dummy, return 0; } +static int bpf_testmod_ops__test_ref_acquire(int dummy, + struct task_struct *task__ref_acquired) +{ + return 0; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, .test_maybe_null = bpf_testmod_ops__test_maybe_null, + .test_ref_acquire = bpf_testmod_ops__test_ref_acquire, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index 23fa1872ee67..a0233990fb0e 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -35,6 +35,8 @@ struct bpf_testmod_ops { void (*test_2)(int a, int b); /* Used to test nullable arguments. */ int (*test_maybe_null)(int dummy, struct task_struct *task); + /* Used to test ref_acquired arguments. */ + int (*test_ref_acquire)(int dummy, struct task_struct *task); /* The following fields are used to test shadow copies. */ char onebyte; diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_ref_acquire.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_ref_acquire.c new file mode 100644 index 000000000000..779287a00ed8 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_ref_acquire.c @@ -0,0 +1,58 @@ +#include + +#include "struct_ops_ref_acquire.skel.h" +#include "struct_ops_ref_acquire_ref_leak.skel.h" +#include "struct_ops_ref_acquire_dup_ref.skel.h" + +/* Test that the verifier accepts a program that acquires a referenced + * kptr and releases the reference + */ +static void ref_acquire(void) +{ + struct struct_ops_ref_acquire *skel; + + skel = struct_ops_ref_acquire__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open_and_load")) + return; + + struct_ops_ref_acquire__destroy(skel); +} + +/* Test that the verifier rejects a program that acquires a referenced + * kptr without releasing the reference + */ +static void ref_acquire_ref_leak(void) +{ + struct struct_ops_ref_acquire_ref_leak *skel; + + skel = struct_ops_ref_acquire_ref_leak__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__open_and_load")) + return; + + struct_ops_ref_acquire_ref_leak__destroy(skel); +} + +/* Test that the verifier rejects a program that tries to acquire a + * referenced twice + */ +static void ref_acquire_dup_ref(void) +{ + struct struct_ops_ref_acquire_dup_ref *skel; + + skel = struct_ops_ref_acquire_dup_ref__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__open_and_load")) + return; + + struct_ops_ref_acquire_dup_ref__destroy(skel); +} + +void test_struct_ops_ref_acquire(void) +{ + if (test__start_subtest("ref_acquire")) + ref_acquire(); + if (test__start_subtest("ref_acquire_ref_leak")) + ref_acquire_ref_leak(); + if (test__start_subtest("ref_acquire_dup_ref")) + ref_acquire_dup_ref(); +} + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire.c b/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire.c new file mode 100644 index 000000000000..bae342db0fdb --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire.c @@ -0,0 +1,27 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +void bpf_task_release(struct task_struct *p) __ksym; + +/* This is a test BPF program that uses struct_ops to access a referenced + * kptr argument. This is a test for the verifier to ensure that it recongnizes + * the task as a referenced object (i.e., ref_obj_id > 0). + */ +SEC("struct_ops/test_ref_acquire") +int BPF_PROG(test_ref_acquire, int dummy, + struct task_struct *task) +{ + bpf_task_release(task); + + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_ref_acquire = { + .test_ref_acquire = (void *)test_ref_acquire, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_dup_ref.c b/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_dup_ref.c new file mode 100644 index 000000000000..489db98a47fb --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_dup_ref.c @@ -0,0 +1,24 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +void bpf_task_release(struct task_struct *p) __ksym; + +SEC("struct_ops/test_ref_acquire") +int BPF_PROG(test_ref_acquire, int dummy, + struct task_struct *task) +{ + bpf_task_release(task); + bpf_task_release(task); + + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_ref_acquire = { + .test_ref_acquire = (void *)test_ref_acquire, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_ref_leak.c b/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_ref_leak.c new file mode 100644 index 000000000000..c5b9a1d748a1 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_ref_leak.c @@ -0,0 +1,19 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +SEC("struct_ops/test_ref_acquire") +int BPF_PROG(test_ref_acquire, int dummy, + struct task_struct *task) +{ + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_ref_acquire = { + .test_ref_acquire = (void *)test_ref_acquire, +}; + + From patchwork Fri May 10 19:23:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661877 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-vs1-f50.google.com (mail-vs1-f50.google.com [209.85.217.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A38C47F6F; Fri, 10 May 2024 19:24:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369058; cv=none; b=M77wspXhm3yTs+MV+erV2n4NEUa2TpuL/I68+0bt+TCZWvGycARTj+drMrYzy57uxQpXHaT+V5RRKO2L3VQ+fGblJxvQGqizZP7gEKyLUN7HU5UtzO4hpCHTSZwcTf3gIhQfxrxOOiluhZAGsEbreyPtfK6E1BMomUYM50+xfKU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369058; c=relaxed/simple; bh=qELuF+Cc4ONyICP3ZRDho6bvZLqdh+8lLbRzBJ/sj9c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CJZIUEkILWz9jOVeFyeXGVGbjNZ49siIfhcbmveisdoOFuBF6y4BzQNNO7HjA9DTZTZPcceuZfsP10YcQv154eLsIAe6AOaAKnlwvnE19fd9IPOahL/+T7g3cwic3VQNCal3+/RDtSZwyFOD/IEDQdjVzqk5iqzh2qmV2bH3w6M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EojEOA3+; arc=none smtp.client-ip=209.85.217.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EojEOA3+" Received: by mail-vs1-f50.google.com with SMTP id ada2fe7eead31-47eff9b3c23so836589137.0; Fri, 10 May 2024 12:24:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369055; x=1715973855; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Nq6AEd1+I0lBJLo+1769o4RMbb/mkQ7MLhlZLDxNmmo=; b=EojEOA3+FJiThYVa80mFOvkDyTnxQ3PEdDuma+0KtSMoj5/p1wrU5xBJjDCHgva0hq lhZGf7XHZ/XVgMaHXLfsBft5oR1wJ9JU4CDSH4Oxc5+PlRkqq1/xwMM3pfP8OlywwvjV 0DpyhTdpHCjEFRO3ABwEFXFhUzB2J1elhl+gdMDtnYBKjU+8AqEhOT8Komh2O5vSv+mL 1IG4IJoWmB/SJzPiDab4zZdq3N2DAH4XR7YS1qFa4KGy57nKUvLkU1yZa9SRFf/k7fEN 6eROQHSDz54G68SQ8bgOT69MCYSOxSf1DgdnoTuU3c8qC5rv6lClep+WuUYirBPZdhZO QOWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369055; x=1715973855; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Nq6AEd1+I0lBJLo+1769o4RMbb/mkQ7MLhlZLDxNmmo=; b=TnFdQB5gpWEL+nXhqtEwv+kC1A8oFJsvbjdRk1U+QK3vm3IjWlZtMfkkEcfpu3nRD6 Bel7U9tynSYVEpQPcO0Tz1UeUNyuIWzMD1GCtSqNmXxxj4N/mNfAmR319BVjDnRCFblO 69Iw7zONg66UswwyQGTN4toWDgvn+BrEThg5gZpm3jouJ4MqNEhEoF5bSfUGGe+unfUl iO5JLaWEVXDecXfQsAE/9vVU8J5kQyWmcvauCzsP+gU6DKKhbPokJVc9ueGcRlCag2ej hS/YuyXSUI/y2JVk+B3sK+WKUTiN8/QONP8Jda5vrR8+ebj9qp//35QtESW2st0Z0x9b Mw4g== X-Gm-Message-State: AOJu0YxPu6ZUBMqFfHt8c0cvumv+B6olOLY1a/sWQrqFNoLSGkoc6XtQ kNkzW4X3c570+gZNnb1DiEHWHsylTsSBebpwS7y4y+a9sLJ5CCdoUV1Y5A== X-Google-Smtp-Source: AGHT+IHRF5IxuTAH4l9jI1XQUNvEAGaE6FUQ5jYxTkr4aGznhrErLlQLLIWs1nHgP530/LEAHdKAhQ== X-Received: by 2002:a05:6102:3a11:b0:47b:f5ce:569d with SMTP id ada2fe7eead31-48077dcd8f2mr4229851137.3.1715369055246; Fri, 10 May 2024 12:24:15 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:14 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 03/20] bpf: Allow struct_ops prog to return referenced kptr Date: Fri, 10 May 2024 19:23:55 +0000 Message-Id: <20240510192412.3297104-4-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch allows a struct_ops program to return a referenced kptr if the struct_ops member has a pointer return type. To make sure the pointer returned to kernel is valid, it needs to be referenced and originally comes from the kernel. That is, it should be acquired through kfuncs or struct_ops "ref_acquried" arguments, but not allocated locally. Besides, null pointer is allowed. Therefore, kernel caller of the struct_ops function consuming the pointer needs to take care of the potential null pointer. The first use case will be Qdisc_ops::dequeue, where a qdisc returns a pointer to the skb to be dequeued. To achieve this, we first allow a reference object to leak through return if it is in the return register and the type matches the return type of the function. Then, we check whether the pointer to-be-returned is valid in check_return_code(). Signed-off-by: Amery Hung --- kernel/bpf/verifier.c | 50 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 06a6edd306fd..2d4a55ead85b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10081,16 +10081,36 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit) { + enum bpf_prog_type type = resolve_prog_type(env->prog); + u32 regno = exception_exit? BPF_REG_1 : BPF_REG_0; + struct bpf_reg_state *reg = reg_state(env, regno); struct bpf_func_state *state = cur_func(env); + const struct bpf_prog *prog = env->prog; + const struct btf_type *ret_type = NULL; bool refs_lingering = false; + struct btf *btf; int i; if (!exception_exit && state->frameno && !state->in_callback_fn) return 0; + if (type == BPF_PROG_TYPE_STRUCT_OPS && + reg->type & PTR_TO_BTF_ID && reg->ref_obj_id) { + btf = bpf_prog_get_target_btf(prog); + ret_type = btf_type_by_id(btf, prog->aux->attach_func_proto->type); + if (reg->btf_id != ret_type->type) { + verbose(env, "Return kptr type, struct %s, doesn't match function prototype, struct %s\n", + btf_type_name(reg->btf, reg->btf_id), + btf_type_name(btf, ret_type->type)); + return -EINVAL; + } + } + for (i = 0; i < state->acquired_refs; i++) { if (!exception_exit && state->in_callback_fn && state->refs[i].callback_ref != state->frameno) continue; + if (ret_type && reg->ref_obj_id == state->refs[i].id) + continue; verbose(env, "Unreleased reference id=%d alloc_insn=%d\n", state->refs[i].id, state->refs[i].insn_idx); refs_lingering = true; @@ -15395,12 +15415,15 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char const char *exit_ctx = "At program exit"; struct tnum enforce_attach_type_range = tnum_unknown; const struct bpf_prog *prog = env->prog; - struct bpf_reg_state *reg; + struct bpf_reg_state *reg = reg_state(env, regno); struct bpf_retval_range range = retval_range(0, 1); enum bpf_prog_type prog_type = resolve_prog_type(env->prog); int err; struct bpf_func_state *frame = env->cur_state->frame[0]; const bool is_subprog = frame->subprogno; + struct btf *btf = bpf_prog_get_target_btf(prog); + bool st_ops_ret_is_kptr = false; + const struct btf_type *t; /* LSM and struct_ops func-ptr's return type could be "void" */ if (!is_subprog || frame->in_exception_callback_fn) { @@ -15409,10 +15432,26 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char if (prog->expected_attach_type == BPF_LSM_CGROUP) /* See below, can be 0 or 0-1 depending on hook. */ break; - fallthrough; + if (!prog->aux->attach_func_proto->type) + return 0; + break; case BPF_PROG_TYPE_STRUCT_OPS: if (!prog->aux->attach_func_proto->type) return 0; + + t = btf_type_by_id(btf, prog->aux->attach_func_proto->type); + if (btf_type_is_ptr(t)) { + /* Allow struct_ops programs to return kptr or null if + * the return type is a pointer type. + * check_reference_leak has ensured the returning kptr + * matches the type of the function prototype and is + * the only leaking reference. Thus, we can safely return + * if the pointer is in its unmodified form + */ + if (reg->type & PTR_TO_BTF_ID) + return __check_ptr_off_reg(env, reg, regno, false); + st_ops_ret_is_kptr = true; + } break; default: break; @@ -15434,8 +15473,6 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char return -EACCES; } - reg = cur_regs(env) + regno; - if (frame->in_async_callback_fn) { /* enforce return zero from async callbacks like timer */ exit_ctx = "At async callback return"; @@ -15522,6 +15559,11 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char case BPF_PROG_TYPE_NETFILTER: range = retval_range(NF_DROP, NF_ACCEPT); break; + case BPF_PROG_TYPE_STRUCT_OPS: + if (!st_ops_ret_is_kptr) + return 0; + range = retval_range(0, 0); + break; case BPF_PROG_TYPE_EXT: /* freplace program can return anything as its return value * depends on the to-be-replaced kernel func or bpf program. From patchwork Fri May 10 19:23:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661878 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60FBD495F0; Fri, 10 May 2024 19:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369059; cv=none; b=fIJ4z3ayod/0i87cB81NYSh8O5X5NavDM909YMxIVbn8JHK+XVc5xp7/Y8wGqzxpgieHGrXus7PvyvskmSCENVifPIOKxt/k7r9/YzBXITzHrdfuvw9DYKFwc2AkiJOtwC6EbmQvZnxb08/IMN+UsRbfnFKWGAk8Jwfi0Wb99Xo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369059; c=relaxed/simple; bh=TnBvzKVSRvp41lLaB77onR7QlVDU1ixHo+VfF51Mq7k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=e/ESXTlfw5KWDqDN31aSOUStMOVeix/LY6GfsL6ouW4cbD5uhCFr+2HSwoLL6yPDtmYPKD077EUTDDMgOzVAdoZjb7O9xj2axzmGOapmByOR25nKJGVREfZJUNirDoQjSJQCgixWGov0SyYTeS65z5Qv/VZAQ2Mcbk2oCcwG6QU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gi6ZXOnS; arc=none smtp.client-ip=209.85.128.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gi6ZXOnS" Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-61be4b98766so27167057b3.3; Fri, 10 May 2024 12:24:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369056; x=1715973856; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dSXogPVCNCmUa6HtDLih4q+bEy2SsRVAx205AbGkGKg=; b=gi6ZXOnSzyWeWhqzd9D10bTo4RC1Ony2iGLGmTm4OkkCIq/CMgAAerTOElqiFdL4hg 1+GgLj2VNHw8+9aOVct1v1qB63WGJZowEmsRdpHRHDPpD31itq47Vyn458Hiz/yb7G+O PzDMrFmUNi7fKvzYM1+Q5kLgiZF/ML10ElY/++9Y46VtHqEGINrJlph1YyHjXAs41M16 rXD4V7ZaJd3gz/4y72PKoZJJelN4cpHlpGChB6Av7wgXyCLFzYIvojwzLty/get6havl rGZqEHwYVtfL5WUzxWVXGJvugzgUpefiD8XSyqZ7wiGAeoY8VOgXSFdsTf6Mtuxv5WwJ v/tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369056; x=1715973856; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dSXogPVCNCmUa6HtDLih4q+bEy2SsRVAx205AbGkGKg=; b=GlD6zbOuQojky9n7T+PGFjhxse8Y/THB0RVl+4xfvRiTJZXASZ0XOmVdKyMewNi6IM ow1ioi87WN3KkgYFllF4eMVEBm50EBjXyWqJhhXlxAqj6UKK7JziQS6LvhF3jHZM93Yu /PiP3cC97xKmvcCBpaqKi8DZT4bCLD2bF2o0tw7a+1I3s0SifpBh3w4uu1+v5OcsTl9J 6xLzJ4Z0wUYx69YB/RD+/n886xjmTVs2m2Ni6RFKAlUXzBt7+KbfhKeCngTrupyteXN1 bprzT74ZIxho1QtFACefjmgoaQW5h6tn44ze9PCGnrDtBTzxhoiVQZSpy7uWzW+MyQGx J4Ug== X-Gm-Message-State: AOJu0YwEzsdJ9KPywbXYJhfGIfU6psYqVcsQmQmbheDx7T9Y9c7eMU35 rwoCvK18MHzGYc2noOduoJToUWmIKAmxedwRj9bzU2bk3ifV2rj2JlXYEg== X-Google-Smtp-Source: AGHT+IFFS7V5SpqdI0/DgUAdStD7+RcaSrWqct0VDrd+gK7LOqjvCbiW4RQnfEk0inedxQpQ5ScDyg== X-Received: by 2002:a81:6d0d:0:b0:61b:765b:cbea with SMTP id 00721157ae682-622aff427ffmr37756477b3.7.1715369055891; Fri, 10 May 2024 12:24:15 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:15 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 04/20] selftests/bpf: Test returning kptr from struct_ops programs Date: Fri, 10 May 2024 19:23:56 +0000 Message-Id: <20240510192412.3297104-5-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Test struct_ops programs returning kptr. The verifier should only allow programs returning NULL or a non-local kptr with the correct type. Signed-off-by: Amery Hung --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 8 ++ .../selftests/bpf/bpf_testmod/bpf_testmod.h | 4 + .../prog_tests/test_struct_ops_kptr_return.c | 87 +++++++++++++++++++ .../bpf/progs/struct_ops_kptr_return.c | 24 +++++ ...uct_ops_kptr_return_fail__invalid_scalar.c | 24 +++++ .../struct_ops_kptr_return_fail__local_kptr.c | 30 +++++++ ...uct_ops_kptr_return_fail__nonzero_offset.c | 23 +++++ .../struct_ops_kptr_return_fail__wrong_type.c | 28 ++++++ 8 files changed, 228 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 64dcab25b539..097a8d1c2ef8 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -600,11 +600,19 @@ static int bpf_testmod_ops__test_ref_acquire(int dummy, return 0; } +static struct task_struct * +bpf_testmod_ops__test_kptr_return(int dummy, struct task_struct *task__ref_acquired, + struct cgroup *cgrp) +{ + return NULL; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, .test_maybe_null = bpf_testmod_ops__test_maybe_null, .test_ref_acquire = bpf_testmod_ops__test_ref_acquire, + .test_kptr_return = bpf_testmod_ops__test_kptr_return, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index a0233990fb0e..6d24e1307b64 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct cgroup; struct bpf_testmod_test_read_ctx { char *buf; @@ -37,6 +38,9 @@ struct bpf_testmod_ops { int (*test_maybe_null)(int dummy, struct task_struct *task); /* Used to test ref_acquired arguments. */ int (*test_ref_acquire)(int dummy, struct task_struct *task); + /* Used to test returning kptr. */ + struct task_struct *(*test_kptr_return)(int dummy, struct task_struct *task, + struct cgroup *cgrp); /* The following fields are used to test shadow copies. */ char onebyte; diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c new file mode 100644 index 000000000000..bc2fac39215a --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c @@ -0,0 +1,87 @@ +#include + +#include "struct_ops_kptr_return.skel.h" +#include "struct_ops_kptr_return_fail__wrong_type.skel.h" +#include "struct_ops_kptr_return_fail__invalid_scalar.skel.h" +#include "struct_ops_kptr_return_fail__nonzero_offset.skel.h" +#include "struct_ops_kptr_return_fail__local_kptr.skel.h" + +/* Test that the verifier accepts a program that acquires a referenced + * kptr and releases the reference through return + */ +static void kptr_return(void) +{ + struct struct_ops_kptr_return *skel; + + skel = struct_ops_kptr_return__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open_and_load")) + return; + + struct_ops_kptr_return__destroy(skel); +} + +/* Test that the verifier rejects a program that returns a kptr of the + * wrong type + */ +static void kptr_return_fail__wrong_type(void) +{ + struct struct_ops_kptr_return_fail__wrong_type *skel; + + skel = struct_ops_kptr_return_fail__wrong_type__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__wrong_type__open_and_load")) + return; + + struct_ops_kptr_return_fail__wrong_type__destroy(skel); +} + +/* Test that the verifier rejects a program that returns a non-null scalar */ +static void kptr_return_fail__invalid_scalar(void) +{ + struct struct_ops_kptr_return_fail__invalid_scalar *skel; + + skel = struct_ops_kptr_return_fail__invalid_scalar__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__invalid_scalar__open_and_load")) + return; + + struct_ops_kptr_return_fail__invalid_scalar__destroy(skel); +} + +/* Test that the verifier rejects a program that returns kptr with non-zero offset */ +static void kptr_return_fail__nonzero_offset(void) +{ + struct struct_ops_kptr_return_fail__nonzero_offset *skel; + + skel = struct_ops_kptr_return_fail__nonzero_offset__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__nonzero_offset__open_and_load")) + return; + + struct_ops_kptr_return_fail__nonzero_offset__destroy(skel); +} + +/* Test that the verifier rejects a program that returns local kptr */ +static void kptr_return_fail__local_kptr(void) +{ + struct struct_ops_kptr_return_fail__local_kptr *skel; + + skel = struct_ops_kptr_return_fail__local_kptr__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__local_kptr__open_and_load")) + return; + + struct_ops_kptr_return_fail__local_kptr__destroy(skel); +} + +void test_struct_ops_kptr_return(void) +{ + if (test__start_subtest("kptr_return")) + kptr_return(); + if (test__start_subtest("kptr_return_fail__wrong_type")) + kptr_return_fail__wrong_type(); + if (test__start_subtest("kptr_return_fail__invalid_scalar")) + kptr_return_fail__invalid_scalar(); + if (test__start_subtest("kptr_return_fail__nonzero_offset")) + kptr_return_fail__nonzero_offset(); + if (test__start_subtest("kptr_return_fail__local_kptr")) + kptr_return_fail__local_kptr(); +} + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c new file mode 100644 index 000000000000..34933a88e1f9 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c @@ -0,0 +1,24 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * allow a referenced kptr (acquired with "ref_acquired") to be leaked through return. + */ +SEC("struct_ops/test_kptr_return") +struct task_struct *BPF_PROG(test_kptr_return, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + return task; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_kptr_return = (void *)test_kptr_return, +}; + + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c new file mode 100644 index 000000000000..d479e3377496 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c @@ -0,0 +1,24 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a non-zero scalar value. + */ +SEC("struct_ops/test_kptr_return") +struct task_struct *BPF_PROG(test_kptr_return, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + bpf_task_release(task); + return (struct task_struct *)1; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_kptr_return = (void *)test_kptr_return, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c new file mode 100644 index 000000000000..9266987798ca --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c @@ -0,0 +1,30 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a local kptr. + */ +SEC("struct_ops/test_kptr_return") +struct task_struct *BPF_PROG(test_kptr_return, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + struct task_struct *t; + + t = bpf_obj_new(typeof(*task)); + if (!t) + return task; + + return t; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_kptr_return = (void *)test_kptr_return, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c new file mode 100644 index 000000000000..1a369e9839f3 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c @@ -0,0 +1,23 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a modified referenced kptr. + */ +SEC("struct_ops/test_kptr_return") +struct task_struct *BPF_PROG(test_kptr_return, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + return (struct task_struct *)&task->jobctl; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_kptr_return = (void *)test_kptr_return, +}; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c new file mode 100644 index 000000000000..4128ea0b77f1 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c @@ -0,0 +1,28 @@ +#include +#include +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +/* This test struct_ops BPF programs returning referenced kptr. The verifier should + * reject programs returning a referenced kptr of the wrong type. + */ +SEC("struct_ops/test_kptr_return") +struct task_struct *BPF_PROG(test_kptr_return, int dummy, + struct task_struct *task, struct cgroup *cgrp) +{ + struct task_struct *ret; + + ret = (struct task_struct *)bpf_cgroup_acquire(cgrp); + bpf_task_release(task); + + return ret; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_kptr_return = { + .test_kptr_return = (void *)test_kptr_return, +}; From patchwork Fri May 10 19:23:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661881 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-oo1-f48.google.com (mail-oo1-f48.google.com [209.85.161.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F36174D135; Fri, 10 May 2024 19:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369060; cv=none; b=KoVnpH7JN3E/t6O/pUYfLTS46l4n/9X4JIyiXXK44nRAr8PEc2agZGTRf8739M5SStTYByRYuteEprYn60N4L08/dJiL9eylqp9dKSEIkuv8KQqD1QH5xj2RXfYYr8jSyLtj404zvf+0xSybHRCe2EGdDacrLdAQJBZN+2cuxDc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369060; c=relaxed/simple; bh=FwrEGylC4YNlzqjMfKbV8tttqeM65gtqhap5vUcMyrU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=R1P4YV6SYxccosNBy1Vk3L8HbCNsyAciPUzdUo97VpstnZUgiGhcQP+lIWIzP43mUO/KPPA7+AOw4wX8JOJitTmAga+YjC32RNb1D9N+iUMMRpJueTgDRgPgEtNLG+wbUU3ojk8JC2JKy4/fiLD0cIXFhB1QpWNvxSjG+50QoNU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XAVp5N2n; arc=none smtp.client-ip=209.85.161.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XAVp5N2n" Received: by mail-oo1-f48.google.com with SMTP id 006d021491bc7-5b283d2b1b4so593102eaf.0; Fri, 10 May 2024 12:24:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369057; x=1715973857; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tIUJawt0aviLgC4egL3XBtE2Jfbiei8RIU4+1Vt6Fpc=; b=XAVp5N2nqaDpSnG+rmmTpc8adf7Rhy+s+09d5I+4jNgajNoboEbPNosw6i5JbENKWz b1sFCo6sbfG+BUBH7lkNVEawKXbRU8qmdyM4ZoAtqQBn3j12UBioxTpzKJg/WkXMY3VC 3kT05e5DaMe4IMOxUK21auZ+MbrzidUiXNfidlsFEYhvhtFAF7+uLrRy6VO82ugmiIzv T85fk4PJNADA5ZJc38EjOUIy8gwcIIJNopHTktB/nkFlRQwjwpz+iWHOBNPjWTWKspP5 Ca7XtIM3yiodyUvrRvt3fdurropCU0G/Na8+HjLBZxLFeFLjcTzPgPtiL474I7uiJfkD YZow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369057; x=1715973857; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tIUJawt0aviLgC4egL3XBtE2Jfbiei8RIU4+1Vt6Fpc=; b=DeEtMLr7TGy14sdTioq7KGruzVjSPLYN6IbJ7g0HkQikj4jxfYpq61xNsbLvZE+beX Ua/1V6aFkT+EV2LGmjar3nGWujOXVNhokPNfMG6rN5jr9PUcJ4ruy24ul/pA7P2YKCjW BlLTGBmH7HE/Cch+jRMdZg3IN7I3Bth2Zi1Y6RSVgE0jLsbZmYvjBKCfZttW28/JSprs dNfAoAEGis0bGMXWiGGBU0p5mRs1uqlYKhF/s41Tr/wR2rJxMIwKn5xbYZTHUImASB9O Q0/v/OHfFFLnKz/LIJLqAfQ0hcgnx3cZiOMqb62CGfhogGA7xfKdWB+fO4UTPD4AIWGp taNA== X-Gm-Message-State: AOJu0Yw26s6kflrl4XdtSGNGgyP5eKQRH/kwMaKUgwRVvfESIu6lqR+Z wSF8wU7Ru0/5fmKWvbSW0mVKFEywXs470wVhLPkV0T5Un68R2JqNmyZYcg== X-Google-Smtp-Source: AGHT+IFRabZ/ZaL16AoNnu91bmRwZobadIDbFxADs+tZHXq49BQ/zQH1oRJadPFcuDMxL8H3aY8T/A== X-Received: by 2002:a05:6358:63a4:b0:18f:8613:12b8 with SMTP id e5c5f4694b2df-193bb3fd3e1mr424120555d.5.1715369056581; Fri, 10 May 2024 12:24:16 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:16 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 05/20] bpf: Generate btf_struct_metas for kernel BTF Date: Fri, 10 May 2024 19:23:57 +0000 Message-Id: <20240510192412.3297104-6-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Currently, only program BTF from the user may contain special BTF fields (e.g., bpf_list_head, bpf_spin_lock, and bpf_timer). To support adding kernel objects to collections, we will need specical BTF fields (i.e., graph nodes) in kernel structures as well. This patch takes the first step by finding these fields and build metadata for kernel BTF. Unlike parsing program BTF, where we go through all types, an allowlist specifying kernel structures that contain special BTF fields is used. This to avoid wasting time parsing most kernel types that does not have any special BTF field. Signed-off-by: Amery Hung --- kernel/bpf/btf.c | 63 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 59 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index e462fb4a4598..5ee6ccc2fab7 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -5380,6 +5380,11 @@ static const char *alloc_obj_fields[] = { "bpf_refcount", }; +/* kernel structures with special BTF fields*/ +static const char *kstructs_with_special_btf[] = { + "unused", +}; + static struct btf_struct_metas * btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) { @@ -5391,6 +5396,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) } _arr; } aof; struct btf_struct_metas *tab = NULL; + bool btf_is_base_kernel; int i, n, id, ret; BUILD_BUG_ON(offsetof(struct btf_id_set, cnt) != 0); @@ -5412,16 +5418,25 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) return NULL; sort(&aof.set.ids, aof.set.cnt, sizeof(aof.set.ids[0]), btf_id_cmp_func, NULL); - n = btf_nr_types(btf); + btf_is_base_kernel = btf_is_kernel(btf) && !btf_is_module(btf); + n = btf_is_base_kernel ? ARRAY_SIZE(kstructs_with_special_btf) : btf_nr_types(btf); for (i = 1; i < n; i++) { struct btf_struct_metas *new_tab; const struct btf_member *member; struct btf_struct_meta *type; struct btf_record *record; const struct btf_type *t; - int j, tab_cnt; + int j, tab_cnt, id; - t = btf_type_by_id(btf, i); + id = btf_is_base_kernel ? + btf_find_by_name_kind(btf, kstructs_with_special_btf[i], + BTF_KIND_STRUCT) : i; + if (id < 0) { + ret = -EINVAL; + goto free; + } + + t = btf_type_by_id(btf, id); if (!t) { ret = -EINVAL; goto free; @@ -5449,7 +5464,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) tab = new_tab; type = &tab->types[tab->cnt]; - type->btf_id = i; + type->btf_id = id; record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT | BPF_KPTR, t->size); @@ -5967,6 +5982,7 @@ BTF_ID(struct, bpf_ctx_convert) struct btf *btf_parse_vmlinux(void) { + struct btf_struct_metas *struct_meta_tab; struct btf_verifier_env *env = NULL; struct bpf_verifier_log *log; struct btf *btf = NULL; @@ -6009,6 +6025,23 @@ struct btf *btf_parse_vmlinux(void) if (err) goto errout; + struct_meta_tab = btf_parse_struct_metas(&env->log, btf); + if (IS_ERR(struct_meta_tab)) { + err = PTR_ERR(struct_meta_tab); + goto errout; + } + btf->struct_meta_tab = struct_meta_tab; + + if (struct_meta_tab) { + int i; + + for (i = 0; i < struct_meta_tab->cnt; i++) { + err = btf_check_and_fixup_fields(struct_meta_tab->types[i].record); + if (err < 0) + goto errout_meta; + } + } + /* btf_parse_vmlinux() runs under bpf_verifier_lock */ bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]); @@ -6021,6 +6054,8 @@ struct btf *btf_parse_vmlinux(void) btf_verifier_env_free(env); return btf; +errout_meta: + btf_free_struct_meta_tab(btf); errout: btf_verifier_env_free(env); if (btf) { @@ -6034,6 +6069,7 @@ struct btf *btf_parse_vmlinux(void) static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size) { + struct btf_struct_metas *struct_meta_tab; struct btf_verifier_env *env = NULL; struct bpf_verifier_log *log; struct btf *btf = NULL, *base_btf; @@ -6091,10 +6127,29 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u if (err) goto errout; + struct_meta_tab = btf_parse_struct_metas(&env->log, btf); + if (IS_ERR(struct_meta_tab)) { + err = PTR_ERR(struct_meta_tab); + goto errout; + } + btf->struct_meta_tab = struct_meta_tab; + + if (struct_meta_tab) { + int i; + + for (i = 0; i < struct_meta_tab->cnt; i++) { + err = btf_check_and_fixup_fields(struct_meta_tab->types[i].record); + if (err < 0) + goto errout_meta; + } + } + btf_verifier_env_free(env); refcount_set(&btf->refcnt, 1); return btf; +errout_meta: + btf_free_struct_meta_tab(btf); errout: btf_verifier_env_free(env); if (btf) { From patchwork Fri May 10 19:23:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661879 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1E524DA0C; Fri, 10 May 2024 19:24:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369060; cv=none; b=qL2oQln4tpIPy9JjjGKd6A986Hq7HZB4UZ4GdsCr7M+V+H9Etee1VOf/K6168NsxrPDzxTIvDJr6NuACQBiwUGiRROM++NqbAcH3c3OxvJKMACWfcztqgSK9j+bEoKbVlT8X4UuJnNOMD8+Oi8lvlimGWQFQEAEbIQ3RSZv4OG0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369060; c=relaxed/simple; bh=78O1fK/GWjByrVCvMdKfc79ioEJO6ct+eeRphv1z7IU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nk2I5eQ7BFF6QA7+qspkwc6NPHGsmSte7SnlBLU5EIjrHl08/8FYAm8t1TiSIuAHUzVq5YQZBzW8x78oDVRM4LgJa4Gtc5yVeY/DqkF2ad54Zug8u0HKrtBbfbmioDkEbu9joyeFkMzEgEvGbLG4zbg35ZFW4pZuhJj0jrid0Hg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cAEiTwa6; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cAEiTwa6" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-43d4538b16fso7992601cf.2; Fri, 10 May 2024 12:24:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369057; x=1715973857; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HHnVB9wqlBS1hWFoiolkVuBl1TTtjZBWgIt5tZVd4p4=; b=cAEiTwa6Q9dlihhuWVtDf+lkMxCpl5X1j/gmjEN6TKkedlJwF99l5jB5nzDqK7dPpi jD8sGhOParXkkG59u9rIjE1Iw09gFryUmNmvIUi4Qk2IffsToZ5ecMxWGfj0mNuJ9E4z J0oLBmlgVt6m8k03FIOIK4G2qsMW5fC30Vo1dYRRT9CpJqOKLlinlIFKL0DwoeifN/9p tZUggDGf9lUSHblmivjX/4VNpe4hTq6XaWX7leja2PYv3QKvWJm5dJ8YHNVecBlA7B9Q 4/zWQ0Gob5lABJ06kc5RvOqOm2a32ED6BsyXE/73TPz0i7ORWGnvrOyDbLTKToa6GQqW 4ITA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369057; x=1715973857; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HHnVB9wqlBS1hWFoiolkVuBl1TTtjZBWgIt5tZVd4p4=; b=rhF954kMsH+vaTkIqCRBWE59xdrjRs7r4TMdxQSN1v5FyIaBoFnlZ6MlG9VRAoQBLg Tf4rhXqWx953uEbggjgzKjwFiEQE96Ko1jSOwQ7Pjy+yJt0WPMRsXY2W2g/Rqn9vmvy5 wFvwej+YFYbP/L26cIqCo6xwzbLuhAEXbQluh0vgqZNLDBgwCWeDlGOubNyassgPhxIB rVpzQG20iS7Ok3CO+SxuZIEDfWzVQwgEDtfmOFw8lNWqjvhzhS87CRcA/NGNDnXK4REA ENZJyyG9rFZe5deCwUeHFdMqp5nzIryRXGN6ya920e+1ZKh6so+tN6JoUfCXlADhERB+ Fusw== X-Gm-Message-State: AOJu0Yz1fTTsShvDs2E+0ooxiRa10bJ3JjSyEdNX9LK4vDS6XKTYSFOY KJ78WXEJgyQqpSnbdSoQJGrEJ8X9QhAGrlXxdaesbABzJj7GQ2Rnyr8ZUQ== X-Google-Smtp-Source: AGHT+IFuuRfA0Vq97IXe4fcfze9EwMIXXr90yLUiW+tK0dOBSBN5CD6++tu3Ju08PfH+q5hTt3gxVg== X-Received: by 2002:a05:622a:1193:b0:43d:f840:52d1 with SMTP id d75a77b69052e-43dfdd0c2damr36627341cf.60.1715369057308; Fri, 10 May 2024 12:24:17 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:16 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 06/20] bpf: Recognize kernel types as graph values Date: Fri, 10 May 2024 19:23:58 +0000 Message-Id: <20240510192412.3297104-7-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch teaches bpf to recognize graphs that contain kernel objects as graph values. BPF programs can use a new BTF declaration tag "contain_kptr" to signal that the value of a graph will be a kernel type. "contains_kptr" follows the same annotation format as "contains". For the implementation, when the value is a kernel type, we uses kernel BTF for node and roots as well so that we don't need to match the same type in different BTF. Since graph values can be kernel types, we can no longer make the assumption that the BTF is from programs when finding and parsing graph nodes and roots. Therefore, we record the BTF of a node in btf_field_info and use it later. No kernel object can be added to bpf graphs yet. In later patches, we will teach the verifier to allow moving kptr in and out collections. Signed-off-by: Amery Hung --- include/linux/btf.h | 4 +- kernel/bpf/btf.c | 49 ++++++++++++------- kernel/bpf/syscall.c | 2 +- .../testing/selftests/bpf/bpf_experimental.h | 1 + 4 files changed, 36 insertions(+), 20 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index f9e56fd12a9f..2579b8a51172 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -219,7 +219,7 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, u32 expected_offset, u32 expected_size); struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t, u32 field_mask, u32 value_size); -int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec); +int btf_check_and_fixup_fields(struct btf_record *rec); bool btf_type_is_void(const struct btf_type *t); s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind); s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p); @@ -569,7 +569,7 @@ static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dt { return 0; } -static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id) +static inline struct btf_struct_meta *btf_find_struct_meta(u32 btf_id) { return NULL; } diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 5ee6ccc2fab7..37fb6143da79 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3296,6 +3296,7 @@ struct btf_field_info { struct { const char *node_name; u32 value_btf_id; + const struct btf *btf; } graph_root; }; }; @@ -3405,7 +3406,9 @@ btf_find_graph_root(const struct btf *btf, const struct btf_type *pt, enum btf_field_type head_type) { const char *node_field_name; + bool value_is_kptr = false; const char *value_type; + struct btf *kptr_btf; s32 id; if (!__btf_type_is_struct(t)) @@ -3413,15 +3416,26 @@ btf_find_graph_root(const struct btf *btf, const struct btf_type *pt, if (t->size != sz) return BTF_FIELD_IGNORE; value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:"); - if (IS_ERR(value_type)) - return -EINVAL; + if (!IS_ERR(value_type)) + goto found; + value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains_kptr:"); + if (!IS_ERR(value_type)) { + value_is_kptr = true; + goto found; + } + return -EINVAL; +found: node_field_name = strstr(value_type, ":"); if (!node_field_name) return -EINVAL; value_type = kstrndup(value_type, node_field_name - value_type, GFP_KERNEL | __GFP_NOWARN); if (!value_type) return -ENOMEM; - id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT); + if (value_is_kptr) + id = bpf_find_btf_id(value_type, BTF_KIND_STRUCT, &kptr_btf); + else + id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT); + kfree(value_type); if (id < 0) return id; @@ -3431,6 +3445,7 @@ btf_find_graph_root(const struct btf *btf, const struct btf_type *pt, info->type = head_type; info->off = off; info->graph_root.value_btf_id = id; + info->graph_root.btf = value_is_kptr ? kptr_btf : btf; info->graph_root.node_name = node_field_name; return BTF_FIELD_FOUND; } @@ -3722,13 +3737,13 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field, return ret; } -static int btf_parse_graph_root(const struct btf *btf, - struct btf_field *field, +static int btf_parse_graph_root(struct btf_field *field, struct btf_field_info *info, const char *node_type_name, size_t node_type_align) { const struct btf_type *t, *n = NULL; + const struct btf *btf = info->graph_root.btf; const struct btf_member *member; u32 offset; int i; @@ -3766,18 +3781,16 @@ static int btf_parse_graph_root(const struct btf *btf, return 0; } -static int btf_parse_list_head(const struct btf *btf, struct btf_field *field, - struct btf_field_info *info) +static int btf_parse_list_head(struct btf_field *field, struct btf_field_info *info) { - return btf_parse_graph_root(btf, field, info, "bpf_list_node", - __alignof__(struct bpf_list_node)); + return btf_parse_graph_root(field, info, "bpf_list_node", + __alignof__(struct bpf_list_node)); } -static int btf_parse_rb_root(const struct btf *btf, struct btf_field *field, - struct btf_field_info *info) +static int btf_parse_rb_root(struct btf_field *field, struct btf_field_info *info) { - return btf_parse_graph_root(btf, field, info, "bpf_rb_node", - __alignof__(struct bpf_rb_node)); + return btf_parse_graph_root(field, info, "bpf_rb_node", + __alignof__(struct bpf_rb_node)); } static int btf_field_cmp(const void *_a, const void *_b, const void *priv) @@ -3859,12 +3872,12 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type goto end; break; case BPF_LIST_HEAD: - ret = btf_parse_list_head(btf, &rec->fields[i], &info_arr[i]); + ret = btf_parse_list_head(&rec->fields[i], &info_arr[i]); if (ret < 0) goto end; break; case BPF_RB_ROOT: - ret = btf_parse_rb_root(btf, &rec->fields[i], &info_arr[i]); + ret = btf_parse_rb_root(&rec->fields[i], &info_arr[i]); if (ret < 0) goto end; break; @@ -3901,7 +3914,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type return ERR_PTR(ret); } -int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec) +int btf_check_and_fixup_fields(struct btf_record *rec) { int i; @@ -3917,11 +3930,13 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec) return 0; for (i = 0; i < rec->cnt; i++) { struct btf_struct_meta *meta; + struct btf *btf; u32 btf_id; if (!(rec->fields[i].type & BPF_GRAPH_ROOT)) continue; btf_id = rec->fields[i].graph_root.value_btf_id; + btf = rec->fields[i].graph_root.btf; meta = btf_find_struct_meta(btf, btf_id); if (!meta) return -EFAULT; @@ -5630,7 +5645,7 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat int i; for (i = 0; i < struct_meta_tab->cnt; i++) { - err = btf_check_and_fixup_fields(btf, struct_meta_tab->types[i].record); + err = btf_check_and_fixup_fields(struct_meta_tab->types[i].record); if (err < 0) goto errout_meta; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index e44c276e8617..9e93d48efe19 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1157,7 +1157,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, } } - ret = btf_check_and_fixup_fields(btf, map->record); + ret = btf_check_and_fixup_fields(map->record); if (ret < 0) goto free_map_tab; diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index a5b9df38c162..a4da75df819c 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -7,6 +7,7 @@ #include #define __contains(name, node) __attribute__((btf_decl_tag("contains:" #name ":" #node))) +#define __contains_kptr(name, node) __attribute__((btf_decl_tag("contains_kptr:" #name ":" #node))) /* Description * Allocates an object of the type represented by 'local_type_id' in From patchwork Fri May 10 19:23:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661880 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F169E4E1A2; Fri, 10 May 2024 19:24:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369060; cv=none; b=nQTabKKPsbGuWAe63JVnBv2zhIdGoa0yG3SaXF9Xi9nuuyyCk9cckTG30bEzGlsH6Mmk6UIIXfiv5Zo+ETfbVfo3ytsGynOVO/b0EZuG8IJhkWQmr0CwHHE62HFoc+7j1T9r0/ASNI/h5L4sZrecxXyV+cLlfnS4+LNXBC/N/eA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369060; c=relaxed/simple; bh=CR7ZGBrrR73rmPsA8ZlyLKpezlNbf+aOQYW9pxfs1sQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iMQhqIXlpmwzsZeddpwUsg29LKKh8MrYOKgwMly0wqz4OJTGf3xePoeF4M+huZnMDHekoANyPby4vLIzEUMogiB3LkRInOyr3q6f98z/kHFapDtbyVJP0U7LzVzBPmIdYvAw31n/Y7yTsmaIHHuDzAIzS0/waUClpISN4B5zEHI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JeNCyg5x; arc=none smtp.client-ip=209.85.160.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JeNCyg5x" Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-43dfcbc4893so7532801cf.2; Fri, 10 May 2024 12:24:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369058; x=1715973858; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=m9a/RePxiUW4k3uhOxmW6u08M40VUxUFwzco5EbUNc0=; b=JeNCyg5xuDt/th2C7VAkAhXdaTaAiXaLqO+rzXAbDtOCvnkrkbUE4LDVZbKXC0NqrT hPNEyO0mudmNpLjEnYsGomvgSiT+MY3Jd7ZNh8Sq2j+mMmPx+u5xxOM638IIvEIph4jy F18h8Fl+PicJ79mem5eInejQI8fSACFbGMmaPc7EzUCAIl3lAIbvfMi+slPPfxOY6dEP E8xBGKL6uEtturW4vWPcJiv9ErUy+lJp/gDbcxTjOmBzqU6jutSFKhCe/KgxKGvce6gJ k9XKA7a4xnQnEXp3olvQ5+OYy8oc28Brv5+z1hz+SNIEdmnrhPGBlTlq9rPPq7cZeSBr MZmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369058; x=1715973858; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=m9a/RePxiUW4k3uhOxmW6u08M40VUxUFwzco5EbUNc0=; b=vCbf14aSswJuwYSVDayfYgkv+sye3E/HFQuYsUKUoCDGsv8ZUAYcABeabYoWHCtFl/ lYoOl9BXKBuz8JhFmeMSR+G8o+vS9iem5zD6B533abx3EfdfM/AQETujU05BTvA1oafc p083/DHjBbQhF0lVyJkwJtYPkyCSc6Zl1oQ9U3tdEHMzr46Cmws4Hz7XzqvV0wcKs3Kx 0jSydOoP2MHfhxaiiAMTDgbHdf9QhHEARnv+bO/zUidqwzjJJiL6qMgGF0Aw96iEdHjk gpYhMkmD+gZ7DTyYzX5ylveeFhQ308wCvOKzDtqZ3nIwXOHe4lpzUFPOsvPX8cZVPL8C DFdw== X-Gm-Message-State: AOJu0YyWO24QTHx38UiqWv9ZVvt7clihS5T29DYy76kXGEWMYQH4Auvf wYjWx2XmNyuopnBNPFkgUXmvHSOXsIaCNFrEFEvLAbECPymEyPQc8i2seg== X-Google-Smtp-Source: AGHT+IHGYwgAYVc1lAXO8jJHVThO9Wh4re+Kr0NbMB6VMM2aDHzcaVGso/h6KW7cH7FrINXv2pe3tw== X-Received: by 2002:a05:622a:5c8c:b0:43a:dbbb:a19d with SMTP id d75a77b69052e-43dfdb72e46mr44227511cf.33.1715369057814; Fri, 10 May 2024 12:24:17 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:17 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 07/20] bpf: Allow adding kernel objects to collections Date: Fri, 10 May 2024 19:23:59 +0000 Message-Id: <20240510192412.3297104-8-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC To allow adding/removing kernel objects to/from collections, we teach the verifier that a graph node can be in a trusted kptr in addition to local objects. Besides, a kernel graph value removed from a collection should still be a trusted kptr. Signed-off-by: Amery Hung --- include/linux/bpf_verifier.h | 8 +++++++- kernel/bpf/verifier.c | 18 ++++++++++++------ 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 7cb1b75eee38..edb306ef4c61 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -864,9 +864,15 @@ static inline bool type_is_ptr_alloc_obj(u32 type) return base_type(type) == PTR_TO_BTF_ID && type_flag(type) & MEM_ALLOC; } +static inline bool type_is_ptr_trusted(u32 type) +{ + return base_type(type) == PTR_TO_BTF_ID && type_flag(type) & PTR_TRUSTED; +} + static inline bool type_is_non_owning_ref(u32 type) { - return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF; + return (type_is_ptr_alloc_obj(type) || type_is_ptr_trusted(type)) && + type_flag(type) & NON_OWN_REF; } static inline bool type_is_pkt_pointer(enum bpf_reg_type type) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2d4a55ead85b..f01d2b876a2e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -413,7 +413,8 @@ static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg) if (reg->type == PTR_TO_MAP_VALUE) { rec = reg->map_ptr->record; - } else if (type_is_ptr_alloc_obj(reg->type)) { + } else if (type_is_ptr_alloc_obj(reg->type) || type_is_ptr_trusted(reg->type) || + reg->type == PTR_TO_BTF_ID) { meta = btf_find_struct_meta(reg->btf, reg->btf_id); if (meta) rec = meta->record; @@ -1860,7 +1861,8 @@ static void mark_reg_graph_node(struct bpf_reg_state *regs, u32 regno, struct btf_field_graph_root *ds_head) { __mark_reg_known_zero(®s[regno]); - regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC; + regs[regno].type = btf_is_kernel(ds_head->btf) ? PTR_TO_BTF_ID | PTR_TRUSTED : + PTR_TO_BTF_ID | MEM_ALLOC; regs[regno].btf = ds_head->btf; regs[regno].btf_id = ds_head->value_btf_id; regs[regno].off = ds_head->node_offset; @@ -11931,8 +11933,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return ret; break; case KF_ARG_PTR_TO_LIST_NODE: - if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { - verbose(env, "arg#%d expected pointer to allocated object\n", i); + if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC) && + reg->type != (PTR_TO_BTF_ID | PTR_TRUSTED) && + reg->type != PTR_TO_BTF_ID) { + verbose(env, "arg#%d expected pointer to allocated object or trusted pointer\n", i); return -EINVAL; } if (!reg->ref_obj_id) { @@ -11954,8 +11958,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return -EINVAL; } } else { - if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { - verbose(env, "arg#%d expected pointer to allocated object\n", i); + if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC) && + reg->type != (PTR_TO_BTF_ID | PTR_TRUSTED) && + reg->type != PTR_TO_BTF_ID) { + verbose(env, "arg#%d expected pointer to allocated object or trusted pointer\n", i); return -EINVAL; } if (!reg->ref_obj_id) { From patchwork Fri May 10 19:24:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661882 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B09D44F201; Fri, 10 May 2024 19:24:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369061; cv=none; b=gyhfGpoEnTYGdOZKsASk1CAX0Z9+V/TUXns4F96HsvI91y+PmTmusT4KTwiqLz221US6wYGeM/zzDeRpsZAPn5/EeN2CCjMLwnem42wXvzxFHgTlPRuf8skvGkFyCbIrLFX1CL7vl6N9X4q1/eavzG6Tx6a7rrmqdxFRAv1XWl0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369061; c=relaxed/simple; bh=kehJkF7hNW6So1in8gSfWI+KqSEe1cpFQyx2tFoR3xY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JvTK5dcbk+KWHD6JIzC5nqCzM8s3uncZ/4z6IxmwHQj91qKP9l5VqzXvh/AcODDHfTsG/ceqn5FqyzconAi7KpbJWerrINsAe7mNT4Ie3sJ4u+2nIet3PI8fTkeDnptVYqr5zjfAlxGK6Gwq18ar4fhlEiWhh1exkMlhbdnLIH0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mm+X8vDz; arc=none smtp.client-ip=209.85.221.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mm+X8vDz" Received: by mail-vk1-f180.google.com with SMTP id 71dfb90a1353d-4df97a50d1aso87233e0c.1; Fri, 10 May 2024 12:24:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369058; x=1715973858; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AIumfiaWIEWPZRe/xDRsqp1TqW1h17NwK3fwZB2xMTs=; b=mm+X8vDzYYY7wvBxyIG2gCtF0Mak4Brx0bUcX6/8cTXjn6INBMD3xLlp7eVyyPoOlj SEvU5TCd05+ONs5PY+0JZ+R1ZTy+c0l04RXzDEpkDa29zaX0Mk8+2uFYpDb23CwErkL1 VTVU3uDOLSlN7rU5M0TuRYAjxd4xKexueEIAUIhXlCWX2W/HC4JoVV8JfImn/kVSC7Mr Mga+Xm6aU2c9cDqB/IWAcjgYUqC8Jqe4PFhEGiaoEKnJhbN2AA/0tKFkXZyPYsEryyiy LVk2CFySktAd7VVbW0xPDvuhGryasi/HTlLaM8XH4poE09fiq9Upqh2sU4OYrhG8UKBt tleg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369058; x=1715973858; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AIumfiaWIEWPZRe/xDRsqp1TqW1h17NwK3fwZB2xMTs=; b=Z9tTsTZPw1FLyrd/6a51APQ3SGBu60iLYqS4aQu2WndKZrPHPHgUvCOq+SAchKj9rM Fj/uJmkGDuIrir7mxSida+1jBecZPs+XNmQdsu6BebB5LAovy0XNNZG8vE0GiDdZA48S 1YuuSP95f5mYZidb4J+ZsepwyK8Ms/kYj+yBCGVobrGO/4yi8Pd0bMa5+WNgXn25cYIJ EVxqFFnI2BxQWApXFLBHf5pwwn4O0+g/0LAyvEmw8P6p9TRpsU0Y+tmcOE9TlbefbKqd +C63XPsMdK16u1r8dLIK4vFfIb1lyM2A+Df2p2iapBCu9Nc7sRF2rolc57VqH7p5y2FL A6Dw== X-Gm-Message-State: AOJu0YwRql+FGTy+OrnoRAfcro/E1iXCyeim1cbRiBFzVTcMt7lYKimM FZqgYZBFSCt7jH+lxHsanN6BcFXwuvJZmKaQ2e62PvBD44ZJ93gJyI5+OA== X-Google-Smtp-Source: AGHT+IExx1AJFWIy0pm2XKH2XcB1MuY8euiT1hHYapSLEQAZ6Cjh9SZrHHIpwCazZphs6mgyVZoBrQ== X-Received: by 2002:a05:6122:4597:b0:4df:16d8:2b82 with SMTP id 71dfb90a1353d-4df88283ba4mr4236908e0c.1.1715369058463; Fri, 10 May 2024 12:24:18 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:18 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 08/20] selftests/bpf: Test adding kernel object to bpf graph Date: Fri, 10 May 2024 19:24:00 +0000 Message-Id: <20240510192412.3297104-9-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch tests bpf graphs storing kernel objects. Signed-off-by: Amery Hung --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 14 +++++++++ .../selftests/bpf/bpf_testmod/bpf_testmod.h | 5 ++++ .../selftests/bpf/prog_tests/linked_list.c | 6 ++-- .../testing/selftests/bpf/progs/linked_list.c | 15 ++++++++++ .../testing/selftests/bpf/progs/linked_list.h | 8 +++++ .../selftests/bpf/progs/linked_list_fail.c | 29 +++++++++++++++++++ 6 files changed, 75 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 097a8d1c2ef8..90dda6335c04 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -494,6 +494,18 @@ __bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused return arg; } +__bpf_kfunc static struct bpf_testmod_linked_list_obj * +bpf_kfunc_call_test_acq_linked_list_obj(void) +{ + return kzalloc(sizeof(struct bpf_testmod_linked_list_obj), GFP_ATOMIC); +} + +__bpf_kfunc static void +bpf_kfunc_call_test_rel_linked_list_obj(struct bpf_testmod_linked_list_obj *obj) +{ + kvfree(obj); +} + BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids) BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc) BTF_ID_FLAGS(func, bpf_kfunc_call_test1) @@ -520,6 +532,8 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU) BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE) BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg) BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset) +BTF_ID_FLAGS(func, bpf_kfunc_call_test_acq_linked_list_obj, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_kfunc_call_test_rel_linked_list_obj, KF_RELEASE) BTF_KFUNCS_END(bpf_testmod_check_kfunc_ids) static int bpf_testmod_ops_init(struct btf *btf) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index 6d24e1307b64..77c36fc016e3 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -99,4 +99,9 @@ struct bpf_testmod_ops2 { int (*test_1)(void); }; +struct bpf_testmod_linked_list_obj { + int val; + struct bpf_list_node node; +}; + #endif /* _BPF_TESTMOD_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c index 2fb89de63bd2..813c2e9a2346 100644 --- a/tools/testing/selftests/bpf/prog_tests/linked_list.c +++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c @@ -80,8 +80,8 @@ static struct { { "direct_write_node", "direct access to bpf_list_node is disallowed" }, { "use_after_unlock_push_front", "invalid mem access 'scalar'" }, { "use_after_unlock_push_back", "invalid mem access 'scalar'" }, - { "double_push_front", "arg#1 expected pointer to allocated object" }, - { "double_push_back", "arg#1 expected pointer to allocated object" }, + { "double_push_front", "arg#1 expected pointer to allocated object or trusted pointer" }, + { "double_push_back", "arg#1 expected pointer to allocated object or trusted pointer" }, { "no_node_value_type", "bpf_list_node not found at offset=0" }, { "incorrect_value_type", "operation on bpf_list_head expects arg#1 bpf_list_node at offset=48 in struct foo, " @@ -96,6 +96,8 @@ static struct { { "incorrect_head_off2", "bpf_list_head not found at offset=1" }, { "pop_front_off", "off 48 doesn't point to 'struct bpf_spin_lock' that is at 40" }, { "pop_back_off", "off 48 doesn't point to 'struct bpf_spin_lock' that is at 40" }, + { "direct_write_node_kernel", "" }, + { "push_local_node_to_kptr_list", "operation on bpf_list_head expects arg#1 bpf_list_node at offset=8 in struct bpf_testmod_linked_list_obj, but arg is at offset=8 in struct bpf_testmod_linked_list_obj" }, }; static void test_linked_list_fail_prog(const char *prog_name, const char *err_msg) diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c index 26205ca80679..148ec67feaf7 100644 --- a/tools/testing/selftests/bpf/progs/linked_list.c +++ b/tools/testing/selftests/bpf/progs/linked_list.c @@ -378,4 +378,19 @@ int global_list_in_list(void *ctx) return test_list_in_list(&glock, &ghead); } +SEC("tc") +int push_to_kptr_list(void *ctx) +{ + struct bpf_testmod_linked_list_obj *f; + + f = bpf_kfunc_call_test_acq_linked_list_obj(); + if (!f) + return 0; + + bpf_spin_lock(&glock3); + bpf_list_push_back(&ghead3, &f->node); + bpf_spin_unlock(&glock3); + return 0; +} + char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/linked_list.h b/tools/testing/selftests/bpf/progs/linked_list.h index c0f3609a7ffa..14bd92cfdb6f 100644 --- a/tools/testing/selftests/bpf/progs/linked_list.h +++ b/tools/testing/selftests/bpf/progs/linked_list.h @@ -5,6 +5,7 @@ #include #include #include "bpf_experimental.h" +#include "../bpf_testmod/bpf_testmod.h" struct bar { struct bpf_list_node node; @@ -52,5 +53,12 @@ struct { private(A) struct bpf_spin_lock glock; private(A) struct bpf_list_head ghead __contains(foo, node2); private(B) struct bpf_spin_lock glock2; +private(C) struct bpf_spin_lock glock3; +private(C) struct bpf_list_head ghead3 __contains_kptr(bpf_testmod_linked_list_obj, node); + +struct bpf_testmod_linked_list_obj *bpf_kfunc_call_test_acq_linked_list_obj(void) __ksym; +void bpf_kfunc_call_test_rel_linked_list_obj(struct bpf_testmod_linked_list_obj *obj) __ksym; +struct bpf_testmod_rb_tree_obj *bpf_kfunc_call_test_acq_rb_tree_obj(void) __ksym; +void bpf_kfunc_call_test_rel_rb_tree_obj(struct bpf_testmod_rb_tree_obj *obj) __ksym; #endif diff --git a/tools/testing/selftests/bpf/progs/linked_list_fail.c b/tools/testing/selftests/bpf/progs/linked_list_fail.c index 6438982b928b..5f8063ecc448 100644 --- a/tools/testing/selftests/bpf/progs/linked_list_fail.c +++ b/tools/testing/selftests/bpf/progs/linked_list_fail.c @@ -609,4 +609,33 @@ int pop_back_off(void *ctx) return pop_ptr_off((void *)bpf_list_pop_back); } +SEC("?tc") +int direct_write_node_kernel(void *ctx) +{ + struct bpf_testmod_linked_list_obj *f; + + f = bpf_kfunc_call_test_acq_linked_list_obj(); + if (!f) + return 0; + + *(__u64 *)&f->node = 0; + bpf_kfunc_call_test_rel_linked_list_obj(f); + return 0; +} + +SEC("?tc") +int push_local_node_to_kptr_list(void *ctx) +{ + struct bpf_testmod_linked_list_obj *f; + + f = bpf_obj_new(typeof(*f)); + if (!f) + return 0; + + bpf_spin_lock(&glock3); + bpf_list_push_back(&ghead3, &f->node); + bpf_spin_unlock(&glock3); + return 0; +} + char _license[] SEC("license") = "GPL"; From patchwork Fri May 10 19:24:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661883 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E44C502AE; Fri, 10 May 2024 19:24:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369063; cv=none; b=bEstblHrVlv02oKEc7muCNjDT8dK3CwRQjlDPyZ7Db2nK8/vQWGSg5NVGz9Lr+5AVm02jjmbuK7QBFdVPxbJ3j4hnlDLrV91BjiM81mFPx0P2NYnaMgAbdfZHViSZ+qxKTp3D/6xJTGd/ko0A9j4D4wnx1LgchSdReTLsn0Ek8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369063; c=relaxed/simple; bh=kTydXBETpjr1D6XSLVXfwnOa/+Brli4WffZu0T51Xcc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ClY5ii+gTKJTNSftcJ5FYXhtdOOW8iS1xkGE/ciS0Ufs3xDUm+rCW5TC+QL+zgxt3mEQ34QokSV9MaFOe7NKICdCS7xEdC3ZlO1LYSs4Qrfy7jyJIsfawmUlz5aQpT1EyxhZ5fNB+D5QTlSt8vBabYmmsw44R116Z2HnCcWXRbo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZiJDmb7y; arc=none smtp.client-ip=209.85.160.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZiJDmb7y" Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-43ddbdf2439so15427251cf.0; Fri, 10 May 2024 12:24:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369059; x=1715973859; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hveCu8ucfpTFonbjNBtsRxV1gZi43mXbiPaT0dPNXp0=; b=ZiJDmb7yHLGoQXPfP2+TiGJp0b7qvTf0lfW4dPox75TJWhN4GII0Dh6RpE/1yhDhvJ 4tUBqNfF5YHFn2qXlH7bnU6dYfxMeBbZmwFXOv7/YUkUmpg6codlRjnnGb4R2eM0vBQU gFD8FkD34LotUb2cn+hcQE3tWvqCEgJH/nBsJEYSizPzSvp35BQs4a4LhjnbpUlUc5Ym iNExZyAk3yNaeafLMRt2TwAFjWGgPzyyZZ624MVq9JXBaOQd94y6TfNo2XYtA5hMt3cG puWDh3mQvvNwLn5lxXvrwdrQ1rUnmoic75aryniswcUCkQgf+6U0vCBvYMPQZu2uV1zx CZ9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369059; x=1715973859; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hveCu8ucfpTFonbjNBtsRxV1gZi43mXbiPaT0dPNXp0=; b=q08qTMIK4kvK1Q18NQpe34jwAGgi+u+F8Alte4zPnNvx4pBm6pKBcGkGjJarVjwq/R X0Wgtvnwcm6gBOr7G4299HWNquXMbLlM14YpvPNs1elDuoNsBB6Kf6gaFyJqnVNtVFgv EfjOTDvqpApKEKBR9RV8+4+0nd3kZFHmv0/FcID6QWRXi4vtJtMvOJAobrf811i0ODcT 7fuy6OdMOyWtfDOwCC9EEMrKJCL1CGqWHqsbp1lDpXGa89Wph59gnhUbisWT5R9I08RF a3Yi3KEqgwOEKGBJtHZzb7PVxXrdTdQOYFqGmBa7D7UEhnbM0lbtd3ksLbPd4pBqDDe0 1+ow== X-Gm-Message-State: AOJu0YyjpAjZl0s9CAJSHeuFO6Xs06L7B6tryR9bb9o2eIS2ZLcnXRKt xiqcsNlG8XZiuvoiOGCBXtoq6Edoh4iy5CEqm/7SNowh3f08tfhQnDAYiw== X-Google-Smtp-Source: AGHT+IEEHrBSdl5u3GgPumiQKHpYQuCPrumgKyDcajkj29MvZpj15ouVgIcL2/m2WJY86iJ5NeCucg== X-Received: by 2002:ac8:7f51:0:b0:43d:d970:b3f1 with SMTP id d75a77b69052e-43dfdd0c647mr36136851cf.61.1715369059163; Fri, 10 May 2024 12:24:19 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:18 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 09/20] bpf: Find special BTF fields in union Date: Fri, 10 May 2024 19:24:01 +0000 Message-Id: <20240510192412.3297104-10-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch looks into unions when parsing BTF. While we would like to support adding a skb to bpf collections, the bpf graph node in sk_buff will happen to be in a union due to space constraint. Therefore, Signed-off-by: Amery Hung --- kernel/bpf/btf.c | 74 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 64 insertions(+), 10 deletions(-) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 37fb6143da79..25a5dc840ac3 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3305,7 +3305,7 @@ static int btf_find_struct(const struct btf *btf, const struct btf_type *t, u32 off, int sz, enum btf_field_type field_type, struct btf_field_info *info) { - if (!__btf_type_is_struct(t)) + if (!btf_type_is_struct(t)) return BTF_FIELD_IGNORE; if (t->size != sz) return BTF_FIELD_IGNORE; @@ -3497,6 +3497,24 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, return type; } +static int btf_get_union_field_types(const struct btf *btf, const struct btf_type *u, + u32 field_mask, u32 *seen_mask, int *align, int *sz) +{ + int i, field_type, field_types = 0; + const struct btf_member *member; + const struct btf_type *t; + + for_each_member(i, u, member) { + t = btf_type_by_id(btf, member->type); + field_type = btf_get_field_type(__btf_name_by_offset(btf, t->name_off), + field_mask, seen_mask, align, sz); + if (field_type == 0 || field_type == BPF_KPTR_REF) + continue; + field_types = field_types | field_type; + } + return field_types; +} + #undef field_mask_test_name static int btf_find_struct_field(const struct btf *btf, @@ -3512,8 +3530,12 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *member_type = btf_type_by_id(btf, member->type); - field_type = btf_get_field_type(__btf_name_by_offset(btf, member_type->name_off), - field_mask, &seen_mask, &align, &sz); + field_type = BTF_INFO_KIND(member_type->info) == BTF_KIND_UNION ? + btf_get_union_field_types(btf, member_type, field_mask, + &seen_mask, &align, &sz) : + btf_get_field_type(__btf_name_by_offset(btf, member_type->name_off), + field_mask, &seen_mask, &align, &sz); + if (field_type == 0) continue; if (field_type < 0) @@ -3521,8 +3543,7 @@ static int btf_find_struct_field(const struct btf *btf, off = __btf_member_bit_offset(t, member); if (off % 8) - /* valid C code cannot generate such BTF */ - return -EINVAL; + continue; off /= 8; if (off % align) continue; @@ -3737,6 +3758,20 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field, return ret; } +static const struct btf_type * +btf_find_member_by_name(const struct btf *btf, const struct btf_type *t, + const char *member_name) +{ + const struct btf_member *member; + int i; + + for_each_member(i, t, member) { + if (!strcmp(member_name, __btf_name_by_offset(btf, member->name_off))) + return btf_type_by_id(btf, member->type); + } + return NULL; +} + static int btf_parse_graph_root(struct btf_field *field, struct btf_field_info *info, const char *node_type_name, @@ -3754,18 +3789,27 @@ static int btf_parse_graph_root(struct btf_field *field, * verify its type. */ for_each_member(i, t, member) { - if (strcmp(info->graph_root.node_name, - __btf_name_by_offset(btf, member->name_off))) + const struct btf_type *member_type = btf_type_by_id(btf, member->type); + + if (BTF_INFO_KIND(member_type->info) == BTF_KIND_UNION) { + member_type = btf_find_member_by_name(btf, member_type, + info->graph_root.node_name); + if (!member_type) + continue; + } else if (strcmp(info->graph_root.node_name, + __btf_name_by_offset(btf, member->name_off))) { continue; + } + /* Invalid BTF, two members with same name */ if (n) return -EINVAL; - n = btf_type_by_id(btf, member->type); + n = member_type; if (!__btf_type_is_struct(n)) return -EINVAL; if (strcmp(node_type_name, __btf_name_by_offset(btf, n->name_off))) return -EINVAL; - offset = __btf_member_bit_offset(n, member); + offset = __btf_member_bit_offset(member_type, member); if (offset % 8) return -EINVAL; offset /= 8; @@ -5440,7 +5484,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) const struct btf_member *member; struct btf_struct_meta *type; struct btf_record *record; - const struct btf_type *t; + const struct btf_type *t, *member_type; int j, tab_cnt, id; id = btf_is_base_kernel ? @@ -5462,6 +5506,16 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) cond_resched(); for_each_member(j, t, member) { + member_type = btf_type_by_id(btf, member->type); + if (BTF_INFO_KIND(member_type->info) == BTF_KIND_UNION) { + const struct btf_member *umember; + int k; + + for_each_member(k, member_type, umember) { + if (btf_id_set_contains(&aof.set, umember->type)) + goto parse; + } + } if (btf_id_set_contains(&aof.set, member->type)) goto parse; } From patchwork Fri May 10 19:24:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661887 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E400D50A87; Fri, 10 May 2024 19:24:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369065; cv=none; b=pcJP0Bf7Md3T2kPxy86be2+qBkqZXX60c0+PV2xNstnXvZ35rD8vukZGBst/UHE3AUvXDNttojwmMXUCDYtjhlfM4t3qqWVayeo34VmMv8HUD34tjma/W36wgpmKQOss2KZUlWDxFU8WAHKOytFpc8TzKgrpScjBE9iaP0VdMuY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369065; c=relaxed/simple; bh=0ABJ9EGtHlMzRO8t2wR9+4kAyrBkPtWe5FOnV9rV9/o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gQuHDDnL7hCprpR9ziZG5GoIOaXU0TIgZ2ugGddZ5Inlg2aMrQrVtlswzgLbA7wlJMoKD8suKRO8R+t4fiUZ9xlX8B5Qukd713B80hosiyHspABBkEOCexC9q7nb5cjOJutBSJdqEIsGvkqprA8Cnj4b9UHrhk0vSlOIOc/Kd9E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Nq1djXYV; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Nq1djXYV" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-43de92e234dso24396381cf.1; Fri, 10 May 2024 12:24:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369061; x=1715973861; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BY/IGGUIFdyzwo5w0wCes/hxX7mNMzc6kH+nYlLXTM0=; b=Nq1djXYVzp+i3KaMRrW72BOZutNurQV2l1C59sah9hzXibjIwpMf1V5tkRjo3alQpY w6ivTkWvpGAjCpSBDSZfTBe4udGmrCbCscZSDM80a/To53poQP8BP7ulp78Ce67GIae7 LHFcMxO2Bpv9EbPXtK5x8I282QmB9d3BSKEEuMgr4VV56I/H2MtVh4rYYcAHpDrhvITP NpiyeNHNDBkHdO6uX3gZ3XO2vPFJDSFfRRg701IJFwKP7+3vXuRHm81MZHJPCtqKUGz/ LR6AWeug7P/yPFlDvgBsUVYArT0adGqzQKun/Qekx1whM5W9DxrxO5uFx5iVy52WaOkg a3Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369061; x=1715973861; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BY/IGGUIFdyzwo5w0wCes/hxX7mNMzc6kH+nYlLXTM0=; b=q7UDYPSuguqlcx1ok+d3EL5g7ENMiSBi4UQV+EPIX6gJOcoKoliekxU/ZHmP9eXDP5 RdEBz7XjzXA1y0vCmppyhA5rOmZVVqJRq/bQysnmzgRYdnDv9shCsn1Eg6oQVCNtlQNC a2sorryg+S89CDsJODvSsKIrWtRvoQT8sk8u7LOX8usmAg87SwOBSqSz+7efZnSwoXUq V0Rxnoi5V96ng5Lg3u6pxB5fTyARPdQBneEMquBGRVmYmXzCl97Z9HpDvjsu9gxn3w9F fnkCiNWmXLlbciy0uyb0spxe3RSyTyVaThuczqfqMVvqQQSpDCXP/Ho4latFQCCGJzK2 so/Q== X-Gm-Message-State: AOJu0Yzkfj3ziNTq8VtkIdG9zebbFxvBOnx4Yccv1XKtBv07GP0kPFQU 4o+906/qAMX9nK82aCNk1aTApXoHw7qT+gEJPr7INnn3+Y/EHsP9RHTE4w== X-Google-Smtp-Source: AGHT+IHDloUvoXMyn50szuMjCDLkltoo8/ncszUXxwdABD5IOz67IM+jNMNmQPUpiaJOeVDcnFmYjQ== X-Received: by 2002:a05:622a:152:b0:43a:f5db:88b8 with SMTP id d75a77b69052e-43dfce260ebmr58982781cf.24.1715369059804; Fri, 10 May 2024 12:24:19 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:19 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 10/20] bpf: Introduce exclusive-ownership list and rbtree nodes Date: Fri, 10 May 2024 19:24:02 +0000 Message-Id: <20240510192412.3297104-11-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch reintroduces the semantic of exclusive ownership of a reference. The main motivation is to save spaces and avoid changing kernel structure layout. Existing bpf graph nodes add an additional owner field to list_head and rb_node to safely support shared ownership of a reference. The previous patch supports adding kernel objects to collections by including bpf_list_node or bpf_rb_node in a kernel structure same as user-defined local objects. However, some kernel objects' layout have been optimized through out the years and cannot be easily changed. For example, a bpf_rb_node cannot be added in the union at offset=0 in sk_buff since bpf_rb_node is larger than other members. Exclusive ownership solves the problem as "owner" is no longer needed and both graph nodes can be at the same offset. To achieve this, bpf_list_excl_node and bpf_rb_excl_node are first introduced. They simply wrap list_head and rb_node, and serve as annotations in BTF. Then, we make sure that they cannot co-exist with bpf_refcount, bpf_list_node and bpf_rb_nodes in the same structure when parsing btf. This will prevent the user from acquiring more than one reference to a object with a exclusive node. No exclusive node can be added to collection yet. We will teach the verifier to accept exclusive nodes as valid nodes and then skip the ownership checks in graph kfuncs. Signed-off-by: Amery Hung --- include/linux/bpf.h | 27 ++++++++++++--- include/linux/rbtree_types.h | 4 +++ include/linux/types.h | 4 +++ kernel/bpf/btf.c | 64 +++++++++++++++++++++++++++++++++--- kernel/bpf/syscall.c | 20 +++++++++-- 5 files changed, 108 insertions(+), 11 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6aabca1581fe..49c29c823fb3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -197,11 +197,16 @@ enum btf_field_type { BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_PERCPU, BPF_LIST_HEAD = (1 << 5), BPF_LIST_NODE = (1 << 6), - BPF_RB_ROOT = (1 << 7), - BPF_RB_NODE = (1 << 8), - BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE, + BPF_LIST_EXCL_NODE = (1 << 7), + BPF_RB_ROOT = (1 << 8), + BPF_RB_NODE = (1 << 9), + BPF_RB_EXCL_NODE = (1 << 10), + BPF_GRAPH_EXCL_NODE = BPF_RB_EXCL_NODE | BPF_LIST_EXCL_NODE, + BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE | + BPF_RB_EXCL_NODE | BPF_LIST_EXCL_NODE, BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD, - BPF_REFCOUNT = (1 << 9), + BPF_GRAPH_NODE_OR_ROOT = BPF_GRAPH_NODE | BPF_GRAPH_ROOT, + BPF_REFCOUNT = (1 << 11), }; typedef void (*btf_dtor_kfunc_t)(void *); @@ -321,10 +326,14 @@ static inline const char *btf_field_type_name(enum btf_field_type type) return "bpf_list_head"; case BPF_LIST_NODE: return "bpf_list_node"; + case BPF_LIST_EXCL_NODE: + return "bpf_list_excl_node"; case BPF_RB_ROOT: return "bpf_rb_root"; case BPF_RB_NODE: return "bpf_rb_node"; + case BPF_RB_EXCL_NODE: + return "bpf_rb_excl_node"; case BPF_REFCOUNT: return "bpf_refcount"; default: @@ -348,10 +357,14 @@ static inline u32 btf_field_type_size(enum btf_field_type type) return sizeof(struct bpf_list_head); case BPF_LIST_NODE: return sizeof(struct bpf_list_node); + case BPF_LIST_EXCL_NODE: + return sizeof(struct bpf_list_excl_node); case BPF_RB_ROOT: return sizeof(struct bpf_rb_root); case BPF_RB_NODE: return sizeof(struct bpf_rb_node); + case BPF_RB_EXCL_NODE: + return sizeof(struct bpf_rb_excl_node); case BPF_REFCOUNT: return sizeof(struct bpf_refcount); default: @@ -375,10 +388,14 @@ static inline u32 btf_field_type_align(enum btf_field_type type) return __alignof__(struct bpf_list_head); case BPF_LIST_NODE: return __alignof__(struct bpf_list_node); + case BPF_LIST_EXCL_NODE: + return __alignof__(struct bpf_list_excl_node); case BPF_RB_ROOT: return __alignof__(struct bpf_rb_root); case BPF_RB_NODE: return __alignof__(struct bpf_rb_node); + case BPF_RB_EXCL_NODE: + return __alignof__(struct bpf_rb_excl_node); case BPF_REFCOUNT: return __alignof__(struct bpf_refcount); default: @@ -396,10 +413,12 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr) refcount_set((refcount_t *)addr, 1); break; case BPF_RB_NODE: + case BPF_RB_EXCL_NODE: RB_CLEAR_NODE((struct rb_node *)addr); break; case BPF_LIST_HEAD: case BPF_LIST_NODE: + case BPF_LIST_EXCL_NODE: INIT_LIST_HEAD((struct list_head *)addr); break; case BPF_RB_ROOT: diff --git a/include/linux/rbtree_types.h b/include/linux/rbtree_types.h index 45b6ecde3665..fc5185991fb1 100644 --- a/include/linux/rbtree_types.h +++ b/include/linux/rbtree_types.h @@ -28,6 +28,10 @@ struct rb_root_cached { struct rb_node *rb_leftmost; }; +struct bpf_rb_excl_node { + struct rb_node rb_node; +}; + #define RB_ROOT (struct rb_root) { NULL, } #define RB_ROOT_CACHED (struct rb_root_cached) { {NULL, }, NULL } diff --git a/include/linux/types.h b/include/linux/types.h index 2bc8766ba20c..71429cd80ce2 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -202,6 +202,10 @@ struct hlist_node { struct hlist_node *next, **pprev; }; +struct bpf_list_excl_node { + struct list_head list_head; +}; + struct ustat { __kernel_daddr_t f_tfree; #ifdef CONFIG_ARCH_32BIT_USTAT_F_TINODE diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 25a5dc840ac3..a641c716e0fa 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3484,6 +3484,8 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root"); field_mask_test_name(BPF_RB_NODE, "bpf_rb_node"); field_mask_test_name(BPF_REFCOUNT, "bpf_refcount"); + field_mask_test_name(BPF_LIST_EXCL_NODE, "bpf_list_excl_node"); + field_mask_test_name(BPF_RB_EXCL_NODE, "bpf_rb_excl_node"); /* Only return BPF_KPTR when all other types with matchable names fail */ if (field_mask & BPF_KPTR) { @@ -3504,6 +3506,8 @@ static int btf_get_union_field_types(const struct btf *btf, const struct btf_typ const struct btf_member *member; const struct btf_type *t; + field_mask &= BPF_GRAPH_EXCL_NODE; + for_each_member(i, u, member) { t = btf_type_by_id(btf, member->type); field_type = btf_get_field_type(__btf_name_by_offset(btf, t->name_off), @@ -3552,13 +3556,28 @@ static int btf_find_struct_field(const struct btf *btf, case BPF_SPIN_LOCK: case BPF_TIMER: case BPF_LIST_NODE: + case BPF_LIST_EXCL_NODE: case BPF_RB_NODE: + case BPF_RB_EXCL_NODE: case BPF_REFCOUNT: ret = btf_find_struct(btf, member_type, off, sz, field_type, idx < info_cnt ? &info[idx] : &tmp); if (ret < 0) return ret; break; + case BPF_GRAPH_EXCL_NODE: + ret = btf_find_struct(btf, member_type, off, sz, + BPF_LIST_EXCL_NODE, + idx < info_cnt ? &info[idx] : &tmp); + if (ret < 0) + return ret; + ++idx; + ret = btf_find_struct(btf, member_type, off, sz, + BPF_RB_EXCL_NODE, + idx < info_cnt ? &info[idx] : &tmp); + if (ret < 0) + return ret; + break; case BPF_KPTR_UNREF: case BPF_KPTR_REF: case BPF_KPTR_PERCPU: @@ -3619,7 +3638,9 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t, case BPF_SPIN_LOCK: case BPF_TIMER: case BPF_LIST_NODE: + case BPF_LIST_EXCL_NODE: case BPF_RB_NODE: + case BPF_RB_EXCL_NODE: case BPF_REFCOUNT: ret = btf_find_struct(btf, var_type, off, sz, field_type, idx < info_cnt ? &info[idx] : &tmp); @@ -3827,14 +3848,24 @@ static int btf_parse_graph_root(struct btf_field *field, static int btf_parse_list_head(struct btf_field *field, struct btf_field_info *info) { - return btf_parse_graph_root(field, info, "bpf_list_node", - __alignof__(struct bpf_list_node)); + int err; + + err = btf_parse_graph_root(field, info, "bpf_list_node", + __alignof__(struct bpf_list_node)); + + return err ? btf_parse_graph_root(field, info, "bpf_list_excl_node", + __alignof__(struct bpf_list_excl_node)) : 0; } static int btf_parse_rb_root(struct btf_field *field, struct btf_field_info *info) { - return btf_parse_graph_root(field, info, "bpf_rb_node", - __alignof__(struct bpf_rb_node)); + int err; + + err = btf_parse_graph_root(field, info, "bpf_rb_node", + __alignof__(struct bpf_rb_node)); + + return err ? btf_parse_graph_root(field, info, "bpf_rb_excl_node", + __alignof__(struct bpf_rb_excl_node)) : 0; } static int btf_field_cmp(const void *_a, const void *_b, const void *priv) @@ -3864,6 +3895,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type return NULL; cnt = ret; + /* This needs to be kzalloc to zero out padding and unused fields, see * comment in btf_record_equal. */ @@ -3881,7 +3913,9 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type ret = -EFAULT; goto end; } - if (info_arr[i].off < next_off) { + if (info_arr[i].off < next_off && + !(info_arr[i].off == info_arr[i - 1].off && + (info_arr[i].type | info_arr[i - 1].type) == BPF_GRAPH_EXCL_NODE)) { ret = -EEXIST; goto end; } @@ -3925,6 +3959,8 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type if (ret < 0) goto end; break; + case BPF_LIST_EXCL_NODE: + case BPF_RB_EXCL_NODE: case BPF_LIST_NODE: case BPF_RB_NODE: break; @@ -3949,6 +3985,21 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type goto end; } + if (rec->refcount_off >= 0 && + (btf_record_has_field(rec, BPF_LIST_EXCL_NODE) || + btf_record_has_field(rec, BPF_RB_EXCL_NODE))) { + ret = -EINVAL; + goto end; + } + + if ((btf_record_has_field(rec, BPF_LIST_EXCL_NODE) || + btf_record_has_field(rec, BPF_RB_EXCL_NODE)) && + (btf_record_has_field(rec, BPF_LIST_NODE) || + btf_record_has_field(rec, BPF_RB_NODE))) { + ret = -EINVAL; + goto end; + } + sort_r(rec->fields, rec->cnt, sizeof(struct btf_field), btf_field_cmp, NULL, rec); @@ -5434,8 +5485,10 @@ static const char *alloc_obj_fields[] = { "bpf_spin_lock", "bpf_list_head", "bpf_list_node", + "bpf_list_excl_node", "bpf_rb_root", "bpf_rb_node", + "bpf_rb_excl_node", "bpf_refcount", }; @@ -5536,6 +5589,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) type->btf_id = id; record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT | + BPF_LIST_EXCL_NODE | BPF_RB_EXCL_NODE | BPF_KPTR, t->size); /* The record cannot be unset, treat it as an error if so */ if (IS_ERR_OR_NULL(record)) { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 9e93d48efe19..25fad6293720 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -528,13 +528,23 @@ struct btf_field *btf_record_find(const struct btf_record *rec, u32 offset, u32 field_mask) { struct btf_field *field; + u32 i; if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & field_mask)) return NULL; field = bsearch(&offset, rec->fields, rec->cnt, sizeof(rec->fields[0]), btf_field_cmp); - if (!field || !(field->type & field_mask)) + if (!field) return NULL; - return field; + if (field->type & field_mask) + return field; + if (field->type & BPF_GRAPH_EXCL_NODE && field_mask & BPF_GRAPH_EXCL_NODE) { + i = field - rec->fields; + if (i > 0 && (field - 1)->type & field_mask) + return field - 1; + if (i < rec->cnt - 1 && (field + 1)->type & field_mask) + return field + 1; + } + return NULL; } void btf_record_free(struct btf_record *rec) @@ -554,8 +564,10 @@ void btf_record_free(struct btf_record *rec) break; case BPF_LIST_HEAD: case BPF_LIST_NODE: + case BPF_LIST_EXCL_NODE: case BPF_RB_ROOT: case BPF_RB_NODE: + case BPF_RB_EXCL_NODE: case BPF_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: @@ -603,8 +615,10 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) break; case BPF_LIST_HEAD: case BPF_LIST_NODE: + case BPF_LIST_EXCL_NODE: case BPF_RB_ROOT: case BPF_RB_NODE: + case BPF_RB_EXCL_NODE: case BPF_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: @@ -711,7 +725,9 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off); break; case BPF_LIST_NODE: + case BPF_LIST_EXCL_NODE: case BPF_RB_NODE: + case BPF_RB_EXCL_NODE: case BPF_REFCOUNT: break; default: From patchwork Fri May 10 19:24:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661885 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96E20502BE; Fri, 10 May 2024 19:24:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369063; cv=none; b=XeSDm4vvaz/OAQAKkxfgoTen5LYhlH0Q1RS2WvyK3ZSl6YA79qibrhbya0FHpJWCul8BfkOGt+eeWS24BVXn8sfAgh9f8zVgfE2VQ5AS9MaM5mR5LrYNFl7mozt8qzKc5bpEB8g5AwyUbs7W7ACyqoi59JQfpLH0WKslEW/Dl/w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369063; c=relaxed/simple; bh=syUcB07gwNwRkuB9GJehuYow/O4JgHWo2jWcVtjNxWo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ConkxZmJfLz52CZhD3C97q5fIMXQQC3kmQXj+EJQ9j8WzHgZ8z1/FKfiyaY+xJ282j3/k6tCCzNE5kqf4hdbn5LKicWqNsw4H7V0jJqd9Dhoeem/4qeFJZVTKnHFFFa6VU23OuxCguF30iQ5kY63mWrAwEIoLqIU4Go0C4uJBA8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZiAMt1HV; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZiAMt1HV" Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-43df751b5b8so12917881cf.0; Fri, 10 May 2024 12:24:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369060; x=1715973860; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AOGBkGpSFFT7zPNyeKaVqbqpwZiJlX+/nC4CsYgkmE4=; b=ZiAMt1HV+ucwUmsmp7H6sL1tRBq0bqGJoD+nl8khhVtqbA5fqTqk8GnSmk/gkq6z/Q p+FN4kNfyG6omE/vbFSMcfkfpj5FT75d0RAiy04M53mhlaI7u2vAqhomLz7fVuuhTl58 6OL0i2wcxbplMcNkGDeYj/fsyfmGMb+fD2PMcw1cLEh/CIq2krjT3KpPk0uduDG7fsbv 0Vrj8Mx3WDnSbJL5SjRJBkKaB+ep7A6ayT1xexNCZS4HX4tn44q327vDuz+Uz47lBbQX e3Es4XafJhVhnFM45WiXt3xvaKSq53Us6uaXik5y207ERFEz9jRYSrcnIpznyWPEkYEq EbDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369060; x=1715973860; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AOGBkGpSFFT7zPNyeKaVqbqpwZiJlX+/nC4CsYgkmE4=; b=ibCNecxYdQSxrvutg3+ty41Vxf/H01avO8qsvuUT5wiFNelKLtuf52nRQRVw3wmEqb Vk+bGUjMfVGNgRckKvN8vsjeMX1tpRoLkFUDNGxJPzqsKNu4f+kvKT1xaBUFjV+ioEfd s5oaV+sHmXqo9Fh/nrYNH3lA6WmbJ3q6b/QaVQZ2bgCggcLBWBwwITUTrjcQK2wlvQGA nZVNOuaDJEAljztYqyMxoOGRpumtpeTqhMtGcRP/Hvdh1Ch3UUvS03bYT9a5DZROrtot wfuOJRhnd4TlQOEgQbh4Cuy7hSMqD3Jbsai6NIE5sfPzTtneVSzxQlT3LounHeXf5UGq bA+Q== X-Gm-Message-State: AOJu0YwTUQD5IIC6zuTTIr3Pol34ulT0+tnJd0qQHNjrUxQ86Uhlaa61 cjwHYvbGQV0HqTul8kZsIq+6DHAEUWJIApg6qpF/s1uaiH2Dyha6fdwU4w== X-Google-Smtp-Source: AGHT+IFguX6k1ox/pzV81Z8RD4aKDN4Y+RxugjPD6K7jwC46EpNwtGH4tH9WkGaHmOJTgxGioagEVQ== X-Received: by 2002:a05:622a:1496:b0:437:bedb:3ff with SMTP id d75a77b69052e-43dec297648mr107218271cf.27.1715369060389; Fri, 10 May 2024 12:24:20 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:20 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 11/20] bpf: Allow adding exclusive nodes to bpf list and rbtree Date: Fri, 10 May 2024 19:24:03 +0000 Message-Id: <20240510192412.3297104-12-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch first teaches verifier to accept exclusive nodes (bpf_list_excl_node and bpf_rb_excl_node) as valid graph nodes. Graph kfuncs can now skip ownership tracking and checks for graphs containing exclusive nodes since we already make sure that a exclusive node cannot be owned by more than one collection at the same time. Graph kfuncs will use struct_meta to tell whether a node is exclusive or not. Therefore we pass struct_meta as an additional argument to graph remove kfuncs and let verifier fixup the instruction. The first user of exclusive-ownership nodes is sk_buff. In bpf qdisc, an sk_buff will be able to be enqueued into either a bpf_list or a bpf_rbtree. This significantly simplify how users write the code and improve qdisc performance as we no longer need to allocate local objects to store skb kptrs. Signed-off-by: Amery Hung --- include/linux/skbuff.h | 2 + kernel/bpf/btf.c | 1 + kernel/bpf/helpers.c | 63 +++++++---- kernel/bpf/verifier.c | 101 ++++++++++++++---- .../testing/selftests/bpf/bpf_experimental.h | 58 +++++++++- 5 files changed, 180 insertions(+), 45 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 03ea36a82cdd..fefc82542a3c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -871,6 +871,8 @@ struct sk_buff { struct rb_node rbnode; /* used in netem, ip4 defrag, and tcp stack */ struct list_head list; struct llist_node ll_node; + struct bpf_list_excl_node bpf_list; + struct bpf_rb_excl_node bpf_rbnode; }; struct sock *sk; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index a641c716e0fa..6a9c1671c8f4 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -5495,6 +5495,7 @@ static const char *alloc_obj_fields[] = { /* kernel structures with special BTF fields*/ static const char *kstructs_with_special_btf[] = { "unused", + "sk_buff", }; static struct btf_struct_metas * diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 70655cec452c..7acdd8899304 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1988,6 +1988,9 @@ static int __bpf_list_add(struct bpf_list_node_kern *node, bool tail, struct btf_record *rec, u64 off) { struct list_head *n = &node->list_head, *h = (void *)head; + bool exclusive; + + exclusive = btf_record_has_field(rec, BPF_LIST_EXCL_NODE); /* If list_head was 0-initialized by map, bpf_obj_init_field wasn't * called on its fields, so init here @@ -1998,14 +2001,15 @@ static int __bpf_list_add(struct bpf_list_node_kern *node, /* node->owner != NULL implies !list_empty(n), no need to separately * check the latter */ - if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) { + if (!exclusive && cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) { /* Only called from BPF prog, no need to migrate_disable */ __bpf_obj_drop_impl((void *)n - off, rec, false); return -EINVAL; } tail ? list_add_tail(n, h) : list_add(n, h); - WRITE_ONCE(node->owner, head); + if (!exclusive) + WRITE_ONCE(node->owner, head); return 0; } @@ -2030,10 +2034,14 @@ __bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head, return __bpf_list_add(n, head, true, meta ? meta->record : NULL, off); } -static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail) +static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, + struct btf_record *rec, bool tail) { struct list_head *n, *h = (void *)head; struct bpf_list_node_kern *node; + bool exclusive; + + exclusive = btf_record_has_field(rec, BPF_LIST_EXCL_NODE); /* If list_head was 0-initialized by map, bpf_obj_init_field wasn't * called on its fields, so init here @@ -2045,40 +2053,55 @@ static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tai n = tail ? h->prev : h->next; node = container_of(n, struct bpf_list_node_kern, list_head); - if (WARN_ON_ONCE(READ_ONCE(node->owner) != head)) + if (!exclusive && WARN_ON_ONCE(READ_ONCE(node->owner) != head)) return NULL; list_del_init(n); - WRITE_ONCE(node->owner, NULL); + if (!exclusive) + WRITE_ONCE(node->owner, NULL); return (struct bpf_list_node *)n; } -__bpf_kfunc struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) +__bpf_kfunc struct bpf_list_node *bpf_list_pop_front_impl(struct bpf_list_head *head, + void *meta__ign) { - return __bpf_list_del(head, false); + struct btf_struct_meta *meta = meta__ign; + + return __bpf_list_del(head, meta ? meta->record : NULL, false); } -__bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) +__bpf_kfunc struct bpf_list_node *bpf_list_pop_back_impl(struct bpf_list_head *head, + void *meta__ign) { - return __bpf_list_del(head, true); + struct btf_struct_meta *meta = meta__ign; + + return __bpf_list_del(head, meta ? meta->record : NULL, true); } -__bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root, - struct bpf_rb_node *node) +__bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove_impl(struct bpf_rb_root *root, + struct bpf_rb_node *node, + void *meta__ign) { struct bpf_rb_node_kern *node_internal = (struct bpf_rb_node_kern *)node; struct rb_root_cached *r = (struct rb_root_cached *)root; struct rb_node *n = &node_internal->rb_node; + struct btf_struct_meta *meta = meta__ign; + struct btf_record *rec; + bool exclusive; + + rec = meta ? meta->record : NULL; + exclusive = btf_record_has_field(rec, BPF_RB_EXCL_NODE); /* node_internal->owner != root implies either RB_EMPTY_NODE(n) or * n is owned by some other tree. No need to check RB_EMPTY_NODE(n) */ - if (READ_ONCE(node_internal->owner) != root) + if (!exclusive && READ_ONCE(node_internal->owner) != root) return NULL; rb_erase_cached(n, r); RB_CLEAR_NODE(n); - WRITE_ONCE(node_internal->owner, NULL); + if (!exclusive) + WRITE_ONCE(node_internal->owner, NULL); return (struct bpf_rb_node *)n; } @@ -2093,11 +2116,14 @@ static int __bpf_rbtree_add(struct bpf_rb_root *root, struct rb_node *parent = NULL, *n = &node->rb_node; bpf_callback_t cb = (bpf_callback_t)less; bool leftmost = true; + bool exclusive; + + exclusive = btf_record_has_field(rec, BPF_RB_EXCL_NODE); /* node->owner != NULL implies !RB_EMPTY_NODE(n), no need to separately * check the latter */ - if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) { + if (!exclusive && cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) { /* Only called from BPF prog, no need to migrate_disable */ __bpf_obj_drop_impl((void *)n - off, rec, false); return -EINVAL; @@ -2115,7 +2141,8 @@ static int __bpf_rbtree_add(struct bpf_rb_root *root, rb_link_node(n, parent, link); rb_insert_color_cached(n, (struct rb_root_cached *)root, leftmost); - WRITE_ONCE(node->owner, root); + if (!exclusive) + WRITE_ONCE(node->owner, root); return 0; } @@ -2562,11 +2589,11 @@ BTF_ID_FLAGS(func, bpf_percpu_obj_drop_impl, KF_RELEASE) BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL | KF_RCU) BTF_ID_FLAGS(func, bpf_list_push_front_impl) BTF_ID_FLAGS(func, bpf_list_push_back_impl) -BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) -BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_list_pop_front_impl, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_list_pop_back_impl, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE) -BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_rbtree_remove_impl, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_rbtree_add_impl) BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f01d2b876a2e..ffab9b6048cd 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -11005,13 +11005,13 @@ enum special_kfunc_type { KF_bpf_refcount_acquire_impl, KF_bpf_list_push_front_impl, KF_bpf_list_push_back_impl, - KF_bpf_list_pop_front, - KF_bpf_list_pop_back, + KF_bpf_list_pop_front_impl, + KF_bpf_list_pop_back_impl, KF_bpf_cast_to_kern_ctx, KF_bpf_rdonly_cast, KF_bpf_rcu_read_lock, KF_bpf_rcu_read_unlock, - KF_bpf_rbtree_remove, + KF_bpf_rbtree_remove_impl, KF_bpf_rbtree_add_impl, KF_bpf_rbtree_first, KF_bpf_dynptr_from_skb, @@ -11031,11 +11031,11 @@ BTF_ID(func, bpf_obj_drop_impl) BTF_ID(func, bpf_refcount_acquire_impl) BTF_ID(func, bpf_list_push_front_impl) BTF_ID(func, bpf_list_push_back_impl) -BTF_ID(func, bpf_list_pop_front) -BTF_ID(func, bpf_list_pop_back) +BTF_ID(func, bpf_list_pop_front_impl) +BTF_ID(func, bpf_list_pop_back_impl) BTF_ID(func, bpf_cast_to_kern_ctx) BTF_ID(func, bpf_rdonly_cast) -BTF_ID(func, bpf_rbtree_remove) +BTF_ID(func, bpf_rbtree_remove_impl) BTF_ID(func, bpf_rbtree_add_impl) BTF_ID(func, bpf_rbtree_first) BTF_ID(func, bpf_dynptr_from_skb) @@ -11057,13 +11057,13 @@ BTF_ID(func, bpf_obj_drop_impl) BTF_ID(func, bpf_refcount_acquire_impl) BTF_ID(func, bpf_list_push_front_impl) BTF_ID(func, bpf_list_push_back_impl) -BTF_ID(func, bpf_list_pop_front) -BTF_ID(func, bpf_list_pop_back) +BTF_ID(func, bpf_list_pop_front_impl) +BTF_ID(func, bpf_list_pop_back_impl) BTF_ID(func, bpf_cast_to_kern_ctx) BTF_ID(func, bpf_rdonly_cast) BTF_ID(func, bpf_rcu_read_lock) BTF_ID(func, bpf_rcu_read_unlock) -BTF_ID(func, bpf_rbtree_remove) +BTF_ID(func, bpf_rbtree_remove_impl) BTF_ID(func, bpf_rbtree_add_impl) BTF_ID(func, bpf_rbtree_first) BTF_ID(func, bpf_dynptr_from_skb) @@ -11382,14 +11382,14 @@ static bool is_bpf_list_api_kfunc(u32 btf_id) { return btf_id == special_kfunc_list[KF_bpf_list_push_front_impl] || btf_id == special_kfunc_list[KF_bpf_list_push_back_impl] || - btf_id == special_kfunc_list[KF_bpf_list_pop_front] || - btf_id == special_kfunc_list[KF_bpf_list_pop_back]; + btf_id == special_kfunc_list[KF_bpf_list_pop_front_impl] || + btf_id == special_kfunc_list[KF_bpf_list_pop_back_impl]; } static bool is_bpf_rbtree_api_kfunc(u32 btf_id) { return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl] || - btf_id == special_kfunc_list[KF_bpf_rbtree_remove] || + btf_id == special_kfunc_list[KF_bpf_rbtree_remove_impl] || btf_id == special_kfunc_list[KF_bpf_rbtree_first]; } @@ -11448,11 +11448,13 @@ static bool check_kfunc_is_graph_node_api(struct bpf_verifier_env *env, switch (node_field_type) { case BPF_LIST_NODE: + case BPF_LIST_EXCL_NODE: ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_list_push_front_impl] || kfunc_btf_id == special_kfunc_list[KF_bpf_list_push_back_impl]); break; case BPF_RB_NODE: - ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_remove] || + case BPF_RB_EXCL_NODE: + ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_remove_impl] || kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl]); break; default: @@ -11515,6 +11517,9 @@ __process_kf_arg_ptr_to_graph_root(struct bpf_verifier_env *env, return -EFAULT; } *head_field = field; + meta->arg_btf = field->graph_root.btf; + meta->arg_btf_id = field->graph_root.value_btf_id; + return 0; } @@ -11603,18 +11608,30 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno, struct bpf_kfunc_call_arg_meta *meta) { - return __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta, - BPF_LIST_HEAD, BPF_LIST_NODE, - &meta->arg_list_head.field); + int err; + + err = __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta, + BPF_LIST_HEAD, BPF_LIST_NODE, + &meta->arg_list_head.field); + + return err ? __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta, + BPF_LIST_HEAD, BPF_LIST_EXCL_NODE, + &meta->arg_list_head.field) : 0; } static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno, struct bpf_kfunc_call_arg_meta *meta) { - return __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta, - BPF_RB_ROOT, BPF_RB_NODE, - &meta->arg_rbtree_root.field); + int err; + + err = __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta, + BPF_RB_ROOT, BPF_RB_NODE, + &meta->arg_rbtree_root.field); + + return err ? __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta, + BPF_RB_ROOT, BPF_RB_EXCL_NODE, + &meta->arg_rbtree_root.field) : 0; } /* @@ -11948,7 +11965,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return ret; break; case KF_ARG_PTR_TO_RB_NODE: - if (meta->func_id == special_kfunc_list[KF_bpf_rbtree_remove]) { + if (meta->func_id == special_kfunc_list[KF_bpf_rbtree_remove_impl]) { if (!type_is_non_owning_ref(reg->type) || reg->ref_obj_id) { verbose(env, "rbtree_remove node input must be non-owning ref\n"); return -EINVAL; @@ -12255,6 +12272,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, } } + if (meta.func_id == special_kfunc_list[KF_bpf_list_pop_front_impl] || + meta.func_id == special_kfunc_list[KF_bpf_list_pop_back_impl] || + meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove_impl]) + insn_aux->kptr_struct_meta = btf_find_struct_meta(meta.arg_btf, meta.arg_btf_id); + if (meta.func_id == special_kfunc_list[KF_bpf_throw]) { if (!bpf_jit_supports_exceptions()) { verbose(env, "JIT does not support calling kfunc %s#%d\n", @@ -12386,12 +12408,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, insn_aux->kptr_struct_meta = btf_find_struct_meta(meta.arg_btf, meta.arg_btf_id); - } else if (meta.func_id == special_kfunc_list[KF_bpf_list_pop_front] || - meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) { + } else if (meta.func_id == special_kfunc_list[KF_bpf_list_pop_front_impl] || + meta.func_id == special_kfunc_list[KF_bpf_list_pop_back_impl]) { struct btf_field *field = meta.arg_list_head.field; mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root); - } else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] || + } else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove_impl] || meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) { struct btf_field *field = meta.arg_rbtree_root.field; @@ -19526,6 +19548,21 @@ static void __fixup_collection_insert_kfunc(struct bpf_insn_aux_data *insn_aux, *cnt = 4; } +static void __fixup_collection_remove_kfunc(struct bpf_insn_aux_data *insn_aux, + u16 struct_meta_reg, + struct bpf_insn *insn, + struct bpf_insn *insn_buf, + int *cnt) +{ + struct btf_struct_meta *kptr_struct_meta = insn_aux->kptr_struct_meta; + struct bpf_insn addr[2] = { BPF_LD_IMM64(struct_meta_reg, (long)kptr_struct_meta) }; + + insn_buf[0] = addr[0]; + insn_buf[1] = addr[1]; + insn_buf[2] = *insn; + *cnt = 3; +} + static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, struct bpf_insn *insn_buf, int insn_idx, int *cnt) { @@ -19614,6 +19651,24 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg, node_offset_reg, insn, insn_buf, cnt); + } else if (desc->func_id == special_kfunc_list[KF_bpf_list_pop_back_impl] || + desc->func_id == special_kfunc_list[KF_bpf_list_pop_front_impl] || + desc->func_id == special_kfunc_list[KF_bpf_rbtree_remove_impl]) { + struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta; + int struct_meta_reg = BPF_REG_2; + + /* rbtree_remove has extra 'node' arg, so args-to-fixup are in diff regs */ + if (desc->func_id == special_kfunc_list[KF_bpf_rbtree_remove_impl]) + struct_meta_reg = BPF_REG_3; + + if (!kptr_struct_meta) { + verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n", + insn_idx); + return -EFAULT; + } + + __fixup_collection_remove_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg, + insn, insn_buf, cnt); } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] || desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) { insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1); diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index a4da75df819c..27f6d1fec793 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -91,22 +91,34 @@ extern int bpf_list_push_back_impl(struct bpf_list_head *head, * Returns * Pointer to bpf_list_node of deleted entry, or NULL if list is empty. */ -extern struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) __ksym; +extern struct bpf_list_node *bpf_list_pop_front_impl(struct bpf_list_head *head, + void *meta) __ksym; + +/* Convenience macro to wrap over bpf_list_pop_front_impl */ +#define bpf_list_pop_front(head) bpf_list_pop_front_impl(head, NULL) /* Description * Remove the entry at the end of the BPF linked list. * Returns * Pointer to bpf_list_node of deleted entry, or NULL if list is empty. */ -extern struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) __ksym; +extern struct bpf_list_node *bpf_list_pop_back_impl(struct bpf_list_head *head, + void *meta) __ksym; + +/* Convenience macro to wrap over bpf_list_pop_back_impl */ +#define bpf_list_pop_back(head) bpf_list_pop_back_impl(head, NULL) /* Description * Remove 'node' from rbtree with root 'root' * Returns * Pointer to the removed node, or NULL if 'root' didn't contain 'node' */ -extern struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root, - struct bpf_rb_node *node) __ksym; +extern struct bpf_rb_node *bpf_rbtree_remove_impl(struct bpf_rb_root *root, + struct bpf_rb_node *node, + void *meta) __ksym; + +/* Convenience macro to wrap over bpf_rbtree_remove_impl */ +#define bpf_rbtree_remove(head, node) bpf_rbtree_remove_impl(head, node, NULL) /* Description * Add 'node' to rbtree with root 'root' using comparator 'less' @@ -132,6 +144,44 @@ extern int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *nod */ extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym; +/* Convenience single-ownership graph functions */ +int bpf_list_excl_push_front(struct bpf_list_head *head, struct bpf_list_excl_node *node) +{ + return bpf_list_push_front(head, (struct bpf_list_node *)node); +} + +int bpf_list_excl_push_back(struct bpf_list_head *head, struct bpf_list_excl_node *node) +{ + return bpf_list_push_back(head, (struct bpf_list_node *)node); +} + +struct bpf_list_excl_node *bpf_list_excl_pop_front(struct bpf_list_head *head) +{ + return (struct bpf_list_excl_node *)bpf_list_pop_front(head); +} + +struct bpf_list_excl_node *bpf_list_excl_pop_back(struct bpf_list_head *head) +{ + return (struct bpf_list_excl_node *)bpf_list_pop_back(head); +} + +struct bpf_rb_excl_node *bpf_rbtree_excl_remove(struct bpf_rb_root *root, + struct bpf_rb_excl_node *node) +{ + return (struct bpf_rb_excl_node *)bpf_rbtree_remove(root, (struct bpf_rb_node *)node); +} + +int bpf_rbtree_excl_add(struct bpf_rb_root *root, struct bpf_rb_excl_node *node, + bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b)) +{ + return bpf_rbtree_add(root, (struct bpf_rb_node *)node, less); +} + +struct bpf_rb_excl_node *bpf_rbtree_excl_first(struct bpf_rb_root *root) +{ + return (struct bpf_rb_excl_node *)bpf_rbtree_first(root); +} + /* Description * Allocates a percpu object of the type represented by 'local_type_id' in * program BTF. User may use the bpf_core_type_id_local macro to pass the From patchwork Fri May 10 19:24:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661884 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FBEA5101A; Fri, 10 May 2024 19:24:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369063; cv=none; b=DU081i9VafDQ6pPKh2o5BirEKzQ9cwsyLMlyjxfXoih1PgFvyW+FLxlo/nJuPT37SD1Ll4bHz6mjaaT5ViFNPQ3A1HmQvcw7cBRrA+UQXTvJ5xVzVPekQMWbpOrg+Q9wopOGxSllxmkqGK4JpDfQ/m6xWcRAf8WD2UVeoNJ8PAg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369063; c=relaxed/simple; bh=LeBqbzZw3U0iZrWXjjzjdcbkZuvBR+k7Yi2U/7a0FQc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tLPXAddMMneW3Kg3flVEFqKHpvoNoshipOsEvyBURgVelRJPhjys5hadaXnlPHAXr2fYHSYcZUrinj8YNNk8R03wk5RYSQH56GNEjnyoc1pZQ7cbUWCzrzfJPeW+7OMmkx3oeVC0H9zLLvtbohG9o2LoANC7fC36ydK1/2erE+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RGqzDaqz; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RGqzDaqz" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-43dfcbc4893so7533351cf.2; Fri, 10 May 2024 12:24:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369061; x=1715973861; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cBGyuSo6Z3xNghSqlZ8Vpcr9iM+DD2u0q7FNLP4Iug4=; b=RGqzDaqzfki62ABdGBPI3eLANwr61/VG8Z8Nc5C1aPq3T9TVbJsXOeT9qktPE10g6f I0MLlyZaMvz7mQgLGJ6XIm9xv9G/uEFUfMcDcyL1PCu42VzotEd8G8eJv7CrJUSOmb1x TPwn/3rQ+e5jgsKDZlficQVF7e6cDOZiVpwpNm6+RisDMjlcZCFAwTmoXpXNkxOkXAJO t6qLg4ohHRSRks1M5l5TZPHyTGxz9u3nDwth4okYn5J+METk0Wmn4Myf1IJrXBJ2Wfq3 bqfWfwgoiLljrzr8kdMv1xuKETAxdkaVppWNURLRWA8Y4LHkAYXNpaxX6ExVbNri8wQf F2rQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369061; x=1715973861; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cBGyuSo6Z3xNghSqlZ8Vpcr9iM+DD2u0q7FNLP4Iug4=; b=gCOvCZQUiGW+4Rix74hcnlPKWSkqg1BWs5H0pcjh3TR+g6MK8yikBsDDqpMrKtNkIi /JtI5m2c/aO/yvcfw3e3AJbKD37boaxmsDm+ac7hWeS19TQ597XzHjXy2SJn73KoHOk0 Owxvwjs2FHE4PZJqKrGcQw195kytMoSzOdeuPDBd/8YY2iFANyLcsgXfWqZ6LdohL3RR GPgIKqv5wpY/Xwco1ZOhdCZ8x9a1MI8tqJxBejtynZSYzY/k9Kglu5+MTy7xfipyeV1i Ri0nrRK2wCpo+QX/RHHEs4GZHSe91ocI2XsARZkS9GT742KZKHEo314QCEEJuQPOGioo bZHQ== X-Gm-Message-State: AOJu0YxQ+W0Sw+OwtT4Qmk9qp18O3R2cn/ErnHzUQ5o3uRJng8RO+USd BthpKTAGVT5llkzIjMdcL2eymYiq1rki7G4fYhpB9s/MhBc+BAWhk6rwBw== X-Google-Smtp-Source: AGHT+IE0oa2Da+xPsupq02++/jgXnqa3Pf33AC8D1GF4Fk7EhmExrwo1QS4AXXqaZvrIyFsEF0KQ7A== X-Received: by 2002:ac8:7fd3:0:b0:43a:db0c:e7f0 with SMTP id d75a77b69052e-43dfdb6e62fmr37471621cf.29.1715369061044; Fri, 10 May 2024 12:24:21 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:20 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 12/20] selftests/bpf: Modify linked_list tests to work with macro-ified removes Date: Fri, 10 May 2024 19:24:04 +0000 Message-Id: <20240510192412.3297104-13-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Since a hidden arguement is added to bpf list remove kfuncs, and bpf_list_pop_back/front are macrofied, we modify selftests so that it can be compiled. Signed-off-by: Amery Hung --- .../selftests/bpf/progs/linked_list_fail.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/linked_list_fail.c b/tools/testing/selftests/bpf/progs/linked_list_fail.c index 5f8063ecc448..d260f80ea64d 100644 --- a/tools/testing/selftests/bpf/progs/linked_list_fail.c +++ b/tools/testing/selftests/bpf/progs/linked_list_fail.c @@ -49,8 +49,7 @@ int test##_missing_lock_##op(void *ctx) \ { \ INIT; \ - void (*p)(void *) = (void *)&bpf_list_##op; \ - p(hexpr); \ + bpf_list_##op(hexpr); \ return 0; \ } @@ -96,9 +95,8 @@ CHECK(inner_map, push_back, &iv->head, &f->node2); int test##_incorrect_lock_##op(void *ctx) \ { \ INIT; \ - void (*p)(void *) = (void *)&bpf_list_##op; \ bpf_spin_lock(lexpr); \ - p(hexpr); \ + bpf_list_##op(hexpr); \ return 0; \ } @@ -576,7 +574,7 @@ int incorrect_head_off2(void *ctx) } static __always_inline -int pop_ptr_off(void *(*op)(void *head)) +int pop_ptr_off(bool pop_front) { struct { struct bpf_list_head head __contains(foo, node2); @@ -588,7 +586,10 @@ int pop_ptr_off(void *(*op)(void *head)) if (!p) return 0; bpf_spin_lock(&p->lock); - n = op(&p->head); + if (pop_front) + n = bpf_list_pop_front(&p->head); + else + n = bpf_list_pop_back(&p->head); bpf_spin_unlock(&p->lock); if (!n) @@ -600,13 +601,13 @@ int pop_ptr_off(void *(*op)(void *head)) SEC("?tc") int pop_front_off(void *ctx) { - return pop_ptr_off((void *)bpf_list_pop_front); + return pop_ptr_off(true); } SEC("?tc") int pop_back_off(void *ctx) { - return pop_ptr_off((void *)bpf_list_pop_back); + return pop_ptr_off(false); } SEC("?tc") From patchwork Fri May 10 19:24:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661886 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF447481CD; Fri, 10 May 2024 19:24:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369065; cv=none; b=VzFRIYF4jP8V+A7sd4jO6k4xQSkedQxPpEuCWmHi8+jNS8LNe4WSehbTVvYQAQu9dSarLFMqxzBidWGlIY3HT+iozJGsUb1zONZNilH2rlcJMJnjkpMkgTjv4omBEK14oibc6NV4Km+AtTD+V0R7+3vCKJU+Y+gJT9pI8f4zCTw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369065; c=relaxed/simple; bh=LY1V7Kf2blJsc6F2WFAacV0Wv+CpTA4aPumqXFb7W64=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WV1cyVo4jUjg87xNz6aDOuV3ycV6vpajOqJsyQ40rUdNg+Ycv+8J2wM8pTr5aFTB8xv0FlZEgl45FI7gzkyLai5H4MXNWEnn3cbvmcry374mEe1Xqw2+mRSI6oP7O/nrf8c1Pp6NwTUr5+6Ju5A69WNr/buhoaRwBsTth53qKMI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XcasTyOC; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XcasTyOC" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-43df2c8eeb1so12264101cf.3; Fri, 10 May 2024 12:24:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369062; x=1715973862; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sKOlmTw+sd2NVyMctGsaaUsDDNJLitT8EajX7p1i1os=; b=XcasTyOCKZf+R636OKZhuhvB1NkD9GkLeHQeORcEoC1UCY2EVgXBtbubcg3wy1FDvC SDxfQyp93qd0b462dG1Wtg/mD+v5JhfG9dttjLfj6ZZuYemw/E8hHbdfNJxI+6iPouVz pyBFYGc+vqCHWmBCJcR5lfeV1luPtPQehJ3FW8zTL5Gk7M0W2lfvotFKZB3b85XGCvv/ as9ARkDAxE4UYpzNomaKv9Vv//5WVhY2hfH6LTcTOA55iBBZRzrSEra79p3OrLx2OKpB mUQvjqBV7LGx8h1zaBif1Dn/a74V1JGON2R6Lz1/f482KsaMzpPlQXSsc4DUzyV/rxYV eLJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369062; x=1715973862; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sKOlmTw+sd2NVyMctGsaaUsDDNJLitT8EajX7p1i1os=; b=jyZUBtW+Gqb0NSTXAC1+9coxFHQRnZcWu/2OqlWbpGHZA+RIMI4emNuxkIZ4mJuhRV 6BwL51W4viUYk33/t0amhC0Bna8zZtoda80HHD8kU0et6M8KDJFPz0qZHXJfuBdWSUwG tEswAdoLVnrtj81av7TqY7hRNC18bbHhNVtOd96eRnhp8Q1VDQ3+tj3hYLrHV1fanoJ8 tN4uEQJJ6TCgRFRLrEMP05PEgK/pYDFCIUjQlcHp/C+PFNxsX/Xw5qBdmiq7kHdNGgNP uSnTEZVaxycTDcUp9b+CCcJ0E1C6nOavSy58DoBGCTgUfTThEHSxOuKwjw51CBeG5yHl uj/A== X-Gm-Message-State: AOJu0YzOVQE+6VyyEH4TEpfWNzWWI5nggvVD0K0nKjgJ+6B4VC2YaPP6 thakhvS2+74PyKrUsoOH8G+CPA/KQWxSE+TY1OCkXrJafn26S+NF3eIPDg== X-Google-Smtp-Source: AGHT+IHB2xYdVTlU/wbS5FaQ3yjXwz2Urpm6eofRY6HgxVKZL6Qa9C2cpbXO4gJevnHVV4AF7+U7cg== X-Received: by 2002:a05:622a:8b:b0:43a:d430:b678 with SMTP id d75a77b69052e-43dfdb170a3mr34517901cf.32.1715369061626; Fri, 10 May 2024 12:24:21 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:21 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 13/20] bpf: net_sched: Support implementation of Qdisc_ops in bpf Date: Fri, 10 May 2024 19:24:05 +0000 Message-Id: <20240510192412.3297104-14-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch enables users to implement a qdisc using bpf. The last few patches in this series has prepared struct_ops to support core methods in Qdisc_ops. The recent advancement in bpf such as local objects, bpf list and bpf rbtree has also provided powerful and flexible building blocks to realize sophisticated scheduling algorithms. Therefore, in this patch, we start allowing qdisc to be implemented using bpf struct_ops. Users can implement .enqueue and .dequeue in Qdisc_ops in bpf and register the qdisc dynamically into the kernel. To further make bpf qdisc easy to use, a qdisc watchdog and a class hash table are included by default. They are taken care of by bpf qdisc infra with predefined Qdisc_ops and Qdisc_class_ops methods. In the next few patches, kfuncs will be introduced for users to really make use of them, and more ops will be supported. Signed-off-by: Cong Wang Co-developed-by: Amery Hung Signed-off-by: Amery Hung --- include/linux/btf.h | 1 + kernel/bpf/btf.c | 2 +- net/sched/Makefile | 4 + net/sched/bpf_qdisc.c | 563 ++++++++++++++++++++++++++++++++++++++++ net/sched/sch_api.c | 7 +- net/sched/sch_generic.c | 3 +- 6 files changed, 575 insertions(+), 5 deletions(-) create mode 100644 net/sched/bpf_qdisc.c diff --git a/include/linux/btf.h b/include/linux/btf.h index 2579b8a51172..2d01a921f604 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -520,6 +520,7 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); const char *btf_name_by_offset(const struct btf *btf, u32 offset); struct btf *btf_parse_vmlinux(void); struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); +u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, int off); u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id, const struct bpf_prog *prog); u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id, diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 6a9c1671c8f4..edfaba046427 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6304,7 +6304,7 @@ static bool is_int_ptr(struct btf *btf, const struct btf_type *t) return btf_type_is_int(t); } -static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, +u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, int off) { const struct btf_param *args; diff --git a/net/sched/Makefile b/net/sched/Makefile index 82c3f78ca486..2094e6e74158 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -63,6 +63,10 @@ obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o obj-$(CONFIG_NET_SCH_ETF) += sch_etf.o obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o +ifeq ($(CONFIG_BPF_JIT),y) +obj-$(CONFIG_BPF_SYSCALL) += bpf_qdisc.o +endif + obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o obj-$(CONFIG_NET_CLS_FW) += cls_fw.o diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c new file mode 100644 index 000000000000..53e9b0f1fbd8 --- /dev/null +++ b/net/sched/bpf_qdisc.c @@ -0,0 +1,563 @@ +#include +#include +#include +#include +#include +#include +#include + +static struct bpf_struct_ops bpf_Qdisc_ops; + +static u32 unsupported_ops[] = { + offsetof(struct Qdisc_ops, init), + offsetof(struct Qdisc_ops, reset), + offsetof(struct Qdisc_ops, destroy), + offsetof(struct Qdisc_ops, change), + offsetof(struct Qdisc_ops, attach), + offsetof(struct Qdisc_ops, change_real_num_tx), + offsetof(struct Qdisc_ops, dump), + offsetof(struct Qdisc_ops, dump_stats), + offsetof(struct Qdisc_ops, ingress_block_set), + offsetof(struct Qdisc_ops, egress_block_set), + offsetof(struct Qdisc_ops, ingress_block_get), + offsetof(struct Qdisc_ops, egress_block_get), +}; + +struct sch_bpf_class { + struct Qdisc_class_common common; + struct Qdisc *qdisc; + + unsigned int drops; + unsigned int overlimits; + struct gnet_stats_basic_sync bstats; +}; + +struct bpf_sched_data { + struct tcf_proto __rcu *filter_list; /* optional external classifier */ + struct tcf_block *block; + struct Qdisc_class_hash clhash; + struct qdisc_watchdog watchdog; +}; + +struct bpf_sk_buff_ptr { + struct sk_buff *skb; +}; + +static int bpf_qdisc_init(struct btf *btf) +{ + return 0; +} + +static int sch_bpf_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, + struct Qdisc **old, struct netlink_ext_ack *extack) +{ + struct sch_bpf_class *cl = (struct sch_bpf_class *)arg; + + if (new) + *old = qdisc_replace(sch, new, &cl->qdisc); + return 0; +} + +static struct Qdisc *sch_bpf_leaf(struct Qdisc *sch, unsigned long arg) +{ + struct sch_bpf_class *cl = (struct sch_bpf_class *)arg; + + return cl->qdisc; +} + +static struct sch_bpf_class *sch_bpf_find(struct Qdisc *sch, u32 classid) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + struct Qdisc_class_common *clc; + + clc = qdisc_class_find(&q->clhash, classid); + if (!clc) + return NULL; + return container_of(clc, struct sch_bpf_class, common); +} + +static unsigned long sch_bpf_search(struct Qdisc *sch, u32 handle) +{ + return (unsigned long)sch_bpf_find(sch, handle); +} + +static int sch_bpf_change_class(struct Qdisc *sch, u32 classid, + u32 parentid, struct nlattr **tca, + unsigned long *arg, + struct netlink_ext_ack *extack) +{ + struct sch_bpf_class *cl = (struct sch_bpf_class *)*arg; + struct bpf_sched_data *q = qdisc_priv(sch); + + if (!cl) { + if (classid == 0 || TC_H_MAJ(classid ^ sch->handle) != 0 || + sch_bpf_find(sch, classid)) + return -EINVAL; + + cl = kzalloc(sizeof(*cl), GFP_KERNEL); + if (!cl) + return -ENOBUFS; + + cl->common.classid = classid; + gnet_stats_basic_sync_init(&cl->bstats); + qdisc_class_hash_insert(&q->clhash, &cl->common); + } + + qdisc_class_hash_grow(sch, &q->clhash); + *arg = (unsigned long)cl; + return 0; +} + +static int sch_bpf_delete(struct Qdisc *sch, unsigned long arg, + struct netlink_ext_ack *extack) +{ + struct sch_bpf_class *cl = (struct sch_bpf_class *)arg; + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_class_hash_remove(&q->clhash, &cl->common); + if (cl->qdisc) + qdisc_put(cl->qdisc); + return 0; +} + +static struct tcf_block *sch_bpf_tcf_block(struct Qdisc *sch, unsigned long cl, + struct netlink_ext_ack *extack) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + if (cl) + return NULL; + return q->block; +} + +static unsigned long sch_bpf_bind(struct Qdisc *sch, unsigned long parent, + u32 classid) +{ + return 0; +} + +static void sch_bpf_unbind(struct Qdisc *q, unsigned long cl) +{ +} + +static int sch_bpf_dump_class(struct Qdisc *sch, unsigned long arg, + struct sk_buff *skb, struct tcmsg *tcm) +{ + return 0; +} + +static int +sch_bpf_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d) +{ + struct sch_bpf_class *cl = (struct sch_bpf_class *)arg; + struct gnet_stats_queue qs = { + .drops = cl->drops, + .overlimits = cl->overlimits, + }; + __u32 qlen = 0; + + if (cl->qdisc) + qdisc_qstats_qlen_backlog(cl->qdisc, &qlen, &qs.backlog); + else + qlen = 0; + + if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 || + gnet_stats_copy_queue(d, NULL, &qs, qlen) < 0) + return -1; + return 0; +} + +static void sch_bpf_walk(struct Qdisc *sch, struct qdisc_walker *arg) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + struct sch_bpf_class *cl; + unsigned int i; + + if (arg->stop) + return; + + for (i = 0; i < q->clhash.hashsize; i++) { + hlist_for_each_entry(cl, &q->clhash.hash[i], common.hnode) { + if (arg->count < arg->skip) { + arg->count++; + continue; + } + if (arg->fn(sch, (unsigned long)cl, arg) < 0) { + arg->stop = 1; + return; + } + arg->count++; + } + } +} + +static int bpf_qdisc_init_op(struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + int err; + + qdisc_watchdog_init(&q->watchdog, sch); + + err = tcf_block_get(&q->block, &q->filter_list, sch, extack); + if (err) + return err; + + err = qdisc_class_hash_init(&q->clhash); + if (err < 0) + return err; + + return 0; +} + +static void bpf_qdisc_reset_op(struct Qdisc *sch) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + struct sch_bpf_class *cl; + unsigned int i; + + for (i = 0; i < q->clhash.hashsize; i++) { + hlist_for_each_entry(cl, &q->clhash.hash[i], common.hnode) { + if (cl->qdisc) + qdisc_reset(cl->qdisc); + } + } + + qdisc_watchdog_cancel(&q->watchdog); +} + +static void bpf_qdisc_destroy_class(struct Qdisc *sch, struct sch_bpf_class *cl) +{ + if (cl->qdisc) + qdisc_put(cl->qdisc); + kfree(cl); +} + +static void bpf_qdisc_destroy_op(struct Qdisc *sch) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + struct sch_bpf_class *cl; + struct hlist_node *next; + unsigned int i; + + qdisc_watchdog_cancel(&q->watchdog); + tcf_block_put(q->block); + + for (i = 0; i < q->clhash.hashsize; i++) { + hlist_for_each_entry_safe(cl, next, &q->clhash.hash[i], + common.hnode) { + qdisc_class_hash_remove(&q->clhash, + &cl->common); + bpf_qdisc_destroy_class(sch, cl); + } + } + + qdisc_class_hash_destroy(&q->clhash); +} + +static const struct Qdisc_class_ops sch_bpf_class_ops = { + .graft = sch_bpf_graft, + .leaf = sch_bpf_leaf, + .find = sch_bpf_search, + .change = sch_bpf_change_class, + .delete = sch_bpf_delete, + .tcf_block = sch_bpf_tcf_block, + .bind_tcf = sch_bpf_bind, + .unbind_tcf = sch_bpf_unbind, + .dump = sch_bpf_dump_class, + .dump_stats = sch_bpf_dump_class_stats, + .walk = sch_bpf_walk, +}; + +static const struct bpf_func_proto * +bpf_qdisc_get_func_proto(enum bpf_func_id func_id, + const struct bpf_prog *prog) +{ + switch (func_id) { + default: + return bpf_base_func_proto(func_id, prog); + } +} + +BTF_ID_LIST_SINGLE(bpf_sk_buff_ids, struct, sk_buff) +BTF_ID_LIST_SINGLE(bpf_sk_buff_ptr_ids, struct, bpf_sk_buff_ptr) + +static bool bpf_qdisc_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + struct btf *btf = prog->aux->attach_btf; + u32 arg; + + arg = get_ctx_arg_idx(btf, prog->aux->attach_func_proto, off); + if (!strcmp(prog->aux->attach_func_name, "enqueue")) { + if (arg == 2) { + info->reg_type = PTR_TO_BTF_ID | PTR_TRUSTED; + info->btf = btf; + info->btf_id = bpf_sk_buff_ptr_ids[0]; + return true; + } + } + + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + const struct btf_type *t, *skbt; + size_t end; + + skbt = btf_type_by_id(reg->btf, bpf_sk_buff_ids[0]); + t = btf_type_by_id(reg->btf, reg->btf_id); + if (t != skbt) { + bpf_log(log, "only read is supported\n"); + return -EACCES; + } + + switch (off) { + case offsetof(struct sk_buff, tstamp): + end = offsetofend(struct sk_buff, tstamp); + break; + case offsetof(struct sk_buff, priority): + end = offsetofend(struct sk_buff, priority); + break; + case offsetof(struct sk_buff, mark): + end = offsetofend(struct sk_buff, mark); + break; + case offsetof(struct sk_buff, queue_mapping): + end = offsetofend(struct sk_buff, queue_mapping); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, tc_classid): + end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, tc_classid); + break; + case offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, data[0]) ... + offsetof(struct sk_buff, cb) + offsetof(struct qdisc_skb_cb, + data[QDISC_CB_PRIV_LEN - 1]): + end = offsetof(struct sk_buff, cb) + + offsetofend(struct qdisc_skb_cb, data[QDISC_CB_PRIV_LEN - 1]); + break; + case offsetof(struct sk_buff, tc_index): + end = offsetofend(struct sk_buff, tc_index); + break; + default: + bpf_log(log, "no write support to sk_buff at off %d\n", off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of sk_buff ended at %zu\n", + off, size, end); + return -EACCES; + } + + return 0; +} + +static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { + .get_func_proto = bpf_qdisc_get_func_proto, + .is_valid_access = bpf_qdisc_is_valid_access, + .btf_struct_access = bpf_qdisc_btf_struct_access, +}; + +static int bpf_qdisc_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + const struct Qdisc_ops *uqdisc_ops; + struct Qdisc_ops *qdisc_ops; + u32 moff; + + uqdisc_ops = (const struct Qdisc_ops *)udata; + qdisc_ops = (struct Qdisc_ops *)kdata; + + moff = __btf_member_bit_offset(t, member) / 8; + switch (moff) { + case offsetof(struct Qdisc_ops, cl_ops): + if (uqdisc_ops->cl_ops) + return -EINVAL; + + qdisc_ops->cl_ops = &sch_bpf_class_ops; + return 1; + case offsetof(struct Qdisc_ops, priv_size): + if (uqdisc_ops->priv_size) + return -EINVAL; + qdisc_ops->priv_size = sizeof(struct bpf_sched_data); + return 1; + case offsetof(struct Qdisc_ops, init): + qdisc_ops->init = bpf_qdisc_init_op; + return 1; + case offsetof(struct Qdisc_ops, reset): + qdisc_ops->reset = bpf_qdisc_reset_op; + return 1; + case offsetof(struct Qdisc_ops, destroy): + qdisc_ops->destroy = bpf_qdisc_destroy_op; + return 1; + case offsetof(struct Qdisc_ops, peek): + if (!uqdisc_ops->peek) + qdisc_ops->peek = qdisc_peek_dequeued; + return 1; + case offsetof(struct Qdisc_ops, id): + if (bpf_obj_name_cpy(qdisc_ops->id, uqdisc_ops->id, + sizeof(qdisc_ops->id)) <= 0) + return -EINVAL; + return 1; + } + + return 0; +} + +static bool is_unsupported(u32 member_offset) +{ + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(unsupported_ops); i++) { + if (member_offset == unsupported_ops[i]) + return true; + } + + return false; +} + +static int bpf_qdisc_check_member(const struct btf_type *t, + const struct btf_member *member, + const struct bpf_prog *prog) +{ + if (is_unsupported(__btf_member_bit_offset(t, member) / 8)) + return -ENOTSUPP; + return 0; +} + +static int bpf_qdisc_validate(void *kdata) +{ + return 0; +} + +static int bpf_qdisc_reg(void *kdata) +{ + return register_qdisc(kdata); +} + +static void bpf_qdisc_unreg(void *kdata) +{ + return unregister_qdisc(kdata); +} + +static int Qdisc_ops__enqueue(struct sk_buff *skb__ref_acquired, struct Qdisc *sch, + struct sk_buff **to_free) +{ + return 0; +} + +static struct sk_buff *Qdisc_ops__dequeue(struct Qdisc *sch) +{ + return NULL; +} + +static struct sk_buff *Qdisc_ops__peek(struct Qdisc *sch) +{ + return NULL; +} + +static int Qdisc_ops__init(struct Qdisc *sch, struct nlattr *arg, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static void Qdisc_ops__reset(struct Qdisc *sch) +{ +} + +static void Qdisc_ops__destroy(struct Qdisc *sch) +{ +} + +static int Qdisc_ops__change(struct Qdisc *sch, struct nlattr *arg, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static void Qdisc_ops__attach(struct Qdisc *sch) +{ +} + +static int Qdisc_ops__change_tx_queue_len(struct Qdisc *sch, unsigned int new_len) +{ + return 0; +} + +static void Qdisc_ops__change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx) +{ +} + +static int Qdisc_ops__dump(struct Qdisc *sch, struct sk_buff *skb) +{ + return 0; +} + +static int Qdisc_ops__dump_stats(struct Qdisc *sch, struct gnet_dump *d) +{ + return 0; +} + +static void Qdisc_ops__ingress_block_set(struct Qdisc *sch, u32 block_index) +{ +} + +static void Qdisc_ops__egress_block_set(struct Qdisc *sch, u32 block_index) +{ +} + +static u32 Qdisc_ops__ingress_block_get(struct Qdisc *sch) +{ + return 0; +} + +static u32 Qdisc_ops__egress_block_get(struct Qdisc *sch) +{ + return 0; +} + +static struct Qdisc_ops __bpf_ops_qdisc_ops = { + .enqueue = Qdisc_ops__enqueue, + .dequeue = Qdisc_ops__dequeue, + .peek = Qdisc_ops__peek, + .init = Qdisc_ops__init, + .reset = Qdisc_ops__reset, + .destroy = Qdisc_ops__destroy, + .change = Qdisc_ops__change, + .attach = Qdisc_ops__attach, + .change_tx_queue_len = Qdisc_ops__change_tx_queue_len, + .change_real_num_tx = Qdisc_ops__change_real_num_tx, + .dump = Qdisc_ops__dump, + .dump_stats = Qdisc_ops__dump_stats, + .ingress_block_set = Qdisc_ops__ingress_block_set, + .egress_block_set = Qdisc_ops__egress_block_set, + .ingress_block_get = Qdisc_ops__ingress_block_get, + .egress_block_get = Qdisc_ops__egress_block_get, +}; + +static struct bpf_struct_ops bpf_Qdisc_ops = { + .verifier_ops = &bpf_qdisc_verifier_ops, + .reg = bpf_qdisc_reg, + .unreg = bpf_qdisc_unreg, + .check_member = bpf_qdisc_check_member, + .init_member = bpf_qdisc_init_member, + .init = bpf_qdisc_init, + .validate = bpf_qdisc_validate, + .name = "Qdisc_ops", + .cfi_stubs = &__bpf_ops_qdisc_ops, + .owner = THIS_MODULE, +}; + +static int __init bpf_qdisc_kfunc_init(void) +{ + return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); +} +late_initcall(bpf_qdisc_kfunc_init); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 65e05b0c98e4..3b5ada5830cd 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -358,7 +359,7 @@ static struct Qdisc_ops *qdisc_lookup_ops(struct nlattr *kind) read_lock(&qdisc_mod_lock); for (q = qdisc_base; q; q = q->next) { if (nla_strcmp(kind, q->id) == 0) { - if (!try_module_get(q->owner)) + if (!bpf_try_module_get(q, q->owner)) q = NULL; break; } @@ -1282,7 +1283,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, /* We will try again qdisc_lookup_ops, * so don't keep a reference. */ - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err = -EAGAIN; goto err_out; } @@ -1392,7 +1393,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, netdev_put(dev, &sch->dev_tracker); qdisc_free(sch); err_out2: - module_put(ops->owner); + bpf_module_put(ops, ops->owner); err_out: *errp = err; return NULL; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index ff5336493777..f4343653db0f 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -1067,7 +1068,7 @@ static void __qdisc_destroy(struct Qdisc *qdisc) if (ops->destroy) ops->destroy(qdisc); - module_put(ops->owner); + bpf_module_put(ops, ops->owner); netdev_put(dev, &qdisc->dev_tracker); trace_qdisc_destroy(qdisc); From patchwork Fri May 10 19:24:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661888 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A37851C33; Fri, 10 May 2024 19:24:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369066; cv=none; b=bd3HNw4wG/L3FipsNdsOKTg3zYBQXSGXs0xP9kFcNnXL38MqA4adkqhC8UOjxGGCL9PNccVmzmpsww6O1TZPasOPm5GpOhUlsz5X4R7yEHRppHcVpvRS1/I1XAyBJw2/0ZqL8vrMNt9i8LH07iVNfjZ6RbfKKQanyUkWZ6zpyWg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369066; c=relaxed/simple; bh=JcgJgP4QF+fBtYObSbzOevCEnALGEv+7UrUjbVZ4I14=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=V3dVyoxoxtUlkkims6BhDiqSxuofBvrLxD8/gB4r9nDFxgGswfcH7M5KAwhnn9QnoteVahw4Nx7EAdTV822V838oCjOZvfN5nzGWc5vcidvmkxg3CSqN4yUIZMceKarJJuVKNMW9ngeUyeBi4GegeRGHQKXM7zA6R4eYR9Zbk4E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bImthb92; arc=none smtp.client-ip=209.85.128.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bImthb92" Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-6202ad4cae3so20561777b3.2; Fri, 10 May 2024 12:24:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369062; x=1715973862; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KCCMzj6W98Sbdlylo//bYfzGNMOyUb2ACdHxeYhqLCA=; b=bImthb92M5ZMekXwDiDwNiDKk1WHYcIjFho3iiS49AJNQLsg7rNWGa8BRttkiyi1lX mgW6IwJ32GaaonJakYVQOkWbYKR03L1Xvg0HFD+2S4aL9ohCU787EL+T0/9S232zlbQR eoBmR6y3bzc7HjywFUAko36PuVQv2Y75JbVc4L3XWMnWs1vKyf/WoNY5WZ2llaGosyUu 71rVMi2pHQc2rRAKuTqEnZL27SWrySwg6DvRb+wOjZeBUeswZdkxwIMcUI4rA7K9oOBS ctyROdXqZgOyMWn/tzPswuN9DXNS8xz5UGiJFDckVnAzJRRePHyCQS/9gVQ/gl50c1S6 Swxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369062; x=1715973862; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KCCMzj6W98Sbdlylo//bYfzGNMOyUb2ACdHxeYhqLCA=; b=cbt1xUlIBk/kzu0XIpBsH8faqcmsCZvr9BzQuo/ojkSR3StYtbAGhXh8ZLBo8M/fG/ 4Ois1IPibNv0OvDIE5q1JZeWoNRImFnF4w+veFmemiF8cFH/cXrTeKgrkwHpTklBEOvE AFrdjDQJX0TPgQinNwvHDPhUAG7vDjo4gnFMp91XskW6QPNTJ4wLpoIPmYy/uor4Y4lW xRaMiAKqyQxDvkcVKhIia/pXO6QnS3A4UdumrM53HzaFQ1lt5R8C6+WSvA/ulbjARcJu mjw4tfxDt/P9AWOeISeNhmzopJaur2cbpBFXylfysNdniolJMH8v0LQ0GDrDabTPE+k3 Is9w== X-Gm-Message-State: AOJu0YyYZX7q5AuH8hfGnhVgv9X5aW695jkzuKGeDawDQRj0MiMN/kS8 TA7fbDtV/agdLPKKFsBDgew8/rliWEprZOl+6A1cVHHS/mIddBsv6k6UTw== X-Google-Smtp-Source: AGHT+IF4cHcApfaBFcLRBGm4PboNiYvgYqfewUBAjHoXJ6K6I4epC0SoyY7JAOCeXr+vxGrxSvb7Tg== X-Received: by 2002:a81:7602:0:b0:620:3c10:527a with SMTP id 00721157ae682-622aff64a28mr32228007b3.15.1715369062135; Fri, 10 May 2024 12:24:22 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:21 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 14/20] bpf: net_sched: Add bpf qdisc kfuncs Date: Fri, 10 May 2024 19:24:06 +0000 Message-Id: <20240510192412.3297104-15-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC This patch adds kfuncs for working on skb and manipulating child class/qdisc. Both bpf_qdisc_skb_drop() and bpf_skb_release() can be used to release a reference to an skb. However, bpf_qdisc_skb_drop() can only be called in .enqueue where a to_free skb list is available from kernel to defer the release. Otherwise, bpf_skb_release() should be used elsewhere. It is also used in bpf_obj_free_fields() when cleaning up skb in maps and collections. For bpf_qdisc_enqueue() and bpf_qdisc_dequeue(), kfuncs that pass skb between the current qdisc and a child qdisc, classid is used to refer to a specific child qdisc instead of "srtuct Qdisc *" so that it is impossible to recursively enqueue or dequeue skb to a qdisc itself. More specifically, while we can make bpf_qdisc_find_class() return a pointer to a child qdisc, and use it in enqueue or dequeue kfuncs instead of classid, it would be hard to make sure the pointer is not pointing to the current qdisc, causing indefinite resursive calls. bpf_qdisc_create_child() is introduced to make the deployment easier and more robust. It can be called in .init to populate the class hierarchy the scheduling algorithm expect. This saves extra tc calls and prevents user errors in creating classes. An example can be found in the bpf prio qdisc in selftests. bpf_skb_set_dev() is temporarily added to restore skb->dev after removing skb from collection. Apparently, we cannot rely on the user to always call it after every remove. This will be addressed in the next revision. Signed-off-by: Cong Wang Co-developed-by: Amery Hung Signed-off-by: Amery Hung --- net/sched/bpf_qdisc.c | 239 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 238 insertions(+), 1 deletion(-) diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 53e9b0f1fbd8..2a40452c2c9a 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -358,6 +358,229 @@ static int bpf_qdisc_btf_struct_access(struct bpf_verifier_log *log, return 0; } +__bpf_kfunc_start_defs(); + +/* bpf_skb_set_dev - A temporary kfunc to restore skb->dev after removing an + * skb from collections. + * @skb: The skb to get the flow hash from. + * @sch: The qdisc the skb belongs to. + */ +__bpf_kfunc void bpf_skb_set_dev(struct sk_buff *skb, struct Qdisc *sch) +{ + skb->dev = qdisc_dev(sch); +} + +/* bpf_skb_get_hash - Get the flow hash of an skb. + * @skb: The skb to get the flow hash from. + */ +__bpf_kfunc u32 bpf_skb_get_hash(struct sk_buff *skb) +{ + return skb_get_hash(skb); +} + +/* bpf_skb_release - Release an skb reference acquired on an skb immediately. + * @skb: The skb on which a reference is being released. + */ +__bpf_kfunc void bpf_skb_release(struct sk_buff *skb) +{ + consume_skb(skb); +} + +/* bpf_qdisc_skb_drop - Add an skb to be dropped later to a list. + * @skb: The skb on which a reference is being released and dropped. + * @to_free_list: The list of skbs to be dropped. + */ +__bpf_kfunc void bpf_qdisc_skb_drop(struct sk_buff *skb, + struct bpf_sk_buff_ptr *to_free_list) +{ + __qdisc_drop(skb, (struct sk_buff **)to_free_list); +} + +/* bpf_qdisc_watchdog_schedule - Schedule a qdisc to a later time using a timer. + * @sch: The qdisc to be scheduled. + * @expire: The expiry time of the timer. + * @delta_ns: The slack range of the timer. + */ +__bpf_kfunc void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + + qdisc_watchdog_schedule_range_ns(&q->watchdog, expire, delta_ns); +} + +/* bpf_skb_tc_classify - Classify an skb using an existing filter referred + * to by the specified handle on the net device of index ifindex. + * @skb: The skb to be classified. + * @handle: The handle of the filter to be referenced. + * @ifindex: The ifindex of the net device where the filter is attached. + * + * Returns a 64-bit integer containing the tc action verdict and the classid, + * created as classid << 32 | action. + */ +__bpf_kfunc u64 bpf_skb_tc_classify(struct sk_buff *skb, int ifindex, u32 handle) +{ + struct net *net = dev_net(skb->dev); + const struct Qdisc_class_ops *cops; + struct tcf_result res = {}; + struct tcf_block *block; + struct tcf_chain *chain; + struct net_device *dev; + int result = TC_ACT_OK; + unsigned long cl = 0; + struct Qdisc *q; + + rcu_read_lock(); + dev = dev_get_by_index_rcu(net, ifindex); + if (!dev) + goto out; + q = qdisc_lookup_rcu(dev, handle); + if (!q) + goto out; + + cops = q->ops->cl_ops; + if (!cops) + goto out; + if (!cops->tcf_block) + goto out; + if (TC_H_MIN(handle)) { + cl = cops->find(q, handle); + if (cl == 0) + goto out; + } + block = cops->tcf_block(q, cl, NULL); + if (!block) + goto out; + + for (chain = tcf_get_next_chain(block, NULL); + chain; + chain = tcf_get_next_chain(block, chain)) { + struct tcf_proto *tp; + + for (tp = tcf_get_next_proto(chain, NULL); + tp; tp = tcf_get_next_proto(chain, tp)) { + + result = tcf_classify(skb, NULL, tp, &res, false); + if (result >= 0) { + switch (result) { + case TC_ACT_QUEUED: + case TC_ACT_STOLEN: + case TC_ACT_TRAP: + fallthrough; + case TC_ACT_SHOT: + rcu_read_unlock(); + return result; + } + } + } + } +out: + rcu_read_unlock(); + return (res.class << 32 | result); +} + +/* bpf_qdisc_create_child - Create a default child qdisc during init. + * A qdisc can use this kfunc to populate the desired class topology during + * initialization without relying on the user to do this correctly. A default + * pfifo will be added to the child class. + * + * @sch: The parent qdisc of the to-be-created child qdisc. + * @min: The minor number of the child qdisc. + * @extack: Netlink extended ACK report. + */ +__bpf_kfunc int bpf_qdisc_create_child(struct Qdisc *sch, u32 min, + struct netlink_ext_ack *extack) +{ + struct bpf_sched_data *q = qdisc_priv(sch); + struct sch_bpf_class *cl; + struct Qdisc *new_q; + + cl = kzalloc(sizeof(*cl), GFP_KERNEL); + if (!cl) + return -ENOMEM; + + cl->common.classid = TC_H_MAKE(sch->handle, TC_H_MIN(min)); + + new_q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, + TC_H_MAKE(sch->handle, min), extack); + if (!new_q) + return -ENOMEM; + + cl->qdisc = new_q; + + qdisc_class_hash_insert(&q->clhash, &cl->common); + qdisc_hash_add(new_q, true); + return 0; +} + +/* bpf_qdisc_find_class - Check if a specific class exists in a qdisc. + * @sch: The qdisc the class belongs to. + * @classid: The classsid of the class. + */ +__bpf_kfunc bool bpf_qdisc_find_class(struct Qdisc *sch, u32 classid) +{ + struct sch_bpf_class *cl = sch_bpf_find(sch, classid); + + if (!cl || !cl->qdisc) + return false; + + return true; +} + +/* bpf_qdisc_enqueue - Enqueue an skb into a child qdisc. + * @skb: The skb to be enqueued into another qdisc. + * @sch: The qdisc the skb currently belongs to. + * @classid: The handle of the child qdisc where the skb will be enqueued. + * @to_free_list: The list of skbs where a to-be-dropped skb will be added to. + */ +__bpf_kfunc int bpf_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, u32 classid, + struct bpf_sk_buff_ptr *to_free_list) +{ + struct sch_bpf_class *cl = sch_bpf_find(sch, classid); + + if (!cl || !cl->qdisc) + return qdisc_drop(skb, sch, (struct sk_buff **)to_free_list); + + return qdisc_enqueue(skb, cl->qdisc, (struct sk_buff **)to_free_list); +} + +/* bpf_qdisc_enqueue - Dequeue an skb from a child qdisc. + * @sch: The parent qdisc of the child qdisc. + * @classid: The handle of the child qdisc where we try to dequeue an skb. + */ +__bpf_kfunc struct sk_buff *bpf_qdisc_dequeue(struct Qdisc *sch, u32 classid) +{ + struct sch_bpf_class *cl = sch_bpf_find(sch, classid); + + if (!cl || !cl->qdisc) + return NULL; + + return cl->qdisc->dequeue(cl->qdisc); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(bpf_qdisc_kfunc_ids) +BTF_ID_FLAGS(func, bpf_skb_set_dev) +BTF_ID_FLAGS(func, bpf_skb_get_hash) +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_qdisc_skb_drop, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_qdisc_watchdog_schedule) +BTF_ID_FLAGS(func, bpf_skb_tc_classify) +BTF_ID_FLAGS(func, bpf_qdisc_create_child) +BTF_ID_FLAGS(func, bpf_qdisc_find_class) +BTF_ID_FLAGS(func, bpf_qdisc_enqueue, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_qdisc_dequeue, KF_ACQUIRE | KF_RET_NULL) +BTF_KFUNCS_END(bpf_qdisc_kfunc_ids) + +static const struct btf_kfunc_id_set bpf_qdisc_kfunc_set = { + .owner = THIS_MODULE, + .set = &bpf_qdisc_kfunc_ids, +}; + +BTF_ID_LIST(skb_kfunc_dtor_ids) +BTF_ID(struct, sk_buff) +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE) + static const struct bpf_verifier_ops bpf_qdisc_verifier_ops = { .get_func_proto = bpf_qdisc_get_func_proto, .is_valid_access = bpf_qdisc_is_valid_access, @@ -558,6 +781,20 @@ static struct bpf_struct_ops bpf_Qdisc_ops = { static int __init bpf_qdisc_kfunc_init(void) { - return register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); + int ret; + const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = { + { + .btf_id = skb_kfunc_dtor_ids[0], + .kfunc_btf_id = skb_kfunc_dtor_ids[1] + }, + }; + + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_qdisc_kfunc_set); + ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors, + ARRAY_SIZE(skb_kfunc_dtors), + THIS_MODULE); + ret = ret ?: register_bpf_struct_ops(&bpf_Qdisc_ops, Qdisc_ops); + + return ret; } late_initcall(bpf_qdisc_kfunc_init); From patchwork Fri May 10 19:24:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661889 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8177537F8; Fri, 10 May 2024 19:24:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369066; cv=none; b=WtVXJsX0WmwhxvPABPr4bi4IIR2bfPb9dmTGnhIqXjyj2Q7tnBHan7Os1IhhvXzv4ufSSUH3LaHe0jpvNeMLr+SRz4QROgXZKkJcm0c9X9motgCa71mWLGHqWM+Zfj9p19A531zAWiO9UuAzLxZw9AoXSK6oOwZIDnvYxwk/hbw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369066; c=relaxed/simple; bh=hIbTqEglQ2bFYGbS4V0MMiMJKsrpxxyTN/D2mrW0T+g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Sy1OAZAU7Ne5pZSqdNHJFh5UaiDjX4kVzIqHyUFys2m8teroQrnZMEBS4RpXf3XuT+vga9ebFyFFKz6Ay6+Y6LbjshJ7IOLEmG/1Umv2xc1TrpPvwkPLCegztJuji7EklTqI9BN+JkQkxg5i4epQn5sugl6LDabpMTWMcqyJbVk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YVsN2QKj; arc=none smtp.client-ip=209.85.160.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YVsN2QKj" Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-43692353718so15545881cf.0; Fri, 10 May 2024 12:24:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369064; x=1715973864; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=G8AFTlewnSL+/ijdYwMN1SOSwKFzSFH7C8XrS2MV76M=; b=YVsN2QKjCzk3g0Fcc8TdvjWuvr6+84XX0MeCABs/Nx42G6M50FjOruTe1dKmiwZlWf JaLlNypKMNkw0azH5RtOdGslWhTlBzZu/k4W8JOhbcM12xpm/zv4nq0ssR/MCvSxQLVY ukHj1cTMvUpJov36+7nuQjqVAEFJ4wjH/pO1CJbP4fd6i3t/8fY+z4/Dw6s5LCmKoAu6 eJDlrwWGYGuE5bNIlx72ePIQ1m6d/M8z3upr+a2nei2H2ln3D3s+IXmRrtdSODBjIBD2 eDyY6KLJNNog8zZoqBfvA0BDCIM01BdNPodiESlO13RNRx4VycZYq1pUn10F8qSzwduP CgCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369064; x=1715973864; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G8AFTlewnSL+/ijdYwMN1SOSwKFzSFH7C8XrS2MV76M=; b=YLyTX2sBADivy/cGsWZ4Ld7Xc7FHwLSykYj/BrmWcqh3KrCnAlR2PK7AbuXHj+YOoY whApwv8y3VLzuIBhwIkzYbUm0+OPRP+ZKAS0UOg8E3PQrvVSK0dnC2gq1E7uIBjnKd0F hxK2oYuBCKfavRk5G/HjzT5Ad/D/m4bdHmrdymr1mvvgTQ4F8LHsfkJdfVgA599P08kb EnnmuhAkxXyv9ANcDez22VhIDz6XRZSClUnwkH9l2PYm7HU2pC+c3l6Pxp98VjBCD4Tl t1FX2y+l+wArOf52JkPCoIwr4l4e5GYTL5CHFC8SxJDB5sfOeS5IY6oazMP6x1ISmzZL AZfQ== X-Gm-Message-State: AOJu0YwqEOpKFDC/kByxEaFFlqpS15lC7JmACzrKPlpte3LD8/1E1nU4 cc8kw+FQ8KFQ7ajcPmujNFW15Qzs4DhmuCFRWPrn0uXv4knBZJDrgVvt2Q== X-Google-Smtp-Source: AGHT+IF+O6VF5tdmHwzi0WGAbkDJZMXnnacXVWCOiStepB72RMHkoQiIhUERVPn6rXAvtIDvPwnsNg== X-Received: by 2002:ac8:58c9:0:b0:43a:cfd9:355a with SMTP id d75a77b69052e-43dfdaada9bmr51070901cf.23.1715369063820; Fri, 10 May 2024 12:24:23 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:22 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 15/20] bpf: net_sched: Allow more optional methods in Qdisc_ops Date: Fri, 10 May 2024 19:24:07 +0000 Message-Id: <20240510192412.3297104-16-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC So far, init, reset, and destroy are implemented by bpf qdisc infra as fixed methods that manipulate the watchdog and the class hash table according to the occasion. This patch allows users to supply these three ops to perform the desired work alongside the predefined methods. Signed-off-by: Amery Hung --- include/net/sch_generic.h | 8 ++++++++ net/sched/bpf_qdisc.c | 22 +++++----------------- net/sched/sch_api.c | 12 +++++++++++- net/sched/sch_generic.c | 8 ++++++++ 4 files changed, 32 insertions(+), 18 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 76db6be16083..71e54cfa0d41 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1356,4 +1356,12 @@ static inline void qdisc_synchronize(const struct Qdisc *q) msleep(1); } +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) +extern const struct Qdisc_class_ops sch_bpf_class_ops; + +int bpf_qdisc_init_pre_op(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack); +void bpf_qdisc_destroy_post_op(struct Qdisc *sch); +void bpf_qdisc_reset_post_op(struct Qdisc *sch); +#endif + #endif diff --git a/net/sched/bpf_qdisc.c b/net/sched/bpf_qdisc.c index 2a40452c2c9a..cb9088d0571a 100644 --- a/net/sched/bpf_qdisc.c +++ b/net/sched/bpf_qdisc.c @@ -9,9 +9,6 @@ static struct bpf_struct_ops bpf_Qdisc_ops; static u32 unsupported_ops[] = { - offsetof(struct Qdisc_ops, init), - offsetof(struct Qdisc_ops, reset), - offsetof(struct Qdisc_ops, destroy), offsetof(struct Qdisc_ops, change), offsetof(struct Qdisc_ops, attach), offsetof(struct Qdisc_ops, change_real_num_tx), @@ -191,8 +188,8 @@ static void sch_bpf_walk(struct Qdisc *sch, struct qdisc_walker *arg) } } -static int bpf_qdisc_init_op(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) +int bpf_qdisc_init_pre_op(struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) { struct bpf_sched_data *q = qdisc_priv(sch); int err; @@ -210,7 +207,7 @@ static int bpf_qdisc_init_op(struct Qdisc *sch, struct nlattr *opt, return 0; } -static void bpf_qdisc_reset_op(struct Qdisc *sch) +void bpf_qdisc_reset_post_op(struct Qdisc *sch) { struct bpf_sched_data *q = qdisc_priv(sch); struct sch_bpf_class *cl; @@ -233,7 +230,7 @@ static void bpf_qdisc_destroy_class(struct Qdisc *sch, struct sch_bpf_class *cl) kfree(cl); } -static void bpf_qdisc_destroy_op(struct Qdisc *sch) +void bpf_qdisc_destroy_post_op(struct Qdisc *sch) { struct bpf_sched_data *q = qdisc_priv(sch); struct sch_bpf_class *cl; @@ -255,7 +252,7 @@ static void bpf_qdisc_destroy_op(struct Qdisc *sch) qdisc_class_hash_destroy(&q->clhash); } -static const struct Qdisc_class_ops sch_bpf_class_ops = { +const struct Qdisc_class_ops sch_bpf_class_ops = { .graft = sch_bpf_graft, .leaf = sch_bpf_leaf, .find = sch_bpf_search, @@ -611,15 +608,6 @@ static int bpf_qdisc_init_member(const struct btf_type *t, return -EINVAL; qdisc_ops->priv_size = sizeof(struct bpf_sched_data); return 1; - case offsetof(struct Qdisc_ops, init): - qdisc_ops->init = bpf_qdisc_init_op; - return 1; - case offsetof(struct Qdisc_ops, reset): - qdisc_ops->reset = bpf_qdisc_reset_op; - return 1; - case offsetof(struct Qdisc_ops, destroy): - qdisc_ops->destroy = bpf_qdisc_destroy_op; - return 1; case offsetof(struct Qdisc_ops, peek): if (!uqdisc_ops->peek) qdisc_ops->peek = qdisc_peek_dequeued; diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 3b5ada5830cd..a81ceee55755 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1249,7 +1249,6 @@ static int qdisc_block_indexes_set(struct Qdisc *sch, struct nlattr **tca, Parameters are passed via opt. */ - static struct Qdisc *qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue, u32 parent, u32 handle, @@ -1352,6 +1351,13 @@ static struct Qdisc *qdisc_create(struct net_device *dev, rcu_assign_pointer(sch->stab, stab); } +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) + if (ops->cl_ops == &sch_bpf_class_ops) { + err = bpf_qdisc_init_pre_op(sch, tca[TCA_OPTIONS], extack); + if (err != 0) + goto err_out4; + } +#endif if (ops->init) { err = ops->init(sch, tca[TCA_OPTIONS], extack); if (err != 0) @@ -1388,6 +1394,10 @@ static struct Qdisc *qdisc_create(struct net_device *dev, */ if (ops->destroy) ops->destroy(sch); +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) + if (ops->cl_ops == &sch_bpf_class_ops) + bpf_qdisc_destroy_post_op(sch); +#endif qdisc_put_stab(rtnl_dereference(sch->stab)); err_out3: netdev_put(dev, &sch->dev_tracker); diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index f4343653db0f..385ae2974f00 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1024,6 +1024,10 @@ void qdisc_reset(struct Qdisc *qdisc) if (ops->reset) ops->reset(qdisc); +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) + if (ops->cl_ops == &sch_bpf_class_ops) + bpf_qdisc_reset_post_op(qdisc); +#endif __skb_queue_purge(&qdisc->gso_skb); __skb_queue_purge(&qdisc->skb_bad_txq); @@ -1067,6 +1071,10 @@ static void __qdisc_destroy(struct Qdisc *qdisc) if (ops->destroy) ops->destroy(qdisc); +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_BPF_JIT) + if (ops->cl_ops == &sch_bpf_class_ops) + bpf_qdisc_destroy_post_op(qdisc); +#endif bpf_module_put(ops, ops->owner); netdev_put(dev, &qdisc->dev_tracker); From patchwork Fri May 10 19:24:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661890 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-vs1-f46.google.com (mail-vs1-f46.google.com [209.85.217.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D76F53E13; Fri, 10 May 2024 19:24:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369067; cv=none; b=OxS0goo130ukHOd/iIZUJwEcQ67u+4LD3+S6epoEY1JVbOZyjq7LFmKLwWJEkFrpHAdS2OsD78YqR7WlJNAuSPjeem/qGODuJpHBHBa1iflqzeT9Tq9LqXHD6EH4mfQgKqHvJj5wS5mUzBy2F+jQxSWOmhX33lWSz+rnGK9knIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369067; c=relaxed/simple; bh=Sg/pCugTRPuGlOYqJCAn9Ck8okRhHNs6C/2UjYppjhs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=rkk9JwapuSogi3sS/M3pw5Mq3irHqy3Xd7EHbrrO/qnuiNSQSS1son3W2vtuJfVufJWUAe0PM2s9Ka0X5KNk7rfPCZVacNXMj2w893OvelKbjmQYW98KfvhHo0GCPYUCNxEV9ZvQvbr1sRbEgutM7RFLzHtcv+jrlXIR5Xc/yMs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jyZqvE8m; arc=none smtp.client-ip=209.85.217.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jyZqvE8m" Received: by mail-vs1-f46.google.com with SMTP id ada2fe7eead31-47ef5a51829so813086137.2; Fri, 10 May 2024 12:24:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369064; x=1715973864; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ie8ysy/g3R/YaoIh+WlTihyZrX28jNlDoRj210xQ3/w=; b=jyZqvE8mqV1YPeZrqy3B/I4cjvaXeF2occFBNnbJhll1Jzjg0l0se8WJlDarxvKLIt MzCX+VYFiltIjip3myA4nIZg9kk+YpMcriGTGeueYO31nGXYMbeimHnhuLvtzHL2uNdh +KS5mwjDbEgXZKKqpJ0RCMt0f17+qkRdq74iF8OUSOvcXI56hEWLPCXZrXzsWeRBTvi+ JS6WK6IeZ24bl38KIZff76iFUIMqBrvHh6tCJH9Z4NvKpbvfUMfPWAD/YIh9wZ+L7mKi xFikLn5Is+t/kCs9q0W9/ZBJv+fvRnACshzBRGygtI9zcmRTjb8aMeOqklqSBX3XepP9 5dCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369064; x=1715973864; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ie8ysy/g3R/YaoIh+WlTihyZrX28jNlDoRj210xQ3/w=; b=jwv2rcIZTIYbIBJV2wZX4S/bv+YQzN62FB554IBj+oSzxn1/FsiVwD92Ct3v9+eNYh oGkRCR6VC/O+AtJHo6Exz6JSQEcTaAGZNjshTcEMxpzfcc7kHwal29eiQwDFQvIoSOF0 sk/CF7Xa5YGQhO995wVvdsdOheKh2OkFZRkZrR8Bn0ThfulwRGOl5EHGR9wGA3Qb4ncz USlq9U+W2/Kz4eqjGuQSlIN2eKQtKIZlDfiteM1u4IxayUQE+Cpu7pmBjECeyMtbR/g8 WXkZAS9wiRmG92QVKhkLPN0bM118lUVVGek/WXDFm51y1WNWnTRXcS2wfJlRgopXj9fX WaAQ== X-Gm-Message-State: AOJu0Yw94pn9u9KsYbAjT4r/Xs3LKN43iwjQEztZvJzbITMygEhzCTqO eYxCSbc3M29QLwq8p3CNrUPdIm6BYGicbqUtsdABKZ3dhPYaXeje+gplZA== X-Google-Smtp-Source: AGHT+IGE9+RnmgGdFHRG0ftNAu564MzVV15w4WCa7/TQbCs/npIkKtrhZ8d2fS3eYt0WY7DVBTc09A== X-Received: by 2002:a05:6102:6ca:b0:47e:f3af:c569 with SMTP id ada2fe7eead31-48077e273b3mr3974508137.21.1715369064371; Fri, 10 May 2024 12:24:24 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:24 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 16/20] libbpf: Support creating and destroying qdisc Date: Fri, 10 May 2024 19:24:08 +0000 Message-Id: <20240510192412.3297104-17-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch extends support of adding and removing qdiscs beyond clsact qdisc. In bpf_tc_hook_create() and bpf_tc_hook_destroy(), a user can first set "attach_point" to BPF_TC_QDISC, and then specify the qdisc with "qdisc". Signed-off-by: Amery Hung --- tools/lib/bpf/libbpf.h | 5 ++++- tools/lib/bpf/netlink.c | 20 +++++++++++++++++--- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index f88ab50c0229..2da4bc6f0cc1 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1234,6 +1234,7 @@ enum bpf_tc_attach_point { BPF_TC_INGRESS = 1 << 0, BPF_TC_EGRESS = 1 << 1, BPF_TC_CUSTOM = 1 << 2, + BPF_TC_QDISC = 1 << 3, }; #define BPF_TC_PARENT(a, b) \ @@ -1248,9 +1249,11 @@ struct bpf_tc_hook { int ifindex; enum bpf_tc_attach_point attach_point; __u32 parent; + __u32 handle; + char *qdisc; size_t :0; }; -#define bpf_tc_hook__last_field parent +#define bpf_tc_hook__last_field qdisc struct bpf_tc_opts { size_t sz; diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 68a2def17175..72db8c0add21 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -529,9 +529,9 @@ int bpf_xdp_query_id(int ifindex, int flags, __u32 *prog_id) } -typedef int (*qdisc_config_t)(struct libbpf_nla_req *req); +typedef int (*qdisc_config_t)(struct libbpf_nla_req *req, struct bpf_tc_hook *hook); -static int clsact_config(struct libbpf_nla_req *req) +static int clsact_config(struct libbpf_nla_req *req, struct bpf_tc_hook *hook) { req->tc.tcm_parent = TC_H_CLSACT; req->tc.tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0); @@ -539,6 +539,16 @@ static int clsact_config(struct libbpf_nla_req *req) return nlattr_add(req, TCA_KIND, "clsact", sizeof("clsact")); } +static int qdisc_config(struct libbpf_nla_req *req, struct bpf_tc_hook *hook) +{ + char *qdisc = OPTS_GET(hook, qdisc, NULL); + + req->tc.tcm_parent = OPTS_GET(hook, parent, TC_H_ROOT); + req->tc.tcm_handle = OPTS_GET(hook, handle, 0); + + return nlattr_add(req, TCA_KIND, qdisc, strlen(qdisc) + 1); +} + static int attach_point_to_config(struct bpf_tc_hook *hook, qdisc_config_t *config) { @@ -552,6 +562,9 @@ static int attach_point_to_config(struct bpf_tc_hook *hook, return 0; case BPF_TC_CUSTOM: return -EOPNOTSUPP; + case BPF_TC_QDISC: + *config = &qdisc_config; + return 0; default: return -EINVAL; } @@ -596,7 +609,7 @@ static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags) req.tc.tcm_family = AF_UNSPEC; req.tc.tcm_ifindex = OPTS_GET(hook, ifindex, 0); - ret = config(&req); + ret = config(&req, hook); if (ret < 0) return ret; @@ -639,6 +652,7 @@ int bpf_tc_hook_destroy(struct bpf_tc_hook *hook) case BPF_TC_INGRESS: case BPF_TC_EGRESS: return libbpf_err(__bpf_tc_detach(hook, NULL, true)); + case BPF_TC_QDISC: case BPF_TC_INGRESS | BPF_TC_EGRESS: return libbpf_err(tc_qdisc_delete(hook)); case BPF_TC_CUSTOM: From patchwork Fri May 10 19:24:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661891 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C42355E75; Fri, 10 May 2024 19:24:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369067; cv=none; b=kQMxst43X8n4WAB4XIzzinCQh4cEffTuTEUColfM1K2da/uA+4aEy+DZuJ8VFqqBACojIdA593Q2VcpbC5p5m9XYXlHM1+t8Xnd0CblGnx9Pq9VCBZGDOSmGbr/eVPTj2rSN+/4XVDVu2P5jX/xxESGTVTYD129S1hqzD+liwIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369067; c=relaxed/simple; bh=X8JTZEt4CLyh8rSwnEp3audO/QsnByZlLOSQi/tN6RI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZebQShSY2sj3YTMNyrgh0QNef4Zt6NWucIuskO0rniQ9JQCzmoyGxzLc3iQ3vNtekhP8WUrFoP9eQL2MCIjrpd8lrwO5Zyf2sj/epmE6UOrFFU4uACU1fvYGbt3veV9w6RgugyKdpR9ZXM7dxKIJ4NntOoLpV2edcbSvBFM+GHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iRes9vu1; arc=none smtp.client-ip=209.85.219.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iRes9vu1" Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6a05f376effso17989216d6.0; Fri, 10 May 2024 12:24:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369065; x=1715973865; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TCyraiy4VfnPbvydKak/xLiI2YOc9+4gv5Onslt4dC0=; b=iRes9vu1xVYDIIWrftLomj9v4QwMOh7c93/sGRSzbktEJJCa7eJhgddQdKhoH+vMeH ZWU3tx83jCwMm0AkmlgsH4ol9k4OVbqZwEMcVaZPh6l6Fz1MC6M/124I3MAew0KOjsgq L5xAnVWDemG/GvqbLIdbASUy3E/RNRSK8ex9+L3oxNjz4c0lDpOMwOH/kGMtGui/h9hr W0ArlBMdnKDaERI6BQzjLPuNyrczZDtBtz+fh5SGDkH2Dx7SuClZMMtcl6MI8h+rsMkY TJUaFz8hU6Q02j4N8wqCsZg7MbVJtmdpN1GuY/QgwdgaNLVC06IrRxlqPu3olarZ1xtT UI/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369065; x=1715973865; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TCyraiy4VfnPbvydKak/xLiI2YOc9+4gv5Onslt4dC0=; b=jg0GDqdOQep2f4kHfIUc1se9MnfAfXCXjdr3WCNKFfu73uLL0ual8w3kfHVkYBOvpE Jhnd6uqz0yMME8iV11+TfZwCy2Lif+NRgWBQTD6PdZP1Sl7bQKYE1QJ8QRtumfFYVnNX VWBaPOpj41QtCZ3gFA6/g8KYE/J+tGL/YKdQONj0xuJxcYmGf047lY7fua9/Fh4VRZd6 ceutFEYPZMz1JTwTNi241mIYt9joZjaeC+Y1Rh1Jhc2xxqS+Xv6Q5r5L/h6o76KXNb0v DkbxfZwzElJVjWvmrQNmIs1xhmfTlIHTK1BwV22x9G2SbLCVCm8ah4qxKLrnM1ZwAvZH +cEQ== X-Gm-Message-State: AOJu0YwbYeI49C5ci68n6jBV6y9+2hlFLAIUkzfupARCOuNTUUyfTEMf +8VxqYLTYkhjAWBW5KMgoL9vzUqkcx8sqP9CdlDgruBLSIm73k5YiKbzoA== X-Google-Smtp-Source: AGHT+IHQcbKaoqUQe5Yu+hPiJ9h/Y1docuZVx6UMnXeuKEVSigP+I6iEa5AAujveEujJwSpTBZIoOA== X-Received: by 2002:a05:6214:3a09:b0:6a0:84f8:6b85 with SMTP id 6a1803df08f44-6a16823d998mr41479096d6.53.1715369065003; Fri, 10 May 2024 12:24:25 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:24 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 17/20] selftests: Add a basic fifo qdisc test Date: Fri, 10 May 2024 19:24:09 +0000 Message-Id: <20240510192412.3297104-18-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This selftest shows a bare minimum fifo qdisc, which simply enqueues skbs into the back of a bpf list and dequeues from the front of the list. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 161 ++++++++++++++++++ .../selftests/bpf/progs/bpf_qdisc_common.h | 23 +++ .../selftests/bpf/progs/bpf_qdisc_fifo.c | 83 +++++++++ 3 files changed, 267 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_common.h create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c new file mode 100644 index 000000000000..295d0216e70f --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -0,0 +1,161 @@ +#include +#include +#include + +#include "network_helpers.h" +#include "bpf_qdisc_fifo.skel.h" + +#ifndef ENOTSUPP +#define ENOTSUPP 524 +#endif + +#define LO_IFINDEX 1 + +static const unsigned int total_bytes = 10 * 1024 * 1024; +static int stop; + +static void *server(void *arg) +{ + int lfd = (int)(long)arg, err = 0, fd; + ssize_t nr_sent = 0, bytes = 0; + char batch[1500]; + + fd = accept(lfd, NULL, NULL); + while (fd == -1) { + if (errno == EINTR) + continue; + err = -errno; + goto done; + } + + if (settimeo(fd, 0)) { + err = -errno; + goto done; + } + + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_sent = send(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_sent == -1 && errno == EINTR) + continue; + if (nr_sent == -1) { + err = -errno; + break; + } + bytes += nr_sent; + } + + ASSERT_EQ(bytes, total_bytes, "send"); + +done: + if (fd >= 0) + close(fd); + if (err) { + WRITE_ONCE(stop, 1); + return ERR_PTR(err); + } + return NULL; +} + +static void do_test(char *qdisc) +{ + DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = LO_IFINDEX, + .attach_point = BPF_TC_QDISC, + .parent = TC_H_ROOT, + .handle = 0x8000000, + .qdisc = qdisc); + struct sockaddr_in6 sa6 = {}; + ssize_t nr_recv = 0, bytes = 0; + int lfd = -1, fd = -1; + pthread_t srv_thread; + socklen_t addrlen = sizeof(sa6); + void *thread_ret; + char batch[1500]; + int err; + + WRITE_ONCE(stop, 0); + + err = bpf_tc_hook_create(&hook); + if (!ASSERT_OK(err, "attach qdisc")) + return; + + lfd = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_NEQ(lfd, -1, "socket")) { + bpf_tc_hook_destroy(&hook); + return; + } + + fd = socket(AF_INET6, SOCK_STREAM, 0); + if (!ASSERT_NEQ(fd, -1, "socket")) { + bpf_tc_hook_destroy(&hook); + close(lfd); + return; + } + + if (settimeo(lfd, 0) || settimeo(fd, 0)) + goto done; + + err = getsockname(lfd, (struct sockaddr *)&sa6, &addrlen); + if (!ASSERT_NEQ(err, -1, "getsockname")) + goto done; + + /* connect to server */ + err = connect(fd, (struct sockaddr *)&sa6, addrlen); + if (!ASSERT_NEQ(err, -1, "connect")) + goto done; + + err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd); + if (!ASSERT_OK(err, "pthread_create")) + goto done; + + /* recv total_bytes */ + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_recv = recv(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_recv == -1 && errno == EINTR) + continue; + if (nr_recv == -1) + break; + bytes += nr_recv; + } + + ASSERT_EQ(bytes, total_bytes, "recv"); + + WRITE_ONCE(stop, 1); + pthread_join(srv_thread, &thread_ret); + ASSERT_OK(IS_ERR(thread_ret), "thread_ret"); + +done: + close(lfd); + close(fd); + + bpf_tc_hook_destroy(&hook); + return; +} + +static void test_fifo(void) +{ + struct bpf_qdisc_fifo *fifo_skel; + struct bpf_link *link; + + fifo_skel = bpf_qdisc_fifo__open_and_load(); + if (!ASSERT_OK_PTR(fifo_skel, "bpf_qdisc_fifo__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(fifo_skel->maps.fifo); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_fifo__destroy(fifo_skel); + return; + } + + do_test("bpf_fifo"); + + bpf_link__destroy(link); + bpf_qdisc_fifo__destroy(fifo_skel); +} + +void test_bpf_qdisc(void) +{ + if (test__start_subtest("fifo")) + test_fifo(); +} diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h new file mode 100644 index 000000000000..96ab357de28e --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h @@ -0,0 +1,23 @@ +#ifndef _BPF_QDISC_COMMON_H +#define _BPF_QDISC_COMMON_H + +#define NET_XMIT_SUCCESS 0x00 +#define NET_XMIT_DROP 0x01 /* skb dropped */ +#define NET_XMIT_CN 0x02 /* congestion notification */ + +#define TC_PRIO_CONTROL 7 +#define TC_PRIO_MAX 15 + +void bpf_skb_set_dev(struct sk_buff *skb, struct Qdisc *sch) __ksym; +u32 bpf_skb_get_hash(struct sk_buff *p) __ksym; +void bpf_skb_release(struct sk_buff *p) __ksym; +void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym; +void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) __ksym; +bool bpf_qdisc_find_class(struct Qdisc *sch, u32 classid) __ksym; +int bpf_qdisc_create_child(struct Qdisc *sch, u32 min, + struct netlink_ext_ack *extack) __ksym; +int bpf_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, u32 classid, + struct bpf_sk_buff_ptr *to_free_list) __ksym; +struct sk_buff *bpf_qdisc_dequeue(struct Qdisc *sch, u32 classid) __ksym; + +#endif diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c new file mode 100644 index 000000000000..433fd9c3639c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c @@ -0,0 +1,83 @@ +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(B) struct bpf_spin_lock q_fifo_lock; +private(B) struct bpf_list_head q_fifo __contains_kptr(sk_buff, bpf_list); + +unsigned int q_limit = 1000; +unsigned int q_qlen = 0; + +SEC("struct_ops/bpf_fifo_enqueue") +int BPF_PROG(bpf_fifo_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + q_qlen++; + if (q_qlen > q_limit) { + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; + } + + bpf_spin_lock(&q_fifo_lock); + bpf_list_excl_push_back(&q_fifo, &skb->bpf_list); + bpf_spin_unlock(&q_fifo_lock); + + return NET_XMIT_SUCCESS; +} + +SEC("struct_ops/bpf_fifo_dequeue") +struct sk_buff *BPF_PROG(bpf_fifo_dequeue, struct Qdisc *sch) +{ + struct sk_buff *skb; + struct bpf_list_excl_node *node; + + bpf_spin_lock(&q_fifo_lock); + node = bpf_list_excl_pop_front(&q_fifo); + bpf_spin_unlock(&q_fifo_lock); + if (!node) + return NULL; + + skb = container_of(node, struct sk_buff, bpf_list); + bpf_skb_set_dev(skb, sch); + q_qlen--; + + return skb; +} + +static int reset_fifo(u32 index, void *ctx) +{ + struct bpf_list_excl_node *node; + struct sk_buff *skb; + + bpf_spin_lock(&q_fifo_lock); + node = bpf_list_excl_pop_front(&q_fifo); + bpf_spin_unlock(&q_fifo_lock); + + if (!node) { + return 1; + } + + skb = container_of(node, struct sk_buff, bpf_list); + bpf_skb_release(skb); + return 0; +} + +SEC("struct_ops/bpf_fifo_reset") +void BPF_PROG(bpf_fifo_reset, struct Qdisc *sch) +{ + bpf_loop(q_qlen, reset_fifo, NULL, 0); + q_qlen = 0; +} + +SEC(".struct_ops") +struct Qdisc_ops fifo = { + .enqueue = (void *)bpf_fifo_enqueue, + .dequeue = (void *)bpf_fifo_dequeue, + .reset = (void *)bpf_fifo_reset, + .id = "bpf_fifo", +}; + From patchwork Fri May 10 19:24:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661892 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E7835F87D; Fri, 10 May 2024 19:24:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369069; cv=none; b=knlKHEsDmgRY0ZIrKbJg3lcSd9MD6tg/0uscmJfdMG5z52hnuQBIgE74ie7HvbFUgkx5NNTC4OXa0tPc3fCg1nUqQXqMZYfZGQL1BcpvxMHD8MZXCMFBIzUJSgQP2WA4itB54Z9LHl0LvSAJaIrMLFGgYABs6JnzZ8Ypn94ssio= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369069; c=relaxed/simple; bh=pNh7tvOl6BykfommEul44M+8qE3CQz31nNxdYKQMe/c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=A/69uOBbd5VE+oSdcbs7sPDWECK77sNIc5Blq7pp8A5r03doUIRpnYeZTM+o7QJdB+QMK5uVvaDqjFK2cXTk6H1ta3csgdE1uG9itlzZa+Lf++ynQ2N0r1BVzbMDTj8mQiS0Xg8D1nDvzJgiWN4QCbnlvkcqpvFeAOLsqdEA01E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ha6yA7vm; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ha6yA7vm" Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-434c695ec3dso13861011cf.0; Fri, 10 May 2024 12:24:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369066; x=1715973866; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xsw2WtcdY5iX/KGsvWk9j4qZlJKKVnpOsehein5lkWc=; b=ha6yA7vmeqD4ekU9vWFyH69zo7UL9TQrFHk3ZdT5X5OnV/wR/B030CasmSzId+iFqi iEiPLTngMREgpnY4ebuboNWtZTPC62dG/MgpHQZWhhw6AqtXeXTjE6/WP90biXLEc18o 2Aig7ty77VMzMpmqm+BRP+RI9NN7ZaL9E592s+rKMf15Fqzft7igfhgKeh27O6lQNx1i XyetqShZwsFZE2hpU6+uWc/cX0yL9SKhBfuYdiJBEnwjksN6EyTbyuVdaUiaASAGwqGN HdEWWlHPsNPwL+M313lUOul3I/jOhaaypWwVxDf9ojXFYlixL/Gi8n+8CbJ498Wen93e EvZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369066; x=1715973866; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xsw2WtcdY5iX/KGsvWk9j4qZlJKKVnpOsehein5lkWc=; b=Du0SJnjnoUaIKjyzUyyzrrwYiJJkoTLhKL5mou64XgTStFyWvCvcCo/dvhtsXDlsEJ faqQtsXU4CRo9UnPcEbO7uMRW4YeuGYLZXwpGe9qOmpqHLZmiIYJnezMZowP1HahDzM4 9Qz2Sp0ka6SJcUofm3wEbb1ihhI/Sr1oZbIchbPvmtVWLxH2L6XkNHQtjCAUwkxsGE5i VvBU4c6oR0BA77JzXDg614jhSeg0JsrNPWBnkCeSidSEX4Vc6VNBdYKG2z1F2Ph1tZlv HFDIrj9dnWsAHbBrcA5BZCmp97xtAhDvXqJC5khdJqk7m7XYEcfk2og6VQXJylq/rCw5 F++Q== X-Gm-Message-State: AOJu0YxFC+myoJXy3BWBNSPw/Hgw0S9bqKWexHznHzNgbcJck/n5NzFG nd56jrP486W7pKtgAgfkVWBxQ6IpUVGi/ehdwdGE5UasxI6A+lfeZhhcbA== X-Google-Smtp-Source: AGHT+IFaSBrw/scnD/dBOVbCLFa27OzZOEgMk0wA4ZAa8fe7taVQbx7RxsYO2ug0iNW3VMcJG2MsOw== X-Received: by 2002:a05:622a:c4:b0:43a:10cd:9d3 with SMTP id d75a77b69052e-43dfdaba9c2mr42594471cf.11.1715369065800; Fri, 10 May 2024 12:24:25 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:25 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 18/20] selftests: Add a bpf fq qdisc to selftest Date: Fri, 10 May 2024 19:24:10 +0000 Message-Id: <20240510192412.3297104-19-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This test implements a more sophisticated qdisc using bpf. The bpf fair- queueing (fq) qdisc gives each flow an equal chance to transmit data. It also respects the timestamp of skb for rate limiting. The implementation does not prevent hash collision of flows nor does it recycle flows. The bpf fq also takes the chance to communicate packet drop information with a bpf clsact EDT rate limiter using bpf maps. With the info, the rate limiter can compenstate the delay caused by packet drops in qdisc to maintain the throughput. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 24 + .../selftests/bpf/progs/bpf_qdisc_fq.c | 660 ++++++++++++++++++ 2 files changed, 684 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c index 295d0216e70f..394bf5a4adae 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -4,6 +4,7 @@ #include "network_helpers.h" #include "bpf_qdisc_fifo.skel.h" +#include "bpf_qdisc_fq.skel.h" #ifndef ENOTSUPP #define ENOTSUPP 524 @@ -154,8 +155,31 @@ static void test_fifo(void) bpf_qdisc_fifo__destroy(fifo_skel); } +static void test_fq(void) +{ + struct bpf_qdisc_fq *fq_skel; + struct bpf_link *link; + + fq_skel = bpf_qdisc_fq__open_and_load(); + if (!ASSERT_OK_PTR(fq_skel, "bpf_qdisc_fq__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(fq_skel->maps.fq); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_fq__destroy(fq_skel); + return; + } + + do_test("bpf_fq"); + + bpf_link__destroy(link); + bpf_qdisc_fq__destroy(fq_skel); +} + void test_bpf_qdisc(void) { if (test__start_subtest("fifo")) test_fifo(); + if (test__start_subtest("fq")) + test_fq(); } diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c new file mode 100644 index 000000000000..5118237da9e4 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c @@ -0,0 +1,660 @@ +#include +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +#define NSEC_PER_USEC 1000L +#define NSEC_PER_SEC 1000000000L +#define PSCHED_MTU (64 * 1024 + 14) + +#define NUM_QUEUE_LOG 10 +#define NUM_QUEUE (1 << NUM_QUEUE_LOG) +#define PRIO_QUEUE (NUM_QUEUE + 1) +#define COMP_DROP_PKT_DELAY 1 +#define THROTTLED 0xffffffffffffffff + +/* fq configuration */ +__u64 q_flow_refill_delay = 40 * 10000; //40us +__u64 q_horizon = 10ULL * NSEC_PER_SEC; +__u32 q_initial_quantum = 10 * PSCHED_MTU; +__u32 q_quantum = 2 * PSCHED_MTU; +__u32 q_orphan_mask = 1023; +__u32 q_flow_plimit = 100; +__u32 q_plimit = 10000; +__u32 q_timer_slack = 10 * NSEC_PER_USEC; +bool q_horizon_drop = true; + +bool q_compensate_tstamp; +bool q_random_drop; + +unsigned long time_next_delayed_flow = ~0ULL; +unsigned long unthrottle_latency_ns = 0ULL; +unsigned long ktime_cache = 0; +unsigned long dequeue_now; +unsigned int fq_qlen = 0; + +struct fq_flow_node { + u32 hash; + int credit; + u32 qlen; + u32 socket_hash; + u64 age; + u64 time_next_packet; + struct bpf_list_node list_node; + struct bpf_rb_node rb_node; + struct bpf_rb_root queue __contains_kptr(sk_buff, bpf_rbnode); + struct bpf_spin_lock lock; + struct bpf_refcount refcount; +}; + +struct dequeue_nonprio_ctx { + bool dequeued; + u64 expire; +}; + +struct fq_stashed_flow { + struct fq_flow_node __kptr *flow; +}; + +struct stashed_skb { + struct sk_buff __kptr *skb; +}; + +/* [NUM_QUEUE] for TC_PRIO_CONTROL + * [0, NUM_QUEUE - 1] for other flows + */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, __u32); + __type(value, struct fq_stashed_flow); + __uint(max_entries, NUM_QUEUE + 1); +} fq_stashed_flows SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u32); + __type(value, __u64); + __uint(pinning, LIBBPF_PIN_BY_NAME); + __uint(max_entries, 16); +} rate_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u32); + __type(value, __u64); + __uint(pinning, LIBBPF_PIN_BY_NAME); + __uint(max_entries, 16); +} comp_map SEC(".maps"); + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock fq_delayed_lock; +private(A) struct bpf_rb_root fq_delayed __contains(fq_flow_node, rb_node); + +private(B) struct bpf_spin_lock fq_new_flows_lock; +private(B) struct bpf_list_head fq_new_flows __contains(fq_flow_node, list_node); + +private(C) struct bpf_spin_lock fq_old_flows_lock; +private(C) struct bpf_list_head fq_old_flows __contains(fq_flow_node, list_node); + +private(D) struct bpf_spin_lock fq_stashed_skb_lock; +private(D) struct bpf_list_head fq_stashed_skb __contains_kptr(sk_buff, bpf_list); + +static __always_inline bool bpf_kptr_xchg_back(void *map_val, void *ptr) +{ + void *ret; + + ret = bpf_kptr_xchg(map_val, ptr); + if (ret) { //unexpected + bpf_obj_drop(ret); + return false; + } + return true; +} + +static __always_inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb) +{ + return (struct qdisc_skb_cb *)skb->cb; +} + +static __always_inline int hash64(u64 val, int bits) +{ + return val * 0x61C8864680B583EBull >> (64 - bits); +} + +static bool skb_tstamp_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct sk_buff *skb_a; + struct sk_buff *skb_b; + + skb_a = container_of(a, struct sk_buff, bpf_rbnode); + skb_b = container_of(b, struct sk_buff, bpf_rbnode); + + return skb_a->tstamp < skb_b->tstamp; +} + +static bool fn_time_next_packet_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct fq_flow_node *flow_a; + struct fq_flow_node *flow_b; + + flow_a = container_of(a, struct fq_flow_node, rb_node); + flow_b = container_of(b, struct fq_flow_node, rb_node); + + return flow_a->time_next_packet < flow_b->time_next_packet; +} + +static __always_inline void +fq_flows_add_head(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct fq_flow_node *flow) +{ + bpf_spin_lock(lock); + bpf_list_push_front(head, &flow->list_node); + bpf_spin_unlock(lock); +} + +static __always_inline void +fq_flows_add_tail(struct bpf_list_head *head, struct bpf_spin_lock *lock, + struct fq_flow_node *flow) +{ + bpf_spin_lock(lock); + bpf_list_push_back(head, &flow->list_node); + bpf_spin_unlock(lock); +} + +static __always_inline bool +fq_flows_is_empty(struct bpf_list_head *head, struct bpf_spin_lock *lock) +{ + struct bpf_list_node *node; + + bpf_spin_lock(lock); + node = bpf_list_pop_front(head); + if (node) { + bpf_list_push_front(head, node); + bpf_spin_unlock(lock); + return false; + } + bpf_spin_unlock(lock); + + return true; +} + +static __always_inline void fq_flow_set_detached(struct fq_flow_node *flow) +{ + flow->age = bpf_jiffies64(); + bpf_obj_drop(flow); +} + +static __always_inline bool fq_flow_is_detached(struct fq_flow_node *flow) +{ + return flow->age != 0 && flow->age != THROTTLED; +} + +static __always_inline bool fq_flow_is_throttled(struct fq_flow_node *flow) +{ + return flow->age != THROTTLED; +} + +static __always_inline bool sk_listener(struct sock *sk) +{ + return (1 << sk->__sk_common.skc_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV); +} + +static __always_inline int +fq_classify(struct sk_buff *skb, u32 *hash, struct fq_stashed_flow **sflow, + bool *connected, u32 *sk_hash) +{ + struct fq_flow_node *flow; + struct sock *sk = skb->sk; + + *connected = false; + + if ((skb->priority & TC_PRIO_MAX) == TC_PRIO_CONTROL) { + *hash = PRIO_QUEUE; + } else { + if (!sk || sk_listener(sk)) { + *sk_hash = bpf_skb_get_hash(skb) & q_orphan_mask; + *sk_hash = (*sk_hash << 1 | 1); + } else if (sk->__sk_common.skc_state == TCP_CLOSE) { + *sk_hash = bpf_skb_get_hash(skb) & q_orphan_mask; + *sk_hash = (*sk_hash << 1 | 1); + } else { + *sk_hash = sk->__sk_common.skc_hash; + *connected = true; + } + *hash = hash64(*sk_hash, NUM_QUEUE_LOG); + } + + *sflow = bpf_map_lookup_elem(&fq_stashed_flows, hash); + if (!*sflow) + return -1; //unexpected + + if ((*sflow)->flow) + return 0; + + flow = bpf_obj_new(typeof(*flow)); + if (!flow) + return -1; + + flow->hash = *hash; + flow->credit = q_initial_quantum; + flow->qlen = 0; + flow->age = 1UL; + flow->time_next_packet = 0; + + bpf_kptr_xchg_back(&(*sflow)->flow, flow); + + return 0; +} + +static __always_inline bool fq_packet_beyond_horizon(struct sk_buff *skb) +{ + return (s64)skb->tstamp > (s64)(ktime_cache + q_horizon); +} + +SEC("struct_ops/bpf_fq_enqueue") +int BPF_PROG(bpf_fq_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr); + u64 time_to_send, jiffies, delay_ns, *comp_ns, *rate; + struct fq_flow_node *flow = NULL, *flow_copy; + struct fq_stashed_flow *sflow; + u32 hash, daddr, sk_hash; + bool connected; + + if (q_random_drop & (bpf_get_prandom_u32() > ~0U * 0.90)) + goto drop; + + if (fq_qlen >= q_plimit) + goto drop; + + if (!skb->tstamp) { + time_to_send = ktime_cache = bpf_ktime_get_ns(); + } else { + if (fq_packet_beyond_horizon(skb)) { + ktime_cache = bpf_ktime_get_ns(); + if (fq_packet_beyond_horizon(skb)) { + if (q_horizon_drop) + goto drop; + + skb->tstamp = ktime_cache + q_horizon; + } + } + time_to_send = skb->tstamp; + } + + if (fq_classify(skb, &hash, &sflow, &connected, &sk_hash) < 0) + goto drop; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (!flow) + goto drop; //unexpected + + if (hash != PRIO_QUEUE) { + if (connected && flow->socket_hash != sk_hash) { + flow->credit = q_initial_quantum; + flow->socket_hash = sk_hash; + if (fq_flow_is_throttled(flow)) { + /* mark the flow as undetached. The reference to the + * throttled flow in fq_delayed will be removed later. + */ + flow_copy = bpf_refcount_acquire(flow); + flow_copy->age = 0; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow_copy); + } + flow->time_next_packet = 0ULL; + } + + if (flow->qlen >= q_flow_plimit) { + bpf_kptr_xchg_back(&sflow->flow, flow); + goto drop; + } + + if (fq_flow_is_detached(flow)) { + if (connected) + flow->socket_hash = sk_hash; + + flow_copy = bpf_refcount_acquire(flow); + + jiffies = bpf_jiffies64(); + if ((s64)(jiffies - (flow_copy->age + q_flow_refill_delay)) > 0) { + if (flow_copy->credit < q_quantum) + flow_copy->credit = q_quantum; + } + flow_copy->age = 0; + fq_flows_add_tail(&fq_new_flows, &fq_new_flows_lock, flow_copy); + } + } + + skb->tstamp = time_to_send; + + bpf_spin_lock(&flow->lock); + bpf_rbtree_excl_add(&flow->queue, &skb->bpf_rbnode, skb_tstamp_less); + bpf_spin_unlock(&flow->lock); + + flow->qlen++; + bpf_kptr_xchg_back(&sflow->flow, flow); + + fq_qlen++; + return NET_XMIT_SUCCESS; + +drop: + if (q_compensate_tstamp) { + bpf_probe_read_kernel(&daddr, sizeof(daddr), &iph->daddr); + rate = bpf_map_lookup_elem(&rate_map, &daddr); + comp_ns = bpf_map_lookup_elem(&comp_map, &daddr); + if (rate && comp_ns) { + delay_ns = (u64)qdisc_skb_cb(skb)->pkt_len * NSEC_PER_SEC / (*rate); + __sync_fetch_and_add(comp_ns, delay_ns); + } + } + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; +} + +static int fq_unset_throttled_flows(u32 index, bool *unset_all) +{ + struct bpf_rb_node *node = NULL; + struct fq_flow_node *flow; + + bpf_spin_lock(&fq_delayed_lock); + + node = bpf_rbtree_first(&fq_delayed); + if (!node) { + bpf_spin_unlock(&fq_delayed_lock); + return 1; + } + + flow = container_of(node, struct fq_flow_node, rb_node); + if (!*unset_all && flow->time_next_packet > dequeue_now) { + time_next_delayed_flow = flow->time_next_packet; + bpf_spin_unlock(&fq_delayed_lock); + return 1; + } + + node = bpf_rbtree_remove(&fq_delayed, &flow->rb_node); + + bpf_spin_unlock(&fq_delayed_lock); + + if (!node) + return 1; //unexpected + + flow = container_of(node, struct fq_flow_node, rb_node); + + /* the flow was recycled during enqueue() */ + if (flow->age != THROTTLED) { + bpf_obj_drop(flow); + return 0; + } + + flow->age = 0; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow); + + return 0; +} + +static __always_inline void fq_flow_set_throttled(struct fq_flow_node *flow) +{ + flow->age = THROTTLED; + + if (time_next_delayed_flow > flow->time_next_packet) + time_next_delayed_flow = flow->time_next_packet; + + bpf_spin_lock(&fq_delayed_lock); + bpf_rbtree_add(&fq_delayed, &flow->rb_node, fn_time_next_packet_less); + bpf_spin_unlock(&fq_delayed_lock); +} + +static __always_inline void fq_check_throttled(void) +{ + bool unset_all = false; + unsigned long sample; + + if (time_next_delayed_flow > dequeue_now) + return; + + sample = (unsigned long)(dequeue_now - time_next_delayed_flow); + unthrottle_latency_ns -= unthrottle_latency_ns >> 3; + unthrottle_latency_ns += sample >> 3; + + time_next_delayed_flow = ~0ULL; + bpf_loop(NUM_QUEUE, fq_unset_throttled_flows, &unset_all, 0); +} + +static __always_inline void stash_skb(struct sk_buff *skb) +{ + bpf_spin_lock(&fq_stashed_skb_lock); + bpf_list_excl_push_back(&fq_stashed_skb, &skb->bpf_list); + bpf_spin_unlock(&fq_stashed_skb_lock); +} + +static __always_inline struct sk_buff *get_stashed_skb() +{ + struct bpf_list_excl_node *node; + struct sk_buff *skb; + + bpf_spin_lock(&fq_stashed_skb_lock); + node = bpf_list_excl_pop_front(&fq_stashed_skb); + bpf_spin_unlock(&fq_stashed_skb_lock); + if (!node) + return NULL; + + skb = container_of(node, struct sk_buff, bpf_list); + return skb; +} + +static int +fq_dequeue_nonprio_flows(u32 index, struct dequeue_nonprio_ctx *ctx) +{ + u64 time_next_packet, time_to_send; + struct bpf_rb_excl_node *rb_node; + struct sk_buff *skb = NULL; + struct bpf_list_head *head; + struct bpf_list_node *node; + struct bpf_spin_lock *lock; + struct fq_flow_node *flow; + bool is_empty; + + head = &fq_new_flows; + lock = &fq_new_flows_lock; + bpf_spin_lock(&fq_new_flows_lock); + node = bpf_list_pop_front(&fq_new_flows); + bpf_spin_unlock(&fq_new_flows_lock); + if (!node) { + head = &fq_old_flows; + lock = &fq_old_flows_lock; + bpf_spin_lock(&fq_old_flows_lock); + node = bpf_list_pop_front(&fq_old_flows); + bpf_spin_unlock(&fq_old_flows_lock); + if (!node) { + if (time_next_delayed_flow != ~0ULL) + ctx->expire = time_next_delayed_flow; + return 1; + } + } + + flow = container_of(node, struct fq_flow_node, list_node); + if (flow->credit <= 0) { + flow->credit += q_quantum; + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow); + return 0; + } + + bpf_spin_lock(&flow->lock); + rb_node = bpf_rbtree_excl_first(&flow->queue); + if (!rb_node) { + bpf_spin_unlock(&flow->lock); + is_empty = fq_flows_is_empty(&fq_old_flows, &fq_old_flows_lock); + if (head == &fq_new_flows && !is_empty) + fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow); + else + fq_flow_set_detached(flow); + + return 0; + } + + skb = container_of(rb_node, struct sk_buff, bpf_rbnode); + time_to_send = skb->tstamp; + + time_next_packet = (time_to_send > flow->time_next_packet) ? + time_to_send : flow->time_next_packet; + if (dequeue_now < time_next_packet) { + bpf_spin_unlock(&flow->lock); + flow->time_next_packet = time_next_packet; + fq_flow_set_throttled(flow); + return 0; + } + + rb_node = bpf_rbtree_excl_remove(&flow->queue, rb_node); + bpf_spin_unlock(&flow->lock); + + if (!rb_node) { + fq_flows_add_tail(head, lock, flow); + return 0; //unexpected + } + + skb = container_of(rb_node, struct sk_buff, bpf_rbnode); + + flow->credit -= qdisc_skb_cb(skb)->pkt_len; + flow->qlen--; + fq_qlen--; + + ctx->dequeued = true; + stash_skb(skb); + + fq_flows_add_head(head, lock, flow); + + return 1; +} + +static __always_inline struct sk_buff *fq_dequeue_prio(void) +{ + struct fq_flow_node *flow = NULL; + struct fq_stashed_flow *sflow; + struct sk_buff *skb = NULL; + struct bpf_rb_excl_node *node; + u32 hash = NUM_QUEUE; + + sflow = bpf_map_lookup_elem(&fq_stashed_flows, &hash); + if (!sflow) + return NULL; //unexpected + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (!flow) + return NULL; + + bpf_spin_lock(&flow->lock); + node = bpf_rbtree_excl_first(&flow->queue); + if (!node) { + bpf_spin_unlock(&flow->lock); + goto xchg_flow_back; + } + + skb = container_of(node, struct sk_buff, bpf_rbnode); + node = bpf_rbtree_excl_remove(&flow->queue, &skb->bpf_rbnode); + bpf_spin_unlock(&flow->lock); + + if (!node) { + skb = NULL; + goto xchg_flow_back; + } + + skb = container_of(node, struct sk_buff, bpf_rbnode); + fq_qlen--; + +xchg_flow_back: + bpf_kptr_xchg_back(&sflow->flow, flow); + + return skb; +} + +SEC("struct_ops/bpf_fq_dequeue") +struct sk_buff *BPF_PROG(bpf_fq_dequeue, struct Qdisc *sch) +{ + struct dequeue_nonprio_ctx cb_ctx = {}; + struct sk_buff *skb = NULL; + + skb = fq_dequeue_prio(); + if (skb) { + bpf_skb_set_dev(skb, sch); + return skb; + } + + ktime_cache = dequeue_now = bpf_ktime_get_ns(); + fq_check_throttled(); + bpf_loop(q_plimit, fq_dequeue_nonprio_flows, &cb_ctx, 0); + + skb = get_stashed_skb(); + + if (skb) { + bpf_skb_set_dev(skb, sch); + return skb; + } + + if (cb_ctx.expire) + bpf_qdisc_watchdog_schedule(sch, cb_ctx.expire, q_timer_slack); + + return NULL; +} + +static int +fq_reset_flows(u32 index, void *ctx) +{ + struct bpf_list_node *node; + struct fq_flow_node *flow; + + bpf_spin_lock(&fq_new_flows_lock); + node = bpf_list_pop_front(&fq_new_flows); + bpf_spin_unlock(&fq_new_flows_lock); + if (!node) { + bpf_spin_lock(&fq_old_flows_lock); + node = bpf_list_pop_front(&fq_old_flows); + bpf_spin_unlock(&fq_old_flows_lock); + if (!node) + return 1; + } + + flow = container_of(node, struct fq_flow_node, list_node); + bpf_obj_drop(flow); + + return 0; +} + +static int +fq_reset_stashed_flows(u32 index, void *ctx) +{ + struct fq_flow_node *flow = NULL; + struct fq_stashed_flow *sflow; + + sflow = bpf_map_lookup_elem(&fq_stashed_flows, &index); + if (!sflow) + return 0; + + flow = bpf_kptr_xchg(&sflow->flow, flow); + if (flow) + bpf_obj_drop(flow); + + return 0; +} + +SEC("struct_ops/bpf_fq_reset") +void BPF_PROG(bpf_fq_reset, struct Qdisc *sch) +{ + bool unset_all = true; + fq_qlen = 0; + bpf_loop(NUM_QUEUE + 1, fq_reset_stashed_flows, NULL, 0); + bpf_loop(NUM_QUEUE, fq_reset_flows, NULL, 0); + bpf_loop(NUM_QUEUE, fq_unset_throttled_flows, &unset_all, 0); + return; +} + +SEC(".struct_ops") +struct Qdisc_ops fq = { + .enqueue = (void *)bpf_fq_enqueue, + .dequeue = (void *)bpf_fq_dequeue, + .reset = (void *)bpf_fq_reset, + .id = "bpf_fq", +}; From patchwork Fri May 10 19:24:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661894 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86849745D6; Fri, 10 May 2024 19:24:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369070; cv=none; b=HcK0KPZ8z46Ix7meNOstm4LT7HCAhH1Eo6t6yxXpFHyW2MQr/fH313a4+1qXGTw1kOOVXycuLjr9NGCyukm0oCODLD3y4sBfeW4xL3PxQfYF2gOXiGLDoHa0H+kHFyrptb8n3SHVpQ2Yqe1IFgBy6kkD+BFpBQ9P2Iov39nqE4w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369070; c=relaxed/simple; bh=McH3JQz3ARd65YOEk+fjyk5G0KlyuzzdR67QHonGxHE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LoWtiuJCCO64qbVOlRlkogRxYEuR5jCn2ROYL0Yl2FvFtw9arsnRc4ytNS19YJk9NvgO/iAVUOD2p41Pu4K7l+p/WLOiJs35FtdqOTQtMdER1ina/M7PSeK8J2YGFq5fhlBdBdg9WzHwRuK5W3mToMghsBcjJN9bJRrS23mwSsM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lnisPoIT; arc=none smtp.client-ip=209.85.219.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lnisPoIT" Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6a0ffaa079dso29924086d6.1; Fri, 10 May 2024 12:24:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369066; x=1715973866; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gcFgc0dd4kFkfH1AU13CU7hfuaB5/OLt6xXwuSXRk+A=; b=lnisPoITDmtZ03N6I8b5VJB4ifojIiJulecyerhBLeQbUT7rBhJjq8bElHKVSlnybT iHIfsr6tcO2NQrXXuog7dGPWxf4B+J2VKmU5+R9y3B27mdf2uj9qfHkWnOQ8SJVSwyRi VWsQrK23r0Y6edhK7ttIYxapQVZGLMavZPsh6qSYcUeKhTnxE0YDRTofNlvt5OxlVaI0 ndqWFRKhB1I7wgx2sLt8/ThuQcnTmDDKEwiSYH/6lM3m00mlurAXp1BkLqTKZtQ3fBnu VPO/ja8jQSDcjQmE5BzEiAKZ2BdjWu/bPYFWdKlBYR/IESfyhT0EOKc1yGx/G1P75Ys+ N8Vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369066; x=1715973866; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gcFgc0dd4kFkfH1AU13CU7hfuaB5/OLt6xXwuSXRk+A=; b=LbfviINWth0h2UIfXigAG8lgXYNEmnNJRYmB5kcA3qQ/lB/Uq92azBrU4aJk3QF0En 7e/D4DIEPkR6fjIYBIpmpp82Px+MTJQplRA41iOOq27EI3TyrJkWdW/5VN64+kcz7M3i VZeq0hp49Loaiga4D4hXbBUEvYXRXlWdaCRvKa10MfgeSvRRd6pRsuDfurBncxzlt+rh khgLE2WcOO5FabiuUjJo6S0suPDrVirVD0HvJqMchaK8SGnp4S3zEFd0GvdwK06M/T+C AWuaaWLKXV28miFX9elXSX1RpK73d6UVzcgpj/meYPBDEMCFq6vCqqz6YUxHTV+MUAeS xUjA== X-Gm-Message-State: AOJu0Yzt4Ub2aEwv3C98QgiqTF0IRrisMQAyzJXnWbNAuPnDTYHVjSHZ WzSYNp97NAa2g3YSrDpfHjAnh7u/TiAOAuOJ7zrBGCZnZo9Q8XJA097k9w== X-Google-Smtp-Source: AGHT+IF2V0gfhmoBGlHwEaxVpr8nk38rG81YpQ1QxO5eYANQguLdkdt9bISew15DIlcqnR7N6WXmAA== X-Received: by 2002:a0c:f301:0:b0:6a0:f637:667 with SMTP id 6a1803df08f44-6a15cbc48a1mr116205646d6.12.1715369066495; Fri, 10 May 2024 12:24:26 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:26 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 19/20] selftests: Add a bpf netem qdisc to selftest Date: Fri, 10 May 2024 19:24:11 +0000 Message-Id: <20240510192412.3297104-20-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This test implements a simple network emulator qdisc that simulates packet drop, loss and delay. The qdisc uses Gilbert-Elliott model to simulate packet drops. When used with mq qdisc, the bpf netem qdiscs on different tx queues maintain a global state machine using a bpf map. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 30 +++ .../selftests/bpf/progs/bpf_qdisc_netem.c | 236 ++++++++++++++++++ 2 files changed, 266 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c index 394bf5a4adae..ec9c0d166e89 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -6,6 +6,13 @@ #include "bpf_qdisc_fifo.skel.h" #include "bpf_qdisc_fq.skel.h" +struct crndstate { + u32 last; + u32 rho; +}; + +#include "bpf_qdisc_netem.skel.h" + #ifndef ENOTSUPP #define ENOTSUPP 524 #endif @@ -176,10 +183,33 @@ static void test_fq(void) bpf_qdisc_fq__destroy(fq_skel); } +static void test_netem(void) +{ + struct bpf_qdisc_netem *netem_skel; + struct bpf_link *link; + + netem_skel = bpf_qdisc_netem__open_and_load(); + if (!ASSERT_OK_PTR(netem_skel, "bpf_qdisc_netem__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(netem_skel->maps.netem); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_netem__destroy(netem_skel); + return; + } + + do_test("bpf_netem"); + + bpf_link__destroy(link); + bpf_qdisc_netem__destroy(netem_skel); +} + void test_bpf_qdisc(void) { if (test__start_subtest("fifo")) test_fifo(); if (test__start_subtest("fq")) test_fq(); + if (test__start_subtest("netem")) + test_netem(); } diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c new file mode 100644 index 000000000000..c1df73cdbd3e --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c @@ -0,0 +1,236 @@ +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(A) struct bpf_spin_lock t_root_lock; +private(A) struct bpf_rb_root t_root __contains_kptr(sk_buff, bpf_rbnode); + +int q_loss_model = CLG_GILB_ELL; +unsigned int q_limit = 1000; +signed long q_latency = 0; +signed long q_jitter = 0; +unsigned int q_loss = 1; +unsigned int q_qlen = 0; + +struct crndstate q_loss_cor = {.last = 0, .rho = 0,}; +struct crndstate q_delay_cor = {.last = 0, .rho = 0,}; + +struct clg_state { + u64 state; + u32 a1; + u32 a2; + u32 a3; + u32 a4; + u32 a5; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, __u32); + __type(value, struct clg_state); + __uint(max_entries, 1); +} g_clg_state SEC(".maps"); + +static bool skb_tstamp_less(struct bpf_rb_node *a, const struct bpf_rb_node *b) +{ + struct sk_buff *skb_a; + struct sk_buff *skb_b; + + skb_a = container_of(a, struct sk_buff, bpf_rbnode); + skb_b = container_of(b, struct sk_buff, bpf_rbnode); + + return skb_a->tstamp < skb_b->tstamp; +} + +static __always_inline u32 get_crandom(struct crndstate *state) +{ + u64 value, rho; + unsigned long answer; + + if (!state || state->rho == 0) /* no correlation */ + return bpf_get_prandom_u32(); + + value = bpf_get_prandom_u32(); + rho = (u64)state->rho + 1; + answer = (value * ((1ull<<32) - rho) + state->last * rho) >> 32; + state->last = answer; + return answer; +} + +static __always_inline s64 tabledist(s64 mu, s32 sigma, struct crndstate *state) +{ + u32 rnd; + + if (sigma == 0) + return mu; + + rnd = get_crandom(state); + + /* default uniform distribution */ + return ((rnd % (2 * (u32)sigma)) + mu) - sigma; +} + +static __always_inline bool loss_gilb_ell(void) +{ + struct clg_state *clg; + u32 r1, r2, key = 0; + bool ret = false; + + clg = bpf_map_lookup_elem(&g_clg_state, &key); + if (!clg) + return false; + + r1 = bpf_get_prandom_u32(); + r2 = bpf_get_prandom_u32(); + + switch (clg->state) { + case GOOD_STATE: + if (r1 < clg->a1) + __sync_val_compare_and_swap(&clg->state, + GOOD_STATE, BAD_STATE); + if (r2 < clg->a4) + ret = true; + break; + case BAD_STATE: + if (r1 < clg->a2) + __sync_val_compare_and_swap(&clg->state, + BAD_STATE, GOOD_STATE); + if (r2 > clg->a3) + ret = true; + } + + return ret; +} + +static __always_inline bool loss_event(void) +{ + switch (q_loss_model) { + case CLG_RANDOM: + return q_loss && q_loss >= get_crandom(&q_loss_cor); + case CLG_GILB_ELL: + return loss_gilb_ell(); + } + + return false; +} + +SEC("struct_ops/bpf_netem_enqueue") +int BPF_PROG(bpf_netem_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + int count = 1; + s64 delay = 0; + u64 now; + + if (loss_event()) + --count; + + if (count == 0) { + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; + } + + q_qlen++; + if (q_qlen > q_limit) { + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; + } + + delay = tabledist(q_latency, q_jitter, &q_delay_cor); + now = bpf_ktime_get_ns(); + skb->tstamp = now + delay; + + bpf_spin_lock(&t_root_lock); + bpf_rbtree_excl_add(&t_root, &skb->bpf_rbnode, skb_tstamp_less); + bpf_spin_unlock(&t_root_lock); + + return NET_XMIT_SUCCESS; +} + +SEC("struct_ops/bpf_netem_dequeue") +struct sk_buff *BPF_PROG(bpf_netem_dequeue, struct Qdisc *sch) +{ + struct bpf_rb_excl_node *node; + struct sk_buff *skb; + u64 now, tstamp; + + now = bpf_ktime_get_ns(); + + bpf_spin_lock(&t_root_lock); + node = bpf_rbtree_excl_first(&t_root); + if (!node) { + bpf_spin_unlock(&t_root_lock); + return NULL; + } + + skb = container_of(node, struct sk_buff, bpf_rbnode); + tstamp = skb->tstamp; + if (tstamp <= now) { + node = bpf_rbtree_excl_remove(&t_root, node); + bpf_spin_unlock(&t_root_lock); + + if (!node) + return NULL; + + skb = container_of(node, struct sk_buff, bpf_rbnode); + bpf_skb_set_dev(skb, sch); + q_qlen--; + return skb; + } + + bpf_spin_unlock(&t_root_lock); + bpf_qdisc_watchdog_schedule(sch, tstamp, 0); + return NULL; +} + +SEC("struct_ops/bpf_netem_init") +int BPF_PROG(bpf_netem_init, struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static int reset_queue(u32 index, void *ctx) +{ + struct bpf_rb_excl_node *node; + struct sk_buff *skb; + + bpf_spin_lock(&t_root_lock); + node = bpf_rbtree_excl_first(&t_root); + if (!node) { + bpf_spin_unlock(&t_root_lock); + return 1; + } + + skb = container_of(node, struct sk_buff, bpf_rbnode); + node = bpf_rbtree_excl_remove(&t_root, node); + bpf_spin_unlock(&t_root_lock); + + if (!node) + return 1; + + skb = container_of(node, struct sk_buff, bpf_rbnode); + bpf_skb_release(skb); + return 0; +} + +SEC("struct_ops/bpf_netem_reset") +void BPF_PROG(bpf_netem_reset, struct Qdisc *sch) +{ + bpf_loop(q_limit, reset_queue, NULL, 0); + q_qlen = 0; +} + +SEC(".struct_ops") +struct Qdisc_ops netem = { + .enqueue = (void *)bpf_netem_enqueue, + .dequeue = (void *)bpf_netem_dequeue, + .init = (void *)bpf_netem_init, + .reset = (void *)bpf_netem_reset, + .id = "bpf_netem", +}; + From patchwork Fri May 10 19:24:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amery Hung X-Patchwork-Id: 13661893 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22167502AE; Fri, 10 May 2024 19:24:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369069; cv=none; b=b3ju361kKz8EiwoQk0cLAN/XximVnUnR5OG3fdch2RSdA7eUFfrqdNMa+/iOMleSPkSkxvNs0WeJHPxlKnI7PoV9F3+DXkv1vMU66/WmWufYIyqi1NKFkyL/CRkYJ5Q/gFNfae4Ia33DRCLJXtPE739nwIMiLKdn/97cPyEpLy0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715369069; c=relaxed/simple; bh=Fs1kE8T0Hjn3Bl3AxSlYEf44BYhAIFciOjWKpPT6NG8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JwkZIRSuUb4REX7CYR6uB91V95JfZ4uHUtMEy8hndoNw8TAQTxmtK4pCwxYVMlqYkk0VQlt8A2CjGsA3JlDKInKFFp6maXivrpFopQSfZR1swEiD3dhB9sIcrz8pttT7/JrDA76pzDmwH5GDQ3kHfLRxA1X4NTfZys23nc6rme8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=I4oGExQT; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="I4oGExQT" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-43d2277d7e1so11544181cf.1; Fri, 10 May 2024 12:24:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715369067; x=1715973867; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NgS7VqnllBlrll6Bn1u4GTToL8zBqJUYSEmej4G9S7c=; b=I4oGExQTvwbe0LDZgmaUvxgXoYkg0iMC7czug719QfUadG+VW0d6wnbnbWQPmDGOCd shO3IxtetDsf+tyARNdUwlS2JMjzIQzf5qiejagQPdMbTsWWmCco1obVUM2Xrg4rB1lX Ik38MVDykeVPll7jALIOZCK6RVswoomAIGzSKgKZKzumXplRZ7DF7vFa0dc3QqVkHO8/ EBYtgTG33lcITVf30oDcnMrGaCeGDEujsOP5FUXN1qkAfn12JBCxRu7oI6d0JY1o09lE ypC6KMGdoWk0oPjOxGXKODNeED/ADJFSULWy3YCIjPiFInXsNCdwOfPdGKhmnmG/aAF9 uQuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715369067; x=1715973867; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NgS7VqnllBlrll6Bn1u4GTToL8zBqJUYSEmej4G9S7c=; b=Ylj6Bs3BxBmAtMoqYm/98AuZRRk17t8hvN+dDqR5yjVp+VwZg5f99Oq2TYfwemKyXY sJvuCgokwRiuyRX8kqhC7UuTyQM0sP1ve5icUb4iKbplWLR63deBIzpRXar0h7phF0TP +s+jBcFIk2Qu4KEjNhw9xiUtJVFQtSYczf/OdR7Bqaq5mvy/5Yxd70+7kSTpdxMHh6Rs zu82LWj7zxOcVHobmPb1znaR5hFgkOzGDREUAYIYKCqQuiuQDYTAeTazdA7FREOvfNJO 7Vm6gmLIBr76CSLjS3XdZfv912koGy+UsuHu/hTFiJqcCq8/VFZcR2IY293/+gqa+H9S hndw== X-Gm-Message-State: AOJu0YwIWLIk236m4gPLNkNjCmgua9mDHp2wBAvFIfb+62bQJUlmBWoj QHJ6ito8ZPGMP0tk645HGzLQ9qxW/pyS3Ycrv3JzDRn9hwqVpQjbOqmR2Q== X-Google-Smtp-Source: AGHT+IFFrkTO44IZwACSfZWIPVWCj+enc1xe9b2thGeTuz7H6z6Ys5Kbrji6PLrX0zCCooXSXKODpw== X-Received: by 2002:a05:622a:5c15:b0:43d:f232:dfb3 with SMTP id d75a77b69052e-43dfdb69357mr55894671cf.47.1715369067082; Fri, 10 May 2024 12:24:27 -0700 (PDT) Received: from n36-183-057.byted.org ([147.160.184.83]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df5b46a26sm23863251cf.80.2024.05.10.12.24.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 May 2024 12:24:26 -0700 (PDT) From: Amery Hung X-Google-Original-From: Amery Hung To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com, xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com, ameryhung@gmail.com Subject: [RFC PATCH v8 20/20] selftests: Add a prio bpf qdisc Date: Fri, 10 May 2024 19:24:12 +0000 Message-Id: <20240510192412.3297104-21-amery.hung@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240510192412.3297104-1-amery.hung@bytedance.com> References: <20240510192412.3297104-1-amery.hung@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This test implements a classful qdisc using bpf. The prio qdisc, like its native counterpart, has 16 bands. An skb is classified into a band based on its priority. During dequeue, the band with the lowest priority value are tried first. The bpf prio qdisc populates the classes during initialization with pfifo qdisc, and we later change them to be fq qdiscs. A direct queue using bpf list is provided to make sure the traffic will be always flowing even if qdiscs in all bands are removed. Signed-off-by: Amery Hung --- .../selftests/bpf/prog_tests/bpf_qdisc.c | 52 +++++++- .../selftests/bpf/progs/bpf_qdisc_prio.c | 112 ++++++++++++++++++ 2 files changed, 160 insertions(+), 4 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_prio.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c index ec9c0d166e89..e1e80fb3c52d 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c @@ -2,9 +2,11 @@ #include #include +#include "netlink_helpers.h" #include "network_helpers.h" #include "bpf_qdisc_fifo.skel.h" #include "bpf_qdisc_fq.skel.h" +#include "bpf_qdisc_prio.skel.h" struct crndstate { u32 last; @@ -65,7 +67,7 @@ static void *server(void *arg) return NULL; } -static void do_test(char *qdisc) +static void do_test(char *qdisc, int (*setup)(void)) { DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = LO_IFINDEX, .attach_point = BPF_TC_QDISC, @@ -87,6 +89,12 @@ static void do_test(char *qdisc) if (!ASSERT_OK(err, "attach qdisc")) return; + if (setup) { + err = setup(); + if (!ASSERT_OK(err, "setup qdisc")) + return; + } + lfd = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); if (!ASSERT_NEQ(lfd, -1, "socket")) { bpf_tc_hook_destroy(&hook); @@ -156,7 +164,7 @@ static void test_fifo(void) return; } - do_test("bpf_fifo"); + do_test("bpf_fifo", NULL); bpf_link__destroy(link); bpf_qdisc_fifo__destroy(fifo_skel); @@ -177,7 +185,7 @@ static void test_fq(void) return; } - do_test("bpf_fq"); + do_test("bpf_fq", NULL); bpf_link__destroy(link); bpf_qdisc_fq__destroy(fq_skel); @@ -198,12 +206,46 @@ static void test_netem(void) return; } - do_test("bpf_netem"); + do_test("bpf_netem", NULL); bpf_link__destroy(link); bpf_qdisc_netem__destroy(netem_skel); } +static int setup_prio_bands(void) +{ + char cmd[128]; + int i; + + for (i = 1; i <= 16; i++) { + snprintf(cmd, sizeof(cmd), "tc qdisc add dev lo parent 800:%x handle %x0: fq", i, i); + if (!ASSERT_OK(system(cmd), cmd)) + return -1; + } + return 0; +} + +static void test_prio_qdisc(void) +{ + struct bpf_qdisc_prio *prio_skel; + struct bpf_link *link; + + prio_skel = bpf_qdisc_prio__open_and_load(); + if (!ASSERT_OK_PTR(prio_skel, "bpf_qdisc_prio__open_and_load")) + return; + + link = bpf_map__attach_struct_ops(prio_skel->maps.prio); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_qdisc_prio__destroy(prio_skel); + return; + } + + do_test("bpf_prio", &setup_prio_bands); + + bpf_link__destroy(link); + bpf_qdisc_prio__destroy(prio_skel); +} + void test_bpf_qdisc(void) { if (test__start_subtest("fifo")) @@ -212,4 +254,6 @@ void test_bpf_qdisc(void) test_fq(); if (test__start_subtest("netem")) test_netem(); + if (test__start_subtest("prio")) + test_prio_qdisc(); } diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_prio.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_prio.c new file mode 100644 index 000000000000..9a7797a7ed9d --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_prio.c @@ -0,0 +1,112 @@ +#include +#include "bpf_experimental.h" +#include "bpf_qdisc_common.h" + +char _license[] SEC("license") = "GPL"; + +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) + +private(B) struct bpf_spin_lock direct_queue_lock; +private(B) struct bpf_list_head direct_queue __contains_kptr(sk_buff, bpf_list); + +unsigned int q_limit = 1000; +unsigned int q_qlen = 0; + +SEC("struct_ops/bpf_prio_enqueue") +int BPF_PROG(bpf_prio_enqueue, struct sk_buff *skb, struct Qdisc *sch, + struct bpf_sk_buff_ptr *to_free) +{ + u32 classid = sch->handle | (skb->priority & TC_PRIO_MAX); + + if (bpf_qdisc_find_class(sch, classid)) + return bpf_qdisc_enqueue(skb, sch, classid, to_free); + + q_qlen++; + if (q_qlen > q_limit) { + bpf_qdisc_skb_drop(skb, to_free); + return NET_XMIT_DROP; + } + + bpf_spin_lock(&direct_queue_lock); + bpf_list_excl_push_back(&direct_queue, &skb->bpf_list); + bpf_spin_unlock(&direct_queue_lock); + + return NET_XMIT_SUCCESS; +} + +SEC("struct_ops/bpf_prio_dequeue") +struct sk_buff *BPF_PROG(bpf_prio_dequeue, struct Qdisc *sch) +{ + struct bpf_list_excl_node *node; + struct sk_buff *skb; + u32 i, classid; + + bpf_spin_lock(&direct_queue_lock); + node = bpf_list_excl_pop_front(&direct_queue); + bpf_spin_unlock(&direct_queue_lock); + if (!node) { + for (i = 0; i <= TC_PRIO_MAX; i++) { + classid = sch->handle | i; + skb = bpf_qdisc_dequeue(sch, classid); + if (skb) + return skb; + } + return NULL; + } + + skb = container_of(node, struct sk_buff, bpf_list); + bpf_skb_set_dev(skb, sch); + q_qlen--; + + return skb; +} + +SEC("struct_ops/bpf_prio_init") +int BPF_PROG(bpf_prio_init, struct Qdisc *sch, struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + int i, err; + + for (i = 1; i <= TC_PRIO_MAX + 1; i++) { + err = bpf_qdisc_create_child(sch, i, extack); + if (err) + return err; + } + + return 0; +} + +static int reset_direct_queue(u32 index, void *ctx) +{ + struct bpf_list_excl_node *node; + struct sk_buff *skb; + + bpf_spin_lock(&direct_queue_lock); + node = bpf_list_excl_pop_front(&direct_queue); + bpf_spin_unlock(&direct_queue_lock); + + if (!node) { + return 1; + } + + skb = container_of(node, struct sk_buff, bpf_list); + bpf_skb_release(skb); + return 0; +} + +SEC("struct_ops/bpf_prio_reset") +void BPF_PROG(bpf_prio_reset, struct Qdisc *sch) +{ + bpf_loop(q_qlen, reset_direct_queue, NULL, 0); + q_qlen = 0; +} + +SEC(".struct_ops") +struct Qdisc_ops prio = { + .enqueue = (void *)bpf_prio_enqueue, + .dequeue = (void *)bpf_prio_dequeue, + .init = (void *)bpf_prio_init, + .reset = (void *)bpf_prio_reset, + .id = "bpf_prio", +}; +