From patchwork Thu Nov 17 04:28:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13046113 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6022FC4332F for ; Thu, 17 Nov 2022 04:29:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239200AbiKQE26 (ORCPT ); Wed, 16 Nov 2022 23:28:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239258AbiKQE2x (ORCPT ); Wed, 16 Nov 2022 23:28:53 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 762AC49B63 for ; Wed, 16 Nov 2022 20:28:51 -0800 (PST) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AH0burd027251 for ; Wed, 16 Nov 2022 20:28:51 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=K8Xoy70UobYRNz2g3VHG0ThYc056tboYQ3HmSeYxaiA=; b=HiBFQNfSPloxl9QqgR8gQSYDFTVZashQhIHoU0J0U0uFrsZr4CerLv3kQ8uPQvSnxz5e UhjUyKeOev8e2vKQHVpdN+PMC3bQ1N7Af3sRBxtrA6khgkeaxTSw284xHXfLt0hPG/dM ndVDInaXEMdLDfa6m8BXqmQ9auN0InpZvtY= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3kvy58fprg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 16 Nov 2022 20:28:51 -0800 Received: from twshared16963.27.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 16 Nov 2022 20:28:50 -0800 Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 169CA124860B6; Wed, 16 Nov 2022 20:28:39 -0800 (PST) From: Yonghong Song To: CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , , Martin KaFai Lau Subject: [PATCH bpf-next v6 4/7] bpf: Add bpf_rcu_read_lock() verifier support Date: Wed, 16 Nov 2022 20:28:39 -0800 Message-ID: <20221117042839.1090807-1-yhs@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221117042818.1086954-1-yhs@fb.com> References: <20221117042818.1086954-1-yhs@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: JaaFMrflu5KyIOXFqp3csr2Po2IPuWZq X-Proofpoint-GUID: JaaFMrflu5KyIOXFqp3csr2Po2IPuWZq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-16_03,2022-11-16_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net To simplify the design and support the common practice, no nested bpf_rcu_read_lock() is allowed. A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID object access needing rcu_read_lock protection. Note that rcu protection is not needed for non-sleepable program, but it is supported to make cross-sleepable/nonsleepable development easier. For sleepable program, the following insns can be inside the rcu lock region: - any non call insns except BPF_ABS/BPF_IND - non sleepable helpers or kfuncs The rcu pointer will be invalidated at bpf_rcu_read_unlock() so it cannot be used outside the current rcu read lock region. Also, bpf_*_storage_get() helper's 5th hidden argument (for memory allocation flag) should be GFP_ATOMIC. If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of this pointer and the load which gets this pointer needs to be protected by bpf_rcu_read_lock(). The following shows a couple of examples: struct task_struct { ... struct task_struct __rcu *real_parent; struct css_set __rcu *cgroups; ... }; struct css_set { ... struct cgroup *dfl_cgrp; ... } ... task = bpf_get_current_task_btf(); cgroups = task->cgroups; dfl_cgroup = cgroups->dfl_cgrp; ... using dfl_cgroup ... The bpf_rcu_read_lock/unlock() should be added like below to avoid verification failures. task = bpf_get_current_task_btf(); bpf_rcu_read_lock(); cgroups = task->cgroups; dfl_cgroup = cgroups->dfl_cgrp; bpf_rcu_read_unlock(); ... using dfl_cgroup ... The following is another example for task->real_parent. task = bpf_get_current_task_btf(); bpf_rcu_read_lock(); real_parent = task->real_parent; ... bpf_task_storage_get(&map, real_parent, 0, 0); bpf_rcu_read_unlock(); There is another case observed in selftest bpf_iter_ipv6_route.c: struct fib6_info *rt = ctx->rt; ... fib6_nh = &rt->fib6_nh[0]; // Not rcu protected ... if (rt->nh) fib6_nh = &nh->nh_info->fib6_nh; // rcu protected ... ... using fib6_nh ... Note that the use of fib6_nh is tag with rcu in one path but not in the other path. Current verification will fail since the same insn cannot be used with different pointer types. The above use case is a valid one so the verifier is changed to ignore MEM_RCU type tag in such cases. Signed-off-by: Yonghong Song --- include/linux/bpf.h | 3 + include/linux/bpf_verifier.h | 4 ++ kernel/bpf/btf.c | 31 ++++++++- kernel/bpf/verifier.c | 122 ++++++++++++++++++++++++++++++++--- 4 files changed, 149 insertions(+), 11 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8c89d6020fb3..08895ccabc6d 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -524,6 +524,9 @@ enum bpf_type_flag { /* Size is known at compile time. */ MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS), + /* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */ + MEM_RCU = BIT(11 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 1a32baa78ce2..484baeffbfb0 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -325,6 +325,7 @@ struct bpf_verifier_state { u32 curframe; u32 active_spin_lock; bool speculative; + bool active_rcu_lock; /* first and last insn idx of this verifier state */ u32 first_insn_idx; @@ -424,6 +425,7 @@ struct bpf_insn_aux_data { u32 seen; /* this insn was processed by the verifier at env->pass_cnt */ bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */ bool zext_dst; /* this insn zero extends dst reg */ + bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */ u8 alu_state; /* used in combination with alu_limit */ /* below fields are initialized once */ @@ -627,6 +629,8 @@ void bpf_free_kfunc_btf_tab(struct bpf_kfunc_btf_tab *tab); int mark_chain_precision(struct bpf_verifier_env *env, int regno); +void clear_all_rcu_pointers(struct bpf_verifier_env *env); + #define BPF_BASE_TYPE_MASK GENMASK(BPF_BASE_TYPE_BITS - 1, 0) /* extract base type from bpf_{arg, return, reg}_type. */ diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index fc4df69cfbf9..687bc66fe911 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -5989,6 +5989,9 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf, /* check __percpu tag */ if (strcmp(tag_value, "percpu") == 0) tmp_flag = MEM_PERCPU; + /* check __rcu tag */ + if (strcmp(tag_value, "rcu") == 0) + tmp_flag = MEM_RCU; } stype = btf_type_skip_modifiers(btf, mtype->type, &id); @@ -6451,6 +6454,9 @@ static bool btf_is_kfunc_arg_mem_size(const struct btf *btf, return true; } +BTF_ID_LIST_SINGLE(bpf_rcu_read_lock_id, func, bpf_rcu_read_lock) +BTF_ID_LIST_SINGLE(bpf_rcu_read_unlock_id, func, bpf_rcu_read_unlock) + static int btf_check_func_arg_match(struct bpf_verifier_env *env, const struct btf *btf, u32 func_id, struct bpf_reg_state *regs, @@ -6460,7 +6466,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, { enum bpf_prog_type prog_type = resolve_prog_type(env->prog); bool rel = false, kptr_get = false, trusted_args = false; - bool sleepable = false; + bool sleepable = false, rcu_lock = false, rcu_unlock = false; struct bpf_verifier_log *log = &env->log; u32 i, nargs, ref_id, ref_obj_id = 0; bool is_kfunc = btf_is_kernel(btf); @@ -6469,6 +6475,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, const struct btf_param *args; int ref_regno = 0, ret; + t = btf_type_by_id(btf, func_id); if (!t || !btf_type_is_func(t)) { /* These checks were already done by the verifier while loading @@ -6499,6 +6506,28 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, kptr_get = kfunc_meta->flags & KF_KPTR_GET; trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS; sleepable = kfunc_meta->flags & KF_SLEEPABLE; + rcu_lock = func_id == *bpf_rcu_read_lock_id; + rcu_unlock = func_id == *bpf_rcu_read_unlock_id; + } + + /* checking rcu read lock/unlock */ + if (env->cur_state->active_rcu_lock) { + if (rcu_lock) { + bpf_log(log, "nested rcu read lock (kernel function %s)\n", func_name); + return -EINVAL; + } else if (rcu_unlock) { + clear_all_rcu_pointers(env); + env->cur_state->active_rcu_lock = false; + } else if (sleepable) { + bpf_log(log, "kernel func %s is sleepable within rcu_read_lock region\n", + func_name); + return -EINVAL; + } + } else if (rcu_lock) { + env->cur_state->active_rcu_lock = true; + } else if (rcu_unlock) { + bpf_log(log, "unmatched rcu read unlock (kernel function %s)\n", func_name); + return -EINVAL; } /* check that BTF function arguments match actual types that the diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0312d9ce292f..44633a1a2565 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include "disasm.h" @@ -513,6 +514,14 @@ static bool is_callback_calling_function(enum bpf_func_id func_id) func_id == BPF_FUNC_user_ringbuf_drain; } +static bool is_storage_get_function(enum bpf_func_id func_id) +{ + return func_id == BPF_FUNC_sk_storage_get || + func_id == BPF_FUNC_inode_storage_get || + func_id == BPF_FUNC_task_storage_get || + func_id == BPF_FUNC_cgrp_storage_get; +} + static bool helper_multiple_ref_obj_use(enum bpf_func_id func_id, const struct bpf_map *map) { @@ -583,6 +592,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env, strncpy(prefix, "user_", 32); if (type & MEM_PERCPU) strncpy(prefix, "percpu_", 32); + if (type & MEM_RCU) + strncpy(prefix, "rcu_", 32); if (type & PTR_UNTRUSTED) strncpy(prefix, "untrusted_", 32); @@ -1208,6 +1219,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state, dst_state->speculative = src->speculative; dst_state->curframe = src->curframe; dst_state->active_spin_lock = src->active_spin_lock; + dst_state->active_rcu_lock = src->active_rcu_lock; dst_state->branches = src->branches; dst_state->parent = src->parent; dst_state->first_insn_idx = src->first_insn_idx; @@ -4687,6 +4699,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, return -EACCES; } + if ((reg->type & MEM_RCU) && env->prog->aux->sleepable && + !env->cur_state->active_rcu_lock) { + verbose(env, + "R%d is ptr_%s access rcu-protected memory with off=%d, not rcu protected\n", + regno, tname, off); + return -EACCES; + } + if (env->ops->btf_struct_access) { ret = env->ops->btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag); } else { @@ -4701,6 +4721,16 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, if (ret < 0) return ret; + /* The value is a rcu pointer. For a sleepable program, the load needs to be + * in a rcu lock region, similar to rcu_dereference(). + */ + if ((flag & MEM_RCU) && env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) { + verbose(env, + "R%d is rcu dereference ptr_%s with off=%d, not in rcu_read_lock region\n", + regno, tname, off); + return -EACCES; + } + /* If this is an untrusted pointer, all pointers formed by walking it * also inherit the untrusted flag. */ @@ -5807,7 +5837,12 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } }; static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } }; static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } }; static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } }; -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } }; +static const struct bpf_reg_types btf_ptr_types = { + .types = { + PTR_TO_BTF_ID, + PTR_TO_BTF_ID | MEM_RCU, + } +}; static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } }; static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } }; static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } }; @@ -5881,6 +5916,20 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, if (arg_type & PTR_MAYBE_NULL) type &= ~PTR_MAYBE_NULL; + /* If the reg type is marked as MEM_RCU, ensure the usage is in the rcu_read_lock + * region, and remove MEM_RCU from the type since the arg_type won't encode + * MEM_RCU. + */ + if (type & MEM_RCU) { + if (env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) { + verbose(env, + "R%d is arg type %s needs rcu protection\n", + regno, reg_type_str(env, reg->type)); + return -EACCES; + } + type &= ~MEM_RCU; + } + for (i = 0; i < ARRAY_SIZE(compatible->types); i++) { expected = compatible->types[i]; if (expected == NOT_INIT) @@ -5897,7 +5946,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, return -EACCES; found: - if (reg->type == PTR_TO_BTF_ID) { + /* reg is already protected by rcu_read_lock(). Peel off MEM_RCU from reg->type. */ + if ((reg->type & ~MEM_RCU) == PTR_TO_BTF_ID) { /* For bpf_sk_release, it needs to match against first member * 'struct sock_common', hence make an exception for it. This * allows bpf_sk_release to work for multiple socket types. @@ -5973,6 +6023,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, * fixed offset. */ case PTR_TO_BTF_ID: + case PTR_TO_BTF_ID | MEM_RCU: /* When referenced PTR_TO_BTF_ID is passed to release function, * it's fixed offset must be 0. In the other cases, fixed offset * can be non-zero. @@ -6700,6 +6751,17 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env) })); } +void clear_all_rcu_pointers(struct bpf_verifier_env *env) +{ + struct bpf_func_state *state; + struct bpf_reg_state *reg; + + bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ + if (reg->type & MEM_RCU) + __mark_reg_unknown(env, reg); + })); +} + enum { AT_PKT_END = -1, BEYOND_PKT_END = -2, @@ -7429,6 +7491,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn return err; } + if (env->cur_state->active_rcu_lock) { + if (bpf_lsm_sleepable_func_proto(func_id) || + bpf_tracing_sleepable_func_proto(func_id)) { + verbose(env, "sleepable helper %s#%din rcu_read_lock region\n", + func_id_name(func_id), func_id); + return -EINVAL; + } + + if (env->prog->aux->sleepable && is_storage_get_function(func_id)) + env->insn_aux_data[insn_idx].storage_get_func_atomic = true; + } + meta.func_id = func_id; /* check args */ for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { @@ -10647,6 +10721,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn) return -EINVAL; } + if (env->prog->aux->sleepable && env->cur_state->active_rcu_lock) { + verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_rcu_read_lock-ed region\n"); + return -EINVAL; + } + if (regs[ctx_reg].type != PTR_TO_CTX) { verbose(env, "at the time of BPF_LD_ABS|IND R6 != pointer to skb\n"); @@ -11911,6 +11990,9 @@ static bool states_equal(struct bpf_verifier_env *env, if (old->active_spin_lock != cur->active_spin_lock) return false; + if (old->active_rcu_lock != cur->active_rcu_lock) + return false; + /* for states to be equal callsites have to be the same * and all frame states need to be equivalent */ @@ -12324,6 +12406,11 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev) !reg_type_mismatch_ok(prev)); } +static bool reg_type_mismatch_ignore_rcu(enum bpf_reg_type src, enum bpf_reg_type prev) +{ + return reg_type_mismatch(src & ~MEM_RCU, prev & ~MEM_RCU); +} + static int do_check(struct bpf_verifier_env *env) { bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); @@ -12449,6 +12536,17 @@ static int do_check(struct bpf_verifier_env *env) prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type; + /* For code like below, + * struct foo *f; + * if (...) + * f = ...; // f with MEM_RCU type tag. + * else + * f = ...; // f without MEM_RCU type tag. + * ... f ... // Here f could be with/without MEM_RCU + * + * It is safe to ignore MEM_RCU type tag here since + * base types are the same. + */ if (*prev_src_type == NOT_INIT) { /* saw a valid insn * dst_reg = *(u32 *)(src_reg + off) @@ -12456,7 +12554,7 @@ static int do_check(struct bpf_verifier_env *env) */ *prev_src_type = src_reg_type; - } else if (reg_type_mismatch(src_reg_type, *prev_src_type)) { + } else if (reg_type_mismatch_ignore_rcu(src_reg_type, *prev_src_type)) { /* ABuser program is trying to use the same insn * dst_reg = *(u32*) (src_reg + off) * with different pointer types: @@ -12595,6 +12693,11 @@ static int do_check(struct bpf_verifier_env *env) return -EINVAL; } + if (env->cur_state->active_rcu_lock) { + verbose(env, "bpf_rcu_read_unlock is missing\n"); + return -EINVAL; + } + /* We must do check_reference_leak here before * prepare_func_exit to handle the case when * state->curframe > 0, it may be a callback @@ -13690,6 +13793,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) break; case PTR_TO_BTF_ID: case PTR_TO_BTF_ID | PTR_UNTRUSTED: + case PTR_TO_BTF_ID | MEM_RCU: if (type == BPF_READ) { insn->code = BPF_LDX | BPF_PROBE_MEM | BPF_SIZE((insn)->code); @@ -14338,14 +14442,12 @@ static int do_misc_fixups(struct bpf_verifier_env *env) goto patch_call_imm; } - if (insn->imm == BPF_FUNC_task_storage_get || - insn->imm == BPF_FUNC_sk_storage_get || - insn->imm == BPF_FUNC_inode_storage_get || - insn->imm == BPF_FUNC_cgrp_storage_get) { - if (env->prog->aux->sleepable) - insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL); - else + if (is_storage_get_function(insn->imm)) { + if (!env->prog->aux->sleepable || + env->insn_aux_data[i + delta].storage_get_func_atomic) insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC); + else + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL); insn_buf[1] = *insn; cnt = 2;