[RFC,bpf-next,08/11] bpf: Add OBJ_NON_OWNING_REF type flag

Message ID	20220722183438.3319790-9-davemarchevsky@fb.com (mailing list archive)
State	RFC
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: Dave Marchevsky <davemarchevsky@fb.com> To: <bpf@vger.kernel.org> CC: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Kernel Team <kernel-team@fb.com>, Tejun Heo <tj@kernel.org>, Dave Marchevsky <davemarchevsky@fb.com> Subject: [RFC PATCH bpf-next 08/11] bpf: Add OBJ_NON_OWNING_REF type flag Date: Fri, 22 Jul 2022 11:34:35 -0700 Message-ID: <20220722183438.3319790-9-davemarchevsky@fb.com> In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain Precedence: bulk
Series	bpf: Introduce rbtree map \| expand [RFC,bpf-next,00/11] bpf: Introduce rbtree map [RFC,bpf-next,01/11] bpf: Pull repeated reg access bounds check into helper fn [RFC,bpf-next,02/11] bpf: Add verifier support for custom callback return range [RFC,bpf-next,03/11] bpf: Add rb_node_off to bpf_map [RFC,bpf-next,04/11] bpf: Add rbtree map [RFC,bpf-next,05/11] bpf: Add bpf_spin_lock member to rbtree [RFC,bpf-next,06/11] bpf: Add bpf_rbtree_{lock,unlock} helpers [RFC,bpf-next,07/11] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} [RFC,bpf-next,08/11] bpf: Add OBJ_NON_OWNING_REF type flag [RFC,bpf-next,09/11] bpf: Add CONDITIONAL_RELEASE type flag [RFC,bpf-next,10/11] bpf: Introduce PTR_ITER and PTR_ITER_END type flags [RFC,bpf-next,11/11] selftests/bpf: Add rbtree map tests

Message ID

20220722183438.3319790-9-davemarchevsky@fb.com (mailing list archive)

State

RFC

Delegated to:

BPF

Headers

From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>, Tejun Heo <tj@kernel.org>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFC PATCH bpf-next 08/11] bpf: Add OBJ_NON_OWNING_REF type flag
Date: Fri, 22 Jul 2022 11:34:35 -0700
Message-ID: <20220722183438.3319790-9-davemarchevsky@fb.com>
In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com>
References: <20220722183438.3319790-1-davemarchevsky@fb.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
Precedence: bulk

Series

bpf: Introduce rbtree map | expand

Checks

Context	Check	Description
bpf/vmtest-bpf-next-PR	fail	merge-conflict
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/apply	fail	Patch does not apply to bpf-next

Context

Check

Description

bpf/vmtest-bpf-next-PR

fail

merge-conflict

netdev/tree_selection

success

Clearly marked for bpf-next, async

netdev/apply

fail

Patch does not apply to bpf-next

Commit Message

Dave Marchevsky July 22, 2022, 6:34 p.m. UTC

Consider a pointer to a type that would normally need acquire / release
semantics to be safely held. There may be scenarios where such a pointer
can be safely held without the need to acquire a reference.

For example, although a PTR_TO_BTF_ID for a rbtree_map node is released
via bpf_rbtree_add helper, the helper doesn't change the address of the
node and must be called with the rbtree_map's spinlock held. Since the
only way to remove a node from the rbtree - bpf_rbtree_remove helper -
requires the same lock, the newly-added node cannot be removed by a
concurrently-running program until the lock is released. Therefore it is
safe to hold a reference to this node until bpf_rbtree_unlock is called.

This patch introduces a new type flag and associated verifier logic to
handle such "non-owning" references.

Currently the only usecase I have is the rbtree example above, so the
verifier logic is straightforward:
  * Tag return types of bpf_rbtree_{add,find} with OBJ_NON_OWNING_REF
    * These both require the rbtree lock to be held to return anything
    non-NULL
    * Since ret type for both is PTR_TO_BTF_ID_OR_NULL, if lock is not
    held and NULL is returned, existing mark_ptr_or_null_reg logic
    will clear reg type.
    * So if mark_ptr_or_null_reg logic turns the returned reg into a
    PTR_TO_BTF_ID | OBJ_NON_OWNING_REF, verifier knows lock is held.

  * When the lock is released the verifier invalidates any regs holding
  non owning refs similarly to existing release_reference logic - but no
  need to clear ref_obj_id as an 'owning' reference was never acquired.

[ TODO: Currently the invalidation logic in
clear_rbtree_node_non_owning_refs is not parametrized by map so
unlocking any rbtree lock will invalidate all non-owning refs ]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h   |  1 +
 kernel/bpf/rbtree.c   |  4 +--
 kernel/bpf/verifier.c | 63 +++++++++++++++++++++++++++++++++++++++----
 3 files changed, 61 insertions(+), 7 deletions(-)

Comments

Alexei Starovoitov Aug. 1, 2022, 10:41 p.m. UTC | #1

On 7/22/22 11:34 AM, Dave Marchevsky wrote:
> Consider a pointer to a type that would normally need acquire / release
> semantics to be safely held. There may be scenarios where such a pointer
> can be safely held without the need to acquire a reference.
> 
> For example, although a PTR_TO_BTF_ID for a rbtree_map node is released
> via bpf_rbtree_add helper, the helper doesn't change the address of the
> node and must be called with the rbtree_map's spinlock held. Since the
> only way to remove a node from the rbtree - bpf_rbtree_remove helper -
> requires the same lock, the newly-added node cannot be removed by a
> concurrently-running program until the lock is released. Therefore it is
> safe to hold a reference to this node until bpf_rbtree_unlock is called.
> 
> This patch introduces a new type flag and associated verifier logic to
> handle such "non-owning" references.
> 
> Currently the only usecase I have is the rbtree example above, so the
> verifier logic is straightforward:
>    * Tag return types of bpf_rbtree_{add,find} with OBJ_NON_OWNING_REF
>      * These both require the rbtree lock to be held to return anything
>      non-NULL
>      * Since ret type for both is PTR_TO_BTF_ID_OR_NULL, if lock is not
>      held and NULL is returned, existing mark_ptr_or_null_reg logic
>      will clear reg type.
>      * So if mark_ptr_or_null_reg logic turns the returned reg into a
>      PTR_TO_BTF_ID | OBJ_NON_OWNING_REF, verifier knows lock is held.
> 
>    * When the lock is released the verifier invalidates any regs holding
>    non owning refs similarly to existing release_reference logic - but no
>    need to clear ref_obj_id as an 'owning' reference was never acquired.
> 
> [ TODO: Currently the invalidation logic in
> clear_rbtree_node_non_owning_refs is not parametrized by map so
> unlocking any rbtree lock will invalidate all non-owning refs ]

probably should be parametrized by 'lock' instead of by 'map'.

> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   include/linux/bpf.h   |  1 +
>   kernel/bpf/rbtree.c   |  4 +--
>   kernel/bpf/verifier.c | 63 +++++++++++++++++++++++++++++++++++++++----
>   3 files changed, 61 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index eb8c550db0e2..c9c4b4fb019c 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -412,6 +412,7 @@ enum bpf_type_flag {
>   	/* Size is known at compile time. */
>   	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
>   
> +	OBJ_NON_OWNING_REF	= BIT(11 + BPF_BASE_TYPE_BITS),
>   	__BPF_TYPE_FLAG_MAX,
>   	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
>   };
> diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
> index 5b1ab73e164f..34864fc83209 100644
> --- a/kernel/bpf/rbtree.c
> +++ b/kernel/bpf/rbtree.c
> @@ -111,7 +111,7 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
>   const struct bpf_func_proto bpf_rbtree_add_proto = {
>   	.func = bpf_rbtree_add,
>   	.gpl_only = true,
> -	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
> +	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
>   	.arg1_type = ARG_CONST_MAP_PTR,
>   	.arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
>   	.arg2_btf_id = &bpf_rbtree_btf_ids[0],
> @@ -133,7 +133,7 @@ BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb)
>   const struct bpf_func_proto bpf_rbtree_find_proto = {
>   	.func = bpf_rbtree_find,
>   	.gpl_only = true,
> -	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
> +	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
>   	.ret_btf_id = &bpf_rbtree_btf_ids[0],
>   	.arg1_type = ARG_CONST_MAP_PTR,
>   	.arg2_type = ARG_ANYTHING,

To prevent race the bpf_rbtree_find/add feels not enough.
bpf_rbtree_alloc_node probably needs it too?
Otherwise after bpf_rbtree_unlock the stashed pointer from alloc
will still be accessible?
May be it's actually ok to access by the prog?
Due to the way preallocated maps work the elements can be use-after-free
within the same map. It's similar to SLAB_TYPESAFE_BY_RCU.
The elements won't be released into global kernel memory while progs are 
looking at them.
It seems to me it's ok to do away without OBJ_NON_OWNING_REF concept.
The prog might have pointers to rbtree elements and they will be
accessible by the prog even after bpf_rbtree_unlock().
This access will be racy, but it's safe.
So just drop this patch?

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eb8c550db0e2..c9c4b4fb019c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -412,6 +412,7 @@  enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	OBJ_NON_OWNING_REF	= BIT(11 + BPF_BASE_TYPE_BITS),
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 5b1ab73e164f..34864fc83209 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -111,7 +111,7 @@  BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
 const struct bpf_func_proto bpf_rbtree_add_proto = {
 	.func = bpf_rbtree_add,
 	.gpl_only = true,
-	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
 	.arg1_type = ARG_CONST_MAP_PTR,
 	.arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
 	.arg2_btf_id = &bpf_rbtree_btf_ids[0],
@@ -133,7 +133,7 @@  BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb)
 const struct bpf_func_proto bpf_rbtree_find_proto = {
 	.func = bpf_rbtree_find,
 	.gpl_only = true,
-	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
 	.ret_btf_id = &bpf_rbtree_btf_ids[0],
 	.arg1_type = ARG_CONST_MAP_PTR,
 	.arg2_type = ARG_ANYTHING,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 174a355d97fd..4f46b2dfbc4b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -467,6 +467,11 @@  static bool type_is_rdonly_mem(u32 type)
 	return type & MEM_RDONLY;
 }
 
+static bool type_is_non_owning_ref(u32 type)
+{
+	return type & OBJ_NON_OWNING_REF;
+}
+
 static bool type_may_be_null(u32 type)
 {
 	return type & PTR_MAYBE_NULL;
@@ -555,7 +560,9 @@  static bool function_returns_rbtree_node(enum bpf_func_id func_id)
 static const char *reg_type_str(struct bpf_verifier_env *env,
 				enum bpf_reg_type type)
 {
-	char postfix[16] = {0}, prefix[32] = {0};
+	char postfix[32] = {0}, prefix[32] = {0};
+	unsigned int postfix_idx = 0;
+
 	static const char * const str[] = {
 		[NOT_INIT]		= "?",
 		[SCALAR_VALUE]		= "scalar",
@@ -579,11 +586,18 @@  static const char *reg_type_str(struct bpf_verifier_env *env,
 		[PTR_TO_MAP_KEY]	= "map_key",
 	};
 
-	if (type & PTR_MAYBE_NULL) {
+	if (type_may_be_null(type)) {
 		if (base_type(type) == PTR_TO_BTF_ID)
-			strncpy(postfix, "or_null_", 16);
+			postfix_idx += strlcpy(postfix + postfix_idx, "or_null_", 32 - postfix_idx);
 		else
-			strncpy(postfix, "_or_null", 16);
+			postfix_idx += strlcpy(postfix + postfix_idx, "_or_null", 32 - postfix_idx);
+	}
+
+	if (type_is_non_owning_ref(type)) {
+		if (base_type(type) == PTR_TO_BTF_ID)
+			postfix_idx += strlcpy(postfix + postfix_idx, "non_own_", 32 - postfix_idx);
+		else
+			postfix_idx += strlcpy(postfix + postfix_idx, "_non_own", 32 - postfix_idx);
 	}
 
 	if (type & MEM_RDONLY)
@@ -5684,12 +5698,18 @@  static const struct bpf_reg_types int_ptr_types = {
 	},
 };
 
+static const struct bpf_reg_types btf_ptr_types = {
+	.types = {
+		PTR_TO_BTF_ID,
+		PTR_TO_BTF_ID | OBJ_NON_OWNING_REF,
+	},
+};
+
 static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } };
 static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
 static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
 static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
@@ -6635,6 +6655,33 @@  static int release_reference(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static void clear_non_owning_ref_regs(struct bpf_verifier_env *env,
+				      struct bpf_func_state *state)
+{
+	struct bpf_reg_state *regs = state->regs, *reg;
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++)
+		if (type_is_non_owning_ref(regs[i].type))
+			mark_reg_unknown(env, regs, i);
+
+	bpf_for_each_spilled_reg(i, state, reg) {
+		if (!reg)
+			continue;
+		if (type_is_non_owning_ref(reg->type))
+			__mark_reg_unknown(env, reg);
+	}
+}
+
+static void clear_rbtree_node_non_owning_refs(struct bpf_verifier_env *env)
+{
+	struct bpf_verifier_state *vstate = env->cur_state;
+	int i;
+
+	for (i = 0; i <= vstate->curframe; i++)
+		clear_non_owning_ref_regs(env, vstate->frame[i]);
+}
+
 static void clear_caller_saved_regs(struct bpf_verifier_env *env,
 				    struct bpf_reg_state *regs)
 {
@@ -7436,6 +7483,12 @@  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			}
 		}
 		break;
+	case BPF_FUNC_rbtree_unlock:
+		/* TODO clear_rbtree_node_non_owning_refs calls should be
+		 * parametrized by base_type or ideally by owning map
+		 */
+		clear_rbtree_node_non_owning_refs(env);
+		break;
 	}
 
 	if (err)