Message ID | 20230827152816.2000760-1-yonghong.song@linux.dev (mailing list archive) |
---|---|
State | Accepted |
Commit | 616334e5036b3f0dc1b8c9374d77785ab21deb83 |
Delegated to: | BPF |
Headers | show |
Series | bpf: Add support for local percpu kptr | expand |
On Sun, Aug 27, 2023 at 08:28:16AM -0700, Yonghong Song wrote: > In previous selftests/bpf patch, we have > p = bpf_percpu_obj_new(struct val_t); > if (!p) > goto out; > > p1 = bpf_kptr_xchg(&e->pc, p); > if (p1) { > /* race condition */ > bpf_percpu_obj_drop(p1); > } > > p = e->pc; > if (!p) > goto out; > > After bpf_kptr_xchg(), we need to re-read e->pc into 'p'. > This is due to that the second argument of bpf_kptr_xchg() is marked > OBJ_RELEASE and it will be marked as invalid after the call. > So after bpf_kptr_xchg(), 'p' is an unknown scalar, > and the bpf program needs to reread from the map value. > > This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU, > and if 'p' is RCU protected. If this is the case, 'p' can be marked > as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not > an owning reference any more. Such a change makes re-read > from the map value unnecessary. > > Note that re-reading 'e->pc' after bpf_kptr_xchg() might get > a different value from 'p' if immediately before 'p = e->pc', > another cpu may do another bpf_kptr_xchg() and swap in another value > into 'e->pc'. If this is the case, then 'p = e->pc' may > get either 'p' or another value, and race condition already exists. > So removing direct re-reading seems fine too. ... > + } else if (func_id == BPF_FUNC_kptr_xchg && meta.ref_obj_id) { > + u32 ref_obj_id = meta.ref_obj_id; > + bool in_rcu = in_rcu_cs(env); > + struct bpf_func_state *state; > + struct bpf_reg_state *reg; > + > + err = release_reference_state(cur_func(env), ref_obj_id); > + if (!err) { > + bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ > + if (reg->ref_obj_id == ref_obj_id) { > + if (in_rcu && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) { > + reg->ref_obj_id = 0; > + reg->type &= ~MEM_ALLOC; > + reg->type |= MEM_RCU; > + } else { > + mark_reg_invalid(env, reg); > + } > + } > + })); > + } > } else if (meta.ref_obj_id) { > err = release_reference(env, meta.ref_obj_id); I think this open coded version of release_reference() can be safely folded into release_reference(). If it's safe to do for kptr_xchg() then it's safe to do for all KF_RELEASE kfuncs too that call release_reference(). bpf_percpu_obj_drop() is the only such kfunc and converting its arg1 from MEM_ALLOC | MEM_PERCPU to MEM_RCU | MEM_PERCPU should be equally valid, since bpf_percpu_obj_drop() is doing bpf_mem_free_rcu. I'm planning to apply the whole set. Above nit can be a follow up.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6b7e7ca611f3..dbba2b806017 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9660,6 +9660,26 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn return -EFAULT; } err = unmark_stack_slots_dynptr(env, ®s[meta.release_regno]); + } else if (func_id == BPF_FUNC_kptr_xchg && meta.ref_obj_id) { + u32 ref_obj_id = meta.ref_obj_id; + bool in_rcu = in_rcu_cs(env); + struct bpf_func_state *state; + struct bpf_reg_state *reg; + + err = release_reference_state(cur_func(env), ref_obj_id); + if (!err) { + bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ + if (reg->ref_obj_id == ref_obj_id) { + if (in_rcu && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) { + reg->ref_obj_id = 0; + reg->type &= ~MEM_ALLOC; + reg->type |= MEM_RCU; + } else { + mark_reg_invalid(env, reg); + } + } + })); + } } else if (meta.ref_obj_id) { err = release_reference(env, meta.ref_obj_id); } else if (register_is_null(®s[meta.release_regno])) {
In previous selftests/bpf patch, we have p = bpf_percpu_obj_new(struct val_t); if (!p) goto out; p1 = bpf_kptr_xchg(&e->pc, p); if (p1) { /* race condition */ bpf_percpu_obj_drop(p1); } p = e->pc; if (!p) goto out; After bpf_kptr_xchg(), we need to re-read e->pc into 'p'. This is due to that the second argument of bpf_kptr_xchg() is marked OBJ_RELEASE and it will be marked as invalid after the call. So after bpf_kptr_xchg(), 'p' is an unknown scalar, and the bpf program needs to reread from the map value. This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU, and if 'p' is RCU protected. If this is the case, 'p' can be marked as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not an owning reference any more. Such a change makes re-read from the map value unnecessary. Note that re-reading 'e->pc' after bpf_kptr_xchg() might get a different value from 'p' if immediately before 'p = e->pc', another cpu may do another bpf_kptr_xchg() and swap in another value into 'e->pc'. If this is the case, then 'p = e->pc' may get either 'p' or another value, and race condition already exists. So removing direct re-reading seems fine too. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> --- kernel/bpf/verifier.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)