Message ID | 20230814172857.1366162-1-yonghong.song@linux.dev (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Add support for local percpu kptr | expand |
On Mon, 14 Aug 2023 at 23:00, Yonghong Song <yonghong.song@linux.dev> wrote: > > In previous selftests/bpf patch, we have > p = bpf_percpu_obj_new(struct val_t); > if (!p) > goto out; > > p1 = bpf_kptr_xchg(&e->pc, p); > if (p1) { > /* race condition */ > bpf_percpu_obj_drop(p1); > } > > p = e->pc; > if (!p) > goto out; > > After bpf_kptr_xchg(), we need to re-read e->pc into 'p'. > This is due to that the second argument of bpf_kptr_xchg() is marked > OBJ_RELEASE and it will be marked as invalid after the call. > So after bpf_kptr_xchg(), 'p' is an unknown scalar, > and the bpf program needs to reread from the map value. > > This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU, > and if 'p' is RCU protected. If this is the case, 'p' can be marked > as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not > an owning reference any more. Such a change makes re-read > from the map value unnecessary. > > Note that re-reading 'e->pc' after bpf_kptr_xchg() might get > a different value from 'p' if immediately before 'p = e->pc', > another cpu may do another bpf_kptr_xchg() and swap in another value > into 'e->pc'. If this is the case, then 'p = e->pc' may > get either 'p' or another value, and race condition already exists. > So removing direct re-reading seems fine too. > > Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > --- > kernel/bpf/verifier.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 6fc200cb68b6..6fa458e13bfc 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env, > return err; > > bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ > - if (reg->ref_obj_id == ref_obj_id) > - mark_reg_invalid(env, reg); > + if (reg->ref_obj_id == ref_obj_id) { > + if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) { Wouldn't this check also be true in case of bpf_percpu_obj_drop(p) inside RCU CS/non-sleepable prog? Do we want to permit access to p after drop in that case? I think it will be a bit unintuitive. I think we should preserve normal behavior for everything except for kptr_xchg of a percpu_kptr. > + reg->ref_obj_id = 0; > + reg->type &= ~MEM_ALLOC; > + reg->type |= MEM_RCU; > + } else { > + mark_reg_invalid(env, reg); > + } > + } > })); > > return 0; > -- > 2.34.1 > >
On 8/18/23 6:44 PM, Kumar Kartikeya Dwivedi wrote: > On Mon, 14 Aug 2023 at 23:00, Yonghong Song <yonghong.song@linux.dev> wrote: >> >> In previous selftests/bpf patch, we have >> p = bpf_percpu_obj_new(struct val_t); >> if (!p) >> goto out; >> >> p1 = bpf_kptr_xchg(&e->pc, p); >> if (p1) { >> /* race condition */ >> bpf_percpu_obj_drop(p1); >> } >> >> p = e->pc; >> if (!p) >> goto out; >> >> After bpf_kptr_xchg(), we need to re-read e->pc into 'p'. >> This is due to that the second argument of bpf_kptr_xchg() is marked >> OBJ_RELEASE and it will be marked as invalid after the call. >> So after bpf_kptr_xchg(), 'p' is an unknown scalar, >> and the bpf program needs to reread from the map value. >> >> This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU, >> and if 'p' is RCU protected. If this is the case, 'p' can be marked >> as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not >> an owning reference any more. Such a change makes re-read >> from the map value unnecessary. >> >> Note that re-reading 'e->pc' after bpf_kptr_xchg() might get >> a different value from 'p' if immediately before 'p = e->pc', >> another cpu may do another bpf_kptr_xchg() and swap in another value >> into 'e->pc'. If this is the case, then 'p = e->pc' may >> get either 'p' or another value, and race condition already exists. >> So removing direct re-reading seems fine too. >> >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> >> --- >> kernel/bpf/verifier.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 6fc200cb68b6..6fa458e13bfc 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env, >> return err; >> >> bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ >> - if (reg->ref_obj_id == ref_obj_id) >> - mark_reg_invalid(env, reg); >> + if (reg->ref_obj_id == ref_obj_id) { >> + if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) { > > Wouldn't this check also be true in case of bpf_percpu_obj_drop(p) > inside RCU CS/non-sleepable prog? > Do we want to permit access to p after drop in that case? I think it > will be a bit unintuitive. > I think we should preserve normal behavior for everything except for > kptr_xchg of a percpu_kptr. You are correct. Above condition also applies to bpf_percpu_obj_drop() and we should should change MEM_ALLOC to MEM_RCU only for bpf_percpu_obj_new(). Will fix. > >> + reg->ref_obj_id = 0; >> + reg->type &= ~MEM_ALLOC; >> + reg->type |= MEM_RCU; >> + } else { >> + mark_reg_invalid(env, reg); >> + } >> + } >> })); >> >> return 0; >> -- >> 2.34.1 >> >>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6fc200cb68b6..6fa458e13bfc 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env, return err; bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ - if (reg->ref_obj_id == ref_obj_id) - mark_reg_invalid(env, reg); + if (reg->ref_obj_id == ref_obj_id) { + if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) { + reg->ref_obj_id = 0; + reg->type &= ~MEM_ALLOC; + reg->type |= MEM_RCU; + } else { + mark_reg_invalid(env, reg); + } + } })); return 0;
In previous selftests/bpf patch, we have p = bpf_percpu_obj_new(struct val_t); if (!p) goto out; p1 = bpf_kptr_xchg(&e->pc, p); if (p1) { /* race condition */ bpf_percpu_obj_drop(p1); } p = e->pc; if (!p) goto out; After bpf_kptr_xchg(), we need to re-read e->pc into 'p'. This is due to that the second argument of bpf_kptr_xchg() is marked OBJ_RELEASE and it will be marked as invalid after the call. So after bpf_kptr_xchg(), 'p' is an unknown scalar, and the bpf program needs to reread from the map value. This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU, and if 'p' is RCU protected. If this is the case, 'p' can be marked as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not an owning reference any more. Such a change makes re-read from the map value unnecessary. Note that re-reading 'e->pc' after bpf_kptr_xchg() might get a different value from 'p' if immediately before 'p = e->pc', another cpu may do another bpf_kptr_xchg() and swap in another value into 'e->pc'. If this is the case, then 'p = e->pc' may get either 'p' or another value, and race condition already exists. So removing direct re-reading seems fine too. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> --- kernel/bpf/verifier.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)