Message ID | 20220827100134.1621137-1-houtao@huaweicloud.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,v2,1/3] bpf: Disable preemption when increasing per-cpu map_locked | expand |
On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@huaweicloud.com> wrote: > > From: Hou Tao <houtao1@huawei.com> > > Per-cpu htab->map_locked is used to prohibit the concurrent accesses > from both NMI and non-NMI contexts. But since commit 74d862b682f5 > ("sched: Make migrate_disable/enable() independent of RT"), > migrations_disable() is also preemptible under CONFIG_PREEMPT case, nit: migrate_disable > so now map_locked also disallows concurrent updates from normal contexts > (e.g. userspace processes) unexpectedly as shown below: > > process A process B > > htab_map_update_elem() > htab_lock_bucket() > migrate_disable() > /* return 1 */ > __this_cpu_inc_return() > /* preempted by B */ > > htab_map_update_elem() > /* the same bucket as A */ > htab_lock_bucket() > migrate_disable() > /* return 2, so lock fails */ > __this_cpu_inc_return() > return -EBUSY > > A fix that seems feasible is using in_nmi() in htab_lock_bucket() and > only checking the value of map_locked for nmi context. But it will > re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered > through non-tracing program (e.g. fentry program). > > So fixing it by using disable_preempt() instead of migrate_disable() when > increasing htab->map_locked. However when htab_use_raw_lock() is false, > bucket lock will be a sleepable spin-lock and it breaks disable_preempt(), > so still use migrate_disable() for spin-lock case and leave the > concurrent map updates problem to BPF memory allocator patchset in which > !htab_use_raw_lock() case will be removed. I think the description needs a bit more clarity and I think you mean preempt_disable() instead of disable_preempt Suggestion: One cannot use preempt_disable() to fix this issue as htab_use_raw_lock being false causes the bucket lock to be a spin lock which can sleep and does not work with preempt_disable(). Therefore, use migrate_disable() when using the spinlock instead of preempt_disable() and defer fixing concurrent updates to when the kernel has its own BPF memory allocator. > > Reviewed-by: Hao Luo <haoluo@google.com> > Signed-off-by: Hou Tao <houtao1@huawei.com> Fixes tag here please? > --- > kernel/bpf/hashtab.c | 23 ++++++++++++++++++----- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index b301a63afa2f..6fb3b7fd1622 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, > unsigned long *pflags) > { > unsigned long flags; > + bool use_raw_lock; > > hash = hash & HASHTAB_MAP_LOCK_MASK; > > - migrate_disable(); > + use_raw_lock = htab_use_raw_lock(htab); > + if (use_raw_lock) > + preempt_disable(); > + else > + migrate_disable(); > if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { > __this_cpu_dec(*(htab->map_locked[hash])); > - migrate_enable(); > + if (use_raw_lock) > + preempt_enable(); > + else > + migrate_enable(); > return -EBUSY; > } > > - if (htab_use_raw_lock(htab)) > + if (use_raw_lock) > raw_spin_lock_irqsave(&b->raw_lock, flags); > else > spin_lock_irqsave(&b->lock, flags); > @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, > struct bucket *b, u32 hash, > unsigned long flags) > { > + bool use_raw_lock = htab_use_raw_lock(htab); > + > hash = hash & HASHTAB_MAP_LOCK_MASK; > - if (htab_use_raw_lock(htab)) > + if (use_raw_lock) > raw_spin_unlock_irqrestore(&b->raw_lock, flags); > else > spin_unlock_irqrestore(&b->lock, flags); > __this_cpu_dec(*(htab->map_locked[hash])); > - migrate_enable(); > + if (use_raw_lock) > + preempt_enable(); > + else > + migrate_enable(); > } > > static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); > -- > 2.29.2 >
On 8/29/2022 6:39 AM, KP Singh wrote: > On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@huaweicloud.com> wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> Per-cpu htab->map_locked is used to prohibit the concurrent accesses >> from both NMI and non-NMI contexts. But since commit 74d862b682f5 >> ("sched: Make migrate_disable/enable() independent of RT"), >> migrations_disable() is also preemptible under CONFIG_PREEMPT case, > nit: migrate_disable Will fix in v3. > >> so now map_locked also disallows concurrent updates from normal contexts >> (e.g. userspace processes) unexpectedly as shown below: >> >> process A process B >> >> htab_map_update_elem() >> htab_lock_bucket() >> migrate_disable() >> /* return 1 */ >> __this_cpu_inc_return() >> /* preempted by B */ >> >> htab_map_update_elem() >> /* the same bucket as A */ >> htab_lock_bucket() >> migrate_disable() >> /* return 2, so lock fails */ >> __this_cpu_inc_return() >> return -EBUSY >> >> A fix that seems feasible is using in_nmi() in htab_lock_bucket() and >> only checking the value of map_locked for nmi context. But it will >> re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered >> through non-tracing program (e.g. fentry program). >> >> So fixing it by using disable_preempt() instead of migrate_disable() when >> increasing htab->map_locked. However when htab_use_raw_lock() is false, >> bucket lock will be a sleepable spin-lock and it breaks disable_preempt(), >> so still use migrate_disable() for spin-lock case and leave the >> concurrent map updates problem to BPF memory allocator patchset in which >> !htab_use_raw_lock() case will be removed. > I think the description needs a bit more clarity and I think you mean > preempt_disable() instead of disable_preempt > > Suggestion: > > One cannot use preempt_disable() to fix this issue as htab_use_raw_lock > being false causes the bucket lock to be a spin lock which can sleep and > does not work with preempt_disable(). > > Therefore, use migrate_disable() when using the spinlock instead of > preempt_disable() and defer fixing concurrent updates to when the kernel > has its own BPF memory allocator. Will update the commit message in v3 and thanks for the suggestion. > >> Reviewed-by: Hao Luo <haoluo@google.com> >> Signed-off-by: Hou Tao <houtao1@huawei.com> > Fixes tag here please? Will add "Fixes: 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT")" in v3. > > >> --- >> kernel/bpf/hashtab.c | 23 ++++++++++++++++++----- >> 1 file changed, 18 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> index b301a63afa2f..6fb3b7fd1622 100644 >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, >> unsigned long *pflags) >> { >> unsigned long flags; >> + bool use_raw_lock; >> >> hash = hash & HASHTAB_MAP_LOCK_MASK; >> >> - migrate_disable(); >> + use_raw_lock = htab_use_raw_lock(htab); >> + if (use_raw_lock) >> + preempt_disable(); >> + else >> + migrate_disable(); >> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { >> __this_cpu_dec(*(htab->map_locked[hash])); >> - migrate_enable(); >> + if (use_raw_lock) >> + preempt_enable(); >> + else >> + migrate_enable(); >> return -EBUSY; >> } >> >> - if (htab_use_raw_lock(htab)) >> + if (use_raw_lock) >> raw_spin_lock_irqsave(&b->raw_lock, flags); >> else >> spin_lock_irqsave(&b->lock, flags); >> @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, >> struct bucket *b, u32 hash, >> unsigned long flags) >> { >> + bool use_raw_lock = htab_use_raw_lock(htab); >> + >> hash = hash & HASHTAB_MAP_LOCK_MASK; >> - if (htab_use_raw_lock(htab)) >> + if (use_raw_lock) >> raw_spin_unlock_irqrestore(&b->raw_lock, flags); >> else >> spin_unlock_irqrestore(&b->lock, flags); >> __this_cpu_dec(*(htab->map_locked[hash])); >> - migrate_enable(); >> + if (use_raw_lock) >> + preempt_enable(); >> + else >> + migrate_enable(); >> } >> >> static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); >> -- >> 2.29.2 >> > .
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index b301a63afa2f..6fb3b7fd1622 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, unsigned long *pflags) { unsigned long flags; + bool use_raw_lock; hash = hash & HASHTAB_MAP_LOCK_MASK; - migrate_disable(); + use_raw_lock = htab_use_raw_lock(htab); + if (use_raw_lock) + preempt_disable(); + else + migrate_disable(); if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { __this_cpu_dec(*(htab->map_locked[hash])); - migrate_enable(); + if (use_raw_lock) + preempt_enable(); + else + migrate_enable(); return -EBUSY; } - if (htab_use_raw_lock(htab)) + if (use_raw_lock) raw_spin_lock_irqsave(&b->raw_lock, flags); else spin_lock_irqsave(&b->lock, flags); @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, struct bucket *b, u32 hash, unsigned long flags) { + bool use_raw_lock = htab_use_raw_lock(htab); + hash = hash & HASHTAB_MAP_LOCK_MASK; - if (htab_use_raw_lock(htab)) + if (use_raw_lock) raw_spin_unlock_irqrestore(&b->raw_lock, flags); else spin_unlock_irqrestore(&b->lock, flags); __this_cpu_dec(*(htab->map_locked[hash])); - migrate_enable(); + if (use_raw_lock) + preempt_enable(); + else + migrate_enable(); } static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);