Message ID | 20220829142752.330094-2-houtao@huaweicloud.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | make bpf_task_storage_busy being preemption-safe | expand |
On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > Now migrate_disable() does not disable preemption and under some > architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither > preemption-safe nor IRQ-safe, so the concurrent lookups or updates on > the same task local storage and the same CPU may make > bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock() > will always fail. > > Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating > bpf_task_storage_busy. > > Fixes: bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]") > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > kernel/bpf/bpf_local_storage.c | 4 ++-- > kernel/bpf/bpf_task_storage.c | 8 ++++---- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c > index 4ee2e7286c23..802fc15b0d73 100644 > --- a/kernel/bpf/bpf_local_storage.c > +++ b/kernel/bpf/bpf_local_storage.c > @@ -555,11 +555,11 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap, > struct bpf_local_storage_elem, map_node))) { > if (busy_counter) { > migrate_disable(); > - __this_cpu_inc(*busy_counter); > + this_cpu_inc(*busy_counter); > } > bpf_selem_unlink(selem, false); > if (busy_counter) { > - __this_cpu_dec(*busy_counter); > + this_cpu_dec(*busy_counter); > migrate_enable(); > } > cond_resched_rcu(); > diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c > index e9014dc62682..6f290623347e 100644 > --- a/kernel/bpf/bpf_task_storage.c > +++ b/kernel/bpf/bpf_task_storage.c > @@ -26,20 +26,20 @@ static DEFINE_PER_CPU(int, bpf_task_storage_busy); > static void bpf_task_storage_lock(void) > { > migrate_disable(); > - __this_cpu_inc(bpf_task_storage_busy); > + this_cpu_inc(bpf_task_storage_busy); > } > > static void bpf_task_storage_unlock(void) > { > - __this_cpu_dec(bpf_task_storage_busy); > + this_cpu_dec(bpf_task_storage_busy); > migrate_enable(); > } > > static bool bpf_task_storage_trylock(void) > { > migrate_disable(); > - if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) { > - __this_cpu_dec(bpf_task_storage_busy); > + if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) { > + this_cpu_dec(bpf_task_storage_busy); This change is only needed here but not in the htab fix [0] or you are planning to address it separately ? [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/
Hi, On 8/30/2022 8:52 AM, Martin KaFai Lau wrote: > On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> Now migrate_disable() does not disable preemption and under some >> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither >> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on >> the same task local storage and the same CPU may make >> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock() >> will always fail. >> >> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating >> bpf_task_storage_busy. SNIP >> static bool bpf_task_storage_trylock(void) >> { >> migrate_disable(); >> - if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) { >> - __this_cpu_dec(bpf_task_storage_busy); >> + if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) { >> + this_cpu_dec(bpf_task_storage_busy); > This change is only needed here but not in the htab fix [0] > or you are planning to address it separately ? > > [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/ > . For htab_lock_bucket() in hash-table, in theory there will be problem as shown below, but I can not reproduce it on ARM64 host: *p = 0 // process A r0 = *p r0 += 1 // process B r1 = *p // *p = 1 *p = r0 r1 += 1 // *p = 1 *p = r1 // r0 = 1 r0 = *p // r1 = 1 r1 = *p In hash table fixes, migrate_disable() in htab_lock_bucket() is replaced by preempt_disable(), so the above case will be impossible. And if process A is preempted by IRQ, __this_cpu_inc_return will be OK. Beside bpf hash-table, there are also other __this_cpu_inc_return users in bpf (e.g. __bpf_prog_enter), maybe I should fix these usage as well ?
On Tue, Aug 30, 2022 at 10:21:30AM +0800, Hou Tao wrote: > Hi, > > On 8/30/2022 8:52 AM, Martin KaFai Lau wrote: > > On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote: > >> From: Hou Tao <houtao1@huawei.com> > >> > >> Now migrate_disable() does not disable preemption and under some > >> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither > >> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on > >> the same task local storage and the same CPU may make > >> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock() > >> will always fail. > >> > >> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating > >> bpf_task_storage_busy. > SNIP > >> static bool bpf_task_storage_trylock(void) > >> { > >> migrate_disable(); > >> - if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) { > >> - __this_cpu_dec(bpf_task_storage_busy); > >> + if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) { > >> + this_cpu_dec(bpf_task_storage_busy); > > This change is only needed here but not in the htab fix [0] > > or you are planning to address it separately ? > > > > [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/ > > . > For htab_lock_bucket() in hash-table, in theory there will be problem as shown > below, but I can not reproduce it on ARM64 host: > > *p = 0 > > // process A > r0 = *p > r0 += 1 > // process B > r1 = *p > // *p = 1 > *p = r0 > r1 += 1 > // *p = 1 > *p = r1 > > // r0 = 1 > r0 = *p > // r1 = 1 > r1 = *p > > In hash table fixes, migrate_disable() in htab_lock_bucket() is replaced by > preempt_disable(), so the above case will be impossible. And if process A is > preempted by IRQ, __this_cpu_inc_return will be OK. hmm... iiuc, it is fine for the preempted by IRQ case because the operations won't be interleaved as the process A/B example above? That should also be true for the task-storage case here, meaning only CONFIG_PREEMPT can reproduce it? If that is the case, please also mention that in the commit message.
Hi, On 8/31/2022 8:41 AM, Martin KaFai Lau wrote: > On Tue, Aug 30, 2022 at 10:21:30AM +0800, Hou Tao wrote: >> Hi, >> >> On 8/30/2022 8:52 AM, Martin KaFai Lau wrote: >>> On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote: >>>> From: Hou Tao <houtao1@huawei.com> >>>> >>>> Now migrate_disable() does not disable preemption and under some >>>> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither >>>> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on >>>> the same task local storage and the same CPU may make >>>> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock() >>>> will always fail. >>>> >>>> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating >>>> bpf_task_storage_busy. >> SNIP >>>> static bool bpf_task_storage_trylock(void) >>>> { >>>> migrate_disable(); >>>> - if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) { >>>> - __this_cpu_dec(bpf_task_storage_busy); >>>> + if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) { >>>> + this_cpu_dec(bpf_task_storage_busy); >>> This change is only needed here but not in the htab fix [0] >>> or you are planning to address it separately ? >>> >>> [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/ >>> . >> For htab_lock_bucket() in hash-table, in theory there will be problem as shown >> below, but I can not reproduce it on ARM64 host: >> >> *p = 0 >> >> // process A >> r0 = *p >> r0 += 1 >> // process B >> r1 = *p >> // *p = 1 >> *p = r0 >> r1 += 1 >> // *p = 1 >> *p = r1 >> >> // r0 = 1 >> r0 = *p >> // r1 = 1 >> r1 = *p >> >> In hash table fixes, migrate_disable() in htab_lock_bucket() is replaced by >> preempt_disable(), so the above case will be impossible. And if process A is >> preempted by IRQ, __this_cpu_inc_return will be OK. > hmm... iiuc, it is fine for the preempted by IRQ case because the > operations won't be interleaved as the process A/B example above? > That should also be true for the task-storage case here, > meaning only CONFIG_PREEMPT can reproduce it? If that is the > case, please also mention that in the commit message. > . Yes. There will be no interleave if it is preempted by IRQ and it is also true for task-storage case. CONFIG_PREEMPT can reproduce and in theory CONFIG_PREEMPT_RT is also possible. Will add that in commit message.
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index 4ee2e7286c23..802fc15b0d73 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -555,11 +555,11 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap, struct bpf_local_storage_elem, map_node))) { if (busy_counter) { migrate_disable(); - __this_cpu_inc(*busy_counter); + this_cpu_inc(*busy_counter); } bpf_selem_unlink(selem, false); if (busy_counter) { - __this_cpu_dec(*busy_counter); + this_cpu_dec(*busy_counter); migrate_enable(); } cond_resched_rcu(); diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index e9014dc62682..6f290623347e 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -26,20 +26,20 @@ static DEFINE_PER_CPU(int, bpf_task_storage_busy); static void bpf_task_storage_lock(void) { migrate_disable(); - __this_cpu_inc(bpf_task_storage_busy); + this_cpu_inc(bpf_task_storage_busy); } static void bpf_task_storage_unlock(void) { - __this_cpu_dec(bpf_task_storage_busy); + this_cpu_dec(bpf_task_storage_busy); migrate_enable(); } static bool bpf_task_storage_trylock(void) { migrate_disable(); - if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) { - __this_cpu_dec(bpf_task_storage_busy); + if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) { + this_cpu_dec(bpf_task_storage_busy); migrate_enable(); return false; }