Message ID | 20231020014214.2471419-2-houtao@huaweicloud.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: Fix bpf timer kmemleak | expand |
On Thu, Oct 19, 2023 at 6:41 PM Hou Tao <houtao@huaweicloud.com> wrote: > > From: Hou Tao <houtao1@huawei.com> > > When there are concurrent uref release and bpf timer init operations, > the following sequence diagram is possible and it will lead to memory > leak: > > bpf program X > > bpf_timer_init() > lock timer->lock > read timer->timer as NULL > read map->usercnt != 0 > > process Y > > close(map_fd) > // put last uref > bpf_map_put_uref() > atomic_dec_and_test(map->usercnt) > array_map_free_timers() > bpf_timer_cancel_and_free() > // just return and lead to memory leak > read timer->timer is NULL > > t = bpf_map_kmalloc_node() > timer->timer = t > unlock timer->lock > > Fix the problem by checking map->usercnt again after timer->timer is > assigned, so when there are concurrent uref release and bpf timer init, > either bpf_timer_cancel_and_free() from uref release reads a no-NULL > timer and the newly-added check of map->usercnt reads a zero usercnt. > > Because atomic_dec_and_test(map->usercnt) and READ_ONCE(timer->timer) > in bpf_timer_cancel_and_free() are not protected by a lock, so add > a memory barrier to guarantee the order between map->usercnt and > timer->timer. Also use WRITE_ONCE(timer->timer, x) to match the lockless > read of timer->timer. > > Reported-by: Hsin-Wei Hung <hsinweih@uci.edu> > Closes: https://lore.kernel.org/bpf/CABcoxUaT2k9hWsS1tNgXyoU3E-=PuOgMn737qK984fbFmfYixQ@mail.gmail.com > Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > kernel/bpf/helpers.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 757b99c1e613f..a7d92c3ddc3dd 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -1156,7 +1156,7 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map > u64, flags) > { > clockid_t clockid = flags & (MAX_CLOCKS - 1); > - struct bpf_hrtimer *t; > + struct bpf_hrtimer *t, *to_free = NULL; > int ret = 0; > > BUILD_BUG_ON(MAX_CLOCKS != 16); > @@ -1197,9 +1197,21 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map > rcu_assign_pointer(t->callback_fn, NULL); > hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); > t->timer.function = bpf_timer_cb; > - timer->timer = t; > + WRITE_ONCE(timer->timer, t); > + /* Guarantee order between timer->timer and map->usercnt. So when > + * there are concurrent uref release and bpf timer init, either > + * bpf_timer_cancel_and_free() called by uref release reads a no-NULL > + * timer or atomic64_read() below reads a zero usercnt. > + */ > + smp_mb(); > + if (!atomic64_read(&map->usercnt)) { > + WRITE_ONCE(timer->timer, NULL); > + to_free = t; just kfree(t); here. > + ret = -EPERM; > + } This will add a second atomic64_read(&map->usercnt) in the same function. Let's remove the first one ? > out: > __bpf_spin_unlock_irqrestore(&timer->lock); > + kfree(to_free); > return ret; > } > > @@ -1372,7 +1384,7 @@ void bpf_timer_cancel_and_free(void *val) > /* The subsequent bpf_timer_start/cancel() helpers won't be able to use > * this timer, since it won't be initialized. > */ > - timer->timer = NULL; > + WRITE_ONCE(timer->timer, NULL); > out: > __bpf_spin_unlock_irqrestore(&timer->lock); > if (!t) > -- > 2.29.2 >
Hi, On 10/20/2023 10:14 AM, Alexei Starovoitov wrote: > On Thu, Oct 19, 2023 at 6:41 PM Hou Tao <houtao@huaweicloud.com> wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> When there are concurrent uref release and bpf timer init operations, >> the following sequence diagram is possible and it will lead to memory >> leak: >> >> bpf program X >> >> bpf_timer_init() >> lock timer->lock >> read timer->timer as NULL >> read map->usercnt != 0 >> >> process Y >> >> close(map_fd) >> // put last uref >> bpf_map_put_uref() >> atomic_dec_and_test(map->usercnt) >> array_map_free_timers() >> bpf_timer_cancel_and_free() >> // just return and lead to memory leak >> read timer->timer is NULL >> >> t = bpf_map_kmalloc_node() >> timer->timer = t >> unlock timer->lock >> >> Fix the problem by checking map->usercnt again after timer->timer is >> assigned, so when there are concurrent uref release and bpf timer init, >> either bpf_timer_cancel_and_free() from uref release reads a no-NULL >> timer and the newly-added check of map->usercnt reads a zero usercnt. >> >> Because atomic_dec_and_test(map->usercnt) and READ_ONCE(timer->timer) >> in bpf_timer_cancel_and_free() are not protected by a lock, so add >> a memory barrier to guarantee the order between map->usercnt and >> timer->timer. Also use WRITE_ONCE(timer->timer, x) to match the lockless >> read of timer->timer. >> >> Reported-by: Hsin-Wei Hung <hsinweih@uci.edu> >> Closes: https://lore.kernel.org/bpf/CABcoxUaT2k9hWsS1tNgXyoU3E-=PuOgMn737qK984fbFmfYixQ@mail.gmail.com >> Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") >> Signed-off-by: Hou Tao <houtao1@huawei.com> >> --- >> kernel/bpf/helpers.c | 18 +++++++++++++++--- >> 1 file changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >> index 757b99c1e613f..a7d92c3ddc3dd 100644 >> --- a/kernel/bpf/helpers.c >> +++ b/kernel/bpf/helpers.c >> @@ -1156,7 +1156,7 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map >> u64, flags) >> { >> clockid_t clockid = flags & (MAX_CLOCKS - 1); >> - struct bpf_hrtimer *t; >> + struct bpf_hrtimer *t, *to_free = NULL; >> int ret = 0; >> >> BUILD_BUG_ON(MAX_CLOCKS != 16); >> @@ -1197,9 +1197,21 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map >> rcu_assign_pointer(t->callback_fn, NULL); >> hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); >> t->timer.function = bpf_timer_cb; >> - timer->timer = t; >> + WRITE_ONCE(timer->timer, t); >> + /* Guarantee order between timer->timer and map->usercnt. So when >> + * there are concurrent uref release and bpf timer init, either >> + * bpf_timer_cancel_and_free() called by uref release reads a no-NULL >> + * timer or atomic64_read() below reads a zero usercnt. >> + */ >> + smp_mb(); >> + if (!atomic64_read(&map->usercnt)) { >> + WRITE_ONCE(timer->timer, NULL); >> + to_free = t; > just kfree(t); here. Will do. It is a slow path, so I think doing kfree() under spin-lock is acceptable. > >> + ret = -EPERM; >> + } > This will add a second atomic64_read(&map->usercnt) in the same function. > Let's remove the first one ? I prefer to still keep it. Because it can detect the release of map uref early and the handle of uref release is simple compared with the second atomic64_read(). Do you have a strong preference ? > >> out: >> __bpf_spin_unlock_irqrestore(&timer->lock); >> + kfree(to_free); >> return ret; >> } >> >> @@ -1372,7 +1384,7 @@ void bpf_timer_cancel_and_free(void *val) >> /* The subsequent bpf_timer_start/cancel() helpers won't be able to use >> * this timer, since it won't be initialized. >> */ >> - timer->timer = NULL; >> + WRITE_ONCE(timer->timer, NULL); >> out: >> __bpf_spin_unlock_irqrestore(&timer->lock); >> if (!t) >> -- >> 2.29.2 >> > .
On Fri, Oct 20, 2023 at 12:31 AM Hou Tao <houtao@huaweicloud.com> wrote: > > Hi, > > On 10/20/2023 10:14 AM, Alexei Starovoitov wrote: > > On Thu, Oct 19, 2023 at 6:41 PM Hou Tao <houtao@huaweicloud.com> wrote: > >> From: Hou Tao <houtao1@huawei.com> > >> > >> When there are concurrent uref release and bpf timer init operations, > >> the following sequence diagram is possible and it will lead to memory > >> leak: > >> > >> bpf program X > >> > >> bpf_timer_init() > >> lock timer->lock > >> read timer->timer as NULL > >> read map->usercnt != 0 > >> > >> process Y > >> > >> close(map_fd) > >> // put last uref > >> bpf_map_put_uref() > >> atomic_dec_and_test(map->usercnt) > >> array_map_free_timers() > >> bpf_timer_cancel_and_free() > >> // just return and lead to memory leak > >> read timer->timer is NULL > >> > >> t = bpf_map_kmalloc_node() > >> timer->timer = t > >> unlock timer->lock > >> > >> Fix the problem by checking map->usercnt again after timer->timer is > >> assigned, so when there are concurrent uref release and bpf timer init, > >> either bpf_timer_cancel_and_free() from uref release reads a no-NULL > >> timer and the newly-added check of map->usercnt reads a zero usercnt. > >> > >> Because atomic_dec_and_test(map->usercnt) and READ_ONCE(timer->timer) > >> in bpf_timer_cancel_and_free() are not protected by a lock, so add > >> a memory barrier to guarantee the order between map->usercnt and > >> timer->timer. Also use WRITE_ONCE(timer->timer, x) to match the lockless > >> read of timer->timer. > >> > >> Reported-by: Hsin-Wei Hung <hsinweih@uci.edu> > >> Closes: https://lore.kernel.org/bpf/CABcoxUaT2k9hWsS1tNgXyoU3E-=PuOgMn737qK984fbFmfYixQ@mail.gmail.com > >> Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") > >> Signed-off-by: Hou Tao <houtao1@huawei.com> > >> --- > >> kernel/bpf/helpers.c | 18 +++++++++++++++--- > >> 1 file changed, 15 insertions(+), 3 deletions(-) > >> > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >> index 757b99c1e613f..a7d92c3ddc3dd 100644 > >> --- a/kernel/bpf/helpers.c > >> +++ b/kernel/bpf/helpers.c > >> @@ -1156,7 +1156,7 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map > >> u64, flags) > >> { > >> clockid_t clockid = flags & (MAX_CLOCKS - 1); > >> - struct bpf_hrtimer *t; > >> + struct bpf_hrtimer *t, *to_free = NULL; > >> int ret = 0; > >> > >> BUILD_BUG_ON(MAX_CLOCKS != 16); > >> @@ -1197,9 +1197,21 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map > >> rcu_assign_pointer(t->callback_fn, NULL); > >> hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); > >> t->timer.function = bpf_timer_cb; > >> - timer->timer = t; > >> + WRITE_ONCE(timer->timer, t); > >> + /* Guarantee order between timer->timer and map->usercnt. So when > >> + * there are concurrent uref release and bpf timer init, either > >> + * bpf_timer_cancel_and_free() called by uref release reads a no-NULL > >> + * timer or atomic64_read() below reads a zero usercnt. > >> + */ > >> + smp_mb(); > >> + if (!atomic64_read(&map->usercnt)) { > >> + WRITE_ONCE(timer->timer, NULL); > >> + to_free = t; > > just kfree(t); here. > > Will do. It is a slow path, so I think doing kfree() under spin-lock is > acceptable. > > > >> + ret = -EPERM; > >> + } > > This will add a second atomic64_read(&map->usercnt) in the same function. > > Let's remove the first one ? > > I prefer to still keep it. Because it can detect the release of map uref > early and the handle of uref release is simple compared with the second > atomic64_read(). Do you have a strong preference ? I bet somebody will send a patch to remove the first one as redundant. So let's do it now. The only reason we do repeated early check is to avoid taking a lock. Here doing an extra early check to avoid kmalloc is an overkill. That check is highly unlikely to hit while for locks it's a likely one. Hence extra check is justified for locks, but not here. Reading your other email it looks like this patchset is incomplete anyway?
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 757b99c1e613f..a7d92c3ddc3dd 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1156,7 +1156,7 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map u64, flags) { clockid_t clockid = flags & (MAX_CLOCKS - 1); - struct bpf_hrtimer *t; + struct bpf_hrtimer *t, *to_free = NULL; int ret = 0; BUILD_BUG_ON(MAX_CLOCKS != 16); @@ -1197,9 +1197,21 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map rcu_assign_pointer(t->callback_fn, NULL); hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); t->timer.function = bpf_timer_cb; - timer->timer = t; + WRITE_ONCE(timer->timer, t); + /* Guarantee order between timer->timer and map->usercnt. So when + * there are concurrent uref release and bpf timer init, either + * bpf_timer_cancel_and_free() called by uref release reads a no-NULL + * timer or atomic64_read() below reads a zero usercnt. + */ + smp_mb(); + if (!atomic64_read(&map->usercnt)) { + WRITE_ONCE(timer->timer, NULL); + to_free = t; + ret = -EPERM; + } out: __bpf_spin_unlock_irqrestore(&timer->lock); + kfree(to_free); return ret; } @@ -1372,7 +1384,7 @@ void bpf_timer_cancel_and_free(void *val) /* The subsequent bpf_timer_start/cancel() helpers won't be able to use * this timer, since it won't be initialized. */ - timer->timer = NULL; + WRITE_ONCE(timer->timer, NULL); out: __bpf_spin_unlock_irqrestore(&timer->lock); if (!t)