diff mbox series

[bpf,1/2] bpf: Check map->usercnt again after timer->timer is assigned

Message ID 20231017125717.241101-2-houtao@huaweicloud.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: Fix bpf timer kmemleak | expand

Checks

Context Check Description
bpf/vmtest-bpf-VM_Test-0 success Logs for ShellCheck
bpf/vmtest-bpf-PR success PR summary
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf, async
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1403 this patch: 1403
netdev/cc_maintainers fail 1 blamed authors not CCed: toke@redhat.com; 1 maintainers not CCed: toke@redhat.com
netdev/build_clang success Errors and warnings before: 1388 this patch: 1388
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1427 this patch: 1427
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 28 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-6 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-1 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-11 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-15 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-21 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-16 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-28 success Logs for veristat
bpf/vmtest-bpf-VM_Test-22 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-25 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-19 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-24 success Logs for test_verifier on aarch64 with gcc

Commit Message

Hou Tao Oct. 17, 2023, 12:57 p.m. UTC
From: Hou Tao <houtao1@huawei.com>

When there are concurrent uref release and bpf timer init operations,
the following sequence diagram is possible and it will lead to memory
leak:

bpf program X

bpf_timer_init()
  lock timer->lock
    read timer->timer as NULL
    read map->usercnt != 0

                process Y

                close(map_fd)
                  // put last uref
                  bpf_map_put_uref()
                    atomic_dec_and_test(map->usercnt)
                      array_map_free_timers()
                        bpf_timer_cancel_and_free()
                          // just return and lead to memory leak
                          read timer->timer is NULL

    t = bpf_map_kmalloc_node()
    timer->timer = t
  unlock timer->lock

Fix the problem by checking map->usercnt again after timer->timer is
assigned, so when there are concurrent uref release and bpf timer init,
either bpf_timer_cancel_and_free() from uref release reads a no-NULL
timer and the newly-added check of map->usercnt reads a zero usercnt.
Because atomic_dec_and_test(map->usercnt) and READ_ONCE(timer->timer)
in bpf_timer_cancel_and_free() are not protected by a lock, so add
a memory barrier to guarantee the order between map->usercnt and
timer->timer.

Reported-by: Hsin-Wei Hung <hsinweih@uci.edu>
Closes: https://lore.kernel.org/bpf/CABcoxUaT2k9hWsS1tNgXyoU3E-=PuOgMn737qK984fbFmfYixQ@mail.gmail.com
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/helpers.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 757b99c1e613..c9278a872e7b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1156,7 +1156,7 @@  BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
 	   u64, flags)
 {
 	clockid_t clockid = flags & (MAX_CLOCKS - 1);
-	struct bpf_hrtimer *t;
+	struct bpf_hrtimer *t, *to_free = NULL;
 	int ret = 0;
 
 	BUILD_BUG_ON(MAX_CLOCKS != 16);
@@ -1198,8 +1198,20 @@  BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
 	hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
 	t->timer.function = bpf_timer_cb;
 	timer->timer = t;
+	/* Guarantee order between timer->timer and map->usercnt. So when
+	 * there are concurrent uref release and bpf timer init, either
+	 * bpf_timer_cancel_and_free() called by uref release reads a no-NULL
+	 * timer or atomic64_read() below reads a zero usercnt.
+	 */
+	smp_mb__before_atomic();
+	if (!atomic64_read(&map->usercnt)) {
+		timer->timer = NULL;
+		to_free = t;
+		ret = -EPERM;
+	}
 out:
 	__bpf_spin_unlock_irqrestore(&timer->lock);
+	kfree(to_free);
 	return ret;
 }