diff mbox series

[bpf] bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps

Message ID 20230522154558.2166815-1-aspsk@isovalent.com (mailing list archive)
State Accepted
Delegated to: BPF
Headers show
Series [bpf] bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-12 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for test_maps on s390x with gcc
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/cc_maintainers warning 9 maintainers not CCed: daniel@iogearbox.net yhs@fb.com kpsingh@kernel.org john.fastabend@gmail.com sdf@google.com andrii@kernel.org jolsa@kernel.org haoluo@google.com ast@kernel.org
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Anton Protopopov May 22, 2023, 3:45 p.m. UTC
The LRU and LRU_PERCPU maps allocate a new element on update before locking the
target hash table bucket. Right after that the maps try to lock the bucket.
If this fails, then maps return -EBUSY to the caller without releasing the
allocated element. This makes the element untracked: it doesn't belong to
either of free lists, and it doesn't belong to the hash table, so can't be
re-used; this eventually leads to the permanent -ENOMEM on LRU map updates,
which is unexpected. Fix this by returning the element to the local free list
if bucket locking fails.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 kernel/bpf/hashtab.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Anton Protopopov May 22, 2023, 4:12 p.m. UTC | #1
On Mon, May 22, 2023 at 03:45:58PM +0000, Anton Protopopov wrote:
> The LRU and LRU_PERCPU maps allocate a new element on update before locking the
> target hash table bucket. Right after that the maps try to lock the bucket.
> If this fails, then maps return -EBUSY to the caller without releasing the
> allocated element. This makes the element untracked: it doesn't belong to
> either of free lists, and it doesn't belong to the hash table, so can't be
> re-used; this eventually leads to the permanent -ENOMEM on LRU map updates,
> which is unexpected. Fix this by returning the element to the local free list
> if bucket locking fails.

Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked")

> Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
> ---
>  kernel/bpf/hashtab.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 00c253b84bf5..9901efee4339 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1215,7 +1215,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
> 
>  	ret = htab_lock_bucket(htab, b, hash, &flags);
>  	if (ret)
> -		return ret;
> +		goto err_lock_bucket;
>  
>  	l_old = lookup_elem_raw(head, hash, key, key_size);
>  
> @@ -1236,6 +1236,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
>  err:
>  	htab_unlock_bucket(htab, b, hash, flags);
>  
> +err_lock_bucket:
>  	if (ret)
>  		htab_lru_push_free(htab, l_new);
>  	else if (l_old)
> @@ -1338,7 +1339,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
>  
>  	ret = htab_lock_bucket(htab, b, hash, &flags);
>  	if (ret)
> -		return ret;
> +		goto err_lock_bucket;
>  
>  	l_old = lookup_elem_raw(head, hash, key, key_size);
>  
> @@ -1361,6 +1362,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
>  	ret = 0;
>  err:
>  	htab_unlock_bucket(htab, b, hash, flags);
> +err_lock_bucket:
>  	if (l_new)
>  		bpf_lru_push_free(&htab->lru, &l_new->lru_node);
>  	return ret;
> -- 
> 2.34.1
>
Martin KaFai Lau May 22, 2023, 5:33 p.m. UTC | #2
On 5/22/23 9:12 AM, Anton Protopopov wrote:
> On Mon, May 22, 2023 at 03:45:58PM +0000, Anton Protopopov wrote:
>> The LRU and LRU_PERCPU maps allocate a new element on update before locking the
>> target hash table bucket. Right after that the maps try to lock the bucket.
>> If this fails, then maps return -EBUSY to the caller without releasing the
>> allocated element. This makes the element untracked: it doesn't belong to
>> either of free lists, and it doesn't belong to the hash table, so can't be
>> re-used; this eventually leads to the permanent -ENOMEM on LRU map updates,
>> which is unexpected.

Ouch. This is very bad. :(

Excellent catch. Applied.

I am thinking if a test could be written but it does not seem like anything 
after htab_lock_bucket() is traceable. Not sure if Song may have good idea?

>> Fix this by returning the element to the local free list if bucket locking fails.
> Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked")
>
diff mbox series

Patch

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 00c253b84bf5..9901efee4339 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1215,7 +1215,7 @@  static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value

 	ret = htab_lock_bucket(htab, b, hash, &flags);
 	if (ret)
-		return ret;
+		goto err_lock_bucket;
 
 	l_old = lookup_elem_raw(head, hash, key, key_size);
 
@@ -1236,6 +1236,7 @@  static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 err:
 	htab_unlock_bucket(htab, b, hash, flags);
 
+err_lock_bucket:
 	if (ret)
 		htab_lru_push_free(htab, l_new);
 	else if (l_old)
@@ -1338,7 +1339,7 @@  static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 
 	ret = htab_lock_bucket(htab, b, hash, &flags);
 	if (ret)
-		return ret;
+		goto err_lock_bucket;
 
 	l_old = lookup_elem_raw(head, hash, key, key_size);
 
@@ -1361,6 +1362,7 @@  static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 	ret = 0;
 err:
 	htab_unlock_bucket(htab, b, hash, flags);
+err_lock_bucket:
 	if (l_new)
 		bpf_lru_push_free(&htab->lru, &l_new->lru_node);
 	return ret;