diff mbox series

[bpf-next] RFC: bpf hashmap - improve iteration in the presence of concurrent delete operations

Message ID 20220201183514.3235495-1-zenczykowski@gmail.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series [bpf-next] RFC: bpf hashmap - improve iteration in the presence of concurrent delete operations | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 13 this patch: 13
netdev/cc_maintainers warning 6 maintainers not CCed: andrii@kernel.org kpsingh@kernel.org john.fastabend@gmail.com kafai@fb.com songliubraving@fb.com yhs@fb.com
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff fail author Signed-off-by missing
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 29 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next fail VM_Test
bpf/vmtest-bpf-next-PR fail PR summary

Commit Message

Maciej Żenczykowski Feb. 1, 2022, 6:35 p.m. UTC
From: Maciej Żenczykowski <maze@google.com>

by resuming from the bucket the key would be in if it still existed

Note: AFAICT this an API change that would require some sort of guard...

bpf map iteration was added all the way back in v3.18-rc4-939-g0f8e4bd8a1fc
but behaviour of find first key with NULL was only done in v4.11-rc7-2042-g8fe45924387b
(before that you'd get EFAULT)

this means previously find first key was done by calling with a non existing key
(AFAICT this was hoping to get lucky, since if your non existing key actually existed,
it wouldn't actually iterate right, since you couldn't guarantee your magic value was
the first one)

ie. do we need a BPF_MAP_GET_NEXT_KEY2 or a sysctl? some other approach?

Not-Signed-off-by-Yet: Maciej Żenczykowski <maze@google.com>
---
 kernel/bpf/hashtab.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Martin KaFai Lau Feb. 1, 2022, 9:43 p.m. UTC | #1
On Tue, Feb 01, 2022 at 10:35:14AM -0800, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
> 
> by resuming from the bucket the key would be in if it still existed
> 
> Note: AFAICT this an API change that would require some sort of guard...
> 
> bpf map iteration was added all the way back in v3.18-rc4-939-g0f8e4bd8a1fc
> but behaviour of find first key with NULL was only done in v4.11-rc7-2042-g8fe45924387b
> (before that you'd get EFAULT)
> 
> this means previously find first key was done by calling with a non existing key
> (AFAICT this was hoping to get lucky, since if your non existing key actually existed,
> it wouldn't actually iterate right, since you couldn't guarantee your magic value was
> the first one)
> 
> ie. do we need a BPF_MAP_GET_NEXT_KEY2 or a sysctl? some other approach?
BPF_MAP_LOOKUP_BATCH has already been added to lookup in batches of buckets,
so the next bpf_map_lookup_batch will start from the beginning of
the next bucket.  Here is the details:

https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com/#t
diff mbox series

Patch

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d29af9988f37..0d61a9a54279 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -780,7 +780,7 @@  static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 	key_size = map->key_size;
 
 	if (!key)
-		goto find_first_elem;
+		goto find_first_elem_this_bucket;
 
 	hash = htab_map_hash(key, key_size, htab->hashrnd);
 
@@ -790,7 +790,7 @@  static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 	l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets);
 
 	if (!l)
-		goto find_first_elem;
+		goto find_first_elem_next_bucket;
 
 	/* key was found, get next key in the same bucket */
 	next_l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_next_rcu(&l->hash_node)),
@@ -802,11 +802,12 @@  static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 		return 0;
 	}
 
+find_first_elem_next_bucket:
 	/* no more elements in this hash list, go to the next bucket */
 	i = hash & (htab->n_buckets - 1);
 	i++;
 
-find_first_elem:
+find_first_elem_this_bucket:
 	/* iterate over buckets */
 	for (; i < htab->n_buckets; i++) {
 		head = select_bucket(htab, i);