Message ID | Z6JXtA1M5jAZx8xD@debian.debian (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | handling EINTR from bpf_map_lookup_batch | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
Hi, On 2/5/2025 2:08 AM, Yan Zhai wrote: > I am getting EINTR when trying to use bpf_map_lookup_batch on an > array_of_maps. The error happens when there is a "hole" in the array. > For example, say the outer map has max entries of 256, each inner map > is used for a transport protocol, and I only populated key 6 and > 17 for TCP and UDP. Then when I do batch lookup, I always get EINTR. > This so far seems to only happen with array of maps. Does it make > sense to allow skipping to the next key for this map type? Something > like: > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index c420edbfb7c8..83915a8059ef 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -2027,6 +2027,8 @@ int generic_map_lookup_batch(struct bpf_map *map, > attr->batch.elem_flags); > > if (err == -ENOENT) { > + if (IS_FD_ARRAY(map) > + goto next_key; It seems only BPF_MAP_TYPE_ARRAY_OF_MAPS supports batched operation, so map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS will be enough. It is also better to reset err as 0, otherwise generic_map_lookup_batch may return -ENOENT. > if (retry) { > retry--; > continue; > @@ -2048,6 +2050,7 @@ int generic_map_lookup_batch(struct bpf_map *map, > goto free_buf; > } > > +next_key: > if (!prev_key) > prev_key = buf_prevkey; > Make sense. Please add a selftest for it. Another way is to return id 0 for these non-existent values in the fd array, but it may break existed prog. Just skipping the empty array slot is better. > Also the context about my scenario if anyone is curious: I am trying > to associate each map to a userspace service in a multi tenant > environment. This is an addition to cgroup accounting, in case the > creator cgroup goes away, e.g. systemd service restarts always > recreate cgroups. And we also want to monitor the utilization level of > non-prealloc maps of different tenants. When dealing with inner maps, > it is not always trivial. To connect dots I choose to read these IDs > periodically and link them to the tenant of the outer map, that's > where this EINTR occurred. > > best > Yan > > .
On Wed, Feb 5, 2025 at 2:19 AM Hou Tao <houtao@huaweicloud.com> wrote: > > Hi, > > On 2/5/2025 2:08 AM, Yan Zhai wrote: > > I am getting EINTR when trying to use bpf_map_lookup_batch on an > > array_of_maps. The error happens when there is a "hole" in the array. > > For example, say the outer map has max entries of 256, each inner map > > is used for a transport protocol, and I only populated key 6 and > > 17 for TCP and UDP. Then when I do batch lookup, I always get EINTR. > > This so far seems to only happen with array of maps. Does it make > > sense to allow skipping to the next key for this map type? Something > > like: > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index c420edbfb7c8..83915a8059ef 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -2027,6 +2027,8 @@ int generic_map_lookup_batch(struct bpf_map *map, > > attr->batch.elem_flags); > > > > if (err == -ENOENT) { > > + if (IS_FD_ARRAY(map) > > + goto next_key; > > It seems only BPF_MAP_TYPE_ARRAY_OF_MAPS supports batched operation, so > map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS will be enough. It is also > better to reset err as 0, otherwise generic_map_lookup_batch may return > -ENOENT. > > if (retry) { > > retry--; > > continue; > > @@ -2048,6 +2050,7 @@ int generic_map_lookup_batch(struct bpf_map *map, > > goto free_buf; > > } > > > > +next_key: > > if (!prev_key) > > prev_key = buf_prevkey; > > > > Make sense. Please add a selftest for it. Another way is to return id 0 > for these non-existent values in the fd array, but it may break existed > prog. Just skipping the empty array slot is better. Let's not invent new magic return values. But stepping back... why do we have this EINTR case at all? Can we always goto next_key for all map types? The command returns and a set of (key, value) pairs. It's always better to skip then get stuck in EINTR, since EINTR implies that the user space should retry and it might be successful next time. While here it's not the case. I don't see any selftests for EINTR, so I suspect it was added as escape path in case retry count exceeds 3 and author assumed that it should never happen in practice, so EINTR was expected to be 'never happens'. Clearly that's not the case.
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c420edbfb7c8..83915a8059ef 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2027,6 +2027,8 @@ int generic_map_lookup_batch(struct bpf_map *map, attr->batch.elem_flags); if (err == -ENOENT) { + if (IS_FD_ARRAY(map) + goto next_key; if (retry) { retry--; continue; @@ -2048,6 +2050,7 @@ int generic_map_lookup_batch(struct bpf_map *map, goto free_buf; } +next_key: if (!prev_key) prev_key = buf_prevkey;