Message ID | 20230622095330.1023453-1-aspsk@isovalent.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: add percpu stats for bpf_map | expand |
On 6/22/23 11:53 AM, Anton Protopopov wrote: > This series adds a mechanism for maps to populate per-cpu counters of elements > on insertions/deletions. The sum of these counters can be accessed by a new > kfunc from a map iterator program. > > The following patches are present in the series: > > * Patch 1 adds a generic per-cpu counter to struct bpf_map > * Patch 2 utilizes this mechanism for hash-based maps > * Patch 3 extends the preloaded map iterator to dump the sum > * Patch 4 adds a self-test for the change > > The reason for adding this functionality in our case (Cilium) is to get > signals about how full some heavy-used maps are and what the actual dynamic > profile of map capacity is. In the case of LRU maps this is impossible to get > this information anyhow else. See also [1]. > > This is a v2 for the https://lore.kernel.org/bpf/20230531110511.64612-1-aspsk@isovalent.com/T/#t > It was rewritten according to previous comments. I've turned this series into > an RFC for two reasons: > > 1) This patch only works on systems where this_cpu_{inc,dec} is atomic for s64. > For systems which might write s64 non-atomically this would be required to use > some locking mechanism to prevent readers from reading trash via the > bpf_map_sum_elements_counter() kfunc (see patch 1) > > 2) In comparison with the v1, we're adding extra instructions per map operation > (for preallocated maps, as well as for non-preallocated maps). The only > functionality we're interested at the moment is the number of elements present > in a map, not a per-cpu statistics. This could be better achieved by using > the v1 version, which only adds computations for preallocated maps. > > So, the question is: won't it be fine to do the changes in the following way: > > * extend the preallocated hash maps to populate percpu batch counters as in v1 > * add a kfunc as in v2 to get the current sum > > This works as > > * nobody at the moment actually requires the per-cpu statistcs > * this implementation can be transparently turned into per-cpu statistics, if > such a need occurs on practice (the only thing to change would be to > re-implement the kfunc and, maybe, add more kfuncs to get per-cpu stats) > * the "v1 way" is the least intrusive: it only affects preallocated maps, as > other maps already provide the required functionality > > [1] https://lpc.events/event/16/contributions/1368/ > > v1 -> v2: > - make the counters generic part of struct bpf_map > - don't use map_info and /proc/self/fdinfo in favor of a kfunc Tbh, I did like v1 approach a bit better. We are trying to bend over backwards just so that we don't add things to uapi, but in the end we are also adding it to the maps.debug, etc (yes, it has .debug in the name and all) ... > for maps can be extended to print the element count, so the user can have > convenient 'cat /sys/fs/bpf/maps.debug' way to debug maps. ... I feel with convenience access, some folks in the wild might build apps parsing it in the end instead of writing iterators. I would be curious to hear from Google and Meta folks in terms of how you observe map fillup or other useful map related stats in production (unless you don't watch out for this?), and if there is a common denominator we could come up with that would be really useful for all parties and thus potentially make sense for fdinfo or as an extensible bpf_map_info stats extension in case there is some agreement? In our case we don't really care about exact numbers, just more around whether a user is scaling up their k8s cluster to the point where the previous configurations on sizing e.g. around LB mappings etc would hit the limits in the foreseeable future if they dial up further. This is exported to dashboards (e.g. grafana) for visibility so operators can know ahead of time. The iterator approach is okay, less convenient, but if there is indeed nothing common from other production users which helped map visibility over the years, then so be it. I thought to at least ask given we have this thread. :) Thanks, Daniel
On Fri, Jun 23, 2023 at 2:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 6/22/23 11:53 AM, Anton Protopopov wrote: > > This series adds a mechanism for maps to populate per-cpu counters of elements > > on insertions/deletions. The sum of these counters can be accessed by a new > > kfunc from a map iterator program. > > > > The following patches are present in the series: > > > > * Patch 1 adds a generic per-cpu counter to struct bpf_map > > * Patch 2 utilizes this mechanism for hash-based maps > > * Patch 3 extends the preloaded map iterator to dump the sum > > * Patch 4 adds a self-test for the change > > > > The reason for adding this functionality in our case (Cilium) is to get > > signals about how full some heavy-used maps are and what the actual dynamic > > profile of map capacity is. In the case of LRU maps this is impossible to get > > this information anyhow else. See also [1]. > > > > This is a v2 for the https://lore.kernel.org/bpf/20230531110511.64612-1-aspsk@isovalent.com/T/#t > > It was rewritten according to previous comments. I've turned this series into > > an RFC for two reasons: > > > > 1) This patch only works on systems where this_cpu_{inc,dec} is atomic for s64. > > For systems which might write s64 non-atomically this would be required to use > > some locking mechanism to prevent readers from reading trash via the > > bpf_map_sum_elements_counter() kfunc (see patch 1) > > > > 2) In comparison with the v1, we're adding extra instructions per map operation > > (for preallocated maps, as well as for non-preallocated maps). The only > > functionality we're interested at the moment is the number of elements present > > in a map, not a per-cpu statistics. This could be better achieved by using > > the v1 version, which only adds computations for preallocated maps. > > > > So, the question is: won't it be fine to do the changes in the following way: > > > > * extend the preallocated hash maps to populate percpu batch counters as in v1 > > * add a kfunc as in v2 to get the current sum > > > > This works as > > > > * nobody at the moment actually requires the per-cpu statistcs > > * this implementation can be transparently turned into per-cpu statistics, if > > such a need occurs on practice (the only thing to change would be to > > re-implement the kfunc and, maybe, add more kfuncs to get per-cpu stats) > > * the "v1 way" is the least intrusive: it only affects preallocated maps, as > > other maps already provide the required functionality > > > > [1] https://lpc.events/event/16/contributions/1368/ > > > > v1 -> v2: > > - make the counters generic part of struct bpf_map > > - don't use map_info and /proc/self/fdinfo in favor of a kfunc > > Tbh, I did like v1 approach a bit better. We are trying to bend over backwards just > so that we don't add things to uapi, but in the end we are also adding it to the > maps.debug, etc (yes, it has .debug in the name and all) ... I think we should keep bending even more backwards to avoid uapi burden. All new features should be non-uapi no matter how many people say "I'll definitely use it". > or as an extensible bpf_map_info stats extension in case there is some agreement? I'd rather not. bpf_map_info returns what user space sent to the kernel earlier. stats or anything that user space didn't explicitly give earlier is quite different. Same goes for bpf_prog_info. We only added verified_insns there that doesn't fit the above definition and it was a mistake. After almost 2 years it is still unused and cannot be removed. veristat is parsing non-uapi verifier log. Tooling can live with non-uapi map stats just fine.
On 6/24/23 2:17 AM, Alexei Starovoitov wrote: > On Fri, Jun 23, 2023 at 2:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote: >> On 6/22/23 11:53 AM, Anton Protopopov wrote: >>> This series adds a mechanism for maps to populate per-cpu counters of elements >>> on insertions/deletions. The sum of these counters can be accessed by a new >>> kfunc from a map iterator program. >>> >>> The following patches are present in the series: >>> >>> * Patch 1 adds a generic per-cpu counter to struct bpf_map >>> * Patch 2 utilizes this mechanism for hash-based maps >>> * Patch 3 extends the preloaded map iterator to dump the sum >>> * Patch 4 adds a self-test for the change >>> >>> The reason for adding this functionality in our case (Cilium) is to get >>> signals about how full some heavy-used maps are and what the actual dynamic >>> profile of map capacity is. In the case of LRU maps this is impossible to get >>> this information anyhow else. See also [1]. >>> >>> This is a v2 for the https://lore.kernel.org/bpf/20230531110511.64612-1-aspsk@isovalent.com/T/#t >>> It was rewritten according to previous comments. I've turned this series into >>> an RFC for two reasons: >>> >>> 1) This patch only works on systems where this_cpu_{inc,dec} is atomic for s64. >>> For systems which might write s64 non-atomically this would be required to use >>> some locking mechanism to prevent readers from reading trash via the >>> bpf_map_sum_elements_counter() kfunc (see patch 1) >>> >>> 2) In comparison with the v1, we're adding extra instructions per map operation >>> (for preallocated maps, as well as for non-preallocated maps). The only >>> functionality we're interested at the moment is the number of elements present >>> in a map, not a per-cpu statistics. This could be better achieved by using >>> the v1 version, which only adds computations for preallocated maps. >>> >>> So, the question is: won't it be fine to do the changes in the following way: >>> >>> * extend the preallocated hash maps to populate percpu batch counters as in v1 >>> * add a kfunc as in v2 to get the current sum >>> >>> This works as >>> >>> * nobody at the moment actually requires the per-cpu statistcs >>> * this implementation can be transparently turned into per-cpu statistics, if >>> such a need occurs on practice (the only thing to change would be to >>> re-implement the kfunc and, maybe, add more kfuncs to get per-cpu stats) >>> * the "v1 way" is the least intrusive: it only affects preallocated maps, as >>> other maps already provide the required functionality >>> >>> [1] https://lpc.events/event/16/contributions/1368/ >>> >>> v1 -> v2: >>> - make the counters generic part of struct bpf_map >>> - don't use map_info and /proc/self/fdinfo in favor of a kfunc >> >> Tbh, I did like v1 approach a bit better. We are trying to bend over backwards just >> so that we don't add things to uapi, but in the end we are also adding it to the >> maps.debug, etc (yes, it has .debug in the name and all) ... > > I think we should keep bending even more backwards to avoid uapi burden. > All new features should be non-uapi no matter how many people > say "I'll definitely use it". > >> or as an extensible bpf_map_info stats extension in case there is some agreement? > > I'd rather not. > bpf_map_info returns what user space sent to the kernel earlier. > stats or anything that user space didn't explicitly give earlier > is quite different. > Same goes for bpf_prog_info. > We only added verified_insns there that doesn't fit the above definition > and it was a mistake. After almost 2 years it is still unused > and cannot be removed. Right, hence my question also on how others observe maps in production to see if there are similar use cases and approaches as we have. > veristat is parsing non-uapi verifier log. > Tooling can live with non-uapi map stats just fine. Sure, that's okay. Thanks, Daniel