Message ID | 20211027234504.30744-1-joannekoong@fb.com (mailing list archive) |
---|---|
Headers | show |
Series | Implement bloom filter map | expand |
On Wed, Oct 27, 2021 at 04:44:59PM -0700, Joanne Koong wrote: > This patchset adds a new kind of bpf map: the bloom filter map. > Bloom filters are a space-efficient probabilistic data structure > used to quickly test whether an element exists in a set. > For a brief overview about how bloom filters work, > https://en.wikipedia.org/wiki/Bloom_filter > may be helpful. > > One example use-case is an application leveraging a bloom filter > map to determine whether a computationally expensive hashmap > lookup can be avoided. If the element was not found in the bloom > filter map, the hashmap lookup can be skipped. > > This patchset includes benchmarks for testing the performance of > the bloom filter for different entry sizes and different number of > hash functions used, as well as comparisons for hashmap lookups > with vs. without the bloom filter. > > A high level overview of this patchset is as follows: > 1/5 - kernel changes for adding bloom filter map > 2/5 - libbpf changes for adding map_extra flags > 3/5 - tests for the bloom filter map > 4/5 - benchmarks for bloom filter lookup/update throughput and false positive > rate > 5/5 - benchmarks for how hashmap lookups perform with vs. without the bloom > filter > > v5 -> v6: > * in 1/5: remove "inline" from the hash function, add check in syscall to > fail out in cases where map_extra is not 0 for non-bloom-filter maps, > fix alignment matching issues, move "map_extra flags" comments to inside > the bpf_attr struct, add bpf_map_info map_extra changes here, add map_extra > assignment in bpf_map_get_info_by_fd, change hash value_size to u32 instead of > a u64 > * in 2/5: remove bpf_map_info map_extra changes, remove TODO comment about > extending BTF arrays to cover u64s, cast to unsigned long long for %llx when > printing out map_extra flags > * in 3/5: use __type(value, ...) instead of __uint(value_size, ...) for values > and keys > * in 4/5: fix wrong bounds for the index when iterating through random values, > update commit message to include update+lookup benchmark results for 8 byte > and 64-byte value sizes, remove explicit global bool initializaton to false > for hashmap_use_bloom and count_false_hits variables Thanks! Only have minor comments in patch 1. belated Acked-by: Martin KaFai Lau <kafai@fb.com>
On Thu, Oct 28, 2021 at 3:10 PM Martin KaFai Lau <kafai@fb.com> wrote: > > On Wed, Oct 27, 2021 at 04:44:59PM -0700, Joanne Koong wrote: > > This patchset adds a new kind of bpf map: the bloom filter map. > > Bloom filters are a space-efficient probabilistic data structure > > used to quickly test whether an element exists in a set. > > For a brief overview about how bloom filters work, > > https://en.wikipedia.org/wiki/Bloom_filter > > may be helpful. > > > > One example use-case is an application leveraging a bloom filter > > map to determine whether a computationally expensive hashmap > > lookup can be avoided. If the element was not found in the bloom > > filter map, the hashmap lookup can be skipped. > > > > This patchset includes benchmarks for testing the performance of > > the bloom filter for different entry sizes and different number of > > hash functions used, as well as comparisons for hashmap lookups > > with vs. without the bloom filter. > > > > A high level overview of this patchset is as follows: > > 1/5 - kernel changes for adding bloom filter map > > 2/5 - libbpf changes for adding map_extra flags > > 3/5 - tests for the bloom filter map > > 4/5 - benchmarks for bloom filter lookup/update throughput and false positive > > rate > > 5/5 - benchmarks for how hashmap lookups perform with vs. without the bloom > > filter > > > > v5 -> v6: > > * in 1/5: remove "inline" from the hash function, add check in syscall to > > fail out in cases where map_extra is not 0 for non-bloom-filter maps, > > fix alignment matching issues, move "map_extra flags" comments to inside > > the bpf_attr struct, add bpf_map_info map_extra changes here, add map_extra > > assignment in bpf_map_get_info_by_fd, change hash value_size to u32 instead of > > a u64 > > * in 2/5: remove bpf_map_info map_extra changes, remove TODO comment about > > extending BTF arrays to cover u64s, cast to unsigned long long for %llx when > > printing out map_extra flags > > * in 3/5: use __type(value, ...) instead of __uint(value_size, ...) for values > > and keys > > * in 4/5: fix wrong bounds for the index when iterating through random values, > > update commit message to include update+lookup benchmark results for 8 byte > > and 64-byte value sizes, remove explicit global bool initializaton to false > > for hashmap_use_bloom and count_false_hits variables > Thanks! Only have minor comments in patch 1. belated > Acked-by: Martin KaFai Lau <kafai@fb.com> Thanks for the detailed review and sorry for pushing too soon. I forced pushed your Ack. Joanne, pls follow up with fixes for patch 1 asap, so we get it cleaned up before the merge window.
On 10/28/21 4:05 PM, Alexei Starovoitov wrote: > On Thu, Oct 28, 2021 at 3:10 PM Martin KaFai Lau <kafai@fb.com> wrote: >> On Wed, Oct 27, 2021 at 04:44:59PM -0700, Joanne Koong wrote: >>> This patchset adds a new kind of bpf map: the bloom filter map. >>> Bloom filters are a space-efficient probabilistic data structure >>> used to quickly test whether an element exists in a set. >>> For a brief overview about how bloom filters work, >>> https://en.wikipedia.org/wiki/Bloom_filter >>> may be helpful. >>> >>> One example use-case is an application leveraging a bloom filter >>> map to determine whether a computationally expensive hashmap >>> lookup can be avoided. If the element was not found in the bloom >>> filter map, the hashmap lookup can be skipped. >>> >>> This patchset includes benchmarks for testing the performance of >>> the bloom filter for different entry sizes and different number of >>> hash functions used, as well as comparisons for hashmap lookups >>> with vs. without the bloom filter. >>> >>> A high level overview of this patchset is as follows: >>> 1/5 - kernel changes for adding bloom filter map >>> 2/5 - libbpf changes for adding map_extra flags >>> 3/5 - tests for the bloom filter map >>> 4/5 - benchmarks for bloom filter lookup/update throughput and false positive >>> rate >>> 5/5 - benchmarks for how hashmap lookups perform with vs. without the bloom >>> filter >>> >>> v5 -> v6: >>> * in 1/5: remove "inline" from the hash function, add check in syscall to >>> fail out in cases where map_extra is not 0 for non-bloom-filter maps, >>> fix alignment matching issues, move "map_extra flags" comments to inside >>> the bpf_attr struct, add bpf_map_info map_extra changes here, add map_extra >>> assignment in bpf_map_get_info_by_fd, change hash value_size to u32 instead of >>> a u64 >>> * in 2/5: remove bpf_map_info map_extra changes, remove TODO comment about >>> extending BTF arrays to cover u64s, cast to unsigned long long for %llx when >>> printing out map_extra flags >>> * in 3/5: use __type(value, ...) instead of __uint(value_size, ...) for values >>> and keys >>> * in 4/5: fix wrong bounds for the index when iterating through random values, >>> update commit message to include update+lookup benchmark results for 8 byte >>> and 64-byte value sizes, remove explicit global bool initializaton to false >>> for hashmap_use_bloom and count_false_hits variables >> Thanks! Only have minor comments in patch 1. belated >> Acked-by: Martin KaFai Lau <kafai@fb.com> > Thanks for the detailed review and sorry for pushing too soon. > I forced pushed your Ack. > > Joanne, pls follow up with fixes for patch 1 asap, so we get it cleaned up > before the merge window. Should the fixes be in a new separate patchset or as v7 of this existing patchset? Thanks.
On Thu, Oct 28, 2021 at 5:24 PM Joanne Koong <joannekoong@fb.com> wrote: > > Thanks for the detailed review and sorry for pushing too soon. > > I forced pushed your Ack. > > > > Joanne, pls follow up with fixes for patch 1 asap, so we get it cleaned up > > before the merge window. > Should the fixes be in a new separate patchset or as v7 of this existing > patchset? Thanks. The v6 was applied to bpf-next. Pls send a fix as another patch.