diff mbox series

[v6,bpf-next,1/5] bpf: Add bloom filter map implementation

Message ID 20211027234504.30744-2-joannekoong@fb.com (mailing list archive)
State Accepted
Delegated to: BPF
Headers show
Series Implement bloom filter map | expand

Checks

Context Check Description
netdev/cover_letter success Series has a cover letter
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 11 maintainers not CCed: john.fastabend@gmail.com revest@chromium.org yhs@fb.com jackmanb@google.com songliubraving@fb.com netdev@vger.kernel.org kpsingh@kernel.org davemarchevsky@fb.com brouer@redhat.com joe@cilium.io liuhangbin@gmail.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 11790 this patch: 11790
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/build_allmodconfig_warn success Errors and warnings before: 11421 this patch: 11421
netdev/header_inline success No static functions without inline keyword in header files
bpf/vmtest-bpf-next fail VM_Test
bpf/vmtest-bpf-next-PR fail PR summary

Commit Message

Joanne Koong Oct. 27, 2021, 11:45 p.m. UTC
This patch adds the kernel-side changes for the implementation of
a bpf bloom filter map.

The bloom filter map supports peek (determining whether an element
is present in the map) and push (adding an element to the map)
operations.These operations are exposed to userspace applications
through the already existing syscalls in the following way:

BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_UPDATE_ELEM -> push

The bloom filter map does not have keys, only values. In light of
this, the bloom filter map's API matches that of queue stack maps:
user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
APIs to query or add an element to the bloom filter map. When the
bloom filter map is created, it must be created with a key_size of 0.

For updates, the user will pass in the element to add to the map
as the value, with a NULL key. For lookups, the user will pass in the
element to query in the map as the value, with a NULL key. In the
verifier layer, this requires us to modify the argument type of
a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
as well, in the syscall layer, we need to copy over the user value
so that in bpf_map_peek_elem, we know which specific value to query.

A few things to please take note of:
 * If there are any concurrent lookups + updates, the user is
responsible for synchronizing this to ensure no false negative lookups
occur.
 * The number of hashes to use for the bloom filter is configurable from
userspace. If no number is specified, the default used will be 5 hash
functions. The benchmarks later in this patchset can help compare the
performance of using different number of hashes on different entry
sizes. In general, using more hashes decreases both the false positive
rate and the speed of a lookup.
 * Deleting an element in the bloom filter map is not supported.
 * The bloom filter map may be used as an inner map.
 * The "max_entries" size that is specified at map creation time is used
to approximate a reasonable bitmap size for the bloom filter, and is not
otherwise strictly enforced. If the user wishes to insert more entries
into the bloom filter than "max_entries", they may do so but they should
be aware that this may lead to a higher false positive rate.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 include/linux/bpf.h            |   1 +
 include/linux/bpf_types.h      |   1 +
 include/uapi/linux/bpf.h       |   9 ++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/bloom_filter.c      | 195 +++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  24 +++-
 kernel/bpf/verifier.c          |  19 +++-
 tools/include/uapi/linux/bpf.h |   9 ++
 8 files changed, 253 insertions(+), 7 deletions(-)
 create mode 100644 kernel/bpf/bloom_filter.c

Comments

Andrii Nakryiko Oct. 28, 2021, 6:15 p.m. UTC | #1
On Wed, Oct 27, 2021 at 4:45 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch adds the kernel-side changes for the implementation of
> a bpf bloom filter map.
>
> The bloom filter map supports peek (determining whether an element
> is present in the map) and push (adding an element to the map)
> operations.These operations are exposed to userspace applications
> through the already existing syscalls in the following way:
>
> BPF_MAP_LOOKUP_ELEM -> peek
> BPF_MAP_UPDATE_ELEM -> push
>
> The bloom filter map does not have keys, only values. In light of
> this, the bloom filter map's API matches that of queue stack maps:
> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> APIs to query or add an element to the bloom filter map. When the
> bloom filter map is created, it must be created with a key_size of 0.
>
> For updates, the user will pass in the element to add to the map
> as the value, with a NULL key. For lookups, the user will pass in the
> element to query in the map as the value, with a NULL key. In the
> verifier layer, this requires us to modify the argument type of
> a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
> as well, in the syscall layer, we need to copy over the user value
> so that in bpf_map_peek_elem, we know which specific value to query.
>
> A few things to please take note of:
>  * If there are any concurrent lookups + updates, the user is
> responsible for synchronizing this to ensure no false negative lookups
> occur.
>  * The number of hashes to use for the bloom filter is configurable from
> userspace. If no number is specified, the default used will be 5 hash
> functions. The benchmarks later in this patchset can help compare the
> performance of using different number of hashes on different entry
> sizes. In general, using more hashes decreases both the false positive
> rate and the speed of a lookup.
>  * Deleting an element in the bloom filter map is not supported.
>  * The bloom filter map may be used as an inner map.
>  * The "max_entries" size that is specified at map creation time is used
> to approximate a reasonable bitmap size for the bloom filter, and is not
> otherwise strictly enforced. If the user wishes to insert more entries
> into the bloom filter than "max_entries", they may do so but they should
> be aware that this may lead to a higher false positive rate.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---

Don't forget to keep received Acks between revisions.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  include/linux/bpf.h            |   1 +
>  include/linux/bpf_types.h      |   1 +
>  include/uapi/linux/bpf.h       |   9 ++
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/bloom_filter.c      | 195 +++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c           |  24 +++-
>  kernel/bpf/verifier.c          |  19 +++-
>  tools/include/uapi/linux/bpf.h |   9 ++
>  8 files changed, 253 insertions(+), 7 deletions(-)
>  create mode 100644 kernel/bpf/bloom_filter.c

[...]
Alexei Starovoitov Oct. 28, 2021, 8:35 p.m. UTC | #2
On Wed, Oct 27, 2021 at 4:45 PM Joanne Koong <joannekoong@fb.com> wrote:
> @@ -1080,6 +1089,14 @@ static int map_lookup_elem(union bpf_attr *attr)
>         if (!value)
>                 goto free_key;
>
> +       if (map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
> +               if (copy_from_user(value, uvalue, value_size))
> +                       err = -EFAULT;
> +               else
> +                       err = bpf_map_copy_value(map, key, value, attr->flags);
> +               goto free_value;
> +       }
> +

Applied to bpf-next.
I couldn't find where lookup from user space is tested.
Please follow up with an extra test if that's the case.
Martin KaFai Lau Oct. 28, 2021, 9:14 p.m. UTC | #3
On Wed, Oct 27, 2021 at 04:45:00PM -0700, Joanne Koong wrote:
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 31421c74ba08..50105e0b8fcc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -169,6 +169,7 @@ struct bpf_map {
The earlier context is copied here:

	struct bpf_map *inner_map_meta;
#ifdef CONFIG_SECURITY
        void *security;
#endif

>  	u32 value_size;
>  	u32 max_entries;
>  	u32 map_flags;
> +	u64 map_extra; /* any per-map-type extra fields */
There is a 4 byte hole before the new 'u64 map_extra'.  Try to move
it before map_flags

Later in this struct. This existing comment needs to be updated also:
	/* 22 bytes hole */

>  	int spin_lock_off; /* >=0 valid offset, <0 error */
>  	int timer_off; /* >=0 valid offset, <0 error */
>  	u32 id;
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 9c81724e4b98..c4424ac2fa02 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
>  #endif
>  BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
> +BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
>  
>  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c10820037883..8bead4aa3ad0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -906,6 +906,7 @@ enum bpf_map_type {
>  	BPF_MAP_TYPE_RINGBUF,
>  	BPF_MAP_TYPE_INODE_STORAGE,
>  	BPF_MAP_TYPE_TASK_STORAGE,
> +	BPF_MAP_TYPE_BLOOM_FILTER,
>  };
>  
>  /* Note that tracing related programs such as
> @@ -1274,6 +1275,13 @@ union bpf_attr {
>  						   * struct stored as the
>  						   * map value
>  						   */
> +		/* Any per-map-type extra fields
> +		 *
> +		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
> +		 * number of hash functions (if 0, the bloom filter will default
> +		 * to using 5 hash functions).
> +		 */
> +		__u64	map_extra;
>  	};
>  
>  	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
> @@ -5638,6 +5646,7 @@ struct bpf_map_info {
>  	__u32 btf_id;
>  	__u32 btf_key_type_id;
>  	__u32 btf_value_type_id;
There is also a 4 byte hole here.  A "__u32 :32" is needed.
You can find details in 36f9814a494a ("bpf: fix uapi hole for 32 bit compat applications")

> +	__u64 map_extra;
>  } __attribute__((aligned(8)));

[ ... ]

> +static int peek_elem(struct bpf_map *map, void *value)
These generic map-ops names could be confusing in tracing and
in perf-report.  There was a 'bloom_filter_map_' prefix in the earlier version.
I could have missed something in the earlier discussion threads.
What was the reason in dropping the prefix?

> +{
> +	struct bpf_bloom_filter *bloom =
> +		container_of(map, struct bpf_bloom_filter, map);
> +	u32 i, h;
> +
> +	for (i = 0; i < bloom->nr_hash_funcs; i++) {
> +		h = hash(bloom, value, map->value_size, i);
> +		if (!test_bit(h, bloom->bitset))
> +			return -ENOENT;
> +	}
> +
> +	return 0;
> +}
> +
> +static int push_elem(struct bpf_map *map, void *value, u64 flags)
> +{
> +	struct bpf_bloom_filter *bloom =
> +		container_of(map, struct bpf_bloom_filter, map);
> +	u32 i, h;
> +
> +	if (flags != BPF_ANY)
> +		return -EINVAL;
> +
> +	for (i = 0; i < bloom->nr_hash_funcs; i++) {
> +		h = hash(bloom, value, map->value_size, i);
> +		set_bit(h, bloom->bitset);
> +	}
> +
> +	return 0;
> +}
> +
> +static int pop_elem(struct bpf_map *map, void *value)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static struct bpf_map *map_alloc(union bpf_attr *attr)
> +{
> +	u32 bitset_bytes, bitset_mask, nr_hash_funcs, nr_bits;
> +	int numa_node = bpf_map_attr_numa_node(attr);
> +	struct bpf_bloom_filter *bloom;
> +
> +	if (!bpf_capable())
> +		return ERR_PTR(-EPERM);
> +
> +	if (attr->key_size != 0 || attr->value_size == 0 ||
> +	    attr->max_entries == 0 ||
> +	    attr->map_flags & ~BLOOM_CREATE_FLAG_MASK ||
> +	    !bpf_map_flags_access_ok(attr->map_flags) ||
> +	    (attr->map_extra & ~0xF))
> +		return ERR_PTR(-EINVAL);
> +
> +	/* The lower 4 bits of map_extra specify the number of hash functions */
> +	nr_hash_funcs = attr->map_extra & 0xF;
nit. "& 0xF" is unnecessary.  It has just been tested immediately above.

> +	if (nr_hash_funcs == 0)
> +		/* Default to using 5 hash functions if unspecified */
> +		nr_hash_funcs = 5;
> +
> +	/* For the bloom filter, the optimal bit array size that minimizes the
> +	 * false positive probability is n * k / ln(2) where n is the number of
> +	 * expected entries in the bloom filter and k is the number of hash
> +	 * functions. We use 7 / 5 to approximate 1 / ln(2).
> +	 *
> +	 * We round this up to the nearest power of two to enable more efficient
> +	 * hashing using bitmasks. The bitmask will be the bit array size - 1.
> +	 *
> +	 * If this overflows a u32, the bit array size will have 2^32 (4
> +	 * GB) bits.
> +	 */
> +	if (check_mul_overflow(attr->max_entries, nr_hash_funcs, &nr_bits) ||
> +	    check_mul_overflow(nr_bits / 5, (u32)7, &nr_bits) ||
> +	    nr_bits > (1UL << 31)) {
> +		/* The bit array size is 2^32 bits but to avoid overflowing the
> +		 * u32, we use U32_MAX, which will round up to the equivalent
> +		 * number of bytes
> +		 */
> +		bitset_bytes = BITS_TO_BYTES(U32_MAX);
> +		bitset_mask = U32_MAX;
> +	} else {
> +		if (nr_bits <= BITS_PER_LONG)
> +			nr_bits = BITS_PER_LONG;
> +		else
> +			nr_bits = roundup_pow_of_two(nr_bits);
> +		bitset_bytes = BITS_TO_BYTES(nr_bits);
> +		bitset_mask = nr_bits - 1;
> +	}
> +
> +	bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
> +	bloom = bpf_map_area_alloc(sizeof(*bloom) + bitset_bytes, numa_node);
> +
> +	if (!bloom)
> +		return ERR_PTR(-ENOMEM);
> +
> +	bpf_map_init_from_attr(&bloom->map, attr);
> +
> +	bloom->nr_hash_funcs = nr_hash_funcs;
> +	bloom->bitset_mask = bitset_mask;
> +
> +	/* Check whether the value size is u32-aligned */
> +	if ((attr->value_size & (sizeof(u32) - 1)) == 0)
> +		bloom->aligned_u32_count =
> +			attr->value_size / sizeof(u32);
> +
> +	if (!(attr->map_flags & BPF_F_ZERO_SEED))
> +		bloom->hash_seed = get_random_int();
> +
> +	return &bloom->map;
> +}
Joanne Koong Oct. 29, 2021, 12:15 a.m. UTC | #4
On 10/28/21 11:15 AM, Andrii Nakryiko wrote:

> On Wed, Oct 27, 2021 at 4:45 PM Joanne Koong <joannekoong@fb.com> wrote:
>> This patch adds the kernel-side changes for the implementation of
>> a bpf bloom filter map.
>>
>> The bloom filter map supports peek (determining whether an element
>> is present in the map) and push (adding an element to the map)
>> operations.These operations are exposed to userspace applications
>> through the already existing syscalls in the following way:
>>
>> BPF_MAP_LOOKUP_ELEM -> peek
>> BPF_MAP_UPDATE_ELEM -> push
>>
>> The bloom filter map does not have keys, only values. In light of
>> this, the bloom filter map's API matches that of queue stack maps:
>> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
>> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
>> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
>> APIs to query or add an element to the bloom filter map. When the
>> bloom filter map is created, it must be created with a key_size of 0.
>>
>> For updates, the user will pass in the element to add to the map
>> as the value, with a NULL key. For lookups, the user will pass in the
>> element to query in the map as the value, with a NULL key. In the
>> verifier layer, this requires us to modify the argument type of
>> a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
>> as well, in the syscall layer, we need to copy over the user value
>> so that in bpf_map_peek_elem, we know which specific value to query.
>>
>> A few things to please take note of:
>>   * If there are any concurrent lookups + updates, the user is
>> responsible for synchronizing this to ensure no false negative lookups
>> occur.
>>   * The number of hashes to use for the bloom filter is configurable from
>> userspace. If no number is specified, the default used will be 5 hash
>> functions. The benchmarks later in this patchset can help compare the
>> performance of using different number of hashes on different entry
>> sizes. In general, using more hashes decreases both the false positive
>> rate and the speed of a lookup.
>>   * Deleting an element in the bloom filter map is not supported.
>>   * The bloom filter map may be used as an inner map.
>>   * The "max_entries" size that is specified at map creation time is used
>> to approximate a reasonable bitmap size for the bloom filter, and is not
>> otherwise strictly enforced. If the user wishes to insert more entries
>> into the bloom filter than "max_entries", they may do so but they should
>> be aware that this may lead to a higher false positive rate.
>>
>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
>> ---
> Don't forget to keep received Acks between revisions.
>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
Can you elaborate a little on how to keep received Acks between revisions?

Should I copy and paste the "Acked-by: Andrii Nakryiko <andrii@kernel.org>"
line into the commit message for the patch? Or should this information be
in the subject line of the email for the patch? Or in the patchset series'
cover letter? Thanks!

>
>>   include/linux/bpf.h            |   1 +
>>   include/linux/bpf_types.h      |   1 +
>>   include/uapi/linux/bpf.h       |   9 ++
>>   kernel/bpf/Makefile            |   2 +-
>>   kernel/bpf/bloom_filter.c      | 195 +++++++++++++++++++++++++++++++++
>>   kernel/bpf/syscall.c           |  24 +++-
>>   kernel/bpf/verifier.c          |  19 +++-
>>   tools/include/uapi/linux/bpf.h |   9 ++
>>   8 files changed, 253 insertions(+), 7 deletions(-)
>>   create mode 100644 kernel/bpf/bloom_filter.c
> [...]
Andrii Nakryiko Oct. 29, 2021, 12:44 a.m. UTC | #5
On Thu, Oct 28, 2021 at 5:15 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> On 10/28/21 11:15 AM, Andrii Nakryiko wrote:
>
> > On Wed, Oct 27, 2021 at 4:45 PM Joanne Koong <joannekoong@fb.com> wrote:
> >> This patch adds the kernel-side changes for the implementation of
> >> a bpf bloom filter map.
> >>
> >> The bloom filter map supports peek (determining whether an element
> >> is present in the map) and push (adding an element to the map)
> >> operations.These operations are exposed to userspace applications
> >> through the already existing syscalls in the following way:
> >>
> >> BPF_MAP_LOOKUP_ELEM -> peek
> >> BPF_MAP_UPDATE_ELEM -> push
> >>
> >> The bloom filter map does not have keys, only values. In light of
> >> this, the bloom filter map's API matches that of queue stack maps:
> >> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> >> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> >> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> >> APIs to query or add an element to the bloom filter map. When the
> >> bloom filter map is created, it must be created with a key_size of 0.
> >>
> >> For updates, the user will pass in the element to add to the map
> >> as the value, with a NULL key. For lookups, the user will pass in the
> >> element to query in the map as the value, with a NULL key. In the
> >> verifier layer, this requires us to modify the argument type of
> >> a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
> >> as well, in the syscall layer, we need to copy over the user value
> >> so that in bpf_map_peek_elem, we know which specific value to query.
> >>
> >> A few things to please take note of:
> >>   * If there are any concurrent lookups + updates, the user is
> >> responsible for synchronizing this to ensure no false negative lookups
> >> occur.
> >>   * The number of hashes to use for the bloom filter is configurable from
> >> userspace. If no number is specified, the default used will be 5 hash
> >> functions. The benchmarks later in this patchset can help compare the
> >> performance of using different number of hashes on different entry
> >> sizes. In general, using more hashes decreases both the false positive
> >> rate and the speed of a lookup.
> >>   * Deleting an element in the bloom filter map is not supported.
> >>   * The bloom filter map may be used as an inner map.
> >>   * The "max_entries" size that is specified at map creation time is used
> >> to approximate a reasonable bitmap size for the bloom filter, and is not
> >> otherwise strictly enforced. If the user wishes to insert more entries
> >> into the bloom filter than "max_entries", they may do so but they should
> >> be aware that this may lead to a higher false positive rate.
> >>
> >> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> >> ---
> > Don't forget to keep received Acks between revisions.
> >
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Can you elaborate a little on how to keep received Acks between revisions?
>
> Should I copy and paste the "Acked-by: Andrii Nakryiko <andrii@kernel.org>"

Yes, it's all manual. See [0] for an example.

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20211028063501.2239335-9-memxor@gmail.com/

> line into the commit message for the patch? Or should this information be
> in the subject line of the email for the patch? Or in the patchset series'
> cover letter? Thanks!
>
> >
> >>   include/linux/bpf.h            |   1 +
> >>   include/linux/bpf_types.h      |   1 +
> >>   include/uapi/linux/bpf.h       |   9 ++
> >>   kernel/bpf/Makefile            |   2 +-
> >>   kernel/bpf/bloom_filter.c      | 195 +++++++++++++++++++++++++++++++++
> >>   kernel/bpf/syscall.c           |  24 +++-
> >>   kernel/bpf/verifier.c          |  19 +++-
> >>   tools/include/uapi/linux/bpf.h |   9 ++
> >>   8 files changed, 253 insertions(+), 7 deletions(-)
> >>   create mode 100644 kernel/bpf/bloom_filter.c
> > [...]
Joanne Koong Oct. 29, 2021, 3:17 a.m. UTC | #6
On 10/28/21 2:14 PM, Martin KaFai Lau wrote:

> On Wed, Oct 27, 2021 at 04:45:00PM -0700, Joanne Koong wrote:
[...]
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index c10820037883..8bead4aa3ad0 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -906,6 +906,7 @@ enum bpf_map_type {
>>   	BPF_MAP_TYPE_RINGBUF,
>>   	BPF_MAP_TYPE_INODE_STORAGE,
>>   	BPF_MAP_TYPE_TASK_STORAGE,
>> +	BPF_MAP_TYPE_BLOOM_FILTER,
>>   };
>>   
>>   /* Note that tracing related programs such as
>> @@ -1274,6 +1275,13 @@ union bpf_attr {
>>   						   * struct stored as the
>>   						   * map value
>>   						   */
>> +		/* Any per-map-type extra fields
>> +		 *
>> +		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
>> +		 * number of hash functions (if 0, the bloom filter will default
>> +		 * to using 5 hash functions).
>> +		 */
>> +		__u64	map_extra;
>>   	};
>>   
When I run pahole (on an x86-64 machine), I see that there's an 8 byte hole
right before map_extra in the "union bpf_attr" struct (above this 
paragraph).
It seems like this should be padded as well with a "__u64 :64;"? I will 
add that in.
>>   	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
>> @@ -5638,6 +5646,7 @@ struct bpf_map_info {
>>   	__u32 btf_id;
>>   	__u32 btf_key_type_id;
>>   	__u32 btf_value_type_id;
> There is also a 4 byte hole here.  A "__u32 :32" is needed.
> You can find details in 36f9814a494a ("bpf: fix uapi hole for 32 bit compat applications")
>
>> +	__u64 map_extra;
>>   } __attribute__((aligned(8)));
> [ ... ]
>
>> +static int peek_elem(struct bpf_map *map, void *value)
> These generic map-ops names could be confusing in tracing and
> in perf-report.  There was a 'bloom_filter_map_' prefix in the earlier version.
> I could have missed something in the earlier discussion threads.
> What was the reason in dropping the prefix?
>
The reason I dropped the prefix was so that the function names would be
less verbose. Your point about it being confusing in tracing and in 
perf-report
makes a lot of sense - I will add it back in!

[...]
Martin KaFai Lau Oct. 29, 2021, 4:49 a.m. UTC | #7
On Thu, Oct 28, 2021 at 08:17:23PM -0700, Joanne Koong wrote:
> On 10/28/21 2:14 PM, Martin KaFai Lau wrote:
> 
> > On Wed, Oct 27, 2021 at 04:45:00PM -0700, Joanne Koong wrote:
> [...]
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index c10820037883..8bead4aa3ad0 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -906,6 +906,7 @@ enum bpf_map_type {
> > >   	BPF_MAP_TYPE_RINGBUF,
> > >   	BPF_MAP_TYPE_INODE_STORAGE,
> > >   	BPF_MAP_TYPE_TASK_STORAGE,
> > > +	BPF_MAP_TYPE_BLOOM_FILTER,
> > >   };
> > >   /* Note that tracing related programs such as
> > > @@ -1274,6 +1275,13 @@ union bpf_attr {
> > >   						   * struct stored as the
> > >   						   * map value
> > >   						   */
> > > +		/* Any per-map-type extra fields
> > > +		 *
> > > +		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
> > > +		 * number of hash functions (if 0, the bloom filter will default
> > > +		 * to using 5 hash functions).
> > > +		 */
> > > +		__u64	map_extra;
> > >   	};
> When I run pahole (on an x86-64 machine), I see that there's an 8 byte hole
> right before map_extra in the "union bpf_attr" struct (above this
> paragraph).
> It seems like this should be padded as well with a "__u64 :64;"? I will add
> that in.
hmm... I don't see it.

pahole tools/lib/bpf/libbpf.a:

union bpf_attr {
	struct {
		__u32              map_type;           /*     0     4 */
		__u32              key_size;           /*     4     4 */
		__u32              value_size;         /*     8     4 */
		__u32              max_entries;        /*    12     4 */
		__u32              map_flags;          /*    16     4 */
		__u32              inner_map_fd;       /*    20     4 */
		__u32              numa_node;          /*    24     4 */
		char               map_name[16];       /*    28    16 */
		__u32              map_ifindex;        /*    44     4 */
		__u32              btf_fd;             /*    48     4 */
		__u32              btf_key_type_id;    /*    52     4 */
		__u32              btf_value_type_id;  /*    56     4 */
		__u32              btf_vmlinux_value_type_id; /*    60     4 */
		/* --- cacheline 1 boundary (64 bytes) --- */
		__u64              map_extra;          /*    64     8 */
	};                                             /*     0    72 */

or you meant another struct/union?

> > >   	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
> > > @@ -5638,6 +5646,7 @@ struct bpf_map_info {
> > >   	__u32 btf_id;
> > >   	__u32 btf_key_type_id;
> > >   	__u32 btf_value_type_id;
> > There is also a 4 byte hole here.  A "__u32 :32" is needed.
> > You can find details in 36f9814a494a ("bpf: fix uapi hole for 32 bit compat applications")
> > 
> > > +	__u64 map_extra;
> > >   } __attribute__((aligned(8)));
Martin KaFai Lau Oct. 29, 2021, 6:40 a.m. UTC | #8
On Thu, Oct 28, 2021 at 10:52:22PM -0700, Joanne Koong wrote:
> On 10/28/21 9:49 PM, Martin KaFai Lau wrote:
> 
> > On Thu, Oct 28, 2021 at 08:17:23PM -0700, Joanne Koong wrote:
> > > On 10/28/21 2:14 PM, Martin KaFai Lau wrote:
> > > 
> > > On Wed, Oct 27, 2021 at 04:45:00PM -0700, Joanne Koong wrote:
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index
> > > 31421c74ba08..50105e0b8fcc 100644 > --- a/include/linux/bpf.h > +++
> > > b/include/linux/bpf.h > @@ -169,6 +169,7 @@ struct bpf_map { The
> > > earlier context is copied here:
> > > 
> > > 	struct bpf_map *inner_map_meta;
> > > #ifdef CONFIG_SECURITY
> > >          void *security;
> > > #endif
> > > 
> > > >  	u32 value_size; > u32 max_entries; > u32 map_flags; > + u64
> > > map_extra; /* any per-map-type extra fields */ There is a 4 byte
> > > hole before the new 'u64 map_extra'.  Try to move
> > > it before map_flags
> 
> Manually resuscitating your previous comment back into this email ^.
> 
> After rebasing to the latest master, I'm not seeing a significant difference
> anymore with map_extra before/after max_flags. This is what I see when
> running "pahole vmlinux.o":
> 
> With map_extra AFTER map_flags:
> 
> struct bpf_map {
>         const struct bpf_map_ops  * ops __attribute__((__aligned__(64)));
> /*     0     8 */
>         struct bpf_map *           inner_map_meta;       /* 8     8 */
>         void *                     security;             /* 16     8 */
>         enum bpf_map_type          map_type;             /* 24     4 */
>         u32                        key_size;             /* 28     4 */
>         u32                        value_size;           /* 32     4 */
>         u32                        max_entries;          /* 36     4 */
>         u32                        map_flags;            /* 40     4 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         u64                        map_extra;            /* 48     8 */
>         int                        spin_lock_off;        /* 56     4 */
>         int                        timer_off;            /* 60     4 */
>         /* --- cacheline 1 boundary (64 bytes) --- */
>         u32                        id;                   /* 64     4 */
>         int                        numa_node;            /* 68     4 */
>         u32                        btf_key_type_id;      /* 72     4 */
>         u32                        btf_value_type_id;    /* 76     4 */
>         struct btf *               btf;                  /* 80     8 */
>         struct mem_cgroup *        memcg;                /* 88     8 */
>         char                       name[16];             /* 96    16 */
>         u32                        btf_vmlinux_value_type_id; /* 112     4
> */
>         bool                       bypass_spec_v1;       /* 116     1 */
>         bool                       frozen;               /* 117     1 */
> 
>         /* XXX 10 bytes hole, try to pack */
> 
>         /* --- cacheline 2 boundary (128 bytes) --- */
>         atomic64_t                 refcnt __attribute__((__aligned__(64)));
> /*   128     8 */
>         atomic64_t                 usercnt;              /* 136     8 */
>         struct work_struct         work;                 /* 144    72 */
>         /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
>         struct mutex               freeze_mutex;         /* 216   144 */
>         /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */
>         u64                        writecnt;             /* 360     8 */
> 
>         /* size: 384, cachelines: 6, members: 26 */
>         /* sum members: 354, holes: 2, sum holes: 14 */
>         /* padding: 16 */
>         /* forced alignments: 2, forced holes: 1, sum forced holes: 10 */
> } __attribute__((__aligned__(64)));
> 
> 
> With map_extra BEFORE map_flags:
> 
> struct bpf_map {
>         const struct bpf_map_ops  * ops __attribute__((__aligned__(64)));
> /*     0     8 */
>         struct bpf_map *           inner_map_meta;       /* 8     8 */
>         void *                     security;             /* 16     8 */
>         enum bpf_map_type          map_type;             /* 24     4 */
>         u32                        key_size;             /* 28     4 */
>         u32                        value_size;           /* 32     4 */
>         u32                        max_entries;          /* 36     4 */
>         u64                        map_extra;            /* 40     8 */
>         u32                        map_flags;            /* 48     4 */
>         int                        spin_lock_off;        /* 52     4 */
>         int                        timer_off;            /* 56     4 */
>         u32                        id;                   /* 60     4 */
>         /* --- cacheline 1 boundary (64 bytes) --- */
>         int                        numa_node;            /* 64     4 */
>         u32                        btf_key_type_id;      /* 68     4 */
>         u32                        btf_value_type_id;    /* 72     4 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         struct btf *               btf;                  /* 80     8 */
>         struct mem_cgroup *        memcg;                /* 88     8 */
>         char                       name[16];             /* 96    16 */
>         u32                        btf_vmlinux_value_type_id; /* 112     4
> */
>         bool                       bypass_spec_v1;       /* 116     1 */
>         bool                       frozen;               /* 117     1 */
> 
>         /* XXX 10 bytes hole, try to pack */
> 
>         /* --- cacheline 2 boundary (128 bytes) --- */
>         atomic64_t                 refcnt __attribute__((__aligned__(64)));
> /*   128     8 */
>         atomic64_t                 usercnt;              /* 136     8 */
>         struct work_struct         work;                 /* 144    72 */
>         /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
>         struct mutex               freeze_mutex;         /* 216   144 */
>         /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */
>         u64                        writecnt;             /* 360     8 */
> 
>         /* size: 384, cachelines: 6, members: 26 */
>         /* sum members: 354, holes: 2, sum holes: 14 */
>         /* padding: 16 */
>         /* forced alignments: 2, forced holes: 1, sum forced holes: 10 */
> } __attribute__((__aligned__(64)));
> 
> 
> The main difference is that the "id" field is part of the 2nd cacheline when
> "map_extra" is after "map_flags", and is part of the 1st cacheline when
> "map_extra" is before "map_flags".
> 
> Do you think it's still worth it to move "map_extra" to before "map_flags"?
It looks like there is an existing 4 byte hole.  I would take this chance
to plunge it by using an existing 4 byte field.  Something like this:

diff --git i/include/linux/bpf.h w/include/linux/bpf.h
index 50105e0b8fcc..0e07c659acd4 100644
--- i/include/linux/bpf.h
+++ w/include/linux/bpf.h
@@ -169,22 +169,22 @@ struct bpf_map {
 	u32 value_size;
 	u32 max_entries;
 	u32 map_flags;
-	u64 map_extra; /* any per-map-type extra fields */
 	int spin_lock_off; /* >=0 valid offset, <0 error */
 	int timer_off; /* >=0 valid offset, <0 error */
 	u32 id;
 	int numa_node;
 	u32 btf_key_type_id;
 	u32 btf_value_type_id;
+	u32 btf_vmlinux_value_type_id;
+	u64 map_extra; /* any per-map-type extra fields */
 	struct btf *btf;
 #ifdef CONFIG_MEMCG_KMEM
 	struct mem_cgroup *memcg;
 #endif
 	char name[BPF_OBJ_NAME_LEN];
-	u32 btf_vmlinux_value_type_id;
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 22 bytes hole */
+	/* 14 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 31421c74ba08..50105e0b8fcc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -169,6 +169,7 @@  struct bpf_map {
 	u32 value_size;
 	u32 max_entries;
 	u32 map_flags;
+	u64 map_extra; /* any per-map-type extra fields */
 	int spin_lock_off; /* >=0 valid offset, <0 error */
 	int timer_off; /* >=0 valid offset, <0 error */
 	u32 id;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 9c81724e4b98..c4424ac2fa02 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -125,6 +125,7 @@  BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
 
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c10820037883..8bead4aa3ad0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@  enum bpf_map_type {
 	BPF_MAP_TYPE_RINGBUF,
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BLOOM_FILTER,
 };
 
 /* Note that tracing related programs such as
@@ -1274,6 +1275,13 @@  union bpf_attr {
 						   * struct stored as the
 						   * map value
 						   */
+		/* Any per-map-type extra fields
+		 *
+		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
+		 * number of hash functions (if 0, the bloom filter will default
+		 * to using 5 hash functions).
+		 */
+		__u64	map_extra;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -5638,6 +5646,7 @@  struct bpf_map_info {
 	__u32 btf_id;
 	__u32 btf_key_type_id;
 	__u32 btf_value_type_id;
+	__u64 map_extra;
 } __attribute__((aligned(8)));
 
 struct bpf_btf_info {
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 7f33098ca63f..cf6ca339f3cd 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -7,7 +7,7 @@  endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
-obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
+obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
new file mode 100644
index 000000000000..7c50232b7571
--- /dev/null
+++ b/kernel/bpf/bloom_filter.c
@@ -0,0 +1,195 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <linux/bitmap.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/err.h>
+#include <linux/jhash.h>
+#include <linux/random.h>
+
+#define BLOOM_CREATE_FLAG_MASK \
+	(BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
+
+struct bpf_bloom_filter {
+	struct bpf_map map;
+	u32 bitset_mask;
+	u32 hash_seed;
+	/* If the size of the values in the bloom filter is u32 aligned,
+	 * then it is more performant to use jhash2 as the underlying hash
+	 * function, else we use jhash. This tracks the number of u32s
+	 * in an u32-aligned value size. If the value size is not u32 aligned,
+	 * this will be 0.
+	 */
+	u32 aligned_u32_count;
+	u32 nr_hash_funcs;
+	unsigned long bitset[];
+};
+
+static u32 hash(struct bpf_bloom_filter *bloom, void *value,
+		u32 value_size, u32 index)
+{
+	u32 h;
+
+	if (bloom->aligned_u32_count)
+		h = jhash2(value, bloom->aligned_u32_count,
+			   bloom->hash_seed + index);
+	else
+		h = jhash(value, value_size, bloom->hash_seed + index);
+
+	return h & bloom->bitset_mask;
+}
+
+static int peek_elem(struct bpf_map *map, void *value)
+{
+	struct bpf_bloom_filter *bloom =
+		container_of(map, struct bpf_bloom_filter, map);
+	u32 i, h;
+
+	for (i = 0; i < bloom->nr_hash_funcs; i++) {
+		h = hash(bloom, value, map->value_size, i);
+		if (!test_bit(h, bloom->bitset))
+			return -ENOENT;
+	}
+
+	return 0;
+}
+
+static int push_elem(struct bpf_map *map, void *value, u64 flags)
+{
+	struct bpf_bloom_filter *bloom =
+		container_of(map, struct bpf_bloom_filter, map);
+	u32 i, h;
+
+	if (flags != BPF_ANY)
+		return -EINVAL;
+
+	for (i = 0; i < bloom->nr_hash_funcs; i++) {
+		h = hash(bloom, value, map->value_size, i);
+		set_bit(h, bloom->bitset);
+	}
+
+	return 0;
+}
+
+static int pop_elem(struct bpf_map *map, void *value)
+{
+	return -EOPNOTSUPP;
+}
+
+static struct bpf_map *map_alloc(union bpf_attr *attr)
+{
+	u32 bitset_bytes, bitset_mask, nr_hash_funcs, nr_bits;
+	int numa_node = bpf_map_attr_numa_node(attr);
+	struct bpf_bloom_filter *bloom;
+
+	if (!bpf_capable())
+		return ERR_PTR(-EPERM);
+
+	if (attr->key_size != 0 || attr->value_size == 0 ||
+	    attr->max_entries == 0 ||
+	    attr->map_flags & ~BLOOM_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags) ||
+	    (attr->map_extra & ~0xF))
+		return ERR_PTR(-EINVAL);
+
+	/* The lower 4 bits of map_extra specify the number of hash functions */
+	nr_hash_funcs = attr->map_extra & 0xF;
+	if (nr_hash_funcs == 0)
+		/* Default to using 5 hash functions if unspecified */
+		nr_hash_funcs = 5;
+
+	/* For the bloom filter, the optimal bit array size that minimizes the
+	 * false positive probability is n * k / ln(2) where n is the number of
+	 * expected entries in the bloom filter and k is the number of hash
+	 * functions. We use 7 / 5 to approximate 1 / ln(2).
+	 *
+	 * We round this up to the nearest power of two to enable more efficient
+	 * hashing using bitmasks. The bitmask will be the bit array size - 1.
+	 *
+	 * If this overflows a u32, the bit array size will have 2^32 (4
+	 * GB) bits.
+	 */
+	if (check_mul_overflow(attr->max_entries, nr_hash_funcs, &nr_bits) ||
+	    check_mul_overflow(nr_bits / 5, (u32)7, &nr_bits) ||
+	    nr_bits > (1UL << 31)) {
+		/* The bit array size is 2^32 bits but to avoid overflowing the
+		 * u32, we use U32_MAX, which will round up to the equivalent
+		 * number of bytes
+		 */
+		bitset_bytes = BITS_TO_BYTES(U32_MAX);
+		bitset_mask = U32_MAX;
+	} else {
+		if (nr_bits <= BITS_PER_LONG)
+			nr_bits = BITS_PER_LONG;
+		else
+			nr_bits = roundup_pow_of_two(nr_bits);
+		bitset_bytes = BITS_TO_BYTES(nr_bits);
+		bitset_mask = nr_bits - 1;
+	}
+
+	bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
+	bloom = bpf_map_area_alloc(sizeof(*bloom) + bitset_bytes, numa_node);
+
+	if (!bloom)
+		return ERR_PTR(-ENOMEM);
+
+	bpf_map_init_from_attr(&bloom->map, attr);
+
+	bloom->nr_hash_funcs = nr_hash_funcs;
+	bloom->bitset_mask = bitset_mask;
+
+	/* Check whether the value size is u32-aligned */
+	if ((attr->value_size & (sizeof(u32) - 1)) == 0)
+		bloom->aligned_u32_count =
+			attr->value_size / sizeof(u32);
+
+	if (!(attr->map_flags & BPF_F_ZERO_SEED))
+		bloom->hash_seed = get_random_int();
+
+	return &bloom->map;
+}
+
+static void map_free(struct bpf_map *map)
+{
+	struct bpf_bloom_filter *bloom =
+		container_of(map, struct bpf_bloom_filter, map);
+
+	bpf_map_area_free(bloom);
+}
+
+static void *lookup_elem(struct bpf_map *map, void *key)
+{
+	/* The eBPF program should use map_peek_elem instead */
+	return ERR_PTR(-EINVAL);
+}
+
+static int update_elem(struct bpf_map *map, void *key,
+		       void *value, u64 flags)
+{
+	/* The eBPF program should use map_push_elem instead */
+	return -EINVAL;
+}
+
+static int check_btf(const struct bpf_map *map, const struct btf *btf,
+		     const struct btf_type *key_type,
+		     const struct btf_type *value_type)
+{
+	/* Bloom filter maps are keyless */
+	return btf_type_is_void(key_type) ? 0 : -EINVAL;
+}
+
+static int bpf_bloom_btf_id;
+const struct bpf_map_ops bloom_filter_map_ops = {
+	.map_meta_equal = bpf_map_meta_equal,
+	.map_alloc = map_alloc,
+	.map_free = map_free,
+	.map_push_elem = push_elem,
+	.map_peek_elem = peek_elem,
+	.map_pop_elem = pop_elem,
+	.map_lookup_elem = lookup_elem,
+	.map_update_elem = update_elem,
+	.map_check_btf = check_btf,
+	.map_btf_name = "bpf_bloom_filter",
+	.map_btf_id = &bpf_bloom_btf_id,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5beb321b3b3b..ff0c6f5b2ec5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -199,7 +199,8 @@  static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
 		err = bpf_fd_reuseport_array_update_elem(map, key, value,
 							 flags);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
+		   map->map_type == BPF_MAP_TYPE_STACK ||
+		   map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
 		err = map->ops->map_push_elem(map, value, flags);
 	} else {
 		rcu_read_lock();
@@ -238,7 +239,8 @@  static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
+		   map->map_type == BPF_MAP_TYPE_STACK ||
+		   map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
 		err = map->ops->map_peek_elem(map, value);
 	} else if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
 		/* struct_ops map requires directly updating "value" */
@@ -348,6 +350,7 @@  void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
 	map->max_entries = attr->max_entries;
 	map->map_flags = bpf_map_flags_retain_permanent(attr->map_flags);
 	map->numa_node = bpf_map_attr_numa_node(attr);
+	map->map_extra = attr->map_extra;
 }
 
 static int bpf_map_alloc_id(struct bpf_map *map)
@@ -553,6 +556,7 @@  static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   "value_size:\t%u\n"
 		   "max_entries:\t%u\n"
 		   "map_flags:\t%#x\n"
+		   "map_extra:\t%#llx\n"
 		   "memlock:\t%lu\n"
 		   "map_id:\t%u\n"
 		   "frozen:\t%u\n",
@@ -561,6 +565,7 @@  static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   map->value_size,
 		   map->max_entries,
 		   map->map_flags,
+		   (unsigned long long)map->map_extra,
 		   bpf_map_memory_footprint(map),
 		   map->id,
 		   READ_ONCE(map->frozen));
@@ -810,7 +815,7 @@  static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD btf_vmlinux_value_type_id
+#define BPF_MAP_CREATE_LAST_FIELD map_extra
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
@@ -831,6 +836,10 @@  static int map_create(union bpf_attr *attr)
 		return -EINVAL;
 	}
 
+	if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
+	    attr->map_extra != 0)
+		return -EINVAL;
+
 	f_flags = bpf_get_file_flag(attr->map_flags);
 	if (f_flags < 0)
 		return f_flags;
@@ -1080,6 +1089,14 @@  static int map_lookup_elem(union bpf_attr *attr)
 	if (!value)
 		goto free_key;
 
+	if (map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
+		if (copy_from_user(value, uvalue, value_size))
+			err = -EFAULT;
+		else
+			err = bpf_map_copy_value(map, key, value, attr->flags);
+		goto free_value;
+	}
+
 	err = bpf_map_copy_value(map, key, value, attr->flags);
 	if (err)
 		goto free_value;
@@ -3875,6 +3892,7 @@  static int bpf_map_get_info_by_fd(struct file *file,
 	info.value_size = map->value_size;
 	info.max_entries = map->max_entries;
 	info.map_flags = map->map_flags;
+	info.map_extra = map->map_extra;
 	memcpy(info.name, map->name, sizeof(map->name));
 
 	if (map->btf) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c6616e325803..3c8aa7df1773 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5002,7 +5002,10 @@  static int resolve_map_arg_type(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 		break;
-
+	case BPF_MAP_TYPE_BLOOM_FILTER:
+		if (meta->func_id == BPF_FUNC_map_peek_elem)
+			*arg_type = ARG_PTR_TO_MAP_VALUE;
+		break;
 	default:
 		break;
 	}
@@ -5577,6 +5580,11 @@  static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    func_id != BPF_FUNC_task_storage_delete)
 			goto error;
 		break;
+	case BPF_MAP_TYPE_BLOOM_FILTER:
+		if (func_id != BPF_FUNC_map_peek_elem &&
+		    func_id != BPF_FUNC_map_push_elem)
+			goto error;
+		break;
 	default:
 		break;
 	}
@@ -5644,13 +5652,18 @@  static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    map->map_type != BPF_MAP_TYPE_SOCKHASH)
 			goto error;
 		break;
-	case BPF_FUNC_map_peek_elem:
 	case BPF_FUNC_map_pop_elem:
-	case BPF_FUNC_map_push_elem:
 		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
 		    map->map_type != BPF_MAP_TYPE_STACK)
 			goto error;
 		break;
+	case BPF_FUNC_map_peek_elem:
+	case BPF_FUNC_map_push_elem:
+		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
+		    map->map_type != BPF_MAP_TYPE_STACK &&
+		    map->map_type != BPF_MAP_TYPE_BLOOM_FILTER)
+			goto error;
+		break;
 	case BPF_FUNC_sk_storage_get:
 	case BPF_FUNC_sk_storage_delete:
 		if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c10820037883..8bead4aa3ad0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@  enum bpf_map_type {
 	BPF_MAP_TYPE_RINGBUF,
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BLOOM_FILTER,
 };
 
 /* Note that tracing related programs such as
@@ -1274,6 +1275,13 @@  union bpf_attr {
 						   * struct stored as the
 						   * map value
 						   */
+		/* Any per-map-type extra fields
+		 *
+		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
+		 * number of hash functions (if 0, the bloom filter will default
+		 * to using 5 hash functions).
+		 */
+		__u64	map_extra;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -5638,6 +5646,7 @@  struct bpf_map_info {
 	__u32 btf_id;
 	__u32 btf_key_type_id;
 	__u32 btf_value_type_id;
+	__u64 map_extra;
 } __attribute__((aligned(8)));
 
 struct bpf_btf_info {