mbox series

[bpf-next,v2,00/20] Support dynptr key for hash map

Message ID 20250125111109.732718-1-houtao@huaweicloud.com (mailing list archive)
Headers show
Series Support dynptr key for hash map | expand

Message

Hou Tao Jan. 25, 2025, 11:10 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

Hi,

The patch set aims to add the basic dynptr key support for hash map as
discussed in [1]. The main motivation is to fully utilize the BTF info
of the map key and to support variable-length key (e.g., string or any
byte stream) for bpf map. The patch set uses bpf_dynptr to represent the
variable-length part in the map key and the total number of
variable-length parts in the map key is limited as 1 now. Due to the
limitation in bpf memory allocator, the max size of dynptr in map key is
limited as 4088 bytes. Beside the variable-length parts (dynptr parts),
the fixed-size part in map key is still allowed, so all of these
following map key definitions are valid:

	struct bpf_dynptr;

	struct map_key_1 {
		struct bpf_dynptr name;
	};
	struct map_key_2 {
		int pid;
		struct bpf_dynptr name;
	};
	struct map_key_3 {
		struct map_key_2 f1;
		unsigned long when;
	};

The patch set supports lookup, update, delete operations on normal hash
map with dynptr key for both bpf program and bpf syscall. It also
supports lookup_and_delete and get_next_key operations on dynptr map key
for bpf syscall.

However the following operations have not been fully supported yet on a
hash map with dynptr key:

1) batched map operation through bpf syscall
2) the memory accounting for dynptr (aka .htab_map_mem_usage)
3) btf print for the dynptr in map key
4) bpftool support
5) the iteration of elements through bpf program
When a bpf program iterates the element in a hash map with dynptr key
(e.g., bpf_for_each_map_elem() helper or map element iterator), the
dynptr in the map key has not been specially treated yet and the dynptr
is only treated as a read-only 16-bytes buffer.

The patch set is structured as follow:

Patch #1~#2 introduce BPF_DYNPTR in btf_field_type and parse the
bpf_dynptr in the map key.

Patch #3~#7 remove the need to specify BPF_F_DYNPTR_IN_KEY explicitly,
introduces an internal BPF_INT_F_DYNPTR_IN_KEY map flag, set the
internal flag when there is any bpf_dynptr in the map key btf, and also
verify the value of max_extra is valid when it is set.

Patch #8~#9 refactor check_stack_range_initialized() and support
dynptr-keyed map in verifier.

Patch #10~#12 introduce bpf_dynptr_user, support the use of
bpf_dynptr_user in bpf syscall for map lookup, lookup_delete, update,
delete and get_next_key operations.

Patch #13~#17 update the lookup, lookup_delete, update, delete and
get_next_key callback correspondingly to support dynptr-keyed hash map.

Patch #18~#19 add positive and negative test cases for hash map with
dynptr key support.

Patch #20 adds the benchmark to compare the lookup and update
performance between normal hash map and dynptr-keyed hash map.

Patch set v2 mainly address the suggestions and comments in v1. It
mainly includes:
1) remove the need to set BPF_F_DYNPTR_IN_KEY flag explicitly
2) remove bpf_dynptr_user helpers from libbpf
3) support dynptr-keyed map in verifier in a less-intrusive way
4) add always_inline for lookup_{nulls_elem|elem}_raw to alleviate
   the performance degradation

The performance results in v2 are almost the same as v1. When the max
length of str is greater than 256, the lookup performance of dynptr
hash-map will be better than the normal hash map. When the max length is
greater than 512, the update performance of dynptr hash map will be
better than the normal hash map. And the memory consumption of hash-map
with dynptr key is smaller compared with normal hash map.

a) lookup operation

max_entries = 8K (randomly generated data set)
| max length of desc | normal hash-map    | dynptr hash-map   |
| ---                |  ---               | ---               |
|  64                | 12.0 M/s (1.7 MB)  | 8.3 M/s (1.4 MB)  |
| 128                |  6.4 M/s (2.2 MB)  | 6.6 M/s (1.7 MB)  |
| 256                |  3.7 M/s (4.2 MB)  | 4.8 M/s (2.3 MB)  |
| 512                |  2.1 M/s (8.2 MB)  | 3.1 M/s (3.8 MB)  |
| 1024               |  1.1 M/s (16 MB)   | 1.9 M/s (6.5 MB)  |
| 2048               |  0.6 M/s (32 MB)   | 1.1 M/s (12 MB)   |
| 4096               |  0.3 M/s (64 MB)   | 0.6 M/s (22 MB)   |

| string in file     | normal hash-map    | dynptr hash-map   |
| ---                |  ---               | ---               |
| kallsyms           |  7.7 M/s (29 MB)   | 7.3 M/s (22 MB)   |
| string in BTF      |  8.0 M/s (22 MB)   | 7.3 M/s (16 MB)   |
| alexa top 1M sites |  3.9 M/s (191 MB)  | 3.7 M/s (138 MB)  |

b) update and delete operation

max_entries = 8K (randomly generated data set)
| max length of desc | normal hash-map    | dynptr hash-map   |
| ---                |  ---               | ---               |
|  64                |  5.0 M/s           | 3.6 M/s           |
| 128                |  3.8 M/s           | 3.4 M/s           |
| 256                |  2.7 M/s           | 2.7 M/s           |
| 512                |  1.7 M/s           | 2.1 M/s           |
| 1024               |  0.9 M/s           | 1.5 M/s           |
| 2048               |  0.5 M/s           | 0.9 M/s           |
| 4096               |  0.3 M/s           | 0.5 M/s           |

| strings in file    | normal hash-map    | dynptr hash-map   |
| ---                |  ---               | ---               |
| kallsyms           |  3.9 M/s           | 2.9 M/s           |
| strings in BTF     |  4.1 M/s           | 3.3 M/s           |
| alexa top 1M sites |  2.7 M/s           | 2.5 M/s           |

As usual, comments and suggestions are always welcome.

PS: I will soon start my long Chinese Lunar New Year holiday, so my
replies may be a bit slow.

---

Change Log:
v2:
  * remove the need to set BPF_F_DYNPTR_IN_KEY flag explicitly
  * remove bpf_dynptr_user helpers from libbpf
  * support dynptr-keyed map in verifier in a less-intrusive way
  * handle the return value of kvmemdup_bpfptr() correctly
  * add necessary comments for ->record and ->key_record
  * use __bpf_md_ptr to define the data field of bpf_dynptr_user
  * add always_inline for lookup_{nulls_elem|elem}_raw
  * add benchmark patch for dynptr-keyed hash map

v1: https://lore.kernel.org/bpf/20241008091501.8302-1-houtao@huaweicloud.com/

[1]: https://lore.kernel.org/bpf/CAADnVQJWaBRB=P-ZNkppwm=0tZaT3qP8PKLLJ2S5SSA2-S8mxg@mail.gmail.com/

Hou Tao (20):
  bpf: Add two helpers to facilitate the parsing of bpf_dynptr
  bpf: Parse bpf_dynptr in map key
  bpf: Factor out get_map_btf() helper
  bpf: Move the initialization of btf before ->map_alloc_check
  bpf: Introduce an internal map flag BPF_INT_F_DYNPTR_IN_KEY
  bpf: Set BPF_INT_F_DYNPTR_IN_KEY conditionally
  bpf: Use map_extra to indicate the max data size of dynptrs in map key
  bpf: Split check_stack_range_initialized() into small functions
  bpf: Support map key with dynptr in verifier
  bpf: Introduce bpf_dynptr_user
  bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input
  bpf: Handle bpf_dynptr_user in bpf syscall when it is used as output
  bpf: Support basic operations for dynptr key in hash map
  bpf: Export bpf_dynptr_set_size
  bpf: Support get_next_key operation for dynptr key in hash map
  bpf: Disable unsupported operations for map with dynptr key
  bpf: Enable BPF_INT_F_DYNPTR_IN_KEY for hash map
  selftests/bpf: Add bpf_dynptr_user_init() helper
  selftests/bpf: Add test cases for hash map with dynptr key
  selftests/bpf: Add benchmark for dynptr key support in hash map

 include/linux/bpf.h                           |  40 +-
 include/linux/btf.h                           |   2 +
 include/uapi/linux/bpf.h                      |   6 +
 kernel/bpf/btf.c                              |  46 +-
 kernel/bpf/hashtab.c                          | 319 ++++++++-
 kernel/bpf/helpers.c                          |   2 +-
 kernel/bpf/map_in_map.c                       |  21 +-
 kernel/bpf/syscall.c                          | 363 +++++++++--
 kernel/bpf/verifier.c                         | 373 ++++++++---
 tools/include/uapi/linux/bpf.h                |   6 +
 tools/testing/selftests/bpf/Makefile          |   2 +
 tools/testing/selftests/bpf/bench.c           |  10 +
 .../selftests/bpf/benchs/bench_dynptr_key.c   | 612 ++++++++++++++++++
 .../bpf/benchs/run_bench_dynptr_key.sh        |  51 ++
 tools/testing/selftests/bpf/bpf_util.h        |   9 +
 .../bpf/prog_tests/htab_dynkey_test.c         | 427 ++++++++++++
 .../selftests/bpf/progs/cpumask_common.h      |   2 +-
 .../selftests/bpf/progs/dynptr_key_bench.c    | 250 +++++++
 .../bpf/progs/htab_dynkey_test_failure.c      | 216 +++++++
 .../bpf/progs/htab_dynkey_test_success.c      | 383 +++++++++++
 20 files changed, 2946 insertions(+), 194 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_dynptr_key.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_dynptr_key.sh
 create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_key_bench.c
 create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c