mbox series

[RFC,bpf-next,v3,0/6] Handle immediate reuse in bpf memory allocator

Message ID 20230429101215.111262-1-houtao@huaweicloud.com (mailing list archive)
Headers show
Series Handle immediate reuse in bpf memory allocator | expand

Message

Hou Tao April 29, 2023, 10:12 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

Hi,

As discussed in v1, currently the freed objects in bpf memory allocator
may be reused immediately by the new allocation, it introduces
use-after-bpf-ma-free problem for non-preallocated hash map and makes
lookup procedure return incorrect result. The immediate reuse also makes
introducing new use case more difficult (e.g. qp-trie).

The patch series tries to solve these problems by introducing
BPF_MA_{REUSE|FREE}_AFTER_RCU_GP in bpf memory allocator. For
REUSE_AFTER_GP, the freed objects are reused only after one RCU grace
period and may be freed by bpf memory allocator after another
RCU-tasks-trace grace period. So for bpf programs which care about reuse
problem, these programs can use bpf_rcu_read_{lock,unlock}() to access
these objects safely and for those which doesn't care, there will be
safely use-after-bpf-ma-free because these objects have not been freed
by bpf memory allocator. FREE_AFTER_GP behavior differently. Instead of
making the freed elements being reusable after one RCU GP, it directly
freed these elements back to slab after one RCU GP, so sleepable bpf
program must use bpf_rcu_read_{lock,unlock}() to access elements
allocated from FREE_AFTER_GP bpf memory allocator.

Personally I prefer FREE_AFTER_RCU_GP because its implementation is much
simpler compared with REUSE_AFTER_RCU and its memory usage is also better
than REUSE_AFTER_GP. But its shortcoming is also obvious, so I want to get
some feedback before putting in more effort. As usual, comments and
suggestions are always welcome.

Change Log:
v3:
 * add BPF_MA_FREE_AFTER_RCU_GP bpf memory allocator
 * Update htab memory benchmark
   * move the benchmark patch to the last patch
   * remove array and useless bpf_map_lookup_elem(&array, ...) in bpf
     programs
   * add synchronization between addition CPU and deletion CPU for
     add_del_on_diff_cpu case to prevent unnecessary loop
   * add the benchmark result for "extra call_rcu + bpf ma"

v2: https://lore.kernel.org/bpf/20230408141846.1878768-1-houtao@huaweicloud.com/
 * add a benchmark for bpf memory allocator to compare between different
   flavor of bpf memory allocator.
 * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator.
v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/

Hou Tao (6):
  bpf: Factor out a common helper free_all()
  bpf: Pass bitwise flags to bpf_mem_alloc_init()
  bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP
  bpf: Introduce BPF_MA_FREE_AFTER_RCU_GP
  bpf: Add two module parameters in htab for memory benchmark
  selftests/bpf: Add benchmark for bpf memory allocator

 include/linux/bpf_mem_alloc.h                 |  10 +-
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/cpumask.c                          |   2 +-
 kernel/bpf/hashtab.c                          |  43 +-
 kernel/bpf/memalloc.c                         | 529 ++++++++++++++++--
 tools/testing/selftests/bpf/Makefile          |   3 +
 tools/testing/selftests/bpf/bench.c           |   4 +
 .../selftests/bpf/benchs/bench_htab_mem.c     | 352 ++++++++++++
 .../bpf/benchs/run_bench_htab_mem.sh          |  64 +++
 .../selftests/bpf/progs/htab_mem_bench.c      | 135 +++++
 10 files changed, 1090 insertions(+), 54 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_htab_mem.sh
 create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c