Message ID | 20230408141846.1878768-1-houtao@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce BPF_MA_REUSE_AFTER_RCU_GP | expand |
ping ? On 4/8/2023 10:18 PM, Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > Hi, > > As discussed in v1, currently the freed objects in bpf memory allocator > may be reused immediately by the new allocation, it introduces > use-after-bpf-ma-free problem for non-preallocated hash map and makes > lookup procedure return incorrect result. The immediate reuse also makes > introducing new use case more difficult (e.g. qp-trie). > > The patch series tries to introduce BPF_MA_REUSE_AFTER_RCU_GP to solve > these problems. For BPF_MA_REUSE_AFTER_GP, the freed objects are reused > only after one RCU grace period and may be freed by bpf memory allocator > after another RCU-tasks-trace grace period. So for bpf programs which > care about reuse problem, these programs can use > bpf_rcu_read_{lock,unlock}() to access these freed objects safely and > for those which doesn't care, there will be safely use-after-bpf-ma-free > because these objects have not been freed by bpf memory allocator. > > The current implementation is far from perfect, but I think it is ready > for get some feedbacks before putting in more effort. The implementation > mainly focus on how to speed up the transition from freed elements to > reusable elements and try to reduce the risk of OOM. > > To accelerate the transition, it dynamically allocates rcu_head and call > call_rcu() in a kworker to do the transition. The frequency of call_rcu() > invocation could be improved by calling call_rcu() in irq work, but after > did that, I found the RCU grace period increased a lot and I still could > not figure out why. To reduce the risk of OOM, these reusable elements need > to be free as well, but we can not dynamically allocate rcu_head to do > that, because compared with RCU grace period RCU-tasks-trace grace > period is slower, so the freeing of reusable elements is just like the > freeing in normal bpf memory allocator, but these is one difference: for > BPF_MA_REUSE_AFTER_GP bpf ma these freeing elements are still available > for reuse in unit_alloc(). Please see individual patches for more details. > > Comments and suggestions are always welcome. > > Change Log: > v2: > * add a benchmark for bpf memory allocator to compare between different > flavor of bpf memory allocator. > * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator. > v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/ > > Hou Tao (4): > selftests/bpf: Add benchmark for bpf memory allocator > bpf: Factor out a common helper free_all() > bpf: Pass bitwise flags to bpf_mem_alloc_init() > bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP > > include/linux/bpf_mem_alloc.h | 9 +- > kernel/bpf/core.c | 2 +- > kernel/bpf/cpumask.c | 2 +- > kernel/bpf/hashtab.c | 5 +- > kernel/bpf/memalloc.c | 390 ++++++++++++++++-- > tools/testing/selftests/bpf/Makefile | 3 + > tools/testing/selftests/bpf/bench.c | 4 + > .../selftests/bpf/benchs/bench_htab_mem.c | 273 ++++++++++++ > .../selftests/bpf/progs/htab_mem_bench.c | 145 +++++++ > 9 files changed, 785 insertions(+), 48 deletions(-) > create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c > create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c >
From: Hou Tao <houtao1@huawei.com> Hi, As discussed in v1, currently the freed objects in bpf memory allocator may be reused immediately by the new allocation, it introduces use-after-bpf-ma-free problem for non-preallocated hash map and makes lookup procedure return incorrect result. The immediate reuse also makes introducing new use case more difficult (e.g. qp-trie). The patch series tries to introduce BPF_MA_REUSE_AFTER_RCU_GP to solve these problems. For BPF_MA_REUSE_AFTER_GP, the freed objects are reused only after one RCU grace period and may be freed by bpf memory allocator after another RCU-tasks-trace grace period. So for bpf programs which care about reuse problem, these programs can use bpf_rcu_read_{lock,unlock}() to access these freed objects safely and for those which doesn't care, there will be safely use-after-bpf-ma-free because these objects have not been freed by bpf memory allocator. The current implementation is far from perfect, but I think it is ready for get some feedbacks before putting in more effort. The implementation mainly focus on how to speed up the transition from freed elements to reusable elements and try to reduce the risk of OOM. To accelerate the transition, it dynamically allocates rcu_head and call call_rcu() in a kworker to do the transition. The frequency of call_rcu() invocation could be improved by calling call_rcu() in irq work, but after did that, I found the RCU grace period increased a lot and I still could not figure out why. To reduce the risk of OOM, these reusable elements need to be free as well, but we can not dynamically allocate rcu_head to do that, because compared with RCU grace period RCU-tasks-trace grace period is slower, so the freeing of reusable elements is just like the freeing in normal bpf memory allocator, but these is one difference: for BPF_MA_REUSE_AFTER_GP bpf ma these freeing elements are still available for reuse in unit_alloc(). Please see individual patches for more details. Comments and suggestions are always welcome. Change Log: v2: * add a benchmark for bpf memory allocator to compare between different flavor of bpf memory allocator. * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator. v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/ Hou Tao (4): selftests/bpf: Add benchmark for bpf memory allocator bpf: Factor out a common helper free_all() bpf: Pass bitwise flags to bpf_mem_alloc_init() bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP include/linux/bpf_mem_alloc.h | 9 +- kernel/bpf/core.c | 2 +- kernel/bpf/cpumask.c | 2 +- kernel/bpf/hashtab.c | 5 +- kernel/bpf/memalloc.c | 390 ++++++++++++++++-- tools/testing/selftests/bpf/Makefile | 3 + tools/testing/selftests/bpf/bench.c | 4 + .../selftests/bpf/benchs/bench_htab_mem.c | 273 ++++++++++++ .../selftests/bpf/progs/htab_mem_bench.c | 145 +++++++ 9 files changed, 785 insertions(+), 48 deletions(-) create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c