mbox series

[RFC,bpf-next,v2,0/4] Introduce BPF_MA_REUSE_AFTER_RCU_GP

Message ID 20230408141846.1878768-1-houtao@huaweicloud.com (mailing list archive)
Headers show
Series Introduce BPF_MA_REUSE_AFTER_RCU_GP | expand

Message

Hou Tao April 8, 2023, 2:18 p.m. UTC
From: Hou Tao <houtao1@huawei.com>

Hi,

As discussed in v1, currently the freed objects in bpf memory allocator
may be reused immediately by the new allocation, it introduces
use-after-bpf-ma-free problem for non-preallocated hash map and makes
lookup procedure return incorrect result. The immediate reuse also makes
introducing new use case more difficult (e.g. qp-trie).

The patch series tries to introduce BPF_MA_REUSE_AFTER_RCU_GP to solve
these problems. For BPF_MA_REUSE_AFTER_GP, the freed objects are reused
only after one RCU grace period and may be freed by bpf memory allocator
after another RCU-tasks-trace grace period. So for bpf programs which
care about reuse problem, these programs can use
bpf_rcu_read_{lock,unlock}() to access these freed objects safely and
for those which doesn't care, there will be safely use-after-bpf-ma-free
because these objects have not been freed by bpf memory allocator.

The current implementation is far from perfect, but I think it is ready
for get some feedbacks before putting in more effort. The implementation
mainly focus on how to speed up the transition from freed elements to
reusable elements and try to reduce the risk of OOM.

To accelerate the transition, it dynamically allocates rcu_head and call
call_rcu() in a kworker to do the transition. The frequency of call_rcu()
invocation could be improved by calling call_rcu() in irq work, but after
did that, I found the RCU grace period increased a lot and I still could
not figure out why. To reduce the risk of OOM, these reusable elements need
to be free as well, but we can not dynamically allocate rcu_head to do
that, because compared with RCU grace period RCU-tasks-trace grace
period is slower, so the freeing of reusable elements is just like the
freeing in normal bpf memory allocator, but these is one difference: for
BPF_MA_REUSE_AFTER_GP bpf ma these freeing elements are still available
for reuse in unit_alloc(). Please see individual patches for more details.

Comments and suggestions are always welcome.

Change Log:
v2:
 * add a benchmark for bpf memory allocator to compare between different
   flavor of bpf memory allocator.
 * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator.
v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/

Hou Tao (4):
  selftests/bpf: Add benchmark for bpf memory allocator
  bpf: Factor out a common helper free_all()
  bpf: Pass bitwise flags to bpf_mem_alloc_init()
  bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP

 include/linux/bpf_mem_alloc.h                 |   9 +-
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/cpumask.c                          |   2 +-
 kernel/bpf/hashtab.c                          |   5 +-
 kernel/bpf/memalloc.c                         | 390 ++++++++++++++++--
 tools/testing/selftests/bpf/Makefile          |   3 +
 tools/testing/selftests/bpf/bench.c           |   4 +
 .../selftests/bpf/benchs/bench_htab_mem.c     | 273 ++++++++++++
 .../selftests/bpf/progs/htab_mem_bench.c      | 145 +++++++
 9 files changed, 785 insertions(+), 48 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c
 create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c

Comments

Hou Tao April 21, 2023, 6:23 a.m. UTC | #1
ping ?

On 4/8/2023 10:18 PM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> Hi,
>
> As discussed in v1, currently the freed objects in bpf memory allocator
> may be reused immediately by the new allocation, it introduces
> use-after-bpf-ma-free problem for non-preallocated hash map and makes
> lookup procedure return incorrect result. The immediate reuse also makes
> introducing new use case more difficult (e.g. qp-trie).
>
> The patch series tries to introduce BPF_MA_REUSE_AFTER_RCU_GP to solve
> these problems. For BPF_MA_REUSE_AFTER_GP, the freed objects are reused
> only after one RCU grace period and may be freed by bpf memory allocator
> after another RCU-tasks-trace grace period. So for bpf programs which
> care about reuse problem, these programs can use
> bpf_rcu_read_{lock,unlock}() to access these freed objects safely and
> for those which doesn't care, there will be safely use-after-bpf-ma-free
> because these objects have not been freed by bpf memory allocator.
>
> The current implementation is far from perfect, but I think it is ready
> for get some feedbacks before putting in more effort. The implementation
> mainly focus on how to speed up the transition from freed elements to
> reusable elements and try to reduce the risk of OOM.
>
> To accelerate the transition, it dynamically allocates rcu_head and call
> call_rcu() in a kworker to do the transition. The frequency of call_rcu()
> invocation could be improved by calling call_rcu() in irq work, but after
> did that, I found the RCU grace period increased a lot and I still could
> not figure out why. To reduce the risk of OOM, these reusable elements need
> to be free as well, but we can not dynamically allocate rcu_head to do
> that, because compared with RCU grace period RCU-tasks-trace grace
> period is slower, so the freeing of reusable elements is just like the
> freeing in normal bpf memory allocator, but these is one difference: for
> BPF_MA_REUSE_AFTER_GP bpf ma these freeing elements are still available
> for reuse in unit_alloc(). Please see individual patches for more details.
>
> Comments and suggestions are always welcome.
>
> Change Log:
> v2:
>  * add a benchmark for bpf memory allocator to compare between different
>    flavor of bpf memory allocator.
>  * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator.
> v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/
>
> Hou Tao (4):
>   selftests/bpf: Add benchmark for bpf memory allocator
>   bpf: Factor out a common helper free_all()
>   bpf: Pass bitwise flags to bpf_mem_alloc_init()
>   bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP
>
>  include/linux/bpf_mem_alloc.h                 |   9 +-
>  kernel/bpf/core.c                             |   2 +-
>  kernel/bpf/cpumask.c                          |   2 +-
>  kernel/bpf/hashtab.c                          |   5 +-
>  kernel/bpf/memalloc.c                         | 390 ++++++++++++++++--
>  tools/testing/selftests/bpf/Makefile          |   3 +
>  tools/testing/selftests/bpf/bench.c           |   4 +
>  .../selftests/bpf/benchs/bench_htab_mem.c     | 273 ++++++++++++
>  .../selftests/bpf/progs/htab_mem_bench.c      | 145 +++++++
>  9 files changed, 785 insertions(+), 48 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c
>  create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c
>