[RFC,bpf-next,v2,4/4] bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP

From: Hou Tao <houtao1@huawei.com>

From: Hou Tao <houtao1@huawei.com>

Currently the freed objects in bpf memory allocator may be reused
immediately by new allocation, it introduces use-after-bpf-ma-free
problem for non-preallocated hash map and makes lookup procedure
return incorrect result. The immediate reuse also makes introducing
new use case more difficult (e.g. qp-trie).

So introduce BPF_MA_REUSE_AFTER_RCU_GP to solve these problems. For
BPF_MA_REUSE_AFTER_GP, the freed objects are reused only after one RCU
grace period and may be returned back to slab system after another
RCU-tasks-trace grace period. So for bpf programs which care about reuse
problem, these programs can use bpf_rcu_read_{lock,unlock}() to access
these freed objects safely and for those which doesn't care, there will
be safely use-after-bpf-ma-free because these objects have not been
freed by bpf memory allocator.

To make these freed elements being reusab quickly, BPF_MA_REUSE_AFTER_GP
dynamically allocates memory to create many inflight RCU callbacks to
mark these freed element being reusable. These memories used for
bpf_reuse_batch will be freed when these RCU callbacks complete. When no
memory is available, synchronize_rcu_expedited() will be used to make
these freed element reusable. In order to reduce the risk of OOM, part
of these reusable memory will be freed through RCU-tasks-trace grace
period. Before these freeing memories are freed, these memories are also
available for reuse.

The following are the benchmark results when comparing between different
flavors of bpf memory allocator. These results show:
* The performance of reuse-after-rcu-gp bpf ma is good than no bpf ma.
  Its memory usage is also good than no bpf ma except for
  add_del_on_diff_cpu case.
* The memory usage of reuse-after-rcu-gp bpf ma increases a lot compared
  with normal bpf ma.
* The memory usage of free-after-rcu-gp bpf ma is better than
  reuse-after-rcu-gp bpf ma, but its performance is bad than
  reuse-after-ruc-gp because it doesn't do reuse.

(1) no bpf memory allocator (v6.0.19)
| name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
| --                  | --         | --                   | --                |
| no_op               | 1187       | 1.05                 | 1.05              |
| overwrite           | 3.74       | 32.52                | 84.18             |
| batch_add_batch_del | 2.23       | 26.38                | 48.75             |
| add_del_on_diff_cpu | 3.92       | 33.72                | 48.96             |

(2) normal bpf memory allocator
| name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
| --                  | --         | --                   | --                |
| no_op               | 1187       | 0.96                 | 1.00              |
| overwrite           | 27.12      | 2.5                  | 2.99              |
| batch_add_batch_del | 8.9        | 2.77                 | 3.24              |
| add_del_on_diff_cpu | 11.30      | 218.54               | 440.37            |

(3) reuse-after-rcu-gp bpf memory allocator
| name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
| --                  | --         | --                   | --                |
| no_op               | 1276       | 0.96                 | 1.00              |
| overwrite           | 15.66      | 25.00                | 33.07             |
| batch_add_batch_del | 10.32      | 18.84                | 22.64             |
| add_del_on_diff_cpu | 13.00      | 550.50               | 748.74            |

(4) free-after-rcu-gp bpf memory allocator (free directly through call_rcu)

| name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
| --                  | --         | --                   | --                |
| no_op               | 1263       | 0.96                 | 1.00              |
| overwrite           | 10.73      | 12.33                | 20.32             |
| batch_add_batch_del | 7.02       | 9.45                 | 14.07             |
| add_del_on_diff_cpu | 8.99       | 131.64               | 204.42            |

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 include/linux/bpf_mem_alloc.h |   1 +
 kernel/bpf/memalloc.c         | 353 +++++++++++++++++++++++++++++++---
 2 files changed, 326 insertions(+), 28 deletions(-)

Message ID	20230408141846.1878768-5-houtao@huaweicloud.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <rcu-owner@vger.kernel.org> From: Hou Tao <houtao@huaweicloud.com> To: bpf@vger.kernel.org, Martin KaFai Lau <martin.lau@linux.dev>, Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>, Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>, Daniel Borkmann <daniel@iogearbox.net>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>, John Fastabend <john.fastabend@gmail.com>, "Paul E . McKenney" <paulmck@kernel.org>, rcu@vger.kernel.org, houtao1@huawei.com Subject: [RFC bpf-next v2 4/4] bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP Date: Sat, 8 Apr 2023 22:18:46 +0800 Message-Id: <20230408141846.1878768-5-houtao@huaweicloud.com> In-Reply-To: <20230408141846.1878768-1-houtao@huaweicloud.com> References: <20230408141846.1878768-1-houtao@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce BPF_MA_REUSE_AFTER_RCU_GP \| expand [RFC,bpf-next,v2,0/4] Introduce BPF_MA_REUSE_AFTER_RCU_GP [RFC,bpf-next,v2,1/4] selftests/bpf: Add benchmark for bpf memory allocator [RFC,bpf-next,v2,2/4] bpf: Factor out a common helper free_all() [RFC,bpf-next,v2,3/4] bpf: Pass bitwise flags to bpf_mem_alloc_init() [RFC,bpf-next,v2,4/4] bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP

[RFC,bpf-next,v2,4/4] bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP

Commit Message

Comments

Patch