[RFC,bpf-next,v4,2/3] selftests/bpf: Add benchmark for bpf memory allocator

From: Hou Tao <houtao1@huawei.com>

From: Hou Tao <houtao1@huawei.com>

The benchmark could be used to compare the performance of hash map
operations and the memory usage between different flavors of bpf memory
allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also
could be used to check the performance improvement or the memory saving
provided by optimization.

The benchmark creates a non-preallocated hash map which uses bpf memory
allocator and shows the operation performance and the memory usage of
the hash map under different use cases:
(1) no_op
Only create the hash map and there is no operations on hash map. It is
used as the baseline. When each CPU completes the iteration of
nr_entries / nr_threads elements in hash map, the loop count is
increased.
(2) overwrite
Each CPU overwrites nonoverlapping part of hash map. When each CPU
completes overwriting of nr_entries / nr_threads elements in hash map,
the loop count is increased.
(3) batch_add_batch_del
Each CPU adds then deletes nonoverlapping part of hash map in batch.
When each CPU adds and deletes nr_entries / nr_threads elements in hash
map, the loop count is increased.
(4) add_del_on_diff_cpu
Each two CPUs add and delete nonoverlapping part of map cooperatively
When each two-CPU pair adds and deletes nr_entries / nr_threads * 2
elements in hash map, the loop count is increased twice.

The following are the benchmark results when comparing between different
flavors of bpf memory allocator. These tests are conducted on a KVM guest
with 8 CPUs and 16 GB memory. The command line below is used to do all
the following benchmarks:

  ./bench htab-mem --use-case $name --max-entries 16384 ${OPTS} \
          --full 50 -d 10 --producers=8 --prod-affinity=0-7

These results show:
* preallocated case has both better performance and better memory
  efficiency.
* normal bpf memory doesn't handle add_del_on_diff_cpu very well. The
  large memory usage is due to the slow tasks trace RCU grace period.

(1) non-preallocated + no bpf memory allocator (v6.0.19)
use kmalloc() + call_rcu

| name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
| --                  | --         | --                   | --                |
| no_op               | 1214.42    | 0.92                 | 0.92              |
| overwrite           | 3.21       | 40.47                | 67.98             |
| batch_add_batch_del | 2.32       | 24.31                | 49.33             |
| add_del_on_diff_cpu | 2.92       | 4.03                 | 6.00              |

(2) preallocated
OPTS=--preallocated

| name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
| --                  | --         | --                   | --                |
| no_op               | 1156.59    | 1.88                 | 1.88              |
| overwrite           | 36.19      | 1.88                 | 1.88              |
| batch_add_batch_del | 22.27      | 1.88                 | 1.88              |
| add_del_on_diff_cpu | 4.68       | 1.95                 | 2.05              |

(3) normal bpf memory allocator

| name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
| --                  | --         | --                   | --                |
| no_op               | 1273.55    | 0.98                 | 0.98              |
| overwrite           | 26.57      | 2.59                 | 2.74              |
| batch_add_batch_del | 11.13      | 2.59                 | 2.99              |
| add_del_on_diff_cpu | 3.72       | 15.15                | 26.04             |

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 tools/testing/selftests/bpf/Makefile          |   3 +
 tools/testing/selftests/bpf/bench.c           |   4 +
 .../selftests/bpf/benchs/bench_htab_mem.c     | 352 ++++++++++++++++++
 .../bpf/benchs/run_bench_htab_mem.sh          |  42 +++
 .../selftests/bpf/progs/htab_mem_bench.c      | 135 +++++++
 5 files changed, 536 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_htab_mem.sh
 create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c

Message ID	20230606035310.4026145-3-houtao@huaweicloud.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC24F80A for <bpf@vger.kernel.org>; Tue, 6 Jun 2023 03:21:00 +0000 (UTC) From: Hou Tao <houtao@huaweicloud.com> To: bpf@vger.kernel.org, Martin KaFai Lau <martin.lau@linux.dev>, Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>, Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>, Daniel Borkmann <daniel@iogearbox.net>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>, John Fastabend <john.fastabend@gmail.com>, "Paul E . McKenney" <paulmck@kernel.org>, rcu@vger.kernel.org, houtao1@huawei.com Subject: [RFC PATCH bpf-next v4 2/3] selftests/bpf: Add benchmark for bpf memory allocator Date: Tue, 6 Jun 2023 11:53:09 +0800 Message-Id: <20230606035310.4026145-3-houtao@huaweicloud.com> In-Reply-To: <20230606035310.4026145-1-houtao@huaweicloud.com> References: <20230606035310.4026145-1-houtao@huaweicloud.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	Handle immediate reuse in bpf memory allocator \| expand [RFC,bpf-next,v4,0/3] Handle immediate reuse in bpf memory allocator [RFC,bpf-next,v4,1/3] bpf: Factor out a common helper free_all() [RFC,bpf-next,v4,2/3] selftests/bpf: Add benchmark for bpf memory allocator [RFC,bpf-next,v4,3/3] bpf: Only reuse after one RCU GP in bpf memory allocator

Context	Check	Description
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3	fail	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	fail	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-5	fail	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for veristat
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers	warning	3 maintainers not CCed: mykolal@fb.com shuah@kernel.org linux-kselftest@vger.kernel.org
netdev/build_clang	success	Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	net selftest script(s) already in Makefile
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 8 this patch: 8
netdev/checkpatch	warning	WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: externs should be avoided in .c files WARNING: line length of 100 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: quoted string split across lines
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

[RFC,bpf-next,v4,2/3] selftests/bpf: Add benchmark for bpf memory allocator

Checks

Commit Message

Comments

Patch