diff mbox series

[bpf-next] bpf: Shrink size of struct bpf_map/bpf_array.

Message ID 20240220235001.57411-1-alexei.starovoitov@gmail.com (mailing list archive)
State Accepted
Commit f867839918091be532fce29901937582e8323536
Delegated to: BPF
Headers show
Series [bpf-next] bpf: Shrink size of struct bpf_map/bpf_array. | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7798 this patch: 7798
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 9 maintainers not CCed: jolsa@kernel.org john.fastabend@gmail.com yonghong.song@linux.dev martin.lau@linux.dev song@kernel.org sdf@google.com eddyz87@gmail.com kpsingh@kernel.org haoluo@google.com
netdev/build_clang success Errors and warnings before: 2369 this patch: 2369
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8292 this patch: 8292
netdev/checkpatch warning CHECK: struct mutex definition without comment
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 6 this patch: 6
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17

Commit Message

Alexei Starovoitov Feb. 20, 2024, 11:50 p.m. UTC
From: Alexei Starovoitov <ast@kernel.org>

Back in 2018 the commit be95a845cc44 ("bpf: avoid false sharing of map refcount with max_entries")
added ____cacheline_aligned to "struct bpf_map" to make sure that fields like
refcnt don't share a cache line with max_entries that is used to bounds check
map access. That was done to make spectre style attacks harder. The main
mitigation is done via code similar to array_index_nospec(), of course.
This was an additional precaution.
It increased the size of "struct bpf_map" a little, but it's affect
on all other maps (like array) is significant, since "struct bpf_map" is
typically the first member in other map types.

Undo this ____cacheline_aligned tag. Instead move freeze_mutex field around,
so that refcnt and max_entries are still in different cache lines.

The main effect is seen in sizeof(struct bpf_array) that reduces from 320 to 248 bytes.

BEFORE:

struct bpf_map {
	const struct bpf_map_ops  * ops;                 /*     0     8 */
	...
	char                       name[16];             /*    96    16 */

	/* XXX 16 bytes hole, try to pack */

	/* --- cacheline 2 boundary (128 bytes) --- */
	atomic64_t refcnt __attribute__((__aligned__(64))); /*   128     8 */
	...
	/* size: 256, cachelines: 4, members: 30 */
	/* sum members: 232, holes: 1, sum holes: 16 */
	/* padding: 8 */
	/* paddings: 1, sum paddings: 2 */
} __attribute__((__aligned__(64)));

struct bpf_array {
	struct bpf_map             map;                  /*     0   256 */
	...
	/* size: 320, cachelines: 5, members: 5 */
	/* padding: 48 */
	/* paddings: 1, sum paddings: 8 */
} __attribute__((__aligned__(64)));

AFTER:

struct bpf_map {
	/* size: 232, cachelines: 4, members: 30 */
	/* paddings: 1, sum paddings: 2 */
	/* last cacheline: 40 bytes */
};
struct bpf_array {
	/* size: 248, cachelines: 4, members: 5 */
	/* last cacheline: 56 bytes */
};

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

Comments

Yonghong Song Feb. 21, 2024, 4:40 p.m. UTC | #1
On 2/20/24 3:50 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
>
> Back in 2018 the commit be95a845cc44 ("bpf: avoid false sharing of map refcount with max_entries")
> added ____cacheline_aligned to "struct bpf_map" to make sure that fields like
> refcnt don't share a cache line with max_entries that is used to bounds check
> map access. That was done to make spectre style attacks harder. The main
> mitigation is done via code similar to array_index_nospec(), of course.
> This was an additional precaution.
> It increased the size of "struct bpf_map" a little, but it's affect
> on all other maps (like array) is significant, since "struct bpf_map" is
> typically the first member in other map types.
>
> Undo this ____cacheline_aligned tag. Instead move freeze_mutex field around,
> so that refcnt and max_entries are still in different cache lines.
>
> The main effect is seen in sizeof(struct bpf_array) that reduces from 320 to 248 bytes.
>
> BEFORE:
>
> struct bpf_map {
> 	const struct bpf_map_ops  * ops;                 /*     0     8 */
> 	...
> 	char                       name[16];             /*    96    16 */
>
> 	/* XXX 16 bytes hole, try to pack */
>
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	atomic64_t refcnt __attribute__((__aligned__(64))); /*   128     8 */
> 	...
> 	/* size: 256, cachelines: 4, members: 30 */
> 	/* sum members: 232, holes: 1, sum holes: 16 */
> 	/* padding: 8 */
> 	/* paddings: 1, sum paddings: 2 */
> } __attribute__((__aligned__(64)));
>
> struct bpf_array {
> 	struct bpf_map             map;                  /*     0   256 */
> 	...
> 	/* size: 320, cachelines: 5, members: 5 */
> 	/* padding: 48 */
> 	/* paddings: 1, sum paddings: 8 */
> } __attribute__((__aligned__(64)));
>
> AFTER:
>
> struct bpf_map {
> 	/* size: 232, cachelines: 4, members: 30 */
> 	/* paddings: 1, sum paddings: 2 */
> 	/* last cacheline: 40 bytes */
> };
> struct bpf_array {
> 	/* size: 248, cachelines: 4, members: 5 */
> 	/* last cacheline: 56 bytes */
> };
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: Yonghong Song <yonghong.song@linux.dev>
patchwork-bot+netdevbpf@kernel.org Feb. 21, 2024, 5:10 p.m. UTC | #2
Hello:

This patch was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Tue, 20 Feb 2024 15:50:01 -0800 you wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Back in 2018 the commit be95a845cc44 ("bpf: avoid false sharing of map refcount with max_entries")
> added ____cacheline_aligned to "struct bpf_map" to make sure that fields like
> refcnt don't share a cache line with max_entries that is used to bounds check
> map access. That was done to make spectre style attacks harder. The main
> mitigation is done via code similar to array_index_nospec(), of course.
> This was an additional precaution.
> It increased the size of "struct bpf_map" a little, but it's affect
> on all other maps (like array) is significant, since "struct bpf_map" is
> typically the first member in other map types.
> 
> [...]

Here is the summary with links:
  - [bpf-next] bpf: Shrink size of struct bpf_map/bpf_array.
    https://git.kernel.org/bpf/bpf-next/c/f86783991809

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c7aa99b44dbd..814dc913a968 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -251,10 +251,7 @@  struct bpf_list_node_kern {
 } __attribute__((aligned(8)));
 
 struct bpf_map {
-	/* The first two cachelines with read-mostly members of which some
-	 * are also accessed in fast-path (e.g. ops, max_entries).
-	 */
-	const struct bpf_map_ops *ops ____cacheline_aligned;
+	const struct bpf_map_ops *ops;
 	struct bpf_map *inner_map_meta;
 #ifdef CONFIG_SECURITY
 	void *security;
@@ -276,17 +273,14 @@  struct bpf_map {
 	struct obj_cgroup *objcg;
 #endif
 	char name[BPF_OBJ_NAME_LEN];
-	/* The 3rd and 4th cacheline with misc members to avoid false sharing
-	 * particularly with refcounting.
-	 */
-	atomic64_t refcnt ____cacheline_aligned;
+	struct mutex freeze_mutex;
+	atomic64_t refcnt;
 	atomic64_t usercnt;
 	/* rcu is used before freeing and work is only used during freeing */
 	union {
 		struct work_struct work;
 		struct rcu_head rcu;
 	};
-	struct mutex freeze_mutex;
 	atomic64_t writecnt;
 	/* 'Ownership' of program-containing map is claimed by the first program
 	 * that is going to use this map or by the first program which FD is