[bpf] bpf: Do not allocate percpu memory at init stage

Message ID	20231110061734.2958678-1-yonghong.song@linux.dev (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BFB8538C for <bpf@vger.kernel.org>; Fri, 10 Nov 2023 06:17:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from 66-220-155-178.mail-mxout.facebook.com (66-220-155-178.mail-mxout.facebook.com [66.220.155.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 594655FD7 for <bpf@vger.kernel.org>; Thu, 9 Nov 2023 22:17:49 -0800 (PST) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 21D02299D58F2; Thu, 9 Nov 2023 22:17:34 -0800 (PST) From: Yonghong Song <yonghong.song@linux.dev> To: bpf@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>, "Kirill A . Shutemov" <kirill@shutemov.name> Subject: [PATCH bpf] bpf: Do not allocate percpu memory at init stage Date: Thu, 9 Nov 2023 22:17:34 -0800 Message-Id: <20231110061734.2958678-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: <bpf.vger.kernel.org> List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Patchwork-Delegate: bpf@iogearbox.net
Series	[bpf] bpf: Do not allocate percpu memory at init stage \| expand [bpf] bpf: Do not allocate percpu memory at init stage

Message ID

20231110061734.2958678-1-yonghong.song@linux.dev (mailing list archive)

State

Superseded

Delegated to:

BPF

Headers

From: Yonghong Song <yonghong.song@linux.dev>
To: bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com,
	Martin KaFai Lau <martin.lau@kernel.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>
Subject: [PATCH bpf] bpf: Do not allocate percpu memory at init stage
Date: Thu,  9 Nov 2023 22:17:34 -0800
Message-Id: <20231110061734.2958678-1-yonghong.song@linux.dev>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Series

[bpf] bpf: Do not allocate percpu memory at init stage | expand

Context	Check	Description
bpf/vmtest-bpf-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-2	success	Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-0	success	Logs for Lint
bpf/vmtest-bpf-VM_Test-3	success	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-4	success	Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7	success	Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-6	success	Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-5	success	Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9	success	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-14	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-17	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-16	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18	fail	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-15	success	Logs for set-matrix
bpf/vmtest-bpf-VM_Test-19	success	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-20	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24	success	Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-21	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23	fail	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27	success	Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-25	success	Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-26	success	Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-29	success	Logs for x86_64-llvm-16 / veristat
bpf/vmtest-bpf-VM_Test-28	success	Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-13	success	Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-12	success	Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-11	success	Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-PR	fail	PR summary
bpf/vmtest-bpf-VM_Test-10	success	Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for bpf, async
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 2677 this patch: 2677
netdev/cc_maintainers	warning	7 maintainers not CCed: jolsa@kernel.org martin.lau@linux.dev song@kernel.org haoluo@google.com john.fastabend@gmail.com sdf@google.com kpsingh@kernel.org
netdev/build_clang	success	Errors and warnings before: 1296 this patch: 1296
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 2756 this patch: 2756
netdev/checkpatch	warning	WARNING: 'upto' may be misspelled - perhaps 'up to'? WARNING: Too many leading tabs - consider code refactoring WARNING: line length of 101 exceeds 80 columns WARNING: line length of 105 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Context

Check

Description

bpf/vmtest-bpf-VM_Test-1

success

Logs for ShellCheck

bpf/vmtest-bpf-VM_Test-2

success

Logs for Validate matrix.py

bpf/vmtest-bpf-VM_Test-0

success

Logs for Lint

bpf/vmtest-bpf-VM_Test-3

success

Logs for aarch64-gcc / build / build for aarch64 with gcc

bpf/vmtest-bpf-VM_Test-8

success

Logs for aarch64-gcc / veristat

bpf/vmtest-bpf-VM_Test-4

success

Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc

bpf/vmtest-bpf-VM_Test-7

success

Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc

bpf/vmtest-bpf-VM_Test-6

success

Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc

bpf/vmtest-bpf-VM_Test-5

success

Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc

bpf/vmtest-bpf-VM_Test-9

success

Logs for s390x-gcc / build / build for s390x with gcc

bpf/vmtest-bpf-VM_Test-14

success

Logs for s390x-gcc / veristat

bpf/vmtest-bpf-VM_Test-17

success

Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-16

success

Logs for x86_64-gcc / build / build for x86_64 with gcc

bpf/vmtest-bpf-VM_Test-18

fail

Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-15

success

Logs for set-matrix

bpf/vmtest-bpf-VM_Test-19

success

Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-20

success

Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-24

success

Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16

bpf/vmtest-bpf-VM_Test-21

success

Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-23

fail

Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-22

success

Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc

bpf/vmtest-bpf-VM_Test-27

success

Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16

bpf/vmtest-bpf-VM_Test-25

success

Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16

bpf/vmtest-bpf-VM_Test-26

success

Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16

bpf/vmtest-bpf-VM_Test-29

success

Logs for x86_64-llvm-16 / veristat

bpf/vmtest-bpf-VM_Test-28

success

Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16

bpf/vmtest-bpf-VM_Test-13

success

Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc

bpf/vmtest-bpf-VM_Test-12

success

Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc

bpf/vmtest-bpf-VM_Test-11

success

Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc

bpf/vmtest-bpf-PR

fail

PR summary

bpf/vmtest-bpf-VM_Test-10

success

Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc

netdev/series_format

success

Single patches do not need cover letters

netdev/tree_selection

success

Clearly marked for bpf, async

netdev/fixes_present

success

Fixes tag present in non-next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 2677 this patch: 2677

netdev/cc_maintainers

warning

7 maintainers not CCed: jolsa@kernel.org martin.lau@linux.dev song@kernel.org haoluo@google.com john.fastabend@gmail.com sdf@google.com kpsingh@kernel.org

netdev/build_clang

success

Errors and warnings before: 1296 this patch: 1296

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

Fixes tag looks correct

netdev/build_allmodconfig_warn

success

Errors and warnings before: 2756 this patch: 2756

netdev/checkpatch

warning

WARNING: 'upto' may be misspelled - perhaps 'up to'? WARNING: Too many leading tabs - consider code refactoring WARNING: line length of 101 exceeds 80 columns WARNING: line length of 105 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns

netdev/build_clang_rust

success

No Rust files in patch. Skipping build

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

Commit Message

Yonghong Song Nov. 10, 2023, 6:17 a.m. UTC

Kirill Shutemov reported significant percpu memory increase after booting
in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for
non-fix-size percpu mem allocation"). The percpu memory is increased
from 111MB to 969MB. The number is from /proc/meminfo.

I tried to reproduce the issue with my local VM which at most supports
upto 255 cpus. With 252 cpus, without the above commit, the percpu memory
immediately after boot is 57MB while with the above commit the percpu
memory is 231MB.

This is not good since so far percpu memory from bpf memory allocator
is not widely used yet. Let us change pre-allocation in init stage
to on-demand allocation when verifier detects there is a need of
percpu memory for bpf program. With this change, percpu memory
consumption after boot can be reduced signicantly.

  [1] https://lore.kernel.org/lkml/20231109154934.4saimljtqx625l3v@box.shutemov.name/

Fixes: 41a5db8d8161 ("bpf: Add support for non-fix-size percpu mem allocation")
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h   |  2 +-
 kernel/bpf/core.c     |  8 +++-----
 kernel/bpf/verifier.c | 17 +++++++++++++++--
 3 files changed, 19 insertions(+), 8 deletions(-)

Comments

Kirill A. Shutemov Nov. 10, 2023, 8:41 a.m. UTC | #1

On Thu, Nov 09, 2023 at 10:17:34PM -0800, Yonghong Song wrote:
> Kirill Shutemov reported significant percpu memory increase after booting

s/memory increase/memory consumption increase/ ?

> in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for
> non-fix-size percpu mem allocation"). The percpu memory is increased
> from 111MB to 969MB. The number is from /proc/meminfo.
> 
> I tried to reproduce the issue with my local VM which at most supports
> upto 255 cpus. With 252 cpus, without the above commit, the percpu memory
> immediately after boot is 57MB while with the above commit the percpu
> memory is 231MB.
> 
> This is not good since so far percpu memory from bpf memory allocator
> is not widely used yet. Let us change pre-allocation in init stage
> to on-demand allocation when verifier detects there is a need of
> percpu memory for bpf program. With this change, percpu memory
> consumption after boot can be reduced signicantly.
> 
>   [1] https://lore.kernel.org/lkml/20231109154934.4saimljtqx625l3v@box.shutemov.name/
> 
> Fixes: 41a5db8d8161 ("bpf: Add support for non-fix-size percpu mem allocation")
> Cc: Kirill A. Shutemov <kirill@shutemov.name>

Reported-and-tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Hou Tao Nov. 10, 2023, 9:01 a.m. UTC | #2

On 11/10/2023 2:17 PM, Yonghong Song wrote:
> Kirill Shutemov reported significant percpu memory increase after booting
> in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for
> non-fix-size percpu mem allocation"). The percpu memory is increased
> from 111MB to 969MB. The number is from /proc/meminfo.
>
> I tried to reproduce the issue with my local VM which at most supports
> upto 255 cpus. With 252 cpus, without the above commit, the percpu memory
> immediately after boot is 57MB while with the above commit the percpu
> memory is 231MB.
>
> This is not good since so far percpu memory from bpf memory allocator
> is not widely used yet. Let us change pre-allocation in init stage
> to on-demand allocation when verifier detects there is a need of
> percpu memory for bpf program. With this change, percpu memory
> consumption after boot can be reduced signicantly.
>
>   [1] https://lore.kernel.org/lkml/20231109154934.4saimljtqx625l3v@box.shutemov.name/
>
> Fixes: 41a5db8d8161 ("bpf: Add support for non-fix-size percpu mem allocation")
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h   |  2 +-
>  kernel/bpf/core.c     |  8 +++-----
>  kernel/bpf/verifier.c | 17 +++++++++++++++--
>  3 files changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index b4825d3cdb29..3df67a04d32e 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -56,7 +56,7 @@ extern struct idr btf_idr;
>  extern spinlock_t btf_idr_lock;
>  extern struct kobject *btf_kobj;
>  extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> -extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
> +extern bool bpf_global_ma_set;
>  
>  typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
>  typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 08626b519ce2..cd3afe57ece3 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -64,8 +64,8 @@
>  #define OFF	insn->off
>  #define IMM	insn->imm
>  
> -struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> -bool bpf_global_ma_set, bpf_global_percpu_ma_set;
> +struct bpf_mem_alloc bpf_global_ma;
> +bool bpf_global_ma_set;
>  
>  /* No hurry in this branch
>   *
> @@ -2934,9 +2934,7 @@ static int __init bpf_global_ma_init(void)
>  
>  	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
>  	bpf_global_ma_set = !ret;
> -	ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
> -	bpf_global_percpu_ma_set = !ret;
> -	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
> +	return ret;
>  }
>  late_initcall(bpf_global_ma_init);
>  #endif
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index bd1c42eb540f..7d485c8b794f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -26,6 +26,7 @@
>  #include <linux/poison.h>
>  #include <linux/module.h>
>  #include <linux/cpumask.h>
> +#include <linux/bpf_mem_alloc.h>
>  #include <net/xdp.h>
>  
>  #include "disasm.h"
> @@ -41,6 +42,9 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
>  #undef BPF_LINK_TYPE
>  };
>  
> +struct bpf_mem_alloc bpf_global_percpu_ma;
> +static bool bpf_global_percpu_ma_set;
> +
>  /* bpf_check() is a static code analyzer that walks eBPF program
>   * instruction by instruction and updates register/stack state.
>   * All paths of conditional branches are analyzed until 'bpf_exit' insn.
> @@ -12074,8 +12078,17 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  				if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
>  					return -ENOMEM;
>  
> -				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && !bpf_global_percpu_ma_set)
> -					return -ENOMEM;
> +				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
> +					mutex_lock(&bpf_verifier_lock);

Instead of acquiring the global lock each time, can we test whether or
bpf_global_percpu_ma_set is set before acquiring the global lock ?
> +					if (!bpf_global_percpu_ma_set) {
> +						err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
> +						if (!err)
> +							bpf_global_percpu_ma_set = true;
> +					}
> +					mutex_unlock(&bpf_verifier_lock);
> +					if (err)
> +						return err;
> +				}
>  
>  				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
>  					verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");

Yonghong Song Nov. 10, 2023, 4:36 p.m. UTC | #3

On 11/10/23 1:01 AM, Hou Tao wrote:
>
> On 11/10/2023 2:17 PM, Yonghong Song wrote:
>> Kirill Shutemov reported significant percpu memory increase after booting
>> in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for
>> non-fix-size percpu mem allocation"). The percpu memory is increased
>> from 111MB to 969MB. The number is from /proc/meminfo.
>>
>> I tried to reproduce the issue with my local VM which at most supports
>> upto 255 cpus. With 252 cpus, without the above commit, the percpu memory
>> immediately after boot is 57MB while with the above commit the percpu
>> memory is 231MB.
>>
>> This is not good since so far percpu memory from bpf memory allocator
>> is not widely used yet. Let us change pre-allocation in init stage
>> to on-demand allocation when verifier detects there is a need of
>> percpu memory for bpf program. With this change, percpu memory
>> consumption after boot can be reduced signicantly.
>>
>>    [1] https://lore.kernel.org/lkml/20231109154934.4saimljtqx625l3v@box.shutemov.name/
>>
>> Fixes: 41a5db8d8161 ("bpf: Add support for non-fix-size percpu mem allocation")
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf.h   |  2 +-
>>   kernel/bpf/core.c     |  8 +++-----
>>   kernel/bpf/verifier.c | 17 +++++++++++++++--
>>   3 files changed, 19 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index b4825d3cdb29..3df67a04d32e 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -56,7 +56,7 @@ extern struct idr btf_idr;
>>   extern spinlock_t btf_idr_lock;
>>   extern struct kobject *btf_kobj;
>>   extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
>> -extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>> +extern bool bpf_global_ma_set;
>>   
>>   typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
>>   typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 08626b519ce2..cd3afe57ece3 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -64,8 +64,8 @@
>>   #define OFF	insn->off
>>   #define IMM	insn->imm
>>   
>> -struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
>> -bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>> +struct bpf_mem_alloc bpf_global_ma;
>> +bool bpf_global_ma_set;
>>   
>>   /* No hurry in this branch
>>    *
>> @@ -2934,9 +2934,7 @@ static int __init bpf_global_ma_init(void)
>>   
>>   	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
>>   	bpf_global_ma_set = !ret;
>> -	ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
>> -	bpf_global_percpu_ma_set = !ret;
>> -	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
>> +	return ret;
>>   }
>>   late_initcall(bpf_global_ma_init);
>>   #endif
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index bd1c42eb540f..7d485c8b794f 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -26,6 +26,7 @@
>>   #include <linux/poison.h>
>>   #include <linux/module.h>
>>   #include <linux/cpumask.h>
>> +#include <linux/bpf_mem_alloc.h>
>>   #include <net/xdp.h>
>>   
>>   #include "disasm.h"
>> @@ -41,6 +42,9 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
>>   #undef BPF_LINK_TYPE
>>   };
>>   
>> +struct bpf_mem_alloc bpf_global_percpu_ma;
>> +static bool bpf_global_percpu_ma_set;
>> +
>>   /* bpf_check() is a static code analyzer that walks eBPF program
>>    * instruction by instruction and updates register/stack state.
>>    * All paths of conditional branches are analyzed until 'bpf_exit' insn.
>> @@ -12074,8 +12078,17 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>   				if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
>>   					return -ENOMEM;
>>   
>> -				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && !bpf_global_percpu_ma_set)
>> -					return -ENOMEM;
>> +				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
>> +					mutex_lock(&bpf_verifier_lock);
> Instead of acquiring the global lock each time, can we test whether or
> bpf_global_percpu_ma_set is set before acquiring the global lock ?

Currently, in verifier we have two places to use bpf_verifier_lock:
(1) to get btf_vmlinux:
         if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
                 mutex_lock(&bpf_verifier_lock);
                 if (!btf_vmlinux)
                         btf_vmlinux = btf_parse_vmlinux();
                 mutex_unlock(&bpf_verifier_lock);
         }
This will only lock once if btf_parse_vmlinux() is successful.

(2) for unprividged bpf programs in bpf_check().
A big chunk of it is under bpf_verifier_lock.

I didn't use style (1) since I assume unprividged bpf programs
is rare and it should seldomly collide with percpu_obj_new_impl.

But my assumption related to (2) may be wrong and in the future
verifier_lock() could be used in other places which may cause
more contention.

So I now agree with you and will make appropriate change. Thanks!

>> +					if (!bpf_global_percpu_ma_set) {
>> +						err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
>> +						if (!err)
>> +							bpf_global_percpu_ma_set = true;
>> +					}
>> +					mutex_unlock(&bpf_verifier_lock);
>> +					if (err)
>> +						return err;
>> +				}
>>   
>>   				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
>>   					verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b4825d3cdb29..3df67a04d32e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -56,7 +56,7 @@  extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
 extern struct kobject *btf_kobj;
 extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
-extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
+extern bool bpf_global_ma_set;
 
 typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 08626b519ce2..cd3afe57ece3 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -64,8 +64,8 @@ 
 #define OFF	insn->off
 #define IMM	insn->imm
 
-struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
-bool bpf_global_ma_set, bpf_global_percpu_ma_set;
+struct bpf_mem_alloc bpf_global_ma;
+bool bpf_global_ma_set;
 
 /* No hurry in this branch
  *
@@ -2934,9 +2934,7 @@  static int __init bpf_global_ma_init(void)
 
 	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
 	bpf_global_ma_set = !ret;
-	ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
-	bpf_global_percpu_ma_set = !ret;
-	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
+	return ret;
 }
 late_initcall(bpf_global_ma_init);
 #endif
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bd1c42eb540f..7d485c8b794f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -26,6 +26,7 @@ 
 #include <linux/poison.h>
 #include <linux/module.h>
 #include <linux/cpumask.h>
+#include <linux/bpf_mem_alloc.h>
 #include <net/xdp.h>
 
 #include "disasm.h"
@@ -41,6 +42,9 @@  static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
 #undef BPF_LINK_TYPE
 };
 
+struct bpf_mem_alloc bpf_global_percpu_ma;
+static bool bpf_global_percpu_ma_set;
+
 /* bpf_check() is a static code analyzer that walks eBPF program
  * instruction by instruction and updates register/stack state.
  * All paths of conditional branches are analyzed until 'bpf_exit' insn.
@@ -12074,8 +12078,17 @@  static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
 					return -ENOMEM;
 
-				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && !bpf_global_percpu_ma_set)
-					return -ENOMEM;
+				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
+					mutex_lock(&bpf_verifier_lock);
+					if (!bpf_global_percpu_ma_set) {
+						err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
+						if (!err)
+							bpf_global_percpu_ma_set = true;
+					}
+					mutex_unlock(&bpf_verifier_lock);
+					if (err)
+						return err;
+				}
 
 				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
 					verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");

[bpf] bpf: Do not allocate percpu memory at init stage

Checks

Commit Message

Comments

Patch