diff mbox series

[bpf-next,v1,2/2] bpftool: profile online CPUs instead of possible

Message ID 20230117044902.98938-2-tong@infragraf.org (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next,v1,1/2] libbpf: introduce new API libbpf_num_online_cpus | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 13 of 13 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 23 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-36 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain }}
bpf/vmtest-bpf-next-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-9 success Logs for set-matrix

Commit Message

Tonghao Zhang Jan. 17, 2023, 4:49 a.m. UTC
From: Tonghao Zhang <tong@infragraf.org>

The number of online cpu may be not equal to possible cpu.
bpftool prog profile, can not create pmu event on possible
but not online cpu.

$ dmidecode -s system-product-name
PowerEdge R620
$ cat /sys/devices/system/cpu/online
0-31
$ cat /sys/devices/system/cpu/possible
0-47

To fix this issue, use online cpu instead of possible, to
create perf event and other resource.

Signed-off-by: Tonghao Zhang <tong@infragraf.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
---
 tools/bpf/bpftool/prog.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Quentin Monnet Jan. 18, 2023, 10:41 a.m. UTC | #1
2023-01-17 12:49 UTC+0800 ~ tong@infragraf.org
> From: Tonghao Zhang <tong@infragraf.org>
> 
> The number of online cpu may be not equal to possible cpu.
> bpftool prog profile, can not create pmu event on possible
> but not online cpu.

s/not/on/ ?

> 
> $ dmidecode -s system-product-name
> PowerEdge R620
> $ cat /sys/devices/system/cpu/online
> 0-31
> $ cat /sys/devices/system/cpu/possible
> 0-47
> 
> To fix this issue, use online cpu instead of possible, to
> create perf event and other resource.
> 
> Signed-off-by: Tonghao Zhang <tong@infragraf.org>
> Cc: Quentin Monnet <quentin@isovalent.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Song Liu <song@kernel.org>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: Stanislav Fomichev <sdf@google.com>
> Cc: Hao Luo <haoluo@google.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/bpf/bpftool/prog.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index cfc9fdc1e863..08b352dd799e 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -2056,6 +2056,7 @@ static int profile_parse_metrics(int argc, char **argv)
>  
>  static void profile_read_values(struct profiler_bpf *obj)
>  {
> +	__u32 possible_cpus = libbpf_num_possible_cpus();
>  	__u32 m, cpu, num_cpu = obj->rodata->num_cpu;
>  	int reading_map_fd, count_map_fd;
>  	__u64 counts[num_cpu];
> @@ -2080,7 +2081,7 @@ static void profile_read_values(struct profiler_bpf *obj)
>  		profile_total_count += counts[cpu];
>  
>  	for (m = 0; m < ARRAY_SIZE(metrics); m++) {
> -		struct bpf_perf_event_value values[num_cpu];
> +		struct bpf_perf_event_value values[possible_cpus];
>  
>  		if (!metrics[m].selected)
>  			continue;
> @@ -2321,7 +2322,7 @@ static int do_profile(int argc, char **argv)
>  	if (num_metric <= 0)
>  		goto out;
>  
> -	num_cpu = libbpf_num_possible_cpus();
> +	num_cpu = libbpf_num_online_cpus();
>  	if (num_cpu <= 0) {
>  		p_err("failed to identify number of CPUs");
>  		goto out;

Thanks, but it doesn't seem to be enough to solve the issue. How did you
test it? With your series applied locally, I'm trying the following
(Intel x86_64, CPUs: 0..7):

	# echo 0 > /sys/devices/system/cpu/cpu2/online
	# ./bpftool prog profile id 1525 duration 1 cycles instructions
	Error: failed to create event cycles on cpu 2

It seems that we're still trying to open the perf events on the offline
CPU in profile_open_perf_events(), because even though we try to use
fewer of the possible CPUs we're still referencing them in order by
their index. So it works if I only disabled the last CPUs instead (#7,
then #6, ...), but to work with _any_ CPU disabled, we would need to
retrieve the list of online CPUs.
Tonghao Zhang Jan. 19, 2023, 8:22 a.m. UTC | #2
> On Jan 18, 2023, at 6:41 PM, Quentin Monnet <quentin@isovalent.com> wrote:
> 
> 2023-01-17 12:49 UTC+0800 ~ tong@infragraf.org
>> From: Tonghao Zhang <tong@infragraf.org>
>> 
>> The number of online cpu may be not equal to possible cpu.
>> bpftool prog profile, can not create pmu event on possible
>> but not online cpu.
> 
> s/not/on/ ?
> 
>> 
>> $ dmidecode -s system-product-name
>> PowerEdge R620
>> $ cat /sys/devices/system/cpu/online
>> 0-31
>> $ cat /sys/devices/system/cpu/possible
>> 0-47
>> 
>> To fix this issue, use online cpu instead of possible, to
>> create perf event and other resource.
>> 
>> Signed-off-by: Tonghao Zhang <tong@infragraf.org>
>> Cc: Quentin Monnet <quentin@isovalent.com>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Andrii Nakryiko <andrii@kernel.org>
>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>> Cc: Song Liu <song@kernel.org>
>> Cc: Yonghong Song <yhs@fb.com>
>> Cc: John Fastabend <john.fastabend@gmail.com>
>> Cc: KP Singh <kpsingh@kernel.org>
>> Cc: Stanislav Fomichev <sdf@google.com>
>> Cc: Hao Luo <haoluo@google.com>
>> Cc: Jiri Olsa <jolsa@kernel.org>
>> ---
>> tools/bpf/bpftool/prog.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>> index cfc9fdc1e863..08b352dd799e 100644
>> --- a/tools/bpf/bpftool/prog.c
>> +++ b/tools/bpf/bpftool/prog.c
>> @@ -2056,6 +2056,7 @@ static int profile_parse_metrics(int argc, char **argv)
>> 
>> static void profile_read_values(struct profiler_bpf *obj)
>> {
>> +	__u32 possible_cpus = libbpf_num_possible_cpus();
>> 	__u32 m, cpu, num_cpu = obj->rodata->num_cpu;
>> 	int reading_map_fd, count_map_fd;
>> 	__u64 counts[num_cpu];
>> @@ -2080,7 +2081,7 @@ static void profile_read_values(struct profiler_bpf *obj)
>> 		profile_total_count += counts[cpu];
>> 
>> 	for (m = 0; m < ARRAY_SIZE(metrics); m++) {
>> -		struct bpf_perf_event_value values[num_cpu];
>> +		struct bpf_perf_event_value values[possible_cpus];
>> 
>> 		if (!metrics[m].selected)
>> 			continue;
>> @@ -2321,7 +2322,7 @@ static int do_profile(int argc, char **argv)
>> 	if (num_metric <= 0)
>> 		goto out;
>> 
>> -	num_cpu = libbpf_num_possible_cpus();
>> +	num_cpu = libbpf_num_online_cpus();
>> 	if (num_cpu <= 0) {
>> 		p_err("failed to identify number of CPUs");
>> 		goto out;
> 
> Thanks, but it doesn't seem to be enough to solve the issue. How did you
> test it? With your series applied locally, I'm trying the following
> (Intel x86_64, CPUs: 0..7):
> 
> 	# echo 0 > /sys/devices/system/cpu/cpu2/online
> 	# ./bpftool prog profile id 1525 duration 1 cycles instructions
> 	Error: failed to create event cycles on cpu 2
> 
> It seems that we're still trying to open the perf events on the offline
> CPU in profile_open_perf_events(), because even though we try to use
> fewer of the possible CPUs we're still referencing them in order by
Hi
Thanks for your review and comment.
I don’t test the case that one cpu is offline which is not last CPU.
> their index. So it works if I only disabled the last CPUs instead (#7,
> then #6, ...), but to work with _any_ CPU disabled, we would need to
> retrieve the list of online CPUs.
> 
Yes, In other way, to fix it, we can use the errno return by open_perf_event. If errno is ENODEV, we can skip this cpu to profile? 
The patch fix this issue.

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 620032042576..deec7196b48c 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -55,6 +55,24 @@ void p_err(const char *fmt, ...)
 	va_end(ap);
 }

+void p_warn(const char *fmt, ...)
+{
+	va_list ap;
+
+	va_start(ap, fmt);
+	if (json_output) {
+		jsonw_start_object(json_wtr);
+		jsonw_name(json_wtr, "warning");
+		jsonw_vprintf_enquote(json_wtr, fmt, ap);
+		jsonw_end_object(json_wtr);
+	} else {
+		fprintf(stderr, "Warn: ");
+		vfprintf(stderr, fmt, ap);
+		fprintf(stderr, "\n");
+	}
+	va_end(ap);
+}
+
 void p_info(const char *fmt, ...)
 {
 	va_list ap;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index a84224b6a604..e62edec9e13a 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -86,6 +86,7 @@ extern struct btf *base_btf;
 extern struct hashmap *refs_table;

 void __printf(1, 2) p_err(const char *fmt, ...);
+void __printf(1, 2) p_warn(const char *fmt, ...);
 void __printf(1, 2) p_info(const char *fmt, ...);

 bool is_prefix(const char *pfx, const char *str);
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index cfc9fdc1e863..d9363ba01ec0 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -2233,10 +2233,36 @@ static void profile_close_perf_events(struct profiler_bpf *obj)
 	profile_perf_event_cnt = 0;
 }

+static int profile_open_perf_event(int mid, int cpu, int map_fd)
+{
+	int pmu_fd;
+
+	pmu_fd = syscall(__NR_perf_event_open, &metrics[mid].attr,
+			 -1/*pid*/, cpu, -1/*group_fd*/, 0);
+	if (pmu_fd < 0) {
+		if (errno == ENODEV) {
+			p_warn("cpu %d may be offline, skip %s metric profiling.",
+				cpu, metrics[mid].name);
+			profile_perf_event_cnt++;
+			return 0;
+		}
+		return -1;
+	}
+
+	if (bpf_map_update_elem(map_fd,
+				&profile_perf_event_cnt,
+				&pmu_fd, BPF_ANY) ||
+	    ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0))
+		return -1;
+
+	profile_perf_events[profile_perf_event_cnt++] = pmu_fd;
+	return 0;
+}
+
 static int profile_open_perf_events(struct profiler_bpf *obj)
 {
 	unsigned int cpu, m;
-	int map_fd, pmu_fd;
+	int map_fd;

 	profile_perf_events = calloc(
 		sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric);
@@ -2255,17 +2281,11 @@ static int profile_open_perf_events(struct profiler_bpf *obj)
 		if (!metrics[m].selected)
 			continue;
 		for (cpu = 0; cpu < obj->rodata->num_cpu; cpu++) {
-			pmu_fd = syscall(__NR_perf_event_open, &metrics[m].attr,
-					 -1/*pid*/, cpu, -1/*group_fd*/, 0);
-			if (pmu_fd < 0 ||
-			    bpf_map_update_elem(map_fd, &profile_perf_event_cnt,
-						&pmu_fd, BPF_ANY) ||
-			    ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0)) {
+			if (profile_open_perf_event(m, cpu, map_fd)) {
 				p_err("failed to create event %s on cpu %d",
 				      metrics[m].name, cpu);
 				return -1;
 			}
-			profile_perf_events[profile_perf_event_cnt++] = pmu_fd;
 		}
 	}
 	return 0;


----
Best Regards, Tonghao <tong@infragraf.org>
Quentin Monnet Jan. 20, 2023, 3:35 p.m. UTC | #3
2023-01-19 16:22 UTC+0800 ~ Tonghao Zhang <tong@infragraf.org>
>> On Jan 18, 2023, at 6:41 PM, Quentin Monnet <quentin@isovalent.com> wrote:
>>
>> 2023-01-17 12:49 UTC+0800 ~ tong@infragraf.org
>>> From: Tonghao Zhang <tong@infragraf.org>
>>>
>>> The number of online cpu may be not equal to possible cpu.
>>> bpftool prog profile, can not create pmu event on possible
>>> but not online cpu.
>>
>> s/not/on/ ?
>>
>>>
>>> $ dmidecode -s system-product-name
>>> PowerEdge R620
>>> $ cat /sys/devices/system/cpu/online
>>> 0-31
>>> $ cat /sys/devices/system/cpu/possible
>>> 0-47
>>>
>>> To fix this issue, use online cpu instead of possible, to
>>> create perf event and other resource.
>>>
>>> Signed-off-by: Tonghao Zhang <tong@infragraf.org>
>>> Cc: Quentin Monnet <quentin@isovalent.com>
>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>> Cc: Andrii Nakryiko <andrii@kernel.org>
>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>> Cc: Song Liu <song@kernel.org>
>>> Cc: Yonghong Song <yhs@fb.com>
>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>> Cc: KP Singh <kpsingh@kernel.org>
>>> Cc: Stanislav Fomichev <sdf@google.com>
>>> Cc: Hao Luo <haoluo@google.com>
>>> Cc: Jiri Olsa <jolsa@kernel.org>
>>> ---
>>> tools/bpf/bpftool/prog.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>>> index cfc9fdc1e863..08b352dd799e 100644
>>> --- a/tools/bpf/bpftool/prog.c
>>> +++ b/tools/bpf/bpftool/prog.c
>>> @@ -2056,6 +2056,7 @@ static int profile_parse_metrics(int argc, char **argv)
>>>
>>> static void profile_read_values(struct profiler_bpf *obj)
>>> {
>>> +	__u32 possible_cpus = libbpf_num_possible_cpus();
>>> 	__u32 m, cpu, num_cpu = obj->rodata->num_cpu;
>>> 	int reading_map_fd, count_map_fd;
>>> 	__u64 counts[num_cpu];
>>> @@ -2080,7 +2081,7 @@ static void profile_read_values(struct profiler_bpf *obj)
>>> 		profile_total_count += counts[cpu];
>>>
>>> 	for (m = 0; m < ARRAY_SIZE(metrics); m++) {
>>> -		struct bpf_perf_event_value values[num_cpu];
>>> +		struct bpf_perf_event_value values[possible_cpus];
>>>
>>> 		if (!metrics[m].selected)
>>> 			continue;
>>> @@ -2321,7 +2322,7 @@ static int do_profile(int argc, char **argv)
>>> 	if (num_metric <= 0)
>>> 		goto out;
>>>
>>> -	num_cpu = libbpf_num_possible_cpus();
>>> +	num_cpu = libbpf_num_online_cpus();
>>> 	if (num_cpu <= 0) {
>>> 		p_err("failed to identify number of CPUs");
>>> 		goto out;
>>
>> Thanks, but it doesn't seem to be enough to solve the issue. How did you
>> test it? With your series applied locally, I'm trying the following
>> (Intel x86_64, CPUs: 0..7):
>>
>> 	# echo 0 > /sys/devices/system/cpu/cpu2/online
>> 	# ./bpftool prog profile id 1525 duration 1 cycles instructions
>> 	Error: failed to create event cycles on cpu 2
>>
>> It seems that we're still trying to open the perf events on the offline
>> CPU in profile_open_perf_events(), because even though we try to use
>> fewer of the possible CPUs we're still referencing them in order by
> Hi
> Thanks for your review and comment.
> I don’t test the case that one cpu is offline which is not last CPU.
>> their index. So it works if I only disabled the last CPUs instead (#7,
>> then #6, ...), but to work with _any_ CPU disabled, we would need to
>> retrieve the list of online CPUs.
>>
> Yes, In other way, to fix it, we can use the errno return by open_perf_event. If errno is ENODEV, we can skip this cpu to profile? 
> The patch fix this issue.
> 
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index 620032042576..deec7196b48c 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -55,6 +55,24 @@ void p_err(const char *fmt, ...)
>  	va_end(ap);
>  }
> 
> +void p_warn(const char *fmt, ...)
> +{
> +	va_list ap;
> +
> +	va_start(ap, fmt);
> +	if (json_output) {
> +		jsonw_start_object(json_wtr);
> +		jsonw_name(json_wtr, "warning");
> +		jsonw_vprintf_enquote(json_wtr, fmt, ap);
> +		jsonw_end_object(json_wtr);
> +	} else {
> +		fprintf(stderr, "Warn: ");
> +		vfprintf(stderr, fmt, ap);
> +		fprintf(stderr, "\n");
> +	}
> +	va_end(ap);
> +}
> +
>  void p_info(const char *fmt, ...)
>  {
>  	va_list ap;
> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> index a84224b6a604..e62edec9e13a 100644
> --- a/tools/bpf/bpftool/main.h
> +++ b/tools/bpf/bpftool/main.h
> @@ -86,6 +86,7 @@ extern struct btf *base_btf;
>  extern struct hashmap *refs_table;
> 
>  void __printf(1, 2) p_err(const char *fmt, ...);
> +void __printf(1, 2) p_warn(const char *fmt, ...);
>  void __printf(1, 2) p_info(const char *fmt, ...);
> 
>  bool is_prefix(const char *pfx, const char *str);
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index cfc9fdc1e863..d9363ba01ec0 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -2233,10 +2233,36 @@ static void profile_close_perf_events(struct profiler_bpf *obj)
>  	profile_perf_event_cnt = 0;
>  }
> 
> +static int profile_open_perf_event(int mid, int cpu, int map_fd)
> +{
> +	int pmu_fd;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &metrics[mid].attr,
> +			 -1/*pid*/, cpu, -1/*group_fd*/, 0);
> +	if (pmu_fd < 0) {
> +		if (errno == ENODEV) {
> +			p_warn("cpu %d may be offline, skip %s metric profiling.",
> +				cpu, metrics[mid].name);

Nit: I think it's fine to keep this at the info level (p_info()).

> +			profile_perf_event_cnt++;
> +			return 0;
> +		}
> +		return -1;
> +	}
> +
> +	if (bpf_map_update_elem(map_fd,
> +				&profile_perf_event_cnt,
> +				&pmu_fd, BPF_ANY) ||
> +	    ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0))
> +		return -1;
> +
> +	profile_perf_events[profile_perf_event_cnt++] = pmu_fd;
> +	return 0;
> +}
> +
>  static int profile_open_perf_events(struct profiler_bpf *obj)
>  {
>  	unsigned int cpu, m;
> -	int map_fd, pmu_fd;
> +	int map_fd;
> 
>  	profile_perf_events = calloc(
>  		sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric);
> @@ -2255,17 +2281,11 @@ static int profile_open_perf_events(struct profiler_bpf *obj)
>  		if (!metrics[m].selected)
>  			continue;
>  		for (cpu = 0; cpu < obj->rodata->num_cpu; cpu++) {
> -			pmu_fd = syscall(__NR_perf_event_open, &metrics[m].attr,
> -					 -1/*pid*/, cpu, -1/*group_fd*/, 0);
> -			if (pmu_fd < 0 ||
> -			    bpf_map_update_elem(map_fd, &profile_perf_event_cnt,
> -						&pmu_fd, BPF_ANY) ||
> -			    ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0)) {
> +			if (profile_open_perf_event(m, cpu, map_fd)) {
>  				p_err("failed to create event %s on cpu %d",
>  				      metrics[m].name, cpu);
>  				return -1;
>  			}
> -			profile_perf_events[profile_perf_event_cnt++] = pmu_fd;
>  		}
>  	}
>  	return 0;
> 
> 
> ----
> Best Regards, Tonghao <tong@infragraf.org>
> 

I haven't tested this patch, but yes, looks like it should address the
issue. Could you submit a proper v2, please?

Quentin
Tonghao Zhang Feb. 1, 2023, 3:12 a.m. UTC | #4
> On Jan 20, 2023, at 11:35 PM, Quentin Monnet <quentin@isovalent.com> wrote:
> 
> 2023-01-19 16:22 UTC+0800 ~ Tonghao Zhang <tong@infragraf.org>
>>> On Jan 18, 2023, at 6:41 PM, Quentin Monnet <quentin@isovalent.com> wrote:
>>> 
>>> 2023-01-17 12:49 UTC+0800 ~ tong@infragraf.org
>>>> From: Tonghao Zhang <tong@infragraf.org>
>>>> 
>>>> The number of online cpu may be not equal to possible cpu.
>>>> bpftool prog profile, can not create pmu event on possible
>>>> but not online cpu.
>>> 
>>> s/not/on/ ?
>>> 
>>>> 
>>>> $ dmidecode -s system-product-name
>>>> PowerEdge R620
>>>> $ cat /sys/devices/system/cpu/online
>>>> 0-31
>>>> $ cat /sys/devices/system/cpu/possible
>>>> 0-47
>>>> 
>>>> To fix this issue, use online cpu instead of possible, to
>>>> create perf event and other resource.
>>>> 
>>>> Signed-off-by: Tonghao Zhang <tong@infragraf.org>
>>>> Cc: Quentin Monnet <quentin@isovalent.com>
>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>> Cc: Andrii Nakryiko <andrii@kernel.org>
>>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>>> Cc: Song Liu <song@kernel.org>
>>>> Cc: Yonghong Song <yhs@fb.com>
>>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>>> Cc: KP Singh <kpsingh@kernel.org>
>>>> Cc: Stanislav Fomichev <sdf@google.com>
>>>> Cc: Hao Luo <haoluo@google.com>
>>>> Cc: Jiri Olsa <jolsa@kernel.org>
>>>> ---
>>>> tools/bpf/bpftool/prog.c | 5 +++--
>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>>>> index cfc9fdc1e863..08b352dd799e 100644
>>>> --- a/tools/bpf/bpftool/prog.c
>>>> +++ b/tools/bpf/bpftool/prog.c
>>>> @@ -2056,6 +2056,7 @@ static int profile_parse_metrics(int argc, char **argv)
>>>> 
>>>> static void profile_read_values(struct profiler_bpf *obj)
>>>> {
>>>> +	__u32 possible_cpus = libbpf_num_possible_cpus();
>>>> 	__u32 m, cpu, num_cpu = obj->rodata->num_cpu;
>>>> 	int reading_map_fd, count_map_fd;
>>>> 	__u64 counts[num_cpu];
>>>> @@ -2080,7 +2081,7 @@ static void profile_read_values(struct profiler_bpf *obj)
>>>> 		profile_total_count += counts[cpu];
>>>> 
>>>> 	for (m = 0; m < ARRAY_SIZE(metrics); m++) {
>>>> -		struct bpf_perf_event_value values[num_cpu];
>>>> +		struct bpf_perf_event_value values[possible_cpus];
>>>> 
>>>> 		if (!metrics[m].selected)
>>>> 			continue;
>>>> @@ -2321,7 +2322,7 @@ static int do_profile(int argc, char **argv)
>>>> 	if (num_metric <= 0)
>>>> 		goto out;
>>>> 
>>>> -	num_cpu = libbpf_num_possible_cpus();
>>>> +	num_cpu = libbpf_num_online_cpus();
>>>> 	if (num_cpu <= 0) {
>>>> 		p_err("failed to identify number of CPUs");
>>>> 		goto out;
>>> 
>>> Thanks, but it doesn't seem to be enough to solve the issue. How did you
>>> test it? With your series applied locally, I'm trying the following
>>> (Intel x86_64, CPUs: 0..7):
>>> 
>>> 	# echo 0 > /sys/devices/system/cpu/cpu2/online
>>> 	# ./bpftool prog profile id 1525 duration 1 cycles instructions
>>> 	Error: failed to create event cycles on cpu 2
>>> 
>>> It seems that we're still trying to open the perf events on the offline
>>> CPU in profile_open_perf_events(), because even though we try to use
>>> fewer of the possible CPUs we're still referencing them in order by
>> Hi
>> Thanks for your review and comment.
>> I don’t test the case that one cpu is offline which is not last CPU.
>>> their index. So it works if I only disabled the last CPUs instead (#7,
>>> then #6, ...), but to work with _any_ CPU disabled, we would need to
>>> retrieve the list of online CPUs.
>>> 
>> Yes, In other way, to fix it, we can use the errno return by open_perf_event. If errno is ENODEV, we can skip this cpu to profile? 
>> The patch fix this issue.
>> 
>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
>> index 620032042576..deec7196b48c 100644
>> --- a/tools/bpf/bpftool/common.c
>> +++ b/tools/bpf/bpftool/common.c
>> @@ -55,6 +55,24 @@ void p_err(const char *fmt, ...)
>> 	va_end(ap);
>> }
>> 
>> +void p_warn(const char *fmt, ...)
>> +{
>> +	va_list ap;
>> +
>> +	va_start(ap, fmt);
>> +	if (json_output) {
>> +		jsonw_start_object(json_wtr);
>> +		jsonw_name(json_wtr, "warning");
>> +		jsonw_vprintf_enquote(json_wtr, fmt, ap);
>> +		jsonw_end_object(json_wtr);
>> +	} else {
>> +		fprintf(stderr, "Warn: ");
>> +		vfprintf(stderr, fmt, ap);
>> +		fprintf(stderr, "\n");
>> +	}
>> +	va_end(ap);
>> +}
>> +
>> void p_info(const char *fmt, ...)
>> {
>> 	va_list ap;
>> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
>> index a84224b6a604..e62edec9e13a 100644
>> --- a/tools/bpf/bpftool/main.h
>> +++ b/tools/bpf/bpftool/main.h
>> @@ -86,6 +86,7 @@ extern struct btf *base_btf;
>> extern struct hashmap *refs_table;
>> 
>> void __printf(1, 2) p_err(const char *fmt, ...);
>> +void __printf(1, 2) p_warn(const char *fmt, ...);
>> void __printf(1, 2) p_info(const char *fmt, ...);
>> 
>> bool is_prefix(const char *pfx, const char *str);
>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>> index cfc9fdc1e863..d9363ba01ec0 100644
>> --- a/tools/bpf/bpftool/prog.c
>> +++ b/tools/bpf/bpftool/prog.c
>> @@ -2233,10 +2233,36 @@ static void profile_close_perf_events(struct profiler_bpf *obj)
>> 	profile_perf_event_cnt = 0;
>> }
>> 
>> +static int profile_open_perf_event(int mid, int cpu, int map_fd)
>> +{
>> +	int pmu_fd;
>> +
>> +	pmu_fd = syscall(__NR_perf_event_open, &metrics[mid].attr,
>> +			 -1/*pid*/, cpu, -1/*group_fd*/, 0);
>> +	if (pmu_fd < 0) {
>> +		if (errno == ENODEV) {
>> +			p_warn("cpu %d may be offline, skip %s metric profiling.",
>> +				cpu, metrics[mid].name);
> 
> Nit: I think it's fine to keep this at the info level (p_info()).
Ok
>> +			profile_perf_event_cnt++;
>> +			return 0;
>> +		}
>> +		return -1;
>> +	}
>> +
>> +	if (bpf_map_update_elem(map_fd,
>> +				&profile_perf_event_cnt,
>> +				&pmu_fd, BPF_ANY) ||
>> +	    ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0))
>> +		return -1;
>> +
>> +	profile_perf_events[profile_perf_event_cnt++] = pmu_fd;
>> +	return 0;
>> +}
>> +
>> static int profile_open_perf_events(struct profiler_bpf *obj)
>> {
>> 	unsigned int cpu, m;
>> -	int map_fd, pmu_fd;
>> +	int map_fd;
>> 
>> 	profile_perf_events = calloc(
>> 		sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric);
>> @@ -2255,17 +2281,11 @@ static int profile_open_perf_events(struct profiler_bpf *obj)
>> 		if (!metrics[m].selected)
>> 			continue;
>> 		for (cpu = 0; cpu < obj->rodata->num_cpu; cpu++) {
>> -			pmu_fd = syscall(__NR_perf_event_open, &metrics[m].attr,
>> -					 -1/*pid*/, cpu, -1/*group_fd*/, 0);
>> -			if (pmu_fd < 0 ||
>> -			    bpf_map_update_elem(map_fd, &profile_perf_event_cnt,
>> -						&pmu_fd, BPF_ANY) ||
>> -			    ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0)) {
>> +			if (profile_open_perf_event(m, cpu, map_fd)) {
>> 				p_err("failed to create event %s on cpu %d",
>> 				      metrics[m].name, cpu);
>> 				return -1;
>> 			}
>> -			profile_perf_events[profile_perf_event_cnt++] = pmu_fd;
>> 		}
>> 	}
>> 	return 0;
>> 
>> 
>> ----
>> Best Regards, Tonghao <tong@infragraf.org>
>> 
> 
> I haven't tested this patch, but yes, looks like it should address the
> issue. Could you submit a proper v2, please?
Yes, v2 will be sent soon. I’m on vacation. So sorry reply you late.
> Quentin

----
Best Regards, Tonghao <tong@infragraf.org>
diff mbox series

Patch

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index cfc9fdc1e863..08b352dd799e 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -2056,6 +2056,7 @@  static int profile_parse_metrics(int argc, char **argv)
 
 static void profile_read_values(struct profiler_bpf *obj)
 {
+	__u32 possible_cpus = libbpf_num_possible_cpus();
 	__u32 m, cpu, num_cpu = obj->rodata->num_cpu;
 	int reading_map_fd, count_map_fd;
 	__u64 counts[num_cpu];
@@ -2080,7 +2081,7 @@  static void profile_read_values(struct profiler_bpf *obj)
 		profile_total_count += counts[cpu];
 
 	for (m = 0; m < ARRAY_SIZE(metrics); m++) {
-		struct bpf_perf_event_value values[num_cpu];
+		struct bpf_perf_event_value values[possible_cpus];
 
 		if (!metrics[m].selected)
 			continue;
@@ -2321,7 +2322,7 @@  static int do_profile(int argc, char **argv)
 	if (num_metric <= 0)
 		goto out;
 
-	num_cpu = libbpf_num_possible_cpus();
+	num_cpu = libbpf_num_online_cpus();
 	if (num_cpu <= 0) {
 		p_err("failed to identify number of CPUs");
 		goto out;