Message ID | 20250312083859.1019635-1-vmalik@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] selftests/bpf: Fix string read in strncmp benchmark | expand |
On 3/12/25 09:38, Viktor Malik wrote: > The strncmp benchmark uses the bpf_strncmp helper and a hand-written > loop to compare two strings. The values of the strings are filled from > userspace. One of the strings is non-const (in .bss) while the other is > const (in .rodata) since that is the requirement of bpf_strncmp. > > The problem is that in the hand-written loop, Clang optimizes the reads > from the const string to always return 0 which breaks the benchmark. > > Mark the const string as volatile to avoid that. > > The effect can be seen on the strncmp-no-helper variant. > > Before this change: > > # ./bench strncmp-no-helper > Setting up benchmark 'strncmp-no-helper'... > Benchmark 'strncmp-no-helper' started. > Iter 0 (8440.107us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 1 (73909.374us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 2 (-8140.994us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 3 (3094.474us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 4 (-2828.468us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 5 (2635.595us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 6 (-306.478us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Summary: hits 0.000 ± 0.000M/s ( 0.000M/prod), drops 0.000 ± 0.000M/s, total operations 0.000 ± 0.000M/s > > After this change: > > # ./bench strncmp-no-helper > Setting up benchmark 'strncmp-no-helper'... > Benchmark 'strncmp-no-helper' started. > Iter 0 (21180.011us): hits 5.320M/s ( 5.320M/prod), drops 0.000M/s, total operations 5.320M/s > Iter 1 (-692.499us): hits 5.246M/s ( 5.246M/prod), drops 0.000M/s, total operations 5.246M/s > Iter 2 (-704.751us): hits 5.332M/s ( 5.332M/prod), drops 0.000M/s, total operations 5.332M/s > Iter 3 (62057.929us): hits 5.299M/s ( 5.299M/prod), drops 0.000M/s, total operations 5.299M/s > Iter 4 (-7981.421us): hits 5.303M/s ( 5.303M/prod), drops 0.000M/s, total operations 5.303M/s > Iter 5 (3500.341us): hits 5.306M/s ( 5.306M/prod), drops 0.000M/s, total operations 5.306M/s > Iter 6 (-3851.046us): hits 5.264M/s ( 5.264M/prod), drops 0.000M/s, total operations 5.264M/s > Summary: hits 5.338 ± 0.147M/s ( 5.338M/prod), drops 0.000 ± 0.000M/s, total operations 5.338 ± 0.147M/s > Ah, forgot fixes: Fixes: 9c42652f8be3 ("selftests/bpf: Add benchmark for bpf_strncmp() helper") > Signed-off-by: Viktor Malik <vmalik@redhat.com> > --- > tools/testing/selftests/bpf/progs/strncmp_bench.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c b/tools/testing/selftests/bpf/progs/strncmp_bench.c > index 18373a7df76e..92a828a1ebea 100644 > --- a/tools/testing/selftests/bpf/progs/strncmp_bench.c > +++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c > @@ -9,7 +9,7 @@ > > /* Will be updated by benchmark before program loading */ > const volatile unsigned int cmp_str_len = 1; > -const char target[STRNCMP_STR_SZ]; > +const volatile char target[STRNCMP_STR_SZ]; > > long hits = 0; > char str[STRNCMP_STR_SZ]; > @@ -17,7 +17,7 @@ char str[STRNCMP_STR_SZ]; > char _license[] SEC("license") = "GPL"; > > static __always_inline int local_strncmp(const char *s1, unsigned int sz, > - const char *s2) > + const volatile char *s2) > { > int ret = 0; > unsigned int i; > @@ -43,7 +43,7 @@ int strncmp_no_helper(void *ctx) > SEC("tp/syscalls/sys_enter_getpgid") > int strncmp_helper(void *ctx) > { > - if (bpf_strncmp(str, cmp_str_len + 1, target) < 0) > + if (bpf_strncmp(str, cmp_str_len + 1, (const char *)target) < 0) > __sync_add_and_fetch(&hits, 1); > return 0; > }
On Wed, Mar 12, 2025 at 1:39 AM Viktor Malik <vmalik@redhat.com> wrote: > > The strncmp benchmark uses the bpf_strncmp helper and a hand-written > loop to compare two strings. The values of the strings are filled from > userspace. One of the strings is non-const (in .bss) while the other is > const (in .rodata) since that is the requirement of bpf_strncmp. > > The problem is that in the hand-written loop, Clang optimizes the reads > from the const string to always return 0 which breaks the benchmark. > > Mark the const string as volatile to avoid that. > > The effect can be seen on the strncmp-no-helper variant. > > Before this change: > > # ./bench strncmp-no-helper > Setting up benchmark 'strncmp-no-helper'... > Benchmark 'strncmp-no-helper' started. > Iter 0 (8440.107us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 1 (73909.374us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 2 (-8140.994us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 3 (3094.474us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 4 (-2828.468us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 5 (2635.595us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Iter 6 (-306.478us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s > Summary: hits 0.000 ± 0.000M/s ( 0.000M/prod), drops 0.000 ± 0.000M/s, total operations 0.000 ± 0.000M/s > > After this change: > > # ./bench strncmp-no-helper > Setting up benchmark 'strncmp-no-helper'... > Benchmark 'strncmp-no-helper' started. > Iter 0 (21180.011us): hits 5.320M/s ( 5.320M/prod), drops 0.000M/s, total operations 5.320M/s > Iter 1 (-692.499us): hits 5.246M/s ( 5.246M/prod), drops 0.000M/s, total operations 5.246M/s > Iter 2 (-704.751us): hits 5.332M/s ( 5.332M/prod), drops 0.000M/s, total operations 5.332M/s > Iter 3 (62057.929us): hits 5.299M/s ( 5.299M/prod), drops 0.000M/s, total operations 5.299M/s > Iter 4 (-7981.421us): hits 5.303M/s ( 5.303M/prod), drops 0.000M/s, total operations 5.303M/s > Iter 5 (3500.341us): hits 5.306M/s ( 5.306M/prod), drops 0.000M/s, total operations 5.306M/s > Iter 6 (-3851.046us): hits 5.264M/s ( 5.264M/prod), drops 0.000M/s, total operations 5.264M/s > Summary: hits 5.338 ± 0.147M/s ( 5.338M/prod), drops 0.000 ± 0.000M/s, total operations 5.338 ± 0.147M/s > > Signed-off-by: Viktor Malik <vmalik@redhat.com> > --- > tools/testing/selftests/bpf/progs/strncmp_bench.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c b/tools/testing/selftests/bpf/progs/strncmp_bench.c > index 18373a7df76e..92a828a1ebea 100644 > --- a/tools/testing/selftests/bpf/progs/strncmp_bench.c > +++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c > @@ -9,7 +9,7 @@ > > /* Will be updated by benchmark before program loading */ > const volatile unsigned int cmp_str_len = 1; > -const char target[STRNCMP_STR_SZ]; > +const volatile char target[STRNCMP_STR_SZ]; > > long hits = 0; > char str[STRNCMP_STR_SZ]; > @@ -17,7 +17,7 @@ char str[STRNCMP_STR_SZ]; > char _license[] SEC("license") = "GPL"; > > static __always_inline int local_strncmp(const char *s1, unsigned int sz, > - const char *s2) > + const volatile char *s2) this will be a bit unfair to local_strncmp(), as now you'll be forcing the compiler to re-read s1[i] twice, right? What if we do: diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c b/tools/testing/selftests/bpf/progs/strncmp_bench.c index 18373a7df76e..f47bf88f8d2a 100644 --- a/tools/testing/selftests/bpf/progs/strncmp_bench.c +++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c @@ -35,7 +35,10 @@ static __always_inline int local_strncmp(const char *s1, unsigned int sz, SEC("tp/syscalls/sys_enter_getpgid") int strncmp_no_helper(void *ctx) { - if (local_strncmp(str, cmp_str_len + 1, target) < 0) + const char *target_str = target; + + barrier_var(target_str); + if (local_strncmp(str, cmp_str_len + 1, target_str) < 0) __sync_add_and_fetch(&hits, 1); return 0; } that will prevent compiler optimization as well and won't force us to do all those casts? pw-bot: cr > { > int ret = 0; > unsigned int i; > @@ -43,7 +43,7 @@ int strncmp_no_helper(void *ctx) > SEC("tp/syscalls/sys_enter_getpgid") > int strncmp_helper(void *ctx) > { > - if (bpf_strncmp(str, cmp_str_len + 1, target) < 0) > + if (bpf_strncmp(str, cmp_str_len + 1, (const char *)target) < 0) > __sync_add_and_fetch(&hits, 1); > return 0; > } > -- > 2.48.1 >
On 3/13/25 00:45, Andrii Nakryiko wrote: > On Wed, Mar 12, 2025 at 1:39 AM Viktor Malik <vmalik@redhat.com> wrote: >> >> The strncmp benchmark uses the bpf_strncmp helper and a hand-written >> loop to compare two strings. The values of the strings are filled from >> userspace. One of the strings is non-const (in .bss) while the other is >> const (in .rodata) since that is the requirement of bpf_strncmp. >> >> The problem is that in the hand-written loop, Clang optimizes the reads >> from the const string to always return 0 which breaks the benchmark. >> >> Mark the const string as volatile to avoid that. >> >> The effect can be seen on the strncmp-no-helper variant. >> >> Before this change: >> >> # ./bench strncmp-no-helper >> Setting up benchmark 'strncmp-no-helper'... >> Benchmark 'strncmp-no-helper' started. >> Iter 0 (8440.107us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s >> Iter 1 (73909.374us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s >> Iter 2 (-8140.994us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s >> Iter 3 (3094.474us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s >> Iter 4 (-2828.468us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s >> Iter 5 (2635.595us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s >> Iter 6 (-306.478us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s >> Summary: hits 0.000 ± 0.000M/s ( 0.000M/prod), drops 0.000 ± 0.000M/s, total operations 0.000 ± 0.000M/s >> >> After this change: >> >> # ./bench strncmp-no-helper >> Setting up benchmark 'strncmp-no-helper'... >> Benchmark 'strncmp-no-helper' started. >> Iter 0 (21180.011us): hits 5.320M/s ( 5.320M/prod), drops 0.000M/s, total operations 5.320M/s >> Iter 1 (-692.499us): hits 5.246M/s ( 5.246M/prod), drops 0.000M/s, total operations 5.246M/s >> Iter 2 (-704.751us): hits 5.332M/s ( 5.332M/prod), drops 0.000M/s, total operations 5.332M/s >> Iter 3 (62057.929us): hits 5.299M/s ( 5.299M/prod), drops 0.000M/s, total operations 5.299M/s >> Iter 4 (-7981.421us): hits 5.303M/s ( 5.303M/prod), drops 0.000M/s, total operations 5.303M/s >> Iter 5 (3500.341us): hits 5.306M/s ( 5.306M/prod), drops 0.000M/s, total operations 5.306M/s >> Iter 6 (-3851.046us): hits 5.264M/s ( 5.264M/prod), drops 0.000M/s, total operations 5.264M/s >> Summary: hits 5.338 ± 0.147M/s ( 5.338M/prod), drops 0.000 ± 0.000M/s, total operations 5.338 ± 0.147M/s >> >> Signed-off-by: Viktor Malik <vmalik@redhat.com> >> --- >> tools/testing/selftests/bpf/progs/strncmp_bench.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c b/tools/testing/selftests/bpf/progs/strncmp_bench.c >> index 18373a7df76e..92a828a1ebea 100644 >> --- a/tools/testing/selftests/bpf/progs/strncmp_bench.c >> +++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c >> @@ -9,7 +9,7 @@ >> >> /* Will be updated by benchmark before program loading */ >> const volatile unsigned int cmp_str_len = 1; >> -const char target[STRNCMP_STR_SZ]; >> +const volatile char target[STRNCMP_STR_SZ]; >> >> long hits = 0; >> char str[STRNCMP_STR_SZ]; >> @@ -17,7 +17,7 @@ char str[STRNCMP_STR_SZ]; >> char _license[] SEC("license") = "GPL"; >> >> static __always_inline int local_strncmp(const char *s1, unsigned int sz, >> - const char *s2) >> + const volatile char *s2) > > this will be a bit unfair to local_strncmp(), as now you'll be forcing > the compiler to re-read s1[i] twice, right? What if we do: Ah, right, I didn't realize s1[i] was used twice there. > > > diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c > b/tools/testing/selftests/bpf/progs/strncmp_bench.c > index 18373a7df76e..f47bf88f8d2a 100644 > --- a/tools/testing/selftests/bpf/progs/strncmp_bench.c > +++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c > @@ -35,7 +35,10 @@ static __always_inline int local_strncmp(const char > *s1, unsigned int sz, > SEC("tp/syscalls/sys_enter_getpgid") > int strncmp_no_helper(void *ctx) > { > - if (local_strncmp(str, cmp_str_len + 1, target) < 0) > + const char *target_str = target; > + > + barrier_var(target_str); > + if (local_strncmp(str, cmp_str_len + 1, target_str) < 0) > __sync_add_and_fetch(&hits, 1); > return 0; > } > > > that will prevent compiler optimization as well and won't force us to > do all those casts? Yeah, that works, I'll send v2. Thanks! > > pw-bot: cr > > >> { >> int ret = 0; >> unsigned int i; >> @@ -43,7 +43,7 @@ int strncmp_no_helper(void *ctx) >> SEC("tp/syscalls/sys_enter_getpgid") >> int strncmp_helper(void *ctx) >> { >> - if (bpf_strncmp(str, cmp_str_len + 1, target) < 0) >> + if (bpf_strncmp(str, cmp_str_len + 1, (const char *)target) < 0) >> __sync_add_and_fetch(&hits, 1); >> return 0; >> } >> -- >> 2.48.1 >> >
diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c b/tools/testing/selftests/bpf/progs/strncmp_bench.c index 18373a7df76e..92a828a1ebea 100644 --- a/tools/testing/selftests/bpf/progs/strncmp_bench.c +++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c @@ -9,7 +9,7 @@ /* Will be updated by benchmark before program loading */ const volatile unsigned int cmp_str_len = 1; -const char target[STRNCMP_STR_SZ]; +const volatile char target[STRNCMP_STR_SZ]; long hits = 0; char str[STRNCMP_STR_SZ]; @@ -17,7 +17,7 @@ char str[STRNCMP_STR_SZ]; char _license[] SEC("license") = "GPL"; static __always_inline int local_strncmp(const char *s1, unsigned int sz, - const char *s2) + const volatile char *s2) { int ret = 0; unsigned int i; @@ -43,7 +43,7 @@ int strncmp_no_helper(void *ctx) SEC("tp/syscalls/sys_enter_getpgid") int strncmp_helper(void *ctx) { - if (bpf_strncmp(str, cmp_str_len + 1, target) < 0) + if (bpf_strncmp(str, cmp_str_len + 1, (const char *)target) < 0) __sync_add_and_fetch(&hits, 1); return 0; }
The strncmp benchmark uses the bpf_strncmp helper and a hand-written loop to compare two strings. The values of the strings are filled from userspace. One of the strings is non-const (in .bss) while the other is const (in .rodata) since that is the requirement of bpf_strncmp. The problem is that in the hand-written loop, Clang optimizes the reads from the const string to always return 0 which breaks the benchmark. Mark the const string as volatile to avoid that. The effect can be seen on the strncmp-no-helper variant. Before this change: # ./bench strncmp-no-helper Setting up benchmark 'strncmp-no-helper'... Benchmark 'strncmp-no-helper' started. Iter 0 (8440.107us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 1 (73909.374us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 2 (-8140.994us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 3 (3094.474us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 4 (-2828.468us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 5 (2635.595us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 6 (-306.478us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Summary: hits 0.000 ± 0.000M/s ( 0.000M/prod), drops 0.000 ± 0.000M/s, total operations 0.000 ± 0.000M/s After this change: # ./bench strncmp-no-helper Setting up benchmark 'strncmp-no-helper'... Benchmark 'strncmp-no-helper' started. Iter 0 (21180.011us): hits 5.320M/s ( 5.320M/prod), drops 0.000M/s, total operations 5.320M/s Iter 1 (-692.499us): hits 5.246M/s ( 5.246M/prod), drops 0.000M/s, total operations 5.246M/s Iter 2 (-704.751us): hits 5.332M/s ( 5.332M/prod), drops 0.000M/s, total operations 5.332M/s Iter 3 (62057.929us): hits 5.299M/s ( 5.299M/prod), drops 0.000M/s, total operations 5.299M/s Iter 4 (-7981.421us): hits 5.303M/s ( 5.303M/prod), drops 0.000M/s, total operations 5.303M/s Iter 5 (3500.341us): hits 5.306M/s ( 5.306M/prod), drops 0.000M/s, total operations 5.306M/s Iter 6 (-3851.046us): hits 5.264M/s ( 5.264M/prod), drops 0.000M/s, total operations 5.264M/s Summary: hits 5.338 ± 0.147M/s ( 5.338M/prod), drops 0.000 ± 0.000M/s, total operations 5.338 ± 0.147M/s Signed-off-by: Viktor Malik <vmalik@redhat.com> --- tools/testing/selftests/bpf/progs/strncmp_bench.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)