Message ID | 20240918203925.150231-1-mykyta.yatsenko5@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,v2] selftests/bpf: emit top frequent code lines in veristat | expand |
On Wed, 2024-09-18 at 21:39 +0100, Mykyta Yatsenko wrote: > From: Mykyta Yatsenko <yatsenko@meta.com> > > Production BPF programs are increasing in number of instructions and states > to the point, where optimising verification process for them is necessary > to avoid running into instruction limit. Authors of those BPF programs > need to analyze verifier output, for example, collecting the most > frequent source code lines to understand which part of the program has > the biggest verification cost. > > This patch introduces `--top-src-lines` flag in veristat. > `--top-src-lines=N` makes veristat output N the most popular sorce code > lines, parsed from verification log. > > An example: > ``` > $ sudo ./veristat --log-size=1000000000 --top-src-lines=4 pyperf600.bpf.o > Processing 'pyperf600.bpf.o'... > Top source lines (on_event): > 4697: (pyperf.h:0) > 2334: (pyperf.h:326) event->stack[i] = *symbol_id; > 2334: (pyperf.h:118) pidData->offsets.String_data); > 1176: (pyperf.h:92) bpf_probe_read_user(&frame->f_back, > ... > ``` > > Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> I think this is a cool feature! It's a bit of a shame that we don't collect information like this in the verifier itself, where it would be simpler to do (e.g. associate a counter with each instruction, or with each jump target). [...] > +static int print_top_src_lines(char * const buf, size_t buf_sz, const char *prog_name) > +{ > + int lines_cap = 1; > + int lines_size = 0; > + char **lines; > + char *line = NULL; > + char *state; > + struct line_cnt *freq = NULL; > + struct line_cnt *cur; > + int unique_lines; > + int err; Note: when compiling with clang 20.0.0git the following warning is reported: veristat.c:957:14: error: variable 'err' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized] 957 | for (i = 0; i < min(unique_lines, env.top_src_lines); ++i) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ veristat.c:972:9: note: uninitialized use occurs here 972 | return err; | ... veristat.c:903:9: note: initialize the variable 'err' to silence this warning 903 | int err; | ^ | = 0 Also, a nitpick: declarations should be sorted in a "reverse Christmas tree" order (at-least that's what Andrii enforces :). > + int i; > + > + lines = calloc(lines_cap, sizeof(char *)); Nitpick: here and in a few places below use sizeof(*<array>), e.g.: calloc(lines_cap, sizeof(*lines)) > + if (!lines) > + return -ENOMEM; > + > + while ((line = strtok_r(line ? NULL : buf, "\n", &state))) { > + if (strncmp(line, "; ", 2)) > + continue; > + line += 2; > + > + if (lines_size == lines_cap) { > + char **tmp; > + > + lines_cap *= 2; > + tmp = realloc(lines, lines_cap * sizeof(char *)); > + if (!tmp) { > + err = -ENOMEM; > + goto cleanup; > + } > + lines = tmp; > + } > + lines[lines_size] = line; > + lines_size++; > + } > + > + if (!lines_size) > + goto cleanup; > + > + qsort(lines, lines_size, sizeof(char *), str_cmp); > + > + freq = calloc(lines_size, sizeof(struct line_cnt)); > + if (!freq) { > + err = -ENOMEM; > + goto cleanup; > + } > + > + cur = freq; > + cur->line = lines[0]; > + cur->cnt = 1; > + for (i = 1; i < lines_size; ++i) { > + if (strcmp(lines[i], cur->line)) { > + cur++; > + cur->line = lines[i]; > + cur->cnt = 0; > + } > + cur->cnt++; > + } > + unique_lines = cur - freq + 1; > + > + qsort(freq, unique_lines, sizeof(struct line_cnt), line_cnt_cmp); > + > + printf("Top source lines (%s):\n", prog_name); > + for (i = 0; i < min(unique_lines, env.top_src_lines); ++i) { > + char *src_code; > + char *src_line; > + > + src_code = strtok_r(freq[i].line, "@", &state); Does verifier guarantee presence of '@' for each source comment line? > + src_line = strtok_r(NULL, "\0", &state); > + if (src_line) The '.line' string is null-terminated, can 'src_line' ever be NULL? > + printf("%5d: (%s)\t%s\n", freq[i].cnt, src_line + 1, src_code); > + else > + printf("%5d: %s\n", freq[i].cnt, src_code); > + } > + > +cleanup: > + free(freq); > + free(lines); > + return err; > +} > + [...]
Hi Mykyta, On 2024/9/19 04:39, Mykyta Yatsenko wrote: > From: Mykyta Yatsenko <yatsenko@meta.com> > > Production BPF programs are increasing in number of instructions and states > to the point, where optimising verification process for them is necessary > to avoid running into instruction limit. Authors of those BPF programs > need to analyze verifier output, for example, collecting the most > frequent source code lines to understand which part of the program has > the biggest verification cost. > > This patch introduces `--top-src-lines` flag in veristat. > `--top-src-lines=N` makes veristat output N the most popular sorce code > lines, parsed from verification log. > > An example: > ``` > $ sudo ./veristat --log-size=1000000000 --top-src-lines=4 pyperf600.bpf.o > Processing 'pyperf600.bpf.o'... > Top source lines (on_event): > 4697: (pyperf.h:0) > 2334: (pyperf.h:326) event->stack[i] = *symbol_id; > 2334: (pyperf.h:118) pidData->offsets.String_data); > 1176: (pyperf.h:92) bpf_probe_read_user(&frame->f_back, > ... > ``` I think this is useful and wonder how can I use it. In particular, is it possible to know the corresponding instruction number contributed by the source lines? Assume a prog is rejected due to instruction limit. I can optimize the prog with `--top-src-lines`, but have to check the result with another "load" to see the total instruction number (because I don't know how many instructions reduced with the optimized src lines). Am I right? or is there any better method? Thanks.
On 19/09/2024 03:15, Eduard Zingerman wrote: > On Wed, 2024-09-18 at 21:39 +0100, Mykyta Yatsenko wrote: >> From: Mykyta Yatsenko <yatsenko@meta.com> >> >> Production BPF programs are increasing in number of instructions and states >> to the point, where optimising verification process for them is necessary >> to avoid running into instruction limit. Authors of those BPF programs >> need to analyze verifier output, for example, collecting the most >> frequent source code lines to understand which part of the program has >> the biggest verification cost. >> >> This patch introduces `--top-src-lines` flag in veristat. >> `--top-src-lines=N` makes veristat output N the most popular sorce code >> lines, parsed from verification log. >> >> An example: >> ``` >> $ sudo ./veristat --log-size=1000000000 --top-src-lines=4 pyperf600.bpf.o >> Processing 'pyperf600.bpf.o'... >> Top source lines (on_event): >> 4697: (pyperf.h:0) >> 2334: (pyperf.h:326) event->stack[i] = *symbol_id; >> 2334: (pyperf.h:118) pidData->offsets.String_data); >> 1176: (pyperf.h:92) bpf_probe_read_user(&frame->f_back, >> ... >> ``` >> >> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> > I think this is a cool feature! > It's a bit of a shame that we don't collect information like this in > the verifier itself, where it would be simpler to do (e.g. associate a > counter with each instruction, or with each jump target). Thanks for looking at this patch. > [...] > >> +static int print_top_src_lines(char * const buf, size_t buf_sz, const char *prog_name) >> +{ >> + int lines_cap = 1; >> + int lines_size = 0; >> + char **lines; >> + char *line = NULL; >> + char *state; >> + struct line_cnt *freq = NULL; >> + struct line_cnt *cur; >> + int unique_lines; >> + int err; > Note: > when compiling with clang 20.0.0git the following warning is reported: > > veristat.c:957:14: error: variable 'err' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized] > 957 | for (i = 0; i < min(unique_lines, env.top_src_lines); ++i) { > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > veristat.c:972:9: note: uninitialized use occurs here > 972 | return err; > | > ... > veristat.c:903:9: note: initialize the variable 'err' to silence this warning > 903 | int err; > | ^ > | = 0 yes, I've got a mail from CI regarding this, going to fix in the next version. > Also, a nitpick: declarations should be sorted in a "reverse Christmas > tree" order (at-least that's what Andrii enforces :). > >> + int i; >> + >> + lines = calloc(lines_cap, sizeof(char *)); > Nitpick: here and in a few places below use sizeof(*<array>), e.g.: > calloc(lines_cap, sizeof(*lines)) > >> + if (!lines) >> + return -ENOMEM; >> + >> + while ((line = strtok_r(line ? NULL : buf, "\n", &state))) { >> + if (strncmp(line, "; ", 2)) >> + continue; >> + line += 2; >> + >> + if (lines_size == lines_cap) { >> + char **tmp; >> + >> + lines_cap *= 2; >> + tmp = realloc(lines, lines_cap * sizeof(char *)); >> + if (!tmp) { >> + err = -ENOMEM; >> + goto cleanup; >> + } >> + lines = tmp; >> + } >> + lines[lines_size] = line; >> + lines_size++; >> + } >> + >> + if (!lines_size) >> + goto cleanup; >> + >> + qsort(lines, lines_size, sizeof(char *), str_cmp); >> + >> + freq = calloc(lines_size, sizeof(struct line_cnt)); >> + if (!freq) { >> + err = -ENOMEM; >> + goto cleanup; >> + } >> + >> + cur = freq; >> + cur->line = lines[0]; >> + cur->cnt = 1; >> + for (i = 1; i < lines_size; ++i) { >> + if (strcmp(lines[i], cur->line)) { >> + cur++; >> + cur->line = lines[i]; >> + cur->cnt = 0; >> + } >> + cur->cnt++; >> + } >> + unique_lines = cur - freq + 1; >> + >> + qsort(freq, unique_lines, sizeof(struct line_cnt), line_cnt_cmp); >> + >> + printf("Top source lines (%s):\n", prog_name); >> + for (i = 0; i < min(unique_lines, env.top_src_lines); ++i) { >> + char *src_code; >> + char *src_line; >> + >> + src_code = strtok_r(freq[i].line, "@", &state); > Does verifier guarantee presence of '@' for each source comment line? It does not, if there is no @ character in the source line, I just print a full source line. `if (src_line)` should check for that. >> + src_line = strtok_r(NULL, "\0", &state); >> + if (src_line) > The '.line' string is null-terminated, can 'src_line' ever be NULL? > >> + printf("%5d: (%s)\t%s\n", freq[i].cnt, src_line + 1, src_code); >> + else >> + printf("%5d: %s\n", freq[i].cnt, src_code); >> + } >> + >> +cleanup: >> + free(freq); >> + free(lines); >> + return err; >> +} >> + > [...] > >
On 19/09/2024 03:51, Philo Lu wrote: > > Hi Mykyta, > > On 2024/9/19 04:39, Mykyta Yatsenko wrote: >> From: Mykyta Yatsenko <yatsenko@meta.com> >> >> Production BPF programs are increasing in number of instructions and >> states >> to the point, where optimising verification process for them is >> necessary >> to avoid running into instruction limit. Authors of those BPF programs >> need to analyze verifier output, for example, collecting the most >> frequent source code lines to understand which part of the program has >> the biggest verification cost. >> >> This patch introduces `--top-src-lines` flag in veristat. >> `--top-src-lines=N` makes veristat output N the most popular sorce code >> lines, parsed from verification log. >> >> An example: >> ``` >> $ sudo ./veristat --log-size=1000000000 --top-src-lines=4 >> pyperf600.bpf.o >> Processing 'pyperf600.bpf.o'... >> Top source lines (on_event): >> 4697: (pyperf.h:0) >> 2334: (pyperf.h:326) event->stack[i] = *symbol_id; >> 2334: (pyperf.h:118) pidData->offsets.String_data); >> 1176: (pyperf.h:92) bpf_probe_read_user(&frame->f_back, >> ... >> ``` > > I think this is useful and wonder how can I use it. In particular, is > it possible to know the corresponding instruction number contributed > by the source lines? > No, as far as I know, we don't have that info, so we just use number of source lines as a proxy for number of instructions. Eduard suggested to collect instruction count per source line in verifier, maybe that actually what we should do. > Assume a prog is rejected due to instruction limit. I can optimize the > prog with `--top-src-lines`, but have to check the result with another > "load" to see the total instruction number (because I don't know how > many instructions reduced with the optimized src lines). > > Am I right? or is there any better method? Yes, you are right. > > Thanks.
On Wed, Sep 18, 2024 at 1:40 PM Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> wrote: > > From: Mykyta Yatsenko <yatsenko@meta.com> > > Production BPF programs are increasing in number of instructions and states > to the point, where optimising verification process for them is necessary > to avoid running into instruction limit. Authors of those BPF programs > need to analyze verifier output, for example, collecting the most > frequent source code lines to understand which part of the program has > the biggest verification cost. > > This patch introduces `--top-src-lines` flag in veristat. > `--top-src-lines=N` makes veristat output N the most popular sorce code > lines, parsed from verification log. > > An example: > ``` > $ sudo ./veristat --log-size=1000000000 --top-src-lines=4 pyperf600.bpf.o > Processing 'pyperf600.bpf.o'... > Top source lines (on_event): > 4697: (pyperf.h:0) > 2334: (pyperf.h:326) event->stack[i] = *symbol_id; > 2334: (pyperf.h:118) pidData->offsets.String_data); > 1176: (pyperf.h:92) bpf_probe_read_user(&frame->f_back, > ... > ``` > > Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> > --- > tools/testing/selftests/bpf/veristat.c | 132 ++++++++++++++++++++++++- > 1 file changed, 127 insertions(+), 5 deletions(-) > Looks pretty close to the final state, see some nits and some potential problems below. We should figure out the verboseness bits before applying this, though. pw-bot: cr > diff --git a/tools/testing/selftests/bpf/veristat.c b/tools/testing/selftests/bpf/veristat.c > index 1ec5c4c47235..854fa4459b77 100644 > --- a/tools/testing/selftests/bpf/veristat.c > +++ b/tools/testing/selftests/bpf/veristat.c > @@ -143,6 +143,7 @@ static struct env { > char **filenames; > int filename_cnt; > bool verbose; > + bool print_verbose; let's not do that, if we need to pass log_level to the kernel due to top_src_lines, then that should be coded explicitly, not relying on implicit "verbose" flag. But also consider that there are two verbose levels for the kernel: LOG_LEVEL=1 and LOG_LEVEL=2. We probably should default to LOG_LEVEL=2 if --top-src-lines is specified *unless* user explicitly specified -vl1. Please play with different combinations and see if they make sense. log_level=2 is extremely verbose, so if the user specified -v (and not -vl2), they will be surprised by extra verboseness. > bool debug; > bool quiet; > bool force_checkpoints; > @@ -179,11 +180,12 @@ static struct env { > int files_skipped; > int progs_processed; > int progs_skipped; > + int top_src_lines; > } env; [...] > +static int print_top_src_lines(char * const buf, size_t buf_sz, const char *prog_name) > +{ > + int lines_cap = 1; > + int lines_size = 0; > + char **lines; > + char *line = NULL; > + char *state; > + struct line_cnt *freq = NULL; > + struct line_cnt *cur; > + int unique_lines; > + int err; > + int i; > + > + lines = calloc(lines_cap, sizeof(char *)); > + if (!lines) > + return -ENOMEM; start with lines_cap == 0, skip this calloc(), let realloc() do all the work (lines should be initialized to NULL, of course) > + > + while ((line = strtok_r(line ? NULL : buf, "\n", &state))) { > + if (strncmp(line, "; ", 2)) nit: with strncmp() and strcmp() I find it much more readable explicit == 0 or != 0, otherwise my brain instinctively understands it in exactly opposite way > + continue; > + line += 2; > + > + if (lines_size == lines_cap) { > + char **tmp; > + > + lines_cap *= 2; nit: generally speaking, it's common pattern to have some starting minimal length that's a bit bigger than 1 or 2, so e.g., you'd do lines_cap = max(16, 2 * lines_cap); > + tmp = realloc(lines, lines_cap * sizeof(char *)); > + if (!tmp) { > + err = -ENOMEM; > + goto cleanup; > + } > + lines = tmp; > + } > + lines[lines_size] = line; > + lines_size++; > + } > + > + if (!lines_size) nit: lines_size == 0 (it's not an error code and neither is it bool, so explicit zero comparison makes more sense, IMO) > + goto cleanup; > + > + qsort(lines, lines_size, sizeof(char *), str_cmp); nit: probably would be better to use sizeof(*lines) > + > + freq = calloc(lines_size, sizeof(struct line_cnt)); > + if (!freq) { > + err = -ENOMEM; > + goto cleanup; > + } > + > + cur = freq; > + cur->line = lines[0]; > + cur->cnt = 1; > + for (i = 1; i < lines_size; ++i) { > + if (strcmp(lines[i], cur->line)) { > + cur++; > + cur->line = lines[i]; > + cur->cnt = 0; > + } > + cur->cnt++; > + } > + unique_lines = cur - freq + 1; > + > + qsort(freq, unique_lines, sizeof(struct line_cnt), line_cnt_cmp); nit: sizeof(*freq), besides being shorter, it also won't need updating if we rename the type *and* it shows connection to freq[] itself quite explicitly > + > + printf("Top source lines (%s):\n", prog_name); > + for (i = 0; i < min(unique_lines, env.top_src_lines); ++i) { > + char *src_code; > + char *src_line; > + > + src_code = strtok_r(freq[i].line, "@", &state); what happens if the line doesn't have '@' in it, which will be the case on older kernels? Can you please test all this on some old kernel with a different verifier log output? It shouldn't crash. tbh, it feels like strtok_r is a bit of an overkill here. I'd just strrchr('@') (note reverse search), if it's there, extract everything to the right as file name, otherwise assume the entire string is source code line. This will handle both old and new verifier log formats > + src_line = strtok_r(NULL, "\0", &state); > + if (src_line) > + printf("%5d: (%s)\t%s\n", freq[i].cnt, src_line + 1, src_code); > + else > + printf("%5d: %s\n", freq[i].cnt, src_code); > + } > + > +cleanup: > + free(freq); > + free(lines); > + return err; > +} > + [...]
On Thu, Sep 19, 2024 at 5:08 AM Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> wrote: > > On 19/09/2024 03:51, Philo Lu wrote: > > > > Hi Mykyta, > > > > On 2024/9/19 04:39, Mykyta Yatsenko wrote: > >> From: Mykyta Yatsenko <yatsenko@meta.com> > >> > >> Production BPF programs are increasing in number of instructions and > >> states > >> to the point, where optimising verification process for them is > >> necessary > >> to avoid running into instruction limit. Authors of those BPF programs > >> need to analyze verifier output, for example, collecting the most > >> frequent source code lines to understand which part of the program has > >> the biggest verification cost. > >> > >> This patch introduces `--top-src-lines` flag in veristat. > >> `--top-src-lines=N` makes veristat output N the most popular sorce code > >> lines, parsed from verification log. > >> > >> An example: > >> ``` > >> $ sudo ./veristat --log-size=1000000000 --top-src-lines=4 > >> pyperf600.bpf.o > >> Processing 'pyperf600.bpf.o'... > >> Top source lines (on_event): > >> 4697: (pyperf.h:0) > >> 2334: (pyperf.h:326) event->stack[i] = *symbol_id; > >> 2334: (pyperf.h:118) pidData->offsets.String_data); > >> 1176: (pyperf.h:92) bpf_probe_read_user(&frame->f_back, > >> ... > >> ``` > > > > I think this is useful and wonder how can I use it. In particular, is > > it possible to know the corresponding instruction number contributed > > by the source lines? > > > No, as far as I know, we don't have that info, so we just use number of > source lines as a proxy for number of instructions. Eduard suggested to > collect > instruction count per source line in verifier, maybe that actually what > we should do. Let's start simple and then build upon that. I'd rather finalize the minimal first version before going down to assembly instruction recording and counting. > > Assume a prog is rejected due to instruction limit. I can optimize the > > prog with `--top-src-lines`, but have to check the result with another > > "load" to see the total instruction number (because I don't know how > > many instructions reduced with the optimized src lines). > > > > Am I right? or is there any better method? > Yes, you are right. > > > > Thanks. > >
diff --git a/tools/testing/selftests/bpf/veristat.c b/tools/testing/selftests/bpf/veristat.c index 1ec5c4c47235..854fa4459b77 100644 --- a/tools/testing/selftests/bpf/veristat.c +++ b/tools/testing/selftests/bpf/veristat.c @@ -143,6 +143,7 @@ static struct env { char **filenames; int filename_cnt; bool verbose; + bool print_verbose; bool debug; bool quiet; bool force_checkpoints; @@ -179,11 +180,12 @@ static struct env { int files_skipped; int progs_processed; int progs_skipped; + int top_src_lines; } env; static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args) { - if (!env.verbose) + if (!env.print_verbose) return 0; if (level == LIBBPF_DEBUG && !env.debug) return 0; @@ -206,6 +208,7 @@ const char argp_program_doc[] = enum { OPT_LOG_FIXED = 1000, OPT_LOG_SIZE = 1001, + OPT_TOP_SRC_LINES = 1002, }; static const struct argp_option opts[] = { @@ -228,6 +231,7 @@ static const struct argp_option opts[] = { "Force frequent BPF verifier state checkpointing (set BPF_F_TEST_STATE_FREQ program flag)" }, { "test-reg-invariants", 'r', NULL, 0, "Force BPF verifier failure on register invariant violation (BPF_F_TEST_REG_INVARIANTS program flag)" }, + { "top-src-lines", OPT_TOP_SRC_LINES, "N", 0, "Emit N most frequent source code lines" }, {}, }; @@ -249,10 +253,12 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state) break; case 'v': env.verbose = true; + env.print_verbose = true; break; case 'd': env.debug = true; env.verbose = true; + env.print_verbose = true; break; case 'q': env.quiet = true; @@ -337,6 +343,15 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state) return -ENOMEM; env.filename_cnt++; break; + case OPT_TOP_SRC_LINES: + errno = 0; + env.verbose = true; + env.top_src_lines = strtol(arg, NULL, 10); + if (errno) { + fprintf(stderr, "invalid top lines N specifier: %s\n", arg); + argp_usage(state); + } + break; default: return ARGP_ERR_UNKNOWN; } @@ -854,6 +869,109 @@ static int parse_verif_log(char * const buf, size_t buf_sz, struct verif_stats * return 0; } +struct line_cnt { + char *line; + int cnt; +}; + +static int str_cmp(const void *a, const void *b) +{ + const char **str1 = (const char **)a; + const char **str2 = (const char **)b; + + return strcmp(*str1, *str2); +} + +static int line_cnt_cmp(const void *a, const void *b) +{ + const struct line_cnt *a_cnt = (const struct line_cnt *)a; + const struct line_cnt *b_cnt = (const struct line_cnt *)b; + + return b_cnt->cnt - a_cnt->cnt; +} + +static int print_top_src_lines(char * const buf, size_t buf_sz, const char *prog_name) +{ + int lines_cap = 1; + int lines_size = 0; + char **lines; + char *line = NULL; + char *state; + struct line_cnt *freq = NULL; + struct line_cnt *cur; + int unique_lines; + int err; + int i; + + lines = calloc(lines_cap, sizeof(char *)); + if (!lines) + return -ENOMEM; + + while ((line = strtok_r(line ? NULL : buf, "\n", &state))) { + if (strncmp(line, "; ", 2)) + continue; + line += 2; + + if (lines_size == lines_cap) { + char **tmp; + + lines_cap *= 2; + tmp = realloc(lines, lines_cap * sizeof(char *)); + if (!tmp) { + err = -ENOMEM; + goto cleanup; + } + lines = tmp; + } + lines[lines_size] = line; + lines_size++; + } + + if (!lines_size) + goto cleanup; + + qsort(lines, lines_size, sizeof(char *), str_cmp); + + freq = calloc(lines_size, sizeof(struct line_cnt)); + if (!freq) { + err = -ENOMEM; + goto cleanup; + } + + cur = freq; + cur->line = lines[0]; + cur->cnt = 1; + for (i = 1; i < lines_size; ++i) { + if (strcmp(lines[i], cur->line)) { + cur++; + cur->line = lines[i]; + cur->cnt = 0; + } + cur->cnt++; + } + unique_lines = cur - freq + 1; + + qsort(freq, unique_lines, sizeof(struct line_cnt), line_cnt_cmp); + + printf("Top source lines (%s):\n", prog_name); + for (i = 0; i < min(unique_lines, env.top_src_lines); ++i) { + char *src_code; + char *src_line; + + src_code = strtok_r(freq[i].line, "@", &state); + src_line = strtok_r(NULL, "\0", &state); + if (src_line) + printf("%5d: (%s)\t%s\n", freq[i].cnt, src_line + 1, src_code); + else + printf("%5d: %s\n", freq[i].cnt, src_code); + } + +cleanup: + free(freq); + free(lines); + return err; +} + static int guess_prog_type_by_ctx_name(const char *ctx_name, enum bpf_prog_type *prog_type, enum bpf_attach_type *attach_type) @@ -1043,11 +1161,13 @@ static int process_prog(const char *filename, struct bpf_object *obj, struct bpf stats->stats[VERDICT] = err == 0; /* 1 - success, 0 - failure */ parse_verif_log(buf, buf_sz, stats); - if (env.verbose) { + if (env.print_verbose) { printf("PROCESSING %s/%s, DURATION US: %ld, VERDICT: %s, VERIFIER LOG:\n%s\n", filename, prog_name, stats->stats[DURATION], err ? "failure" : "success", buf); } + if (env.top_src_lines) + print_top_src_lines(buf, buf_sz, stats->prog_name); if (verif_log_buf != buf) free(buf); @@ -1065,13 +1185,13 @@ static int process_obj(const char *filename) int err = 0, prog_cnt = 0; if (!should_process_file_prog(base_filename, NULL)) { - if (env.verbose) + if (env.print_verbose) printf("Skipping '%s' due to filters...\n", filename); env.files_skipped++; return 0; } if (!is_bpf_obj_file(filename)) { - if (env.verbose) + if (env.print_verbose) printf("Skipping '%s' as it's not a BPF object file...\n", filename); env.files_skipped++; return 0; @@ -2116,13 +2236,15 @@ int main(int argc, char **argv) return 0; } - if (env.verbose && env.quiet) { + if (env.print_verbose && env.quiet) { fprintf(stderr, "Verbose and quiet modes are incompatible, please specify just one or neither!\n\n"); argp_help(&argp, stderr, ARGP_HELP_USAGE, "veristat"); return 1; } if (env.verbose && env.log_level == 0) env.log_level = 1; + if (env.top_src_lines && env.log_level < 2) + env.log_level = 2; if (env.output_spec.spec_cnt == 0) { if (env.out_fmt == RESFMT_CSV)