diff mbox series

[dwarves,v4,3/4] pahole: Use per-thread btf instances to avoid mutex locking.

Message ID 20220126192039.2840752-4-kuifeng@fb.com (mailing list archive)
State Not Applicable
Delegated to: BPF
Headers show
Series Parallelize BTF type info generating of pahole | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR fail merge-conflict
netdev/tree_selection success Not a local patch

Commit Message

Kui-Feng Lee Jan. 26, 2022, 7:20 p.m. UTC
Create an instance of btf for each worker thread, and add type info to
the local btf instance in the steal-function of pahole without mutex
acquiring.  Once finished with all worker threads, merge all
per-thread btf instances to the primary btf instance.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 btf_encoder.c |   5 ++
 btf_encoder.h |   2 +
 pahole.c      | 125 ++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 124 insertions(+), 8 deletions(-)

Comments

Andrii Nakryiko Jan. 26, 2022, 7:58 p.m. UTC | #1
On Wed, Jan 26, 2022 at 11:21 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Create an instance of btf for each worker thread, and add type info to
> the local btf instance in the steal-function of pahole without mutex
> acquiring.  Once finished with all worker threads, merge all
> per-thread btf instances to the primary btf instance.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---

There are still unnecessary casts and missing {} in the else branch,
but I'll let Arnaldo decide or fix it up.

Once this lands, can you please send kernel patch to use -j if pahole
support it during the kernel build? See scripts/pahole-version.sh and
scripts/link-vmlinux.sh for how pahole is set up and used in the
kernel. Thanks!

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  btf_encoder.c |   5 ++
>  btf_encoder.h |   2 +
>  pahole.c      | 125 ++++++++++++++++++++++++++++++++++++++++++++++----
>  3 files changed, 124 insertions(+), 8 deletions(-)
>

[...]
Kui-Feng Lee Jan. 26, 2022, 8:57 p.m. UTC | #2
On Wed, 2022-01-26 at 11:58 -0800, Andrii Nakryiko wrote:
> On Wed, Jan 26, 2022 at 11:21 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > 
> > Create an instance of btf for each worker thread, and add type info
> > to
> > the local btf instance in the steal-function of pahole without
> > mutex
> > acquiring.  Once finished with all worker threads, merge all
> > per-thread btf instances to the primary btf instance.
> > 
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> 
> There are still unnecessary casts and missing {} in the else branch,
> but I'll let Arnaldo decide or fix it up.
> 
> Once this lands, can you please send kernel patch to use -j if pahole
> support it during the kernel build? See scripts/pahole-version.sh and
> scripts/link-vmlinux.sh for how pahole is set up and used in the
> kernel. Thanks!

Sure!
Arnaldo Carvalho de Melo Jan. 27, 2022, 10:05 a.m. UTC | #3
Em Wed, Jan 26, 2022 at 11:58:27AM -0800, Andrii Nakryiko escreveu:
> On Wed, Jan 26, 2022 at 11:21 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
> >
> > Create an instance of btf for each worker thread, and add type info to
> > the local btf instance in the steal-function of pahole without mutex
> > acquiring.  Once finished with all worker threads, merge all
> > per-thread btf instances to the primary btf instance.
> >
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> 
> There are still unnecessary casts and missing {} in the else branch,
> but I'll let Arnaldo decide or fix it up.

Yeah, I'll do some sanitization, thanks for all the review!

- Arnaldo
 
> Once this lands, can you please send kernel patch to use -j if pahole
> support it during the kernel build? See scripts/pahole-version.sh and
> scripts/link-vmlinux.sh for how pahole is set up and used in the
> kernel. Thanks!
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> 
> >  btf_encoder.c |   5 ++
> >  btf_encoder.h |   2 +
> >  pahole.c      | 125 ++++++++++++++++++++++++++++++++++++++++++++++----
> >  3 files changed, 124 insertions(+), 8 deletions(-)
> >
> 
> [...]
Arnaldo Carvalho de Melo Jan. 27, 2022, 8:12 p.m. UTC | #4
Em Wed, Jan 26, 2022 at 11:58:27AM -0800, Andrii Nakryiko escreveu:
> On Wed, Jan 26, 2022 at 11:21 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
> >
> > Create an instance of btf for each worker thread, and add type info to
> > the local btf instance in the steal-function of pahole without mutex
> > acquiring.  Once finished with all worker threads, merge all
> > per-thread btf instances to the primary btf instance.
> >
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> 
> There are still unnecessary casts and missing {} in the else branch,
> but I'll let Arnaldo decide or fix it up.
> 
> Once this lands, can you please send kernel patch to use -j if pahole
> support it during the kernel build? See scripts/pahole-version.sh and
> scripts/link-vmlinux.sh for how pahole is set up and used in the
> kernel. Thanks!

I also tweaked this:

diff --git a/pahole.c b/pahole.c
index 8dcd6bf951fe1f93..1b2b19b2be45d30c 100644
--- a/pahole.c
+++ b/pahole.c
@@ -2887,13 +2887,13 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
                static pthread_mutex_t btf_lock = PTHREAD_MUTEX_INITIALIZER;
                struct btf_encoder *encoder;

+               pthread_mutex_lock(&btf_lock);
                /*
                 * FIXME:
                 *
                 * This should be really done at main(), but since in the current codebase only at this
                 * point we'll have cu->elf setup...
                 */
-               pthread_mutex_lock(&btf_lock);
                if (!btf_encoder) {
                        /*
                         * btf_encoder is the primary encoder.
⬢[acme@toolbox pahole]$

As moving it to after that comment will only make the patch a bit
larger, changing nothing.

+++ b/pahole.c
@@ -2900,11 +2900,9 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
                         * And, it is used by the thread
                         * create it.
                         */
-                       btf_encoder = btf_encoder__new(cu, detached_btf_filename,
-                                                      conf_load->base_btf,
-                                                      skip_encoding_btf_vars,
-                                                      btf_encode_force,
-                                                      btf_gen_floats, global_verbose);
+                       btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars,
+                                                      btf_encode_force, btf_gen_floats, global_verbose);
+
                        if (btf_encoder && thr_data) {
                                struct thread_data *thread = thr_data;

⬢[acme@toolbox pahole]$

i.e. cosmetic stuff to make the patch smaller by keeping preexisting
lines as-is.

And the missing {} Andrii noticed:

diff --git a/pahole.c b/pahole.c
index 7e2e37582f21c566..8c0a982f05c9ae3d 100644
--- a/pahole.c
+++ b/pahole.c
@@ -2937,8 +2937,9 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
                                thread->btf = btf_encoder__btf(thread->encoder);
                        }
                        encoder = thread->encoder;
-               } else
+               } else {
                        encoder = btf_encoder;
+               }

                if (btf_encoder__encode_cu(encoder, cu)) {
                        fprintf(stderr, "Encountered error while encoding BTF.\n");
⬢[acme@toolbox pahole]$

I'll look at the needless casts and push it to the 'next' and tmp.master
branches so that libbpf's CI can have a chance to test it.

I also added 'perf stat' results for 1.21 (the one in this fedora 34
workstation), 1.23 (parallel DWARF loading) and with your patches + the
libbpf update as a committer testing section, please add performance
numbers in in future work.

Thanks, applied!

- Arnaldo
Arnaldo Carvalho de Melo Jan. 28, 2022, 7:50 p.m. UTC | #5
Em Thu, Jan 27, 2022 at 05:12:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Jan 26, 2022 at 11:58:27AM -0800, Andrii Nakryiko escreveu:
> > On Wed, Jan 26, 2022 at 11:21 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > > Create an instance of btf for each worker thread, and add type info to
> > > the local btf instance in the steal-function of pahole without mutex
> > > acquiring.  Once finished with all worker threads, merge all
> > > per-thread btf instances to the primary btf instance.

> > There are still unnecessary casts and missing {} in the else branch,
> > but I'll let Arnaldo decide or fix it up.

So its just one unneeded cast as thr_data here is just a 'void *':

diff --git a/pahole.c b/pahole.c
index 8c0a982f05c9ae3d..39e18804100dbfda 100644
--- a/pahole.c
+++ b/pahole.c
@@ -2924,7 +2924,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
                 * avoids copying the data collected by the first thread.
                 */
                if (thr_data) {
-                       struct thread_data *thread = (struct thread_data *)thr_data;
+                       struct thread_data *thread = thr_data;

                        if (thread->encoder == NULL) {
                                thread->encoder =


This other is needed as it is a "void **":

@@ -2832,7 +2832,7 @@ static int pahole_thread_exit(struct conf_load *conf, void *thr_data)
 static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void **thr_data,
                                  int error)
 {
-       struct thread_data **threads = (struct thread_data **)thr_data;
+       struct thread_data **threads = thr_data;
        int i;
        int err = 0;


Removing it:

/var/home/acme/git/pahole/pahole.c: In function ‘pahole_threads_collect’:
/var/home/acme/git/pahole/pahole.c:2835:40: warning: initialization of ‘struct thread_data **’ from incompatible pointer type ‘void **’ [-Wincompatible-pointer-types]
 2835 |         struct thread_data **threads = thr_data;
      |                                        ^~~~~~~~


And I did some more profiling, now the focus should go to elfutils:

⬢[acme@toolbox pahole]$ perf report --no-children -s dso --call-graph none 2> /dev/null | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 27K of event 'cycles:u'
# Event count (approx.): 27956766207
#
# Overhead  Shared Object
# ........  ...................
#
    46.70%  libdwarves.so.1.0.0
    39.84%  libdw-0.186.so
     9.70%  libc-2.33.so
     2.14%  libpthread-2.33.so
     1.47%  [unknown]
     0.09%  ld-2.33.so
     0.06%  libelf-0.186.so
     0.00%  libcrypto.so.1.1.1l
     0.00%  libk5crypto.so.3.1
⬢[acme@toolbox pahole]$

$ perf report -g graph,0.5,2 --stdio --no-children -s dso --dso libdw-0.186.so

# To display the perf.data header info, please use --header/--header-only options.
#
# dso: libdw-0.186.so
#
# Total Lost Samples: 0
#
# Samples: 27K of event 'cycles:u'
# Event count (approx.): 27956766207
#
# Overhead  Shared Object 
# ........  ..............
#
    39.84%  libdw-0.186.so
            |          
            |--25.66%--__libdw_find_attr
            |          |          
            |          |--20.96%--__dwarf_attr_internal (inlined)
            |          |          |          
            |          |          |--10.94%--attr_numeric
            |          |          |          |          
            |          |          |          |--9.96%--die__process_class
            |          |          |          |          __die__process_tag
            |          |          |          |          die__process_unit
            |          |          |          |          die__process
            |          |          |          |          dwarf_cus__create_and_process_cu
            |          |          |          |          dwarf_cus__process_cu_thread
            |          |          |          |          start_thread
            |          |          |          |          __GI___clone (inlined)
            |          |          |          |          
            |          |          |           --0.61%--type__init
            |          |          |                     __die__process_tag
            |          |          |                     die__process_unit
            |          |          |                     die__process
            |          |          |                     dwarf_cus__create_and_process_cu
            |          |          |                     dwarf_cus__process_cu_thread
            |          |          |                     start_thread
            |          |          |                     __GI___clone (inlined)
            |          |          |          
            |          |          |--5.43%--attr_type
            |          |          |          |          
            |          |          |           --4.94%--tag__init
            |          |          |                     |          
            |          |          |                     |--2.60%--die__process_class
            |          |          |                     |          __die__process_tag
            |          |          |                     |          die__process_unit
            |          |          |                     |          die__process
            |          |          |                     |          dwarf_cus__create_and_process_cu
            |          |          |                     |          dwarf_cus__process_cu_thread
            |          |          |                     |          start_thread
            |          |          |                     |          __GI___clone (inlined)
            |          |          |                     |          
            |          |          |                     |--0.99%--__die__process_tag
            |          |          |                     |          |          
            |          |          |                     |           --0.98%--die__process_unit
            |          |          |                     |                     die__process
            |          |          |                     |                     dwarf_cus__create_and_process_cu
            |          |          |                     |                     dwarf_cus__process_cu_thread
            |          |          |                     |                     start_thread
            |          |          |                     |                     __GI___clone (inlined)
            |          |          
            |          |--4.01%--__dwarf_siblingof_internal (inlined)
            |          |          |          
            |          |          |--1.41%--die__process_class
            |          |          |          __die__process_tag
            |          |          |          die__process_unit
            |          |          |          die__process
            |          |          |          dwarf_cus__create_and_process_cu
            |          |          |          dwarf_cus__process_cu_thread
            |          |          |          start_thread
            |          |          |          __GI___clone (inlined)
            |          |          |          
            |          |          |--1.02%--die__process
            |          |          |          dwarf_cus__create_and_process_cu
            |          |          |          dwarf_cus__process_cu_thread
            |          |          |          start_thread
            |          |          |          __GI___clone (inlined)
            |          
            |--2.38%--__libdw_form_val_compute_len
            |          __libdw_find_attr
            |          |          
            |           --1.86%--__dwarf_attr_internal (inlined)
            |                     |          
            |                     |--0.94%--attr_numeric
            |                     |          |          
            |                     |           --0.83%--die__process_class
            |                     |                     __die__process_tag
            |                     |                     die__process_unit
            |                     |                     die__process
            |                     |                     dwarf_cus__create_and_process_cu
            |                     |                     dwarf_cus__process_cu_thread
            |                     |                     start_thread
            |                     |                     __GI___clone (inlined)
            |                     |          
            |                      --0.56%--attr_type
            |                                |          
            |                                 --0.53%--tag__init



#
# (Tip: If you have debuginfo enabled, try: perf report -s sym,srcline)
#

This find_attr thing needs improvements, its a linear search AFAIK, some
hashtable could do wonders, I guess.

Mark, was this considered at some point?

⬢[acme@toolbox pahole]$ rpm -q elfutils-libs
elfutils-libs-0.186-1.fc34.x86_64

Andrii https://github.com/libbpf/libbpf/actions/workflows/pahole.yml is
in failure mode for 3 days, and only yesterday I pushed these changes,
seems unrelated to pahole:

Tests exit status: 1
Test Results:
             bpftool: PASS
          test_progs: FAIL (returned 1)
 test_progs-no_alu32: FAIL (returned 1)
       test_verifier: PASS
            shutdown: CLEAN
Error: Process completed with exit code 1.

Can you please check?

- Arnaldo
Andrii Nakryiko Feb. 1, 2022, 6:56 a.m. UTC | #6
On Fri, Jan 28, 2022 at 11:52 AM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Thu, Jan 27, 2022 at 05:12:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, Jan 26, 2022 at 11:58:27AM -0800, Andrii Nakryiko escreveu:
> > > On Wed, Jan 26, 2022 at 11:21 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > > > Create an instance of btf for each worker thread, and add type info to
> > > > the local btf instance in the steal-function of pahole without mutex
> > > > acquiring.  Once finished with all worker threads, merge all
> > > > per-thread btf instances to the primary btf instance.
>
> > > There are still unnecessary casts and missing {} in the else branch,
> > > but I'll let Arnaldo decide or fix it up.
>
> So its just one unneeded cast as thr_data here is just a 'void *':
>
> diff --git a/pahole.c b/pahole.c
> index 8c0a982f05c9ae3d..39e18804100dbfda 100644
> --- a/pahole.c
> +++ b/pahole.c
> @@ -2924,7 +2924,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
>                  * avoids copying the data collected by the first thread.
>                  */
>                 if (thr_data) {
> -                       struct thread_data *thread = (struct thread_data *)thr_data;
> +                       struct thread_data *thread = thr_data;
>
>                         if (thread->encoder == NULL) {
>                                 thread->encoder =
>
>
> This other is needed as it is a "void **":
>
> @@ -2832,7 +2832,7 @@ static int pahole_thread_exit(struct conf_load *conf, void *thr_data)
>  static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void **thr_data,
>                                   int error)
>  {
> -       struct thread_data **threads = (struct thread_data **)thr_data;
> +       struct thread_data **threads = thr_data;
>         int i;
>         int err = 0;
>
>
> Removing it:
>
> /var/home/acme/git/pahole/pahole.c: In function ‘pahole_threads_collect’:
> /var/home/acme/git/pahole/pahole.c:2835:40: warning: initialization of ‘struct thread_data **’ from incompatible pointer type ‘void **’ [-Wincompatible-pointer-types]
>  2835 |         struct thread_data **threads = thr_data;
>       |                                        ^~~~~~~~
>
>
> And I did some more profiling, now the focus should go to elfutils:
>
> ⬢[acme@toolbox pahole]$ perf report --no-children -s dso --call-graph none 2> /dev/null | head -20
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 27K of event 'cycles:u'
> # Event count (approx.): 27956766207
> #
> # Overhead  Shared Object
> # ........  ...................
> #
>     46.70%  libdwarves.so.1.0.0
>     39.84%  libdw-0.186.so
>      9.70%  libc-2.33.so
>      2.14%  libpthread-2.33.so
>      1.47%  [unknown]
>      0.09%  ld-2.33.so
>      0.06%  libelf-0.186.so
>      0.00%  libcrypto.so.1.1.1l
>      0.00%  libk5crypto.so.3.1
> ⬢[acme@toolbox pahole]$
>
> $ perf report -g graph,0.5,2 --stdio --no-children -s dso --dso libdw-0.186.so
>

[...]

>
> #
> # (Tip: If you have debuginfo enabled, try: perf report -s sym,srcline)
> #
>
> This find_attr thing needs improvements, its a linear search AFAIK, some
> hashtable could do wonders, I guess.
>
> Mark, was this considered at some point?

This would be a great improvement, yes!

But strange that you didn't see any BTF-related functions, are you
sure you profiled the entire DWARF to BTF conversion process? BTF
encoding is not dominant, but noticeable anyways (e.g., adding unique
strings to BTF is relatively expensive still).

>
> ⬢[acme@toolbox pahole]$ rpm -q elfutils-libs
> elfutils-libs-0.186-1.fc34.x86_64
>
> Andrii https://github.com/libbpf/libbpf/actions/workflows/pahole.yml is
> in failure mode for 3 days, and only yesterday I pushed these changes,
> seems unrelated to pahole:
>
> Tests exit status: 1
> Test Results:
>              bpftool: PASS
>           test_progs: FAIL (returned 1)
>  test_progs-no_alu32: FAIL (returned 1)
>        test_verifier: PASS
>             shutdown: CLEAN
> Error: Process completed with exit code 1.
>
> Can you please check?

Yes, it's not related to pahole. This is the BPF selftests issue which
I already fixed last week, but didn't get a chance to sync everything
to Github repo before leaving for a short vacation. I'll do another
sync tonight and it should be all green again.

>
> - Arnaldo
Arnaldo Carvalho de Melo Feb. 2, 2022, 12:16 a.m. UTC | #7
Em Mon, Jan 31, 2022 at 10:56:16PM -0800, Andrii Nakryiko escreveu:
> On Fri, Jan 28, 2022 at 11:52 AM Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> >
> > Em Thu, Jan 27, 2022 at 05:12:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Wed, Jan 26, 2022 at 11:58:27AM -0800, Andrii Nakryiko escreveu:
> > > > On Wed, Jan 26, 2022 at 11:21 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > > > > Create an instance of btf for each worker thread, and add type info to
> > > > > the local btf instance in the steal-function of pahole without mutex
> > > > > acquiring.  Once finished with all worker threads, merge all
> > > > > per-thread btf instances to the primary btf instance.
> >
> > > > There are still unnecessary casts and missing {} in the else branch,
> > > > but I'll let Arnaldo decide or fix it up.
> >
> > So its just one unneeded cast as thr_data here is just a 'void *':
> >
> > diff --git a/pahole.c b/pahole.c
> > index 8c0a982f05c9ae3d..39e18804100dbfda 100644
> > --- a/pahole.c
> > +++ b/pahole.c
> > @@ -2924,7 +2924,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
> >                  * avoids copying the data collected by the first thread.
> >                  */
> >                 if (thr_data) {
> > -                       struct thread_data *thread = (struct thread_data *)thr_data;
> > +                       struct thread_data *thread = thr_data;
> >
> >                         if (thread->encoder == NULL) {
> >                                 thread->encoder =
> >
> >
> > This other is needed as it is a "void **":
> >
> > @@ -2832,7 +2832,7 @@ static int pahole_thread_exit(struct conf_load *conf, void *thr_data)
> >  static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void **thr_data,
> >                                   int error)
> >  {
> > -       struct thread_data **threads = (struct thread_data **)thr_data;
> > +       struct thread_data **threads = thr_data;
> >         int i;
> >         int err = 0;
> >
> >
> > Removing it:
> >
> > /var/home/acme/git/pahole/pahole.c: In function ‘pahole_threads_collect’:
> > /var/home/acme/git/pahole/pahole.c:2835:40: warning: initialization of ‘struct thread_data **’ from incompatible pointer type ‘void **’ [-Wincompatible-pointer-types]
> >  2835 |         struct thread_data **threads = thr_data;
> >       |                                        ^~~~~~~~
> >
> >
> > And I did some more profiling, now the focus should go to elfutils:
> >
> > ⬢[acme@toolbox pahole]$ perf report --no-children -s dso --call-graph none 2> /dev/null | head -20
> > # To display the perf.data header info, please use --header/--header-only options.
> > #
> > #
> > # Total Lost Samples: 0
> > #
> > # Samples: 27K of event 'cycles:u'
> > # Event count (approx.): 27956766207
> > #
> > # Overhead  Shared Object
> > # ........  ...................
> > #
> >     46.70%  libdwarves.so.1.0.0
> >     39.84%  libdw-0.186.so
> >      9.70%  libc-2.33.so
> >      2.14%  libpthread-2.33.so
> >      1.47%  [unknown]
> >      0.09%  ld-2.33.so
> >      0.06%  libelf-0.186.so
> >      0.00%  libcrypto.so.1.1.1l
> >      0.00%  libk5crypto.so.3.1
> > ⬢[acme@toolbox pahole]$
> >
> > $ perf report -g graph,0.5,2 --stdio --no-children -s dso --dso libdw-0.186.so
> >
> 
> [...]
> 
> >
> > #
> > # (Tip: If you have debuginfo enabled, try: perf report -s sym,srcline)
> > #
> >
> > This find_attr thing needs improvements, its a linear search AFAIK, some
> > hashtable could do wonders, I guess.
> >
> > Mark, was this considered at some point?
> 
> This would be a great improvement, yes!
> 
> But strange that you didn't see any BTF-related functions, are you
> sure you profiled the entire DWARF to BTF conversion process? BTF
> encoding is not dominant, but noticeable anyways (e.g., adding unique
> strings to BTF is relatively expensive still).

It appears under the 

> >     46.70%  libdwarves.so.1.0.0

line, since we statically link libbpf. So yeah, it was unfortunate the
way I presented the profiling output, I was just trying to point to what
I think is the low hanging fruit, i.e. optimizing find_attr routines in
libdw-0.186.so.
 
> >
> > ⬢[acme@toolbox pahole]$ rpm -q elfutils-libs
> > elfutils-libs-0.186-1.fc34.x86_64
> >
> > Andrii https://github.com/libbpf/libbpf/actions/workflows/pahole.yml is
> > in failure mode for 3 days, and only yesterday I pushed these changes,
> > seems unrelated to pahole:
> >
> > Tests exit status: 1
> > Test Results:
> >              bpftool: PASS
> >           test_progs: FAIL (returned 1)
> >  test_progs-no_alu32: FAIL (returned 1)
> >        test_verifier: PASS
> >             shutdown: CLEAN
> > Error: Process completed with exit code 1.
> >
> > Can you please check?
> 
> Yes, it's not related to pahole. This is the BPF selftests issue which
> I already fixed last week, but didn't get a chance to sync everything
> to Github repo before leaving for a short vacation. I'll do another
> sync tonight and it should be all green again.

I'll check tomorrow.

Going green I'll do some more tests and work towards releasing 1.24.

- Arnaldo
diff mbox series

Patch

diff --git a/btf_encoder.c b/btf_encoder.c
index 9d015f304e92..56a76f5d7275 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1529,3 +1529,8 @@  int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu)
 out:
 	return err;
 }
+
+struct btf *btf_encoder__btf(struct btf_encoder *encoder)
+{
+	return encoder->btf;
+}
diff --git a/btf_encoder.h b/btf_encoder.h
index f133b0d7674d..0f0eee84df74 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -29,4 +29,6 @@  struct btf_encoder *btf_encoders__first(struct list_head *encoders);
 
 struct btf_encoder *btf_encoders__next(struct btf_encoder *encoder);
 
+struct btf *btf_encoder__btf(struct btf_encoder *encoder);
+
 #endif /* _BTF_ENCODER_H_ */
diff --git a/pahole.c b/pahole.c
index f3eeaaca4cdf..8dcd6bf951fe 100644
--- a/pahole.c
+++ b/pahole.c
@@ -2798,6 +2798,72 @@  out:
 
 static struct type_instance *header;
 
+struct thread_data {
+	struct btf *btf;
+	struct btf_encoder *encoder;
+};
+
+static int pahole_threads_prepare(struct conf_load *conf, int nr_threads, void **thr_data)
+{
+	int i;
+	struct thread_data *threads = calloc(sizeof(struct thread_data), nr_threads);
+
+	for (i = 0; i < nr_threads; i++)
+		thr_data[i] = threads + i;
+
+	return 0;
+}
+
+static int pahole_thread_exit(struct conf_load *conf, void *thr_data)
+{
+	struct thread_data *thread = thr_data;
+
+	if (thread == NULL)
+		return 0;
+
+	/*
+	 * Here we will call btf__dedup() here once we extend
+	 * btf__dedup().
+	 */
+
+	return 0;
+}
+
+static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void **thr_data,
+				  int error)
+{
+	struct thread_data **threads = (struct thread_data **)thr_data;
+	int i;
+	int err = 0;
+
+	if (error)
+		goto out;
+
+	for (i = 0; i < nr_threads; i++) {
+		/*
+		 * Merge content of the btf instances of worker threads to the btf
+		 * instance of the primary btf_encoder.
+                */
+		if (!threads[i]->btf || threads[i]->encoder == btf_encoder)
+			continue; /* The primary btf_encoder */
+		err = btf__add_btf(btf_encoder__btf(btf_encoder), threads[i]->btf);
+		if (err < 0)
+			goto out;
+		btf_encoder__delete(threads[i]->encoder);
+		threads[i]->encoder = NULL;
+	}
+	err = 0;
+
+out:
+	for (i = 0; i < nr_threads; i++) {
+		if (threads[i]->encoder && threads[i]->encoder != btf_encoder)
+			btf_encoder__delete(threads[i]->encoder);
+	}
+	free(threads[0]);
+
+	return err;
+}
+
 static enum load_steal_kind pahole_stealer(struct cu *cu,
 					   struct conf_load *conf_load,
 					   void *thr_data)
@@ -2819,30 +2885,70 @@  static enum load_steal_kind pahole_stealer(struct cu *cu,
 
 	if (btf_encode) {
 		static pthread_mutex_t btf_lock = PTHREAD_MUTEX_INITIALIZER;
+		struct btf_encoder *encoder;
 
-		pthread_mutex_lock(&btf_lock);
 		/*
 		 * FIXME:
 		 *
 		 * This should be really done at main(), but since in the current codebase only at this
 		 * point we'll have cu->elf setup...
 		 */
+		pthread_mutex_lock(&btf_lock);
 		if (!btf_encoder) {
-			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars,
-						       btf_encode_force, btf_gen_floats, global_verbose);
-			if (btf_encoder == NULL) {
-				ret = LSK__STOP_LOADING;
-				goto out_btf;
+			/*
+			 * btf_encoder is the primary encoder.
+			 * And, it is used by the thread
+			 * create it.
+			 */
+			btf_encoder = btf_encoder__new(cu, detached_btf_filename,
+						       conf_load->base_btf,
+						       skip_encoding_btf_vars,
+						       btf_encode_force,
+						       btf_gen_floats, global_verbose);
+			if (btf_encoder && thr_data) {
+				struct thread_data *thread = thr_data;
+
+				thread->encoder = btf_encoder;
+				thread->btf = btf_encoder__btf(btf_encoder);
 			}
 		}
+		pthread_mutex_unlock(&btf_lock);
+
+		if (btf_encoder == NULL) {
+			ret = LSK__STOP_LOADING;
+			goto out_btf;
+		}
 
-		if (btf_encoder__encode_cu(btf_encoder, cu)) {
+		/*
+		 * thr_data keeps per-thread data for worker threads.  Each worker thread
+		 * has an encoder.  The main thread will merge the data collected by all
+		 * these encoders to btf_encoder.  However, the first thread reaching this
+		 * function creates btf_encoder and reuses it as its local encoder.  It
+		 * avoids copying the data collected by the first thread.
+		 */
+		if (thr_data) {
+			struct thread_data *thread = (struct thread_data *)thr_data;
+
+			if (thread->encoder == NULL) {
+				thread->encoder =
+					btf_encoder__new(cu, detached_btf_filename,
+							 NULL,
+							 skip_encoding_btf_vars,
+							 btf_encode_force,
+							 btf_gen_floats,
+							 global_verbose);
+				thread->btf = btf_encoder__btf(thread->encoder);
+			}
+			encoder = thread->encoder;
+		} else
+			encoder = btf_encoder;
+
+		if (btf_encoder__encode_cu(encoder, cu)) {
 			fprintf(stderr, "Encountered error while encoding BTF.\n");
 			exit(1);
 		}
 		ret = LSK__DELETE;
 out_btf:
-		pthread_mutex_unlock(&btf_lock);
 		return ret;
 	}
 #if 0
@@ -3207,6 +3313,9 @@  int main(int argc, char *argv[])
 	memset(tab, ' ', sizeof(tab) - 1);
 
 	conf_load.steal = pahole_stealer;
+	conf_load.thread_exit = pahole_thread_exit;
+	conf_load.threads_prepare = pahole_threads_prepare;
+	conf_load.threads_collect = pahole_threads_collect;
 
 	// Make 'pahole --header type < file' a shorter form of 'pahole -C type --count 1 < file'
 	if (conf.header_type && !class_name && prettify_input) {