mbox series

[v3,dwarves,0/8] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions

Message ID 1675790102-23037-1-git-send-email-alan.maguire@oracle.com (mailing list archive)
Headers show
Series dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions | expand

Message

Alan Maguire Feb. 7, 2023, 5:14 p.m. UTC
At optimization level -O2 or higher in gcc, static functions may be
optimized such that they have suffixes like .isra.0, .constprop.0 etc.
These represent 
    
- constant propagation (.constprop.0);
- interprocedural scalar replacement of aggregates, removal of
  unused parameters and replacement of parameters passed by
  reference by parameters passed by value (.isra.0)
  
See [1] for details. 
    
Currently BTF encoding does not handle such optimized functions
that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
This is safer because such suffixes can often indicate parameters have
been optimized out.  This series addresses this by matching a
function to a suffixed version ("foo" matching "foo.isra.0") while
ensuring that the function signature does not contain optimized-out
parameters.  Note that if the function is found ("foo") it will
be preferred, only falling back to "foo.isra.0" if lookup of the
function fails.  Addition to BTF is skipped if the function has
optimized-out parameters, since the expected function signature
will not match. BTF encoding does not include the "."-suffix to
be consistent with DWARF. In addition, the kernel currently does
not allow a "." suffix in a BTF function name.

A problem with this approach however is that BTF carries out the
encoding process in parallel across multiple CUs, and sometimes
a function has optimized-out parameters in one CU but not others;
we see this for NF_HOOK.constprop.0 for example.  So in order to
determine if the function has optimized-out parameters in any
CU, its addition is not carried out until we have processed all
CUs and are about to merge BTF.  At this point we know if any
such optimizations have occurred.  Patches 1-5 handle the
optimized-out parameter identification and matching "."-suffixed
functions with the original function to facilitate BTF
encoding.  This feature can be enabled via the
"--btf_gen_optimized" option.

Patch 6 addresses a related problem - it is entirely possible
for a static function of the same name to exist in different
CUs with different function signatures.  Because BTF does not
currently encode any information that would help disambiguate
which BTF function specification matches which static function
(in the case of multiple different function signatures), it is
best to eliminate such functions from BTF for now.  The same
mechanism that is used to compare static "."-suffixed functions
is re-used for the static function comparison.  A superficial
comparison of number of parameters/parameter names is done to
see if such representations are consistent, and if inconsistent
prototypes are observed, the function is flagged for exclusion
from BTF.

When these methods are combined - the additive encoding of
"."-suffixed functions and the subtractive elimination of
functions with inconsistent parameters - we see an overall
drop in the number of functions in vmlinux BTF, from
51529 to 50246.  Skipping inconsistent functions is enabled
via "--skip_encoding_btf_inconsistent_proto".

Changes since v2 [2]
- Arnaldo incorporated some of the suggestions in the v2 thread;
  these patches are based on those; the relevant changes are
  noted as committer changes.
- Patch 1 is unchanged from v2, but the rest of the patches
  have been updated:
- Patch 2 separates out the changes to the struct btf_encoder
  that better support later addition of functions.
- Patch 3 then is changed insofar as these changes are no
  longer needed for the function addition refactoring.
- Patch 4 has a small change; we need to verify that an
  encoder has actually been added to the encoders list
  prior to removal
- Patch 5 changed significantly; when attempting to measure
  performance the relatively good numbers attained when using
  delayed function addition were not reproducible.
  Further analysis revealed that the large number of lookups
  caused by the presence of the separate function tree was
  a major cause of performance degradation in the multi
  threaded case.  So instead of maintaining a separate tree,
  we use the ELF function list which we already need to look
  up to match ELF -> DWARF function descriptions to store
  the function representation.  This has 2 benefits; firstly
  as mentioned, we already look up the ELF function so no
  additional lookup is required to save the function.
  Secondly, the ELF representation is identical for each
  encoder, so we can index the same function across multiple
  encoder function arrays - this greatly speeds up the
  processing of comparing function representations across
  encoders.  There is still a performance cost in this
  approach however; more details are provided in patch 6.
  An option specific to adding functions with "." suffixes
  is added "--btf_gen_optimized"
- Patch 6 builds on patch 5 in applying the save/merge/add
  approach for all functions using the same mechanisms.
  In addition the "--skip_encoding_btf_inconsistent_proto"
  option is introduced.
- Patches 7/8 document the new options in the pahole manual
  page.
  
Changes since v1 [3]

- Eduard noted that a DW_AT_const_value attribute can signal
  an optimized-out parameter, and that the lack of a location
  attribute signals optimization; ensure we handle those cases
  also (Eduard, patch 1).
- Jiri noted we can have inconsistencies between a static
  and non-static function; apply the comparison process to
  all functions (Jiri, patch 5)
- segmentation fault was observed when handling functions with
  > 10 parameters; needed parameter comparison loop to exit
  at BTF_ENCODER_MAX_PARAMETERS (patch 5)
- Kui-Feng Lee pointed out that having a global shared function
  tree would lead to a lot of contention; here a per-encoder 
  tree is used, and once the threads are collected the trees
  are merged. Performance numbers are provided in patch 5 
  (Kui-Feng Lee, patches 4/5)

[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
[2] https://lore.kernel.org/bpf/1675088985-20300-1-git-send-email-alan.maguire@oracle.com/
[3] https://lore.kernel.org/bpf/1674567931-26458-1-git-send-email-alan.maguire@oracle.com/

Alan Maguire (8):
  dwarf_loader: Help spotting functions with optimized-out parameters
  btf_encoder: store type_id_off, unspecified type in encoder
  btf_encoder: Refactor function addition into dedicated
    btf_encoder__add_func
  btf_encoder: Rework btf_encoders__*() API to allow traversal of
    encoders
  btf_encoder: Represent "."-suffixed functions (".isra.0") in BTF
  btf_encoder: support delaying function addition to check for function
    prototype inconsistencies
  dwarves: document --btf_gen_optimized option
  dwarves: document --skip_encoding_btf_inconsistent_proto option

 btf_encoder.c      | 360 +++++++++++++++++++++++++++++++++++++--------
 btf_encoder.h      |   6 -
 dwarf_loader.c     | 130 +++++++++++++++-
 dwarves.h          |  11 +-
 man-pages/pahole.1 |  10 ++
 pahole.c           |  30 +++-
 6 files changed, 468 insertions(+), 79 deletions(-)

Comments

Jiri Olsa Feb. 8, 2023, 1:20 p.m. UTC | #1
On Tue, Feb 07, 2023 at 05:14:54PM +0000, Alan Maguire wrote:

SNIP

> 
> Changes since v2 [2]
> - Arnaldo incorporated some of the suggestions in the v2 thread;
>   these patches are based on those; the relevant changes are
>   noted as committer changes.
> - Patch 1 is unchanged from v2, but the rest of the patches
>   have been updated:
> - Patch 2 separates out the changes to the struct btf_encoder
>   that better support later addition of functions.
> - Patch 3 then is changed insofar as these changes are no
>   longer needed for the function addition refactoring.
> - Patch 4 has a small change; we need to verify that an
>   encoder has actually been added to the encoders list
>   prior to removal
> - Patch 5 changed significantly; when attempting to measure
>   performance the relatively good numbers attained when using
>   delayed function addition were not reproducible.
>   Further analysis revealed that the large number of lookups
>   caused by the presence of the separate function tree was
>   a major cause of performance degradation in the multi
>   threaded case.  So instead of maintaining a separate tree,
>   we use the ELF function list which we already need to look
>   up to match ELF -> DWARF function descriptions to store
>   the function representation.  This has 2 benefits; firstly
>   as mentioned, we already look up the ELF function so no
>   additional lookup is required to save the function.
>   Secondly, the ELF representation is identical for each
>   encoder, so we can index the same function across multiple
>   encoder function arrays - this greatly speeds up the
>   processing of comparing function representations across
>   encoders.  There is still a performance cost in this

awesome.. great we can do it without the extra tree

I wonder we could save some cycles just by memdup-ing the encoder->functions
array for the subsequent encoders, but that's ok for another patch ;-)

thanks,
jirka
Alan Maguire Feb. 8, 2023, 3:25 p.m. UTC | #2
On 08/02/2023 13:20, Jiri Olsa wrote:
> On Tue, Feb 07, 2023 at 05:14:54PM +0000, Alan Maguire wrote:
> 
> SNIP
> 
>>
>> Changes since v2 [2]
>> - Arnaldo incorporated some of the suggestions in the v2 thread;
>>   these patches are based on those; the relevant changes are
>>   noted as committer changes.
>> - Patch 1 is unchanged from v2, but the rest of the patches
>>   have been updated:
>> - Patch 2 separates out the changes to the struct btf_encoder
>>   that better support later addition of functions.
>> - Patch 3 then is changed insofar as these changes are no
>>   longer needed for the function addition refactoring.
>> - Patch 4 has a small change; we need to verify that an
>>   encoder has actually been added to the encoders list
>>   prior to removal
>> - Patch 5 changed significantly; when attempting to measure
>>   performance the relatively good numbers attained when using
>>   delayed function addition were not reproducible.
>>   Further analysis revealed that the large number of lookups
>>   caused by the presence of the separate function tree was
>>   a major cause of performance degradation in the multi
>>   threaded case.  So instead of maintaining a separate tree,
>>   we use the ELF function list which we already need to look
>>   up to match ELF -> DWARF function descriptions to store
>>   the function representation.  This has 2 benefits; firstly
>>   as mentioned, we already look up the ELF function so no
>>   additional lookup is required to save the function.
>>   Secondly, the ELF representation is identical for each
>>   encoder, so we can index the same function across multiple
>>   encoder function arrays - this greatly speeds up the
>>   processing of comparing function representations across
>>   encoders.  There is still a performance cost in this
> 
> awesome.. great we can do it without the extra tree
> 
> I wonder we could save some cycles just by memdup-ing the encoder->functions
> array for the subsequent encoders, but that's ok for another patch ;-)
> 

great idea; also provides extra assurance the layout of the
ELF function arrays are identical! I'd started to explore having
ELF info allocated once in main encoder thread and just duped
for other threads; should definitely save some time. thanks!

Alan

> thanks,
> jirka
>
Arnaldo Carvalho de Melo Feb. 8, 2023, 4:20 p.m. UTC | #3
Em Tue, Feb 07, 2023 at 05:14:54PM +0000, Alan Maguire escreveu:
> At optimization level -O2 or higher in gcc, static functions may be
> optimized such that they have suffixes like .isra.0, .constprop.0 etc.
> These represent 
>     
> - constant propagation (.constprop.0);
> - interprocedural scalar replacement of aggregates, removal of
>   unused parameters and replacement of parameters passed by
>   reference by parameters passed by value (.isra.0)

Initial test, without using the new options:

[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | tail
      3 start_show
      3 timeout_show
      3 uuid_show
      4 m_next
      4 parse_options
      4 sk_diag_fill
      4 state_show
      4 state_store
      5 status_show
      6 type_show
[acme@pumpkin ~]$

Now I'll use --skip_encoding_btf_inconsistent_proto and --btf_gen_optimized

- Arnaldo
   
> See [1] for details. 
>     
> Currently BTF encoding does not handle such optimized functions
> that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
> This is safer because such suffixes can often indicate parameters have
> been optimized out.  This series addresses this by matching a
> function to a suffixed version ("foo" matching "foo.isra.0") while
> ensuring that the function signature does not contain optimized-out
> parameters.  Note that if the function is found ("foo") it will
> be preferred, only falling back to "foo.isra.0" if lookup of the
> function fails.  Addition to BTF is skipped if the function has
> optimized-out parameters, since the expected function signature
> will not match. BTF encoding does not include the "."-suffix to
> be consistent with DWARF. In addition, the kernel currently does
> not allow a "." suffix in a BTF function name.
> 
> A problem with this approach however is that BTF carries out the
> encoding process in parallel across multiple CUs, and sometimes
> a function has optimized-out parameters in one CU but not others;
> we see this for NF_HOOK.constprop.0 for example.  So in order to
> determine if the function has optimized-out parameters in any
> CU, its addition is not carried out until we have processed all
> CUs and are about to merge BTF.  At this point we know if any
> such optimizations have occurred.  Patches 1-5 handle the
> optimized-out parameter identification and matching "."-suffixed
> functions with the original function to facilitate BTF
> encoding.  This feature can be enabled via the
> "--btf_gen_optimized" option.
> 
> Patch 6 addresses a related problem - it is entirely possible
> for a static function of the same name to exist in different
> CUs with different function signatures.  Because BTF does not
> currently encode any information that would help disambiguate
> which BTF function specification matches which static function
> (in the case of multiple different function signatures), it is
> best to eliminate such functions from BTF for now.  The same
> mechanism that is used to compare static "."-suffixed functions
> is re-used for the static function comparison.  A superficial
> comparison of number of parameters/parameter names is done to
> see if such representations are consistent, and if inconsistent
> prototypes are observed, the function is flagged for exclusion
> from BTF.
> 
> When these methods are combined - the additive encoding of
> "."-suffixed functions and the subtractive elimination of
> functions with inconsistent parameters - we see an overall
> drop in the number of functions in vmlinux BTF, from
> 51529 to 50246.  Skipping inconsistent functions is enabled
> via "--skip_encoding_btf_inconsistent_proto".
> 
> Changes since v2 [2]
> - Arnaldo incorporated some of the suggestions in the v2 thread;
>   these patches are based on those; the relevant changes are
>   noted as committer changes.
> - Patch 1 is unchanged from v2, but the rest of the patches
>   have been updated:
> - Patch 2 separates out the changes to the struct btf_encoder
>   that better support later addition of functions.
> - Patch 3 then is changed insofar as these changes are no
>   longer needed for the function addition refactoring.
> - Patch 4 has a small change; we need to verify that an
>   encoder has actually been added to the encoders list
>   prior to removal
> - Patch 5 changed significantly; when attempting to measure
>   performance the relatively good numbers attained when using
>   delayed function addition were not reproducible.
>   Further analysis revealed that the large number of lookups
>   caused by the presence of the separate function tree was
>   a major cause of performance degradation in the multi
>   threaded case.  So instead of maintaining a separate tree,
>   we use the ELF function list which we already need to look
>   up to match ELF -> DWARF function descriptions to store
>   the function representation.  This has 2 benefits; firstly
>   as mentioned, we already look up the ELF function so no
>   additional lookup is required to save the function.
>   Secondly, the ELF representation is identical for each
>   encoder, so we can index the same function across multiple
>   encoder function arrays - this greatly speeds up the
>   processing of comparing function representations across
>   encoders.  There is still a performance cost in this
>   approach however; more details are provided in patch 6.
>   An option specific to adding functions with "." suffixes
>   is added "--btf_gen_optimized"
> - Patch 6 builds on patch 5 in applying the save/merge/add
>   approach for all functions using the same mechanisms.
>   In addition the "--skip_encoding_btf_inconsistent_proto"
>   option is introduced.
> - Patches 7/8 document the new options in the pahole manual
>   page.
>   
> Changes since v1 [3]
> 
> - Eduard noted that a DW_AT_const_value attribute can signal
>   an optimized-out parameter, and that the lack of a location
>   attribute signals optimization; ensure we handle those cases
>   also (Eduard, patch 1).
> - Jiri noted we can have inconsistencies between a static
>   and non-static function; apply the comparison process to
>   all functions (Jiri, patch 5)
> - segmentation fault was observed when handling functions with
>   > 10 parameters; needed parameter comparison loop to exit
>   at BTF_ENCODER_MAX_PARAMETERS (patch 5)
> - Kui-Feng Lee pointed out that having a global shared function
>   tree would lead to a lot of contention; here a per-encoder 
>   tree is used, and once the threads are collected the trees
>   are merged. Performance numbers are provided in patch 5 
>   (Kui-Feng Lee, patches 4/5)
> 
> [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> [2] https://lore.kernel.org/bpf/1675088985-20300-1-git-send-email-alan.maguire@oracle.com/
> [3] https://lore.kernel.org/bpf/1674567931-26458-1-git-send-email-alan.maguire@oracle.com/
> 
> Alan Maguire (8):
>   dwarf_loader: Help spotting functions with optimized-out parameters
>   btf_encoder: store type_id_off, unspecified type in encoder
>   btf_encoder: Refactor function addition into dedicated
>     btf_encoder__add_func
>   btf_encoder: Rework btf_encoders__*() API to allow traversal of
>     encoders
>   btf_encoder: Represent "."-suffixed functions (".isra.0") in BTF
>   btf_encoder: support delaying function addition to check for function
>     prototype inconsistencies
>   dwarves: document --btf_gen_optimized option
>   dwarves: document --skip_encoding_btf_inconsistent_proto option
> 
>  btf_encoder.c      | 360 +++++++++++++++++++++++++++++++++++++--------
>  btf_encoder.h      |   6 -
>  dwarf_loader.c     | 130 +++++++++++++++-
>  dwarves.h          |  11 +-
>  man-pages/pahole.1 |  10 ++
>  pahole.c           |  30 +++-
>  6 files changed, 468 insertions(+), 79 deletions(-)
> 
> -- 
> 2.31.1
>
Arnaldo Carvalho de Melo Feb. 8, 2023, 4:50 p.m. UTC | #4
Em Wed, Feb 08, 2023 at 01:20:39PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Feb 07, 2023 at 05:14:54PM +0000, Alan Maguire escreveu:
> > At optimization level -O2 or higher in gcc, static functions may be
> > optimized such that they have suffixes like .isra.0, .constprop.0 etc.
> > These represent 
> >     
> > - constant propagation (.constprop.0);
> > - interprocedural scalar replacement of aggregates, removal of
> >   unused parameters and replacement of parameters passed by
> >   reference by parameters passed by value (.isra.0)
> 
> Initial test, without using the new options:
> 
> [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | tail
>       3 start_show
>       3 timeout_show
>       3 uuid_show
>       4 m_next
>       4 parse_options
>       4 sk_diag_fill
>       4 state_show
>       4 state_store
>       5 status_show
>       6 type_show
> [acme@pumpkin ~]$
> 
> Now I'll use --skip_encoding_btf_inconsistent_proto and --btf_gen_optimized

With:

⬢[acme@toolbox linux]$ git diff
diff --git a/scripts/pahole-flags.sh b/scripts/pahole-flags.sh
index 1f1f1d397c399afe..9dd185fb1ff1fc3b 100755
--- a/scripts/pahole-flags.sh
+++ b/scripts/pahole-flags.sh
@@ -21,7 +21,7 @@ if [ "${pahole_ver}" -ge "122" ]; then
 fi
 if [ "${pahole_ver}" -ge "124" ]; then
        # see PAHOLE_HAS_LANG_EXCLUDE
-       extra_paholeopt="${extra_paholeopt} --lang_exclude=rust"
+       extra_paholeopt="${extra_paholeopt} --lang_exclude=rust --btf_gen_optimized --skip_encoding_btf_inconsistent_proto"
 fi

 echo ${extra_paholeopt}
⬢[acme@toolbox linux]$

I get:

[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | tail
      1 zswap_writeback_entry
      1 zswap_zpool_param_set
      1 zs_zpool_create
      1 zs_zpool_destroy
      1 zs_zpool_free
      1 zs_zpool_malloc
      1 zs_zpool_map
      1 zs_zpool_shrink
      1 zs_zpool_total_size
      1 zs_zpool_unmap
[acme@pumpkin ~]$

No functions with more than one entry:

[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | grep -v ' 1 '
[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | grep ' 1 ' | wc -l
54558
[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | wc -l
54558
[acme@pumpkin ~]$

So I'll bump the release as we did in the past when testing features
that we need to test against a release on the pahole-flags.sh script so
that we can do further tests.

- Arnaldo
 
> - Arnaldo
>    
> > See [1] for details. 
> >     
> > Currently BTF encoding does not handle such optimized functions
> > that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
> > This is safer because such suffixes can often indicate parameters have
> > been optimized out.  This series addresses this by matching a
> > function to a suffixed version ("foo" matching "foo.isra.0") while
> > ensuring that the function signature does not contain optimized-out
> > parameters.  Note that if the function is found ("foo") it will
> > be preferred, only falling back to "foo.isra.0" if lookup of the
> > function fails.  Addition to BTF is skipped if the function has
> > optimized-out parameters, since the expected function signature
> > will not match. BTF encoding does not include the "."-suffix to
> > be consistent with DWARF. In addition, the kernel currently does
> > not allow a "." suffix in a BTF function name.
> > 
> > A problem with this approach however is that BTF carries out the
> > encoding process in parallel across multiple CUs, and sometimes
> > a function has optimized-out parameters in one CU but not others;
> > we see this for NF_HOOK.constprop.0 for example.  So in order to
> > determine if the function has optimized-out parameters in any
> > CU, its addition is not carried out until we have processed all
> > CUs and are about to merge BTF.  At this point we know if any
> > such optimizations have occurred.  Patches 1-5 handle the
> > optimized-out parameter identification and matching "."-suffixed
> > functions with the original function to facilitate BTF
> > encoding.  This feature can be enabled via the
> > "--btf_gen_optimized" option.
> > 
> > Patch 6 addresses a related problem - it is entirely possible
> > for a static function of the same name to exist in different
> > CUs with different function signatures.  Because BTF does not
> > currently encode any information that would help disambiguate
> > which BTF function specification matches which static function
> > (in the case of multiple different function signatures), it is
> > best to eliminate such functions from BTF for now.  The same
> > mechanism that is used to compare static "."-suffixed functions
> > is re-used for the static function comparison.  A superficial
> > comparison of number of parameters/parameter names is done to
> > see if such representations are consistent, and if inconsistent
> > prototypes are observed, the function is flagged for exclusion
> > from BTF.
> > 
> > When these methods are combined - the additive encoding of
> > "."-suffixed functions and the subtractive elimination of
> > functions with inconsistent parameters - we see an overall
> > drop in the number of functions in vmlinux BTF, from
> > 51529 to 50246.  Skipping inconsistent functions is enabled
> > via "--skip_encoding_btf_inconsistent_proto".
> > 
> > Changes since v2 [2]
> > - Arnaldo incorporated some of the suggestions in the v2 thread;
> >   these patches are based on those; the relevant changes are
> >   noted as committer changes.
> > - Patch 1 is unchanged from v2, but the rest of the patches
> >   have been updated:
> > - Patch 2 separates out the changes to the struct btf_encoder
> >   that better support later addition of functions.
> > - Patch 3 then is changed insofar as these changes are no
> >   longer needed for the function addition refactoring.
> > - Patch 4 has a small change; we need to verify that an
> >   encoder has actually been added to the encoders list
> >   prior to removal
> > - Patch 5 changed significantly; when attempting to measure
> >   performance the relatively good numbers attained when using
> >   delayed function addition were not reproducible.
> >   Further analysis revealed that the large number of lookups
> >   caused by the presence of the separate function tree was
> >   a major cause of performance degradation in the multi
> >   threaded case.  So instead of maintaining a separate tree,
> >   we use the ELF function list which we already need to look
> >   up to match ELF -> DWARF function descriptions to store
> >   the function representation.  This has 2 benefits; firstly
> >   as mentioned, we already look up the ELF function so no
> >   additional lookup is required to save the function.
> >   Secondly, the ELF representation is identical for each
> >   encoder, so we can index the same function across multiple
> >   encoder function arrays - this greatly speeds up the
> >   processing of comparing function representations across
> >   encoders.  There is still a performance cost in this
> >   approach however; more details are provided in patch 6.
> >   An option specific to adding functions with "." suffixes
> >   is added "--btf_gen_optimized"
> > - Patch 6 builds on patch 5 in applying the save/merge/add
> >   approach for all functions using the same mechanisms.
> >   In addition the "--skip_encoding_btf_inconsistent_proto"
> >   option is introduced.
> > - Patches 7/8 document the new options in the pahole manual
> >   page.
> >   
> > Changes since v1 [3]
> > 
> > - Eduard noted that a DW_AT_const_value attribute can signal
> >   an optimized-out parameter, and that the lack of a location
> >   attribute signals optimization; ensure we handle those cases
> >   also (Eduard, patch 1).
> > - Jiri noted we can have inconsistencies between a static
> >   and non-static function; apply the comparison process to
> >   all functions (Jiri, patch 5)
> > - segmentation fault was observed when handling functions with
> >   > 10 parameters; needed parameter comparison loop to exit
> >   at BTF_ENCODER_MAX_PARAMETERS (patch 5)
> > - Kui-Feng Lee pointed out that having a global shared function
> >   tree would lead to a lot of contention; here a per-encoder 
> >   tree is used, and once the threads are collected the trees
> >   are merged. Performance numbers are provided in patch 5 
> >   (Kui-Feng Lee, patches 4/5)
> > 
> > [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> > [2] https://lore.kernel.org/bpf/1675088985-20300-1-git-send-email-alan.maguire@oracle.com/
> > [3] https://lore.kernel.org/bpf/1674567931-26458-1-git-send-email-alan.maguire@oracle.com/
> > 
> > Alan Maguire (8):
> >   dwarf_loader: Help spotting functions with optimized-out parameters
> >   btf_encoder: store type_id_off, unspecified type in encoder
> >   btf_encoder: Refactor function addition into dedicated
> >     btf_encoder__add_func
> >   btf_encoder: Rework btf_encoders__*() API to allow traversal of
> >     encoders
> >   btf_encoder: Represent "."-suffixed functions (".isra.0") in BTF
> >   btf_encoder: support delaying function addition to check for function
> >     prototype inconsistencies
> >   dwarves: document --btf_gen_optimized option
> >   dwarves: document --skip_encoding_btf_inconsistent_proto option
> > 
> >  btf_encoder.c      | 360 +++++++++++++++++++++++++++++++++++++--------
> >  btf_encoder.h      |   6 -
> >  dwarf_loader.c     | 130 +++++++++++++++-
> >  dwarves.h          |  11 +-
> >  man-pages/pahole.1 |  10 ++
> >  pahole.c           |  30 +++-
> >  6 files changed, 468 insertions(+), 79 deletions(-)
> > 
> > -- 
> > 2.31.1
> > 
> 
> -- 
> 
> - Arnaldo
Jiri Olsa Feb. 9, 2023, 9:36 a.m. UTC | #5
On Wed, Feb 08, 2023 at 01:50:27PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Feb 08, 2023 at 01:20:39PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Tue, Feb 07, 2023 at 05:14:54PM +0000, Alan Maguire escreveu:
> > > At optimization level -O2 or higher in gcc, static functions may be
> > > optimized such that they have suffixes like .isra.0, .constprop.0 etc.
> > > These represent 
> > >     
> > > - constant propagation (.constprop.0);
> > > - interprocedural scalar replacement of aggregates, removal of
> > >   unused parameters and replacement of parameters passed by
> > >   reference by parameters passed by value (.isra.0)
> > 
> > Initial test, without using the new options:
> > 
> > [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | tail
> >       3 start_show
> >       3 timeout_show
> >       3 uuid_show
> >       4 m_next
> >       4 parse_options
> >       4 sk_diag_fill
> >       4 state_show
> >       4 state_store
> >       5 status_show
> >       6 type_show
> > [acme@pumpkin ~]$
> > 
> > Now I'll use --skip_encoding_btf_inconsistent_proto and --btf_gen_optimized
> 
> With:
> 
> ⬢[acme@toolbox linux]$ git diff
> diff --git a/scripts/pahole-flags.sh b/scripts/pahole-flags.sh
> index 1f1f1d397c399afe..9dd185fb1ff1fc3b 100755
> --- a/scripts/pahole-flags.sh
> +++ b/scripts/pahole-flags.sh
> @@ -21,7 +21,7 @@ if [ "${pahole_ver}" -ge "122" ]; then
>  fi
>  if [ "${pahole_ver}" -ge "124" ]; then
>         # see PAHOLE_HAS_LANG_EXCLUDE
> -       extra_paholeopt="${extra_paholeopt} --lang_exclude=rust"
> +       extra_paholeopt="${extra_paholeopt} --lang_exclude=rust --btf_gen_optimized --skip_encoding_btf_inconsistent_proto"
>  fi
> 
>  echo ${extra_paholeopt}
> ⬢[acme@toolbox linux]$
> 
> I get:
> 
> [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | tail
>       1 zswap_writeback_entry
>       1 zswap_zpool_param_set
>       1 zs_zpool_create
>       1 zs_zpool_destroy
>       1 zs_zpool_free
>       1 zs_zpool_malloc
>       1 zs_zpool_map
>       1 zs_zpool_shrink
>       1 zs_zpool_total_size
>       1 zs_zpool_unmap
> [acme@pumpkin ~]$
> 
> No functions with more than one entry:
> 
> [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | grep -v ' 1 '
> [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | grep ' 1 ' | wc -l
> 54558
> [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | wc -l
> 54558
> [acme@pumpkin ~]$
> 
> So I'll bump the release as we did in the past when testing features
> that we need to test against a release on the pahole-flags.sh script so
> that we can do further tests.

I did similar test and ran bpf selftests built with new pahole,
all looks good

Acked/Tested-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka
Arnaldo Carvalho de Melo Feb. 9, 2023, 12:22 p.m. UTC | #6
Em Thu, Feb 09, 2023 at 10:36:51AM +0100, Jiri Olsa escreveu:
> On Wed, Feb 08, 2023 at 01:50:27PM -0300, Arnaldo Carvalho de Melo wrote:
> > No functions with more than one entry:

> > [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | grep -v ' 1 '
> > [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | grep ' 1 ' | wc -l
> > 54558
> > [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | wc -l
> > 54558
> > [acme@pumpkin ~]$

> > So I'll bump the release as we did in the past when testing features
> > that we need to test against a release on the pahole-flags.sh script so
> > that we can do further tests.

> I did similar test and ran bpf selftests built with new pahole,
> all looks good

> Acked/Tested-by: Jiri Olsa <jolsa@kernel.org>

Thanks, added to the patches in this series.

- Arnaldo
Arnaldo Carvalho de Melo Feb. 9, 2023, 1:09 p.m. UTC | #7
Em Wed, Feb 08, 2023 at 05:17:02PM +0000, Alan Maguire escreveu:
> From: Alan Maguire <alan.maguire@oracle.com>
> Date: Thu, 2 Feb 2023 12:26:20 +0000
> Subject: [PATCH bpf-next] bpf: add --skip_encoding_btf_inconsistent_proto,
>  --btf_gen_optimized to pahole flags for v1.25
> 
> v1.25 of pahole supports filtering out functions with multiple
> inconsistent function prototypes or optimized-out parameters
> from the BTF representation.  These present problems because
> there is no additional info in BTF saying which inconsistent
> prototype matches which function instance to help guide
> attachment, and functions with optimized-out parameters can
> lead to incorrect assumptions about register contents.
> 
> So for now, filter out such functions while adding BTF
> representations for functions that have "."-suffixes
> (foo.isra.0) but not optimized-out parameters.
> 
> This patch assumes changes in [1] land and pahole is bumped
> to v1.25.
> 
> [1] https://lore.kernel.org/bpf/1675790102-23037-1-git-send-email-alan.maguire@oracle.com/
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  scripts/pahole-flags.sh | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/scripts/pahole-flags.sh b/scripts/pahole-flags.sh
> index 1f1f1d3..728d551 100755
> --- a/scripts/pahole-flags.sh
> +++ b/scripts/pahole-flags.sh
> @@ -23,5 +23,8 @@ if [ "${pahole_ver}" -ge "124" ]; then
>  	# see PAHOLE_HAS_LANG_EXCLUDE
>  	extra_paholeopt="${extra_paholeopt} --lang_exclude=rust"
>  fi
> +if [ "${pahole_ver}" -ge "125" ]; then
> +	extra_paholeopt="${extra_paholeopt} --skip_encoding_btf_inconsistent_proto --btf_gen_optimized"
> +fi
>  
>  echo ${extra_paholeopt}
> -- 
> 1.8.3.1


I applied the patch and it works as advertised using what is in pahole's
'next' branch, that should go to 'master' later today.

Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

- Arnaldo