diff mbox series

[RFC] kbuild: bpf: Do not run pahole with -j on 32bit userspace

Message ID 20240820085950.200358-1-jirislaby@kernel.org (mailing list archive)
State New
Headers show
Series [RFC] kbuild: bpf: Do not run pahole with -j on 32bit userspace | expand

Commit Message

Jiri Slaby Aug. 20, 2024, 8:59 a.m. UTC
From: Jiri Slaby <jslaby@suse.cz>

== WARNING ==
This is only a PoC. There are deficiencies like CROSS_COMPILE or LLVM
are completely unhandled.

The simple version is just do there:
  ifeq ($(CONFIG_64BIT,y)
but it has its own deficiencies, of course.

So any ideas, inputs?
== WARNING ==

When pahole is run with -j on 32bit userspace (32bit pahole in
particular), it randomly fails with OOM:
> btf_encoder__tag_kfuncs: Failed to get ELF section(62) data: out of memory.
> btf_encoder__encode: failed to tag kfuncs!

or simply SIGSEGV (failed to allocate the btf encoder).

It very depends on how many threads are created.

So do not invoke pahole with -j on 32bit.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Fixes: b4f72786429c ("scripts/pahole-flags.sh: Parse DWARF and generate BTF with multithreading.")
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229450
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: shung-hsi.yu@suse.com
Cc: msuchanek@suse.com
---
 init/Kconfig            |  4 ++++
 scripts/Makefile.btf    |  2 ++
 scripts/pahole-class.sh | 21 +++++++++++++++++++++
 3 files changed, 27 insertions(+)
 create mode 100644 scripts/pahole-class.sh

Comments

Jiri Slaby Aug. 20, 2024, 9:08 a.m. UTC | #1
On 20. 08. 24, 10:59, Jiri Slaby (SUSE) wrote:
> From: Jiri Slaby <jslaby@suse.cz>
> 
> == WARNING ==
> This is only a PoC. There are deficiencies like CROSS_COMPILE or LLVM
> are completely unhandled.
> 
> The simple version is just do there:
>    ifeq ($(CONFIG_64BIT,y)
> but it has its own deficiencies, of course.
> 
> So any ideas, inputs?

Also as Shung-Hsi Yu suggests, we can cap -j to 1 in pahole proper when 
sizeof(long) == 4.

> == WARNING ==
> 
> When pahole is run with -j on 32bit userspace (32bit pahole in
> particular), it randomly fails with OOM:
>> btf_encoder__tag_kfuncs: Failed to get ELF section(62) data: out of memory.
>> btf_encoder__encode: failed to tag kfuncs!
> 
> or simply SIGSEGV (failed to allocate the btf encoder).
> 
> It very depends on how many threads are created.

I forgot to add, that it depends on the kernel version too. It happens 
much often with 6.11-rc now (vmlinux got big enough, apparently).

thanks,
Jiri Olsa Aug. 20, 2024, 2:33 p.m. UTC | #2
On Tue, Aug 20, 2024 at 10:59:50AM +0200, Jiri Slaby (SUSE) wrote:
> From: Jiri Slaby <jslaby@suse.cz>
> 
> == WARNING ==
> This is only a PoC. There are deficiencies like CROSS_COMPILE or LLVM
> are completely unhandled.
> 
> The simple version is just do there:
>   ifeq ($(CONFIG_64BIT,y)
> but it has its own deficiencies, of course.
> 
> So any ideas, inputs?
> == WARNING ==
> 
> When pahole is run with -j on 32bit userspace (32bit pahole in
> particular), it randomly fails with OOM:
> > btf_encoder__tag_kfuncs: Failed to get ELF section(62) data: out of memory.
> > btf_encoder__encode: failed to tag kfuncs!
> 
> or simply SIGSEGV (failed to allocate the btf encoder).
> 
> It very depends on how many threads are created.
> 
> So do not invoke pahole with -j on 32bit.

could you share more details about your setup?

does it need to run on pure 32bit to reproduce? I can't reproduce when
doing cross build and running 32 bit pahole on x86_64.. I do see some
errors though

  [667939] STRUCT bpf_prog_aux Error emitting BTF type
  Encountered error while encoding BTF.

thanks,
jirka

> 
> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
> Fixes: b4f72786429c ("scripts/pahole-flags.sh: Parse DWARF and generate BTF with multithreading.")
> Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229450
> Cc: Masahiro Yamada <masahiroy@kernel.org>
> Cc: Nathan Chancellor <nathan@kernel.org>
> Cc: Nicolas Schier <nicolas@fjasle.eu>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Eduard Zingerman <eddyz87@gmail.com>
> Cc: Song Liu <song@kernel.org>
> Cc: Yonghong Song <yonghong.song@linux.dev>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: Stanislav Fomichev <sdf@fomichev.me>
> Cc: Hao Luo <haoluo@google.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-kbuild@vger.kernel.org
> Cc: bpf@vger.kernel.org
> Cc: shung-hsi.yu@suse.com
> Cc: msuchanek@suse.com
> ---
>  init/Kconfig            |  4 ++++
>  scripts/Makefile.btf    |  2 ++
>  scripts/pahole-class.sh | 21 +++++++++++++++++++++
>  3 files changed, 27 insertions(+)
>  create mode 100644 scripts/pahole-class.sh
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index f36ca8a0e209..f5e80497eef0 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -113,6 +113,10 @@ config PAHOLE_VERSION
>  	int
>  	default $(shell,$(srctree)/scripts/pahole-version.sh $(PAHOLE))
>  
> +config PAHOLE_CLASS
> +	string
> +	default $(shell,$(srctree)/scripts/pahole-class.sh $(PAHOLE))
> +
>  config CONSTRUCTORS
>  	bool
>  
> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
> index b75f09f3f424..f7de8e922bce 100644
> --- a/scripts/Makefile.btf
> +++ b/scripts/Makefile.btf
> @@ -12,7 +12,9 @@ endif
>  
>  pahole-flags-$(call test-ge, $(pahole-ver), 121)	+= --btf_gen_floats
>  
> +ifeq ($(CONFIG_PAHOLE_CLASS),ELF64)
>  pahole-flags-$(call test-ge, $(pahole-ver), 122)	+= -j
> +endif
>  
>  pahole-flags-$(call test-ge, $(pahole-ver), 125)	+= --skip_encoding_btf_inconsistent_proto --btf_gen_optimized
>  
> diff --git a/scripts/pahole-class.sh b/scripts/pahole-class.sh
> new file mode 100644
> index 000000000000..d15a92077f76
> --- /dev/null
> +++ b/scripts/pahole-class.sh
> @@ -0,0 +1,21 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Usage: $ ./pahole-class.sh pahole
> +#
> +# Prints pahole's ELF class, such as ELF64
> +
> +if [ ! -x "$(command -v "$@")" ]; then
> +	echo 0
> +	exit 1
> +fi
> +
> +PAHOLE="$(which "$@")"
> +CLASS="$(readelf -h "$PAHOLE" 2>/dev/null | sed -n 's/.*Class: *// p')"
> +
> +# Scripts like scripts/dummy-tools/pahole
> +if [ -n "$CLASS" ]; then
> +	echo "$CLASS"
> +else
> +	echo ELF64
> +fi
> -- 
> 2.46.0
>
Jiri Slaby Aug. 21, 2024, 5:32 a.m. UTC | #3
On 20. 08. 24, 16:33, Jiri Olsa wrote:
> On Tue, Aug 20, 2024 at 10:59:50AM +0200, Jiri Slaby (SUSE) wrote:
>> From: Jiri Slaby <jslaby@suse.cz>
>>
>> == WARNING ==
>> This is only a PoC. There are deficiencies like CROSS_COMPILE or LLVM
>> are completely unhandled.
>>
>> The simple version is just do there:
>>    ifeq ($(CONFIG_64BIT,y)
>> but it has its own deficiencies, of course.
>>
>> So any ideas, inputs?
>> == WARNING ==
>>
>> When pahole is run with -j on 32bit userspace (32bit pahole in
>> particular), it randomly fails with OOM:
>>> btf_encoder__tag_kfuncs: Failed to get ELF section(62) data: out of memory.
>>> btf_encoder__encode: failed to tag kfuncs!
>>
>> or simply SIGSEGV (failed to allocate the btf encoder).
>>
>> It very depends on how many threads are created.
>>
>> So do not invoke pahole with -j on 32bit.
> 
> could you share more details about your setup?
> 
> does it need to run on pure 32bit to reproduce?

armv7l builds are 32bit only.

> I can't reproduce when
> doing cross build and running 32 bit pahole on x86_64..

i586 is built using 64bit kernel. It is enough to have 32bit userspace.
As written in the linked bug:
https://bugzilla.suse.com/show_bug.cgi?id=1229450#c6

FWIW, steps to reproduce locally:
docker pull jirislaby/pahole_crash
docker run -it jirislaby/pahole_crash

The VM space of pahole is exhausted:
process map: https://bugzilla.suse.com/attachment.cgi?id=876821
strace of mmaps: https://bugzilla.suse.com/attachment.cgi?id=876822

You need to run with large enough -j on a fast machine. Note that this 
happens on build hosts even with -j4, but they are under heavy load, so 
parallelism of held memory is high.

On my box with 16 cores, it is (likely far) enough to run with -j32.

> I do see some
> errors though
> 
>    [667939] STRUCT bpf_prog_aux Error emitting BTF type
>    Encountered error while encoding BTF.

It's possible that it is one of the errors. There are different ones. As 
I wrote above, sometimes it is a crash, sometimes it is the failure I 
mentioned above. But it always ends up with a failed build:
 > libbpf: failed to find '.BTF' ELF section in vmlinux
 > FAILED: load BTF from vmlinux: No data available
 > make[2]: *** [../scripts/Makefile.vmlinux:34: vmlinux] Error 255
 > make[2]: *** Deleting file 'vmlinux'
 > make[1]: *** 
[/home/abuild/rpmbuild/BUILD/kernel-vanilla-6.11~rc3.338.gc3f2d783a459/linux-6.11-rc3-338-gc3f2d783a459/Makefile:1158: 
vmlinux] Error 2
 > make: *** [../Makefile:224: __sub-make] Error 2
 > error: Bad exit status from /var/tmp/rpm-tmp.olf5Nu (%build)

thanks,
Jiri Slaby Aug. 21, 2024, 6:40 a.m. UTC | #4
On 21. 08. 24, 7:32, Jiri Slaby wrote:
> On 20. 08. 24, 16:33, Jiri Olsa wrote:
>> On Tue, Aug 20, 2024 at 10:59:50AM +0200, Jiri Slaby (SUSE) wrote:
>>> From: Jiri Slaby <jslaby@suse.cz>
>>>
>>> == WARNING ==
>>> This is only a PoC. There are deficiencies like CROSS_COMPILE or LLVM
>>> are completely unhandled.
>>>
>>> The simple version is just do there:
>>>    ifeq ($(CONFIG_64BIT,y)
>>> but it has its own deficiencies, of course.
>>>
>>> So any ideas, inputs?
>>> == WARNING ==
>>>
>>> When pahole is run with -j on 32bit userspace (32bit pahole in
>>> particular), it randomly fails with OOM:
>>>> btf_encoder__tag_kfuncs: Failed to get ELF section(62) data: out of 
>>>> memory.
>>>> btf_encoder__encode: failed to tag kfuncs!
>>>
>>> or simply SIGSEGV (failed to allocate the btf encoder).
>>>
>>> It very depends on how many threads are created.
>>>
>>> So do not invoke pahole with -j on 32bit.
>>
>> could you share more details about your setup?
>>
>> does it need to run on pure 32bit to reproduce?
> 
> armv7l builds are 32bit only.
> 
>> I can't reproduce when
>> doing cross build and running 32 bit pahole on x86_64..
> 
> i586 is built using 64bit kernel. It is enough to have 32bit userspace.
> As written in the linked bug:
> https://bugzilla.suse.com/show_bug.cgi?id=1229450#c6
> 
> FWIW, steps to reproduce locally:
> docker pull jirislaby/pahole_crash
> docker run -it jirislaby/pahole_crash
> 
> The VM space of pahole is exhausted:
> process map: https://bugzilla.suse.com/attachment.cgi?id=876821
> strace of mmaps: https://bugzilla.suse.com/attachment.cgi?id=876822
> 
> You need to run with large enough -j on a fast machine. Note that this 
> happens on build hosts even with -j4, but they are under heavy load, so 
> parallelism of held memory is high.

 From https://bugzilla.suse.com/show_bug.cgi?id=1229450#c20:
Run on 64bit:
pahole -j32 -> 4.102 GB
pahole -j16 -> 3.895 GB
pahole -j1 -> 3.706 GB

On 32bit (the same vmlinux):
pahole -j32 -> 2.870 GB (crash)
pahole -j16 -> 2.810 GB
pahole -j1 -> 2.444 GB

Look there for full massif report.

So now I think we should disable BTF generation with 32bit pahole 
completely. Or someone debugs it and improves debug info loading to not 
eat that much.

thanks,
Jiri Slaby Aug. 21, 2024, 7:29 a.m. UTC | #5
On 21. 08. 24, 8:40, Jiri Slaby wrote:
>  From https://bugzilla.suse.com/show_bug.cgi?id=1229450#c20:
> Run on 64bit:
> pahole -j32 -> 4.102 GB
> pahole -j16 -> 3.895 GB
> pahole -j1 -> 3.706 GB
> 
> On 32bit (the same vmlinux):
> pahole -j32 -> 2.870 GB (crash)
> pahole -j16 -> 2.810 GB
> pahole -j1 -> 2.444 GB
> 
> Look there for full massif report.

 From https://bugzilla.suse.com/show_bug.cgi?id=1229450#c21:
(In reply to Jiri Slaby from comment #20)
 > | |   |   ->24.01% (954,816,480B) 0x489B4AB: UnknownInlinedFun 
(dwarf_loader.c:959)

So given this struct class_member is the largest consumer, running 
pahole on pahole. The below results in 4.102 GB -> 3.585 GB savings.

--- a/dwarves.h
+++ b/dwarves.h
@@ -487,14 +487,14 @@ int cu__for_all_tags(struct cu *cu,
   */
  struct tag {
         struct list_head node;
+       const char       *attribute;
+       void             *priv;
         type_id_t        type;
         uint16_t         tag;
+       uint16_t         recursivity_level;
         bool             visited;
         bool             top_level;
         bool             has_btf_type_tag;
-       uint16_t         recursivity_level;
-       const char       *attribute;
-       void             *priv;
  };

  // To use with things like type->type_enum == 
perf_event_type+perf_user_event_type
@@ -1086,17 +1086,17 @@ static inline int function__inlined(const struct 
function *func)
  struct class_member {
         struct tag       tag;
         const char       *name;
+       uint64_t         const_value;
         uint32_t         bit_offset;
         uint32_t         bit_size;
         uint32_t         byte_offset;
         int              hole;
         size_t           byte_size;
+       uint32_t         alignment;
         int8_t           bitfield_offset;
         uint8_t          bitfield_size;
         uint8_t          bit_hole;
         uint8_t          bitfield_end:1;
-       uint64_t         const_value;
-       uint32_t         alignment;
         uint8_t          visited:1;
         uint8_t          is_static:1;
         uint8_t          has_bit_offset:1;
Shung-Hsi Yu Aug. 22, 2024, 3:55 a.m. UTC | #6
(Add pahole maintainer and mailing list)

Hi Arnaldo,

We're running into kernel build failure on 32-bit (both full 32-bit and
32-bit userspace on 64-bit kernel) because pahole crashed due to virtual
memory exhaustion[1]. As a workaround we currently limit pahole's
parallel job count to 1 on such system[2]:

On Tue, 20 Aug 2024 10:59:50AM +0200, Jiri Slaby wrote:
[...]
> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
> index b75f09f3f424..f7de8e922bce 100644
> --- a/scripts/Makefile.btf
> +++ b/scripts/Makefile.btf
> @@ -12,7 +12,9 @@ endif
>  
>  pahole-flags-$(call test-ge, $(pahole-ver), 121)	+= --btf_gen_floats
>  
> +ifeq ($(CONFIG_PAHOLE_CLASS),ELF64)
>  pahole-flags-$(call test-ge, $(pahole-ver), 122)	+= -j
> +endif
>  
>  pahole-flags-$(call test-ge, $(pahole-ver), 125)	+= --skip_encoding_btf_inconsistent_proto --btf_gen_optimized
>  
> diff --git a/scripts/pahole-class.sh b/scripts/pahole-class.sh
> new file mode 100644
> index 000000000000..d15a92077f76
> --- /dev/null
> +++ b/scripts/pahole-class.sh
> @@ -0,0 +1,21 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Usage: $ ./pahole-class.sh pahole
> +#
> +# Prints pahole's ELF class, such as ELF64
> +
> +if [ ! -x "$(command -v "$@")" ]; then
> +	echo 0
> +	exit 1
> +fi
> +
> +PAHOLE="$(which "$@")"
> +CLASS="$(readelf -h "$PAHOLE" 2>/dev/null | sed -n 's/.*Class: *// p')"
> +
> +# Scripts like scripts/dummy-tools/pahole
> +if [ -n "$CLASS" ]; then
> +	echo "$CLASS"
> +else
> +	echo ELF64
> +fi
> -- 

This helped lowered the memory usage enough so pahole no longer crash:

On Wed, Aug 21, 2024 at 09:29:57AM GMT, Jiri Slaby wrote:
> On 21. 08. 24, 8:40, Jiri Slaby wrote:
> >  From https://bugzilla.suse.com/show_bug.cgi?id=1229450#c20:
> > Run on 64bit:
> > pahole -j32 -> 4.102 GB
> > pahole -j16 -> 3.895 GB
> > pahole -j1 -> 3.706 GB
> > 
> > On 32bit (the same vmlinux):
> > pahole -j32 -> 2.870 GB (crash)
> > pahole -j16 -> 2.810 GB
> > pahole -j1 -> 2.444 GB

Jiri (Slaby) in the meanwhile has also proposed structure packing to
further reduce memory usage. (Note: I think the numbers below are from a
64-bit machine)

> From https://bugzilla.suse.com/show_bug.cgi?id=1229450#c21:
> (In reply to Jiri Slaby from comment #20)
> > | |   |   ->24.01% (954,816,480B) 0x489B4AB: UnknownInlinedFun
> (dwarf_loader.c:959)
> 
> So given this struct class_member is the largest consumer, running pahole on
> pahole. The below results in 4.102 GB -> 3.585 GB savings.
> 
> --- a/dwarves.h
> +++ b/dwarves.h
> @@ -487,14 +487,14 @@ int cu__for_all_tags(struct cu *cu,
>   */
>  struct tag {
>         struct list_head node;
> +       const char       *attribute;
> +       void             *priv;
>         type_id_t        type;
>         uint16_t         tag;
> +       uint16_t         recursivity_level;
>         bool             visited;
>         bool             top_level;
>         bool             has_btf_type_tag;
> -       uint16_t         recursivity_level;
> -       const char       *attribute;
> -       void             *priv;
>  };
> 
>  // To use with things like type->type_enum ==
> perf_event_type+perf_user_event_type
> @@ -1086,17 +1086,17 @@ static inline int function__inlined(const struct
> function *func)
>  struct class_member {
>         struct tag       tag;
>         const char       *name;
> +       uint64_t         const_value;
>         uint32_t         bit_offset;
>         uint32_t         bit_size;
>         uint32_t         byte_offset;
>         int              hole;
>         size_t           byte_size;
> +       uint32_t         alignment;
>         int8_t           bitfield_offset;
>         uint8_t          bitfield_size;
>         uint8_t          bit_hole;
>         uint8_t          bitfield_end:1;
> -       uint64_t         const_value;
> -       uint32_t         alignment;
>         uint8_t          visited:1;
>         uint8_t          is_static:1;
>         uint8_t          has_bit_offset:1;
>--

What do you think?

IIUC pahole's memory usage is largely tied to the number of entries in
vmlinux/kmodule DWARF, and there probably isn't much we could do about
that.

Shung-Hsi

1: https://bugzilla.suse.com/show_bug.cgi?id=1229450
2: https://lore.kernel.org/all/20240820085950.200358-1-jirislaby@kernel.org/
Arnaldo Carvalho de Melo Aug. 22, 2024, 3:24 p.m. UTC | #7
On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
> (Add pahole maintainer and mailing list)
> 
> Hi Arnaldo,
> 
> We're running into kernel build failure on 32-bit (both full 32-bit and
> 32-bit userspace on 64-bit kernel) because pahole crashed due to virtual
> memory exhaustion[1]. As a workaround we currently limit pahole's
> parallel job count to 1 on such system[2]:
> 
> On Tue, 20 Aug 2024 10:59:50AM +0200, Jiri Slaby wrote:
> [...]
> > diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
> > index b75f09f3f424..f7de8e922bce 100644
> > --- a/scripts/Makefile.btf
> > +++ b/scripts/Makefile.btf
> > @@ -12,7 +12,9 @@ endif
> >  
> >  pahole-flags-$(call test-ge, $(pahole-ver), 121)	+= --btf_gen_floats
> >  
> > +ifeq ($(CONFIG_PAHOLE_CLASS),ELF64)
> >  pahole-flags-$(call test-ge, $(pahole-ver), 122)	+= -j
> > +endif
> >  
> >  pahole-flags-$(call test-ge, $(pahole-ver), 125)	+= --skip_encoding_btf_inconsistent_proto --btf_gen_optimized
> >  
> > diff --git a/scripts/pahole-class.sh b/scripts/pahole-class.sh
> > new file mode 100644
> > index 000000000000..d15a92077f76
> > --- /dev/null
> > +++ b/scripts/pahole-class.sh
> > @@ -0,0 +1,21 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Usage: $ ./pahole-class.sh pahole
> > +#
> > +# Prints pahole's ELF class, such as ELF64
> > +
> > +if [ ! -x "$(command -v "$@")" ]; then
> > +	echo 0
> > +	exit 1
> > +fi
> > +
> > +PAHOLE="$(which "$@")"
> > +CLASS="$(readelf -h "$PAHOLE" 2>/dev/null | sed -n 's/.*Class: *// p')"
> > +
> > +# Scripts like scripts/dummy-tools/pahole
> > +if [ -n "$CLASS" ]; then
> > +	echo "$CLASS"
> > +else
> > +	echo ELF64
> > +fi
> > -- 
> 
> This helped lowered the memory usage enough so pahole no longer crash:
> 
> On Wed, Aug 21, 2024 at 09:29:57AM GMT, Jiri Slaby wrote:
> > On 21. 08. 24, 8:40, Jiri Slaby wrote:
> > >  From https://bugzilla.suse.com/show_bug.cgi?id=1229450#c20:
> > > Run on 64bit:
> > > pahole -j32 -> 4.102 GB
> > > pahole -j16 -> 3.895 GB
> > > pahole -j1 -> 3.706 GB
> > > 
> > > On 32bit (the same vmlinux):
> > > pahole -j32 -> 2.870 GB (crash)
> > > pahole -j16 -> 2.810 GB
> > > pahole -j1 -> 2.444 GB
> 
> Jiri (Slaby) in the meanwhile has also proposed structure packing to
> further reduce memory usage. (Note: I think the numbers below are from a
> 64-bit machine)

That is interesting, packing pahole data structures ;-) :-)

Also a coincidence is that I did some packing on what is in the next
branch:

3ef508ad94012933 dwarf_loader: Make 'struct dwarf_tag' more compact by getting rid of 'struct dwarf_off_ref
70febc8858588348 core: Reorganize 'struct class_member' to save 8 bytes
76bcb88a67556468 core: Make tag->recursivity_level a uint8_t
b8b9e04d177d8eb7 core: Make tag->top_level a single bit flag
539acefcdd5b0f71 core: Make tag->has_btf_type_tag a single bit flag
dba2c2c1aa5dfa05 core: Make tag->visited a single bit flag
7409cfadcae0253b core: Shrink 'struct namespace' a bit by using a hole in its embedded 'tag'

Also I did more work to reduce the number of allocations:

cbecc3785266f0c5 dwarf_loader: Do just one alloc for 'struct dwarf_tag + struct tag'

With it we get:

⬢[acme@toolbox pahole]$ pahole -C class_member build/libdwarves.so.1.0.0 
struct class_member {
	struct tag                 tag;                  /*     0    32 */

	/* XXX last struct has 1 bit hole */

	const char  *              name;                 /*    32     8 */
	uint32_t                   bit_offset;           /*    40     4 */
	uint32_t                   bit_size;             /*    44     4 */
	uint32_t                   byte_offset;          /*    48     4 */
	int                        hole;                 /*    52     4 */
	size_t                     byte_size;            /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	int8_t                     bitfield_offset;      /*    64     1 */
	uint8_t                    bitfield_size;        /*    65     1 */
	uint8_t                    bit_hole;             /*    66     1 */
	uint8_t                    bitfield_end:1;       /*    67: 0  1 */
	uint8_t                    visited:1;            /*    67: 1  1 */
	uint8_t                    is_static:1;          /*    67: 2  1 */
	uint8_t                    has_bit_offset:1;     /*    67: 3  1 */
	uint8_t                    accessibility:2;      /*    67: 4  1 */
	uint8_t                    virtuality:2;         /*    67: 6  1 */
	uint32_t                   alignment;            /*    68     4 */
	uint64_t                   const_value;          /*    72     8 */

	/* size: 80, cachelines: 2, members: 18 */
	/* member types with bit holes: 1, total: 1 */
	/* last cacheline: 16 bytes */
};

⬢[acme@toolbox pahole]$

And also the dwarf_tag, that is allocated for each tag coming from DWARF
got smaller:

⬢[acme@toolbox pahole]$ pahole -C dwarf_tag build/libdwarves.so.1.0.0 
struct dwarf_tag {
	struct hlist_node          hash_node;            /*     0    16 */
	Dwarf_Off                  type;                 /*    16     8 */
	Dwarf_Off                  id;                   /*    24     8 */
	union {
		Dwarf_Off          abstract_origin;      /*    32     8 */
		Dwarf_Off          containing_type;      /*    32     8 */
	};                                               /*    32     8 */
	Dwarf_Off                  specification;        /*    40     8 */
	struct {
		_Bool              type:1;               /*    48: 0  1 */
		_Bool              abstract_origin:1;    /*    48: 1  1 */
		_Bool              containing_type:1;    /*    48: 2  1 */
		_Bool              specification:1;      /*    48: 3  1 */
	} from_types_section;                            /*    48     1 */

	/* XXX last struct has 4 bits of padding */
	/* XXX 1 byte hole, try to pack */

	uint16_t                   decl_line;            /*    50     2 */
	uint32_t                   small_id;             /*    52     4 */
	const char  *              decl_file;            /*    56     8 */

	/* size: 64, cachelines: 1, members: 9 */
	/* sum members: 63, holes: 1, sum holes: 1 */
	/* member types with bit paddings: 1, total: 4 bits */
};

⬢[acme@toolbox pahole]$

I stumbled on this limitation as well when trying to build the kernel on
a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
created a swapfile and it managed to proceed, a bit slowly, but worked
as well.

Please let me know if what is in the 'next' branch of:

https://git.kernel.org/pub/scm/devel/pahole/pahole.git

Works for you, that will be extra motivation to move it to the master
branch and cut 1.28.

Thanks!

- Arnaldo
 
> > From https://bugzilla.suse.com/show_bug.cgi?id=1229450#c21:
> > (In reply to Jiri Slaby from comment #20)
> > > | |   |   ->24.01% (954,816,480B) 0x489B4AB: UnknownInlinedFun
> > (dwarf_loader.c:959)
> > 
> > So given this struct class_member is the largest consumer, running pahole on
> > pahole. The below results in 4.102 GB -> 3.585 GB savings.
> > 
> > --- a/dwarves.h
> > +++ b/dwarves.h
> > @@ -487,14 +487,14 @@ int cu__for_all_tags(struct cu *cu,
> >   */
> >  struct tag {
> >         struct list_head node;
> > +       const char       *attribute;
> > +       void             *priv;
> >         type_id_t        type;
> >         uint16_t         tag;
> > +       uint16_t         recursivity_level;
> >         bool             visited;
> >         bool             top_level;
> >         bool             has_btf_type_tag;
> > -       uint16_t         recursivity_level;
> > -       const char       *attribute;
> > -       void             *priv;
> >  };
> > 
> >  // To use with things like type->type_enum ==
> > perf_event_type+perf_user_event_type
> > @@ -1086,17 +1086,17 @@ static inline int function__inlined(const struct
> > function *func)
> >  struct class_member {
> >         struct tag       tag;
> >         const char       *name;
> > +       uint64_t         const_value;
> >         uint32_t         bit_offset;
> >         uint32_t         bit_size;
> >         uint32_t         byte_offset;
> >         int              hole;
> >         size_t           byte_size;
> > +       uint32_t         alignment;
> >         int8_t           bitfield_offset;
> >         uint8_t          bitfield_size;
> >         uint8_t          bit_hole;
> >         uint8_t          bitfield_end:1;
> > -       uint64_t         const_value;
> > -       uint32_t         alignment;
> >         uint8_t          visited:1;
> >         uint8_t          is_static:1;
> >         uint8_t          has_bit_offset:1;
> >--
> 
> What do you think?
> 
> IIUC pahole's memory usage is largely tied to the number of entries in
> vmlinux/kmodule DWARF, and there probably isn't much we could do about
> that.
> 
> Shung-Hsi
> 
> 1: https://bugzilla.suse.com/show_bug.cgi?id=1229450
> 2: https://lore.kernel.org/all/20240820085950.200358-1-jirislaby@kernel.org/
Jiri Slaby Aug. 26, 2024, 8:57 a.m. UTC | #8
On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
> On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
> I stumbled on this limitation as well when trying to build the kernel on
> a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
> created a swapfile and it managed to proceed, a bit slowly, but worked
> as well.

Here, it hits the VM space limit (3 G).

> Please let me know if what is in the 'next' branch of:
> 
> https://git.kernel.org/pub/scm/devel/pahole/pahole.git
> 
> Works for you, that will be extra motivation to move it to the master
> branch and cut 1.28.

on 64bit (-j1):
* master: 3.706 GB
(* master + my changes: 3.559 GB)
* next: 3.157 GB 


on 32bit:
  * master-j1: 2.445 GB
  * master-j16: 2.608 GB
  * master-j32: 2.811 GB
  * next-j1: 2.256 GB
  * next-j16: 2.401 GB
  * next-j32: 2.613 GB

It's definitely better. So I think it could work now, if the thread 
count was limited to 1 on 32bit. As building with -j10, -j20 randomly 
fails on random machines (32bit processes only of course). Unlike -j1.

thanks,
Sedat Dilek Aug. 26, 2024, 10:18 a.m. UTC | #9
On Thu, Aug 22, 2024 at 5:24 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:

> Please let me know if what is in the 'next' branch of:
>
> https://git.kernel.org/pub/scm/devel/pahole/pahole.git
>
> Works for you, that will be extra motivation to move it to the master
> branch and cut 1.28.

For pahole version 1.28 - Please, Go Go Go.

-Sedat-

pahole 1.27 segfaults when generating BTF for modules built with LTO #2032
https://github.com/ClangBuiltLinux/linux/issues/2032
Arnaldo Carvalho de Melo Aug. 26, 2024, 5:03 p.m. UTC | #10
On Mon, Aug 26, 2024 at 10:57:22AM +0200, Jiri Slaby wrote:
> On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
> > On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
> > I stumbled on this limitation as well when trying to build the kernel on
> > a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
> > created a swapfile and it managed to proceed, a bit slowly, but worked
> > as well.
> 
> Here, it hits the VM space limit (3 G).

right, in my case it was on a 64-bit system, so just not enough memory,
not address space.
 
> > Please let me know if what is in the 'next' branch of:

> > https://git.kernel.org/pub/scm/devel/pahole/pahole.git

> > Works for you, that will be extra motivation to move it to the master
> > branch and cut 1.28.

> on 64bit (-j1):
> * master: 3.706 GB
> (* master + my changes: 3.559 GB)
> * next: 3.157 GB
 
> on 32bit:
>  * master-j1: 2.445 GB
>  * master-j16: 2.608 GB
>  * master-j32: 2.811 GB
>  * next-j1: 2.256 GB
>  * next-j16: 2.401 GB
>  * next-j32: 2.613 GB
> 
> It's definitely better. So I think it could work now, if the thread count
> was limited to 1 on 32bit. As building with -j10, -j20 randomly fails on
> random machines (32bit processes only of course). Unlike -j1.

Cool, I just merged a patch from Alan Maguire that should help with the
parallel case, would be able to test it? It is in the 'next' branch:

⬢[acme@toolbox pahole]$ git log --oneline -5
f37212d1611673a2 (HEAD -> master) pahole: Teduce memory usage by smarter deleting of CUs

Excerpt of the above:

    This leads to deleting ~90 CUs during parallel vmlinux BTF generation
    versus deleting just 1 prior to this change.

c7ec9200caa7d485 btf_encoder: Add "distilled_base" BTF feature to split BTF generation
bc4e6a9adfc72758 pahole: Sync with libbpf-1.5
5e3ed3ec2947c69f pahole: Do --lang_exclude CU filtering earlier
c46455bb0379fa38 dwarf_loader: Allow filtering CUs early in loading
⬢[acme@toolbox pahole]$

- Arnaldo
Sedat Dilek Aug. 26, 2024, 6:42 p.m. UTC | #11
On Mon, Aug 26, 2024 at 7:03 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Mon, Aug 26, 2024 at 10:57:22AM +0200, Jiri Slaby wrote:
> > On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
> > > On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
> > > I stumbled on this limitation as well when trying to build the kernel on
> > > a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
> > > created a swapfile and it managed to proceed, a bit slowly, but worked
> > > as well.
> >
> > Here, it hits the VM space limit (3 G).
>
> right, in my case it was on a 64-bit system, so just not enough memory,
> not address space.
>
> > > Please let me know if what is in the 'next' branch of:
>
> > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git
>
> > > Works for you, that will be extra motivation to move it to the master
> > > branch and cut 1.28.
>
> > on 64bit (-j1):
> > * master: 3.706 GB
> > (* master + my changes: 3.559 GB)
> > * next: 3.157 GB
>
> > on 32bit:
> >  * master-j1: 2.445 GB
> >  * master-j16: 2.608 GB
> >  * master-j32: 2.811 GB
> >  * next-j1: 2.256 GB
> >  * next-j16: 2.401 GB
> >  * next-j32: 2.613 GB
> >
> > It's definitely better. So I think it could work now, if the thread count
> > was limited to 1 on 32bit. As building with -j10, -j20 randomly fails on
> > random machines (32bit processes only of course). Unlike -j1.
>
> Cool, I just merged a patch from Alan Maguire that should help with the
> parallel case, would be able to test it? It is in the 'next' branch:
>
> ⬢[acme@toolbox pahole]$ git log --oneline -5
> f37212d1611673a2 (HEAD -> master) pahole: Teduce memory usage by smarter deleting of CUs
>

*R*edzce? memory usage ...

-Sedat-

> Excerpt of the above:
>
>     This leads to deleting ~90 CUs during parallel vmlinux BTF generation
>     versus deleting just 1 prior to this change.
>
> c7ec9200caa7d485 btf_encoder: Add "distilled_base" BTF feature to split BTF generation
> bc4e6a9adfc72758 pahole: Sync with libbpf-1.5
> 5e3ed3ec2947c69f pahole: Do --lang_exclude CU filtering earlier
> c46455bb0379fa38 dwarf_loader: Allow filtering CUs early in loading
> ⬢[acme@toolbox pahole]$
>
> - Arnaldo
>
Phil Auld Aug. 26, 2024, 6:48 p.m. UTC | #12
On Mon, Aug 26, 2024 at 08:42:10PM +0200 Sedat Dilek wrote:
> On Mon, Aug 26, 2024 at 7:03 PM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Mon, Aug 26, 2024 at 10:57:22AM +0200, Jiri Slaby wrote:
> > > On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
> > > > On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
> > > > I stumbled on this limitation as well when trying to build the kernel on
> > > > a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
> > > > created a swapfile and it managed to proceed, a bit slowly, but worked
> > > > as well.
> > >
> > > Here, it hits the VM space limit (3 G).
> >
> > right, in my case it was on a 64-bit system, so just not enough memory,
> > not address space.
> >
> > > > Please let me know if what is in the 'next' branch of:
> >
> > > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git
> >
> > > > Works for you, that will be extra motivation to move it to the master
> > > > branch and cut 1.28.
> >
> > > on 64bit (-j1):
> > > * master: 3.706 GB
> > > (* master + my changes: 3.559 GB)
> > > * next: 3.157 GB
> >
> > > on 32bit:
> > >  * master-j1: 2.445 GB
> > >  * master-j16: 2.608 GB
> > >  * master-j32: 2.811 GB
> > >  * next-j1: 2.256 GB
> > >  * next-j16: 2.401 GB
> > >  * next-j32: 2.613 GB
> > >
> > > It's definitely better. So I think it could work now, if the thread count
> > > was limited to 1 on 32bit. As building with -j10, -j20 randomly fails on
> > > random machines (32bit processes only of course). Unlike -j1.
> >
> > Cool, I just merged a patch from Alan Maguire that should help with the
> > parallel case, would be able to test it? It is in the 'next' branch:
> >
> > ⬢[acme@toolbox pahole]$ git log --oneline -5
> > f37212d1611673a2 (HEAD -> master) pahole: Teduce memory usage by smarter deleting of CUs
> >
> 
> *R*edzce? memory usage ...
>

If you meant that further typo it's golden, and if not the irony is rich :)

Either way this is my favorite email of the day!


Cheers,
Phil


> -Sedat-
> 
> > Excerpt of the above:
> >
> >     This leads to deleting ~90 CUs during parallel vmlinux BTF generation
> >     versus deleting just 1 prior to this change.
> >
> > c7ec9200caa7d485 btf_encoder: Add "distilled_base" BTF feature to split BTF generation
> > bc4e6a9adfc72758 pahole: Sync with libbpf-1.5
> > 5e3ed3ec2947c69f pahole: Do --lang_exclude CU filtering earlier
> > c46455bb0379fa38 dwarf_loader: Allow filtering CUs early in loading
> > ⬢[acme@toolbox pahole]$
> >
> > - Arnaldo
> >
> 

--
Arnaldo Carvalho de Melo Aug. 26, 2024, 8:02 p.m. UTC | #13
On Mon, Aug 26, 2024 at 10:57:22AM +0200, Jiri Slaby wrote:
> On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
> > On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
> > I stumbled on this limitation as well when trying to build the kernel on
> > a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
> > created a swapfile and it managed to proceed, a bit slowly, but worked
> > as well.
> 
> Here, it hits the VM space limit (3 G).
> 
> > Please let me know if what is in the 'next' branch of:
> > 
> > https://git.kernel.org/pub/scm/devel/pahole/pahole.git
> > 
> > Works for you, that will be extra motivation to move it to the master
> > branch and cut 1.28.
> 
> on 64bit (-j1):
> * master: 3.706 GB
> (* master + my changes: 3.559 GB)
> * next: 3.157 GB
> 
> 
> on 32bit:
>  * master-j1: 2.445 GB
>  * master-j16: 2.608 GB
>  * master-j32: 2.811 GB
>  * next-j1: 2.256 GB
>  * next-j16: 2.401 GB
>  * next-j32: 2.613 GB
> 
> It's definitely better. So I think it could work now, if the thread count
> was limited to 1 on 32bit. As building with -j10, -j20 randomly fails on
> random machines (32bit processes only of course). Unlike -j1.

Great! I'm now processing a patch by Alan Maguire that should help with
parallel loading, I'll add it o the next branch, together with his work
on distilled BTF.

If you could test it, it would be great to see how much it helps with
teh serial case and if it allows for at least some parallel processing
on 32-bit architectures.

Thanks for all your effort on this, appreciated.

- Arnaldo
Arnaldo Carvalho de Melo Aug. 26, 2024, 8:04 p.m. UTC | #14
On Mon, Aug 26, 2024 at 02:48:18PM -0400, Phil Auld wrote:
> On Mon, Aug 26, 2024 at 08:42:10PM +0200 Sedat Dilek wrote:
> > On Mon, Aug 26, 2024 at 7:03 PM Arnaldo Carvalho de Melo
> > <acme@kernel.org> wrote:
> > >
> > > On Mon, Aug 26, 2024 at 10:57:22AM +0200, Jiri Slaby wrote:
> > > > On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
> > > > > On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
> > > > > I stumbled on this limitation as well when trying to build the kernel on
> > > > > a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
> > > > > created a swapfile and it managed to proceed, a bit slowly, but worked
> > > > > as well.
> > > >
> > > > Here, it hits the VM space limit (3 G).
> > >
> > > right, in my case it was on a 64-bit system, so just not enough memory,
> > > not address space.
> > >
> > > > > Please let me know if what is in the 'next' branch of:
> > >
> > > > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git
> > >
> > > > > Works for you, that will be extra motivation to move it to the master
> > > > > branch and cut 1.28.
> > >
> > > > on 64bit (-j1):
> > > > * master: 3.706 GB
> > > > (* master + my changes: 3.559 GB)
> > > > * next: 3.157 GB
> > >
> > > > on 32bit:
> > > >  * master-j1: 2.445 GB
> > > >  * master-j16: 2.608 GB
> > > >  * master-j32: 2.811 GB
> > > >  * next-j1: 2.256 GB
> > > >  * next-j16: 2.401 GB
> > > >  * next-j32: 2.613 GB
> > > >
> > > > It's definitely better. So I think it could work now, if the thread count
> > > > was limited to 1 on 32bit. As building with -j10, -j20 randomly fails on
> > > > random machines (32bit processes only of course). Unlike -j1.
> > >
> > > Cool, I just merged a patch from Alan Maguire that should help with the
> > > parallel case, would be able to test it? It is in the 'next' branch:
> > >
> > > ⬢[acme@toolbox pahole]$ git log --oneline -5
> > > f37212d1611673a2 (HEAD -> master) pahole: Teduce memory usage by smarter deleting of CUs
> > >
> > 
> > *R*edzce? memory usage ...
> >
> 
> If you meant that further typo it's golden, and if not the irony is rich :)
> 
> Either way this is my favorite email of the day!

Hahaha, I went to uppercase what comes after the colon and introduced
that typo ;-)

Faxing it....

- Arnaldo
Andrii Nakryiko Aug. 26, 2024, 10:07 p.m. UTC | #15
On Mon, Aug 26, 2024 at 1:04 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Mon, Aug 26, 2024 at 02:48:18PM -0400, Phil Auld wrote:
> > On Mon, Aug 26, 2024 at 08:42:10PM +0200 Sedat Dilek wrote:
> > > On Mon, Aug 26, 2024 at 7:03 PM Arnaldo Carvalho de Melo
> > > <acme@kernel.org> wrote:
> > > >
> > > > On Mon, Aug 26, 2024 at 10:57:22AM +0200, Jiri Slaby wrote:
> > > > > On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
> > > > > > On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
> > > > > > I stumbled on this limitation as well when trying to build the kernel on
> > > > > > a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
> > > > > > created a swapfile and it managed to proceed, a bit slowly, but worked
> > > > > > as well.
> > > > >
> > > > > Here, it hits the VM space limit (3 G).
> > > >
> > > > right, in my case it was on a 64-bit system, so just not enough memory,
> > > > not address space.
> > > >
> > > > > > Please let me know if what is in the 'next' branch of:
> > > >
> > > > > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git
> > > >
> > > > > > Works for you, that will be extra motivation to move it to the master
> > > > > > branch and cut 1.28.
> > > >
> > > > > on 64bit (-j1):
> > > > > * master: 3.706 GB
> > > > > (* master + my changes: 3.559 GB)
> > > > > * next: 3.157 GB
> > > >
> > > > > on 32bit:
> > > > >  * master-j1: 2.445 GB
> > > > >  * master-j16: 2.608 GB
> > > > >  * master-j32: 2.811 GB
> > > > >  * next-j1: 2.256 GB
> > > > >  * next-j16: 2.401 GB
> > > > >  * next-j32: 2.613 GB
> > > > >
> > > > > It's definitely better. So I think it could work now, if the thread count
> > > > > was limited to 1 on 32bit. As building with -j10, -j20 randomly fails on
> > > > > random machines (32bit processes only of course). Unlike -j1.
> > > >
> > > > Cool, I just merged a patch from Alan Maguire that should help with the
> > > > parallel case, would be able to test it? It is in the 'next' branch:
> > > >
> > > > ⬢[acme@toolbox pahole]$ git log --oneline -5
> > > > f37212d1611673a2 (HEAD -> master) pahole: Teduce memory usage by smarter deleting of CUs
> > > >
> > >
> > > *R*edzce? memory usage ...
> > >
> >
> > If you meant that further typo it's golden, and if not the irony is rich :)
> >
> > Either way this is my favorite email of the day!
>
> Hahaha, I went to uppercase what comes after the colon and introduced
> that typo ;-)
>
> Faxing it....

typo-fest continues ;)

but it's great to see that a new pahole release is coming (cc'ing
Usama), we are eagerly waiting for one of the bug fixes that will go
into 1.28 (one of the first commits after 1.27 was cut). Note, libbpf
CI's for pahole-staging is failing, but that seems a one-off test and
has nothing to do with pahole being broken.

>
> - Arnaldo
>
Jiri Slaby Aug. 27, 2024, 8:37 a.m. UTC | #16
On 26. 08. 24, 19:03, Arnaldo Carvalho de Melo wrote:
> On Mon, Aug 26, 2024 at 10:57:22AM +0200, Jiri Slaby wrote:
>> On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
>>> On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
>>> I stumbled on this limitation as well when trying to build the kernel on
>>> a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
>>> created a swapfile and it managed to proceed, a bit slowly, but worked
>>> as well.
>>
>> Here, it hits the VM space limit (3 G).
> 
> right, in my case it was on a 64-bit system, so just not enough memory,
> not address space.
>   
>>> Please let me know if what is in the 'next' branch of:
> 
>>> https://git.kernel.org/pub/scm/devel/pahole/pahole.git
> 
>>> Works for you, that will be extra motivation to move it to the master
>>> branch and cut 1.28.
> 
>> on 64bit (-j1):
>> * master: 3.706 GB
>> (* master + my changes: 3.559 GB)
>> * next: 3.157 GB
>   
>> on 32bit:
>>   * master-j1: 2.445 GB
>>   * master-j16: 2.608 GB
>>   * master-j32: 2.811 GB
>>   * next-j1: 2.256 GB
>>   * next-j16: 2.401 GB
>>   * next-j32: 2.613 GB
>>
>> It's definitely better. So I think it could work now, if the thread count
>> was limited to 1 on 32bit. As building with -j10, -j20 randomly fails on
>> random machines (32bit processes only of course). Unlike -j1.
> 
> Cool, I just merged a patch from Alan Maguire that should help with the
> parallel case, would be able to test it? It is in the 'next' branch:

Not much helping.

On my box (as all previous runs):
next-j1 2.242
next-j16 2.808
next-j32 2.646

On a build host:
next-j1: 2.242
next-j16: 2.824
next-j20: 2.902 (crash)
next-j32: 2.902 (crash)
Jiri Slaby Sept. 4, 2024, 6:06 a.m. UTC | #17
On 27. 08. 24, 10:37, Jiri Slaby wrote:
> On 26. 08. 24, 19:03, Arnaldo Carvalho de Melo wrote:
>> On Mon, Aug 26, 2024 at 10:57:22AM +0200, Jiri Slaby wrote:
>>> On 22. 08. 24, 17:24, Arnaldo Carvalho de Melo wrote:
>>>> On Thu, Aug 22, 2024 at 11:55:05AM +0800, Shung-Hsi Yu wrote:
>>>> I stumbled on this limitation as well when trying to build the 
>>>> kernel on
>>>> a Libre Computer rk3399-pc board with only 4GiB of RAM, there I just
>>>> created a swapfile and it managed to proceed, a bit slowly, but worked
>>>> as well.
>>>
>>> Here, it hits the VM space limit (3 G).
>>
>> right, in my case it was on a 64-bit system, so just not enough memory,
>> not address space.
>>>> Please let me know if what is in the 'next' branch of:
>>
>>>> https://git.kernel.org/pub/scm/devel/pahole/pahole.git
>>
>>>> Works for you, that will be extra motivation to move it to the master
>>>> branch and cut 1.28.
>>
>>> on 64bit (-j1):
>>> * master: 3.706 GB
>>> (* master + my changes: 3.559 GB)
>>> * next: 3.157 GB
>>> on 32bit:
>>>   * master-j1: 2.445 GB
>>>   * master-j16: 2.608 GB
>>>   * master-j32: 2.811 GB
>>>   * next-j1: 2.256 GB
>>>   * next-j16: 2.401 GB
>>>   * next-j32: 2.613 GB
>>>
>>> It's definitely better. So I think it could work now, if the thread 
>>> count
>>> was limited to 1 on 32bit. As building with -j10, -j20 randomly fails on
>>> random machines (32bit processes only of course). Unlike -j1.
>>
>> Cool, I just merged a patch from Alan Maguire that should help with the
>> parallel case, would be able to test it? It is in the 'next' branch:
> 
> Not much helping.
> 
> On my box (as all previous runs):
> next-j1 2.242
> next-j16 2.808
> next-j32 2.646
> 
> On a build host:
> next-j1: 2.242
> next-j16: 2.824
> next-j20: 2.902 (crash)
> next-j32: 2.902 (crash)

So I disabled BTF on all 32bits (arm+i586) in openSUSE Tumbleweed for 
now. Better kernel without BTF, than no kernel at all 8-).

If you have ideas to force -j1 on 32bit, I can reenable/retest...

thanks,
Sedat Dilek Sept. 25, 2024, 8:17 a.m. UTC | #18
On Mon, Aug 26, 2024 at 12:18 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Thu, Aug 22, 2024 at 5:24 PM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
>
> > Please let me know if what is in the 'next' branch of:
> >
> > https://git.kernel.org/pub/scm/devel/pahole/pahole.git
> >
> > Works for you, that will be extra motivation to move it to the master
> > branch and cut 1.28.
>
> For pahole version 1.28 - Please, Go Go Go.
>
> -Sedat-
>
> pahole 1.27 segfaults when generating BTF for modules built with LTO #2032
> https://github.com/ClangBuiltLinux/linux/issues/2032

Hi Arnaldo,

Any news for pahole version 1.28 release?

Thanks.

Best regards,
-Sedat-
diff mbox series

Patch

diff --git a/init/Kconfig b/init/Kconfig
index f36ca8a0e209..f5e80497eef0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -113,6 +113,10 @@  config PAHOLE_VERSION
 	int
 	default $(shell,$(srctree)/scripts/pahole-version.sh $(PAHOLE))
 
+config PAHOLE_CLASS
+	string
+	default $(shell,$(srctree)/scripts/pahole-class.sh $(PAHOLE))
+
 config CONSTRUCTORS
 	bool
 
diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index b75f09f3f424..f7de8e922bce 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -12,7 +12,9 @@  endif
 
 pahole-flags-$(call test-ge, $(pahole-ver), 121)	+= --btf_gen_floats
 
+ifeq ($(CONFIG_PAHOLE_CLASS),ELF64)
 pahole-flags-$(call test-ge, $(pahole-ver), 122)	+= -j
+endif
 
 pahole-flags-$(call test-ge, $(pahole-ver), 125)	+= --skip_encoding_btf_inconsistent_proto --btf_gen_optimized
 
diff --git a/scripts/pahole-class.sh b/scripts/pahole-class.sh
new file mode 100644
index 000000000000..d15a92077f76
--- /dev/null
+++ b/scripts/pahole-class.sh
@@ -0,0 +1,21 @@ 
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Usage: $ ./pahole-class.sh pahole
+#
+# Prints pahole's ELF class, such as ELF64
+
+if [ ! -x "$(command -v "$@")" ]; then
+	echo 0
+	exit 1
+fi
+
+PAHOLE="$(which "$@")"
+CLASS="$(readelf -h "$PAHOLE" 2>/dev/null | sed -n 's/.*Class: *// p')"
+
+# Scripts like scripts/dummy-tools/pahole
+if [ -n "$CLASS" ]; then
+	echo "$CLASS"
+else
+	echo ELF64
+fi