diff mbox series

[PATCH/RFT] Re: ANNOUNCE: pahole v1.27 (reproducible builds, BTF kfuncs)

Message ID ZnCWRMfRDMHqSxBb@x1 (mailing list archive)
State Not Applicable
Headers show
Series [PATCH/RFT] Re: ANNOUNCE: pahole v1.27 (reproducible builds, BTF kfuncs) | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Arnaldo Carvalho de Melo June 17, 2024, 8:02 p.m. UTC
On Mon, Jun 17, 2024 at 04:39:40PM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Jun 13, 2024 at 02:40:19PM -0700, Nathan Chancellor wrote:
> > On Tue, Jun 11, 2024 at 06:26:53PM -0300, Arnaldo Carvalho de Melo wrote:
> > > 	The v1.27 release of pahole and its friends is out, supporting
> > > parallel reproducible builds and encoding kernel kfuncs in BTF, allowing
> > > tools such as bpftrace to enumerate the available kfuncs and obtain its
> > > function signatures and return types.
> > 
> > After commit f632e75 ("dwarf_loader: Add the cu to the cus list early,
> > remove on LSK_DELETE"), I (and others[1]) notice a crash when running
> > pahole on modules built with Clang when CONFIG_LTO_CLANG is enabled:
> > 
> >   $ curl -LSso .config https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/raw/main/config
> > 
> >   $ scripts/config -d LTO_NONE -e LTO_CLANG_THIN
> > 
> >   $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 olddefconfig vmlinux crypto/cast_common.ko
> >   make[3]: *** [scripts/Makefile.modfinal:59: crypto/cast_common.ko] Error 139
> > 
> > I've isolated this to the following commands using the files available
> > at [2] (these were built with LLVM 18 but I could reproduce it with LLVM
> > 17 and LLVM 19, so it appears to impact a number of versions):
> > 
> >   $ tar -tf clang-lto-pahole-1.27-crash.tar.zst
> >   clang-lto-pahole-1.27-crash/
> >   clang-lto-pahole-1.27-crash/cast_common.mod.o
> >   clang-lto-pahole-1.27-crash/module.lds
> >   clang-lto-pahole-1.27-crash/cast_common.o
> >   clang-lto-pahole-1.27-crash/cast_common.ko.bak
> >   clang-lto-pahole-1.27-crash/vmlinux
> >   clang-lto-pahole-1.27-crash/cast_common.ko
> > 
> >   $ tar -axf clang-lto-pahole-1.27-crash.tar.zst
> > 
> >   $ cd clang-lto-pahole-1.27-crash
> > 
> >   $ LLVM_OBJCOPY="llvm-objcopy" pahole-1.26 -J -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func --lang_exclude=rust --btf_base vmlinux cast_common.ko
> > 
> >   $ cp cast_common.ko{.bak,}
> > 
> >   $ LLVM_OBJCOPY="llvm-objcopy" pahole-1.27 -J -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func --lang_exclude=rust --btf_base vmlinux cast_common.ko
> >   fish: Job 1, '...' terminated by signal SIGSEGV (Address boundary error)
> > 
> > If there is any more information I can provide or patches I can test, I
> > am more than happy to do so.
> 
> I reproduced the problem by just running 'pahole cast_common.ko", so
> this isn't even related to the BTF parts, its about the DWARF loader,
> I'm on it, thanks for the detailed report and for providing the files.

One liner + explanation at the end, with it both DWARF and BTF loaders
work:

⬢[acme@toolbox clang-lto-pahole-1.27-crash]$ pahole -F btf -C task_struct cast_common.ko | tail
	/* XXX last struct has 1 hole, 1 bit hole */

	/* size: 11584, cachelines: 181, members: 265 */
	/* sum members: 11483, holes: 22, sum holes: 85 */
	/* sum bitfield members: 82 bits, bit holes: 2, sum bit holes: 46 bits */
	/* member types with holes: 4, total: 6, bit holes: 2, total: 2 */
	/* paddings: 6, sum paddings: 49 */
	/* forced alignments: 1, forced holes: 1, sum forced holes: 8 */
};

⬢[acme@toolbox clang-lto-pahole-1.27-crash]$ pahole -F dwarf -C task_struct cast_common.ko | tail
	/* XXX last struct has 1 hole, 1 bit hole */

	/* size: 11584, cachelines: 181, members: 265 */
	/* sum members: 11483, holes: 22, sum holes: 85 */
	/* sum bitfield members: 82 bits, bit holes: 2, sum bit holes: 46 bits */
	/* member types with holes: 4, total: 6, bit holes: 2, total: 2 */
	/* paddings: 6, sum paddings: 49 */
	/* forced alignments: 8, forced holes: 1, sum forced holes: 8 */
};

⬢[acme@toolbox clang-lto-pahole-1.27-crash]$

Also the BTF encoder (that uses the DWARF loader), etc.

⬢[acme@toolbox clang-lto-pahole-1.27-crash]$ pahole -j --btf_encode cast_common.ko
⬢[acme@toolbox clang-lto-pahole-1.27-crash]$ readelf -SW cast_common.ko | grep BTF
  [30] .BTF              PROGBITS        0000000000000000 0223c9 01172c 00      0   0  1
⬢[acme@toolbox clang-lto-pahole-1.27-crash]$ pahole -F btf cast_common.ko | wc -l
4302
⬢[acme@toolbox clang-lto-pahole-1.27-crash]$ set -o vi
⬢[acme@toolbox clang-lto-pahole-1.27-crash]$ pahole -F btf cast_common.ko | head
struct elf32_note {
	Elf32_Word                 n_namesz;             /*     0     4 */
	Elf32_Word                 n_descsz;             /*     4     4 */
	Elf32_Word                 n_type;               /*     8     4 */

	/* size: 12, cachelines: 1, members: 3 */
	/* last cacheline: 12 bytes */
};
struct module {
	enum module_state          state;                /*     0     4 */
⬢[acme@toolbox clang-lto-pahole-1.27-crash]$ pahole -F btf cast_common.ko | tail
	const char  *              mod_name;             /*    24     8 */
	const char  * *            class_names;          /*    32     8 */
	const int                  length;               /*    40     4 */
	const int                  base;                 /*    44     4 */
	enum class_map_type        map_type;             /*    48     4 */

	/* size: 56, cachelines: 1, members: 7 */
	/* padding: 4 */
	/* last cacheline: 56 bytes */
};
⬢[acme@toolbox clang-lto-pahole-1.27-crash]$

Can you try with the one liner below? We remove it from the cus list
unconditionally, and since we alloc space with zalloc/calloc in
cu__new() and missed initializing that list_head (cu->node) we ended up
hitting list_del with a zeroed 'struct list_head' :-\

I'll try and get this cast_common.ko checked into a test repo for pahole
so that this gets regression tested.

Please test this patch so that we see if this is the only problem and
your kernel build with clang completes successfully.

Thanks,

- Arnaldo

Comments

Nathan Chancellor June 17, 2024, 9:08 p.m. UTC | #1
On Mon, Jun 17, 2024 at 05:02:12PM -0300, Arnaldo Carvalho de Melo wrote:
> Can you try with the one liner below? We remove it from the cus list
> unconditionally, and since we alloc space with zalloc/calloc in
> cu__new() and missed initializing that list_head (cu->node) we ended up
> hitting list_del with a zeroed 'struct list_head' :-\
> 
> I'll try and get this cast_common.ko checked into a test repo for pahole
> so that this gets regression tested.
> 
> Please test this patch so that we see if this is the only problem and
> your kernel build with clang completes successfully.

Thanks, I rebuilt pahole with the following diff and both my build and
the other configuration I tested for this regression successfully
complete.

Tested-by: Nathan Chancellor <nathan@kernel.org>

> diff --git a/dwarves.c b/dwarves.c
> index 1ec259f50dbd3778..823a01524a12bb37 100644
> --- a/dwarves.c
> +++ b/dwarves.c
> @@ -739,6 +739,7 @@ struct cu *cu__new(const char *name, uint8_t addr_size,
>  		cu->dfops	= NULL;
>  		INIT_LIST_HEAD(&cu->tags);
>  		INIT_LIST_HEAD(&cu->tool_list);
> +		INIT_LIST_HEAD(&cu->node);
>  
>  		cu->addr_size = addr_size;
>  		cu->extra_dbg_info = 0;
Arnaldo Carvalho de Melo June 18, 2024, 1:51 p.m. UTC | #2
On Mon, Jun 17, 2024 at 02:08:10PM -0700, Nathan Chancellor wrote:
> On Mon, Jun 17, 2024 at 05:02:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > Can you try with the one liner below? We remove it from the cus list
> > unconditionally, and since we alloc space with zalloc/calloc in
> > cu__new() and missed initializing that list_head (cu->node) we ended up
> > hitting list_del with a zeroed 'struct list_head' :-\
> > 
> > I'll try and get this cast_common.ko checked into a test repo for pahole
> > so that this gets regression tested.
> > 
> > Please test this patch so that we see if this is the only problem and
> > your kernel build with clang completes successfully.
> 
> Thanks, I rebuilt pahole with the following diff and both my build and
> the other configuration I tested for this regression successfully
> complete.
> 
> Tested-by: Nathan Chancellor <nathan@kernel.org>

Great, I just added this:

From 6a2b27c0f512619b0e7a769a18a0fb05bb3789a5 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Tue, 18 Jun 2024 10:37:30 -0300
Subject: [PATCH 1/1] core: Initialize cu->node with INIT_LIST_HEAD()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In cu__new() zalloc() is used defensively, and that helped catch this
problem where we assume that a cu us in the cus list of cu instances,
but that is not the case when we use cus__merge_and_process_cu(), for
instance when loading files created by clang with LTO, as reported by
Peter Jung and narrowed down by Nathan Chancellor.

If we use INIT_LIST_HEAD() in cu__new() to initialize cu->node, which is
what we do with other lists and nodes there, then the unconditional
removal using list_del_init() will be a no-op and removing something not
on the cus list of cu instances will not cause problems, just keep an
unconsistent cus->nr_entries field.

So lets just have this fix in first, keeping Nathan's Tested-by and then
do the a bit more involved fix of either adding that cu to the cus list
or checking at removal time if it is there.

  Program received signal SIGSEGV, Segmentation fault.
  0x00007ffff7f1e13e in __list_del (prev=0x0, next=0x0) at /home/acme/git/pahole/list.h:106
  106		next->prev = prev;
  (gdb) bt
  #0  0x00007ffff7f1e13e in __list_del (prev=0x0, next=0x0) at /home/acme/git/pahole/list.h:106
  #1  0x00007ffff7f1e176 in list_del_init (entry=0x417980) at /home/acme/git/pahole/list.h:165
  #2  0x00007ffff7f1f8f9 in __cus__remove (cus=0x4142a0, cu=0x417980) at /home/acme/git/pahole/dwarves.c:527
  #3  0x00007ffff7f1f92b in cus__remove (cus=0x4142a0, cu=0x417980) at /home/acme/git/pahole/dwarves.c:533
  #4  0x00007ffff7f3d01c in cus__finalize (cus=0x4142a0, cu=0x417980, conf=0x4133c0 <conf_load>, thr_data=0x0)
      at /home/acme/git/pahole/dwarf_loader.c:3040
  #5  0x00007ffff7f3e05c in cus__merge_and_process_cu (cus=0x4142a0, conf=0x4133c0 <conf_load>, mod=0x415cf0, dw=0x416110, elf=0x414380,
      filename=0x7fffffffe3f7 "cast_common.ko", build_id=0x416680 "\265D\371U\213\373u|\037\250\242\032\271\365⒜]y\023", build_id_len=20,
      type_dcu=0x0) at /home/acme/git/pahole/dwarf_loader.c:3482
  #6  0x00007ffff7f3e218 in cus__load_module (cus=0x4142a0, conf=0x4133c0 <conf_load>, mod=0x415cf0, dw=0x416110, elf=0x414380,
      filename=0x7fffffffe3f7 "cast_common.ko") at /home/acme/git/pahole/dwarf_loader.c:3521
  #7  0x00007ffff7f3e396 in cus__process_dwflmod (dwflmod=0x415cf0, userdata=0x415d00, name=0x415ea0 "cast_common.ko", base=65536,
      arg=0x7fffffffde40) at /home/acme/git/pahole/dwarf_loader.c:3581
  #8  0x00007ffff7eb4609 in dwfl_getmodules (dwfl=0x414300, callback=0x7ffff7f3e2ec <cus__process_dwflmod>, arg=0x7fffffffde40, offset=0)
      at ../libdwfl/dwfl_getmodules.c:86
  #9  0x00007ffff7f3e4c5 in cus__process_file (cus=0x4142a0, conf=0x4133c0 <conf_load>, fd=3, filename=0x7fffffffe3f7 "cast_common.ko")
      at /home/acme/git/pahole/dwarf_loader.c:3647
  #10 0x00007ffff7f3e5cd in dwarf__load_file (cus=0x4142a0, conf=0x4133c0 <conf_load>, filename=0x7fffffffe3f7 "cast_common.ko")
      at /home/acme/git/pahole/dwarf_loader.c:3684
  #11 0x00007ffff7f232df in cus__load_file (cus=0x4142a0, conf=0x4133c0 <conf_load>, filename=0x7fffffffe3f7 "cast_common.ko")
      at /home/acme/git/pahole/dwarves.c:2134
  #12 0x00007ffff7f23e8b in cus__load_files (cus=0x4142a0, conf=0x4133c0 <conf_load>, filenames=0x7fffffffe0f0)
      at /home/acme/git/pahole/dwarves.c:2637
  #13 0x000000000040aec0 in main (argc=2, argv=0x7fffffffe0e8) at /home/acme/git/pahole/pahole.c:3805
  (gdb) fr 1
  #1  0x00007ffff7f1e176 in list_del_init (entry=0x417980) at /home/acme/git/pahole/list.h:165
  165		__list_del(entry->prev, entry->next);
  (gdb) p entry
  $1 = (struct list_head *) 0x417980
  (gdb) p entry->next
  $2 = (struct list_head *) 0x0
  (gdb) p entry->prev
  $3 = (struct list_head *) 0x0

Closes: https://github.com/acmel/dwarves/issues/53
Closes: https://gitlab.archlinux.org/archlinux/packaging/packages/pahole/-/issues/1
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/all/20240617210810.GA1877676@thelio-3990X
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dwarves.c b/dwarves.c
index 1ec259f50dbd3778..823a01524a12bb37 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -739,6 +739,7 @@ struct cu *cu__new(const char *name, uint8_t addr_size,
 		cu->dfops	= NULL;
 		INIT_LIST_HEAD(&cu->tags);
 		INIT_LIST_HEAD(&cu->tool_list);
+		INIT_LIST_HEAD(&cu->node);
 
 		cu->addr_size = addr_size;
 		cu->extra_dbg_info = 0;
Nathan Chancellor July 10, 2024, 7:31 p.m. UTC | #3
Hi Arnaldo,

On Tue, Jun 18, 2024 at 10:51:44AM -0300, Arnaldo Carvalho de Melo wrote:
> >From 6a2b27c0f512619b0e7a769a18a0fb05bb3789a5 Mon Sep 17 00:00:00 2001
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date: Tue, 18 Jun 2024 10:37:30 -0300
> Subject: [PATCH 1/1] core: Initialize cu->node with INIT_LIST_HEAD()

Could a new release be cut for this issue? Several people have been bit
by this (including CI systems) and two distributions have talked about
backporting this change on top of 1.27 to resolve it:

https://gitlab.archlinux.org/archlinux/packaging/packages/pahole/-/issues/1
https://src.fedoraproject.org/rpms/dwarves/pull-request/4

Cheers,
Nathan
diff mbox series

Patch

diff --git a/dwarves.c b/dwarves.c
index 1ec259f50dbd3778..823a01524a12bb37 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -739,6 +739,7 @@  struct cu *cu__new(const char *name, uint8_t addr_size,
 		cu->dfops	= NULL;
 		INIT_LIST_HEAD(&cu->tags);
 		INIT_LIST_HEAD(&cu->tool_list);
+		INIT_LIST_HEAD(&cu->node);
 
 		cu->addr_size = addr_size;
 		cu->extra_dbg_info = 0;