mbox series

[v3,0/8] perf: Support multiple system call tables in the build

Message ID 20250219185657.280286-1-irogers@google.com (mailing list archive)
Headers show
Series perf: Support multiple system call tables in the build | expand

Message

Ian Rogers Feb. 19, 2025, 6:56 p.m. UTC
This work builds on the clean up of system call tables and removal of
libaudit by Charlie Jenkins <charlie@rivosinc.com>.

The system call table in perf trace is used to map system call numbers
to names and vice versa. Prior to these changes, a single table
matching the perf binary's build was present. The table would be
incorrect if tracing say a 32-bit binary from a 64-bit version of
perf, the names and numbers wouldn't match.

Change the build so that a single system call file is built and the
potentially multiple tables are identifiable from the ELF machine type
of the process being examined. To determine the ELF machine type, the
executable's header is read from /proc/pid/exe with fallbacks to using
the perf's binary type when unknown.

Remove some runtime types used by the system call tables and make
equivalents generated at build time.

v3: Add Charlie's reviewed-by tags. Incorporate feedback from Arnd
    Bergmann <arnd@arndb.de> on additional optional column and MIPS
    system call numbering. Rebase past Namhyung's global system call
    statistics and add comments that they don't yet support an
    e_machine other than EM_HOST.

v2: Change the 1 element cache for the last table as suggested by
    Howard Chu, add Howard's reviewed-by tags.
    Add a comment and apology to Charlie for not doing better in
    guiding:
    https://lore.kernel.org/all/20250114-perf_syscall_arch_runtime-v1-1-5b304e408e11@rivosinc.com/
    After discussion on v1 and he agreed this patch series would be
    the better direction.

Ian Rogers (8):
  perf syscalltble: Remove syscall_table.h
  perf trace: Reorganize syscalls
  perf syscalltbl: Remove struct syscalltbl
  perf thread: Add support for reading the e_machine type for a thread
  perf trace beauty: Add syscalltbl.sh generating all system call tables
  perf syscalltbl: Use lookup table containing multiple architectures
  perf build: Remove Makefile.syscalls
  perf syscalltbl: Mask off ABI type for MIPS system calls

 tools/perf/Makefile.perf                      |  10 +-
 tools/perf/arch/alpha/entry/syscalls/Kbuild   |   2 -
 .../alpha/entry/syscalls/Makefile.syscalls    |   5 -
 tools/perf/arch/alpha/include/syscall_table.h |   2 -
 tools/perf/arch/arc/entry/syscalls/Kbuild     |   2 -
 .../arch/arc/entry/syscalls/Makefile.syscalls |   3 -
 tools/perf/arch/arc/include/syscall_table.h   |   2 -
 tools/perf/arch/arm/entry/syscalls/Kbuild     |   4 -
 .../arch/arm/entry/syscalls/Makefile.syscalls |   2 -
 tools/perf/arch/arm/include/syscall_table.h   |   2 -
 tools/perf/arch/arm64/entry/syscalls/Kbuild   |   3 -
 .../arm64/entry/syscalls/Makefile.syscalls    |   6 -
 tools/perf/arch/arm64/include/syscall_table.h |   8 -
 tools/perf/arch/csky/entry/syscalls/Kbuild    |   2 -
 .../csky/entry/syscalls/Makefile.syscalls     |   3 -
 tools/perf/arch/csky/include/syscall_table.h  |   2 -
 .../perf/arch/loongarch/entry/syscalls/Kbuild |   2 -
 .../entry/syscalls/Makefile.syscalls          |   3 -
 .../arch/loongarch/include/syscall_table.h    |   2 -
 tools/perf/arch/mips/entry/syscalls/Kbuild    |   2 -
 .../mips/entry/syscalls/Makefile.syscalls     |   5 -
 tools/perf/arch/mips/include/syscall_table.h  |   2 -
 tools/perf/arch/parisc/entry/syscalls/Kbuild  |   3 -
 .../parisc/entry/syscalls/Makefile.syscalls   |   6 -
 .../perf/arch/parisc/include/syscall_table.h  |   8 -
 tools/perf/arch/powerpc/entry/syscalls/Kbuild |   3 -
 .../powerpc/entry/syscalls/Makefile.syscalls  |   6 -
 .../perf/arch/powerpc/include/syscall_table.h |   8 -
 tools/perf/arch/riscv/entry/syscalls/Kbuild   |   2 -
 .../riscv/entry/syscalls/Makefile.syscalls    |   4 -
 tools/perf/arch/riscv/include/syscall_table.h |   8 -
 tools/perf/arch/s390/entry/syscalls/Kbuild    |   2 -
 .../s390/entry/syscalls/Makefile.syscalls     |   5 -
 tools/perf/arch/s390/include/syscall_table.h  |   2 -
 tools/perf/arch/sh/entry/syscalls/Kbuild      |   2 -
 .../arch/sh/entry/syscalls/Makefile.syscalls  |   4 -
 tools/perf/arch/sh/include/syscall_table.h    |   2 -
 tools/perf/arch/sparc/entry/syscalls/Kbuild   |   3 -
 .../sparc/entry/syscalls/Makefile.syscalls    |   5 -
 tools/perf/arch/sparc/include/syscall_table.h |   8 -
 tools/perf/arch/x86/entry/syscalls/Kbuild     |   3 -
 .../arch/x86/entry/syscalls/Makefile.syscalls |   6 -
 tools/perf/arch/x86/include/syscall_table.h   |   8 -
 tools/perf/arch/xtensa/entry/syscalls/Kbuild  |   2 -
 .../xtensa/entry/syscalls/Makefile.syscalls   |   4 -
 .../perf/arch/xtensa/include/syscall_table.h  |   2 -
 tools/perf/builtin-trace.c                    | 290 +++++++++++-------
 tools/perf/scripts/Makefile.syscalls          |  61 ----
 tools/perf/scripts/syscalltbl.sh              |  86 ------
 tools/perf/trace/beauty/syscalltbl.sh         | 274 +++++++++++++++++
 tools/perf/util/syscalltbl.c                  | 148 ++++-----
 tools/perf/util/syscalltbl.h                  |  22 +-
 tools/perf/util/thread.c                      |  50 +++
 tools/perf/util/thread.h                      |  14 +-
 54 files changed, 616 insertions(+), 509 deletions(-)
 delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/alpha/include/syscall_table.h
 delete mode 100644 tools/perf/arch/arc/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/arc/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/arc/include/syscall_table.h
 delete mode 100644 tools/perf/arch/arm/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/arm/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/arm/include/syscall_table.h
 delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/arm64/include/syscall_table.h
 delete mode 100644 tools/perf/arch/csky/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/csky/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/csky/include/syscall_table.h
 delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/loongarch/include/syscall_table.h
 delete mode 100644 tools/perf/arch/mips/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/mips/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/mips/include/syscall_table.h
 delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/parisc/include/syscall_table.h
 delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/powerpc/include/syscall_table.h
 delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/riscv/include/syscall_table.h
 delete mode 100644 tools/perf/arch/s390/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/s390/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/s390/include/syscall_table.h
 delete mode 100644 tools/perf/arch/sh/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/sh/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/sh/include/syscall_table.h
 delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/sparc/include/syscall_table.h
 delete mode 100644 tools/perf/arch/x86/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/x86/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/x86/include/syscall_table.h
 delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Kbuild
 delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Makefile.syscalls
 delete mode 100644 tools/perf/arch/xtensa/include/syscall_table.h
 delete mode 100644 tools/perf/scripts/Makefile.syscalls
 delete mode 100755 tools/perf/scripts/syscalltbl.sh
 create mode 100755 tools/perf/trace/beauty/syscalltbl.sh

Comments

Namhyung Kim Feb. 25, 2025, 3:05 a.m. UTC | #1
On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> This work builds on the clean up of system call tables and removal of
> libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> 
> The system call table in perf trace is used to map system call numbers
> to names and vice versa. Prior to these changes, a single table
> matching the perf binary's build was present. The table would be
> incorrect if tracing say a 32-bit binary from a 64-bit version of
> perf, the names and numbers wouldn't match.
> 
> Change the build so that a single system call file is built and the
> potentially multiple tables are identifiable from the ELF machine type
> of the process being examined. To determine the ELF machine type, the
> executable's header is read from /proc/pid/exe with fallbacks to using
> the perf's binary type when unknown.
> 
> Remove some runtime types used by the system call tables and make
> equivalents generated at build time.

So I tested this with a test program.

  $ cat a.c
  #include <stdio.h>
  int main(void)
  {
  	char buf[4096];
  	FILE *fp = fopen("a.c", "r");
  	size_t len;
  
  	len = fread(buf, sizeof(buf), 1, fp);
  	fwrite(buf, 1, len, stdout);
  	fflush(stdout);
  	fclose(fp);
  	return 0;
  }
  
  $ gcc -o a64.out a.c
  $ gcc -o a32.out -m32 a.c
  
  $ ./perf version
  perf version 6.14.rc1.ge002a64f6188
  
  $ git show
  commit e002a64f61882626992dd6513c0db3711c06fea7 (HEAD -> perf-check)
  Author: Ian Rogers <irogers@google.com>
  Date:   Wed Feb 19 10:56:57 2025 -0800
  
      perf syscalltbl: Mask off ABI type for MIPS system calls
      
      Arnd Bergmann described that MIPS system calls don't necessarily start
      from 0 as an ABI prefix is applied:
      https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
      When decoding the "id" (aka system call number) for MIPS ignore values
      greater-than 1000.
      
      Signed-off-by: Ian Rogers <irogers@google.com>

It works well with 64bit.

  $ sudo ./perf trace ./a64.out |& tail
       0.266 ( 0.007 ms): a64.out/858681 munmap(addr: 0x7f392723a000, len: 109058)                             = 0
       0.286 ( 0.002 ms): a64.out/858681 getrandom(ubuf: 0x7f3927232178, len: 8, flags: NONBLOCK)              = 8
       0.289 ( 0.001 ms): a64.out/858681 brk()                                                                 = 0x56419ecf7000
       0.291 ( 0.002 ms): a64.out/858681 brk(brk: 0x56419ed18000)                                              = 0x56419ed18000
       0.299 ( 0.009 ms): a64.out/858681 openat(dfd: CWD, filename: "a.c")                                     = 3
       0.312 ( 0.001 ms): a64.out/858681 fstat(fd: 3, statbuf: 0x7ffdfadf1eb0)                                 = 0
       0.315 ( 0.002 ms): a64.out/858681 read(fd: 3, buf: 0x7ffdfadf2030, count: 4096)                         = 211
       0.318 ( 0.009 ms): a64.out/858681 read(fd: 3, buf: 0x56419ecf7480, count: 4096)                         = 0
       0.330 ( 0.001 ms): a64.out/858681 close(fd: 3)                                                          = 0
       0.338 (         ): a64.out/858681 exit_group()                                                          = ?

But 32bit is still broken and use 64bit syscall table wrongly.

  $ file a32.out
  a32.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2,
  BuildID[sha1]=6eea873c939012e6c715e8f030261642bf61cb4e, for GNU/Linux 3.2.0, not stripped

  $ sudo ./perf trace ./a32.out |& tail
       0.296 ( 0.001 ms): a32.out/858699 getxattr(pathname: "", name: "������", value: 0xf7f6ce14, size: 1)  = 0
       0.305 ( 0.007 ms): a32.out/858699 fchmod(fd: -134774784, mode: IFLNK|ISUID|ISVTX|IWOTH|0x10000)         = 0
       0.333 ( 0.001 ms): a32.out/858699 recvfrom(size: 4160146964, flags: RST|0x20000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1481879552
       0.335 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, ubuf: 0xf7f71278, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
       0.355 ( 0.002 ms): a32.out/858699 recvfrom(fd: 1482018816, ubuf: 0x5855d000, size: 4160146964, flags: RST|NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482018816
       0.362 ( 0.010 ms): a32.out/858699 preadv(fd: 4294967196, vec: (struct iovec){.iov_base = (void *)0x1b01000000632e62,.iov_len = (__kernel_size_t)1125899909479171,}, pos_h: 4160146964) = 3
       0.385 ( 0.002 ms): a32.out/858699 close(fd: 3)                                                          = 211
       0.388 ( 0.001 ms): a32.out/858699 close(fd: 3)                                                          = 0
       0.393 ( 0.002 ms): a32.out/858699 lstat(filename: "")                                                   = 0
       0.396 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720

The last 5 should be openat, read, read, close and brk(?).

Thanks,
Namhyung

> 
> v3: Add Charlie's reviewed-by tags. Incorporate feedback from Arnd
>     Bergmann <arnd@arndb.de> on additional optional column and MIPS
>     system call numbering. Rebase past Namhyung's global system call
>     statistics and add comments that they don't yet support an
>     e_machine other than EM_HOST.
> 
> v2: Change the 1 element cache for the last table as suggested by
>     Howard Chu, add Howard's reviewed-by tags.
>     Add a comment and apology to Charlie for not doing better in
>     guiding:
>     https://lore.kernel.org/all/20250114-perf_syscall_arch_runtime-v1-1-5b304e408e11@rivosinc.com/
>     After discussion on v1 and he agreed this patch series would be
>     the better direction.
> 
> Ian Rogers (8):
>   perf syscalltble: Remove syscall_table.h
>   perf trace: Reorganize syscalls
>   perf syscalltbl: Remove struct syscalltbl
>   perf thread: Add support for reading the e_machine type for a thread
>   perf trace beauty: Add syscalltbl.sh generating all system call tables
>   perf syscalltbl: Use lookup table containing multiple architectures
>   perf build: Remove Makefile.syscalls
>   perf syscalltbl: Mask off ABI type for MIPS system calls
> 
>  tools/perf/Makefile.perf                      |  10 +-
>  tools/perf/arch/alpha/entry/syscalls/Kbuild   |   2 -
>  .../alpha/entry/syscalls/Makefile.syscalls    |   5 -
>  tools/perf/arch/alpha/include/syscall_table.h |   2 -
>  tools/perf/arch/arc/entry/syscalls/Kbuild     |   2 -
>  .../arch/arc/entry/syscalls/Makefile.syscalls |   3 -
>  tools/perf/arch/arc/include/syscall_table.h   |   2 -
>  tools/perf/arch/arm/entry/syscalls/Kbuild     |   4 -
>  .../arch/arm/entry/syscalls/Makefile.syscalls |   2 -
>  tools/perf/arch/arm/include/syscall_table.h   |   2 -
>  tools/perf/arch/arm64/entry/syscalls/Kbuild   |   3 -
>  .../arm64/entry/syscalls/Makefile.syscalls    |   6 -
>  tools/perf/arch/arm64/include/syscall_table.h |   8 -
>  tools/perf/arch/csky/entry/syscalls/Kbuild    |   2 -
>  .../csky/entry/syscalls/Makefile.syscalls     |   3 -
>  tools/perf/arch/csky/include/syscall_table.h  |   2 -
>  .../perf/arch/loongarch/entry/syscalls/Kbuild |   2 -
>  .../entry/syscalls/Makefile.syscalls          |   3 -
>  .../arch/loongarch/include/syscall_table.h    |   2 -
>  tools/perf/arch/mips/entry/syscalls/Kbuild    |   2 -
>  .../mips/entry/syscalls/Makefile.syscalls     |   5 -
>  tools/perf/arch/mips/include/syscall_table.h  |   2 -
>  tools/perf/arch/parisc/entry/syscalls/Kbuild  |   3 -
>  .../parisc/entry/syscalls/Makefile.syscalls   |   6 -
>  .../perf/arch/parisc/include/syscall_table.h  |   8 -
>  tools/perf/arch/powerpc/entry/syscalls/Kbuild |   3 -
>  .../powerpc/entry/syscalls/Makefile.syscalls  |   6 -
>  .../perf/arch/powerpc/include/syscall_table.h |   8 -
>  tools/perf/arch/riscv/entry/syscalls/Kbuild   |   2 -
>  .../riscv/entry/syscalls/Makefile.syscalls    |   4 -
>  tools/perf/arch/riscv/include/syscall_table.h |   8 -
>  tools/perf/arch/s390/entry/syscalls/Kbuild    |   2 -
>  .../s390/entry/syscalls/Makefile.syscalls     |   5 -
>  tools/perf/arch/s390/include/syscall_table.h  |   2 -
>  tools/perf/arch/sh/entry/syscalls/Kbuild      |   2 -
>  .../arch/sh/entry/syscalls/Makefile.syscalls  |   4 -
>  tools/perf/arch/sh/include/syscall_table.h    |   2 -
>  tools/perf/arch/sparc/entry/syscalls/Kbuild   |   3 -
>  .../sparc/entry/syscalls/Makefile.syscalls    |   5 -
>  tools/perf/arch/sparc/include/syscall_table.h |   8 -
>  tools/perf/arch/x86/entry/syscalls/Kbuild     |   3 -
>  .../arch/x86/entry/syscalls/Makefile.syscalls |   6 -
>  tools/perf/arch/x86/include/syscall_table.h   |   8 -
>  tools/perf/arch/xtensa/entry/syscalls/Kbuild  |   2 -
>  .../xtensa/entry/syscalls/Makefile.syscalls   |   4 -
>  .../perf/arch/xtensa/include/syscall_table.h  |   2 -
>  tools/perf/builtin-trace.c                    | 290 +++++++++++-------
>  tools/perf/scripts/Makefile.syscalls          |  61 ----
>  tools/perf/scripts/syscalltbl.sh              |  86 ------
>  tools/perf/trace/beauty/syscalltbl.sh         | 274 +++++++++++++++++
>  tools/perf/util/syscalltbl.c                  | 148 ++++-----
>  tools/perf/util/syscalltbl.h                  |  22 +-
>  tools/perf/util/thread.c                      |  50 +++
>  tools/perf/util/thread.h                      |  14 +-
>  54 files changed, 616 insertions(+), 509 deletions(-)
>  delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/alpha/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/arc/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/arc/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/arc/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/arm/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/arm/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/arm/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/arm64/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/csky/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/csky/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/csky/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/loongarch/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/mips/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/mips/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/mips/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/parisc/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/powerpc/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/riscv/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/s390/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/s390/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/s390/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/sh/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/sh/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/sh/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/sparc/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/x86/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/x86/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/x86/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/xtensa/include/syscall_table.h
>  delete mode 100644 tools/perf/scripts/Makefile.syscalls
>  delete mode 100755 tools/perf/scripts/syscalltbl.sh
>  create mode 100755 tools/perf/trace/beauty/syscalltbl.sh
> 
> -- 
> 2.48.1.601.g30ceb7b040-goog
>
Namhyung Kim Feb. 25, 2025, 3:20 a.m. UTC | #2
On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> This work builds on the clean up of system call tables and removal of
> libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> 
> The system call table in perf trace is used to map system call numbers
> to names and vice versa. Prior to these changes, a single table
> matching the perf binary's build was present. The table would be
> incorrect if tracing say a 32-bit binary from a 64-bit version of
> perf, the names and numbers wouldn't match.
> 
> Change the build so that a single system call file is built and the
> potentially multiple tables are identifiable from the ELF machine type
> of the process being examined. To determine the ELF machine type, the
> executable's header is read from /proc/pid/exe with fallbacks to using
> the perf's binary type when unknown.

Hmm.. then this is limited to live mode and potentially detect wrong
machine type if it reads an old data, right?

Also IIUC fallback to the perf binary means it cannot use cross-machine
table.  For example, it cannot process data from ARM64 on x86, no?  It
seems it should use perf_env.arch.

One more concern is BPF.  The BPF should know about the ABI of the
current process so that it can augment the syscall arguments correctly.
Currently it only checks the syscall number but it can be different on
32-bit and 64-bit.

Thanks,
Namhyung


> 
> Remove some runtime types used by the system call tables and make
> equivalents generated at build time.
> 
> v3: Add Charlie's reviewed-by tags. Incorporate feedback from Arnd
>     Bergmann <arnd@arndb.de> on additional optional column and MIPS
>     system call numbering. Rebase past Namhyung's global system call
>     statistics and add comments that they don't yet support an
>     e_machine other than EM_HOST.
> 
> v2: Change the 1 element cache for the last table as suggested by
>     Howard Chu, add Howard's reviewed-by tags.
>     Add a comment and apology to Charlie for not doing better in
>     guiding:
>     https://lore.kernel.org/all/20250114-perf_syscall_arch_runtime-v1-1-5b304e408e11@rivosinc.com/
>     After discussion on v1 and he agreed this patch series would be
>     the better direction.
> 
> Ian Rogers (8):
>   perf syscalltble: Remove syscall_table.h
>   perf trace: Reorganize syscalls
>   perf syscalltbl: Remove struct syscalltbl
>   perf thread: Add support for reading the e_machine type for a thread
>   perf trace beauty: Add syscalltbl.sh generating all system call tables
>   perf syscalltbl: Use lookup table containing multiple architectures
>   perf build: Remove Makefile.syscalls
>   perf syscalltbl: Mask off ABI type for MIPS system calls
> 
>  tools/perf/Makefile.perf                      |  10 +-
>  tools/perf/arch/alpha/entry/syscalls/Kbuild   |   2 -
>  .../alpha/entry/syscalls/Makefile.syscalls    |   5 -
>  tools/perf/arch/alpha/include/syscall_table.h |   2 -
>  tools/perf/arch/arc/entry/syscalls/Kbuild     |   2 -
>  .../arch/arc/entry/syscalls/Makefile.syscalls |   3 -
>  tools/perf/arch/arc/include/syscall_table.h   |   2 -
>  tools/perf/arch/arm/entry/syscalls/Kbuild     |   4 -
>  .../arch/arm/entry/syscalls/Makefile.syscalls |   2 -
>  tools/perf/arch/arm/include/syscall_table.h   |   2 -
>  tools/perf/arch/arm64/entry/syscalls/Kbuild   |   3 -
>  .../arm64/entry/syscalls/Makefile.syscalls    |   6 -
>  tools/perf/arch/arm64/include/syscall_table.h |   8 -
>  tools/perf/arch/csky/entry/syscalls/Kbuild    |   2 -
>  .../csky/entry/syscalls/Makefile.syscalls     |   3 -
>  tools/perf/arch/csky/include/syscall_table.h  |   2 -
>  .../perf/arch/loongarch/entry/syscalls/Kbuild |   2 -
>  .../entry/syscalls/Makefile.syscalls          |   3 -
>  .../arch/loongarch/include/syscall_table.h    |   2 -
>  tools/perf/arch/mips/entry/syscalls/Kbuild    |   2 -
>  .../mips/entry/syscalls/Makefile.syscalls     |   5 -
>  tools/perf/arch/mips/include/syscall_table.h  |   2 -
>  tools/perf/arch/parisc/entry/syscalls/Kbuild  |   3 -
>  .../parisc/entry/syscalls/Makefile.syscalls   |   6 -
>  .../perf/arch/parisc/include/syscall_table.h  |   8 -
>  tools/perf/arch/powerpc/entry/syscalls/Kbuild |   3 -
>  .../powerpc/entry/syscalls/Makefile.syscalls  |   6 -
>  .../perf/arch/powerpc/include/syscall_table.h |   8 -
>  tools/perf/arch/riscv/entry/syscalls/Kbuild   |   2 -
>  .../riscv/entry/syscalls/Makefile.syscalls    |   4 -
>  tools/perf/arch/riscv/include/syscall_table.h |   8 -
>  tools/perf/arch/s390/entry/syscalls/Kbuild    |   2 -
>  .../s390/entry/syscalls/Makefile.syscalls     |   5 -
>  tools/perf/arch/s390/include/syscall_table.h  |   2 -
>  tools/perf/arch/sh/entry/syscalls/Kbuild      |   2 -
>  .../arch/sh/entry/syscalls/Makefile.syscalls  |   4 -
>  tools/perf/arch/sh/include/syscall_table.h    |   2 -
>  tools/perf/arch/sparc/entry/syscalls/Kbuild   |   3 -
>  .../sparc/entry/syscalls/Makefile.syscalls    |   5 -
>  tools/perf/arch/sparc/include/syscall_table.h |   8 -
>  tools/perf/arch/x86/entry/syscalls/Kbuild     |   3 -
>  .../arch/x86/entry/syscalls/Makefile.syscalls |   6 -
>  tools/perf/arch/x86/include/syscall_table.h   |   8 -
>  tools/perf/arch/xtensa/entry/syscalls/Kbuild  |   2 -
>  .../xtensa/entry/syscalls/Makefile.syscalls   |   4 -
>  .../perf/arch/xtensa/include/syscall_table.h  |   2 -
>  tools/perf/builtin-trace.c                    | 290 +++++++++++-------
>  tools/perf/scripts/Makefile.syscalls          |  61 ----
>  tools/perf/scripts/syscalltbl.sh              |  86 ------
>  tools/perf/trace/beauty/syscalltbl.sh         | 274 +++++++++++++++++
>  tools/perf/util/syscalltbl.c                  | 148 ++++-----
>  tools/perf/util/syscalltbl.h                  |  22 +-
>  tools/perf/util/thread.c                      |  50 +++
>  tools/perf/util/thread.h                      |  14 +-
>  54 files changed, 616 insertions(+), 509 deletions(-)
>  delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/alpha/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/arc/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/arc/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/arc/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/arm/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/arm/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/arm/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/arm64/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/csky/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/csky/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/csky/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/loongarch/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/mips/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/mips/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/mips/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/parisc/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/powerpc/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/riscv/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/s390/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/s390/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/s390/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/sh/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/sh/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/sh/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/sparc/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/x86/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/x86/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/x86/include/syscall_table.h
>  delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Kbuild
>  delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Makefile.syscalls
>  delete mode 100644 tools/perf/arch/xtensa/include/syscall_table.h
>  delete mode 100644 tools/perf/scripts/Makefile.syscalls
>  delete mode 100755 tools/perf/scripts/syscalltbl.sh
>  create mode 100755 tools/perf/trace/beauty/syscalltbl.sh
> 
> -- 
> 2.48.1.601.g30ceb7b040-goog
>
Ian Rogers Feb. 25, 2025, 4:22 a.m. UTC | #3
On Mon, Feb 24, 2025 at 7:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> > This work builds on the clean up of system call tables and removal of
> > libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> >
> > The system call table in perf trace is used to map system call numbers
> > to names and vice versa. Prior to these changes, a single table
> > matching the perf binary's build was present. The table would be
> > incorrect if tracing say a 32-bit binary from a 64-bit version of
> > perf, the names and numbers wouldn't match.
> >
> > Change the build so that a single system call file is built and the
> > potentially multiple tables are identifiable from the ELF machine type
> > of the process being examined. To determine the ELF machine type, the
> > executable's header is read from /proc/pid/exe with fallbacks to using
> > the perf's binary type when unknown.
>
> Hmm.. then this is limited to live mode and potentially detect wrong
> machine type if it reads an old data, right?
>
> Also IIUC fallback to the perf binary means it cannot use cross-machine
> table.  For example, it cannot process data from ARM64 on x86, no?  It
> seems it should use perf_env.arch.

The perf env arch is kind of horrid. On x86 it has the value x86 and
then there is an extra 64bit flag, who knows how x32 should be encoded
- but we barely support x32 as-is. I'd rather we added a new feature
for the e_machine/e_flags of the executable and worked with those, but
it is kind of weird with doing system wide mode. I didn't want to drag
that into this patch series anyway as there is already enough here.

> One more concern is BPF.  The BPF should know about the ABI of the
> current process so that it can augment the syscall arguments correctly.
> Currently it only checks the syscall number but it can be different on
> 32-bit and 64-bit.

That's right. This change is trying to clean up
tools/perf/util/syscalltbl.c and the perf trace usage. I didn't go as
far as making BPF programs pair system call number with e_machine and
e_flags, there is enough here and the behavior after these patches
matches the behavior before - that is to assume the system call ABI
matches that of the perf binary.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> >
> > Remove some runtime types used by the system call tables and make
> > equivalents generated at build time.
> >
> > v3: Add Charlie's reviewed-by tags. Incorporate feedback from Arnd
> >     Bergmann <arnd@arndb.de> on additional optional column and MIPS
> >     system call numbering. Rebase past Namhyung's global system call
> >     statistics and add comments that they don't yet support an
> >     e_machine other than EM_HOST.
> >
> > v2: Change the 1 element cache for the last table as suggested by
> >     Howard Chu, add Howard's reviewed-by tags.
> >     Add a comment and apology to Charlie for not doing better in
> >     guiding:
> >     https://lore.kernel.org/all/20250114-perf_syscall_arch_runtime-v1-1-5b304e408e11@rivosinc.com/
> >     After discussion on v1 and he agreed this patch series would be
> >     the better direction.
> >
> > Ian Rogers (8):
> >   perf syscalltble: Remove syscall_table.h
> >   perf trace: Reorganize syscalls
> >   perf syscalltbl: Remove struct syscalltbl
> >   perf thread: Add support for reading the e_machine type for a thread
> >   perf trace beauty: Add syscalltbl.sh generating all system call tables
> >   perf syscalltbl: Use lookup table containing multiple architectures
> >   perf build: Remove Makefile.syscalls
> >   perf syscalltbl: Mask off ABI type for MIPS system calls
> >
> >  tools/perf/Makefile.perf                      |  10 +-
> >  tools/perf/arch/alpha/entry/syscalls/Kbuild   |   2 -
> >  .../alpha/entry/syscalls/Makefile.syscalls    |   5 -
> >  tools/perf/arch/alpha/include/syscall_table.h |   2 -
> >  tools/perf/arch/arc/entry/syscalls/Kbuild     |   2 -
> >  .../arch/arc/entry/syscalls/Makefile.syscalls |   3 -
> >  tools/perf/arch/arc/include/syscall_table.h   |   2 -
> >  tools/perf/arch/arm/entry/syscalls/Kbuild     |   4 -
> >  .../arch/arm/entry/syscalls/Makefile.syscalls |   2 -
> >  tools/perf/arch/arm/include/syscall_table.h   |   2 -
> >  tools/perf/arch/arm64/entry/syscalls/Kbuild   |   3 -
> >  .../arm64/entry/syscalls/Makefile.syscalls    |   6 -
> >  tools/perf/arch/arm64/include/syscall_table.h |   8 -
> >  tools/perf/arch/csky/entry/syscalls/Kbuild    |   2 -
> >  .../csky/entry/syscalls/Makefile.syscalls     |   3 -
> >  tools/perf/arch/csky/include/syscall_table.h  |   2 -
> >  .../perf/arch/loongarch/entry/syscalls/Kbuild |   2 -
> >  .../entry/syscalls/Makefile.syscalls          |   3 -
> >  .../arch/loongarch/include/syscall_table.h    |   2 -
> >  tools/perf/arch/mips/entry/syscalls/Kbuild    |   2 -
> >  .../mips/entry/syscalls/Makefile.syscalls     |   5 -
> >  tools/perf/arch/mips/include/syscall_table.h  |   2 -
> >  tools/perf/arch/parisc/entry/syscalls/Kbuild  |   3 -
> >  .../parisc/entry/syscalls/Makefile.syscalls   |   6 -
> >  .../perf/arch/parisc/include/syscall_table.h  |   8 -
> >  tools/perf/arch/powerpc/entry/syscalls/Kbuild |   3 -
> >  .../powerpc/entry/syscalls/Makefile.syscalls  |   6 -
> >  .../perf/arch/powerpc/include/syscall_table.h |   8 -
> >  tools/perf/arch/riscv/entry/syscalls/Kbuild   |   2 -
> >  .../riscv/entry/syscalls/Makefile.syscalls    |   4 -
> >  tools/perf/arch/riscv/include/syscall_table.h |   8 -
> >  tools/perf/arch/s390/entry/syscalls/Kbuild    |   2 -
> >  .../s390/entry/syscalls/Makefile.syscalls     |   5 -
> >  tools/perf/arch/s390/include/syscall_table.h  |   2 -
> >  tools/perf/arch/sh/entry/syscalls/Kbuild      |   2 -
> >  .../arch/sh/entry/syscalls/Makefile.syscalls  |   4 -
> >  tools/perf/arch/sh/include/syscall_table.h    |   2 -
> >  tools/perf/arch/sparc/entry/syscalls/Kbuild   |   3 -
> >  .../sparc/entry/syscalls/Makefile.syscalls    |   5 -
> >  tools/perf/arch/sparc/include/syscall_table.h |   8 -
> >  tools/perf/arch/x86/entry/syscalls/Kbuild     |   3 -
> >  .../arch/x86/entry/syscalls/Makefile.syscalls |   6 -
> >  tools/perf/arch/x86/include/syscall_table.h   |   8 -
> >  tools/perf/arch/xtensa/entry/syscalls/Kbuild  |   2 -
> >  .../xtensa/entry/syscalls/Makefile.syscalls   |   4 -
> >  .../perf/arch/xtensa/include/syscall_table.h  |   2 -
> >  tools/perf/builtin-trace.c                    | 290 +++++++++++-------
> >  tools/perf/scripts/Makefile.syscalls          |  61 ----
> >  tools/perf/scripts/syscalltbl.sh              |  86 ------
> >  tools/perf/trace/beauty/syscalltbl.sh         | 274 +++++++++++++++++
> >  tools/perf/util/syscalltbl.c                  | 148 ++++-----
> >  tools/perf/util/syscalltbl.h                  |  22 +-
> >  tools/perf/util/thread.c                      |  50 +++
> >  tools/perf/util/thread.h                      |  14 +-
> >  54 files changed, 616 insertions(+), 509 deletions(-)
> >  delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/alpha/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/arc/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/arc/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/arc/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/arm/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/arm/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/arm/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/arm64/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/csky/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/csky/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/csky/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/loongarch/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/mips/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/mips/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/mips/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/parisc/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/powerpc/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/riscv/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/s390/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/s390/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/s390/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/sh/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/sh/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/sh/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/sparc/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/x86/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/x86/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/x86/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/xtensa/include/syscall_table.h
> >  delete mode 100644 tools/perf/scripts/Makefile.syscalls
> >  delete mode 100755 tools/perf/scripts/syscalltbl.sh
> >  create mode 100755 tools/perf/trace/beauty/syscalltbl.sh
> >
> > --
> > 2.48.1.601.g30ceb7b040-goog
> >
Ian Rogers Feb. 25, 2025, 4:37 a.m. UTC | #4
On Mon, Feb 24, 2025 at 7:05 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> > This work builds on the clean up of system call tables and removal of
> > libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> >
> > The system call table in perf trace is used to map system call numbers
> > to names and vice versa. Prior to these changes, a single table
> > matching the perf binary's build was present. The table would be
> > incorrect if tracing say a 32-bit binary from a 64-bit version of
> > perf, the names and numbers wouldn't match.
> >
> > Change the build so that a single system call file is built and the
> > potentially multiple tables are identifiable from the ELF machine type
> > of the process being examined. To determine the ELF machine type, the
> > executable's header is read from /proc/pid/exe with fallbacks to using
> > the perf's binary type when unknown.
> >
> > Remove some runtime types used by the system call tables and make
> > equivalents generated at build time.
>
> So I tested this with a test program.
>
>   $ cat a.c
>   #include <stdio.h>
>   int main(void)
>   {
>         char buf[4096];
>         FILE *fp = fopen("a.c", "r");
>         size_t len;
>
>         len = fread(buf, sizeof(buf), 1, fp);
>         fwrite(buf, 1, len, stdout);
>         fflush(stdout);
>         fclose(fp);
>         return 0;
>   }
>
>   $ gcc -o a64.out a.c
>   $ gcc -o a32.out -m32 a.c
>
>   $ ./perf version
>   perf version 6.14.rc1.ge002a64f6188
>
>   $ git show
>   commit e002a64f61882626992dd6513c0db3711c06fea7 (HEAD -> perf-check)
>   Author: Ian Rogers <irogers@google.com>
>   Date:   Wed Feb 19 10:56:57 2025 -0800
>
>       perf syscalltbl: Mask off ABI type for MIPS system calls
>
>       Arnd Bergmann described that MIPS system calls don't necessarily start
>       from 0 as an ABI prefix is applied:
>       https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
>       When decoding the "id" (aka system call number) for MIPS ignore values
>       greater-than 1000.
>
>       Signed-off-by: Ian Rogers <irogers@google.com>
>
> It works well with 64bit.
>
>   $ sudo ./perf trace ./a64.out |& tail
>        0.266 ( 0.007 ms): a64.out/858681 munmap(addr: 0x7f392723a000, len: 109058)                             = 0
>        0.286 ( 0.002 ms): a64.out/858681 getrandom(ubuf: 0x7f3927232178, len: 8, flags: NONBLOCK)              = 8
>        0.289 ( 0.001 ms): a64.out/858681 brk()                                                                 = 0x56419ecf7000
>        0.291 ( 0.002 ms): a64.out/858681 brk(brk: 0x56419ed18000)                                              = 0x56419ed18000
>        0.299 ( 0.009 ms): a64.out/858681 openat(dfd: CWD, filename: "a.c")                                     = 3
>        0.312 ( 0.001 ms): a64.out/858681 fstat(fd: 3, statbuf: 0x7ffdfadf1eb0)                                 = 0
>        0.315 ( 0.002 ms): a64.out/858681 read(fd: 3, buf: 0x7ffdfadf2030, count: 4096)                         = 211
>        0.318 ( 0.009 ms): a64.out/858681 read(fd: 3, buf: 0x56419ecf7480, count: 4096)                         = 0
>        0.330 ( 0.001 ms): a64.out/858681 close(fd: 3)                                                          = 0
>        0.338 (         ): a64.out/858681 exit_group()                                                          = ?
>
> But 32bit is still broken and use 64bit syscall table wrongly.
>
>   $ file a32.out
>   a32.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2,
>   BuildID[sha1]=6eea873c939012e6c715e8f030261642bf61cb4e, for GNU/Linux 3.2.0, not stripped
>
>   $ sudo ./perf trace ./a32.out |& tail
>        0.296 ( 0.001 ms): a32.out/858699 getxattr(pathname: "", name: "������", value: 0xf7f6ce14, size: 1)  = 0
>        0.305 ( 0.007 ms): a32.out/858699 fchmod(fd: -134774784, mode: IFLNK|ISUID|ISVTX|IWOTH|0x10000)         = 0
>        0.333 ( 0.001 ms): a32.out/858699 recvfrom(size: 4160146964, flags: RST|0x20000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1481879552
>        0.335 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, ubuf: 0xf7f71278, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
>        0.355 ( 0.002 ms): a32.out/858699 recvfrom(fd: 1482018816, ubuf: 0x5855d000, size: 4160146964, flags: RST|NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482018816
>        0.362 ( 0.010 ms): a32.out/858699 preadv(fd: 4294967196, vec: (struct iovec){.iov_base = (void *)0x1b01000000632e62,.iov_len = (__kernel_size_t)1125899909479171,}, pos_h: 4160146964) = 3
>        0.385 ( 0.002 ms): a32.out/858699 close(fd: 3)                                                          = 211
>        0.388 ( 0.001 ms): a32.out/858699 close(fd: 3)                                                          = 0
>        0.393 ( 0.002 ms): a32.out/858699 lstat(filename: "")                                                   = 0
>        0.396 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
>
> The last 5 should be openat, read, read, close and brk(?).

That's strange as nearly the same test works for me:
```
$ git show
commit 7920020237af8138f7be1a21be9a2918a71ddc5e (HEAD -> ptn-syscalltbl)
Author: Ian Rogers <irogers@google.com>
Date:   Fri Jan 31 21:34:07 2025 -0800

   perf syscalltbl: Mask off ABI type for MIPS system calls

   Arnd Bergmann described that MIPS system calls don't necessarily start
   from 0 as an ABI prefix is applied:
   https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
   When decoding the "id" (aka system call number) for MIPS ignore values
   greater-than 1000.

   Signed-off-by: Ian Rogers <irogers@google.com>
..
$ file a.out
a.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux.so.2,
BuildID[sha1]=3fcd28f85a27a3108941661a91dbc675c06868f9, for GNU/Linux
3.2.0, not stripped
$ sudo /tmp/perf/perf trace ./a.out
...
         ? (         ): a.out/218604  ... [continued]: execve())
                                    = 0
     0.067 ( 0.003 ms): a.out/218604 brk()
                                    = 0x5749e000
     0.154 ( 0.007 ms): a.out/218604 access(filename: 0xf7fc7f28,
mode: R)                                 = -1 ENOENT (No such file or
directory)
     0.168 ( 0.023 ms): a.out/218604 openat(dfd: CWD, filename:
0xf7fc44c3, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
     0.193 ( 0.006 ms): a.out/218604 statx(dfd:
3</proc/218604/status>, filename: 0xf7fc510a, flags:
NO_AUTOMOUNT|EMPTY_PATH, mask:
TYPE|MODE|NLINK|UID|GID|ATIME|MTIME|CTIME|INO|SIZE|BLOCKS, buffer:
0xffaa6b88) = 0
     0.212 ( 0.002 ms): a.out/218604 close(fd: 3</proc/218604/status>)
                                    = 0
     0.233 ( 0.019 ms): a.out/218604 openat(dfd: CWD, filename:
0xf7f973e0, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
     0.255 ( 0.004 ms): a.out/218604 read(fd: 3</proc/218604/status>,
buf: 0xffaa6df0, count: 512)         = 512
     0.262 ( 0.003 ms): a.out/218604 statx(dfd:
3</proc/218604/status>, filename: 0xf7fc510a, flags:
NO_AUTOMOUNT|EMPTY_PATH, mask:
TYPE|MODE|NLINK|UID|GID|ATIME|MTIME|CTIME|INO|SIZE|BLOCKS, buffer:
0xffaa6b38) = 0
     0.347 ( 0.002 ms): a.out/218604 close(fd: 3</proc/218604/status>)
                                    = 0
     0.372 ( 0.002 ms): a.out/218604 set_tid_address(tidptr:
0xf7f98528)                                   = 218604 (a.out)
     0.376 ( 0.002 ms): a.out/218604 set_robust_list(head: 0xf7f9852c,
len: 12)                            =
     0.381 ( 0.002 ms): a.out/218604 rseq(rseq: 0xf7f98960, rseq_len:
32, sig: 1392848979)                 =
     0.469 ( 0.010 ms): a.out/218604 mprotect(start: 0xf7f6e000, len:
8192, prot: READ)                    = 0
     0.489 ( 0.007 ms): a.out/218604 mprotect(start: 0x5661a000, len:
4096, prot: READ)                    = 0
     0.503 ( 0.007 ms): a.out/218604 mprotect(start: 0xf7fd0000, len:
8192, prot: READ)                    = 0
     0.550 ( 0.015 ms): a.out/218604 munmap(addr: 0xf7f7b000, len:
111198)                                 = 0
     0.589 ( 0.035 ms): a.out/218604 openat(dfd: CWD, filename:
0x56619008)                                = 3
     0.627 ( 0.024 ms): a.out/218604 read(fd: 3</proc/218604/status>,
buf: 0xffaa68fc, count: 4096)        = 1437
     0.654 ( 0.090 ms): a.out/218604 write(fd: 1</dev/pts/3>, buf: ,
count: 1437)                          = 1437
     0.766 (1000.164 ms): a.out/218604 clock_nanosleep(rqtp:
0xffaa6824, rmtp: 0xffaa681c)                   = 0
  1000.942 (         ): a.out/218604 exit_group()
$ file /tmp/perf/perf
/tmp/perf/perf: ELF 64-bit LSB pie executable, x86-64, version 1
(SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=60b07f65d2559a7193b2d1d36cfa00054dfbd076, for GNU/Linux
3.2.0, with debug_info, not stripped
```
Perhaps your a.out binary was built as an x32 one?
Looking under the covers with gdb:
```
$ sudo gdb --args /tmp/perf/perf trace ./a.out
GNU gdb (Debian 15.1-1) 15.1
...
Reading symbols from /tmp/perf/perf...
(gdb) b syscalltbl__name
Breakpoint 1 at 0x23a51b: file util/syscalltbl.c, line 47.
(gdb) r
...
[Detaching after vfork from child process 218826]

Breakpoint 1, syscalltbl__name (e_machine=3, id=11) at util/syscalltbl.c:47
47              const struct syscalltbl *table = find_table(e_machine);
```
So the e_machine is 3 which corresponds to EM_386.

I've not fixed every use of syscalltbl but I believe this one is working.

Thanks,
Ian

> >
> > v3: Add Charlie's reviewed-by tags. Incorporate feedback from Arnd
> >     Bergmann <arnd@arndb.de> on additional optional column and MIPS
> >     system call numbering. Rebase past Namhyung's global system call
> >     statistics and add comments that they don't yet support an
> >     e_machine other than EM_HOST.
> >
> > v2: Change the 1 element cache for the last table as suggested by
> >     Howard Chu, add Howard's reviewed-by tags.
> >     Add a comment and apology to Charlie for not doing better in
> >     guiding:
> >     https://lore.kernel.org/all/20250114-perf_syscall_arch_runtime-v1-1-5b304e408e11@rivosinc.com/
> >     After discussion on v1 and he agreed this patch series would be
> >     the better direction.
> >
> > Ian Rogers (8):
> >   perf syscalltble: Remove syscall_table.h
> >   perf trace: Reorganize syscalls
> >   perf syscalltbl: Remove struct syscalltbl
> >   perf thread: Add support for reading the e_machine type for a thread
> >   perf trace beauty: Add syscalltbl.sh generating all system call tables
> >   perf syscalltbl: Use lookup table containing multiple architectures
> >   perf build: Remove Makefile.syscalls
> >   perf syscalltbl: Mask off ABI type for MIPS system calls
> >
> >  tools/perf/Makefile.perf                      |  10 +-
> >  tools/perf/arch/alpha/entry/syscalls/Kbuild   |   2 -
> >  .../alpha/entry/syscalls/Makefile.syscalls    |   5 -
> >  tools/perf/arch/alpha/include/syscall_table.h |   2 -
> >  tools/perf/arch/arc/entry/syscalls/Kbuild     |   2 -
> >  .../arch/arc/entry/syscalls/Makefile.syscalls |   3 -
> >  tools/perf/arch/arc/include/syscall_table.h   |   2 -
> >  tools/perf/arch/arm/entry/syscalls/Kbuild     |   4 -
> >  .../arch/arm/entry/syscalls/Makefile.syscalls |   2 -
> >  tools/perf/arch/arm/include/syscall_table.h   |   2 -
> >  tools/perf/arch/arm64/entry/syscalls/Kbuild   |   3 -
> >  .../arm64/entry/syscalls/Makefile.syscalls    |   6 -
> >  tools/perf/arch/arm64/include/syscall_table.h |   8 -
> >  tools/perf/arch/csky/entry/syscalls/Kbuild    |   2 -
> >  .../csky/entry/syscalls/Makefile.syscalls     |   3 -
> >  tools/perf/arch/csky/include/syscall_table.h  |   2 -
> >  .../perf/arch/loongarch/entry/syscalls/Kbuild |   2 -
> >  .../entry/syscalls/Makefile.syscalls          |   3 -
> >  .../arch/loongarch/include/syscall_table.h    |   2 -
> >  tools/perf/arch/mips/entry/syscalls/Kbuild    |   2 -
> >  .../mips/entry/syscalls/Makefile.syscalls     |   5 -
> >  tools/perf/arch/mips/include/syscall_table.h  |   2 -
> >  tools/perf/arch/parisc/entry/syscalls/Kbuild  |   3 -
> >  .../parisc/entry/syscalls/Makefile.syscalls   |   6 -
> >  .../perf/arch/parisc/include/syscall_table.h  |   8 -
> >  tools/perf/arch/powerpc/entry/syscalls/Kbuild |   3 -
> >  .../powerpc/entry/syscalls/Makefile.syscalls  |   6 -
> >  .../perf/arch/powerpc/include/syscall_table.h |   8 -
> >  tools/perf/arch/riscv/entry/syscalls/Kbuild   |   2 -
> >  .../riscv/entry/syscalls/Makefile.syscalls    |   4 -
> >  tools/perf/arch/riscv/include/syscall_table.h |   8 -
> >  tools/perf/arch/s390/entry/syscalls/Kbuild    |   2 -
> >  .../s390/entry/syscalls/Makefile.syscalls     |   5 -
> >  tools/perf/arch/s390/include/syscall_table.h  |   2 -
> >  tools/perf/arch/sh/entry/syscalls/Kbuild      |   2 -
> >  .../arch/sh/entry/syscalls/Makefile.syscalls  |   4 -
> >  tools/perf/arch/sh/include/syscall_table.h    |   2 -
> >  tools/perf/arch/sparc/entry/syscalls/Kbuild   |   3 -
> >  .../sparc/entry/syscalls/Makefile.syscalls    |   5 -
> >  tools/perf/arch/sparc/include/syscall_table.h |   8 -
> >  tools/perf/arch/x86/entry/syscalls/Kbuild     |   3 -
> >  .../arch/x86/entry/syscalls/Makefile.syscalls |   6 -
> >  tools/perf/arch/x86/include/syscall_table.h   |   8 -
> >  tools/perf/arch/xtensa/entry/syscalls/Kbuild  |   2 -
> >  .../xtensa/entry/syscalls/Makefile.syscalls   |   4 -
> >  .../perf/arch/xtensa/include/syscall_table.h  |   2 -
> >  tools/perf/builtin-trace.c                    | 290 +++++++++++-------
> >  tools/perf/scripts/Makefile.syscalls          |  61 ----
> >  tools/perf/scripts/syscalltbl.sh              |  86 ------
> >  tools/perf/trace/beauty/syscalltbl.sh         | 274 +++++++++++++++++
> >  tools/perf/util/syscalltbl.c                  | 148 ++++-----
> >  tools/perf/util/syscalltbl.h                  |  22 +-
> >  tools/perf/util/thread.c                      |  50 +++
> >  tools/perf/util/thread.h                      |  14 +-
> >  54 files changed, 616 insertions(+), 509 deletions(-)
> >  delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/alpha/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/arc/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/arc/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/arc/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/arm/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/arm/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/arm/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/arm64/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/csky/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/csky/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/csky/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/loongarch/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/mips/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/mips/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/mips/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/parisc/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/powerpc/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/riscv/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/s390/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/s390/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/s390/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/sh/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/sh/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/sh/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/sparc/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/x86/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/x86/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/x86/include/syscall_table.h
> >  delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Kbuild
> >  delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Makefile.syscalls
> >  delete mode 100644 tools/perf/arch/xtensa/include/syscall_table.h
> >  delete mode 100644 tools/perf/scripts/Makefile.syscalls
> >  delete mode 100755 tools/perf/scripts/syscalltbl.sh
> >  create mode 100755 tools/perf/trace/beauty/syscalltbl.sh
> >
> > --
> > 2.48.1.601.g30ceb7b040-goog
> >
Namhyung Kim Feb. 25, 2025, 5:40 a.m. UTC | #5
On Mon, Feb 24, 2025 at 08:37:01PM -0800, Ian Rogers wrote:
> On Mon, Feb 24, 2025 at 7:05 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> > > This work builds on the clean up of system call tables and removal of
> > > libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> > >
> > > The system call table in perf trace is used to map system call numbers
> > > to names and vice versa. Prior to these changes, a single table
> > > matching the perf binary's build was present. The table would be
> > > incorrect if tracing say a 32-bit binary from a 64-bit version of
> > > perf, the names and numbers wouldn't match.
> > >
> > > Change the build so that a single system call file is built and the
> > > potentially multiple tables are identifiable from the ELF machine type
> > > of the process being examined. To determine the ELF machine type, the
> > > executable's header is read from /proc/pid/exe with fallbacks to using
> > > the perf's binary type when unknown.
> > >
> > > Remove some runtime types used by the system call tables and make
> > > equivalents generated at build time.
> >
> > So I tested this with a test program.
> >
> >   $ cat a.c
> >   #include <stdio.h>
> >   int main(void)
> >   {
> >         char buf[4096];
> >         FILE *fp = fopen("a.c", "r");
> >         size_t len;
> >
> >         len = fread(buf, sizeof(buf), 1, fp);
> >         fwrite(buf, 1, len, stdout);
> >         fflush(stdout);
> >         fclose(fp);
> >         return 0;
> >   }
> >
> >   $ gcc -o a64.out a.c
> >   $ gcc -o a32.out -m32 a.c
> >
> >   $ ./perf version
> >   perf version 6.14.rc1.ge002a64f6188
> >
> >   $ git show
> >   commit e002a64f61882626992dd6513c0db3711c06fea7 (HEAD -> perf-check)
> >   Author: Ian Rogers <irogers@google.com>
> >   Date:   Wed Feb 19 10:56:57 2025 -0800
> >
> >       perf syscalltbl: Mask off ABI type for MIPS system calls
> >
> >       Arnd Bergmann described that MIPS system calls don't necessarily start
> >       from 0 as an ABI prefix is applied:
> >       https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
> >       When decoding the "id" (aka system call number) for MIPS ignore values
> >       greater-than 1000.
> >
> >       Signed-off-by: Ian Rogers <irogers@google.com>
> >
> > It works well with 64bit.
> >
> >   $ sudo ./perf trace ./a64.out |& tail
> >        0.266 ( 0.007 ms): a64.out/858681 munmap(addr: 0x7f392723a000, len: 109058)                             = 0
> >        0.286 ( 0.002 ms): a64.out/858681 getrandom(ubuf: 0x7f3927232178, len: 8, flags: NONBLOCK)              = 8
> >        0.289 ( 0.001 ms): a64.out/858681 brk()                                                                 = 0x56419ecf7000
> >        0.291 ( 0.002 ms): a64.out/858681 brk(brk: 0x56419ed18000)                                              = 0x56419ed18000
> >        0.299 ( 0.009 ms): a64.out/858681 openat(dfd: CWD, filename: "a.c")                                     = 3
> >        0.312 ( 0.001 ms): a64.out/858681 fstat(fd: 3, statbuf: 0x7ffdfadf1eb0)                                 = 0
> >        0.315 ( 0.002 ms): a64.out/858681 read(fd: 3, buf: 0x7ffdfadf2030, count: 4096)                         = 211
> >        0.318 ( 0.009 ms): a64.out/858681 read(fd: 3, buf: 0x56419ecf7480, count: 4096)                         = 0
> >        0.330 ( 0.001 ms): a64.out/858681 close(fd: 3)                                                          = 0
> >        0.338 (         ): a64.out/858681 exit_group()                                                          = ?
> >
> > But 32bit is still broken and use 64bit syscall table wrongly.
> >
> >   $ file a32.out
> >   a32.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2,
> >   BuildID[sha1]=6eea873c939012e6c715e8f030261642bf61cb4e, for GNU/Linux 3.2.0, not stripped
> >
> >   $ sudo ./perf trace ./a32.out |& tail
> >        0.296 ( 0.001 ms): a32.out/858699 getxattr(pathname: "", name: "������", value: 0xf7f6ce14, size: 1)  = 0
> >        0.305 ( 0.007 ms): a32.out/858699 fchmod(fd: -134774784, mode: IFLNK|ISUID|ISVTX|IWOTH|0x10000)         = 0
> >        0.333 ( 0.001 ms): a32.out/858699 recvfrom(size: 4160146964, flags: RST|0x20000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1481879552
> >        0.335 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, ubuf: 0xf7f71278, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
> >        0.355 ( 0.002 ms): a32.out/858699 recvfrom(fd: 1482018816, ubuf: 0x5855d000, size: 4160146964, flags: RST|NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482018816
> >        0.362 ( 0.010 ms): a32.out/858699 preadv(fd: 4294967196, vec: (struct iovec){.iov_base = (void *)0x1b01000000632e62,.iov_len = (__kernel_size_t)1125899909479171,}, pos_h: 4160146964) = 3
> >        0.385 ( 0.002 ms): a32.out/858699 close(fd: 3)                                                          = 211
> >        0.388 ( 0.001 ms): a32.out/858699 close(fd: 3)                                                          = 0
> >        0.393 ( 0.002 ms): a32.out/858699 lstat(filename: "")                                                   = 0
> >        0.396 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
> >
> > The last 5 should be openat, read, read, close and brk(?).
> 
> That's strange as nearly the same test works for me:
> ```
> $ git show
> commit 7920020237af8138f7be1a21be9a2918a71ddc5e (HEAD -> ptn-syscalltbl)
> Author: Ian Rogers <irogers@google.com>
> Date:   Fri Jan 31 21:34:07 2025 -0800
> 
>    perf syscalltbl: Mask off ABI type for MIPS system calls
> 
>    Arnd Bergmann described that MIPS system calls don't necessarily start
>    from 0 as an ABI prefix is applied:
>    https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
>    When decoding the "id" (aka system call number) for MIPS ignore values
>    greater-than 1000.
> 
>    Signed-off-by: Ian Rogers <irogers@google.com>
> ..
> $ file a.out
> a.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV),
> dynamically linked, interpreter /lib/ld-linux.so.2,
> BuildID[sha1]=3fcd28f85a27a3108941661a91dbc675c06868f9, for GNU/Linux
> 3.2.0, not stripped
> $ sudo /tmp/perf/perf trace ./a.out
> ...
>          ? (         ): a.out/218604  ... [continued]: execve())
>                                     = 0
>      0.067 ( 0.003 ms): a.out/218604 brk()
>                                     = 0x5749e000
>      0.154 ( 0.007 ms): a.out/218604 access(filename: 0xf7fc7f28,
> mode: R)                                 = -1 ENOENT (No such file or
> directory)
>      0.168 ( 0.023 ms): a.out/218604 openat(dfd: CWD, filename:
> 0xf7fc44c3, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
>      0.193 ( 0.006 ms): a.out/218604 statx(dfd:
> 3</proc/218604/status>, filename: 0xf7fc510a, flags:
> NO_AUTOMOUNT|EMPTY_PATH, mask:
> TYPE|MODE|NLINK|UID|GID|ATIME|MTIME|CTIME|INO|SIZE|BLOCKS, buffer:
> 0xffaa6b88) = 0
>      0.212 ( 0.002 ms): a.out/218604 close(fd: 3</proc/218604/status>)
>                                     = 0
>      0.233 ( 0.019 ms): a.out/218604 openat(dfd: CWD, filename:
> 0xf7f973e0, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
>      0.255 ( 0.004 ms): a.out/218604 read(fd: 3</proc/218604/status>,
> buf: 0xffaa6df0, count: 512)         = 512
>      0.262 ( 0.003 ms): a.out/218604 statx(dfd:
> 3</proc/218604/status>, filename: 0xf7fc510a, flags:
> NO_AUTOMOUNT|EMPTY_PATH, mask:
> TYPE|MODE|NLINK|UID|GID|ATIME|MTIME|CTIME|INO|SIZE|BLOCKS, buffer:
> 0xffaa6b38) = 0
>      0.347 ( 0.002 ms): a.out/218604 close(fd: 3</proc/218604/status>)
>                                     = 0
>      0.372 ( 0.002 ms): a.out/218604 set_tid_address(tidptr:
> 0xf7f98528)                                   = 218604 (a.out)
>      0.376 ( 0.002 ms): a.out/218604 set_robust_list(head: 0xf7f9852c,
> len: 12)                            =
>      0.381 ( 0.002 ms): a.out/218604 rseq(rseq: 0xf7f98960, rseq_len:
> 32, sig: 1392848979)                 =
>      0.469 ( 0.010 ms): a.out/218604 mprotect(start: 0xf7f6e000, len:
> 8192, prot: READ)                    = 0
>      0.489 ( 0.007 ms): a.out/218604 mprotect(start: 0x5661a000, len:
> 4096, prot: READ)                    = 0
>      0.503 ( 0.007 ms): a.out/218604 mprotect(start: 0xf7fd0000, len:
> 8192, prot: READ)                    = 0
>      0.550 ( 0.015 ms): a.out/218604 munmap(addr: 0xf7f7b000, len:
> 111198)                                 = 0
>      0.589 ( 0.035 ms): a.out/218604 openat(dfd: CWD, filename:
> 0x56619008)                                = 3
>      0.627 ( 0.024 ms): a.out/218604 read(fd: 3</proc/218604/status>,
> buf: 0xffaa68fc, count: 4096)        = 1437
>      0.654 ( 0.090 ms): a.out/218604 write(fd: 1</dev/pts/3>, buf: ,
> count: 1437)                          = 1437
>      0.766 (1000.164 ms): a.out/218604 clock_nanosleep(rqtp:
> 0xffaa6824, rmtp: 0xffaa681c)                   = 0
>   1000.942 (         ): a.out/218604 exit_group()
> $ file /tmp/perf/perf
> /tmp/perf/perf: ELF 64-bit LSB pie executable, x86-64, version 1
> (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
> BuildID[sha1]=60b07f65d2559a7193b2d1d36cfa00054dfbd076, for GNU/Linux
> 3.2.0, with debug_info, not stripped
> ```
> Perhaps your a.out binary was built as an x32 one?
> Looking under the covers with gdb:
> ```
> $ sudo gdb --args /tmp/perf/perf trace ./a.out
> GNU gdb (Debian 15.1-1) 15.1
> ...
> Reading symbols from /tmp/perf/perf...
> (gdb) b syscalltbl__name
> Breakpoint 1 at 0x23a51b: file util/syscalltbl.c, line 47.
> (gdb) r
> ...
> [Detaching after vfork from child process 218826]
> 
> Breakpoint 1, syscalltbl__name (e_machine=3, id=11) at util/syscalltbl.c:47
> 47              const struct syscalltbl *table = find_table(e_machine);
> ```
> So the e_machine is 3 which corresponds to EM_386.
> 
> I've not fixed every use of syscalltbl but I believe this one is working.

Strange.  I'm seeing 62 (x86_64).

  $ sudo gdb -q --args ./perf trace ./a32.out
  Reading symbols from ./perf...
  (gdb) b syscalltbl__name
  Breakpoint 1 at 0x27998b: file util/syscalltbl.c, line 46.
  (gdb) r
  Starting program: /home/namhyung/tmp/perf trace ./a32.out
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
  [Detaching after fork from child process 886888]
  
  Breakpoint 1, syscalltbl__name (e_machine=62, id=156) at util/syscalltbl.c:46
  46	{

But the binary is i386.

  $ file a32.out
  a32.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2,
  BuildID[sha1]=6eea873c939012e6c715e8f030261642bf61cb4e, for GNU/Linux 3.2.0, not stripped
  
  $ readelf -h a32.out
  ELF Header:
    Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
    Class:                             ELF32
    Data:                              2's complement, little endian
    Version:                           1 (current)
    OS/ABI:                            UNIX - System V
    ABI Version:                       0
    Type:                              DYN (Position-Independent Executable file)
    Machine:                           Intel 80386
    Version:                           0x1
    Entry point address:               0x10a0
    Start of program headers:          52 (bytes into file)
    Start of section headers:          13932 (bytes into file)
    Flags:                             0x0
    Size of this header:               52 (bytes)
    Size of program headers:           32 (bytes)
    Number of program headers:         11
    Size of section headers:           40 (bytes)
    Number of section headers:         30
    Section header string table index: 29

  $ hexdump -C -n 32 a32.out
  00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
  00000010  03 00 03 00 01 00 00 00  a0 10 00 00 34 00 00 00  |............4...|
  00000020  ----- -----
              ^     ^
	      |     |
	   ET_DYN   |
	          EM_386

Thanks,
Namhyung
Namhyung Kim Feb. 26, 2025, 2:47 a.m. UTC | #6
On Mon, Feb 24, 2025 at 09:40:55PM -0800, Namhyung Kim wrote:
> On Mon, Feb 24, 2025 at 08:37:01PM -0800, Ian Rogers wrote:
> > On Mon, Feb 24, 2025 at 7:05 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> > > > This work builds on the clean up of system call tables and removal of
> > > > libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> > > >
> > > > The system call table in perf trace is used to map system call numbers
> > > > to names and vice versa. Prior to these changes, a single table
> > > > matching the perf binary's build was present. The table would be
> > > > incorrect if tracing say a 32-bit binary from a 64-bit version of
> > > > perf, the names and numbers wouldn't match.
> > > >
> > > > Change the build so that a single system call file is built and the
> > > > potentially multiple tables are identifiable from the ELF machine type
> > > > of the process being examined. To determine the ELF machine type, the
> > > > executable's header is read from /proc/pid/exe with fallbacks to using
> > > > the perf's binary type when unknown.
> > > >
> > > > Remove some runtime types used by the system call tables and make
> > > > equivalents generated at build time.
> > >
> > > So I tested this with a test program.
> > >
> > >   $ cat a.c
> > >   #include <stdio.h>
> > >   int main(void)
> > >   {
> > >         char buf[4096];
> > >         FILE *fp = fopen("a.c", "r");
> > >         size_t len;
> > >
> > >         len = fread(buf, sizeof(buf), 1, fp);
> > >         fwrite(buf, 1, len, stdout);
> > >         fflush(stdout);
> > >         fclose(fp);
> > >         return 0;
> > >   }
> > >
> > >   $ gcc -o a64.out a.c
> > >   $ gcc -o a32.out -m32 a.c
> > >
> > >   $ ./perf version
> > >   perf version 6.14.rc1.ge002a64f6188
> > >
> > >   $ git show
> > >   commit e002a64f61882626992dd6513c0db3711c06fea7 (HEAD -> perf-check)
> > >   Author: Ian Rogers <irogers@google.com>
> > >   Date:   Wed Feb 19 10:56:57 2025 -0800
> > >
> > >       perf syscalltbl: Mask off ABI type for MIPS system calls
> > >
> > >       Arnd Bergmann described that MIPS system calls don't necessarily start
> > >       from 0 as an ABI prefix is applied:
> > >       https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
> > >       When decoding the "id" (aka system call number) for MIPS ignore values
> > >       greater-than 1000.
> > >
> > >       Signed-off-by: Ian Rogers <irogers@google.com>
> > >
> > > It works well with 64bit.
> > >
> > >   $ sudo ./perf trace ./a64.out |& tail
> > >        0.266 ( 0.007 ms): a64.out/858681 munmap(addr: 0x7f392723a000, len: 109058)                             = 0
> > >        0.286 ( 0.002 ms): a64.out/858681 getrandom(ubuf: 0x7f3927232178, len: 8, flags: NONBLOCK)              = 8
> > >        0.289 ( 0.001 ms): a64.out/858681 brk()                                                                 = 0x56419ecf7000
> > >        0.291 ( 0.002 ms): a64.out/858681 brk(brk: 0x56419ed18000)                                              = 0x56419ed18000
> > >        0.299 ( 0.009 ms): a64.out/858681 openat(dfd: CWD, filename: "a.c")                                     = 3
> > >        0.312 ( 0.001 ms): a64.out/858681 fstat(fd: 3, statbuf: 0x7ffdfadf1eb0)                                 = 0
> > >        0.315 ( 0.002 ms): a64.out/858681 read(fd: 3, buf: 0x7ffdfadf2030, count: 4096)                         = 211
> > >        0.318 ( 0.009 ms): a64.out/858681 read(fd: 3, buf: 0x56419ecf7480, count: 4096)                         = 0
> > >        0.330 ( 0.001 ms): a64.out/858681 close(fd: 3)                                                          = 0
> > >        0.338 (         ): a64.out/858681 exit_group()                                                          = ?
> > >
> > > But 32bit is still broken and use 64bit syscall table wrongly.
> > >
> > >   $ file a32.out
> > >   a32.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2,
> > >   BuildID[sha1]=6eea873c939012e6c715e8f030261642bf61cb4e, for GNU/Linux 3.2.0, not stripped
> > >
> > >   $ sudo ./perf trace ./a32.out |& tail
> > >        0.296 ( 0.001 ms): a32.out/858699 getxattr(pathname: "", name: "������", value: 0xf7f6ce14, size: 1)  = 0
> > >        0.305 ( 0.007 ms): a32.out/858699 fchmod(fd: -134774784, mode: IFLNK|ISUID|ISVTX|IWOTH|0x10000)         = 0
> > >        0.333 ( 0.001 ms): a32.out/858699 recvfrom(size: 4160146964, flags: RST|0x20000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1481879552
> > >        0.335 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, ubuf: 0xf7f71278, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
> > >        0.355 ( 0.002 ms): a32.out/858699 recvfrom(fd: 1482018816, ubuf: 0x5855d000, size: 4160146964, flags: RST|NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482018816
> > >        0.362 ( 0.010 ms): a32.out/858699 preadv(fd: 4294967196, vec: (struct iovec){.iov_base = (void *)0x1b01000000632e62,.iov_len = (__kernel_size_t)1125899909479171,}, pos_h: 4160146964) = 3
> > >        0.385 ( 0.002 ms): a32.out/858699 close(fd: 3)                                                          = 211
> > >        0.388 ( 0.001 ms): a32.out/858699 close(fd: 3)                                                          = 0
> > >        0.393 ( 0.002 ms): a32.out/858699 lstat(filename: "")                                                   = 0
> > >        0.396 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
> > >
> > > The last 5 should be openat, read, read, close and brk(?).
> > 
> > That's strange as nearly the same test works for me:
> > ```
> > $ git show
> > commit 7920020237af8138f7be1a21be9a2918a71ddc5e (HEAD -> ptn-syscalltbl)
> > Author: Ian Rogers <irogers@google.com>
> > Date:   Fri Jan 31 21:34:07 2025 -0800
> > 
> >    perf syscalltbl: Mask off ABI type for MIPS system calls
> > 
> >    Arnd Bergmann described that MIPS system calls don't necessarily start
> >    from 0 as an ABI prefix is applied:
> >    https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
> >    When decoding the "id" (aka system call number) for MIPS ignore values
> >    greater-than 1000.
> > 
> >    Signed-off-by: Ian Rogers <irogers@google.com>
> > ..
> > $ file a.out
> > a.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV),
> > dynamically linked, interpreter /lib/ld-linux.so.2,
> > BuildID[sha1]=3fcd28f85a27a3108941661a91dbc675c06868f9, for GNU/Linux
> > 3.2.0, not stripped
> > $ sudo /tmp/perf/perf trace ./a.out
> > ...
> >          ? (         ): a.out/218604  ... [continued]: execve())
> >                                     = 0
> >      0.067 ( 0.003 ms): a.out/218604 brk()
> >                                     = 0x5749e000
> >      0.154 ( 0.007 ms): a.out/218604 access(filename: 0xf7fc7f28,
> > mode: R)                                 = -1 ENOENT (No such file or
> > directory)
> >      0.168 ( 0.023 ms): a.out/218604 openat(dfd: CWD, filename:
> > 0xf7fc44c3, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
> >      0.193 ( 0.006 ms): a.out/218604 statx(dfd:
> > 3</proc/218604/status>, filename: 0xf7fc510a, flags:
> > NO_AUTOMOUNT|EMPTY_PATH, mask:
> > TYPE|MODE|NLINK|UID|GID|ATIME|MTIME|CTIME|INO|SIZE|BLOCKS, buffer:
> > 0xffaa6b88) = 0
> >      0.212 ( 0.002 ms): a.out/218604 close(fd: 3</proc/218604/status>)
> >                                     = 0
> >      0.233 ( 0.019 ms): a.out/218604 openat(dfd: CWD, filename:
> > 0xf7f973e0, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
> >      0.255 ( 0.004 ms): a.out/218604 read(fd: 3</proc/218604/status>,
> > buf: 0xffaa6df0, count: 512)         = 512
> >      0.262 ( 0.003 ms): a.out/218604 statx(dfd:
> > 3</proc/218604/status>, filename: 0xf7fc510a, flags:
> > NO_AUTOMOUNT|EMPTY_PATH, mask:
> > TYPE|MODE|NLINK|UID|GID|ATIME|MTIME|CTIME|INO|SIZE|BLOCKS, buffer:
> > 0xffaa6b38) = 0
> >      0.347 ( 0.002 ms): a.out/218604 close(fd: 3</proc/218604/status>)
> >                                     = 0
> >      0.372 ( 0.002 ms): a.out/218604 set_tid_address(tidptr:
> > 0xf7f98528)                                   = 218604 (a.out)
> >      0.376 ( 0.002 ms): a.out/218604 set_robust_list(head: 0xf7f9852c,
> > len: 12)                            =
> >      0.381 ( 0.002 ms): a.out/218604 rseq(rseq: 0xf7f98960, rseq_len:
> > 32, sig: 1392848979)                 =
> >      0.469 ( 0.010 ms): a.out/218604 mprotect(start: 0xf7f6e000, len:
> > 8192, prot: READ)                    = 0
> >      0.489 ( 0.007 ms): a.out/218604 mprotect(start: 0x5661a000, len:
> > 4096, prot: READ)                    = 0
> >      0.503 ( 0.007 ms): a.out/218604 mprotect(start: 0xf7fd0000, len:
> > 8192, prot: READ)                    = 0
> >      0.550 ( 0.015 ms): a.out/218604 munmap(addr: 0xf7f7b000, len:
> > 111198)                                 = 0
> >      0.589 ( 0.035 ms): a.out/218604 openat(dfd: CWD, filename:
> > 0x56619008)                                = 3
> >      0.627 ( 0.024 ms): a.out/218604 read(fd: 3</proc/218604/status>,
> > buf: 0xffaa68fc, count: 4096)        = 1437
> >      0.654 ( 0.090 ms): a.out/218604 write(fd: 1</dev/pts/3>, buf: ,
> > count: 1437)                          = 1437
> >      0.766 (1000.164 ms): a.out/218604 clock_nanosleep(rqtp:
> > 0xffaa6824, rmtp: 0xffaa681c)                   = 0
> >   1000.942 (         ): a.out/218604 exit_group()
> > $ file /tmp/perf/perf
> > /tmp/perf/perf: ELF 64-bit LSB pie executable, x86-64, version 1
> > (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
> > BuildID[sha1]=60b07f65d2559a7193b2d1d36cfa00054dfbd076, for GNU/Linux
> > 3.2.0, with debug_info, not stripped
> > ```
> > Perhaps your a.out binary was built as an x32 one?
> > Looking under the covers with gdb:
> > ```
> > $ sudo gdb --args /tmp/perf/perf trace ./a.out
> > GNU gdb (Debian 15.1-1) 15.1
> > ...
> > Reading symbols from /tmp/perf/perf...
> > (gdb) b syscalltbl__name
> > Breakpoint 1 at 0x23a51b: file util/syscalltbl.c, line 47.
> > (gdb) r
> > ...
> > [Detaching after vfork from child process 218826]
> > 
> > Breakpoint 1, syscalltbl__name (e_machine=3, id=11) at util/syscalltbl.c:47
> > 47              const struct syscalltbl *table = find_table(e_machine);
> > ```
> > So the e_machine is 3 which corresponds to EM_386.
> > 
> > I've not fixed every use of syscalltbl but I believe this one is working.
> 
> Strange.  I'm seeing 62 (x86_64).
> 
>   $ sudo gdb -q --args ./perf trace ./a32.out
>   Reading symbols from ./perf...
>   (gdb) b syscalltbl__name
>   Breakpoint 1 at 0x27998b: file util/syscalltbl.c, line 46.
>   (gdb) r
>   Starting program: /home/namhyung/tmp/perf trace ./a32.out
>   [Thread debugging using libthread_db enabled]
>   Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>   [Detaching after fork from child process 886888]
>   
>   Breakpoint 1, syscalltbl__name (e_machine=62, id=156) at util/syscalltbl.c:46
>   46	{
> 
> But the binary is i386.
> 
>   $ file a32.out
>   a32.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2,
>   BuildID[sha1]=6eea873c939012e6c715e8f030261642bf61cb4e, for GNU/Linux 3.2.0, not stripped
>   
>   $ readelf -h a32.out
>   ELF Header:
>     Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
>     Class:                             ELF32
>     Data:                              2's complement, little endian
>     Version:                           1 (current)
>     OS/ABI:                            UNIX - System V
>     ABI Version:                       0
>     Type:                              DYN (Position-Independent Executable file)
>     Machine:                           Intel 80386
>     Version:                           0x1
>     Entry point address:               0x10a0
>     Start of program headers:          52 (bytes into file)
>     Start of section headers:          13932 (bytes into file)
>     Flags:                             0x0
>     Size of this header:               52 (bytes)
>     Size of program headers:           32 (bytes)
>     Number of program headers:         11
>     Size of section headers:           40 (bytes)
>     Number of section headers:         30
>     Section header string table index: 29
> 
>   $ hexdump -C -n 32 a32.out
>   00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
>   00000010  03 00 03 00 01 00 00 00  a0 10 00 00 34 00 00 00  |............4...|
>   00000020  ----- -----
>               ^     ^
> 	      |     |
> 	   ET_DYN   |
> 	          EM_386
> 

I found it failed to open /proc/PID/exe for some reason.  It failed with
ENOENT but I've confirmed there's /proc/PID directory.  Strange...

Thanks,
Namhyung
Namhyung Kim Feb. 26, 2025, 11:47 p.m. UTC | #7
On Tue, Feb 25, 2025 at 06:47:58PM -0800, Namhyung Kim wrote:
> On Mon, Feb 24, 2025 at 09:40:55PM -0800, Namhyung Kim wrote:
> > On Mon, Feb 24, 2025 at 08:37:01PM -0800, Ian Rogers wrote:
> > > On Mon, Feb 24, 2025 at 7:05 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> > > > > This work builds on the clean up of system call tables and removal of
> > > > > libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> > > > >
> > > > > The system call table in perf trace is used to map system call numbers
> > > > > to names and vice versa. Prior to these changes, a single table
> > > > > matching the perf binary's build was present. The table would be
> > > > > incorrect if tracing say a 32-bit binary from a 64-bit version of
> > > > > perf, the names and numbers wouldn't match.
> > > > >
> > > > > Change the build so that a single system call file is built and the
> > > > > potentially multiple tables are identifiable from the ELF machine type
> > > > > of the process being examined. To determine the ELF machine type, the
> > > > > executable's header is read from /proc/pid/exe with fallbacks to using
> > > > > the perf's binary type when unknown.
> > > > >
> > > > > Remove some runtime types used by the system call tables and make
> > > > > equivalents generated at build time.
> > > >
> > > > So I tested this with a test program.
> > > >
> > > >   $ cat a.c
> > > >   #include <stdio.h>
> > > >   int main(void)
> > > >   {
> > > >         char buf[4096];
> > > >         FILE *fp = fopen("a.c", "r");
> > > >         size_t len;
> > > >
> > > >         len = fread(buf, sizeof(buf), 1, fp);
> > > >         fwrite(buf, 1, len, stdout);
> > > >         fflush(stdout);
> > > >         fclose(fp);
> > > >         return 0;
> > > >   }
> > > >
> > > >   $ gcc -o a64.out a.c
> > > >   $ gcc -o a32.out -m32 a.c
> > > >
> > > >   $ ./perf version
> > > >   perf version 6.14.rc1.ge002a64f6188
> > > >
> > > >   $ git show
> > > >   commit e002a64f61882626992dd6513c0db3711c06fea7 (HEAD -> perf-check)
> > > >   Author: Ian Rogers <irogers@google.com>
> > > >   Date:   Wed Feb 19 10:56:57 2025 -0800
> > > >
> > > >       perf syscalltbl: Mask off ABI type for MIPS system calls
> > > >
> > > >       Arnd Bergmann described that MIPS system calls don't necessarily start
> > > >       from 0 as an ABI prefix is applied:
> > > >       https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
> > > >       When decoding the "id" (aka system call number) for MIPS ignore values
> > > >       greater-than 1000.
> > > >
> > > >       Signed-off-by: Ian Rogers <irogers@google.com>
> > > >
> > > > It works well with 64bit.
> > > >
> > > >   $ sudo ./perf trace ./a64.out |& tail
> > > >        0.266 ( 0.007 ms): a64.out/858681 munmap(addr: 0x7f392723a000, len: 109058)                             = 0
> > > >        0.286 ( 0.002 ms): a64.out/858681 getrandom(ubuf: 0x7f3927232178, len: 8, flags: NONBLOCK)              = 8
> > > >        0.289 ( 0.001 ms): a64.out/858681 brk()                                                                 = 0x56419ecf7000
> > > >        0.291 ( 0.002 ms): a64.out/858681 brk(brk: 0x56419ed18000)                                              = 0x56419ed18000
> > > >        0.299 ( 0.009 ms): a64.out/858681 openat(dfd: CWD, filename: "a.c")                                     = 3
> > > >        0.312 ( 0.001 ms): a64.out/858681 fstat(fd: 3, statbuf: 0x7ffdfadf1eb0)                                 = 0
> > > >        0.315 ( 0.002 ms): a64.out/858681 read(fd: 3, buf: 0x7ffdfadf2030, count: 4096)                         = 211
> > > >        0.318 ( 0.009 ms): a64.out/858681 read(fd: 3, buf: 0x56419ecf7480, count: 4096)                         = 0
> > > >        0.330 ( 0.001 ms): a64.out/858681 close(fd: 3)                                                          = 0
> > > >        0.338 (         ): a64.out/858681 exit_group()                                                          = ?
> > > >
> > > > But 32bit is still broken and use 64bit syscall table wrongly.
> > > >
> > > >   $ file a32.out
> > > >   a32.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2,
> > > >   BuildID[sha1]=6eea873c939012e6c715e8f030261642bf61cb4e, for GNU/Linux 3.2.0, not stripped
> > > >
> > > >   $ sudo ./perf trace ./a32.out |& tail
> > > >        0.296 ( 0.001 ms): a32.out/858699 getxattr(pathname: "", name: "������", value: 0xf7f6ce14, size: 1)  = 0
> > > >        0.305 ( 0.007 ms): a32.out/858699 fchmod(fd: -134774784, mode: IFLNK|ISUID|ISVTX|IWOTH|0x10000)         = 0
> > > >        0.333 ( 0.001 ms): a32.out/858699 recvfrom(size: 4160146964, flags: RST|0x20000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1481879552
> > > >        0.335 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, ubuf: 0xf7f71278, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
> > > >        0.355 ( 0.002 ms): a32.out/858699 recvfrom(fd: 1482018816, ubuf: 0x5855d000, size: 4160146964, flags: RST|NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482018816
> > > >        0.362 ( 0.010 ms): a32.out/858699 preadv(fd: 4294967196, vec: (struct iovec){.iov_base = (void *)0x1b01000000632e62,.iov_len = (__kernel_size_t)1125899909479171,}, pos_h: 4160146964) = 3
> > > >        0.385 ( 0.002 ms): a32.out/858699 close(fd: 3)                                                          = 211
> > > >        0.388 ( 0.001 ms): a32.out/858699 close(fd: 3)                                                          = 0
> > > >        0.393 ( 0.002 ms): a32.out/858699 lstat(filename: "")                                                   = 0
> > > >        0.396 ( 0.004 ms): a32.out/858699 recvfrom(fd: 1482014720, size: 4160146964, flags: NOSIGNAL|MORE|WAITFORONE|BATCH|SPLICE_PAGES|CMSG_CLOEXEC|0x10500000, addr: 0xf7f6ce14, addr_len: 0xf7f71278) = 1482014720
> > > >
> > > > The last 5 should be openat, read, read, close and brk(?).
> > > 
> > > That's strange as nearly the same test works for me:
> > > ```
> > > $ git show
> > > commit 7920020237af8138f7be1a21be9a2918a71ddc5e (HEAD -> ptn-syscalltbl)
> > > Author: Ian Rogers <irogers@google.com>
> > > Date:   Fri Jan 31 21:34:07 2025 -0800
> > > 
> > >    perf syscalltbl: Mask off ABI type for MIPS system calls
> > > 
> > >    Arnd Bergmann described that MIPS system calls don't necessarily start
> > >    from 0 as an ABI prefix is applied:
> > >    https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/
> > >    When decoding the "id" (aka system call number) for MIPS ignore values
> > >    greater-than 1000.
> > > 
> > >    Signed-off-by: Ian Rogers <irogers@google.com>
> > > ..
> > > $ file a.out
> > > a.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV),
> > > dynamically linked, interpreter /lib/ld-linux.so.2,
> > > BuildID[sha1]=3fcd28f85a27a3108941661a91dbc675c06868f9, for GNU/Linux
> > > 3.2.0, not stripped
> > > $ sudo /tmp/perf/perf trace ./a.out
> > > ...
> > >          ? (         ): a.out/218604  ... [continued]: execve())
> > >                                     = 0
> > >      0.067 ( 0.003 ms): a.out/218604 brk()
> > >                                     = 0x5749e000
> > >      0.154 ( 0.007 ms): a.out/218604 access(filename: 0xf7fc7f28,
> > > mode: R)                                 = -1 ENOENT (No such file or
> > > directory)
> > >      0.168 ( 0.023 ms): a.out/218604 openat(dfd: CWD, filename:
> > > 0xf7fc44c3, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
> > >      0.193 ( 0.006 ms): a.out/218604 statx(dfd:
> > > 3</proc/218604/status>, filename: 0xf7fc510a, flags:
> > > NO_AUTOMOUNT|EMPTY_PATH, mask:
> > > TYPE|MODE|NLINK|UID|GID|ATIME|MTIME|CTIME|INO|SIZE|BLOCKS, buffer:
> > > 0xffaa6b88) = 0
> > >      0.212 ( 0.002 ms): a.out/218604 close(fd: 3</proc/218604/status>)
> > >                                     = 0
> > >      0.233 ( 0.019 ms): a.out/218604 openat(dfd: CWD, filename:
> > > 0xf7f973e0, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
> > >      0.255 ( 0.004 ms): a.out/218604 read(fd: 3</proc/218604/status>,
> > > buf: 0xffaa6df0, count: 512)         = 512
> > >      0.262 ( 0.003 ms): a.out/218604 statx(dfd:
> > > 3</proc/218604/status>, filename: 0xf7fc510a, flags:
> > > NO_AUTOMOUNT|EMPTY_PATH, mask:
> > > TYPE|MODE|NLINK|UID|GID|ATIME|MTIME|CTIME|INO|SIZE|BLOCKS, buffer:
> > > 0xffaa6b38) = 0
> > >      0.347 ( 0.002 ms): a.out/218604 close(fd: 3</proc/218604/status>)
> > >                                     = 0
> > >      0.372 ( 0.002 ms): a.out/218604 set_tid_address(tidptr:
> > > 0xf7f98528)                                   = 218604 (a.out)
> > >      0.376 ( 0.002 ms): a.out/218604 set_robust_list(head: 0xf7f9852c,
> > > len: 12)                            =
> > >      0.381 ( 0.002 ms): a.out/218604 rseq(rseq: 0xf7f98960, rseq_len:
> > > 32, sig: 1392848979)                 =
> > >      0.469 ( 0.010 ms): a.out/218604 mprotect(start: 0xf7f6e000, len:
> > > 8192, prot: READ)                    = 0
> > >      0.489 ( 0.007 ms): a.out/218604 mprotect(start: 0x5661a000, len:
> > > 4096, prot: READ)                    = 0
> > >      0.503 ( 0.007 ms): a.out/218604 mprotect(start: 0xf7fd0000, len:
> > > 8192, prot: READ)                    = 0
> > >      0.550 ( 0.015 ms): a.out/218604 munmap(addr: 0xf7f7b000, len:
> > > 111198)                                 = 0
> > >      0.589 ( 0.035 ms): a.out/218604 openat(dfd: CWD, filename:
> > > 0x56619008)                                = 3
> > >      0.627 ( 0.024 ms): a.out/218604 read(fd: 3</proc/218604/status>,
> > > buf: 0xffaa68fc, count: 4096)        = 1437
> > >      0.654 ( 0.090 ms): a.out/218604 write(fd: 1</dev/pts/3>, buf: ,
> > > count: 1437)                          = 1437
> > >      0.766 (1000.164 ms): a.out/218604 clock_nanosleep(rqtp:
> > > 0xffaa6824, rmtp: 0xffaa681c)                   = 0
> > >   1000.942 (         ): a.out/218604 exit_group()
> > > $ file /tmp/perf/perf
> > > /tmp/perf/perf: ELF 64-bit LSB pie executable, x86-64, version 1
> > > (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
> > > BuildID[sha1]=60b07f65d2559a7193b2d1d36cfa00054dfbd076, for GNU/Linux
> > > 3.2.0, with debug_info, not stripped
> > > ```
> > > Perhaps your a.out binary was built as an x32 one?
> > > Looking under the covers with gdb:
> > > ```
> > > $ sudo gdb --args /tmp/perf/perf trace ./a.out
> > > GNU gdb (Debian 15.1-1) 15.1
> > > ...
> > > Reading symbols from /tmp/perf/perf...
> > > (gdb) b syscalltbl__name
> > > Breakpoint 1 at 0x23a51b: file util/syscalltbl.c, line 47.
> > > (gdb) r
> > > ...
> > > [Detaching after vfork from child process 218826]
> > > 
> > > Breakpoint 1, syscalltbl__name (e_machine=3, id=11) at util/syscalltbl.c:47
> > > 47              const struct syscalltbl *table = find_table(e_machine);
> > > ```
> > > So the e_machine is 3 which corresponds to EM_386.
> > > 
> > > I've not fixed every use of syscalltbl but I believe this one is working.
> > 
> > Strange.  I'm seeing 62 (x86_64).
> > 
> >   $ sudo gdb -q --args ./perf trace ./a32.out
> >   Reading symbols from ./perf...
> >   (gdb) b syscalltbl__name
> >   Breakpoint 1 at 0x27998b: file util/syscalltbl.c, line 46.
> >   (gdb) r
> >   Starting program: /home/namhyung/tmp/perf trace ./a32.out
> >   [Thread debugging using libthread_db enabled]
> >   Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> >   [Detaching after fork from child process 886888]
> >   
> >   Breakpoint 1, syscalltbl__name (e_machine=62, id=156) at util/syscalltbl.c:46
> >   46	{
> > 
> > But the binary is i386.
> > 
> >   $ file a32.out
> >   a32.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2,
> >   BuildID[sha1]=6eea873c939012e6c715e8f030261642bf61cb4e, for GNU/Linux 3.2.0, not stripped
> >   
> >   $ readelf -h a32.out
> >   ELF Header:
> >     Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
> >     Class:                             ELF32
> >     Data:                              2's complement, little endian
> >     Version:                           1 (current)
> >     OS/ABI:                            UNIX - System V
> >     ABI Version:                       0
> >     Type:                              DYN (Position-Independent Executable file)
> >     Machine:                           Intel 80386
> >     Version:                           0x1
> >     Entry point address:               0x10a0
> >     Start of program headers:          52 (bytes into file)
> >     Start of section headers:          13932 (bytes into file)
> >     Flags:                             0x0
> >     Size of this header:               52 (bytes)
> >     Size of program headers:           32 (bytes)
> >     Number of program headers:         11
> >     Size of section headers:           40 (bytes)
> >     Number of section headers:         30
> >     Section header string table index: 29
> > 
> >   $ hexdump -C -n 32 a32.out
> >   00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
> >   00000010  03 00 03 00 01 00 00 00  a0 10 00 00 34 00 00 00  |............4...|
> >   00000020  ----- -----
> >               ^     ^
> > 	      |     |
> > 	   ET_DYN   |
> > 	          EM_386
> > 
> 
> I found it failed to open /proc/PID/exe for some reason.  It failed with
> ENOENT but I've confirmed there's /proc/PID directory.  Strange...

It sometimes succeeded and showed the correct syscall names. :(

I don't know what's the problem on my machine.  But I think this is a
pre-exisiting problem and this patch improves it.

Thanks,
Namhyung
Namhyung Kim Feb. 27, 2025, midnight UTC | #8
On Mon, Feb 24, 2025 at 08:22:50PM -0800, Ian Rogers wrote:
> On Mon, Feb 24, 2025 at 7:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> > > This work builds on the clean up of system call tables and removal of
> > > libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> > >
> > > The system call table in perf trace is used to map system call numbers
> > > to names and vice versa. Prior to these changes, a single table
> > > matching the perf binary's build was present. The table would be
> > > incorrect if tracing say a 32-bit binary from a 64-bit version of
> > > perf, the names and numbers wouldn't match.
> > >
> > > Change the build so that a single system call file is built and the
> > > potentially multiple tables are identifiable from the ELF machine type
> > > of the process being examined. To determine the ELF machine type, the
> > > executable's header is read from /proc/pid/exe with fallbacks to using
> > > the perf's binary type when unknown.
> >
> > Hmm.. then this is limited to live mode and potentially detect wrong
> > machine type if it reads an old data, right?
> >
> > Also IIUC fallback to the perf binary means it cannot use cross-machine
> > table.  For example, it cannot process data from ARM64 on x86, no?  It
> > seems it should use perf_env.arch.
> 
> The perf env arch is kind of horrid. On x86 it has the value x86 and
> then there is an extra 64bit flag, who knows how x32 should be encoded
> - but we barely support x32 as-is. I'd rather we added a new feature
> for the e_machine/e_flags of the executable and worked with those, but
> it is kind of weird with doing system wide mode. I didn't want to drag
> that into this patch series anyway as there is already enough here.

Right, I don't know how to handle x32 properly.  Maybe we can just
ignore it for now.

But anyway looking at /proc/PID for recorded data doesn't seem correct.
Can you please add a flag to do that only from trace__run() and just use
EM_HOST for trace__replay()?

Later, we may need to add a misc flag or so to PERF_RECORD_FORK (and
PERF_RECORD_COMM with MISC_COMM_EXEC) to indicate non-standard ABI for a
new thread.  But it's not clear how to make it arch-independent.

> 
> > One more concern is BPF.  The BPF should know about the ABI of the
> > current process so that it can augment the syscall arguments correctly.
> > Currently it only checks the syscall number but it can be different on
> > 32-bit and 64-bit.
> 
> That's right. This change is trying to clean up
> tools/perf/util/syscalltbl.c and the perf trace usage. I didn't go as
> far as making BPF programs pair system call number with e_machine and
> e_flags, there is enough here and the behavior after these patches
> matches the behavior before - that is to assume the system call ABI
> matches that of the perf binary.

Right, the next step would be adding a BPF kfunc to identify the current
ABI.

Thanks,
Namhyung
Ian Rogers Feb. 27, 2025, 5:24 a.m. UTC | #9
On Wed, Feb 26, 2025 at 4:00 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Feb 24, 2025 at 08:22:50PM -0800, Ian Rogers wrote:
> > On Mon, Feb 24, 2025 at 7:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> > > > This work builds on the clean up of system call tables and removal of
> > > > libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> > > >
> > > > The system call table in perf trace is used to map system call numbers
> > > > to names and vice versa. Prior to these changes, a single table
> > > > matching the perf binary's build was present. The table would be
> > > > incorrect if tracing say a 32-bit binary from a 64-bit version of
> > > > perf, the names and numbers wouldn't match.
> > > >
> > > > Change the build so that a single system call file is built and the
> > > > potentially multiple tables are identifiable from the ELF machine type
> > > > of the process being examined. To determine the ELF machine type, the
> > > > executable's header is read from /proc/pid/exe with fallbacks to using
> > > > the perf's binary type when unknown.
> > >
> > > Hmm.. then this is limited to live mode and potentially detect wrong
> > > machine type if it reads an old data, right?
> > >
> > > Also IIUC fallback to the perf binary means it cannot use cross-machine
> > > table.  For example, it cannot process data from ARM64 on x86, no?  It
> > > seems it should use perf_env.arch.
> >
> > The perf env arch is kind of horrid. On x86 it has the value x86 and
> > then there is an extra 64bit flag, who knows how x32 should be encoded
> > - but we barely support x32 as-is. I'd rather we added a new feature
> > for the e_machine/e_flags of the executable and worked with those, but
> > it is kind of weird with doing system wide mode. I didn't want to drag
> > that into this patch series anyway as there is already enough here.
>
> Right, I don't know how to handle x32 properly.  Maybe we can just
> ignore it for now.
>
> But anyway looking at /proc/PID for recorded data doesn't seem correct.
> Can you please add a flag to do that only from trace__run() and just use
> EM_HOST for trace__replay()?

So I was hoping at some later point the e_machine on the thread could
be populated from the data file - hence the accessor being on thread
and not part of the trace code. We could add a global flag to thread
to disable the reading from /proc but we do similar reading in
machine.c for /proc/version, /proc/kallsyms, /proc/modules, etc. I
think the chance a pid is recycled and the process has a different
e_machine are remote enough that it is similar in nature. Adding the
flag means we need to go and fix up all uses, we only need to set the
flag in builtin-trace.c currently, but we've been historically bad at
setting these globals and bugs creep in. I also don't think
record/replay is working well and I didn't want the syscalltbl cleanup
to turn into a perf trace record/replay fixing exercise.

Thanks,
Ian

> Later, we may need to add a misc flag or so to PERF_RECORD_FORK (and
> PERF_RECORD_COMM with MISC_COMM_EXEC) to indicate non-standard ABI for a
> new thread.  But it's not clear how to make it arch-independent.
>
> >
> > > One more concern is BPF.  The BPF should know about the ABI of the
> > > current process so that it can augment the syscall arguments correctly.
> > > Currently it only checks the syscall number but it can be different on
> > > 32-bit and 64-bit.
> >
> > That's right. This change is trying to clean up
> > tools/perf/util/syscalltbl.c and the perf trace usage. I didn't go as
> > far as making BPF programs pair system call number with e_machine and
> > e_flags, there is enough here and the behavior after these patches
> > matches the behavior before - that is to assume the system call ABI
> > matches that of the perf binary.
>
> Right, the next step would be adding a BPF kfunc to identify the current
> ABI.
>
> Thanks,
> Namhyung
>
Namhyung Kim Feb. 27, 2025, 7:24 a.m. UTC | #10
On Wed, Feb 26, 2025 at 09:24:15PM -0800, Ian Rogers wrote:
> On Wed, Feb 26, 2025 at 4:00 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Feb 24, 2025 at 08:22:50PM -0800, Ian Rogers wrote:
> > > On Mon, Feb 24, 2025 at 7:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Wed, Feb 19, 2025 at 10:56:49AM -0800, Ian Rogers wrote:
> > > > > This work builds on the clean up of system call tables and removal of
> > > > > libaudit by Charlie Jenkins <charlie@rivosinc.com>.
> > > > >
> > > > > The system call table in perf trace is used to map system call numbers
> > > > > to names and vice versa. Prior to these changes, a single table
> > > > > matching the perf binary's build was present. The table would be
> > > > > incorrect if tracing say a 32-bit binary from a 64-bit version of
> > > > > perf, the names and numbers wouldn't match.
> > > > >
> > > > > Change the build so that a single system call file is built and the
> > > > > potentially multiple tables are identifiable from the ELF machine type
> > > > > of the process being examined. To determine the ELF machine type, the
> > > > > executable's header is read from /proc/pid/exe with fallbacks to using
> > > > > the perf's binary type when unknown.
> > > >
> > > > Hmm.. then this is limited to live mode and potentially detect wrong
> > > > machine type if it reads an old data, right?
> > > >
> > > > Also IIUC fallback to the perf binary means it cannot use cross-machine
> > > > table.  For example, it cannot process data from ARM64 on x86, no?  It
> > > > seems it should use perf_env.arch.
> > >
> > > The perf env arch is kind of horrid. On x86 it has the value x86 and
> > > then there is an extra 64bit flag, who knows how x32 should be encoded
> > > - but we barely support x32 as-is. I'd rather we added a new feature
> > > for the e_machine/e_flags of the executable and worked with those, but
> > > it is kind of weird with doing system wide mode. I didn't want to drag
> > > that into this patch series anyway as there is already enough here.
> >
> > Right, I don't know how to handle x32 properly.  Maybe we can just
> > ignore it for now.
> >
> > But anyway looking at /proc/PID for recorded data doesn't seem correct.
> > Can you please add a flag to do that only from trace__run() and just use
> > EM_HOST for trace__replay()?
> 
> So I was hoping at some later point the e_machine on the thread could
> be populated from the data file - hence the accessor being on thread
> and not part of the trace code.

Fair enough.


> We could add a global flag to thread
> to disable the reading from /proc but we do similar reading in
> machine.c for /proc/version, /proc/kallsyms, /proc/modules, etc.

You can add a flag to struct trace and only care about the perf trace
use case - whether to call thread__get_e_machine() or not.

In general, reading /proc from perf record is fine.  But doing that from
perf report or similar is not good.  You don't need to fix them, if any,
with this change.  But let's not introduce more bugs.


> I think the chance a pid is recycled and the process has a different
> e_machine are remote enough that it is similar in nature. Adding the
> flag means we need to go and fix up all uses, we only need to set the
> flag in builtin-trace.c currently, but we've been historically bad at
> setting these globals and bugs creep in. I also don't think
> record/replay is working well and I didn't want the syscalltbl cleanup
> to turn into a perf trace record/replay fixing exercise.

Yep, please see above.  Anyway I think record/replay on the same machine
is working well.

Thanks,
Namhyung

> 
> > Later, we may need to add a misc flag or so to PERF_RECORD_FORK (and
> > PERF_RECORD_COMM with MISC_COMM_EXEC) to indicate non-standard ABI for a
> > new thread.  But it's not clear how to make it arch-independent.
> >
> > >
> > > > One more concern is BPF.  The BPF should know about the ABI of the
> > > > current process so that it can augment the syscall arguments correctly.
> > > > Currently it only checks the syscall number but it can be different on
> > > > 32-bit and 64-bit.
> > >
> > > That's right. This change is trying to clean up
> > > tools/perf/util/syscalltbl.c and the perf trace usage. I didn't go as
> > > far as making BPF programs pair system call number with e_machine and
> > > e_flags, there is enough here and the behavior after these patches
> > > matches the behavior before - that is to assume the system call ABI
> > > matches that of the perf binary.
> >
> > Right, the next step would be adding a BPF kfunc to identify the current
> > ABI.
> >
> > Thanks,
> > Namhyung
> >