diff mbox series

kbuild: remove duplicate dependencies from .mod files

Message ID 20200210131925.145463-1-samitolvanen@google.com (mailing list archive)
State New, archived
Headers show
Series kbuild: remove duplicate dependencies from .mod files | expand

Commit Message

Sami Tolvanen Feb. 10, 2020, 1:19 p.m. UTC
With CONFIG_TRIM_UNUSED_SYMS, if a module has enough dependencies to
exceed the default xargs command line size limit, the output is split
into multiple lines, which can result in used symbols getting trimmed.

This change removes duplicate dependencies, which will reduce the
probability of this happening and makes .mod files smaller and easier
to read.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 scripts/Makefile.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: bb6d3fb354c5ee8d6bde2d576eb7220ea09862b9

Comments

Masahiro Yamada Feb. 12, 2020, 1:22 p.m. UTC | #1
Hi Sami,

On Mon, Feb 10, 2020 at 10:19 PM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> With CONFIG_TRIM_UNUSED_SYMS, if a module has enough dependencies to
> exceed the default xargs command line size limit, the output is split
> into multiple lines, which can result in used symbols getting trimmed.
>
> This change removes duplicate dependencies, which will reduce the
> probability of this happening and makes .mod files smaller and easier
> to read.
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>  scripts/Makefile.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index a1730d42e5f3..a083bcec19d3 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -257,7 +257,7 @@ endef
>
>  # List module undefined symbols (or empty line if not enabled)
>  ifdef CONFIG_TRIM_UNUSED_KSYMS
> -cmd_undef_syms = $(NM) $< | sed -n 's/^  *U //p' | xargs echo
> +cmd_undef_syms = $(NM) $< | sed -n 's/^  *U //p' | sort -u | xargs echo



In which case are undefined symbols duplicated?

Do you have a .config to reproduce it?



>  else
>  cmd_undef_syms = echo
>  endif
>
> base-commit: bb6d3fb354c5ee8d6bde2d576eb7220ea09862b9
> --
> 2.25.0.341.g760bfbb309-goog
>
Sami Tolvanen Feb. 12, 2020, 5:13 p.m. UTC | #2
On Wed, Feb 12, 2020 at 5:23 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> In which case are undefined symbols duplicated?

When a module consists of multiple compilation units, which depend on
the same external symbols. In Android, we ran into this when adding
hardening features that all depend on an external error handler
function with a rather long name. When CONFIG_TRIM_UNUSED_SYMS was
later enabled, we ran into this:

$ llvm-nm drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' |
xargs echo | wc
      2    9136  168660

xargs defaults to 128kiB limit for command line size, so the output
was split into two lines, which means some of the dependencies were
dropped and we ran into modpost errors. One method of fixing this is
to increase the limit:

$ llvm-nm drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' |
xargs -s 262144 echo | wc
      1    9136  168660

But it seems removing duplicates is a better solution as the length of
the dependency list is reduced significantly:

$ llvm-nm drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' |
sort -u | xargs echo | wc
      1    2716   50461

> Do you have a .config to reproduce it?

I can currently reproduce this on an Android kernel that has
Control-Flow Integrity (CFI) enabled. While this feature is not
upstreamed yet, there's nothing that would prevent us from hitting the
command line limit with sufficiently large modules otherwise as well.

Sami
Masahiro Yamada Feb. 16, 2020, 4:27 a.m. UTC | #3
Hi Sami,

On Thu, Feb 13, 2020 at 2:13 AM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> On Wed, Feb 12, 2020 at 5:23 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> > In which case are undefined symbols duplicated?
>
> When a module consists of multiple compilation units, which depend on
> the same external symbols. In Android, we ran into this when adding
> hardening features that all depend on an external error handler
> function with a rather long name. When CONFIG_TRIM_UNUSED_SYMS was
> later enabled, we ran into this:
>
> $ llvm-nm drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' |
> xargs echo | wc
>       2    9136  168660
>
> xargs defaults to 128kiB limit for command line size, so the output
> was split into two lines, which means some of the dependencies were
> dropped and we ran into modpost errors. One method of fixing this is
> to increase the limit:
>
> $ llvm-nm drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' |
> xargs -s 262144 echo | wc
>       1    9136  168660
>
> But it seems removing duplicates is a better solution as the length of
> the dependency list is reduced significantly:
>
> $ llvm-nm drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' |
> sort -u | xargs echo | wc
>       1    2716   50461


At least, I am unable to reproduce this in upstream.

This is my result for x86 allmodconfig builds.

masahiro@grover:~/workspace/linux-kbuild$ nm
drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' | wc
    572     572   11478
masahiro@grover:~/workspace/linux-kbuild$ nm
drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' | sort -u |
wc
    572     572   11478


I see no difference with/without 'sort -u'.


I also tried llvm-nm instead of GNU nm,
but the result is the same.


masahiro@grover:~/workspace/linux-kbuild$ llvm-nm
drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' | wc
    572     572   11478
masahiro@grover:~/workspace/linux-kbuild$ llvm-nm
drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' | sort -u |
wc
    572     572   11478




> > Do you have a .config to reproduce it?
>
> I can currently reproduce this on an Android kernel that has
> Control-Flow Integrity (CFI) enabled. While this feature is not
> upstreamed yet, there's nothing that would prevent us from hitting the
> command line limit with sufficiently large modules otherwise as well.


Does ACK do this differently?

I think it would be strange
if $(NM) duplicated undefined symbols.
Sami Tolvanen Feb. 18, 2020, 7:21 p.m. UTC | #4
Hi Masahiro,

On Sat, Feb 15, 2020 at 8:28 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
> masahiro@grover:~/workspace/linux-kbuild$ llvm-nm
> drivers/gpu/drm/nouveau/nouveau.o | sed -n 's/^  *U //p' | sort -u |
> wc
>     572     572   11478

Thank you for testing this!

> Does ACK do this differently?

Yes, the difference is that we use LTO and after looking into this a
bit more, it turns out that when the individual files in the thin
archive are LLVM bitcode, llvm-nm behaves differently and prints out
the symbols for each file separately.

$ llvm-nm drivers/gpu/drm/nouveau/nouveau.o
...
nvif/client.o:
---------------- W __cfi_check
---------------- W __cfi_check_fail
                 U __cfi_slowpath_diag
                 U __ubsan_handle_cfi_check_fail
---------------- T nvif_client_fini
---------------- T nvif_client_init
---------------- T nvif_client_ioctl
---------------- T nvif_client_resume
---------------- T nvif_client_suspend
                 U nvif_object_fini
                 U nvif_object_init
                 U strncpy

nvif/device.o:
---------------- W __cfi_check
---------------- W __cfi_check_fail
                 U __ubsan_handle_cfi_check_fail
                 U kfree
---------------- T nvif_device_fini
---------------- T nvif_device_init
---------------- T nvif_device_time
                 U nvif_object_fini
                 U nvif_object_init
                 U nvif_object_mthd
                 U nvif_user_fini
...

While this output format still works for us, it does generate a lot of
duplicates. Anyway, I think we can come back to this when LTO is
closer to getting upstreamed.

Sami
diff mbox series

Patch

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index a1730d42e5f3..a083bcec19d3 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -257,7 +257,7 @@  endef
 
 # List module undefined symbols (or empty line if not enabled)
 ifdef CONFIG_TRIM_UNUSED_KSYMS
-cmd_undef_syms = $(NM) $< | sed -n 's/^  *U //p' | xargs echo
+cmd_undef_syms = $(NM) $< | sed -n 's/^  *U //p' | sort -u | xargs echo
 else
 cmd_undef_syms = echo
 endif