mbox series

[v2,0/6] Generate address range data for built-in modules

Message ID 20240511224035.27775-1-kris.van.hees@oracle.com (mailing list archive)
Headers show
Series Generate address range data for built-in modules | expand

Message

Kris Van Hees May 11, 2024, 10:40 p.m. UTC
Especially for tracing applications, it is convenient to be able to
refer to a symbol using a <module name, symbol name> pair and to be able
to translate an address into a <nodule mname, symbol name> pair.  But
that does not work if the module is built into the kernel because the
object files that comprise the built-in module implementation are simply
linked into the kernel image along with all other kernel object files.

This is especially visible when providing tracing scripts for support
purposes, where the developer of the script targets a particular kernel
version, but does not have control over whether the target system has
a particular module as loadable module or built-in module.  When tracing
symbols within a module, referring them by <module name, symbol name>
pairs is both convenient and aids symbol lookup.  But that naming will
not work if the module name information is lost if the module is built
into the kernel on the target system.

Earlier work addressing this loss of information for built-in modules
involved adding module name information to the kallsyms data, but that
required more invasive code in the kernel proper.  This work never did
get merged into the kernel tree.

All that is really needed is knowing whether a given address belongs to
a particular module (or multiple modules if they share an object file).
Or in other words, whether that address falls within an address range
that is associated with one or more modules.

This patch series is baaed on Luis Chamberlain's patch to generate
modules.builtin.objs, associating built-in modules with their object
files.  Using this data, vmlinux.o.map and vmlinux.map can be parsed in
a single pass to generate a modules.buitin.ranges file with offset range
information (relative to the base address of the associated section) for
built-in modules.  The file gets installed along with the other
modules.builtin.* files.

The impact on the kernel build is minimal because everything is done
using a single-pass AWK script.  The generated data size is minimal as
well, (depending on the exact kernel configuration) usually in the range
of 500-700 lines, with a file size of 20-40KB.

Changes since v1:
 - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
 - Moved the config option to the tracers section
 - 2nd arg to generate_builtin_ranges.awk should be vmlinux.map

Kris Van Hees (5):
  trace: add CONFIG_BUILTIN_MODULE_RANGES option
  kbuild: generate a linker map for vmlinux.o
  module: script to generate offset ranges for builtin modules
  kbuild: generate modules.builtin.ranges when linking the kernel
  module: add install target for modules.builtin.ranges

Luis Chamberlain (1):
  kbuild: add modules.builtin.objs

 .gitignore                          |   2 +-
 Documentation/dontdiff              |   2 +-
 Documentation/kbuild/kbuild.rst     |   5 ++
 Makefile                            |   8 +-
 include/linux/module.h              |   4 +-
 kernel/trace/Kconfig                |  17 ++++
 scripts/Makefile.lib                |   5 +-
 scripts/Makefile.modinst            |  11 ++-
 scripts/Makefile.vmlinux            |  17 ++++
 scripts/Makefile.vmlinux_o          |  18 ++++-
 scripts/generate_builtin_ranges.awk | 149 ++++++++++++++++++++++++++++++++++++
 11 files changed, 228 insertions(+), 10 deletions(-)
 create mode 100755 scripts/generate_builtin_ranges.awk


base-commit: dd5a440a31fae6e459c0d6271dddd62825505361

Comments

Masahiro Yamada May 13, 2024, 4:43 a.m. UTC | #1
On Sun, May 12, 2024 at 7:42 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> Especially for tracing applications, it is convenient to be able to
> refer to a symbol using a <module name, symbol name> pair and to be able
> to translate an address into a <nodule mname, symbol name> pair.  But
> that does not work if the module is built into the kernel because the
> object files that comprise the built-in module implementation are simply
> linked into the kernel image along with all other kernel object files.
>
> This is especially visible when providing tracing scripts for support
> purposes, where the developer of the script targets a particular kernel
> version, but does not have control over whether the target system has
> a particular module as loadable module or built-in module.  When tracing
> symbols within a module, referring them by <module name, symbol name>
> pairs is both convenient and aids symbol lookup.  But that naming will
> not work if the module name information is lost if the module is built
> into the kernel on the target system.
>
> Earlier work addressing this loss of information for built-in modules
> involved adding module name information to the kallsyms data, but that
> required more invasive code in the kernel proper.  This work never did
> get merged into the kernel tree.
>
> All that is really needed is knowing whether a given address belongs to
> a particular module (or multiple modules if they share an object file).
> Or in other words, whether that address falls within an address range
> that is associated with one or more modules.
>
> This patch series is baaed on Luis Chamberlain's patch to generate
> modules.builtin.objs, associating built-in modules with their object
> files.  Using this data, vmlinux.o.map and vmlinux.map can be parsed in
> a single pass to generate a modules.buitin.ranges file with offset range
> information (relative to the base address of the associated section) for
> built-in modules.  The file gets installed along with the other
> modules.builtin.* files.



I still do not want to see modules.builtin.objs.


During the vmlinux.o.map parse, every time an object path
is encountered, you can open the corresponding .cmd file.



Let's say, you have the following in vmlinux.o.map:

.text          0x00000000007d4fe0     0x46c8 drivers/i2c/i2c-core-base.o



You can check drivers/i2c/.i2c-core-base.o.cmd


$ cat drivers/i2c/.i2c-core-base.o.cmd | tr ' ' '\n' | grep KBUILD_MODFILE
-DKBUILD_MODFILE='"drivers/i2c/i2c-core"'


Now you know this object is part of drivers/i2c/i2c-core
(that is, its modname is "i2c-core")




Next, you will get the following:

 .text          0x00000000007dc550     0x13c4 drivers/i2c/i2c-core-acpi.o


$ cat drivers/i2c/.i2c-core-acpi.o.cmd | tr ' ' '\n' | grep KBUILD_MODFILE
-DKBUILD_MODFILE='"drivers/i2c/i2c-core"'


This one is also a part of drivers/i2c/i2c-core


You will get the address range of "i2c-core" without changing Makefiles.

You still need to modify scripts/Makefile.vmlinux(_o)
but you can implement everything else in your script,
although I did not fully understand the gawk script.


Now, you can use Python if you like:

  https://lore.kernel.org/lkml/20240512-python-version-v2-1-382870a1fa1d@linaro.org/

Presumably, python code will be more readable for many people.


GNU awk is not documented in Documentation/process/changes.rst
If you insist on using gawk, you need to add it to the doc.





Having said that, I often hope to filter traced functions
by an object path instead of a modname because modname
filtering is only useful tristate code.
For example, filter by "path:drivers/i2c/" or "path:drivers/i2c/i2c-core*"
rather than "mod:i2c-core"

<object path, symbol name> reference will be useful for always-builtin code.




>
> The impact on the kernel build is minimal because everything is done
> using a single-pass AWK script.  The generated data size is minimal as
> well, (depending on the exact kernel configuration) usually in the range
> of 500-700 lines, with a file size of 20-40KB.
>
> Changes since v1:
>  - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
>  - Moved the config option to the tracers section
>  - 2nd arg to generate_builtin_ranges.awk should be vmlinux.map
>
> Kris Van Hees (5):
>   trace: add CONFIG_BUILTIN_MODULE_RANGES option
>   kbuild: generate a linker map for vmlinux.o
>   module: script to generate offset ranges for builtin modules
>   kbuild: generate modules.builtin.ranges when linking the kernel
>   module: add install target for modules.builtin.ranges
>
> Luis Chamberlain (1):
>   kbuild: add modules.builtin.objs
>
>  .gitignore                          |   2 +-
>  Documentation/dontdiff              |   2 +-
>  Documentation/kbuild/kbuild.rst     |   5 ++
>  Makefile                            |   8 +-
>  include/linux/module.h              |   4 +-
>  kernel/trace/Kconfig                |  17 ++++
>  scripts/Makefile.lib                |   5 +-
>  scripts/Makefile.modinst            |  11 ++-
>  scripts/Makefile.vmlinux            |  17 ++++
>  scripts/Makefile.vmlinux_o          |  18 ++++-
>  scripts/generate_builtin_ranges.awk | 149 ++++++++++++++++++++++++++++++++++++
>  11 files changed, 228 insertions(+), 10 deletions(-)
>  create mode 100755 scripts/generate_builtin_ranges.awk
>
>
> base-commit: dd5a440a31fae6e459c0d6271dddd62825505361
> --
> 2.42.0
>
>
Kris Van Hees May 15, 2024, 4:50 p.m. UTC | #2
On Mon, May 13, 2024 at 01:43:15PM +0900, Masahiro Yamada wrote:
> On Sun, May 12, 2024 at 7:42???AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> >
> > Especially for tracing applications, it is convenient to be able to
> > refer to a symbol using a <module name, symbol name> pair and to be able
> > to translate an address into a <nodule mname, symbol name> pair.  But
> > that does not work if the module is built into the kernel because the
> > object files that comprise the built-in module implementation are simply
> > linked into the kernel image along with all other kernel object files.
> >
> > This is especially visible when providing tracing scripts for support
> > purposes, where the developer of the script targets a particular kernel
> > version, but does not have control over whether the target system has
> > a particular module as loadable module or built-in module.  When tracing
> > symbols within a module, referring them by <module name, symbol name>
> > pairs is both convenient and aids symbol lookup.  But that naming will
> > not work if the module name information is lost if the module is built
> > into the kernel on the target system.
> >
> > Earlier work addressing this loss of information for built-in modules
> > involved adding module name information to the kallsyms data, but that
> > required more invasive code in the kernel proper.  This work never did
> > get merged into the kernel tree.
> >
> > All that is really needed is knowing whether a given address belongs to
> > a particular module (or multiple modules if they share an object file).
> > Or in other words, whether that address falls within an address range
> > that is associated with one or more modules.
> >
> > This patch series is baaed on Luis Chamberlain's patch to generate
> > modules.builtin.objs, associating built-in modules with their object
> > files.  Using this data, vmlinux.o.map and vmlinux.map can be parsed in
> > a single pass to generate a modules.buitin.ranges file with offset range
> > information (relative to the base address of the associated section) for
> > built-in modules.  The file gets installed along with the other
> > modules.builtin.* files.
> 
> 
> 
> I still do not want to see modules.builtin.objs.
> 
> 
> During the vmlinux.o.map parse, every time an object path
> is encountered, you can open the corresponding .cmd file.
> 
> 
> 
> Let's say, you have the following in vmlinux.o.map:
> 
> .text          0x00000000007d4fe0     0x46c8 drivers/i2c/i2c-core-base.o
> 
> 
> 
> You can check drivers/i2c/.i2c-core-base.o.cmd
> 
> 
> $ cat drivers/i2c/.i2c-core-base.o.cmd | tr ' ' '\n' | grep KBUILD_MODFILE
> -DKBUILD_MODFILE='"drivers/i2c/i2c-core"'
> 
> 
> Now you know this object is part of drivers/i2c/i2c-core
> (that is, its modname is "i2c-core")
> 
> 
> 
> 
> Next, you will get the following:
> 
>  .text          0x00000000007dc550     0x13c4 drivers/i2c/i2c-core-acpi.o
> 
> 
> $ cat drivers/i2c/.i2c-core-acpi.o.cmd | tr ' ' '\n' | grep KBUILD_MODFILE
> -DKBUILD_MODFILE='"drivers/i2c/i2c-core"'
> 
> 
> This one is also a part of drivers/i2c/i2c-core
>
> 
> You will get the address range of "i2c-core" without changing Makefiles.

Thank you for this suggestion.  I have this approach now implemented, making
use of both KBUILD_MODFILE and KBUILD_MODNAME (both are needed to conclusively
determine that an object belongs to a module).

However, this is not catching objects that are compiled from assembler source,
because modfile_flags and modname_flags are not added to the assembler flags,
and thus KBUILD_MODFILE and KBUILD_MODNAME are not present in the .cmd file
for those objects.

It would seem that it is harmless to add those flags to assembler flags, so
would that be an acceptable solution?  It definitely would provide consistency
with non-asm objects.  And we already pass modfile and modname flags to the
non-asm builds for objects that most certainly do not belong in modules amnyway,
e.g.

$ cat arch/x86/boot/.cmdline.o.cmd| tr ' ' '\n' | grep -- -DKBUILD_MOD
-DKBUILD_MODFILE='"arch/x86/boot/cmdline"'
-DKBUILD_MODNAME='"cmdline"'

> You still need to modify scripts/Makefile.vmlinux(_o)
> but you can implement everything else in your script,
> although I did not fully understand the gawk script.
> 
> 
> Now, you can use Python if you like:
> 
>   https://lore.kernel.org/lkml/20240512-python-version-v2-1-382870a1fa1d@linaro.org/
> 
> Presumably, python code will be more readable for many people.
> 
> 
> GNU awk is not documented in Documentation/process/changes.rst
> If you insist on using gawk, you need to add it to the doc.
> 
> 
> 
> 
> 
> Having said that, I often hope to filter traced functions
> by an object path instead of a modname because modname
> filtering is only useful tristate code.
> For example, filter by "path:drivers/i2c/" or "path:drivers/i2c/i2c-core*"
> rather than "mod:i2c-core"
> 
> <object path, symbol name> reference will be useful for always-builtin code.
> 
> 
> 
> 
> >
> > The impact on the kernel build is minimal because everything is done
> > using a single-pass AWK script.  The generated data size is minimal as
> > well, (depending on the exact kernel configuration) usually in the range
> > of 500-700 lines, with a file size of 20-40KB.
> >
> > Changes since v1:
> >  - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
> >  - Moved the config option to the tracers section
> >  - 2nd arg to generate_builtin_ranges.awk should be vmlinux.map
> >
> > Kris Van Hees (5):
> >   trace: add CONFIG_BUILTIN_MODULE_RANGES option
> >   kbuild: generate a linker map for vmlinux.o
> >   module: script to generate offset ranges for builtin modules
> >   kbuild: generate modules.builtin.ranges when linking the kernel
> >   module: add install target for modules.builtin.ranges
> >
> > Luis Chamberlain (1):
> >   kbuild: add modules.builtin.objs
> >
> >  .gitignore                          |   2 +-
> >  Documentation/dontdiff              |   2 +-
> >  Documentation/kbuild/kbuild.rst     |   5 ++
> >  Makefile                            |   8 +-
> >  include/linux/module.h              |   4 +-
> >  kernel/trace/Kconfig                |  17 ++++
> >  scripts/Makefile.lib                |   5 +-
> >  scripts/Makefile.modinst            |  11 ++-
> >  scripts/Makefile.vmlinux            |  17 ++++
> >  scripts/Makefile.vmlinux_o          |  18 ++++-
> >  scripts/generate_builtin_ranges.awk | 149 ++++++++++++++++++++++++++++++++++++
> >  11 files changed, 228 insertions(+), 10 deletions(-)
> >  create mode 100755 scripts/generate_builtin_ranges.awk
> >
> >
> > base-commit: dd5a440a31fae6e459c0d6271dddd62825505361
> > --
> > 2.42.0
> >
> >
> 
> 
> -- 
> Best Regards
> Masahiro Yamada
Masahiro Yamada May 16, 2024, 2:56 a.m. UTC | #3
On Thu, May 16, 2024 at 1:50 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> On Mon, May 13, 2024 at 01:43:15PM +0900, Masahiro Yamada wrote:
> > On Sun, May 12, 2024 at 7:42???AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> > >
> > > Especially for tracing applications, it is convenient to be able to
> > > refer to a symbol using a <module name, symbol name> pair and to be able
> > > to translate an address into a <nodule mname, symbol name> pair.  But
> > > that does not work if the module is built into the kernel because the
> > > object files that comprise the built-in module implementation are simply
> > > linked into the kernel image along with all other kernel object files.
> > >
> > > This is especially visible when providing tracing scripts for support
> > > purposes, where the developer of the script targets a particular kernel
> > > version, but does not have control over whether the target system has
> > > a particular module as loadable module or built-in module.  When tracing
> > > symbols within a module, referring them by <module name, symbol name>
> > > pairs is both convenient and aids symbol lookup.  But that naming will
> > > not work if the module name information is lost if the module is built
> > > into the kernel on the target system.
> > >
> > > Earlier work addressing this loss of information for built-in modules
> > > involved adding module name information to the kallsyms data, but that
> > > required more invasive code in the kernel proper.  This work never did
> > > get merged into the kernel tree.
> > >
> > > All that is really needed is knowing whether a given address belongs to
> > > a particular module (or multiple modules if they share an object file).
> > > Or in other words, whether that address falls within an address range
> > > that is associated with one or more modules.
> > >
> > > This patch series is baaed on Luis Chamberlain's patch to generate
> > > modules.builtin.objs, associating built-in modules with their object
> > > files.  Using this data, vmlinux.o.map and vmlinux.map can be parsed in
> > > a single pass to generate a modules.buitin.ranges file with offset range
> > > information (relative to the base address of the associated section) for
> > > built-in modules.  The file gets installed along with the other
> > > modules.builtin.* files.
> >
> >
> >
> > I still do not want to see modules.builtin.objs.
> >
> >
> > During the vmlinux.o.map parse, every time an object path
> > is encountered, you can open the corresponding .cmd file.
> >
> >
> >
> > Let's say, you have the following in vmlinux.o.map:
> >
> > .text          0x00000000007d4fe0     0x46c8 drivers/i2c/i2c-core-base.o
> >
> >
> >
> > You can check drivers/i2c/.i2c-core-base.o.cmd
> >
> >
> > $ cat drivers/i2c/.i2c-core-base.o.cmd | tr ' ' '\n' | grep KBUILD_MODFILE
> > -DKBUILD_MODFILE='"drivers/i2c/i2c-core"'
> >
> >
> > Now you know this object is part of drivers/i2c/i2c-core
> > (that is, its modname is "i2c-core")
> >
> >
> >
> >
> > Next, you will get the following:
> >
> >  .text          0x00000000007dc550     0x13c4 drivers/i2c/i2c-core-acpi.o
> >
> >
> > $ cat drivers/i2c/.i2c-core-acpi.o.cmd | tr ' ' '\n' | grep KBUILD_MODFILE
> > -DKBUILD_MODFILE='"drivers/i2c/i2c-core"'
> >
> >
> > This one is also a part of drivers/i2c/i2c-core
> >
> >
> > You will get the address range of "i2c-core" without changing Makefiles.
>
> Thank you for this suggestion.  I have this approach now implemented, making
> use of both KBUILD_MODFILE and KBUILD_MODNAME (both are needed to conclusively
> determine that an object belongs to a module).
>
> However, this is not catching objects that are compiled from assembler source,
> because modfile_flags and modname_flags are not added to the assembler flags,
> and thus KBUILD_MODFILE and KBUILD_MODNAME are not present in the .cmd file
> for those objects.
>
> It would seem that it is harmless to add those flags to assembler flags, so
> would that be an acceptable solution?  It definitely would provide consistency
> with non-asm objects.  And we already pass modfile and modname flags to the
> non-asm builds for objects that most certainly do not belong in modules amnyway,
> e.g.
>
> $ cat arch/x86/boot/.cmdline.o.cmd| tr ' ' '\n' | grep -- -DKBUILD_MOD
> -DKBUILD_MODFILE='"arch/x86/boot/cmdline"'
> -DKBUILD_MODNAME='"cmdline"'



I am fine with passing these to *.S files,
as the -D is a preprocessor option.




--
Best Regards
Masahiro Yamada