mbox series

[v2,0/2] tracing: Introduce relative stacktrace

Message ID 173839458022.2009498.14495253908367838065.stgit@devnote2 (mailing list archive)
Headers show
Series tracing: Introduce relative stacktrace | expand

Message

Masami Hiramatsu (Google) Feb. 1, 2025, 7:23 a.m. UTC
Hi,

Here is the 2nd version of adding relative stacktrace for tracing.
The previous version is here;

https://lore.kernel.org/all/173807861687.1525539.15082309716909038251.stgit@mhiramat.roam.corp.google.com/

In this version, I changed the idea to only use the first 32bit of
the build_id of the modules instead of using live hash/id to identify
the module. Also, save the offset from the .text section for each
module instead of using the offset from the _stext for the module
address. (For the core kernel text address, keep using the offset
from _stext.)

This brings the following benefits:
 - Do not need to save the live module allocation information on
   somewhere in the reserved memory.
 - Easy to find the module offline.
 - We can ensure there are only offsets from the base, no KASLR info.

Moreover, encode/decode module build_id, we can show the module name
with the symbols on stacktrace.

Thus, this relative stacktrace is a better option for the persistent
ring buffer with security restricted environment (e.g. no kallsyms
access from user.)

 # echo 1 > options/relative-stacktrace 
 # modprobe trace_events_sample
 # echo stacktrace > events/sample-trace/foo_bar/trigger 
 # cat trace 
    event-sample-1622    [004] ...1.   397.542659: <stack trace>
 => event_triggers_post_call
 => trace_event_raw_event_foo_bar [trace_events_sample]
 => do_simple_thread_func [trace_events_sample]
 => simple_thread [trace_events_sample]
 => kthread
 => ret_from_fork
 => ret_from_fork_asm

Thank you,
---

Masami Hiramatsu (Google) (2):
      modules: Add __module_build_id() to find module by build_id
      tracing: Add relative-stacktrace option


 include/linux/module.h       |    8 +++++
 include/linux/trace.h        |    5 +++
 kernel/module/Kconfig        |    3 ++
 kernel/module/kallsyms.c     |    4 +--
 kernel/module/main.c         |   29 ++++++++++++++++++++
 kernel/trace/Kconfig         |    1 +
 kernel/trace/trace.c         |   50 ++++++++++++++++++++++++++++++----
 kernel/trace/trace.h         |    3 ++
 kernel/trace/trace_entries.h |   18 ++++++++++++
 kernel/trace/trace_output.c  |   62 ++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug            |    1 +
 11 files changed, 175 insertions(+), 9 deletions(-)

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

Comments

Steven Rostedt Feb. 3, 2025, 3:32 p.m. UTC | #1
On Sat,  1 Feb 2025 16:23:00 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> Hi,
> 
> Here is the 2nd version of adding relative stacktrace for tracing.
> The previous version is here;
> 
> https://lore.kernel.org/all/173807861687.1525539.15082309716909038251.stgit@mhiramat.roam.corp.google.com/
> 
> In this version, I changed the idea to only use the first 32bit of
> the build_id of the modules instead of using live hash/id to identify
> the module. Also, save the offset from the .text section for each
> module instead of using the offset from the _stext for the module
> address. (For the core kernel text address, keep using the offset
> from _stext.)
> 
> This brings the following benefits:
>  - Do not need to save the live module allocation information on
>    somewhere in the reserved memory.
>  - Easy to find the module offline.
>  - We can ensure there are only offsets from the base, no KASLR info.
> 
> Moreover, encode/decode module build_id, we can show the module name
> with the symbols on stacktrace.
> 
> Thus, this relative stacktrace is a better option for the persistent
> ring buffer with security restricted environment (e.g. no kallsyms
> access from user.)
> 
>  # echo 1 > options/relative-stacktrace 
>  # modprobe trace_events_sample
>  # echo stacktrace > events/sample-trace/foo_bar/trigger 
>  # cat trace 
>     event-sample-1622    [004] ...1.   397.542659: <stack trace>
>  => event_triggers_post_call
>  => trace_event_raw_event_foo_bar [trace_events_sample]
>  => do_simple_thread_func [trace_events_sample]
>  => simple_thread [trace_events_sample]
>  => kthread
>  => ret_from_fork
>  => ret_from_fork_asm  
>

I thought we decided that we didn't need the relative stack trace? That all
we need to do is to expose the offset from the last boot, and a list of
modules that were loaded and their addresses, and then we can easily
decipher the stack traces into human readable format?

-- Steve
Masami Hiramatsu (Google) Feb. 5, 2025, 12:25 p.m. UTC | #2
On Mon, 3 Feb 2025 10:32:34 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sat,  1 Feb 2025 16:23:00 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> > Hi,
> > 
> > Here is the 2nd version of adding relative stacktrace for tracing.
> > The previous version is here;
> > 
> > https://lore.kernel.org/all/173807861687.1525539.15082309716909038251.stgit@mhiramat.roam.corp.google.com/
> > 
> > In this version, I changed the idea to only use the first 32bit of
> > the build_id of the modules instead of using live hash/id to identify
> > the module. Also, save the offset from the .text section for each
> > module instead of using the offset from the _stext for the module
> > address. (For the core kernel text address, keep using the offset
> > from _stext.)
> > 
> > This brings the following benefits:
> >  - Do not need to save the live module allocation information on
> >    somewhere in the reserved memory.
> >  - Easy to find the module offline.
> >  - We can ensure there are only offsets from the base, no KASLR info.
> > 
> > Moreover, encode/decode module build_id, we can show the module name
> > with the symbols on stacktrace.
> > 
> > Thus, this relative stacktrace is a better option for the persistent
> > ring buffer with security restricted environment (e.g. no kallsyms
> > access from user.)
> > 
> >  # echo 1 > options/relative-stacktrace 
> >  # modprobe trace_events_sample
> >  # echo stacktrace > events/sample-trace/foo_bar/trigger 
> >  # cat trace 
> >     event-sample-1622    [004] ...1.   397.542659: <stack trace>
> >  => event_triggers_post_call
> >  => trace_event_raw_event_foo_bar [trace_events_sample]
> >  => do_simple_thread_func [trace_events_sample]
> >  => simple_thread [trace_events_sample]
> >  => kthread
> >  => ret_from_fork
> >  => ret_from_fork_asm  
> >
> 
> I thought we decided that we didn't need the relative stack trace? That all
> we need to do is to expose the offset from the last boot, and a list of
> modules that were loaded and their addresses, and then we can easily
> decipher the stack traces into human readable format?

Hmm, if it is for the last boot, it is OK. So when the user mmapped the
buffer before using it for trace, such base-address metadata will be
exposed, and after using the trace, it is not exposed because that will
leak the current boot base address? (Or we can expose that?)

I meant that exposing the table for previous boot is safe, but it may
not be allowed for the live tracing. That is my concern.

Anyway, let me try storing the module table.

Thank you,

> 
> -- Steve
>
Masami Hiramatsu (Google) Feb. 5, 2025, 1:28 p.m. UTC | #3
On Wed, 5 Feb 2025 21:25:43 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Mon, 3 Feb 2025 10:32:34 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Sat,  1 Feb 2025 16:23:00 +0900
> > "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> > 
> > > Hi,
> > > 
> > > Here is the 2nd version of adding relative stacktrace for tracing.
> > > The previous version is here;
> > > 
> > > https://lore.kernel.org/all/173807861687.1525539.15082309716909038251.stgit@mhiramat.roam.corp.google.com/
> > > 
> > > In this version, I changed the idea to only use the first 32bit of
> > > the build_id of the modules instead of using live hash/id to identify
> > > the module. Also, save the offset from the .text section for each
> > > module instead of using the offset from the _stext for the module
> > > address. (For the core kernel text address, keep using the offset
> > > from _stext.)
> > > 
> > > This brings the following benefits:
> > >  - Do not need to save the live module allocation information on
> > >    somewhere in the reserved memory.
> > >  - Easy to find the module offline.
> > >  - We can ensure there are only offsets from the base, no KASLR info.
> > > 
> > > Moreover, encode/decode module build_id, we can show the module name
> > > with the symbols on stacktrace.
> > > 
> > > Thus, this relative stacktrace is a better option for the persistent
> > > ring buffer with security restricted environment (e.g. no kallsyms
> > > access from user.)
> > > 
> > >  # echo 1 > options/relative-stacktrace 
> > >  # modprobe trace_events_sample
> > >  # echo stacktrace > events/sample-trace/foo_bar/trigger 
> > >  # cat trace 
> > >     event-sample-1622    [004] ...1.   397.542659: <stack trace>
> > >  => event_triggers_post_call
> > >  => trace_event_raw_event_foo_bar [trace_events_sample]
> > >  => do_simple_thread_func [trace_events_sample]
> > >  => simple_thread [trace_events_sample]
> > >  => kthread
> > >  => ret_from_fork
> > >  => ret_from_fork_asm  
> > >
> > 
> > I thought we decided that we didn't need the relative stack trace? That all
> > we need to do is to expose the offset from the last boot, and a list of
> > modules that were loaded and their addresses, and then we can easily
> > decipher the stack traces into human readable format?
> 
> Hmm, if it is for the last boot, it is OK. So when the user mmapped the
> buffer before using it for trace, such base-address metadata will be
> exposed, and after using the trace, it is not exposed because that will
> leak the current boot base address? (Or we can expose that?)
> 
> I meant that exposing the table for previous boot is safe, but it may
> not be allowed for the live tracing. That is my concern.

Ah, nevermind. Anyway when we trace stack from specific trace event,
it exposes the symbol address which is easily estimated.

So for completely different context, one possible way to use case of
this relative stacktrace (and relative pointers as wider application)
is not exposing any kernel text address to users including the address
in the trace events (maybe we can introduce something like `POINTER()`
macro for TRACE_EVENT(). But this is another story.

Thanks,
Steven Rostedt Feb. 5, 2025, 2:53 p.m. UTC | #4
On Wed, 5 Feb 2025 21:25:43 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> Anyway, let me try storing the module table.

I have the module table code almost done. At least the recording of the
modules into persistent memory. Exposing and using it is not started yet. I
can send what I have and you can take it over if you want.

-- Steve
Steven Rostedt Feb. 5, 2025, 10:52 p.m. UTC | #5
On Wed, 5 Feb 2025 09:53:22 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 5 Feb 2025 21:25:43 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > Anyway, let me try storing the module table.  
> 
> I have the module table code almost done. At least the recording of the
> modules into persistent memory. Exposing and using it is not started yet. I
> can send what I have and you can take it over if you want.

I finished what I was working on. Can you start with that? I can push this
up to the ring-buffer/core branch. Although it's not fully tested.

-- Steve
Masami Hiramatsu (Google) Feb. 6, 2025, 12:28 a.m. UTC | #6
On Wed, 5 Feb 2025 17:52:34 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 5 Feb 2025 09:53:22 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Wed, 5 Feb 2025 21:25:43 +0900
> > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > 
> > > Anyway, let me try storing the module table.  
> > 
> > I have the module table code almost done. At least the recording of the
> > modules into persistent memory. Exposing and using it is not started yet. I
> > can send what I have and you can take it over if you want.
> 
> I finished what I was working on. Can you start with that? I can push this
> up to the ring-buffer/core branch. Although it's not fully tested.

Oops, I also worked on that. Anyway, let me check it.

Thanks!

> 
> -- Steve
>