mbox series

[0/6] Tracing register accesses with pstore and dynamic debug

Message ID cover.1536430404.git.saiprakash.ranjan@codeaurora.org (mailing list archive)
Headers show
Series Tracing register accesses with pstore and dynamic debug | expand

Message

Sai Prakash Ranjan Sept. 8, 2018, 8:27 p.m. UTC
Hi,

This patch series adds Event tracing support to pstore and is continuation
to the RFC patch introduced to add a new tracing facility for register
accesses called Register Trace Buffer(RTB). Since we decided to not introduce
a separate framework to trace register accesses and use existing framework
like tracepoints, I have moved from RFC. Details of the RFC in link below:

Link: https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/

MSR tracing example given by Steven was helpful in using tracepoints for
register accesses instead of using separate trace. But just having these
IO traces would not help much unless we could have them in some persistent
ram buffer for debugging unclocked access or some kind of bus hang or an
unexpected reset caused by some buggy driver which happens a lot during
initial development stages. By analyzing the last few entries of this buffer,
we could identify the register access which is causing the issue.

But again, adding pstore support for just one such event would be unacceptable.
Instead, add pstore support for all events since event tracing is widely used
across the kernel to debug variety of issues and adding this support would be
really useful for such purposes.

In addition to this, provide dynamic debug support to filter out unwanted logs
and limit trace to only specific files or directories since there can be
aweful lot of register trace events and we will be interested only in specific
drivers or subsystems which we will be working on. So introduce a new flag "e"
to filter these event tracing to specified input. Currently only register
access trace for arm64 will have this support but this could be expanded later
to other events also if required.

First example below shows the use of tracing events with pstore support. Here
we trace sched events:

  # trace_event=sched tp_pstore in command line
  # reboot -f
  # mount -t pstore pstore /sys/fs/pstore/
  # tail /sys/fs/pstore/event-ramoops-0
  sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=rcu_preempt next_pid=10 next_prio=120
  sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=R+ ==> next_comm=swapper/1 next_pid=0 next_prio=120
  sched_waking: comm=rcu_sched pid=11 prio=120 target_cpu=002
  sched_wakeup: comm=rcu_sched pid=11 prio=120 target_cpu=002
  sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=rcu_sched next_pid=11 next_prio=120
  sched_switch: prev_comm=rcu_sched prev_pid=11 prev_prio=120 prev_state=R+ ==> next_comm=swapper/2 next_pid=0 next_prio=120
  sched_waking: comm=reboot pid=1867 prio=120 target_cpu=000
  sched_wakeup: comm=reboot pid=1867 prio=120 target_cpu=000
  sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=reboot next_pid=1867 next_prio=120

We can get more details for debugging as we can see from the above pstore
output.

Below is the second example for identifying the root cause of bus hang in
qcom mdp tested on db410c. This hang was intentionally introduced just to
show the usecase of tracing with pstore. The module used can be found here:
 https://github.com/saiprakash-ranjan/Bus-Hang
This does an unclocked access and will reset db410c and later logs can be
viewed through pstore. Here we manually specify the path to trace in kernel
command line.

Note: For testing purpose, I just copied bus_hang.c to drivers/soc/qcom and built it.

1) Set command line with dyndbg, trace_event and tp_pstore parameter as below:

   # dyndbg="file drivers/soc/qcom/* +e" trace_event=io tp_pstore

2) Bus hang by reading below debugfs entry with bus_hang module.

   # cat /sys/kernel/debug/hang/bus_hang

3) After restart, we can find the cause in last entry i.e. (bus_hang_mdp+0xa4/0xb8)

   # cat /sys/fs/pstore/event-ramoops-0
   io_write: type=writel cpu=0 ts:1423426774 data=0xffff00000d5065a4 caller=qcom_smsm_probe+0x52c/0x678
   io_write: type=writel cpu=0 ts:1423649847 data=0xffff00000d506608 caller=qcom_smsm_probe+0x52c/0x678
   io_read: type=readl cpu=1 ts:53095994171 data=0xffff00000a51d040 caller=bus_hang_mdp+0xa4/0xb8

4) Offending register access found as below:

   # (gdb)
   # (gdb) list *(bus_hang_mdp+0xa4)
   # 0xffff00000867cdc8 is in bus_hang_mdp (drivers/soc/qcom/bus_hang.c:10).
   # 5       static int bus_hang_mdp(void *data, u64 *val)
   # 6       {
   # 7               void *p = ioremap(0x01a01000, SZ_4K);
   # 8               unsigned int a;
   # 9
   # 10              a = __raw_readl((void *)((unsigned long)p + 0x40));  <----
   # 11
   # 12              *val = a;
   # 13
   # 14              return 0;
   # (gdb)

Patchwise oneline description is given below:

Patch 1 adds event-size property to dt-bindings of ramoops.

Patch 2 adds generic event tracing support to pstore.

Patch 3 adds tp_pstore cmdline to have tracepoints go to pstore.

Patch 4 adds tracepoint for register accesses, i.e. for (read/write{b,w,l,q}).

Patch 5 adds new header for instrumentation of io operations.

Patch 6 adds new flag for dynamic event tracing.

Sai Prakash Ranjan (6):
  dt-bindings: ramoops: Add event-size property
  pstore: Add event tracing support
  tracing: Add tp_pstore cmdline to have tracepoints go to pstore
  arm64/io: Add tracepoint for register accesses
  arm64/io: Add header for instrumentation of io operations
  dynamic_debug: Add flag for dynamic event tracing

 .../admin-guide/kernel-parameters.txt         | 21 ++++++
 .../bindings/reserved-memory/ramoops.txt      |  7 +-
 arch/arm64/include/asm/io.h                   | 25 +++----
 arch/arm64/kernel/io.c                        | 22 ++++++
 fs/pstore/Kconfig                             |  2 +-
 fs/pstore/ftrace.c                            | 55 +++++++++++++++
 fs/pstore/inode.c                             |  4 ++
 fs/pstore/ram.c                               | 44 +++++++++++-
 include/asm-generic/io-instrumented.h         | 69 ++++++++++++++++++
 include/asm-generic/io-trace.h                | 70 +++++++++++++++++++
 include/linux/dynamic_debug.h                 |  1 +
 include/linux/ftrace.h                        |  6 +-
 include/linux/pstore.h                        |  2 +
 include/linux/pstore_ram.h                    |  1 +
 kernel/sysctl.c                               |  7 ++
 kernel/trace/Kconfig                          | 22 +++++-
 kernel/trace/trace.c                          | 51 ++++++++++++++
 kernel/trace/trace.h                          |  7 ++
 lib/dynamic_debug.c                           |  1 +
 19 files changed, 393 insertions(+), 24 deletions(-)
 create mode 100644 include/asm-generic/io-instrumented.h
 create mode 100644 include/asm-generic/io-trace.h

Comments

Will Deacon Sept. 11, 2018, 3:11 p.m. UTC | #1
Hello,

On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote:
> This patch series adds Event tracing support to pstore and is continuation
> to the RFC patch introduced to add a new tracing facility for register
> accesses called Register Trace Buffer(RTB). Since we decided to not introduce
> a separate framework to trace register accesses and use existing framework
> like tracepoints, I have moved from RFC. Details of the RFC in link below:
> 
> Link: https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/

The arm64 backend bits look fine to be, but they're mixed up with core
changes that I haven't looked at closely.

Will
Sai Prakash Ranjan Sept. 11, 2018, 4:11 p.m. UTC | #2
On 9/11/2018 8:41 PM, Will Deacon wrote:
> Hello,
> 
> On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote:
>> This patch series adds Event tracing support to pstore and is continuation
>> to the RFC patch introduced to add a new tracing facility for register
>> accesses called Register Trace Buffer(RTB). Since we decided to not introduce
>> a separate framework to trace register accesses and use existing framework
>> like tracepoints, I have moved from RFC. Details of the RFC in link below:
>>
>> Link: https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/
> 
> The arm64 backend bits look fine to be, but they're mixed up with core
> changes that I haven't looked at closely.
> 
Hi Will,

Thanks for the review. Let me know if you have any suggestions or points 
regarding this patch, I will be happy to incorporate it.
Joel Fernandes Oct. 20, 2018, 5:25 a.m. UTC | #3
On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote:
> Hi,
> 
> This patch series adds Event tracing support to pstore and is continuation
> to the RFC patch introduced to add a new tracing facility for register
> accesses called Register Trace Buffer(RTB). Since we decided to not introduce
> a separate framework to trace register accesses and use existing framework
> like tracepoints, I have moved from RFC. Details of the RFC in link below:
> 
> Link: https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/
> 
> MSR tracing example given by Steven was helpful in using tracepoints for
> register accesses instead of using separate trace. But just having these
> IO traces would not help much unless we could have them in some persistent
> ram buffer for debugging unclocked access or some kind of bus hang or an
> unexpected reset caused by some buggy driver which happens a lot during
> initial development stages. By analyzing the last few entries of this buffer,
> we could identify the register access which is causing the issue.

Hi Sai,

I wanted to see if I could make some time to get your patches working. We are
hitting usecases that need something like this as well. Basically devices
hanging and then the ramdump does not tell us much, so in this case pstore
events can be really helpful. This usecase came up last year as well.

Anyway while I was going through your patches, I cleaned up some pstore code
as well and I have 3 more patches on top of yours for this clean up. I prefer
we submit the patches together and sync our work together so that there is
least conflict.

Here's my latest tree:
https://github.com/joelagnel/linux-kernel/commits/pstore-events
(note that I have only build tested the patches since I just wrote them and
its quite late in the night here ;-))

thanks,

- Joel
Sai Prakash Ranjan Oct. 20, 2018, 6:32 a.m. UTC | #4
On 10/20/2018 10:55 AM, Joel Fernandes wrote:
> On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote:
>> Hi,
>>
>> This patch series adds Event tracing support to pstore and is continuation
>> to the RFC patch introduced to add a new tracing facility for register
>> accesses called Register Trace Buffer(RTB). Since we decided to not introduce
>> a separate framework to trace register accesses and use existing framework
>> like tracepoints, I have moved from RFC. Details of the RFC in link below:
>>
>> Link: https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/
>>
>> MSR tracing example given by Steven was helpful in using tracepoints for
>> register accesses instead of using separate trace. But just having these
>> IO traces would not help much unless we could have them in some persistent
>> ram buffer for debugging unclocked access or some kind of bus hang or an
>> unexpected reset caused by some buggy driver which happens a lot during
>> initial development stages. By analyzing the last few entries of this buffer,
>> we could identify the register access which is causing the issue.
> 
> Hi Sai,
> 
> I wanted to see if I could make some time to get your patches working. We are
> hitting usecases that need something like this as well. Basically devices
> hanging and then the ramdump does not tell us much, so in this case pstore
> events can be really helpful. This usecase came up last year as well.
> 
> Anyway while I was going through your patches, I cleaned up some pstore code
> as well and I have 3 more patches on top of yours for this clean up. I prefer
> we submit the patches together and sync our work together so that there is
> least conflict.
> 
> Here's my latest tree:
> https://github.com/joelagnel/linux-kernel/commits/pstore-events
> (note that I have only build tested the patches since I just wrote them and
> its quite late in the night here ;-))
> 

Hi Joel,

Thanks for looking into this. Sure, I will be happy to sync up with you 
on this. I can test your additional patches on top of my pstore patches. 
BTW, I'm still stuck at copying binary record into pstore and then 
extract it during read time. Seems like I'm missing something.

Thanks,
Sai
Joel Fernandes Oct. 20, 2018, 4:27 p.m. UTC | #5
On Sat, Oct 20, 2018 at 12:02:37PM +0530, Sai Prakash Ranjan wrote:
> On 10/20/2018 10:55 AM, Joel Fernandes wrote:
> > On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote:
> > > Hi,
> > > 
> > > This patch series adds Event tracing support to pstore and is continuation
> > > to the RFC patch introduced to add a new tracing facility for register
> > > accesses called Register Trace Buffer(RTB). Since we decided to not introduce
> > > a separate framework to trace register accesses and use existing framework
> > > like tracepoints, I have moved from RFC. Details of the RFC in link below:
> > > 
> > > Link: https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/
> > > 
> > > MSR tracing example given by Steven was helpful in using tracepoints for
> > > register accesses instead of using separate trace. But just having these
> > > IO traces would not help much unless we could have them in some persistent
> > > ram buffer for debugging unclocked access or some kind of bus hang or an
> > > unexpected reset caused by some buggy driver which happens a lot during
> > > initial development stages. By analyzing the last few entries of this buffer,
> > > we could identify the register access which is causing the issue.
> > 
> > Hi Sai,
> > 
> > I wanted to see if I could make some time to get your patches working. We are
> > hitting usecases that need something like this as well. Basically devices
> > hanging and then the ramdump does not tell us much, so in this case pstore
> > events can be really helpful. This usecase came up last year as well.
> > 
> > Anyway while I was going through your patches, I cleaned up some pstore code
> > as well and I have 3 more patches on top of yours for this clean up. I prefer
> > we submit the patches together and sync our work together so that there is
> > least conflict.
> > 
> > Here's my latest tree:
> > https://github.com/joelagnel/linux-kernel/commits/pstore-events
> > (note that I have only build tested the patches since I just wrote them and
> > its quite late in the night here ;-))
> > 
> 
> Hi Joel,
> 
> Thanks for looking into this. Sure, I will be happy to sync up with you on

Thanks. And added a fourth patch in the tree too.

> this. I can test your additional patches on top of my pstore patches. BTW,
> I'm still stuck at copying binary record into pstore and then extract it
> during read time. Seems like I'm missing something.

Sure, push your latest somewhere and let me know. I'll try to get you unstuck.

thanks,

 - Joel
Sai Prakash Ranjan Oct. 21, 2018, 3:46 a.m. UTC | #6
On 10/20/2018 9:57 PM, Joel Fernandes wrote:
> On Sat, Oct 20, 2018 at 12:02:37PM +0530, Sai Prakash Ranjan wrote:
>> On 10/20/2018 10:55 AM, Joel Fernandes wrote:
>>> On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote:
>>>> Hi,
>>>>
>>>> This patch series adds Event tracing support to pstore and is continuation
>>>> to the RFC patch introduced to add a new tracing facility for register
>>>> accesses called Register Trace Buffer(RTB). Since we decided to not introduce
>>>> a separate framework to trace register accesses and use existing framework
>>>> like tracepoints, I have moved from RFC. Details of the RFC in link below:
>>>>
>>>> Link: https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/
>>>>
>>>> MSR tracing example given by Steven was helpful in using tracepoints for
>>>> register accesses instead of using separate trace. But just having these
>>>> IO traces would not help much unless we could have them in some persistent
>>>> ram buffer for debugging unclocked access or some kind of bus hang or an
>>>> unexpected reset caused by some buggy driver which happens a lot during
>>>> initial development stages. By analyzing the last few entries of this buffer,
>>>> we could identify the register access which is causing the issue.
>>>
>>> Hi Sai,
>>>
>>> I wanted to see if I could make some time to get your patches working. We are
>>> hitting usecases that need something like this as well. Basically devices
>>> hanging and then the ramdump does not tell us much, so in this case pstore
>>> events can be really helpful. This usecase came up last year as well.
>>>
>>> Anyway while I was going through your patches, I cleaned up some pstore code
>>> as well and I have 3 more patches on top of yours for this clean up. I prefer
>>> we submit the patches together and sync our work together so that there is
>>> least conflict.
>>>
>>> Here's my latest tree:
>>> https://github.com/joelagnel/linux-kernel/commits/pstore-events
>>> (note that I have only build tested the patches since I just wrote them and
>>> its quite late in the night here ;-))
>>>
>>
>> Hi Joel,
>>
>> Thanks for looking into this. Sure, I will be happy to sync up with you on
> 
> Thanks. And added a fourth patch in the tree too.
> 
>> this. I can test your additional patches on top of my pstore patches. BTW,
>> I'm still stuck at copying binary record into pstore and then extract it
>> during read time. Seems like I'm missing something.
> 
> Sure, push your latest somewhere and let me know. I'll try to get you unstuck.
> 

Thanks Joel, I will push my changes and let you know in some time.
Sai Prakash Ranjan Oct. 21, 2018, 4:59 a.m. UTC | #7
On 10/21/2018 9:16 AM, Sai Prakash Ranjan wrote:
> On 10/20/2018 9:57 PM, Joel Fernandes wrote:
>> On Sat, Oct 20, 2018 at 12:02:37PM +0530, Sai Prakash Ranjan wrote:
>>> On 10/20/2018 10:55 AM, Joel Fernandes wrote:
>>>> On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote:
>>>>> Hi,
>>>>>
>>>>> This patch series adds Event tracing support to pstore and is 
>>>>> continuation
>>>>> to the RFC patch introduced to add a new tracing facility for register
>>>>> accesses called Register Trace Buffer(RTB). Since we decided to not 
>>>>> introduce
>>>>> a separate framework to trace register accesses and use existing 
>>>>> framework
>>>>> like tracepoints, I have moved from RFC. Details of the RFC in link 
>>>>> below:
>>>>>
>>>>> Link: 
>>>>> https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/ 
>>>>>
>>>>>
>>>>> MSR tracing example given by Steven was helpful in using 
>>>>> tracepoints for
>>>>> register accesses instead of using separate trace. But just having 
>>>>> these
>>>>> IO traces would not help much unless we could have them in some 
>>>>> persistent
>>>>> ram buffer for debugging unclocked access or some kind of bus hang 
>>>>> or an
>>>>> unexpected reset caused by some buggy driver which happens a lot 
>>>>> during
>>>>> initial development stages. By analyzing the last few entries of 
>>>>> this buffer,
>>>>> we could identify the register access which is causing the issue.
>>>>
>>>> Hi Sai,
>>>>
>>>> I wanted to see if I could make some time to get your patches 
>>>> working. We are
>>>> hitting usecases that need something like this as well. Basically 
>>>> devices
>>>> hanging and then the ramdump does not tell us much, so in this case 
>>>> pstore
>>>> events can be really helpful. This usecase came up last year as well.
>>>>
>>>> Anyway while I was going through your patches, I cleaned up some 
>>>> pstore code
>>>> as well and I have 3 more patches on top of yours for this clean up. 
>>>> I prefer
>>>> we submit the patches together and sync our work together so that 
>>>> there is
>>>> least conflict.
>>>>
>>>> Here's my latest tree:
>>>> https://github.com/joelagnel/linux-kernel/commits/pstore-events
>>>> (note that I have only build tested the patches since I just wrote 
>>>> them and
>>>> its quite late in the night here ;-))
>>>>
>>>
>>> Hi Joel,
>>>
>>> Thanks for looking into this. Sure, I will be happy to sync up with 
>>> you on
>>
>> Thanks. And added a fourth patch in the tree too.
>>
>>> this. I can test your additional patches on top of my pstore patches. 
>>> BTW,
>>> I'm still stuck at copying binary record into pstore and then extract it
>>> during read time. Seems like I'm missing something.
>>
>> Sure, push your latest somewhere and let me know. I'll try to get you 
>> unstuck.
>>
> 
> Thanks Joel, I will push my changes and let you know in some time.
> 

Hi Joel,

Here's my tree:
https://github.com/saiprakash-ranjan/linux/commits/pstore-events

I have one patch extra on top of your patches. Nothing much on binary 
record storage in this patch, only removed kmalloc in pstore event call 
to avoid loop.

Thanks,
Sai
Joel Fernandes Oct. 21, 2018, 5:09 a.m. UTC | #8
On Sun, Oct 21, 2018 at 09:16:59AM +0530, Sai Prakash Ranjan wrote:
> On 10/20/2018 9:57 PM, Joel Fernandes wrote:
> > On Sat, Oct 20, 2018 at 12:02:37PM +0530, Sai Prakash Ranjan wrote:
> > > On 10/20/2018 10:55 AM, Joel Fernandes wrote:
> > > > On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote:
> > > > > Hi,
> > > > > 
> > > > > This patch series adds Event tracing support to pstore and is continuation
> > > > > to the RFC patch introduced to add a new tracing facility for register
> > > > > accesses called Register Trace Buffer(RTB). Since we decided to not introduce
> > > > > a separate framework to trace register accesses and use existing framework
> > > > > like tracepoints, I have moved from RFC. Details of the RFC in link below:
> > > > > 
> > > > > Link: https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ranjan@codeaurora.org/
> > > > > 
> > > > > MSR tracing example given by Steven was helpful in using tracepoints for
> > > > > register accesses instead of using separate trace. But just having these
> > > > > IO traces would not help much unless we could have them in some persistent
> > > > > ram buffer for debugging unclocked access or some kind of bus hang or an
> > > > > unexpected reset caused by some buggy driver which happens a lot during
> > > > > initial development stages. By analyzing the last few entries of this buffer,
> > > > > we could identify the register access which is causing the issue.
> > > > 
> > > > Hi Sai,
> > > > 
> > > > I wanted to see if I could make some time to get your patches working. We are
> > > > hitting usecases that need something like this as well. Basically devices
> > > > hanging and then the ramdump does not tell us much, so in this case pstore
> > > > events can be really helpful. This usecase came up last year as well.
> > > > 
> > > > Anyway while I was going through your patches, I cleaned up some pstore code
> > > > as well and I have 3 more patches on top of yours for this clean up. I prefer
> > > > we submit the patches together and sync our work together so that there is
> > > > least conflict.
> > > > 
> > > > Here's my latest tree:
> > > > https://github.com/joelagnel/linux-kernel/commits/pstore-events
> > > > (note that I have only build tested the patches since I just wrote them and
> > > > its quite late in the night here ;-))
> > > > 
> > > 
> > > Hi Joel,
> > > 
> > > Thanks for looking into this. Sure, I will be happy to sync up with you on
> > 
> > Thanks. And added a fourth patch in the tree too.


While at it, I was thinking about the problem we are trying to solve in
another way. If ftrace itself can use pages from the persistent ram store,
instead of the kernel's buddy allocator, then the ftrace ring buffer itself
could persist across a system reboot.

The clear advantage of that for Sai's pstore events work is, not having to
duplicate a lot of the ring buffer code and stuff into pstore (for the pstore
events for example, I wanted time stamps as well and ftrace's ring buffer has
some nice time management code to deal with time deltas). We already have
other ring buffers maintained in other parts of the kernel for tracing right?
So I'm a bit averse to duplicating that into pstore as well for tracing. The
other advantage of persisting the ftrace ring buffer would also mean data
from other tracers such as function-graph tracer and irqsoff tracers would
also persist and then we can also probably get rid of ftrace-in-pstore stuff
which is kind of incomplete anyway since it does not have time stamps for
recorded functions.

Steven and Kees: what do you think, is persisting ftrace ring buffer across
reboots a worthwhile idea? Any thoughts on how feasible something like that
could be, code wise? Off the top, I think the ring buffer state that ftrace
needs other than the trace data itself will also have to be persisted.

thanks,

 - Joel