diff mbox series

[v2,13/16] coresight: tmc-etr: Allow events to use the same ETR buffer

Message ID 20190325215632.17013-14-mathieu.poirier@linaro.org (mailing list archive)
State New, archived
Headers show
Series coresight: Add support for CPU-wide trace scenarios | expand

Commit Message

Mathieu Poirier March 25, 2019, 9:56 p.m. UTC
This patch uses the pid of the process being traced to aggregate traces
coming from different processors in the same sink, something that is
required when collecting traces in CPU-wide mode when the CoreSight HW
enacts a N:1 source/sink topology.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 .../hwtracing/coresight/coresight-tmc-etr.c   | 71 +++++++++++++++++--
 1 file changed, 65 insertions(+), 6 deletions(-)

Comments

Suzuki K Poulose March 26, 2019, 4:18 p.m. UTC | #1
On 03/25/2019 09:56 PM, Mathieu Poirier wrote:
> This patch uses the pid of the process being traced to aggregate traces
> coming from different processors in the same sink, something that is
> required when collecting traces in CPU-wide mode when the CoreSight HW
> enacts a N:1 source/sink topology.
> 
> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> ---
>   .../hwtracing/coresight/coresight-tmc-etr.c   | 71 +++++++++++++++++--
>   1 file changed, 65 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 7254fafdf1c2..cbabf88bd51d 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -8,6 +8,7 @@
>   #include <linux/coresight.h>
>   #include <linux/dma-mapping.h>
>   #include <linux/iommu.h>
> +#include <linux/idr.h>
>   #include <linux/slab.h>
>   #include <linux/types.h>
>   #include <linux/vmalloc.h>
> @@ -41,6 +42,9 @@ struct etr_perf_buffer {
>   	void			**pages;
>   };
>   
> +static DEFINE_IDR(session_idr);
> +static DEFINE_MUTEX(session_idr_lock);

Please correct me if I am wrong here. What we now do with this series is

- One event per CPU and thus one ETR perf buf per CPU.
- One *ETR buf* per PID, from the IDR hash list.

However, if we have 1:1 situation, we will have :

N (say 2 ETRs), but one *ETR buf* as they all share the same PID, 
implying multiple multiple ETRs will try to use the same etr buf,
which could corrupt the trace data ? Instead,  what we need in that
situation is :

One ETR buf perf PID+ETR device. i.e, etr_bufs must be per ETR.
So should the IDR be specific to an ETR ?

Or do we not support a session with multiple ETRs involved (1:1) ?

Cheers
Suzuki
Mathieu Poirier March 26, 2019, 5:55 p.m. UTC | #2
On Tue, Mar 26, 2019 at 04:18:35PM +0000, Suzuki K Poulose wrote:
> On 03/25/2019 09:56 PM, Mathieu Poirier wrote:
> > This patch uses the pid of the process being traced to aggregate traces
> > coming from different processors in the same sink, something that is
> > required when collecting traces in CPU-wide mode when the CoreSight HW
> > enacts a N:1 source/sink topology.
> > 
> > Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> > ---
> >   .../hwtracing/coresight/coresight-tmc-etr.c   | 71 +++++++++++++++++--
> >   1 file changed, 65 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> > index 7254fafdf1c2..cbabf88bd51d 100644
> > --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> > +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> > @@ -8,6 +8,7 @@
> >   #include <linux/coresight.h>
> >   #include <linux/dma-mapping.h>
> >   #include <linux/iommu.h>
> > +#include <linux/idr.h>
> >   #include <linux/slab.h>
> >   #include <linux/types.h>
> >   #include <linux/vmalloc.h>
> > @@ -41,6 +42,9 @@ struct etr_perf_buffer {
> >   	void			**pages;
> >   };
> > +static DEFINE_IDR(session_idr);
> > +static DEFINE_MUTEX(session_idr_lock);
> 
> Please correct me if I am wrong here. What we now do with this series is
> 
> - One event per CPU and thus one ETR perf buf per CPU.
> - One *ETR buf* per PID, from the IDR hash list.
> 
> However, if we have 1:1 situation, we will have :
> 
> N (say 2 ETRs), but one *ETR buf* as they all share the same PID, implying
> multiple multiple ETRs will try to use the same etr buf,
> which could corrupt the trace data ? Instead,  what we need in that
> situation is :
> 
> One ETR buf perf PID+ETR device. i.e, etr_bufs must be per ETR.
> So should the IDR be specific to an ETR ?
> 
> Or do we not support a session with multiple ETRs involved (1:1) ?

At this time 1:1 topologies are not supported and a fair amount of work will go
in doing so.  When I started working on this serie my thought was that because
of said amount of work there is no point thinking about 1:1, and that we can
deal with it when we get there.

But taking a step back and having dealt with the harder (concurrency) problems
inherent to CPU-wide scenarios, it would be trivial to make the code 1:1 ready
and it will be one less thing to worry about down the road.

That being said and answering your question above, I think we need and IDR per
ETR (in the drvdata) to avoid contention issues on systems with several ETRs.  

Thanks for bringing this back to my attention.
Mathieu

> 
> Cheers
> Suzuki
Suzuki K Poulose March 27, 2019, 11:32 a.m. UTC | #3
On 03/26/2019 05:55 PM, Mathieu Poirier wrote:
> On Tue, Mar 26, 2019 at 04:18:35PM +0000, Suzuki K Poulose wrote:
>> On 03/25/2019 09:56 PM, Mathieu Poirier wrote:
>>> This patch uses the pid of the process being traced to aggregate traces
>>> coming from different processors in the same sink, something that is
>>> required when collecting traces in CPU-wide mode when the CoreSight HW
>>> enacts a N:1 source/sink topology.
>>>
>>> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
>>> ---
>>>    .../hwtracing/coresight/coresight-tmc-etr.c   | 71 +++++++++++++++++--
>>>    1 file changed, 65 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>> index 7254fafdf1c2..cbabf88bd51d 100644
>>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>> @@ -8,6 +8,7 @@
>>>    #include <linux/coresight.h>
>>>    #include <linux/dma-mapping.h>
>>>    #include <linux/iommu.h>
>>> +#include <linux/idr.h>
>>>    #include <linux/slab.h>
>>>    #include <linux/types.h>
>>>    #include <linux/vmalloc.h>
>>> @@ -41,6 +42,9 @@ struct etr_perf_buffer {
>>>    	void			**pages;
>>>    };
>>> +static DEFINE_IDR(session_idr);
>>> +static DEFINE_MUTEX(session_idr_lock);
>>
>> Please correct me if I am wrong here. What we now do with this series is
>>
>> - One event per CPU and thus one ETR perf buf per CPU.
>> - One *ETR buf* per PID, from the IDR hash list.
>>
>> However, if we have 1:1 situation, we will have :
>>
>> N (say 2 ETRs), but one *ETR buf* as they all share the same PID, implying
>> multiple multiple ETRs will try to use the same etr buf,
>> which could corrupt the trace data ? Instead,  what we need in that
>> situation is :
>>
>> One ETR buf perf PID+ETR device. i.e, etr_bufs must be per ETR.
>> So should the IDR be specific to an ETR ?
>>
>> Or do we not support a session with multiple ETRs involved (1:1) ?
> 
> At this time 1:1 topologies are not supported and a fair amount of work will go
> in doing so.  When I started working on this serie my thought was that because
> of said amount of work there is no point thinking about 1:1, and that we can
> deal with it when we get there.
> 
> But taking a step back and having dealt with the harder (concurrency) problems
> inherent to CPU-wide scenarios, it would be trivial to make the code 1:1 ready
> and it will be one less thing to worry about down the road.
> 
> That being said and answering your question above, I think we need and IDR per
> ETR (in the drvdata) to avoid contention issues on systems with several ETRs.
> 
> Thanks for bringing this back to my attention.

Cool. Thanks for explaining the rationale. So, when we do that, I think
we may be able to have one "etr_perf_buffer" per ETR and thus we may be
able to move the refcount back to the etr_perf_buffer, just like we
moved the PID and index etr_perf_buffer instead of the etr_buf ?

Cheers
Suzuki
Mathieu Poirier March 27, 2019, 5:01 p.m. UTC | #4
On Wed, 27 Mar 2019 at 05:30, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
> On 03/26/2019 05:55 PM, Mathieu Poirier wrote:
> > On Tue, Mar 26, 2019 at 04:18:35PM +0000, Suzuki K Poulose wrote:
> >> On 03/25/2019 09:56 PM, Mathieu Poirier wrote:
> >>> This patch uses the pid of the process being traced to aggregate traces
> >>> coming from different processors in the same sink, something that is
> >>> required when collecting traces in CPU-wide mode when the CoreSight HW
> >>> enacts a N:1 source/sink topology.
> >>>
> >>> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> >>> ---
> >>>    .../hwtracing/coresight/coresight-tmc-etr.c   | 71 +++++++++++++++++--
> >>>    1 file changed, 65 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>> index 7254fafdf1c2..cbabf88bd51d 100644
> >>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>> @@ -8,6 +8,7 @@
> >>>    #include <linux/coresight.h>
> >>>    #include <linux/dma-mapping.h>
> >>>    #include <linux/iommu.h>
> >>> +#include <linux/idr.h>
> >>>    #include <linux/slab.h>
> >>>    #include <linux/types.h>
> >>>    #include <linux/vmalloc.h>
> >>> @@ -41,6 +42,9 @@ struct etr_perf_buffer {
> >>>     void                    **pages;
> >>>    };
> >>> +static DEFINE_IDR(session_idr);
> >>> +static DEFINE_MUTEX(session_idr_lock);
> >>
> >> Please correct me if I am wrong here. What we now do with this series is
> >>
> >> - One event per CPU and thus one ETR perf buf per CPU.
> >> - One *ETR buf* per PID, from the IDR hash list.
> >>
> >> However, if we have 1:1 situation, we will have :
> >>
> >> N (say 2 ETRs), but one *ETR buf* as they all share the same PID, implying
> >> multiple multiple ETRs will try to use the same etr buf,
> >> which could corrupt the trace data ? Instead,  what we need in that
> >> situation is :
> >>
> >> One ETR buf perf PID+ETR device. i.e, etr_bufs must be per ETR.
> >> So should the IDR be specific to an ETR ?
> >>
> >> Or do we not support a session with multiple ETRs involved (1:1) ?
> >
> > At this time 1:1 topologies are not supported and a fair amount of work will go
> > in doing so.  When I started working on this serie my thought was that because
> > of said amount of work there is no point thinking about 1:1, and that we can
> > deal with it when we get there.
> >
> > But taking a step back and having dealt with the harder (concurrency) problems
> > inherent to CPU-wide scenarios, it would be trivial to make the code 1:1 ready
> > and it will be one less thing to worry about down the road.
> >
> > That being said and answering your question above, I think we need and IDR per
> > ETR (in the drvdata) to avoid contention issues on systems with several ETRs.
> >
> > Thanks for bringing this back to my attention.
>
> Cool. Thanks for explaining the rationale. So, when we do that, I think
> we may be able to have one "etr_perf_buffer" per ETR and thus we may be
> able to move the refcount back to the etr_perf_buffer, just like we
> moved the PID and index etr_perf_buffer instead of the etr_buf ?

An etr_perf_buffer is associated with an event and holds the AUX ring
buffer that was created for that event.  In CPU-wide N:1 mode multiple
events (one per CPU), each with its own AUX ring buffer, share a sink
and as such we can't have a single etr_perf_buffer per ETR.

Thanks,
Mathieu

>
> Cheers
> Suzuki
>
Leo Yan March 30, 2019, 3:43 p.m. UTC | #5
Hi Mathieu,

On Mon, Mar 25, 2019 at 03:56:29PM -0600, Mathieu Poirier wrote:
> This patch uses the pid of the process being traced to aggregate traces
> coming from different processors in the same sink, something that is
> required when collecting traces in CPU-wide mode when the CoreSight HW
> enacts a N:1 source/sink topology.

I tried to use kprobe to verify the flow, so I observe something is
not expected (Maybe I misunderstand it), please check the detailed
info as below.

- The testing steps are as below:

  # perf record -e cs_etm/@f6404000.etr/ -a -- sleep 10 &
  # ls
  # ls
  # ls

  So I expect the CoreSight to do global tracing for 10 seconds, and
  then simply execute 'ls' command for 3 times and these three
  processes will be traced into perf data.

- I used kprobe event to add dynamic points for function tracing
  tmc_alloc_etr_buffer() and tmc_etr_get_etr_buf():

  echo 'p:my_probe tmc_alloc_etr_buffer' > /sys/kernel/debug/tracing/kprobe_events
  echo 'p:my_probe2 0xffff000010ba63c4 pid=%x1:u32' >> kprobe_events

  When start the 'perf record' command it will create etr_perf and
  etr_buf for every CPU, but afterwards for the three 'ls' processes,
  I cannot see any any ftrace log for them.  So finally I capture the
  trace log as below, it only creates buffer for every CPU for one
  etr_perf and one etr_buf but has no any buffer is created for 'ls'
  processes.

#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
            perf-2502  [000] d...  2003.550571: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
            perf-2502  [000] d...  2003.550595: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
            perf-2502  [000] d...  2003.556306: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
            perf-2502  [000] d...  2003.556329: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
            perf-2502  [000] d...  2003.557694: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
            perf-2502  [000] d...  2003.557708: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
            perf-2502  [000] d...  2003.559079: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
            perf-2502  [000] d...  2003.559095: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
            perf-2502  [000] d...  2003.560567: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
            perf-2502  [000] d...  2003.560581: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
            perf-2502  [000] d...  2003.561954: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
            perf-2502  [000] d...  2003.561965: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
            perf-2502  [000] d...  2003.563338: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
            perf-2502  [000] d...  2003.563352: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
            perf-2502  [000] d...  2003.564782: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
            perf-2502  [000] d...  2003.564796: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502


  Question: When every time execute 'ls' program, should we expect the
  tmc-etr driver to create etr_buf for every process?

- If I use the command 'perf report --sort pid --stdio' to output result,
  I also can only see one process and doesn't have any samples for new
  created 'ls' processes:

  # Samples: 168  of event 'branches'
  # Event count (approx.): 168
  #
  # Children      Self      Pid:Command
  # ........  ........  ...............
  #
     100.00%   100.00%     2502:perf

Thanks,
Leo Yan
Mathieu Poirier April 1, 2019, 7:29 a.m. UTC | #6
Good day Leo,

On Sat, 30 Mar 2019 at 09:43, Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Mathieu,
>
> On Mon, Mar 25, 2019 at 03:56:29PM -0600, Mathieu Poirier wrote:
> > This patch uses the pid of the process being traced to aggregate traces
> > coming from different processors in the same sink, something that is
> > required when collecting traces in CPU-wide mode when the CoreSight HW
> > enacts a N:1 source/sink topology.
>
> I tried to use kprobe to verify the flow, so I observe something is
> not expected (Maybe I misunderstand it), please check the detailed
> info as below.
>
> - The testing steps are as below:
>
>   # perf record -e cs_etm/@f6404000.etr/ -a -- sleep 10 &
>   # ls
>   # ls
>   # ls
>
>   So I expect the CoreSight to do global tracing for 10 seconds, and
>   then simply execute 'ls' command for 3 times and these three
>   processes will be traced into perf data.

Right.

>
> - I used kprobe event to add dynamic points for function tracing
>   tmc_alloc_etr_buffer() and tmc_etr_get_etr_buf():
>
>   echo 'p:my_probe tmc_alloc_etr_buffer' > /sys/kernel/debug/tracing/kprobe_events
>   echo 'p:my_probe2 0xffff000010ba63c4 pid=%x1:u32' >> kprobe_events
>
>   When start the 'perf record' command it will create etr_perf and
>   etr_buf for every CPU, but afterwards for the three 'ls' processes,
>   I cannot see any any ftrace log for them.  So finally I capture the
>   trace log as below, it only creates buffer for every CPU for one
>   etr_perf and one etr_buf but has no any buffer is created for 'ls'
>   processes.
>

That is the correct behavior.  More specifically it creates an
etr_perf for every event but the etr_buf is shared between those
events because there is only one ETR device on your system.

> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>             perf-2502  [000] d...  2003.550571: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
>             perf-2502  [000] d...  2003.550595: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
>             perf-2502  [000] d...  2003.556306: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
>             perf-2502  [000] d...  2003.556329: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
>             perf-2502  [000] d...  2003.557694: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
>             perf-2502  [000] d...  2003.557708: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
>             perf-2502  [000] d...  2003.559079: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
>             perf-2502  [000] d...  2003.559095: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
>             perf-2502  [000] d...  2003.560567: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
>             perf-2502  [000] d...  2003.560581: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
>             perf-2502  [000] d...  2003.561954: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
>             perf-2502  [000] d...  2003.561965: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
>             perf-2502  [000] d...  2003.563338: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
>             perf-2502  [000] d...  2003.563352: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
>             perf-2502  [000] d...  2003.564782: my_probe: (tmc_alloc_etr_buffer+0x0/0x280)
>             perf-2502  [000] d...  2003.564796: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
>
>
>   Question: When every time execute 'ls' program, should we expect the
>   tmc-etr driver to create etr_buf for every process?

Buffers are created at the beginning of the trace session for every
CPU (as you reported) but after that everything that happens on those
CPU is using the same buffers.  As such you won't see buffers created
for every program you execute.

>
> - If I use the command 'perf report --sort pid --stdio' to output result,
>   I also can only see one process and doesn't have any samples for new
>   created 'ls' processes:
>
>   # Samples: 168  of event 'branches'
>   # Event count (approx.): 168
>   #
>   # Children      Self      Pid:Command
>   # ........  ........  ...............
>   #
>      100.00%   100.00%     2502:perf

When working with an N:1 source/sink topology and doing a CPU-wide
session, the first event to use the sink will switch it on and the
last one will collect trace data.  With a 10 second trace session it
is very likely traces associated with the 'ls' processes have been
overwritten.  This is a problem inherent to the HW topology and there
is nothing we can currently do to alleviate it.

Let me know if you need more information.
Mathieu

>
> Thanks,
> Leo Yan
Suzuki K Poulose April 1, 2019, 1:01 p.m. UTC | #7
Hi Mathieu,

On 27/03/2019 17:01, Mathieu Poirier wrote:
> On Wed, 27 Mar 2019 at 05:30, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>
>> On 03/26/2019 05:55 PM, Mathieu Poirier wrote:
>>> On Tue, Mar 26, 2019 at 04:18:35PM +0000, Suzuki K Poulose wrote:
>>>> On 03/25/2019 09:56 PM, Mathieu Poirier wrote:
>>>>> This patch uses the pid of the process being traced to aggregate traces
>>>>> coming from different processors in the same sink, something that is
>>>>> required when collecting traces in CPU-wide mode when the CoreSight HW
>>>>> enacts a N:1 source/sink topology.
>>>>>
>>>>> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
>>>>> ---
>>>>>     .../hwtracing/coresight/coresight-tmc-etr.c   | 71 +++++++++++++++++--
>>>>>     1 file changed, 65 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>>> index 7254fafdf1c2..cbabf88bd51d 100644
>>>>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>>> @@ -8,6 +8,7 @@
>>>>>     #include <linux/coresight.h>
>>>>>     #include <linux/dma-mapping.h>
>>>>>     #include <linux/iommu.h>
>>>>> +#include <linux/idr.h>
>>>>>     #include <linux/slab.h>
>>>>>     #include <linux/types.h>
>>>>>     #include <linux/vmalloc.h>
>>>>> @@ -41,6 +42,9 @@ struct etr_perf_buffer {
>>>>>      void                    **pages;
>>>>>     };
>>>>> +static DEFINE_IDR(session_idr);
>>>>> +static DEFINE_MUTEX(session_idr_lock);
>>>>
>>>> Please correct me if I am wrong here. What we now do with this series is
>>>>
>>>> - One event per CPU and thus one ETR perf buf per CPU.
>>>> - One *ETR buf* per PID, from the IDR hash list.
>>>>
>>>> However, if we have 1:1 situation, we will have :
>>>>
>>>> N (say 2 ETRs), but one *ETR buf* as they all share the same PID, implying
>>>> multiple multiple ETRs will try to use the same etr buf,
>>>> which could corrupt the trace data ? Instead,  what we need in that
>>>> situation is :
>>>>
>>>> One ETR buf perf PID+ETR device. i.e, etr_bufs must be per ETR.
>>>> So should the IDR be specific to an ETR ?
>>>>
>>>> Or do we not support a session with multiple ETRs involved (1:1) ?
>>>
>>> At this time 1:1 topologies are not supported and a fair amount of work will go
>>> in doing so.  When I started working on this serie my thought was that because
>>> of said amount of work there is no point thinking about 1:1, and that we can
>>> deal with it when we get there.
>>>
>>> But taking a step back and having dealt with the harder (concurrency) problems
>>> inherent to CPU-wide scenarios, it would be trivial to make the code 1:1 ready
>>> and it will be one less thing to worry about down the road.
>>>
>>> That being said and answering your question above, I think we need and IDR per
>>> ETR (in the drvdata) to avoid contention issues on systems with several ETRs.
>>>
>>> Thanks for bringing this back to my attention.
>>
>> Cool. Thanks for explaining the rationale. So, when we do that, I think
>> we may be able to have one "etr_perf_buffer" per ETR and thus we may be
>> able to move the refcount back to the etr_perf_buffer, just like we
>> moved the PID and index etr_perf_buffer instead of the etr_buf ?
> 
> An etr_perf_buffer is associated with an event and holds the AUX ring
> buffer that was created for that event.  In CPU-wide N:1 mode multiple
> events (one per CPU), each with its own AUX ring buffer, share a sink
> and as such we can't have a single etr_perf_buffer per ETR.

Ok. Thanks for clarifying this. So we have one AUX buffer per event, but
they all could end up in the same ETR HW buffer and may be copied to only
one of the AUX buffer, which happens to really disable the ETR. And thus
we need to have a sufficiently large AUX buffer in place to allow we don't
overflow all the time with trace from multiple events. Makes sense for the
N:1 topology. Also the decoding phase must extract the trace to the
corresponding event based on the TRACEDI of the packets.

Thats the best we could do at the moment.
If there was a way to have one single large AUX buffer for a set of events,
which is easily accessible from a given event in the set, rather than "N" large
buffers which could potentially impact memory consumption.

It would be good to have this clearly documented somewhere in the implementation
to keep us in track.

Cheers
Suzuki

> 
> Thanks,
> Mathieu
> 
>>
>> Cheers
>> Suzuki
>>
Mathieu Poirier April 3, 2019, 2:13 a.m. UTC | #8
On Mon, 1 Apr 2019 at 07:02, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
> Hi Mathieu,
>
> On 27/03/2019 17:01, Mathieu Poirier wrote:
> > On Wed, 27 Mar 2019 at 05:30, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> >>
> >> On 03/26/2019 05:55 PM, Mathieu Poirier wrote:
> >>> On Tue, Mar 26, 2019 at 04:18:35PM +0000, Suzuki K Poulose wrote:
> >>>> On 03/25/2019 09:56 PM, Mathieu Poirier wrote:
> >>>>> This patch uses the pid of the process being traced to aggregate traces
> >>>>> coming from different processors in the same sink, something that is
> >>>>> required when collecting traces in CPU-wide mode when the CoreSight HW
> >>>>> enacts a N:1 source/sink topology.
> >>>>>
> >>>>> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> >>>>> ---
> >>>>>     .../hwtracing/coresight/coresight-tmc-etr.c   | 71 +++++++++++++++++--
> >>>>>     1 file changed, 65 insertions(+), 6 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>>>> index 7254fafdf1c2..cbabf88bd51d 100644
> >>>>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>>>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>>>> @@ -8,6 +8,7 @@
> >>>>>     #include <linux/coresight.h>
> >>>>>     #include <linux/dma-mapping.h>
> >>>>>     #include <linux/iommu.h>
> >>>>> +#include <linux/idr.h>
> >>>>>     #include <linux/slab.h>
> >>>>>     #include <linux/types.h>
> >>>>>     #include <linux/vmalloc.h>
> >>>>> @@ -41,6 +42,9 @@ struct etr_perf_buffer {
> >>>>>      void                    **pages;
> >>>>>     };
> >>>>> +static DEFINE_IDR(session_idr);
> >>>>> +static DEFINE_MUTEX(session_idr_lock);
> >>>>
> >>>> Please correct me if I am wrong here. What we now do with this series is
> >>>>
> >>>> - One event per CPU and thus one ETR perf buf per CPU.
> >>>> - One *ETR buf* per PID, from the IDR hash list.
> >>>>
> >>>> However, if we have 1:1 situation, we will have :
> >>>>
> >>>> N (say 2 ETRs), but one *ETR buf* as they all share the same PID, implying
> >>>> multiple multiple ETRs will try to use the same etr buf,
> >>>> which could corrupt the trace data ? Instead,  what we need in that
> >>>> situation is :
> >>>>
> >>>> One ETR buf perf PID+ETR device. i.e, etr_bufs must be per ETR.
> >>>> So should the IDR be specific to an ETR ?
> >>>>
> >>>> Or do we not support a session with multiple ETRs involved (1:1) ?
> >>>
> >>> At this time 1:1 topologies are not supported and a fair amount of work will go
> >>> in doing so.  When I started working on this serie my thought was that because
> >>> of said amount of work there is no point thinking about 1:1, and that we can
> >>> deal with it when we get there.
> >>>
> >>> But taking a step back and having dealt with the harder (concurrency) problems
> >>> inherent to CPU-wide scenarios, it would be trivial to make the code 1:1 ready
> >>> and it will be one less thing to worry about down the road.
> >>>
> >>> That being said and answering your question above, I think we need and IDR per
> >>> ETR (in the drvdata) to avoid contention issues on systems with several ETRs.
> >>>
> >>> Thanks for bringing this back to my attention.
> >>
> >> Cool. Thanks for explaining the rationale. So, when we do that, I think
> >> we may be able to have one "etr_perf_buffer" per ETR and thus we may be
> >> able to move the refcount back to the etr_perf_buffer, just like we
> >> moved the PID and index etr_perf_buffer instead of the etr_buf ?
> >
> > An etr_perf_buffer is associated with an event and holds the AUX ring
> > buffer that was created for that event.  In CPU-wide N:1 mode multiple
> > events (one per CPU), each with its own AUX ring buffer, share a sink
> > and as such we can't have a single etr_perf_buffer per ETR.
>
> Ok. Thanks for clarifying this. So we have one AUX buffer per event, but
> they all could end up in the same ETR HW buffer and may be copied to only
> one of the AUX buffer, which happens to really disable the ETR.

Exactly.

> And thus
> we need to have a sufficiently large AUX buffer in place to allow we don't
> overflow all the time with trace from multiple events. Makes sense for the
> N:1 topology.

Right - fortunately ring buffer size can be modified from the perf
tools command line.

> Also the decoding phase must extract the trace to the
> corresponding event based on the TRACEDI of the packets.

Yes, a good portion of the user space changes is related to that.

>
> Thats the best we could do at the moment.
> If there was a way to have one single large AUX buffer for a set of events,
> which is easily accessible from a given event in the set, rather than "N" large
> buffers which could potentially impact memory consumption.

That was my original idea but it would have required significant
changes to the kernel perf framework.

>
> It would be good to have this clearly documented somewhere in the implementation
> to keep us in track.

Ok, I'll put together a good description of the design choices in the
code.  There is already a significant amount but nothing that really
summarises this conversation.

>
> Cheers
> Suzuki
>
> >
> > Thanks,
> > Mathieu
> >
> >>
> >> Cheers
> >> Suzuki
> >>
diff mbox series

Patch

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 7254fafdf1c2..cbabf88bd51d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -8,6 +8,7 @@ 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
 #include <linux/iommu.h>
+#include <linux/idr.h>
 #include <linux/slab.h>
 #include <linux/types.h>
 #include <linux/vmalloc.h>
@@ -41,6 +42,9 @@  struct etr_perf_buffer {
 	void			**pages;
 };
 
+static DEFINE_IDR(session_idr);
+static DEFINE_MUTEX(session_idr_lock);
+
 /* Convert the perf index to an offset within the ETR buffer */
 #define PERF_IDX2OFF(idx, buf)	((idx) % ((buf)->nr_pages << PAGE_SHIFT))
 
@@ -1198,16 +1202,48 @@  alloc_etr_buf(struct tmc_drvdata *drvdata, int node,
 }
 
 static struct etr_buf *
-tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node,
+tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, pid_t pid, int node,
 		    int nr_pages, void **pages)
 {
+	int ret;
 	struct etr_buf *etr_buf;
 
+retry:
+	/* See if a buffer has been allocated for this session */
+	mutex_lock(&session_idr_lock);
+	etr_buf = idr_find(&session_idr, pid);
+	if (etr_buf) {
+		refcount_inc(&etr_buf->refcount);
+		mutex_unlock(&session_idr_lock);
+		return etr_buf;
+	}
+
+	/* If we made it here no buffer has been allocated, do so now. */
+	mutex_unlock(&session_idr_lock);
+
 	etr_buf = alloc_etr_buf(drvdata, node, nr_pages, pages);
 	if (IS_ERR(etr_buf))
 		return etr_buf;
 
 	refcount_set(&etr_buf->refcount, 1);
+
+	/* Now that we have a buffer, add it to the IDR. */
+	mutex_lock(&session_idr_lock);
+	ret = idr_alloc(&session_idr, etr_buf, pid, pid + 1, GFP_KERNEL);
+	mutex_unlock(&session_idr_lock);
+
+	/* Another event whith this session ID has allocated this buffer. */
+	if (ret == -ENOSPC) {
+		tmc_free_etr_buf(etr_buf);
+		goto retry;
+	}
+
+	/* The IDR can't allocate room for a new session, abandon ship. */
+	if (ret == -ENOMEM) {
+		tmc_free_etr_buf(etr_buf);
+		return ERR_PTR(ret);
+	}
+
 	return etr_buf;
 }
 
@@ -1219,7 +1255,7 @@  tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node,
  * reaches a minimum limit (1M), beyond which we give up.
  */
 static struct etr_perf_buffer *
-tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node,
+tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, pid_t pid, int node,
 		       int nr_pages, void **pages, bool snapshot)
 {
 	struct etr_buf *etr_buf;
@@ -1229,7 +1265,7 @@  tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node,
 	if (!etr_perf)
 		return ERR_PTR(-ENOMEM);
 
-	etr_buf = tmc_etr_get_etr_buf(drvdata, node, nr_pages, pages);
+	etr_buf = tmc_etr_get_etr_buf(drvdata, pid, node, nr_pages, pages);
 	if (!IS_ERR(etr_buf))
 		goto done;
 
@@ -1254,7 +1290,7 @@  static void *tmc_alloc_etr_buffer(struct coresight_device *csdev,
 	if (cpu == -1)
 		cpu = smp_processor_id();
 
-	etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu),
+	etr_perf = tmc_etr_setup_perf_buf(drvdata, pid, cpu_to_node(cpu),
 					  nr_pages, pages, snapshot);
 	if (IS_ERR(etr_perf)) {
 		dev_dbg(drvdata->dev, "Unable to allocate ETR buffer\n");
@@ -1272,9 +1308,32 @@  static void *tmc_alloc_etr_buffer(struct coresight_device *csdev,
 static void tmc_free_etr_buffer(void *config)
 {
 	struct etr_perf_buffer *etr_perf = config;
+	struct etr_buf *buf, *etr_buf = etr_perf->etr_buf;
+
+	if (!etr_buf)
+		goto free_etr_perf_buffer;
+
+	mutex_lock(&session_idr_lock);
+	/* If we are not the last one to use the buffer, don't touch it. */
+	if (!refcount_dec_and_test(&etr_buf->refcount)) {
+		mutex_unlock(&session_idr_lock);
+		goto free_etr_perf_buffer;
+	}
+
+	/* We are the last one, remove from the IDR and free the buffer. */
+	buf = idr_remove(&session_idr, etr_perf->pid);
+	mutex_unlock(&session_idr_lock);
+
+	/*
+	 * Something went very wrong if the buffer associated with this ID
+	 * is not the same in the IDR.  Leak to avoid use after free.
+	 */
+	if (WARN_ON(buf != etr_buf))
+		goto free_etr_perf_buffer;
+
+	tmc_free_etr_buf(etr_perf->etr_buf);
 
-	if (etr_perf->etr_buf)
-		tmc_free_etr_buf(etr_perf->etr_buf);
+free_etr_perf_buffer:
 	kfree(etr_perf);
 }