diff mbox series

coresight: etm4x: Handle unreachable sink in perf mode

Message ID 20200818192931.168584-1-suzuki.poulose@arm.com (mailing list archive)
State New, archived
Headers show
Series coresight: etm4x: Handle unreachable sink in perf mode | expand

Commit Message

Suzuki K Poulose Aug. 18, 2020, 7:29 p.m. UTC
If the specified/hinted sink is not reachable from a subset of the CPUs,
we could end up unable to trace the event on those CPUs. This
is the best effort we could do until we support 1:1 configurations.
Fail gracefully in such cases avoiding a WARN_ON, which can be easily
triggered by the user on certain platforms, like :

[10919.513250] ------------[ cut here ]------------
[10919.517861] WARNING: CPU: 2 PID: 24021 at
drivers/hwtracing/coresight/coresight-etm-perf.c:316 etm_event_start+0xf8/0x100
...

[10919.564403] CPU: 2 PID: 24021 Comm: perf Not tainted 5.8.0+ #24
[10919.570308] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
[10919.575865] pc : etm_event_start+0xf8/0x100
[10919.580034] lr : etm_event_start+0x80/0x100
[10919.584202] sp : fffffe001932f940
[10919.587502] x29: fffffe001932f940 x28: fffffc834995f800
[10919.592799] x27: 0000000000000000 x26: fffffe0011f3ced0
[10919.598095] x25: fffffc837fce244c x24: fffffc837fce2448
[10919.603391] x23: 0000000000000002 x22: fffffc8353529c00
[10919.608688] x21: fffffc835bb31000 x20: 0000000000000000
[10919.613984] x19: fffffc837fcdcc70 x18: 0000000000000000
[10919.619281] x17: 0000000000000000 x16: 0000000000000000
[10919.624577] x15: 0000000000000000 x14: 00000000000009f8
[10919.629874] x13: 00000000000009f8 x12: 0000000000000018
[10919.635170] x11: 0000000000000000 x10: 0000000000000000
[10919.640467] x9 : fffffe00108cd168 x8 : 0000000000000000
[10919.645763] x7 : 0000000000000020 x6 : 0000000000000001
[10919.651059] x5 : 0000000000000002 x4 : 0000000000000001
[10919.656356] x3 : 0000000000000000 x2 : 0000000000000000
[10919.661652] x1 : fffffe836eb40000 x0 : 0000000000000000
[10919.666949] Call trace:
[10919.669382]  etm_event_start+0xf8/0x100
[10919.673203]  etm_event_add+0x40/0x60
[10919.676765]  event_sched_in.isra.134+0xcc/0x210
[10919.681281]  merge_sched_in+0xb0/0x2a8
[10919.685017]  visit_groups_merge.constprop.140+0x15c/0x4b8
[10919.690400]  ctx_sched_in+0x15c/0x170
[10919.694048]  perf_event_sched_in+0x6c/0xa0
[10919.698130]  ctx_resched+0x60/0xa0
[10919.701517]  perf_event_exec+0x288/0x2f0
[10919.705425]  begin_new_exec+0x4c8/0xf58
[10919.709247]  load_elf_binary+0x66c/0xf30
[10919.713155]  exec_binprm+0x15c/0x450
[10919.716716]  __do_execve_file+0x508/0x748
[10919.720711]  __arm64_sys_execve+0x40/0x50
[10919.724707]  do_el0_svc+0xf4/0x1b8
[10919.728095]  el0_sync_handler+0xf8/0x124
[10919.732003]  el0_sync+0x140/0x180

Fixes: f9d81a657bb8 ("coresight: perf: Allow tracing on hotplugged CPUs")
Reported-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Jeremy Linton Aug. 18, 2020, 9:17 p.m. UTC | #1
Hi,

On 8/18/20 2:29 PM, Suzuki K Poulose wrote:
> If the specified/hinted sink is not reachable from a subset of the CPUs,
> we could end up unable to trace the event on those CPUs. This
> is the best effort we could do until we support 1:1 configurations.
> Fail gracefully in such cases avoiding a WARN_ON, which can be easily
> triggered by the user on certain platforms, like :

I've tested it on 5.8 with "coresight: fix offset by one error in 
counting ports" on the N1SDP and i'm getting traces, and not seeing any 
further WARN's.

Thanks for looking into this!

Tested-by: Jeremy Linton <jeremy.linton@arm.com>

> 
> [10919.513250] ------------[ cut here ]------------
> [10919.517861] WARNING: CPU: 2 PID: 24021 at
> drivers/hwtracing/coresight/coresight-etm-perf.c:316 etm_event_start+0xf8/0x100
> ...
> 
> [10919.564403] CPU: 2 PID: 24021 Comm: perf Not tainted 5.8.0+ #24
> [10919.570308] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
> [10919.575865] pc : etm_event_start+0xf8/0x100
> [10919.580034] lr : etm_event_start+0x80/0x100
> [10919.584202] sp : fffffe001932f940
> [10919.587502] x29: fffffe001932f940 x28: fffffc834995f800
> [10919.592799] x27: 0000000000000000 x26: fffffe0011f3ced0
> [10919.598095] x25: fffffc837fce244c x24: fffffc837fce2448
> [10919.603391] x23: 0000000000000002 x22: fffffc8353529c00
> [10919.608688] x21: fffffc835bb31000 x20: 0000000000000000
> [10919.613984] x19: fffffc837fcdcc70 x18: 0000000000000000
> [10919.619281] x17: 0000000000000000 x16: 0000000000000000
> [10919.624577] x15: 0000000000000000 x14: 00000000000009f8
> [10919.629874] x13: 00000000000009f8 x12: 0000000000000018
> [10919.635170] x11: 0000000000000000 x10: 0000000000000000
> [10919.640467] x9 : fffffe00108cd168 x8 : 0000000000000000
> [10919.645763] x7 : 0000000000000020 x6 : 0000000000000001
> [10919.651059] x5 : 0000000000000002 x4 : 0000000000000001
> [10919.656356] x3 : 0000000000000000 x2 : 0000000000000000
> [10919.661652] x1 : fffffe836eb40000 x0 : 0000000000000000
> [10919.666949] Call trace:
> [10919.669382]  etm_event_start+0xf8/0x100
> [10919.673203]  etm_event_add+0x40/0x60
> [10919.676765]  event_sched_in.isra.134+0xcc/0x210
> [10919.681281]  merge_sched_in+0xb0/0x2a8
> [10919.685017]  visit_groups_merge.constprop.140+0x15c/0x4b8
> [10919.690400]  ctx_sched_in+0x15c/0x170
> [10919.694048]  perf_event_sched_in+0x6c/0xa0
> [10919.698130]  ctx_resched+0x60/0xa0
> [10919.701517]  perf_event_exec+0x288/0x2f0
> [10919.705425]  begin_new_exec+0x4c8/0xf58
> [10919.709247]  load_elf_binary+0x66c/0xf30
> [10919.713155]  exec_binprm+0x15c/0x450
> [10919.716716]  __do_execve_file+0x508/0x748
> [10919.720711]  __arm64_sys_execve+0x40/0x50
> [10919.724707]  do_el0_svc+0xf4/0x1b8
> [10919.728095]  el0_sync_handler+0xf8/0x124
> [10919.732003]  el0_sync+0x140/0x180
> 
> Fixes: f9d81a657bb8 ("coresight: perf: Allow tracing on hotplugged CPUs")
> Reported-by: Jeremy Linton <jeremy.linton@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 1a3169e69bb1..9d61a71da96f 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -321,6 +321,16 @@ static void etm_event_start(struct perf_event *event, int flags)
>   	if (!event_data)
>   		goto fail;
>   
> +	/*
> +	 * Check if this ETM is allowed to trace, as decided
> +	 * at etm_setup_aux(). This could be due to an unreachable
> +	 * sink from this ETM. We can't do much in this case if
> +	 * the sink was specified or hinted to the driver. For
> +	 * now, simply don't record anything on this ETM.
> +	 */
> +	if (!cpumask_test_cpu(cpu, &event_data->mask))
> +		goto fail_end_stop;
> +
>   	path = etm_event_cpu_path(event_data, cpu);
>   	/* We need a sink, no need to continue without one */
>   	sink = coresight_get_sink(path);
>
Mathieu Poirier Aug. 19, 2020, 7:22 p.m. UTC | #2
Hi Suzuki,

On Tue, Aug 18, 2020 at 08:29:31PM +0100, Suzuki K Poulose wrote:
> If the specified/hinted sink is not reachable from a subset of the CPUs,
> we could end up unable to trace the event on those CPUs. This
> is the best effort we could do until we support 1:1 configurations.
> Fail gracefully in such cases avoiding a WARN_ON, which can be easily
> triggered by the user on certain platforms, like :
> 
> [10919.513250] ------------[ cut here ]------------
> [10919.517861] WARNING: CPU: 2 PID: 24021 at
> drivers/hwtracing/coresight/coresight-etm-perf.c:316 etm_event_start+0xf8/0x100
> ...
> 
> [10919.564403] CPU: 2 PID: 24021 Comm: perf Not tainted 5.8.0+ #24
> [10919.570308] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
> [10919.575865] pc : etm_event_start+0xf8/0x100
> [10919.580034] lr : etm_event_start+0x80/0x100
> [10919.584202] sp : fffffe001932f940
> [10919.587502] x29: fffffe001932f940 x28: fffffc834995f800
> [10919.592799] x27: 0000000000000000 x26: fffffe0011f3ced0
> [10919.598095] x25: fffffc837fce244c x24: fffffc837fce2448
> [10919.603391] x23: 0000000000000002 x22: fffffc8353529c00
> [10919.608688] x21: fffffc835bb31000 x20: 0000000000000000
> [10919.613984] x19: fffffc837fcdcc70 x18: 0000000000000000
> [10919.619281] x17: 0000000000000000 x16: 0000000000000000
> [10919.624577] x15: 0000000000000000 x14: 00000000000009f8
> [10919.629874] x13: 00000000000009f8 x12: 0000000000000018
> [10919.635170] x11: 0000000000000000 x10: 0000000000000000
> [10919.640467] x9 : fffffe00108cd168 x8 : 0000000000000000
> [10919.645763] x7 : 0000000000000020 x6 : 0000000000000001
> [10919.651059] x5 : 0000000000000002 x4 : 0000000000000001
> [10919.656356] x3 : 0000000000000000 x2 : 0000000000000000
> [10919.661652] x1 : fffffe836eb40000 x0 : 0000000000000000
> [10919.666949] Call trace:
> [10919.669382]  etm_event_start+0xf8/0x100
> [10919.673203]  etm_event_add+0x40/0x60
> [10919.676765]  event_sched_in.isra.134+0xcc/0x210
> [10919.681281]  merge_sched_in+0xb0/0x2a8
> [10919.685017]  visit_groups_merge.constprop.140+0x15c/0x4b8
> [10919.690400]  ctx_sched_in+0x15c/0x170
> [10919.694048]  perf_event_sched_in+0x6c/0xa0
> [10919.698130]  ctx_resched+0x60/0xa0
> [10919.701517]  perf_event_exec+0x288/0x2f0
> [10919.705425]  begin_new_exec+0x4c8/0xf58
> [10919.709247]  load_elf_binary+0x66c/0xf30
> [10919.713155]  exec_binprm+0x15c/0x450
> [10919.716716]  __do_execve_file+0x508/0x748
> [10919.720711]  __arm64_sys_execve+0x40/0x50
> [10919.724707]  do_el0_svc+0xf4/0x1b8
> [10919.728095]  el0_sync_handler+0xf8/0x124
> [10919.732003]  el0_sync+0x140/0x180
> 
> Fixes: f9d81a657bb8 ("coresight: perf: Allow tracing on hotplugged CPUs")
> Reported-by: Jeremy Linton <jeremy.linton@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 1a3169e69bb1..9d61a71da96f 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -321,6 +321,16 @@ static void etm_event_start(struct perf_event *event, int flags)
>  	if (!event_data)
>  		goto fail;
>  
> +	/*
> +	 * Check if this ETM is allowed to trace, as decided
> +	 * at etm_setup_aux(). This could be due to an unreachable
> +	 * sink from this ETM. We can't do much in this case if
> +	 * the sink was specified or hinted to the driver. For
> +	 * now, simply don't record anything on this ETM.
> +	 */

Can you provide more details on the scenario and the topology of the system?
Without either it is hard to wrap my head around the problem to address.
Having that information in the changelog would go a long way.

I'm sure this is a per-thread scenario because there is more than one CPU per
event.  I'm also suspecting this is on a system where there is one sink per CPU
cluster, and that is not supported.

If I am right on both account I am questionning the "Fixes".  On a system with
N:1 topology the code introduced by f9d81a657bb8 will work in the event a CPU is
hotplugged in.  The code introduced in this patch is simply to prevent a
warn_on() trace from being generated on systems that aren't supported.  It should
have a "stable" tag. 

Thanks,
Mathieu

> +	if (!cpumask_test_cpu(cpu, &event_data->mask))
> +		goto fail_end_stop;
> +
>  	path = etm_event_cpu_path(event_data, cpu);
>  	/* We need a sink, no need to continue without one */
>  	sink = coresight_get_sink(path);
> -- 
> 2.24.1
>
Suzuki K Poulose Sept. 1, 2020, 10:28 a.m. UTC | #3
On 08/19/2020 08:22 PM, Mathieu Poirier wrote:
> Hi Suzuki,
> 
> On Tue, Aug 18, 2020 at 08:29:31PM +0100, Suzuki K Poulose wrote:
>> If the specified/hinted sink is not reachable from a subset of the CPUs,
>> we could end up unable to trace the event on those CPUs. This
>> is the best effort we could do until we support 1:1 configurations.
>> Fail gracefully in such cases avoiding a WARN_ON, which can be easily
>> triggered by the user on certain platforms, like :
>>
>> [10919.513250] ------------[ cut here ]------------
>> [10919.517861] WARNING: CPU: 2 PID: 24021 at
>> drivers/hwtracing/coresight/coresight-etm-perf.c:316 etm_event_start+0xf8/0x100
>> ...
>>
>> [10919.564403] CPU: 2 PID: 24021 Comm: perf Not tainted 5.8.0+ #24
>> [10919.570308] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
>> [10919.575865] pc : etm_event_start+0xf8/0x100
>> [10919.580034] lr : etm_event_start+0x80/0x100
>> [10919.584202] sp : fffffe001932f940
>> [10919.587502] x29: fffffe001932f940 x28: fffffc834995f800
>> [10919.592799] x27: 0000000000000000 x26: fffffe0011f3ced0
>> [10919.598095] x25: fffffc837fce244c x24: fffffc837fce2448
>> [10919.603391] x23: 0000000000000002 x22: fffffc8353529c00
>> [10919.608688] x21: fffffc835bb31000 x20: 0000000000000000
>> [10919.613984] x19: fffffc837fcdcc70 x18: 0000000000000000
>> [10919.619281] x17: 0000000000000000 x16: 0000000000000000
>> [10919.624577] x15: 0000000000000000 x14: 00000000000009f8
>> [10919.629874] x13: 00000000000009f8 x12: 0000000000000018
>> [10919.635170] x11: 0000000000000000 x10: 0000000000000000
>> [10919.640467] x9 : fffffe00108cd168 x8 : 0000000000000000
>> [10919.645763] x7 : 0000000000000020 x6 : 0000000000000001
>> [10919.651059] x5 : 0000000000000002 x4 : 0000000000000001
>> [10919.656356] x3 : 0000000000000000 x2 : 0000000000000000
>> [10919.661652] x1 : fffffe836eb40000 x0 : 0000000000000000
>> [10919.666949] Call trace:
>> [10919.669382]  etm_event_start+0xf8/0x100
>> [10919.673203]  etm_event_add+0x40/0x60
>> [10919.676765]  event_sched_in.isra.134+0xcc/0x210
>> [10919.681281]  merge_sched_in+0xb0/0x2a8
>> [10919.685017]  visit_groups_merge.constprop.140+0x15c/0x4b8
>> [10919.690400]  ctx_sched_in+0x15c/0x170
>> [10919.694048]  perf_event_sched_in+0x6c/0xa0
>> [10919.698130]  ctx_resched+0x60/0xa0
>> [10919.701517]  perf_event_exec+0x288/0x2f0
>> [10919.705425]  begin_new_exec+0x4c8/0xf58
>> [10919.709247]  load_elf_binary+0x66c/0xf30
>> [10919.713155]  exec_binprm+0x15c/0x450
>> [10919.716716]  __do_execve_file+0x508/0x748
>> [10919.720711]  __arm64_sys_execve+0x40/0x50
>> [10919.724707]  do_el0_svc+0xf4/0x1b8
>> [10919.728095]  el0_sync_handler+0xf8/0x124
>> [10919.732003]  el0_sync+0x140/0x180
>>
>> Fixes: f9d81a657bb8 ("coresight: perf: Allow tracing on hotplugged CPUs")
>> Reported-by: Jeremy Linton <jeremy.linton@arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> index 1a3169e69bb1..9d61a71da96f 100644
>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> @@ -321,6 +321,16 @@ static void etm_event_start(struct perf_event *event, int flags)
>>   	if (!event_data)
>>   		goto fail;
>>   
>> +	/*
>> +	 * Check if this ETM is allowed to trace, as decided
>> +	 * at etm_setup_aux(). This could be due to an unreachable
>> +	 * sink from this ETM. We can't do much in this case if
>> +	 * the sink was specified or hinted to the driver. For
>> +	 * now, simply don't record anything on this ETM.
>> +	 */
> 
> Can you provide more details on the scenario and the topology of the system?
> Without either it is hard to wrap my head around the problem to address.
> Having that information in the changelog would go a long way.

Sure. This was detected on N1SDP with the following topology, with the
command :


$ perf record -e cs_etm/@tmc_etf0/ --per-thread dd if=/dev/zero 
of=BIGFILE bs=1M count=100


CPU0
       \
         Funnel0 ---- ETF0 --
       /                      \
CPU1
                                Funnel2
CPU2
       \                      /
         Funnel1 ---- ETF1 --
       /
CPU3


Basically, a pair of CPUS (0&1, 2&3 respectively) are connected to a 
static funnel followed by a TMC-ETF, before connecting to a main
funnel which merges the ETMs and the STM on the system.

In such a case, if the user selects tmc_etf0 as the sink for a perf
session this could trigger a warning when starting the event on ETM2
as we haven't been able to create a path. Also the CPU2 is cleared in
the event_data->cpumask.

I will add the above to the commit log.

For now we don't really support multiple sinks for a perf session. This
will need to be addressed for the per-CPU buffer scenario. But, we
should fix the current logic until we get there, to avoid triggering
the warnings, which can be done quite easily on these systems, which
are not really per-CPU buffers.

> 
> I'm sure this is a per-thread scenario because there is more than one CPU per

Yes, it is a per-thread scenario.

> event.  I'm also suspecting this is on a system where there is one sink per CPU
> cluster, and that is not supported.

No, that is not exactly the case, from the topology above. But not N:1 
either. More of N:M and this is possible on systems with per cluster ETFs.

> 
> If I am right on both account I am questionning the "Fixes".  On a system with
> N:1 topology the code introduced by f9d81a657bb8 will work in the event a CPU is
> hotplugged in.  The code introduced in this patch is simply to prevent a

Correct. But, without the above commit, we would have failed while
creating a path to the sink, because if a CPU was offline then we don't
care about a path from that ETM. So this warning is essentially
triggered only after the above commit and thus the Fixes tag.

> warn_on() trace from being generated on systems that aren't supported.  It should
> have a "stable" tag.

Cheers
Suzuki
Mathieu Poirier Sept. 1, 2020, 5:18 p.m. UTC | #4
Good morning,

On Tue, Sep 01, 2020 at 11:28:55AM +0100, Suzuki K Poulose wrote:
> On 08/19/2020 08:22 PM, Mathieu Poirier wrote:
> > Hi Suzuki,
> > 
> > On Tue, Aug 18, 2020 at 08:29:31PM +0100, Suzuki K Poulose wrote:
> > > If the specified/hinted sink is not reachable from a subset of the CPUs,
> > > we could end up unable to trace the event on those CPUs. This
> > > is the best effort we could do until we support 1:1 configurations.
> > > Fail gracefully in such cases avoiding a WARN_ON, which can be easily
> > > triggered by the user on certain platforms, like :
> > > 
> > > [10919.513250] ------------[ cut here ]------------
> > > [10919.517861] WARNING: CPU: 2 PID: 24021 at
> > > drivers/hwtracing/coresight/coresight-etm-perf.c:316 etm_event_start+0xf8/0x100
> > > ...
> > > 
> > > [10919.564403] CPU: 2 PID: 24021 Comm: perf Not tainted 5.8.0+ #24
> > > [10919.570308] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
> > > [10919.575865] pc : etm_event_start+0xf8/0x100
> > > [10919.580034] lr : etm_event_start+0x80/0x100
> > > [10919.584202] sp : fffffe001932f940
> > > [10919.587502] x29: fffffe001932f940 x28: fffffc834995f800
> > > [10919.592799] x27: 0000000000000000 x26: fffffe0011f3ced0
> > > [10919.598095] x25: fffffc837fce244c x24: fffffc837fce2448
> > > [10919.603391] x23: 0000000000000002 x22: fffffc8353529c00
> > > [10919.608688] x21: fffffc835bb31000 x20: 0000000000000000
> > > [10919.613984] x19: fffffc837fcdcc70 x18: 0000000000000000
> > > [10919.619281] x17: 0000000000000000 x16: 0000000000000000
> > > [10919.624577] x15: 0000000000000000 x14: 00000000000009f8
> > > [10919.629874] x13: 00000000000009f8 x12: 0000000000000018
> > > [10919.635170] x11: 0000000000000000 x10: 0000000000000000
> > > [10919.640467] x9 : fffffe00108cd168 x8 : 0000000000000000
> > > [10919.645763] x7 : 0000000000000020 x6 : 0000000000000001
> > > [10919.651059] x5 : 0000000000000002 x4 : 0000000000000001
> > > [10919.656356] x3 : 0000000000000000 x2 : 0000000000000000
> > > [10919.661652] x1 : fffffe836eb40000 x0 : 0000000000000000
> > > [10919.666949] Call trace:
> > > [10919.669382]  etm_event_start+0xf8/0x100
> > > [10919.673203]  etm_event_add+0x40/0x60
> > > [10919.676765]  event_sched_in.isra.134+0xcc/0x210
> > > [10919.681281]  merge_sched_in+0xb0/0x2a8
> > > [10919.685017]  visit_groups_merge.constprop.140+0x15c/0x4b8
> > > [10919.690400]  ctx_sched_in+0x15c/0x170
> > > [10919.694048]  perf_event_sched_in+0x6c/0xa0
> > > [10919.698130]  ctx_resched+0x60/0xa0
> > > [10919.701517]  perf_event_exec+0x288/0x2f0
> > > [10919.705425]  begin_new_exec+0x4c8/0xf58
> > > [10919.709247]  load_elf_binary+0x66c/0xf30
> > > [10919.713155]  exec_binprm+0x15c/0x450
> > > [10919.716716]  __do_execve_file+0x508/0x748
> > > [10919.720711]  __arm64_sys_execve+0x40/0x50
> > > [10919.724707]  do_el0_svc+0xf4/0x1b8
> > > [10919.728095]  el0_sync_handler+0xf8/0x124
> > > [10919.732003]  el0_sync+0x140/0x180
> > > 
> > > Fixes: f9d81a657bb8 ("coresight: perf: Allow tracing on hotplugged CPUs")
> > > Reported-by: Jeremy Linton <jeremy.linton@arm.com>
> > > Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> > > Cc: Mike Leach <mike.leach@linaro.org>
> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > ---
> > >   drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
> > >   1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > > index 1a3169e69bb1..9d61a71da96f 100644
> > > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > > @@ -321,6 +321,16 @@ static void etm_event_start(struct perf_event *event, int flags)
> > >   	if (!event_data)
> > >   		goto fail;
> > > +	/*
> > > +	 * Check if this ETM is allowed to trace, as decided
> > > +	 * at etm_setup_aux(). This could be due to an unreachable
> > > +	 * sink from this ETM. We can't do much in this case if
> > > +	 * the sink was specified or hinted to the driver. For
> > > +	 * now, simply don't record anything on this ETM.
> > > +	 */
> > 
> > Can you provide more details on the scenario and the topology of the system?
> > Without either it is hard to wrap my head around the problem to address.
> > Having that information in the changelog would go a long way.
> 
> Sure. This was detected on N1SDP with the following topology, with the
> command :
> 
> 
> $ perf record -e cs_etm/@tmc_etf0/ --per-thread dd if=/dev/zero of=BIGFILE
> bs=1M count=100
> 
> 
> CPU0
>       \
>         Funnel0 ---- ETF0 --
>       /                      \
> CPU1
>                                Funnel2
> CPU2
>       \                      /
>         Funnel1 ---- ETF1 --
>       /
> CPU3
> 
> 
> Basically, a pair of CPUS (0&1, 2&3 respectively) are connected to a static
> funnel followed by a TMC-ETF, before connecting to a main
> funnel which merges the ETMs and the STM on the system.
> 
> In such a case, if the user selects tmc_etf0 as the sink for a perf
> session this could trigger a warning when starting the event on ETM2
> as we haven't been able to create a path. Also the CPU2 is cleared in
> the event_data->cpumask.

Ok, that's the kind of topology I imagined you were dealing with.

> 
> I will add the above to the commit log.

Yes please.

> 
> For now we don't really support multiple sinks for a perf session. This
> will need to be addressed for the per-CPU buffer scenario. But, we
> should fix the current logic until we get there, to avoid triggering
> the warnings, which can be done quite easily on these systems, which
> are not really per-CPU buffers.

I agree that it should fail gracefully.  

> 
> > 
> > I'm sure this is a per-thread scenario because there is more than one CPU per
> 
> Yes, it is a per-thread scenario.
> 
> > event.  I'm also suspecting this is on a system where there is one sink per CPU
> > cluster, and that is not supported.
> 
> No, that is not exactly the case, from the topology above. But not N:1
> either. More of N:M and this is possible on systems with per cluster ETFs.
> 
> > 
> > If I am right on both account I am questionning the "Fixes".  On a system with
> > N:1 topology the code introduced by f9d81a657bb8 will work in the event a CPU is
> > hotplugged in.  The code introduced in this patch is simply to prevent a
> 
> Correct. But, without the above commit, we would have failed while
> creating a path to the sink, because if a CPU was offline then we don't
> care about a path from that ETM. So this warning is essentially
> triggered only after the above commit and thus the Fixes tag.

That is an equally valid argument.

> 
> > warn_on() trace from being generated on systems that aren't supported.  It should
> > have a "stable" tag.
> 
> Cheers
> Suzuki
diff mbox series

Patch

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 1a3169e69bb1..9d61a71da96f 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -321,6 +321,16 @@  static void etm_event_start(struct perf_event *event, int flags)
 	if (!event_data)
 		goto fail;
 
+	/*
+	 * Check if this ETM is allowed to trace, as decided
+	 * at etm_setup_aux(). This could be due to an unreachable
+	 * sink from this ETM. We can't do much in this case if
+	 * the sink was specified or hinted to the driver. For
+	 * now, simply don't record anything on this ETM.
+	 */
+	if (!cpumask_test_cpu(cpu, &event_data->mask))
+		goto fail_end_stop;
+
 	path = etm_event_cpu_path(event_data, cpu);
 	/* We need a sink, no need to continue without one */
 	sink = coresight_get_sink(path);