diff mbox series

[RFC] perf tools: cs-etm: fix endless record after being terminated

Message ID 20200102074144.10407-1-liwei391@huawei.com (mailing list archive)
State New, archived
Headers show
Series [RFC] perf tools: cs-etm: fix endless record after being terminated | expand

Commit Message

Wei Li Jan. 2, 2020, 7:41 a.m. UTC
In __cmd_record(), when receiving SIGINT(ctrl + c), a done flag will
be set and the event list will be disabled by evlist__disable() once.

While in auxtrace_record.read_finish(), the related events will be
enabled again, if they are continuous, the recording seems to be endless.

If the cs_etm event is disabled, we don't enable it again here.

Note: This patch is NOT tested since i don't have such a machine with
coresight feature, but the code seems buggy same as arm-spe and intel-pt.

Signed-off-by: Wei Li <liwei391@huawei.com>
---
 tools/perf/arch/arm/util/cs-etm.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Leo Yan Jan. 3, 2020, 8:24 a.m. UTC | #1
Hi Wei,

On Thu, Jan 02, 2020 at 03:41:44PM +0800, Wei Li wrote:
> In __cmd_record(), when receiving SIGINT(ctrl + c), a done flag will
> be set and the event list will be disabled by evlist__disable() once.
> 
> While in auxtrace_record.read_finish(), the related events will be
> enabled again, if they are continuous, the recording seems to be endless.
> 
> If the cs_etm event is disabled, we don't enable it again here.
> 
> Note: This patch is NOT tested since i don't have such a machine with
> coresight feature, but the code seems buggy same as arm-spe and intel-pt.
> 
> Signed-off-by: Wei Li <liwei391@huawei.com>
> ---
>  tools/perf/arch/arm/util/cs-etm.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> index ede040cf82ad..1893a0e3b1e1 100644
> --- a/tools/perf/arch/arm/util/cs-etm.c
> +++ b/tools/perf/arch/arm/util/cs-etm.c
> @@ -865,9 +865,13 @@ static int cs_etm_read_finish(struct auxtrace_record *itr, int idx)
>  	struct evsel *evsel;
>  
>  	evlist__for_each_entry(ptr->evlist, evsel) {
> -		if (evsel->core.attr.type == ptr->cs_etm_pmu->type)
> -			return perf_evlist__enable_event_idx(ptr->evlist,
> -							     evsel, idx);
> +		if (evsel->core.attr.type == ptr->cs_etm_pmu->type) {
> +			if (evsel->disabled)
> +				return 0;
> +			else
> +				return perf_evlist__enable_event_idx(
> +						ptr->evlist, evsel, idx);
> +		}
>  	}

I took some time to test on Arm CoreSight, the perf program can be
terminated by Ctrl+c with SIGINT signal on the mainline kernel.

And after capturing ftrace data with below log:

5242      migration/2-19    [002] d..3  4648.383155: sched_migrate_task: comm=perf pid=1692 prio=120 orig_cpu=2 dest_cpu=0
5243      migration/2-19    [002] d..2  4648.383167: sched_switch: prev_comm=migration/2 prev_pid=19 prev_prio=0 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
5244           <idle>-0     [000] d..2  4648.383167: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=1692 next_prio=120
5245             perf-1692  [000] d..2  4648.383193: sched_stat_runtime: comm=perf pid=1692 runtime=35420 [ns] vruntime=1636633943 [ns]
5246             perf-1692  [000] d..3  4648.383200: sched_waking: comm=migration/0 pid=11 prio=0 target_cpu=000
5247             perf-1692  [000] dN.4  4648.383203: sched_wakeup: comm=migration/0 pid=11 prio=0 target_cpu=000
5248             perf-1692  [000] dN.2  4648.383205: sched_stat_runtime: comm=perf pid=1692 runtime=9340 [ns] vruntime=1636643283 [ns]
5249             perf-1692  [000] d..2  4648.383208: sched_switch: prev_comm=perf prev_pid=1692 prev_prio=120 prev_state=R+ ==> next_comm=migration/0 next_pid=11 next_prio=0
5250      migration/0-11    [000] d..3  4648.383215: sched_migrate_task: comm=perf pid=1692 prio=120 orig_cpu=0 dest_cpu=1
5251       algorithm1-721   [001] dN.2  4648.383225: sched_stat_runtime: comm=algorithm1 pid=721 runtime=2906000 [ns] vruntime=3501282256244 [ns]
5252       algorithm1-721   [001] d..2  4648.383229: sched_switch: prev_comm=algorithm1 prev_pid=721 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=1692 next_prio=120
5253      migration/0-11    [000] d..2  4648.383235: sched_switch: prev_comm=migration/0 prev_pid=11 prev_prio=0 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
5254       algorithm1-721   [001] d..4  4648.383241: <stack trace>
5255  => kprobe_breakpoint_handler
5256  => call_break_hook
5257  => brk_handler
5258  => do_debug_exception
5259  => el1_sync_handler
5260  => el1_sync
5261  => etm_event_stop
5262  => event_sched_out.isra.106
5263  => group_sched_out.part.108
5264  => ctx_sched_out
5265  => task_ctx_sched_out
5266  => __perf_event_task_sched_out
5267  => __schedule
5268  => schedule
5269  => do_notify_resume
5270  => work_pending

We can see after send SIGINT signal, the process 'perf' will be
migrated from CPU2 to CPU0 (line 5242) and it will preempt process
'algorithm1' (line 5252); after the process 'algorithm1' is scheduled
out, the function etm_event_stop() will be invoked to stop tracing.

If we connect with the code in cs_etm_read_finish(), it tries to call
ioctl PERF_EVENT_IOC_ENABLE, but because the process 'algorithm1' is
scheduled out, so the perf event should not be enabled afterwards.

I may miss something at here ... Could you confirm what's the type of
attached process?  normal process or RT process?

Thanks,
Leo

P.s. I tested IntelPT with 5.2-rc3 kernel, it also can be terminated
properly.

>  	return -EINVAL;
> -- 
> 2.17.1
>
Wei Li Jan. 6, 2020, 3 a.m. UTC | #2
Hi Leo,

Thanks for your test and sorry for missing the reproducing info.
The attachment is my test procedure, i can reproduce this issue
with it on kernel 5.5-rc4 definitely.

I have tested these patches on kernel 5.5-rc4, with intel-pt on Xeon
Gold 6140 (72 cores) and arm-spe on HiSilicon Hi1620 (128 cores).
But i can not test CoreSight temporarily, could you please test it
with the test procedure again?

P.s. Running the test procedure as a normal process is enough.

Thanks,
Wei

On 2020/1/3 16:24, Leo Yan wrote:

> 
> I took some time to test on Arm CoreSight, the perf program can be
> terminated by Ctrl+c with SIGINT signal on the mainline kernel.
> 
> And after capturing ftrace data with below log:
> 
> 5242      migration/2-19    [002] d..3  4648.383155: sched_migrate_task: comm=perf pid=1692 prio=120 orig_cpu=2 dest_cpu=0
> 5243      migration/2-19    [002] d..2  4648.383167: sched_switch: prev_comm=migration/2 prev_pid=19 prev_prio=0 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
> 5244           <idle>-0     [000] d..2  4648.383167: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=1692 next_prio=120
> 5245             perf-1692  [000] d..2  4648.383193: sched_stat_runtime: comm=perf pid=1692 runtime=35420 [ns] vruntime=1636633943 [ns]
> 5246             perf-1692  [000] d..3  4648.383200: sched_waking: comm=migration/0 pid=11 prio=0 target_cpu=000
> 5247             perf-1692  [000] dN.4  4648.383203: sched_wakeup: comm=migration/0 pid=11 prio=0 target_cpu=000
> 5248             perf-1692  [000] dN.2  4648.383205: sched_stat_runtime: comm=perf pid=1692 runtime=9340 [ns] vruntime=1636643283 [ns]
> 5249             perf-1692  [000] d..2  4648.383208: sched_switch: prev_comm=perf prev_pid=1692 prev_prio=120 prev_state=R+ ==> next_comm=migration/0 next_pid=11 next_prio=0
> 5250      migration/0-11    [000] d..3  4648.383215: sched_migrate_task: comm=perf pid=1692 prio=120 orig_cpu=0 dest_cpu=1
> 5251       algorithm1-721   [001] dN.2  4648.383225: sched_stat_runtime: comm=algorithm1 pid=721 runtime=2906000 [ns] vruntime=3501282256244 [ns]
> 5252       algorithm1-721   [001] d..2  4648.383229: sched_switch: prev_comm=algorithm1 prev_pid=721 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=1692 next_prio=120
> 5253      migration/0-11    [000] d..2  4648.383235: sched_switch: prev_comm=migration/0 prev_pid=11 prev_prio=0 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
> 5254       algorithm1-721   [001] d..4  4648.383241: <stack trace>
> 5255  => kprobe_breakpoint_handler
> 5256  => call_break_hook
> 5257  => brk_handler
> 5258  => do_debug_exception
> 5259  => el1_sync_handler
> 5260  => el1_sync
> 5261  => etm_event_stop
> 5262  => event_sched_out.isra.106
> 5263  => group_sched_out.part.108
> 5264  => ctx_sched_out
> 5265  => task_ctx_sched_out
> 5266  => __perf_event_task_sched_out
> 5267  => __schedule
> 5268  => schedule
> 5269  => do_notify_resume
> 5270  => work_pending
> 
> We can see after send SIGINT signal, the process 'perf' will be
> migrated from CPU2 to CPU0 (line 5242) and it will preempt process
> 'algorithm1' (line 5252); after the process 'algorithm1' is scheduled
> out, the function etm_event_stop() will be invoked to stop tracing.
> 
> If we connect with the code in cs_etm_read_finish(), it tries to call
> ioctl PERF_EVENT_IOC_ENABLE, but because the process 'algorithm1' is
> scheduled out, so the perf event should not be enabled afterwards.
> 
> I may miss something at here ... Could you confirm what's the type of
> attached process?  normal process or RT process?
> 
> Thanks,
> Leo
> 
> P.s. I tested IntelPT with 5.2-rc3 kernel, it also can be terminated
> properly.
> 
>>  	return -EINVAL;
>> -- 
>> 2.17.1
>>
> 
> .
>
#define _GNU_SOURCE

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/sysinfo.h>
#include <unistd.h>
#include <sched.h>
#include <ctype.h>
#include <string.h>
#include <pthread.h>

int num = 0;
int test[65535];

int mess_rw(int data)
{
    int i;

    while (1) {
        for (i = 0; i < (sizeof(test) / sizeof(test[0])); i++) {
            if (test[i] != data)
                test[i] = data;
        }
    }
}

void *test_thread(void *arg)
{
    int cpu = *(int *)arg;
    cpu_set_t mask;

    CPU_ZERO(&mask);
    CPU_SET(cpu, &mask);
    if (!sched_setaffinity(0, sizeof(mask), &mask))
        printf("thread %d: running on cpu %d\n", cpu, cpu);
    else
        printf("thread %d: fail to set CPU affinity\n", cpu);

    mess_rw(cpu);

    return NULL;
}

int main(int argc, char *argv[])
{
    num = sysconf(_SC_NPROCESSORS_CONF);
    pthread_t thread[num];
    int id[num];
    int i;

    printf("PID %d on system with %d processor(s)\n", getpid(), num);

    for (i = 0; i < num; i++) {
        id[i] = i;
        pthread_create(&thread[i], NULL, test_thread, (void *)&id[i]);
    }

    for (i = 0; i < num; i++) {
        pthread_join(thread[i], NULL);
    }

    return 0;
}
Leo Yan Jan. 6, 2020, 2:37 p.m. UTC | #3
Hi Wei,

On Mon, Jan 06, 2020 at 11:00:19AM +0800, liwei (GF) wrote:
> Hi Leo,
> 
> Thanks for your test and sorry for missing the reproducing info.
> The attachment is my test procedure, i can reproduce this issue
> with it on kernel 5.5-rc4 definitely.
> 
> I have tested these patches on kernel 5.5-rc4, with intel-pt on Xeon
> Gold 6140 (72 cores) and arm-spe on HiSilicon Hi1620 (128 cores).
> But i can not test CoreSight temporarily, could you please test it
> with the test procedure again?

Thanks for the test code.

I tested it with perf/cs-etm, but cannot reproduce the issue.  But after
capturing trace data with trace-cmd, I think this indeed is an
potential issue; the reason for Arm cs-etm cannot produce this issue,
one possibility is because cs-etm doesn't trigger interrupt; if the
interrupt is triggered, perf tool might fail to stop hardware events
with below flow:

  Perf thread:                           Profiled thread(s)

  __cmd_record()
    evlist__disable()
      set_affinity()
        migration to prfiled thread's CPU
          evsel__disable_cpu()
            ioctl(PERF_EVENT_IOC_DISABLE)

                                      Thread is scheduled out
                                        ctx_sched_out()
                                          event_sched_out()
                                            etm4_disable()

  record__mmap_read_all()
    record__auxtrace_mmap_read()
      auxtrace_mmap__read()
        itr->read_finish()
          cs_etm_read_finish()
            ioctl(PERF_EVENT_IOC_ENABLE)

                                     Thread is scheduled in
                                       __perf_event_task_sched_in()
                                         perf_event_sched_in()
                                           event_sched_in()
                                             etm4_enable()

  the condition (hits == rec->samples) is
  never true, infinite loop in
  __cmd_record()


So a minor suggestion for your patch:

diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index ede040cf82ad..29c5eefa6e41 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -864,6 +864,9 @@ static int cs_etm_read_finish(struct auxtrace_record *itr, int idx)
                        container_of(itr, struct cs_etm_recording, itr);
        struct evsel *evsel;
 
+       if (!ptr->evlist->enabled)
+               return 0;
+
        evlist__for_each_entry(ptr->evlist, evsel) {
                if (evsel->core.attr.type == ptr->cs_etm_pmu->type)
                        return perf_evlist__enable_event_idx(ptr->evlist,

I'd like to wait a bit if others can find more general method to fix
this issue.

Thanks,
Leo

> P.s. Running the test procedure as a normal process is enough.
> 
> Thanks,
> Wei
> 
> On 2020/1/3 16:24, Leo Yan wrote:
> 
> > 
> > I took some time to test on Arm CoreSight, the perf program can be
> > terminated by Ctrl+c with SIGINT signal on the mainline kernel.
> > 
> > And after capturing ftrace data with below log:
> > 
> > 5242      migration/2-19    [002] d..3  4648.383155: sched_migrate_task: comm=perf pid=1692 prio=120 orig_cpu=2 dest_cpu=0
> > 5243      migration/2-19    [002] d..2  4648.383167: sched_switch: prev_comm=migration/2 prev_pid=19 prev_prio=0 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
> > 5244           <idle>-0     [000] d..2  4648.383167: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=1692 next_prio=120
> > 5245             perf-1692  [000] d..2  4648.383193: sched_stat_runtime: comm=perf pid=1692 runtime=35420 [ns] vruntime=1636633943 [ns]
> > 5246             perf-1692  [000] d..3  4648.383200: sched_waking: comm=migration/0 pid=11 prio=0 target_cpu=000
> > 5247             perf-1692  [000] dN.4  4648.383203: sched_wakeup: comm=migration/0 pid=11 prio=0 target_cpu=000
> > 5248             perf-1692  [000] dN.2  4648.383205: sched_stat_runtime: comm=perf pid=1692 runtime=9340 [ns] vruntime=1636643283 [ns]
> > 5249             perf-1692  [000] d..2  4648.383208: sched_switch: prev_comm=perf prev_pid=1692 prev_prio=120 prev_state=R+ ==> next_comm=migration/0 next_pid=11 next_prio=0
> > 5250      migration/0-11    [000] d..3  4648.383215: sched_migrate_task: comm=perf pid=1692 prio=120 orig_cpu=0 dest_cpu=1
> > 5251       algorithm1-721   [001] dN.2  4648.383225: sched_stat_runtime: comm=algorithm1 pid=721 runtime=2906000 [ns] vruntime=3501282256244 [ns]
> > 5252       algorithm1-721   [001] d..2  4648.383229: sched_switch: prev_comm=algorithm1 prev_pid=721 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=1692 next_prio=120
> > 5253      migration/0-11    [000] d..2  4648.383235: sched_switch: prev_comm=migration/0 prev_pid=11 prev_prio=0 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
> > 5254       algorithm1-721   [001] d..4  4648.383241: <stack trace>
> > 5255  => kprobe_breakpoint_handler
> > 5256  => call_break_hook
> > 5257  => brk_handler
> > 5258  => do_debug_exception
> > 5259  => el1_sync_handler
> > 5260  => el1_sync
> > 5261  => etm_event_stop
> > 5262  => event_sched_out.isra.106
> > 5263  => group_sched_out.part.108
> > 5264  => ctx_sched_out
> > 5265  => task_ctx_sched_out
> > 5266  => __perf_event_task_sched_out
> > 5267  => __schedule
> > 5268  => schedule
> > 5269  => do_notify_resume
> > 5270  => work_pending
> > 
> > We can see after send SIGINT signal, the process 'perf' will be
> > migrated from CPU2 to CPU0 (line 5242) and it will preempt process
> > 'algorithm1' (line 5252); after the process 'algorithm1' is scheduled
> > out, the function etm_event_stop() will be invoked to stop tracing.
> > 
> > If we connect with the code in cs_etm_read_finish(), it tries to call
> > ioctl PERF_EVENT_IOC_ENABLE, but because the process 'algorithm1' is
> > scheduled out, so the perf event should not be enabled afterwards.
> > 
> > I may miss something at here ... Could you confirm what's the type of
> > attached process?  normal process or RT process?
> > 
> > Thanks,
> > Leo
> > 
> > P.s. I tested IntelPT with 5.2-rc3 kernel, it also can be terminated
> > properly.
> > 
> >>  	return -EINVAL;
> >> -- 
> >> 2.17.1
> >>
> > 
> > .
> > 
> 

> #define _GNU_SOURCE
> 
> #include <stdlib.h>
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/sysinfo.h>
> #include <unistd.h>
> #include <sched.h>
> #include <ctype.h>
> #include <string.h>
> #include <pthread.h>
> 
> int num = 0;
> int test[65535];
> 
> int mess_rw(int data)
> {
>     int i;
> 
>     while (1) {
>         for (i = 0; i < (sizeof(test) / sizeof(test[0])); i++) {
>             if (test[i] != data)
>                 test[i] = data;
>         }
>     }
> }
> 
> void *test_thread(void *arg)
> {
>     int cpu = *(int *)arg;
>     cpu_set_t mask;
> 
>     CPU_ZERO(&mask);
>     CPU_SET(cpu, &mask);
>     if (!sched_setaffinity(0, sizeof(mask), &mask))
>         printf("thread %d: running on cpu %d\n", cpu, cpu);
>     else
>         printf("thread %d: fail to set CPU affinity\n", cpu);
> 
>     mess_rw(cpu);
> 
>     return NULL;
> }
> 
> int main(int argc, char *argv[])
> {
>     num = sysconf(_SC_NPROCESSORS_CONF);
>     pthread_t thread[num];
>     int id[num];
>     int i;
> 
>     printf("PID %d on system with %d processor(s)\n", getpid(), num);
> 
>     for (i = 0; i < num; i++) {
>         id[i] = i;
>         pthread_create(&thread[i], NULL, test_thread, (void *)&id[i]);
>     }
> 
>     for (i = 0; i < num; i++) {
>         pthread_join(thread[i], NULL);
>     }
> 
>     return 0;
> }
>
diff mbox series

Patch

diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index ede040cf82ad..1893a0e3b1e1 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -865,9 +865,13 @@  static int cs_etm_read_finish(struct auxtrace_record *itr, int idx)
 	struct evsel *evsel;
 
 	evlist__for_each_entry(ptr->evlist, evsel) {
-		if (evsel->core.attr.type == ptr->cs_etm_pmu->type)
-			return perf_evlist__enable_event_idx(ptr->evlist,
-							     evsel, idx);
+		if (evsel->core.attr.type == ptr->cs_etm_pmu->type) {
+			if (evsel->disabled)
+				return 0;
+			else
+				return perf_evlist__enable_event_idx(
+						ptr->evlist, evsel, idx);
+		}
 	}
 
 	return -EINVAL;