diff mbox series

[v5,2/9] perf cs-etm: Reflect branch prior to exception

Message ID 20200220052701.7754-3-leo.yan@linaro.org (mailing list archive)
State New, archived
Headers show
Series perf cs-etm: Support thread stack and callchain | expand

Commit Message

Leo Yan Feb. 20, 2020, 5:26 a.m. UTC
When a branch instruction is to be executed, if the branch target
address is not mapped into the virtual address space, this branch
instruction will trigger an exception with data abort.  For this case,
CoreSight decoding flow cannot reflect the complete branch flow prior
to exception, and leads the user space addresses inconsistency before
and after the exception handling.

Let's see the detailed explanation for the issue with an example:

  Packet 0: range packet
            start_addr=0xffffad8018a4     end_addr=0xffffad8018ec
  Packet 1: exception packet
            start_addr=0xffffad8018a4     end_addr=0xffffad801910
  Packet 2: range packet
            start_addr=0xffff800010081c00 end_addr=0xffff800010081c18

There have three packets are coming; from packet 0 to packet 1,
CPU tries to branch from 0xffffad8018ec-4 to 0xffffad801910, accessing
the address 0xffffad801910 causes the data abort, so this branch is not
taken and an exception is triggered and jump to 0xffff800010081c00 in
packet 2.

When handle this sequence, it misses a range packet for the branch
between 0xffffad8018ec-4 and 0xffffad801910, so Perf tool cannot
generate a branch sample for it and this might introduce confusion for
the addresses before and after exception handling, since we can see the
exception return address is 0xffffad801910, which is not a sequential
value for the address 0xffffad8018ec-4 before exception was taken.

  0xffffad8018ec-4 -> 0xffff800010081c00: exception is taken ...
  ... exception return back -> 0xffffad801910

To fix this issue, firstly we need to decide which conditions can be
used to distinguish that a branch triggers an exception.  So below
conditions are used to make decision:

  - Check if the exception is a trap by comparing the specific sample
    flag for the exception packet;
  - The exception packet's end address is not same with its previous
    range packet's end address, which implies a branch triggering the
    exception and the branch target address is contained in the
    exception packet's end address.

This patch changes the exception packet to a 'fake' range packet; this
allows to generate an extra branch sample for the branch instruction
prior to the exception (between 0xffffad8018ec-4 and 0xffffad801910).
So finally can get below samples:

  0xffffad8018ec-4 -> 0xffffad801910: branch
  0xffffad801910 -> 0xffff800010081c00: exception is taken ...
  ... exception return back -> 0xffffad801910

Note, this 'fake' range packet will add an extra recording for last
branch array and change the thread stack pushing and popping (if later
supported).  But since 'fake' range packet's instruction length is set
to zero, it doesn't introduce any change for instruction samples.

Before:

  # perf script -F,+flags

             main  3258          1          branches:   int                      ffffad8018e8 dl_main+0x820 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) => ffff800010081c00 vectors+0x400 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010081c20 vectors+0x420 ([kernel.kallsyms]) => ffff800010082bc0 el0_sync+0x0 ([kernel.kallsyms])
             main  3258          1          branches:   jcc                  ffff800010082c8c el0_sync+0xcc ([kernel.kallsyms]) => ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms]) => ffff800010082ccc el0_sync+0x10c ([kernel.kallsyms])
             [...]
             main  3258          1          branches:   jcc                  ffff800010083574 finish_ret_to_user+0x34 ([kernel.kallsyms]) => ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms]) => ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms]) => ffff8000100835c4 finish_ret_to_user+0x84 ([kernel.kallsyms])
             main  3258          1          branches:   iret                 ffff800010083610 finish_ret_to_user+0xd0 ([kernel.kallsyms]) =>     ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

After:

  # perf script -F,+flags

             main  3258          1          branches:   jmp                      ffffad8018e8 dl_main+0x820 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) =>     ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
             main  3258          1          branches:   int                      ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) => ffff800010081c00 vectors+0x400 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010081c20 vectors+0x420 ([kernel.kallsyms]) => ffff800010082bc0 el0_sync+0x0 ([kernel.kallsyms])
             main  3258          1          branches:   jcc                  ffff800010082c8c el0_sync+0xcc ([kernel.kallsyms]) => ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms]) => ffff800010082ccc el0_sync+0x10c ([kernel.kallsyms])
             [...]
             main  3258          1          branches:   jcc                  ffff800010083574 finish_ret_to_user+0x34 ([kernel.kallsyms]) => ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms]) => ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms]) => ffff8000100835c4 finish_ret_to_user+0x84 ([kernel.kallsyms])
             main  3258          1          branches:   iret                 ffff800010083610 finish_ret_to_user+0xd0 ([kernel.kallsyms]) =>     ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

Suggested-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  1 +
 tools/perf/util/cs-etm.c                      | 66 ++++++++++++++++++-
 2 files changed, 65 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index cd92a99eb89d..f1f66d883391 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -482,6 +482,7 @@  cs_etm_decoder__buffer_exception(struct cs_etm_packet_queue *queue,
 
 	packet = &queue->packet_buffer[queue->tail];
 	packet->exception_number = elem->exception_number;
+	packet->end_addr = elem->en_addr;
 
 	return ret;
 }
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 48932a7a933f..7cf30b5e0e20 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1477,8 +1477,11 @@  static int cs_etm__sample(struct cs_etm_queue *etmq,
 	return 0;
 }
 
-static int cs_etm__exception(struct cs_etm_traceid_queue *tidq)
+static int cs_etm__exception(struct cs_etm_queue *etmq,
+			     struct cs_etm_traceid_queue *tidq)
 {
+	u32 flags;
+
 	/*
 	 * Usually the exception packet follows a range packet, if it's not the
 	 * case, directly bail out.
@@ -1486,6 +1489,65 @@  static int cs_etm__exception(struct cs_etm_traceid_queue *tidq)
 	if (tidq->prev_packet->sample_type != CS_ETM_RANGE)
 		return 0;
 
+	/*
+	 * If the exception is a trap and its end_addr is not same with its
+	 * previous range packet's end_addr, this implies the exception is
+	 * triggered by a branch and the exception packet's end_addr is the
+	 * branch target address from the previous range packet.
+	 *
+	 * Below is an example with three packets:
+	 *   Packet 0: range packet
+	 *             start_addr=0xffffad8018a4     end_addr=0xffffad8018ec
+	 *   Packet 1: exception packet
+	 *             start_addr=0xffffad8018a4     end_addr=0xffffad801910
+	 *   Packet 2: range packet
+	 *             start_addr=0xffff800010081c00 end_addr=0xffff800010081c18
+	 *
+	 * CPU tries to branch from 0xffffad8018ec-4 (packet 0) to
+	 * 0xffffad801910 (packet 1), accessing the address 0xffffad801910
+	 * causes data abort, so the branch is not taken and an exception is
+	 * triggered and jump to 0xffff800010081c00 (packet 2).
+	 *
+	 * For this case, it misses a range packet for the branch between
+	 * 0xffffad8018ec-4 and 0xffffad801910, so perf tool cannot generate
+	 * branch sample and introduces confusion for exception return parsing:
+	 *
+	 *   0xffffad8018ec-4 -> 0xffff800010081c00: exception is taken
+	 *   ... exception return back ... -> 0xffffad801910
+	 *
+	 * To fix this issue, the exception packet is changed to a 'fake'
+	 * range packet.  This can allow to generate a branch sample between
+	 * 0xffffad8018ec-4 and 0xffffad801910.  Finally get below samples:
+	 *
+	 *   0xffffad8018ec-4 -> 0xffffad801910: branch
+	 *   0xffffad801910 -> 0xffff800010081c00: exception is taken
+	 *   ... exception return back ... -> 0xffffad801910
+	 */
+
+	/* Use flags to check if the exception is trap */
+	flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		PERF_IP_FLAG_INTERRUPT;
+
+	if (tidq->packet->sample_type == CS_ETM_EXCEPTION &&
+	    tidq->packet->flags == flags &&
+	    tidq->packet->end_addr != tidq->prev_packet->end_addr) {
+		/*
+		 * Change the exception packet to a range packet, so can reflect
+		 * branch from prev_packet::end_addr-4 to packet::start_addr;
+		 *
+		 * This branch is not taken yet, so set its instruction count
+		 * to zero.  Set 'last_instr_taken_branch' to true, so allow
+		 * it to generate samples with its seqential range packet.
+		 */
+		tidq->packet->sample_type = CS_ETM_RANGE;
+		tidq->packet->start_addr = tidq->packet->end_addr;
+		tidq->packet->instr_count = 0;
+		tidq->packet->last_instr_taken_branch = true;
+
+		/* Generate sample with the previous range packet */
+		return cs_etm__sample(etmq, tidq);
+	}
+
 	/*
 	 * When the exception packet is inserted, whether the last instruction
 	 * in previous range packet is taken branch or not, we need to force
@@ -2045,7 +2107,7 @@  static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
 			 * make sure the previous instruction
 			 * range packet to be handled properly.
 			 */
-			cs_etm__exception(tidq);
+			cs_etm__exception(etmq, tidq);
 			break;
 		case CS_ETM_DISCONTINUITY:
 			/*