[v2,1/4] perf cs-etm: Continuously record last branches
diff mbox series

Message ID 20191101020750.29063-2-leo.yan@linaro.org
State New
Headers show
Series
  • perf cs-etm: Fix synthesizing instruction samples
Related show

Commit Message

Leo Yan Nov. 1, 2019, 2:07 a.m. UTC
Every time synthesize instruction sample, the last branches recording
will be reset.  This would be fine if the instruction period is big
enough, for example if we use the option '--itrace=i100000', the last
branch array is reset for every instruction sample (10000 instructions
per period); before generate the next instruction sample, there has the
enough packets coming to fill last branch array.  On the other hand,
if set a very small period, the packets will be significantly reduced
between two continuous instruction samples, thus if the last branch
array is reset for the previous instruction sample, it's almost empty
for the next instruction sample.

To allow the last branches to work for any instruction periods, this
patch avoids to reset the last branches for every instruction sample
and only reset it when flush the trace data.  The last branches will
be reset only for two cases, one is for trace starting, another case
is for discontinuous trace; thus it can continuously record last
branches.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Robert Walker Nov. 1, 2019, 3:30 p.m. UTC | #1
On 01/11/2019 02:07, Leo Yan wrote:
> Every time synthesize instruction sample, the last branches recording
> will be reset.  This would be fine if the instruction period is big
> enough, for example if we use the option '--itrace=i100000', the last
> branch array is reset for every instruction sample (10000 instructions
> per period); before generate the next instruction sample, there has the
> enough packets coming to fill last branch array.  On the other hand,
> if set a very small period, the packets will be significantly reduced
> between two continuous instruction samples, thus if the last branch
> array is reset for the previous instruction sample, it's almost empty
> for the next instruction sample.
>
> To allow the last branches to work for any instruction periods, this
> patch avoids to reset the last branches for every instruction sample
> and only reset it when flush the trace data.  The last branches will
> be reset only for two cases, one is for trace starting, another case
> is for discontinuous trace; thus it can continuously record last
> branches.

Is this the right thing to do?  This would cause profiling tools to 
count the same branch several times if it appears in multiple 
instruction samples, which could result in a biased profile.

The current implementation matches the behavior of intel_pt where the 
branch buffer is reset after each sample, so  the instruction sample 
only includes branches since the previous sample.

However x86 lbr (perf record -b) does appear to repeat the same full 
branch stack on several samples until a new stack is captured.

I'm not sure what the right or wrong answer is here.  For AutoFDO, we're 
likely to use a much bigger period (>10000 instructions) so won't be 
affected, but other tools might be.

Regards


Rob


> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>   tools/perf/util/cs-etm.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index f5f855fff412..8be6d010ae84 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1153,9 +1153,6 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>   			"CS ETM Trace: failed to deliver instruction event, error %d\n",
>   			ret);
>   
> -	if (etm->synth_opts.last_branch)
> -		cs_etm__reset_last_branch_rb(tidq);
> -
>   	return ret;
>   }
>   
> @@ -1486,6 +1483,10 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>   		tidq->prev_packet = tmp;
>   	}
>   
> +	/* Reset last branches after flush the trace */
> +	if (etm->synth_opts.last_branch)
> +		cs_etm__reset_last_branch_rb(tidq);
> +
>   	return err;
>   }
>
Leo Yan Nov. 2, 2019, 6:11 a.m. UTC | #2
Hi Rob,

On Fri, Nov 01, 2019 at 03:30:19PM +0000, Robert Walker wrote:
> On 01/11/2019 02:07, Leo Yan wrote:
> > Every time synthesize instruction sample, the last branches recording
> > will be reset.  This would be fine if the instruction period is big
> > enough, for example if we use the option '--itrace=i100000', the last
> > branch array is reset for every instruction sample (10000 instructions
> > per period); before generate the next instruction sample, there has the
> > enough packets coming to fill last branch array.  On the other hand,
> > if set a very small period, the packets will be significantly reduced
> > between two continuous instruction samples, thus if the last branch
> > array is reset for the previous instruction sample, it's almost empty
> > for the next instruction sample.
> > 
> > To allow the last branches to work for any instruction periods, this
> > patch avoids to reset the last branches for every instruction sample
> > and only reset it when flush the trace data.  The last branches will
> > be reset only for two cases, one is for trace starting, another case
> > is for discontinuous trace; thus it can continuously record last
> > branches.
> 
> Is this the right thing to do?

Thanks for reviewing and bringing up the questions.  To be honest, my
concern was mainly related with AudoFDO but I don't aware other
potential issues.  So any concern is welcome, in case I miss anything;
hope we can get conclusion with some dicussion.  Please see more
detailed explanation in below.

> This would cause profiling tools to count
> the same branch several times if it appears in multiple instruction samples,
> which could result in a biased profile.

Let's clarify for this.  Firstly, here the 'branch' doesn't refer to
'branch' sample, it means the last branch recording for instruction
samples.  So basically, neither instruction sample nor branch sample
will be changed with this patch.

This patch tries to fix the issue as below:

Before this patch:

  ffff800010083580 <el0_sync>:
  ffff800010083580:  stp     x0, x1, [sp]         -> synthesize instruction sample(n),
                                                     record the last branch,
                                                     reset the last branch.
  ffff800010083584:  stp     x2, x3, [sp,#16]
  ffff800010083588:  stp     x4, x5, [sp,#32]     -> synthesize instruction sample(n+1),
                                                     the last branch is empty which is
                                                     reset by the instructiom sample(n).
  ffff80001008358c:  stp     x6, x7, [sp,#48]
  ffff800010083590:  stp     x8, x9, [sp,#64]     -> synthesize instruction sample(n+2),
                                                     the last branch is empty which is
                                                     reset by the instructiom sample(n).
  [...]


After this patch:

  ffff800010083580 <el0_sync>:
  ffff800010083580:  stp     x0, x1, [sp]         -> synthesize instruction sample(n),
                                                     record the last branch.
  ffff800010083584:  stp     x2, x3, [sp,#16]
  ffff800010083588:  stp     x4, x5, [sp,#32]     -> synthesize instruction sample(n+1),
                                                     record the last branch.
  ffff80001008358c:  stp     x6, x7, [sp,#48]
  ffff800010083590:  stp     x8, x9, [sp,#64]     -> synthesize instruction sample(n+2),
                                                     record the last branch.
  [...]


So from my understanding, the last branch recording works as the
affiliate info for instruction samples and it allows us (or tools) to
know what's the execution flow for the instruction samples.  Seems to
me, it doesn't change value for instruction sample, but we can have
correct info of the last branch recording for every instruction samples.

> The current implementation matches the behavior of intel_pt where the branch
> buffer is reset after each sample, so  the instruction sample only includes
> branches since the previous sample.

Exactly.

@Adrian, it would be nice if you could confirm intel_pt should apply
the samiliar fixing or not?

> However x86 lbr (perf record -b) does appear to repeat the same full branch
> stack on several samples until a new stack is captured.
> 
> I'm not sure what the right or wrong answer is here.  For AutoFDO, we're
> likely to use a much bigger period (>10000 instructions) so won't be
> affected, but other tools might be.

Agree, if AutoFDO uses big period (e.g. --itrace=i10000), this patch
will not change anything.  With big period, it has enough packets to
generate branch recording between two instruction samples.

Could you elaborate what's 'other tools'?  If it's open sourced tool,
I can try to test with this patch set.

Thanks,
Leo Yan

Patch
diff mbox series

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f5f855fff412..8be6d010ae84 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1153,9 +1153,6 @@  static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 			"CS ETM Trace: failed to deliver instruction event, error %d\n",
 			ret);
 
-	if (etm->synth_opts.last_branch)
-		cs_etm__reset_last_branch_rb(tidq);
-
 	return ret;
 }
 
@@ -1486,6 +1483,10 @@  static int cs_etm__flush(struct cs_etm_queue *etmq,
 		tidq->prev_packet = tmp;
 	}
 
+	/* Reset last branches after flush the trace */
+	if (etm->synth_opts.last_branch)
+		cs_etm__reset_last_branch_rb(tidq);
+
 	return err;
 }