diff mbox series

[v4,5/5] perf cs-etm: Synchronize instruction sample with the thread stack

Message ID 20200203020716.31832-6-leo.yan@linaro.org (mailing list archive)
State New, archived
Headers show
Series perf cs-etm: Support thread stack and callchain | expand

Commit Message

Leo Yan Feb. 3, 2020, 2:07 a.m. UTC
The synthesized flow use 'tidq->packet' for instruction samples; on the
other hand, 'tidp->prev_packet' is used to generate the thread stack and
the branch samples, this results in the instruction samples using one
packet ahead than thread stack and branch samples ('tidp->prev_packet'
vs 'tidq->packet').

This leads to an instruction's callchain error as shows in below
example:

  main  1579        100      instructions:
  	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
  	ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
  	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
  	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
  	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
  	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
  	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

In the callchain log, for the two continuous lines the up line contains
one child function info and the followed line contains the caller
function info, and so forth.  So the first two lines are:

  perf_event_update_userpage+0x4c  => the sampled instruction
  perf_event_update_userpage+0x48  => the parent function's calling

The child function and parent function both are the same function
perf_event_update_userpage(), but this isn't a recursive function, thus
the sequence for perf_event_update_userpage() calling itself shouldn't
never happen.  This callchain error is caused by the instruction sample
using an ahead packet than the thread stack, the thread stack is deferred
to process the new packet and misses to pop stack if it is just a return
packet.

To fix this issue, we can simply change to use 'tidq->prev_packet' to
generate the instruction samples, this allows the thread stack to push
and pop synchronously with instruction sample.  Finally, the callchain
can be displayed correctly as below:

  main  1579        100      instructions:
	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Mike Leach Feb. 6, 2020, 3:01 p.m. UTC | #1
Hi Leo,

On Mon, 3 Feb 2020 at 02:08, Leo Yan <leo.yan@linaro.org> wrote:
>
> The synthesized flow use 'tidq->packet' for instruction samples; on the
> other hand, 'tidp->prev_packet' is used to generate the thread stack and
> the branch samples, this results in the instruction samples using one
> packet ahead than thread stack and branch samples ('tidp->prev_packet'
> vs 'tidq->packet').
>
> This leads to an instruction's callchain error as shows in below
> example:
>
>   main  1579        100      instructions:
>         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
>         ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
>         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
>         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
>         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
>         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
>         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
>
> In the callchain log, for the two continuous lines the up line contains
> one child function info and the followed line contains the caller
> function info, and so forth.  So the first two lines are:
>
>   perf_event_update_userpage+0x4c  => the sampled instruction
>   perf_event_update_userpage+0x48  => the parent function's calling
>
> The child function and parent function both are the same function
> perf_event_update_userpage(), but this isn't a recursive function, thus
> the sequence for perf_event_update_userpage() calling itself shouldn't
> never happen.  This callchain error is caused by the instruction sample
> using an ahead packet than the thread stack, the thread stack is deferred
> to process the new packet and misses to pop stack if it is just a return
> packet.
>
> To fix this issue, we can simply change to use 'tidq->prev_packet' to
> generate the instruction samples, this allows the thread stack to push
> and pop synchronously with instruction sample.  Finally, the callchain
> can be displayed correctly as below:
>
>   main  1579        100      instructions:
>         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
>         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
>         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
>         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
>         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
>         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 8f805657658d..410e40ce19f2 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1414,7 +1414,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>         struct cs_etm_packet *tmp;
>         int ret;
>         u8 trace_chan_id = tidq->trace_chan_id;
> -       u64 instrs_executed = tidq->packet->instr_count;
> +       u64 instrs_executed = tidq->prev_packet->instr_count;
>
>         tidq->period_instructions += instrs_executed;
>
> @@ -1505,7 +1505,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>                          * instruction)
>                          */
>                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> -                                                 tidq->packet, offset - 1);
> +                                                 tidq->prev_packet,
> +                                                 offset - 1);
>                         ret = cs_etm__synth_instruction_sample(
>                                 etmq, tidq, addr,
>                                 etm->instructions_sample_period);
> @@ -1525,7 +1526,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>                          * instruction)
>                          */
>                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> -                                                 tidq->packet, offset - 1);
> +                                                 tidq->prev_packet,
> +                                                 offset - 1);
>                         ret = cs_etm__synth_instruction_sample(
>                                 etmq, tidq, addr,
>                                 etm->instructions_sample_period);
> --
> 2.17.1
>
I am really not convinced that this is a correct solution.

Consider a set of trace range packet inputs:
current: 0x3000-0x3050
prev:  0x2000-0x2100
prev-1: 0x1020-0x1080

Before your modification.....
cs_etm__sample()  processes the current packet....

On entry, the branch stack will contain:0x1080=>0x2000;

We add to this from the current packet to get: 0x1080=>0x2000; 0x2100=>0x3000;

This is then copied by cs_etm__copy_last_branch_rb()

We find the instruction sample address in the range 0x3000 to 0x3050,
e.g. 0x3010.
cs_etm__synth_instruction_sample() will then generate a sample with values

sample.ip = 0x3010
sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;

to be passed to the perf session / injected as required.
This sample has the correct branch context for the sampled address -
i.e. how the code arrived @0x3010

After the modification.....
The branch stack will be the same, but the sample address will be from
the range 0x2000-0x2010, e.g. 0x2008 to give a sample in
cs_etm__synth_instruction_sample() of
sample.ip = 0x2008
sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;

This really does not make much sense  - the branch stack no longer
relates to the sample.ip.

Further - cs_etm__synth_instruction_sample() calls cs_etm__copy_insn()
using the _current_ packet and sample.ip. This is a clear mismatch.

I don't know what is causing the apparent error in the callchain, but
given that the previous features added in this set, work without this
alteration, I feel there must be another solution.

Regards

Mike
Leo Yan Feb. 13, 2020, 9:08 a.m. UTC | #2
Hi Mike,

On Thu, Feb 06, 2020 at 03:01:52PM +0000, Mike Leach wrote:
> Hi Leo,
> 
> On Mon, 3 Feb 2020 at 02:08, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > The synthesized flow use 'tidq->packet' for instruction samples; on the
> > other hand, 'tidp->prev_packet' is used to generate the thread stack and
> > the branch samples, this results in the instruction samples using one
> > packet ahead than thread stack and branch samples ('tidp->prev_packet'
> > vs 'tidq->packet').
> >
> > This leads to an instruction's callchain error as shows in below
> > example:
> >
> >   main  1579        100      instructions:
> >         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> >         ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
> >         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> >         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> >         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> >         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> >         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> >
> > In the callchain log, for the two continuous lines the up line contains
> > one child function info and the followed line contains the caller
> > function info, and so forth.  So the first two lines are:
> >
> >   perf_event_update_userpage+0x4c  => the sampled instruction
> >   perf_event_update_userpage+0x48  => the parent function's calling
> >
> > The child function and parent function both are the same function
> > perf_event_update_userpage(), but this isn't a recursive function, thus
> > the sequence for perf_event_update_userpage() calling itself shouldn't
> > never happen.  This callchain error is caused by the instruction sample
> > using an ahead packet than the thread stack, the thread stack is deferred
> > to process the new packet and misses to pop stack if it is just a return
> > packet.
> >
> > To fix this issue, we can simply change to use 'tidq->prev_packet' to
> > generate the instruction samples, this allows the thread stack to push
> > and pop synchronously with instruction sample.  Finally, the callchain
> > can be displayed correctly as below:
> >
> >   main  1579        100      instructions:
> >         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> >         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> >         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> >         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> >         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> >         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 8f805657658d..410e40ce19f2 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1414,7 +1414,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >         struct cs_etm_packet *tmp;
> >         int ret;
> >         u8 trace_chan_id = tidq->trace_chan_id;
> > -       u64 instrs_executed = tidq->packet->instr_count;
> > +       u64 instrs_executed = tidq->prev_packet->instr_count;
> >
> >         tidq->period_instructions += instrs_executed;
> >
> > @@ -1505,7 +1505,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >                          * instruction)
> >                          */
> >                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > -                                                 tidq->packet, offset - 1);
> > +                                                 tidq->prev_packet,
> > +                                                 offset - 1);
> >                         ret = cs_etm__synth_instruction_sample(
> >                                 etmq, tidq, addr,
> >                                 etm->instructions_sample_period);
> > @@ -1525,7 +1526,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >                          * instruction)
> >                          */
> >                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > -                                                 tidq->packet, offset - 1);
> > +                                                 tidq->prev_packet,
> > +                                                 offset - 1);
> >                         ret = cs_etm__synth_instruction_sample(
> >                                 etmq, tidq, addr,
> >                                 etm->instructions_sample_period);
> > --
> > 2.17.1
> >
> I am really not convinced that this is a correct solution.
> 
> Consider a set of trace range packet inputs:
> current: 0x3000-0x3050
> prev:  0x2000-0x2100
> prev-1: 0x1020-0x1080
> 
> Before your modification.....
> cs_etm__sample()  processes the current packet....
> 
> On entry, the branch stack will contain:0x1080=>0x2000;
> 
> We add to this from the current packet to get: 0x1080=>0x2000; 0x2100=>0x3000;
> 
> This is then copied by cs_etm__copy_last_branch_rb()
> 
> We find the instruction sample address in the range 0x3000 to 0x3050,
> e.g. 0x3010.
> cs_etm__synth_instruction_sample() will then generate a sample with values
> 
> sample.ip = 0x3010
> sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;
> 
> to be passed to the perf session / injected as required.
> This sample has the correct branch context for the sampled address -
> i.e. how the code arrived @0x3010
> 
> After the modification.....
> The branch stack will be the same, but the sample address will be from
> the range 0x2000-0x2010, e.g. 0x2008 to give a sample in
> cs_etm__synth_instruction_sample() of
> sample.ip = 0x2008
> sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;
> 
> This really does not make much sense  - the branch stack no longer
> relates to the sample.ip.
> 
> Further - cs_etm__synth_instruction_sample() calls cs_etm__copy_insn()
> using the _current_ packet and sample.ip. This is a clear mismatch.
> 
> I don't know what is causing the apparent error in the callchain, but
> given that the previous features added in this set, work without this
> alteration, I feel there must be another solution.

Good catch!  Thanks a lot for very detailed analysis.

I root caused this issue is relevant with the sequence between two
functions thread_stack__event() and thread_stack__sample().

In this series, thread_stack__sample() is prior to thread_stack__event(),
thus the thread stack event cannot be handled before thread stack
generation.

If move the function thread_stack__event() up and place it before
instruction sample synthesizing; thread_stack__event() can be invoked
prior to thread_stack__sample(), then I can see the thread stack can
be popped properly and the issue can be fixed.  Simply to say, patch
0002 should change the code as below:

         /*
          * Record a branch when the last instruction in
          * PREV_PACKET is a branch.
          */
         if (etm->synth_opts.last_branch &&                                 
             tidq->prev_packet->sample_type == CS_ETM_RANGE &&              
             tidq->prev_packet->last_instr_taken_branch)                    
                 cs_etm__update_last_branch_rb(etmq, tidq);                 
                                                                            
         /*                                                                 
          * The stack event must be processed prior to synthesizing         
          * instruction sample; this can ensure the instruction samples     
          * to generate correct thread stack.                               
          */                                                                
         if (tidq->prev_packet->last_instr_taken_branch)                    
                 cs_etm__add_stack_event(etmq, tidq);                       
                                                                            
         if (etm->sample_instructions &&                                    
             tidq->period_instructions >= etm->instructions_sample_period) {

                cs_etm__synth_instruction_sample();
                    `-> thread_stack__sample();

         }

Does this make sense for you?

Thanks,
Leo Yan
Mike Leach Feb. 13, 2020, 3 p.m. UTC | #3
Hi Leo,

On Thu, 13 Feb 2020 at 09:08, Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Mike,
>
> On Thu, Feb 06, 2020 at 03:01:52PM +0000, Mike Leach wrote:
> > Hi Leo,
> >
> > On Mon, 3 Feb 2020 at 02:08, Leo Yan <leo.yan@linaro.org> wrote:
> > >
> > > The synthesized flow use 'tidq->packet' for instruction samples; on the
> > > other hand, 'tidp->prev_packet' is used to generate the thread stack and
> > > the branch samples, this results in the instruction samples using one
> > > packet ahead than thread stack and branch samples ('tidp->prev_packet'
> > > vs 'tidq->packet').
> > >
> > > This leads to an instruction's callchain error as shows in below
> > > example:
> > >
> > >   main  1579        100      instructions:
> > >         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> > >         ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
> > >         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> > >         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> > >         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> > >         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> > >         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> > >
> > > In the callchain log, for the two continuous lines the up line contains
> > > one child function info and the followed line contains the caller
> > > function info, and so forth.  So the first two lines are:
> > >
> > >   perf_event_update_userpage+0x4c  => the sampled instruction
> > >   perf_event_update_userpage+0x48  => the parent function's calling
> > >
> > > The child function and parent function both are the same function
> > > perf_event_update_userpage(), but this isn't a recursive function, thus
> > > the sequence for perf_event_update_userpage() calling itself shouldn't
> > > never happen.  This callchain error is caused by the instruction sample
> > > using an ahead packet than the thread stack, the thread stack is deferred
> > > to process the new packet and misses to pop stack if it is just a return
> > > packet.
> > >
> > > To fix this issue, we can simply change to use 'tidq->prev_packet' to
> > > generate the instruction samples, this allows the thread stack to push
> > > and pop synchronously with instruction sample.  Finally, the callchain
> > > can be displayed correctly as below:
> > >
> > >   main  1579        100      instructions:
> > >         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> > >         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> > >         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> > >         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> > >         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> > >         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> > >
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > >  tools/perf/util/cs-etm.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > > index 8f805657658d..410e40ce19f2 100644
> > > --- a/tools/perf/util/cs-etm.c
> > > +++ b/tools/perf/util/cs-etm.c
> > > @@ -1414,7 +1414,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > >         struct cs_etm_packet *tmp;
> > >         int ret;
> > >         u8 trace_chan_id = tidq->trace_chan_id;
> > > -       u64 instrs_executed = tidq->packet->instr_count;
> > > +       u64 instrs_executed = tidq->prev_packet->instr_count;
> > >
> > >         tidq->period_instructions += instrs_executed;
> > >
> > > @@ -1505,7 +1505,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > >                          * instruction)
> > >                          */
> > >                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > > -                                                 tidq->packet, offset - 1);
> > > +                                                 tidq->prev_packet,
> > > +                                                 offset - 1);
> > >                         ret = cs_etm__synth_instruction_sample(
> > >                                 etmq, tidq, addr,
> > >                                 etm->instructions_sample_period);
> > > @@ -1525,7 +1526,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > >                          * instruction)
> > >                          */
> > >                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > > -                                                 tidq->packet, offset - 1);
> > > +                                                 tidq->prev_packet,
> > > +                                                 offset - 1);
> > >                         ret = cs_etm__synth_instruction_sample(
> > >                                 etmq, tidq, addr,
> > >                                 etm->instructions_sample_period);
> > > --
> > > 2.17.1
> > >
> > I am really not convinced that this is a correct solution.
> >
> > Consider a set of trace range packet inputs:
> > current: 0x3000-0x3050
> > prev:  0x2000-0x2100
> > prev-1: 0x1020-0x1080
> >
> > Before your modification.....
> > cs_etm__sample()  processes the current packet....
> >
> > On entry, the branch stack will contain:0x1080=>0x2000;
> >
> > We add to this from the current packet to get: 0x1080=>0x2000; 0x2100=>0x3000;
> >
> > This is then copied by cs_etm__copy_last_branch_rb()
> >
> > We find the instruction sample address in the range 0x3000 to 0x3050,
> > e.g. 0x3010.
> > cs_etm__synth_instruction_sample() will then generate a sample with values
> >
> > sample.ip = 0x3010
> > sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;
> >
> > to be passed to the perf session / injected as required.
> > This sample has the correct branch context for the sampled address -
> > i.e. how the code arrived @0x3010
> >
> > After the modification.....
> > The branch stack will be the same, but the sample address will be from
> > the range 0x2000-0x2010, e.g. 0x2008 to give a sample in
> > cs_etm__synth_instruction_sample() of
> > sample.ip = 0x2008
> > sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;
> >
> > This really does not make much sense  - the branch stack no longer
> > relates to the sample.ip.
> >
> > Further - cs_etm__synth_instruction_sample() calls cs_etm__copy_insn()
> > using the _current_ packet and sample.ip. This is a clear mismatch.
> >
> > I don't know what is causing the apparent error in the callchain, but
> > given that the previous features added in this set, work without this
> > alteration, I feel there must be another solution.
>
> Good catch!  Thanks a lot for very detailed analysis.
>
> I root caused this issue is relevant with the sequence between two
> functions thread_stack__event() and thread_stack__sample().
>
> In this series, thread_stack__sample() is prior to thread_stack__event(),
> thus the thread stack event cannot be handled before thread stack
> generation.
>
> If move the function thread_stack__event() up and place it before
> instruction sample synthesizing; thread_stack__event() can be invoked
> prior to thread_stack__sample(), then I can see the thread stack can
> be popped properly and the issue can be fixed.  Simply to say, patch
> 0002 should change the code as below:
>
>          /*
>           * Record a branch when the last instruction in
>           * PREV_PACKET is a branch.
>           */
>          if (etm->synth_opts.last_branch &&
>              tidq->prev_packet->sample_type == CS_ETM_RANGE &&
>              tidq->prev_packet->last_instr_taken_branch)
>                  cs_etm__update_last_branch_rb(etmq, tidq);
>
>          /*
>           * The stack event must be processed prior to synthesizing
>           * instruction sample; this can ensure the instruction samples
>           * to generate correct thread stack.
>           */
>          if (tidq->prev_packet->last_instr_taken_branch)
>                  cs_etm__add_stack_event(etmq, tidq);
>
>          if (etm->sample_instructions &&
>              tidq->period_instructions >= etm->instructions_sample_period) {
>
>                 cs_etm__synth_instruction_sample();
>                     `-> thread_stack__sample();
>
>          }
>
> Does this make sense for you?
>
This looks good.

Regards

Mike



> Thanks,
> Leo Yan
diff mbox series

Patch

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8f805657658d..410e40ce19f2 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1414,7 +1414,7 @@  static int cs_etm__sample(struct cs_etm_queue *etmq,
 	struct cs_etm_packet *tmp;
 	int ret;
 	u8 trace_chan_id = tidq->trace_chan_id;
-	u64 instrs_executed = tidq->packet->instr_count;
+	u64 instrs_executed = tidq->prev_packet->instr_count;
 
 	tidq->period_instructions += instrs_executed;
 
@@ -1505,7 +1505,8 @@  static int cs_etm__sample(struct cs_etm_queue *etmq,
 			 * instruction)
 			 */
 			addr = cs_etm__instr_addr(etmq, trace_chan_id,
-						  tidq->packet, offset - 1);
+						  tidq->prev_packet,
+						  offset - 1);
 			ret = cs_etm__synth_instruction_sample(
 				etmq, tidq, addr,
 				etm->instructions_sample_period);
@@ -1525,7 +1526,8 @@  static int cs_etm__sample(struct cs_etm_queue *etmq,
 			 * instruction)
 			 */
 			addr = cs_etm__instr_addr(etmq, trace_chan_id,
-						  tidq->packet, offset - 1);
+						  tidq->prev_packet,
+						  offset - 1);
 			ret = cs_etm__synth_instruction_sample(
 				etmq, tidq, addr,
 				etm->instructions_sample_period);