diff mbox series

[1/8] perf cs-etm: Don't flush when packet_queue fills up

Message ID 20240905105043.160225-2-james.clark@linaro.org (mailing list archive)
State New, archived
Headers show
Series perf: cs-etm: Coresight decode and disassembly improvements | expand

Commit Message

James Clark Sept. 5, 2024, 10:50 a.m. UTC
cs_etm__flush(), like cs_etm__sample() is an operation that generates a
sample and then swaps the current with the previous packet. Calling
flush after processing the queues results in two swaps which corrupts
the next sample. Therefore it wasn't appropriate to call flush here so
remove it.

Flushing is still done on a discontinuity to explicitly clear the last
branch buffer, but when the packet_queue fills up before reaching a
timestamp, that's not a discontinuity and the call to
cs_etm__process_traceid_queue() already generated samples and drained
the buffers correctly.

This is visible by looking for a branch that has the same target as the
previous branch and the following source is before the address of the
last target, which is impossible as execution would have had to have
gone backwards:

  ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94
   (packet_queue fills here before a timestamp, resulting in a flush and
    branch target ffff80008011cadc is duplicated.)
  ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff80008011cadc update_sg_lb_stats+0x94
  ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34

After removing the flush the correct branch target is used for the
second sample, and ffff8000801117c4 is no longer before the previous
address:

  ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94
  ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff8000801117a0 cpu_util+0x0
  ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34

Fixes: 21fe8dc1191a ("perf cs-etm: Add support for CPU-wide trace scenarios")
Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/util/cs-etm.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Leo Yan Sept. 10, 2024, 8:28 p.m. UTC | #1
On 9/5/2024 11:50 AM, James Clark wrote:

[...]

> cs_etm__flush(), like cs_etm__sample() is an operation that generates a
> sample and then swaps the current with the previous packet. Calling
> flush after processing the queues results in two swaps which corrupts
> the next sample. Therefore it wasn't appropriate to call flush here so
> remove it.

In the cs_etm__sample(), if the period is not overflow, it is not necessarily
to generate instruction samples and copy back stack entries. This is why we
want to call cs_etm__flush() to make sure the last packet can be recorded
properly for instruction sample with back stacks.

We also need to take account into the case for the end of the session - in
this case we need to generate samples for the last packet for complete info.

I am wandering should we remove the cs_etm__packet_swap() from cs_etm__sample()?

Thanks,
Leo

> Flushing is still done on a discontinuity to explicitly clear the last
> branch buffer, but when the packet_queue fills up before reaching a
> timestamp, that's not a discontinuity and the call to
> cs_etm__process_traceid_queue() already generated samples and drained
> the buffers correctly.
> 
> This is visible by looking for a branch that has the same target as the
> previous branch and the following source is before the address of the
> last target, which is impossible as execution would have had to have
> gone backwards:
> 
>   ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94
>    (packet_queue fills here before a timestamp, resulting in a flush and
>     branch target ffff80008011cadc is duplicated.)
>   ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff80008011cadc update_sg_lb_stats+0x94
>   ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34
> 
> After removing the flush the correct branch target is used for the
> second sample, and ffff8000801117c4 is no longer before the previous
> address:
> 
>   ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94
>   ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff8000801117a0 cpu_util+0x0
>   ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34
> 
> Fixes: 21fe8dc1191a ("perf cs-etm: Add support for CPU-wide trace scenarios")
> Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 90f32f327b9b..602e791ff5ba 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2490,12 +2490,6 @@ static void cs_etm__clear_all_traceid_queues(struct cs_etm_queue *etmq)
> 
>                 /* Ignore return value */
>                 cs_etm__process_traceid_queue(etmq, tidq);
> -
> -               /*
> -                * Generate an instruction sample with the remaining
> -                * branchstack entries.
> -                */
> -               cs_etm__flush(etmq, tidq);
>         }
>  }
> 
> --
> 2.34.1
>
Leo Yan Sept. 11, 2024, 8:14 a.m. UTC | #2
On 9/10/2024 9:28 PM, Leo Yan wrote:
> On 9/5/2024 11:50 AM, James Clark wrote:
> 
> [...]
> 
>> cs_etm__flush(), like cs_etm__sample() is an operation that generates a
>> sample and then swaps the current with the previous packet. Calling
>> flush after processing the queues results in two swaps which corrupts
>> the next sample. Therefore it wasn't appropriate to call flush here so
>> remove it.
> 
> In the cs_etm__sample(), if the period is not overflow, it is not necessarily
> to generate instruction samples and copy back stack entries. This is why we
> want to call cs_etm__flush() to make sure the last packet can be recorded
> properly for instruction sample with back stacks.
> 
> We also need to take account into the case for the end of the session - in
> this case we need to generate samples for the last packet for complete info.
> 
> I am wandering should we remove the cs_etm__packet_swap() from cs_etm__sample()?

Sorry for typo. I meant to remove the cs_etm__packet_swap() from cs_etm__flush().

Thanks,
Leo
James Clark Sept. 12, 2024, 3:22 p.m. UTC | #3
On 11/09/2024 09:14, Leo Yan wrote:
> 
> 
> On 9/10/2024 9:28 PM, Leo Yan wrote:
>> On 9/5/2024 11:50 AM, James Clark wrote:
>>
>> [...]
>>
>>> cs_etm__flush(), like cs_etm__sample() is an operation that generates a
>>> sample and then swaps the current with the previous packet. Calling
>>> flush after processing the queues results in two swaps which corrupts
>>> the next sample. Therefore it wasn't appropriate to call flush here so
>>> remove it.
>>
>> In the cs_etm__sample(), if the period is not overflow, it is not necessarily
>> to generate instruction samples and copy back stack entries. This is why we
>> want to call cs_etm__flush() to make sure the last packet can be recorded
>> properly for instruction sample with back stacks.
>>
>> We also need to take account into the case for the end of the session - in
>> this case we need to generate samples for the last packet for complete info.
>>
>> I am wandering should we remove the cs_etm__packet_swap() from cs_etm__sample()?
> 
> Sorry for typo. I meant to remove the cs_etm__packet_swap() from cs_etm__flush().
> 
> Thanks,
> Leo

Turns out there was already cs_etm__end_block() for the end of the 
session, but it was only called for the timeless modes. I added it for 
timestamped mode too in V2.

I also kept the existing flush() function for discontinuities. I changed 
my mind that the differences to cs_etm__sample() weren't relevant.

So I think we still need to keep the swap in flush() because otherwise 
the next sample won't start from the right place.
diff mbox series

Patch

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 90f32f327b9b..602e791ff5ba 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2490,12 +2490,6 @@  static void cs_etm__clear_all_traceid_queues(struct cs_etm_queue *etmq)
 
 		/* Ignore return value */
 		cs_etm__process_traceid_queue(etmq, tidq);
-
-		/*
-		 * Generate an instruction sample with the remaining
-		 * branchstack entries.
-		 */
-		cs_etm__flush(etmq, tidq);
 	}
 }