mbox series

[v4,0/7] Event parsing fixes

Message ID 20240822132506.1468090-1-james.clark@linaro.org (mailing list archive)
Headers show
Series Event parsing fixes | expand

Message

James Clark Aug. 22, 2024, 1:24 p.m. UTC
I rebased this one and made some other fixes so that I could test it,
so I thought I'd repost it here in case it's helpful. I also added a
new test.

But for the testing it all looks ok.

There is one small difference where it now shows "stalled-cycles-..."
as <not supported> events, when before it just didn't show them at all when
they weren't supported:

  $ perf stat -- true

  Performance counter stats for 'true':

              0.66 msec task-clock                       #    0.384 CPUs utilized             
                 0      context-switches                 #    0.000 /sec                      
                 0      cpu-migrations                   #    0.000 /sec                      
                52      page-faults                      #   78.999 K/sec                     
     <not counted>      cpu_atom/instructions/                                                  (0.00%)
           978,399      cpu_core/instructions/           #    1.02  insn per cycle            
     <not counted>      cpu_atom/cycles/                                                        (0.00%)
           959,722      cpu_core/cycles/                 #    1.458 GHz                       
   <not supported>      cpu_atom/stalled-cycles-frontend/                                      
   <not supported>      cpu_core/stalled-cycles-frontend/                                      

I don't think that's a big deal though and could probably be fixed up
later if we really want to.

Tested on Raptor Lake, Juno, N1, Ampere (with the DSU cycles PMU) and
I also faked an Apple M on Juno. 

Changes since v3:

  * Rebase onto perf-tools-next 6236ebe07
  * Fix Intel TPEBS counting mode test
  * Fix arm-spe build
  * Add support for DT devices in stat test
  * Add a new test for hybrid perf stat default arguments

Ian Rogers (5):
  perf evsel: Add alternate_hw_config and use in evsel__match
  perf stat: Uniquify event name improvements
  perf stat: Remove evlist__add_default_attrs use strings
  perf evsel x86: Make evsel__has_perf_metrics work for legacy events
  perf evsel: Remove pmu_name

James Clark (2):
  perf test: Make stat test work on DT devices
  perf test: Add a test for default perf stat command

 tools/perf/arch/arm64/util/arm-spe.c          |   4 +-
 tools/perf/arch/x86/util/evlist.c             |  74 +----
 tools/perf/arch/x86/util/evsel.c              |  35 ++-
 tools/perf/builtin-diff.c                     |   6 +-
 tools/perf/builtin-stat.c                     | 291 +++++++-----------
 tools/perf/tests/parse-events.c               |   2 +-
 tools/perf/tests/shell/stat.sh                |  33 +-
 .../perf/tests/shell/test_stat_intel_tpebs.sh |  11 +-
 tools/perf/util/evlist.c                      |  46 +--
 tools/perf/util/evlist.h                      |  12 -
 tools/perf/util/evsel.c                       |  28 +-
 tools/perf/util/evsel.h                       |  22 +-
 tools/perf/util/metricgroup.c                 |   4 +-
 tools/perf/util/parse-events.c                |  58 ++--
 tools/perf/util/parse-events.h                |   8 +-
 tools/perf/util/parse-events.y                |   2 +-
 tools/perf/util/pmu.c                         |   6 +-
 tools/perf/util/pmu.h                         |   2 +-
 tools/perf/util/stat-display.c                | 101 ++++--
 tools/perf/util/stat-shadow.c                 |  14 +-
 tools/perf/util/stat.c                        |   2 +-
 21 files changed, 348 insertions(+), 413 deletions(-)

Comments

Liang, Kan Aug. 22, 2024, 2:32 p.m. UTC | #1
On 2024-08-22 9:24 a.m., James Clark wrote:
> I rebased this one and made some other fixes so that I could test it,
> so I thought I'd repost it here in case it's helpful. I also added a
> new test.
> 
> But for the testing it all looks ok.
> 
> There is one small difference where it now shows "stalled-cycles-..."
> as <not supported> events, when before it just didn't show them at all when
> they weren't supported:
> 
>   $ perf stat -- true
> 
>   Performance counter stats for 'true':
> 
>               0.66 msec task-clock                       #    0.384 CPUs utilized             
>                  0      context-switches                 #    0.000 /sec                      
>                  0      cpu-migrations                   #    0.000 /sec                      
>                 52      page-faults                      #   78.999 K/sec                     
>      <not counted>      cpu_atom/instructions/                                                  (0.00%)
>            978,399      cpu_core/instructions/           #    1.02  insn per cycle            
>      <not counted>      cpu_atom/cycles/                                                        (0.00%)
>            959,722      cpu_core/cycles/                 #    1.458 GHz                       
>    <not supported>      cpu_atom/stalled-cycles-frontend/                                      
>    <not supported>      cpu_core/stalled-cycles-frontend/                                      
>

Intel didn't support the events for a very long time. It would impact
many existing generations and all future generations.
The current method is to hide the non-exist events. The TopdownL1 is an
example. If it doesn't exist in the json file, perf stat will not
display it.
I don't think it's a good idea to disclose non-exist events in the perf
stat default.

The <not supported> doesn't help here, since there could be many reasons
that the perf tool fails to open a counter. It just provides a
misleading message for an event that never existed.

Thanks,
Kan
> I don't think that's a big deal though and could probably be fixed up
> later if we really want to.
> 
> Tested on Raptor Lake, Juno, N1, Ampere (with the DSU cycles PMU) and
> I also faked an Apple M on Juno. 
> 
> Changes since v3:
> 
>   * Rebase onto perf-tools-next 6236ebe07
>   * Fix Intel TPEBS counting mode test
>   * Fix arm-spe build
>   * Add support for DT devices in stat test
>   * Add a new test for hybrid perf stat default arguments
> 
> Ian Rogers (5):
>   perf evsel: Add alternate_hw_config and use in evsel__match
>   perf stat: Uniquify event name improvements
>   perf stat: Remove evlist__add_default_attrs use strings
>   perf evsel x86: Make evsel__has_perf_metrics work for legacy events
>   perf evsel: Remove pmu_name
> 
> James Clark (2):
>   perf test: Make stat test work on DT devices
>   perf test: Add a test for default perf stat command
> 
>  tools/perf/arch/arm64/util/arm-spe.c          |   4 +-
>  tools/perf/arch/x86/util/evlist.c             |  74 +----
>  tools/perf/arch/x86/util/evsel.c              |  35 ++-
>  tools/perf/builtin-diff.c                     |   6 +-
>  tools/perf/builtin-stat.c                     | 291 +++++++-----------
>  tools/perf/tests/parse-events.c               |   2 +-
>  tools/perf/tests/shell/stat.sh                |  33 +-
>  .../perf/tests/shell/test_stat_intel_tpebs.sh |  11 +-
>  tools/perf/util/evlist.c                      |  46 +--
>  tools/perf/util/evlist.h                      |  12 -
>  tools/perf/util/evsel.c                       |  28 +-
>  tools/perf/util/evsel.h                       |  22 +-
>  tools/perf/util/metricgroup.c                 |   4 +-
>  tools/perf/util/parse-events.c                |  58 ++--
>  tools/perf/util/parse-events.h                |   8 +-
>  tools/perf/util/parse-events.y                |   2 +-
>  tools/perf/util/pmu.c                         |   6 +-
>  tools/perf/util/pmu.h                         |   2 +-
>  tools/perf/util/stat-display.c                | 101 ++++--
>  tools/perf/util/stat-shadow.c                 |  14 +-
>  tools/perf/util/stat.c                        |   2 +-
>  21 files changed, 348 insertions(+), 413 deletions(-)
>
Ian Rogers Aug. 22, 2024, 3:10 p.m. UTC | #2
On Thu, Aug 22, 2024 at 7:32 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-08-22 9:24 a.m., James Clark wrote:
> > I rebased this one and made some other fixes so that I could test it,
> > so I thought I'd repost it here in case it's helpful. I also added a
> > new test.
> >
> > But for the testing it all looks ok.
> >
> > There is one small difference where it now shows "stalled-cycles-..."
> > as <not supported> events, when before it just didn't show them at all when
> > they weren't supported:
> >
> >   $ perf stat -- true
> >
> >   Performance counter stats for 'true':
> >
> >               0.66 msec task-clock                       #    0.384 CPUs utilized
> >                  0      context-switches                 #    0.000 /sec
> >                  0      cpu-migrations                   #    0.000 /sec
> >                 52      page-faults                      #   78.999 K/sec
> >      <not counted>      cpu_atom/instructions/                                                  (0.00%)
> >            978,399      cpu_core/instructions/           #    1.02  insn per cycle
> >      <not counted>      cpu_atom/cycles/                                                        (0.00%)
> >            959,722      cpu_core/cycles/                 #    1.458 GHz
> >    <not supported>      cpu_atom/stalled-cycles-frontend/
> >    <not supported>      cpu_core/stalled-cycles-frontend/
> >
>
> Intel didn't support the events for a very long time. It would impact
> many existing generations and all future generations.
> The current method is to hide the non-exist events. The TopdownL1 is an
> example. If it doesn't exist in the json file, perf stat will not
> display it.
> I don't think it's a good idea to disclose non-exist events in the perf
> stat default.
>
> The <not supported> doesn't help here, since there could be many reasons
> that the perf tool fails to open a counter. It just provides a
> misleading message for an event that never existed.

The list of "default" events, not metrics, similarly has "<not
supported>" in many configurations with "-dd" or "-ddd" on AMD. I'm
not sure the set of default events, at different detail levels, is
necessarily the best. The default events can also be a source of
multiplexing, for example, showing branch miss rate alongside topdown
metrics. Anyway, for the "<not supported>" we should probably be able
to tweak should_skip_zero_counter that is in stat-display.c and tag
these default events as "skippable".

Thanks,
Ian

> Thanks,
> Kan
> > I don't think that's a big deal though and could probably be fixed up
> > later if we really want to.
> >
> > Tested on Raptor Lake, Juno, N1, Ampere (with the DSU cycles PMU) and
> > I also faked an Apple M on Juno.
> >
> > Changes since v3:
> >
> >   * Rebase onto perf-tools-next 6236ebe07
> >   * Fix Intel TPEBS counting mode test
> >   * Fix arm-spe build
> >   * Add support for DT devices in stat test
> >   * Add a new test for hybrid perf stat default arguments
> >
> > Ian Rogers (5):
> >   perf evsel: Add alternate_hw_config and use in evsel__match
> >   perf stat: Uniquify event name improvements
> >   perf stat: Remove evlist__add_default_attrs use strings
> >   perf evsel x86: Make evsel__has_perf_metrics work for legacy events
> >   perf evsel: Remove pmu_name
> >
> > James Clark (2):
> >   perf test: Make stat test work on DT devices
> >   perf test: Add a test for default perf stat command
> >
> >  tools/perf/arch/arm64/util/arm-spe.c          |   4 +-
> >  tools/perf/arch/x86/util/evlist.c             |  74 +----
> >  tools/perf/arch/x86/util/evsel.c              |  35 ++-
> >  tools/perf/builtin-diff.c                     |   6 +-
> >  tools/perf/builtin-stat.c                     | 291 +++++++-----------
> >  tools/perf/tests/parse-events.c               |   2 +-
> >  tools/perf/tests/shell/stat.sh                |  33 +-
> >  .../perf/tests/shell/test_stat_intel_tpebs.sh |  11 +-
> >  tools/perf/util/evlist.c                      |  46 +--
> >  tools/perf/util/evlist.h                      |  12 -
> >  tools/perf/util/evsel.c                       |  28 +-
> >  tools/perf/util/evsel.h                       |  22 +-
> >  tools/perf/util/metricgroup.c                 |   4 +-
> >  tools/perf/util/parse-events.c                |  58 ++--
> >  tools/perf/util/parse-events.h                |   8 +-
> >  tools/perf/util/parse-events.y                |   2 +-
> >  tools/perf/util/pmu.c                         |   6 +-
> >  tools/perf/util/pmu.h                         |   2 +-
> >  tools/perf/util/stat-display.c                | 101 ++++--
> >  tools/perf/util/stat-shadow.c                 |  14 +-
> >  tools/perf/util/stat.c                        |   2 +-
> >  21 files changed, 348 insertions(+), 413 deletions(-)
> >
Liang, Kan Aug. 22, 2024, 3:18 p.m. UTC | #3
On 2024-08-22 11:10 a.m., Ian Rogers wrote:
> On Thu, Aug 22, 2024 at 7:32 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2024-08-22 9:24 a.m., James Clark wrote:
>>> I rebased this one and made some other fixes so that I could test it,
>>> so I thought I'd repost it here in case it's helpful. I also added a
>>> new test.
>>>
>>> But for the testing it all looks ok.
>>>
>>> There is one small difference where it now shows "stalled-cycles-..."
>>> as <not supported> events, when before it just didn't show them at all when
>>> they weren't supported:
>>>
>>>   $ perf stat -- true
>>>
>>>   Performance counter stats for 'true':
>>>
>>>               0.66 msec task-clock                       #    0.384 CPUs utilized
>>>                  0      context-switches                 #    0.000 /sec
>>>                  0      cpu-migrations                   #    0.000 /sec
>>>                 52      page-faults                      #   78.999 K/sec
>>>      <not counted>      cpu_atom/instructions/                                                  (0.00%)
>>>            978,399      cpu_core/instructions/           #    1.02  insn per cycle
>>>      <not counted>      cpu_atom/cycles/                                                        (0.00%)
>>>            959,722      cpu_core/cycles/                 #    1.458 GHz
>>>    <not supported>      cpu_atom/stalled-cycles-frontend/
>>>    <not supported>      cpu_core/stalled-cycles-frontend/
>>>
>>
>> Intel didn't support the events for a very long time. It would impact
>> many existing generations and all future generations.
>> The current method is to hide the non-exist events. The TopdownL1 is an
>> example. If it doesn't exist in the json file, perf stat will not
>> display it.
>> I don't think it's a good idea to disclose non-exist events in the perf
>> stat default.
>>
>> The <not supported> doesn't help here, since there could be many reasons
>> that the perf tool fails to open a counter. It just provides a
>> misleading message for an event that never existed.
> 
> The list of "default" events, not metrics, similarly has "<not
> supported>" in many configurations with "-dd" or "-ddd" on AMD. I'm
> not sure the set of default events, at different detail levels, is
> necessarily the best. The default events can also be a source of
> multiplexing, for example, showing branch miss rate alongside topdown
> metrics. Anyway, for the "<not supported>" we should probably be able
> to tweak should_skip_zero_counter that is in stat-display.c and tag
> these default events as "skippable".

The "skippable" should be fine as long as it's completely hidden.

BTW: The stalled-cycles-backend should be similar to the
stalled-cycles-frontend, but it isn't shown in the example. Is the
stalled-cycles-backend event missed?

Thanks,
Kan
> 
> Thanks,
> Ian
> 
>> Thanks,
>> Kan
>>> I don't think that's a big deal though and could probably be fixed up
>>> later if we really want to.
>>>
>>> Tested on Raptor Lake, Juno, N1, Ampere (with the DSU cycles PMU) and
>>> I also faked an Apple M on Juno.
>>>
>>> Changes since v3:
>>>
>>>   * Rebase onto perf-tools-next 6236ebe07
>>>   * Fix Intel TPEBS counting mode test
>>>   * Fix arm-spe build
>>>   * Add support for DT devices in stat test
>>>   * Add a new test for hybrid perf stat default arguments
>>>
>>> Ian Rogers (5):
>>>   perf evsel: Add alternate_hw_config and use in evsel__match
>>>   perf stat: Uniquify event name improvements
>>>   perf stat: Remove evlist__add_default_attrs use strings
>>>   perf evsel x86: Make evsel__has_perf_metrics work for legacy events
>>>   perf evsel: Remove pmu_name
>>>
>>> James Clark (2):
>>>   perf test: Make stat test work on DT devices
>>>   perf test: Add a test for default perf stat command
>>>
>>>  tools/perf/arch/arm64/util/arm-spe.c          |   4 +-
>>>  tools/perf/arch/x86/util/evlist.c             |  74 +----
>>>  tools/perf/arch/x86/util/evsel.c              |  35 ++-
>>>  tools/perf/builtin-diff.c                     |   6 +-
>>>  tools/perf/builtin-stat.c                     | 291 +++++++-----------
>>>  tools/perf/tests/parse-events.c               |   2 +-
>>>  tools/perf/tests/shell/stat.sh                |  33 +-
>>>  .../perf/tests/shell/test_stat_intel_tpebs.sh |  11 +-
>>>  tools/perf/util/evlist.c                      |  46 +--
>>>  tools/perf/util/evlist.h                      |  12 -
>>>  tools/perf/util/evsel.c                       |  28 +-
>>>  tools/perf/util/evsel.h                       |  22 +-
>>>  tools/perf/util/metricgroup.c                 |   4 +-
>>>  tools/perf/util/parse-events.c                |  58 ++--
>>>  tools/perf/util/parse-events.h                |   8 +-
>>>  tools/perf/util/parse-events.y                |   2 +-
>>>  tools/perf/util/pmu.c                         |   6 +-
>>>  tools/perf/util/pmu.h                         |   2 +-
>>>  tools/perf/util/stat-display.c                | 101 ++++--
>>>  tools/perf/util/stat-shadow.c                 |  14 +-
>>>  tools/perf/util/stat.c                        |   2 +-
>>>  21 files changed, 348 insertions(+), 413 deletions(-)
>>>
>
James Clark Aug. 27, 2024, 9:13 a.m. UTC | #4
On 22/08/2024 4:18 pm, Liang, Kan wrote:
> 
> 
> On 2024-08-22 11:10 a.m., Ian Rogers wrote:
>> On Thu, Aug 22, 2024 at 7:32 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>
>>>
>>>
>>> On 2024-08-22 9:24 a.m., James Clark wrote:
>>>> I rebased this one and made some other fixes so that I could test it,
>>>> so I thought I'd repost it here in case it's helpful. I also added a
>>>> new test.
>>>>
>>>> But for the testing it all looks ok.
>>>>
>>>> There is one small difference where it now shows "stalled-cycles-..."
>>>> as <not supported> events, when before it just didn't show them at all when
>>>> they weren't supported:
>>>>
>>>>    $ perf stat -- true
>>>>
>>>>    Performance counter stats for 'true':
>>>>
>>>>                0.66 msec task-clock                       #    0.384 CPUs utilized
>>>>                   0      context-switches                 #    0.000 /sec
>>>>                   0      cpu-migrations                   #    0.000 /sec
>>>>                  52      page-faults                      #   78.999 K/sec
>>>>       <not counted>      cpu_atom/instructions/                                                  (0.00%)
>>>>             978,399      cpu_core/instructions/           #    1.02  insn per cycle
>>>>       <not counted>      cpu_atom/cycles/                                                        (0.00%)
>>>>             959,722      cpu_core/cycles/                 #    1.458 GHz
>>>>     <not supported>      cpu_atom/stalled-cycles-frontend/
>>>>     <not supported>      cpu_core/stalled-cycles-frontend/
>>>>
>>>
>>> Intel didn't support the events for a very long time. It would impact
>>> many existing generations and all future generations.
>>> The current method is to hide the non-exist events. The TopdownL1 is an
>>> example. If it doesn't exist in the json file, perf stat will not
>>> display it.
>>> I don't think it's a good idea to disclose non-exist events in the perf
>>> stat default.
>>>
>>> The <not supported> doesn't help here, since there could be many reasons
>>> that the perf tool fails to open a counter. It just provides a
>>> misleading message for an event that never existed.
>>
>> The list of "default" events, not metrics, similarly has "<not
>> supported>" in many configurations with "-dd" or "-ddd" on AMD. I'm
>> not sure the set of default events, at different detail levels, is
>> necessarily the best. The default events can also be a source of
>> multiplexing, for example, showing branch miss rate alongside topdown
>> metrics. Anyway, for the "<not supported>" we should probably be able
>> to tweak should_skip_zero_counter that is in stat-display.c and tag
>> these default events as "skippable".
> 
> The "skippable" should be fine as long as it's completely hidden.
> 
> BTW: The stalled-cycles-backend should be similar to the
> stalled-cycles-frontend, but it isn't shown in the example. Is the
> stalled-cycles-backend event missed?
> 
> Thanks,
> Kan

Sorry I should have made it clearer that I truncated the output just to 
focus on the <not supported> part. The full output is below and it does 
include stalled-cycles-backend.

I'll have a look at trying to hide the ones that don't exist, I think it 
will look cleaner. But at the same time what it says isn't incorrect, 
and it's not like we hide the lines from cores where the process didn't 
run, so it doesn't look out of place with the <not counted> ones.



   $ perf stat -- true

   Performance counter stats for 'true':

                0.42 msec task-clock                #    0.439 CPUs 
utilized
                   0      context-switches          #    0.000 /sec 

                   0      cpu-migrations            #    0.000 /sec 

                  53      page-faults               #  125.592 K/sec 

             978,160      cpu_atom/instructions/    #    0.91  insn per 
cycle
       <not counted>      cpu_core/instructions/                 (0.00%)
           1,070,525      cpu_atom/cycles/          #    2.537 GHz 

       <not counted>      cpu_core/cycles/                       (0.00%)
     <not supported>      cpu_atom/stalled-cycles-frontend/
     <not supported>      cpu_core/stalled-cycles-frontend/
     <not supported>      cpu_atom/stalled-cycles-backend/
     <not supported>      cpu_core/stalled-cycles-backend/
             175,814      cpu_atom/branches/        #  416.620 M/sec 

       <not counted>      cpu_core/branches/                     (0.00%)
               6,851      cpu_atom/branch-misses/   #    3.90% of all 
branches
       <not counted>      cpu_core/branch-misses/                (0.00%)
               TopdownL1 (cpu_atom)     #     17.4 % 
tma_bad_speculation
                                        #     21.8 %  tma_retiring 

               TopdownL1 (cpu_atom)     #     27.5 %  tma_backend_bound 

                                        #     33.3 %  tma_frontend_bound 


         0.000960792 seconds time elapsed

         0.000000000 seconds user
         0.000471000 seconds sys
Andi Kleen Aug. 28, 2024, 5:16 a.m. UTC | #5
> There is one small difference where it now shows "stalled-cycles-..."
> as <not supported> events, when before it just didn't show them at all when
> they weren't supported:
> 
>   $ perf stat -- true
> 
>   Performance counter stats for 'true':
> 
>               0.66 msec task-clock                       #    0.384 CPUs utilized             
>                  0      context-switches                 #    0.000 /sec                      
>                  0      cpu-migrations                   #    0.000 /sec                      
>                 52      page-faults                      #   78.999 K/sec                     
>      <not counted>      cpu_atom/instructions/                                                  (0.00%)
>            978,399      cpu_core/instructions/           #    1.02  insn per cycle            
>      <not counted>      cpu_atom/cycles/                                                        (0.00%)
>            959,722      cpu_core/cycles/                 #    1.458 GHz                       
>    <not supported>      cpu_atom/stalled-cycles-frontend/                                      
>    <not supported>      cpu_core/stalled-cycles-frontend/                                      
> 
> I don't think that's a big deal though and could probably be fixed up
> later if we really want to.

I would prefer if that was fixed up. That's quite ugly for the default
view on x86/Intel.

-Andi