mbox series

[0/7] perf stat: Make default perf stat command work on Arm big.LITTLE

Message ID 20240813132323.98728-1-james.clark@linaro.org (mailing list archive)
Headers show
Series perf stat: Make default perf stat command work on Arm big.LITTLE | expand

Message

James Clark Aug. 13, 2024, 1:23 p.m. UTC
The important patches are 3 and 5, the rest are tidyups and tests.

I don't think there is any interaction with the other open issues
about the uncore DSU cycles event or JSON/legacy hw event priorities
because only hw events on core PMUs are used for the default
stat command. And also just sharing the existing x86 code works so
no big changes are required.

For patch 3 the weak arch specific symbol has to continue to be used
rather than picking the implementation based on 
perf_pmus__supports_extended_type() like in patch 5. This is because
that function ends up calling evsel__hw_name() itself which results
in recursion. But at least one weak arch_* construct has been removed,
so it's better than nothing.


James Clark (7):
  perf stat: Initialize instead of overwriting clock event
  perf stat: Remove unused default_null_attrs
  perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
  perf evsel: Remove duplicated __evsel__hw_name() code
  perf evlist: Use hybrid default attrs whenever extended type is
    supported
  perf test: Make stat test work on DT devices
  perf test: Add a test for default perf stat command

 tools/perf/arch/arm64/util/Build   |  1 +
 tools/perf/arch/arm64/util/evsel.c |  7 ++++
 tools/perf/arch/x86/util/evlist.c  | 65 ------------------------------
 tools/perf/arch/x86/util/evsel.c   | 17 +-------
 tools/perf/builtin-stat.c          | 12 ++----
 tools/perf/tests/shell/stat.sh     | 33 ++++++++++++---
 tools/perf/util/evlist.c           | 65 ++++++++++++++++++++++++++----
 tools/perf/util/evlist.h           |  6 +--
 tools/perf/util/evsel.c            | 19 +++++++++
 tools/perf/util/evsel.h            |  2 +-
 10 files changed, 119 insertions(+), 108 deletions(-)
 create mode 100644 tools/perf/arch/arm64/util/evsel.c

Comments

Ian Rogers Aug. 13, 2024, 2:35 p.m. UTC | #1
On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>
> The important patches are 3 and 5, the rest are tidyups and tests.
>
> I don't think there is any interaction with the other open issues
> about the uncore DSU cycles event or JSON/legacy hw event priorities
> because only hw events on core PMUs are used for the default
> stat command. And also just sharing the existing x86 code works so
> no big changes are required.
>
> For patch 3 the weak arch specific symbol has to continue to be used
> rather than picking the implementation based on
> perf_pmus__supports_extended_type() like in patch 5. This is because
> that function ends up calling evsel__hw_name() itself which results
> in recursion. But at least one weak arch_* construct has been removed,
> so it's better than nothing.

Let's not do things this way. The use of strings is architecture
neutral, means we don't need to create new arch functions on things
like RISC-V, it encapsulates the complexity of things like topdown
events, Apple ARM M CPUs not supporting legacy events, etc.
Duplicating the existing x86 logic, when that was something trying to
be removed, is not the way to go. That logic was a holdover from the
hybrid tech debt we've been working to remove with a generic approach.

Thanks,
Ian

> James Clark (7):
>   perf stat: Initialize instead of overwriting clock event
>   perf stat: Remove unused default_null_attrs
>   perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
>   perf evsel: Remove duplicated __evsel__hw_name() code
>   perf evlist: Use hybrid default attrs whenever extended type is
>     supported
>   perf test: Make stat test work on DT devices
>   perf test: Add a test for default perf stat command
>
>  tools/perf/arch/arm64/util/Build   |  1 +
>  tools/perf/arch/arm64/util/evsel.c |  7 ++++
>  tools/perf/arch/x86/util/evlist.c  | 65 ------------------------------
>  tools/perf/arch/x86/util/evsel.c   | 17 +-------
>  tools/perf/builtin-stat.c          | 12 ++----
>  tools/perf/tests/shell/stat.sh     | 33 ++++++++++++---
>  tools/perf/util/evlist.c           | 65 ++++++++++++++++++++++++++----
>  tools/perf/util/evlist.h           |  6 +--
>  tools/perf/util/evsel.c            | 19 +++++++++
>  tools/perf/util/evsel.h            |  2 +-
>  10 files changed, 119 insertions(+), 108 deletions(-)
>  create mode 100644 tools/perf/arch/arm64/util/evsel.c
>
> --
> 2.34.1
>
James Clark Aug. 13, 2024, 2:45 p.m. UTC | #2
On 13/08/2024 3:35 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>
>> The important patches are 3 and 5, the rest are tidyups and tests.
>>
>> I don't think there is any interaction with the other open issues
>> about the uncore DSU cycles event or JSON/legacy hw event priorities
>> because only hw events on core PMUs are used for the default
>> stat command. And also just sharing the existing x86 code works so
>> no big changes are required.
>>
>> For patch 3 the weak arch specific symbol has to continue to be used
>> rather than picking the implementation based on
>> perf_pmus__supports_extended_type() like in patch 5. This is because
>> that function ends up calling evsel__hw_name() itself which results
>> in recursion. But at least one weak arch_* construct has been removed,
>> so it's better than nothing.
> 
> Let's not do things this way. The use of strings is architecture
> neutral, means we don't need to create new arch functions on things
> like RISC-V, it encapsulates the complexity of things like topdown

If the new arch function is an issue that could be worked around by 
calling perf_pmus__supports_extended_type() on patch 3 as well? It just 
needs a small change to not recurse.

> events, Apple ARM M CPUs not supporting legacy events, etc.

If Apple M doesn't support the HW events does _any_ default Perf stat 
command (hybrid or not) work? I'm not really trying to fix that here, 
just make whatever already works work on big.LITTLE.

> Duplicating the existing x86 logic, when that was something trying to
> be removed, is not the way to go. That logic was a holdover from the
> hybrid tech debt we've been working to remove with a generic approach.
> 
> Thanks,
> Ian
> 

I think all of that may make sense, but in this case I haven't actually 
duplicated anything, rather shared the existing code to also be used on 
Arm.

This means we can have the default perf stat working on Arm from today, 
and if any other changes get made it will continue to work as I've also 
added a test for it.

>> James Clark (7):
>>    perf stat: Initialize instead of overwriting clock event
>>    perf stat: Remove unused default_null_attrs
>>    perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
>>    perf evsel: Remove duplicated __evsel__hw_name() code
>>    perf evlist: Use hybrid default attrs whenever extended type is
>>      supported
>>    perf test: Make stat test work on DT devices
>>    perf test: Add a test for default perf stat command
>>
>>   tools/perf/arch/arm64/util/Build   |  1 +
>>   tools/perf/arch/arm64/util/evsel.c |  7 ++++
>>   tools/perf/arch/x86/util/evlist.c  | 65 ------------------------------
>>   tools/perf/arch/x86/util/evsel.c   | 17 +-------
>>   tools/perf/builtin-stat.c          | 12 ++----
>>   tools/perf/tests/shell/stat.sh     | 33 ++++++++++++---
>>   tools/perf/util/evlist.c           | 65 ++++++++++++++++++++++++++----
>>   tools/perf/util/evlist.h           |  6 +--
>>   tools/perf/util/evsel.c            | 19 +++++++++
>>   tools/perf/util/evsel.h            |  2 +-
>>   10 files changed, 119 insertions(+), 108 deletions(-)
>>   create mode 100644 tools/perf/arch/arm64/util/evsel.c
>>
>> --
>> 2.34.1
>>
James Clark Aug. 13, 2024, 3:10 p.m. UTC | #3
On 13/08/2024 3:45 pm, James Clark wrote:
> 
> 
> On 13/08/2024 3:35 pm, Ian Rogers wrote:
>> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> 
>> wrote:
>>>
>>> The important patches are 3 and 5, the rest are tidyups and tests.
>>>
>>> I don't think there is any interaction with the other open issues
>>> about the uncore DSU cycles event or JSON/legacy hw event priorities
>>> because only hw events on core PMUs are used for the default
>>> stat command. And also just sharing the existing x86 code works so
>>> no big changes are required.
>>>
>>> For patch 3 the weak arch specific symbol has to continue to be used
>>> rather than picking the implementation based on
>>> perf_pmus__supports_extended_type() like in patch 5. This is because
>>> that function ends up calling evsel__hw_name() itself which results
>>> in recursion. But at least one weak arch_* construct has been removed,
>>> so it's better than nothing.
>>
>> Let's not do things this way. The use of strings is architecture
>> neutral, means we don't need to create new arch functions on things
>> like RISC-V, it encapsulates the complexity of things like topdown
> 
> If the new arch function is an issue that could be worked around by 
> calling perf_pmus__supports_extended_type() on patch 3 as well? It just 
> needs a small change to not recurse.
> 
>> events, Apple ARM M CPUs not supporting legacy events, etc.
> 
> If Apple M doesn't support the HW events does _any_ default Perf stat 
> command (hybrid or not) work? I'm not really trying to fix that here, 
> just make whatever already works work on big.LITTLE.
> 
>> Duplicating the existing x86 logic, when that was something trying to
>> be removed, is not the way to go. That logic was a holdover from the
>> hybrid tech debt we've been working to remove with a generic approach.
>>
>> Thanks,
>> Ian
>>
> 
> I think all of that may make sense, but in this case I haven't actually 
> duplicated anything, rather shared the existing code to also be used on 
> Arm.
> 
> This means we can have the default perf stat working on Arm from today, 
> and if any other changes get made it will continue to work as I've also 
> added a test for it.
> 

I would also like to note that (not including the new test) this 
patchset _removes_ more code than it adds, so to say it duplicates code 
is a bit unfair.

Of course this touches some similar areas to your other change, but that 
doesn't mean I think we shouldn't continue with that one. I'm still 
happy to review and test or contribute to that one if you like. I think 
it just helped me to do it in this order and get this thing in a working 
state first before the next bigger step.

>>> James Clark (7):
>>>    perf stat: Initialize instead of overwriting clock event
>>>    perf stat: Remove unused default_null_attrs
>>>    perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
>>>    perf evsel: Remove duplicated __evsel__hw_name() code
>>>    perf evlist: Use hybrid default attrs whenever extended type is
>>>      supported
>>>    perf test: Make stat test work on DT devices
>>>    perf test: Add a test for default perf stat command
>>>
>>>   tools/perf/arch/arm64/util/Build   |  1 +
>>>   tools/perf/arch/arm64/util/evsel.c |  7 ++++
>>>   tools/perf/arch/x86/util/evlist.c  | 65 ------------------------------
>>>   tools/perf/arch/x86/util/evsel.c   | 17 +-------
>>>   tools/perf/builtin-stat.c          | 12 ++----
>>>   tools/perf/tests/shell/stat.sh     | 33 ++++++++++++---
>>>   tools/perf/util/evlist.c           | 65 ++++++++++++++++++++++++++----
>>>   tools/perf/util/evlist.h           |  6 +--
>>>   tools/perf/util/evsel.c            | 19 +++++++++
>>>   tools/perf/util/evsel.h            |  2 +-
>>>   10 files changed, 119 insertions(+), 108 deletions(-)
>>>   create mode 100644 tools/perf/arch/arm64/util/evsel.c
>>>
>>> -- 
>>> 2.34.1
>>>