diff mbox series

[1/7] perf stat: Initialize instead of overwriting clock event

Message ID 20240813132323.98728-2-james.clark@linaro.org (mailing list archive)
State New, archived
Headers show
Series perf stat: Make default perf stat command work on Arm big.LITTLE | expand

Commit Message

James Clark Aug. 13, 2024, 1:23 p.m. UTC
This overwrite relies on the clock event remaining at index 0 and is
quite a way down from where the array is initialized, making it easy to
miss. Just initialize it with the correct clock event to begin with.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/builtin-stat.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Ian Rogers Aug. 13, 2024, 2:28 p.m. UTC | #1
On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>
> This overwrite relies on the clock event remaining at index 0 and is
> quite a way down from where the array is initialized, making it easy to
> miss. Just initialize it with the correct clock event to begin with.
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  tools/perf/builtin-stat.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 1f92445f7480..a65f58f8783f 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
>  {
>         struct perf_event_attr default_attrs0[] = {
>
> -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK             },
> +  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
> +                                               PERF_COUNT_SW_CPU_CLOCK :
> +                                               PERF_COUNT_SW_TASK_CLOCK        },

Hand crafting perf_event_attr when we have an event name to
perf_event_atttr parser doesn't make sense. Doing things this way
means we need to duplicate logic between event parsing and these
default configurations. The default configurations are also using
legacy events which of course are broken on Apple ARM M? (albeit for
hardware events, here it is software). Event and metric parsing has to
worry about things like grouping topdown events. All-in-all let's have
one way to do things, event parsing, otherwise this code is going to
end up reinventing all the workarounds the event parsing has to have.
Lots of struct perf_event_attr also contribute to binary size.

If you are worried about a cycles event being opened on arm_dsu PMUs,
there is this patch:
https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/

Thanks,
Ian
James Clark Aug. 13, 2024, 2:38 p.m. UTC | #2
On 13/08/2024 3:28 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>
>> This overwrite relies on the clock event remaining at index 0 and is
>> quite a way down from where the array is initialized, making it easy to
>> miss. Just initialize it with the correct clock event to begin with.
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   tools/perf/builtin-stat.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 1f92445f7480..a65f58f8783f 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
>>   {
>>          struct perf_event_attr default_attrs0[] = {
>>
>> -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK             },
>> +  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
>> +                                               PERF_COUNT_SW_CPU_CLOCK :
>> +                                               PERF_COUNT_SW_TASK_CLOCK        },
> 
> Hand crafting perf_event_attr when we have an event name to
> perf_event_atttr parser doesn't make sense. Doing things this way
> means we need to duplicate logic between event parsing and these
> default configurations. The default configurations are also using
> legacy events which of course are broken on Apple ARM M? (albeit for
> hardware events, here it is software). Event and metric parsing has to
> worry about things like grouping topdown events. All-in-all let's have
> one way to do things, event parsing, otherwise this code is going to
> end up reinventing all the workarounds the event parsing has to have.
> Lots of struct perf_event_attr also contribute to binary size.
> 
> If you are worried about a cycles event being opened on arm_dsu PMUs,
> there is this patch:
> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> 
> Thanks,
> Ian

Hi Ian,

Is this comment related to this patch specifically or is it more of a 
general comment?

This patch doesn't really make any actual changes other than move one 
line of code from one place to another.

James
Ian Rogers Aug. 13, 2024, 2:43 p.m. UTC | #3
On Tue, Aug 13, 2024 at 7:38 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 13/08/2024 3:28 pm, Ian Rogers wrote:
> > On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >> This overwrite relies on the clock event remaining at index 0 and is
> >> quite a way down from where the array is initialized, making it easy to
> >> miss. Just initialize it with the correct clock event to begin with.
> >>
> >> Signed-off-by: James Clark <james.clark@linaro.org>
> >> ---
> >>   tools/perf/builtin-stat.c | 7 +++----
> >>   1 file changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> >> index 1f92445f7480..a65f58f8783f 100644
> >> --- a/tools/perf/builtin-stat.c
> >> +++ b/tools/perf/builtin-stat.c
> >> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
> >>   {
> >>          struct perf_event_attr default_attrs0[] = {
> >>
> >> -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK             },
> >> +  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
> >> +                                               PERF_COUNT_SW_CPU_CLOCK :
> >> +                                               PERF_COUNT_SW_TASK_CLOCK        },
> >
> > Hand crafting perf_event_attr when we have an event name to
> > perf_event_atttr parser doesn't make sense. Doing things this way
> > means we need to duplicate logic between event parsing and these
> > default configurations. The default configurations are also using
> > legacy events which of course are broken on Apple ARM M? (albeit for
> > hardware events, here it is software). Event and metric parsing has to
> > worry about things like grouping topdown events. All-in-all let's have
> > one way to do things, event parsing, otherwise this code is going to
> > end up reinventing all the workarounds the event parsing has to have.
> > Lots of struct perf_event_attr also contribute to binary size.
> >
> > If you are worried about a cycles event being opened on arm_dsu PMUs,
> > there is this patch:
> > https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> >
> > Thanks,
> > Ian
>
> Hi Ian,
>
> Is this comment related to this patch specifically or is it more of a
> general comment?
>
> This patch doesn't really make any actual changes other than move one
> line of code from one place to another.

James, this code is removed here:
https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/

Thanks,
Ian
James Clark Aug. 13, 2024, 2:56 p.m. UTC | #4
On 13/08/2024 3:43 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 7:38 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 13/08/2024 3:28 pm, Ian Rogers wrote:
>>> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>>>
>>>> This overwrite relies on the clock event remaining at index 0 and is
>>>> quite a way down from where the array is initialized, making it easy to
>>>> miss. Just initialize it with the correct clock event to begin with.
>>>>
>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>> ---
>>>>    tools/perf/builtin-stat.c | 7 +++----
>>>>    1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>>> index 1f92445f7480..a65f58f8783f 100644
>>>> --- a/tools/perf/builtin-stat.c
>>>> +++ b/tools/perf/builtin-stat.c
>>>> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
>>>>    {
>>>>           struct perf_event_attr default_attrs0[] = {
>>>>
>>>> -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK             },
>>>> +  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
>>>> +                                               PERF_COUNT_SW_CPU_CLOCK :
>>>> +                                               PERF_COUNT_SW_TASK_CLOCK        },
>>>
>>> Hand crafting perf_event_attr when we have an event name to
>>> perf_event_atttr parser doesn't make sense. Doing things this way
>>> means we need to duplicate logic between event parsing and these
>>> default configurations. The default configurations are also using
>>> legacy events which of course are broken on Apple ARM M? (albeit for
>>> hardware events, here it is software). Event and metric parsing has to
>>> worry about things like grouping topdown events. All-in-all let's have
>>> one way to do things, event parsing, otherwise this code is going to
>>> end up reinventing all the workarounds the event parsing has to have.
>>> Lots of struct perf_event_attr also contribute to binary size.
>>>
>>> If you are worried about a cycles event being opened on arm_dsu PMUs,
>>> there is this patch:
>>> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>>>
>>> Thanks,
>>> Ian
>>
>> Hi Ian,
>>
>> Is this comment related to this patch specifically or is it more of a
>> general comment?
>>
>> This patch doesn't really make any actual changes other than move one
>> line of code from one place to another.
> 
> James, this code is removed here:
> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
> 
> Thanks,
> Ian

Oh I see yeah. We can still work on that change, merging this one 
doesn't necessarily have to block that one, it just makes this one a bit 
redundant when the other one gets done.

If I remember correctly in one of the last related discussions we 
thought that opening the cycles event as a sampling event should be a 
softer warning?

Actually it seems like the DSU cycles event is a non-issue specifically 
for perf stat because it will open successfully anyway? It was just in 
perf record where it was the issue?
diff mbox series

Patch

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 1f92445f7480..a65f58f8783f 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1817,7 +1817,9 @@  static int add_default_attributes(void)
 {
 	struct perf_event_attr default_attrs0[] = {
 
-  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
+  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
+						PERF_COUNT_SW_CPU_CLOCK :
+						PERF_COUNT_SW_TASK_CLOCK	},
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES	},
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS		},
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS		},
@@ -2030,9 +2032,6 @@  static int add_default_attributes(void)
 
 	if (!evsel_list->core.nr_entries) {
 		/* No events so add defaults. */
-		if (target__has_cpu(&target))
-			default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
-
 		if (evlist__add_default_attrs(evsel_list, default_attrs0) < 0)
 			return -1;
 		if (perf_pmus__have_event("cpu", "stalled-cycles-frontend")) {