Message ID | 20240813132323.98728-2-james.clark@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | perf stat: Make default perf stat command work on Arm big.LITTLE | expand |
On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote: > > This overwrite relies on the clock event remaining at index 0 and is > quite a way down from where the array is initialized, making it easy to > miss. Just initialize it with the correct clock event to begin with. > > Signed-off-by: James Clark <james.clark@linaro.org> > --- > tools/perf/builtin-stat.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > index 1f92445f7480..a65f58f8783f 100644 > --- a/tools/perf/builtin-stat.c > +++ b/tools/perf/builtin-stat.c > @@ -1817,7 +1817,9 @@ static int add_default_attributes(void) > { > struct perf_event_attr default_attrs0[] = { > > - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, > + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ? > + PERF_COUNT_SW_CPU_CLOCK : > + PERF_COUNT_SW_TASK_CLOCK }, Hand crafting perf_event_attr when we have an event name to perf_event_atttr parser doesn't make sense. Doing things this way means we need to duplicate logic between event parsing and these default configurations. The default configurations are also using legacy events which of course are broken on Apple ARM M? (albeit for hardware events, here it is software). Event and metric parsing has to worry about things like grouping topdown events. All-in-all let's have one way to do things, event parsing, otherwise this code is going to end up reinventing all the workarounds the event parsing has to have. Lots of struct perf_event_attr also contribute to binary size. If you are worried about a cycles event being opened on arm_dsu PMUs, there is this patch: https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/ Thanks, Ian
On 13/08/2024 3:28 pm, Ian Rogers wrote: > On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote: >> >> This overwrite relies on the clock event remaining at index 0 and is >> quite a way down from where the array is initialized, making it easy to >> miss. Just initialize it with the correct clock event to begin with. >> >> Signed-off-by: James Clark <james.clark@linaro.org> >> --- >> tools/perf/builtin-stat.c | 7 +++---- >> 1 file changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c >> index 1f92445f7480..a65f58f8783f 100644 >> --- a/tools/perf/builtin-stat.c >> +++ b/tools/perf/builtin-stat.c >> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void) >> { >> struct perf_event_attr default_attrs0[] = { >> >> - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, >> + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ? >> + PERF_COUNT_SW_CPU_CLOCK : >> + PERF_COUNT_SW_TASK_CLOCK }, > > Hand crafting perf_event_attr when we have an event name to > perf_event_atttr parser doesn't make sense. Doing things this way > means we need to duplicate logic between event parsing and these > default configurations. The default configurations are also using > legacy events which of course are broken on Apple ARM M? (albeit for > hardware events, here it is software). Event and metric parsing has to > worry about things like grouping topdown events. All-in-all let's have > one way to do things, event parsing, otherwise this code is going to > end up reinventing all the workarounds the event parsing has to have. > Lots of struct perf_event_attr also contribute to binary size. > > If you are worried about a cycles event being opened on arm_dsu PMUs, > there is this patch: > https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/ > > Thanks, > Ian Hi Ian, Is this comment related to this patch specifically or is it more of a general comment? This patch doesn't really make any actual changes other than move one line of code from one place to another. James
On Tue, Aug 13, 2024 at 7:38 AM James Clark <james.clark@linaro.org> wrote: > > > > On 13/08/2024 3:28 pm, Ian Rogers wrote: > > On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote: > >> > >> This overwrite relies on the clock event remaining at index 0 and is > >> quite a way down from where the array is initialized, making it easy to > >> miss. Just initialize it with the correct clock event to begin with. > >> > >> Signed-off-by: James Clark <james.clark@linaro.org> > >> --- > >> tools/perf/builtin-stat.c | 7 +++---- > >> 1 file changed, 3 insertions(+), 4 deletions(-) > >> > >> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > >> index 1f92445f7480..a65f58f8783f 100644 > >> --- a/tools/perf/builtin-stat.c > >> +++ b/tools/perf/builtin-stat.c > >> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void) > >> { > >> struct perf_event_attr default_attrs0[] = { > >> > >> - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, > >> + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ? > >> + PERF_COUNT_SW_CPU_CLOCK : > >> + PERF_COUNT_SW_TASK_CLOCK }, > > > > Hand crafting perf_event_attr when we have an event name to > > perf_event_atttr parser doesn't make sense. Doing things this way > > means we need to duplicate logic between event parsing and these > > default configurations. The default configurations are also using > > legacy events which of course are broken on Apple ARM M? (albeit for > > hardware events, here it is software). Event and metric parsing has to > > worry about things like grouping topdown events. All-in-all let's have > > one way to do things, event parsing, otherwise this code is going to > > end up reinventing all the workarounds the event parsing has to have. > > Lots of struct perf_event_attr also contribute to binary size. > > > > If you are worried about a cycles event being opened on arm_dsu PMUs, > > there is this patch: > > https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/ > > > > Thanks, > > Ian > > Hi Ian, > > Is this comment related to this patch specifically or is it more of a > general comment? > > This patch doesn't really make any actual changes other than move one > line of code from one place to another. James, this code is removed here: https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/ Thanks, Ian
On 13/08/2024 3:43 pm, Ian Rogers wrote: > On Tue, Aug 13, 2024 at 7:38 AM James Clark <james.clark@linaro.org> wrote: >> >> >> >> On 13/08/2024 3:28 pm, Ian Rogers wrote: >>> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote: >>>> >>>> This overwrite relies on the clock event remaining at index 0 and is >>>> quite a way down from where the array is initialized, making it easy to >>>> miss. Just initialize it with the correct clock event to begin with. >>>> >>>> Signed-off-by: James Clark <james.clark@linaro.org> >>>> --- >>>> tools/perf/builtin-stat.c | 7 +++---- >>>> 1 file changed, 3 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c >>>> index 1f92445f7480..a65f58f8783f 100644 >>>> --- a/tools/perf/builtin-stat.c >>>> +++ b/tools/perf/builtin-stat.c >>>> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void) >>>> { >>>> struct perf_event_attr default_attrs0[] = { >>>> >>>> - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, >>>> + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ? >>>> + PERF_COUNT_SW_CPU_CLOCK : >>>> + PERF_COUNT_SW_TASK_CLOCK }, >>> >>> Hand crafting perf_event_attr when we have an event name to >>> perf_event_atttr parser doesn't make sense. Doing things this way >>> means we need to duplicate logic between event parsing and these >>> default configurations. The default configurations are also using >>> legacy events which of course are broken on Apple ARM M? (albeit for >>> hardware events, here it is software). Event and metric parsing has to >>> worry about things like grouping topdown events. All-in-all let's have >>> one way to do things, event parsing, otherwise this code is going to >>> end up reinventing all the workarounds the event parsing has to have. >>> Lots of struct perf_event_attr also contribute to binary size. >>> >>> If you are worried about a cycles event being opened on arm_dsu PMUs, >>> there is this patch: >>> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/ >>> >>> Thanks, >>> Ian >> >> Hi Ian, >> >> Is this comment related to this patch specifically or is it more of a >> general comment? >> >> This patch doesn't really make any actual changes other than move one >> line of code from one place to another. > > James, this code is removed here: > https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/ > > Thanks, > Ian Oh I see yeah. We can still work on that change, merging this one doesn't necessarily have to block that one, it just makes this one a bit redundant when the other one gets done. If I remember correctly in one of the last related discussions we thought that opening the cycles event as a sampling event should be a softer warning? Actually it seems like the DSU cycles event is a non-issue specifically for perf stat because it will open successfully anyway? It was just in perf record where it was the issue?
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 1f92445f7480..a65f58f8783f 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1817,7 +1817,9 @@ static int add_default_attributes(void) { struct perf_event_attr default_attrs0[] = { - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ? + PERF_COUNT_SW_CPU_CLOCK : + PERF_COUNT_SW_TASK_CLOCK }, { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES }, { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS }, { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS }, @@ -2030,9 +2032,6 @@ static int add_default_attributes(void) if (!evsel_list->core.nr_entries) { /* No events so add defaults. */ - if (target__has_cpu(&target)) - default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK; - if (evlist__add_default_attrs(evsel_list, default_attrs0) < 0) return -1; if (perf_pmus__have_event("cpu", "stalled-cycles-frontend")) {
This overwrite relies on the clock event remaining at index 0 and is quite a way down from where the array is initialized, making it easy to miss. Just initialize it with the correct clock event to begin with. Signed-off-by: James Clark <james.clark@linaro.org> --- tools/perf/builtin-stat.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)