Message ID | 20231113112507.917107-4-james.clark@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: perf: Add support for event counting threshold | expand |
On Mon, Nov 13, 2023 at 3:26 AM James Clark <james.clark@arm.com> wrote: > > Add documentation for the new Perf event open parameters and > the threshold_max capability file. > > Signed-off-by: James Clark <james.clark@arm.com> > --- > Documentation/arch/arm64/perf.rst | 56 +++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst > index 1f87b57c2332..36b8111a710d 100644 > --- a/Documentation/arch/arm64/perf.rst > +++ b/Documentation/arch/arm64/perf.rst > @@ -164,3 +164,59 @@ and should be used to mask the upper bits as needed. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c > .. _tools/lib/perf/tests/test-evsel.c: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c > + > +Event Counting Threshold > +========================================== > + > +Overview > +-------- > + > +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on > +events whose count meets a specified threshold condition. For example if > +threshold_compare is set to 2 ('Greater than or equal'), and the > +threshold is set to 2, then the PMU counter will now only increment by > +when an event would have previously incremented the PMU counter by 2 or > +more on a single processor cycle. > + > +To increment by 1 after passing the threshold condition instead of the > +number of events on that cycle, add the 'threshold_count' option to the > +commandline. > + > +How-to > +------ > + > +The threshold, threshold_compare and threshold_count values can be > +provided per event: > + > +.. code-block:: sh > + > + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ > + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ Can you please explain this a bit more? I guess the first event counts stall_slot PMU if the event if it's greater than or equal to 2. And as threshold_count is not set, it'd count the stall_slot as is. E.g. it counts 3 when it sees 3. OTOH, dtlb_walk will count 1 if it sees an event less than 10. Is my understanding correct? > + > +And the following comparison values are supported: > + > +.. code-block:: > + > + 0: Not-equal > + 1: Equals > + 2: Greater-than-or-equal > + 3: Less-than So the above values are for threashold_compare, right? It'd be nice if it's more explicit. Similarly, it'd be helpful to have a description for the threshold and threshold_count fields. Thanks, Namhyung > + > +The maximum supported threshold value can be read from the caps of each > +PMU, for example: > + > +.. code-block:: sh > + > + cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max > + > + 0x000000ff > + > +If a value higher than this is given, then it will be silently clamped > +to the maximum. The highest possible maximum is 4095, as the config > +field for threshold is limited to 12 bits, and the Perf tool will refuse > +to parse higher values. > + > +If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read > +0, and both threshold and threshold_compare will be silently ignored. > +threshold_max will also read as 0 on aarch32 guests, even if the host > +is running on hardware with the feature. > -- > 2.34.1 > >
On 20/11/2023 21:31, Namhyung Kim wrote: > On Mon, Nov 13, 2023 at 3:26 AM James Clark <james.clark@arm.com> wrote: >> >> Add documentation for the new Perf event open parameters and >> the threshold_max capability file. >> >> Signed-off-by: James Clark <james.clark@arm.com> >> --- >> Documentation/arch/arm64/perf.rst | 56 +++++++++++++++++++++++++++++++ >> 1 file changed, 56 insertions(+) >> >> diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst >> index 1f87b57c2332..36b8111a710d 100644 >> --- a/Documentation/arch/arm64/perf.rst >> +++ b/Documentation/arch/arm64/perf.rst >> @@ -164,3 +164,59 @@ and should be used to mask the upper bits as needed. >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c >> .. _tools/lib/perf/tests/test-evsel.c: >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c >> + >> +Event Counting Threshold >> +========================================== >> + >> +Overview >> +-------- >> + >> +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on >> +events whose count meets a specified threshold condition. For example if >> +threshold_compare is set to 2 ('Greater than or equal'), and the >> +threshold is set to 2, then the PMU counter will now only increment by >> +when an event would have previously incremented the PMU counter by 2 or >> +more on a single processor cycle. >> + >> +To increment by 1 after passing the threshold condition instead of the >> +number of events on that cycle, add the 'threshold_count' option to the >> +commandline. >> + >> +How-to >> +------ >> + >> +The threshold, threshold_compare and threshold_count values can be >> +provided per event: >> + >> +.. code-block:: sh >> + >> + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ >> + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ > > Can you please explain this a bit more? > > I guess the first event counts stall_slot PMU if the event if it's > greater than or equal to 2. And as threshold_count is not set, > it'd count the stall_slot as is. E.g. it counts 3 when it sees 3. > > OTOH, dtlb_walk will count 1 if it sees an event less than 10. > Is my understanding correct? That is correct. The behavior is described in the paragraph above. But I agree that it would be really helpful if we explained with the example above. > >> + >> +And the following comparison values are supported: >> + >> +.. code-block:: >> + >> + 0: Not-equal >> + 1: Equals >> + 2: Greater-than-or-equal >> + 3: Less-than > > So the above values are for threashold_compare, right? > It'd be nice if it's more explicit. > > Similarly, it'd be helpful to have a description for the > threshold and threshold_count fields. Agreed. Suzuki > > Thanks, > Namhyung > >> + >> +The maximum supported threshold value can be read from the caps of each >> +PMU, for example: >> + >> +.. code-block:: sh >> + >> + cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max >> + >> + 0x000000ff >> + >> +If a value higher than this is given, then it will be silently clamped >> +to the maximum. The highest possible maximum is 4095, as the config >> +field for threshold is limited to 12 bits, and the Perf tool will refuse >> +to parse higher values. >> + >> +If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read >> +0, and both threshold and threshold_compare will be silently ignored. >> +threshold_max will also read as 0 on aarch32 guests, even if the host >> +is running on hardware with the feature. >> -- >> 2.34.1 >> >>
On 11/21/23 03:01, Namhyung Kim wrote: > On Mon, Nov 13, 2023 at 3:26 AM James Clark <james.clark@arm.com> wrote: >> Add documentation for the new Perf event open parameters and >> the threshold_max capability file. >> >> Signed-off-by: James Clark <james.clark@arm.com> >> --- >> Documentation/arch/arm64/perf.rst | 56 +++++++++++++++++++++++++++++++ >> 1 file changed, 56 insertions(+) >> >> diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst >> index 1f87b57c2332..36b8111a710d 100644 >> --- a/Documentation/arch/arm64/perf.rst >> +++ b/Documentation/arch/arm64/perf.rst >> @@ -164,3 +164,59 @@ and should be used to mask the upper bits as needed. >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c >> .. _tools/lib/perf/tests/test-evsel.c: >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c >> + >> +Event Counting Threshold >> +========================================== >> + >> +Overview >> +-------- >> + >> +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on >> +events whose count meets a specified threshold condition. For example if >> +threshold_compare is set to 2 ('Greater than or equal'), and the >> +threshold is set to 2, then the PMU counter will now only increment by >> +when an event would have previously incremented the PMU counter by 2 or >> +more on a single processor cycle. >> + >> +To increment by 1 after passing the threshold condition instead of the >> +number of events on that cycle, add the 'threshold_count' option to the >> +commandline. >> + >> +How-to >> +------ >> + >> +The threshold, threshold_compare and threshold_count values can be >> +provided per event: >> + >> +.. code-block:: sh >> + >> + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ >> + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ > Can you please explain this a bit more? > > I guess the first event counts stall_slot PMU if the event if it's > greater than or equal to 2. And as threshold_count is not set, > it'd count the stall_slot as is. E.g. it counts 3 when it sees 3. Hence without 'threshold_count' being set, the other two config requests will not have an effect, is that correct ? > > OTOH, dtlb_walk will count 1 if it sees an event less than 10. > Is my understanding correct? 'Equals' and 'Greater-than-or-equal' makes sense and are intuitive. Just wondering what will happen for 'Not-equal' and 'Less-than' - when would the counter count in such cases ? 0: Not-equal 1: Equals 2: Greater-than-or-equal 3: Less-than
On 21/11/2023 10:33, Suzuki K Poulose wrote: > On 20/11/2023 21:31, Namhyung Kim wrote: >> On Mon, Nov 13, 2023 at 3:26 AM James Clark <james.clark@arm.com> wrote: >>> >>> Add documentation for the new Perf event open parameters and >>> the threshold_max capability file. >>> >>> Signed-off-by: James Clark <james.clark@arm.com> >>> --- >>> Documentation/arch/arm64/perf.rst | 56 +++++++++++++++++++++++++++++++ >>> 1 file changed, 56 insertions(+) >>> >>> diff --git a/Documentation/arch/arm64/perf.rst >>> b/Documentation/arch/arm64/perf.rst >>> index 1f87b57c2332..36b8111a710d 100644 >>> --- a/Documentation/arch/arm64/perf.rst >>> +++ b/Documentation/arch/arm64/perf.rst >>> @@ -164,3 +164,59 @@ and should be used to mask the upper bits as >>> needed. >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c >>> .. _tools/lib/perf/tests/test-evsel.c: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c >>> + >>> +Event Counting Threshold >>> +========================================== >>> + >>> +Overview >>> +-------- >>> + >>> +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on >>> +events whose count meets a specified threshold condition. For >>> example if >>> +threshold_compare is set to 2 ('Greater than or equal'), and the >>> +threshold is set to 2, then the PMU counter will now only increment by >>> +when an event would have previously incremented the PMU counter by 2 or >>> +more on a single processor cycle. >>> + >>> +To increment by 1 after passing the threshold condition instead of the >>> +number of events on that cycle, add the 'threshold_count' option to the >>> +commandline. >>> + >>> +How-to >>> +------ >>> + >>> +The threshold, threshold_compare and threshold_count values can be >>> +provided per event: >>> + >>> +.. code-block:: sh >>> + >>> + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ >>> + -e >>> dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ >> >> Can you please explain this a bit more? >> >> I guess the first event counts stall_slot PMU if the event if it's >> greater than or equal to 2. And as threshold_count is not set, >> it'd count the stall_slot as is. E.g. it counts 3 when it sees 3. >> >> OTOH, dtlb_walk will count 1 if it sees an event less than 10. >> Is my understanding correct? > > That is correct. The behavior is described in the paragraph above. > But I agree that it would be really helpful if we explained with the > example above. > Yeah I can add a description of how the example behaves. >> >>> + >>> +And the following comparison values are supported: >>> + >>> +.. code-block:: >>> + >>> + 0: Not-equal >>> + 1: Equals >>> + 2: Greater-than-or-equal >>> + 3: Less-than >> >> So the above values are for threashold_compare, right? >> It'd be nice if it's more explicit. Yep I agree, I can label this with threshold_compare. >> >> Similarly, it'd be helpful to have a description for the >> threshold and threshold_count fields. > > Agreed. > > Suzuki > Yeah I'll add explicit descriptions for each field. Thanks for the review. > > >> >> Thanks, >> Namhyung >> >>> + >>> +The maximum supported threshold value can be read from the caps of each >>> +PMU, for example: >>> + >>> +.. code-block:: sh >>> + >>> + cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max >>> + >>> + 0x000000ff >>> + >>> +If a value higher than this is given, then it will be silently clamped >>> +to the maximum. The highest possible maximum is 4095, as the config >>> +field for threshold is limited to 12 bits, and the Perf tool will >>> refuse >>> +to parse higher values. >>> + >>> +If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read >>> +0, and both threshold and threshold_compare will be silently ignored. >>> +threshold_max will also read as 0 on aarch32 guests, even if the host >>> +is running on hardware with the feature. >>> -- >>> 2.34.1 >>> >>> > >
On 23/11/2023 05:50, Anshuman Khandual wrote: > > > On 11/21/23 03:01, Namhyung Kim wrote: >> On Mon, Nov 13, 2023 at 3:26 AM James Clark <james.clark@arm.com> wrote: >>> Add documentation for the new Perf event open parameters and >>> the threshold_max capability file. >>> >>> Signed-off-by: James Clark <james.clark@arm.com> >>> --- >>> Documentation/arch/arm64/perf.rst | 56 +++++++++++++++++++++++++++++++ >>> 1 file changed, 56 insertions(+) >>> >>> diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst >>> index 1f87b57c2332..36b8111a710d 100644 >>> --- a/Documentation/arch/arm64/perf.rst >>> +++ b/Documentation/arch/arm64/perf.rst >>> @@ -164,3 +164,59 @@ and should be used to mask the upper bits as needed. >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c >>> .. _tools/lib/perf/tests/test-evsel.c: >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c >>> + >>> +Event Counting Threshold >>> +========================================== >>> + >>> +Overview >>> +-------- >>> + >>> +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on >>> +events whose count meets a specified threshold condition. For example if >>> +threshold_compare is set to 2 ('Greater than or equal'), and the >>> +threshold is set to 2, then the PMU counter will now only increment by >>> +when an event would have previously incremented the PMU counter by 2 or >>> +more on a single processor cycle. >>> + >>> +To increment by 1 after passing the threshold condition instead of the >>> +number of events on that cycle, add the 'threshold_count' option to the >>> +commandline. >>> + >>> +How-to >>> +------ >>> + >>> +The threshold, threshold_compare and threshold_count values can be >>> +provided per event: >>> + >>> +.. code-block:: sh >>> + >>> + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ >>> + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ >> Can you please explain this a bit more? >> >> I guess the first event counts stall_slot PMU if the event if it's >> greater than or equal to 2. And as threshold_count is not set, >> it'd count the stall_slot as is. E.g. it counts 3 when it sees 3. > > Hence without 'threshold_count' being set, the other two config requests > will not have an effect, is that correct ? Yeah I can mention this. It's implied because 0 is the default value of config fields, and 0 is a valid value for compare and count field, so threshold=0 has to be the way to disable it. But I can mention it explicitly. > >> >> OTOH, dtlb_walk will count 1 if it sees an event less than 10. >> Is my understanding correct? > > 'Equals' and 'Greater-than-or-equal' makes sense and are intuitive. Just > wondering what will happen for 'Not-equal' and 'Less-than' - when would > the counter count in such cases ? > > 0: Not-equal > 1: Equals > 2: Greater-than-or-equal > 3: Less-than > They would count when the event is not equal to or less than the threshold value on any cycle. Probably going into more detail would start to reproduce what's in the reference manual. All the pseudocode is in there which describes how it works. As for use cases, I'm not really sure. It probably wasn't any effort to add into the hardware with a single not gate, and something could have been missed if it wasn't added. You might be able to do things like count the inverse of something without having to open another event to subtract from to find what the inverse would be.
On 23/11/2023 15:45, James Clark wrote: > > > On 23/11/2023 05:50, Anshuman Khandual wrote: >> >> >> On 11/21/23 03:01, Namhyung Kim wrote: >>> On Mon, Nov 13, 2023 at 3:26 AM James Clark <james.clark@arm.com> wrote: >>>> Add documentation for the new Perf event open parameters and >>>> the threshold_max capability file. >>>> >>>> Signed-off-by: James Clark <james.clark@arm.com> >>>> --- >>>> Documentation/arch/arm64/perf.rst | 56 +++++++++++++++++++++++++++++++ >>>> 1 file changed, 56 insertions(+) >>>> >>>> diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst >>>> index 1f87b57c2332..36b8111a710d 100644 >>>> --- a/Documentation/arch/arm64/perf.rst >>>> +++ b/Documentation/arch/arm64/perf.rst >>>> @@ -164,3 +164,59 @@ and should be used to mask the upper bits as needed. >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c >>>> .. _tools/lib/perf/tests/test-evsel.c: >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c >>>> + >>>> +Event Counting Threshold >>>> +========================================== >>>> + >>>> +Overview >>>> +-------- >>>> + >>>> +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on >>>> +events whose count meets a specified threshold condition. For example if >>>> +threshold_compare is set to 2 ('Greater than or equal'), and the >>>> +threshold is set to 2, then the PMU counter will now only increment by >>>> +when an event would have previously incremented the PMU counter by 2 or >>>> +more on a single processor cycle. >>>> + >>>> +To increment by 1 after passing the threshold condition instead of the >>>> +number of events on that cycle, add the 'threshold_count' option to the >>>> +commandline. >>>> + >>>> +How-to >>>> +------ >>>> + >>>> +The threshold, threshold_compare and threshold_count values can be >>>> +provided per event: >>>> + >>>> +.. code-block:: sh >>>> + >>>> + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ >>>> + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ >>> Can you please explain this a bit more? >>> >>> I guess the first event counts stall_slot PMU if the event if it's >>> greater than or equal to 2. And as threshold_count is not set, >>> it'd count the stall_slot as is. E.g. it counts 3 when it sees 3. >> >> Hence without 'threshold_count' being set, the other two config requests >> will not have an effect, is that correct ? > > Yeah I can mention this. It's implied because 0 is the default value of > config fields, and 0 is a valid value for compare and count field, so > threshold=0 has to be the way to disable it. But I can mention it > explicitly. > To avoid any confusion, I thought you meant threshold here instead of threshold_count. But I replied in more detail about the same issue on patch 2.
diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst index 1f87b57c2332..36b8111a710d 100644 --- a/Documentation/arch/arm64/perf.rst +++ b/Documentation/arch/arm64/perf.rst @@ -164,3 +164,59 @@ and should be used to mask the upper bits as needed. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c .. _tools/lib/perf/tests/test-evsel.c: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c + +Event Counting Threshold +========================================== + +Overview +-------- + +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on +events whose count meets a specified threshold condition. For example if +threshold_compare is set to 2 ('Greater than or equal'), and the +threshold is set to 2, then the PMU counter will now only increment by +when an event would have previously incremented the PMU counter by 2 or +more on a single processor cycle. + +To increment by 1 after passing the threshold condition instead of the +number of events on that cycle, add the 'threshold_count' option to the +commandline. + +How-to +------ + +The threshold, threshold_compare and threshold_count values can be +provided per event: + +.. code-block:: sh + + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ + +And the following comparison values are supported: + +.. code-block:: + + 0: Not-equal + 1: Equals + 2: Greater-than-or-equal + 3: Less-than + +The maximum supported threshold value can be read from the caps of each +PMU, for example: + +.. code-block:: sh + + cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max + + 0x000000ff + +If a value higher than this is given, then it will be silently clamped +to the maximum. The highest possible maximum is 4095, as the config +field for threshold is limited to 12 bits, and the Perf tool will refuse +to parse higher values. + +If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read +0, and both threshold and threshold_compare will be silently ignored. +threshold_max will also read as 0 on aarch32 guests, even if the host +is running on hardware with the feature.
Add documentation for the new Perf event open parameters and the threshold_max capability file. Signed-off-by: James Clark <james.clark@arm.com> --- Documentation/arch/arm64/perf.rst | 56 +++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)