diff mbox series

[v12,8/8] arm64: docs: document perf event attributes

Message ID 20190328103731.27264-9-andrew.murray@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: Support perf event modifiers :G and :H | expand

Commit Message

Andrew Murray March 28, 2019, 10:37 a.m. UTC
The interaction between the exclude_{host,guest} flags,
exclude_{user,kernel,hv} flags and presence of VHE can result in
different exception levels being filtered by the ARMv8 PMU. As this
can be confusing let's document how they work on arm64.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 Documentation/arm64/perf.txt | 74 ++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 Documentation/arm64/perf.txt

Comments

Will Deacon April 4, 2019, 4:21 p.m. UTC | #1
On Thu, Mar 28, 2019 at 10:37:31AM +0000, Andrew Murray wrote:
> The interaction between the exclude_{host,guest} flags,
> exclude_{user,kernel,hv} flags and presence of VHE can result in
> different exception levels being filtered by the ARMv8 PMU. As this
> can be confusing let's document how they work on arm64.
> 
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  Documentation/arm64/perf.txt | 74 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>  create mode 100644 Documentation/arm64/perf.txt
> 
> diff --git a/Documentation/arm64/perf.txt b/Documentation/arm64/perf.txt
> new file mode 100644
> index 000000000000..604446c1f720
> --- /dev/null
> +++ b/Documentation/arm64/perf.txt
> @@ -0,0 +1,74 @@
> +Perf Event Attributes
> +=====================
> +
> +Author: Andrew Murray <andrew.murray@arm.com>
> +Date: 2019-03-06
> +
> +exclude_user
> +------------
> +
> +This attribute excludes userspace.
> +
> +Userspace always runs at EL0 and thus this attribute will exclude EL0.
> +
> +
> +exclude_kernel
> +--------------
> +
> +This attribute excludes the kernel.
> +
> +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
> +at EL1.
> +
> +This attribute will exclude EL1 and additionally EL2 on a VHE system.

I find this last sentence a bit confusing, because it can be read to imply
that if you don't set exclude_kernel and you're in a guest on a VHE system,
then you can profile EL2.

> +exclude_hv
> +----------
> +
> +This attribute excludes the hypervisor, we ignore this flag on a VHE system
> +as we consider the host kernel to be the hypervisor.

Similar comment as the above: I don't think this makes sense when you look
at things from the guest perspective.

> +On a non-VHE system we consider the hypervisor to be any code that runs at
> +EL2 which is predominantly used for guest/host transitions.
> +
> +This attribute will exclude EL2 on a non-VHE system.
> +
> +
> +exclude_host / exclude_guest
> +----------------------------
> +
> +This attribute excludes the KVM host.

But there are two attributes...

Will
Andrew Murray April 4, 2019, 7:33 p.m. UTC | #2
On Thu, Apr 04, 2019 at 05:21:28PM +0100, Will Deacon wrote:
> On Thu, Mar 28, 2019 at 10:37:31AM +0000, Andrew Murray wrote:
> > The interaction between the exclude_{host,guest} flags,
> > exclude_{user,kernel,hv} flags and presence of VHE can result in
> > different exception levels being filtered by the ARMv8 PMU. As this
> > can be confusing let's document how they work on arm64.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  Documentation/arm64/perf.txt | 74 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 74 insertions(+)
> >  create mode 100644 Documentation/arm64/perf.txt
> > 
> > diff --git a/Documentation/arm64/perf.txt b/Documentation/arm64/perf.txt
> > new file mode 100644
> > index 000000000000..604446c1f720
> > --- /dev/null
> > +++ b/Documentation/arm64/perf.txt
> > @@ -0,0 +1,74 @@
> > +Perf Event Attributes
> > +=====================
> > +
> > +Author: Andrew Murray <andrew.murray@arm.com>
> > +Date: 2019-03-06
> > +
> > +exclude_user
> > +------------
> > +
> > +This attribute excludes userspace.
> > +
> > +Userspace always runs at EL0 and thus this attribute will exclude EL0.
> > +
> > +
> > +exclude_kernel
> > +--------------
> > +
> > +This attribute excludes the kernel.
> > +
> > +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
> > +at EL1.
> > +
> > +This attribute will exclude EL1 and additionally EL2 on a VHE system.
> 
> I find this last sentence a bit confusing, because it can be read to imply
> that if you don't set exclude_kernel and you're in a guest on a VHE system,
> then you can profile EL2.

Yes this could be misleading.

However from the perspective of the guest, when exclude_kernel is not set we
do indeed allow the guest to program it's PMU with ARMV8_PMU_INCLUDE_EL2 - and
thus the statement above is correct in terms of what the kernel believes it is
doing.

I think these statements are less confusing if we treat the exception levels
as those 'detected' by the running context (e.g. consider the impact of nested
virt here) - and we if ignore what the hypervisor (KVM) does outside (e.g.
stops counting upon switching between guest/host, translating PMU filters in
kvm_pmu_set_counter_event_type etc, etc). This then makes this document useful
for those wishing to change this logic (which is the intent) rather than those
trying to understand how we filter for EL levels as seen bare-metal.

With regards to the example you gave (exclude_kernel, EL2) - yes we want the
kernel to believe it can count EL2 - because one day we may want to update
KVM to allow the guest to count it's hypervisor overhead (e.g. host kernel
time associated with the guest).

I could write some preface that describes this outlook. Alternatively I could
just spell out what happens on a guest, e.g.

"For the host this attribute will exclude EL1 and additionally EL2 on a VHE
system.

For the guest this attribute will exclude EL1."

Though I'm less comfortable with this, as the last statement "For the guest this
attribute will exclude EL1." describes the product of both
kvm_pmu_set_counter_event_type and armv8pmu_set_event_filter which is confusing
to work out and also makes an assumption that we don't have nested virt (true
for now at least) and also reasons about bare-metal EL levels which probably
aren't that useful for someone changing this logic or understanding what the
flags do for there performance analysis.

Do you have a preference for how this is improved?

> 
> > +exclude_hv
> > +----------
> > +
> > +This attribute excludes the hypervisor, we ignore this flag on a VHE system
> > +as we consider the host kernel to be the hypervisor.
> 
> Similar comment as the above: I don't think this makes sense when you look
> at things from the guest perspective.
> 
> > +On a non-VHE system we consider the hypervisor to be any code that runs at
> > +EL2 which is predominantly used for guest/host transitions.
> > +
> > +This attribute will exclude EL2 on a non-VHE system.
> > +
> > +
> > +exclude_host / exclude_guest
> > +----------------------------
> > +
> > +This attribute excludes the KVM host.
> 
> But there are two attributes...

Oh, I'll have to update this.

Thanks for the review,

Andrew Murray

> 
> Will
Will Deacon April 5, 2019, 12:43 p.m. UTC | #3
On Thu, Apr 04, 2019 at 08:33:51PM +0100, Andrew Murray wrote:
> On Thu, Apr 04, 2019 at 05:21:28PM +0100, Will Deacon wrote:
> > On Thu, Mar 28, 2019 at 10:37:31AM +0000, Andrew Murray wrote:
> > > +exclude_kernel
> > > +--------------
> > > +
> > > +This attribute excludes the kernel.
> > > +
> > > +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
> > > +at EL1.
> > > +
> > > +This attribute will exclude EL1 and additionally EL2 on a VHE system.
> > 
> > I find this last sentence a bit confusing, because it can be read to imply
> > that if you don't set exclude_kernel and you're in a guest on a VHE system,
> > then you can profile EL2.
> 
> Yes this could be misleading.
> 
> However from the perspective of the guest, when exclude_kernel is not set we
> do indeed allow the guest to program it's PMU with ARMV8_PMU_INCLUDE_EL2 - and
> thus the statement above is correct in terms of what the kernel believes it is
> doing.
> 
> I think these statements are less confusing if we treat the exception levels
> as those 'detected' by the running context (e.g. consider the impact of nested
> virt here) - and we if ignore what the hypervisor (KVM) does outside (e.g.
> stops counting upon switching between guest/host, translating PMU filters in
> kvm_pmu_set_counter_event_type etc, etc). This then makes this document useful
> for those wishing to change this logic (which is the intent) rather than those
> trying to understand how we filter for EL levels as seen bare-metal.
> 
> With regards to the example you gave (exclude_kernel, EL2) - yes we want the
> kernel to believe it can count EL2 - because one day we may want to update
> KVM to allow the guest to count it's hypervisor overhead (e.g. host kernel
> time associated with the guest).

If we were to support this in the future, then exclude_hv will suddenly
start meaning something in a guest, so this could be considered to be an ABI
break.

> I could write some preface that describes this outlook. Alternatively I could
> just spell out what happens on a guest, e.g.
> 
> "For the host this attribute will exclude EL1 and additionally EL2 on a VHE
> system.
> 
> For the guest this attribute will exclude EL1."
> 
> Though I'm less comfortable with this, as the last statement "For the guest this
> attribute will exclude EL1." describes the product of both
> kvm_pmu_set_counter_event_type and armv8pmu_set_event_filter which is confusing
> to work out and also makes an assumption that we don't have nested virt (true
> for now at least) and also reasons about bare-metal EL levels which probably
> aren't that useful for someone changing this logic or understanding what the
> flags do for there performance analysis.
> 
> Do you have a preference for how this is improved?

I think you should be explicit about what is counted. If we don't count EL2
when profiling in a guest (regardless of the exclude_*) flags, then we
should say that. By not documenting this we don't actually buy ourselves
room to change things in future, we should have an emergent behaviour which
isn't covered by our docs.

Will
Andrew Murray April 9, 2019, 11 a.m. UTC | #4
On Fri, Apr 05, 2019 at 01:43:08PM +0100, Will Deacon wrote:
> On Thu, Apr 04, 2019 at 08:33:51PM +0100, Andrew Murray wrote:
> > On Thu, Apr 04, 2019 at 05:21:28PM +0100, Will Deacon wrote:
> > > On Thu, Mar 28, 2019 at 10:37:31AM +0000, Andrew Murray wrote:
> > > > +exclude_kernel
> > > > +--------------
> > > > +
> > > > +This attribute excludes the kernel.
> > > > +
> > > > +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
> > > > +at EL1.
> > > > +
> > > > +This attribute will exclude EL1 and additionally EL2 on a VHE system.
> > > 
> > > I find this last sentence a bit confusing, because it can be read to imply
> > > that if you don't set exclude_kernel and you're in a guest on a VHE system,
> > > then you can profile EL2.
> > 
> > Yes this could be misleading.
> > 
> > However from the perspective of the guest, when exclude_kernel is not set we
> > do indeed allow the guest to program it's PMU with ARMV8_PMU_INCLUDE_EL2 - and
> > thus the statement above is correct in terms of what the kernel believes it is
> > doing.
> > 
> > I think these statements are less confusing if we treat the exception levels
> > as those 'detected' by the running context (e.g. consider the impact of nested
> > virt here) - and we if ignore what the hypervisor (KVM) does outside (e.g.
> > stops counting upon switching between guest/host, translating PMU filters in
> > kvm_pmu_set_counter_event_type etc, etc). This then makes this document useful
> > for those wishing to change this logic (which is the intent) rather than those
> > trying to understand how we filter for EL levels as seen bare-metal.
> > 
> > With regards to the example you gave (exclude_kernel, EL2) - yes we want the
> > kernel to believe it can count EL2 - because one day we may want to update
> > KVM to allow the guest to count it's hypervisor overhead (e.g. host kernel
> > time associated with the guest).
> 
> If we were to support this in the future, then exclude_hv will suddenly
> start meaning something in a guest, so this could be considered to be an ABI
> break.
> 
> > I could write some preface that describes this outlook. Alternatively I could
> > just spell out what happens on a guest, e.g.
> > 
> > "For the host this attribute will exclude EL1 and additionally EL2 on a VHE
> > system.
> > 
> > For the guest this attribute will exclude EL1."
> > 
> > Though I'm less comfortable with this, as the last statement "For the guest this
> > attribute will exclude EL1." describes the product of both
> > kvm_pmu_set_counter_event_type and armv8pmu_set_event_filter which is confusing
> > to work out and also makes an assumption that we don't have nested virt (true
> > for now at least) and also reasons about bare-metal EL levels which probably
> > aren't that useful for someone changing this logic or understanding what the
> > flags do for there performance analysis.
> > 
> > Do you have a preference for how this is improved?
> 
> I think you should be explicit about what is counted. If we don't count EL2
> when profiling in a guest (regardless of the exclude_*) flags, then we
> should say that. By not documenting this we don't actually buy ourselves
> room to change things in future, we should have an emergent behaviour which
> isn't covered by our docs.

OK no problem, I'll update this.

Andrew Murray

> 
> Will
diff mbox series

Patch

diff --git a/Documentation/arm64/perf.txt b/Documentation/arm64/perf.txt
new file mode 100644
index 000000000000..604446c1f720
--- /dev/null
+++ b/Documentation/arm64/perf.txt
@@ -0,0 +1,74 @@ 
+Perf Event Attributes
+=====================
+
+Author: Andrew Murray <andrew.murray@arm.com>
+Date: 2019-03-06
+
+exclude_user
+------------
+
+This attribute excludes userspace.
+
+Userspace always runs at EL0 and thus this attribute will exclude EL0.
+
+
+exclude_kernel
+--------------
+
+This attribute excludes the kernel.
+
+The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
+at EL1.
+
+This attribute will exclude EL1 and additionally EL2 on a VHE system.
+
+
+exclude_hv
+----------
+
+This attribute excludes the hypervisor, we ignore this flag on a VHE system
+as we consider the host kernel to be the hypervisor.
+
+On a non-VHE system we consider the hypervisor to be any code that runs at
+EL2 which is predominantly used for guest/host transitions.
+
+This attribute will exclude EL2 on a non-VHE system.
+
+
+exclude_host / exclude_guest
+----------------------------
+
+This attribute excludes the KVM host.
+
+The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE
+kernel or non-VHE hypervisor).
+
+The KVM guest may run at EL0 (userspace) and EL1 (kernel).
+
+Due to the overlapping exception levels between host and guests we cannot
+exclusively rely on the PMU's hardware exception filtering - therefore we
+must enable/disable counting on the entry and exit to the guest. This is
+performed differently on VHE and non-VHE systems.
+
+For non-VHE systems we exclude EL2 for exclude_host - upon entering and
+exiting the guest we disable/enable the event as appropriate based on the
+exclude_host and exclude_guest attributes.
+
+For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2
+for exclude_host. Upon entering and exiting the guest we modify the event
+to include/exclude EL0 as appropriate based on the exclude_host and
+exclude_guest attributes.
+
+
+Accuracy
+--------
+
+On non-VHE systems we enable/disable counters on the entry/exit of
+host/guest transition at EL2 - however there is a period of time between
+enabling/disabling the counters and entering/exiting the guest. We are
+able to eliminate counters counting host events on the boundaries of guest
+entry/exit when counting guest events by filtering out EL2 for
+exclude_host. However when using !exclude_hv there is a small blackout
+window at the guest entry/exit where host events are not captured.
+
+On VHE systems there are no blackout windows.