diff mbox series

[v1,2/5] KVM: arm/arm64: Adjust entry/exit and trap related tracepoints

Message ID 1560330526-15468-3-git-send-email-yuzenghui@huawei.com (mailing list archive)
State New, archived
Headers show
Series perf kvm: Add stat support on arm64 | expand

Commit Message

Zenghui Yu June 12, 2019, 9:08 a.m. UTC
Currently, we use trace_kvm_exit() to report exception type (e.g.,
"IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together.
But hardware only saves the exit class to ESR_ELx on synchronous
exceptions, not on asynchronous exceptions. When the guest exits
due to external interrupts, we will get tracing output like:

	"kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30"

Obviously, "HSR_EC" here is meaningless.

This patch splits "exit" and "trap" events by adding two tracepoints
explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM
exit events, and trace_kvm_trap_exit() report VM trap events.

These tracepoints are adjusted also in preparation for supporting
'perf kvm stat' on arm64.

Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
---
 arch/arm64/kvm/handle_exit.c |  3 +++
 arch/arm64/kvm/trace.h       | 35 +++++++++++++++++++++++++++++++++++
 virt/kvm/arm/arm.c           |  4 ++--
 virt/kvm/arm/trace.h         | 21 +++++++++++----------
 4 files changed, 51 insertions(+), 12 deletions(-)

Comments

James Morse June 12, 2019, 12:49 p.m. UTC | #1
Hi,

On 12/06/2019 10:08, Zenghui Yu wrote:
> Currently, we use trace_kvm_exit() to report exception type (e.g.,
> "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together.

(They both caused an exit!)


> But hardware only saves the exit class to ESR_ELx on synchronous

EC is the 'Exception Class'. Exit is KVM/Linux's terminology.


> exceptions, not on asynchronous exceptions. When the guest exits
> due to external interrupts, we will get tracing output like:
> 
> 	"kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30"
> 
> Obviously, "HSR_EC" here is meaningless.

I assume we do it this way so there is only one guest-exit tracepoint that catches all exits.
I don't think its a problem if user-space has to know the EC isn't set for asynchronous
exceptions, this is a property of the architecture and anything using these trace-points
is already arch specific.


> This patch splits "exit" and "trap" events by adding two tracepoints
> explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM
> exit events, and trace_kvm_trap_exit() report VM trap events.
> 
> These tracepoints are adjusted also in preparation for supporting
> 'perf kvm stat' on arm64.

Because the existing tracepoints are ABI, I don't think we can change them.

We can add new ones if there is something that a user reasonably needs to trace, and can't
be done any other way.

What can't 'perf kvm stat' do with the existing trace points?


> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 516aead..af3c732 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -264,7 +264,10 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		exit_handle_fn exit_handler;
>  
>  		exit_handler = kvm_get_exit_handler(vcpu);
> +		trace_kvm_trap_enter(vcpu->vcpu_id,
> +				     kvm_vcpu_trap_get_class(vcpu));
>  		handled = exit_handler(vcpu, run);
> +		trace_kvm_trap_exit(vcpu->vcpu_id);
>  	}

Why are there two? Are you using this to benchmark the exit_handler()?

As we can't remove the EC from the exit event, I don't think this tells us anything new.


> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 90cedeb..9f63fd9 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -758,7 +758,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		/**************************************************************
>  		 * Enter the guest
>  		 */
> -		trace_kvm_entry(*vcpu_pc(vcpu));
> +		trace_kvm_entry(vcpu->vcpu_id, *vcpu_pc(vcpu));

Why do you need the PC? It was exported on exit.
(its mostly junk for user-space anyway, you can't infer anything from it)


Thanks,

James
Zenghui Yu June 13, 2019, 11:28 a.m. UTC | #2
Hi James,

On 2019/6/12 20:49, James Morse wrote:
> Hi,
> 
> On 12/06/2019 10:08, Zenghui Yu wrote:
>> Currently, we use trace_kvm_exit() to report exception type (e.g.,
>> "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together.
> 
> (They both caused an exit!)
> 
> 
>> But hardware only saves the exit class to ESR_ELx on synchronous
> 
> EC is the 'Exception Class'. Exit is KVM/Linux's terminology.
Yes, a stupid mistake ;-)

>> exceptions, not on asynchronous exceptions. When the guest exits
>> due to external interrupts, we will get tracing output like:
>>
>> 	"kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30"
>>
>> Obviously, "HSR_EC" here is meaningless.
> 
> I assume we do it this way so there is only one guest-exit tracepoint that catches all exits.
> I don't think its a problem if user-space has to know the EC isn't set for asynchronous
> exceptions, this is a property of the architecture and anything using these trace-points
> is already arch specific.
Actually, *no* problem in current implementation, and I'm OK to still
keep the EC in trace_kvm_exit().  What I really want to do is adding the
EC in trace_trap_enter (the new tracepoint), will explain it later.

>> This patch splits "exit" and "trap" events by adding two tracepoints
>> explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM
>> exit events, and trace_kvm_trap_exit() report VM trap events.
>>
>> These tracepoints are adjusted also in preparation for supporting
>> 'perf kvm stat' on arm64.
> 
> Because the existing tracepoints are ABI, I don't think we can change them.
> 
> We can add new ones if there is something that a user reasonably needs to trace, and can't
> be done any other way.
> 
> What can't 'perf kvm stat' do with the existing trace points?
(A good question! I should have made it clear in the commit message,
  forgive me.)

First, how does 'perf kvm stat' interact with tracepoints?

We have three handlers for a specific event (e.g., "VM-EXIT") --
"is_begin_event", "is_end_event", "decode_key". The first two handlers
make use of two existing tracepoints ("kvm:kvm_exit" & "kvm:kvm_entry")
to check when the VM-EXIT events started/ended, thus the time difference
stats, event start/end time etc. can be calculated.
"is_begin_event" handler gets a *key* from the "ret" field (exit_code)
of "kvm:kvm_exit" payload, and "decode_key" handler makes use of the
*key* to find out the reason for the VM-EXIT event. Of course we should
maintain the mapping between exit_code and exit_reason in userspace.
These are all what *patch #4* had done, #4 is a simple patch to review!
Oh, we can also set "vcpu_id_str" to achieve per vcpu event record, but
currently, we only have the "vcpu_pc" field in "kvm:kvm_entry", without
something like "vcpu_id".

perf people must have a much deeper understanding of this.


OK, next comes the more important question - what should/can we do to
the tracepoints in preparation of 'perf kvm stat' on arm64?

 From the article you've provided, it's clear that we can't remove the EC
from trace_kvm_exit(). But can we add something like "vcpu_id" into
(at least) trace_kvm_entry(), just like what this patch has done?
If not, which means we have to keep the existing tracepoints totally
unchanged, then 'perf kvm stat' will have no way to record/report per
vcpu VM-EXIT events (other arch like X86, powerpc, s390 etc. have this
capability, if I understand it correctly).

As for TRAP events, should we consider adding two new tracepoints --
"kvm_trap_enter" and "kvm_trap_exit", to keep tracking of the trap
handling process? We should also record the EC in "kvm_trap_enter", 
which will be used as *key* in TRAP event's "is_begin_event" handler.
Patch #5 tells us the whole story, it's simple too.

What do you suggest?

>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 516aead..af3c732 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -264,7 +264,10 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>   		exit_handle_fn exit_handler;
>>   
>>   		exit_handler = kvm_get_exit_handler(vcpu);
>> +		trace_kvm_trap_enter(vcpu->vcpu_id,
>> +				     kvm_vcpu_trap_get_class(vcpu));
>>   		handled = exit_handler(vcpu, run);
>> +		trace_kvm_trap_exit(vcpu->vcpu_id);
>>   	}
> 
> Why are there two? Are you using this to benchmark the exit_handler()?
Almostly yes. Let perf know when the TRAP handling event start/end,
and ...

> As we can't remove the EC from the exit event, I don't think this tells us anything new.
As explained above, this EC is for 'perf kvm stat'.

>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 90cedeb..9f63fd9 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -758,7 +758,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>   		/**************************************************************
>>   		 * Enter the guest
>>   		 */
>> -		trace_kvm_entry(*vcpu_pc(vcpu));
>> +		trace_kvm_entry(vcpu->vcpu_id, *vcpu_pc(vcpu));
> 
> Why do you need the PC? It was exported on exit.
> (its mostly junk for user-space anyway, you can't infer anything from it)
(I mainly wanted to add the "vcpu->vcpu_id" here.)
It seems that we can't just remove the PC, which will cause ABI change?


Thanks for your reviewing!

zenghui


.
James Morse June 17, 2019, 11:19 a.m. UTC | #3
Hi Zenghui,

On 13/06/2019 12:28, Zenghui Yu wrote:
> On 2019/6/12 20:49, James Morse wrote:
>> On 12/06/2019 10:08, Zenghui Yu wrote:
>>> Currently, we use trace_kvm_exit() to report exception type (e.g.,
>>> "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together.
>>> But hardware only saves the exit class to ESR_ELx on synchronous
>>> exceptions, not on asynchronous exceptions. When the guest exits
>>> due to external interrupts, we will get tracing output like:
>>>
>>>     "kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30"
>>>
>>> Obviously, "HSR_EC" here is meaningless.

>> I assume we do it this way so there is only one guest-exit tracepoint that catches all
>> exits.
>> I don't think its a problem if user-space has to know the EC isn't set for asynchronous
>> exceptions, this is a property of the architecture and anything using these trace-points
>> is already arch specific.

> Actually, *no* problem in current implementation, and I'm OK to still
> keep the EC in trace_kvm_exit().  What I really want to do is adding the
> EC in trace_trap_enter (the new tracepoint), will explain it later.


>>> This patch splits "exit" and "trap" events by adding two tracepoints
>>> explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM
>>> exit events, and trace_kvm_trap_exit() report VM trap events.
>>>
>>> These tracepoints are adjusted also in preparation for supporting
>>> 'perf kvm stat' on arm64.
>>
>> Because the existing tracepoints are ABI, I don't think we can change them.
>>
>> We can add new ones if there is something that a user reasonably needs to trace, and can't
>> be done any other way.
>>
>> What can't 'perf kvm stat' do with the existing trace points?

> First, how does 'perf kvm stat' interact with tracepoints?

Start at the beginning, good idea. (I've never used this thing!)


> We have three handlers for a specific event (e.g., "VM-EXIT") --
> "is_begin_event", "is_end_event", "decode_key". The first two handlers
> make use of two existing tracepoints ("kvm:kvm_exit" & "kvm:kvm_entry")
> to check when the VM-EXIT events started/ended, thus the time difference
> stats, event start/end time etc. can be calculated.

> "is_begin_event" handler gets a *key* from the "ret" field (exit_code)
> of "kvm:kvm_exit" payload, and "decode_key" handler makes use of the
> *key* to find out the reason for the VM-EXIT event. Of course we should
> maintain the mapping between exit_code and exit_reason in userspace.

Interpreting 'ret' is going to get tricky if we change those values on a whim. Its
internal to the KVM arch code.


> These are all what *patch #4* had done, #4 is a simple patch to review!

> Oh, we can also set "vcpu_id_str" to achieve per vcpu event record, but
> currently, we only have the "vcpu_pc" field in "kvm:kvm_entry", without
> something like "vcpu_id".

Heh, so from the trace-point data, you can't know which on is vcpu-0 and which is vcpu-1.


> OK, next comes the more important question - what should/can we do to
> the tracepoints in preparation of 'perf kvm stat' on arm64?
> 
> From the article you've provided, it's clear that we can't remove the EC
> from trace_kvm_exit(). But can we add something like "vcpu_id" into
> (at least) trace_kvm_entry(), just like what this patch has done?

Adding something is still likely to break a badly written user-space that is trying to
parse the trace information. A regex picking out the last argument will now get a
different value.


> If not, which means we have to keep the existing tracepoints totally
> unchanged, then 'perf kvm stat' will have no way to record/report per
> vcpu VM-EXIT events (other arch like X86, powerpc, s390 etc. have this
> capability, if I understand it correctly).

Well, you get the events, but you don't know which vCPU is which. You can map this back to
the pid of the host thread assuming user-space isn't moving vcpu between host threads.

If we're really stuck: Adding tracepoints to KVM-core's vcpu get/put, that export the
vcpu_id would let you map pid->vcpu_id, which you can then use for the batch of enter/exit
events that come before a final vcpu put.
grepping "vpu_id" shows perf has a mapping for which arch-specific argument in enter/exit
is the vcpu-id. Done with this core-code mapping, you could drop that code...

But I'd be a little nervous adding a new trace-point to work around an ABI problem, as we
may have just moved the ABI problem! (What does a user of a vcpu_put tracepoint really need?)


> As for TRAP events, should we consider adding two new tracepoints --
> "kvm_trap_enter" and "kvm_trap_exit", to keep tracking of the trap
> handling process? We should also record the EC in "kvm_trap_enter", which will be used as
> *key* in TRAP event's "is_begin_event" handler.

The EC can't change between trace_kvm_exit() and handle_exit(), so you already have this.

What are the 'trap' trace points needed for? You get the timing and 'exception class' from
the guest enter/exit tracepoints. What about handle_exit() can't you work out from this?


> Patch #5 tells us the whole story, it's simple too.

(I only skimmed the perf patches, I'll go back now that I know a little more about what
you're doing)


> What do you suggest?

We can explore the vcpu_load()/vcpu_put() trace idea, (it may not work for some other
reason). I'd like to understand what the 'trap' tracepoints are needed for.


Thanks,

James
Zenghui Yu June 21, 2019, 1:25 p.m. UTC | #4
Hi James,

sorry for the late reply.

On 2019/6/17 19:19, James Morse wrote:
> Hi Zenghui,
> 
> On 13/06/2019 12:28, Zenghui Yu wrote:
>> On 2019/6/12 20:49, James Morse wrote:
>>> On 12/06/2019 10:08, Zenghui Yu wrote:
>>>> Currently, we use trace_kvm_exit() to report exception type (e.g.,
>>>> "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together.
>>>> But hardware only saves the exit class to ESR_ELx on synchronous
>>>> exceptions, not on asynchronous exceptions. When the guest exits
>>>> due to external interrupts, we will get tracing output like:
>>>>
>>>>      "kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30"
>>>>
>>>> Obviously, "HSR_EC" here is meaningless.
> 
>>> I assume we do it this way so there is only one guest-exit tracepoint that catches all
>>> exits.
>>> I don't think its a problem if user-space has to know the EC isn't set for asynchronous
>>> exceptions, this is a property of the architecture and anything using these trace-points
>>> is already arch specific.
> 
>> Actually, *no* problem in current implementation, and I'm OK to still
>> keep the EC in trace_kvm_exit().  What I really want to do is adding the
>> EC in trace_trap_enter (the new tracepoint), will explain it later.
> 
> 
>>>> This patch splits "exit" and "trap" events by adding two tracepoints
>>>> explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM
>>>> exit events, and trace_kvm_trap_exit() report VM trap events.
>>>>
>>>> These tracepoints are adjusted also in preparation for supporting
>>>> 'perf kvm stat' on arm64.
>>>
>>> Because the existing tracepoints are ABI, I don't think we can change them.
>>>
>>> We can add new ones if there is something that a user reasonably needs to trace, and can't
>>> be done any other way.
>>>
>>> What can't 'perf kvm stat' do with the existing trace points?
> 
>> First, how does 'perf kvm stat' interact with tracepoints?
> 
> Start at the beginning, good idea. (I've never used this thing!)
> 
> 
>> We have three handlers for a specific event (e.g., "VM-EXIT") --
>> "is_begin_event", "is_end_event", "decode_key". The first two handlers
>> make use of two existing tracepoints ("kvm:kvm_exit" & "kvm:kvm_entry")
>> to check when the VM-EXIT events started/ended, thus the time difference
>> stats, event start/end time etc. can be calculated.
> 
>> "is_begin_event" handler gets a *key* from the "ret" field (exit_code)
>> of "kvm:kvm_exit" payload, and "decode_key" handler makes use of the
>> *key* to find out the reason for the VM-EXIT event. Of course we should
>> maintain the mapping between exit_code and exit_reason in userspace.
> 
> Interpreting 'ret' is going to get tricky if we change those values on a whim. Its
> internal to the KVM arch code.

Yes. We have to keep the definition of 'ret' and the mapping
(kvm_arm_exception_type) consistent between user side and kernel side.
Specifically,
kernel side: arch/arm64/include/asm/kvm_asm.h
user side:   tools/perf/arch/arm64/util/aarch64_guest_exits.h (patch #4)

Do we have a better approach?

>> These are all what *patch #4* had done, #4 is a simple patch to review!
> 
>> Oh, we can also set "vcpu_id_str" to achieve per vcpu event record, but
>> currently, we only have the "vcpu_pc" field in "kvm:kvm_entry", without
>> something like "vcpu_id".
> 
> Heh, so from the trace-point data, you can't know which on is vcpu-0 and which is vcpu-1.

Yes!

>> OK, next comes the more important question - what should/can we do to
>> the tracepoints in preparation of 'perf kvm stat' on arm64?
>>
>>  From the article you've provided, it's clear that we can't remove the EC
>> from trace_kvm_exit(). But can we add something like "vcpu_id" into
>> (at least) trace_kvm_entry(), just like what this patch has done?
> 
> Adding something is still likely to break a badly written user-space that is trying to
> parse the trace information. A regex picking out the last argument will now get a
> different value.
> 
> 
>> If not, which means we have to keep the existing tracepoints totally
>> unchanged, then 'perf kvm stat' will have no way to record/report per
>> vcpu VM-EXIT events (other arch like X86, powerpc, s390 etc. have this
>> capability, if I understand it correctly).
> 
> Well, you get the events, but you don't know which vCPU is which. You can map this back to
> the pid of the host thread assuming user-space isn't moving vcpu between host threads.
> 
> If we're really stuck: Adding tracepoints to KVM-core's vcpu get/put, that export the
> vcpu_id would let you map pid->vcpu_id, which you can then use for the batch of enter/exit
> events that come before a final vcpu put.
> grepping "vpu_id" shows perf has a mapping for which arch-specific argument in enter/exit
> is the vcpu-id. Done with this core-code mapping, you could drop that code...

Yes. In the current 'perf kvm' implementation, we make use of the
"vcpu id" field (specified by vcpu_id_str, differ between arch) of
"kvm:kvm_entry" tracepoint payload, to achieve per vcpu record/report.
(Details in per_vcpu_record() in tools/perf/builtin-kvm.c)

Adding tracepoints in vcpu_load()/vcpu_put() might be a good idea, we
can get "vcpu id" information without breaking existing tracepoint ABI.
And I think other arch will benefit from this way, for those who haven't
supported 'perf kvm stat' yet and don't have "vcpu id" info in the
"kvm:kvm_entry" tracepoint.

But there is at least one problem (as I can see)... KVM will not always
do a vcpu_load()/vcpu_put(), while guest always enters/exits.
In the worst case, if KVM can always handle guest's exits, perf will not
have a chance to catch vcpu_load()/vcpu_put(). And still, we fail to get
per vcpu statistical analysis.

I've written a simple patch to prove this, attached at the end of this
mail. I run 'perf kvm stat record/report' against a 4-VCPU guest, but
only vcpu-3 will be reported, as only vcpu-3 is doing some MMIO access
and we have to return to user space (and vcpu_load() will be invoked).

I'm not sure if I (likely) missed some important points you've provided.

> But I'd be a little nervous adding a new trace-point to work around an ABI problem, as we
> may have just moved the ABI problem! (What does a user of a vcpu_put tracepoint really need?)
> 
> 
>> As for TRAP events, should we consider adding two new tracepoints --
>> "kvm_trap_enter" and "kvm_trap_exit", to keep tracking of the trap
>> handling process? We should also record the EC in "kvm_trap_enter", which will be used as
>> *key* in TRAP event's "is_begin_event" handler.
> 
> The EC can't change between trace_kvm_exit() and handle_exit(), so you already have this.
> 
> What are the 'trap' trace points needed for? You get the timing and 'exception class' from
> the guest enter/exit tracepoints. What about handle_exit() can't you work out from this?

In short, these two trap tracepoints are for patch #5.
Some users (me) may want to know, how many traps a guest has in a given
period of time, how many times a particular type of trap (e.g., Dabort)
has occurred, and how long each trap has been processed. That's what
patch #5 does, you can get these info through 'perf kvm stat report
--event=trap'.

If we use guest enter/exit tracepoints to support TRAP events, there
will be two problems:
   1) the timing becomes inaccurate (longer than the real timing)
   2) the reported 'EC' becomes "Unknown" in the case when guest exits
      due to IRQ, as we set 'esr_ec' to 0 on asynchronous exceptions,
      in trace_kvm_exit.

>> Patch #5 tells us the whole story, it's simple too.
> 
> (I only skimmed the perf patches, I'll go back now that I know a little more about what
> you're doing)

Much appreciated :)

>> What do you suggest?
> 
> We can explore the vcpu_load()/vcpu_put() trace idea, (it may not work for some other
> reason). I'd like to understand what the 'trap' tracepoints are needed for.

Thanks for your suggestion!


Btw, I have collected some useful links, attached here, to share with
someone who're interested in this series.

[1] the initial idea for 'perf kvm stat':
https://events.static.linuxfound.org/images/stories/pdf/lcjp2012_guangrong.pdf
[2] a discussion on improving kvm_exit tracepoint:
https://patchwork.kernel.org/patch/7097311/


Thanks,
zenghui



---8<---

Two tracepoints have already been added in vcpu_load/put().

---
  tools/perf/arch/arm64/util/kvm-stat.c | 2 ++
  tools/perf/builtin-kvm.c              | 7 ++++++-
  2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm64/util/kvm-stat.c 
b/tools/perf/arch/arm64/util/kvm-stat.c
index 41b28de..a91fe7d 100644
--- a/tools/perf/arch/arm64/util/kvm-stat.c
+++ b/tools/perf/arch/arm64/util/kvm-stat.c
@@ -103,6 +103,8 @@ static void trap_event_decode_key(struct 
perf_kvm_stat *kvm __maybe_unused,
  	"kvm:kvm_exit",
  	"kvm:kvm_trap_enter",
  	"kvm:kvm_trap_exit",
+	"kvm:vcpu_load",
+	"kvm:vcpu_put",
  	NULL,
  };

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index dbb6f73..f011fcf 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -81,6 +81,11 @@ bool exit_event_begin(struct perf_evsel *evsel,
  	return false;
  }

+static bool kvm_vcpu_load_event(struct perf_evsel *evsel)
+{
+	return !strcmp(evsel->name, "kvm:vcpu_load");
+}
+
  bool kvm_entry_event(struct perf_evsel *evsel)
  {
  	return !strcmp(evsel->name, kvm_entry_trace);
@@ -400,7 +405,7 @@ struct vcpu_event_record *per_vcpu_record(struct 
thread *thread,
  					  struct perf_sample *sample)
  {
  	/* Only kvm_entry records vcpu id. */
-	if (!thread__priv(thread) && kvm_entry_event(evsel)) {
+	if (!thread__priv(thread) && kvm_vcpu_load_event(evsel)) {
  		struct vcpu_event_record *vcpu_record;

  		vcpu_record = zalloc(sizeof(*vcpu_record));
diff mbox series

Patch

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 516aead..af3c732 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -264,7 +264,10 @@  static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		exit_handle_fn exit_handler;
 
 		exit_handler = kvm_get_exit_handler(vcpu);
+		trace_kvm_trap_enter(vcpu->vcpu_id,
+				     kvm_vcpu_trap_get_class(vcpu));
 		handled = exit_handler(vcpu, run);
+		trace_kvm_trap_exit(vcpu->vcpu_id);
 	}
 
 	return handled;
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index eab91ad..6014c73 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -8,6 +8,41 @@ 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm
 
+TRACE_EVENT(kvm_trap_enter,
+	TP_PROTO(unsigned int vcpu_id, unsigned int esr_ec),
+	TP_ARGS(vcpu_id, esr_ec),
+
+	TP_STRUCT__entry(
+		__field(unsigned int,	vcpu_id)
+		__field(unsigned int,	esr_ec)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
+		__entry->esr_ec		= esr_ec;
+	),
+
+	TP_printk("VCPU %u: HSR_EC=0x%04x (%s)",
+		  __entry->vcpu_id,
+		  __entry->esr_ec,
+		  __print_symbolic(__entry->esr_ec, kvm_arm_exception_class))
+);
+
+TRACE_EVENT(kvm_trap_exit,
+	TP_PROTO(unsigned int vcpu_id),
+	TP_ARGS(vcpu_id),
+
+	TP_STRUCT__entry(
+		__field(unsigned int,	vcpu_id)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
+	),
+
+	TP_printk("VCPU %u", __entry->vcpu_id)
+);
+
 TRACE_EVENT(kvm_wfx_arm64,
 	TP_PROTO(unsigned long vcpu_pc, bool is_wfe),
 	TP_ARGS(vcpu_pc, is_wfe),
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 90cedeb..9f63fd9 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -758,7 +758,7 @@  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		/**************************************************************
 		 * Enter the guest
 		 */
-		trace_kvm_entry(*vcpu_pc(vcpu));
+		trace_kvm_entry(vcpu->vcpu_id, *vcpu_pc(vcpu));
 		guest_enter_irqoff();
 
 		if (has_vhe()) {
@@ -822,7 +822,7 @@  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * guest time.
 		 */
 		guest_exit();
-		trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+		trace_kvm_exit(vcpu->vcpu_id, ret, *vcpu_pc(vcpu));
 
 		/* Exit types that need handling before we can be preempted */
 		handle_exit_early(vcpu, run, ret);
diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h
index 8b7dff2..edd9dae 100644
--- a/virt/kvm/arm/trace.h
+++ b/virt/kvm/arm/trace.h
@@ -12,40 +12,41 @@ 
  * Tracepoints for entry/exit to guest
  */
 TRACE_EVENT(kvm_entry,
-	TP_PROTO(unsigned long vcpu_pc),
-	TP_ARGS(vcpu_pc),
+	TP_PROTO(unsigned int vcpu_id, unsigned long vcpu_pc),
+	TP_ARGS(vcpu_id, vcpu_pc),
 
 	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
 		__field(	unsigned long,	vcpu_pc		)
 	),
 
 	TP_fast_assign(
+		__entry->vcpu_id		= vcpu_id;
 		__entry->vcpu_pc		= vcpu_pc;
 	),
 
-	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+	TP_printk("VCPU %u: PC=0x%08lx", __entry->vcpu_id, __entry->vcpu_pc)
 );
 
 TRACE_EVENT(kvm_exit,
-	TP_PROTO(int ret, unsigned int esr_ec, unsigned long vcpu_pc),
-	TP_ARGS(ret, esr_ec, vcpu_pc),
+	TP_PROTO(unsigned int vcpu_id, int ret, unsigned long vcpu_pc),
+	TP_ARGS(vcpu_id, ret, vcpu_pc),
 
 	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
 		__field(	int,		ret		)
-		__field(	unsigned int,	esr_ec		)
 		__field(	unsigned long,	vcpu_pc		)
 	),
 
 	TP_fast_assign(
+		__entry->vcpu_id		= vcpu_id;
 		__entry->ret			= ARM_EXCEPTION_CODE(ret);
-		__entry->esr_ec = ARM_EXCEPTION_IS_TRAP(ret) ? esr_ec : 0;
 		__entry->vcpu_pc		= vcpu_pc;
 	),
 
-	TP_printk("%s: HSR_EC: 0x%04x (%s), PC: 0x%08lx",
+	TP_printk("VCPU %u: exit_type=%s, PC=0x%08lx",
+		  __entry->vcpu_id,
 		  __print_symbolic(__entry->ret, kvm_arm_exception_type),
-		  __entry->esr_ec,
-		  __print_symbolic(__entry->esr_ec, kvm_arm_exception_class),
 		  __entry->vcpu_pc)
 );