Message ID | 1560330526-15468-3-git-send-email-yuzenghui@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | perf kvm: Add stat support on arm64 | expand |
Hi, On 12/06/2019 10:08, Zenghui Yu wrote: > Currently, we use trace_kvm_exit() to report exception type (e.g., > "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together. (They both caused an exit!) > But hardware only saves the exit class to ESR_ELx on synchronous EC is the 'Exception Class'. Exit is KVM/Linux's terminology. > exceptions, not on asynchronous exceptions. When the guest exits > due to external interrupts, we will get tracing output like: > > "kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30" > > Obviously, "HSR_EC" here is meaningless. I assume we do it this way so there is only one guest-exit tracepoint that catches all exits. I don't think its a problem if user-space has to know the EC isn't set for asynchronous exceptions, this is a property of the architecture and anything using these trace-points is already arch specific. > This patch splits "exit" and "trap" events by adding two tracepoints > explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM > exit events, and trace_kvm_trap_exit() report VM trap events. > > These tracepoints are adjusted also in preparation for supporting > 'perf kvm stat' on arm64. Because the existing tracepoints are ABI, I don't think we can change them. We can add new ones if there is something that a user reasonably needs to trace, and can't be done any other way. What can't 'perf kvm stat' do with the existing trace points? > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c > index 516aead..af3c732 100644 > --- a/arch/arm64/kvm/handle_exit.c > +++ b/arch/arm64/kvm/handle_exit.c > @@ -264,7 +264,10 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run) > exit_handle_fn exit_handler; > > exit_handler = kvm_get_exit_handler(vcpu); > + trace_kvm_trap_enter(vcpu->vcpu_id, > + kvm_vcpu_trap_get_class(vcpu)); > handled = exit_handler(vcpu, run); > + trace_kvm_trap_exit(vcpu->vcpu_id); > } Why are there two? Are you using this to benchmark the exit_handler()? As we can't remove the EC from the exit event, I don't think this tells us anything new. > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 90cedeb..9f63fd9 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -758,7 +758,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > /************************************************************** > * Enter the guest > */ > - trace_kvm_entry(*vcpu_pc(vcpu)); > + trace_kvm_entry(vcpu->vcpu_id, *vcpu_pc(vcpu)); Why do you need the PC? It was exported on exit. (its mostly junk for user-space anyway, you can't infer anything from it) Thanks, James
Hi James, On 2019/6/12 20:49, James Morse wrote: > Hi, > > On 12/06/2019 10:08, Zenghui Yu wrote: >> Currently, we use trace_kvm_exit() to report exception type (e.g., >> "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together. > > (They both caused an exit!) > > >> But hardware only saves the exit class to ESR_ELx on synchronous > > EC is the 'Exception Class'. Exit is KVM/Linux's terminology. Yes, a stupid mistake ;-) >> exceptions, not on asynchronous exceptions. When the guest exits >> due to external interrupts, we will get tracing output like: >> >> "kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30" >> >> Obviously, "HSR_EC" here is meaningless. > > I assume we do it this way so there is only one guest-exit tracepoint that catches all exits. > I don't think its a problem if user-space has to know the EC isn't set for asynchronous > exceptions, this is a property of the architecture and anything using these trace-points > is already arch specific. Actually, *no* problem in current implementation, and I'm OK to still keep the EC in trace_kvm_exit(). What I really want to do is adding the EC in trace_trap_enter (the new tracepoint), will explain it later. >> This patch splits "exit" and "trap" events by adding two tracepoints >> explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM >> exit events, and trace_kvm_trap_exit() report VM trap events. >> >> These tracepoints are adjusted also in preparation for supporting >> 'perf kvm stat' on arm64. > > Because the existing tracepoints are ABI, I don't think we can change them. > > We can add new ones if there is something that a user reasonably needs to trace, and can't > be done any other way. > > What can't 'perf kvm stat' do with the existing trace points? (A good question! I should have made it clear in the commit message, forgive me.) First, how does 'perf kvm stat' interact with tracepoints? We have three handlers for a specific event (e.g., "VM-EXIT") -- "is_begin_event", "is_end_event", "decode_key". The first two handlers make use of two existing tracepoints ("kvm:kvm_exit" & "kvm:kvm_entry") to check when the VM-EXIT events started/ended, thus the time difference stats, event start/end time etc. can be calculated. "is_begin_event" handler gets a *key* from the "ret" field (exit_code) of "kvm:kvm_exit" payload, and "decode_key" handler makes use of the *key* to find out the reason for the VM-EXIT event. Of course we should maintain the mapping between exit_code and exit_reason in userspace. These are all what *patch #4* had done, #4 is a simple patch to review! Oh, we can also set "vcpu_id_str" to achieve per vcpu event record, but currently, we only have the "vcpu_pc" field in "kvm:kvm_entry", without something like "vcpu_id". perf people must have a much deeper understanding of this. OK, next comes the more important question - what should/can we do to the tracepoints in preparation of 'perf kvm stat' on arm64? From the article you've provided, it's clear that we can't remove the EC from trace_kvm_exit(). But can we add something like "vcpu_id" into (at least) trace_kvm_entry(), just like what this patch has done? If not, which means we have to keep the existing tracepoints totally unchanged, then 'perf kvm stat' will have no way to record/report per vcpu VM-EXIT events (other arch like X86, powerpc, s390 etc. have this capability, if I understand it correctly). As for TRAP events, should we consider adding two new tracepoints -- "kvm_trap_enter" and "kvm_trap_exit", to keep tracking of the trap handling process? We should also record the EC in "kvm_trap_enter", which will be used as *key* in TRAP event's "is_begin_event" handler. Patch #5 tells us the whole story, it's simple too. What do you suggest? >> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c >> index 516aead..af3c732 100644 >> --- a/arch/arm64/kvm/handle_exit.c >> +++ b/arch/arm64/kvm/handle_exit.c >> @@ -264,7 +264,10 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run) >> exit_handle_fn exit_handler; >> >> exit_handler = kvm_get_exit_handler(vcpu); >> + trace_kvm_trap_enter(vcpu->vcpu_id, >> + kvm_vcpu_trap_get_class(vcpu)); >> handled = exit_handler(vcpu, run); >> + trace_kvm_trap_exit(vcpu->vcpu_id); >> } > > Why are there two? Are you using this to benchmark the exit_handler()? Almostly yes. Let perf know when the TRAP handling event start/end, and ... > As we can't remove the EC from the exit event, I don't think this tells us anything new. As explained above, this EC is for 'perf kvm stat'. >> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c >> index 90cedeb..9f63fd9 100644 >> --- a/virt/kvm/arm/arm.c >> +++ b/virt/kvm/arm/arm.c >> @@ -758,7 +758,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >> /************************************************************** >> * Enter the guest >> */ >> - trace_kvm_entry(*vcpu_pc(vcpu)); >> + trace_kvm_entry(vcpu->vcpu_id, *vcpu_pc(vcpu)); > > Why do you need the PC? It was exported on exit. > (its mostly junk for user-space anyway, you can't infer anything from it) (I mainly wanted to add the "vcpu->vcpu_id" here.) It seems that we can't just remove the PC, which will cause ABI change? Thanks for your reviewing! zenghui .
Hi Zenghui, On 13/06/2019 12:28, Zenghui Yu wrote: > On 2019/6/12 20:49, James Morse wrote: >> On 12/06/2019 10:08, Zenghui Yu wrote: >>> Currently, we use trace_kvm_exit() to report exception type (e.g., >>> "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together. >>> But hardware only saves the exit class to ESR_ELx on synchronous >>> exceptions, not on asynchronous exceptions. When the guest exits >>> due to external interrupts, we will get tracing output like: >>> >>> "kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30" >>> >>> Obviously, "HSR_EC" here is meaningless. >> I assume we do it this way so there is only one guest-exit tracepoint that catches all >> exits. >> I don't think its a problem if user-space has to know the EC isn't set for asynchronous >> exceptions, this is a property of the architecture and anything using these trace-points >> is already arch specific. > Actually, *no* problem in current implementation, and I'm OK to still > keep the EC in trace_kvm_exit(). What I really want to do is adding the > EC in trace_trap_enter (the new tracepoint), will explain it later. >>> This patch splits "exit" and "trap" events by adding two tracepoints >>> explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM >>> exit events, and trace_kvm_trap_exit() report VM trap events. >>> >>> These tracepoints are adjusted also in preparation for supporting >>> 'perf kvm stat' on arm64. >> >> Because the existing tracepoints are ABI, I don't think we can change them. >> >> We can add new ones if there is something that a user reasonably needs to trace, and can't >> be done any other way. >> >> What can't 'perf kvm stat' do with the existing trace points? > First, how does 'perf kvm stat' interact with tracepoints? Start at the beginning, good idea. (I've never used this thing!) > We have three handlers for a specific event (e.g., "VM-EXIT") -- > "is_begin_event", "is_end_event", "decode_key". The first two handlers > make use of two existing tracepoints ("kvm:kvm_exit" & "kvm:kvm_entry") > to check when the VM-EXIT events started/ended, thus the time difference > stats, event start/end time etc. can be calculated. > "is_begin_event" handler gets a *key* from the "ret" field (exit_code) > of "kvm:kvm_exit" payload, and "decode_key" handler makes use of the > *key* to find out the reason for the VM-EXIT event. Of course we should > maintain the mapping between exit_code and exit_reason in userspace. Interpreting 'ret' is going to get tricky if we change those values on a whim. Its internal to the KVM arch code. > These are all what *patch #4* had done, #4 is a simple patch to review! > Oh, we can also set "vcpu_id_str" to achieve per vcpu event record, but > currently, we only have the "vcpu_pc" field in "kvm:kvm_entry", without > something like "vcpu_id". Heh, so from the trace-point data, you can't know which on is vcpu-0 and which is vcpu-1. > OK, next comes the more important question - what should/can we do to > the tracepoints in preparation of 'perf kvm stat' on arm64? > > From the article you've provided, it's clear that we can't remove the EC > from trace_kvm_exit(). But can we add something like "vcpu_id" into > (at least) trace_kvm_entry(), just like what this patch has done? Adding something is still likely to break a badly written user-space that is trying to parse the trace information. A regex picking out the last argument will now get a different value. > If not, which means we have to keep the existing tracepoints totally > unchanged, then 'perf kvm stat' will have no way to record/report per > vcpu VM-EXIT events (other arch like X86, powerpc, s390 etc. have this > capability, if I understand it correctly). Well, you get the events, but you don't know which vCPU is which. You can map this back to the pid of the host thread assuming user-space isn't moving vcpu between host threads. If we're really stuck: Adding tracepoints to KVM-core's vcpu get/put, that export the vcpu_id would let you map pid->vcpu_id, which you can then use for the batch of enter/exit events that come before a final vcpu put. grepping "vpu_id" shows perf has a mapping for which arch-specific argument in enter/exit is the vcpu-id. Done with this core-code mapping, you could drop that code... But I'd be a little nervous adding a new trace-point to work around an ABI problem, as we may have just moved the ABI problem! (What does a user of a vcpu_put tracepoint really need?) > As for TRAP events, should we consider adding two new tracepoints -- > "kvm_trap_enter" and "kvm_trap_exit", to keep tracking of the trap > handling process? We should also record the EC in "kvm_trap_enter", which will be used as > *key* in TRAP event's "is_begin_event" handler. The EC can't change between trace_kvm_exit() and handle_exit(), so you already have this. What are the 'trap' trace points needed for? You get the timing and 'exception class' from the guest enter/exit tracepoints. What about handle_exit() can't you work out from this? > Patch #5 tells us the whole story, it's simple too. (I only skimmed the perf patches, I'll go back now that I know a little more about what you're doing) > What do you suggest? We can explore the vcpu_load()/vcpu_put() trace idea, (it may not work for some other reason). I'd like to understand what the 'trap' tracepoints are needed for. Thanks, James
Hi James, sorry for the late reply. On 2019/6/17 19:19, James Morse wrote: > Hi Zenghui, > > On 13/06/2019 12:28, Zenghui Yu wrote: >> On 2019/6/12 20:49, James Morse wrote: >>> On 12/06/2019 10:08, Zenghui Yu wrote: >>>> Currently, we use trace_kvm_exit() to report exception type (e.g., >>>> "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together. >>>> But hardware only saves the exit class to ESR_ELx on synchronous >>>> exceptions, not on asynchronous exceptions. When the guest exits >>>> due to external interrupts, we will get tracing output like: >>>> >>>> "kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30" >>>> >>>> Obviously, "HSR_EC" here is meaningless. > >>> I assume we do it this way so there is only one guest-exit tracepoint that catches all >>> exits. >>> I don't think its a problem if user-space has to know the EC isn't set for asynchronous >>> exceptions, this is a property of the architecture and anything using these trace-points >>> is already arch specific. > >> Actually, *no* problem in current implementation, and I'm OK to still >> keep the EC in trace_kvm_exit(). What I really want to do is adding the >> EC in trace_trap_enter (the new tracepoint), will explain it later. > > >>>> This patch splits "exit" and "trap" events by adding two tracepoints >>>> explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM >>>> exit events, and trace_kvm_trap_exit() report VM trap events. >>>> >>>> These tracepoints are adjusted also in preparation for supporting >>>> 'perf kvm stat' on arm64. >>> >>> Because the existing tracepoints are ABI, I don't think we can change them. >>> >>> We can add new ones if there is something that a user reasonably needs to trace, and can't >>> be done any other way. >>> >>> What can't 'perf kvm stat' do with the existing trace points? > >> First, how does 'perf kvm stat' interact with tracepoints? > > Start at the beginning, good idea. (I've never used this thing!) > > >> We have three handlers for a specific event (e.g., "VM-EXIT") -- >> "is_begin_event", "is_end_event", "decode_key". The first two handlers >> make use of two existing tracepoints ("kvm:kvm_exit" & "kvm:kvm_entry") >> to check when the VM-EXIT events started/ended, thus the time difference >> stats, event start/end time etc. can be calculated. > >> "is_begin_event" handler gets a *key* from the "ret" field (exit_code) >> of "kvm:kvm_exit" payload, and "decode_key" handler makes use of the >> *key* to find out the reason for the VM-EXIT event. Of course we should >> maintain the mapping between exit_code and exit_reason in userspace. > > Interpreting 'ret' is going to get tricky if we change those values on a whim. Its > internal to the KVM arch code. Yes. We have to keep the definition of 'ret' and the mapping (kvm_arm_exception_type) consistent between user side and kernel side. Specifically, kernel side: arch/arm64/include/asm/kvm_asm.h user side: tools/perf/arch/arm64/util/aarch64_guest_exits.h (patch #4) Do we have a better approach? >> These are all what *patch #4* had done, #4 is a simple patch to review! > >> Oh, we can also set "vcpu_id_str" to achieve per vcpu event record, but >> currently, we only have the "vcpu_pc" field in "kvm:kvm_entry", without >> something like "vcpu_id". > > Heh, so from the trace-point data, you can't know which on is vcpu-0 and which is vcpu-1. Yes! >> OK, next comes the more important question - what should/can we do to >> the tracepoints in preparation of 'perf kvm stat' on arm64? >> >> From the article you've provided, it's clear that we can't remove the EC >> from trace_kvm_exit(). But can we add something like "vcpu_id" into >> (at least) trace_kvm_entry(), just like what this patch has done? > > Adding something is still likely to break a badly written user-space that is trying to > parse the trace information. A regex picking out the last argument will now get a > different value. > > >> If not, which means we have to keep the existing tracepoints totally >> unchanged, then 'perf kvm stat' will have no way to record/report per >> vcpu VM-EXIT events (other arch like X86, powerpc, s390 etc. have this >> capability, if I understand it correctly). > > Well, you get the events, but you don't know which vCPU is which. You can map this back to > the pid of the host thread assuming user-space isn't moving vcpu between host threads. > > If we're really stuck: Adding tracepoints to KVM-core's vcpu get/put, that export the > vcpu_id would let you map pid->vcpu_id, which you can then use for the batch of enter/exit > events that come before a final vcpu put. > grepping "vpu_id" shows perf has a mapping for which arch-specific argument in enter/exit > is the vcpu-id. Done with this core-code mapping, you could drop that code... Yes. In the current 'perf kvm' implementation, we make use of the "vcpu id" field (specified by vcpu_id_str, differ between arch) of "kvm:kvm_entry" tracepoint payload, to achieve per vcpu record/report. (Details in per_vcpu_record() in tools/perf/builtin-kvm.c) Adding tracepoints in vcpu_load()/vcpu_put() might be a good idea, we can get "vcpu id" information without breaking existing tracepoint ABI. And I think other arch will benefit from this way, for those who haven't supported 'perf kvm stat' yet and don't have "vcpu id" info in the "kvm:kvm_entry" tracepoint. But there is at least one problem (as I can see)... KVM will not always do a vcpu_load()/vcpu_put(), while guest always enters/exits. In the worst case, if KVM can always handle guest's exits, perf will not have a chance to catch vcpu_load()/vcpu_put(). And still, we fail to get per vcpu statistical analysis. I've written a simple patch to prove this, attached at the end of this mail. I run 'perf kvm stat record/report' against a 4-VCPU guest, but only vcpu-3 will be reported, as only vcpu-3 is doing some MMIO access and we have to return to user space (and vcpu_load() will be invoked). I'm not sure if I (likely) missed some important points you've provided. > But I'd be a little nervous adding a new trace-point to work around an ABI problem, as we > may have just moved the ABI problem! (What does a user of a vcpu_put tracepoint really need?) > > >> As for TRAP events, should we consider adding two new tracepoints -- >> "kvm_trap_enter" and "kvm_trap_exit", to keep tracking of the trap >> handling process? We should also record the EC in "kvm_trap_enter", which will be used as >> *key* in TRAP event's "is_begin_event" handler. > > The EC can't change between trace_kvm_exit() and handle_exit(), so you already have this. > > What are the 'trap' trace points needed for? You get the timing and 'exception class' from > the guest enter/exit tracepoints. What about handle_exit() can't you work out from this? In short, these two trap tracepoints are for patch #5. Some users (me) may want to know, how many traps a guest has in a given period of time, how many times a particular type of trap (e.g., Dabort) has occurred, and how long each trap has been processed. That's what patch #5 does, you can get these info through 'perf kvm stat report --event=trap'. If we use guest enter/exit tracepoints to support TRAP events, there will be two problems: 1) the timing becomes inaccurate (longer than the real timing) 2) the reported 'EC' becomes "Unknown" in the case when guest exits due to IRQ, as we set 'esr_ec' to 0 on asynchronous exceptions, in trace_kvm_exit. >> Patch #5 tells us the whole story, it's simple too. > > (I only skimmed the perf patches, I'll go back now that I know a little more about what > you're doing) Much appreciated :) >> What do you suggest? > > We can explore the vcpu_load()/vcpu_put() trace idea, (it may not work for some other > reason). I'd like to understand what the 'trap' tracepoints are needed for. Thanks for your suggestion! Btw, I have collected some useful links, attached here, to share with someone who're interested in this series. [1] the initial idea for 'perf kvm stat': https://events.static.linuxfound.org/images/stories/pdf/lcjp2012_guangrong.pdf [2] a discussion on improving kvm_exit tracepoint: https://patchwork.kernel.org/patch/7097311/ Thanks, zenghui ---8<--- Two tracepoints have already been added in vcpu_load/put(). --- tools/perf/arch/arm64/util/kvm-stat.c | 2 ++ tools/perf/builtin-kvm.c | 7 ++++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/perf/arch/arm64/util/kvm-stat.c b/tools/perf/arch/arm64/util/kvm-stat.c index 41b28de..a91fe7d 100644 --- a/tools/perf/arch/arm64/util/kvm-stat.c +++ b/tools/perf/arch/arm64/util/kvm-stat.c @@ -103,6 +103,8 @@ static void trap_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, "kvm:kvm_exit", "kvm:kvm_trap_enter", "kvm:kvm_trap_exit", + "kvm:vcpu_load", + "kvm:vcpu_put", NULL, }; diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index dbb6f73..f011fcf 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -81,6 +81,11 @@ bool exit_event_begin(struct perf_evsel *evsel, return false; } +static bool kvm_vcpu_load_event(struct perf_evsel *evsel) +{ + return !strcmp(evsel->name, "kvm:vcpu_load"); +} + bool kvm_entry_event(struct perf_evsel *evsel) { return !strcmp(evsel->name, kvm_entry_trace); @@ -400,7 +405,7 @@ struct vcpu_event_record *per_vcpu_record(struct thread *thread, struct perf_sample *sample) { /* Only kvm_entry records vcpu id. */ - if (!thread__priv(thread) && kvm_entry_event(evsel)) { + if (!thread__priv(thread) && kvm_vcpu_load_event(evsel)) { struct vcpu_event_record *vcpu_record; vcpu_record = zalloc(sizeof(*vcpu_record));
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index 516aead..af3c732 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -264,7 +264,10 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run) exit_handle_fn exit_handler; exit_handler = kvm_get_exit_handler(vcpu); + trace_kvm_trap_enter(vcpu->vcpu_id, + kvm_vcpu_trap_get_class(vcpu)); handled = exit_handler(vcpu, run); + trace_kvm_trap_exit(vcpu->vcpu_id); } return handled; diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h index eab91ad..6014c73 100644 --- a/arch/arm64/kvm/trace.h +++ b/arch/arm64/kvm/trace.h @@ -8,6 +8,41 @@ #undef TRACE_SYSTEM #define TRACE_SYSTEM kvm +TRACE_EVENT(kvm_trap_enter, + TP_PROTO(unsigned int vcpu_id, unsigned int esr_ec), + TP_ARGS(vcpu_id, esr_ec), + + TP_STRUCT__entry( + __field(unsigned int, vcpu_id) + __field(unsigned int, esr_ec) + ), + + TP_fast_assign( + __entry->vcpu_id = vcpu_id; + __entry->esr_ec = esr_ec; + ), + + TP_printk("VCPU %u: HSR_EC=0x%04x (%s)", + __entry->vcpu_id, + __entry->esr_ec, + __print_symbolic(__entry->esr_ec, kvm_arm_exception_class)) +); + +TRACE_EVENT(kvm_trap_exit, + TP_PROTO(unsigned int vcpu_id), + TP_ARGS(vcpu_id), + + TP_STRUCT__entry( + __field(unsigned int, vcpu_id) + ), + + TP_fast_assign( + __entry->vcpu_id = vcpu_id; + ), + + TP_printk("VCPU %u", __entry->vcpu_id) +); + TRACE_EVENT(kvm_wfx_arm64, TP_PROTO(unsigned long vcpu_pc, bool is_wfe), TP_ARGS(vcpu_pc, is_wfe), diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 90cedeb..9f63fd9 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -758,7 +758,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) /************************************************************** * Enter the guest */ - trace_kvm_entry(*vcpu_pc(vcpu)); + trace_kvm_entry(vcpu->vcpu_id, *vcpu_pc(vcpu)); guest_enter_irqoff(); if (has_vhe()) { @@ -822,7 +822,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) * guest time. */ guest_exit(); - trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu)); + trace_kvm_exit(vcpu->vcpu_id, ret, *vcpu_pc(vcpu)); /* Exit types that need handling before we can be preempted */ handle_exit_early(vcpu, run, ret); diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h index 8b7dff2..edd9dae 100644 --- a/virt/kvm/arm/trace.h +++ b/virt/kvm/arm/trace.h @@ -12,40 +12,41 @@ * Tracepoints for entry/exit to guest */ TRACE_EVENT(kvm_entry, - TP_PROTO(unsigned long vcpu_pc), - TP_ARGS(vcpu_pc), + TP_PROTO(unsigned int vcpu_id, unsigned long vcpu_pc), + TP_ARGS(vcpu_id, vcpu_pc), TP_STRUCT__entry( + __field( unsigned int, vcpu_id ) __field( unsigned long, vcpu_pc ) ), TP_fast_assign( + __entry->vcpu_id = vcpu_id; __entry->vcpu_pc = vcpu_pc; ), - TP_printk("PC: 0x%08lx", __entry->vcpu_pc) + TP_printk("VCPU %u: PC=0x%08lx", __entry->vcpu_id, __entry->vcpu_pc) ); TRACE_EVENT(kvm_exit, - TP_PROTO(int ret, unsigned int esr_ec, unsigned long vcpu_pc), - TP_ARGS(ret, esr_ec, vcpu_pc), + TP_PROTO(unsigned int vcpu_id, int ret, unsigned long vcpu_pc), + TP_ARGS(vcpu_id, ret, vcpu_pc), TP_STRUCT__entry( + __field( unsigned int, vcpu_id ) __field( int, ret ) - __field( unsigned int, esr_ec ) __field( unsigned long, vcpu_pc ) ), TP_fast_assign( + __entry->vcpu_id = vcpu_id; __entry->ret = ARM_EXCEPTION_CODE(ret); - __entry->esr_ec = ARM_EXCEPTION_IS_TRAP(ret) ? esr_ec : 0; __entry->vcpu_pc = vcpu_pc; ), - TP_printk("%s: HSR_EC: 0x%04x (%s), PC: 0x%08lx", + TP_printk("VCPU %u: exit_type=%s, PC=0x%08lx", + __entry->vcpu_id, __print_symbolic(__entry->ret, kvm_arm_exception_type), - __entry->esr_ec, - __print_symbolic(__entry->esr_ec, kvm_arm_exception_class), __entry->vcpu_pc) );
Currently, we use trace_kvm_exit() to report exception type (e.g., "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together. But hardware only saves the exit class to ESR_ELx on synchronous exceptions, not on asynchronous exceptions. When the guest exits due to external interrupts, we will get tracing output like: "kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30" Obviously, "HSR_EC" here is meaningless. This patch splits "exit" and "trap" events by adding two tracepoints explicitly in handle_trap_exceptions(). Let trace_kvm_exit() report VM exit events, and trace_kvm_trap_exit() report VM trap events. These tracepoints are adjusted also in preparation for supporting 'perf kvm stat' on arm64. Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> --- arch/arm64/kvm/handle_exit.c | 3 +++ arch/arm64/kvm/trace.h | 35 +++++++++++++++++++++++++++++++++++ virt/kvm/arm/arm.c | 4 ++-- virt/kvm/arm/trace.h | 21 +++++++++++---------- 4 files changed, 51 insertions(+), 12 deletions(-)