KVM: x86: add full vm-exit reason debug entries

Message ID	1555939499-30854-1-git-send-email-pizhenwei@bytedance.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: zhenwei pi <pizhenwei@bytedance.com> To: pbonzini@redhat.com, rkrcmar@redhat.com, mingo@redhat.com, kvm@vger.kernel.org Cc: lifei.shirley@bytedance.com, pizhenwei@bytedance.com Subject: [PATCH] KVM: x86: add full vm-exit reason debug entries Date: Mon, 22 Apr 2019 21:24:59 +0800 Message-Id: <1555939499-30854-1-git-send-email-pizhenwei@bytedance.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	KVM: x86: add full vm-exit reason debug entries \| expand KVM: x86: add full vm-exit reason debug entries

Message ID

1555939499-30854-1-git-send-email-pizhenwei@bytedance.com (mailing list archive)

State

New, archived

Headers

From: zhenwei pi <pizhenwei@bytedance.com>
To: pbonzini@redhat.com, rkrcmar@redhat.com, mingo@redhat.com,
        kvm@vger.kernel.org
Cc: lifei.shirley@bytedance.com, pizhenwei@bytedance.com
Subject: [PATCH] KVM: x86: add full vm-exit reason debug entries
Date: Mon, 22 Apr 2019 21:24:59 +0800
Message-Id: <1555939499-30854-1-git-send-email-pizhenwei@bytedance.com>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Series

KVM: x86: add full vm-exit reason debug entries | expand

Commit Message

zhenwei pi April 22, 2019, 1:24 p.m. UTC

Hit several typical cases of performance drop due to vm-exit:
case 1, jemalloc calls madvise(void *addr, size_t length, MADV_DONTNEED) in
guest, IPI causes a lot of exits.
case 2, atop collects IPC by perf hardware events in guest, vpmu & rdpmc
exits increase a lot.
case 3, host memory compaction invalidates TDP and tdp_page_fault can cause
huge loss of performance.
case 4, web services(written by golang) call futex and have higher latency
than host os environment.

Add more vm-exit reason debug entries, they are helpful to recognize
performance drop reason. In this patch:
1, add more vm-exit reasons.
2, add wrmsr details.
3, add CR details.
4, add hypercall details.

Currently we can also implement the same result by bpf.
Sample code (written by Fei Li<lifei.shirley@bytedance.com>):
  b = BPF(text="""

  struct kvm_msr_exit_info {
      u32 pid;
      u32 tgid;
      u32 msr_exit_ct;
  };
  BPF_HASH(kvm_msr_exit, unsigned int, struct kvm_msr_exit_info, 1024);

  TRACEPOINT_PROBE(kvm, kvm_msr) {
      int ct = args->ecx;
      if (ct >= 0xffffffff) {
          return -1;
      }

      u32 pid = bpf_get_current_pid_tgid() >> 32;
      u32 tgid = bpf_get_current_pid_tgid();

      struct kvm_msr_exit_info *exit_info;
      struct kvm_msr_exit_info init_exit_info = {};
      exit_info = kvm_msr_exit.lookup(&ct);
      if (exit_info != NULL) {
          exit_info->pid = pid;
          exit_info->tgid = tgid;
          exit_info->msr_exit_ct++;
      } else {
          init_exit_info.pid = pid;
          init_exit_info.tgid = tgid;
          init_exit_info.msr_exit_ct = 1;
          kvm_msr_exit.update(&ct, &init_exit_info);
      }
      return 0;
  }
  """)

Run wrmsr(MSR_IA32_TSCDEADLINE, val) benchmark in guest
(CPU Intel Gold 5118):
case 1, no bpf on host. ~1127 cycles/wrmsr.
case 2, sample bpf on host with JIT. ~1223 cycles/wrmsr.      -->  8.5%
case 3, sample bpf on host without JIT. ~1312 cycles/wrmsr.   --> 16.4%

So, debug entries are more efficient than the bpf method.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 arch/x86/include/asm/kvm_host.h | 77 +++++++++++++++++++++++++++++++++++------
 arch/x86/kvm/cpuid.c            |  1 +
 arch/x86/kvm/lapic.c            | 17 +++++++++
 arch/x86/kvm/pmu.c              |  1 +
 arch/x86/kvm/vmx/vmx.c          | 26 ++++++++++++++
 arch/x86/kvm/x86.c              | 60 ++++++++++++++++++++++++++++++++
 6 files changed, 172 insertions(+), 10 deletions(-)

Comments

Sean Christopherson April 22, 2019, 8:54 p.m. UTC | #1

On Mon, Apr 22, 2019 at 09:24:59PM +0800, zhenwei pi wrote:
> Hit several typical cases of performance drop due to vm-exit:
> case 1, jemalloc calls madvise(void *addr, size_t length, MADV_DONTNEED) in
> guest, IPI causes a lot of exits.
> case 2, atop collects IPC by perf hardware events in guest, vpmu & rdpmc
> exits increase a lot.
> case 3, host memory compaction invalidates TDP and tdp_page_fault can cause
> huge loss of performance.
> case 4, web services(written by golang) call futex and have higher latency
> than host os environment.
> 
> Add more vm-exit reason debug entries, they are helpful to recognize
> performance drop reason. In this patch:
> 1, add more vm-exit reasons.
> 2, add wrmsr details.
> 3, add CR details.
> 4, add hypercall details.
> 
> Currently we can also implement the same result by bpf.
> Sample code (written by Fei Li<lifei.shirley@bytedance.com>):
>   b = BPF(text="""
> 
>   struct kvm_msr_exit_info {
>       u32 pid;
>       u32 tgid;
>       u32 msr_exit_ct;
>   };
>   BPF_HASH(kvm_msr_exit, unsigned int, struct kvm_msr_exit_info, 1024);
> 
>   TRACEPOINT_PROBE(kvm, kvm_msr) {
>       int ct = args->ecx;
>       if (ct >= 0xffffffff) {
>           return -1;
>       }
> 
>       u32 pid = bpf_get_current_pid_tgid() >> 32;
>       u32 tgid = bpf_get_current_pid_tgid();
> 
>       struct kvm_msr_exit_info *exit_info;
>       struct kvm_msr_exit_info init_exit_info = {};
>       exit_info = kvm_msr_exit.lookup(&ct);
>       if (exit_info != NULL) {
>           exit_info->pid = pid;
>           exit_info->tgid = tgid;
>           exit_info->msr_exit_ct++;
>       } else {
>           init_exit_info.pid = pid;
>           init_exit_info.tgid = tgid;
>           init_exit_info.msr_exit_ct = 1;
>           kvm_msr_exit.update(&ct, &init_exit_info);
>       }
>       return 0;
>   }
>   """)
> 
> Run wrmsr(MSR_IA32_TSCDEADLINE, val) benchmark in guest
> (CPU Intel Gold 5118):
> case 1, no bpf on host. ~1127 cycles/wrmsr.
> case 2, sample bpf on host with JIT. ~1223 cycles/wrmsr.      -->  8.5%
> case 3, sample bpf on host without JIT. ~1312 cycles/wrmsr.   --> 16.4%
> 
> So, debug entries are more efficient than the bpf method.

How much does host performance matter?  E.g. does high overhead interfere
with debug, is this something you want to have running at all times, etc...

> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 77 +++++++++++++++++++++++++++++++++++------
>  arch/x86/kvm/cpuid.c            |  1 +
>  arch/x86/kvm/lapic.c            | 17 +++++++++
>  arch/x86/kvm/pmu.c              |  1 +
>  arch/x86/kvm/vmx/vmx.c          | 26 ++++++++++++++
>  arch/x86/kvm/x86.c              | 60 ++++++++++++++++++++++++++++++++
>  6 files changed, 172 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a9d03af34030..e7d70ed046b2 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -945,21 +945,11 @@ struct kvm_vcpu_stat {
>  	u64 pf_guest;
>  	u64 tlb_flush;
>  	u64 invlpg;
> -
> -	u64 exits;
> -	u64 io_exits;
> -	u64 mmio_exits;
> -	u64 signal_exits;
> -	u64 irq_window_exits;
> -	u64 nmi_window_exits;
>  	u64 l1d_flush;
> -	u64 halt_exits;
>  	u64 halt_successful_poll;
>  	u64 halt_attempted_poll;
>  	u64 halt_poll_invalid;
>  	u64 halt_wakeup;
> -	u64 request_irq_exits;
> -	u64 irq_exits;
>  	u64 host_state_reload;
>  	u64 fpu_reload;
>  	u64 insn_emulation;
> @@ -968,6 +958,73 @@ struct kvm_vcpu_stat {
>  	u64 irq_injections;
>  	u64 nmi_injections;
>  	u64 req_event;
> +
> +	/* vm-exit reasons */
> +	u64 exits;
> +	u64 io_exits;
> +	u64 mmio_exits;
> +	u64 signal_exits;
> +	u64 irq_window_exits;
> +	u64 nmi_window_exits;
> +	u64 halt_exits;
> +	u64 request_irq_exits;
> +	u64 irq_exits;
> +	u64 exception_nmi_exits;
> +	u64 cr_exits;
> +	u64 dr_exits;
> +	u64 cpuid_exits;
> +	u64 rdpmc_exits;
> +	u64 update_ppr_exits;
> +	u64 rdmsr_exits;
> +	u64 wrmsr_exits;
> +	u64 apic_access_exits;
> +	u64 apic_write_exits;
> +	u64 apic_eoi_exits;
> +	u64 wbinvd_exits;
> +	u64 xsetbv_exits;
> +	u64 task_switch_exits;
> +	u64 ept_violation_exits;
> +	u64 pause_exits;
> +	u64 mwait_exits;
> +	u64 monitor_trap_exits;
> +	u64 monitor_exits;
> +	u64 pml_full_exits;
> +	u64 preemption_timer_exits;
> +	/* wrmsr & apic vm-exit reasons */
> +	u64 wrmsr_set_apic_base;
> +	u64 wrmsr_set_wall_clock;
> +	u64 wrmsr_set_system_time;
> +	u64 wrmsr_set_pmu;
> +	u64 lapic_set_tscdeadline;
> +	u64 lapic_set_tpr;
> +	u64 lapic_set_eoi;
> +	u64 lapic_set_ldr;
> +	u64 lapic_set_dfr;
> +	u64 lapic_set_spiv;
> +	u64 lapic_set_icr;
> +	u64 lapic_set_icr2;
> +	u64 lapic_set_lvt;
> +	u64 lapic_set_lvtt;
> +	u64 lapic_set_tmict;
> +	u64 lapic_set_tdcr;
> +	u64 lapic_set_esr;
> +	u64 lapic_set_self_ipi;
> +	/* cr vm-exit reasons */
> +	u64 cr_movetocr0;
> +	u64 cr_movetocr3;
> +	u64 cr_movetocr4;
> +	u64 cr_movetocr8;
> +	u64 cr_movefromcr3;
> +	u64 cr_movefromcr8;
> +	u64 cr_clts;
> +	u64 cr_lmsw;
> +	/* hypercall vm-exit reasons */
> +	u64 hypercall_vapic_poll_irq;
> +	u64 hypercall_kick_cpu;
> +#ifdef CONFIG_X86_64
> +	u64 hypercall_clock_pairing;
> +#endif
> +	u64 hypercall_send_ipi;
>  };

There are quite a few issues with this approach:

  - That's a lot of memory that may never be consumed.

  - struct kvm_vcpu_stat is vendor agnostic, but the exits tracked are
    very much VMX centric.  And for many of the exits that do overlap with
    SVM, only the VMX paths are updated by your patch.

  - The granularity of what is tracked is very haphazard, e.g. CLTS and
    LMSW get their own entries, but NMIs and all exceptions get lumped
    into a single stat.

And so on and so forth.

Just spitballing, rather than trying to extend the existing stats
implementation, what if we go in the opposite direction and pare it down
as much as possible in favor of trace events?  I.e. improve KVM's trace
events so that userspace can use them to aggregate data for all vCPUS in
a VM, filter out specific MSRs, etc..., and add post-processing scripts
for common operations so that users don't need to reinvent the wheel
every time they want to collect certain information.

Adding a tracepoint for things like fpu_reload or host_state_reload
is probably overkill, but just about every other x86 stat entry either
has an existing corresponding tracepoint or could provide much more
useful info if a tracepoint were added, e.g. to track which IRQ is
interrupting the guest or what I/O ports are causing exits.

The post-processing scripts might even be able to eliminate some of
manual analysis, e.g. by collating stats based on guest rip to build a
picture of what piece of guest code is triggerring what exits.

zhenwei pi April 23, 2019, 6:14 a.m. UTC | #2

On 04/23/2019 04:54 AM, Sean Christopherson wrote:

> On Mon, Apr 22, 2019 at 09:24:59PM +0800, zhenwei pi wrote:
>> Hit several typical cases of performance drop due to vm-exit:
>> case 1, jemalloc calls madvise(void *addr, size_t length, MADV_DONTNEED) in
>> guest, IPI causes a lot of exits.
>> case 2, atop collects IPC by perf hardware events in guest, vpmu & rdpmc
>> exits increase a lot.
>> case 3, host memory compaction invalidates TDP and tdp_page_fault can cause
>> huge loss of performance.
>> case 4, web services(written by golang) call futex and have higher latency
>> than host os environment.
>>
>> Add more vm-exit reason debug entries, they are helpful to recognize
>> performance drop reason. In this patch:
>> 1, add more vm-exit reasons.
>> 2, add wrmsr details.
>> 3, add CR details.
>> 4, add hypercall details.
>>
>> Currently we can also implement the same result by bpf.
>> Sample code (written by Fei Li<lifei.shirley@bytedance.com>):
>>    b = BPF(text="""
>>
>>    struct kvm_msr_exit_info {
>>        u32 pid;
>>        u32 tgid;
>>        u32 msr_exit_ct;
>>    };
>>    BPF_HASH(kvm_msr_exit, unsigned int, struct kvm_msr_exit_info, 1024);
>>
>>    TRACEPOINT_PROBE(kvm, kvm_msr) {
>>        int ct = args->ecx;
>>        if (ct >= 0xffffffff) {
>>            return -1;
>>        }
>>
>>        u32 pid = bpf_get_current_pid_tgid() >> 32;
>>        u32 tgid = bpf_get_current_pid_tgid();
>>
>>        struct kvm_msr_exit_info *exit_info;
>>        struct kvm_msr_exit_info init_exit_info = {};
>>        exit_info = kvm_msr_exit.lookup(&ct);
>>        if (exit_info != NULL) {
>>            exit_info->pid = pid;
>>            exit_info->tgid = tgid;
>>            exit_info->msr_exit_ct++;
>>        } else {
>>            init_exit_info.pid = pid;
>>            init_exit_info.tgid = tgid;
>>            init_exit_info.msr_exit_ct = 1;
>>            kvm_msr_exit.update(&ct, &init_exit_info);
>>        }
>>        return 0;
>>    }
>>    """)
>>
>> Run wrmsr(MSR_IA32_TSCDEADLINE, val) benchmark in guest
>> (CPU Intel Gold 5118):
>> case 1, no bpf on host. ~1127 cycles/wrmsr.
>> case 2, sample bpf on host with JIT. ~1223 cycles/wrmsr.      -->  8.5%
>> case 3, sample bpf on host without JIT. ~1312 cycles/wrmsr.   --> 16.4%
>>
>> So, debug entries are more efficient than the bpf method.
> How much does host performance matter?  E.g. does high overhead interfere
> with debug, is this something you want to have running at all times, etc...
>
Yes, I need to run the counter at all times.
Because over 3K large size (40C128G) VMs are managed by k8s, and different web
services in docker(Ex, with quota 4C8G). Web services can be scheduled automatically,
so I need to monitor the performance of VM, and find performance drop case as soon as
possible.

>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h | 77 +++++++++++++++++++++++++++++++++++------
>>   arch/x86/kvm/cpuid.c            |  1 +
>>   arch/x86/kvm/lapic.c            | 17 +++++++++
>>   arch/x86/kvm/pmu.c              |  1 +
>>   arch/x86/kvm/vmx/vmx.c          | 26 ++++++++++++++
>>   arch/x86/kvm/x86.c              | 60 ++++++++++++++++++++++++++++++++
>>   6 files changed, 172 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index a9d03af34030..e7d70ed046b2 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -945,21 +945,11 @@ struct kvm_vcpu_stat {
>>   	u64 pf_guest;
>>   	u64 tlb_flush;
>>   	u64 invlpg;
>> -
>> -	u64 exits;
>> -	u64 io_exits;
>> -	u64 mmio_exits;
>> -	u64 signal_exits;
>> -	u64 irq_window_exits;
>> -	u64 nmi_window_exits;
>>   	u64 l1d_flush;
>> -	u64 halt_exits;
>>   	u64 halt_successful_poll;
>>   	u64 halt_attempted_poll;
>>   	u64 halt_poll_invalid;
>>   	u64 halt_wakeup;
>> -	u64 request_irq_exits;
>> -	u64 irq_exits;
>>   	u64 host_state_reload;
>>   	u64 fpu_reload;
>>   	u64 insn_emulation;
>> @@ -968,6 +958,73 @@ struct kvm_vcpu_stat {
>>   	u64 irq_injections;
>>   	u64 nmi_injections;
>>   	u64 req_event;
>> +
>> +	/* vm-exit reasons */
>> +	u64 exits;
>> +	u64 io_exits;
>> +	u64 mmio_exits;
>> +	u64 signal_exits;
>> +	u64 irq_window_exits;
>> +	u64 nmi_window_exits;
>> +	u64 halt_exits;
>> +	u64 request_irq_exits;
>> +	u64 irq_exits;
>> +	u64 exception_nmi_exits;
>> +	u64 cr_exits;
>> +	u64 dr_exits;
>> +	u64 cpuid_exits;
>> +	u64 rdpmc_exits;
>> +	u64 update_ppr_exits;
>> +	u64 rdmsr_exits;
>> +	u64 wrmsr_exits;
>> +	u64 apic_access_exits;
>> +	u64 apic_write_exits;
>> +	u64 apic_eoi_exits;
>> +	u64 wbinvd_exits;
>> +	u64 xsetbv_exits;
>> +	u64 task_switch_exits;
>> +	u64 ept_violation_exits;
>> +	u64 pause_exits;
>> +	u64 mwait_exits;
>> +	u64 monitor_trap_exits;
>> +	u64 monitor_exits;
>> +	u64 pml_full_exits;
>> +	u64 preemption_timer_exits;
>> +	/* wrmsr & apic vm-exit reasons */
>> +	u64 wrmsr_set_apic_base;
>> +	u64 wrmsr_set_wall_clock;
>> +	u64 wrmsr_set_system_time;
>> +	u64 wrmsr_set_pmu;
>> +	u64 lapic_set_tscdeadline;
>> +	u64 lapic_set_tpr;
>> +	u64 lapic_set_eoi;
>> +	u64 lapic_set_ldr;
>> +	u64 lapic_set_dfr;
>> +	u64 lapic_set_spiv;
>> +	u64 lapic_set_icr;
>> +	u64 lapic_set_icr2;
>> +	u64 lapic_set_lvt;
>> +	u64 lapic_set_lvtt;
>> +	u64 lapic_set_tmict;
>> +	u64 lapic_set_tdcr;
>> +	u64 lapic_set_esr;
>> +	u64 lapic_set_self_ipi;
>> +	/* cr vm-exit reasons */
>> +	u64 cr_movetocr0;
>> +	u64 cr_movetocr3;
>> +	u64 cr_movetocr4;
>> +	u64 cr_movetocr8;
>> +	u64 cr_movefromcr3;
>> +	u64 cr_movefromcr8;
>> +	u64 cr_clts;
>> +	u64 cr_lmsw;
>> +	/* hypercall vm-exit reasons */
>> +	u64 hypercall_vapic_poll_irq;
>> +	u64 hypercall_kick_cpu;
>> +#ifdef CONFIG_X86_64
>> +	u64 hypercall_clock_pairing;
>> +#endif
>> +	u64 hypercall_send_ipi;
>>   };
> There are quite a few issues with this approach:
>
>    - That's a lot of memory that may never be consumed.
>
>    - struct kvm_vcpu_stat is vendor agnostic, but the exits tracked are
>      very much VMX centric.  And for many of the exits that do overlap with
>      SVM, only the VMX paths are updated by your patch.
>
>    - The granularity of what is tracked is very haphazard, e.g. CLTS and
>      LMSW get their own entries, but NMIs and all exceptions get lumped
>      into a single stat.
>
> And so on and so forth.
>
> Just spitballing, rather than trying to extend the existing stats
> implementation, what if we go in the opposite direction and pare it down
> as much as possible in favor of trace events?  I.e. improve KVM's trace
> events so that userspace can use them to aggregate data for all vCPUS in
> a VM, filter out specific MSRs, etc..., and add post-processing scripts
> for common operations so that users don't need to reinvent the wheel
> every time they want to collect certain information.
>
> Adding a tracepoint for things like fpu_reload or host_state_reload
> is probably overkill, but just about every other x86 stat entry either
> has an existing corresponding tracepoint or could provide much more
> useful info if a tracepoint were added, e.g. to track which IRQ is
> interrupting the guest or what I/O ports are causing exits.
>
> The post-processing scripts might even be able to eliminate some of
> manual analysis, e.g. by collating stats based on guest rip to build a
> picture of what piece of guest code is triggerring what exits.
>
Thanks for your message. I'll do more test as you suggested.

Fam April 25, 2019, 7:01 a.m. UTC | #3

> On Apr 23, 2019, at 04:54, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Mon, Apr 22, 2019 at 09:24:59PM +0800, zhenwei pi wrote:
>> Hit several typical cases of performance drop due to vm-exit:
>> case 1, jemalloc calls madvise(void *addr, size_t length, MADV_DONTNEED) in
>> guest, IPI causes a lot of exits.
>> case 2, atop collects IPC by perf hardware events in guest, vpmu & rdpmc
>> exits increase a lot.
>> case 3, host memory compaction invalidates TDP and tdp_page_fault can cause
>> huge loss of performance.
>> case 4, web services(written by golang) call futex and have higher latency
>> than host os environment.
>> 
>> Add more vm-exit reason debug entries, they are helpful to recognize
>> performance drop reason. In this patch:
>> 1, add more vm-exit reasons.
>> 2, add wrmsr details.
>> 3, add CR details.
>> 4, add hypercall details.
>> 
>> Currently we can also implement the same result by bpf.
>> Sample code (written by Fei Li<lifei.shirley@bytedance.com>):
>>  b = BPF(text="""
>> 
>>  struct kvm_msr_exit_info {
>>      u32 pid;
>>      u32 tgid;
>>      u32 msr_exit_ct;
>>  };
>>  BPF_HASH(kvm_msr_exit, unsigned int, struct kvm_msr_exit_info, 1024);
>> 
>>  TRACEPOINT_PROBE(kvm, kvm_msr) {
>>      int ct = args->ecx;
>>      if (ct >= 0xffffffff) {
>>          return -1;
>>      }
>> 
>>      u32 pid = bpf_get_current_pid_tgid() >> 32;
>>      u32 tgid = bpf_get_current_pid_tgid();
>> 
>>      struct kvm_msr_exit_info *exit_info;
>>      struct kvm_msr_exit_info init_exit_info = {};
>>      exit_info = kvm_msr_exit.lookup(&ct);
>>      if (exit_info != NULL) {
>>          exit_info->pid = pid;
>>          exit_info->tgid = tgid;
>>          exit_info->msr_exit_ct++;
>>      } else {
>>          init_exit_info.pid = pid;
>>          init_exit_info.tgid = tgid;
>>          init_exit_info.msr_exit_ct = 1;
>>          kvm_msr_exit.update(&ct, &init_exit_info);
>>      }
>>      return 0;
>>  }
>>  """)
>> 
>> Run wrmsr(MSR_IA32_TSCDEADLINE, val) benchmark in guest
>> (CPU Intel Gold 5118):
>> case 1, no bpf on host. ~1127 cycles/wrmsr.
>> case 2, sample bpf on host with JIT. ~1223 cycles/wrmsr.      -->  8.5%
>> case 3, sample bpf on host without JIT. ~1312 cycles/wrmsr.   --> 16.4%
>> 
>> So, debug entries are more efficient than the bpf method.
> 
> How much does host performance matter?  E.g. does high overhead interfere
> with debug, is this something you want to have running at all times, etc…

The intention is to have a long running statictics for monitoring etc. In general, using eBPF, ftrace etc. are okay for not-so-hot points, but this one turned out to be very difficult to do efficiently without such a code change.

Fam

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a9d03af34030..e7d70ed046b2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -945,21 +945,11 @@  struct kvm_vcpu_stat {
 	u64 pf_guest;
 	u64 tlb_flush;
 	u64 invlpg;
-
-	u64 exits;
-	u64 io_exits;
-	u64 mmio_exits;
-	u64 signal_exits;
-	u64 irq_window_exits;
-	u64 nmi_window_exits;
 	u64 l1d_flush;
-	u64 halt_exits;
 	u64 halt_successful_poll;
 	u64 halt_attempted_poll;
 	u64 halt_poll_invalid;
 	u64 halt_wakeup;
-	u64 request_irq_exits;
-	u64 irq_exits;
 	u64 host_state_reload;
 	u64 fpu_reload;
 	u64 insn_emulation;
@@ -968,6 +958,73 @@  struct kvm_vcpu_stat {
 	u64 irq_injections;
 	u64 nmi_injections;
 	u64 req_event;
+
+	/* vm-exit reasons */
+	u64 exits;
+	u64 io_exits;
+	u64 mmio_exits;
+	u64 signal_exits;
+	u64 irq_window_exits;
+	u64 nmi_window_exits;
+	u64 halt_exits;
+	u64 request_irq_exits;
+	u64 irq_exits;
+	u64 exception_nmi_exits;
+	u64 cr_exits;
+	u64 dr_exits;
+	u64 cpuid_exits;
+	u64 rdpmc_exits;
+	u64 update_ppr_exits;
+	u64 rdmsr_exits;
+	u64 wrmsr_exits;
+	u64 apic_access_exits;
+	u64 apic_write_exits;
+	u64 apic_eoi_exits;
+	u64 wbinvd_exits;
+	u64 xsetbv_exits;
+	u64 task_switch_exits;
+	u64 ept_violation_exits;
+	u64 pause_exits;
+	u64 mwait_exits;
+	u64 monitor_trap_exits;
+	u64 monitor_exits;
+	u64 pml_full_exits;
+	u64 preemption_timer_exits;
+	/* wrmsr & apic vm-exit reasons */
+	u64 wrmsr_set_apic_base;
+	u64 wrmsr_set_wall_clock;
+	u64 wrmsr_set_system_time;
+	u64 wrmsr_set_pmu;
+	u64 lapic_set_tscdeadline;
+	u64 lapic_set_tpr;
+	u64 lapic_set_eoi;
+	u64 lapic_set_ldr;
+	u64 lapic_set_dfr;
+	u64 lapic_set_spiv;
+	u64 lapic_set_icr;
+	u64 lapic_set_icr2;
+	u64 lapic_set_lvt;
+	u64 lapic_set_lvtt;
+	u64 lapic_set_tmict;
+	u64 lapic_set_tdcr;
+	u64 lapic_set_esr;
+	u64 lapic_set_self_ipi;
+	/* cr vm-exit reasons */
+	u64 cr_movetocr0;
+	u64 cr_movetocr3;
+	u64 cr_movetocr4;
+	u64 cr_movetocr8;
+	u64 cr_movefromcr3;
+	u64 cr_movefromcr8;
+	u64 cr_clts;
+	u64 cr_lmsw;
+	/* hypercall vm-exit reasons */
+	u64 hypercall_vapic_poll_irq;
+	u64 hypercall_kick_cpu;
+#ifdef CONFIG_X86_64
+	u64 hypercall_clock_pairing;
+#endif
+	u64 hypercall_send_ipi;
 };
 
 struct x86_instruction_info;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd3951638ae4..c02846b0a74f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -959,6 +959,7 @@  int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
 	u32 eax, ebx, ecx, edx;
 
+	++vcpu->stat.cpuid_exits;
 	if (cpuid_fault_enabled(vcpu) && !kvm_require_cpl(vcpu, 0))
 		return 1;
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9bf70cf84564..e43aabd3057a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -707,6 +707,7 @@  static void apic_update_ppr(struct kvm_lapic *apic)
 
 void kvm_apic_update_ppr(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.update_ppr_exits;
 	apic_update_ppr(vcpu->arch.apic);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_update_ppr);
@@ -1199,6 +1200,7 @@  void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
+	++vcpu->stat.apic_eoi_exits;
 	trace_kvm_eoi(apic, vector);
 
 	kvm_ioapic_send_eoi(apic, vector);
@@ -1820,15 +1822,18 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		break;
 
 	case APIC_TASKPRI:
+		++apic->vcpu->stat.lapic_set_tpr;
 		report_tpr_access(apic, true);
 		apic_set_tpr(apic, val & 0xff);
 		break;
 
 	case APIC_EOI:
+		++apic->vcpu->stat.lapic_set_eoi;
 		apic_set_eoi(apic);
 		break;
 
 	case APIC_LDR:
+		++apic->vcpu->stat.lapic_set_ldr;
 		if (!apic_x2apic_mode(apic))
 			kvm_apic_set_ldr(apic, val & APIC_LDR_MASK);
 		else
@@ -1836,6 +1841,7 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		break;
 
 	case APIC_DFR:
+		++apic->vcpu->stat.lapic_set_dfr;
 		if (!apic_x2apic_mode(apic)) {
 			kvm_lapic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
 			recalculate_apic_map(apic->vcpu->kvm);
@@ -1845,6 +1851,7 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 	case APIC_SPIV: {
 		u32 mask = 0x3ff;
+		++apic->vcpu->stat.lapic_set_spiv;
 		if (kvm_lapic_get_reg(apic, APIC_LVR) & APIC_LVR_DIRECTED_EOI)
 			mask |= APIC_SPIV_DIRECTED_EOI;
 		apic_set_spiv(apic, val & mask);
@@ -1865,12 +1872,14 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		break;
 	}
 	case APIC_ICR:
+		++apic->vcpu->stat.lapic_set_icr;
 		/* No delay here, so we always clear the pending bit */
 		kvm_lapic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
 		apic_send_ipi(apic);
 		break;
 
 	case APIC_ICR2:
+		++apic->vcpu->stat.lapic_set_icr2;
 		if (!apic_x2apic_mode(apic))
 			val &= 0xff000000;
 		kvm_lapic_set_reg(apic, APIC_ICR2, val);
@@ -1883,6 +1892,7 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 	case APIC_LVTPC:
 	case APIC_LVT1:
 	case APIC_LVTERR:
+		++apic->vcpu->stat.lapic_set_lvt;
 		/* TODO: Check vector */
 		if (!kvm_apic_sw_enabled(apic))
 			val |= APIC_LVT_MASKED;
@@ -1893,6 +1903,7 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		break;
 
 	case APIC_LVTT:
+		++apic->vcpu->stat.lapic_set_lvtt;
 		if (!kvm_apic_sw_enabled(apic))
 			val |= APIC_LVT_MASKED;
 		val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
@@ -1901,6 +1912,7 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		break;
 
 	case APIC_TMICT:
+		++apic->vcpu->stat.lapic_set_tmict;
 		if (apic_lvtt_tscdeadline(apic))
 			break;
 
@@ -1912,6 +1924,7 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 	case APIC_TDCR: {
 		uint32_t old_divisor = apic->divide_count;
 
+		++apic->vcpu->stat.lapic_set_tdcr;
 		if (val & 4)
 			apic_debug("KVM_WRITE:TDCR %x\n", val);
 		kvm_lapic_set_reg(apic, APIC_TDCR, val);
@@ -1925,6 +1938,7 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		break;
 	}
 	case APIC_ESR:
+		++apic->vcpu->stat.lapic_set_esr;
 		if (apic_x2apic_mode(apic) && val != 0) {
 			apic_debug("KVM_WRITE:ESR not zero %x\n", val);
 			ret = 1;
@@ -1932,6 +1946,7 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		break;
 
 	case APIC_SELF_IPI:
+		++apic->vcpu->stat.lapic_set_self_ipi;
 		if (apic_x2apic_mode(apic)) {
 			kvm_lapic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
 		} else
@@ -1999,6 +2014,7 @@  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 {
 	u32 val = 0;
 
+	++vcpu->stat.apic_write_exits;
 	/* hw has done the conditional check and inst decode */
 	offset &= 0xff0;
 
@@ -2050,6 +2066,7 @@  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
+	++vcpu->stat.lapic_set_tscdeadline;
 	if (!lapic_in_kernel(vcpu) || apic_lvtt_oneshot(apic) ||
 			apic_lvtt_period(apic))
 		return;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index e39741997893..e731e788aed0 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -321,6 +321,7 @@  int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 
 int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+	++vcpu->stat.wrmsr_set_pmu;
 	return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info);
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b4e7d645275a..7c5aecd73827 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4451,6 +4451,7 @@  static int handle_exception(struct kvm_vcpu *vcpu)
 	u32 vect_info;
 	enum emulation_result er;
 
+	++vcpu->stat.exception_nmi_exits;
 	vect_info = vmx->idt_vectoring_info;
 	intr_info = vmx->exit_intr_info;
 
@@ -4657,6 +4658,7 @@  static int handle_cr(struct kvm_vcpu *vcpu)
 	int err;
 	int ret;
 
+	++vcpu->stat.cr_exits;
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	cr = exit_qualification & 15;
 	reg = (exit_qualification >> 8) & 15;
@@ -4666,18 +4668,22 @@  static int handle_cr(struct kvm_vcpu *vcpu)
 		trace_kvm_cr_write(cr, val);
 		switch (cr) {
 		case 0:
+			++vcpu->stat.cr_movetocr0;
 			err = handle_set_cr0(vcpu, val);
 			return kvm_complete_insn_gp(vcpu, err);
 		case 3:
+			++vcpu->stat.cr_movetocr3;
 			WARN_ON_ONCE(enable_unrestricted_guest);
 			err = kvm_set_cr3(vcpu, val);
 			return kvm_complete_insn_gp(vcpu, err);
 		case 4:
+			++vcpu->stat.cr_movetocr4;
 			err = handle_set_cr4(vcpu, val);
 			return kvm_complete_insn_gp(vcpu, err);
 		case 8: {
 				u8 cr8_prev = kvm_get_cr8(vcpu);
 				u8 cr8 = (u8)val;
+				++vcpu->stat.cr_movetocr8;
 				err = kvm_set_cr8(vcpu, cr8);
 				ret = kvm_complete_insn_gp(vcpu, err);
 				if (lapic_in_kernel(vcpu))
@@ -4695,6 +4701,7 @@  static int handle_cr(struct kvm_vcpu *vcpu)
 		}
 		break;
 	case 2: /* clts */
+		++vcpu->stat.cr_clts;
 		WARN_ONCE(1, "Guest should always own CR0.TS");
 		vmx_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS));
 		trace_kvm_cr_write(0, kvm_read_cr0(vcpu));
@@ -4702,12 +4709,14 @@  static int handle_cr(struct kvm_vcpu *vcpu)
 	case 1: /*mov from cr*/
 		switch (cr) {
 		case 3:
+			++vcpu->stat.cr_movefromcr3;
 			WARN_ON_ONCE(enable_unrestricted_guest);
 			val = kvm_read_cr3(vcpu);
 			kvm_register_write(vcpu, reg, val);
 			trace_kvm_cr_read(cr, val);
 			return kvm_skip_emulated_instruction(vcpu);
 		case 8:
+			++vcpu->stat.cr_movefromcr8;
 			val = kvm_get_cr8(vcpu);
 			kvm_register_write(vcpu, reg, val);
 			trace_kvm_cr_read(cr, val);
@@ -4715,6 +4724,7 @@  static int handle_cr(struct kvm_vcpu *vcpu)
 		}
 		break;
 	case 3: /* lmsw */
+		++vcpu->stat.cr_lmsw;
 		val = (exit_qualification >> LMSW_SOURCE_DATA_SHIFT) & 0x0f;
 		trace_kvm_cr_write(0, (kvm_read_cr0(vcpu) & ~0xful) | val);
 		kvm_lmsw(vcpu, val);
@@ -4734,6 +4744,7 @@  static int handle_dr(struct kvm_vcpu *vcpu)
 	unsigned long exit_qualification;
 	int dr, dr7, reg;
 
+	++vcpu->stat.dr_exits;
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	dr = exit_qualification & DEBUG_REG_ACCESS_NUM;
 
@@ -4830,6 +4841,8 @@  static int handle_rdmsr(struct kvm_vcpu *vcpu)
 	u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX];
 	struct msr_data msr_info;
 
+	++vcpu->stat.rdmsr_exits;
+
 	msr_info.index = ecx;
 	msr_info.host_initiated = false;
 	if (vmx_get_msr(vcpu, &msr_info)) {
@@ -4853,6 +4866,8 @@  static int handle_wrmsr(struct kvm_vcpu *vcpu)
 	u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u)
 		| ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32);
 
+	++vcpu->stat.wrmsr_exits;
+
 	msr.data = data;
 	msr.index = ecx;
 	msr.host_initiated = false;
@@ -4910,6 +4925,7 @@  static int handle_rdpmc(struct kvm_vcpu *vcpu)
 {
 	int err;
 
+	++vcpu->stat.rdpmc_exits;
 	err = kvm_rdpmc(vcpu);
 	return kvm_complete_insn_gp(vcpu, err);
 }
@@ -4924,6 +4940,7 @@  static int handle_xsetbv(struct kvm_vcpu *vcpu)
 	u64 new_bv = kvm_read_edx_eax(vcpu);
 	u32 index = kvm_register_read(vcpu, VCPU_REGS_RCX);
 
+	++vcpu->stat.xsetbv_exits;
 	if (kvm_set_xcr(vcpu, index, new_bv) == 0)
 		return kvm_skip_emulated_instruction(vcpu);
 	return 1;
@@ -4945,6 +4962,7 @@  static int handle_xrstors(struct kvm_vcpu *vcpu)
 
 static int handle_apic_access(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.apic_access_exits;
 	if (likely(fasteoi)) {
 		unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 		int access_type, offset;
@@ -4998,6 +5016,7 @@  static int handle_task_switch(struct kvm_vcpu *vcpu)
 	idt_index = (vmx->idt_vectoring_info & VECTORING_INFO_VECTOR_MASK);
 	type = (vmx->idt_vectoring_info & VECTORING_INFO_TYPE_MASK);
 
+	++vcpu->stat.task_switch_exits;
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
 	reason = (u32)exit_qualification >> 30;
@@ -5056,6 +5075,7 @@  static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	gpa_t gpa;
 	u64 error_code;
 
+	++vcpu->stat.ept_violation_exits;
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
 	/*
@@ -5268,6 +5288,7 @@  static void vmx_enable_tdp(void)
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.pause_exits;
 	if (!kvm_pause_in_guest(vcpu->kvm))
 		grow_ple_window(vcpu);
 
@@ -5288,6 +5309,7 @@  static int handle_nop(struct kvm_vcpu *vcpu)
 
 static int handle_mwait(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.mwait_exits;
 	printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
 	return handle_nop(vcpu);
 }
@@ -5300,11 +5322,13 @@  static int handle_invalid_op(struct kvm_vcpu *vcpu)
 
 static int handle_monitor_trap(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.monitor_trap_exits;
 	return 1;
 }
 
 static int handle_monitor(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.monitor_exits;
 	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
 	return handle_nop(vcpu);
 }
@@ -5412,6 +5436,7 @@  static int handle_pml_full(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification;
 
+	++vcpu->stat.pml_full_exits;
 	trace_kvm_pml_full(vcpu->vcpu_id);
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -5435,6 +5460,7 @@  static int handle_pml_full(struct kvm_vcpu *vcpu)
 
 static int handle_preemption_timer(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.preemption_timer_exits;
 	if (!to_vmx(vcpu)->req_immediate_exit)
 		kvm_lapic_expired_hv_timer(vcpu);
 	return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a0d1fc80ac5a..938080dafee6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -209,6 +209,59 @@  struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ "largepages", VM_STAT(lpages) },
 	{ "max_mmu_page_hash_collisions",
 		VM_STAT(max_mmu_page_hash_collisions) },
+	{ "exception_nmi_exits", VCPU_STAT(exception_nmi_exits) },
+	{ "cr_exits", VCPU_STAT(cr_exits) },
+	{ "dr_exits", VCPU_STAT(dr_exits) },
+	{ "cpuid_exits", VCPU_STAT(cpuid_exits) },
+	{ "rdpmc_exits", VCPU_STAT(rdpmc_exits) },
+	{ "update_ppr_exits", VCPU_STAT(update_ppr_exits) },
+	{ "rdmsr_exits", VCPU_STAT(rdmsr_exits) },
+	{ "wrmsr_exits", VCPU_STAT(wrmsr_exits) },
+	{ "apic_access_exits", VCPU_STAT(apic_access_exits) },
+	{ "apic_write_exits", VCPU_STAT(apic_write_exits) },
+	{ "apic_eoi_exits", VCPU_STAT(apic_eoi_exits) },
+	{ "wbinvd_exits", VCPU_STAT(wbinvd_exits) },
+	{ "xsetbv_exits", VCPU_STAT(xsetbv_exits) },
+	{ "task_switch_exits", VCPU_STAT(task_switch_exits) },
+	{ "ept_violation_exits", VCPU_STAT(ept_violation_exits) },
+	{ "pause_exits", VCPU_STAT(pause_exits) },
+	{ "mwait_exits", VCPU_STAT(mwait_exits) },
+	{ "monitor_trap_exits", VCPU_STAT(monitor_trap_exits) },
+	{ "monitor_exits", VCPU_STAT(monitor_exits) },
+	{ "pml_full_exits", VCPU_STAT(pml_full_exits) },
+	{ "preemption_timer_exits", VCPU_STAT(preemption_timer_exits) },
+	{ "wrmsr_set_apic_base", VCPU_STAT(wrmsr_set_apic_base) },
+	{ "wrmsr_set_wall_clock", VCPU_STAT(wrmsr_set_wall_clock) },
+	{ "wrmsr_set_system_time", VCPU_STAT(wrmsr_set_system_time) },
+	{ "wrmsr_set_pmu", VCPU_STAT(wrmsr_set_pmu) },
+	{ "lapic_set_tscdeadline", VCPU_STAT(lapic_set_tscdeadline) },
+	{ "lapic_set_tpr", VCPU_STAT(lapic_set_tpr) },
+	{ "lapic_set_eoi", VCPU_STAT(lapic_set_eoi) },
+	{ "lapic_set_ldr", VCPU_STAT(lapic_set_ldr) },
+	{ "lapic_set_dfr", VCPU_STAT(lapic_set_dfr) },
+	{ "lapic_set_spiv", VCPU_STAT(lapic_set_spiv) },
+	{ "lapic_set_icr", VCPU_STAT(lapic_set_icr) },
+	{ "lapic_set_icr2", VCPU_STAT(lapic_set_icr2) },
+	{ "lapic_set_lvt", VCPU_STAT(lapic_set_lvt) },
+	{ "lapic_set_lvtt", VCPU_STAT(lapic_set_lvtt) },
+	{ "lapic_set_tmict", VCPU_STAT(lapic_set_tmict) },
+	{ "lapic_set_tdcr", VCPU_STAT(lapic_set_tdcr) },
+	{ "lapic_set_esr", VCPU_STAT(lapic_set_esr) },
+	{ "lapic_set_self_ipi", VCPU_STAT(lapic_set_self_ipi) },
+	{ "cr_movetocr0", VCPU_STAT(cr_movetocr0) },
+	{ "cr_movetocr3", VCPU_STAT(cr_movetocr3) },
+	{ "cr_movetocr4", VCPU_STAT(cr_movetocr4) },
+	{ "cr_movetocr8", VCPU_STAT(cr_movetocr8) },
+	{ "cr_movefromcr3", VCPU_STAT(cr_movefromcr3) },
+	{ "cr_movefromcr8", VCPU_STAT(cr_movefromcr8) },
+	{ "cr_clts", VCPU_STAT(cr_clts) },
+	{ "cr_lmsw", VCPU_STAT(cr_lmsw) },
+	{ "hypercall_vapic_poll_irq", VCPU_STAT(hypercall_vapic_poll_irq) },
+	{ "hypercall_kick_cpu", VCPU_STAT(hypercall_kick_cpu) },
+#ifdef CONFIG_X86_64
+	{ "hypercall_clock_pairing", VCPU_STAT(hypercall_clock_pairing) },
+#endif
+	{ "hypercall_send_ipi", VCPU_STAT(hypercall_send_ipi) },
 	{ NULL }
 };
 
@@ -2519,6 +2572,7 @@  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_KVM_WALL_CLOCK_NEW:
 	case MSR_KVM_WALL_CLOCK:
+		++vcpu->stat.wrmsr_set_wall_clock;
 		vcpu->kvm->arch.wall_clock = data;
 		kvm_write_wall_clock(vcpu->kvm, data);
 		break;
@@ -2526,6 +2580,7 @@  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KVM_SYSTEM_TIME: {
 		struct kvm_arch *ka = &vcpu->kvm->arch;
 
+		++vcpu->stat.wrmsr_set_system_time;
 		kvmclock_reset(vcpu);
 
 		if (vcpu->vcpu_id == 0 && !msr_info->host_initiated) {
@@ -5683,6 +5738,7 @@  static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
 
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.wbinvd_exits;
 	kvm_emulate_wbinvd_noskip(vcpu);
 	return kvm_skip_emulated_instruction(vcpu);
 }
@@ -7127,18 +7183,22 @@  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 
 	switch (nr) {
 	case KVM_HC_VAPIC_POLL_IRQ:
+		++vcpu->stat.hypercall_vapic_poll_irq;
 		ret = 0;
 		break;
 	case KVM_HC_KICK_CPU:
+		++vcpu->stat.hypercall_kick_cpu;
 		kvm_pv_kick_cpu_op(vcpu->kvm, a0, a1);
 		ret = 0;
 		break;
 #ifdef CONFIG_X86_64
 	case KVM_HC_CLOCK_PAIRING:
+		++vcpu->stat.hypercall_clock_pairing;
 		ret = kvm_pv_clock_pairing(vcpu, a0, a1);
 		break;
 #endif
 	case KVM_HC_SEND_IPI:
+		++vcpu->stat.hypercall_send_ipi;
 		ret = kvm_pv_send_ipi(vcpu->kvm, a0, a1, a2, a3, op_64_bit);
 		break;
 	default:

KVM: x86: add full vm-exit reason debug entries

Commit Message

Comments

Patch