Message ID | 20211130074221.93635-6-likexu@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/pmu: Count two basic events for emulated instructions | expand |
On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote: > > From: Like Xu <likexu@tencent.com> > > When KVM retires a guest instruction through emulation, increment any > vPMCs that are configured to monitor "instructions retired," and > update the sample period of those counters so that they will overflow > at the right time. > > Signed-off-by: Eric Hankland <ehankland@google.com> > [jmattson: > - Split the code to increment "branch instructions retired" into a > separate commit. > - Added 'static' to kvm_pmu_incr_counter() definition. > - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state == > PERF_EVENT_STATE_ACTIVE. > ] > Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") > Signed-off-by: Jim Mattson <jmattson@google.com> > [likexu: > - Drop checks for pmc->perf_event or event state or event type > - Increase a counter once its umask bits and the first 8 select bits are matched > - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf; > - Rename kvm_pmu_record_event to kvm_pmu_trigger_event; > - Add counter enable and CPL check for kvm_pmu_trigger_event(); > ] > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Like Xu <likexu@tencent.com> > --- > +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id) > +{ > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + struct kvm_pmc *pmc; > + int i; > + > + for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) { > + pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i); > + > + if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc)) > + continue; > + > + /* Ignore checks for edge detect, pin control, invert and CMASK bits */ I don't understand how we can ignore these checks. Doesn't that violate the architectural specification? > + if (eventsel_match_perf_hw_id(pmc, perf_hw_id) && cpl_is_matched(pmc)) > + kvm_pmu_incr_counter(pmc); > + } > +} > +EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event); > +
On 9/12/2021 12:33 pm, Jim Mattson wrote: > On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote: >> >> From: Like Xu <likexu@tencent.com> >> >> When KVM retires a guest instruction through emulation, increment any >> vPMCs that are configured to monitor "instructions retired," and >> update the sample period of those counters so that they will overflow >> at the right time. >> >> Signed-off-by: Eric Hankland <ehankland@google.com> >> [jmattson: >> - Split the code to increment "branch instructions retired" into a >> separate commit. >> - Added 'static' to kvm_pmu_incr_counter() definition. >> - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state == >> PERF_EVENT_STATE_ACTIVE. >> ] >> Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") >> Signed-off-by: Jim Mattson <jmattson@google.com> >> [likexu: >> - Drop checks for pmc->perf_event or event state or event type >> - Increase a counter once its umask bits and the first 8 select bits are matched >> - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf; >> - Rename kvm_pmu_record_event to kvm_pmu_trigger_event; >> - Add counter enable and CPL check for kvm_pmu_trigger_event(); >> ] >> Cc: Peter Zijlstra <peterz@infradead.org> >> Signed-off-by: Like Xu <likexu@tencent.com> >> --- > >> +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id) >> +{ >> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + struct kvm_pmc *pmc; >> + int i; >> + >> + for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) { >> + pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i); >> + >> + if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc)) >> + continue; >> + >> + /* Ignore checks for edge detect, pin control, invert and CMASK bits */ > > I don't understand how we can ignore these checks. Doesn't that > violate the architectural specification? OK, let's take a conservative approach in the V3. > >> + if (eventsel_match_perf_hw_id(pmc, perf_hw_id) && cpl_is_matched(pmc)) >> + kvm_pmu_incr_counter(pmc); >> + } >> +} >> +EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event); >> + >
>>> + /* Ignore checks for edge detect, pin control, invert and >>> CMASK bits */ >> >> I don't understand how we can ignore these checks. Doesn't that >> violate the architectural specification? > > OK, let's take a conservative approach in the V3. > Hi Jim, does this version look good to you ? --- From 4ad42d98ce26d324fa2f72c38fe2c42fe04f2d6d Mon Sep 17 00:00:00 2001 From: Like Xu <likexu@tencent.com> Date: Tue, 30 Nov 2021 15:42:20 +0800 Subject: [PATCH 5/6] KVM: x86: Update vPMCs when retiring instructions From: Like Xu <likexu@tencent.com> When KVM retires a guest instruction through emulation, increment any vPMCs that are configured to monitor "instructions retired," and update the sample period of those counters so that they will overflow at the right time. Signed-off-by: Eric Hankland <ehankland@google.com> [jmattson: - Split the code to increment "branch instructions retired" into a separate commit. - Added 'static' to kvm_pmu_incr_counter() definition. - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state == PERF_EVENT_STATE_ACTIVE. ] Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> [likexu: - Drop checks for pmc->perf_event or event state or event type - Increase a counter only its umask bits and the first 8 select bits are matched - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf; - Rename kvm_pmu_record_event to kvm_pmu_trigger_event; - Add counter enable and CPL check for kvm_pmu_trigger_event(); ] Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Like Xu <likexu@tencent.com> --- arch/x86/kvm/pmu.c | 73 ++++++++++++++++++++++++++++++++++++++++++---- arch/x86/kvm/pmu.h | 1 + arch/x86/kvm/x86.c | 3 ++ 3 files changed, 72 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index a20207ee4014..db510dae3241 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -22,6 +22,14 @@ /* This is enough to filter the vast majority of currently defined events. */ #define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300 +#define PMC_EVENTSEL_ARCH_MASK \ + (ARCH_PERFMON_EVENTSEL_EVENT | \ + ARCH_PERFMON_EVENTSEL_UMASK | \ + ARCH_PERFMON_EVENTSEL_USR | \ + ARCH_PERFMON_EVENTSEL_OS | \ + ARCH_PERFMON_EVENTSEL_INT | \ + ARCH_PERFMON_EVENTSEL_ENABLE) + /* NOTE: * - Each perf counter is defined as "struct kvm_pmc"; * - There are two types of perf counters: general purpose (gp) and fixed. @@ -203,11 +211,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) if (!allow_event) return; - if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE | - ARCH_PERFMON_EVENTSEL_INV | - ARCH_PERFMON_EVENTSEL_CMASK | - HSW_IN_TX | - HSW_IN_TX_CHECKPOINTED))) { + if (!(eventsel & ~PMC_EVENTSEL_ARCH_MASK)) { config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc); if (config != PERF_COUNT_HW_MAX) type = PERF_TYPE_HARDWARE; @@ -482,6 +486,65 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu) kvm_pmu_reset(vcpu); } +static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) +{ + struct kvm_pmu *pmu = pmc_to_pmu(pmc); + u64 prev_count; + + prev_count = pmc->counter; + pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc); + + reprogram_counter(pmu, pmc->idx); + if (pmc->counter < prev_count) + __kvm_perf_overflow(pmc, false); +} + +static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc, + unsigned int perf_hw_id) +{ + if (pmc->eventsel & ~PMC_EVENTSEL_ARCH_MASK) + return false; + + return kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc) == perf_hw_id; +} + +static inline bool cpl_is_matched(struct kvm_pmc *pmc) +{ + bool select_os, select_user; + u64 config = pmc->current_config; + + if (pmc_is_gp(pmc)) { + select_os = config & ARCH_PERFMON_EVENTSEL_OS; + select_user = config & ARCH_PERFMON_EVENTSEL_USR; + } else { + select_os = config & 0x1; + select_user = config & 0x2; + } + + return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user; +} + +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + struct kvm_pmc *pmc; + int i; + + if (!pmu->version) + return; + + for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) { + pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i); + + if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc)) + continue; + + if (eventsel_match_perf_hw_id(pmc, perf_hw_id) && cpl_is_matched(pmc)) + kvm_pmu_incr_counter(pmc); + } +} +EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event); + int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp) { struct kvm_pmu_event_filter tmp, *filter; diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index c91d9725aafd..7a7b8d5b775e 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -157,6 +157,7 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu); void kvm_pmu_cleanup(struct kvm_vcpu *vcpu); void kvm_pmu_destroy(struct kvm_vcpu *vcpu); int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp); +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id); bool is_vmware_backdoor_pmc(u32 pmc_idx); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1aaf37e1bd0f..68b65e243eb3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7980,6 +7980,8 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu) if (unlikely(!r)) return 0; + kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS); + /* * rflags is the old, "raw" value of the flags. The new value has * not been saved yet. @@ -8242,6 +8244,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, vcpu->arch.emulate_regs_need_sync_to_vcpu = false; if (!ctxt->have_exception || exception_type(ctxt->exception.vector) == EXCPT_TRAP) { + kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS); kvm_rip_write(vcpu, ctxt->eip); if (r && (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP))) r = kvm_vcpu_do_singlestep(vcpu);
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index a20207ee4014..8abdadb7e22a 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -482,6 +482,66 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu) kvm_pmu_reset(vcpu); } +static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) +{ + struct kvm_pmu *pmu = pmc_to_pmu(pmc); + u64 prev_count; + + prev_count = pmc->counter; + pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc); + + reprogram_counter(pmu, pmc->idx); + if (pmc->counter < prev_count) + __kvm_perf_overflow(pmc, false); +} + +static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc, + unsigned int perf_hw_id) +{ + u64 old_eventsel = pmc->eventsel; + unsigned int config; + + pmc->eventsel &= (ARCH_PERFMON_EVENTSEL_EVENT | ARCH_PERFMON_EVENTSEL_UMASK); + config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc); + pmc->eventsel = old_eventsel; + return config == perf_hw_id; +} + +static inline bool cpl_is_matched(struct kvm_pmc *pmc) +{ + bool select_os, select_user; + u64 config = pmc->current_config; + + if (pmc_is_gp(pmc)) { + select_os = config & ARCH_PERFMON_EVENTSEL_OS; + select_user = config & ARCH_PERFMON_EVENTSEL_USR; + } else { + select_os = config & 0x1; + select_user = config & 0x2; + } + + return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user; +} + +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + struct kvm_pmc *pmc; + int i; + + for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) { + pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i); + + if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc)) + continue; + + /* Ignore checks for edge detect, pin control, invert and CMASK bits */ + if (eventsel_match_perf_hw_id(pmc, perf_hw_id) && cpl_is_matched(pmc)) + kvm_pmu_incr_counter(pmc); + } +} +EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event); + int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp) { struct kvm_pmu_event_filter tmp, *filter; diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index c91d9725aafd..7a7b8d5b775e 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -157,6 +157,7 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu); void kvm_pmu_cleanup(struct kvm_vcpu *vcpu); void kvm_pmu_destroy(struct kvm_vcpu *vcpu); int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp); +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id); bool is_vmware_backdoor_pmc(u32 pmc_idx); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a05a26471f19..83371be00771 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7978,6 +7978,8 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu) if (unlikely(!r)) return 0; + kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS); + /* * rflags is the old, "raw" value of the flags. The new value has * not been saved yet. @@ -8240,6 +8242,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, vcpu->arch.emulate_regs_need_sync_to_vcpu = false; if (!ctxt->have_exception || exception_type(ctxt->exception.vector) == EXCPT_TRAP) { + kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS); kvm_rip_write(vcpu, ctxt->eip); if (r && (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP))) r = kvm_vcpu_do_singlestep(vcpu);