Message ID | 20221111102645.82001-7-likexu@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: Add AMD Guest PerfMonV2 PMU support | expand |
On Fri, Nov 11, 2022, Like Xu wrote: On Fri, Nov 11, 2022, Like Xu wrote: > @@ -162,20 +179,42 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > static void amd_pmu_refresh(struct kvm_vcpu *vcpu) > { > struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + struct kvm_cpuid_entry2 *entry; > + union cpuid_0x80000022_ebx ebx; > > - if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) > - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE; > + pmu->version = 1; > + if (kvm_cpu_cap_has(X86_FEATURE_AMD_PMU_V2) && Why check kvm_cpu_cap support? I.e. what will go wrong if userspace enumerates PMU v2 to the guest without proper hardware/KVM support. If this is _necessary_ to protect the host kernel, then we should probably have a helper to query PMU features, e.g. static __always_inline bool guest_pmu_has(struct kvm_vcpu *vcpu, unsigned int x86_feature) { return kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature); } > + guest_cpuid_has(vcpu, X86_FEATURE_AMD_PMU_V2)) { > + pmu->version = 2; > + entry = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0); > + ebx.full = entry->ebx; > + pmu->nr_arch_gp_counters = min3((unsigned int)ebx.split.num_core_pmc, > + (unsigned int)kvm_pmu_cap.num_counters_gp, > + (unsigned int)KVM_AMD_PMC_MAX_GENERIC); Blech. This really shouldn't be necessary, KVM should tweak kvm_pmu_cap.num_counters_gp as needed during initialization to ensure num_counters_gp doesn't exceed KVM's internal limits. Posted a patch[*], please take a look. As mentioned in that thread, I'll somewhat speculatively apply that series sooner than later so that you can use it a base for this series (assuming the patch isn't busted). [*] https://lore.kernel.org/all/20230124234905.3774678-2-seanjc@google.com > + } > + > + /* Commitment to minimal PMCs, regardless of CPUID.80000022 */ Please expand this comment. I'm still not entirely sure I've interpreted it correctly, and I'm not sure that I agree with the code. > + if (kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE) && AFAICT, checking kvm_cpu_cap_has() is an unrelated change. Either it's a bug fix and belongs in a separate patch, or it's unnecessary and should be dropped. > + guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) > + pmu->nr_arch_gp_counters = max_t(unsigned int, > + pmu->nr_arch_gp_counters, > + AMD64_NUM_COUNTERS_CORE); > else > - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS; > + pmu->nr_arch_gp_counters = max_t(unsigned int, > + pmu->nr_arch_gp_counters, > + AMD64_NUM_COUNTERS); Using max() doesn't look right. E.g. if KVM ends up running on some odd setup where ebx.split.num_core_pmc/kvm_pmu_cap.num_counters_gp is less than AMD64_NUM_COUNTERS_CORE or AMD64_NUM_COUNTERS. Or more likely, if userspace says "only expose N counters to this guest". Shouldn't this be something like? if (guest_cpuid_has(vcpu, X86_FEATURE_AMD_PMU_V2)) pmu->nr_arch_gp_counters = min(ebx.split.num_core_pmc, kvm_pmu_cap.num_counters_gp); else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE; else pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERSE;
On 25/1/2023 8:10 am, Sean Christopherson wrote: > On Fri, Nov 11, 2022, Like Xu wrote: > On Fri, Nov 11, 2022, Like Xu wrote: >> @@ -162,20 +179,42 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >> static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >> { >> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + struct kvm_cpuid_entry2 *entry; >> + union cpuid_0x80000022_ebx ebx; >> >> - if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) >> - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE; >> + pmu->version = 1; >> + if (kvm_cpu_cap_has(X86_FEATURE_AMD_PMU_V2) && > > Why check kvm_cpu_cap support? I.e. what will go wrong if userspace enumerates > PMU v2 to the guest without proper hardware/KVM support. > > If this is _necessary_ to protect the host kernel, then we should probably have > a helper to query PMU features, e.g. > > static __always_inline bool guest_pmu_has(struct kvm_vcpu *vcpu, > unsigned int x86_feature) > { > return kvm_cpu_cap_has(x86_feature) && > guest_cpuid_has(vcpu, x86_feature); > } > > > >> + guest_cpuid_has(vcpu, X86_FEATURE_AMD_PMU_V2)) { >> + pmu->version = 2; >> + entry = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0); >> + ebx.full = entry->ebx; >> + pmu->nr_arch_gp_counters = min3((unsigned int)ebx.split.num_core_pmc, >> + (unsigned int)kvm_pmu_cap.num_counters_gp, >> + (unsigned int)KVM_AMD_PMC_MAX_GENERIC); > > Blech. This really shouldn't be necessary, KVM should tweak kvm_pmu_cap.num_counters_gp > as needed during initialization to ensure num_counters_gp doesn't exceed KVM's > internal limits. > > Posted a patch[*], please take a look. As mentioned in that thread, I'll somewhat > speculatively apply that series sooner than later so that you can use it a base > for this series (assuming the patch isn't busted). > > [*] https://lore.kernel.org/all/20230124234905.3774678-2-seanjc@google.com > >> + } >> + >> + /* Commitment to minimal PMCs, regardless of CPUID.80000022 */ > > Please expand this comment. I'm still not entirely sure I've interpreted it correctly, > and I'm not sure that I agree with the code. In the first version [1], I used almost the same if-elif-else sequence but the concerns from JimM[2] has changed my mind: "Nonetheless, for compatibility with old software, Fn8000_0022_EBX can never report less than four counters (or six, if Fn8000_0001_ECX[PerfCtrExtCore] is set)." Both in amd_pmu_refresh() and in __do_cpuid_func(), KVM implements this using the override approach of first applying the semantics of AMD_PMU_V2 and then implementing a minimum number of counters supported based on whether or not guest have PERFCTR_CORE, the proposed if-elif-else does not fulfill this need. [1] 20220905123946.95223-4-likexu@tencent.com/ [2] CALMp9eQObuiJGV=YrAU9Fw+KoXfJtZMJ-KUs-qCOVd+R9zGBpw@mail.gmail.com > >> + if (kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE) && > > AFAICT, checking kvm_cpu_cap_has() is an unrelated change. Either it's a bug fix > and belongs in a separate patch, or it's unnecessary and should be dropped. > >> + guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) >> + pmu->nr_arch_gp_counters = max_t(unsigned int, >> + pmu->nr_arch_gp_counters, >> + AMD64_NUM_COUNTERS_CORE); >> else >> - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS; >> + pmu->nr_arch_gp_counters = max_t(unsigned int, >> + pmu->nr_arch_gp_counters, >> + AMD64_NUM_COUNTERS); > > Using max() doesn't look right. E.g. if KVM ends up running on some odd setup > where ebx.split.num_core_pmc/kvm_pmu_cap.num_counters_gp is less than > AMD64_NUM_COUNTERS_CORE or AMD64_NUM_COUNTERS. > > Or more likely, if userspace says "only expose N counters to this guest". > > Shouldn't this be something like? > > if (guest_cpuid_has(vcpu, X86_FEATURE_AMD_PMU_V2)) > pmu->nr_arch_gp_counters = min(ebx.split.num_core_pmc, > kvm_pmu_cap.num_counters_gp); > else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) > pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE; > else > pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERSE; >
On Mon, Feb 06, 2023, Like Xu wrote: > On 25/1/2023 8:10 am, Sean Christopherson wrote: > > > + } > > > + > > > + /* Commitment to minimal PMCs, regardless of CPUID.80000022 */ > > > > Please expand this comment. I'm still not entirely sure I've interpreted it correctly, > > and I'm not sure that I agree with the code. > > In the first version [1], I used almost the same if-elif-else sequence > but the concerns from JimM[2] has changed my mind: > > "Nonetheless, for compatibility with old software, Fn8000_0022_EBX can never > report less than four counters (or six, if Fn8000_0001_ECX[PerfCtrExtCore] is set)." > > Both in amd_pmu_refresh() and in __do_cpuid_func(), KVM implements > this using the override approach of first applying the semantics of > AMD_PMU_V2 and then implementing a minimum number of counters > supported based on whether or not guest have PERFCTR_CORE, > the proposed if-elif-else does not fulfill this need. Jim's comments were in the context of __do_cpuid_func(), i.e. KVM_GET_SUPPORTED_CPUID. As far as guest CPUID is concerned, that's userspace's responsibility to get correct. And for KVM_GET_SUPPORTED_CPUID, overriding kvm_pmu_cap.num_counters_gp is not the correct approach. KVM should sanity check the number of counters enumerated by perf and explicitly disable vPMU support if the min isn't met. E.g. if KVM needs 6 counters and perf says there are 4, then something is wrong and enumerating 6 to the guest is only going to cause more problems. > [1] 20220905123946.95223-4-likexu@tencent.com/ > [2] CALMp9eQObuiJGV=YrAU9Fw+KoXfJtZMJ-KUs-qCOVd+R9zGBpw@mail.gmail.com
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 81114a376c4e..d02990fcd46f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -512,6 +512,7 @@ struct kvm_pmc { #define MSR_ARCH_PERFMON_EVENTSEL_MAX (MSR_ARCH_PERFMON_EVENTSEL0 + KVM_INTEL_PMC_MAX_GENERIC - 1) #define KVM_PMC_MAX_FIXED 3 #define KVM_AMD_PMC_MAX_GENERIC 6 +#define MSR_F15H_PERF_MSR_MAX (MSR_F15H_PERF_CTR0 + 2 * (KVM_AMD_PMC_MAX_GENERIC - 1)) struct kvm_pmu { unsigned nr_arch_gp_counters; unsigned nr_arch_fixed_counters; diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index a3726af5416d..c70ff57ee44c 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -471,12 +471,15 @@ int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) switch (msr) { case MSR_CORE_PERF_GLOBAL_STATUS: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: msr_info->data = pmu->global_status; return 0; case MSR_CORE_PERF_GLOBAL_CTRL: + case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: msr_info->data = pmu->global_ctrl; return 0; case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: msr_info->data = 0; return 0; default: @@ -495,12 +498,14 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) switch (msr) { case MSR_CORE_PERF_GLOBAL_STATUS: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: if (!msr_info->host_initiated) return 1; /* RO MSR */ pmu->global_status = data; return 0; case MSR_CORE_PERF_GLOBAL_CTRL: + case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: if (!kvm_valid_perf_global_ctrl(pmu, data)) return 1; @@ -511,6 +516,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } return 0; case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: if (data & pmu->global_ovf_ctrl_mask) return 1; diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 4e7d7e6cccec..e58f39f8f10b 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -92,12 +92,6 @@ static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu, return amd_pmc_idx_to_pmc(vcpu_to_pmu(vcpu), idx & ~(3u << 30)); } -static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) -{ - /* All MSRs refer to exactly one PMC, so msr_idx_to_pmc is enough. */ - return false; -} - static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); @@ -109,6 +103,29 @@ static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr) return pmc; } +static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + + switch (msr) { + case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3: + return pmu->version > 0; + case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5: + return guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE); + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: + case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: + return pmu->version > 1; + default: + if (msr > MSR_F15H_PERF_CTR5 && + msr < MSR_F15H_PERF_CTL0 + 2 * pmu->nr_arch_gp_counters) + return pmu->version > 1; + break; + } + + return amd_msr_idx_to_pmc(vcpu, msr); +} + static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); @@ -162,20 +179,42 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) static void amd_pmu_refresh(struct kvm_vcpu *vcpu) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + struct kvm_cpuid_entry2 *entry; + union cpuid_0x80000022_ebx ebx; - if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE; + pmu->version = 1; + if (kvm_cpu_cap_has(X86_FEATURE_AMD_PMU_V2) && + guest_cpuid_has(vcpu, X86_FEATURE_AMD_PMU_V2)) { + pmu->version = 2; + entry = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0); + ebx.full = entry->ebx; + pmu->nr_arch_gp_counters = min3((unsigned int)ebx.split.num_core_pmc, + (unsigned int)kvm_pmu_cap.num_counters_gp, + (unsigned int)KVM_AMD_PMC_MAX_GENERIC); + } + + /* Commitment to minimal PMCs, regardless of CPUID.80000022 */ + if (kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE) && + guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) + pmu->nr_arch_gp_counters = max_t(unsigned int, + pmu->nr_arch_gp_counters, + AMD64_NUM_COUNTERS_CORE); else - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS; + pmu->nr_arch_gp_counters = max_t(unsigned int, + pmu->nr_arch_gp_counters, + AMD64_NUM_COUNTERS); + + if (pmu->version > 1) { + pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); + pmu->global_ovf_ctrl_mask = pmu->global_ctrl_mask; + } pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; pmu->reserved_bits = 0xfffffff000280000ull; pmu->raw_event_mask = AMD64_RAW_EVENT_MASK; - pmu->version = 1; /* not applicable to AMD; but clean them to prevent any fall out */ pmu->counter_bitmask[KVM_PMC_FIXED] = 0; pmu->nr_arch_fixed_counters = 0; - pmu->global_status = 0; bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters); } @@ -186,6 +225,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu) BUILD_BUG_ON(KVM_AMD_PMC_MAX_GENERIC > AMD64_NUM_COUNTERS_CORE); BUILD_BUG_ON(KVM_AMD_PMC_MAX_GENERIC > INTEL_PMC_MAX_GENERIC); + BUILD_BUG_ON(KVM_AMD_PMC_MAX_GENERIC < 1); for (i = 0; i < KVM_AMD_PMC_MAX_GENERIC ; i++) { pmu->gp_counters[i].type = KVM_PMC_GP; @@ -206,6 +246,8 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu) pmc_stop_counter(pmc); pmc->counter = pmc->prev_counter = pmc->eventsel = 0; } + + pmu->global_ctrl = pmu->global_status = 0; } struct kvm_pmu_ops amd_pmu_ops __initdata = { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e46e458c5b08..99bc47f1a40e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1458,6 +1458,10 @@ static const u32 msrs_to_save_all[] = { MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2, MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5, + MSR_AMD64_PERF_CNTR_GLOBAL_CTL, + MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, + MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, + MSR_IA32_XFD, MSR_IA32_XFD_ERR, }; @@ -3859,7 +3863,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_PEBS_ENABLE: case MSR_IA32_DS_AREA: case MSR_PEBS_DATA_CFG: - case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5: + case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_MSR_MAX: + case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_set_msr(vcpu, msr_info); /* @@ -3962,7 +3969,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_PEBS_ENABLE: case MSR_IA32_DS_AREA: case MSR_PEBS_DATA_CFG: - case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5: + case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_MSR_MAX: + case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) return kvm_pmu_get_msr(vcpu, msr_info); /*