Message ID | 20250302220112.17653-5-dongli.zhang@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup | expand |
On 3/3/2025 6:00 AM, Dongli Zhang wrote: > Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured, > there is no way to fully disable KVM AMD PMU virtualization. Neither > "-cpu host,-pmu" nor "-cpu EPYC" achieves this. This looks like a KVM bug. Anyway, since QEMU can achieve its goal with KVM_PMU_CAP_DISABLE with current KVM, I'm fine with it. I have one nit below, otherwise Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> > As a result, the following message still appears in the VM dmesg: > > [ 0.263615] Performance Events: AMD PMU driver. > > However, the expected output should be: > > [ 0.596381] Performance Events: PMU not available due to virtualization, using software events only. > [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled > > This occurs because AMD does not use any CPUID bit to indicate PMU > availability. > > To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE > when "-pmu" is configured. > > Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> > --- > Changed since v1: > - Switch back to the initial implementation with "-pmu". > https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com > - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on > Intel platform because current "pmu" property works as expected." > > target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index f41e190fb8..5c8a852dbd 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -176,6 +176,8 @@ static int has_triple_fault_event; > > static bool has_msr_mcg_ext_ctl; > > +static int has_pmu_cap; > + > static struct kvm_cpuid2 *cpuid_cache; > static struct kvm_cpuid2 *hv_cpuid_cache; > static struct kvm_msr_list *kvm_feature_msrs; > @@ -2053,6 +2055,33 @@ full: > > int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp) > { > + static bool first = true; > + int ret; > + > + if (first) { > + first = false; > + > + /* > + * Since Linux v5.18, KVM provides a VM-level capability to easily > + * disable PMUs; however, QEMU has been providing PMU property per > + * CPU since v1.6. In order to accommodate both, have to configure > + * the VM-level capability here. > + * > + * KVM_PMU_CAP_DISABLE doesn't change the PMU > + * behavior on Intel platform because current "pmu" property works > + * as expected. > + */ > + if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) { One nit, it's safer to use (has_pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu Maybe we can rename has_pmu_cap to pmu_cap as well. > + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0, > + KVM_PMU_CAP_DISABLE); > + if (ret < 0) { > + error_setg_errno(errp, -ret, > + "Failed to set KVM_PMU_CAP_DISABLE"); > + return ret; > + } > + } > + } > + > return 0; > } > > @@ -3351,6 +3380,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > } > } > > + has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY); > + > return 0; > } >
On Tue, Mar 04, 2025, Xiaoyao Li wrote: > On 3/3/2025 6:00 AM, Dongli Zhang wrote: > > Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured, > > there is no way to fully disable KVM AMD PMU virtualization. Neither > > "-cpu host,-pmu" nor "-cpu EPYC" achieves this. > > This looks like a KVM bug. Heh, the patches you sent do fix _a_ KVM bug, but this is something else entirely. In practice, the KVM bug only affects what KVM_GET_SUPPORTED_CPUID returns when enable_pmu=false, and in that case, it's only a reporting issue, i.e. KVM will still block usage of the PMU. As Dongli pointed out, older AMD CPUs don't actually enumerate a PMU in CPUID, and so the kernel assumes that not-too-old CPUs have a PMU: /* Performance-monitoring supported from K7 and later: */ if (boot_cpu_data.x86 < 6) return -ENODEV; The "expected" output: Performance Events: PMU not available due to virtualization, using software events only. is a long-standing workaround in the kernel to deal with lack of enumeration. On top of explicit enumeration, init_hw_perf_events() => check_hw_exists() probes hardware to see if it actually works. If an MSR is unexpectedly unavailable, as is the case when running as a guest, the kernel prints a message and disables PMU usage. E.g. the above message is specific to running as a guest: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { pr_cont("PMU not available due to virtualization, using software events only.\n"); From the KVM side, because there's no CPUID enumeration, there's no way for KVM to know that userspace wants to completely disable PMU virtualization from CPUID alone. Whereas with Intel CPUs, KVM infers that the PMU should be disabled by lack of a non-zero PMU version, e.g. if CPUID.0xA is omitted. > Anyway, since QEMU can achieve its goal with KVM_PMU_CAP_DISABLE with > current KVM, I'm fine with it. Yeah, this is the only way other than disabling KVM's PMU virtualization via module param (enable_pmu).
On 3/5/2025 9:22 AM, Sean Christopherson wrote: > On Tue, Mar 04, 2025, Xiaoyao Li wrote: >> On 3/3/2025 6:00 AM, Dongli Zhang wrote: >>> Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured, >>> there is no way to fully disable KVM AMD PMU virtualization. Neither >>> "-cpu host,-pmu" nor "-cpu EPYC" achieves this. >> >> This looks like a KVM bug. > > Heh, the patches you sent do fix _a_ KVM bug, but this is something else entirely. Aha, that fix was just found by code inspection. It was not supposed to be related with this. > In practice, the KVM bug only affects what KVM_GET_SUPPORTED_CPUID returns when > enable_pmu=false, and in that case, it's only a reporting issue, i.e. KVM will > still block usage of the PMU. > > As Dongli pointed out, older AMD CPUs don't actually enumerate a PMU in CPUID, > and so the kernel assumes that not-too-old CPUs have a PMU: > > /* Performance-monitoring supported from K7 and later: */ > if (boot_cpu_data.x86 < 6) > return -ENODEV; > > The "expected" output: > > Performance Events: PMU not available due to virtualization, using software events only. > > is a long-standing workaround in the kernel to deal with lack of enumeration. On > top of explicit enumeration, init_hw_perf_events() => check_hw_exists() probes > hardware to see if it actually works. If an MSR is unexpectedly unavailable, as > is the case when running as a guest, the kernel prints a message and disables PMU > usage. E.g. the above message is specific to running as a guest: > > if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { > pr_cont("PMU not available due to virtualization, using software events only.\n"); > > From the KVM side, because there's no CPUID enumeration, there's no way for KVM > to know that userspace wants to completely disable PMU virtualization from CPUID > alone. Whereas with Intel CPUs, KVM infers that the PMU should be disabled by > lack of a non-zero PMU version, e.g. if CPUID.0xA is omitted. I see now. Thanks to you and Dongli! >> Anyway, since QEMU can achieve its goal with KVM_PMU_CAP_DISABLE with >> current KVM, I'm fine with it. > > Yeah, this is the only way other than disabling KVM's PMU virtualization via > module param (enable_pmu).
> > + if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) { > > One nit, it's safer to use > > (has_pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu > > Maybe we can rename has_pmu_cap to pmu_cap as well. Yes, I agree. Regards, Zhao
On Sun, Mar 02, 2025 at 02:00:12PM -0800, Dongli Zhang wrote: > Date: Sun, 2 Mar 2025 14:00:12 -0800 > From: Dongli Zhang <dongli.zhang@oracle.com> > Subject: [PATCH v2 04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if > "-pmu" is configured > X-Mailer: git-send-email 2.43.5 > > Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured, > there is no way to fully disable KVM AMD PMU virtualization. Neither > "-cpu host,-pmu" nor "-cpu EPYC" achieves this. > > As a result, the following message still appears in the VM dmesg: > > [ 0.263615] Performance Events: AMD PMU driver. > > However, the expected output should be: > > [ 0.596381] Performance Events: PMU not available due to virtualization, using software events only. > [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled > > This occurs because AMD does not use any CPUID bit to indicate PMU > availability. > > To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE > when "-pmu" is configured. > > Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> > --- > Changed since v1: > - Switch back to the initial implementation with "-pmu". > https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com > - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on > Intel platform because current "pmu" property works as expected." > > target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) Overall LGTM. And with Xiaoyao's comment fixed :-) Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Thanks, Zhao
Hi Xiaoyao and Zhao, On 3/5/25 6:41 AM, Zhao Liu wrote: >>> + if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) { >> >> One nit, it's safer to use >> >> (has_pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu >> >> Maybe we can rename has_pmu_cap to pmu_cap as well. > > Yes, I agree. > Thanks both of you very much! I also need to modify PATCH 08/10 where has_pmu_cap is used. [PATCH v2 08/10] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index f41e190fb8..5c8a852dbd 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -176,6 +176,8 @@ static int has_triple_fault_event; static bool has_msr_mcg_ext_ctl; +static int has_pmu_cap; + static struct kvm_cpuid2 *cpuid_cache; static struct kvm_cpuid2 *hv_cpuid_cache; static struct kvm_msr_list *kvm_feature_msrs; @@ -2053,6 +2055,33 @@ full: int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp) { + static bool first = true; + int ret; + + if (first) { + first = false; + + /* + * Since Linux v5.18, KVM provides a VM-level capability to easily + * disable PMUs; however, QEMU has been providing PMU property per + * CPU since v1.6. In order to accommodate both, have to configure + * the VM-level capability here. + * + * KVM_PMU_CAP_DISABLE doesn't change the PMU + * behavior on Intel platform because current "pmu" property works + * as expected. + */ + if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) { + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0, + KVM_PMU_CAP_DISABLE); + if (ret < 0) { + error_setg_errno(errp, -ret, + "Failed to set KVM_PMU_CAP_DISABLE"); + return ret; + } + } + } + return 0; } @@ -3351,6 +3380,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } + has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY); + return 0; }
Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured, there is no way to fully disable KVM AMD PMU virtualization. Neither "-cpu host,-pmu" nor "-cpu EPYC" achieves this. As a result, the following message still appears in the VM dmesg: [ 0.263615] Performance Events: AMD PMU driver. However, the expected output should be: [ 0.596381] Performance Events: PMU not available due to virtualization, using software events only. [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled This occurs because AMD does not use any CPUID bit to indicate PMU availability. To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE when "-pmu" is configured. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> --- Changed since v1: - Switch back to the initial implementation with "-pmu". https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on Intel platform because current "pmu" property works as expected." target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)