diff mbox series

[v2,04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured

Message ID 20250302220112.17653-5-dongli.zhang@oracle.com (mailing list archive)
State New, archived
Headers show
Series target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup | expand

Commit Message

Dongli Zhang March 2, 2025, 10 p.m. UTC
Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
there is no way to fully disable KVM AMD PMU virtualization. Neither
"-cpu host,-pmu" nor "-cpu EPYC" achieves this.

As a result, the following message still appears in the VM dmesg:

[    0.263615] Performance Events: AMD PMU driver.

However, the expected output should be:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

This occurs because AMD does not use any CPUID bit to indicate PMU
availability.

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
  - Switch back to the initial implementation with "-pmu".
https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
  - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
    Intel platform because current "pmu" property works as expected."

 target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Xiaoyao Li March 4, 2025, 7:59 a.m. UTC | #1
On 3/3/2025 6:00 AM, Dongli Zhang wrote:
> Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
> there is no way to fully disable KVM AMD PMU virtualization. Neither
> "-cpu host,-pmu" nor "-cpu EPYC" achieves this.

This looks like a KVM bug.

Anyway, since QEMU can achieve its goal with KVM_PMU_CAP_DISABLE with 
current KVM, I'm fine with it.

I have one nit below, otherwise

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> As a result, the following message still appears in the VM dmesg:
> 
> [    0.263615] Performance Events: AMD PMU driver.
> 
> However, the expected output should be:
> 
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
> 
> This occurs because AMD does not use any CPUID bit to indicate PMU
> availability.
> 
> To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
> when "-pmu" is configured.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
>    - Switch back to the initial implementation with "-pmu".
> https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
>    - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
>      Intel platform because current "pmu" property works as expected."
> 
>   target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index f41e190fb8..5c8a852dbd 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -176,6 +176,8 @@ static int has_triple_fault_event;
>   
>   static bool has_msr_mcg_ext_ctl;
>   
> +static int has_pmu_cap;
> +
>   static struct kvm_cpuid2 *cpuid_cache;
>   static struct kvm_cpuid2 *hv_cpuid_cache;
>   static struct kvm_msr_list *kvm_feature_msrs;
> @@ -2053,6 +2055,33 @@ full:
>   
>   int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
>   {
> +    static bool first = true;
> +    int ret;
> +
> +    if (first) {
> +        first = false;
> +
> +        /*
> +         * Since Linux v5.18, KVM provides a VM-level capability to easily
> +         * disable PMUs; however, QEMU has been providing PMU property per
> +         * CPU since v1.6. In order to accommodate both, have to configure
> +         * the VM-level capability here.
> +         *
> +         * KVM_PMU_CAP_DISABLE doesn't change the PMU
> +         * behavior on Intel platform because current "pmu" property works
> +         * as expected.
> +         */
> +        if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) {

One nit, it's safer to use

	(has_pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu

Maybe we can rename has_pmu_cap to pmu_cap as well.

> +            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> +                                    KVM_PMU_CAP_DISABLE);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret,
> +                                 "Failed to set KVM_PMU_CAP_DISABLE");
> +                return ret;
> +            }
> +        }
> +    }
> +
>       return 0;
>   }
>   
> @@ -3351,6 +3380,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>           }
>       }
>   
> +    has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
> +
>       return 0;
>   }
>
Sean Christopherson March 5, 2025, 1:22 a.m. UTC | #2
On Tue, Mar 04, 2025, Xiaoyao Li wrote:
> On 3/3/2025 6:00 AM, Dongli Zhang wrote:
> > Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
> > there is no way to fully disable KVM AMD PMU virtualization. Neither
> > "-cpu host,-pmu" nor "-cpu EPYC" achieves this.
> 
> This looks like a KVM bug.

Heh, the patches you sent do fix _a_ KVM bug, but this is something else entirely.

In practice, the KVM bug only affects what KVM_GET_SUPPORTED_CPUID returns when
enable_pmu=false, and in that case, it's only a reporting issue, i.e. KVM will
still block usage of the PMU.

As Dongli pointed out, older AMD CPUs don't actually enumerate a PMU in CPUID,
and so the kernel assumes that not-too-old CPUs have a PMU:

	/* Performance-monitoring supported from K7 and later: */
	if (boot_cpu_data.x86 < 6)
		return -ENODEV;

The "expected" output:

   Performance Events: PMU not available due to virtualization, using software events only.

is a long-standing workaround in the kernel to deal with lack of enumeration.  On
top of explicit enumeration, init_hw_perf_events() => check_hw_exists() probes
hardware to see if it actually works.  If an MSR is unexpectedly unavailable, as
is the case when running as a guest, the kernel prints a message and disables PMU
usage.  E.g. the above message is specific to running as a guest:

	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
		pr_cont("PMU not available due to virtualization, using software events only.\n");

From the KVM side, because there's no CPUID enumeration, there's no way for KVM
to know that userspace wants to completely disable PMU virtualization from CPUID
alone.  Whereas with Intel CPUs, KVM infers that the PMU should be disabled by
lack of a non-zero PMU version, e.g. if CPUID.0xA is omitted.

> Anyway, since QEMU can achieve its goal with KVM_PMU_CAP_DISABLE with
> current KVM, I'm fine with it.

Yeah, this is the only way other than disabling KVM's PMU virtualization via
module param (enable_pmu).
Xiaoyao Li March 5, 2025, 1:35 a.m. UTC | #3
On 3/5/2025 9:22 AM, Sean Christopherson wrote:
> On Tue, Mar 04, 2025, Xiaoyao Li wrote:
>> On 3/3/2025 6:00 AM, Dongli Zhang wrote:
>>> Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
>>> there is no way to fully disable KVM AMD PMU virtualization. Neither
>>> "-cpu host,-pmu" nor "-cpu EPYC" achieves this.
>>
>> This looks like a KVM bug.
> 
> Heh, the patches you sent do fix _a_ KVM bug, but this is something else entirely.

Aha, that fix was just found by code inspection. It was not supposed to 
be related with this.

> In practice, the KVM bug only affects what KVM_GET_SUPPORTED_CPUID returns when
> enable_pmu=false, and in that case, it's only a reporting issue, i.e. KVM will
> still block usage of the PMU.
> 
> As Dongli pointed out, older AMD CPUs don't actually enumerate a PMU in CPUID,
> and so the kernel assumes that not-too-old CPUs have a PMU:
> 
> 	/* Performance-monitoring supported from K7 and later: */
> 	if (boot_cpu_data.x86 < 6)
> 		return -ENODEV;
> 
> The "expected" output:
> 
>     Performance Events: PMU not available due to virtualization, using software events only.
> 
> is a long-standing workaround in the kernel to deal with lack of enumeration.  On
> top of explicit enumeration, init_hw_perf_events() => check_hw_exists() probes
> hardware to see if it actually works.  If an MSR is unexpectedly unavailable, as
> is the case when running as a guest, the kernel prints a message and disables PMU
> usage.  E.g. the above message is specific to running as a guest:
> 
> 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> 		pr_cont("PMU not available due to virtualization, using software events only.\n");
> 
>  From the KVM side, because there's no CPUID enumeration, there's no way for KVM
> to know that userspace wants to completely disable PMU virtualization from CPUID
> alone.  Whereas with Intel CPUs, KVM infers that the PMU should be disabled by
> lack of a non-zero PMU version, e.g. if CPUID.0xA is omitted.

I see now.

Thanks to you and Dongli!

>> Anyway, since QEMU can achieve its goal with KVM_PMU_CAP_DISABLE with
>> current KVM, I'm fine with it.
> 
> Yeah, this is the only way other than disabling KVM's PMU virtualization via
> module param (enable_pmu).
Zhao Liu March 5, 2025, 2:41 p.m. UTC | #4
> > +        if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) {
> 
> One nit, it's safer to use
> 
> 	(has_pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu
> 
> Maybe we can rename has_pmu_cap to pmu_cap as well.

Yes, I agree.

Regards,
Zhao
Zhao Liu March 5, 2025, 2:44 p.m. UTC | #5
On Sun, Mar 02, 2025 at 02:00:12PM -0800, Dongli Zhang wrote:
> Date: Sun,  2 Mar 2025 14:00:12 -0800
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v2 04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if
>  "-pmu" is configured
> X-Mailer: git-send-email 2.43.5
> 
> Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
> there is no way to fully disable KVM AMD PMU virtualization. Neither
> "-cpu host,-pmu" nor "-cpu EPYC" achieves this.
> 
> As a result, the following message still appears in the VM dmesg:
> 
> [    0.263615] Performance Events: AMD PMU driver.
> 
> However, the expected output should be:
> 
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
> 
> This occurs because AMD does not use any CPUID bit to indicate PMU
> availability.
> 
> To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
> when "-pmu" is configured.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
>   - Switch back to the initial implementation with "-pmu".
> https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
>   - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
>     Intel platform because current "pmu" property works as expected."
> 
>  target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)

Overall LGTM. And with Xiaoyao's comment fixed :-)

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>

Thanks,
Zhao
Dongli Zhang March 5, 2025, 8:13 p.m. UTC | #6
Hi Xiaoyao and Zhao,

On 3/5/25 6:41 AM, Zhao Liu wrote:
>>> +        if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) {
>>
>> One nit, it's safer to use
>>
>> 	(has_pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu
>>
>> Maybe we can rename has_pmu_cap to pmu_cap as well.
> 
> Yes, I agree.
> 

Thanks both of you very much!

I also need to modify PATCH 08/10 where has_pmu_cap is used.

[PATCH v2 08/10] target/i386/kvm: reset AMD PMU registers during VM reset

Dongli Zhang
diff mbox series

Patch

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f41e190fb8..5c8a852dbd 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -176,6 +176,8 @@  static int has_triple_fault_event;
 
 static bool has_msr_mcg_ext_ctl;
 
+static int has_pmu_cap;
+
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
@@ -2053,6 +2055,33 @@  full:
 
 int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
 {
+    static bool first = true;
+    int ret;
+
+    if (first) {
+        first = false;
+
+        /*
+         * Since Linux v5.18, KVM provides a VM-level capability to easily
+         * disable PMUs; however, QEMU has been providing PMU property per
+         * CPU since v1.6. In order to accommodate both, have to configure
+         * the VM-level capability here.
+         *
+         * KVM_PMU_CAP_DISABLE doesn't change the PMU
+         * behavior on Intel platform because current "pmu" property works
+         * as expected.
+         */
+        if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) {
+            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+                                    KVM_PMU_CAP_DISABLE);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret,
+                                 "Failed to set KVM_PMU_CAP_DISABLE");
+                return ret;
+            }
+        }
+    }
+
     return 0;
 }
 
@@ -3351,6 +3380,8 @@  int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
     return 0;
 }