diff mbox series

[2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE

Message ID 20241104094119.4131-3-dongli.zhang@oracle.com (mailing list archive)
State New
Headers show
Series target/i386/kvm/pmu: Enhancement, Bugfix and Cleanup | expand

Commit Message

Dongli Zhang Nov. 4, 2024, 9:40 a.m. UTC
The AMD PMU virtualization is not disabled when configuring
"-cpu host,-pmu" in the QEMU command line on an AMD server. Neither
"-cpu host,-pmu" nor "-cpu EPYC" effectively disables AMD PMU
virtualization in such an environment.

As a result, VM logs typically show:

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

whereas the expected logs should be:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

This discrepancy occurs because AMD PMU does not use CPUID to determine
whether PMU virtualization is supported.

To address this, we introduce a new property, 'pmu-cap-disabled', for KVM
acceleration. This property sets KVM_PMU_CAP_DISABLE if
KVM_CAP_PMU_CAPABILITY is supported. Note that this feature currently
supports only x86 hosts, as KVM_CAP_PMU_CAPABILITY is used exclusively for
x86 systems.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Another previous solution to re-use '-cpu host,-pmu':
https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/

 accel/kvm/kvm-all.c        |  1 +
 include/sysemu/kvm_int.h   |  1 +
 qemu-options.hx            |  9 ++++++-
 target/i386/cpu.c          |  2 +-
 target/i386/kvm/kvm.c      | 52 ++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/kvm_i386.h |  2 ++
 6 files changed, 65 insertions(+), 2 deletions(-)

Comments

Zhao Liu Nov. 7, 2024, 7:52 a.m. UTC | #1
(+Dapang & Zide)

Hi Dongli,

On Mon, Nov 04, 2024 at 01:40:17AM -0800, Dongli Zhang wrote:
> Date: Mon,  4 Nov 2024 01:40:17 -0800
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH 2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set
>  KVM_PMU_CAP_DISABLE
> X-Mailer: git-send-email 2.43.5
> 
> The AMD PMU virtualization is not disabled when configuring
> "-cpu host,-pmu" in the QEMU command line on an AMD server. Neither
> "-cpu host,-pmu" nor "-cpu EPYC" effectively disables AMD PMU
> virtualization in such an environment.
> 
> As a result, VM logs typically show:
> 
> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
> 
> whereas the expected logs should be:
> 
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
> 
> This discrepancy occurs because AMD PMU does not use CPUID to determine
> whether PMU virtualization is supported.

Intel platform doesn't have this issue since Linux kernel fails to check
the CPU family & model when "-cpu *,-pmu" option clears PMU version.

The difference between Intel and AMD platforms, however, is that it seems
Intel hardly ever reaches the “...due virtualization” message, but
instead reports an error because it recognizes a mismatched family/model.

This may be a drawback of the PMU driver's print message, but the result
is the same, it prevents the PMU driver from enabling.

So, please mention that KVM_PMU_CAP_DISABLE doesn't change the PMU
behavior on Intel platform because current "pmu" property works as
expected.

> To address this, we introduce a new property, 'pmu-cap-disabled', for KVM
> acceleration. This property sets KVM_PMU_CAP_DISABLE if
> KVM_CAP_PMU_CAPABILITY is supported. Note that this feature currently
> supports only x86 hosts, as KVM_CAP_PMU_CAPABILITY is used exclusively for
> x86 systems.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Another previous solution to re-use '-cpu host,-pmu':
> https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/

IMO, I prefer the previous version. This VM-level KVM property is
difficult to integrate with the existing CPU properties. Pls refer later
comments for reasons.

>  accel/kvm/kvm-all.c        |  1 +
>  include/sysemu/kvm_int.h   |  1 +
>  qemu-options.hx            |  9 ++++++-
>  target/i386/cpu.c          |  2 +-
>  target/i386/kvm/kvm.c      | 52 ++++++++++++++++++++++++++++++++++++++
>  target/i386/kvm/kvm_i386.h |  2 ++
>  6 files changed, 65 insertions(+), 2 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 801cff16a5..8b5ba45cf7 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -3933,6 +3933,7 @@ static void kvm_accel_instance_init(Object *obj)
>      s->xen_evtchn_max_pirq = 256;
>      s->device = NULL;
>      s->msr_energy.enable = false;
> +    s->pmu_cap_disabled = false;
>  }

The CPU property "pmu" also defaults to "false"...but:

 * max CPU would override this and try to enable PMU by default in
   max_x86_cpu_initfn().

 * Other named CPU models keep the default setting to avoid affecting
   the migration.

The pmu_cap_disabled and “pmu” property look unbound and unassociated,
so this can cause the conflict when they are not synchronized. For
example,

-cpu host -accel kvm,pmu-cap-disabled=on

The above options will fail to launch a VM (on Intel platform).

Ideally, the “pmu” property and pmu-cap-disabled should be bound to each
other and be consistent. But it's not easy because:
 - There is no proper way to have pmu_cap_disabled set different default
   values (e.g., "false" for max CPU and "true" for named CPU models)
   based on different CPU models.
 - And, no proper place to check the consistency of pmu_cap_disabled and
   enable_pmu.

Therefore, I prefer your previous approach, to reuse current CPU "pmu"
property.

Further, considering that this is currently the only case that needs to
to set the VM level's capability in the CPU context, there is no need to
introduce a new kvm interface (in your previous patch), which can instead
be set in kvm_cpu_realizefn(), like:


diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index 99d1941cf51c..05e9c9a1a0cf 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -42,6 +42,8 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
 {
     X86CPU *cpu = X86_CPU(cs);
     CPUX86State *env = &cpu->env;
+    KVMState *s = kvm_state;
+    static bool first = true;
     bool ret;

     /*
@@ -63,6 +65,29 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
      *   check/update ucode_rev, phys_bits, guest_phys_bits, mwait
      *   cpu_common_realizefn() (via xcc->parent_realize)
      */
+
+    if (first) {
+        first = false;
+
+        /*
+         * Since Linux v5.18, KVM provides a VM-level capability to easily
+         * disable PMUs; however, QEMU has been providing PMU property per
+         * CPU since v1.6. In order to accommodate both, have to configure
+         * the VM-level capability here.
+         */
+        if (!cpu->enable_pmu &&
+            kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY)) {
+            int r = kvm_vm_enable_cap(s, KVM_CAP_PMU_CAPABILITY, 0,
+                                      KVM_PMU_CAP_DISABLE);
+
+            if (r < 0) {
+                error_setg(errp, "kvm: Failed to disable pmu cap: %s",
+                           strerror(-r));
+                return false;
+            }
+        }
+    }
+
     if (cpu->max_features) {
         if (enable_cpu_pm) {
             if (kvm_has_waitpkg()) {
---

In addition, if PMU is disabled, why not mask the perf related bits in
8000_0001_ECX? :)

Regards,
Zhao
Dongli Zhang Nov. 7, 2024, 11:44 p.m. UTC | #2
Hi Zhao,


On 11/6/24 11:52 PM, Zhao Liu wrote:
> (+Dapang & Zide)
> 
> Hi Dongli,
> 
> On Mon, Nov 04, 2024 at 01:40:17AM -0800, Dongli Zhang wrote:
>> Date: Mon,  4 Nov 2024 01:40:17 -0800
>> From: Dongli Zhang <dongli.zhang@oracle.com>
>> Subject: [PATCH 2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set
>>  KVM_PMU_CAP_DISABLE
>> X-Mailer: git-send-email 2.43.5
>>
>> The AMD PMU virtualization is not disabled when configuring
>> "-cpu host,-pmu" in the QEMU command line on an AMD server. Neither
>> "-cpu host,-pmu" nor "-cpu EPYC" effectively disables AMD PMU
>> virtualization in such an environment.
>>
>> As a result, VM logs typically show:
>>
>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>
>> whereas the expected logs should be:
>>
>> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>
>> This discrepancy occurs because AMD PMU does not use CPUID to determine
>> whether PMU virtualization is supported.
> 
> Intel platform doesn't have this issue since Linux kernel fails to check
> the CPU family & model when "-cpu *,-pmu" option clears PMU version.
> 
> The difference between Intel and AMD platforms, however, is that it seems
> Intel hardly ever reaches the “...due virtualization” message, but
> instead reports an error because it recognizes a mismatched family/model.
> 
> This may be a drawback of the PMU driver's print message, but the result
> is the same, it prevents the PMU driver from enabling.
> 
> So, please mention that KVM_PMU_CAP_DISABLE doesn't change the PMU
> behavior on Intel platform because current "pmu" property works as
> expected.

Sure. I will mention this in v2.

> 
>> To address this, we introduce a new property, 'pmu-cap-disabled', for KVM
>> acceleration. This property sets KVM_PMU_CAP_DISABLE if
>> KVM_CAP_PMU_CAPABILITY is supported. Note that this feature currently
>> supports only x86 hosts, as KVM_CAP_PMU_CAPABILITY is used exclusively for
>> x86 systems.
>>
>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>> ---
>> Another previous solution to re-use '-cpu host,-pmu':
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!Nm8Db-mwBoMIwKkRqzC9kgNi5uZ7SCIf43zUBn92Ar_NEbLXq-ZkrDDvpvDQ4cnS2i4VyKAp6CRVE12bRkMF$ 
> 
> IMO, I prefer the previous version. This VM-level KVM property is
> difficult to integrate with the existing CPU properties. Pls refer later
> comments for reasons.
> 
>>  accel/kvm/kvm-all.c        |  1 +
>>  include/sysemu/kvm_int.h   |  1 +
>>  qemu-options.hx            |  9 ++++++-
>>  target/i386/cpu.c          |  2 +-
>>  target/i386/kvm/kvm.c      | 52 ++++++++++++++++++++++++++++++++++++++
>>  target/i386/kvm/kvm_i386.h |  2 ++
>>  6 files changed, 65 insertions(+), 2 deletions(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 801cff16a5..8b5ba45cf7 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -3933,6 +3933,7 @@ static void kvm_accel_instance_init(Object *obj)
>>      s->xen_evtchn_max_pirq = 256;
>>      s->device = NULL;
>>      s->msr_energy.enable = false;
>> +    s->pmu_cap_disabled = false;
>>  }
> 
> The CPU property "pmu" also defaults to "false"...but:
> 
>  * max CPU would override this and try to enable PMU by default in
>    max_x86_cpu_initfn().
> 
>  * Other named CPU models keep the default setting to avoid affecting
>    the migration.
> 
> The pmu_cap_disabled and “pmu” property look unbound and unassociated,
> so this can cause the conflict when they are not synchronized. For
> example,
> 
> -cpu host -accel kvm,pmu-cap-disabled=on
> 
> The above options will fail to launch a VM (on Intel platform).
> 
> Ideally, the “pmu” property and pmu-cap-disabled should be bound to each
> other and be consistent. But it's not easy because:
>  - There is no proper way to have pmu_cap_disabled set different default
>    values (e.g., "false" for max CPU and "true" for named CPU models)
>    based on different CPU models.
>  - And, no proper place to check the consistency of pmu_cap_disabled and
>    enable_pmu.
> 
> Therefore, I prefer your previous approach, to reuse current CPU "pmu"
> property.

Thank you very much for the suggestion and reasons.

I am going to follow your suggestion to switch back to the previous solution in v2.

> 
> Further, considering that this is currently the only case that needs to
> to set the VM level's capability in the CPU context, there is no need to
> introduce a new kvm interface (in your previous patch), which can instead
> be set in kvm_cpu_realizefn(), like:
> 
> 
> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
> index 99d1941cf51c..05e9c9a1a0cf 100644
> --- a/target/i386/kvm/kvm-cpu.c
> +++ b/target/i386/kvm/kvm-cpu.c
> @@ -42,6 +42,8 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>  {
>      X86CPU *cpu = X86_CPU(cs);
>      CPUX86State *env = &cpu->env;
> +    KVMState *s = kvm_state;
> +    static bool first = true;
>      bool ret;
> 
>      /*
> @@ -63,6 +65,29 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>       *   check/update ucode_rev, phys_bits, guest_phys_bits, mwait
>       *   cpu_common_realizefn() (via xcc->parent_realize)
>       */
> +
> +    if (first) {
> +        first = false;
> +
> +        /*
> +         * Since Linux v5.18, KVM provides a VM-level capability to easily
> +         * disable PMUs; however, QEMU has been providing PMU property per
> +         * CPU since v1.6. In order to accommodate both, have to configure
> +         * the VM-level capability here.
> +         */
> +        if (!cpu->enable_pmu &&
> +            kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY)) {
> +            int r = kvm_vm_enable_cap(s, KVM_CAP_PMU_CAPABILITY, 0,
> +                                      KVM_PMU_CAP_DISABLE);
> +
> +            if (r < 0) {
> +                error_setg(errp, "kvm: Failed to disable pmu cap: %s",
> +                           strerror(-r));
> +                return false;
> +            }
> +        }
> +    }
> +
>      if (cpu->max_features) {
>          if (enable_cpu_pm) {
>              if (kvm_has_waitpkg()) {
> ---

Sure. I will limit the change within only x86 + KVM.

> 
> In addition, if PMU is disabled, why not mask the perf related bits in
> 8000_0001_ECX? :)
> 

My fault. I have masked only 0x80000022, and I forgot 0x80000001 for AMD.

Thank you very much for the reminder.


I will wait for a day or maybe the weekend. I am going to switch to the previous
solution in v2 if there isn't any further objection with a more valid reason.

Thank you very much for the feedback!

Dongli Zhang
Zhao Liu Nov. 8, 2024, 2:32 a.m. UTC | #3
> I will wait for a day or maybe the weekend. I am going to switch to the previous
> solution in v2 if there isn't any further objection with a more valid reason.
> 
> Thank you very much for the feedback!
> 

Welcome. It's now v9.2 soft frozen. I'm also continuing to review your
remaining patches.

Regards,
Zhao
Sandipan Das Nov. 8, 2024, 12:52 p.m. UTC | #4
On 11/8/2024 5:14 AM, dongli.zhang@oracle.com wrote:
> Hi Zhao,
> 
> 
> On 11/6/24 11:52 PM, Zhao Liu wrote:
>> (+Dapang & Zide)
>>
>> Hi Dongli,
>>
>> On Mon, Nov 04, 2024 at 01:40:17AM -0800, Dongli Zhang wrote:
>>> Date: Mon,  4 Nov 2024 01:40:17 -0800
>>> From: Dongli Zhang <dongli.zhang@oracle.com>
>>> Subject: [PATCH 2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set
>>>  KVM_PMU_CAP_DISABLE
>>> X-Mailer: git-send-email 2.43.5
>>>
>>> The AMD PMU virtualization is not disabled when configuring
>>> "-cpu host,-pmu" in the QEMU command line on an AMD server. Neither
>>> "-cpu host,-pmu" nor "-cpu EPYC" effectively disables AMD PMU
>>> virtualization in such an environment.
>>>
>>> As a result, VM logs typically show:
>>>
>>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>>
>>> whereas the expected logs should be:
>>>
>>> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
>>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>>
>>> This discrepancy occurs because AMD PMU does not use CPUID to determine
>>> whether PMU virtualization is supported.
>>
>> Intel platform doesn't have this issue since Linux kernel fails to check
>> the CPU family & model when "-cpu *,-pmu" option clears PMU version.
>>
>> The difference between Intel and AMD platforms, however, is that it seems
>> Intel hardly ever reaches the “...due virtualization” message, but
>> instead reports an error because it recognizes a mismatched family/model.
>>
>> This may be a drawback of the PMU driver's print message, but the result
>> is the same, it prevents the PMU driver from enabling.
>>
>> So, please mention that KVM_PMU_CAP_DISABLE doesn't change the PMU
>> behavior on Intel platform because current "pmu" property works as
>> expected.
> 
> Sure. I will mention this in v2.
> 
>>
>>> To address this, we introduce a new property, 'pmu-cap-disabled', for KVM
>>> acceleration. This property sets KVM_PMU_CAP_DISABLE if
>>> KVM_CAP_PMU_CAPABILITY is supported. Note that this feature currently
>>> supports only x86 hosts, as KVM_CAP_PMU_CAPABILITY is used exclusively for
>>> x86 systems.
>>>
>>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>>> ---
>>> Another previous solution to re-use '-cpu host,-pmu':
>>> https://urldefense.com/v3/__https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!Nm8Db-mwBoMIwKkRqzC9kgNi5uZ7SCIf43zUBn92Ar_NEbLXq-ZkrDDvpvDQ4cnS2i4VyKAp6CRVE12bRkMF$ 
>>
>> IMO, I prefer the previous version. This VM-level KVM property is
>> difficult to integrate with the existing CPU properties. Pls refer later
>> comments for reasons.
>>
>>>  accel/kvm/kvm-all.c        |  1 +
>>>  include/sysemu/kvm_int.h   |  1 +
>>>  qemu-options.hx            |  9 ++++++-
>>>  target/i386/cpu.c          |  2 +-
>>>  target/i386/kvm/kvm.c      | 52 ++++++++++++++++++++++++++++++++++++++
>>>  target/i386/kvm/kvm_i386.h |  2 ++
>>>  6 files changed, 65 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index 801cff16a5..8b5ba45cf7 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -3933,6 +3933,7 @@ static void kvm_accel_instance_init(Object *obj)
>>>      s->xen_evtchn_max_pirq = 256;
>>>      s->device = NULL;
>>>      s->msr_energy.enable = false;
>>> +    s->pmu_cap_disabled = false;
>>>  }
>>
>> The CPU property "pmu" also defaults to "false"...but:
>>
>>  * max CPU would override this and try to enable PMU by default in
>>    max_x86_cpu_initfn().
>>
>>  * Other named CPU models keep the default setting to avoid affecting
>>    the migration.
>>
>> The pmu_cap_disabled and “pmu” property look unbound and unassociated,
>> so this can cause the conflict when they are not synchronized. For
>> example,
>>
>> -cpu host -accel kvm,pmu-cap-disabled=on
>>
>> The above options will fail to launch a VM (on Intel platform).
>>
>> Ideally, the “pmu” property and pmu-cap-disabled should be bound to each
>> other and be consistent. But it's not easy because:
>>  - There is no proper way to have pmu_cap_disabled set different default
>>    values (e.g., "false" for max CPU and "true" for named CPU models)
>>    based on different CPU models.
>>  - And, no proper place to check the consistency of pmu_cap_disabled and
>>    enable_pmu.
>>
>> Therefore, I prefer your previous approach, to reuse current CPU "pmu"
>> property.
> 
> Thank you very much for the suggestion and reasons.
> 
> I am going to follow your suggestion to switch back to the previous solution in v2.
> 
>>
>> Further, considering that this is currently the only case that needs to
>> to set the VM level's capability in the CPU context, there is no need to
>> introduce a new kvm interface (in your previous patch), which can instead
>> be set in kvm_cpu_realizefn(), like:
>>
>>
>> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
>> index 99d1941cf51c..05e9c9a1a0cf 100644
>> --- a/target/i386/kvm/kvm-cpu.c
>> +++ b/target/i386/kvm/kvm-cpu.c
>> @@ -42,6 +42,8 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>>  {
>>      X86CPU *cpu = X86_CPU(cs);
>>      CPUX86State *env = &cpu->env;
>> +    KVMState *s = kvm_state;
>> +    static bool first = true;
>>      bool ret;
>>
>>      /*
>> @@ -63,6 +65,29 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>>       *   check/update ucode_rev, phys_bits, guest_phys_bits, mwait
>>       *   cpu_common_realizefn() (via xcc->parent_realize)
>>       */
>> +
>> +    if (first) {
>> +        first = false;
>> +
>> +        /*
>> +         * Since Linux v5.18, KVM provides a VM-level capability to easily
>> +         * disable PMUs; however, QEMU has been providing PMU property per
>> +         * CPU since v1.6. In order to accommodate both, have to configure
>> +         * the VM-level capability here.
>> +         */
>> +        if (!cpu->enable_pmu &&
>> +            kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY)) {
>> +            int r = kvm_vm_enable_cap(s, KVM_CAP_PMU_CAPABILITY, 0,
>> +                                      KVM_PMU_CAP_DISABLE);
>> +
>> +            if (r < 0) {
>> +                error_setg(errp, "kvm: Failed to disable pmu cap: %s",
>> +                           strerror(-r));
>> +                return false;
>> +            }
>> +        }
>> +    }
>> +
>>      if (cpu->max_features) {
>>          if (enable_cpu_pm) {
>>              if (kvm_has_waitpkg()) {
>> ---
> 
> Sure. I will limit the change within only x86 + KVM.
> 
>>
>> In addition, if PMU is disabled, why not mask the perf related bits in
>> 8000_0001_ECX? :)
>>
> 
> My fault. I have masked only 0x80000022, and I forgot 0x80000001 for AMD.
> 
> Thank you very much for the reminder.
> 
> 

I think this may not show a functional impact because the guest kernel
would anyway not register the core PMU because of being unable to access
the PMC MSRs as a result of KVM_PMU_CAP_DISABLE but its the right thing
to do because the guest OS may not always be Linux-based :)
Zhao Liu Nov. 13, 2024, 5:15 p.m. UTC | #5
> > Further, considering that this is currently the only case that needs to
> > to set the VM level's capability in the CPU context, there is no need to
> > introduce a new kvm interface (in your previous patch), which can instead
> > be set in kvm_cpu_realizefn(), like:
> >

Now your case is not the only user of kvm_arch_pre_create_vcpu(), and
TDX also needs this [*]. So, this is the support for bringing back your
previous solution (preferably with comments, as I suggested earlier,
explaining why it's necessary to handle VM-level cap in the CPU
context). :-)

[*]: https://lore.kernel.org/qemu-devel/20241105062408.3533704-8-xiaoyao.li@intel.com/
Dongli Zhang Nov. 14, 2024, 12:13 a.m. UTC | #6
On 11/13/24 9:15 AM, Zhao Liu wrote:
>>> Further, considering that this is currently the only case that needs to
>>> to set the VM level's capability in the CPU context, there is no need to
>>> introduce a new kvm interface (in your previous patch), which can instead
>>> be set in kvm_cpu_realizefn(), like:
>>>
> 
> Now your case is not the only user of kvm_arch_pre_create_vcpu(), and
> TDX also needs this [*]. So, this is the support for bringing back your
> previous solution (preferably with comments, as I suggested earlier,
> explaining why it's necessary to handle VM-level cap in the CPU
> context). :-)
> 
> [*]: https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20241105062408.3533704-8-xiaoyao.li@intel.com/__;!!ACWV5N9M2RV99hQ!PGJQ9Kv-1mo_4YX1fb4N9YvZZqHInebtCNcSniRi3qJNPIfG5zZnDp1b1gVmO-DdpYxMvO1hlH9owAHV5UMT$ 
> 

Thank you very much for the reminder!

Dongli Zhang
Mi, Dapeng Nov. 21, 2024, 10:06 a.m. UTC | #7
On 11/8/2024 7:44 AM, dongli.zhang@oracle.com wrote:
> Hi Zhao,
>
>
> On 11/6/24 11:52 PM, Zhao Liu wrote:
>> (+Dapang & Zide)
>>
>> Hi Dongli,
>>
>> On Mon, Nov 04, 2024 at 01:40:17AM -0800, Dongli Zhang wrote:
>>> Date: Mon,  4 Nov 2024 01:40:17 -0800
>>> From: Dongli Zhang <dongli.zhang@oracle.com>
>>> Subject: [PATCH 2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set
>>>  KVM_PMU_CAP_DISABLE
>>> X-Mailer: git-send-email 2.43.5
>>>
>>> The AMD PMU virtualization is not disabled when configuring
>>> "-cpu host,-pmu" in the QEMU command line on an AMD server. Neither
>>> "-cpu host,-pmu" nor "-cpu EPYC" effectively disables AMD PMU
>>> virtualization in such an environment.
>>>
>>> As a result, VM logs typically show:
>>>
>>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>>
>>> whereas the expected logs should be:
>>>
>>> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
>>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>>
>>> This discrepancy occurs because AMD PMU does not use CPUID to determine
>>> whether PMU virtualization is supported.
>> Intel platform doesn't have this issue since Linux kernel fails to check
>> the CPU family & model when "-cpu *,-pmu" option clears PMU version.
>>
>> The difference between Intel and AMD platforms, however, is that it seems
>> Intel hardly ever reaches the “...due virtualization” message, but
>> instead reports an error because it recognizes a mismatched family/model.
>>
>> This may be a drawback of the PMU driver's print message, but the result
>> is the same, it prevents the PMU driver from enabling.
>>
>> So, please mention that KVM_PMU_CAP_DISABLE doesn't change the PMU
>> behavior on Intel platform because current "pmu" property works as
>> expected.
> Sure. I will mention this in v2.
>
>>> To address this, we introduce a new property, 'pmu-cap-disabled', for KVM
>>> acceleration. This property sets KVM_PMU_CAP_DISABLE if
>>> KVM_CAP_PMU_CAPABILITY is supported. Note that this feature currently
>>> supports only x86 hosts, as KVM_CAP_PMU_CAPABILITY is used exclusively for
>>> x86 systems.
>>>
>>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>>> ---
>>> Another previous solution to re-use '-cpu host,-pmu':
>>> https://urldefense.com/v3/__https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!Nm8Db-mwBoMIwKkRqzC9kgNi5uZ7SCIf43zUBn92Ar_NEbLXq-ZkrDDvpvDQ4cnS2i4VyKAp6CRVE12bRkMF$ 
>> IMO, I prefer the previous version. This VM-level KVM property is
>> difficult to integrate with the existing CPU properties. Pls refer later
>> comments for reasons.
>>
>>>  accel/kvm/kvm-all.c        |  1 +
>>>  include/sysemu/kvm_int.h   |  1 +
>>>  qemu-options.hx            |  9 ++++++-
>>>  target/i386/cpu.c          |  2 +-
>>>  target/i386/kvm/kvm.c      | 52 ++++++++++++++++++++++++++++++++++++++
>>>  target/i386/kvm/kvm_i386.h |  2 ++
>>>  6 files changed, 65 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index 801cff16a5..8b5ba45cf7 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -3933,6 +3933,7 @@ static void kvm_accel_instance_init(Object *obj)
>>>      s->xen_evtchn_max_pirq = 256;
>>>      s->device = NULL;
>>>      s->msr_energy.enable = false;
>>> +    s->pmu_cap_disabled = false;
>>>  }
>> The CPU property "pmu" also defaults to "false"...but:
>>
>>  * max CPU would override this and try to enable PMU by default in
>>    max_x86_cpu_initfn().
>>
>>  * Other named CPU models keep the default setting to avoid affecting
>>    the migration.
>>
>> The pmu_cap_disabled and “pmu” property look unbound and unassociated,
>> so this can cause the conflict when they are not synchronized. For
>> example,
>>
>> -cpu host -accel kvm,pmu-cap-disabled=on
>>
>> The above options will fail to launch a VM (on Intel platform).
>>
>> Ideally, the “pmu” property and pmu-cap-disabled should be bound to each
>> other and be consistent. But it's not easy because:
>>  - There is no proper way to have pmu_cap_disabled set different default
>>    values (e.g., "false" for max CPU and "true" for named CPU models)
>>    based on different CPU models.
>>  - And, no proper place to check the consistency of pmu_cap_disabled and
>>    enable_pmu.
>>
>> Therefore, I prefer your previous approach, to reuse current CPU "pmu"
>> property.
> Thank you very much for the suggestion and reasons.
>
> I am going to follow your suggestion to switch back to the previous solution in v2.

+1.

 I also prefer to leverage current exist "+/-pmu" option instead of adding
a new option. More options, more complexity. When they are not
inconsistent, which has higher priority? all these are issues.

Although KVM_CAP_PMU_CAPABILITY is a VM-level PMU capability, but all CPUs
in a same VM should always share same PMU configuration (Don't consider
hybrid platforms which have many issues need to be handled specifically).


>
>> Further, considering that this is currently the only case that needs to
>> to set the VM level's capability in the CPU context, there is no need to
>> introduce a new kvm interface (in your previous patch), which can instead
>> be set in kvm_cpu_realizefn(), like:
>>
>>
>> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
>> index 99d1941cf51c..05e9c9a1a0cf 100644
>> --- a/target/i386/kvm/kvm-cpu.c
>> +++ b/target/i386/kvm/kvm-cpu.c
>> @@ -42,6 +42,8 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>>  {
>>      X86CPU *cpu = X86_CPU(cs);
>>      CPUX86State *env = &cpu->env;
>> +    KVMState *s = kvm_state;
>> +    static bool first = true;
>>      bool ret;
>>
>>      /*
>> @@ -63,6 +65,29 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>>       *   check/update ucode_rev, phys_bits, guest_phys_bits, mwait
>>       *   cpu_common_realizefn() (via xcc->parent_realize)
>>       */
>> +
>> +    if (first) {
>> +        first = false;
>> +
>> +        /*
>> +         * Since Linux v5.18, KVM provides a VM-level capability to easily
>> +         * disable PMUs; however, QEMU has been providing PMU property per
>> +         * CPU since v1.6. In order to accommodate both, have to configure
>> +         * the VM-level capability here.
>> +         */
>> +        if (!cpu->enable_pmu &&
>> +            kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY)) {
>> +            int r = kvm_vm_enable_cap(s, KVM_CAP_PMU_CAPABILITY, 0,
>> +                                      KVM_PMU_CAP_DISABLE);
>> +
>> +            if (r < 0) {
>> +                error_setg(errp, "kvm: Failed to disable pmu cap: %s",
>> +                           strerror(-r));
>> +                return false;
>> +            }
>> +        }

It seems KVM_CAP_PMU_CAPABILITY is called to only disable PMU here. From
point view of logic completeness,  KVM_CAP_PMU_CAPABILITY should be called
to enabled PMU as well when user wants to enable PMU.

I know currently we only need to disable PMU, but we may need to enable PMU
via KVM_CAP_PMU_CAPABILITY soon.

We are working on the new KVM mediated vPMU framework, Sean suggest to
leverage KVM_CAP_PMU_CAPABILITY to enable mediated vPMU dynamically
(https://lore.kernel.org/all/Zz4uhmuPcZl9vJVr@google.com/). So It would be
better if the enable logic can be added here as well.

Thanks.


>> +    }
>> +
>>      if (cpu->max_features) {
>>          if (enable_cpu_pm) {
>>              if (kvm_has_waitpkg()) {
>> ---
> Sure. I will limit the change within only x86 + KVM.
>
>> In addition, if PMU is disabled, why not mask the perf related bits in
>> 8000_0001_ECX? :)
>>
> My fault. I have masked only 0x80000022, and I forgot 0x80000001 for AMD.
>
> Thank you very much for the reminder.
>
>
> I will wait for a day or maybe the weekend. I am going to switch to the previous
> solution in v2 if there isn't any further objection with a more valid reason.
>
> Thank you very much for the feedback!
>
> Dongli Zhang
>
>
diff mbox series

Patch

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 801cff16a5..8b5ba45cf7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3933,6 +3933,7 @@  static void kvm_accel_instance_init(Object *obj)
     s->xen_evtchn_max_pirq = 256;
     s->device = NULL;
     s->msr_energy.enable = false;
+    s->pmu_cap_disabled = false;
 }
 
 /**
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index a1e72763da..cdee1a481c 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -166,6 +166,7 @@  struct KVMState
     uint16_t xen_gnttab_max_frames;
     uint16_t xen_evtchn_max_pirq;
     char *device;
+    bool pmu_cap_disabled;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/qemu-options.hx b/qemu-options.hx
index dacc9790a4..9684424835 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -191,7 +191,8 @@  DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "                eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n"
     "                notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n"
     "                thread=single|multi (enable multi-threaded TCG)\n"
-    "                device=path (KVM device path, default /dev/kvm)\n", QEMU_ARCH_ALL)
+    "                device=path (KVM device path, default /dev/kvm)\n"
+    "                pmu-cap-disabled=true|false (disable KVM_CAP_PMU_CAPABILITY, x86 only, default false)\n", QEMU_ARCH_ALL)
 SRST
 ``-accel name[,prop=value[,...]]``
     This is used to enable an accelerator. Depending on the target
@@ -277,6 +278,12 @@  SRST
         option can be used to pass the KVM device to use via a file descriptor
         by setting the value to ``/dev/fdset/NN``.
 
+    ``pmu-cap-disabled=true|false``
+        When using the KVM accelerator, it controls the disabling of
+        KVM_CAP_PMU_CAPABILITY via KVM_PMU_CAP_DISABLE. When this capability is
+        disabled, PMU virtualization is turned off at the KVM module level.
+        Note that this functionality is supported for x86 hosts only.
+
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 4490a7a8d6..c97ed74a47 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7102,7 +7102,7 @@  void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
     case 0x80000022:
         *eax = *ebx = *ecx = *edx = 0;
         /* AMD Extended Performance Monitoring and Debug */
-        if (kvm_enabled() && cpu->enable_pmu &&
+        if (kvm_enabled() && cpu->enable_pmu && !kvm_pmu_cap_disabled() &&
             (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) &&
             (env->features[FEAT_8000_0022_EAX] & CPUID_8000_0022_EAX_PERFMON_V2)) {
             *eax |= CPUID_8000_0022_EAX_PERFMON_V2;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8e17942c3b..ed73b1e7e0 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -166,6 +166,7 @@  static bool has_msr_vmx_procbased_ctls2;
 static bool has_msr_perf_capabs;
 static bool has_msr_pkrs;
 static bool has_msr_hwcr;
+static bool has_pmu_cap;
 
 static uint32_t has_architectural_pmu_version;
 static uint32_t num_architectural_pmu_gp_counters;
@@ -3196,6 +3197,11 @@  static void kvm_vm_enable_energy_msrs(KVMState *s)
     return;
 }
 
+bool kvm_pmu_cap_disabled(void)
+{
+    return kvm_state->pmu_cap_disabled;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     int ret;
@@ -3335,6 +3341,23 @@  int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
+    if (s->pmu_cap_disabled) {
+        if (has_pmu_cap) {
+            ret = kvm_vm_enable_cap(s, KVM_CAP_PMU_CAPABILITY, 0,
+                                    KVM_PMU_CAP_DISABLE);
+            if (ret < 0) {
+                s->pmu_cap_disabled = false;
+                error_report("kvm: Failed to disable pmu cap: %s",
+                             strerror(-ret));
+            }
+        } else {
+            s->pmu_cap_disabled = false;
+            error_report("kvm: KVM_CAP_PMU_CAPABILITY is not supported");
+        }
+    }
+
     return 0;
 }
 
@@ -6525,6 +6548,29 @@  static void kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v,
     s->xen_evtchn_max_pirq = value;
 }
 
+static void kvm_set_pmu_cap_disabled(Object *obj, Visitor *v,
+                                     const char *name, void *opaque,
+                                     Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+    bool pmu_cap_disabled;
+    Error *error = NULL;
+
+    if (s->fd != -1) {
+        error_setg(errp, "Cannot set properties after the accelerator has "
+                         "been initialized");
+        return;
+    }
+
+    visit_type_bool(v, name, &pmu_cap_disabled, &error);
+    if (error) {
+        error_propagate(errp, error);
+        return;
+    }
+
+    s->pmu_cap_disabled = pmu_cap_disabled;
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
     object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
@@ -6564,6 +6610,12 @@  void kvm_arch_accel_class_init(ObjectClass *oc)
                               NULL, NULL);
     object_class_property_set_description(oc, "xen-evtchn-max-pirq",
                                           "Maximum number of Xen PIRQs");
+
+    object_class_property_add(oc, "pmu-cap-disabled", "bool",
+                              NULL, kvm_set_pmu_cap_disabled,
+                              NULL, NULL);
+    object_class_property_set_description(oc, "pmu-cap-disabled",
+                                          "Disable KVM_CAP_PMU_CAPABILITY");
 }
 
 void kvm_set_max_apic_id(uint32_t max_apic_id)
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 9de9c0d303..03522de9d1 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -49,6 +49,8 @@  uint64_t kvm_arch_get_supported_msr_feature(KVMState *s, uint32_t index);
 void kvm_set_max_apic_id(uint32_t max_apic_id);
 void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask);
 
+bool kvm_pmu_cap_disabled(void);
+
 #ifdef CONFIG_KVM
 
 bool kvm_is_vm_type_supported(int type);