diff mbox series

[V10,05/18] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled

Message ID 20210806133802.3528-6-lingshan.zhu@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS | expand

Commit Message

Zhu, Lingshan Aug. 6, 2021, 1:37 p.m. UTC
From: Like Xu <like.xu@linux.intel.com>

On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
detect whether the processor supports performance monitoring facility.

It depends on the PMU is enabled for the guest, and a software write
operation to this available bit will be ignored. The proposal to ignore
the toggle in KVM is the way to go and that behavior matches bare metal.

Cc: Yao Yuan <yuan.yao@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/vmx/pmu_intel.c | 1 +
 arch/x86/kvm/x86.c           | 1 +
 2 files changed, 2 insertions(+)

Comments

Liuxiangdong Nov. 7, 2021, 10:14 a.m. UTC | #1
Hi, like and lingshan.

As said,  IA32_MISC_ENABLE[7] bit depends on the PMU is enabled for the 
guest, so a software
write openration to this bit will be ignored.

But, in this patch, all the openration that writes msr_ia32_misc_enable 
in guest could make this bit become 0.

Suppose:
When we start vm with "enable_pmu", vcpu->arch.ia32_misc_enable_msr may 
be 0x80 first.
And next, guest writes msr_ia32_misc_enable value 0x1.
What we want could be 0x81, but unfortunately, it will be 0x1 because of
"data &= ~MSR_IA32_MISC_ENABLE_EMON;"
And even if guest writes msr_ia32_misc_enable value 0x81, it will be 0x1 
also.


What we want is write operation will not change this bit. So, how about 
this?

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, 
struct msr_data *msr_info)
          }
          break;
      case MSR_IA32_MISC_ENABLE:
+        data &= ~MSR_IA32_MISC_ENABLE_EMON;
+        data |= (vcpu->arch.ia32_misc_enable_msr & 
MSR_IA32_MISC_ENABLE_EMON);
          if (!kvm_check_has_quirk(vcpu->kvm, 
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
              ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
MSR_IA32_MISC_ENABLE_MWAIT)) {
              if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))


Or is there anything in your design intention I don't understand?

Thanks!

Xiangdong Liu


On 2021/8/6 21:37, Zhu Lingshan wrote:
> From: Like Xu <like.xu@linux.intel.com>
>
> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
> detect whether the processor supports performance monitoring facility.
>
> It depends on the PMU is enabled for the guest, and a software write
> operation to this available bit will be ignored. The proposal to ignore
> the toggle in KVM is the way to go and that behavior matches bare metal.
>
> Cc: Yao Yuan <yuan.yao@intel.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   arch/x86/kvm/vmx/pmu_intel.c | 1 +
>   arch/x86/kvm/x86.c           | 1 +
>   2 files changed, 2 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 9efc1a6b8693..d9dbebe03cae 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>   	if (!pmu->version)
>   		return;
>   
> +	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>   	perf_get_x86_pmu_capability(&x86_pmu);
>   
>   	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index efd11702465c..f6b6984e26ef 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		}
>   		break;
>   	case MSR_IA32_MISC_ENABLE:
> +		data &= ~MSR_IA32_MISC_ENABLE_EMON;
>   		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>   		    ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
>   			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
Like Xu Nov. 8, 2021, 3:06 a.m. UTC | #2
On 7/11/2021 6:14 pm, Liuxiangdong wrote:
> Hi, like and lingshan.
> 
> As said,  IA32_MISC_ENABLE[7] bit depends on the PMU is enabled for the guest, 
> so a software
> write openration to this bit will be ignored.
> 
> But, in this patch, all the openration that writes msr_ia32_misc_enable in guest 
> could make this bit become 0.
> 
> Suppose:
> When we start vm with "enable_pmu", vcpu->arch.ia32_misc_enable_msr may be 0x80 
> first.
> And next, guest writes msr_ia32_misc_enable value 0x1.
> What we want could be 0x81, but unfortunately, it will be 0x1 because of
> "data &= ~MSR_IA32_MISC_ENABLE_EMON;"
> And even if guest writes msr_ia32_misc_enable value 0x81, it will be 0x1 also.
> 

Yes and thank you. The fix has been committed on my private tree for a long time.

> 
> What we want is write operation will not change this bit. So, how about this?
> 
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
>           }
>           break;
>       case MSR_IA32_MISC_ENABLE:
> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
> +        data |= (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_EMON);
>           if (!kvm_check_has_quirk(vcpu->kvm, 
> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
> MSR_IA32_MISC_ENABLE_MWAIT)) {
>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> 
> 

How about this for the final state considering PEBS enabling:

	case MSR_IA32_MISC_ENABLE: {
		u64 old_val = vcpu->arch.ia32_misc_enable_msr;
		u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
			MSR_IA32_MISC_ENABLE_EMON;

		/* RO bits */
		if (!msr_info->host_initiated &&
		    ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
			return 1;

		/*
		 * For a dummy user space, the order of setting vPMU capabilities and
		 * initialising MSR_IA32_MISC_ENABLE is not strictly guaranteed, so to
		 * avoid inconsistent functionality we keep the vPMU bits unchanged here.
		 */
		data &= ~pmu_mask;
		data |= old_val & pmu_mask;
		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
		    ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
				return 1;
			vcpu->arch.ia32_misc_enable_msr = data;
			kvm_update_cpuid_runtime(vcpu);
		} else {
			vcpu->arch.ia32_misc_enable_msr = data;
		}
		break;
	}

> Or is there anything in your design intention I don't understand?
> 
> Thanks!
> 
> Xiangdong Liu
> 
> 
> On 2021/8/6 21:37, Zhu Lingshan wrote:
>> From: Like Xu <like.xu@linux.intel.com>
>>
>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
>> detect whether the processor supports performance monitoring facility.
>>
>> It depends on the PMU is enabled for the guest, and a software write
>> operation to this available bit will be ignored. The proposal to ignore
>> the toggle in KVM is the way to go and that behavior matches bare metal.
>>
>> Cc: Yao Yuan <yuan.yao@intel.com>
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> ---
>>   arch/x86/kvm/vmx/pmu_intel.c | 1 +
>>   arch/x86/kvm/x86.c           | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 9efc1a6b8693..d9dbebe03cae 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>>       if (!pmu->version)
>>           return;
>> +    vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>>       perf_get_x86_pmu_capability(&x86_pmu);
>>       pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index efd11702465c..f6b6984e26ef 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
>> msr_data *msr_info)
>>           }
>>           break;
>>       case MSR_IA32_MISC_ENABLE:
>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>
Liuxiangdong Nov. 8, 2021, 4:07 a.m. UTC | #3
On 2021/11/8 11:06, Like Xu wrote:
> On 7/11/2021 6:14 pm, Liuxiangdong wrote:
>> Hi, like and lingshan.
>>
>> As said,  IA32_MISC_ENABLE[7] bit depends on the PMU is enabled for 
>> the guest, so a software
>> write openration to this bit will be ignored.
>>
>> But, in this patch, all the openration that writes 
>> msr_ia32_misc_enable in guest could make this bit become 0.
>>
>> Suppose:
>> When we start vm with "enable_pmu", vcpu->arch.ia32_misc_enable_msr 
>> may be 0x80 first.
>> And next, guest writes msr_ia32_misc_enable value 0x1.
>> What we want could be 0x81, but unfortunately, it will be 0x1 because of
>> "data &= ~MSR_IA32_MISC_ENABLE_EMON;"
>> And even if guest writes msr_ia32_misc_enable value 0x81, it will be 
>> 0x1 also.
>>
>
> Yes and thank you. The fix has been committed on my private tree for a 
> long time.
>
>>
>> What we want is write operation will not change this bit. So, how 
>> about this?
>>
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, 
>> struct msr_data *msr_info)
>>           }
>>           break;
>>       case MSR_IA32_MISC_ENABLE:
>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>> +        data |= (vcpu->arch.ia32_misc_enable_msr & 
>> MSR_IA32_MISC_ENABLE_EMON);
>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>
>>
>
> How about this for the final state considering PEBS enabling:
>
>     case MSR_IA32_MISC_ENABLE: {
>         u64 old_val = vcpu->arch.ia32_misc_enable_msr;
>         u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>             MSR_IA32_MISC_ENABLE_EMON;
>
         u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
             MSR_IA32_MISC_ENABLE_EMON;

Repetitive "MSR_IA32_MISC_ENABLE_EMON" ?

>         /* RO bits */
>         if (!msr_info->host_initiated &&
>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>             return 1;
>
>         /*
>          * For a dummy user space, the order of setting vPMU 
> capabilities and
>          * initialising MSR_IA32_MISC_ENABLE is not strictly 
> guaranteed, so to
>          * avoid inconsistent functionality we keep the vPMU bits 
> unchanged here.
>          */
Yes. It's a little clearer with comments.
>         data &= ~pmu_mask;
>         data |= old_val & pmu_mask;
>         if (!kvm_check_has_quirk(vcpu->kvm, 
> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
>             if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>                 return 1;
>             vcpu->arch.ia32_misc_enable_msr = data;
>             kvm_update_cpuid_runtime(vcpu);
>         } else {
>             vcpu->arch.ia32_misc_enable_msr = data;
>         }
>         break;
>     }
>
>> Or is there anything in your design intention I don't understand?
>>
>> Thanks!
>>
>> Xiangdong Liu
>>
>>
>> On 2021/8/6 21:37, Zhu Lingshan wrote:
>>> From: Like Xu <like.xu@linux.intel.com>
>>>
>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
>>> detect whether the processor supports performance monitoring facility.
>>>
>>> It depends on the PMU is enabled for the guest, and a software write
>>> operation to this available bit will be ignored. The proposal to ignore
>>> the toggle in KVM is the way to go and that behavior matches bare 
>>> metal.
>>>
>>> Cc: Yao Yuan <yuan.yao@intel.com>
>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> ---
>>>   arch/x86/kvm/vmx/pmu_intel.c | 1 +
>>>   arch/x86/kvm/x86.c           | 1 +
>>>   2 files changed, 2 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c 
>>> b/arch/x86/kvm/vmx/pmu_intel.c
>>> index 9efc1a6b8693..d9dbebe03cae 100644
>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu 
>>> *vcpu)
>>>       if (!pmu->version)
>>>           return;
>>> +    vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>>>       perf_get_x86_pmu_capability(&x86_pmu);
>>>       pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index efd11702465c..f6b6984e26ef 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, 
>>> struct msr_data *msr_info)
>>>           }
>>>           break;
>>>       case MSR_IA32_MISC_ENABLE:
>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>
Like Xu Nov. 8, 2021, 4:11 a.m. UTC | #4
On 8/11/2021 12:07 pm, Liuxiangdong wrote:
> 
> 
> On 2021/11/8 11:06, Like Xu wrote:
>> On 7/11/2021 6:14 pm, Liuxiangdong wrote:
>>> Hi, like and lingshan.
>>>
>>> As said,  IA32_MISC_ENABLE[7] bit depends on the PMU is enabled for the 
>>> guest, so a software
>>> write openration to this bit will be ignored.
>>>
>>> But, in this patch, all the openration that writes msr_ia32_misc_enable in 
>>> guest could make this bit become 0.
>>>
>>> Suppose:
>>> When we start vm with "enable_pmu", vcpu->arch.ia32_misc_enable_msr may be 
>>> 0x80 first.
>>> And next, guest writes msr_ia32_misc_enable value 0x1.
>>> What we want could be 0x81, but unfortunately, it will be 0x1 because of
>>> "data &= ~MSR_IA32_MISC_ENABLE_EMON;"
>>> And even if guest writes msr_ia32_misc_enable value 0x81, it will be 0x1 also.
>>>
>>
>> Yes and thank you. The fix has been committed on my private tree for a long time.
>>
>>>
>>> What we want is write operation will not change this bit. So, how about this?
>>>
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
>>> msr_data *msr_info)
>>>           }
>>>           break;
>>>       case MSR_IA32_MISC_ENABLE:
>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>> +        data |= (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_EMON);
>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>
>>>
>>
>> How about this for the final state considering PEBS enabling:
>>
>>     case MSR_IA32_MISC_ENABLE: {
>>         u64 old_val = vcpu->arch.ia32_misc_enable_msr;
>>         u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>             MSR_IA32_MISC_ENABLE_EMON;
>>
>          u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>              MSR_IA32_MISC_ENABLE_EMON;
> 
> Repetitive "MSR_IA32_MISC_ENABLE_EMON" ?

Oops,
  		
	u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
			MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;

I'll send the fix after sync with Lingshan.

> 
>>         /* RO bits */
>>         if (!msr_info->host_initiated &&
>>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>>             return 1;
>>
>>         /*
>>          * For a dummy user space, the order of setting vPMU capabilities and
>>          * initialising MSR_IA32_MISC_ENABLE is not strictly guaranteed, so to
>>          * avoid inconsistent functionality we keep the vPMU bits unchanged here.
>>          */
> Yes. It's a little clearer with comments.

Thanks for your feedback! Enjoy the feature.

>>         data &= ~pmu_mask;
>>         data |= old_val & pmu_mask;
>>         if (!kvm_check_has_quirk(vcpu->kvm, 
>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
>>             if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>                 return 1;
>>             vcpu->arch.ia32_misc_enable_msr = data;
>>             kvm_update_cpuid_runtime(vcpu);
>>         } else {
>>             vcpu->arch.ia32_misc_enable_msr = data;
>>         }
>>         break;
>>     }
>>
>>> Or is there anything in your design intention I don't understand?
>>>
>>> Thanks!
>>>
>>> Xiangdong Liu
>>>
>>>
>>> On 2021/8/6 21:37, Zhu Lingshan wrote:
>>>> From: Like Xu <like.xu@linux.intel.com>
>>>>
>>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
>>>> detect whether the processor supports performance monitoring facility.
>>>>
>>>> It depends on the PMU is enabled for the guest, and a software write
>>>> operation to this available bit will be ignored. The proposal to ignore
>>>> the toggle in KVM is the way to go and that behavior matches bare metal.
>>>>
>>>> Cc: Yao Yuan <yuan.yao@intel.com>
>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>> ---
>>>>   arch/x86/kvm/vmx/pmu_intel.c | 1 +
>>>>   arch/x86/kvm/x86.c           | 1 +
>>>>   2 files changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>>>> index 9efc1a6b8693..d9dbebe03cae 100644
>>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>>>>       if (!pmu->version)
>>>>           return;
>>>> +    vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>>>>       perf_get_x86_pmu_capability(&x86_pmu);
>>>>       pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index efd11702465c..f6b6984e26ef 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
>>>> msr_data *msr_info)
>>>>           }
>>>>           break;
>>>>       case MSR_IA32_MISC_ENABLE:
>>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>
>
Liuxiangdong Nov. 8, 2021, 8:27 a.m. UTC | #5
On 2021/11/8 12:11, Like Xu wrote:
> On 8/11/2021 12:07 pm, Liuxiangdong wrote:
>>
>>
>> On 2021/11/8 11:06, Like Xu wrote:
>>> On 7/11/2021 6:14 pm, Liuxiangdong wrote:
>>>> Hi, like and lingshan.
>>>>
>>>> As said,  IA32_MISC_ENABLE[7] bit depends on the PMU is enabled for 
>>>> the guest, so a software
>>>> write openration to this bit will be ignored.
>>>>
>>>> But, in this patch, all the openration that writes 
>>>> msr_ia32_misc_enable in guest could make this bit become 0.
>>>>
>>>> Suppose:
>>>> When we start vm with "enable_pmu", vcpu->arch.ia32_misc_enable_msr 
>>>> may be 0x80 first.
>>>> And next, guest writes msr_ia32_misc_enable value 0x1.
>>>> What we want could be 0x81, but unfortunately, it will be 0x1 
>>>> because of
>>>> "data &= ~MSR_IA32_MISC_ENABLE_EMON;"
>>>> And even if guest writes msr_ia32_misc_enable value 0x81, it will 
>>>> be 0x1 also.
>>>>
>>>
>>> Yes and thank you. The fix has been committed on my private tree for 
>>> a long time.
>>>
>>>>
>>>> What we want is write operation will not change this bit. So, how 
>>>> about this?
>>>>
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, 
>>>> struct msr_data *msr_info)
>>>>           }
>>>>           break;
>>>>       case MSR_IA32_MISC_ENABLE:
>>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>> +        data |= (vcpu->arch.ia32_misc_enable_msr & 
>>>> MSR_IA32_MISC_ENABLE_EMON);
>>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>
>>>>
>>>
>>> How about this for the final state considering PEBS enabling:
>>>
>>>     case MSR_IA32_MISC_ENABLE: {
>>>         u64 old_val = vcpu->arch.ia32_misc_enable_msr;
>>>         u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>>             MSR_IA32_MISC_ENABLE_EMON;
>>>
>>          u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>              MSR_IA32_MISC_ENABLE_EMON;
>>
>> Repetitive "MSR_IA32_MISC_ENABLE_EMON" ?
>
> Oops,
>
>     u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>             MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
>

Yes. bit[12] is also read-only, so we can keep this bit unchanged also.

And, because write operation will not change this bit by "pmu_mask", do 
we still need this if statement?

         /* RO bits */
         if (!msr_info->host_initiated &&
             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
             return 1;

"(old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL" means some 
operation tries to change this bit,
so we cannot allow it.
But, if there is no this judgement, "pmu_mask" will still make this 
bit[12] no change.

The only difference is that we can not change other bit (except bit 12 
and bit 7) once "old_val[12] != data[12]" if there exists this statement
and we can change other bit if there is no judgement.

For both MSR_IA32_MISC_ENABLE_EMON and MSR_IA32_MISC_ENABLE_EMON are 
read-only, maybe we can keep
their behavioral consistency. Either both judge, or neither.

Do you think so?


> I'll send the fix after sync with Lingshan.
>
>>
>>>         /* RO bits */
>>>         if (!msr_info->host_initiated &&
>>>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>>>             return 1;
>>>
>>>         /*
>>>          * For a dummy user space, the order of setting vPMU 
>>> capabilities and
>>>          * initialising MSR_IA32_MISC_ENABLE is not strictly 
>>> guaranteed, so to
>>>          * avoid inconsistent functionality we keep the vPMU bits 
>>> unchanged here.
>>>          */
>> Yes. It's a little clearer with comments.
>
> Thanks for your feedback! Enjoy the feature.
>
>>>         data &= ~pmu_mask;
>>>         data |= old_val & pmu_mask;
>>>         if (!kvm_check_has_quirk(vcpu->kvm, 
>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>             if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>                 return 1;
>>>             vcpu->arch.ia32_misc_enable_msr = data;
>>>             kvm_update_cpuid_runtime(vcpu);
>>>         } else {
>>>             vcpu->arch.ia32_misc_enable_msr = data;
>>>         }
>>>         break;
>>>     }
>>>
>>>> Or is there anything in your design intention I don't understand?
>>>>
>>>> Thanks!
>>>>
>>>> Xiangdong Liu
>>>>
>>>>
>>>> On 2021/8/6 21:37, Zhu Lingshan wrote:
>>>>> From: Like Xu <like.xu@linux.intel.com>
>>>>>
>>>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] 
>>>>> bit to
>>>>> detect whether the processor supports performance monitoring 
>>>>> facility.
>>>>>
>>>>> It depends on the PMU is enabled for the guest, and a software write
>>>>> operation to this available bit will be ignored. The proposal to 
>>>>> ignore
>>>>> the toggle in KVM is the way to go and that behavior matches bare 
>>>>> metal.
>>>>>
>>>>> Cc: Yao Yuan <yuan.yao@intel.com>
>>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>>> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>>> ---
>>>>>   arch/x86/kvm/vmx/pmu_intel.c | 1 +
>>>>>   arch/x86/kvm/x86.c           | 1 +
>>>>>   2 files changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c 
>>>>> b/arch/x86/kvm/vmx/pmu_intel.c
>>>>> index 9efc1a6b8693..d9dbebe03cae 100644
>>>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>>>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu 
>>>>> *vcpu)
>>>>>       if (!pmu->version)
>>>>>           return;
>>>>> +    vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>>>>>       perf_get_x86_pmu_capability(&x86_pmu);
>>>>>       pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>> index efd11702465c..f6b6984e26ef 100644
>>>>> --- a/arch/x86/kvm/x86.c
>>>>> +++ b/arch/x86/kvm/x86.c
>>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu 
>>>>> *vcpu, struct msr_data *msr_info)
>>>>>           }
>>>>>           break;
>>>>>       case MSR_IA32_MISC_ENABLE:
>>>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>
>>
Like Xu Nov. 8, 2021, 8:44 a.m. UTC | #6
On 8/11/2021 4:27 pm, Liuxiangdong wrote:
> 
> 
> On 2021/11/8 12:11, Like Xu wrote:
>> On 8/11/2021 12:07 pm, Liuxiangdong wrote:
>>>
>>>
>>> On 2021/11/8 11:06, Like Xu wrote:
>>>> On 7/11/2021 6:14 pm, Liuxiangdong wrote:
>>>>> Hi, like and lingshan.
>>>>>
>>>>> As said,  IA32_MISC_ENABLE[7] bit depends on the PMU is enabled for the 
>>>>> guest, so a software
>>>>> write openration to this bit will be ignored.
>>>>>
>>>>> But, in this patch, all the openration that writes msr_ia32_misc_enable in 
>>>>> guest could make this bit become 0.
>>>>>
>>>>> Suppose:
>>>>> When we start vm with "enable_pmu", vcpu->arch.ia32_misc_enable_msr may be 
>>>>> 0x80 first.
>>>>> And next, guest writes msr_ia32_misc_enable value 0x1.
>>>>> What we want could be 0x81, but unfortunately, it will be 0x1 because of
>>>>> "data &= ~MSR_IA32_MISC_ENABLE_EMON;"
>>>>> And even if guest writes msr_ia32_misc_enable value 0x81, it will be 0x1 also.
>>>>>
>>>>
>>>> Yes and thank you. The fix has been committed on my private tree for a long 
>>>> time.
>>>>
>>>>>
>>>>> What we want is write operation will not change this bit. So, how about this?
>>>>>
>>>>> --- a/arch/x86/kvm/x86.c
>>>>> +++ b/arch/x86/kvm/x86.c
>>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
>>>>> msr_data *msr_info)
>>>>>           }
>>>>>           break;
>>>>>       case MSR_IA32_MISC_ENABLE:
>>>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>>> +        data |= (vcpu->arch.ia32_misc_enable_msr & 
>>>>> MSR_IA32_MISC_ENABLE_EMON);
>>>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>>
>>>>>
>>>>
>>>> How about this for the final state considering PEBS enabling:
>>>>
>>>>     case MSR_IA32_MISC_ENABLE: {
>>>>         u64 old_val = vcpu->arch.ia32_misc_enable_msr;
>>>>         u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>>>             MSR_IA32_MISC_ENABLE_EMON;
>>>>
>>>          u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>>              MSR_IA32_MISC_ENABLE_EMON;
>>>
>>> Repetitive "MSR_IA32_MISC_ENABLE_EMON" ?
>>
>> Oops,
>>
>>     u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>             MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
>>
> 
> Yes. bit[12] is also read-only, so we can keep this bit unchanged also.
> 
> And, because write operation will not change this bit by "pmu_mask", do we still 
> need this if statement?
> 
>          /* RO bits */
>          if (!msr_info->host_initiated &&
>              ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>              return 1;
> 
> "(old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL" means some operation 
> tries to change this bit,
> so we cannot allow it.
> But, if there is no this judgement, "pmu_mask" will still make this bit[12] no 
> change.
> 
> The only difference is that we can not change other bit (except bit 12 and bit 
> 7) once "old_val[12] != data[12]" if there exists this statement
> and we can change other bit if there is no judgement.
> 
> For both MSR_IA32_MISC_ENABLE_EMON and MSR_IA32_MISC_ENABLE_EMON are read-only, 
> maybe we can keep
> their behavioral consistency. Either both judge, or neither.

One more difference per Intel SDM, I assume:

For Bit 7, Performance Monitoring Available (R)
	(R)  means that attempts to change this bit will be silent;
For Bit 12, Processor Event Based Sampling (PEBS) Unavailable (RO),
	(RO) means that attempts to change this bit will be #GP;

> 
> Do you think so?
> 
> 
>> I'll send the fix after sync with Lingshan.
>>
>>>
>>>>         /* RO bits */
>>>>         if (!msr_info->host_initiated &&
>>>>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>>>>             return 1;
>>>>
>>>>         /*
>>>>          * For a dummy user space, the order of setting vPMU capabilities and
>>>>          * initialising MSR_IA32_MISC_ENABLE is not strictly guaranteed, so to
>>>>          * avoid inconsistent functionality we keep the vPMU bits unchanged 
>>>> here.
>>>>          */
>>> Yes. It's a little clearer with comments.
>>
>> Thanks for your feedback! Enjoy the feature.
>>
>>>>         data &= ~pmu_mask;
>>>>         data |= old_val & pmu_mask;
>>>>         if (!kvm_check_has_quirk(vcpu->kvm, 
>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>             if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>                 return 1;
>>>>             vcpu->arch.ia32_misc_enable_msr = data;
>>>>             kvm_update_cpuid_runtime(vcpu);
>>>>         } else {
>>>>             vcpu->arch.ia32_misc_enable_msr = data;
>>>>         }
>>>>         break;
>>>>     }
>>>>
>>>>> Or is there anything in your design intention I don't understand?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Xiangdong Liu
>>>>>
>>>>>
>>>>> On 2021/8/6 21:37, Zhu Lingshan wrote:
>>>>>> From: Like Xu <like.xu@linux.intel.com>
>>>>>>
>>>>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
>>>>>> detect whether the processor supports performance monitoring facility.
>>>>>>
>>>>>> It depends on the PMU is enabled for the guest, and a software write
>>>>>> operation to this available bit will be ignored. The proposal to ignore
>>>>>> the toggle in KVM is the way to go and that behavior matches bare metal.
>>>>>>
>>>>>> Cc: Yao Yuan <yuan.yao@intel.com>
>>>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>>>> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>>>> ---
>>>>>>   arch/x86/kvm/vmx/pmu_intel.c | 1 +
>>>>>>   arch/x86/kvm/x86.c           | 1 +
>>>>>>   2 files changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>>>>>> index 9efc1a6b8693..d9dbebe03cae 100644
>>>>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>>>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>>>>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>>>>>>       if (!pmu->version)
>>>>>>           return;
>>>>>> +    vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>>>>>>       perf_get_x86_pmu_capability(&x86_pmu);
>>>>>>       pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>> index efd11702465c..f6b6984e26ef 100644
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
>>>>>> msr_data *msr_info)
>>>>>>           }
>>>>>>           break;
>>>>>>       case MSR_IA32_MISC_ENABLE:
>>>>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>>
>>>
>
Liuxiangdong Nov. 8, 2021, 10:06 a.m. UTC | #7
On 2021/11/8 16:44, Like Xu wrote:
> On 8/11/2021 4:27 pm, Liuxiangdong wrote:
>>
>>
>> On 2021/11/8 12:11, Like Xu wrote:
>>> On 8/11/2021 12:07 pm, Liuxiangdong wrote:
>>>>
>>>>
>>>> On 2021/11/8 11:06, Like Xu wrote:
>>>>> On 7/11/2021 6:14 pm, Liuxiangdong wrote:
>>>>>> Hi, like and lingshan.
>>>>>>
>>>>>> As said,  IA32_MISC_ENABLE[7] bit depends on the PMU is enabled 
>>>>>> for the guest, so a software
>>>>>> write openration to this bit will be ignored.
>>>>>>
>>>>>> But, in this patch, all the openration that writes 
>>>>>> msr_ia32_misc_enable in guest could make this bit become 0.
>>>>>>
>>>>>> Suppose:
>>>>>> When we start vm with "enable_pmu", 
>>>>>> vcpu->arch.ia32_misc_enable_msr may be 0x80 first.
>>>>>> And next, guest writes msr_ia32_misc_enable value 0x1.
>>>>>> What we want could be 0x81, but unfortunately, it will be 0x1 
>>>>>> because of
>>>>>> "data &= ~MSR_IA32_MISC_ENABLE_EMON;"
>>>>>> And even if guest writes msr_ia32_misc_enable value 0x81, it will 
>>>>>> be 0x1 also.
>>>>>>
>>>>>
>>>>> Yes and thank you. The fix has been committed on my private tree 
>>>>> for a long time.
>>>>>
>>>>>>
>>>>>> What we want is write operation will not change this bit. So, how 
>>>>>> about this?
>>>>>>
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu 
>>>>>> *vcpu, struct msr_data *msr_info)
>>>>>>           }
>>>>>>           break;
>>>>>>       case MSR_IA32_MISC_ENABLE:
>>>>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>>>> +        data |= (vcpu->arch.ia32_misc_enable_msr & 
>>>>>> MSR_IA32_MISC_ENABLE_EMON);
>>>>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>>>
>>>>>>
>>>>>
>>>>> How about this for the final state considering PEBS enabling:
>>>>>
>>>>>     case MSR_IA32_MISC_ENABLE: {
>>>>>         u64 old_val = vcpu->arch.ia32_misc_enable_msr;
>>>>>         u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>>>>             MSR_IA32_MISC_ENABLE_EMON;
>>>>>
>>>>          u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>>>              MSR_IA32_MISC_ENABLE_EMON;
>>>>
>>>> Repetitive "MSR_IA32_MISC_ENABLE_EMON" ?
>>>
>>> Oops,
>>>
>>>     u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>>             MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
>>>
>>
>> Yes. bit[12] is also read-only, so we can keep this bit unchanged also.
>>
>> And, because write operation will not change this bit by "pmu_mask", 
>> do we still need this if statement?
>>
>>          /* RO bits */
>>          if (!msr_info->host_initiated &&
>>              ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>>              return 1;
>>
>> "(old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL" means some 
>> operation tries to change this bit,
>> so we cannot allow it.
>> But, if there is no this judgement, "pmu_mask" will still make this 
>> bit[12] no change.
>>
>> The only difference is that we can not change other bit (except bit 
>> 12 and bit 7) once "old_val[12] != data[12]" if there exists this 
>> statement
>> and we can change other bit if there is no judgement.
>>
>> For both MSR_IA32_MISC_ENABLE_EMON and MSR_IA32_MISC_ENABLE_EMON are 
>> read-only, maybe we can keep
>> their behavioral consistency. Either both judge, or neither.
>
> One more difference per Intel SDM, I assume:
>
> For Bit 7, Performance Monitoring Available (R)
>     (R)  means that attempts to change this bit will be silent;
> For Bit 12, Processor Event Based Sampling (PEBS) Unavailable (RO),
>     (RO) means that attempts to change this bit will be #GP;
>

Yes, I found it in SDM. You're right.  Thanks for your explanation!

>>
>> Do you think so?
>>
>>
>>> I'll send the fix after sync with Lingshan.
>>>
>>>>
>>>>>         /* RO bits */
>>>>>         if (!msr_info->host_initiated &&
>>>>>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>>>>>             return 1;
>>>>>
>>>>>         /*
>>>>>          * For a dummy user space, the order of setting vPMU 
>>>>> capabilities and
>>>>>          * initialising MSR_IA32_MISC_ENABLE is not strictly 
>>>>> guaranteed, so to
>>>>>          * avoid inconsistent functionality we keep the vPMU bits 
>>>>> unchanged here.
>>>>>          */
>>>> Yes. It's a little clearer with comments.
>>>
>>> Thanks for your feedback! Enjoy the feature.
>>>
>>>>>         data &= ~pmu_mask;
>>>>>         data |= old_val & pmu_mask;
>>>>>         if (!kvm_check_has_quirk(vcpu->kvm, 
>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>>             ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>>             if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>>                 return 1;
>>>>>             vcpu->arch.ia32_misc_enable_msr = data;
>>>>>             kvm_update_cpuid_runtime(vcpu);
>>>>>         } else {
>>>>>             vcpu->arch.ia32_misc_enable_msr = data;
>>>>>         }
>>>>>         break;
>>>>>     }
>>>>>
>>>>>> Or is there anything in your design intention I don't understand?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Xiangdong Liu
>>>>>>
>>>>>>
>>>>>> On 2021/8/6 21:37, Zhu Lingshan wrote:
>>>>>>> From: Like Xu <like.xu@linux.intel.com>
>>>>>>>
>>>>>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] 
>>>>>>> bit to
>>>>>>> detect whether the processor supports performance monitoring 
>>>>>>> facility.
>>>>>>>
>>>>>>> It depends on the PMU is enabled for the guest, and a software 
>>>>>>> write
>>>>>>> operation to this available bit will be ignored. The proposal to 
>>>>>>> ignore
>>>>>>> the toggle in KVM is the way to go and that behavior matches 
>>>>>>> bare metal.
>>>>>>>
>>>>>>> Cc: Yao Yuan <yuan.yao@intel.com>
>>>>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>>>>> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>>>>> ---
>>>>>>>   arch/x86/kvm/vmx/pmu_intel.c | 1 +
>>>>>>>   arch/x86/kvm/x86.c           | 1 +
>>>>>>>   2 files changed, 2 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c 
>>>>>>> b/arch/x86/kvm/vmx/pmu_intel.c
>>>>>>> index 9efc1a6b8693..d9dbebe03cae 100644
>>>>>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>>>>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>>>>>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct 
>>>>>>> kvm_vcpu *vcpu)
>>>>>>>       if (!pmu->version)
>>>>>>>           return;
>>>>>>> +    vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>>>>>>>       perf_get_x86_pmu_capability(&x86_pmu);
>>>>>>>       pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>>> index efd11702465c..f6b6984e26ef 100644
>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu 
>>>>>>> *vcpu, struct msr_data *msr_info)
>>>>>>>           }
>>>>>>>           break;
>>>>>>>       case MSR_IA32_MISC_ENABLE:
>>>>>>> +        data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>>>>>           if (!kvm_check_has_quirk(vcpu->kvm, 
>>>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>>>>               ((vcpu->arch.ia32_misc_enable_msr ^ data) & 
>>>>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>>>>               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>>>
>>>>
>>
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9efc1a6b8693..d9dbebe03cae 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -488,6 +488,7 @@  static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	if (!pmu->version)
 		return;
 
+	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
 	perf_get_x86_pmu_capability(&x86_pmu);
 
 	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efd11702465c..f6b6984e26ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3321,6 +3321,7 @@  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_MISC_ENABLE:
+		data &= ~MSR_IA32_MISC_ENABLE_EMON;
 		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
 		    ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
 			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))