diff mbox series

kvm: vmx: Limit guest PMCs to those supported on the host

Message ID 20190930233854.158117-1-jmattson@google.com (mailing list archive)
State New, archived
Headers show
Series kvm: vmx: Limit guest PMCs to those supported on the host | expand

Commit Message

Jim Mattson Sept. 30, 2019, 11:38 p.m. UTC
KVM can only virtualize as many PMCs as the host supports.

Limit the number of generic counters and fixed counters to the number
of corresponding counters supported on the host, rather than to
INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively.

Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
contiguous MSR indices reserved by Intel for event selectors. Since
the existing code relies on a contiguous range of MSR indices for
event selectors, it can't possibly work for more than 18 general
purpose counters.

Fixes: f5132b01386b5a ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Vitaly Kuznetsov Oct. 1, 2019, 11:32 a.m. UTC | #1
Jim Mattson <jmattson@google.com> writes:

> KVM can only virtualize as many PMCs as the host supports.
>
> Limit the number of generic counters and fixed counters to the number
> of corresponding counters supported on the host, rather than to
> INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively.
>
> Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
> contiguous MSR indices reserved by Intel for event selectors. Since
> the existing code relies on a contiguous range of MSR indices for
> event selectors, it can't possibly work for more than 18 general
> purpose counters.

Should we also trim msrs_to_save[] by removing impossible entries
(18-31) then?

>
> Fixes: f5132b01386b5a ("KVM: Expose a version 2 architectural PMU to a guests")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Marc Orr <marcorr@google.com>
> ---
>  arch/x86/kvm/vmx/pmu_intel.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 4dea0e0e7e392..3e9c059099e94 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -262,6 +262,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> +	struct x86_pmu_capability x86_pmu;
>  	struct kvm_cpuid_entry2 *entry;
>  	union cpuid10_eax eax;
>  	union cpuid10_edx edx;
> @@ -283,8 +284,10 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  	if (!pmu->version)
>  		return;
>  
> +	perf_get_x86_pmu_capability(&x86_pmu);
> +
>  	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
> -					INTEL_PMC_MAX_GENERIC);
> +					 x86_pmu.num_counters_gp);

This is a theoretical fix which is orthogonal to the issue with
state_test I reported on Friday, right? Because in my case
'eax.split.num_counters' is already 8.

>  	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1;
>  	pmu->available_event_types = ~entry->ebx &
>  					((1ull << eax.split.mask_length) - 1);
> @@ -294,7 +297,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  	} else {
>  		pmu->nr_arch_fixed_counters =
>  			min_t(int, edx.split.num_counters_fixed,
> -				INTEL_PMC_MAX_FIXED);
> +			      x86_pmu.num_counters_fixed);
>  		pmu->counter_bitmask[KVM_PMC_FIXED] =
>  			((u64)1 << edx.split.bit_width_fixed) - 1;
>  	}
Paolo Bonzini Oct. 1, 2019, 1:28 p.m. UTC | #2
On 01/10/19 13:32, Vitaly Kuznetsov wrote:
> Jim Mattson <jmattson@google.com> writes:
> 
>> KVM can only virtualize as many PMCs as the host supports.
>>
>> Limit the number of generic counters and fixed counters to the number
>> of corresponding counters supported on the host, rather than to
>> INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively.
>>
>> Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
>> contiguous MSR indices reserved by Intel for event selectors. Since
>> the existing code relies on a contiguous range of MSR indices for
>> event selectors, it can't possibly work for more than 18 general
>> purpose counters.
> 
> Should we also trim msrs_to_save[] by removing impossible entries
> (18-31) then?

Yes, I'll send a patch in a second.

Paolo
Jim Mattson Oct. 1, 2019, 2:07 p.m. UTC | #3
On Tue, Oct 1, 2019 at 6:29 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 01/10/19 13:32, Vitaly Kuznetsov wrote:
> > Jim Mattson <jmattson@google.com> writes:
> >
> >> KVM can only virtualize as many PMCs as the host supports.
> >>
> >> Limit the number of generic counters and fixed counters to the number
> >> of corresponding counters supported on the host, rather than to
> >> INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively.
> >>
> >> Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
> >> contiguous MSR indices reserved by Intel for event selectors. Since
> >> the existing code relies on a contiguous range of MSR indices for
> >> event selectors, it can't possibly work for more than 18 general
> >> purpose counters.
> >
> > Should we also trim msrs_to_save[] by removing impossible entries
> > (18-31) then?
>
> Yes, I'll send a patch in a second.
>
> Paolo

I thought you were going to revert that msrs_to_save patch. I've been
working on a replacement.
Paolo Bonzini Oct. 1, 2019, 2:24 p.m. UTC | #4
On 01/10/19 16:07, Jim Mattson wrote:
> On Tue, Oct 1, 2019 at 6:29 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 01/10/19 13:32, Vitaly Kuznetsov wrote:
>>> Jim Mattson <jmattson@google.com> writes:
>>>
>>>> KVM can only virtualize as many PMCs as the host supports.
>>>>
>>>> Limit the number of generic counters and fixed counters to the number
>>>> of corresponding counters supported on the host, rather than to
>>>> INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively.
>>>>
>>>> Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
>>>> contiguous MSR indices reserved by Intel for event selectors. Since
>>>> the existing code relies on a contiguous range of MSR indices for
>>>> event selectors, it can't possibly work for more than 18 general
>>>> purpose counters.
>>>
>>> Should we also trim msrs_to_save[] by removing impossible entries
>>> (18-31) then?
>>
>> Yes, I'll send a patch in a second.
> 
> I thought you were going to revert that msrs_to_save patch. I've been
> working on a replacement.

We can use a little more time to think more about it and discuss it.

For example, trimming is enough for the basic usage of passing
KVM_SET_SUPPORTED_CPUID output to KVM_SET_CPUID2 and then retrieving all
MSRs in the list.  If that is also okay for Google's userspace, we might
actually leave everything that way and retroactively decide that you
need to filter the MSRs but only if you pass your own CPUID.

Paolo
Jim Mattson Oct. 1, 2019, 2:30 p.m. UTC | #5
On Tue, Oct 1, 2019 at 7:24 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 01/10/19 16:07, Jim Mattson wrote:
> > On Tue, Oct 1, 2019 at 6:29 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 01/10/19 13:32, Vitaly Kuznetsov wrote:
> >>> Jim Mattson <jmattson@google.com> writes:
> >>>
> >>>> KVM can only virtualize as many PMCs as the host supports.
> >>>>
> >>>> Limit the number of generic counters and fixed counters to the number
> >>>> of corresponding counters supported on the host, rather than to
> >>>> INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively.
> >>>>
> >>>> Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18
> >>>> contiguous MSR indices reserved by Intel for event selectors. Since
> >>>> the existing code relies on a contiguous range of MSR indices for
> >>>> event selectors, it can't possibly work for more than 18 general
> >>>> purpose counters.
> >>>
> >>> Should we also trim msrs_to_save[] by removing impossible entries
> >>> (18-31) then?
> >>
> >> Yes, I'll send a patch in a second.
> >
> > I thought you were going to revert that msrs_to_save patch. I've been
> > working on a replacement.
>
> We can use a little more time to think more about it and discuss it.
>
> For example, trimming is enough for the basic usage of passing
> KVM_SET_SUPPORTED_CPUID output to KVM_SET_CPUID2 and then retrieving all
> MSRs in the list.  If that is also okay for Google's userspace, we might
> actually leave everything that way and retroactively decide that you
> need to filter the MSRs but only if you pass your own CPUID.
>
> Paolo

If just trimming the static list, remember to trim to even less than
18, since Intel has used one of the reserved MSRs following the event
selectors for something else. I was going to follow Sean's suggestion
and specifically enumerate all of the PMU MSRs based on CPUID 0AH.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 4dea0e0e7e392..3e9c059099e94 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -262,6 +262,7 @@  static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct x86_pmu_capability x86_pmu;
 	struct kvm_cpuid_entry2 *entry;
 	union cpuid10_eax eax;
 	union cpuid10_edx edx;
@@ -283,8 +284,10 @@  static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	if (!pmu->version)
 		return;
 
+	perf_get_x86_pmu_capability(&x86_pmu);
+
 	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
-					INTEL_PMC_MAX_GENERIC);
+					 x86_pmu.num_counters_gp);
 	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1;
 	pmu->available_event_types = ~entry->ebx &
 					((1ull << eax.split.mask_length) - 1);
@@ -294,7 +297,7 @@  static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	} else {
 		pmu->nr_arch_fixed_counters =
 			min_t(int, edx.split.num_counters_fixed,
-				INTEL_PMC_MAX_FIXED);
+			      x86_pmu.num_counters_fixed);
 		pmu->counter_bitmask[KVM_PMC_FIXED] =
 			((u64)1 << edx.split.bit_width_fixed) - 1;
 	}