diff mbox series

[1/3] Provide VM capability to disable PMU virtualization for individual VMs

Message ID 20220119182818.3641304-1-daviddunn@google.com (mailing list archive)
State New, archived
Headers show
Series [1/3] Provide VM capability to disable PMU virtualization for individual VMs | expand

Commit Message

David Dunn Jan. 19, 2022, 6:28 p.m. UTC
When PMU virtualization is enabled via the module parameter, usermode
can disable PMU virtualization on individual VMs using this new
capability.

This provides a uniform way to disable PMU virtualization on x86.  Since
AMD doesn't have a CPUID bit for PMU support, disabling PMU
virtualization requires some other state to indicate whether the PMU
related MSRs are ignored.

Since KVM_GET_SUPPORTED_CPUID reports the maximal CPUID information
based on module parameters, usermode will need to adjust CPUID when
disabling PMU virtualization on individual VMs.  On Intel CPUs, the
change to PMU enablement will not alter existing until SET_CPUID2 is
invoked.

Signed-off-by: David Dunn <daviddunn@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm/pmu.c          |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c    |  2 +-
 arch/x86/kvm/x86.c              | 11 +++++++++++
 include/uapi/linux/kvm.h        |  1 +
 tools/include/uapi/linux/kvm.h  |  1 +
 6 files changed, 16 insertions(+), 2 deletions(-)

Comments

Sean Christopherson Jan. 20, 2022, 1:15 a.m. UTC | #1
On Wed, Jan 19, 2022, David Dunn wrote:
> When PMU virtualization is enabled via the module parameter, usermode
> can disable PMU virtualization on individual VMs using this new
> capability.
> 
> This provides a uniform way to disable PMU virtualization on x86.  Since
> AMD doesn't have a CPUID bit for PMU support, disabling PMU
> virtualization requires some other state to indicate whether the PMU
> related MSRs are ignored.
> 
> Since KVM_GET_SUPPORTED_CPUID reports the maximal CPUID information
> based on module parameters, usermode will need to adjust CPUID when
> disabling PMU virtualization on individual VMs.  On Intel CPUs, the
> change to PMU enablement will not alter existing until SET_CPUID2 is
> invoked.
> 
> Signed-off-by: David Dunn <daviddunn@google.com>
> ---

I'm not necessarily opposed to this capability, but can't userspace get the same
result by using MSR filtering to inject #GP on the PMU MSRs?

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 55518b7d3b96..9b640c5bb4f6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4326,6 +4326,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		if (r < sizeof(struct kvm_xsave))
>  			r = sizeof(struct kvm_xsave);
>  		break;
> +	case KVM_CAP_ENABLE_PMU:
> +		r = enable_pmu;
> +		break;
>  	}
>  	default:
>  		break;
> @@ -5937,6 +5940,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		kvm->arch.exit_on_emulation_error = cap->args[0];
>  		r = 0;
>  		break;
> +	case KVM_CAP_ENABLE_PMU:
> +		r = -EINVAL;
> +		if (!enable_pmu || cap->args[0] & ~1)

Probably worth adding a #define in uapi/.../kvm.h for bit 0.

> +			break;
> +		kvm->arch.enable_pmu = cap->args[0];
> +		r = 0;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> @@ -11562,6 +11572,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>  
>  	kvm->arch.guest_can_read_msr_platform_info = true;
> +	kvm->arch.enable_pmu = true;

Rather than default to "true", just capture the global "enable_pmu" and then all
the sites that check "enable_pmu" in VM context can check _only_ kvm->arch.enable_pmu.
enable_pmu is readonly, so there's no danger of it being toggled after the VM is
created.

>  #if IS_ENABLED(CONFIG_HYPERV)
>  	spin_lock_init(&kvm->arch.hv_root_tdp_lock);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 9563d294f181..37cbcdffe773 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
>  #define KVM_CAP_VM_GPA_BITS 207
>  #define KVM_CAP_XSAVE2 208
> +#define KVM_CAP_ENABLE_PMU 209
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
> index f066637ee206..e71712c71ab1 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h
> @@ -1132,6 +1132,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ARM_MTE 205
>  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
>  #define KVM_CAP_XSAVE2 207
> +#define KVM_CAP_ENABLE_PMU 209
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.34.1.703.g22d0c6ccf7-goog
>
David Dunn Jan. 20, 2022, 3 a.m. UTC | #2
Thanks Sean.

On Wed, Jan 19, 2022 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:

> I'm not necessarily opposed to this capability, but can't userspace get the same
> result by using MSR filtering to inject #GP on the PMU MSRs?

Yes.  It is possible for each userspace to inject #GP on Intel and
ignore on AMD.  But
I think it is less error prone to handle it once in KVM in the same
way we handle the
module parameter.  No extra complexity in KVM but it reduces the
complexity in clients.

> Probably worth adding a #define in uapi/.../kvm.h for bit 0.

> Rather than default to "true", just capture the global "enable_pmu" and then all
> the sites that check "enable_pmu" in VM context can check _only_ kvm->arch.enable_pmu.
> enable_pmu is readonly, so there's no danger of it being toggled after the VM is
> created.

Thanks for the feedback.  I'll incorporate both of these in v2.

Dave Dunn
Like Xu Jan. 20, 2022, 3:02 a.m. UTC | #3
Hi David,

Thanks for coming to address this.

Please modify the patch(es) subject to follow the convention.

On 20/1/2022 2:28 am, David Dunn wrote:
> When PMU virtualization is enabled via the module parameter, usermode
> can disable PMU virtualization on individual VMs using this new
> capability.

Will the user space fail or be notified when the enable_pmu say no ?

> 
> This provides a uniform way to disable PMU virtualization on x86.  Since
> AMD doesn't have a CPUID bit for PMU support, disabling PMU

Not entirely absent, such as PERFCTR_CORE.

> virtualization requires some other state to indicate whether the PMU
> related MSRs are ignored.

Not just ignored, but made to disappear altogether.

> 
> Since KVM_GET_SUPPORTED_CPUID reports the maximal CPUID information
> based on module parameters, usermode will need to adjust CPUID when
> disabling PMU virtualization on individual VMs.  On Intel CPUs, the
> change to PMU enablement will not alter existing until SET_CPUID2 is
> invoked.

Please clarify. Do we have a requirement for the order in which the
SET_CPUID2 and ioctl_enable_cap interfaces are called?

> 
> Signed-off-by: David Dunn <daviddunn@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  1 +
>   arch/x86/kvm/svm/pmu.c          |  2 +-
>   arch/x86/kvm/vmx/pmu_intel.c    |  2 +-
>   arch/x86/kvm/x86.c              | 11 +++++++++++
>   include/uapi/linux/kvm.h        |  1 +
>   tools/include/uapi/linux/kvm.h  |  1 +
>   6 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 682ad02a4e58..5cdcd4a7671b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1232,6 +1232,7 @@ struct kvm_arch {
>   	hpa_t	hv_root_tdp;
>   	spinlock_t hv_root_tdp_lock;
>   #endif
> +	bool enable_pmu;

The name makes it difficult to distinguish the scope of access to the variable.
Try storing it via "pmu->version == 0".

>   };
>   
>   struct kvm_vm_stat {
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index 5aa45f13b16d..605bcfb55625 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -101,7 +101,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
>   {
>   	struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
>   
> -	if (!enable_pmu)
> +	if (!enable_pmu || !vcpu->kvm->arch.enable_pmu)
>   		return NULL;
>   
>   	switch (msr) {
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 466d18fc0c5d..4c3885765027 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -487,7 +487,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>   	pmu->reserved_bits = 0xffffffff00200000ull;
>   
>   	entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
> -	if (!entry || !enable_pmu)
> +	if (!entry || !vcpu->kvm->arch.enable_pmu || !enable_pmu)
>   		return;
>   	eax.full = entry->eax;
>   	edx.full = entry->edx;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 55518b7d3b96..9b640c5bb4f6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4326,6 +4326,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   		if (r < sizeof(struct kvm_xsave))
>   			r = sizeof(struct kvm_xsave);
>   		break;
> +	case KVM_CAP_ENABLE_PMU:
> +		r = enable_pmu;
> +		break;
>   	}
>   	default:
>   		break;
> @@ -5937,6 +5940,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		kvm->arch.exit_on_emulation_error = cap->args[0];
>   		r = 0;
>   		break;
> +	case KVM_CAP_ENABLE_PMU:
> +		r = -EINVAL;
> +		if (!enable_pmu || cap->args[0] & ~1)
> +			break;
> +		kvm->arch.enable_pmu = cap->args[0];
> +		r = 0;
> +		break;
>   	default:
>   		r = -EINVAL;
>   		break;
> @@ -11562,6 +11572,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>   
>   	kvm->arch.guest_can_read_msr_platform_info = true;
> +	kvm->arch.enable_pmu = true;
>   
>   #if IS_ENABLED(CONFIG_HYPERV)
>   	spin_lock_init(&kvm->arch.hv_root_tdp_lock);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 9563d294f181..37cbcdffe773 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt {
>   #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
>   #define KVM_CAP_VM_GPA_BITS 207
>   #define KVM_CAP_XSAVE2 208
> +#define KVM_CAP_ENABLE_PMU 209

Rename it to KVM_CAP_PMU_CAPABILITY and use the bit 0 for *DISABLE_PMU*.

>   
>   #ifdef KVM_CAP_IRQ_ROUTING
>   
> diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
> index f066637ee206..e71712c71ab1 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h
> @@ -1132,6 +1132,7 @@ struct kvm_ppc_resize_hpt {
>   #define KVM_CAP_ARM_MTE 205
>   #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
>   #define KVM_CAP_XSAVE2 207
> +#define KVM_CAP_ENABLE_PMU 209
>   
>   #ifdef KVM_CAP_IRQ_ROUTING
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 682ad02a4e58..5cdcd4a7671b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1232,6 +1232,7 @@  struct kvm_arch {
 	hpa_t	hv_root_tdp;
 	spinlock_t hv_root_tdp_lock;
 #endif
+	bool enable_pmu;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 5aa45f13b16d..605bcfb55625 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -101,7 +101,7 @@  static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
 {
 	struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
 
-	if (!enable_pmu)
+	if (!enable_pmu || !vcpu->kvm->arch.enable_pmu)
 		return NULL;
 
 	switch (msr) {
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 466d18fc0c5d..4c3885765027 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -487,7 +487,7 @@  static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->reserved_bits = 0xffffffff00200000ull;
 
 	entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
-	if (!entry || !enable_pmu)
+	if (!entry || !vcpu->kvm->arch.enable_pmu || !enable_pmu)
 		return;
 	eax.full = entry->eax;
 	edx.full = entry->edx;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 55518b7d3b96..9b640c5bb4f6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4326,6 +4326,9 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		if (r < sizeof(struct kvm_xsave))
 			r = sizeof(struct kvm_xsave);
 		break;
+	case KVM_CAP_ENABLE_PMU:
+		r = enable_pmu;
+		break;
 	}
 	default:
 		break;
@@ -5937,6 +5940,13 @@  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		kvm->arch.exit_on_emulation_error = cap->args[0];
 		r = 0;
 		break;
+	case KVM_CAP_ENABLE_PMU:
+		r = -EINVAL;
+		if (!enable_pmu || cap->args[0] & ~1)
+			break;
+		kvm->arch.enable_pmu = cap->args[0];
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -11562,6 +11572,7 @@  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
 	kvm->arch.guest_can_read_msr_platform_info = true;
+	kvm->arch.enable_pmu = true;
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	spin_lock_init(&kvm->arch.hv_root_tdp_lock);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9563d294f181..37cbcdffe773 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1133,6 +1133,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
+#define KVM_CAP_ENABLE_PMU 209
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index f066637ee206..e71712c71ab1 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1132,6 +1132,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_XSAVE2 207
+#define KVM_CAP_ENABLE_PMU 209
 
 #ifdef KVM_CAP_IRQ_ROUTING