diff mbox series

[v2,4/7] x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID

Message ID 20181210172159.410-5-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show
Series x86/kvm/hyper-v: Implement KVM_GET_SUPPORTED_HV_CPUID | expand

Commit Message

Vitaly Kuznetsov Dec. 10, 2018, 5:21 p.m. UTC
With every new Hyper-V Enlightenment we implement we're forced to add a
KVM_CAP_HYPERV_* capability. While this approach works it is fairly
inconvenient: the majority of the enlightenments we do have corresponding
CPUID feature bit(s) and userspace has to know this anyways to be able to
expose the feature to the guest.

Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
cap to rule them all!") returning all Hyper-V CPUID feature leaves.

Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
0x40000001) and we would probably confuse userspace in case we decide to
return these twice.

KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
Changes since v1:
- Document KVM_GET_SUPPORTED_HV_CPUID and KVM_CAP_HYPERV_CPUID.
- Fix error handling in kvm_vcpu_ioctl_get_hv_cpuid()
---
 Documentation/virtual/kvm/api.txt |  57 ++++++++++++++
 arch/x86/kvm/hyperv.c             | 121 ++++++++++++++++++++++++++++++
 arch/x86/kvm/hyperv.h             |   2 +
 arch/x86/kvm/x86.c                |  20 +++++
 include/uapi/linux/kvm.h          |   4 +
 5 files changed, 204 insertions(+)

Comments

Roman Kagan Dec. 11, 2018, 12:50 p.m. UTC | #1
On Mon, Dec 10, 2018 at 06:21:56PM +0100, Vitaly Kuznetsov wrote:
> With every new Hyper-V Enlightenment we implement we're forced to add a
> KVM_CAP_HYPERV_* capability. While this approach works it is fairly
> inconvenient: the majority of the enlightenments we do have corresponding
> CPUID feature bit(s) and userspace has to know this anyways to be able to
> expose the feature to the guest.
> 
> Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
> cap to rule them all!") returning all Hyper-V CPUID feature leaves.
> 
> Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
> Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
> 0x40000001) and we would probably confuse userspace in case we decide to
> return these twice.
> 
> KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
> KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> Changes since v1:
> - Document KVM_GET_SUPPORTED_HV_CPUID and KVM_CAP_HYPERV_CPUID.
> - Fix error handling in kvm_vcpu_ioctl_get_hv_cpuid()
> ---
>  Documentation/virtual/kvm/api.txt |  57 ++++++++++++++
>  arch/x86/kvm/hyperv.c             | 121 ++++++++++++++++++++++++++++++
>  arch/x86/kvm/hyperv.h             |   2 +
>  arch/x86/kvm/x86.c                |  20 +++++
>  include/uapi/linux/kvm.h          |   4 +
>  5 files changed, 204 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index cd209f7730af..5b72de0af24d 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3753,6 +3753,63 @@ Coalesced pio is based on coalesced mmio. There is little difference
>  between coalesced mmio and pio except that coalesced pio records accesses
>  to I/O ports.
>  
> +4.117 KVM_GET_SUPPORTED_HV_CPUID
> +
> +Capability: KVM_CAP_HYPERV_CPUID
> +Architectures: x86
> +Type: vcpu ioctl
> +Parameters: struct kvm_cpuid2 (in/out)
> +Returns: 0 on success, -1 on error
> +
> +struct kvm_cpuid2 {
> +	__u32 nent;
> +	__u32 padding;
> +	struct kvm_cpuid_entry2 entries[0];
> +};
> +
> +struct kvm_cpuid_entry2 {
> +	__u32 function;
> +	__u32 index;
> +	__u32 flags;
> +	__u32 eax;
> +	__u32 ebx;
> +	__u32 ecx;
> +	__u32 edx;
> +	__u32 padding[3];
> +};
> +
> +This ioctl returns x86 cpuid features leaves related to Hyper-V emulation in
> +KVM.  Userspace can use the information returned by this ioctl to construct
> +cpuid information presented to guests consuming Hyper-V enlightenments (e.g.
> +Windows or Hyper-V guests).
> +
> +CPUID feature leaves returned by this ioctl are defined by Hyper-V Top Level
> +Functional Specification (TLFS). These leaves can't be obtained with
> +KVM_GET_SUPPORTED_CPUID ioctl because some of them intersect with KVM feature
> +leaves (0x40000000, 0x40000001).
> +
> +Currently, the following list of CPUID leaves are returned:
> + HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
> + HYPERV_CPUID_INTERFACE
> + HYPERV_CPUID_VERSION
> + HYPERV_CPUID_FEATURES
> + HYPERV_CPUID_ENLIGHTMENT_INFO
> + HYPERV_CPUID_IMPLEMENT_LIMITS
> + HYPERV_CPUID_NESTED_FEATURES
> +
> +HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
> +enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).

IOW the output of ioctl(KVM_GET_SUPPORTED_HV_CPUID) depends on
whether ioctl(KVM_ENABLE_CAP, KVM_CAP_HYPERV_ENLIGHTENED_VMCS) has
already been called on that vcpu?  I wonder if this fits the intended
usage?

Roman.

> +
> +Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
> +with the 'nent' field indicating the number of entries in the variable-size
> +array 'entries'.  If the number of entries is too low to describe all Hyper-V
> +feature leaves, an error (E2BIG) is returned. If the number is more or equal
> +to the number of Hyper-V feature leaves, the 'nent' field is adjusted to the
> +number of valid entries in the 'entries' array, which is then filled.
> +
> +'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved,
> +userspace should not expect to get any particular value there.
> +
>  5. The kvm_run structure
>  ------------------------
>  
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index cecf907e4c49..04c3cdf3389e 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1804,3 +1804,124 @@ int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args)
>  		return kvm_hv_eventfd_deassign(kvm, args->conn_id);
>  	return kvm_hv_eventfd_assign(kvm, args->conn_id, args->fd);
>  }
> +
> +int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
> +				struct kvm_cpuid_entry2 __user *entries)
> +{
> +	uint16_t evmcs_ver = kvm_x86_ops->nested_get_evmcs_version(vcpu);
> +	struct kvm_cpuid_entry2 cpuid_entries[] = {
> +		{ .function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS },
> +		{ .function = HYPERV_CPUID_INTERFACE },
> +		{ .function = HYPERV_CPUID_VERSION },
> +		{ .function = HYPERV_CPUID_FEATURES },
> +		{ .function = HYPERV_CPUID_ENLIGHTMENT_INFO },
> +		{ .function = HYPERV_CPUID_IMPLEMENT_LIMITS },
> +		{ .function = HYPERV_CPUID_NESTED_FEATURES },
> +	};
> +	int i, nent = ARRAY_SIZE(cpuid_entries);
> +
> +	/* Skip NESTED_FEATURES if eVMCS is not supported */
> +	if (!evmcs_ver)
> +		--nent;
> +
> +	if (cpuid->nent < nent)
> +		return -E2BIG;
> +
> +	if (cpuid->nent > nent)
> +		cpuid->nent = nent;
> +
> +	for (i = 0; i < nent; i++) {
> +		struct kvm_cpuid_entry2 *ent = &cpuid_entries[i];
> +		u32 signature[3];
> +
> +		switch (ent->function) {
> +		case HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS:
> +			memcpy(signature, "Linux KVM Hv", 12);
> +
> +			ent->eax = HYPERV_CPUID_NESTED_FEATURES;
> +			ent->ebx = signature[0];
> +			ent->ecx = signature[1];
> +			ent->edx = signature[2];
> +			break;
> +
> +		case HYPERV_CPUID_INTERFACE:
> +			memcpy(signature, "Hv#1\0\0\0\0\0\0\0\0", 12);
> +			ent->eax = signature[0];
> +			break;
> +
> +		case HYPERV_CPUID_VERSION:
> +			/*
> +			 * We implement some Hyper-V 2016 functions so let's use
> +			 * this version.
> +			 */
> +			ent->eax = 0x00003839;
> +			ent->ebx = 0x000A0000;
> +			break;
> +
> +		case HYPERV_CPUID_FEATURES:
> +			ent->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
> +			ent->eax |= HV_MSR_TIME_REF_COUNT_AVAILABLE;
> +			ent->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
> +			ent->eax |= HV_MSR_SYNTIMER_AVAILABLE;
> +			ent->eax |= HV_X64_MSR_APIC_ACCESS_AVAILABLE;
> +			ent->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE;
> +			ent->eax |= HV_X64_MSR_VP_INDEX_AVAILABLE;
> +			ent->eax |= HV_X64_MSR_RESET_AVAILABLE;
> +			ent->eax |= HV_MSR_REFERENCE_TSC_AVAILABLE;
> +			ent->eax |= HV_X64_MSR_GUEST_IDLE_AVAILABLE;
> +			ent->eax |= HV_X64_ACCESS_FREQUENCY_MSRS;
> +			ent->eax |= HV_X64_ACCESS_REENLIGHTENMENT;
> +
> +			ent->ebx |= HV_X64_POST_MESSAGES;
> +			ent->ebx |= HV_X64_SIGNAL_EVENTS;
> +
> +			ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
> +			ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> +			ent->edx |= HV_STIMER_DIRECT_MODE_AVAILABLE;
> +
> +			break;
> +
> +		case HYPERV_CPUID_ENLIGHTMENT_INFO:
> +			ent->eax |= HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
> +			ent->eax |= HV_X64_APIC_ACCESS_RECOMMENDED;
> +			ent->eax |= HV_X64_SYSTEM_RESET_RECOMMENDED;
> +			ent->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED;
> +			ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
> +			ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
> +			ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> +
> +			/*
> +			 * Default number of spinlock retry attempts, matches
> +			 * HyperV 2016.
> +			 */
> +			ent->ebx = 0x00000FFF;
> +
> +			break;
> +
> +		case HYPERV_CPUID_IMPLEMENT_LIMITS:
> +			/* Maximum number of virtual processors */
> +			ent->eax = KVM_MAX_VCPUS;
> +			/*
> +			 * Maximum number of logical processors, matches
> +			 * HyperV 2016.
> +			 */
> +			ent->ebx = 64;
> +
> +			break;
> +
> +		case HYPERV_CPUID_NESTED_FEATURES:
> +			ent->eax = evmcs_ver;
> +
> +			break;
> +
> +		default:
> +			break;
> +		}
> +	}
> +
> +	if (copy_to_user(entries, cpuid_entries,
> +			 nent * sizeof(struct kvm_cpuid_entry2)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 0e66c12ed2c3..25428f5966e6 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -95,5 +95,7 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
>  void kvm_hv_init_vm(struct kvm *kvm);
>  void kvm_hv_destroy_vm(struct kvm *kvm);
>  int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
> +int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
> +				struct kvm_cpuid_entry2 __user *entries);
>  
>  #endif
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b21b5ceb8d26..142776435166 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2998,6 +2998,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_HYPERV_SEND_IPI:
>  	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
>  	case KVM_CAP_HYPERV_STIMER_DIRECT:
> +	case KVM_CAP_HYPERV_CPUID:
>  	case KVM_CAP_PCI_SEGMENT:
>  	case KVM_CAP_DEBUGREGS:
>  	case KVM_CAP_X86_ROBUST_SINGLESTEP:
> @@ -4191,6 +4192,25 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		r = kvm_x86_ops->set_nested_state(vcpu, user_kvm_nested_state, &kvm_state);
>  		break;
>  	}
> +	case KVM_GET_SUPPORTED_HV_CPUID: {
> +		struct kvm_cpuid2 __user *cpuid_arg = argp;
> +		struct kvm_cpuid2 cpuid;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
> +			goto out;
> +
> +		r = kvm_vcpu_ioctl_get_hv_cpuid(vcpu, &cpuid,
> +						cpuid_arg->entries);
> +		if (r)
> +			goto out;
> +
> +		r = -EFAULT;
> +		if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
> +			goto out;
> +		r = 0;
> +		break;
> +	}
>  	default:
>  		r = -EINVAL;
>  	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b8da14cee8e5..9d0038eae002 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -976,6 +976,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_EXCEPTION_PAYLOAD 164
>  #define KVM_CAP_ARM_VM_IPA_SIZE 165
>  #define KVM_CAP_HYPERV_STIMER_DIRECT 166
> +#define KVM_CAP_HYPERV_CPUID 167
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -1422,6 +1423,9 @@ struct kvm_enc_region {
>  #define KVM_GET_NESTED_STATE         _IOWR(KVMIO, 0xbe, struct kvm_nested_state)
>  #define KVM_SET_NESTED_STATE         _IOW(KVMIO,  0xbf, struct kvm_nested_state)
>  
> +/* Available with KVM_CAP_HYPERV_CPUID */
> +#define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc0, struct kvm_cpuid2)
> +
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
>  	/* Guest initialization commands */
> -- 
> 2.19.2
>
Vitaly Kuznetsov Dec. 11, 2018, 1:28 p.m. UTC | #2
Roman Kagan <rkagan@virtuozzo.com> writes:

> On Mon, Dec 10, 2018 at 06:21:56PM +0100, Vitaly Kuznetsov wrote:

>> +
>> +Currently, the following list of CPUID leaves are returned:
>> + HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
>> + HYPERV_CPUID_INTERFACE
>> + HYPERV_CPUID_VERSION
>> + HYPERV_CPUID_FEATURES
>> + HYPERV_CPUID_ENLIGHTMENT_INFO
>> + HYPERV_CPUID_IMPLEMENT_LIMITS
>> + HYPERV_CPUID_NESTED_FEATURES
>> +
>> +HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
>> +enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
>
> IOW the output of ioctl(KVM_GET_SUPPORTED_HV_CPUID) depends on
> whether ioctl(KVM_ENABLE_CAP, KVM_CAP_HYPERV_ENLIGHTENED_VMCS) has
> already been called on that vcpu?  I wonder if this fits the intended
> usage?

I added HYPERV_CPUID_NESTED_FEATURES in the list (and made the new ioctl
per-cpu and not per-vm) for consistency. *In theory*
KVM_CAP_HYPERV_ENLIGHTENED_VMCS is also enabled per-vcpu so some
hypothetical userspace can later check enabled eVMCS versions (which can
differ across vCPUs!) with KVM_GET_SUPPORTED_HV_CPUID. We will also have
direct tlb flush and other nested features there so to avoid addning new
KVM_CAP_* for them we need the CPUID.

Another thing I'm thinking about is something like 'hv_all' cpu flag for
Qemu which would enable everything by setting guest CPUIDs to what
KVM_GET_SUPPORTED_HV_CPUID returns. In that case it would also be
convenient to have HYPERV_CPUID_NESTED_FEATURES properly filled (or not
filled when eVMCS was not enabled).
Roman Kagan Dec. 11, 2018, 2:02 p.m. UTC | #3
On Tue, Dec 11, 2018 at 02:28:14PM +0100, Vitaly Kuznetsov wrote:
> Roman Kagan <rkagan@virtuozzo.com> writes:
> 
> > On Mon, Dec 10, 2018 at 06:21:56PM +0100, Vitaly Kuznetsov wrote:
> 
> >> +
> >> +Currently, the following list of CPUID leaves are returned:
> >> + HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
> >> + HYPERV_CPUID_INTERFACE
> >> + HYPERV_CPUID_VERSION
> >> + HYPERV_CPUID_FEATURES
> >> + HYPERV_CPUID_ENLIGHTMENT_INFO
> >> + HYPERV_CPUID_IMPLEMENT_LIMITS
> >> + HYPERV_CPUID_NESTED_FEATURES
> >> +
> >> +HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
> >> +enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
> >
> > IOW the output of ioctl(KVM_GET_SUPPORTED_HV_CPUID) depends on
> > whether ioctl(KVM_ENABLE_CAP, KVM_CAP_HYPERV_ENLIGHTENED_VMCS) has
> > already been called on that vcpu?  I wonder if this fits the intended
> > usage?
> 
> I added HYPERV_CPUID_NESTED_FEATURES in the list (and made the new ioctl
> per-cpu and not per-vm) for consistency. *In theory*
> KVM_CAP_HYPERV_ENLIGHTENED_VMCS is also enabled per-vcpu so some
> hypothetical userspace can later check enabled eVMCS versions (which can
> differ across vCPUs!) with KVM_GET_SUPPORTED_HV_CPUID. We will also have
> direct tlb flush and other nested features there so to avoid addning new
> KVM_CAP_* for them we need the CPUID.

This is different from how KVM_GET_SUPPORTED_CPUID is used: QEMU assumes
that its output doesn't change between calls, and even caches the result
calling the ioctl only once.

> Another thing I'm thinking about is something like 'hv_all' cpu flag for
> Qemu which would enable everything by setting guest CPUIDs to what
> KVM_GET_SUPPORTED_HV_CPUID returns. In that case it would also be
> convenient to have HYPERV_CPUID_NESTED_FEATURES properly filled (or not
> filled when eVMCS was not enabled).

I think this is orthogonal to the way you obtain capability info from
the kernel.

Roman.
Vitaly Kuznetsov Dec. 11, 2018, 3:04 p.m. UTC | #4
Roman Kagan <rkagan@virtuozzo.com> writes:

> On Tue, Dec 11, 2018 at 02:28:14PM +0100, Vitaly Kuznetsov wrote:
>> Roman Kagan <rkagan@virtuozzo.com> writes:
>> 
>> > On Mon, Dec 10, 2018 at 06:21:56PM +0100, Vitaly Kuznetsov wrote:
>> 
>> >> +
>> >> +Currently, the following list of CPUID leaves are returned:
>> >> + HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
>> >> + HYPERV_CPUID_INTERFACE
>> >> + HYPERV_CPUID_VERSION
>> >> + HYPERV_CPUID_FEATURES
>> >> + HYPERV_CPUID_ENLIGHTMENT_INFO
>> >> + HYPERV_CPUID_IMPLEMENT_LIMITS
>> >> + HYPERV_CPUID_NESTED_FEATURES
>> >> +
>> >> +HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
>> >> +enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
>> >
>> > IOW the output of ioctl(KVM_GET_SUPPORTED_HV_CPUID) depends on
>> > whether ioctl(KVM_ENABLE_CAP, KVM_CAP_HYPERV_ENLIGHTENED_VMCS) has
>> > already been called on that vcpu?  I wonder if this fits the intended
>> > usage?
>> 
>> I added HYPERV_CPUID_NESTED_FEATURES in the list (and made the new ioctl
>> per-cpu and not per-vm) for consistency. *In theory*
>> KVM_CAP_HYPERV_ENLIGHTENED_VMCS is also enabled per-vcpu so some
>> hypothetical userspace can later check enabled eVMCS versions (which can
>> differ across vCPUs!) with KVM_GET_SUPPORTED_HV_CPUID. We will also have
>> direct tlb flush and other nested features there so to avoid addning new
>> KVM_CAP_* for them we need the CPUID.
>
> This is different from how KVM_GET_SUPPORTED_CPUID is used: QEMU assumes
> that its output doesn't change between calls, and even caches the result
> calling the ioctl only once.
>

Yes, I'm not sure if we have to have full consistency between
KVM_GET_SUPPORTED_CPUID and KVM_GET_SUPPORTED_HV_CPUID.

>> Another thing I'm thinking about is something like 'hv_all' cpu flag for
>> Qemu which would enable everything by setting guest CPUIDs to what
>> KVM_GET_SUPPORTED_HV_CPUID returns. In that case it would also be
>> convenient to have HYPERV_CPUID_NESTED_FEATURES properly filled (or not
>> filled when eVMCS was not enabled).
>
> I think this is orthogonal to the way you obtain capability info from
> the kernel.

Not necessarily. If very dumb userspace does 'host passthrough' for
Hyper-V features without doing anything (e.g. not enabling Enlightened
VMCS) it will just put the result of KVM_GET_SUPPORTED_HV_CPUID in guest
facing CPUIDs and it will all work. In case eVMCS was previously enabled
it again just copies everything and this still works.

We don't probably need this for Qemu though. If you think it would be
better to have HYPERV_CPUID_NESTED_FEATURES returned regardless of eVMCS
enablement I'm ready to budge)
Roman Kagan Dec. 11, 2018, 3:10 p.m. UTC | #5
On Tue, Dec 11, 2018 at 04:04:01PM +0100, Vitaly Kuznetsov wrote:
> Roman Kagan <rkagan@virtuozzo.com> writes:
> 
> > On Tue, Dec 11, 2018 at 02:28:14PM +0100, Vitaly Kuznetsov wrote:
> >> Roman Kagan <rkagan@virtuozzo.com> writes:
> >> 
> >> > On Mon, Dec 10, 2018 at 06:21:56PM +0100, Vitaly Kuznetsov wrote:
> >> 
> >> >> +
> >> >> +Currently, the following list of CPUID leaves are returned:
> >> >> + HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
> >> >> + HYPERV_CPUID_INTERFACE
> >> >> + HYPERV_CPUID_VERSION
> >> >> + HYPERV_CPUID_FEATURES
> >> >> + HYPERV_CPUID_ENLIGHTMENT_INFO
> >> >> + HYPERV_CPUID_IMPLEMENT_LIMITS
> >> >> + HYPERV_CPUID_NESTED_FEATURES
> >> >> +
> >> >> +HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
> >> >> +enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
> >> >
> >> > IOW the output of ioctl(KVM_GET_SUPPORTED_HV_CPUID) depends on
> >> > whether ioctl(KVM_ENABLE_CAP, KVM_CAP_HYPERV_ENLIGHTENED_VMCS) has
> >> > already been called on that vcpu?  I wonder if this fits the intended
> >> > usage?
> >> 
> >> I added HYPERV_CPUID_NESTED_FEATURES in the list (and made the new ioctl
> >> per-cpu and not per-vm) for consistency. *In theory*
> >> KVM_CAP_HYPERV_ENLIGHTENED_VMCS is also enabled per-vcpu so some
> >> hypothetical userspace can later check enabled eVMCS versions (which can
> >> differ across vCPUs!) with KVM_GET_SUPPORTED_HV_CPUID. We will also have
> >> direct tlb flush and other nested features there so to avoid addning new
> >> KVM_CAP_* for them we need the CPUID.
> >
> > This is different from how KVM_GET_SUPPORTED_CPUID is used: QEMU assumes
> > that its output doesn't change between calls, and even caches the result
> > calling the ioctl only once.
> >
> 
> Yes, I'm not sure if we have to have full consistency between
> KVM_GET_SUPPORTED_CPUID and KVM_GET_SUPPORTED_HV_CPUID.

Neither do I.  I just noticed the difference and thought it might
matter.

> >> Another thing I'm thinking about is something like 'hv_all' cpu flag for
> >> Qemu which would enable everything by setting guest CPUIDs to what
> >> KVM_GET_SUPPORTED_HV_CPUID returns. In that case it would also be
> >> convenient to have HYPERV_CPUID_NESTED_FEATURES properly filled (or not
> >> filled when eVMCS was not enabled).
> >
> > I think this is orthogonal to the way you obtain capability info from
> > the kernel.
> 
> Not necessarily. If very dumb userspace does 'host passthrough' for
> Hyper-V features without doing anything (e.g. not enabling Enlightened
> VMCS) it will just put the result of KVM_GET_SUPPORTED_HV_CPUID in guest
> facing CPUIDs and it will all work. In case eVMCS was previously enabled
> it again just copies everything and this still works.
> 
> We don't probably need this for Qemu though. If you think it would be
> better to have HYPERV_CPUID_NESTED_FEATURES returned regardless of eVMCS
> enablement I'm ready to budge)

I have no opinion on this.  I hope Paolo, who requested the feature, can
explain the desired semantics :)

Roman.
diff mbox series

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index cd209f7730af..5b72de0af24d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3753,6 +3753,63 @@  Coalesced pio is based on coalesced mmio. There is little difference
 between coalesced mmio and pio except that coalesced pio records accesses
 to I/O ports.
 
+4.117 KVM_GET_SUPPORTED_HV_CPUID
+
+Capability: KVM_CAP_HYPERV_CPUID
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_cpuid2 (in/out)
+Returns: 0 on success, -1 on error
+
+struct kvm_cpuid2 {
+	__u32 nent;
+	__u32 padding;
+	struct kvm_cpuid_entry2 entries[0];
+};
+
+struct kvm_cpuid_entry2 {
+	__u32 function;
+	__u32 index;
+	__u32 flags;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+	__u32 padding[3];
+};
+
+This ioctl returns x86 cpuid features leaves related to Hyper-V emulation in
+KVM.  Userspace can use the information returned by this ioctl to construct
+cpuid information presented to guests consuming Hyper-V enlightenments (e.g.
+Windows or Hyper-V guests).
+
+CPUID feature leaves returned by this ioctl are defined by Hyper-V Top Level
+Functional Specification (TLFS). These leaves can't be obtained with
+KVM_GET_SUPPORTED_CPUID ioctl because some of them intersect with KVM feature
+leaves (0x40000000, 0x40000001).
+
+Currently, the following list of CPUID leaves are returned:
+ HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
+ HYPERV_CPUID_INTERFACE
+ HYPERV_CPUID_VERSION
+ HYPERV_CPUID_FEATURES
+ HYPERV_CPUID_ENLIGHTMENT_INFO
+ HYPERV_CPUID_IMPLEMENT_LIMITS
+ HYPERV_CPUID_NESTED_FEATURES
+
+HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
+enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
+
+Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
+with the 'nent' field indicating the number of entries in the variable-size
+array 'entries'.  If the number of entries is too low to describe all Hyper-V
+feature leaves, an error (E2BIG) is returned. If the number is more or equal
+to the number of Hyper-V feature leaves, the 'nent' field is adjusted to the
+number of valid entries in the 'entries' array, which is then filled.
+
+'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved,
+userspace should not expect to get any particular value there.
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index cecf907e4c49..04c3cdf3389e 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1804,3 +1804,124 @@  int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args)
 		return kvm_hv_eventfd_deassign(kvm, args->conn_id);
 	return kvm_hv_eventfd_assign(kvm, args->conn_id, args->fd);
 }
+
+int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
+				struct kvm_cpuid_entry2 __user *entries)
+{
+	uint16_t evmcs_ver = kvm_x86_ops->nested_get_evmcs_version(vcpu);
+	struct kvm_cpuid_entry2 cpuid_entries[] = {
+		{ .function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS },
+		{ .function = HYPERV_CPUID_INTERFACE },
+		{ .function = HYPERV_CPUID_VERSION },
+		{ .function = HYPERV_CPUID_FEATURES },
+		{ .function = HYPERV_CPUID_ENLIGHTMENT_INFO },
+		{ .function = HYPERV_CPUID_IMPLEMENT_LIMITS },
+		{ .function = HYPERV_CPUID_NESTED_FEATURES },
+	};
+	int i, nent = ARRAY_SIZE(cpuid_entries);
+
+	/* Skip NESTED_FEATURES if eVMCS is not supported */
+	if (!evmcs_ver)
+		--nent;
+
+	if (cpuid->nent < nent)
+		return -E2BIG;
+
+	if (cpuid->nent > nent)
+		cpuid->nent = nent;
+
+	for (i = 0; i < nent; i++) {
+		struct kvm_cpuid_entry2 *ent = &cpuid_entries[i];
+		u32 signature[3];
+
+		switch (ent->function) {
+		case HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS:
+			memcpy(signature, "Linux KVM Hv", 12);
+
+			ent->eax = HYPERV_CPUID_NESTED_FEATURES;
+			ent->ebx = signature[0];
+			ent->ecx = signature[1];
+			ent->edx = signature[2];
+			break;
+
+		case HYPERV_CPUID_INTERFACE:
+			memcpy(signature, "Hv#1\0\0\0\0\0\0\0\0", 12);
+			ent->eax = signature[0];
+			break;
+
+		case HYPERV_CPUID_VERSION:
+			/*
+			 * We implement some Hyper-V 2016 functions so let's use
+			 * this version.
+			 */
+			ent->eax = 0x00003839;
+			ent->ebx = 0x000A0000;
+			break;
+
+		case HYPERV_CPUID_FEATURES:
+			ent->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
+			ent->eax |= HV_MSR_TIME_REF_COUNT_AVAILABLE;
+			ent->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
+			ent->eax |= HV_MSR_SYNTIMER_AVAILABLE;
+			ent->eax |= HV_X64_MSR_APIC_ACCESS_AVAILABLE;
+			ent->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE;
+			ent->eax |= HV_X64_MSR_VP_INDEX_AVAILABLE;
+			ent->eax |= HV_X64_MSR_RESET_AVAILABLE;
+			ent->eax |= HV_MSR_REFERENCE_TSC_AVAILABLE;
+			ent->eax |= HV_X64_MSR_GUEST_IDLE_AVAILABLE;
+			ent->eax |= HV_X64_ACCESS_FREQUENCY_MSRS;
+			ent->eax |= HV_X64_ACCESS_REENLIGHTENMENT;
+
+			ent->ebx |= HV_X64_POST_MESSAGES;
+			ent->ebx |= HV_X64_SIGNAL_EVENTS;
+
+			ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
+			ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+			ent->edx |= HV_STIMER_DIRECT_MODE_AVAILABLE;
+
+			break;
+
+		case HYPERV_CPUID_ENLIGHTMENT_INFO:
+			ent->eax |= HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+			ent->eax |= HV_X64_APIC_ACCESS_RECOMMENDED;
+			ent->eax |= HV_X64_SYSTEM_RESET_RECOMMENDED;
+			ent->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED;
+			ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
+			ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
+			ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
+
+			/*
+			 * Default number of spinlock retry attempts, matches
+			 * HyperV 2016.
+			 */
+			ent->ebx = 0x00000FFF;
+
+			break;
+
+		case HYPERV_CPUID_IMPLEMENT_LIMITS:
+			/* Maximum number of virtual processors */
+			ent->eax = KVM_MAX_VCPUS;
+			/*
+			 * Maximum number of logical processors, matches
+			 * HyperV 2016.
+			 */
+			ent->ebx = 64;
+
+			break;
+
+		case HYPERV_CPUID_NESTED_FEATURES:
+			ent->eax = evmcs_ver;
+
+			break;
+
+		default:
+			break;
+		}
+	}
+
+	if (copy_to_user(entries, cpuid_entries,
+			 nent * sizeof(struct kvm_cpuid_entry2)))
+		return -EFAULT;
+
+	return 0;
+}
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 0e66c12ed2c3..25428f5966e6 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -95,5 +95,7 @@  void kvm_hv_setup_tsc_page(struct kvm *kvm,
 void kvm_hv_init_vm(struct kvm *kvm);
 void kvm_hv_destroy_vm(struct kvm *kvm);
 int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
+int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
+				struct kvm_cpuid_entry2 __user *entries);
 
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b21b5ceb8d26..142776435166 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2998,6 +2998,7 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_HYPERV_SEND_IPI:
 	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
 	case KVM_CAP_HYPERV_STIMER_DIRECT:
+	case KVM_CAP_HYPERV_CPUID:
 	case KVM_CAP_PCI_SEGMENT:
 	case KVM_CAP_DEBUGREGS:
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
@@ -4191,6 +4192,25 @@  long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = kvm_x86_ops->set_nested_state(vcpu, user_kvm_nested_state, &kvm_state);
 		break;
 	}
+	case KVM_GET_SUPPORTED_HV_CPUID: {
+		struct kvm_cpuid2 __user *cpuid_arg = argp;
+		struct kvm_cpuid2 cpuid;
+
+		r = -EFAULT;
+		if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
+			goto out;
+
+		r = kvm_vcpu_ioctl_get_hv_cpuid(vcpu, &cpuid,
+						cpuid_arg->entries);
+		if (r)
+			goto out;
+
+		r = -EFAULT;
+		if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+			goto out;
+		r = 0;
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b8da14cee8e5..9d0038eae002 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -976,6 +976,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_EXCEPTION_PAYLOAD 164
 #define KVM_CAP_ARM_VM_IPA_SIZE 165
 #define KVM_CAP_HYPERV_STIMER_DIRECT 166
+#define KVM_CAP_HYPERV_CPUID 167
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1422,6 +1423,9 @@  struct kvm_enc_region {
 #define KVM_GET_NESTED_STATE         _IOWR(KVMIO, 0xbe, struct kvm_nested_state)
 #define KVM_SET_NESTED_STATE         _IOW(KVMIO,  0xbf, struct kvm_nested_state)
 
+/* Available with KVM_CAP_HYPERV_CPUID */
+#define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc0, struct kvm_cpuid2)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */